Project

General

Profile

Feature #1036

Search for similar names

Added by Alexander Watzinger 5 months ago. Updated 5 months ago.

Status:
Closed
Priority:
Low
Category:
Backend
Target version:
Start date:
2019-06-07
Estimated time:
16.00 h

Description

To prevent duplicates or spelling mistakes a search for similar names will be implemented.

  • Adding Python library fuzzywuzzy which uses Levenshtein Distance to calculate the differences between sequences
  • Option to select ratio
  • Option to select class

Ideas for next version

  • Search for a manual added string
  • Add a check when inserting an entity and warn if a similar name already exists

History

#1 Updated by Alexander Watzinger 5 months ago

Or maybe use PostgreSQL with install postgresql-contrib:

CREATE EXTENSION pg_trgm;

CREATE INDEX trgm_idx ON model.entity USING GIST (name gist_trgm_ops);

select (similarity(n1.name, n2.name)) as sim, n1.name, n2.name
from model.entity n1, model.entity n2
where
n1.id != n2.id
and n1.system_type = 'place'
and n2.system_type = 'place'
and similarity(n1.name, n2.name) > .7
order by sim desc;

#2 Updated by Alexander Watzinger 5 months ago

  • Status changed from In Progress to Closed
  • Description updated (diff)

Also available in: Atom PDF