Project

General

Profile

Actions

Feature #1036

closed

Search for similar names

Added by Alexander Watzinger over 5 years ago. Updated over 5 years ago.

Status:
Closed
Priority:
Normal
Category:
Backend
Target version:
Start date:
2019-06-07
Estimated time:
16.00 h

Description

To prevent duplicates or spelling mistakes a search for similar names will be implemented.

  • Adding Python library fuzzywuzzy which uses Levenshtein Distance to calculate the differences between sequences
  • Option to select ratio
  • Option to select class

Ideas for next version

  • Search for a manual added string
  • Add a check when inserting an entity and warn if a similar name already exists
Actions #1

Updated by Alexander Watzinger over 5 years ago

Or maybe use PostgreSQL with install postgresql-contrib:

CREATE EXTENSION pg_trgm;

CREATE INDEX trgm_idx ON model.entity USING GIST (name gist_trgm_ops);

select (similarity(n1.name, n2.name)) as sim, n1.name, n2.name
from model.entity n1, model.entity n2
where
n1.id != n2.id
and n1.system_type = 'place'
and n2.system_type = 'place'
and similarity(n1.name, n2.name) > .7
order by sim desc;

Actions #2

Updated by Alexander Watzinger over 5 years ago

  • Description updated (diff)
  • Status changed from In Progress to Closed
Actions

Also available in: Atom PDF