Oracle FAQ | Your Portal to the Oracle Knowledge Grid |
![]() |
![]() |
Home -> Community -> Usenet -> c.d.o.server -> slightly OT - cleaning up "dirty" keys?
If (!) one had a database where a primary
key field (e.g. name) had been used for a few
years, and the DB had serveral "variant" spellings
(e.g. "J Smith", "John Smith", "J K Smith", "J. Smith"
all for the same induividual) does
anyone know of a tool that would identify
"likely" groupings.
One would like 2 names with a small
"edit distance"
http://en.wikipedia.org/wiki/Edit_distance
to be put together, for human checking.
But if one had 100,000 keys, this would
involve (in a naive implementation)
10^10 comparisons.
Does anyone know a good algorithm
(an/or heuristic if this is NP-hard)
BugBear Received on Wed Mar 01 2006 - 07:44:53 CST
![]() |
![]() |