Oracle FAQ | Your Portal to the Oracle Knowledge Grid |
![]() |
![]() |
Home -> Community -> Usenet -> c.d.o.misc -> Re: Selecting SIMILAR, not the same records (PROBABLE) duplicates
> having done similar work for a client, you will be best off using
> PL/SQL or other procedural language (I also used PERL on that project.)
That's what I'm going to do. I'm not limited to SQL, I will finally do the
thing in PL/SQL
>
> IF you must use only a SQL solution, then you need some preparation
> work. Assuming this is a spelling issue, then you create a spelling
> correction table. One column is the misspelling, and the second column
> is the correct spelling. As long as the mispelling always maps to only
> one correct spelling then this works. Otherwise you need other
> intervention to bring more context into play, which means at least more
> columns to the spelling table. (Consider that if the data you are
> trying to match is a set of street names, then the context of the
> street name is the city. So City would be a column in the spelling
> table and in your query. For example the misspelled street FAR is FAIR
> in A city, but it is FARE in B city.)
In fact my context is very similar:
name-city-country where city+country make an additional context for name.
I was asking for distinguishing names only as it seems the simplest, but what I need to do basing on the business requirement is sorting out candidate duplicates in given city and country + some more conditions ;)
Since I'm dealing with ANY sort of naming - companies, business areas, private people - checking against dictionary would be a kind of suicide ;)
Thanks and BR,
Kroger
Received on Wed Sep 06 2006 - 11:17:19 CDT
![]() |
![]() |