Oracle FAQ | Your Portal to the Oracle Knowledge Grid |
![]() |
![]() |
Home -> Community -> Usenet -> c.d.o.misc -> Re: Selecting SIMILAR, not the same records (PROBABLE) duplicates
> And if you have this what do you do?
>
> ID
> 1 aaa
> 2 aaa h
> 3 h
> 4 h aaa
>
> The request makes no business sense. Here's what I would suggest:
In that case (1,2) and (3,4) are candidate duplicates. It makes business sense, looking at a simple example (data entered by dumb or dumber user into name field):
-JohnSmith -John -John Smith Jr
In the app we are developing, we must be able to display that those three entries MAY refer to the same person...
>
> SELECT DISTINCT name
> FROM table;
>
> Spool the output, send it to the manager of the department, and ask
> them to sort it out.
This wouldn't make sense because
1. above
2. data changing approximately 5-20k records per day
>
> Alternatively ask for a list of legal values. Create a foreign key
> constraint and blow everything that doesn't match out of the table
> so they can sort it out.
The point is not to blow duplicates, but to pass the candidates to manual verification.
>
> This is not a programming problem ... this is a business problem
> and there really is no solution other than a subject matter expert
> looking over the garbage value by value.
I wouldn't agree.. In my particular case there is a (reasonable) business request, which I need to implement, with or without magic ;)
I solved the issue though. I still need to polish the code, however the basic idea is to choose ids of the records with the shortest name having candidate duplicates (in your sample records 1 and 3) and then I am able to display all others, with <pseudo-sql> name like(chosen_shortest_name_by_id||'%') </pseudo-sql>
> PS: rename your columns.
Those are just samples, the real table has ca. 15 columns with fine names..
Thanks anyway!
BR,
Kroger
Received on Wed Sep 06 2006 - 11:07:20 CDT
![]() |
![]() |