Oracle FAQ | Your Portal to the Oracle Knowledge Grid |
![]() |
![]() |
Home -> Community -> Usenet -> c.d.o.misc -> Selecting SIMILAR, not the same records (PROBABLE) duplicates
Hi,
I've been struggling with that for two days now... There is a simple solution for finding duplicates - with GROUP BY and HAVING COUNT(*)>1 but it is not enough in my case...
For the example table as follows:
id || name 1 || aaa 2 || aaa xxx 3 || aaa 4 || aaah 5 || bbb 6 || bbb p 7 || ccc
I would need to see the following:
rowId || duplicatedRowId
1 2 1 3 1 4 5 6
This means, that I need to build a query that for a given row selects others, whose beginning of the name is similar...
In fact, something like that:
select t1.id as rowId, t2.id as duplicateRowId from test t1, test t2 where t1.name like t2.name||'%' and t1.id<>t2.id
would do... but in such case I end up with all id-pairs (meaning I get 5 is a duplicate of 6 and 6 is a duplicate of 5)
I'm neither Oracle or SQL expert... Is there any easy way for obtaining what
I desired? I feel I'm getting dumb with that..
Or do I need to go into cursors etc?
I'm also looking for efficient solution for that issue, since the source
table to lookup is pretty big... 1-2M records...
Thanks in advance,
Best regards,
Kroger Received on Tue Sep 05 2006 - 14:05:15 CDT
![]() |
![]() |