Re: Selecting SIMILAR, not the same records (PROBABLE) duplicates

From: felidae <m.mischke_at_gmx.net>
Date: 14 Sep 2006 04:42:46 -0700
Message-ID: <1158234166.453569.301610@b28g2000cwb.googlegroups.com>

Regardless of the data model, integrity and whatever, how about this simple approach:

SQL> select * from test;

ID NAME
---------- --------------------

         1 aaa
         2 aaa xxx
         3 aaa
         4 aaah
         5 bbb
         6 bbb p
         7 ccc
         8 h
         9 h aaa

9 rows selected.

SQL> select t1.id, t1.name, t2.id matching_id, t2.name duplicate_name 2 from test t1, test t2
3 where t2.name like '%' || t1.name || '%' 4 and t1.id != t2.id;

        ID NAME                 MATCHING_ID DUPLICATE_NAME


---------- -------------------- ----------- --------------------

         3 aaa                            1 aaa
         1 aaa                            2 aaa xxx
         3 aaa                            2 aaa xxx
         1 aaa                            3 aaa
         1 aaa                            4 aaah
         3 aaa                            4 aaah
         8 h                              4 aaah
         5 bbb                            6 bbb p
         1 aaa                            9 h aaa
         3 aaa                            9 h aaa
         8 h                              9 h aaa

11 rows selected. Received on Thu Sep 14 2006 - 06:42:46 CDT