finding duplicate records with typo's
From: tom <tomschuring_at_gmail.com>
Date: Sun, 05 Aug 2007 18:13:57 -0700
Message-ID: <1186362837.693799.303390_at_x35g2000prf.googlegroups.com>
hello,
Date: Sun, 05 Aug 2007 18:13:57 -0700
Message-ID: <1186362837.693799.303390_at_x35g2000prf.googlegroups.com>
hello,
can someone tell me (or point me in the right direction) of what the right way of finding duplicates in dirty data (caused by typo's) ?
hash( "hello") => 4323 hash( "helo") => 4334 hash("tree") => 7326
i'm not sure what direction i should look in, this is just an idea that i had, but any idea's are very welcome.
thanks,
tom
Received on Mon Aug 06 2007 - 03:13:57 CEST