finding duplicate records with typo's
From: tom <tomschuring_at_gmail.com>
Date: Sun, 05 Aug 2007 18:13:57 -0700
Message-ID: <1186362837.693799.303390_at_x35g2000prf.googlegroups.com>
hello,
Date: Sun, 05 Aug 2007 18:13:57 -0700
Message-ID: <1186362837.693799.303390_at_x35g2000prf.googlegroups.com>
hello,
can someone tell me (or point me in the right direction) of what the
right way of finding duplicates in dirty data (caused by typo's) ?
is there something like a 'hashing' or 'rating' of text that will give
you a number that you can compare ?
for example
thanks,
hash( "hello") => 4323
hash( "helo") => 4334
hash("tree") => 7326
tom
Received on Mon Aug 06 2007 - 03:13:57 CEST