finding duplicate records with typo's

From: tom <tomschuring_at_gmail.com>
Date: Sun, 05 Aug 2007 18:13:57 -0700
Message-ID: <1186362837.693799.303390_at_x35g2000prf.googlegroups.com>

hello,

can someone tell me (or point me in the right direction) of what the right way of finding duplicates in dirty data (caused by typo's) ?

is there something like a 'hashing' or 'rating' of text that will give you a number that you can compare ?

hash( "hello") => 4323
hash( "helo") =>  4334
hash("tree")  => 7326

i'm not sure what direction i should look in, this is just an idea that i had, but any idea's are very welcome.

thanks,
tom Received on Mon Aug 06 2007 - 03:13:57 CEST

This message: [ Message body ]
Next message: Bob Badour: "Re: finding duplicate records with typo's"
Previous message: Hugo Kornelis: "Re: Cardinality - I really need help"
Next in thread: Bob Badour: "Re: finding duplicate records with typo's"
Reply: Bob Badour: "Re: finding duplicate records with typo's"
Reply: Marshall: "Re: finding duplicate records with typo's"

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ]

Original text of this message