Re: finding duplicate records with typo's

From: Bob Badour <bbadour_at_pei.sympatico.ca>
Date: Sun, 05 Aug 2007 22:25:00 -0300
Message-ID: <46b67836$0$4038$9a566e8b_at_news.aliant.net>

tom wrote:

> hello,
>
> can someone tell me (or point me in the right direction) of what the
> right way of finding duplicates in dirty data (caused by typo's) ?
>
> is there something like a 'hashing' or 'rating' of text that will give
> you a number that you can compare ?
>
> for example
>
> hash( "hello") => 4323
> hash( "helo") => 4334
> hash("tree") => 7326
>
> i'm not sure what direction i should look in, this is just an idea
> that i had, but any idea's are very welcome.
>
> thanks,
> tom
>

If you are looking for duplicates, I assume you want to note the similarity between "hello" and "helo". The name of the function usually used for that is soundex. Received on Mon Aug 06 2007 - 03:25:00 CEST

This message: [ Message body ]
Next message: tom: "Re: finding duplicate records with typo's"
In reply to tom: "finding duplicate records with typo's"
Next in thread: tom: "Re: finding duplicate records with typo's"
Reply: tom: "Re: finding duplicate records with typo's"

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ]

Original text of this message