Oracle FAQ | Your Portal to the Oracle Knowledge Grid |
Home -> Community -> Usenet -> c.d.o.server -> Re: slightly OT - cleaning up "dirty" keys?
bugbear wrote:
> sybrandb_at_yahoo.com wrote:
>> Try the SOUNDEX function available in Oracle on your data. >> It is heavily English-oriented, but that doesn't seem to be a problem >> in your case. >> SOUNDEX will provide the 'phonetic' representation of a name.
Hardly - soundex only takes the first four characters in account, if these differ, and result in different mappings.
The algorithm is known, and public; this was taken from http://www.blogger.com/publish.g?blogID=12361937&inprogress=true:
# Capitalize all letters in the word and drop all punctuation marks.
Pad the word with rightmost blanks as needed during each procedure step.
# Retain the first letter of the word.
# Change all occurrence of the following letters to '0' (zero):
'A', E', 'I', 'O', 'U', 'H', 'W', 'Y'.
# Change letters from the following sets into the digit given:
# Remove all pairs of digits which occur beside each other from the
string that resulted after step (4).
# Remove all zeros from the string that results from step 5.0 (placed
there in step 3)
# Pad the string that resulted from step (6) with trailing zeros and
return only the first four positions, which will be of the form
<uppercase letter> <digit> <digit> <digit>.
So, your "J Smith", "John Smith" and "J K Smith" will result in totally
different soundex values.
1* select soundex('J Smith'), soundex('John Smith'),soundex('J K
Smith') from dual
SQL> /
SOUN SOUN SOUN
---- ---- ----
J253 J525 J225
-- Regards, Frank van Bortel Top-posting is one way to shut me up...Received on Thu Mar 02 2006 - 14:09:25 CST