Oracle FAQ Your Portal to the Oracle Knowledge Grid
HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US
 

Home -> Community -> Usenet -> c.d.o.server -> Re: Detecting typos

Re: Detecting typos

From: Mark C. Stock <mcstockX_at_Xenquery>
Date: Fri, 7 Jan 2005 08:55:41 -0500
Message-ID: <q8mdnbRXIvR9CUPcRVn-3Q@comcast.com>

"Frank van Bortel" <fvanbortel_at_netscape.net> wrote in message news:crm2sp$ocf$1_at_news1.zwoll1.ov.home.nl...
| Rauf Sarwar wrote:
| > CE wrote:
| >
| [major snip]
| > Maybe you should have incorporated SOUNDEX as originally suggested by
| > Sybrand. Soundex will not pick up any gross spelling mistakes but if it
| > sounds similar then it will pick it up. In your example above,
| > 'CHARLIE' does not sound similar to 'CHALRIE' so soundex will not pick
| > it up.
| >
| > SQL> select 1 from dual
| > 2 where soundex('CHARLIE') = soundex('CHARLI');
| >
| > 1
| > ----------
| > 1
| >
| > SQL> select 1 from dual
| > 2 where soundex('CHARLIE') = soundex('CHARLIW');
| >
| > 1
| > ----------
| > 1
| >
| > SQL> select 1 from dual
| > 2 where soundex('CHARLIE') = soundex('CHARL');
| >
| > 1
| > ----------
| > 1
| >
| > SQL> select 1 from dual
| > 2 where soundex('CHARLIE') = soundex('CHALRIE');
| > 1
| > ----------
| >
| > SQL>
| >
| > Regards
| > /Rauf
| >
| Well, soundex only checks for a limited set of
| characters, returning fixed values for the first
| four (after discarding vowels, H, Y and W).
| So, it's only logical that soundex('CHARLIE')
| is different from soundex('CHALRIE'), as in fact
| CLR is compared to CRL.
| See

|
http://www.csc.fi/cschelp/sovellukset/stat/sas/sasdoc/sashtml/lgref/z0245948.htm
| for an explanation of soundex

|

| Nice to see a 1918 algorithm for U.S. census records since 1880 is still
| used and discussed :).
|

| --
| Regards,
| Frank van Bortel

CE....

Try approaching your problem as a spell-check problem... a little googling on that might land you some interesting stuff

Basic logic would be something like...

  1. is word in list of known words? then it's ok
  2. is work in list of known permutations? then return error with list of possible known words
  3. apply a permutation algorithm (more sophisticated than soundex) to the word and check against appropriate list(s)
  4. apply a repair algorithm to the word and check against appropriate list
  5. 'learn' word if user says so
    • mcs
Received on Fri Jan 07 2005 - 07:55:41 CST

Original text of this message

HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US