Oracle FAQ | Your Portal to the Oracle Knowledge Grid |
![]() |
![]() |
Home -> Community -> Usenet -> c.d.o.server -> Re: Detecting typos
CE wrote:
> Hi,
>
> I'm working on a project which involves matching customers against an
> existing database using (amonst other things) address, date of birth
> and name.
>
> One problem I have to overcome is allowing for typographical errors.
> E.g. matching "CATHERINE" with "CSTHERINE". So what I though I'd do
is
> have a little function to compare 2 strings and return the number of
> differences:
>
> CREATE OR REPLACE function string_compare
> (string1 IN VARCHAR2,
> string2 IN VARCHAR2) RETURN NUMBER IS
> diffs BINARY_INTEGER := 0;
> BEGIN
> IF least(length(string1),length(string2)) > 0 THEN
> FOR i IN 1..least(length(string1),length(string2)) LOOP
> IF substr(string1,i,1) <> substr(string2,i,1) THEN
> diffs := diffs + 1;
> END IF;
> END LOOP;
> END IF;
> RETURN diffs;
> END;
> /
>
> This works ok (I've got to watch out for comparing MARY and MARK etc,
> though), but doesn't handle a comparison where there might be a
missing
> letter (e.g. "CATHERINE" and "CTHERINE"). Can anybody think of
> anything a little bit cleverer?
>
> Also if anyone could recommend a book on the rationale/logic behind
> such customer matching/de-duplicating, I'd also be grateful.
> Thanks
>
> CE
For completeness sake, this is the function I knocked up in the end. It returns a 1 (as opposed to a 0) if it's an exact match of if there is a single typographical error which I've defined as one of
1) Missing or added character (e.g. CHARLIE and CHARLI) 2) One character incorrect (e.g. CHARLIE and CHARLIW) 3) Pair of characters transposed (e.g. CHARLIE and CHALRIE)
CREATE OR REPLACE function is_typo
(p_string1 IN VARCHAR2,
p_string2 IN VARCHAR2) RETURN NUMBER IS v_differences BINARY_INTEGER := 0; v_string1 VARCHAR2(100) := p_string1; v_string2 VARCHAR2(100) := p_string2; v_is_typo NUMBER := 0;
SUBSTR(v_string1,i+1,1)|| SUBSTR(v_string1,i,1)|| SUBSTR(v_string1,i+2) = v_string2 THEN
Regards
CE Received on Fri Jan 07 2005 - 04:38:45 CST
![]() |
![]() |