Re: ORACLE SUBSTR FUNCTION HELP

From: Ed Prochak <ed.prochak_at_magicinterface.com>
Date: Mon, 07 Oct 2002 17:07:00 GMT
Message-ID: <3DA1EC43.989DB10B@magicinterface.com>

John Roberts wrote:
>
> "Dave" <dchou1108_at_hotmail.com> wrote in message
> news:8f19196e.0209290611.38cc9b2_at_posting.google.com...
> > I am trying to get the city name from an address column of type
> VARCHAR2(80)
> >
> > The address looks as follow:
> > ADDRESS
> > --------------------------------------------------------------------------
> ------
> > 2800 Montrose Ave. Lacresenta, CA 92832
> > 1231 Fullerton Ave. Fullerton, CA 92898
> > 3301 Huston Ave. Garden Grove, CA 92123
> > 1234 Placentia Rd. Placentia, CA 92871
> > 34 Starbucks Rd. Seattle, WA 43249
> > 1214 Ford Ave. LA, CA 98649
> > 1317 Bond Ave. LA, CA 98649
> >
> > How do I get the city for the substring. I am trying to group my query by
> the city.
> >
> > Thanks
> >
>
> It all depends on whether this is a one-shot deal (like a data conversion)
> or a continuing application requirement.
>
> If the latter, then table redesign is appropriate, splitting your ADDRESS
> column into STREET_1, STREET_2, CITY, STATE,
> COUNTRY, POSTAL_CD etc.
>
> For a one-shot requirement, or for a non-critical requirement (such as some
> reports), then some sort of
> parsing algorithm implemented as a custom oracle function might work.

(Data conversions are one of my specialties, and I've been dealing with address issues recently.) I would still agree with sybrand that a table redesign is in order. Ignoring the garbage data issue for moment, if the source file separated the fields consistently, then SQL Loader could have loaded a staging table with the fields.

If the source file was not consistent (either fixed length fields or delimiters or both), I would parse it before loading the staging table. PL/SQL has limited abilities for handling the kinds of parsing needed. Even given this ADDRESS table, I would use another language, likely PERL. Dealing with garbage data is another issue. Solving that requires methods and procedures beyond programming. (i.e. there will be addresses that cannot be parsed that require manual intervention such as calling the customer to verify the address).

> Sometimes you have to work
> with what you're dealt (garbage in = garbage out). Your function should
> begin by breaking up the text into tokens and delimiters.
> An attempt should be made to classify each token by inference from length,
> numeric vs non-numeric, mixed case versus upper case, etc.
> You could apply heuristics such as assuming the natural order of things
> (street, then city, then state/province, then zip/postcode).
> You could also do lookups for known state codes, known city names. Once you
> nail down the city and state, you could assume that anything
> preceeding is the street info, anything following is the zip/postcode. In
> the end you should be able to accurately identfy 95% of the correct
> city/state combos. Whether a 5% error rate is acceptable depends on your
> application. I you are using the addresses for sending junk mail,
> then I suspect a 5% error rate is OK. If on the otherhand you are mailing
> legal notices, I suspect you need 100% accuracy.
>
> Hope this suggestion is worthwhile.
>
> John Roberts

The process John describes is good general advice. If you really are stuck with using this table and PL/SQL, then you need both the SUBSTR() and INSTR() functions. (I really would avoid a SQL only solution.)

HTH,
Ed

-- 
Edward J. Prochak   --- Magic Interface, Ltd.
Ofc: 440-498-3700   --- 7295 Popham Place, Solon, OH 44139
on the web at       --- http://www.magicinterface.com
email: ed.prochak_at_magicinterface.com

Received on Mon Oct 07 2002 - 12:07:00 CDT