Oracle FAQ | Your Portal to the Oracle Knowledge Grid |
Home -> Community -> Mailing Lists -> Oracle-L -> Re: Algorithm or ideas wanted for creative text parsing
I did something similar at one time and didn't find anything cleverer than storing somewhere how many "segments" are significant for one given substr(your_stuff, instr(your_stuff, '.', -1, 1) + 1). For instance, with a .com, .net or .edu you just need the previous piece, for a .uk or a .sg you need the two previous pieces. But it would be too easy if it were as simple, because for .ca you can have big companies that are or smaller ones that are Same story with .us, often (but not always) preceded by a state code, or with .fr because you can have generic stuff (such as .gouv) preceding the termination.
Brace yourself for CASE clauses of death in your statements ...
HTH Stéphane Faroult
rjamya wrote:
>Basically I am looking to isolate just the (distinct) domain name from
>fully qualified domain names that you'd normally see in web-surfing.
>I am working on couple of techniques, but it gets complicated since
>TLDs differ in format and there is only so much you can do with
>sample data ...
>and by some magic the output should be ....
>Any ideas, thoughts? I'd prefer to do this in SQL if possible, else
>I'd prefer plsql. The data is already in a database.
>Thanks in advance
>Got RAC?
-- on Mon Apr 10 2006 - 12:37:15 CDT