Oracle FAQ Your Portal to the Oracle Knowledge Grid
HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US
 

Home -> Community -> Mailing Lists -> Oracle-L -> Re: Algorithm or ideas wanted for creative text parsing

Re: Algorithm or ideas wanted for creative text parsing

From: Daniel Fink <danielwfink_at_yahoo.com>
Date: Mon, 10 Apr 2006 10:22:02 -0700 (PDT)
Message-ID: <20060410172202.45031.qmail@web37215.mail.mud.yahoo.com>


Raj,

Could you put valid 'extensions' into a table?

Read domain name
Loop domain extensions table

   If domain extension is at the end of domain name

       remove domain extension from domain name
       domain name = domain name - everything before the last .
       exit loop

end loop

Regards,
Daniel Fink

rjamya <rjamya_at_gmail.com> wrote: If I don't distinguish "blueyonder.co.uk" and "demon.co.uk", it will be just "co.uk" and that means most of commercial domains under UK tld. It will be akin to bundling most of us sites under ".com" alone.

if I had to take only the last 2 parts, it is a piece of cake, I wouldn't trouble this list for such a small RTFM issue. The problem I have is much more complicated.

And no, this isn't a rhetorical question at all. Raj

On 4/10/06, sol beach wrote:
> Rhetorical question -
>
> On what basis will the s/w "decide" whether 2 (akamaistream.net) parts or 3
> (blueyonder.co.uk) parts
> is the "right" answer?
>
>
> On 4/10/06, rjamya wrote:
> >
> Basically I am looking to isolate just the (distinct) domain name from
> fully qualified domain names that you'd normally see in web-surfing.
>
> I am working on couple of techniques, but it gets complicated since
> TLDs differ in format and there is only so much you can do with
> substr().
>

--
----------------------------------------------
Got RAC?
--
http://www.freelists.org/webpage/oracle-l




--
http://www.freelists.org/webpage/oracle-l
Received on Mon Apr 10 2006 - 12:22:02 CDT

Original text of this message

HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US