Oracle FAQ | Your Portal to the Oracle Knowledge Grid |
Home -> Community -> Mailing Lists -> Oracle-L -> Algorithm or ideas wanted for creative text parsing
Basically I am looking to isolate just the (distinct) domain name from
fully qualified domain names that you'd normally see in web-surfing.
I am working on couple of techniques, but it gets complicated since TLDs differ in format and there is only so much you can do with substr().
sample data ...
a836.v8519e.c8519.g.vm.akamaistream.net
a705.l1923962123.c19239.n.lm.akamaistream.net
db.c7.bf.a0.top.list.ru a1657.l1923962104.c19239.n.lm.akamaistream.net a1181.v21080b.c21080.g.vm.akamaistream.net dl1.games.vip.scd.yahoo.com lcp.mud.us.music.yahoo.com www.celhs.osceola.k12.fl.us
w.s0.gc.sj.ipixmedia.com w.s0.gc.sj.ipixmedia.com v.s0.gc.sj.ipixmedia.com
lib1.store.vip.sc5.yahoo.com www.twingroves.district96.k12.il.us www.twingroves.district96.k12.il.us www.the-simpsons.hpg.ig.com.br www.schools.pinellas.k12.fl.us www.rails4days.pwp.blueyonder.co.uk www.rails4days.pwp.blueyonder.co.uk www.garrp.dhr.state.ga.us www.celhs.osceola.k12.fl.us www.williamrobertson.pwp.blueyonder.co.ukwww.williamrobertson.pwp.blueyonder.co.uk
lcp.mud.us.music.yahoo.com c.s0.gc.sj.ipixmedia.com c.s0.gc.sj.ipixmedia.com
and by some magic the output should be ....
akamaistream.net
apple.com
yahoo.com
fl.us
ipixmedia.com
il.us
ig.com.br
blueyonder.co.uk
ga.us
yellowpagecity.com
Any ideas, thoughts? I'd prefer to do this in SQL if possible, else I'd prefer plsql. The data is already in a 10.1.0.4 database.
Thanks in advance
Raj