Re: Why this query is sooo slow?!

From: <ctcgag_at_hotmail.com>
Date: 02 Jun 2004 23:28:16 GMT
Message-ID: <20040602192816.464$7l@newsreader.com>

ibm_97_at_yahoo.com (JZ) wrote:
> Oracle 9.2.0.5 for Linux on a relatively powerful server (4GB RAM and
> 4 CPU)
> Of course as always, I didn't get any help at all from Oracle tech
> support.
>
> This is the query:
>
> select
> TO_CHAR(n.apptimestamp,'') period,
> c.countryname,
> sum(n.eventcount)
> from
> high n
> ,geocountry_m c
> ,geoipaddress_m ip
> ,geoinformation_m info
> where
> n.apptimestamp >= '05/02/2004 05:00:00'
> and n.apptimestamp < '05/02/2004 06:00:00'
> and n.nfseverity in (4,5)
> and n.deviceid in (24,3,13,22,7,5,21,20,26,12,27,23,9,1,15,18,6,2,11,19,
> 17,10,16,14,25,4,8) and ip.blockid = info.blockid
> and c.countrycode = info.countrycode
> and n.sourceint >= ip.startip
> and n.sourceint <= ip.endip
> group by
> TO_CHAR(n.apptimestamp,'')
> ,c.countryname
> ;
>
> HIGH table has 3.7 millions rows, it only has 10500 rows which meet
> the following conditions in the query:
...
>
> geoipaddress_m table has 1.4 millions rows. It has the following
> columns:
>
> STARTIP NOT NULL NUMBER(38)
> ENDIP NOT NULL NUMBER(38)
> BLOCKID NOT NULL NUMBER(38)
I assume that STARTIP and ENDIP represent the end points of non-overlapping ranges. In other words,

select count(*) from geoipaddress_m a, geoipaddress_m b where a.startip<b.startip and not a.endip < b.startip

Will always return 0.

But of course Oracle doesn't know that. So for each qualifying row in HIGH, it will start at the right spot in geoipaddress_m's index, immediately find one match, and then futilely march all the way to the end of the index looking for more matches (0.7 million rows, on average). I bet you could get good results by re-writing the query such that the look-up into this table is done with a subselect that quits after it finds one row. Actually, you might have to reverse the order of endip and startip in the index to pull that off, because the way it is now I think it starts at the beginning of the index and marches upto the point just before it would quit anyway, at which point it finds the only match.

I don't see how Sybase could pull this query off so rapidly. Perhaps it has some special hack that is optimized for querying into non-overlapping ranges (But how would it know that that is what you mean in the first place?)

Also, if you have access to spatial data indexing option, you could use that to support this type of query cleanly (but I have no hands-on experience at doing so.)

Xho

-- 
-------------------- http://NewsReader.Com/ --------------------
Usenet Newsgroup Service                        $9.95/Month 30GB

Received on Wed Jun 02 2004 - 18:28:16 CDT