Home » RDBMS Server » Performance Tuning » Indexing question
Indexing question [message #185201] Mon, 31 July 2006 08:32 Go to next message
Fred Easey
Messages: 73
Registered: January 2005
Member

Hi,

I have a load of IP Numbers that I want to check against an IP range to establish IP country.

My list of IP numbers has about 10k rows. My list of ranges (in the format ip_number_from, ip_number_to and country_name) has approx 70k rows.

So my query is something like:

select
   b.country_name,
   count(*)
from
   big_ip_list a,
   ip_country_lookup b
where
   a.ip_number between b.ip_number_from and b.ip_number_to
group by
   b.country_name


so my question is, how should I index my lookup table to get the best performance? (nb- none of the ip_ranges on the lookup table overlap each other)

Cheers,

Fred

[Updated on: Mon, 31 July 2006 08:42]

Report message to a moderator

Re: Indexing question [message #185217 is a reply to message #185201] Mon, 31 July 2006 10:29 Go to previous messageGo to next message
JSI2001
Messages: 1016
Registered: March 2005
Location: Scotland
Senior Member
Is the IP unique? presumably it is. Just create a unique index on that column, I wouldn't even index your range columns. It's a pretty small table and Oracle would probably choose to perform a FTS on it anyway.

Jim
Re: Indexing question [message #185223 is a reply to message #185201] Mon, 31 July 2006 11:14 Go to previous messageGo to next message
Fred Easey
Messages: 73
Registered: January 2005
Member

Yeah that seems to work ok. About 3 minutes all told.

Cheers.
Re: Indexing question [message #185277 is a reply to message #185223] Mon, 31 July 2006 21:50 Go to previous messageGo to next message
rleishman
Messages: 3728
Registered: October 2005
Location: Melbourne, Australia
Senior Member
There is another way to do it if you can guarantee no overlaps, but you need to change your SQL.

- Create an index on the IP_NUMBER_FROM

- For extra oomph, add country and ip_number_to to the index, although they are not strictly necessary - don't expect OOM performance improvement - maybe 5-20%.

- Gather stats on the table and index (this is important - it won't work if you don't gather stats).

- Use the SQL:
select
   b.country_name,
   count(*)
from
   big_ip_list a,
   ip_country_lookup b
where
   b.ip_number_from = (
   SELECT max(ip_number_from)
   FROM   big_ip_list
   WHERE  ip_number_from <= a.ip_number
)
and
   a.ip_number between b.ip_number_from and b.ip_number_to
group by
   b.country_name


This exploits a special feature of the CBO to pick the MIN/MAX of a non-unique range with the equivalent I/O of a Unique Index Scan.

Jim's suggestion is good, but scalability is a problem because it (usually) performs Nested Loops lookups on the big table (it can be forced with hints to perform a SORT-MERGE join instead). The above alternative allows you to FTS the big table (much faster way of reading a big table) and then Nested Loops join the smaller (cacheable) table.

Ross Leishman
Re: Indexing question [message #185809 is a reply to message #185277] Thu, 03 August 2006 13:24 Go to previous messageGo to next message
srinivas4u2
Messages: 66
Registered: June 2005
Member
Ross,

If I am not wrong, nested loops is a join method and FTS an access method right?

So the big table and small table could be joined via a nested loops or sort merge or hash join. And the table could be accessed via an index or FTS or patition scan if it's partitioned. Please correct me if I am wrong.

Also can you please explain in detail why you say scalability would be better with your approach?

Thanks,
Srinivas
Re: Indexing question [message #185865 is a reply to message #185809] Thu, 03 August 2006 21:25 Go to previous messageGo to next message
rleishman
Messages: 3728
Registered: October 2005
Location: Melbourne, Australia
Senior Member
>> If I am not wrong, nested loops is a join method and FTS an access method right?

Correct

>> So the big table and small table could be joined via a nested loops or sort merge or hash join.

Yes to NL and Sort-Merge. No to Hash. Hash joins are only used for EQUALS joins.

>> And the table could be accessed via an index or FTS or patition scan if it's partitioned. Please correct me if I am wrong.

Correct

>> Also can you please explain in detail why you say scalability would be better with your approach?

A nested loop join using a large table as the outer (non-driving) table is not scalable because you read the entire big table via an index, which is heaps slower than a full table scan. Reading a big table via an index is between 10 and 100 times slower than reading the same rows via FTS.

If Jim's SQL is using Sort-Merge, it will use FTS (good) but it will need to do a sort (bad). If it uses Nested-Loops, it will FTS on the small table and index access the big table (bad). If the NL join was driving off the big-table, it would do an FTS on the big table (good) and an unbounded index range scan on the small table (bad).

My SQL uses an FTS on the big table (good), and a unique index access on the small table (very good). The index access on the small table is not a problem because it is small enough to live entirely in cache, so even though you access it thousands of times via an index, they are not disk-access which is what kills the indexed access on the big table.

Ross Leishman
Re: Indexing question [message #185977 is a reply to message #185201] Fri, 04 August 2006 10:27 Go to previous messageGo to next message
srinivas4u2
Messages: 66
Registered: June 2005
Member
Thanks for the response Ross!

A nested loop join using a large table as the outer (non-driving) table is not scalable because you read the entire big table via an index, which is heaps slower than a full table scan. Reading a big table via an index is between 10 and 100 times slower than reading the same rows via FTS.

Yes, true. But this would be applicable only if the number of rows returned is say > 15 or 20% right? If the rows returned is less than than we could use index right?

If Jim's SQL is using Sort-Merge, it will use FTS (good) but it will need to do a sort (bad). If it uses Nested-Loops, it will FTS on the small table and index access the big table (bad). If the NL join was driving off the big-table, it would do an FTS on the big table (good) and an unbounded index range scan on the small table (bad).

How are you able to tell the join method without seeing the explain plan?

And how does the change in join method change the access path? (Sort-Merge, it will use FTS (good) and Nested-Loops, it will FTS on the small table and index access the big table (bad))

My SQL uses an FTS on the big table (good), and a unique index access on the small table (very good). The index access on the small table is not a problem because it is small enough to live entirely in cache, so even though you access it thousands of times via an index, they are not disk-access which is what kills the indexed access on the big table.

And can you please tell why your query would use unique index access while Jim's unbounded index range scan?

Thanks,
Srinivas
Re: Indexing question [message #186189 is a reply to message #185977] Sun, 06 August 2006 21:08 Go to previous messageGo to next message
rleishman
Messages: 3728
Registered: October 2005
Location: Melbourne, Australia
Senior Member
srinivas4u2 wrote on Sat, 05 August 2006 01:27

Thanks for the response Ross!

Quote:

A nested loop join using a large table as the outer (non-driving) table is not scalable because you read the entire big table via an index, which is heaps slower than a full table scan. Reading a big table via an index is between 10 and 100 times slower than reading the same rows via FTS.


Yes, true. But this would be applicable only if the number of rows returned is say > 15 or 20% right? If the rows returned is less than than we could use index right?



Correct, but the OP's query was an unconstrained join.

srinivas4u2 wrote on Sat, 05 August 2006 01:27


Quote:

If Jim's SQL is using Sort-Merge, it will use FTS (good) but it will need to do a sort (bad). If it uses Nested-Loops, it will FTS on the small table and index access the big table (bad). If the NL join was driving off the big-table, it would do an FTS on the big table (good) and an unbounded index range scan on the small table (bad).


How are you able to tell the join method without seeing the explain plan?


You can't.

srinivas4u2 wrote on Sat, 05 August 2006 01:27


And how does the change in join method change the access path?
Quote:

(Sort-Merge, it will use FTS (good) and Nested-Loops, it will FTS on the small table and index access the big table (bad))


The nature of Sort-Merge is that it will (typically) use FTS on both tables unless an index exists with all columns required to resolve the query, usually with a full or fast-full scan. It is still possible to perform an index range scan with a Sort-Merge, but only with constant-comparison predicates, never with join predicates.

srinivas4u2 wrote on Sat, 05 August 2006 01:27


Quote:

My SQL uses an FTS on the big table (good), and a unique index access on the small table (very good). The index access on the small table is not a problem because it is small enough to live entirely in cache, so even though you access it thousands of times via an index, they are not disk-access which is what kills the indexed access on the big table.

And can you please tell why your query would use unique index access while Jim's unbounded index range scan?



Jim's uses a BETWEEN to join to the small table; this expands to
a.ip_number >= b.ip_number_from 
and a.ip_number <= b.ip_number_to
If (b) is the outer table, then it could range scan an index with ip_number_from as the leading column, but then it would have to read every matching row in the index and filter those with a.ip_number > b.ip_number_to - even if ip_number_to is in the index.
ie. If there are 200 Million rows in the "small" table with a.ip_number >= b.ip_number_from, but only 5 of those have a.ip_number <= b.ip_number_to, Oracle will still range-scan the 200-Million rows.
The only way a second or subsequent column in the index can be included in a range-scan is if all leading columns have been used with equals conditions.

Ross Leishman

Re: Indexing question [message #186343 is a reply to message #185201] Mon, 07 August 2006 10:17 Go to previous message
srinivas4u2
Messages: 66
Registered: June 2005
Member
Thanks Ross for your reply. Greatly appreciate it!

-Srinivas
Previous Topic: parallel execution
Next Topic: Replication Issues with external tables
Goto Forum:
  


Current Time: Wed Nov 27 05:52:46 CST 2024