Home » RDBMS Server » Performance Tuning » Before Asking the Index Question (Oracle 9.2.0.3 Unix)
Before Asking the Index Question [message #346231] Sun, 07 September 2008 20:01 Go to next message
harrysmall3
Messages: 109
Registered: April 2008
Location: Massachusetts
Senior Member
I have been meaning for sometime to create this post which leads to Nested Loops vs. Hash Join access methods
in SQL Query Performance.

It starts with the quizical look I receive from many jr. programmers at my shop who have stopped by my office for aid in finding the answer to either:

"Am I indexing the correct field(s) in this Query?"
"What other field should I index in this Query?"
or - (the "index question") -
"What fields should I index to make this Query run faster?"

First I direct readers to a very informative artice on this site posted by rleishman http://www.orafaq.com/node/1981
entitled "Tuning High Volume SQL" which contains excellent nuts and bolts data analysis in the background of different query
methods (Starting with a humorous but so relevent anectdote
of Mr. & Mrs. Oracle in the Supermarket!)

99% of the time my answer to Index question or its variations above, after showing the plan path to the programmer, is
"DROP this index" (or force a hash join).

And in all cases the SQL involves table joins. For a while I got to be known as the "Anti-Index" guy at the shop but to show that such is not the case, that I always employ necessary indexes, I started to point out one thing to everyone who comes to me with their queries.

All the queries I am presented with in conjunction to the index question are table join queries. As a starting point in analyzing high volume queries - I ask programmers to first take note of this fact and that Nested Loops are usually the problem and not the solution.

As 90% of all our DB tables are high volume, It has become my rule of thumb to say, for our system, if you are doing a table join then use a hash join. I have my own benchmark in terms of expected rows returned as well.

If your WHERE clause limits the number of rows to be returned for a join key - lets use the following example


SELECT  p.CONTRACT,c.CLAIM_ID  from CLAIMS c,
                                    PROVIDERS p
WHERE c.TAX_ID = p.TAX_ID 
AND   c.TAX_ID IN {..............}


To address how many rows merits the hash join vs. the nested loop, I say for the above, if you want the CONTRACTS for more tax-id's than you would have the time to write out individually in the c.TAX_ID IN {.........} then use a hash join.

Knowing some other facts about our tables, 1% of volume is too much for Nested Loops.

In asserting that I am not the anti-index "villian" I explain that my "NO INDEX" advice always works because the queries presented to me are table join queries. In those regards it is not a contradiction that I tell others who seek my advice to not use indexes while I also talk about the extent of time I take to use proper indexes in my queries.

Its just that the question of "What Index to use" is always posed to me when the query is a table join. Often, when receiving a call to see if I am free to review some SQL, I ask my comrades to do one thing for me before popping by. Make a list of the following:

Is your query a table join. Yes? then identify:
- What criteria in your WHERE clause pertains only to Table A
- What criteria in your WHERE clause pertains only to Table B,etc.
Basically I want to see what criteria can be isolated to an individual table and what criteria defines the joins.

When there is no criteria specific to any one table that limits the rows to be joined, that the only criteria in the WHERE clause involves different foreign key fields to join the table then I usually need look no further regardless of size of the tables.

It becomes paridoxical here because my advice for hash-join rule of thumb now encompasses "When you are joining a lot of data or when you are joining a small amount of data (?!)".

When Table A and B are big, hash join
When Table A and B are small, hash join
When Table A is big and Table B is small, hash join
When Table A is small and Table B is big, hash join.

"Huh??", "So indexes are bad???"
"No I always find benefits for indexing when (repeat the same criteria for Table A and B)"

This blasphemy is resolved when I clarify - When your WHERE criteria onlyt involves the JOIN of the tables.

I summarize to them, and for this post, that our system's capability of in memory hash joins proves to out perform all nested loop joins.

The art of indexing comes into play with all criteria specific to one table that would reduce the number of records to be joined.

As the above pertains to the characteristics in the environment that I work in, I would follow the advice in Ross's article concerning the rule of thumb to be 1 - 10% volume of a table for using nested loops over hash joins.

But my point here is that in evaluating large queries that involve table joins, it is beneficial to take a step back and see what criteria is specific to the join and what criteria is limited to a specific table.

Distinguish the "WHERE A.something = B.something OR A.something_else = B.something_else" from the other criteria specific to the individual tables.. "Where A.Date between...",
"Where B.record_type in '1','2',...", etc.

When the WHERE clauses get large, this can be paramount for understanding what volume you expect back and an initial decision on "Should Indexes be employed".

An eye opener for me, from discussions here, was the importance of clustering and finding out your "best index". Jaws also drop when I ask "What order are the records in" in a specific table.

When all rows can be retrieved in a single block read off a range scan the performance rewards are like winning the lottery.

Regards
Harry



Re: Before Asking the Index Question [message #346316 is a reply to message #346231] Mon, 08 September 2008 03:24 Go to previous messageGo to next message
Olivia
Messages: 519
Registered: June 2008
Senior Member
Thanks for the information.

Regards,
Oli
Re: Before Asking the Index Question [message #346326 is a reply to message #346231] Mon, 08 September 2008 04:07 Go to previous messageGo to next message
Olivia
Messages: 519
Registered: June 2008
Senior Member

@harrysmall3: Thanks for the information.

I want to add a few points here:

In a production database with very large tables, it is not always easy to get your database to invoke hash joins without increasing the RAM regions that control hash joins. For large tables, hash joins requires lots of RAM.

The Oracle hash join is more memory intensive than a nested loop join. To be faster than a nested loop join, we must set the hash_area_size large enough to hold the entire hash table in memory (about 1.6 times the sum of the rows in the table).



Regards,
Oli
Re: Before Asking the Index Question [message #346441 is a reply to message #346326] Mon, 08 September 2008 10:22 Go to previous messageGo to next message
harrysmall3
Messages: 109
Registered: April 2008
Location: Massachusetts
Senior Member
Olivia - after Oracle 8i (?) is the hash_area_size now something dynamically controlled by the PGA_AGGREGATE_TARGET settings? (hash_area_size setting still present but defunct?)

As I am at work for once with time to type and be at an Oracle terminal at the same time, I'd like to throw together a little benchmark test with the memory size we have.

Just because I know I can't store the 1.6 * size of the tables in memory for a couple Giga-byte size tables, and it would be interesting to see the performance even when Oracle has to write to temp (and how many times) in doing the join.

Thx for the feedback!
Harry
Re: Before Asking the Index Question [message #346613 is a reply to message #346441] Tue, 09 September 2008 02:35 Go to previous messageGo to next message
Olivia
Messages: 519
Registered: June 2008
Senior Member
Harry,


we must set the hash_area_size large enough to hold the entire hash table in memory (about 1.6 times the sum of the rows in the table).

If the Oracle hash join overflows the hash_area_size memory, the hash join will page into the TEMP tablespace, severely degrading the performance of the hash join.


To dynamically allocate the proper hash_area_size for your SQL query in terms of the size of your hash join driving table.

select
   'alter session set hash_area_size='||trunc(sum(bytes)*1.6)||';'
from
   dba_segments
where
   segment_name = upper('&1');




Regards,
Oli


Re: Before Asking the Index Question [message #346746 is a reply to message #346231] Tue, 09 September 2008 10:40 Go to previous message
harrysmall3
Messages: 109
Registered: April 2008
Location: Massachusetts
Senior Member
I'm not a DBA, I didnt know this was settable at the session level. More for me to play with Shocked

My interest being reducing that size and seeing the performance reductions as read/write to temp increase when their isnt enough in-memory space.

Thanx very much for the info, Olivia!
Harry
Previous Topic: Perfomance tuning of a view
Next Topic: Cursor versus select into for simple select
Goto Forum:
  


Current Time: Tue Nov 26 14:57:57 CST 2024