|
|
Re: How Oracle optimizer choose joins (hash, Merge & nested loop join) [message #569577 is a reply to message #568993] |
Mon, 29 October 2012 22:56 |
|
Kevin Meade
Messages: 2103 Registered: December 1999 Location: Connecticut USA
|
Senior Member |
|
|
Michel is right on the money. Your first priority should be to write a correct and easy to understand query. The optimizer will do the rest for you.
But since you asked, here is a little information about it. I assume you know what a HASH JOIN is and a NESTED LOOPS JOIN etc. This is to say you know the basics of query construction for Oracle? Michel provided reading for if you don't.
In short, Oracle applies some basic rules of thumb against the query text you supply in order to rewrite it into a different variation that can be better optimized.
Once it has this rewrite, it costs out different plans and selects the plan with the lowest cost.
Cost is an Oracle mashup number based on estimated individual costs added together of I/O, CPU, (and depending upon database version NETWORK / MEMORY CONSUMED / SYSTEM RESOURCE SATURATIONS).
Here is the important part...
For Oracle to cost correctly, you need to provide the optimizer with the right information. You do this by:
1) using a good relational design (3rd normal form data model, constraint definitions, proper data types and nullability)
2) doing the right pre-maintenance on your system (smart statistics collection, use of cpu costing, well reasoned database parameter settings)
3) writing a query that correctly defines the result you want
If you do these three things then only one query in one thousand will be a problem and most of them will be a problem only because of some optimizer limitation or special situation and there will almost always be a database feature you can exploit to fix their performance.
As for joins, here is a very simplified explanation:
In Oracle we mostly concern ourselves with two types of joins: 1) NESTED LOOPS JOIN and 2) HASH JOIN. HASH JOIN comes in two parts: 1) setup and 2) join. NESTED LOOPS JOIN has only one part, 1) join. HASH JOIN join part is almost always more efficient then NESTED LOOPS JOIN join part. But the setup and overhead of HASH JOIN means there is extra cost. As a result, Oracle will usually do a NESTED LOOPS JOIN for a small set of driving rows because this generally makes the NESTED LOOPS JOIN less expensive than a HASH JOIN because of the setup overhead of the HASH JOIN. But after a certain number of rows, HASH JOIN becomes way more efficient inspite of it setup overhead.
NESTED LOOPS JOIN requires very little memory.
HASH JOIN requires lots of memory.
This is another clue that NESTED LOOPS JOIN is much better suited for small joins whereas HASH JOIN is better at large joins.
Remember, I said the above was a vey simple explanation. It leaves out many many details and I am trying to avoid controversy where not everyone agrees on specific opinions.
Kevin
[Updated on: Mon, 29 October 2012 23:01] Report message to a moderator
|
|
|