Two 30 million records tables join query needs tuning [message #349366] |
Sat, 20 September 2008 13:38 |
d.dineshkumar
Messages: 211 Registered: April 2005 Location: Kolkatta
|
Senior Member |
|
|
Hi,
We have two tables both with nearly 30 millions rows and both are not partitioned.
The query gets the sum(of a number column) after joining these two tables. There is a nested loop join of the two tables and for the first table there is a primary key access. The second table access was through a range scan.
Infact the query gets the sum of the number column from the date range of 1-jan-2008 to the end of the previous month.
How to handle aggreagate query like this.
Can partitioining of these two tables will help. Is there any specific way to handle aggregate query.
Thanks for your time.
Regards
|
|
|
|
|
Re: Two 30 million records tables join query needs tuning [message #349553 is a reply to message #349437] |
Mon, 22 September 2008 03:45 |
rleishman
Messages: 3728 Registered: October 2005 Location: Melbourne, Australia
|
Senior Member |
|
|
If you are joining large proportions of the tables (eg. >1M rows from each) then a HASH join will certainly out-perform a nested-loops join.
Given your table sizes are quite large, a hash join will use quite a bit of memory and spill over to TEMP space. You may even find that you exceed available TEMP space and the query fails.
At this point you can do two things:
- Revert to nested loops. Much slower, but it is fully scaleable. You can make it a bit quicker by placing all of the columns you need to select from the subordinate table in the join key index.
- Equi-partition the tables and exploit partition-wise joins. See this article for an explanation of why this is the fastest most scaleable join method.
Ross Leishman
|
|
|
|
|
Re: Two 30 million records tables join query needs tuning [message #350763 is a reply to message #350727] |
Fri, 26 September 2008 04:44 |
rleishman
Messages: 3728 Registered: October 2005 Location: Melbourne, Australia
|
Senior Member |
|
|
I've done quite a bit of benchmarking with clusters and they are becoming less and less useful.
Oracle's hash-join memory management is so s--t hot that hash joins easily out-perform cluster-joins (whose memory management is performed in the clunky buffer-cache). On the flip-side, cluster joins are naturally scaleable whereas hash joins are dependent on PGA availability. This can be mitigated in hash joins by hash partitioning and partition-wise joins.
It makes a bit of sense logically: the benefit in co-locating related rows in a single block (clustering) is to avoid reading multiple blocks when you need all those rows together. However in a high-volume query, you need to read a significant proportion of the blocks anyway, so the benefit of co-locating is lost.
The biggest problem with clusters is that you cannot partition.
The biggest benefit of clustering is in OLTP applications that need to read a small number of related rows. By co-locating those rows, each small, directed query can read a single block instead of 5 or 6. Multiply this across across 100 users on your system and you have a real load reduction.
Ross Leishman
|
|
|
|
Re: Two 30 million records tables join query needs tuning [message #350902 is a reply to message #350804] |
Sat, 27 September 2008 03:52 |
rleishman
Messages: 3728 Registered: October 2005 Location: Melbourne, Australia
|
Senior Member |
|
|
The solution to super-high volume scaleable joins is equi-partitioning or equi-sub-partitioning followed by partition-wise hash joins.
It's probably fair to say that if you can't partition or sub-partition your tables, then you cant cluster them either.
Ross Leishman
|
|
|
Re: Two 30 million records tables join query needs tuning [message #350923 is a reply to message #349366] |
Sat, 27 September 2008 12:04 |
|
Kevin Meade
Messages: 2103 Registered: December 1999 Location: Connecticut USA
|
Senior Member |
|
|
Thanks Ross. This is clearly an area where you have more experience than I. I have not done gobs of partitioning (nor Clustering for that matter). Once again what you say sound most reasonable. I am clearly going to do some refresher reading about hashing and partitioning.
To remark once more on Clustering. I never had much success with it. Each group of pepole I do remember that told me they did make it work had two common comments about it every time (different companys different groups of people, same comments):
Quote: | 1) they never clusted two tables together. They always clustered an entire hierarchy of tables together. Four to Seven tables seemed to be a normal thing to do and typically there was only one cluster in their system which was the focus of their application.
2) they all said that their system would not be workable without the clustering solution. They simply could not get the data put together fast enough for their respective purposes.
|
This is all just FYI.
I thank you once again Ross for giving me something interesting to read while my wife watches her Saturday Morning "girly shows".
Kevin
|
|
|