Home » RDBMS Server » Performance Tuning » Two 30 million records tables join query needs tuning (Oracle 10g,AIX)
Two 30 million records tables join query needs tuning [message #349366] Sat, 20 September 2008 13:38 Go to next message
d.dineshkumar
Messages: 211
Registered: April 2005
Location: Kolkatta
Senior Member
Hi,

We have two tables both with nearly 30 millions rows and both are not partitioned.

The query gets the sum(of a number column) after joining these two tables. There is a nested loop join of the two tables and for the first table there is a primary key access. The second table access was through a range scan.

Infact the query gets the sum of the number column from the date range of 1-jan-2008 to the end of the previous month.

How to handle aggreagate query like this.

Can partitioining of these two tables will help. Is there any specific way to handle aggregate query.

Thanks for your time.

Regards

Re: Two 30 million records tables join query needs tuning [message #349373 is a reply to message #349366] Sat, 20 September 2008 15:00 Go to previous messageGo to next message
Kriptas
Messages: 118
Registered: June 2008
Senior Member
write down the select, and execution plan for it.
try to use hash join.
partitioning may help, but for partitioning you need to now lots of condidtions and select statements that ofter acces the table to decide what partitioning you need
Re: Two 30 million records tables join query needs tuning [message #349437 is a reply to message #349373] Sun, 21 September 2008 16:33 Go to previous messageGo to next message
harrysmall3
Messages: 109
Registered: April 2008
Location: Massachusetts
Senior Member
I agree, other options should be investigated before making a table modification decision.

My first instinct as well would be compare vs. a hash join,
given the number of rows to process in a nested loop.

Regards
Harry
Re: Two 30 million records tables join query needs tuning [message #349553 is a reply to message #349437] Mon, 22 September 2008 03:45 Go to previous messageGo to next message
rleishman
Messages: 3728
Registered: October 2005
Location: Melbourne, Australia
Senior Member
If you are joining large proportions of the tables (eg. >1M rows from each) then a HASH join will certainly out-perform a nested-loops join.

Given your table sizes are quite large, a hash join will use quite a bit of memory and spill over to TEMP space. You may even find that you exceed available TEMP space and the query fails.

At this point you can do two things:
- Revert to nested loops. Much slower, but it is fully scaleable. You can make it a bit quicker by placing all of the columns you need to select from the subordinate table in the join key index.
- Equi-partition the tables and exploit partition-wise joins. See this article for an explanation of why this is the fastest most scaleable join method.

Ross Leishman
Re: Two 30 million records tables join query needs tuning [message #349594 is a reply to message #349366] Mon, 22 September 2008 05:31 Go to previous messageGo to next message
joshirish
Messages: 3
Registered: September 2008
Junior Member
"Infact the query gets the sum of the number column from the date range of 1-jan-2008 to the end of the previous month"

sounds like a decision support/reporting kind of query.

Have you tried materialized views ?

--Sjo
Re: Two 30 million records tables join query needs tuning [message #350727 is a reply to message #349366] Fri, 26 September 2008 01:01 Go to previous messageGo to next message
Kevin Meade
Messages: 2103
Registered: December 1999
Location: Connecticut USA
Senior Member
I am guessing your problem is that the dates are in the parent table. Thus you must find the parent rows you want based on dates, in order to aquire a list of their primary keys and then do lookups with these primary key values against the corresponding foreign key in the child table. If so, then I am not sure how partitioning can help you.

Sorry I have no solution for you at this time, but a piece of SQL might help us all.

This might be a situation in which the very old CLUSTERING technique would work. I never had any luck with clustering, but there are people I have talked with in the past who swore by it. They were adamant that under the right situations, clusting was very useful, and infact their systems would never work without it. They typically however had a hierarchy of six or seven tables that they clustered together. Still, its worth a look I suppose.

Good luck, Kevin
Re: Two 30 million records tables join query needs tuning [message #350763 is a reply to message #350727] Fri, 26 September 2008 04:44 Go to previous messageGo to next message
rleishman
Messages: 3728
Registered: October 2005
Location: Melbourne, Australia
Senior Member
I've done quite a bit of benchmarking with clusters and they are becoming less and less useful.

Oracle's hash-join memory management is so s--t hot that hash joins easily out-perform cluster-joins (whose memory management is performed in the clunky buffer-cache). On the flip-side, cluster joins are naturally scaleable whereas hash joins are dependent on PGA availability. This can be mitigated in hash joins by hash partitioning and partition-wise joins.

It makes a bit of sense logically: the benefit in co-locating related rows in a single block (clustering) is to avoid reading multiple blocks when you need all those rows together. However in a high-volume query, you need to read a significant proportion of the blocks anyway, so the benefit of co-locating is lost.

The biggest problem with clusters is that you cannot partition.

The biggest benefit of clustering is in OLTP applications that need to read a small number of related rows. By co-locating those rows, each small, directed query can read a single block instead of 5 or 6. Multiply this across across 100 users on your system and you have a real load reduction.

Ross Leishman
Re: Two 30 million records tables join query needs tuning [message #350804 is a reply to message #350763] Fri, 26 September 2008 07:27 Go to previous messageGo to next message
Kevin Meade
Messages: 2103
Registered: December 1999
Location: Connecticut USA
Senior Member
Thanks Ross, good stuff. But we still do not know if this guy can even partition his dataset effectively to gain any benefit. If his problem is as I described, then it may not be possible.

I am watching this one to see who figures it out as I can likely use the answer if there is one.

So... I agree with all you said. I just think it is likely with the size data he is pushing that his hash process may go to temp space in which case, hashing is hosed.

Kevin
Re: Two 30 million records tables join query needs tuning [message #350902 is a reply to message #350804] Sat, 27 September 2008 03:52 Go to previous messageGo to next message
rleishman
Messages: 3728
Registered: October 2005
Location: Melbourne, Australia
Senior Member
The solution to super-high volume scaleable joins is equi-partitioning or equi-sub-partitioning followed by partition-wise hash joins.

It's probably fair to say that if you can't partition or sub-partition your tables, then you cant cluster them either.

Ross Leishman
Re: Two 30 million records tables join query needs tuning [message #350923 is a reply to message #349366] Sat, 27 September 2008 12:04 Go to previous message
Kevin Meade
Messages: 2103
Registered: December 1999
Location: Connecticut USA
Senior Member
Thanks Ross. This is clearly an area where you have more experience than I. I have not done gobs of partitioning (nor Clustering for that matter). Once again what you say sound most reasonable. I am clearly going to do some refresher reading about hashing and partitioning.

To remark once more on Clustering. I never had much success with it. Each group of pepole I do remember that told me they did make it work had two common comments about it every time (different companys different groups of people, same comments):

Quote:
1) they never clusted two tables together. They always clustered an entire hierarchy of tables together. Four to Seven tables seemed to be a normal thing to do and typically there was only one cluster in their system which was the focus of their application.

2) they all said that their system would not be workable without the clustering solution. They simply could not get the data put together fast enough for their respective purposes.

This is all just FYI.

I thank you once again Ross for giving me something interesting to read while my wife watches her Saturday Morning "girly shows".

Kevin
Previous Topic: a locking scenario
Next Topic: Why does t.rowid = t.rowid improves performance?
Goto Forum:
  


Current Time: Tue Nov 26 09:19:33 CST 2024