RE: Huge tempspace requirement - doesn't match explain plan
Date: Thu, 21 Aug 2008 21:50:55 -0500
Message-ID: <7ED53A68952D3B4C9540B4EFA5C76E36052452DB@CWYMSX04.Corp.Acxiom.net>
Cam, concerning why a hash join might consume way more temp space than expected, check your columns involved in the join. Frequently we find that temp space over usages is caused by 2 joined tables on non-unique columns ON BOTH SIDES of the join, which is a many-to-many and will consume tons of temp space.
Dave
Dave Herring, DBA | A c x i o m M I C S / C S O 630-944-4762 office | 630-430-5988 wireless | 630-944-4989 fax
From: oracle-l-bounce_at_freelists.org [mailto:oracle-l-bounce_at_freelists.org] On Behalf Of cam Sent: Thursday, August 21, 2008 4:51 PM
To: oracle-l_at_freelists.org
Subject: Huge tempspace requirement - doesn't match explain plan
Hello all,
I'm having terrible trouble with a complex query that is consuming huge amounts of temp space. Sometimes.
This query is large with several nested queries and uses parallelism. One upshot of this is that the explain plans, already complex, are very challenging to read with the addition of the PX send and receive events - they're over a hundred lines long with deep nesting. In text form from awrsqrpt they are truncated halfway thru the tmpspc column which doesn't help since temp space is a specific concern.
The instance has 320 GB (!) of temp space assigned and this query is using it all, but the tmpspc report in the html form of the explain plan generated by awrsqrpt seems to show a requirement of less than 10 GB. I'm monitoring the tmp space usage with a little script which polls v$tempspace & v$sort_usage and I can see each of 64 parallel servers using 8GB each. the query runs for about 2 hours before failing with temp space exhaustion. Reducing parallelism doesn't seem to have any effect on the eventual temp requirement.
On a test instance on another but fairly similar box, the temp space usage is peaking at around 18 GB - this is with very similar driving table volumes, identical parallelism settings etc. Run completes in under 2 hours. Explain plans are different but it is difficult to track down how and why, given their sheer size and complexity. Also, different 'successful' runs with this smaller temp requirement on the test box are themselves using quite different explain plans.
This is 10.2.0.4, full table stats (i.e. estimate=>null, cascade=>true) have been gathered for all tables involved in the joins and BVP has been disabled with _OPTIM_PEEK_USER_BINDS=false to avoid histograms confusing the picture for the moment.
So, I suppose my questions are as follows for anyone willing to read this far... I know I haven't posted much to go on - I'm pretty sure I'm unable to post actual details for reasons of IPR etc.
- is there a workaround for awrsqrpt truncating horizontal output? I'm guessing it would just be to edit the script but haven't dared to yet... The HTML version is an acceptable workaround but I thought it might be nice to be able to use diff on text versions of plans. I'd also really like to be able to see the parent_id of a rowset - at depths like this, the indentation is close to useless for figuring out siblings and parents.
- Why would the explain plans estimate for temspace requirement be so wildly at odds with what is actually being consumed?
- *Very broadly speaking*, I'm assuming that this massive use of temp space is caused by too many hash joins of full table scans - is trying to force more nested loops a reasonable strategy to reduce temp requirement? I'd be inclined to use optimizer_index_cost_adj and _cache parameters for this since I want to avoid hints if possible. Is this a good way to encourage NLs?
- Are there any tools/techniques for organising/viewing/comparing large explain plans?
Cheers for any insights,
cam
The information contained in this communication is confidential, is intended only for the use of the recipient named above, and may be legally privileged.
If the reader of this message is not the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited.
If you have received this communication in error, please resend this communication to the sender and delete the original message or any copy of it from your computer system.
Thank You.
-- http://www.freelists.org/webpage/oracle-lReceived on Thu Aug 21 2008 - 21:50:55 CDT