Re: Error related to hash memory

From: Jared Still <jkstill_at_gmail.com>
Date: Wed, 8 Dec 2021 12:28:17 -0800
Message-ID: <CAORjz=P5j7ZRvH+1GjVREbL+2G-3BdTsmgJ4+NVScyVguZSy8g_at_mail.gmail.com>



HI Jonathan,

The issue I saw with this was after the hash group spilled to disk.

All temp space would be consumed, then the process would crash.

Regardless of how much temp space was provided, it would all be consumed, and a crash would ensue.

I've only seen this happen on two different occasions. I wrote it off as a bug that could not be easily duplicated, and disabled hash group by temporarily, until that process was completed.

Jared Still
Certifiable Oracle DBA and Part Time Perl Evangelist Principal Consultant at Pythian
Oracle ACE Alumni
Pythian Blog http://www.pythian.com/blog/author/still/ Github: https://github.com/jkstill

On Thu, Dec 2, 2021 at 2:11 AM Jonathan Lewis <jlewisoracle_at_gmail.com> wrote:

>
> The error message is
> ORA-32690: Hash Table Infrastructure ran out of memory
> Two things to note about the wording: l
> It's NOT a message saying something about being unable to allocate PGA
> memory
> It's not a message about a hash JOIN, it's about a hash TABLE.
>
> I do not know if I'm reading too much into the words, and I don't know
> exactly how Oracle handles hash aggregation so I won't be able to answer
> anyt questions about the detailed mechanisms that might be causing this
> error to appear; but I do know something about how hash joins work and, in
> particular, how the code handles the in-memory BUILD TABLE component if it
> won't fit in memory.
>
> If you look at the Estimated and Actual rows for the hash aggregation you
> can see that the estimate is out by a factor of over 1,000. For hash joins
> Oracle sets up an structure for the build table that anticipates splitting
> it into partitions that can spill to disc if the volume of data is too
> large to handle in memory (hence one-pass and multipass hash joins). It's
> likely that Oracle does something similar for hash aggregation but doesn't
> allow for the estimate being out by a factor of 1,000 and the enormous
> discrepancy might have resulted (for example) in Oracle being able to keep
> track of data that it had to dump to disc.
>
> I can see two options (apart from the obvious "set the parameter according
> to the workaround suggested" so that the aggregation is a sort aggregate).
> The first is simply to find out why the estimate of rows is so bad and
> allow Oracle to work out a better estimate. Since there's a non-mergeable
> view in the query (operation 20) you may be able to inject a cardinality
> (or, to be fussy, an opt_estimate hint) that simply says "there will be
> 168000000 rows coming out of this query block". That might allow Oracle to
> create a suitable internal structure to handle the data volume correctly.
> However, I note that the MAX MEM for the hash aggregation is 2GB and that
> may be a hard limit (assuming it is an accurate report rather than a
> round-up from something much less), so option 2 is to reduce the absolute
> memory requirement of the hash aggregation.
>
>
> Option 2, reducing the memory requirement: the plan is going serial for
> the hash aggregation and it would be good to find out why. If it runs
> parallel the problem may disappear anyway. However I note that there appear
> to be 8 scalar subqueries in the select list; (operations 2 - 18), and I
> think they are growing the row length BEFORE the aggregation takes place.
> If some of these scalar subqueries can be made to operate AFTER the hash
> aggregation then you could be aggregating a much smaller volume of data and
> the memory requirment and the need to track a large volume spilled to disc
> could be reduced to a level that bypasses the error.
>
> I happen to have published a note only yesterday with an example that
> rewrote a query to do a sort before collecting scalar subquery column
> values: https://jonathanlewis.wordpress.com/2021/12/01/best-practice/ At
> the end of it I pointed out that generally "you don't want to do this kind
> of thing unless it's for very special circumstances" - but you may have
> exactly those circumstances. Another reason for considering this strategy
> is that maybe there's something about one (or more) of your scalar
> subqueries that forces you hash aggregation to serialis - e.g. do you call
> a pl/sql function that is not parallel enabled - and moving one or more
> scalar subqueries out of the hash aggregation may result in parallel
> execution taking place.
>
> (As others hav pointed out - you're doing a serlal insert anyway, so maybe
> a simple "enable parallel dml" would bypass the issue anyway.)
>
> (On a side note: I am curious about the operations 4 - 11 of your plan.
> The code to generate the plan "depth" has some errors in 11g, so I'd like
> to know whether that part of the plan represents a single scalar subquery
> with a couple of cascading subqueries, or a scalar subquery which uses
> decode()/case with further scalar subqueries embedded in the decode()/case,
> or has some other structure.)
>
> Regards
> Jonathan Lewis
>
>
>
>
>
>
>
>
>
>
>
>

--
http://www.freelists.org/webpage/oracle-l
Received on Wed Dec 08 2021 - 21:28:17 CET

Original text of this message