Re: high temp space usage for same analytic function on latest version
Date: Wed, 16 Nov 2022 23:23:27 +0530
Message-ID: <CAEzWdqds7BLUrMQ+FcjT5q7JJ7f1L=jtzO5U9p4wvON2_a4B0w_at_mail.gmail.com>
Extremely sorry, as it looks like I have messed it up. Trying to correct it.
https://gist.github.com/databasetech0073/55c5722da8105dfcc647ab3bd1b24a48
And also as you have highlighted the bug related to bloom filter i tried forcing the (no_px_join_filter hints and even setting up "_bloom_filter_enabled" and "_bloom_pruning_enabled" in session level to stop the bloom pruning and tried running SELECT part of the query, but still its using same excessive amount of tempspace.
--1st sqlmonitor
Below is the sql monitor and outline in version 11.2.0.4, which was finishing by consuming <20GB tempspace.
https://gist.github.com/databasetech0073/630dad9fc43d6ab6c8863251e29f1cf2
--2nd sql monitor
Below is the sql monitor and outline in version 19.15 (by forcing exactly the same plan through a sql profile as it used to take in 11.2).
https://gist.github.com/databasetech0073/37ec33e73a23bd0055148ba43a0a49ca
--3rd sql monitor
Below is the sql monitor and outline in current version ( i.e 19.15 without any profile) which is also consuming high tempspace.
https://gist.github.com/databasetech0073/4b895f5790a993ed326d389b54295f9b
On Tue, 15 Nov, 2022, 6:39 pm Jonathan Lewis, <jlewisoracle_at_gmail.com> wrote:
>
> Since I've noticed Bloom filters as an anomaly, it might be worth checking
> if you're patched for:
> "Bug 30618406 - Huge PGA memory consumption by bloom filters
> (qesblfilter_seg) in px slaves in full partition-wise join"
>
> I know this plan isn't using full PW joins - but (read the details) - your
> parallel operation uses a lot of memory, and then a lot of temp space.
> Sometimes the bug goes a lot further than the initial description.
>
> Marked as fixed in 23.1, but I can see patches for 19.16 in the RU.
>
> Regards
> Jonathan Lewis
>
>
>
>
>
>
> On Tue, 15 Nov 2022 at 07:15, Jonathan Lewis <jlewisoracle_at_gmail.com>
> wrote:
>
>>
>> I've taken a few minutes to look through the github material. The
>> headings don't agree with the contents. Here's a list showing each heading
>> plus two lines from the Outline Data under the heading:
>>
>> Display cursor plan from version 11.2.0.4
>> OPTIMIZER_FEATURES_ENABLE('19.1.0')
>> DB_VERSION('19.1.0')
>>
>>
>> Execution which is failing in 19.15 version with high tempspace
>> consumption
>> OPTIMIZER_FEATURES_ENABLE('11.2.0.4')
>> DB_VERSION('19.1.0')
>>
>>
>> Execution which was running fine in 19.11 version
>> OPTIMIZER_FEATURES_ENABLE('11.2.0.4')
>> DB_VERSION('11.2.0.4')
>>
>>
>> There doesn't seem to be any point in spending time looking at detail
>> when I've got no confidence that the bits of information supplied belong
>> together.
>>
>>
>> A couple of high visibility points, though:
>> a) the version labelled "Execution which was running fine in 19.11
>> version" reported OPT_PARAM('optimizer_dynamic_sampling' 3) in its outline
>> data, and that could have helped Oracle get better estimates on some of the
>> predicates that included functions of columns.
>> b) the version labelled "Display cursor plan from version 11.2.0.4"
>> included Bloom Filtering sent to storage on the massive tablescan where I
>> pointed out the massive CPU anomaly in an earlier post.
>> 76 - storage((COALESCE("FPT"."DEL_FLG",'N')='N' AND
>> TRUNC(INTERNAL_FUNCTION("FPT"."H_DT_TIM"))=:B1 AND
>> SYS_OP_BLOOM_FILTER(:BF0000,"FPT"."A_FK")))
>> filter((COALESCE("FPT"."DEL_FLG",'N')='N' AND
>> TRUNC(INTERNAL_FUNCTION("FPT"."H_DT_TIM"))=:B1 AND
>> SYS_OP_BLOOM_FILTER(:BF0000,"FPT"."A_FK")))
>>
>>
>> Regards
>> Jonathan Lewis
>>
>>
>>
>> On Mon, 14 Nov 2022 at 17:18, yudhi s <learnerdatabase99_at_gmail.com>
>> wrote:
>>
>>> Thank You Sayan and Jonathan.
>>>
>>> As you suggested, as a quick thing , in the select query, I was trying
>>> this hint in the main block to influence optimizer to increase the
>>> resulting rows ~18 times from the CTE "C" , but its not getting reflected
>>> in the plan, which means optimizer is not considering this, but then it
>>> should appear in the hint report section of the plan as error or unused
>>> etc.. but it is not getting published there too, so not sure why is this
>>> happening. Is there anything wrong in the hint below which i am using?
>>>
>>> I am using below hint in the main select query which performs outer join
>>> with the "cte" Table C
>>>
>>> /*+OPT_ESTIMATE( TABLE, C, scale_rows=18)*/
>>>
>>> Also , I tried as below but seeing no change to the tempspace
>>> consumption though.
>>>
>>> cast(RPAD (SUBSTR (CP.CP_NBR, 1, CP.CPCL_NBR - X.RN),CASE
>>> CP.PANL_CNT WHEN 0 THEN 16 ELSE CP.PANL_CNT END, '9') as
>>> varchar2(22)) AS HIGH_VALUE
>>>
>>> cast(RPAD (SUBSTR (CP.CP_NBR, 1, CP.CPCL_NBR - X.RN), CASE
>>> CP.PANL_CNT WHEN 0 THEN 16 ELSE CP.PANL_CNT END, '0') as varchar2(22)
>>> ) AS LOW_VALUE,
>>>
>>>
>>>
>>>
>>> On Mon, 14 Nov, 2022, 4:22 pm Jonathan Lewis, <jlewisoracle_at_gmail.com>
>>> wrote:
>>>
>>>> Still avoiding spending a lot of time on the github text.
>>>>
>>>> a) You materialize a CTE that invovles a connect by query that
>>>> generates 18 rows - use the opt_estimate() hint to tell Oracle that the
>>>> materialized view holds 18 times the rows it expects. (If you examine the
>>>> plan the optimizer allows for ONE row coming from dual, not 18).
>>>>
>>>> b) You have several places where you use expresssions like
>>>> substr(c_nbr,1,4) in predicates . Create (optionally invisible) virtual
>>>> columns for these expressions and gather stats on them, this will give the
>>>> optimizer a better chance of getting the right arithmetic, especially for
>>>> cases where the expression needs a frequency (or top-frequency) histogram.
>>>> Invisible is safest for existing production code, but you may need to test
>>>> carefully to see if the optimizer manages to use the stats from the virtual
>>>> columns when it sees the equivalent expressions.
>>>>
>>>> Regards
>>>> Jonathan Lewis
>>>>
>>>>
>>>>
>>>>
>>>> On Sun, 13 Nov 2022 at 14:28, yudhi s <learnerdatabase99_at_gmail.com>
>>>> wrote:
>>>>
>>>>> Thank you Lok. Actually the join is not exactly on the columns but
>>>>> with some functions so that might be the case why it seems like Cartesian.
>>>>>
>>>>> However the point is , this query has not been changed and I agree
>>>>> that it may be because the data has been increased a lil in one of the
>>>>> table 'C' or 'T' , so the join output has been increased from 7billion to
>>>>> 10billion but how come that justifies the temp space requirement from 20gb
>>>>> to 2.5terabyte it not twice or thrice but multiple times.
>>>>>
>>>>> As this query is running in parallel-16 degree because of a underlying
>>>>> table degree , and considering no other option at hand, so I was thinking
>>>>> if by increasing or decreasing the parallelism with help of hints will help
>>>>> in completing the query at least with lesser temp space(we have 2.5TB of
>>>>> temp space with us right now currently) ? Or any other possible option to
>>>>> make this query succeed?
>>>>>
>>>>> Just to note , we had initially 200gb of temp space but just with the
>>>>> hope that this query may finish, we already increased it to 2.5TB now, but
>>>>> still no luck and the query is failing and it's happening after this 19.15
>>>>> patch.
>>>>>
>>>>>
>>>>> On Sun, 13 Nov, 2022, 6:06 pm Lok P, <loknath.73_at_gmail.com> wrote:
>>>>>
>>>>>> The below section you posted shows the input to 'Hash join outer'
>>>>>> were ~1Million and 37milion respectively, but the result came from the
>>>>>> outer join is ~7billion. So is there any oddity in the data in table 'T'
>>>>>> and 'C' which is playing a role which is increasing the rows input to your
>>>>>> analytical function recently?
>>>>>>
>>>>>>
>>>>>> SQL Plan Monitoring Details (Plan Hash Value=2056239125)
>>>>>>
>>>>>> =================================================================================================================================================================================================================================================================================
>>>>>> | Id | Operation
>>>>>> | Name | Rows | Cost | Time | Start |
>>>>>> Execs | Rows | Read | Read | Write | Write | Cell | Mem | Temp
>>>>>> | Activity | Activity Detail |
>>>>>> | |
>>>>>> | | (Estim) | | Active(s) | Active
>>>>>> | | (Actual) | Reqs | Bytes | Reqs | Bytes | Offload | (Max) |
>>>>>> (Max) | (%) | (# samples) |
>>>>>>
>>>>>> =================================================================================================================================================================================================================================================================================
>>>>>> |
>>>>>> | | | | | |
>>>>>> | | | | | | | |
>>>>>> | | direct path read temp (553) |
>>>>>> | |
>>>>>> | | | | |
>>>>>> | | | | | | | | |
>>>>>> | | direct path write temp (47) |
>>>>>> | 31 | HASH JOIN RIGHT OUTER
>>>>>> | | 9M | 2M | 1676 | +218
>>>>>> | 16 | 7G | | | | | | 217M |
>>>>>> | 23.08 | Cpu (4041) |
>>>>>> | 32 | BUFFER SORT
>>>>>> | | | | 4 | +216 |
>>>>>> 16 | 1M | | | | | | 97M |
>>>>>> | | |
>>>>>> | 33 | PX RECEIVE
>>>>>> | | 10647 | 24 | 4 | +216 |
>>>>>> 16 | 1M | | | | | | |
>>>>>> | | |
>>>>>> | 34 | PX SEND HASH
>>>>>> | :TQ10005 | 10647 | 24 | 1633 | +216
>>>>>> | 1 | 1M | | | | | | |
>>>>>> | 0.01 | Cpu (2) |
>>>>>> | 35 | VIEW
>>>>>> | | 10647 | 24 | 1632 | +217 |
>>>>>> 1 | 1M | | | | | | |
>>>>>> | | |
>>>>>> | 36 | TABLE ACCESS STORAGE FULL
>>>>>> | SYS_TEMP_0FDA1E71E_D71F1BDE | 10647 | 24 | 1632 | +217 |
>>>>>> 1 | 1M | 106 | 104MB | | | | |
>>>>>> | 0.01 | cell multiblock physical read (1) |
>>>>>> | 37 | PX RECEIVE
>>>>>> | | 9M | 2M | 1676 | +218
>>>>>> | 16 | 37M | | | | | | |
>>>>>> | 0.05 | Cpu (9) |
>>>>>> | 38 | PX SEND HASH
>>>>>> | :TQ10009 | 9M | 2M | 1677 | +217 |
>>>>>> 16 | 37M | | | | | | |
>>>>>> | 0.13 | Cpu (23) |
>>>>>> | 39 | VIEW
>>>>>> | | 9M | 2M | 1677 | +217
>>>>>> | 16 | 37M | | | | | | |
>>>>>> | | |
>>>>>>
>>>>>> On Sun, Nov 13, 2022 at 2:34 AM yudhi s <learnerdatabase99_at_gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Thank you Jonathan.
>>>>>>> Actually the original query was a long one so i was trying to reduce
>>>>>>> the complexity by just posting the particular section of the query which
>>>>>>> was causing the temp space spike. But I agree that just a partial query
>>>>>>> doesn't make much sense. Also my apology as the plan format was distorted.
>>>>>>>
>>>>>>> I have again posted the exact query below with the actual object
>>>>>>> names being replaced with dummy names. I have posted the plan in the github
>>>>>>> in the link below, so the sql execution plan format will be intact. And I
>>>>>>> am not having the "display cursor" plan for the current runs/failures for
>>>>>>> this INSERT query, so i have posted the sql monitors for them and i had the
>>>>>>> display cursor plan when it was running fine in 11.2.0.4 so i posted that
>>>>>>> just for reference. Also I have copied and pasted the outline section from
>>>>>>> the display_awr below each of the sqlmonitor plan.
>>>>>>>
>>>>>>>
>>>>>>> https://gist.github.com/databasetech0073/714263bce477f624763f757e457cb861
>>>>>>>
>>>>>>> As mentioned earlier, the section of the plan which is doing the
>>>>>>> outer join is below and it results in ~10billion rows. Which then is passed
>>>>>>> to the analytical function. Was trying to understand, Is there any possible
>>>>>>> way to minimize the temp space consumption in this scenario? because we
>>>>>>> kept on increasing the temp space assuming it would succeed somehow, but
>>>>>>> it just kept consuming 2.5TB+ space and then also it failed with "Ora-01652
>>>>>>> unable to extend temp segment .." as it saturates all allocated temp
>>>>>>> space?
>>>>>>> As far as I know the encryption/TDE was already there in this
>>>>>>> database before we moved from 19.11 to 19.15. I will double check with the
>>>>>>> infra team on this.
>>>>>>>
>>>>>>> "LEFT OUTER JOIN C
>>>>>>> ON SUBSTR (T.C_NBR, 1, 4) = C.BNK_ID_NB_4
>>>>>>> AND LENGTH (T.C_NBR) = C.PANL_CNT
>>>>>>> AND T.C_NBR BETWEEN C.LOW_VALUE AND C.HIGH_VALUE) S"
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Sat, Nov 12, 2022 at 12:50 PM Jonathan Lewis <
>>>>>>> jlewisoracle_at_gmail.com> wrote:
>>>>>>>
>>>>>>>> >> We reran the query and it took a different plan hash value ,
>>>>>>>> however the path is almost similar wrt the line which does "Outer join" and
>>>>>>>> the "analytical function" evaluation. I have posted the specific section of
>>>>>>>> sql monitor below with predicate and column projection for that run.
>>>>>>>>
>>>>>>>> The supplied outputs are virtually unreadable, for different
>>>>>>>> execution plans, for different versions (and youve changed from 19.11 for
>>>>>>>> the "slow" query to 11.2.0.4!) , and partial. I'm not going to look at them.
>>>>>>>>
>>>>>>>> >> However, the sql monitor shows status as 'error' out after
>>>>>>>> sometime(~30minutes)) but the underlying sessions(~33 parallel sessions)
>>>>>>>> kept on running for 2hrs+
>>>>>>>>
>>>>>>>> That looks like the behaviour that Mohamed Houri has written about
>>>>>>>> in the past:
>>>>>>>> https://hourim.wordpress.com/2020/03/07/sql-monitor-flaw/
>>>>>>>>
>>>>>>>> I'd forgotten it would be harder to collect the correct statistics
>>>>>>>> from all the processes when running a parallel query - finding sessions by
>>>>>>>> SQL_ID could work, but using views like v$px_sesstat etc. is safer.
>>>>>>>> Nothing stands out from the information you supplied about wait events, and
>>>>>>>> the only "unusual" thing in the session activity is that you are writing
>>>>>>>> encrypted blocks to the temporary tablespace - were you doing that in the
>>>>>>>> earlier version? It shouldn't explain a massive increase in space
>>>>>>>> requirements, but it might be relevant - it would help to explain an
>>>>>>>> increase in CPU.
>>>>>>>>
>>>>>>>>
>>>>>>>> Regards
>>>>>>>> Jonathan Lewis
>>>>>>>>
>>>>>>>>
>>>>>>>> On Fri, 11 Nov 2022 at 17:07, yudhi s <learnerdatabase99_at_gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Thank You Sayan, Jonathan.
>>>>>>>>>
>>>>>>>>> We reran the query and it took a different plan hash value ,
>>>>>>>>> however the path is almost similar wrt the line which does "Outer join" and
>>>>>>>>> the "analytical function" evaluation. I have posted the specific section of
>>>>>>>>> sql monitor below with predicate and column projection for that run.
>>>>>>>>> However, the sql monitor shows status as 'error' out after
>>>>>>>>> sometime(~30minutes)) but the underlying sessions(~33 parallel sessions)
>>>>>>>>> kept on running for 2hrs+, until then I saw ~2TB of temp space already
>>>>>>>>> consumed and later it failed with insufficient temp space error.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
-- http://www.freelists.org/webpage/oracle-lReceived on Wed Nov 16 2022 - 18:53:27 CET