Re: How to fix cache buffer chain issue

From: Jonathan Lewis <jlewisoracle_at_gmail.com>
Date: Sat, 19 Jun 2021 16:06:09 +0100
Message-ID: <CAGtsp8nhJfvrSHszCvLeip5c6jPOsbJa5Y8gKBosKu2GgNtbHg_at_mail.gmail.com>



Looking at the predicate information for operation 15 you can see that it is an ACCESS predicate (that, since it came from memory) has "lost" the subquery bit.
This means the subquery runs one to generate a value that is effectively then used as a constant to drive the index range scan at operation 15. The index tad_ix2 is (presumably) on the column (date_cr), so this part of the query selects all data for the most recent date by index range scan. (And possibly will be doing this 17 times in the course of the query).

Regards
Jonathan Lewis

On Fri, 18 Jun 2021 at 13:40, Pap <oracle.developer35_at_gmail.com> wrote:

> Thank you Jonathan. I will try to collect ASH information one more time
> when it's running. I need to cross check to see if it's really stuck for
> hours just in the parse stage when multiple sessions submit the same
> query, which I doubt though. And yes the sql monitor capturing also needs
> the underscore parameter change as you rightly pointed out. Will try to get
> that done.
>
> But below is the specific section of the predicate information which I
> captured using the display_cursor function on the same node. And this table
> TAD/index TAD_IX2 was also suspected by other team members that it may be
> the one causing the issue. That index TAD_IX2 is on column DT_CR which has
> ~460 distinct values out of the table TAD holding a total 11million rows.
> So can that index cause so much of an issue?
>
> From the predicate section I am not able to understand if the sort
> aggregate is used as a filter as that exact line is not captured in the
> predicate section. Is there any way I can force the optimizer to perform
> this part efficiently to get rid of the latch cache buffer chain?
>
> There exists another index on column PMENT of table TAD having 909
> distinct values.
>
>
> INSERT INTO GTT1
> SELECT ...
> FROM TAD TAD, TMMC TMMC, TMTD TMTD, TMCT TMCT, TNI TNI
> WHERE TAD.MCD = TMMC.PTMCD
> AND TMTD.PMENT = TAD.PMENT
> AND TMTD.CID = TMCT.SWCTID
> AND TNI.NE = 'XX'
> AND TNI.NID = TMTD.DID
> AND (TRUNC ( :B2) - ADT) <= 1
> AND ( (TAD.MCD = :B1 AND NVL ( :B1, 'ZZZZ') <> 'ZZZZ')
> OR NVL ( :B1, 'ZZZZ') = 'ZZZZ')
> AND TAD.DT_CR IN (SELECT MAX (DT_CR) FROM TAD)
> GROUP BY TMTD.PMENT, TAD.MCD, TMMC.PMCDVAL,
> TMCT.PDCMLPNT
> UNION
> SELECT ..
> FROM TAD TAD, TMMC TMMC, TMTD TMTD, TMCT TMCT, TNI TNI
> WHERE TAD.MCD = TMMC.PTMCD
> AND TMTD.PMENT = TAD.PMENT
> AND TMTD.CID = TMCT.SWCTID
> AND TNI.NE = 'XX'
> AND TNI.NID = TMTD.DID
> AND (TRUNC ( :B2) - ADT) > 1
> AND (TRUNC ( :B2) - ADT) <= 2
> AND ( (TAD.MCD = :B1 AND NVL ( :B1, 'ZZZZ') <> 'ZZZZ')
> OR NVL ( :B1, 'ZZZZ') = 'ZZZZ')
> AND TAD.DT_CR IN (SELECT MAX (DT_CR) FROM TAD)
> GROUP BY TMTD.PMENT, TAD.MCD, TMMC.PMCDVAL, TMCT.PDCMLPNT
>
>
>
>
>
> ----------------------------------------------------------------------------------------------------------------------
> | Id | Operation | Name
> | Rows | Bytes | Cost (%CPU)| Time |
>
> ----------------------------------------------------------------------------------------------------------------------
> | 0 | INSERT STATEMENT |
> | | | 12967 (100)| |
> | 1 | LOAD TABLE CONVENTIONAL |
> | | | | |
> | 2 | SORT UNIQUE |
> | 18 | 1548 | 12967 (1)| 00:02:36 |
> | 3 | UNION-ALL |
> | | | | |
> | 4 | HASH GROUP BY |
> | 1 | 86 | 738 (1)| 00:00:09 |
> | 5 | NESTED LOOPS |
> | 1 | 86 | 736 (1)| 00:00:09 |
> | 6 | NESTED LOOPS |
> | 1 | 86 | 736 (1)| 00:00:09 |
> | 7 | NESTED LOOPS |
> | 1 | 79 | 735 (1)| 00:00:09 |
> |* 8 | HASH JOIN |
> | 1 | 60 | 734 (1)| 00:00:09 |
> | 9 | NESTED LOOPS |
> | 12 | 384 | 26 (0)| 00:00:01 |
> | 10 | NESTED LOOPS |
> | 12 | 384 | 26 (0)| 00:00:01 |
> |* 11 | TABLE ACCESS STORAGE FULL | TNI
> | 12 | 192 | 2 (0)| 00:00:01 |
> |* 12 | INDEX UNIQUE SCAN | MTD_PK
> | 1 | | 1 (0)| 00:00:01 |
> | 13 | TABLE ACCESS BY INDEX ROWID | TMTD
> | 1 | 16 | 2 (0)| 00:00:01 |
>
>
>
> *|* 14 | TABLE ACCESS BY INDEX ROWID | TAD
> | 14 | 392 | 708 (1)| 00:00:09 ||* 15 | INDEX RANGE
> SCAN | TAD_IX2 | 28797 | | 116
> (0)| 00:00:02 || 16 | SORT AGGREGATE |
> | 1 | 8 | | || 17 |
> INDEX FULL SCAN (MIN/MAX) | TAD_IX2 | 1 | 8
> | 3 (0)| 00:00:01 |*
> | 18 | TABLE ACCESS BY INDEX ROWID | TMMC
> | 1 | 19 | 1 (0)| 00:00:01 |
> |* 19 | INDEX RANGE SCAN | TMMC_IX1
> | 1 | | 0 (0)| |
> |* 20 | INDEX UNIQUE SCAN | MCT_PK
> | 1 | | 0 (0)| |
> | 21 | TABLE ACCESS BY INDEX ROWID | TMCT
> | 1 | 7 | 1 (0)| 00:00:01 |
> | 22 | HASH GROUP BY |
> | 1 | 86 | 717 (1)| 00:00:09 |
>
>
>
> Predicate Information (identified by operation id):
> ---------------------------------------------------
>
> 8 - access("TMTD"."PMENT"="TAD"."PMENT")
> 11 - storage("TNI"."NE"='XX')
> filter("TNI"."NE"='XX')
> 12 - access("TNI"."NID"="TMTD"."DID")
> 14 - filter(((NVL(:B1,'ZZZZ')='ZZZZ' OR ("TAD"."MCD"=:B1 AND
> NVL(:B1,'ZZZZ')<>'ZZZZ')) AND
> TRUNC(:B2)-"ADT"<=1))
> 15 - access("TAD"."DT_CR"=)
> 19 - access("TAD"."MCD"="TMMC"."PTMCD")
> 20 - access("TMTD"."CID"="TMCT"."SWCTID")
>
> Regards
> Pap
>
> On Fri, Jun 18, 2021 at 3:36 PM Jonathan Lewis <jlewisoracle_at_gmail.com>
> wrote:
>
>> The plan you've shown is 310 lines long and there's a hidden parameter
>> "_sqlmon_max_planlines" which defaults to 300 in your version of Oracle and
>> is the limiting size of a plan that can be monitor. If you set this to a
>> slightly larger value your plan should appear in the monitor data.
>>
>> "the event its showing for the session is "latch: cache buffers chains"
>> but active session history is not showing up any significant activity for
>> that session"
>>
>> Please clarify - if the query takes 1-2 minutes to run when serial that
>> should give you 60 - 120 samples in v$active_session_history unless unless
>> all the time is spent in "SQL*Net message to/from client". So where are you
>> finding reports of "latch: cache buffers chains" and are you sure that this
>> isn't "last wait" rather than "current wait".
>>
>> There are two obvious possibilities for long run-times that become
>> excessive when you issue multiple concurrent versions of the same query -
>> first is parse time escalating (see, for example this case study:
>> https://jonathanlewis.wordpress.com/2014/12/09/parse-time-2/ ). The
>> statement you've shown may be subject to very long partse times as the
>> optimizer plays around with options for subquery factorisation, complex
>> view merging, unnesting, and join factorisation.
>>
>> The second possibility (which would be more consistent with your comment
>> of "cache buffers chains latches" - would be the contention hinted at by
>> the frequent appearance of the claus:
>>
>> AND TAD.DT_CR IN (SELECT MAX (DT_CR)
>> FROM TAD)
>>
>> which results in this repeated section of execution plan:
>>
>> | 15 | TABLE ACCESS BY INDEX ROWID | TAD | 1 | 29 | 512 (1)| 00:00:07 |
>> | 16 | INDEX RANGE SCAN | TAD_IX2 | 21183 | | 81 (0)| 00:00:01 |
>> | 17 | SORT AGGREGATE | | 1 | 8 | | |
>> | 18 | INDEX FULL SCAN (MIN/MAX) | TAD_IX2 | 1 | 8 | 3 (0)| 00:00:01 |
>>
>> You haven't supplied the Predicate Information for the plan, but if the
>> SORT AGGREGATE is operating as a FILTER predicate rather than a driving
>> ACCESS predicate, and since the index range scan of TAD_IX2 looks as if it
>> might be walking through a very large number of rows it's POSSIBLE that
>> your query is doing a very large number of gets on the root, branch and
>> high-value leaf block for the min/max scan. (I wouldn't expect this to be
>> that case - thanks to scalar subquery caching - but it's something to check
>> when you get the SQL Monitor report). (I would also expect the optimizer
>> to transform the IN (select max() ... ) to = (select max() ,..) since the
>> subquery has to be single row, but maybe it hasn't and maybe that's part of
>> the problem.
>>
>> Regards
>> Jonathan Lewis
>>
>>
>>
>>
>>
>>
>> On Fri, 18 Jun 2021 at 06:46, Pap <oracle.developer35_at_gmail.com> wrote:
>>
>>> Hello Listers, Its version 11.2.0.4 of oracle exadata. And we are
>>> facing an issue in which a reporting query(part of plsql procedure) which
>>> normally finishes within ~1-2minutes runs for ~1-2hrs at times. This
>>> happens when the same query is submitted from 5-6 multiple sessions at the
>>> same time and is accessing the same customer data. When we kill them and
>>> rerun them in serial they run fine without any issue and finish in the same
>>> 1-2 minutes duration.
>>>
>>> Few things we observed is , when all the session submitted at same time
>>> and the query runs long , the event its showing for the session is "latch:
>>> cache buffers chains" but active session history is not showing up any
>>> significant activity for that session and also the sql monitor is not
>>> getting logged for that query. Which means it's not doing significant
>>> activity while this issue occurs but kind of stuck. Why is it so? And also
>>> due to that , I am not able to capture the current object on which it's
>>> actually holding that latch.
>>>
>>> The query is an INSERT query which inserts data into a global temporary
>>> table. It has ~17 UNION clauses of which most look similar. So i am
>>> wondering if by someway we can rewrite this query which will help us in
>>> fixing this issue or making the situation better?
>>>
>>> Attached is the sample INSERT query with UNION clauses(I have removed a
>>> few of the UNIONS to make it look simple) and its plan which suffers from
>>> "latch: cache buffers chains".
>>>
>>> Regards
>>> Pap
>>>
>>

--
http://www.freelists.org/webpage/oracle-l
Received on Sat Jun 19 2021 - 17:06:09 CEST

Original text of this message