Re: How to fix cache buffer chain issue
Date: Sat, 19 Jun 2021 22:51:42 +0530
Message-ID: <CAEjw_fjTRYXd_Vxh0CvAqfv7xN7s3NaRNxvgppTUeBWA5TVd1g_at_mail.gmail.com>
>
> Looking at the predicate information for operation 15 you can see that it
> is an ACCESS predicate (that, since it came from memory) has "lost" the
> subquery bit.
> This means the subquery runs one to generate a value that is effectively
> then used as a constant to drive the index range scan at operation 15. The
> index tad_ix2 is (presumably) on the column (date_cr), so this part of the
> query selects all data for the most recent date by index range scan. (And
> possibly will be doing this 17 times in the course of the query).
>
> Regards
> Jonathan Lewis
>
>
> On Fri, 18 Jun 2021 at 13:40, Pap <oracle.developer35_at_gmail.com> wrote:
>
>> Thank you Jonathan. I will try to collect ASH information one more time
>> when it's running. I need to cross check to see if it's really stuck for
>> hours just in the parse stage when multiple sessions submit the same
>> query, which I doubt though. And yes the sql monitor capturing also needs
>> the underscore parameter change as you rightly pointed out. Will try to get
>> that done.
>>
>> But below is the specific section of the predicate information which I
>> captured using the display_cursor function on the same node. And this table
>> TAD/index TAD_IX2 was also suspected by other team members that it may be
>> the one causing the issue. That index TAD_IX2 is on column DT_CR which has
>> ~460 distinct values out of the table TAD holding a total 11million rows.
>> So can that index cause so much of an issue?
>>
>> From the predicate section I am not able to understand if the sort
>> aggregate is used as a filter as that exact line is not captured in the
>> predicate section. Is there any way I can force the optimizer to perform
>> this part efficiently to get rid of the latch cache buffer chain?
>>
>> There exists another index on column PMENT of table TAD having 909
>> distinct values.
>>
>>
>> INSERT INTO GTT1
>> SELECT ...
>> FROM TAD TAD, TMMC TMMC, TMTD TMTD, TMCT TMCT, TNI TNI
>> WHERE TAD.MCD = TMMC.PTMCD
>> AND TMTD.PMENT = TAD.PMENT
>> AND TMTD.CID = TMCT.SWCTID
>> AND TNI.NE = 'XX'
>> AND TNI.NID = TMTD.DID
>> AND (TRUNC ( :B2) - ADT) <= 1
>> AND ( (TAD.MCD = :B1 AND NVL ( :B1, 'ZZZZ') <> 'ZZZZ')
>> OR NVL ( :B1, 'ZZZZ') = 'ZZZZ')
>> AND TAD.DT_CR IN (SELECT MAX (DT_CR) FROM TAD)
>> GROUP BY TMTD.PMENT, TAD.MCD, TMMC.PMCDVAL,
>> TMCT.PDCMLPNT
>> UNION
>> SELECT ..
>> FROM TAD TAD, TMMC TMMC, TMTD TMTD, TMCT TMCT, TNI TNI
>> WHERE TAD.MCD = TMMC.PTMCD
>> AND TMTD.PMENT = TAD.PMENT
>> AND TMTD.CID = TMCT.SWCTID
>> AND TNI.NE = 'XX'
>> AND TNI.NID = TMTD.DID
>> AND (TRUNC ( :B2) - ADT) > 1
>> AND (TRUNC ( :B2) - ADT) <= 2
>> AND ( (TAD.MCD = :B1 AND NVL ( :B1, 'ZZZZ') <> 'ZZZZ')
>> OR NVL ( :B1, 'ZZZZ') = 'ZZZZ')
>> AND TAD.DT_CR IN (SELECT MAX (DT_CR) FROM TAD)
>> GROUP BY TMTD.PMENT, TAD.MCD, TMMC.PMCDVAL, TMCT.PDCMLPNT
>>
>>
>>
>>
>>
>> ----------------------------------------------------------------------------------------------------------------------
>> | Id | Operation | Name
>> | Rows | Bytes | Cost (%CPU)| Time |
>>
>> ----------------------------------------------------------------------------------------------------------------------
>> | 0 | INSERT STATEMENT |
>> | | | 12967 (100)| |
>> | 1 | LOAD TABLE CONVENTIONAL |
>> | | | | |
>> | 2 | SORT UNIQUE |
>> | 18 | 1548 | 12967 (1)| 00:02:36 |
>> | 3 | UNION-ALL |
>> | | | | |
>> | 4 | HASH GROUP BY |
>> | 1 | 86 | 738 (1)| 00:00:09 |
>> | 5 | NESTED LOOPS |
>> | 1 | 86 | 736 (1)| 00:00:09 |
>> | 6 | NESTED LOOPS |
>> | 1 | 86 | 736 (1)| 00:00:09 |
>> | 7 | NESTED LOOPS |
>> | 1 | 79 | 735 (1)| 00:00:09 |
>> |* 8 | HASH JOIN |
>> | 1 | 60 | 734 (1)| 00:00:09 |
>> | 9 | NESTED LOOPS |
>> | 12 | 384 | 26 (0)| 00:00:01 |
>> | 10 | NESTED LOOPS |
>> | 12 | 384 | 26 (0)| 00:00:01 |
>> |* 11 | TABLE ACCESS STORAGE FULL | TNI
>> | 12 | 192 | 2 (0)| 00:00:01 |
>> |* 12 | INDEX UNIQUE SCAN | MTD_PK
>> | 1 | | 1 (0)| 00:00:01 |
>> | 13 | TABLE ACCESS BY INDEX ROWID | TMTD
>> | 1 | 16 | 2 (0)| 00:00:01 |
>>
>>
>>
>> *|* 14 | TABLE ACCESS BY INDEX ROWID | TAD
>> | 14 | 392 | 708 (1)| 00:00:09 ||* 15 | INDEX RANGE
>> SCAN | TAD_IX2 | 28797 | | 116
>> (0)| 00:00:02 || 16 | SORT AGGREGATE |
>> | 1 | 8 | | || 17 |
>> INDEX FULL SCAN (MIN/MAX) | TAD_IX2 | 1 | 8
>> | 3 (0)| 00:00:01 |*
>> | 18 | TABLE ACCESS BY INDEX ROWID | TMMC
>> | 1 | 19 | 1 (0)| 00:00:01 |
>> |* 19 | INDEX RANGE SCAN | TMMC_IX1
>> | 1 | | 0 (0)| |
>> |* 20 | INDEX UNIQUE SCAN | MCT_PK
>> | 1 | | 0 (0)| |
>> | 21 | TABLE ACCESS BY INDEX ROWID | TMCT
>> | 1 | 7 | 1 (0)| 00:00:01 |
>> | 22 | HASH GROUP BY |
>> | 1 | 86 | 717 (1)| 00:00:09 |
>>
>>
>>
>> Predicate Information (identified by operation id):
>> ---------------------------------------------------
>>
>> 8 - access("TMTD"."PMENT"="TAD"."PMENT")
>> 11 - storage("TNI"."NE"='XX')
>> filter("TNI"."NE"='XX')
>> 12 - access("TNI"."NID"="TMTD"."DID")
>> 14 - filter(((NVL(:B1,'ZZZZ')='ZZZZ' OR ("TAD"."MCD"=:B1 AND
>> NVL(:B1,'ZZZZ')<>'ZZZZ')) AND
>> TRUNC(:B2)-"ADT"<=1))
>> 15 - access("TAD"."DT_CR"=)
>> 19 - access("TAD"."MCD"="TMMC"."PTMCD")
>> 20 - access("TMTD"."CID"="TMCT"."SWCTID")
>>
>> Regards
>> Pap
>>
>> On Fri, Jun 18, 2021 at 3:36 PM Jonathan Lewis <jlewisoracle_at_gmail.com>
>> wrote:
>>
>>> The plan you've shown is 310 lines long and there's a hidden parameter
>>> "_sqlmon_max_planlines" which defaults to 300 in your version of Oracle and
>>> is the limiting size of a plan that can be monitor. If you set this to a
>>> slightly larger value your plan should appear in the monitor data.
>>>
>>> "the event its showing for the session is "latch: cache buffers chains"
>>> but active session history is not showing up any significant activity for
>>> that session"
>>>
>>> Please clarify - if the query takes 1-2 minutes to run when serial that
>>> should give you 60 - 120 samples in v$active_session_history unless unless
>>> all the time is spent in "SQL*Net message to/from client". So where are you
>>> finding reports of "latch: cache buffers chains" and are you sure that this
>>> isn't "last wait" rather than "current wait".
>>>
>>> There are two obvious possibilities for long run-times that become
>>> excessive when you issue multiple concurrent versions of the same query -
>>> first is parse time escalating (see, for example this case study:
>>> https://jonathanlewis.wordpress.com/2014/12/09/parse-time-2/ ). The
>>> statement you've shown may be subject to very long partse times as the
>>> optimizer plays around with options for subquery factorisation, complex
>>> view merging, unnesting, and join factorisation.
>>>
>>> The second possibility (which would be more consistent with your comment
>>> of "cache buffers chains latches" - would be the contention hinted at by
>>> the frequent appearance of the claus:
>>>
>>> AND TAD.DT_CR IN (SELECT MAX (DT_CR)
>>> FROM TAD)
>>>
>>> which results in this repeated section of execution plan:
>>>
>>> | 15 | TABLE ACCESS BY INDEX ROWID | TAD | 1 | 29 | 512 (1)| 00:00:07 |
>>> | 16 | INDEX RANGE SCAN | TAD_IX2 | 21183 | | 81 (0)| 00:00:01 |
>>> | 17 | SORT AGGREGATE | | 1 | 8 | | |
>>> | 18 | INDEX FULL SCAN (MIN/MAX) | TAD_IX2 | 1 | 8 | 3 (0)| 00:00:01 |
>>>
>>> You haven't supplied the Predicate Information for the plan, but if the
>>> SORT AGGREGATE is operating as a FILTER predicate rather than a driving
>>> ACCESS predicate, and since the index range scan of TAD_IX2 looks as if it
>>> might be walking through a very large number of rows it's POSSIBLE that
>>> your query is doing a very large number of gets on the root, branch and
>>> high-value leaf block for the min/max scan. (I wouldn't expect this to be
>>> that case - thanks to scalar subquery caching - but it's something to check
>>> when you get the SQL Monitor report). (I would also expect the optimizer
>>> to transform the IN (select max() ... ) to = (select max() ,..) since the
>>> subquery has to be single row, but maybe it hasn't and maybe that's part of
>>> the problem.
>>>
>>> Regards
>>> Jonathan Lewis
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Fri, 18 Jun 2021 at 06:46, Pap <oracle.developer35_at_gmail.com> wrote:
>>>
>>>> Hello Listers, Its version 11.2.0.4 of oracle exadata. And we are
>>>> facing an issue in which a reporting query(part of plsql procedure) which
>>>> normally finishes within ~1-2minutes runs for ~1-2hrs at times. This
>>>> happens when the same query is submitted from 5-6 multiple sessions at the
>>>> same time and is accessing the same customer data. When we kill them and
>>>> rerun them in serial they run fine without any issue and finish in the same
>>>> 1-2 minutes duration.
>>>>
>>>> Few things we observed is , when all the session submitted at same time
>>>> and the query runs long , the event its showing for the session is "latch:
>>>> cache buffers chains" but active session history is not showing up any
>>>> significant activity for that session and also the sql monitor is not
>>>> getting logged for that query. Which means it's not doing significant
>>>> activity while this issue occurs but kind of stuck. Why is it so? And also
>>>> due to that , I am not able to capture the current object on which it's
>>>> actually holding that latch.
>>>>
>>>> The query is an INSERT query which inserts data into a global temporary
>>>> table. It has ~17 UNION clauses of which most look similar. So i am
>>>> wondering if by someway we can rewrite this query which will help us in
>>>> fixing this issue or making the situation better?
>>>>
>>>> Attached is the sample INSERT query with UNION clauses(I have removed a
>>>> few of the UNIONS to make it look simple) and its plan which suffers from
>>>> "latch: cache buffers chains".
>>>>
>>>> Regards
>>>> Pap
>>>>
>>>
-- http://www.freelists.org/webpage/oracle-lReceived on Sat Jun 19 2021 - 19:21:42 CEST