Re: How to fix cache buffer chain issue
Date: Fri, 18 Jun 2021 23:19:43 +0530
Message-ID: <CAEjw_fiUNscXGUX6qmm-1DFuGLjFtqmT6ZXsynynRGvtYUF49g_at_mail.gmail.com>
Thank you Mark for making it a bit clear for me. And my apology that I missed to respond to that key point.
Actually the 'latch:cache buffer chain' which I had noted during the execution of the query was the event in v$session. But now when i see the dba_hist_active_sess_history for that sample time interval for that session, i don't see that exact wait event logged but all most all the entries showing 'ON CPU' , but all of them are on same section of the code but at different parts i.e. "Table access by index row id" i.e. index TAD_IX2. And is_parse and is_hard_parse column is N for all the entries.
So I agree with your point that , while I was monitoring during run time from v$session , it was showing events as "latch:cache buffer chains' ', "latch:free" etc across multiple sessions for the same query.
But then is it that I really need to monitor it when the issue next occurs , as in this case , how come the same query can spend so much time ON CPU for accessing the same amount of rows(using table access by index rowid) when run concurrently vs when run serially? I am still doubting the repeating part of the code(sort aggregate index full scan min/max optimization) which Jonathan pointed out, and how can I tweak that part to make it more efficient?
Regards
Pap
On Fri, Jun 18, 2021 at 7:57 PM Mark J. Bobak <mark_at_bobak.net> wrote:
> Pap,
>
> Jonathan mentioned it briefly, but I don't think you answered him:
> How do you *know* that 'latch: cache buffers chains' is your problem?
> Please note, if you see that V$SESSION.EVENT = 'latch: cache
> buffers chains', then that *may* be the current event or it *may* be the
> last event waited on. The difference is, what's the value of
> V$SESSION.STATE? If it's 'WAITING', then you're actually waiting on
> whatever is in the EVENT column. If it's not 'WAITING', i.e., if it's
> 'WAITED SHORT TIME' or 'WAITED KNOWN TIME' or 'WAITED UNKNOWN TIME' (I sure
> hope it's not that! Turn on timed_statistics!), then EVENT means that's
> the last thing the session waited on, but it's now on CPU, and not waiting
> for anything. So, you could see 'latch: cache buffers chains' in EVENT,
> but in reality, your sesion could have waited on that event for less than a
> centisecond, and is now on CPU. In that case, chasing cause of 'latch:
> cache buffers chains' is going to be fruitless.
>
> If you use a script like snapper.sql (awesome script, thanks Tanel!) it
> will take care of this for you. If you're looking directly at V$SESION raw
> data, then you need to apply the filter to understand what V$SESION is
> telling you.
>
> Hope that's clear....
>
> -Mark
>
> On Fri, Jun 18, 2021 at 8:56 AM Pap <oracle.developer35_at_gmail.com> wrote:
>
>> Thank you Mark. Actually this is a reporting query and can be hit by
>> multiple/same customers from multiple sessions with each having their
>> session specific data available to them, and no control over how
>> many customer can hit same report at same time, so it would be a little
>> difficult to have individual GTT's created for each reporting query
>> submission. Also it's an existing query running for years , and I need to
>> see the logic , if UNION can really be replaced with UNION ALL. And also i
>> doubt if its the insert part of the GTT which is causing latch cache buffer
>> chain. It seems to be the access of the index /table TAD in that repeating
>> part of the code , which is causing this issue as Jonathan pointed out.
>>
>> Regards
>> Pap
>>
>> On Fri, Jun 18, 2021 at 12:01 PM Mark W. Farnham <mwf_at_rsiz.com> wrote:
>>
>>> When I see something like this (many UNIONs), I believe it is important
>>> to ask whether (or not) the pieces of the UNION contain mutually exclusive
>>> rows.
>>>
>>>
>>>
>>> Why? Because UNION is required to do deduplication whilst UNION ALL is
>>> not. Even if only some of the pieces are known to be mutually exclusive
>>> (and some are not), it is usually worthwhile to produce the required UNION
>>> deduplication on only the pieces that require mutual de-duplication and
>>> then use UNION ALL for the mutually exclusive pieces.
>>>
>>>
>>>
>>> Good luck. Now if you just don’t know analytically from the predicates
>>> that the pieces are mutually exclusive, don’t take chances. Using UNION ALL
>>> when you should have used UNION potentially produces extraneous rows in
>>> your insert. Oracle has to do the deduplication overhead for UNION even if
>>> it is logically impossible for the pieces to contain duplicates. You may be
>>> able to observe from the predicates that the pieces are disjoint. Apart
>>> from the trivial case where each piece is from a different partition and
>>> includes the partition key or each piece has a mutually exclusive predicate
>>> for a particular column, it can get tricky to be certain deduplication is
>>> never needed.
>>>
>>>
>>>
>>> Anyway, check that first.
>>>
>>>
>>>
>>> Regarding the GTT in 11 you don’t have private GTTs, so multiple
>>> inserter could potentially cause a kerfuffle. Oracle keeps track of which
>>> rows “belong” to which session, but many sessions using the same one is
>>> less friendly to the engine than each having its own. I’m not entirely sure
>>> how that would produce cache buffers chain latch contention, but IF it is
>>> easy to create a unique (sessionid suffixed or something like that) GTT
>>> name and parse each query separately having its own GTT, that also might
>>> make the problem go away. IF it is easy, that’s worth a try.
>>>
>>>
>>>
>>> Understand that it’s worth a try because the time overhead to create a
>>> GTT for a 1-2 minute query is pretty small and you’re only reporting 5-6
>>> concurrent sessions. If you have a few thousand sessions that would gum up
>>> the works creating the object in the dictionary. Then (assuming separate
>>> GTTs are the solution) you’d need to create the GTTs once in advance and
>>> have some sort of check-out system for which one to use. (Presumably more
>>> programming than using the sessionid as a suffix and creating on the fly
>>> and dropping when you’re done.)
>>>
>>>
>>>
>>> If a session’s event IS waiting for a latch, it’s not going to be doing
>>> much until it gets that latch.
>>>
>>>
>>>
>>> Even if using separate GTTs makes the problem go away, I suggest you
>>> still look that that UNION versus UNION ALL issue. Unneeded de-duplication
>>> could be a large percentage of your 1-2 minutes when it runs serially. If
>>> that becomes 5 or 10 seconds, your chance of concurrent sessions drops
>>> dramatically.
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> *From:* oracle-l-bounce_at_freelists.org [mailto:
>>> oracle-l-bounce_at_freelists.org] *On Behalf Of *Pap
>>> *Sent:* Friday, June 18, 2021 1:46 AM
>>> *To:* Oracle L
>>> *Subject:* How to fix cache buffer chain issue
>>>
>>>
>>>
>>> Hello Listers, Its version 11.2.0.4 of oracle exadata. And we are
>>> facing an issue in which a reporting query(part of plsql procedure) which
>>> normally finishes within ~1-2minutes runs for ~1-2hrs at times. This
>>> happens when the same query is submitted from 5-6 multiple sessions at the
>>> same time and is accessing the same customer data. When we kill them and
>>> rerun them in serial they run fine without any issue and finish in the same
>>> 1-2 minutes duration.
>>>
>>> Few things we observed is , when all the session submitted at same time
>>> and the query runs long , the event its showing for the session is "latch:
>>> cache buffers chains" but active session history is not showing up any
>>> significant activity for that session and also the sql monitor is not
>>> getting logged for that query. Which means it's not doing significant
>>> activity while this issue occurs but kind of stuck. Why is it so? And also
>>> due to that , I am not able to capture the current object on which it's
>>> actually holding that latch.
>>>
>>> The query is an INSERT query which inserts data into a global temporary
>>> table. It has ~17 UNION clauses of which most look similar. So i am
>>> wondering if by someway we can rewrite this query which will help us in
>>> fixing this issue or making the situation better?
>>>
>>> Attached is the sample INSERT query with UNION clauses(I have removed a
>>> few of the UNIONS to make it look simple) and its plan which suffers from
>>> "latch: cache buffers chains".
>>>
>>>
>>>
>>> Regards
>>>
>>> Pap
>>>
>>
-- http://www.freelists.org/webpage/oracle-lReceived on Fri Jun 18 2021 - 19:49:43 CEST