Re: KTSJ / Wnnn

From: Ls Cheng <exriscer_at_gmail.com>
Date: Thu, 15 Jul 2021 14:27:05 +0200
Message-ID: <CAJ2-Qb9wtAWX5cEzk37HYn3=1=HDvXFNbovcwB13Ze_L8NEL=g_at_mail.gmail.com>



Hi

Is this RAC? If it is then most probably you need to apply this patch

*Bug 32245850 - txtsdan : dml operations hung on "gc current request" waits*

We had quite a few problems with KTSJ in a 19.10 database running in Exadata Full Rack and after SR'ing was told to apply this patch which fixed the problems.

BR

On Thu, Jul 15, 2021 at 1:02 PM Dominic Brooks <dombrooks_at_hotmail.com> wrote:

> I was observing a foreground process yesterday which was running a series
> of batched updates from Java in a single thread and was running very
> slowly.
>
> Each element in the batch was updating a single row via a unique scan.
>
> The execution time of this feed was reported as having tripled since
> moving to 19.6.
>
>
>
> Performance was atrocious. For example, from AWR over a period of 15-16
> hours, an average size batch of a couple of hundred elements was averaging
> anywhere between 8 and over 100 seconds per execution per hour, the vast
> majority of time in cluster related waits. Averages hide a whole bunch of
> detail of course but a useful indicator.
>
>
>
> I was observing from GV$SESSION and GV$ASH and the source of the cluster
> waits seems to be related to KTSJ slave activity and there was strong
> correlation of the “two” (java update plus multiple active KTSJ slaves)
> working on the same datafile/blocks – series of the two doing gc buffer
> busy release, gc buffer busy acquire, gc current block busy with the
> occasional cell single block physical read. Blocking session information on
> some of the gc waits occasionally pointing at the other (update blocked by
> KTSJ or vice versa)
>
>
>
>
>
> Reading all the responses oracle-l thread from May 2020 on KTSJ was the
> best source of information I could find:
>
> https://www.freelists.org/post/oracle-l/reads-by-KTSJ,17
>
> And a couple of bug references leading from there, not all of which
> relevant to my version (19.6) but giving indications of what might be going
> on:
>
> - blocks are not marked as free in assm after delete - 12.2 and later
> (Doc ID 30265523.8)
> - performance degradation by w00 processes after applying july 2020
> dbru (Doc ID 32075777.8) superseded by
> - force full repair enabled by fix control and populate repair list
> even if _assm_segment_repair_bg=false (Doc ID 32234161.8)
>
> With mention of the parameter _assm_segment_repair_bg.
>
> Per the explanations in the oracle-l thread, seems to be foreground
> session doing something which then prompts background session to check/fix
> the ASSM information. But in my case, this fixing is causing significant
> contention back to the foreground session.
>
>
>
> I ran snapper on some of the KTSJ slaves and of the ASSM fix related
> stats, ASSM bg: slave fix state was consistently around 5000 in a 5 second
> period. That is not a statistic I have any context to judge the value of.
>
>
>
> This is a monthly feed so doesn’t happen every day but when it does it
> sits in a critical path. It’s finished now so there’s not a lot I can look
> at now if it’s not in ASH. Obviously a next step is to try to reproduce
> this in a test environment.
>
>
>
> I just wondered whether anyone had done any further investigation into
> this behaviour.
>
>
>
> Cheers,
>
> Dominic
>
>
>
> Sent from Mail <https://go.microsoft.com/fwlink/?LinkId=550986> for
> Windows 10
>
>
>

--
http://www.freelists.org/webpage/oracle-l
Received on Thu Jul 15 2021 - 14:27:05 CEST

Original text of this message