Re: Same query with no plan change or volume but runs for hours vs minutes

From: Timur Akhmadeev <timur.akhmadeev_at_gmail.com>
Date: Tue, 5 Dec 2023 16:19:00 +0300
Message-ID: <CACGsLCL00V7hFifopz0ziJrx+=-kwAAaO0VLFt59+L=0=wMbZQ_at_mail.gmail.com>



Welcome. Check
http://www.juliandyke.com/Diagnostics/Tools/ORADEBUG/ORADEBUG.php for details and you need to

sqlplus / as sysdba
oradebug setospid <OS PID of the long running process> oradebug short_stack
oradebug short_stack
...

Each short_stack output will be a stack trace of the process, which you can paste into orafun.info/stack to decrypt the meaning of (some of) the functions.
You would need multiple stacks to see which one is the most common (i.e. the one code path causing the most CPU usage), but even 2-3 may be enough in your case.
If the 2-3 stacks are all very different then sampling short_stack output every 10 seconds or so and aggregating those is the next natural step.

On Tue, Dec 5, 2023 at 2:23 PM Lok P <loknath.73_at_gmail.com> wrote:

> Thank you so much.
>
> Need some guidance on the tracing part. Should we do something like below
> while the query is running long? And also for how much time period should
> we take the trace? . And I believe we need to wait for the next long
> execution to happen.
>
> alter system set events 'trace[ksq][SQL:XXXXX] ';
> alter system set events 'trace[ksq][SQL:XXXXX] off';
>
> Another thing as we discovered , the stats gathering happens from another
> node-2 during the same time and also runs longer for ~30minutes+ (vs
> ~10minutes in normal days) on the same partition multiple times on which
> the data load is happening from this node-1. So not sure if that can be the
> cause or effect of this INSERT query slowness?
>
> Basically one node is executing the delete + insert conventional
> statement at the same time another node is executing stats gathered on the
> same partition. Can this cause such an anomaly in flagging the L1 bitmaps
> for the index blocks or bumping up the "reject db - optimizer index blocks
> reject" stats and thus end up consuming a lot more CPU ?
>
>
>
> On Tue, Dec 5, 2023 at 3:34 PM Timur Akhmadeev <timur.akhmadeev_at_gmail.com>
> wrote:
>
>> https://gist.github.com/oraclelearner/6720861b373814614fe3b0669aee6901
>>
>> Thanks, I was thinking about possible statement restart due to undo
>> segment extension as there are such events and "transaction rollbacks"=1 in
>> the bad run (see
>> https://jonathanlewis.wordpress.com/2019/09/10/update-restarts/ for an
>> example) but it appears it is not the case, or it is just not reported by
>> SQL Monitor.
>>
>> You need to get stack traces as it will immediately point you in the
>> right direction.
>>
>> On Tue, Dec 5, 2023 at 11:40 AM Lok P <loknath.73_at_gmail.com> wrote:
>>
>>> The sql monitor for the complete execution with DONE status of the bad
>>> run is as below.
>>>
>>> https://gist.github.com/oraclelearner/6720861b373814614fe3b0669aee6901
>>>
>>> One thing I was wrong about and just found that the same set of rows
>>> (almost ~30million) gets inserted as part of another process a day before
>>> into this same partition. And then as part of the current day's run , first
>>> there runs a delete query which deletes almost all the rows(~30million)
>>> from that partition and then this insert query runs and inserts all of the
>>> rows again with few additional rows (~32million). But again this pattern
>>> happens daily ,so unsure of the reason why it causes slowness on certain
>>> days only. Yet to investigate , if it's the correct business logic or we
>>> can avoid the delete somehow.
>>>
>>> And also as "Jonathan" pointed on the "gc current grant busy" stats, i
>>> do see stats gather was kicking on the same object from another nopde while
>>> this INSERT was executing and that stats gather finishes in ~10-15minutes.
>>> But hopefully that won't cause such an anomaly in flagging "L1 bitmaps" for
>>> the index blocks.
>>>
>>> Another thing i observed is, the value of the object_id and
>>> data_object_ids of table partition are all same but the for the PK index
>>> i.e dba_ind_partition is pointing to different object_id and
>>> data_object_ids in dba_objects. So I was thinking of any such structural
>>> change occurring on the index partition causing the insert to run longer.
>>> But I see similar differences between object_id and data_object_ids for all
>>> the index partitions, so that is not adding up to the theory. and also i
>>> didn't find any job or related query in the dba_hist_sqltext, which must be
>>> doing index rebuild/shrink/coalesce etc. Also did not find any such INSERT
>>> which must be happening at same time on the same object.
>>>
>>> On Thu, Nov 30, 2023 at 9:24 PM Timur Akhmadeev <
>>> timur.akhmadeev_at_gmail.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> Since it is reproducible I suggest you take a few stacks of the
>>>> database process with oradebug short_stack and then check them at
>>>> orafun.info/stack
>>>> If you can't do short stacks then can you post SQL Monitor of the bad
>>>> run after it is fully completed and its status is DONE.
>>>>
>>>>
>>>>
>>>> On Mon, Nov 27, 2023 at 11:15 PM Lok P <loknath.73_at_gmail.com> wrote:
>>>>
>>>>> Hi All,
>>>>>
>>>>> The insert query normally runs for <15minutes somedays runs for
>>>>> ~6-7hrs with exactly the same plan and approx same volume. And all the
>>>>> resources seem to be spent on step "LOAD TABLE CONVENTIONAL" and its "ON
>>>>> CPU". Wondering where it's spending time during the long execution?
>>>>>
>>>>> While the long run was happening , we tried to fetch the stats from
>>>>> v$sesstat for a ~5minutes duration and collected the non zero stats. Not
>>>>> able to find out anything odd. Could you please guide me here, what must be
>>>>> the cause of this sudden long execution on some days for this same query
>>>>> with almost the same volume?
>>>>>
>>>>> I have posted the sql monitors from both the runs below and also in
>>>>> the below github linc. And also published the nonzero output from the
>>>>> v$sesstat for the long execution for a ~5 minutes delta duration.We do see
>>>>> some stats like "KTFB alloc**" , is that normal or some issue is
>>>>> causing these stats to pop up?
>>>>>
>>>>> https://gist.github.com/oraclelearner/722a77ef5ebce23396d82e2835958303
>>>>>
>>>>> ******** Sql monitor from long run ********
>>>>> Global Information
>>>>> ------------------------------
>>>>> Status : EXECUTING
>>>>>
>>>>> Instance ID : 1
>>>>>
>>>>> SQL Execution ID : 16777216
>>>>>
>>>>> Execution Started : 11/27/2023 03:30:16
>>>>>
>>>>> First Refresh Time : 11/27/2023 03:30:19
>>>>>
>>>>> Last Refresh Time : 11/27/2023 10:06:13
>>>>>
>>>>> Duration : 23759s
>>>>>
>>>>> PLSQL Entry Ids (Object/Subprogram) : 161298,9
>>>>>
>>>>> PLSQL Current Ids (Object/Subprogram) : 161298,15
>>>>>
>>>>>
>>>>> Global Stats
>>>>>
>>>>> ===========================================================================================================================================================================
>>>>> | Elapsed | Cpu | IO | Application | Concurrency | Cluster
>>>>> | PL/SQL | Other | Buffer | Read | Read | Uncompressed | Offload |
>>>>> Offload | Cell |
>>>>> | Time(s) | Time(s) | Waits(s) | Waits(s) | Waits(s) | Waits(s)
>>>>> | Time(s) | Waits(s) | Gets | Reqs | Bytes | Bytes | Elig Bytes |
>>>>> Returned Bytes | Offload |
>>>>>
>>>>> ===========================================================================================================================================================================
>>>>> | 23758 | 23570 | 4.88 | 1.28 | 4.78 | 58
>>>>> | 41 | 119 | 18G | 1M | 719GB | 359GB | 715GB |
>>>>> 4GB | 99.42% |
>>>>>
>>>>> ===========================================================================================================================================================================
>>>>>
>>>>> SQL Plan Monitoring Details (Plan Hash Value=557779869)
>>>>>
>>>>> ==================================================================================================================================================================================================================================
>>>>> | Id | Operation | Name
>>>>> | Rows | Cost | Time | Start | Execs | Rows | Read | Read
>>>>> | Mem | Activity | Activity Detail |
>>>>> Progress |
>>>>> | | |
>>>>> | (Estim) | | Active(s) | Active | | (Actual) | Reqs | Bytes |
>>>>> | (%) | (# samples) |
>>>>> |
>>>>>
>>>>> ==================================================================================================================================================================================================================================
>>>>> | -> 0 | INSERT STATEMENT |
>>>>> | | | 23761 | +3 | 1 | 0 | | |
>>>>> . | 0.16 | Cpu (35) |
>>>>> |
>>>>> | | |
>>>>> | | | | | | | | |
>>>>> | | cell smart table scan (2) |
>>>>> |
>>>>> | -> 1 | LOAD TABLE CONVENTIONAL | PBDRA_DTL
>>>>> | | | 23762 | +2 | 1 | 0 | 499K | 4GB |
>>>>> . | 98.42 | gc current block 2-way (4) |
>>>>> |
>>>>> | | |
>>>>> | | | | | | | | |
>>>>> | | gc current grant 2-way (23) |
>>>>> |
>>>>> | | |
>>>>> | | | | | | | | |
>>>>> | | gc current grant busy (18) |
>>>>> |
>>>>> | | |
>>>>> | | | | | | | | |
>>>>> | | latch: cache buffers chains (5) |
>>>>> |
>>>>> | | |
>>>>> | | | | | | | | |
>>>>> | | log file switch completion (1) |
>>>>> |
>>>>> | | |
>>>>> | | | | | | | | |
>>>>> | | undo segment extension (2) |
>>>>> |
>>>>> | | |
>>>>> | | | | | | | | |
>>>>> | | Cpu (22695) |
>>>>> |
>>>>> | | |
>>>>> | | | | | | | | |
>>>>> | | cell single block physical read: RDMA (8) |
>>>>> |
>>>>> | | |
>>>>> | | | | | | | | |
>>>>> | | cell single block physical read: xrmem cache (34) |
>>>>> |
>>>>> | | |
>>>>> | | | | | | | | |
>>>>> | | enq: FB - contention (1) |
>>>>> |
>>>>> | | |
>>>>> | | | | | | | | |
>>>>> | | ges message buffer allocation (2) |
>>>>> |
>>>>> | | |
>>>>> | | | | | | | | |
>>>>> | | cell single block physical read: flash cache (3) |
>>>>> |
>>>>> | -> 2 | SEQUENCE | PBDRA_SEQ
>>>>> | | | 23761 | +3 | 1 | 18M | | |
>>>>> . | 0.50 | Cpu (109) |
>>>>> |
>>>>> | | |
>>>>> | | | | | | | | |
>>>>> | | reliable message (6) |
>>>>> |
>>>>> | -> 3 | HASH JOIN |
>>>>> | 68259 | 1M | 23761 | +3 | 1 | 18M | | |
>>>>> 5MB | 0.02 | Cpu (5) |
>>>>> |
>>>>> | 4 | TABLE ACCESS STORAGE FULL | PRFS
>>>>> | 78 | 2 | 1 | +3 | 1 | 63 | 1 | 40960
>>>>> | . | | |
>>>>> |
>>>>> | -> 5 | PARTITION RANGE ITERATOR |
>>>>> | 152K | 1M | 23761 | +3 | 1 | 18M | | |
>>>>> . | | |
>>>>> |
>>>>> | -> 6 | TABLE ACCESS STORAGE FULL | PBABR_DTL
>>>>> | 152K | 1M | 23761 | +3 | 215 | 18M | 740K | 715GB |
>>>>> 7MB | 0.33 | Cpu (73) |
>>>>> 100% |
>>>>> | | |
>>>>> | | | | | | | | |
>>>>> | | gcs drm freeze in enter server mode (1) |
>>>>> |
>>>>> | | |
>>>>> | | | | | | | | |
>>>>> | | reliable message (1) |
>>>>> |
>>>>> | | |
>>>>> | | | | | | | | |
>>>>> | | cell smart table scan (1) |
>>>>> |
>>>>>
>>>>> ==================================================================================================================================================================================================================================
>>>>>
>>>>>
>>>>> ********* Quick Run ****************
>>>>>
>>>>> Global Information
>>>>> ------------------------------
>>>>> Status : DONE
>>>>>
>>>>> Instance ID : 1
>>>>>
>>>>> SQL Execution ID : 16777216
>>>>>
>>>>> Execution Started : 11/24/2023 04:39:04
>>>>>
>>>>> First Refresh Time : 11/24/2023 04:39:07
>>>>>
>>>>> Last Refresh Time : 11/24/2023 04:55:37
>>>>>
>>>>> Duration : 993s
>>>>>
>>>>> PLSQL Entry Ids (Object/Subprogram) : 161298,9
>>>>>
>>>>> PLSQL Current Ids (Object/Subprogram) : 161298,15
>>>>>
>>>>>
>>>>> Global Stats
>>>>>
>>>>> ===========================================================================================================================================================================
>>>>> | Elapsed | Cpu | IO | Application | Concurrency | Cluster
>>>>> | PL/SQL | Other | Buffer | Read | Read | Uncompressed | Offload |
>>>>> Offload | Cell |
>>>>> | Time(s) | Time(s) | Waits(s) | Waits(s) | Waits(s) | Waits(s)
>>>>> | Time(s) | Waits(s) | Gets | Reqs | Bytes | Bytes | Elig Bytes |
>>>>> Returned Bytes | Offload |
>>>>>
>>>>> ===========================================================================================================================================================================
>>>>> | 993 | 942 | 5.27 | 0.05 | 0.01 | 0.02
>>>>> | 68 | 46 | 316M | 2M | 867GB | 443GB | 861GB |
>>>>> 7GB | 99.21% |
>>>>>
>>>>> ===========================================================================================================================================================================
>>>>>
>>>>> SQL Plan Monitoring Details (Plan Hash Value=557779869)
>>>>>
>>>>> =====================================================================================================================================================================================
>>>>> | Id | Operation | Name
>>>>> | Rows | Cost | Time | Start | Execs | Rows | Read | Read |
>>>>> Mem | Activity | Activity Detail |
>>>>> | | |
>>>>> | (Estim) | | Active(s) | Active | | (Actual) | Reqs | Bytes |
>>>>> (Max) | (%) | (# samples) |
>>>>>
>>>>> =====================================================================================================================================================================================
>>>>> | 0 | INSERT STATEMENT |
>>>>> | | | 991 | +3 | 1 | 0 | | |
>>>>> . | | |
>>>>> | 1 | LOAD TABLE CONVENTIONAL | PBDRA_DTL
>>>>> | | | 991 | +3 | 1 | 0 | 874K | 7GB |
>>>>> . | | |
>>>>> | 2 | SEQUENCE | PBDRA_SEQ
>>>>> | | | 992 | +2 | 1 | 30M | | |
>>>>> . | | |
>>>>> | 3 | HASH JOIN |
>>>>> | 66851 | 1M | 991 | +3 | 1 | 30M | | |
>>>>> 5MB | | |
>>>>> | 4 | TABLE ACCESS STORAGE FULL | PRFS
>>>>> | 78 | 2 | 1 | +3 | 1 | 63 | 2 | 49152 |
>>>>> 1MB | | |
>>>>> | 5 | PARTITION RANGE ITERATOR |
>>>>> | 149K | 1M | 991 | +3 | 1 | 32M | | |
>>>>> . | | |
>>>>> | 6 | TABLE ACCESS STORAGE FULL | PBABR_DTL
>>>>> | 149K | 1M | 991 | +3 | 286 | 32M | 892K | 861GB |
>>>>> 7MB | | |
>>>>>
>>>>> =====================================================================================================================================================================================
>>>>>
>>>>
>>>>
>>>> --
>>>> Regards
>>>> Timur Akhmadeev
>>>>
>>>
>>
>> --
>> Regards
>> Timur Akhmadeev
>>
>

-- 
Regards
Timur Akhmadeev

--
http://www.freelists.org/webpage/oracle-l
Received on Tue Dec 05 2023 - 14:19:00 CET

Original text of this message