Re: Same query with no plan change or volume but runs for hours vs minutes

From: Jonathan Lewis <jlewisoracle_at_gmail.com>
Date: Tue, 5 Dec 2023 14:22:07 +0000
Message-ID: <CAGtsp8nb9OODFQBmqWBsTbGtf2+LLM7sMHirmHJjwtMTUdHFiw_at_mail.gmail.com>



One suggestion - you don't want stats gathering on this partition any time around the period when it's deleting and reinserting. I suggest you keep the stats on the partition locked, and unlock them and regather them as the last step of the job that re-inserts the data.

Regards
Jonathan Lewis

On Tue, 5 Dec 2023 at 11:23, Lok P <loknath.73_at_gmail.com> wrote:

> Thank you so much.
>
> Need some guidance on the tracing part. Should we do something like below
> while the query is running long? And also for how much time period should
> we take the trace? . And I believe we need to wait for the next long
> execution to happen.
>
> alter system set events 'trace[ksq][SQL:XXXXX] ';
> alter system set events 'trace[ksq][SQL:XXXXX] off';
>
> Another thing as we discovered , the stats gathering happens from another
> node-2 during the same time and also runs longer for ~30minutes+ (vs
> ~10minutes in normal days) on the same partition multiple times on which
> the data load is happening from this node-1. So not sure if that can be the
> cause or effect of this INSERT query slowness?
>
> Basically one node is executing the delete + insert conventional
> statement at the same time another node is executing stats gathered on the
> same partition. Can this cause such an anomaly in flagging the L1 bitmaps
> for the index blocks or bumping up the "reject db - optimizer index blocks
> reject" stats and thus end up consuming a lot more CPU ?
>
>
>
> On Tue, Dec 5, 2023 at 3:34 PM Timur Akhmadeev <timur.akhmadeev_at_gmail.com>
> wrote:
>
>> https://gist.github.com/oraclelearner/6720861b373814614fe3b0669aee6901
>>
>> Thanks, I was thinking about possible statement restart due to undo
>> segment extension as there are such events and "transaction rollbacks"=1 in
>> the bad run (see
>> https://jonathanlewis.wordpress.com/2019/09/10/update-restarts/ for an
>> example) but it appears it is not the case, or it is just not reported by
>> SQL Monitor.
>>
>> You need to get stack traces as it will immediately point you in the
>> right direction.
>>
>> On Tue, Dec 5, 2023 at 11:40 AM Lok P <loknath.73_at_gmail.com> wrote:
>>
>>> The sql monitor for the complete execution with DONE status of the bad
>>> run is as below.
>>>
>>> https://gist.github.com/oraclelearner/6720861b373814614fe3b0669aee6901
>>>
>>> One thing I was wrong about and just found that the same set of rows
>>> (almost ~30million) gets inserted as part of another process a day before
>>> into this same partition. And then as part of the current day's run , first
>>> there runs a delete query which deletes almost all the rows(~30million)
>>> from that partition and then this insert query runs and inserts all of the
>>> rows again with few additional rows (~32million). But again this pattern
>>> happens daily ,so unsure of the reason why it causes slowness on certain
>>> days only. Yet to investigate , if it's the correct business logic or we
>>> can avoid the delete somehow.
>>>
>>> And also as "Jonathan" pointed on the "gc current grant busy" stats, i
>>> do see stats gather was kicking on the same object from another nopde while
>>> this INSERT was executing and that stats gather finishes in ~10-15minutes.
>>> But hopefully that won't cause such an anomaly in flagging "L1 bitmaps" for
>>> the index blocks.
>>>
>>> Another thing i observed is, the value of the object_id and
>>> data_object_ids of table partition are all same but the for the PK index
>>> i.e dba_ind_partition is pointing to different object_id and
>>> data_object_ids in dba_objects. So I was thinking of any such structural
>>> change occurring on the index partition causing the insert to run longer.
>>> But I see similar differences between object_id and data_object_ids for all
>>> the index partitions, so that is not adding up to the theory. and also i
>>> didn't find any job or related query in the dba_hist_sqltext, which must be
>>> doing index rebuild/shrink/coalesce etc. Also did not find any such INSERT
>>> which must be happening at same time on the same object.
>>>
>>> On Thu, Nov 30, 2023 at 9:24 PM Timur Akhmadeev <
>>> timur.akhmadeev_at_gmail.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> Since it is reproducible I suggest you take a few stacks of the
>>>> database process with oradebug short_stack and then check them at
>>>> orafun.info/stack
>>>> If you can't do short stacks then can you post SQL Monitor of the bad
>>>> run after it is fully completed and its status is DONE.
>>>>
>>>>
>>>>
>>>> On Mon, Nov 27, 2023 at 11:15 PM Lok P <loknath.73_at_gmail.com> wrote:
>>>>
>>>>> Hi All,
>>>>>
>>>>> The insert query normally runs for <15minutes somedays runs for
>>>>> ~6-7hrs with exactly the same plan and approx same volume. And all the
>>>>> resources seem to be spent on step "LOAD TABLE CONVENTIONAL" and its "ON
>>>>> CPU". Wondering where it's spending time during the long execution?
>>>>>
>>>>> While the long run was happening , we tried to fetch the stats from
>>>>> v$sesstat for a ~5minutes duration and collected the non zero stats. Not
>>>>> able to find out anything odd. Could you please guide me here, what must be
>>>>> the cause of this sudden long execution on some days for this same query
>>>>> with almost the same volume?
>>>>>
>>>>> I have posted the sql monitors from both the runs below and also in
>>>>> the below github linc. And also published the nonzero output from the
>>>>> v$sesstat for the long execution for a ~5 minutes delta duration.We do see
>>>>> some stats like "KTFB alloc**" , is that normal or some issue is
>>>>> causing these stats to pop up?
>>>>>
>>>>> https://gist.github.com/oraclelearner/722a77ef5ebce23396d82e2835958303
>>>>>
>>>>> ******** Sql monitor from long run ********
>>>>> Global Information
>>>>> ------------------------------
>>>>> Status : EXECUTING
>>>>>
>>>>> Instance ID : 1
>>>>>
>>>>> SQL Execution ID : 16777216
>>>>>
>>>>> Execution Started : 11/27/2023 03:30:16
>>>>>
>>>>> First Refresh Time : 11/27/2023 03:30:19
>>>>>
>>>>> Last Refresh Time : 11/27/2023 10:06:13
>>>>>
>>>>> Duration : 23759s
>>>>>
>>>>> PLSQL Entry Ids (Object/Subprogram) : 161298,9
>>>>>
>>>>> PLSQL Current Ids (Object/Subprogram) : 161298,15
>>>>>
>>>>>
>>>>> Global Stats
>>>>>
>>>>> ===========================================================================================================================================================================
>>>>> | Elapsed | Cpu | IO | Application | Concurrency | Cluster
>>>>> | PL/SQL | Other | Buffer | Read | Read | Uncompressed | Offload |
>>>>> Offload | Cell |
>>>>> | Time(s) | Time(s) | Waits(s) | Waits(s) | Waits(s) | Waits(s)
>>>>> | Time(s) | Waits(s) | Gets | Reqs | Bytes | Bytes | Elig Bytes |
>>>>> Returned Bytes | Offload |
>>>>>
>>>>> ===========================================================================================================================================================================
>>>>> | 23758 | 23570 | 4.88 | 1.28 | 4.78 | 58
>>>>> | 41 | 119 | 18G | 1M | 719GB | 359GB | 715GB |
>>>>> 4GB | 99.42% |
>>>>>
>>>>> ===========================================================================================================================================================================
>>>>>
>>>>> SQL Plan Monitoring Details (Plan Hash Value=557779869)
>>>>>
>>>>> ==================================================================================================================================================================================================================================
>>>>> | Id | Operation | Name
>>>>> | Rows | Cost | Time | Start | Execs | Rows | Read | Read
>>>>> | Mem | Activity | Activity Detail |
>>>>> Progress |
>>>>> | | |
>>>>> | (Estim) | | Active(s) | Active | | (Actual) | Reqs | Bytes |
>>>>> | (%) | (# samples) |
>>>>> |
>>>>>
>>>>> ==================================================================================================================================================================================================================================
>>>>> | -> 0 | INSERT STATEMENT |
>>>>> | | | 23761 | +3 | 1 | 0 | | |
>>>>> . | 0.16 | Cpu (35) |
>>>>> |
>>>>> | | |
>>>>> | | | | | | | | |
>>>>> | | cell smart table scan (2) |
>>>>> |
>>>>> | -> 1 | LOAD TABLE CONVENTIONAL | PBDRA_DTL
>>>>> | | | 23762 | +2 | 1 | 0 | 499K | 4GB |
>>>>> . | 98.42 | gc current block 2-way (4) |
>>>>> |
>>>>> | | |
>>>>> | | | | | | | | |
>>>>> | | gc current grant 2-way (23) |
>>>>> |
>>>>> | | |
>>>>> | | | | | | | | |
>>>>> | | gc current grant busy (18) |
>>>>> |
>>>>> | | |
>>>>> | | | | | | | | |
>>>>> | | latch: cache buffers chains (5) |
>>>>> |
>>>>> | | |
>>>>> | | | | | | | | |
>>>>> | | log file switch completion (1) |
>>>>> |
>>>>> | | |
>>>>> | | | | | | | | |
>>>>> | | undo segment extension (2) |
>>>>> |
>>>>> | | |
>>>>> | | | | | | | | |
>>>>> | | Cpu (22695) |
>>>>> |
>>>>> | | |
>>>>> | | | | | | | | |
>>>>> | | cell single block physical read: RDMA (8) |
>>>>> |
>>>>> | | |
>>>>> | | | | | | | | |
>>>>> | | cell single block physical read: xrmem cache (34) |
>>>>> |
>>>>> | | |
>>>>> | | | | | | | | |
>>>>> | | enq: FB - contention (1) |
>>>>> |
>>>>> | | |
>>>>> | | | | | | | | |
>>>>> | | ges message buffer allocation (2) |
>>>>> |
>>>>> | | |
>>>>> | | | | | | | | |
>>>>> | | cell single block physical read: flash cache (3) |
>>>>> |
>>>>> | -> 2 | SEQUENCE | PBDRA_SEQ
>>>>> | | | 23761 | +3 | 1 | 18M | | |
>>>>> . | 0.50 | Cpu (109) |
>>>>> |
>>>>> | | |
>>>>> | | | | | | | | |
>>>>> | | reliable message (6) |
>>>>> |
>>>>> | -> 3 | HASH JOIN |
>>>>> | 68259 | 1M | 23761 | +3 | 1 | 18M | | |
>>>>> 5MB | 0.02 | Cpu (5) |
>>>>> |
>>>>> | 4 | TABLE ACCESS STORAGE FULL | PRFS
>>>>> | 78 | 2 | 1 | +3 | 1 | 63 | 1 | 40960
>>>>> | . | | |
>>>>> |
>>>>> | -> 5 | PARTITION RANGE ITERATOR |
>>>>> | 152K | 1M | 23761 | +3 | 1 | 18M | | |
>>>>> . | | |
>>>>> |
>>>>> | -> 6 | TABLE ACCESS STORAGE FULL | PBABR_DTL
>>>>> | 152K | 1M | 23761 | +3 | 215 | 18M | 740K | 715GB |
>>>>> 7MB | 0.33 | Cpu (73) |
>>>>> 100% |
>>>>> | | |
>>>>> | | | | | | | | |
>>>>> | | gcs drm freeze in enter server mode (1) |
>>>>> |
>>>>> | | |
>>>>> | | | | | | | | |
>>>>> | | reliable message (1) |
>>>>> |
>>>>> | | |
>>>>> | | | | | | | | |
>>>>> | | cell smart table scan (1) |
>>>>> |
>>>>>
>>>>> ==================================================================================================================================================================================================================================
>>>>>
>>>>>
>>>>> ********* Quick Run ****************
>>>>>
>>>>> Global Information
>>>>> ------------------------------
>>>>> Status : DONE
>>>>>
>>>>> Instance ID : 1
>>>>>
>>>>> SQL Execution ID : 16777216
>>>>>
>>>>> Execution Started : 11/24/2023 04:39:04
>>>>>
>>>>> First Refresh Time : 11/24/2023 04:39:07
>>>>>
>>>>> Last Refresh Time : 11/24/2023 04:55:37
>>>>>
>>>>> Duration : 993s
>>>>>
>>>>> PLSQL Entry Ids (Object/Subprogram) : 161298,9
>>>>>
>>>>> PLSQL Current Ids (Object/Subprogram) : 161298,15
>>>>>
>>>>>
>>>>> Global Stats
>>>>>
>>>>> ===========================================================================================================================================================================
>>>>> | Elapsed | Cpu | IO | Application | Concurrency | Cluster
>>>>> | PL/SQL | Other | Buffer | Read | Read | Uncompressed | Offload |
>>>>> Offload | Cell |
>>>>> | Time(s) | Time(s) | Waits(s) | Waits(s) | Waits(s) | Waits(s)
>>>>> | Time(s) | Waits(s) | Gets | Reqs | Bytes | Bytes | Elig Bytes |
>>>>> Returned Bytes | Offload |
>>>>>
>>>>> ===========================================================================================================================================================================
>>>>> | 993 | 942 | 5.27 | 0.05 | 0.01 | 0.02
>>>>> | 68 | 46 | 316M | 2M | 867GB | 443GB | 861GB |
>>>>> 7GB | 99.21% |
>>>>>
>>>>> ===========================================================================================================================================================================
>>>>>
>>>>> SQL Plan Monitoring Details (Plan Hash Value=557779869)
>>>>>
>>>>> =====================================================================================================================================================================================
>>>>> | Id | Operation | Name
>>>>> | Rows | Cost | Time | Start | Execs | Rows | Read | Read |
>>>>> Mem | Activity | Activity Detail |
>>>>> | | |
>>>>> | (Estim) | | Active(s) | Active | | (Actual) | Reqs | Bytes |
>>>>> (Max) | (%) | (# samples) |
>>>>>
>>>>> =====================================================================================================================================================================================
>>>>> | 0 | INSERT STATEMENT |
>>>>> | | | 991 | +3 | 1 | 0 | | |
>>>>> . | | |
>>>>> | 1 | LOAD TABLE CONVENTIONAL | PBDRA_DTL
>>>>> | | | 991 | +3 | 1 | 0 | 874K | 7GB |
>>>>> . | | |
>>>>> | 2 | SEQUENCE | PBDRA_SEQ
>>>>> | | | 992 | +2 | 1 | 30M | | |
>>>>> . | | |
>>>>> | 3 | HASH JOIN |
>>>>> | 66851 | 1M | 991 | +3 | 1 | 30M | | |
>>>>> 5MB | | |
>>>>> | 4 | TABLE ACCESS STORAGE FULL | PRFS
>>>>> | 78 | 2 | 1 | +3 | 1 | 63 | 2 | 49152 |
>>>>> 1MB | | |
>>>>> | 5 | PARTITION RANGE ITERATOR |
>>>>> | 149K | 1M | 991 | +3 | 1 | 32M | | |
>>>>> . | | |
>>>>> | 6 | TABLE ACCESS STORAGE FULL | PBABR_DTL
>>>>> | 149K | 1M | 991 | +3 | 286 | 32M | 892K | 861GB |
>>>>> 7MB | | |
>>>>>
>>>>> =====================================================================================================================================================================================
>>>>>
>>>>
>>>>
>>>> --
>>>> Regards
>>>> Timur Akhmadeev
>>>>
>>>
>>
>> --
>> Regards
>> Timur Akhmadeev
>>
>

--
http://www.freelists.org/webpage/oracle-l
Received on Tue Dec 05 2023 - 15:22:07 CET

Original text of this message