Re: Why index access is not happening in parallel
Date: Sun, 25 Jun 2023 09:49:38 +0200
Message-ID: <be67a1e7-e0df-658c-a41c-1cbefbbc2c9a_at_bluewin.ch>
Hi Pap,
I would hint like that to get a parallel insert:
INSERT /*+APPEND *parallel(4)* */ INTO TAB1 (...) SELECT
/*+parallel(4)*/ .....,
It should not be required, but in some occasions it only works with the
parallel Hint in the insert. If that does not work: there could be
other reasons: Does your target table include a LOB?
Let us see if that resolves the index access as well.
If not, just run the query part and do a display_cursor. The very
section of the plan contains the note section with starts with the words
"Note:".
In most cases it will give you a explanation for the breach of parallelism.
Thanks
Lothar
Am 24.06.2023 um 22:17 schrieb Pap:
> Thank You So much Lothar.
>
> I missed these critical things in the sql monitor. Thanks for the
> guidance.
>
> I can see the majority of the time spent in both the INSERT and
> SELECT query is in the PX-coordinator only. But the
> gv$active_session_history was showing highest number of samples count
> at index access path step only. Not sure why.
>
> Actually the expectation behind the usage of higher parallelism was
> faster processing in case of large data volume. and we do have
> 20-30million rows fetched using that index access path and aggregated
> and in those cases the serial execution is causing bottleneck.
>
> Also the hints used in the Insert query are something as below. Do you
> see anything wrong here, which is why INSERT is running in serial? No
> functions used in the queries though. The Decode/case statements are
> used in the SELECT part and that will not stop the parallelism i believe.
>
> INSERT /*+APPEND*/ INTO TAB1 (...) SELECT /*+parallel(4)*/ .....,
>
> And as mentioned, we have scenarios in which the below section of the
> plan actually scans 20-30million rows(mainly from table RFFT) and
> there we see slowness for the index access path (mostly because it's
> happening in serial). And is it correct that , if the INSERT can be
> made parallel then we would be able to make the INDEX Access of table
> RFFT also happen in parallel?
>
> | 10 | PX COORDINATOR |
> | | | 59 | +1 | 5 | 35512 | 1 |
> 8192 | | | . | |
> |
> | 11 | PX SEND QC (RANDOM) | :TQ20000 |
> 63082 | 58 | 56 | +2 | 4 | 35512 | | |
> | | . | | |
> | 12 | PX BLOCK ITERATOR | |
> 63082 | 58 | 56 | +2 | 4 | 35512 | |
> | | | . | | |
> | 13 | TABLE ACCESS STORAGE FULL | RTFX
> | 63082 | 58 | 56 | +2 | 54 | 35512 | 60 |
> 6MB | | | . | 3.03 | cell multiblock physical read
> (2) |
> | 14 | PARTITION LIST ITERATOR |
> | 2 | 8 | 59 | +1 | 35512 | 3M | |
> | | | . | 1.52 | Cpu (1) |
> | 15 | TABLE ACCESS BY LOCAL INDEX ROWID BATCHED | RFFT
> | 2 | 8 | 59 | +1 | 34820 |
> 3M | | | | | . | 9.09 | gc current block
> 3-way (3) |
> | | | | | | |
> | | | | | | | |
> | Cpu (3) |
> | 16 | INDEX RANGE SCAN | RFFT_IX7 |
> 1068 | 1 | 59 | +1 | 34820 | 3M | | |
> | | . | | |
> | 17 | TABLE ACCESS BY INDEX ROWID BATCHED | RTNI
> | 1 | 2 | 59 | +1 | 3M | 3M |
> 371 | 3MB | | | . | 7.58 | Cpu (5) |
> | 18 | INDEX RANGE SCAN | RTNI_IX6 |
> 1 | 1 | 59 | +1 | 3M | 3M | 49 | 392KB |
> | | . | 4.55 | Cpu (3) |
> =========================================================================================================================================================================================================================================
>
> And also regarding the SELECT query in the second post, that also
> shows parallel for the full table scan RTNI but then table TAB1 is
> getting scanned using index TAB1_IX1 and seems to be happening in
> serial. Is it because of the order by clause? as because no function
> is used in the query here. Here also at times the query fetches
> 20-30million rows and there we are expecting the Index access of tab1
> to happen in parallel threads.
>
> The hint used in the SELECT query is as below.
>
> SELECT /*+parallel(4)*/ ..... DECODE (TAB1.etyp, 'XX', RTNI.SCD,
> TAB1.DCD) .....
>
> -----------------------------------------------------------------------------------------------------------------------------------------
> | Id | Operation | Name |
> Rows | Bytes | Cost (%CPU)| Time | TQ |IN-OUT| PQ Distrib |
> -----------------------------------------------------------------------------------------------------------------------------------------
> | 0 | SELECT STATEMENT | | |
> | 33329 (100)| | | | |
> | 1 | SORT GROUP BY | | 16 |
> 3456 | 33329 (1)| 00:00:02 | | | |
> | 2 | NESTED LOOPS | | 1798 |
> 379K| 33328 (1)| 00:00:02 | | | |
> | 3 | PX COORDINATOR | | |
> | | | | | |
> | 4 | PX SEND QC (RANDOM) | :TQ10000 |
> 20 | 2980 | 8 (0)| 00:00:01 | Q1,00 | P->S | QC (RAND) |
> | 5 | PX BLOCK ITERATOR | | 20 |
> 2980 | 8 (0)| 00:00:01 | Q1,00 | PCWC | |
> | 6 | TABLE ACCESS STORAGE FULL | RTNI |
> 20 | 2980 | 8 (0)| 00:00:01 | Q1,00 | PCWP | |
> | 7 | TABLE ACCESS BY INDEX ROWID BATCHED| TAB1 |
> 88 | 5896 | 6005 (1)| 00:00:01 | | | |
> | 8 | INDEX RANGE SCAN | TAB1_IX1 |
> 370 | | 5997 (1)| 00:00:01 | | | |
> -----------------------------------------------------------------------------------------------------------------------------------------
>
>
> On Sat, Jun 24, 2023 at 9:54 PM Lothar Flatz <l.flatz_at_bluewin.ch> wrote:
>
> Hi Pap,
>
> as far as I can see, only the "TABLE ACCESS STORAGE FULL " run in
> parallel.
> Before the nested loop joins the result set goes to the
> Coordinator process. That means the nested loops run serial.
> Actually that is ok, since the estimated number of rows are low. I
> see 100k rows. That is not much to go parallel on.
> As you can see, the is not much time spend in the nested loops.
>
> SQL Plan Monitoring Details (Plan Hash Value=655425048)
> ==========================================================================================================================
> | Id | Operation | Name | Rows
> | Rows | Activity |
> | | | | (Estim) | (Actual) | (%) |
> ==========================================================================================================================
> | 0 | INSERT STATEMENT | |
> | 2 | |
> | 1 | LOAD AS SELECT |
> TAB1 | | 2 | 71.21
> |<--------- that is the important part where time is spend, the
> coordinator processdoes all this work
> | | | | | | |
> | | | | | | |
> | 2 | OPTIMIZER STATISTICS GATHERING
> | | 8395 | 0 | |
> | 3 | HASH JOIN | | 8395
> | 3M | 3.03 |<--- 3M row to be inserted
> | 4 | PX COORDINATOR |
> | | 39093 | |<---------- now we go serial
> | 5 | PX SEND QC (RANDOM) |
> :TQ10000 | 39428 | 39093 | |
> | 6 | PX BLOCK ITERATOR |
> | 39428 | 39093 | |
> | 7 | TABLE ACCESS STORAGE FULL | RDC |
> 39428 | 39093 | |
> | 8 | NESTED LOOPS | | 8395
> | 3M | |
> | 9 | NESTED LOOPS | |
> 100K | 3M | |
> | 10 | PX COORDINATOR |
> | | 35512 | |<------- we are serial again
> | 11 | PX SEND QC (RANDOM) |
> :TQ20000 | 63082 | 35512 | |
> | 12 | PX BLOCK ITERATOR |
> | 63082 | 35512 | |
> | 13 | TABLE ACCESS STORAGE FULL | RTFX
> | 63082 | 35512 | 3.03 |
> | 14 | PARTITION LIST ITERATOR
> | | 2 | 3M | 1.52 |
> | 15 | TABLE ACCESS BY LOCAL INDEX ROWID BATCHED |
> RFFT | 2 | 3M | 9.09
> |<-------- Activity is neglectable, not much time spend here
> | | | | | | |
> | 16 | INDEX RANGE SCAN |
> RFFT_IX7 | 1068 | 3M | |
> | 17 | TABLE ACCESS BY INDEX ROWID BATCHED | RTNI
> | 1 | 3M | 7.58 |<--------- Noct much time spend here
> | 18 | INDEX RANGE SCAN |
> RTNI_IX6 | 1 | 3M | 4.55 |
> ==========================================================================================================================
>
> The insert part is the important one for run time, And it is
> serial in both cases.
> You can also see the total runtime is almost only on the
> coordinator process.
>
> =======================================================
> | Name | Type | Group# | Server# | Elapsed |
> | | | | | Time(s) |
> =======================================================
> | PX Coordinator | QC | | | 67 | <----- this is
> where the time is spend
> | | | | | |
> | | | | | |
> | p000 | Set 1 | 1 | 1 | 0.00 |
> | p001 | Set 1 | 1 | 2 | 0.00 |
> | p002 | Set 1 | 1 | 3 | 0.00 |
> | p003 | Set 1 | 1 | 4 | 0.00 |
> | p000 | Set 1 | 2 | 1 | 0.22 |
> | p001 | Set 1 | 2 | 2 | 0.17 |
> | p002 | Set 1 | 2 | 3 | 0.13 |
> | p003 | Set 1 | 2 | 4 | 0.13 |
> =======================================================
>
> All serial steps are done by the coordinator process.
>
> Therefore that are two Questions to be answered
>
> _*Why is parallel 2 quicker than parallel 4?
> *In short it is not. The parallel 2 execution has less rows to
> insert (2M versus 3M). Two million rows are not so many that you
> could expect a big uplift from direct path insert.
>
> *Why is the Insert not done in parallel?*
> _Not sure. I do not see your hint , where it is placed. Try to
> write it direct after the append hint.
> There could be different reasons for part of the statement not
> going parallel. Sometimes the use of a function that is not
> parallel enables will prevent parallel from happening.
> Normally the plan note will give you information on parallel
> execution. You should display the plan online.
> (select from (dbmx_plan-display_cursor(...)); )
> In any case, if there are only a couple of thousand rows parallel
> makes not much difference.._
>
>
> _Thanks
>
> Lothar_
>
> _Am 24.06.2023 um 17:24 schrieb Pap:
>> Seeing another query with a similar pattern, though the query is
>> different and its simple "Select query". But in this case too
>> even in one place the query is executed with parallel-4 hint and
>> another with parallel-8 hint in the same database and same
>> objects. But the throughput seems more or less the same. So it
>> seems here also the index access is not happening in parallel too.
>>
>> Below is the sql monitor details :-
>>
>> https://gist.github.com/oracle9999/ff2073c222398416c8095d109c233765
>>
>> Not sure if it's a silly one, but I am wondering if my
>> understanding is wrong , when and how the "index access" can
>> happen in parallel fashion? Appreciate your guidance.
>>
>> Regards
>> Pap
>>
>> On Sat, Jun 24, 2023 at 8:13 PM Pap
>> <oracle.developer35_at_gmail.com> wrote:
>>
>> Hello Listers,
>> This Oracle database version 19C and its exadata. We have a
>> customer query in which one of the executions is happening in
>> parallel(2) and another execution is happening in parallel(4)
>> in two different places. The query is exactly the same , just
>> that the place in which it's getting used with PARALLEL(4)
>> it's also happening with the APPEND hint.
>>
>> People here are expecting the query which runs using
>> parallel(4) with Append should be faster as compared to the
>> other one (i.e. with just parallel(2)). But it's not
>> happening that way. Want to understand why?
>>
>> And also in this query the step no which is coming as top
>> contributor in ASH is table access RFFT using index access
>> path - RFFT_IX7. This is a big table partitioned on column
>> C_KEY. Other tables are small tables. So just wondering if
>> the higher parallel threads were not helping the query
>> because the index access is happening using BATCHED index
>> range scan (i.e. TABLE ACCESS BY LOCAL INDEX ROWID BATCHED).
>> or are we misinterpreting the execution plan here and
>> something in the query is not making the index scan to happen
>> in parallel?
>>
>> Below is the sql monitor details of the two plans. Below are
>> two sql monitors, in one case it has processed 2million vs
>> other 3million , however if we see the processing speed i.e.
>> ~67 seconds for 3million vs 38 seconds for 2million. The
>> throughput seems to be more or less the same only. The
>> Parallel-8+ Append seems to be comparatively slower though,
>> which is odd.
>>
>> https://gist.github.com/oracle9999/618251c1e48b315dc70c73e157443773
>>
>> Regards
>> Pap
>>
>
-- http://www.freelists.org/webpage/oracle-lReceived on Sun Jun 25 2023 - 09:49:38 CEST