Re: Why index access is not happening in parallel

From: Lothar Flatz <l.flatz_at_bluewin.ch>
Date: Sun, 25 Jun 2023 09:49:38 +0200
Message-ID: <be67a1e7-e0df-658c-a41c-1cbefbbc2c9a_at_bluewin.ch>

Hi Pap,

I would hint like that to get a parallel insert:

INSERT /*+APPEND *parallel(4)* */ INTO TAB1 (...) SELECT /*+parallel(4)*/ .....,
It should not be required, but in some occasions it only works with the parallel Hint in the insert. If that does not work: there could be other reasons: Does your target table include a LOB? Let us see if that resolves the index access as well. If not, just run the query part and do a display_cursor. The very section of the plan contains the note section with starts with the words "Note:".
In most cases it will give you a explanation for the breach of parallelism.

Thanks

Lothar

Am 24.06.2023 um 22:17 schrieb Pap:
> Thank You So much Lothar.
>
> I missed these critical things in the sql monitor. Thanks for the
> guidance.
>
> I can see the majority of the time spent in both the INSERT and
> SELECT query is in the PX-coordinator only. But the
> gv$active_session_history was showing highest number of samples count
> at index access path step only. Not sure why.
>
> Actually the expectation behind the usage of higher parallelism was
> faster processing in case of large data volume. and we do have
> 20-30million rows fetched using that index access path and aggregated
> and in those cases the serial execution is causing bottleneck.
>
> Also the hints used in the Insert query are something as below. Do you
> see anything wrong here, which is why INSERT is running in serial? No
> functions used in the queries though. The Decode/case statements are
> used in the SELECT part and that will not stop the parallelism i believe.
>
> INSERT /*+APPEND*/ INTO TAB1 (...) SELECT /*+parallel(4)*/ .....,
>
> And as mentioned, we have scenarios in which the below section of the
> plan actually scans 20-30million rows(mainly from table RFFT) and
> there we see slowness for the index access path (mostly because it's
> happening in serial). And is it correct that , if the INSERT can be
> made parallel then we would be able to make the INDEX Access of table
> RFFT also happen in parallel?
>
> | 10 | PX COORDINATOR |
> | | | 59 | +1 | 5 | 35512 | 1 |
> 8192 | | | . | |
> |
> | 11 | PX SEND QC (RANDOM) | :TQ20000 |
> 63082 | 58 | 56 | +2 | 4 | 35512 | | |
> | | . | | |
> | 12 | PX BLOCK ITERATOR | |
> 63082 | 58 | 56 | +2 | 4 | 35512 | |
> | | | . | | |
> | 13 | TABLE ACCESS STORAGE FULL | RTFX
> | 63082 | 58 | 56 | +2 | 54 | 35512 | 60 |
> 6MB | | | . | 3.03 | cell multiblock physical read
> (2) |
> | 14 | PARTITION LIST ITERATOR |
> | 2 | 8 | 59 | +1 | 35512 | 3M | |
> | | | . | 1.52 | Cpu (1) |
> | 15 | TABLE ACCESS BY LOCAL INDEX ROWID BATCHED | RFFT
> | 2 | 8 | 59 | +1 | 34820 |
> 3M | | | | | . | 9.09 | gc current block
> 3-way (3) |
> | | | | | | |
> | | | | | | | |
> | Cpu (3) |
> | 16 | INDEX RANGE SCAN | RFFT_IX7 |
> 1068 | 1 | 59 | +1 | 34820 | 3M | | |
> | | . | | |
> | 17 | TABLE ACCESS BY INDEX ROWID BATCHED | RTNI
> | 1 | 2 | 59 | +1 | 3M | 3M |
> 371 | 3MB | | | . | 7.58 | Cpu (5) |
> | 18 | INDEX RANGE SCAN | RTNI_IX6 |
> 1 | 1 | 59 | +1 | 3M | 3M | 49 | 392KB |
> | | . | 4.55 | Cpu (3) |
> =========================================================================================================================================================================================================================================
>
> And also regarding the SELECT query in the second post, that also
> shows parallel for the full table scan RTNI but then table TAB1 is
> getting scanned using index TAB1_IX1 and seems to be happening in
> serial. Is it because of the order by clause? as because no function
> is used in the query here. Here also at times the query fetches
> 20-30million rows and there we are expecting the Index access of tab1
> to happen in parallel threads.
>
> The hint used in the SELECT query is as below.
>
> SELECT /*+parallel(4)*/ ..... DECODE (TAB1.etyp, 'XX', RTNI.SCD,
> TAB1.DCD) .....
>
> -----------------------------------------------------------------------------------------------------------------------------------------
> | Id | Operation | Name |
> Rows | Bytes | Cost (%CPU)| Time | TQ |IN-OUT| PQ Distrib |
> -----------------------------------------------------------------------------------------------------------------------------------------
> | 0 | SELECT STATEMENT | | |
> | 33329 (100)| | | | |
> | 1 | SORT GROUP BY | | 16 |
> 3456 | 33329 (1)| 00:00:02 | | | |
> | 2 | NESTED LOOPS | | 1798 |
> 379K| 33328 (1)| 00:00:02 | | | |
> | 3 | PX COORDINATOR | | |
> | | | | | |
> | 4 | PX SEND QC (RANDOM) | :TQ10000 |
> 20 | 2980 | 8 (0)| 00:00:01 | Q1,00 | P->S | QC (RAND) |
> | 5 | PX BLOCK ITERATOR | | 20 |
> 2980 | 8 (0)| 00:00:01 | Q1,00 | PCWC | |
> | 6 | TABLE ACCESS STORAGE FULL | RTNI |
> 20 | 2980 | 8 (0)| 00:00:01 | Q1,00 | PCWP | |
> | 7 | TABLE ACCESS BY INDEX ROWID BATCHED| TAB1 |
> 88 | 5896 | 6005 (1)| 00:00:01 | | | |
> | 8 | INDEX RANGE SCAN | TAB1_IX1 |
> 370 | | 5997 (1)| 00:00:01 | | | |
> -----------------------------------------------------------------------------------------------------------------------------------------
>
>
> On Sat, Jun 24, 2023 at 9:54 PM Lothar Flatz <l.flatz_at_bluewin.ch> wrote:
>
> Hi Pap,
>
> as far as I can see, only the "TABLE ACCESS STORAGE FULL " run in
> parallel.
> Before the nested loop joins the result set goes to the
> Coordinator process. That means the nested loops run serial.
> Actually that is ok, since the estimated number of rows are low. I
> see 100k rows. That is not much to go parallel on.
> As you can see, the is not much time spend in the nested loops.
>
> SQL Plan Monitoring Details (Plan Hash Value=655425048)
> ==========================================================================================================================
> | Id | Operation                     | Name              | Rows
> |   Rows   | Activity |
> | | |                               | (Estim) | (Actual) |   (%)    |
> ==========================================================================================================================
> | 0 | INSERT STATEMENT |                               |
> |        2 |          |
> | 1 |   LOAD AS SELECT                                  |
> TAB1                          |         |        2 |    71.21
> |<--------- that is the important part where time is spend, the
> coordinator processdoes all this work
> | | |                               |         | |          |
> | | |                               |         | |          |
> | 2 |    OPTIMIZER STATISTICS GATHERING
> |                               |    8395 |        0 |          |
> | 3 |     HASH JOIN |                               |    8395
> |       3M |     3.03 |<--- 3M row to be inserted
> | 4 |      PX COORDINATOR |
> |         |    39093 |          |<---------- now we go serial
> | 5 |       PX SEND QC (RANDOM)                         |
> :TQ10000                      |   39428 |    39093 |          |
> | 6 |        PX BLOCK ITERATOR |
> |   39428 |    39093 |          |
> | 7 |         TABLE ACCESS STORAGE FULL                 | RDC |
> 39428 |    39093 |          |
> | 8 |      NESTED LOOPS |                               |    8395
> |       3M |          |
> | 9 |       NESTED LOOPS |                               |
> 100K |       3M |          |
> | 10 |        PX COORDINATOR |
> |         |    35512 |          |<-------    we are serial again
> | 11 |         PX SEND QC (RANDOM)                       |
> :TQ20000                      |   63082 |    35512 |          |
> | 12 |          PX BLOCK ITERATOR |
> |   63082 |    35512 |          |
> | 13 |           TABLE ACCESS STORAGE FULL               | RTFX
> |   63082 |    35512 |     3.03 |
> | 14 |        PARTITION LIST ITERATOR
> |                               |       2 |       3M |     1.52 |
> | 15 |         TABLE ACCESS BY LOCAL INDEX ROWID BATCHED |
> RFFT                          |       2 |       3M |     9.09
> |<-------- Activity is neglectable, not much time spend here
> | | |                               |         | |          |
> | 16 |          INDEX RANGE SCAN                         |
> RFFT_IX7                      |    1068 |       3M |          |
> | 17 |       TABLE ACCESS BY INDEX ROWID BATCHED         | RTNI
> |       1 |       3M |     7.58 |<--------- Noct much time spend here
> | 18 |        INDEX RANGE SCAN                           |
> RTNI_IX6                      |       1 |       3M |     4.55 |
> ==========================================================================================================================
>
> The insert part is the important one for run time, And it is
> serial in both cases.
> You can also see the total runtime is almost only on the
> coordinator process.
>
> =======================================================
> |      Name      | Type | Group# | Server# | Elapsed |
> |                |       |        |         | Time(s) |
> =======================================================
> | PX Coordinator | QC    |        |         | 67 | <----- this is
> where the time is spend
> |                |       |        | |         |
> |                |       |        | |         |
> | p000           | Set 1 |      1 |       1 | 0.00 |
> | p001           | Set 1 |      1 |       2 | 0.00 |
> | p002           | Set 1 |      1 |       3 | 0.00 |
> | p003           | Set 1 |      1 |       4 | 0.00 |
> | p000           | Set 1 |      2 |       1 | 0.22 |
> | p001           | Set 1 |      2 |       2 | 0.17 |
> | p002           | Set 1 |      2 |       3 | 0.13 |
> | p003           | Set 1 |      2 |       4 | 0.13 |
> =======================================================
>
> All serial steps are done by the coordinator process.
>
> Therefore that are two Questions to be answered
>
> _*Why is parallel 2 quicker than parallel 4?
> *In short it is not. The parallel 2 execution has less rows to
> insert (2M versus 3M). Two million rows are not so many that you
> could expect a big uplift from direct path insert.
>
> *Why is the Insert not done in parallel?*
> _Not sure. I do not see your hint , where it is placed. Try to
> write it direct after the append hint.
> There could be different reasons for part of the statement not
> going parallel. Sometimes the use of a function that is not
> parallel enables will prevent parallel from happening.
> Normally the plan note will give you information on parallel
> execution. You should display the plan online.
> (select from (dbmx_plan-display_cursor(...)); )
> In any case, if there are only a couple of thousand rows parallel
> makes not much difference.._
>
>
> _Thanks
>
> Lothar_
>
> _Am 24.06.2023 um 17:24 schrieb Pap:
>> Seeing another query with a similar pattern, though the query is
>> different and its simple "Select query". But in this case too
>> even in one place the query is executed with parallel-4 hint and
>> another with parallel-8 hint in the same database and same
>> objects. But the throughput seems more or less the same. So it
>> seems here also the index access is not happening in parallel too.
>>
>> Below is the sql monitor details :-
>>
>> https://gist.github.com/oracle9999/ff2073c222398416c8095d109c233765
>>
>> Not sure if it's a silly one, but I am wondering if my
>> understanding is wrong , when and how the "index access" can
>> happen in parallel fashion? Appreciate your guidance.
>>
>> Regards
>> Pap
>>
>> On Sat, Jun 24, 2023 at 8:13 PM Pap
>> <oracle.developer35_at_gmail.com> wrote:
>>
>> Hello Listers,
>> This Oracle database version 19C and its exadata. We have a
>> customer query in which one of the executions is happening in
>> parallel(2) and another execution is happening in parallel(4)
>> in two different places. The query is exactly the same , just
>> that the place in which it's getting used with PARALLEL(4)
>> it's also happening with the APPEND hint.
>>
>> People here are expecting the query which runs using
>> parallel(4) with Append should be faster as compared to the
>> other one (i.e. with just parallel(2)). But it's not
>> happening that way. Want to understand why?
>>
>> And also in this query the step no which is coming as top
>> contributor in ASH is table access RFFT using index access
>> path - RFFT_IX7. This is a big table partitioned on column
>> C_KEY. Other tables are small tables. So just wondering if
>> the higher parallel threads were not helping the query
>> because the index access is happening using BATCHED index
>> range scan (i.e. TABLE ACCESS BY LOCAL INDEX ROWID BATCHED).
>> or are we misinterpreting the execution plan here and
>> something in the query is not making the index scan to happen
>> in parallel?
>>
>> Below is the sql monitor details of the two plans. Below are
>> two sql monitors, in one case it has processed 2million vs
>> other 3million , however if we see the processing speed i.e.
>> ~67 seconds for 3million vs 38 seconds for 2million. The
>> throughput seems to be more or less the same only. The
>> Parallel-8+ Append seems to be comparatively slower though,
>> which is odd.
>>
>> https://gist.github.com/oracle9999/618251c1e48b315dc70c73e157443773
>>
>> Regards
>> Pap
>>
>

--
http://www.freelists.org/webpage/oracle-l

Received on Sun Jun 25 2023 - 09:49:38 CEST

This message: [ Message body ]
Next message: Lothar Flatz: "Re: Why index access is not happening in parallel"
Previous message: yudhi s: "Re: Why index access is not happening in parallel"
In reply to: Pap: "Re: Why index access is not happening in parallel"
In reply to Pap: "Re: Why index access is not happening in parallel"
Next in thread: Mladen Gogala: "Re: typical active table count"

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ]

Original text of this message