Re: Why index access is not happening in parallel

From: Lothar Flatz <l.flatz_at_bluewin.ch>
Date: Sun, 25 Jun 2023 09:49:38 +0200
Message-ID: <be67a1e7-e0df-658c-a41c-1cbefbbc2c9a_at_bluewin.ch>



Hi Pap,

I would hint like that to get a parallel insert:

INSERT /*+APPEND *parallel(4)* */ INTO TAB1 (...)    SELECT /*+parallel(4)*/ .....,
It should not be required, but in some occasions it only works with the parallel Hint in the insert.  If that does not work: there could be other reasons: Does your target table include a LOB? Let us see if that resolves the index access as well. If not, just run the query part and do a display_cursor. The very section of the plan contains the note section with starts with the words "Note:".
In most cases it will give you a explanation for the breach of parallelism.

Thanks

Lothar

Am 24.06.2023 um 22:17 schrieb Pap:
> Thank You So much Lothar.
>
> I missed these critical things in the sql monitor. Thanks for the
> guidance.
>
>  I can see the majority of the time spent in both the INSERT and
> SELECT query is in the PX-coordinator only. But the
> gv$active_session_history was showing highest number of samples count
> at index access path step only. Not sure why.
>
> Actually the expectation behind the usage of higher parallelism was
> faster processing in case of large data volume. and we do have
> 20-30million rows fetched using that index access path and aggregated
> and in those cases the serial execution is causing bottleneck.
>
> Also the hints used in the Insert query are something as below. Do you
> see anything wrong here, which is why INSERT is running in serial? No
> functions used in the queries though. The Decode/case statements are
> used in the SELECT part and that will not stop the parallelism i believe.
>
> INSERT /*+APPEND*/ INTO TAB1 (...)  SELECT /*+parallel(4)*/ .....,
>
> And as mentioned, we have scenarios in which the below section of the
> plan actually scans 20-30million rows(mainly from table RFFT) and
> there we see slowness for the index access path (mostly because it's
> happening in serial). And is it correct that , if the INSERT can be
> made parallel then we would be able to make the INDEX Access of table
> RFFT also happen in parallel?
>
> | 10 |        PX COORDINATOR                             |            
>       |         |      |        59 | +1 |     5 |    35512 |    1 |
>  8192 |       |       |     . |          |                            
>       |
> | 11 |         PX SEND QC (RANDOM)   | :TQ20000                      |
>   63082 |   58 |      56 |     +2 |     4 |    35512 |      |       |
>     |       |     . |          |             |
> | 12 |          PX BLOCK ITERATOR    |                               |
>   63082 |   58 |        56 |     +2 |     4 |    35512 |      |      
> |       |       |     . |          |               |
> | 13 |           TABLE ACCESS STORAGE FULL   | RTFX                  
>        |   63082 |   58 |      56 |     +2 |    54 |    35512 |   60 |
>   6MB |     |       |     . |     3.03 | cell multiblock physical read
> (2) |
> | 14 |        PARTITION LIST ITERATOR    |                            
>   |       2 |    8 |        59 |     +1 | 35512 |       3M |      |  
>     |       |       |     . |     1.52 | Cpu (1)               |
> | 15 |         TABLE ACCESS BY LOCAL INDEX ROWID BATCHED | RFFT      
>                    |       2 |  8 |        59 |     +1 | 34820 |      
> 3M |      |     |       |       |     . |     9.09 | gc current block
> 3-way (3)        |
> |    |   |                               |         |      |         |
>        |       |          |      |       |     |       |       |      
>    | Cpu (3)             |
> | 16 |          INDEX RANGE SCAN   | RFFT_IX7                      |  
>  1068 |    1 |      59 |     +1 | 34820 |       3M |      |       |  
>   |       |     . |          |             |
> | 17 |       TABLE ACCESS BY INDEX ROWID BATCHED   | RTNI            
>              |       1 |    2 |      59 |     +1 |    3M |       3M |
>  371 |   3MB |     |       |     . |     7.58 | Cpu (5)             |
> | 18 |        INDEX RANGE SCAN   | RTNI_IX6                      |    
>   1 |    1 |      59 |     +1 |    3M |       3M |   49 | 392KB |    
> |       |     . |     4.55 | Cpu (3)             |
> =========================================================================================================================================================================================================================================
>
> And also regarding the SELECT query in the second post,  that also
> shows parallel for the full table scan RTNI but then table TAB1 is
> getting scanned using index TAB1_IX1 and seems to be happening in
> serial. Is it because of the order by clause? as because no function
> is used in the query here. Here also at times the query fetches
> 20-30million rows and there we are expecting the Index access of tab1
> to happen in parallel threads.
>
> The hint used in the SELECT query is as below.
>
> SELECT /*+parallel(4)*/ .....   DECODE (TAB1.etyp, 'XX', RTNI.SCD,  
> TAB1.DCD)    .....
>
> -----------------------------------------------------------------------------------------------------------------------------------------
> | Id  | Operation                             | Name             |
> Rows  | Bytes | Cost (%CPU)| Time     |    TQ  |IN-OUT| PQ Distrib |
> -----------------------------------------------------------------------------------------------------------------------------------------
> |   0 | SELECT STATEMENT                      |              |       |
>       | 33329 (100)|          |        |      |            |
> |   1 |  SORT GROUP BY                        |              |    16 |
>  3456 | 33329   (1)| 00:00:02 |        |      |            |
> |   2 |   NESTED LOOPS                        |              |  1798 |
>   379K| 33328   (1)| 00:00:02 |        |      |            |
> |   3 |    PX COORDINATOR                     |              |       |
>       |            |          |        |      |            |
> |   4 |     PX SEND QC (RANDOM)               | :TQ10000           |  
>  20 |  2980 |     8   (0)| 00:00:01 |  Q1,00 | P->S | QC (RAND)  |
> |   5 |      PX BLOCK ITERATOR                |              |    20 |
>  2980 |     8   (0)| 00:00:01 |  Q1,00 | PCWC |            |
> |   6 |       TABLE ACCESS STORAGE FULL       | RTNI             |  
>  20 |  2980 |     8   (0)| 00:00:01 |  Q1,00 | PCWP |            |
> |   7 |    TABLE ACCESS BY INDEX ROWID BATCHED| TAB1             |  
>  88 |  5896 |  6005   (1)| 00:00:01 |        |      |            |
> |   8 |     INDEX RANGE SCAN                  | TAB1_IX1           |  
> 370 |       |  5997   (1)| 00:00:01 |        |      |            |
> -----------------------------------------------------------------------------------------------------------------------------------------
>
>
> On Sat, Jun 24, 2023 at 9:54 PM Lothar Flatz <l.flatz_at_bluewin.ch> wrote:
>
> Hi Pap,
>
> as far as I can see, only the "TABLE ACCESS STORAGE FULL " run in
> parallel.
> Before the nested loop joins the result set goes to the
> Coordinator process. That means the nested loops run serial.
> Actually that is ok, since the estimated number of rows are low. I
> see 100k rows. That is not much to go parallel on.
> As you can see, the is not much time spend in the nested loops.
>
> SQL Plan Monitoring Details (Plan Hash Value=655425048)
> ==========================================================================================================================
> | Id | Operation                     | Name              |  Rows  
> |   Rows   | Activity |
> | | |                               | (Estim) | (Actual) |   (%)    |
> ==========================================================================================================================
> |  0 | INSERT STATEMENT |                               |        
> |        2 |          |
> |  1 |   LOAD AS SELECT                                  |
> TAB1                          |         |        2 |    71.21
> |<--------- that is the important part where time is spend, the
> coordinator processdoes all this work
> | | |                               |         | |          |
> | | |                               |         | |          |
> |  2 |    OPTIMIZER STATISTICS GATHERING
> |                               |    8395 |        0 |          |
> |  3 |     HASH JOIN |                               |    8395
> |       3M |     3.03 |<--- 3M row to be inserted
> |  4 |      PX COORDINATOR |                              
> |         |    39093 |          |<---------- now we go serial
> |  5 |       PX SEND QC (RANDOM)                         |
> :TQ10000                      |   39428 |    39093 |          |
> |  6 |        PX BLOCK ITERATOR |                              
> |   39428 |    39093 |          |
> |  7 |         TABLE ACCESS STORAGE FULL                 | RDC |  
> 39428 |    39093 |          |
> |  8 |      NESTED LOOPS |                               |    8395
> |       3M |          |
> |  9 |       NESTED LOOPS |                               |   
> 100K |       3M |          |
> | 10 |        PX COORDINATOR |                              
> |         |    35512 |          |<-------    we are serial again
> | 11 |         PX SEND QC (RANDOM)                       |
> :TQ20000                      |   63082 |    35512 |          |
> | 12 |          PX BLOCK ITERATOR |                              
> |   63082 |    35512 |          |
> | 13 |           TABLE ACCESS STORAGE FULL               | RTFX
> |   63082 |    35512 |     3.03 |
> | 14 |        PARTITION LIST ITERATOR
> |                               |       2 |       3M |     1.52 |
> | 15 |         TABLE ACCESS BY LOCAL INDEX ROWID BATCHED |
> RFFT                          |       2 |       3M |     9.09
> |<-------- Activity is neglectable, not much time spend here
> | | |                               |         | |          |
> | 16 |          INDEX RANGE SCAN                         |
> RFFT_IX7                      |    1068 |       3M |          |
> | 17 |       TABLE ACCESS BY INDEX ROWID BATCHED         | RTNI
> |       1 |       3M |     7.58 |<--------- Noct much time spend here
> | 18 |        INDEX RANGE SCAN                           |
> RTNI_IX6                      |       1 |       3M |     4.55 |
> ==========================================================================================================================
>
> The insert part is the important one for run time, And it is
> serial in both cases.
> You can also see the total runtime is almost only on the
> coordinator process.
>
> =======================================================
> |      Name      | Type  | Group# | Server# | Elapsed |
> |                |       |        |         | Time(s) |
> =======================================================
> | PX Coordinator | QC    |        |         | 67 | <----- this is
> where the time is spend
> |                |       |        | |         |
> |                |       |        | |         |
> | p000           | Set 1 |      1 |       1 | 0.00 |
> | p001           | Set 1 |      1 |       2 | 0.00 |
> | p002           | Set 1 |      1 |       3 | 0.00 |
> | p003           | Set 1 |      1 |       4 | 0.00 |
> | p000           | Set 1 |      2 |       1 | 0.22 |
> | p001           | Set 1 |      2 |       2 | 0.17 |
> | p002           | Set 1 |      2 |       3 | 0.13 |
> | p003           | Set 1 |      2 |       4 | 0.13 |
> =======================================================
>
> All serial steps are done by the coordinator process.
>
> Therefore that are two Questions to be answered
>
> _*Why is parallel 2 quicker than parallel 4?
> *In short it is not. The parallel 2 execution has less rows to
> insert (2M versus 3M). Two million rows are not so many that you
> could expect a big uplift from direct path insert.
>
> *Why is the Insert not done in parallel?*
> _Not sure. I do not see your hint , where it is placed. Try to
> write it direct after the append hint.
> There could be different reasons for part of the statement not
> going parallel. Sometimes the use of a function that is not
> parallel enables will prevent parallel from happening.
> Normally the plan note will give you information on parallel
> execution. You should display the plan online.
> (select from (dbmx_plan-display_cursor(...)); )
> In any case, if there are only a couple of thousand rows parallel
> makes not much difference.._
>
>
> _Thanks
>
> Lothar_
>
> _Am 24.06.2023 um 17:24 schrieb Pap:
>> Seeing another query with a similar pattern, though the query is
>> different and its simple "Select query". But in this case too
>> even in one place the query is executed with parallel-4 hint and
>> another with parallel-8 hint in the same database and same
>> objects. But the throughput seems more or less the same. So it
>> seems here also the index access is not happening in parallel too.
>>
>> Below is the sql monitor details :-
>>
>> https://gist.github.com/oracle9999/ff2073c222398416c8095d109c233765
>>
>> Not sure if it's a silly one, but I am wondering if my
>> understanding is wrong , when and how the "index access" can
>> happen in parallel fashion? Appreciate your guidance.
>>
>> Regards
>> Pap
>>
>> On Sat, Jun 24, 2023 at 8:13 PM Pap
>> <oracle.developer35_at_gmail.com> wrote:
>>
>> Hello Listers,
>> This Oracle database version 19C and its exadata. We have a
>> customer query in which one of the executions is happening in
>> parallel(2) and another execution is happening in parallel(4)
>> in two different places. The query is exactly the same , just
>> that the place in which it's getting used with PARALLEL(4)
>> it's also happening with the APPEND hint.
>>
>> People here are expecting the query which runs using
>> parallel(4) with Append should be faster as compared to the
>> other one (i.e. with just parallel(2)). But it's not
>> happening that way. Want to understand why?
>>
>> And also in this query the step no which is coming as top
>> contributor in ASH is table access RFFT using index access
>> path - RFFT_IX7. This is a big table partitioned on column
>> C_KEY.  Other tables are small tables. So just wondering if
>> the higher parallel threads were not helping the query
>> because the index access is happening using BATCHED index
>> range scan (i.e. TABLE ACCESS BY LOCAL INDEX ROWID BATCHED).
>> or are we misinterpreting the execution plan here and
>> something in the query is not making the index scan to happen
>> in parallel?
>>
>> Below is the sql monitor details of the two plans. Below  are
>> two sql monitors, in one case it has processed 2million vs
>> other 3million , however if we see the processing speed i.e.
>> ~67 seconds for 3million vs 38 seconds for 2million. The
>> throughput seems to be more or less the same only. The
>> Parallel-8+ Append seems to be comparatively slower though,
>> which is odd.
>>
>> https://gist.github.com/oracle9999/618251c1e48b315dc70c73e157443773
>>
>> Regards
>> Pap
>>
>

--
http://www.freelists.org/webpage/oracle-l
Received on Sun Jun 25 2023 - 09:49:38 CEST

Original text of this message