Re: Why index access is not happening in parallel
Date: Sun, 25 Jun 2023 01:47:01 +0530
Message-ID: <CAEjw_fi2fGd4AtySpnTNyXh-=2YyFdTrNthCkHD2QO80UG+zHw_at_mail.gmail.com>
Thank You So much Lothar.
I missed these critical things in the sql monitor. Thanks for the guidance.
I can see the majority of the time spent in both the INSERT and SELECT
query is in the PX-coordinator only. But the gv$active_session_history was
showing highest number of samples count at index access path step only. Not
sure why.
Actually the expectation behind the usage of higher parallelism was faster
processing in case of large data volume. and we do have 20-30million rows
fetched using that index access path and aggregated and in those cases the
serial execution is causing bottleneck.
Also the hints used in the Insert query are something as below. Do you see
anything wrong here, which is why INSERT is running in serial? No functions
used in the queries though. The Decode/case statements are used in the
SELECT part and that will not stop the parallelism i believe.
INSERT /*+APPEND*/ INTO TAB1 (...) SELECT /*+parallel(4)*/ .....,
And as mentioned, we have scenarios in which the below section of the plan
actually scans 20-30million rows(mainly from table RFFT) and there we see
slowness for the index access path (mostly because it's happening in
serial). And is it correct that , if the INSERT can be made parallel then
we would be able to make the INDEX Access of table RFFT also happen in
parallel?
| 10 | PX COORDINATOR |
| | | 59 | +1 | 5 | 35512 |
1 | 8192 | | | . | |
|
| 11 | PX SEND QC (RANDOM) | :TQ20000
| 63082 | 58 | 56 | +2 | 4 | 35512 |
| | | | . | |
|
| 12 | PX BLOCK ITERATOR |
| 63082 | 58 | 56 | +2 | 4 | 35512 |
| | | | . | |
|
| 13 | TABLE ACCESS STORAGE FULL | RTFX
| 63082 | 58 | 56 | +2 | 54 | 35512 |
60 | 6MB | | | . | 3.03 | cell multiblock physical
read (2) |
| 14 | PARTITION LIST ITERATOR |
| 2 | 8 | 59 | +1 | 35512 | 3M |
| | | | . | 1.52 | Cpu (1)
|
| 15 | TABLE ACCESS BY LOCAL INDEX ROWID BATCHED | RFFT
| 2 | 8 | 59 | +1 | 34820 | 3M |
| | | | . | 9.09 | gc current block 3-way (3)
|
| | |
| | | | | | |
| | | | | | Cpu (3)
|
| 16 | INDEX RANGE SCAN | RFFT_IX7
| 1068 | 1 | 59 | +1 | 34820 | 3M |
| | | | . | |
|
| 17 | TABLE ACCESS BY INDEX ROWID BATCHED | RTNI
| 1 | 2 | 59 | +1 | 3M | 3M |
371 | 3MB | | | . | 7.58 | Cpu (5)
|
| 18 | INDEX RANGE SCAN | RTNI_IX6
| 1 | 1 | 59 | +1 | 3M | 3M |
49 | 392KB | | | . | 4.55 | Cpu (3)
|
=========================================================================================================================================================================================================================================
And also regarding the SELECT query in the second post, that also shows parallel for the full table scan RTNI but then table TAB1 is getting scanned using index TAB1_IX1 and seems to be happening in serial. Is it because of the order by clause? as because no function is used in the query here. Here also at times the query fetches 20-30million rows and there we are expecting the Index access of tab1 to happen in parallel threads.
The hint used in the SELECT query is as below.
SELECT /*+parallel(4)*/ ..... DECODE (TAB1.etyp, 'XX', RTNI.SCD, TAB1.DCD) ..... -
| Id | Operation | Name | Rows
| Bytes | Cost (%CPU)| Time | TQ |IN-OUT| PQ Distrib |
| 0 | SELECT STATEMENT | |
| | 33329 (100)| | | | |
| 1 | SORT GROUP BY | | 16
| 3456 | 33329 (1)| 00:00:02 | | | |
| 2 | NESTED LOOPS | | 1798
| 379K| 33328 (1)| 00:00:02 | | | |
| 3 | PX COORDINATOR | |
| | | | | | |
| 4 | PX SEND QC (RANDOM) | :TQ10000 | 20
| 2980 | 8 (0)| 00:00:01 | Q1,00 | P->S | QC (RAND) |
| 5 | PX BLOCK ITERATOR | | 20
| 2980 | 8 (0)| 00:00:01 | Q1,00 | PCWC | |
| 6 | TABLE ACCESS STORAGE FULL | RTNI | 20
| 2980 | 8 (0)| 00:00:01 | Q1,00 | PCWP | |
| 7 | TABLE ACCESS BY INDEX ROWID BATCHED| TAB1 | 88
| 5896 | 6005 (1)| 00:00:01 | | | |
| 8 | INDEX RANGE SCAN | TAB1_IX1 | 370
| | 5997 (1)| 00:00:01 | | | |
On Sat, Jun 24, 2023 at 9:54 PM Lothar Flatz <l.flatz_at_bluewin.ch> wrote:
> Hi Pap, > > as far as I can see, only the "TABLE ACCESS STORAGE FULL " run in > parallel. > Before the nested loop joins the result set goes to the Coordinator > process. That means the nested loops run serial. > Actually that is ok, since the estimated number of rows are low. I see > 100k rows. That is not much to go parallel on. > As you can see, the is not much time spend in the nested loops. > > SQL Plan Monitoring Details (Plan Hash Value=655425048) > > ========================================================================================================================== > | Id | Operation | > Name | Rows | Rows | Activity | > | | > | | (Estim) | (Actual) | (%) | > > ========================================================================================================================== > | 0 | INSERT STATEMENT > | | | 2 | | > | 1 | LOAD AS SELECT | > TAB1 | | 2 | 71.21 |<--------- > that is the important part where time is spend, the coordinator processdoes > all this work > | | > | | | | | > | | > | | | | | > | 2 | OPTIMIZER STATISTICS GATHERING > | | 8395 | 0 | | > | 3 | HASH JOIN > | | 8395 | 3M | 3.03 |<--- 3M > row to be inserted > | 4 | PX COORDINATOR > | | | 39093 | > |<---------- now we go serial > | 5 | PX SEND QC (RANDOM) | > :TQ10000 | 39428 | 39093 | | > | 6 | PX BLOCK ITERATOR > | | 39428 | 39093 | | > | 7 | TABLE ACCESS STORAGE FULL | > RDC | 39428 | 39093 | | > | 8 | NESTED LOOPS > | | 8395 | 3M | | > | 9 | NESTED LOOPS > | | 100K | 3M | | > | 10 | PX COORDINATOR > | | | 35512 | > |<------- we are serial again > | 11 | PX SEND QC (RANDOM) | > :TQ20000 | 63082 | 35512 | | > | 12 | PX BLOCK ITERATOR > | | 63082 | 35512 | | > | 13 | TABLE ACCESS STORAGE FULL | > RTFX | 63082 | 35512 | 3.03 | > | 14 | PARTITION LIST ITERATOR > | | 2 | 3M | 1.52 | > | 15 | TABLE ACCESS BY LOCAL INDEX ROWID BATCHED | > RFFT | 2 | 3M | 9.09 |<-------- > Activity is neglectable, not much time spend here > | | > | | | | | > | 16 | INDEX RANGE SCAN | > RFFT_IX7 | 1068 | 3M | | > | 17 | TABLE ACCESS BY INDEX ROWID BATCHED | > RTNI | 1 | 3M | 7.58 |<--------- > Noct much time spend here > | 18 | INDEX RANGE SCAN | > RTNI_IX6 | 1 | 3M | 4.55 | > > ========================================================================================================================== > > The insert part is the important one for run time, And it is serial in > both cases. > You can also see the total runtime is almost only on the coordinator > process. > > ======================================================= > | Name | Type | Group# | Server# | Elapsed | > | | | | | Time(s) | > ======================================================= > | PX Coordinator | QC | | | 67 | <----- this is > where the time is spend > | | | | | | > | | | | | | > | p000 | Set 1 | 1 | 1 | 0.00 | > | p001 | Set 1 | 1 | 2 | 0.00 | > | p002 | Set 1 | 1 | 3 | 0.00 | > | p003 | Set 1 | 1 | 4 | 0.00 | > | p000 | Set 1 | 2 | 1 | 0.22 | > | p001 | Set 1 | 2 | 2 | 0.17 | > | p002 | Set 1 | 2 | 3 | 0.13 | > | p003 | Set 1 | 2 | 4 | 0.13 | > ======================================================= > > All serial steps are done by the coordinator process. > > Therefore that are two Questions to be answered > > > > > > *Why is parallel 2 quicker than parallel 4? In short it is not. The > parallel 2 execution has less rows to insert (2M versus 3M). Two million > rows are not so many that you could expect a big uplift from direct path > insert. Why is the Insert not done in parallel? *Not sure. I do not see > your hint , where it is placed. Try to write it direct after the append > hint. > There could be different reasons for part of the statement not going > parallel. Sometimes the use of a function that is not parallel enables will > prevent parallel from happening. > Normally the plan note will give you information on parallel execution. > You should display the plan online. > (select from (dbmx_plan-display_cursor(...)); ) > In any case, if there are only a couple of thousand rows parallel makes > not much difference.. > > > Thanks > > Lothar > > Am 24.06.2023 um 17:24 schrieb Pap: > > Seeing another query with a similar pattern, though the query is different > and its simple "Select query". But in this case too even in one place the > query is executed with parallel-4 hint and another with parallel-8 hint in > the same database and same objects. But the throughput seems more or less > the same. So it seems here also the index access is not happening in > parallel too. > > Below is the sql monitor details :- > > https://gist.github.com/oracle9999/ff2073c222398416c8095d109c233765 > > Not sure if it's a silly one, but I am wondering if my understanding is > wrong , when and how the "index access" can happen in parallel fashion? > Appreciate your guidance. > > Regards > Pap > > On Sat, Jun 24, 2023 at 8:13 PM Pap <oracle.developer35_at_gmail.com> wrote: > >> Hello Listers, >> This Oracle database version 19C and its exadata. We have a customer >> query in which one of the executions is happening in parallel(2) and >> another execution is happening in parallel(4) in two different places. The >> query is exactly the same , just that the place in which it's getting used >> with PARALLEL(4) it's also happening with the APPEND hint. >> >> People here are expecting the query which runs using parallel(4) with >> Append should be faster as compared to the other one (i.e. with just >> parallel(2)). But it's not happening that way. Want to understand why? >> >> And also in this query the step no which is coming as top contributor in >> ASH is table access RFFT using index access path - RFFT_IX7. This is a big >> table partitioned on column C_KEY. Other tables are small tables. So just >> wondering if the higher parallel threads were not helping the query because >> the index access is happening using BATCHED index range scan (i.e. TABLE >> ACCESS BY LOCAL INDEX ROWID BATCHED). or are we misinterpreting the >> execution plan here and something in the query is not making the index scan >> to happen in parallel? >> >> Below is the sql monitor details of the two plans. Below are two sql >> monitors, in one case it has processed 2million vs other 3million , however >> if we see the processing speed i.e. ~67 seconds for 3million vs 38 seconds >> for 2million. The throughput seems to be more or less the same only. The >> Parallel-8+ Append seems to be comparatively slower though, which is odd. >> >> https://gist.github.com/oracle9999/618251c1e48b315dc70c73e157443773 >> >> Regards >> Pap >> > >
-- http://www.freelists.org/webpage/oracle-lReceived on Sat Jun 24 2023 - 22:17:01 CEST