Re: Why index access is not happening in parallel

From: Pap <oracle.developer35_at_gmail.com>
Date: Fri, 7 Jul 2023 00:00:34 +0530
Message-ID: <CAEjw_fiJstv8rBKhuOpKSD9JbEO2o-rZ3VYoZNii2p-pcSgCYQ_at_mail.gmail.com>



Thank You so much Jonathan for the details. It really helped.

Actually the INSERT and the SELECT queries are both related. The INSERT query is writing data to GTT - "TAB1" on which the SELECT runs at later stage of the job.

As you rightly pointed , Regarding the INSERT query , the GTT - TAB1 has only one composite index(eid,etyp) on it. So adding an "order by eid,etyp" clause at the end of the INSERT query will make the data sorted. So hope it will help making the load faster. However i think it will add some overhead to the sorting of those data. But are you suggesting that because the sql monitor (as posted here
https://gist.github.com/oracle9999/618251c1e48b315dc70c73e157443773) shows the "INSERT" spending ~70% of the time while loading data itself. and thus its fine to take the additional burden of sorting those data manually in the query itself? Hope this understanding is correct.

I am yet to see if the table "RFFT" can be accessed efficiently rather the way its happening now using index- RFFT_IX7 for reading millions of rows. But as the sql monitor shows, the majority(~70%) of the time is getting spend on the INSERT part of the query itself. So I am thinking to addres that part of the query first. Correct me if wrong.

As you pointed about the "workload skew" evident from the parallel slave statistics section. In all the sql monitor (sample one is here https://gist.github.com/oracle9999/ff2073c222398416c8095d109c233765) , its showing 50% of the slaves doing more work as compared to other half. Is it not because table RTFX which is driving and is a global temporary table and thus the access to the heap table "RFFT" is not happening in parallel. Isn't its
because of the related global temporary table restriction which "Stefan koehler" pointed in other thread?

Regarding the point "You might be able to eliminate the workload skew if you recreate RTFX (which appears to be a very small table) with a large PCTFREE (e.g. 90%)", But as i mentioned , table RTFX is a global temporary table. So not sutre of large "PCTFREE like 90" will still help us here?

On Wed, 5 Jul, 2023, 1:07 am Jonathan Lewis, <jlewisoracle_at_gmail.com> wrote:

>
> Why would it do the insert in parallel when the notes (and other readers)
> have already told you that PDML will have been disabled because of the
> indexes on GTTs?
> The only point I was addressing in my comment was the one that hadn't been
> raised by anyone else - the side effect of direct path loads and index
> maintenance.
>
> Since you've posted a new page to github I have looked at it - and any
> further comment is based only on that page and not on any previous material
> you may have posted.
>
> a) the SELECT: the "Rows (Actual)" column tells you that the nested loop
> found 56 million index entries (operation 8), which means you visited the
> table 56M times (operation 7), where "Rows (Actual)" is 42,428 - so you
> spent most of your time reading data that you then threw away. The index is
> not a suitable index for that query so you need to find out why not: did
> Oracle pick the wrong index, or is this the best index but in need of an
> extra column or two, possibly rearranging the order, so that you can do a
> smaller range scan and visit the table far fewer times.
>
> b) the INSERT: processes approximately 3M rows, and as I've said, this may
> be sorting and merging to add index entries, so you need to review what
> indexes you have on the the inserted table. If you have only one index
> then POSSIBLY an insert with an ORDER BY clause that matches the index
> definition might make the insert and subsequence queries faster. Most of
> the elapsed time, though, may be due to a bad choice of execution plan -
> there's a massive discrepancy between the estimated and the actual row
> counts; add to this that the Set 1 parallel servers show a distinct skew in
> the amount of data processed, p00t and p00v seem to do most of the work,
> and doing 3M index and table probes will be expensive. You might be able to
> eliminate the workload skew if you recreate RTFX (which appears to be a
> very small table) with a large PCTFREE (e.g. 90%) You should also see what
> you can do by changing that part of the plan (operations 5,6, and 12-18) -
> different table ordering, or different join methods may help (hash joins
> look more appropriate, but I don't know your data so I can't really
> comment).
>
> On a totally separate train of thought - and one I'm not going to test,
> I'm not even sure it would be legal - could you write a parallel-enabled
> pipelined function that returns the data set that uses the GTT with the
> indexes (hence hiding the GTTs) and write a query that joins the function
> to the rest of the tables in the query.
>
>
> Regards
> Jonathan Lewis
>
>
>
> On Sat, 1 Jul 2023 at 12:04, Pap <oracle.developer35_at_gmail.com> wrote:
>
>> Recaptured the plan for both the INSERT and SELECT again as below and
>> after using Enable_parallel_dml the conventional insert does changed to
>> 'Load as select' which means its doing direct path load, however as the
>> note section showing , it's still not doing the write/Insert in parallel.
>> and Majority of time and resources get spent while writing/Inserting
>> records only.
>>
>> https://gist.github.com/oracle9999/8805315be86cc9b74c52609f73bb4e03
>>
>> On Sat, Jul 1, 2023 at 3:56 PM Pap <oracle.developer35_at_gmail.com> wrote:
>>
>>> Thank you Jonathan.
>>>
>>> Actually i posted two queries in this thread earlier (as below) one was
>>> INSERT and Other was SELECT. Lothar pointed out in both of the cases the
>>> execution time was spent in the query coordinator section and how the
>>> complete execution path of the query was not happening in parallel. Some
>>> parts were being serialized.
>>>
>>> https://gist.github.com/oracle9999/ff2073c222398416c8095d109c233765
>>> https://gist.github.com/oracle9999/618251c1e48b315dc70c73e157443773
>>>
>>> In case of INSERT query as you see, the parallel execution was not
>>> happening initially and it was enable_parallel_dml which was missing. But
>>> after applying that hint, I got the below message in the Note section of
>>> the query , which exposes the restriction which GTT is having in oracle.
>>>
>>> Note
>>> -----
>>> * - PDML disabled because temporary tables with indexes used*
>>> - parallel query server generated this plan using optimizer hints
>>> from coordinator
>>>
>>>
>>>

--
http://www.freelists.org/webpage/oracle-l
Received on Thu Jul 06 2023 - 20:30:34 CEST

Original text of this message