Re: Why index access is not happening in parallel

From: Pap <oracle.developer35_at_gmail.com>
Date: Fri, 7 Jul 2023 00:15:37 +0530
Message-ID: <CAEjw_fiZj1THe37kp3hDSA35N3-Cv_S87Fj04xu+BSpo86Ljig_at_mail.gmail.com>



For the SELECT query the table in which it filtering out 42K from Table TAB1 out of "56million" from the Index access. The filtering clause which is getting used on the result set post applying index, is as below.

 So, is there any way to create an index to cover both of these predicate string to help faster access on TAB1?

filter("TAB1"."MDATE">="RTNI"."STIME" AND DECODE("TAB1"."etyp",'XX',"RTNI"."tkey","TAB1"."DCD"||'/'||"TAB1"."DCD")='XXXX')

(https://gist.github.com/oracle9999/ff2073c222398416c8095d109c233765)

 (https://gist.github.com/oracle9999/8805315be86cc9b74c52609f73bb4e03

On Fri, 7 Jul, 2023, 12:00 am Pap, <oracle.developer35_at_gmail.com> wrote:

> Thank You so much Jonathan for the details. It really helped.
>
> Actually the INSERT and the SELECT queries are both related. The INSERT
> query is writing data to GTT - "TAB1" on which the SELECT runs at later
> stage of the job.
>
> As you rightly pointed , Regarding the INSERT query , the GTT - TAB1 has
> only one composite index(eid,etyp) on it. So adding an "order by eid,etyp"
> clause at the end of the INSERT query will make the data sorted. So hope it
> will help making the load faster. However i think it will add some overhead
> to the sorting of those data. But are you suggesting that because the sql
> monitor (as posted here
> https://gist.github.com/oracle9999/618251c1e48b315dc70c73e157443773)
> shows the "INSERT" spending ~70% of the time while loading data itself. and
> thus its fine to take the additional burden of sorting those data manually
> in the query itself? Hope this understanding is correct.
>
> I am yet to see if the table "RFFT" can be accessed efficiently rather
> the way its happening now using index- RFFT_IX7 for reading millions of
> rows. But as the sql monitor shows, the majority(~70%) of the time is
> getting spend on the INSERT part of the query itself. So I am thinking to
> addres that part of the query first. Correct me if wrong.
>
> As you pointed about the "workload skew" evident from the parallel slave
> statistics section. In all the sql monitor (sample one is here
> https://gist.github.com/oracle9999/ff2073c222398416c8095d109c233765) ,
> its showing 50% of the slaves doing more work as compared to other half. Is
> it not because table RTFX which is driving and is a global temporary table
> and thus the access to the heap table "RFFT" is not happening in parallel.
> Isn't its because of the related global temporary table restriction which
> "Stefan koehler" pointed in other thread?
>
> Regarding the point "You might be able to eliminate the workload skew if
> you recreate RTFX (which appears to be a very small table) with a large
> PCTFREE (e.g. 90%)", But as i mentioned , table RTFX is a global temporary
> table. So not sutre of large "PCTFREE like 90" will still help us here?
>
>
> On Wed, 5 Jul, 2023, 1:07 am Jonathan Lewis, <jlewisoracle_at_gmail.com>
> wrote:
>
>>
>> Why would it do the insert in parallel when the notes (and other readers)
>> have already told you that PDML will have been disabled because of the
>> indexes on GTTs?
>> The only point I was addressing in my comment was the one that hadn't
>> been raised by anyone else - the side effect of direct path loads and index
>> maintenance.
>>
>> Since you've posted a new page to github I have looked at it - and any
>> further comment is based only on that page and not on any previous material
>> you may have posted.
>>
>> a) the SELECT: the "Rows (Actual)" column tells you that the nested loop
>> found 56 million index entries (operation 8), which means you visited the
>> table 56M times (operation 7), where "Rows (Actual)" is 42,428 - so you
>> spent most of your time reading data that you then threw away. The index is
>> not a suitable index for that query so you need to find out why not: did
>> Oracle pick the wrong index, or is this the best index but in need of an
>> extra column or two, possibly rearranging the order, so that you can do a
>> smaller range scan and visit the table far fewer times.
>>
>> b) the INSERT: processes approximately 3M rows, and as I've said, this
>> may be sorting and merging to add index entries, so you need to review what
>> indexes you have on the the inserted table. If you have only one index
>> then POSSIBLY an insert with an ORDER BY clause that matches the index
>> definition might make the insert and subsequence queries faster. Most of
>> the elapsed time, though, may be due to a bad choice of execution plan -
>> there's a massive discrepancy between the estimated and the actual row
>> counts; add to this that the Set 1 parallel servers show a distinct skew in
>> the amount of data processed, p00t and p00v seem to do most of the work,
>> and doing 3M index and table probes will be expensive. You might be able to
>> eliminate the workload skew if you recreate RTFX (which appears to be a
>> very small table) with a large PCTFREE (e.g. 90%) You should also see what
>> you can do by changing that part of the plan (operations 5,6, and 12-18) -
>> different table ordering, or different join methods may help (hash joins
>> look more appropriate, but I don't know your data so I can't really
>> comment).
>>
>> On a totally separate train of thought - and one I'm not going to test,
>> I'm not even sure it would be legal - could you write a parallel-enabled
>> pipelined function that returns the data set that uses the GTT with the
>> indexes (hence hiding the GTTs) and write a query that joins the function
>> to the rest of the tables in the query.
>>
>>
>> Regards
>> Jonathan Lewis
>>
>>
>>
>> On Sat, 1 Jul 2023 at 12:04, Pap <oracle.developer35_at_gmail.com> wrote:
>>
>>> Recaptured the plan for both the INSERT and SELECT again as below and
>>> after using Enable_parallel_dml the conventional insert does changed to
>>> 'Load as select' which means its doing direct path load, however as the
>>> note section showing , it's still not doing the write/Insert in parallel.
>>> and Majority of time and resources get spent while writing/Inserting
>>> records only.
>>>
>>> https://gist.github.com/oracle9999/8805315be86cc9b74c52609f73bb4e03
>>>
>>> On Sat, Jul 1, 2023 at 3:56 PM Pap <oracle.developer35_at_gmail.com> wrote:
>>>
>>>> Thank you Jonathan.
>>>>
>>>> Actually i posted two queries in this thread earlier (as below) one was
>>>> INSERT and Other was SELECT. Lothar pointed out in both of the cases the
>>>> execution time was spent in the query coordinator section and how the
>>>> complete execution path of the query was not happening in parallel. Some
>>>> parts were being serialized.
>>>>
>>>> https://gist.github.com/oracle9999/ff2073c222398416c8095d109c233765
>>>> https://gist.github.com/oracle9999/618251c1e48b315dc70c73e157443773
>>>>
>>>> In case of INSERT query as you see, the parallel execution was not
>>>> happening initially and it was enable_parallel_dml which was missing. But
>>>> after applying that hint, I got the below message in the Note section of
>>>> the query , which exposes the restriction which GTT is having in oracle.
>>>>
>>>> Note
>>>> -----
>>>> * - PDML disabled because temporary tables with indexes used*
>>>> - parallel query server generated this plan using optimizer hints
>>>> from coordinator
>>>>
>>>>
>>>>

--
http://www.freelists.org/webpage/oracle-l
Received on Thu Jul 06 2023 - 20:45:37 CEST

Original text of this message