Re: Performance issue on query doing smart scan

From: Lok P <loknath.73_at_gmail.com>
Date: Wed, 27 Jan 2021 01:07:58 +0530
Message-ID: <CAKna9VaJ_AiBRBYZtipDWeJes=qA3_FFJb1gRc=pY_Ts38G3+Q_at_mail.gmail.com>



Thank You Jonathan.

Actually the partition column I_DT is on TAB2 which is part of another with clause query but full scanning that table is not the problem here as we see from the sql monitor. It's the scanning of TAB1 which is partitioned on PART_DT is causing the issue. And we do have an explicit filter on this column for partition pruning, but anyway our requirement is to scan across all the ~180 partitions only.

Hope i haven't misstated here, With respect to index, currently we don't have any local index covering both the column AANUM and DID , but we do have two separate indexes on related columns. One on column (AANUM, TXID) and another one on (DID, TXID, SID) . Also in this table out of total ~56billion rows, the column TXID is most distinct(NUM_DISTINCT is ~20Billion) here but that can't be used as a filter/join in the query as per business logic.Column AANUM is having NUM_DISTINCT ~173million , column DID is having NUM_DISTINCT as ~333K and column SID having NUM_DISTINCT as ~1.6Million.

Considering we have all these indexes are local and a filter present on the partition key PART_DT which is again truncated date column, so i believe we don't really require to include partition key column PART_DT in the index definition. Even in the query , it has another filter on column " o_id " which is having an index on it and having num_distinct as ~261million, but from the sql monitor it looks its not filtering out the data much effectively as the AANUM,DID does in the first place.

Onething i found though by just forcing the index which is on column (AANUM, TXID) is making the query run longer, may be because of the very high distinct value of second column in the index "TXID", so thinking if it's really worth, to compress all those index partitions on 1, i.e. AANUM and will make that index effective and thus this query can be benefited with that index scan without the need of creating a new index? But we already have the ~150 historical table partitions compressed out of total ~180 partitions, in this case, so not sure , if it's already giving us the index compression benefit.

And Jonathan when you said "*The key detail is that you don't want Oracle to visit the table unless it absolutely has to*", are you pointing by someway(may be by INDEX_JOIN hints..etc) just fetch the rowid or minimum columns in first step and then fetch the other column details in another hit(wrapper query) by visiting the table data from the tables using those fetched rowid?

FROM TAB1 a, stage

WHERE     a.aanum = stage.anum    AND a.did = stage.did      AND a.d_ind = 'Y'
AND a.o_id LIKE     CASE   WHEN stage.al = 'XXXX' THEN stage.o_id
ELSE a.o_id END
AND a.part_dt BETWEEN :1 AND :2

Regards
Lok

On Mon, Jan 25, 2021 at 5:06 PM Jonathan Lewis <jlewisoracle_at_gmail.com> wrote:

>
>
>
> *That last observation means that if the inputs and data sizes for this
> query are typical then a nested loop join using a perfect index into the
> partitioned table might have to decompress 415 (query high) to find the
> 415 relevant rows - which would be a tiny amount of I/O and CPU compared to
> the current load . It's not quite that nice, though, because with the best
> local index it looks as if you'd have to probe all 181 partitions of the
> index for each of the 34 driving rows (and you might decide that you don't
> want to create the index). *
>
> It's very much up to you to work out what you consider to be the perfect
> index - but from what you've said so far on this thread the combination of
> predicates you've indicated and your preference for local indexes does
> suggest a locial index on (anum,did) or (dd, anum). I note, though, that
> the original statement you sent us included a predicate "i_dt between :1
> and :2", which also telling us that the table was partitioned on pt_dt but
> no partition elimination was occurring. Is "i_dt" actually the partitioning
> "pt_dt" or is is another column that might need to be in the index.
>
> I suggest you extract a few partitions of data from your table and do a
> few experiments to give you some idea of how Oracle will use the index
> you've suggested with the code you've supplied. The key detail is that you
> don't want Oracle to visit the table unless it absolutely has to.
>
> * - makes me wonder whether you've been caught in the
> "double-decompression trap some of the time, meaning the decompressed data
> to be sent back to the server it larger than the 1MB limit so the CU is
> decompressed at the cell, then sent compressed anyway and decompressed
> again at the datbase. The session stats would give you a clue about that. *
>
> I'd have to check one of the Exadata books (Kerry Osborne et. al.) for
> your version of Oracle on Exadata, but I think the following stats
> highlight the symptom:
>
> cell CUs sent
> compressed
> 866,940.00
> cell physical IO bytes sent directly to DB node to balance CPU
> 15,739,502,952.00
>
> I'm not sure how Oracle is counting (or possibly double counting) because
> I would have expected query high CUs to be close to 32KB each so sending
> 866,940 of them to the database server for decompression should have meant
> something like 26GB sent to DB node; however there is some consistency
> visible in other numbers viz:
>
> EHCC Turbo Scan CUs Decompressed 1,212,123.00 =
> cell CUs sent compressed 866,940.00 + cell CUs sent uncompressed
> 345,183.00
>
>
> Regards
> Jonathan Lewis
>
>
>
>
>
>
>
> Regards
> Jonathan Lewis
>
>
>
> On Sat, 23 Jan 2021 at 15:00, Lok P <loknath.73_at_gmail.com> wrote:
>
>> Thank You Jonathan.
>>
> Regarding the index , as we are really avoiding any global index because
> of maintenance issues and we have to scan all the partitions data, so are
> you pointing towards trying a composite local index on ANUM, DID ?
>
>
>

--
http://www.freelists.org/webpage/oracle-l
Received on Tue Jan 26 2021 - 20:37:58 CET

Original text of this message