Re: Performance issue on query doing smart scan

From: Lok P <loknath.73_at_gmail.com>
Date: Mon, 1 Feb 2021 16:39:07 +0530
Message-ID: <CAKna9VZ7xG5HRc_qwVce_j63eac=M=B+JG1uodoCfFpcWqjOUw_at_mail.gmail.com>





Thank You Jonathan.

I have modified the table names before posting the query. The actual table/alias name are different. But yes I should have named those better as it's a bit confusing.

In regards to the below point for "Between VS >=, <" Operator, I am not able to fully understand though as input partition key values are bind values in both the cases , so Oracle has to get the exact value in runtime from the input binds. I tried executing a sample query on same table by doing "AND a.part_dt BETWEEN :1 AND :2" and "AND a.part_dt >= :1 AND a.part_dt<:2" and i am not seeing much difference in elapsed time/CPU though. Attached is the sql monitor from both the executions.

*"If you do this then Oracle may be able to discard the predicate (i.e. not waste CPU testing it) if it can deduce that every row in every relevant partition is going to satisfy the predicate. This is particularly interesting for EXADATA with compressed tables, since it MAY mean (and I can't check it) that Oracle won't have to decompress the column to check the value, and this MIGHT have a noticeable benefit for some of the compression options in terms of CPU*."

In regards to the compression, I am seeing the global stats showing the distinct values for AANUM as ~173million which may not be fully accurate considering we are not using incremental stats collection so NDV calculation might go south, but i tried manually seeing the distinct values for a few partitions for column AANUM and those are repeated in all of the partitions. So basically we have AVG ~250million rows in each partition and in those ~135million distinct aanum values , which comes around ~2 index entries for each AANUM and in that prospect the compression of the local index (AANUM,TXID) won't be helpful. And also the same set of AANUM are repeated in each of the partitions. Attached is the columns stats for 4 sample partitions.

And as I see from sample four partitions, out of the columns used as filters in the query the column O_ID seems quite unique , but the way it's used in the query(as a case statement in the right hand side) is not filtering much rows. And as you rightly said, the column d_ind is having two distinct values 'Y' and 'N' but it's having ~40% and 60% rows respectively distributed. So the index on this column is not going to help much. I am planning to first try an index on AANUM,DID and see the performance. Else possibly last option is to have a big index comprising all the columns ( aanum, did, d_ind, o_id, pnm. ) to avoid table read fully, but again considering this table is a big transactional table we may have to think twice before taking decision on creating new indexes.

Regards

Lok

On Fri, Jan 29, 2021 at 4:21 PM Jonathan Lewis <jlewisoracle_at_gmail.com> wrote:

> *Actually the partition column I_DT is on TAB2 which is part of another
> with clause query but full scanning that table is not the problem here as
> we see from the sql monitor. It's the scanning of TAB1 which is partitioned
> on PART_DT is causing the issue. And we do have an explicit filter on this
> column for partition pruning, but anyway our requirement is to scan across
> all the ~180 partitions only.*
>
> Given that you have a table TAB1, which you've included in a CTE that
> you've aliased TAB1, which then gets an alias of "a" in the main query;
> with an equally bad appalli1ng n1aming convention involving TAB2 it's not
> entirely surprising that I managed to cross over the two names when working
> through your problem. Do you really have such bad coding standards?
>
> The point remains, however you said that you scanned all the partitions,
> and that the partitioning column was part_dt, and the SQL you sent us
> originally has the predicate "AND a.part_dt BETWEEN :1 AND :2".
> BETWEEN is a very bad option for a predicate on a partition key.A
> partition contains data that is strictly less than the upper boundary, and
> greater than or equal to the upper bound of the previous partitions, so the
> ideally range-based predicate should be of the form:
> part_dt >= {date1} and part_dt < {date2}
> If you do this than Oracle may be able to discard the predicate (i.e. not
> waste CPU testing it) if it can deduce that every row in every relevant
> partition is going to satisfy the predicate. This is particularly
> interesting for EXADATA with compressed tables, since it MAY mean (and I
> can't check it) that Oracle won't have to decompress the column to check
> the value, and this MIGHT have a noticeable benefit for some of the
> compression options in terms of CPU.
>
>
> Index compression and table compression have nothing to do with each

> other. Also the "much longer running" of the query when forced through
> (AANUM, TXID) isn't about the large number of distinct values of TXID, it's
> about the relatively small number of distinct values of AANUM or, to be
> more accurate, the large number of rows for each value of AANUM that are
> now being acquired by single block access.
>
> Your thoughts on compression do make sense as a general strategy for
> reducing storage space and increasing the probability of caching index leaf
> blocks. Since you have 173 million distinct value for AANUM in a table with
> 56 Billion rows then on average each value of AANUM has about 330 related
> rows and if this table were not partitioned you would automatically say
> that this is a good index for compression. BUT this is a local index on a
> table with 180 partitions, so each partition MIGHT only have one or two
> index entries for each value of AANUM in which case you probably wouldn't
> want to compress it; on the other hand perhaps the AANUM has values which
> correlate with time, so maybe any one value of AANUM appears in only about
> 10 partitions, in which case each partition would average 33 rows per AANUM
> and the index would be worth compressing.
>
> Regarding index columns:
>
>
>
> * WHERE a.aanum = stage.anum AND a.did = stage.did
> AND a.d_ind = 'Y' AND a.o_id LIKE CASE WHEN
> stage.al <http://stage.al> = 'XXXX' THEN stage.o_id ELSE a.o_id END
> AND a.part_dt BETWEEN :1 AND :2*
>
>
>
>
>
>
> * WHERE TAB1.anum(+) = stage.anum AND TAB1.did(+) =
> stage.did AND TAB2.anum(+) = stage.anum AND TAB2.did(+) =
> stage.did AND TAB1.pnm(+) = stage.pnm AND TAB2.pnm(+) =
> stage.pnm*
>
> I made two mistakes with my advice on (anum, did).
>
> First, I forgot that the final where clause was against CTEs, so I missed
> the need to have d_ind and o_id in the index if you wanted to avoid
> visiting the table unnecessarily; secondly I missed the TAB1.pnm predicate
> that was in the middle of your TAB2 predicates in the final where clause -
> you'd need to have this in the index as well to avoid visiting the table.
> Again there's a detail to check in your coding standards - the three TAB1
> predicates should have been together to avoid the risk of that mistake
> happening. (Just to add a positive note, I approve of the "next_table =
> current_table" arrangement of the predicates, some people would have
> written "stage.anum = TAB1.anum(+)")
>
> Two more thought on indexing - a predicate like d_ind='Y' suggests the
> column is a yes/no column.
> a) If almost all the data has d_ind='Y' then there may be no significant
> benefit in having that column in the index. If only a tiny fraction of
> the data has d_ind='Y' then if you can modify the query, you might be able
> to create a very small function-based index that would make this query much
> more efficient while adding very little to the workload.
> b) The list of relevant columns for the index has gone up to 5 for the
> "perfect" index - in no particular order: aanum, did, d_ind, o_id, pnm.
> If you know your data you may be able to decide that a combination of 4 or
> 3 of these columns is "good enough" to reduce the workload dramatically at
> query time while not adding too great a maintenance overhead.
>
> Finally a generic warning:
> *a.o_id LIKE CASE WHEN stage.al <http://stage.al> = 'XXXX' THEN
> stage.o_id ELSE a.o_id END *
>
> This is a variant of a far commoner construct which is erroneously written
> as the model for "give me everything unless the user requests specific
> rows", viz:
> *where colX = nvl(:bind_variable, colX)*
> (See: https://jonathanlewis.wordpress.com/2007/01/09/conditional-sql/ )
>
> The "LIKE" is sufficiently different that it's possible the predicate is
> doing exactly what it's supposed to do, but it's worth mentioning that if
> o_id is NULL then the predicate evaluates to false (even when stage.al =
> 'XXXX') and this may not be the intention.
>
>
> Regards
> Jonathan Lewis
>
>
>
>





--
http://www.freelists.org/webpage/oracle-l


  • application/vnd.openxmlformats-officedocument.spreadsheetml.sheet attachment: Column_Stats.xlsx
Received on Mon Feb 01 2021 - 12:09:07 CET

Original text of this message