Re: Fixing Performance issue with less selective columns

From: Jonathan Lewis <jlewisoracle_at_gmail.com>
Date: Thu, 26 Aug 2021 12:03:22 +0100
Message-ID: <CAGtsp8n-mhxZei0Ziio1tyoULDK0cmXVg9k6Lh7y+z61TCXSwA_at_mail.gmail.com>



Did you find this query as a result of looking at the node 1 load that prompted you question about a query on node 2 running 2 or 3 times as slowly?

Parallelism would be a bad idea - on (a full rack) Exadata a tablescan is effectively parallel 14 anyway because all the cell servers will work concurrently to do the tablescan. Added to which you've reported the time as a couple of seconds, and Oracle's once published a guideline that there's no point in using parallelism for queries that take less than 6 seconds to complete (because of the startup and coordination overheads).

The SQL Monitor report shows you doing 43,000 read requests to scan 45GB of data - and even with 9.99% offload that's a very large number of requests and a huge load if you want to do it a large number of times, concurrently.

You report roughly 330K distsinct values of M_TXT out of 111M rows - that's an average of 333 rows per value. In the worst case that would be 333 real physical read request to the table - which should perform just as well as 43,000 optimised requests but (more significantly) shouldn't overload your system when several users try to do it concurrently. So, as a first step you should do a select count disstinct trim(m_txt) to see if the trim() changes the estimates significantly, and if it still looks good then index on trim(m_txt).

If you want to reducecontention on index access you could globally hash partition the index on trim(m_txt) - I'd have to check that you could do that in 11.2.0.4 so you've got a little testing to do. However, you might want to start the index with one or two of the other columns with small numbers of distinct values and create the index with compression if that guaranteeably reduces the number of visits to the table by a factor of 2 to 20.

The index (even if it were just trim(m_txt) wouldn't help you with your substr(trim(m_txt),2,20) predicate unless your access path could use an index fast full scan (or possibly an index full scan). However if there were a couple of leading columns with equality it might be sufficient to make the optimizer do an index range scan on the leading columns followed by a filter at the index on that predicate.

Regards
Jonathan Lewis

On Wed, 25 Aug 2021 at 20:26, Lok P <loknath.73_at_gmail.com> wrote:

> Thank You Andy. Actually as I stated earlier, this query is executed in a
> few seconds only but it's the higher number of executions(thousands+) for
> different binds which add up to the overall long run time of the process.
> and also most of the runs result in zero rows only. I tried putting
> it/query in a loop for different binds and executed with parallel (i 4)
> and without parallel and it seems it's taking longer with parallel hints.
> It may be because of maintaining and aggregating those additional
> parallel slaves etc.
>
> *"Is :b4:= 'A' representative of your typical query here?" *apology for
> the confusion, actually i was trying to camouflage the actual bind values
> and that made it confusing. So it's actually coming as 'D' i.e. having
> ~104million matching rows in that table. So indexing that won't help here
> too.
>
> The column M_TXT, its VARCHAR2(100 byte) column and holding string values
> with many having spaces in it at start/last.So trim function
> seems necessary.This filter alone is making the result set to ZERO for this
> query and its having highest NUM_DISTINCT values too, so it seems good
> candidate here for index. And this filter With regards to
> substr(:B8,.50), I will confirm and try to get it removed as that doesn't
> seem to serve any purpose but was somehow is there in this legacy code. We
> should be able to simply remove that SUBSTR function without any issue. But
> irrespective of that SUBSTR function on the right hand side of the
> predicate, are you pointing towards creating a function based index on
> trim(M_TXT) at the left hand side to help this query? Actually , I saw in
> many other places , this column is utilized in the predicates section like
> *substr(trim(m_txt),2,20)*=decode(col1, 'YYY', substr('XXXXX',1,20)). So
> is it possible to just have one index to cater both scenarios or we can
> tweak the query someway so as to utilize the same index for all the queries
> on this string column?
>
>
> On Thu, Aug 26, 2021 at 12:12 AM Andy Sayer <andysayer_at_gmail.com> wrote:
>
>> Hi Lok,
>>
>> " If there exists any other way to make this query faster without
>> creating any new index that would really be helpful."
>> You can use parallelism and have the scan completed by more processes at
>> once.
>>
>> Is :b4:= 'A' representative of your typical query here? The results
>> you've shared suggest this will return 0 rows, however there aren't a lot
>> of distinct values for DC_CODE and you don't have any statistics. If the
>> argument is frequently used then an index and a frequency histogram on this
>> column would be lucrative. It's a bit surprising you have so many
>> histograms on this table but this column's statistics didn't appear.
>>
>> The filter on M_TXT also would provide decent selectivity, I would
>> question whether the trim is really needed. The substr 0.5 is a huge red
>> flag.
>>
>> Thanks,
>> Andrew
>>
>>
>>

--
http://www.freelists.org/webpage/oracle-l
Received on Thu Aug 26 2021 - 13:03:22 CEST

Original text of this message