Re: improve cardinality estimate
Date: Fri, 19 Mar 2021 09:59:02 +0200
Message-ID: <CA+riqSWpMFY9sUb+M-zos_JJ0hE22MsBHxup=xpHfetbBdxyMg_at_mail.gmail.com>
I`m curios why rewriting instead of a>=b to a-b>=0 helped to get closer to cardinality estimate
În joi, 18 mar. 2021 la 16:31, Jonathan Lewis <jlewisoracle_at_gmail.com> a scris:
>
> Are all these dates date-only, or is there a time component allowed ?
> Is this one partition per day.
>
> From what you've said, and assuming YES as the answer to the questions
> above, creating a virtual column on (order_processing_date -
> business_processing_date) looks like an obvious thing to do. And then
> create a (frequency) histogram on the virtual column so that the optimizer
> can see there are (usually?) only 3 distinct values and the one you're
> interested in is rare.
>
> Regards
> Jonathan Lewis
>
>
> On Thu, 18 Mar 2021 at 13:52, Laurentiu Oprea <laurentiu.oprea06_at_gmail.com>
> wrote:
>
>> Regarding parallel I`m relaying my affirmation of the fact that I tested
>> with hint /*+ dynamic_sampling(8) */ (the DS level with parallel hint) and
>> gives the same estimate.
>>
>> Yes there is a relation order_procesing_date between
>> business_procesing_date -1 and business_procesing_date + 1 and there is no
>> fooling like null values or year 1900/9999
>>
>> The pattern is constant , grouping by partition_date for the last 7 days
>> it ranges from 0 to 130k (out of a total of 24M) rows filtered by
>> condition order_procesing_date > = business_procesing_date .
>>
>> Also the columns are not virtual columns (strange expressions)
>>
>> În joi, 18 mar. 2021 la 15:14, Jonathan Lewis <jlewisoracle_at_gmail.com> a
>> scris:
>>
>>> The change for parallel processing is one I'd have to think about - it
>>> may be something to do with the way the predicates are rewritten for the
>>> parallel query slaves - so I'll postpone thinking about that bit.
>>>
>>> The questions to ask first of all are:
>>> a) what's the business logic. is there an approximate relationship
>>> between order_processing_date and business_processing date (e.g. the former
>>> is usually about 3 days before the latter).
>>> b) has the optimizer been fooled by a "silly null" effect - like "not
>>> yet processed" = "31-Dec-9999"
>>> c) how much does the pattern change over time - e.g. if the partition
>>> date is sysdate - 1 is this a partition where a lot of rows have not yet
>>> had business_processing_date set, but if partition date = sysdate - 7 then
>>> maybe lots of rows now have a business_processing_date set.
>>>
>>> Regards
>>> Jonathan Lewis
>>>
>>>
>>>
>>>
>>>
>>> On Thu, 18 Mar 2021 at 12:26, Laurentiu Oprea <
>>> laurentiu.oprea06_at_gmail.com> wrote:
>>>
>>>> Hello,
>>>>
>>>> I recently saw a query with a wrong execution plan and looks to be
>>>> caused by an overestimate of a particular where clause. Ex:
>>>>
>>>> select
>>>> count(*)
>>>> from
>>>> orders
>>>> where
>>>> partition_date = sysdate-1 and
>>>> order_procesing_date > = business_procesing_date
>>>>
>>>> the count itself is = 42K
>>>> the count itself without second condition ~ 24M
>>>> the cardinality estimate ~ 24M
>>>> the cardinality estimate if I add parallel hint: 1M (this is an
>>>> interesting one and looks to be related with the fact that parallel queries
>>>> get DS level 8)
>>>> the cardinality estimate if I rewrite the condition :
>>>> order_procesing_date - business_procesing_date >= interval '0' minute
>>>> :1.4M
>>>>
>>>> I haven't tried yet but I suspect that if I create extended stats on
>>>> expression ( order_procesing_date - business_procesing_date) can get me
>>>> better results
>>>>
>>>> The question is what is the logic behind beter cardinality estimates if
>>>> I rewrite the where clause?
>>>> What will be the best approach for this query?
>>>>
>>>> Thank you.
>>>>
>>>
-- http://www.freelists.org/webpage/oracle-lReceived on Fri Mar 19 2021 - 08:59:02 CET