Re: Plan Baseline

From: Jonathan Lewis <jlewisoracle_at_gmail.com>
Date: Tue, 10 Aug 2021 11:20:03 +0100
Message-ID: <CAGtsp8nS7cu9HJPN2duEcxj8m569CYc9Z-2mtXsw6BvWvnyLkg_at_mail.gmail.com>



The first problem is that you're running 12.1.0.2 which gives you plenty of scope for exposing optimizer gaps.
In particular the one that leap to mind is the fact that you have several MERGE() hints in the outline - and there's an enhancement in 12.2.0.1 which deals with the problem that SPB's that contain MERGE() hints don't always reproduce the same plan because they don't specify what they merge into, only what they merge from. The unambiguous 12.2. syntax looks like: merge(_at_qb1 > qb2), e.g. merge(_at_SEL$12 > SEL$641071AC). So it's possible that with some change in estimated numbers the optimizer has merged query blocks in a different order - getting to the same final query block - and failed to reproduce the plan.

The second thing that stand out in your good/bad plan txt is that one plan has an estimate of 8300 rows, the other has an estimate of 6.7M rows. Tracking back to source this is because there an estimate of 10 rows for f_in_list_num in one plan and 8,168 (the default collection size) in the other.
This is a little surprising since there's a cardinality(10) hint in the text on the SQLT output. (I would change that from a table hint to a query block hint, possibly using opt_estimate() to be up to date; and I'd also bring the syntax up to date the "THE" is a very old requirement. It's odd that the hint has been obeyed in one case and not the other. Since baselines include the hint "IGNORE_OPTIM_EMBEDDED_HINTS" you would have expected the plan that followed the baseline to be using 8168, and the plan that bypassed the baseline to use 10.)

This is actually busy sql and running 10053 trace may cause spikes in application in production. Correct me if I am wrong , I believe even though if we enable 10053 trace (alter system set events 'trace[rdbms.SQL_Optimizer.*][sql:gn1072x46d6sb]';) , we need to wait for the sql to re-parse with the bad plan.

This should only dump the CBO trace file on a hard parse - how long does it take to parse the statement, and how often are you doing it. How much extra time would a hard parse take compared to the time taken and resources wasted by running the wrong plan. (Yes, you would have to wait, but you could flush a single cursor from the shared pool to force a hard parse: https://jonathanlewis.wordpress.com/2019/12/20/purge_cursor/ )

Without getting the production CBO trace you're not likely to find why the baseline is being bypassed - I assume you've checked that the baseline is still accepted etc. in the data dictionary. What,exactly, do you mean by saying the baseline is "pinned", by the way.

Regards
Jonathan Lewis

On Mon, 9 Aug 2021 at 19:56, manikandan <pvmanikandan_at_gmail.com> wrote:

> Hi Jonathan,
>
>
>
> Thanks for the reply. The patch was not related to CBO (24764744: MORE
> THAN ONE CURRENT BUFFER FOR A BLOCK IN EXTREAMLY RARE SCENARIO).
>
> This is actually busy sql and running 10053 trace may cause spikes in
> application in production. Correct me if I am wrong , I believe even though
> if we enable 10053 trace (alter system set events
> 'trace[rdbms.SQL_Optimizer.*][sql:gn1072x46d6sb]';) , we need to wait for
> the sql to re-parse with the bad plan.
>
>
>
> We are not facing this behaviour in our performance testing environment
> which is prod like environment.
>
>
>
> Little background on the query
>
>
>
> We had a query which was picking bad execution plan due to de-correlated
> lateral (DCL) views and causing potential performance issues to the
> application. As per Oracle recommendation, we have added optimizer hint
> opt_param('_optimizer_ansi_join_lateral_enhance', 'false') to the query to
> skip DCL view. After the change, we see that the sql id is picking bad
> plans along with good plans (3975766313 - SPB created after putting hint)
> for few executions.
>
>
>
> After the DB bounce, good PHV is 2563251471 and we pinned SPB for this PHV
> as sql was not picking the SPB which we created earlier(on 3975766313).
>
>
>
> I have attached good plans, bad plans , sql stats and 10053 trace from
> SQLT (which is picking SPB) for your reference.
>
>
>
> One thing I noted is
>
>
>
> From Good Plan:-
>
>
>
> Note
>
> -----
>
> - SQL plan baseline SQL_PLAN_673798mksxnxm66e436ab used for this
> statement
>
> - Warning: basic plan statistics not available. These are only
> collected when:
>
> * hint 'gather_plan_statistics' is used for the statement or
>
> * parameter 'statistics_level' is set to 'ALL', at session or
> system level
>
>
>
> From Bad Plan:-
>
>
>
> Note
>
> -----
>
> - statistics feedback used for this statement
>
> - Warning: basic plan statistics not available. These are only
> collected when:
>
> * hint 'gather_plan_statistics' is used for the statement or
>
> * parameter 'statistics_level' is set to 'ALL', at session or
> system level
>
>
>
> Do you think statistics feedback playing a role here and causing this
> behaviour?
>
>
>
> Thanks,
>
> Mani
>
> On Mon, Aug 9, 2021 at 7:50 AM Jonathan Lewis <jlewisoracle_at_gmail.com>
> wrote:
>
>> Mani,
>>
>> A couiple of thoughts:
>>
>> 1: What was the patch for? If it was something to do with wrong results,
>> or CBO picking bad plan, then maybe the patch has blocked some
>> transformation that the SPB depended on,
>>
>> 2: if you can enable the 10053 (CBO) trace and force re-optimization of
>> the query (taking care to ensure that it's being called in exactly the way
>> it gets called in producton) you may find some clue in the trace file
>> explaining by the SPB has not been used.
>>
>>
>> Regards
>> Jonthan Lewis
>>
>>
>>
>>
>> On Mon, 9 Aug 2021 at 07:22, manikandan <pvmanikandan_at_gmail.com> wrote:
>>
>>> Hi,
>>>
>>> We have implemented SPB for one query with a good plan. Post
>>> implementation CBO was able to use the baseline without issue. We went for
>>> DB bounce after that due to one-off patch maintenance and after the DB
>>> bounce CBO is unable to use the BASELINE anymore and has created a new
>>> plan. We have tried flushing the cursor multiple time but of no use. How we
>>> can force the SQL to use the baseline? What would be the cause of query not
>>> being used sql plan baseline post DB bounce?. Database version is 12.1.0.2
>>> + RAC 5 nodes + AIX.
>>>
>>> Thanks,
>>> Mani
>>>
>>

--
http://www.freelists.org/webpage/oracle-l
Received on Tue Aug 10 2021 - 12:20:03 CEST

Original text of this message