Re: Production and Test Environment Queries Behaving Differently

From: Andy Sayer <andysayer_at_gmail.com>
Date: Tue, 19 Sep 2017 00:05:11 +0100
Message-ID: <CACj1VR5rY2QeiYQRQtNw+8rAH3WBe9rRq4NeNiF4k5PrxEiVSg_at_mail.gmail.com>



Hi Ian,

Your view is masking the full query, it would be worth getting the full DDL for the view so you can see exactly what you're working with. The string-date is obviously a worry, but if approval_date is really a date then your string would be implicitly converted to a date (using your nls_date_format) and it shouldn't really effect you. Of course, it is a terrible thing to do, and you should never rely on implicit conversion, but it's not your problem here.

Thinking out loud:

The "good" execution was parsed with user id 64 and ended up with a plan that starts with hash joins and eventually uses a nonnested subquery filter to go from 336109 rows to 275786 rows. You then do some aggregation and filtering based on the result of that and end up with 5344 rows. The "bad" execution was parsed by SYS and started with a merge join within an internal view named VW_SQ_1, this could be from an unnested correlated subquery. The result of this subquery is 806300000 rows, that's quite a large amount, the CBO estimated 37361654 which is also quite large but not as large...
This is also where you do 135696 physical reads and 3967287 physical writes, which is where your time is going (looking at your wait events and the time part of the row sources - although it takes a bit more concentration to read that!)
Next you hash join to PS_EOAW_USERINST (which you've already read once), although this time it didn't do any work (and it doesn't make too much sense why no gets happened here) the stats think it has 25857 rows. This table only appears once in your first plan but twice in your second... perhaps there is a unique constraint that has allowed a rewrite to get rid of it? I would guess you have some SQL within the view like:

from ps_eoaw_userinst outer
where outer.some_cols = some_values
and outer.pk_col in (select subq.pk_col from PS_EOAW_USERINST subq where subq.some_other_cols = some_values)

Given the transformations in play:
What was the full version number of these instances? What were the optimizer parameters set as (especially optimizer_features_enable)? What constraints exist on PS_EOAW_USERINST, are they the same enable/validated states on both?

You could sanity check if the fast plan can occur on your bad instance, by extracting the outline and applying it as a hint. Using dbms_xplan.display_cursor with a format that includes the outline on the fast cursor is an easy way of getting the outline. If the plan can occur then check what the costing is like, compare it to the plan that the CBO wants to use (the one without the outline), clues should become apparent - maybe a join cardinality has been estimated badly etc...

Hope that helps,
Andrew

On 18 September 2017 at 21:59, Sayan Malakshinov <xt.and.r_at_gmail.com> wrote:

> > A.APPROVAL_DATE > '31-JUL-2017'
>
> And I also suspect that the root cause was implicit NLS conversion..
>

--
http://www.freelists.org/webpage/oracle-l
Received on Tue Sep 19 2017 - 01:05:11 CEST

Original text of this message