Re: Query running slow because cell offload discrepancy

From: Pap <oracle.developer35_at_gmail.com>
Date: Thu, 20 Jun 2024 10:26:02 +0530
Message-ID: <CAEjw_fjABD-TvE3=++dyPHMCHwH1CSJfdfxgUsUoOORezQTB9w_at_mail.gmail.com>



Thank you Jonathan.

If I see below call stacks of the trace which i fetched , it shows it's actually doing the decisioning for each partition (starting partition#- 66 till the end partition# - 79). But one thing i see, in case of Node-2 trace, even though "ntcache" is showing non zero for most of the partition but it seems it was showing the decision as "no direct read" only for the 79th partition, because of very high value of "ntcache". So does it mean that the only 79th partition buffered scan took all the time? or the final decision to whether go for directread or nodirectread for all the partition is actually done based on the "nsmtio" decision of the last partition? Node-1 shows "ntcache" values for all the partition as "0" and thus "directread".

https://gist.github.com/databasetech0073/ea635a8c7e58189aabdce1ae6171c7a7

As you mentioned, I had also collected the nonzero difference of the "session stats" for the query execution for both the nodes and published them in the comment section of the below github post. It doesn't show any "***-undo records applied" stats but just shows "undo change vector" stats which is very small though. So I am not sure if it's pointing to some anomaly which is causing this behaviour?

However one thing I see "table scan disk non-IMC rows gotten" , is ~2K on Node-1 vs ~67Million on Node-2. And in the trace i keep seeing one line "cell memory not enabled" in node-2, can that be related?

https://gist.github.com/databasetech0073/4974857b245011d8fd7041220d0133d6

On Thu, Jun 20, 2024 at 3:57 AM Jonathan Lewis <jlewisoracle_at_gmail.com> wrote:

> Actually this table TAB1 is inserted/written by an application having
>> service pointed to Node-1 and getting read by another application whose
>> service points to Node-2, and the impacted query is one from node-2. Can
>> this read and write pattern be related anyway?
>
>
> Yes, it's possible. It may be a side effect of node-2 doing something with
> undo records for read-consistency or commit cleanout. ,You will recall I
> made the comment:
>
>> *The large number of cell single block reads is a little puzzle, of
>> course, but you have the session stats so you can check for anomalies - in
>> particular one that has been mentioned on Oracle-L fairly frequently, reads
>> of the undo tablespace for read-consistency purposes (statistic "xxxx -
>> undo records applied". *
>
>
> Those statistics might be a clue about why node-2 ends up with a lot of
> cached blocks when node-1 is doing all the DML. You could also look at
> other stats relating to block handling (other than just the simplest ones
> about gets and reads) to see if they give you any other clues. e.g.
> cleanouts, cross-instance transfers.
>
> I am a little curious about the use (or not) of direct path reads / smart
> scans - does the trace file show Oracle making the decision just once for
> the whole table, I would have expected it to decide for each partition in
> turn.
>
> Your query is incorrect, by the way. There may be multiple copies of each
> block in the buffer so your count() should be of "distinct buf.block#".
> (You should also be a little careful about about how you run this query -
> if it runs from the CDB you ought to include the con_id in the query
> otherwise you could get some confusing results)
>
> Regards
> Jonathan Lewis
>
>
>
> On Wed, 19 Jun 2024 at 19:20, Pap <oracle.developer35_at_gmail.com> wrote:
>
>> Thank you so much Jonathan and Patrick.
>>
>> I checked the v$buffer_pool, during this run both of the nodes showed
>> buffer_cache size as "32 GB" and "33 GB" respectively.
>>
>> However when we run the below command , it shows cached_blocks is 16935
>> on Node-1 vs 2652629 on Node-2.
>>
>>

--
http://www.freelists.org/webpage/oracle-l
Received on Thu Jun 20 2024 - 06:56:02 CEST

Original text of this message