Re: Query running slow because cell offload discrepancy

From: Pap <oracle.developer35_at_gmail.com>
Date: Fri, 21 Jun 2024 12:12:45 +0530
Message-ID: <CAEjw_fhbt0WF0rT7kQHZE6Uk9oysZJEmtSmBACb5BKKDv3pzgw_at_mail.gmail.com>



Seems related to below. This one was ON but after the 19.21 upgrade we see its OFF. Yet to test though.

Any statistics which I should compare the trend from dba_hist_sysstat from before upgrade to after upgrade time, to be certain that this is the issue?

https://community.oracle.com/mosc/discussion/4545546/bug-31626438

On Thu, Jun 20, 2024 at 10:26 AM Pap <oracle.developer35_at_gmail.com> wrote:

> Thank you Jonathan.
>
> If I see below call stacks of the trace which i fetched , it shows it's
> actually doing the decisioning for each partition (starting partition#- 66
> till the end partition# - 79). But one thing i see, in case of Node-2
> trace, even though "ntcache" is showing non zero for most of the
> partition but it seems it was showing the decision as "no direct read" only
> for the 79th partition, because of very high value of "ntcache". So does
> it mean that the only 79th partition buffered scan took all the time? or
> the final decision to whether go for directread or nodirectread for all the
> partition is actually done based on the "nsmtio" decision of the last
> partition? Node-1 shows "ntcache" values for all the partition as "0" and
> thus "directread".
>
> https://gist.github.com/databasetech0073/ea635a8c7e58189aabdce1ae6171c7a7
>
> As you mentioned, I had also collected the nonzero difference of the
> "session stats" for the query execution for both the nodes and published
> them in the comment section of the below github post. It doesn't show any
> "***-undo records applied" stats but just shows "undo change vector" stats
> which is very small though. So I am not sure if it's pointing to some
> anomaly which is causing this behaviour?
>
> However one thing I see "table scan disk non-IMC rows gotten" , is ~2K on
> Node-1 vs ~67Million on Node-2. And in the trace i keep seeing one line "cell
> memory not enabled" in node-2, can that be related?
>
> https://gist.github.com/databasetech0073/4974857b245011d8fd7041220d0133d6
>
>
> On Thu, Jun 20, 2024 at 3:57 AM Jonathan Lewis <jlewisoracle_at_gmail.com>
> wrote:
>
>> Actually this table TAB1 is inserted/written by an application having
>>> service pointed to Node-1 and getting read by another application whose
>>> service points to Node-2, and the impacted query is one from node-2. Can
>>> this read and write pattern be related anyway?
>>
>>
>> Yes, it's possible. It may be a side effect of node-2 doing something
>> with undo records for read-consistency or commit cleanout. ,You will recall
>> I made the comment:
>>
>>> *The large number of cell single block reads is a little puzzle, of
>>> course, but you have the session stats so you can check for anomalies - in
>>> particular one that has been mentioned on Oracle-L fairly frequently, reads
>>> of the undo tablespace for read-consistency purposes (statistic "xxxx -
>>> undo records applied". *
>>
>>
>> Those statistics might be a clue about why node-2 ends up with a lot of
>> cached blocks when node-1 is doing all the DML. You could also look at
>> other stats relating to block handling (other than just the simplest ones
>> about gets and reads) to see if they give you any other clues. e.g.
>> cleanouts, cross-instance transfers.
>>
>> I am a little curious about the use (or not) of direct path reads / smart
>> scans - does the trace file show Oracle making the decision just once for
>> the whole table, I would have expected it to decide for each partition in
>> turn.
>>
>> Your query is incorrect, by the way. There may be multiple copies of each
>> block in the buffer so your count() should be of "distinct buf.block#".
>> (You should also be a little careful about about how you run this query -
>> if it runs from the CDB you ought to include the con_id in the query
>> otherwise you could get some confusing results)
>>
>> Regards
>> Jonathan Lewis
>>
>>
>>
>> On Wed, 19 Jun 2024 at 19:20, Pap <oracle.developer35_at_gmail.com> wrote:
>>
>>> Thank you so much Jonathan and Patrick.
>>>
>>> I checked the v$buffer_pool, during this run both of the nodes showed
>>> buffer_cache size as "32 GB" and "33 GB" respectively.
>>>
>>> However when we run the below command , it shows cached_blocks is 16935
>>> on Node-1 vs 2652629 on Node-2.
>>>
>>>

--
http://www.freelists.org/webpage/oracle-l
Received on Fri Jun 21 2024 - 08:42:45 CEST

Original text of this message