Re: Query running slow because cell offload discrepancy

From: Pap <oracle.developer35_at_gmail.com>
Date: Wed, 19 Jun 2024 23:50:32 +0530
Message-ID: <CAEjw_fi-+PEUXV-mRBYHmFpFvmMaXsr2NFfaNO=V=Sxi1bAQVw_at_mail.gmail.com>



Thank you so much Jonathan and Patrick.

I checked the v$buffer_pool, during this run both of the nodes showed buffer_cache size as "32 GB" and "33 GB" respectively.

However when we run the below command , it shows cached_blocks is 16935 on Node-1 vs 2652629 on Node-2.

select obj.owner, obj.object_name, obj.object_type,

    count(buf.block# ) as cached_blocks from v$bh buf
inner join dba_objects obj

    on buf.objd = obj.data_object_id
where buf.class# = 1 -- data blocks
and buf.status != 'free'
and lower(object_name)='tab1'
group by obj.owner, obj.object_name, obj.object_type

Then running the trace 10358 level 2 and setting event 'trace [NSMTIO] disk highest' shows below i.e. the decision of the "direct read" to be influenced by the number of blocks present in the buffer cache only.

The "ntcache" is showing as 1578935 out of the total 2027205 blocks on Node-2 trace. So now I am wondering , how it's happening consistently that one of the nodes is caching so many blocks for that table but not the other one. This issue is reproducible consistently.

Actually this table TAB1 is inserted/written by an application having service pointed to Node-1 and getting read by another application whose service points to Node-2, and the impacted query is one from node-2. Can this read and write pattern be related anyway? And also to avoid this , is the only option would be to force the "_serial_direct_read to true" at session level before running that query and then disable it afterwards or any other option exists to handle this?

Below is the key part of the trace from Node-1 and Node-2 , which suggests the presence of the higher number of blocks in the buffer cache being the cause of the difference in behaviour of the cell offloading across both the nodes.

https://gist.github.com/databasetech0073/4974857b245011d8fd7041220d0133d6

Regards
Pap

On Tue, Jun 18, 2024 at 3:43 PM Patrick Jolliffe <jolliffe_at_gmail.com> wrote:

>
> Did you generate and analyze trace per the following blog posts by Roger?
> https://orasql.org/2017/05/04/when-bloggers-get-it-wrong-part-1/
> https://orasql.org/2017/05/04/when-bloggers-get-it-wrong-part-2/
>
>
> On Tue, Jun 18, 2024 at 11:25:58, Jonathan Lewis <jlewisoracle_at_gmail.com>
> wrote:
>
>> If you want to check the actual size of the buffer cache you can query
>> v$buffer_pool; if you want to check if the buffer pool has been shrinking
>> and growing unexpectedly you can query v$sga_resize_ops (
>> https://jonathanlewis.wordpress.com/2007/04/16/sga-resizing/ ) or
>> v$memory_resize_ops. This MAY have had an impact.
>>
>> The decision about direct path also includes an assessment of how much of
>> the object is already cached - and when you compare the stats for the two
>> nodes you can see that they do roughly the same amount of "logical" I/O,
>> despite the fact that node 2 seems to have done only a tiny fraction of the
>> necessary physical I/O to read the table ... i.e. something has already
>> managed to get most of the table into the local buffer cache (or, possibly
>> but it seems unlikely given the timing, a HUGE amount of the table is being
>> access by cache transfer from node 1).
>>
>> The large number of cell single block reads is a little puzzle, of
>> course, but you have the session stats so you can check for anomalies - in
>> particular one that has been mentioned on Oracle-L fairly frequently, reads
>> of the undo tablespace for read-consistency purposes (statistic "xxxx -
>> undo records applied".
>>
>> There is a little oddity in the execution stats - specifically the EXECs
>> for the final nested loops, which are higher than they should be given the
>> number of "rows" acquired in the preceding steps. In itself this probably
>> has a minimal impact, but it does raise the question of whether something
>> catastriphic has happened but only been captured in these two lines.
>>
>> If you want to test the caching hupthesis execute "alter system flush
>> buffer cache" and re-run the query.
>>
>> N.B. for information purposes only: I think the VLOT, MTT, STT, stuff
>> applies at the partition level not the table level.
>>
>> Regards
>> Jonathan Lewis
>>
>>
>> On Mon, 17 Jun 2024 at 20:08, Pap <oracle.developer35_at_gmail.com> wrote:
>>
>>> Can someone guide here, how to proceed to get the root cause of this
>>> issue?
>>>
>>
>

--
http://www.freelists.org/webpage/oracle-l
Received on Wed Jun 19 2024 - 20:20:32 CEST

Original text of this message