Re: Direct read stopped suddenly

From: Sayan Malakshinov <xt.and.r_at_gmail.com>
Date: Wed, 25 May 2022 16:38:35 +0300
Message-ID: <CAOVevU4LLqqyQ1ON8bGfkk5vLQbVoPa8c84439iZn=gWM-HLhg_at_mail.gmail.com>



Hi Pap,

In short, when segment size is 5+ times bigger than buffer cache, Oracle doesn't analyze other conditions and uses serial direct reads, that is why I suggested using table_stats hint - in this case it doesn't matter how many object's blocks are in the buffer cache. When its size is smaller than STT (Small Table threshold), oracle just uses buffered reads. But when it is under MTT (medium table threshold) it uses a more complex logic: Roger MacNicol wrote about that in details:

http://orasql.org/2017/05/04/when-bloggers-get-it-wrong-part-1/
http://orasql.org/2020/10/13/serial-scans-failing-to-offload/
http://orasql.org/2017/05/04/when-bloggers-get-it-wrong-part-2/

On Wed, May 25, 2022 at 4:25 PM Pap <oracle.developer35_at_gmail.com> wrote:

> Also Sayan, as it appears to be availability or presence of certain
> percentage of blocks in the buffer cache out of total number of blocks of
> the table , is the deciding factor between the smart scan vs buffered read
> of an object. So is it good idea to have some kind of alerting done so as
> to catch this odd situation beforehand and then flush the object from the
> buffer cache proactively to avoid this issue?
>
> On Wed, 25 May 2022, 3:51 pm Pap, <oracle.developer35_at_gmail.com> wrote:
>
>> Thank you so much Sayan. Here the table size is <6GB whereas we have
>> db_cache_size set as 70GB. So i believe this can no way be satisfy that
>> VLOT i.e 5 times the buffer cache condition. So in that way it can noway go
>> for direct read. But as we know from the statistics this query from
>> dba_hist_sqlstat was going for cell smart scan in past days. So was there
>> any other criteria too which was making it eligible for going for smart
>> scan in past?
>>
>> And something related to howmuch percent of that object blocks present in
>> buffer cache caused it to not go for smart scan this time? As because in
>> this case i see out of total ~757K blocks(noted in dba_tables) 756K were
>> noted by "ntcache" in the trace file. So it means there must be some
>> criteria in which it would be checking what percent of blocks are still
>> lying over in buffer cache/dirty and are not flushed into the disk and thus
>> it would have taken the decision to go for buffer read rather direct
>> read/smart scan. What is that limit?
>>
>> we have _small_table_threshold = 260356
>>
>> and _very_large_object_threshold= 500
>>
>>
>> On Wed, 25 May 2022, 4:56 am Sayan Malakshinov, <xt.and.r_at_gmail.com>
>> wrote:
>>
>>> Hi Pap,
>>>
>>> Do you mean to say on DB node-1 the object was fully cached whereas on
>>>> node-2 and 3 and 4 it wasn't, and thus when the query was executed from
>>>> node-2/3/4 it was going for smart scan whereas when it was executed from
>>>> node-1 it was going for buffered read?
>>>
>>>
>>> yes, absolutely correct
>>>
>>> And should we then flush it from the buffer cache but it can then
>>>> happen again , so how should we handle this odd behaviour?
>>>
>>>
>>> Yes, you can flush buffer cache, but also you can use "table_stats" hint
>>> to force serial direct path reads: we know that Oracle choses serial direct
>>> path reads when our segment is bigger than "_very_large_object_threshold"
>>> percent of buffer cache. By default, it's equal to 500, i.e. 500%, so if
>>> that table segment is 5 times bigger than buffer cache, Oracle should
>>> choose direct reads.
>>>
>>> Simple example:
>>> https://gist.github.com/xtender/3c2d608f0d8de6b63b78f814903c70ea
>>>
>>> SQL> _at_vparam_ _very_large_object_threshold
>>>
>>> NAME VALUE DEFLT TYPE
>>> DESCRIPTION
>>> ---------------------------------------- ------------ ------------
>>> ------------ ------------------------------------------------------------
>>> _very_large_object_threshold 500 TRUE
>>> number upper threshold level of object size for direct reads
>>>
>>> SQL> select name,block_size,buffers from v$buffer_pool;
>>>
>>> NAME BLOCK_SIZE BUFFERS
>>> -------------------- ---------- ----------
>>> DEFAULT 8192 41118
>>>
>>> SQL> col vlot new_value vlot;
>>> SQL> select buffers*5+1 as vlot from v$buffer_pool;
>>>
>>> VLOT
>>> ----------
>>> 205591
>>>
>>> SQL> create table t_direct as select 1 x from dual;
>>>
>>> Table created.
>>>
>>> SQL> select name,value from v$statname n,v$sesstat s where
>>> s.sid=userenv('sid') and s.statistic#=n.statistic# and name='physical reads
>>> direct';
>>>
>>> NAME
>>> VALUE
>>> ----------------------------------------------------------------
>>> ----------
>>> physical reads direct
>>> 0
>>>
>>> SQL> select count(*) from t_direct;
>>>
>>> COUNT(*)
>>> ----------
>>> 1
>>>
>>> SQL> select name,value from v$statname n,v$sesstat s where
>>> s.sid=userenv('sid') and s.statistic#=n.statistic# and name='physical reads
>>> direct';
>>>
>>> NAME
>>> VALUE
>>> ----------------------------------------------------------------
>>> ----------
>>> physical reads direct
>>> 0
>>>
>>> SQL> select/*+ table_stats(t_direct set rows=1000000 blocks=&vlot) */
>>> count(*) from t_direct;
>>>
>>> COUNT(*)
>>> ----------
>>> 1
>>>
>>> SQL> select name,value from v$statname n,v$sesstat s where
>>> s.sid=userenv('sid') and s.statistic#=n.statistic# and name='physical reads
>>> direct';
>>>
>>> NAME
>>> VALUE
>>> ----------------------------------------------------------------
>>> ----------
>>> physical reads direct
>>> *1*
>>>
>>> More info:
>>> http://orasql.org/2019/04/16/correct-syntax-for-the-table_stats-hint/
>>>
>>> http://orasql.org/2013/03/07/just-test-of-adaptive-direct-path-reads-with-index_stats/
>>>
>>>
>>> <http://orasql.org/2013/03/07/just-test-of-adaptive-direct-path-reads-with-index_stats/>
>>> Full example:
>>> https://gist.github.com/xtender/3c2d608f0d8de6b63b78f814903c70ea
>>>
>>>
>>> On Tue, May 24, 2022 at 11:13 PM Pap <oracle.developer35_at_gmail.com>
>>> wrote:
>>>
>>>> Thank you Sayan. This query has been running fine since the last
>>>> 5-6months and always using cell smart scan , as i can see from 'offload
>>>> returned bytes' column value >0 in dba_hist_sqlstat for this query and thus
>>>> the elapsed time was alway <1second and never encountered such behaviour.
>>>> Do you mean to say on DB node-1 the object was fully cached whereas on
>>>> node-2 and 3 and 4 it wasn't, and thus when the query was executed from
>>>> node-2/3/4 it was going for smart scan whereas when it was executed from
>>>> node-1 it was going for buffered read? And should we then flush it from the
>>>> buffer cache but it can then happen again , so how should we handle this
>>>> odd behaviour?
>>>>
>>>> On Wed, May 25, 2022 at 1:28 AM Sayan Malakshinov <xt.and.r_at_gmail.com>
>>>> wrote:
>>>>
>>>>> Hi Pap,
>>>>>
>>>>> ntcache is a number of blocks of that object in the buffer cache.
>>>>> Since there are almost all blocks in the cache, oracle decides to do not
>>>>> use direct reads.
>>>>>
>>>>>
>>>>> Best regards,
>>>>> Sayan Malakshinov
>>>>> Oracle performance tuning expert
>>>>> Oracle Database Developer Choice Award winner
>>>>> Oracle ACE
>>>>> http://orasql.org
>>>>>
>>>>> вт, 24 мая 2022 г., 22:29 Pap <oracle.developer35_at_gmail.com>:
>>>>>
>>>>>> Hello Listers, We have one legacy system on version 11.2.0.4 and it
>>>>>> is planned to be migrated to 19C. It's a four node database. We have a
>>>>>> select query suddenly stopped doing smartscan from one node and the
>>>>>> response time goes from <1 seconds to >5seconds. And the surprising thing
>>>>>> is , when we manually trigger the sql , it's doing a smart scan when
>>>>>> triggered from other nodes but from one node only it's not doing that and
>>>>>> the response time going high for that specific node. It seems that one
>>>>>> query showing this behaviour and other running fine.Ideally my
>>>>>> understanding was that it should not vary based on DB node as because the
>>>>>> storage nodes are common across all the four DB nodes. So I'm wondering ,
>>>>>> why is it happening?
>>>>>>
>>>>>> Collected the 10358 trace and below is the trace. It does show a
>>>>>> difference in the value of "ntcache" and "ntdist" for both the nodes and
>>>>>> that is possibly driving the direct read/smartscan decision. So I wanted to
>>>>>> understand what is that and why that value is different in both DB nodes
>>>>>> and how to fix this odd behaviour?
>>>>>>
>>>>>> query:-
>>>>>> SELECT c1, c8, c2, c3, c4, c5, c6, c7 from TAB1 where c1 = :1
>>>>>>
>>>>>> Plan hash value: 2108848690
>>>>>>
>>>>>> --------------------------------------------------------------------------------------------------------
>>>>>> | Id | Operation | Name | Rows |
>>>>>> Bytes | Cost (%CPU)| Time |
>>>>>>
>>>>>> --------------------------------------------------------------------------------------------------------
>>>>>> | 0 | SELECT STATEMENT | | |
>>>>>> | 1583 (100)| |
>>>>>> | 1 | TABLE ACCESS STORAGE FULL| TAB1 | 9 |
>>>>>> 711 | 1583 (1)| 00:00:19 |
>>>>>>
>>>>>> --------------------------------------------------------------------------------------------------------
>>>>>> Query Block Name / Object Alias (identified by operation id):
>>>>>> -------------------------------------------------------------
>>>>>> 1 - SEL$1 / TAB1_at_SEL$1
>>>>>>
>>>>>>
>>>>>> Trace from DB node which is doing smartscan:-
>>>>>>
>>>>>> NSMTIO: qertbFetch:[MTT < OBJECT_SIZE < VLOT]: Checking cost to read
>>>>>> from caches(local/remote) and checking storage reduction factors (OLTP/EHCC
>>>>>> Comp)
>>>>>> NSMTIO: kcbdpc:DirectRead: tsn: 29, objd: 1607047, objn: 1607047
>>>>>> ckpt: 1, nblks: 757301, ntcache: 10, ntdist:30
>>>>>> Direct Path for tsn 29 objd 1607047 objn 1607047
>>>>>> Direct Path 1 ckpt 1, nblks 757301 ntcache 10 ntdist 30
>>>>>> Direct Path mndb 0 tdiob 12 txiob 0 tciob 66
>>>>>> Direct path diomrc 128 dios 2
>>>>>> NSMTIO: Additional Info: VLOT=65089150
>>>>>> Object# = 1607047, Object_Size = 757301 blocks
>>>>>> SqlId = 64d0vkmtjgq2x, plan_hash_value = 2108848690, Partition# = 0
>>>>>> KCBO: prepared ksr msg for cid=131228, tsn=29, obj=1607047, 0
>>>>>>
>>>>>>
>>>>>> Trace from DB node which is not doing smartscan:-
>>>>>>
>>>>>> NSMTIO: qertbFetch:[MTT < OBJECT_SIZE < VLOT]: Checking cost to read
>>>>>> from caches(local/remote) and checking storage reduction factors (OLTP/EHCC
>>>>>> Comp)
>>>>>> NSMTIO: kcbdpc:NoDirectRead:[CACHE_READ]: tsn: 29, objd: 1607047,
>>>>>> objn: 1607047
>>>>>> ckpt: 0, nblks: 757301, ntcache: 756116, ntdist:1185
>>>>>> Direct Path for tsn 29 objd 1607047 objn 1607047
>>>>>> Direct Path 0 ckpt 0, nblks 757301 ntcache 756116 ntdist 1185
>>>>>> Direct Path mndb 18 tdiob 10 txiob 0 tciob 70
>>>>>> Direct path diomrc 128 dios 2
>>>>>> NSMTIO: Additional Info: VLOT=65089150
>>>>>> Object# = 1607047, Object_Size = 757301 blocks
>>>>>> SqlId = 64d0vkmtjgq2x, plan_hash_value = 2108848690, Partition# = 0
>>>>>>
>>>>>
>>>
>>> --
>>> Best regards,
>>> Sayan Malakshinov
>>> Oracle performance tuning engineer
>>> Oracle ACE
>>> http://orasql.org
>>>
>>

-- 
Best regards,
Sayan Malakshinov
Oracle performance tuning engineer
Oracle ACE
http://orasql.org

--
http://www.freelists.org/webpage/oracle-l
Received on Wed May 25 2022 - 15:38:35 CEST

Original text of this message