Re: Direct read stopped suddenly
Date: Wed, 25 May 2022 02:26:18 +0300
Message-ID: <CAOVevU582Jm_0j=f5BmC5tTDbspVRy8jZj1SYtY9=RbjyU08HA_at_mail.gmail.com>
Hi Pap,
Do you mean to say on DB node-1 the object was fully cached whereas on
> node-2 and 3 and 4 it wasn't, and thus when the query was executed from
> node-2/3/4 it was going for smart scan whereas when it was executed from
> node-1 it was going for buffered read?
yes, absolutely correct
And should we then flush it from the buffer cache but it can then happen
> again , so how should we handle this odd behaviour?
Yes, you can flush buffer cache, but also you can use "table_stats" hint to force serial direct path reads: we know that Oracle choses serial direct path reads when our segment is bigger than "_very_large_object_threshold" percent of buffer cache. By default, it's equal to 500, i.e. 500%, so if that table segment is 5 times bigger than buffer cache, Oracle should choose direct reads.
Simple example:
https://gist.github.com/xtender/3c2d608f0d8de6b63b78f814903c70ea
SQL> _at_vparam_ _very_large_object_threshold
NAME VALUE DEFLT TYPE
DESCRIPTION
---------------------------------------- ------------ ------------ ------------ ------------------------------------------------------------ _very_large_object_threshold 500 TRUE number
upper threshold level of object size for direct reads
SQL> select name,block_size,buffers from v$buffer_pool;
NAME BLOCK_SIZE BUFFERS -------------------- ---------- ---------- DEFAULT 8192 41118
SQL> col vlot new_value vlot;
SQL> select buffers*5+1 as vlot from v$buffer_pool;
VLOT
205591
SQL> create table t_direct as select 1 x from dual;
Table created.
SQL> select name,value from v$statname n,v$sesstat s where s.sid=userenv('sid') and s.statistic#=n.statistic# and name='physical reads direct';
NAME VALUE ---------------------------------------------------------------- ---------- physical reads direct 0
SQL> select count(*) from t_direct;
COUNT(*)
1
SQL> select name,value from v$statname n,v$sesstat s where s.sid=userenv('sid') and s.statistic#=n.statistic# and name='physical reads direct';
NAME VALUE ---------------------------------------------------------------- ---------- physical reads direct 0
SQL> select/*+ table_stats(t_direct set rows=1000000 blocks=&vlot) */ count(*) from t_direct;
COUNT(*)
1
SQL> select name,value from v$statname n,v$sesstat s where s.sid=userenv('sid') and s.statistic#=n.statistic# and name='physical reads direct';
NAME VALUE ---------------------------------------------------------------- ----------physical reads direct
*1*
More info:
http://orasql.org/2019/04/16/correct-syntax-for-the-table_stats-hint/
http://orasql.org/2013/03/07/just-test-of-adaptive-direct-path-reads-with-index_stats/
<http://orasql.org/2013/03/07/just-test-of-adaptive-direct-path-reads-with-index_stats/>
Full example:
https://gist.github.com/xtender/3c2d608f0d8de6b63b78f814903c70ea
On Tue, May 24, 2022 at 11:13 PM Pap <oracle.developer35_at_gmail.com> wrote:
> Thank you Sayan. This query has been running fine since the last 5-6months
> and always using cell smart scan , as i can see from 'offload returned
> bytes' column value >0 in dba_hist_sqlstat for this query and thus the
> elapsed time was alway <1second and never encountered such behaviour. Do
> you mean to say on DB node-1 the object was fully cached whereas on node-2
> and 3 and 4 it wasn't, and thus when the query was executed from node-2/3/4
> it was going for smart scan whereas when it was executed from node-1 it was
> going for buffered read? And should we then flush it from the buffer cache
> but it can then happen again , so how should we handle this odd behaviour?
>
> On Wed, May 25, 2022 at 1:28 AM Sayan Malakshinov <xt.and.r_at_gmail.com>
> wrote:
> >> Hi Pap, >> >> ntcache is a number of blocks of that object in the buffer cache. Since >> there are almost all blocks in the cache, oracle decides to do not use >> direct reads. >> >> >> Best regards, >> Sayan Malakshinov >> Oracle performance tuning expert >> Oracle Database Developer Choice Award winner >> Oracle ACE >> http://orasql.org >> >> вт, 24 мая 2022 г., 22:29 Pap <oracle.developer35_at_gmail.com>: >> >>> Hello Listers, We have one legacy system on version 11.2.0.4 and it is >>> planned to be migrated to 19C. It's a four node database. We have a select >>> query suddenly stopped doing smartscan from one node and the response time >>> goes from <1 seconds to >5seconds. And the surprising thing is , when we >>> manually trigger the sql , it's doing a smart scan when triggered from >>> other nodes but from one node only it's not doing that and the response >>> time going high for that specific node. It seems that one query showing >>> this behaviour and other running fine.Ideally my understanding was that it >>> should not vary based on DB node as because the storage nodes are common >>> across all the four DB nodes. So I'm wondering , why is it happening? >>> >>> Collected the 10358 trace and below is the trace. It does show a >>> difference in the value of "ntcache" and "ntdist" for both the nodes and >>> that is possibly driving the direct read/smartscan decision. So I wanted to >>> understand what is that and why that value is different in both DB nodes >>> and how to fix this odd behaviour? >>> >>> query:- >>> SELECT c1, c8, c2, c3, c4, c5, c6, c7 from TAB1 where c1 = :1 >>> >>> Plan hash value: 2108848690 >>> >>> -------------------------------------------------------------------------------------------------------- >>> | Id | Operation | Name | Rows | >>> Bytes | Cost (%CPU)| Time | >>> >>> -------------------------------------------------------------------------------------------------------- >>> | 0 | SELECT STATEMENT | | | >>> | 1583 (100)| | >>> | 1 | TABLE ACCESS STORAGE FULL| TAB1 | 9 | >>> 711 | 1583 (1)| 00:00:19 | >>> >>> -------------------------------------------------------------------------------------------------------- >>> Query Block Name / Object Alias (identified by operation id): >>> ------------------------------------------------------------- >>> 1 - SEL$1 / TAB1_at_SEL$1 >>> >>> >>> Trace from DB node which is doing smartscan:- >>> >>> NSMTIO: qertbFetch:[MTT < OBJECT_SIZE < VLOT]: Checking cost to read >>> from caches(local/remote) and checking storage reduction factors (OLTP/EHCC >>> Comp) >>> NSMTIO: kcbdpc:DirectRead: tsn: 29, objd: 1607047, objn: 1607047 >>> ckpt: 1, nblks: 757301, ntcache: 10, ntdist:30 >>> Direct Path for tsn 29 objd 1607047 objn 1607047 >>> Direct Path 1 ckpt 1, nblks 757301 ntcache 10 ntdist 30 >>> Direct Path mndb 0 tdiob 12 txiob 0 tciob 66 >>> Direct path diomrc 128 dios 2 >>> NSMTIO: Additional Info: VLOT=65089150 >>> Object# = 1607047, Object_Size = 757301 blocks >>> SqlId = 64d0vkmtjgq2x, plan_hash_value = 2108848690, Partition# = 0 >>> KCBO: prepared ksr msg for cid=131228, tsn=29, obj=1607047, 0 >>> >>> >>> Trace from DB node which is not doing smartscan:- >>> >>> NSMTIO: qertbFetch:[MTT < OBJECT_SIZE < VLOT]: Checking cost to read >>> from caches(local/remote) and checking storage reduction factors (OLTP/EHCC >>> Comp) >>> NSMTIO: kcbdpc:NoDirectRead:[CACHE_READ]: tsn: 29, objd: 1607047, objn: >>> 1607047 >>> ckpt: 0, nblks: 757301, ntcache: 756116, ntdist:1185 >>> Direct Path for tsn 29 objd 1607047 objn 1607047 >>> Direct Path 0 ckpt 0, nblks 757301 ntcache 756116 ntdist 1185 >>> Direct Path mndb 18 tdiob 10 txiob 0 tciob 70 >>> Direct path diomrc 128 dios 2 >>> NSMTIO: Additional Info: VLOT=65089150 >>> Object# = 1607047, Object_Size = 757301 blocks >>> SqlId = 64d0vkmtjgq2x, plan_hash_value = 2108848690, Partition# = 0 >>> >>
-- Best regards, Sayan Malakshinov Oracle performance tuning engineer Oracle ACE http://orasql.org -- http://www.freelists.org/webpage/oracle-lReceived on Wed May 25 2022 - 01:26:18 CEST