RE: Direct read stopped suddenly
Date: Wed, 25 May 2022 11:06:21 -0400
Message-ID: <07e801d87048$ff103c80$fd30b580$_at_rsiz.com>
I was thinking that too, but the original post stated an increase in the elapsed time “the response time goes from <1 seconds to >5seconds” implying highly efficient block pruning included in the direct read path.
It seems like this would be the valid basis for an enhancement request when Oracle has entered the decision between buffered read due to high memory cached occupation that they also check the estimated storage index optimization and figure that into the choice. Failing that enhancement, when you know something about a table that the CBO does not (yet) take into account, then Sayan’s advice to hint the query is both exactly the right thing to do AND the spirit of the purpose of hints.
Kevin Closson mentioned recently he thought of keeping storage indexes straight in his mind “dramatically improves searching for a needle in a haystack as opposed to finding where certain pieces of straw exist in a haystack.”
I also used to use a farm metaphor for my customers: If someone throws a bag of diamonds in a manure pile, storage indexes don’t change the size of your shovel or how fast you can throw a shovelful of “manure” on a filter, but they reduce the number of shovelfuls of manure you need to shovel to just the ones that contain diamonds.
IF the predicate c1 in the supplied query “SELECT c1, c8, c2, c3, c4, c5, c6, c7 from TAB1 where c1 = :1” is in fact the dominant attribute and it doesn’t often change in value, then the efficiency of the storage indexes could be supercharged via physically ordering the data (especially with the advent of attribute clustering, but also by hand the hard way prior to that).
So IF c1 is usually the diamond you are looking for and you have storage indexes matching attribute clustering for c1, that pushes the likelihood that you need an index on c1 down.
But definitely consider c1 as a candidate index.
One more thing: IF c1 is a status indicator, especially a “completion date” or a final status value, consider making that column have a final value of NULL and then putting an index on it.
Folks often miss two key points about NULL:
- Oracle (nor any standards compliant RDBMS) cannot assign a value or meaning to NULL, but YOU can for a particular column.
- A single column index does not contain entries for NULL values and auto-prunes itself when a value is punched to NULL.
This dates back to approximately forever, and last time I mentioned it in a paper Graham Wood replied something like “that’s nothing new, we called out SPARSE INDEXES in at least V5” as being important. (Which is correct. It might even have been in V4 or V3.)
IF c1 is in fact a status indicator AND IF you use null as a final value (and you can keep a completion date somewhere else if you need the data, but stop using it for the primary pruning predicate) then you can find non-NULL values of c1 most efficiently with or without storage indexes. (I leave working that out as a priori at least a tie for most efficient to the reader).
From: oracle-l-bounce_at_freelists.org [mailto:oracle-l-bounce_at_freelists.org] On Behalf Of Andy Sayer
Sent: Wednesday, May 25, 2022 9:39 AM
To: oracle.developer35_at_gmail.com
Cc: Oracle L; Sayan Malakshinov
Subject: Re: Direct read stopped suddenly
And something related to howmuch percent of that object blocks present in buffer cache caused it to not go for smart scan this time? As because in this case i see out of total ~757K blocks(noted in dba_tables) 756K were noted by "ntcache" in the trace file. So it means there must be some criteria in which it would be checking what percent of blocks are still lying over in buffer cache/dirty and are not flushed into the disk and thus it would have taken the decision to go for buffer read rather direct read/smart scan. What is that limit?
we have _small_table_threshold = 260356
and _very_large_object_threshold= 500
On Wed, 25 May 2022, 4:56 am Sayan Malakshinov, <xt.and.r_at_gmail.com> wrote:
Hi Pap,
Do you mean to say on DB node-1 the object was fully cached whereas on node-2 and 3 and 4 it wasn't, and thus when the query was executed from node-2/3/4 it was going for smart scan whereas when it was executed from node-1 it was going for buffered read?
yes, absolutely correct
And should we then flush it from the buffer cache but it can then happen again , so how should we handle this odd behaviour?
Yes, you can flush buffer cache, but also you can use "table_stats" hint to force serial direct path reads: we know that Oracle choses serial direct path reads when our segment is bigger than "_very_large_object_threshold" percent of buffer cache. By default, it's equal to 500, i.e. 500%, so if that table segment is 5 times bigger than buffer cache, Oracle should choose direct reads.
Simple example: https://gist.github.com/xtender/3c2d608f0d8de6b63b78f814903c70ea
SQL> _at_vparam_ _very_large_object_threshold
NAME VALUE DEFLT TYPE DESCRIPTION
---------------------------------------- ------------ ------------ ------------ ------------------------------------------------------------
_very_large_object_threshold 500 TRUE number upper threshold level of object size for direct reads
SQL> select name,block_size,buffers from v$buffer_pool;
NAME BLOCK_SIZE BUFFERS
-------------------- ---------- ----------
DEFAULT 8192 41118
SQL> col vlot new_value vlot;
SQL> select buffers*5+1 as vlot from v$buffer_pool;
VLOT
205591
SQL> create table t_direct as select 1 x from dual;
Table created.
SQL> select name,value from v$statname n,v$sesstat s where s.sid=userenv('sid') and s.statistic#=n.statistic# and name='physical reads direct';
NAME VALUE
---------------------------------------------------------------- ----------
physical reads direct 0
SQL> select count(*) from t_direct;
COUNT(*)
1
SQL> select name,value from v$statname n,v$sesstat s where s.sid=userenv('sid') and s.statistic#=n.statistic# and name='physical reads direct';
NAME VALUE
---------------------------------------------------------------- ----------
physical reads direct 0
SQL> select/*+ table_stats(t_direct set rows=1000000 blocks=&vlot) */ count(*) from t_direct;
COUNT(*)
1
SQL> select name,value from v$statname n,v$sesstat s where s.sid=userenv('sid') and s.statistic#=n.statistic# and name='physical reads direct';
NAME VALUE
---------------------------------------------------------------- ----------
physical reads direct 1
More info:
http://orasql.org/2019/04/16/correct-syntax-for-the-table_stats-hint/
http://orasql.org/2013/03/07/just-test-of-adaptive-direct-path-reads-with-index_stats/
<http://orasql.org/2013/03/07/just-test-of-adaptive-direct-path-reads-with-index_stats/>
Full example: https://gist.github.com/xtender/3c2d608f0d8de6b63b78f814903c70ea
On Tue, May 24, 2022 at 11:13 PM Pap <oracle.developer35_at_gmail.com> wrote:
Thank you Sayan. This query has been running fine since the last 5-6months and always using cell smart scan , as i can see from 'offload returned bytes' column value >0 in dba_hist_sqlstat for this query and thus the elapsed time was alway <1second and never encountered such behaviour. Do you mean to say on DB node-1 the object was fully cached whereas on node-2 and 3 and 4 it wasn't, and thus when the query was executed from node-2/3/4 it was going for smart scan whereas when it was executed from node-1 it was going for buffered read? And should we then flush it from the buffer cache but it can then happen again , so how should we handle this odd behaviour?
On Wed, May 25, 2022 at 1:28 AM Sayan Malakshinov <xt.and.r_at_gmail.com> wrote:
Hi Pap,
ntcache is a number of blocks of that object in the buffer cache. Since there are almost all blocks in the cache, oracle decides to do not use direct reads.
Best regards,
Sayan Malakshinov
Oracle performance tuning expert
Oracle Database Developer Choice Award winner
Oracle ACE
http://orasql.org
вт, 24 мая 2022 г., 22:29 Pap <oracle.developer35_at_gmail.com>:
Hello Listers, We have one legacy system on version 11.2.0.4 and it is planned to be migrated to 19C. It's a four node database. We have a select query suddenly stopped doing smartscan from one node and the response time goes from <1 seconds to >5seconds. And the surprising thing is , when we manually trigger the sql , it's doing a smart scan when triggered from other nodes but from one node only it's not doing that and the response time going high for that specific node. It seems that one query showing this behaviour and other running fine.Ideally my understanding was that it should not vary based on DB node as because the storage nodes are common across all the four DB nodes. So I'm wondering , why is it happening?
Collected the 10358 trace and below is the trace. It does show a difference in the value of "ntcache" and "ntdist" for both the nodes and that is possibly driving the direct read/smartscan decision. So I wanted to understand what is that and why that value is different in both DB nodes and how to fix this odd behaviour?
query:-
SELECT c1, c8, c2, c3, c4, c5, c6, c7 from TAB1 where c1 = :1
Plan hash value: 2108848690
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |Query Block Name / Object Alias (identified by operation id):
--------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 1583 (100)| | | 1 | TABLE ACCESS STORAGE FULL| TAB1 | 9 | 711 | 1583 (1)| 00:00:19 |
--------------------------------------------------------------------------------------------------------
1 - SEL$1 / TAB1_at_SEL$1
Trace from DB node which is doing smartscan:-
NSMTIO: qertbFetch:[MTT < OBJECT_SIZE < VLOT]: Checking cost to read from caches(local/remote) and checking storage reduction factors (OLTP/EHCC Comp)
NSMTIO: kcbdpc:DirectRead: tsn: 29, objd: 1607047, objn: 1607047
ckpt: 1, nblks: 757301, ntcache: 10, ntdist:30
Direct Path for tsn 29 objd 1607047 objn 1607047
Direct Path 1 ckpt 1, nblks 757301 ntcache 10 ntdist 30
Direct Path mndb 0 tdiob 12 txiob 0 tciob 66
Direct path diomrc 128 dios 2
NSMTIO: Additional Info: VLOT=65089150
Object# = 1607047, Object_Size = 757301 blocks
SqlId = 64d0vkmtjgq2x, plan_hash_value = 2108848690, Partition# = 0
KCBO: prepared ksr msg for cid=131228, tsn=29, obj=1607047, 0
Trace from DB node which is not doing smartscan:-
NSMTIO: qertbFetch:[MTT < OBJECT_SIZE < VLOT]: Checking cost to read from caches(local/remote) and checking storage reduction factors (OLTP/EHCC Comp)
NSMTIO: kcbdpc:NoDirectRead:[CACHE_READ]: tsn: 29, objd: 1607047, objn: 1607047
ckpt: 0, nblks: 757301, ntcache: 756116, ntdist:1185
Direct Path for tsn 29 objd 1607047 objn 1607047
Direct Path 0 ckpt 0, nblks 757301 ntcache 756116 ntdist 1185
Direct Path mndb 18 tdiob 10 txiob 0 tciob 70
Direct path diomrc 128 dios 2
NSMTIO: Additional Info: VLOT=65089150
Object# = 1607047, Object_Size = 757301 blocks
SqlId = 64d0vkmtjgq2x, plan_hash_value = 2108848690, Partition# = 0
-- Best regards, Sayan Malakshinov Oracle performance tuning engineer Oracle ACE http://orasql.org -- http://www.freelists.org/webpage/oracle-lReceived on Wed May 25 2022 - 17:06:21 CEST