Re: (long) 2 Oracle doubts

From: Tanel Poder <tanel_at_@peldik.com>
Date: Sun, 27 Jul 2003 21:40:10 +0300
Message-ID: <3f241c9c_1@news.estpak.ee>

Hi!

"Paul Brewer" <paul_at_paul.brewers.org.uk> wrote in message news:3f22dae0_2_at_mk-nntp-1.news.uk.worldonline.com... [snip]
> And if CBO can't recognise that the table will fit into one read, and that
> the index is not a good one to use(whether or not the table is cached),
I'll
> eat my red hat.
>
> Regards,
> Paul
>

Although your mileage may vary, I didn't have anything better to do with my rainy Sunday here and put together a test which should make you eat your red hat ;)

C:\>sqlplus admin/admin

SQL*Plus: Release 9.2.0.1.0 - Production on P Jul 27 13:16:01 2003

Connected to:

Oracle9i Enterprise Edition Release 9.2.0.1.0 - Production

With the Partitioning, OLAP and Oracle Data Mining options

JServer Release 9.2.0.1.0 - Production

SQL> create tablespace test datafile 'C:\ORACLE\ORA92\ORCL\TEST01.DBF' size 10m autoextend on segment space management manual;

Tablespace created.

I used old freelist based free space management to avoid any slick tricks Oracle might do with my data..

SQL> create table t (id, name) tablespace test as select obj#, name from sys.obj$ where rownum <= 50;

Table created.

SQL> desc t;

 Name                          Null?    Type

 ----------------------------- -------- --------------------

 ID                            NOT NULL NUMBER

 NAME                          NOT NULL VARCHAR2(30)

Both columns are not null from original table’s definition.

SQL> create unique index i on t(id) tablespace test;

Index created.

Note that this is a unique index (which a primary key should normally use..)

SQL> analyze table t compute statistics;

Table analyzed.

(Compute statistics on table analyzes all it’s indexes by default as well)

SQL> select blocks, empty_blocks, num_rows, avg_space, avg_row_len from user_tables where table_name = 'T';

BLOCKS EMPTY_BLOCKS NUM_ROWS AVG_SPACE AVG_ROW_LEN

------------ ---------- ---------- -----------

1 6 50 7237 14

SQL> select blocks from user_extents where segment_name = 'T';

BLOCKS

We see that a 8 block extent has been allocated for table T, 1 block of it is in use for data and one remaining block is for segment header.

SQL> select blevel, leaf_blocks, distinct_keys, clustering_factor from user_indexes where index_name = 'I';

BLEVEL LEAF_BLOCKS DISTINCT_KEYS CLUSTERING_FACTOR

----------- ------------- -----------------

         0           1            50                 1

Index blevel is 0, that means all keys are accessible with only one IO to index root block.

SQL> alter tablespace test offline;

Tablespace altered.

SQL> alter tablespace test online;

Tablespace altered.

SQL> set autotrace on explain stat

SQL> alter session set events '10053 trace name context forever, level 1';

Session altered.

SQL> alter session set events '10046 trace name context forever, level 8';

Session altered.

SQL> select name from t where id = 25;

NAME

I_CDEF1 Execution Plan

0 SELECT STATEMENT Optimizer=CHOOSE (Cost=1 Card=1 Bytes=11)

1 0 TABLE ACCESS (BY INDEX ROWID) OF 'T' (Cost=1 Card=1 Bytes=

11)

2 1 INDEX (UNIQUE SCAN) OF 'I' (UNIQUE) You see, index is preferred over single block table! The reason here is that optmizer finds out that cost is 2 for table scan, but only 1 for index scan. (_table_scan_cost_plus_one parameter says that optimizer should take segment header block reads into account when computing costs. This parameter defaults to true in 9i, false in 8i). Segment header block reads are necessary for every full table scan (and index fast full scan) because the map of extents belonging to segment is there. The reads can be verified from 10046 trace (but data blocks shouldn’t be in buffer cache in order to see sequential and scattered read waits).

Note that index access cost for *unique* scans is calculated as blevel + 1. That is, for single block index, index traversing cost is 0 and for table block read cost 1 is added. In some cases cost doesn’t always represent number of IOs (and CPU usage), this special issue could be because Oracle knows that unique scans are generally very fast (since they never fetch more than one row) and normally a frequently used index’es root block is in cache anyway. This index cost is 0 as long as table is accesses by index rowid afterwards, if the column queried is actually contained in the index (no table read needed) then index scan cost is assigned a cost of 1, probably because you can’t have a statement with no cost at all.

Statistics

0 recursive calls

0 db block gets

2 consistent gets

2 physical reads

0 redo size

380 bytes sent via SQL*Net to client

503 bytes received via SQL*Net from client

2 SQL*Net roundtrips to/from client

0 sorts (memory)

0 sorts (disk)

1 rows processed

Even though the cost was 1, we got 2 physical reads, one for index root/leaf block and one for actual data block.

Let’s play around with index stats a bit and tell optimizer that index has 3 levels, 1 root, 1 branch and a leaf level:

SQL> exec dbms_stats.set_index_stats(ownname=>'ADMIN', indname=>'I', indlevel=>2);

PL/SQL procedure successfully completed.

SQL> select name from t where id = 25;

NAME

I_CDEF1 Execution Plan

0 SELECT STATEMENT Optimizer=CHOOSE (Cost=2 Card=1 Bytes=11)

1 0 TABLE ACCESS (FULL) OF 'T' (Cost=2 Card=1 Bytes=11)

Since optimizer thinks that for getting to this unique row it requires 4 IO’ s (3 levels of index and one data block) it figures out that full table scan with only 2 IO’s is cheaper.

Statistics

0 recursive calls

0 db block gets

4 consistent gets

1 physical reads

0 redo size

380 bytes sent via SQL*Net to client

503 bytes received via SQL*Net from client

2 SQL*Net roundtrips to/from client

0 sorts (memory)

0 sorts (disk)

1 rows processed

We have only one additional physical read, from trace I saw it was a sequential, single block read of table segment header block. There was no scattered reads probably because Oracle kernel cache layer saw that all table blocks up to segment high water mark are already in cache. If HMW would have been higher, multiblock reads up to it would have occurred.

Let’s make the test more interesting and set costs of both scans equal, by setting index level to 1 (root and leaf levels).

SQL> exec dbms_stats.set_index_stats(ownname=>'ADMIN', indname=>'I', indlevel=>1);

PL/SQL procedure successfully completed.

SQL> select name from t where id = 25;

NAME

I_COL3 Execution Plan

0 SELECT STATEMENT Optimizer=CHOOSE (Cost=2 Card=1 Bytes=11)

1 0 TABLE ACCESS (BY INDEX ROWID) OF 'T' (Cost=2 Card=1 Bytes=

11)

2 1 INDEX (UNIQUE SCAN) OF 'I' (UNIQUE) (Cost=1 Card=50)

In this case, cost was 2 for both table and index scan but index scan was chosen. Seems that when costs are equal, access path which is better in Rule Based Optimizer sense, is chosen (index over table scan, rowid over index scan). That’s why actual execution path may differ from CBO calculations of BEST_CST in it’s trace file

Ok, let’s “increase” number of leaf blocks now:

SQL> exec dbms_stats.set_index_stats(ownname=>'ADMIN', indname=>'I', indlevel=>1, numlblks=>500, clstfct=>500 );

PL/SQL procedure successfully completed.

SQL> select name from t where id = 25;

NAME

PROXY_ROLE_DATA$ Execution Plan

0 SELECT STATEMENT Optimizer=CHOOSE (Cost=2 Card=1 Bytes=10)

1 0 TABLE ACCESS (BY INDEX ROWID) OF 'T' (Cost=2 Card=1 Bytes=

10)

2 1 INDEX (UNIQUE SCAN) OF 'I' (UNIQUE) (Cost=1 Card=50)

Even though for CBO the index looks way bigger than table now, index access is still chosen, because optimizer knows that for unique index it only needs to traverse from root block to one single leaf block to get it’s key.

Let’s see what happens with non-unique indexes:

SQL> drop index i;

Index dropped.

SQL> create index i on t(id);

Index created.

SQL> analyze index i compute statistics;

Index analyzed.

SQL> select name from t where id = 25;

NAME

I_ICOL1 Execution Plan

0 SELECT STATEMENT Optimizer=CHOOSE (Cost=2 Card=1 Bytes=10)

1 0 TABLE ACCESS (BY INDEX ROWID) OF 'T' (Cost=2 Card=1 Bytes=

10)

2 1 INDEX (RANGE SCAN) OF 'I' (NON-UNIQUE) (Cost=1 Card=1)

Index unique scan has changed to index range scan and index access cost has changed to 1, thus with data block fetch summing up to 2. (For unique scan index access cost was 0). Note, that this index has blevel 0, since it’s just built on our 50 rows.

Ok, let’s set index bigger for CBO, two levels (indlevel 1) and number of leaf blocks to 50. Also, clustering factor can’t normally be less than number of leaf blocks, thus has to be adjusted (I actually tried to make it less by using compressed tables and non-compressed indexes but for some reason my experiment didn’t succeed..)

SQL> exec dbms_stats.set_index_stats(ownname=>'ADMIN', indname=>'I', indlevel=>1, numlblks=>50, clstfct=>50);

PL/SQL procedure successfully completed.

SQL> select name from t where id = 25;

NAME

I_ICOL1 Execution Plan

0 SELECT STATEMENT Optimizer=CHOOSE (Cost=2 Card=1 Bytes=10)

1 0 TABLE ACCESS (BY INDEX ROWID) OF 'T' (Cost=2 Card=1 Bytes=

10)

2 1 INDEX (RANGE SCAN) OF 'I' (NON-UNIQUE) (Cost=1 Card=1)

Even though I set index height to 2 levels using dbms_stats, CBO still finds that index scan costs 1 IO and corresponding data block read also 1 IO. This seems strange, because index traversing cost is considered the same for 1-level and 2-level indexes. Again, this is probably an Oracle trick for handling unique and non-unique scans on small indexes better. Here’s a result of my little experiment with index heights and CBO calculated index traversing costs:

Index height Cost Unique Cost Range (non-unique)

0 0 1

1 1 1 <-------

2 2 3

3 3 4

These are index scan costs only. From the table you see, that for To calculate the whole data fetching cost you have to add 1 for *each* data block read for corresponding key. Note the *each* data block read. This is the place where clustering factor comes into play. Lets set clustering factor a little higher now:

SQL> exec dbms_stats.set_index_stats(ownname=>'ADMIN', indname=>'I', indlevel=>1, numlblks=>50, clstfct=>51);

PL/SQL procedure successfully completed.

SQL> select name from t where id = 25;

NAME

PROXY_ROLE_DATA$ Execution Plan

0 SELECT STATEMENT Optimizer=CHOOSE (Cost=2 Card=1 Bytes=10)

1 0 TABLE ACCESS (FULL) OF 'T' (Cost=2 Card=1 Bytes=10)

We see, that increasing clustering factor probably made index access cost higher than best cost 2 of full table scan. To see the detailed costs, I just put an index hint into the query:

SQL> select /*+ index(t) */ name from t where id = 25;

NAME

PROXY_ROLE_DATA$ Execution Plan

0 SELECT STATEMENT Optimizer=CHOOSE (Cost=3 Card=1 Bytes=10)

1 0 TABLE ACCESS (BY INDEX ROWID) OF 'T' (Cost=3 Card=1 Bytes=

10)

2 1 INDEX (RANGE SCAN) OF 'I' (NON-UNIQUE) (Cost=1 Card=1)

The index traversing cost for range scan has remained 1, but table access cost has grown by 1. When number of distinct keys was 50 and also clustering factor was 50, CBO knew that for each key value no more than 1 data block had to be fetched to get data, thus cost of reading appropriate data from table is: ceil(1/NDV*CLUSTERING_FACTOR). NDV means number of distinct values and 1/NDV gives us DENSITY, in other words FILTER FACTOR. Thus the cost=ceil(1/50*50)=1. But as soon as clustering factor raised above 50, then cost of table access also raised to 2. When clustering factor reaches 101, then table access cost will be 3 etc..

What happens if we increase number of leaf blocks as well?

SQL> exec dbms_stats.set_index_stats(ownname=>'ADMIN', indname=>'I', indlevel=>1, numlblks=>51, clstfct=>51);

PL/SQL procedure successfully completed.

SQL> select /*+ index(t) */ name from t where id = 25;

NAME

PROXY_ROLE_DATA$ Execution Plan

0 SELECT STATEMENT Optimizer=CHOOSE (Cost=4 Card=1 Bytes=10)

1 0 TABLE ACCESS (BY INDEX ROWID) OF 'T' (Cost=4 Card=1 Bytes=

10)

2 1 INDEX (RANGE SCAN) OF 'I' (NON-UNIQUE) (Cost=2 Card=1)

We see that index range scan cost has risen from 1 to 2 as well! This has again to do with number of distinct keys in index. When we had 50 distinct keys and 50 leaf blocks, it was clear that *on average* there is only one leaf block per key value, thus querying by equality condition on a specific key no more than one index leaf block has to be read. But as long as number of leaf blocks (#LB) actually rose higher than number of distinct keys (#DK in 10053 trace), then ceil(1/#DK*#LB) adds 1 to index traversing and leaf block reading cost. The same rule applies, when #LB reaches 101 in our case, the cost will be added 1 etc..

Note that we are still talking about non-unique RANGE scans. But what happens when we add an unique constraint to ID column which has non-unique index?

SQL> alter table t add constraint c unique(id);

Table altered.

SQL> select name from t where id = 25;

NAME

PROXY_ROLE_DATA$ Execution Plan

0 SELECT STATEMENT Optimizer=CHOOSE (Cost=2 Card=1 Bytes=10)

1 0 TABLE ACCESS (BY INDEX ROWID) OF 'T' (Cost=2 Card=1 Bytes=

10)

2 1 INDEX (RANGE SCAN) OF 'I' (NON-UNIQUE) (Cost=1 Card=50)

Since optimizer knows that contents of ID column and index are definitely unique it doesn’t have to add cost for additional index leaf block scans because in unique index you only have to visit one leaf block to get to your ID (I actually tried playing around with avg row length and avg column length to make unique index leaf block scan more expensive, but with no luck ...)

Just one more experient with disabling constraints:

SQL> alter table t disable constraint c;

Table altered.

SQL> alter table t enable novalidate constraint c;

Table altered.

SQL> select name from t where id = 25;

NAME

PROXY_ROLE_DATA$ Execution Plan

0 SELECT STATEMENT Optimizer=CHOOSE (Cost=2 Card=1 Bytes=10)

1 0 TABLE ACCESS (FULL) OF 'T' (Cost=2 Card=1 Bytes=10)

Oracle now counts full table scan cheaper, because with novalidated constraints it can’t be sure whether all rows in indexed column are unique or not.

SQL> show parameter optimizer

NAME                                 TYPE        VALUE

------------------------------------ ----------- ---------------------------
optimizer_dynamic_sampling           integer     1

optimizer_features_enable            string      9.2.0

optimizer_index_caching              integer     0

optimizer_index_cost_adj             integer     100

optimizer_max_permutations           integer     2000

optimizer_mode                       string      CHOOSE

Here are my optmizer* parameters. Optimizer dynamic sampling didn’t have any effect in my test, because at level 1 it will only sample when tables don’t have statistics (and there’s a join condition if I recall correctly).

Conclusion:

Index access is chosen by CBO for small lookup tables.

especially when the lookup index is unique or there is an unique constraint on lookup column
when non-unique index/constraint then in 9i index is used by default, but in 8i not (the _table_scan_cost_plus_one parameter). If I set optimizer_index_cost_adj to 49, which is quite reasonable value, then index access was used in 8i even when table_scan_cost was left default and non-unique index was used.
The platform used was 9.2.0.1 on Win2k sp3 with 8k blocksize and no special CBO parameters set. db_file_multiblock_read_count was 16, but as I understand, that doesn’t affect costs of scanning so small table like in the experiment.

Since I don’t count myself as an expert on CBO, any additions and corrections are welcome.

As I stated in beginning of my mail, your mileage may vary, but Paul, I’ll be satisfied, when you eat half of your red hat ;)

Cheers,

Tanel. Received on Sun Jul 27 2003 - 13:40:10 CDT