Site search

form.antibot { display: none !important; } You must have JavaScript enabled to use this form.

Aggregator

[latest] [categories]
Please use the contact form to include your blog here, suggest a feed, or give feedback.

If your blog is aggregated on OraFAQ, you may want to display this image on your blog:

Jonathan Lewis

Just another Oracle weblog

URL: https://jonathanlewis.wordpress.com

Updated: 1 week 1 day ago

rowlen surprise

Mon, 2025-09-29 08:40

Here’s a little detail about Oracle’s calculation of user_tables.avg_row_len that I hadn’t noticed before – and I really should have noticed it years ago, so maybe I’m the only person who’s surprised by it.

The figure isn’t necessarily telling you about the number of bytes stored in the row.

We’ve seen plenty of oddities relating to LOBs and LOB locators, of course, and there are little glitches due to chained or migrated rows, but in this note I’m thinking only of columns that are simple data types and complete rows that are single row pieces.

From time to time people ask about estimating the “correct” size for a table – usually in the context of segments that seem to be much bigger than they should be and space that has gone “missing”. The usual response to such a request is to suggest dividing the block size (minus a couple of hundred bytes for overheads) by the avg_row_len then adjusting for the pctfree to get a typical “rows per block” estimate. Dividing the user_tables.num_rows by this “rows per block” gives you the number of blocks needed to hold the table data; add a couple of percent to cater for space management bitmap blocks and there’s your (rough) estimate.

Responding a question on one of the Oracle forums I suddenly realised that this is not quite correct – and here’s a little script (tested, so far, on 19.11 [ed: and 23.9 by a reader]) to demonstrate why not.

rem
rem     Script:         rowlen_oddity.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Sept 2025
rem     Purpose:        
rem
rem     Last tested 
rem             23.9.0.25.7
rem             19.11.0.0
rem     To be tested
rem

create table t1
as
with generator as (
        select 
                rownum id
        from dual 
        connect by 
                level <= 1e4    -- > comment to avoid WordPress format issue
)
select
        rownum                          id,
        rownum + 0.25                   id2,
        rownum + 0.50                   id3,
        lpad(rownum,10,'0')             v1,
        lpad('x',30,'x')                padding
from
        generator       v1,
        generator       v2
where
        rownum <= 1e4   -- > comment to avoid WordPress format issue
;

spool rowlen_oddity.lst

column table_name  format a12
column column_name format a32
column data_default format a40
set linesize 132
set feedback 6

prompt  ====================
prompt  Table after creation
prompt  ====================

select table_name, avg_row_len from user_tables where table_name = 'T1';

prompt  ==============================
prompt  Effects of adding column group
prompt  ==============================

execute dbms_stats.gather_table_stats(user,'t1',method_opt=>'for columns (id, id2) size 1')

select  column_name, internal_column_id, avg_col_len 
from    user_tab_cols
where   table_name = 'T1' 
and     column_name like 'SYS_STU%' 
order by 
        internal_column_id;

select table_name, avg_row_len from user_tables where table_name = 'T1';

prompt  ======================================
prompt  Effects of adding other extended stats
prompt  ======================================

execute dbms_stats.gather_table_stats(user,'t1',method_opt=>'for columns (v1 || id) size 1')

select  column_name, internal_column_id, avg_col_len 
from    user_tab_cols 
where   table_name = 'T1' 
and     column_name like 'SYS_STU%' 
order by 
        internal_column_id;

select table_name, avg_row_len from user_tables where table_name = 'T1';

prompt  ===========================================
prompt  Effects of adding a "proper" virtual column
prompt  ===========================================

alter table t1 add sum_column invisible generated always as (id + id2 + id3) virtual;

execute dbms_stats.gather_table_stats(user,'t1',method_opt=>'for all columns size 1')

select  column_name, internal_column_id, avg_col_len 
from    user_tab_cols 
where   table_name = 'T1' 
and     (column_name like 'SYS_STU%' or column_name = 'SUM_COLUMN')
order by 
        internal_column_id;

select table_name, avg_row_len from user_tables where table_name = 'T1';

prompt  ==========================================
prompt  Effects of adding a "function-based index"
prompt  ==========================================

create index t1_i1 on t1(upper(v1));

execute dbms_stats.gather_table_stats(user,'t1',method_opt=>'for all columns size 1')

select  column_name, internal_column_id, avg_col_len , data_default
from    user_tab_cols 
where   table_name = 'T1' 
order by 
        internal_column_id;

select table_name, avg_row_len from user_tables where table_name = 'T1';

All I’ve done is create a simple table with a few numeric columns and a couple of character columns, then I’ve reported the average row length for that table. After that I’ve add a column group (through a call to dbms_stats.create_extended_stats), a second set of extended stats for a column concatenation, a declared virtual column, and a function-based index.

After each addition to the table I’ve reported the average column length for the column I’ve added and the table’s average row length. Here’s an extract from the output showing just the row lengths (with a note of the activity up to that point:

SQL> host grep T1 rowlen_oddity.lst

T1                    56    -- table created
T1                    68    -- column group added
T1                    83    -- extended stats added
T1                    89    -- virtual column added
T1                   100    -- function-based index added

Every time I’ve done something to introduce some new statistic Oracle has stored metadata in the data dictionary but no actual data in the table – nevertheless user_tables.avg_row_len tells us that the row gets longer each step we take. As a quick cross check, here’s the full list of column names with their “internal” column id and nominal (though never stored) length:

COLUMN_NAME                      INTERNAL_COLUMN_ID AVG_COL_LEN DATA_DEFAULT
-------------------------------- ------------------ ----------- ----------------------------------------
ID                                                1           4
ID2                                               2           5
ID3                                               3           5
V1                                                4          11
PADDING                                           5          31
SYS_STUEOQD16MRLF_3QF0IRTJ0NSQ                    6          12 SYS_OP_COMBINED_HASH("ID","ID2")
SYS_STUUW7XIA$3151$$167LVPSG8U                    7          15 "V1"||TO_CHAR("ID")
SUM_COLUMN                                        8           6 "ID"+"ID2"+"ID3"
SYS_NC00009$                                      9          11 UPPER("V1")

The last 4 rows are the metadata entries in the order we generated them – and if you check their column lengths against the changes in the average row length you’ll see the exact correspondence.

The moral of the blog note is this: if you’re trying to estimate how much space your table should use by multiplying the row length by the number of rows, check if any of its columns are metadata-only and subtract their avg_col_len from the table’s avg_row_len before doing any other arithmetic.

SQL Developer and CBO

Wed, 2025-09-24 07:08

Conclusion

In recent posts I’ve started with the conclusion so that you can decide very quickly whether or not the article is likely to be relevant to your current requirements.

SQL Developer (I have been told) always uses the character data type to pass bind variables to the database and leaves it up to the database/optimizer to deal with any conversions it might need. If the code is then used with the correct data type for the bind variables the optimizer will probably produce differently cardinality estimates for exactly the same code, and changes in cardinality estimates may result in changes in execution plans.

Even when plans with the correct bind variable datatypes match plans with the character datatype they may still perform differently for “exactly the same” inputs. (On the plus side, the plans with the correct datatype are more likely to have the better performance.)

When checking execution plans from memory (dbms_xplan.display_cursor()) always check the Predicate Information section; there is also a peeked_binds option to the format call that makes differences in bind datatypes very obvious.

Preamble

At POUG2025 (Polish Oracle User Group) conference earlier on this month, I attended a presentation by Monika Lewandowska on the special treatment Oracle gives the nvl() operator. While showing us an execution plan that demonstrated an important point, Monika mentioned in passing that she hadn’t yet worked out why one of the cardinality estimates (E-rows) showed the value it did.

I looked at the plan and, since we’d been given all the necessary details of the data, found that I could state very confidently that I didn’t know why the number had appeared either.

After getting home, building a smaller model of the dataset, and exchanging a couple of emails with Monika I discovered the (horrible) answer. If you want to build the model, the script is at the end of this note, I’m just going to show you the query I executed (SQL*Plus, 19.11 and 23.6) and two (different) execution plans:

select  /* PIDN */ 
        count(*), 
        max(product_name) max_name
from    prod
where   product_id = nvl(:nProdId, product_id)
/

The table prod holds 10M rows, the value of column product_id is 1 for half the rows, and unique even numbers for the other half. There is a hybrid histogram on product_id, so the optimizer is (approximately, of course) aware of the data pattern, and product_id is declate not null.

You’ll notice that the query includes a bind variable :nProdID, and a call to nvl() on that bind variable. Here’s the plan, pulled from memory by a call to dbms_xplan.display_cursor(), when I supplied NULL for that variable:

---------------------------------------------------------------------------------------------------------------------
| Id  | Operation                               | Name            | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
---------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                        |                 |      1 |        |      1 |00:00:08.25 |   29717 |
|   1 |  SORT AGGREGATE                         |                 |      1 |      1 |      1 |00:00:08.25 |   29717 |
|   2 |   VIEW                                  | VW_ORE_1807A7AF |      1 |     10M|     10M|00:00:07.09 |   29717 |
|   3 |    UNION-ALL                            |                 |      1 |     10M|     10M|00:00:05.54 |   29717 |
|*  4 |     FILTER                              |                 |      1 |        |      0 |00:00:00.01 |       0 |
|   5 |      TABLE ACCESS BY INDEX ROWID BATCHED| PROD            |      0 |      1 |      0 |00:00:00.01 |       0 |
|*  6 |       INDEX RANGE SCAN                  | PROD_I1         |      0 |      1 |      0 |00:00:00.01 |       0 |
|*  7 |     FILTER                              |                 |      1 |        |     10M|00:00:02.64 |   29717 |
|   8 |      TABLE ACCESS FULL                  | PROD            |      1 |     10M|     10M|00:00:01.13 |   29717 |
---------------------------------------------------------------------------------------------------------------------

(If you’re not familiar with the transformation that the optimizer has applied to the original query then you may want to read this note on OR-expansion of nvl() predicates, or this more recent note on the topic by Monika.)

The plan is exactly what I expected, which didn’t correspond to the cardinality estimate that Monika’s data had shown. So, after an exchange of email, I repeated the test but switched to “Monika-mode” to get the following plan (which is only slightly different):

---------------------------------------------------------------------------------------------------------------------
| Id  | Operation                               | Name            | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
---------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                        |                 |      1 |        |      1 |00:00:10.04 |   29717 |
|   1 |  SORT AGGREGATE                         |                 |      1 |      1 |      1 |00:00:10.04 |   29717 |
|   2 |   VIEW                                  | VW_ORE_1807A7AF |      1 |   2478K|     10M|00:00:08.87 |   29717 |
|   3 |    UNION-ALL                            |                 |      1 |   2478K|     10M|00:00:07.33 |   29717 |
|*  4 |     FILTER                              |                 |      1 |        |      0 |00:00:00.01 |       0 |
|   5 |      TABLE ACCESS BY INDEX ROWID BATCHED| PROD            |      0 |      1 |      0 |00:00:00.01 |       0 |
|*  6 |       INDEX RANGE SCAN                  | PROD_I1         |      0 |      1 |      0 |00:00:00.01 |       0 |
|*  7 |     FILTER                              |                 |      1 |        |     10M|00:00:04.42 |   29717 |
|*  8 |      TABLE ACCESS FULL                  | PROD            |      1 |   2478K|     10M|00:00:02.89 |   29717 |
---------------------------------------------------------------------------------------------------------------------

Note how the cardinality estimate (E-Rows) for operation 8 has dropped to roughly a quarter of the estimate shown in the previous plan – and that 1/4 is the effect that Monika had shown in the original presentation. What’s going on?

The correct response you should give at this point is: “Where’s the Predicate Information?”

If you translated “product_id = nvl(:nprodid, product_id)” into a procedural form you might write it as:

if :nprodid is null then
        product_id = product_id
else
        product_id = :nprodid
if

So my first thought about the difference between the two results was that Monika was running a version of Oracle that actually generated the product_id = product_id predicate then did some arithmetic that behaved as if it was comparing two different columns. That would explain why we both saw the estimated cardinality as (very close to) 1/4 of the expected value.

You could check my claim about the 1/4 estimate by creating the table with a duplicate of the product_id column, including its histogram, and checking the estimiate for the predicate product_id = product_id_dup. Roughly speaking: because the histogram tells the optimizer that half the data has the value 1 and the other half has no repeats then in the Cartesian product half the data from the “left hand table” will match half the data of the “right hand table” – and that “half of a half” is where the 1/4 drops out of the arithmetic.

The email we exchanged was about the Oracle version, the predicate information and the client application.

We were using different versions of Oracle – both 19c but different RUs
We were using different applications – I was using SQL*Plus, she was using SQL Developer
The two sets of predicate information did not match; here they are (based on my sample data):

Predicate Information (identified by operation id):    (SQL*Plus)
---------------------------------------------------
   4 - filter(:NPRODID IS NOT NULL)
   6 - access("PRODUCT_ID"=:NPRODID)
   7 - filter(:NPRODID IS NULL)

Predicate Information (identified by operation id):    ("SQL Developer mode")
---------------------------------------------------
   4 - filter(:NPRODID IS NOT NULL)
   6 - access("PRODUCT_ID"=TO_NUMBER(:NPRODID))
   7 - filter(:NPRODID IS NULL)
   8 - filter("PRODUCT_ID"=TO_NUMBER(TO_CHAR("PRODUCT_ID")))

My guess wasn’t right, but it was in the right ballpark. The problem wasn’t a strange optimizer bug dependent on the version of Oracle, but it was the about the arithmetic of a predicate very similar to “product_id = product_id”.

That predicate had appeared because (I have been told) SQL Developer always uses a character datatype to supply bind variables to the server and lets the server sort out any problems of coercion; so the “Monika-mode” hack I had used from SQL*Plus was to declare variable nProdId varchar2(30) rather than variable nProdId number.

Becauses :nprodid was a character bind variable the mechanism the optimizer had to use to deal with the predicate product_id = nvl(:nprodid, product_id) was to convert the right-hand expresssion to a number, producing the predicate product_id = to_number(:nprodid, product_id), except you can’t apply to_number() to a number, so the optimizer had to convert the numeric column inside the nvl() to a character, leading to the rewritten predicate product_id = to_number(nvl(:nprodid, to_char(product_id))) before the _or_expand_nvl_predicate transformation is applied.

If you want to confirm that Oracle goes through this intermediate step you can generate the CBO (10053) trace, and you will find in it the text (reformatted here for readability):

Transformed predicate 
        "PROD"."PRODUCT_ID"=TO_NUMBER(NVL(:B1,TO_CHAR("PROD"."PRODUCT_ID"))) 
to
           :B1 IS NOT NULL AND "PROD"."PRODUCT_ID"=TO_NUMBER(:B2) 
        OR :B3 IS NULL AND "PROD"."PRODUCT_ID"=TO_NUMBER(TO_CHAR("PROD"."PRODUCT_ID")

As for the arithmetic, there are some functions where function(column) is treated as “unknown, unpeekable value” leading to a fixed selectivity guess and there are some functions where the selectivity of function(column) is assumed to be the same as the selectivity of the underlying column – in this case the optimizer does the predicate arithmetic of product_id = product_id without realising that it’s (effectively) calculating for a predicate that it would normally eliminate.

Consequences

Changes in cardinality estimates can lead to changes in execution plans. This was a very simple query with a very obvious change in estimate, but no change in plan; in production databases the variation in estimate may be far less obvious. If you use SQL Developer to create and test SQL and then embed the SQL in some other tool (with the correct bind declarations) for production execution you may have difficulty why “it works okay in test but picks the wrong plan in production”.

There are variations on this theme; the two plans I’ve shown have the same plan hash value – but do different amounts of work because of the calls to conversion functions. They would also appear as two separate child cursors if one session was using character binds and the other was using numeric binds. Have you ever heard the complaint: This query runs faster for userX than userY, even though the plan is the same and they’re both using the same {some column name/description} – maybe they’re using different tools to run the query.

This is just another reason for getting into the habit of: “always check the Predicate Information”. In fact this prompts me to add a comment about including the “peeked_binds” format option in the call to dbms_xplan.display_cursor() et. al. which would give you a very clear declaration of the source of the problem (if they appear). Here are the two sections from my demo:

Peeked Binds (identified by position):
--------------------------------------
   1 - :1 (NUMBER): (null)

Peeked Binds (identified by position):
--------------------------------------
   1 - :1 (VARCHAR2(30), CSID=873): (null)

Addendum

While SQL Developer makes it a little awkward to test your SQL with numeric bind variables, SQL*Plus has a similar limitation with date bind variables. Although, as we saw above, you need only write variable n1 number in SQL*Plus to create a numeric bind variable, the equivalent variable d1 date results in the following response (in 23.6):

Usage: VAR[IABLE] [ <variable> [ NUMBER | CHAR | CHAR (n [CHAR|BYTE]) |
                    VARCHAR2 (n [CHAR|BYTE]) | NCHAR | NCHAR (n) |
                    NVARCHAR2 (n) | CLOB | NCLOB | BLOB | BFILE |
                    REFCURSOR | BINARY_FLOAT | BINARY_DOUBLE |
                    BOOLEAN | VECTOR] ]

If you want to test your code to generated execution plans with date bind variables (from SQL*Plus) you’ll have to create a PL/SQL block that defines a variable, populates it, then does something that forces the statement to be parsed.

Addendum 2

The link I posted to Bluesky publicising this note has generated a number of responses, covering SQL Developer for VS Code, “runscript”, and SQLcl (so far). I’ll draw together all the bits some time in the next few days; meanwhile, following an initial response about “Run Script (F5)”, Monika sent me an email about that feature of SQL Developer and I’ll just quote it here (with the reminder that different SQL Developer is constantly being enhanced, so the following may not be true for every version):

If you run this:

variable nProdId number
exec :nProdId := null
 
select  /* PIDN */ 
        count(*), 
        max(product_name) max_name
from    prod
where   product_id = nvl(:nProdId, product_id)
/

as a script (F5) in SQL Developer it sends the bind as a number. Unfortunately I work with SQL Developer for VS Code and here it doesn’t work that way.

The model

If you want to build the data set click here to view/hide the test script

rem
rem     Script:         monika_nvl_3a.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Sept 2025
rem     Purpose:        
rem
rem     Last tested 
rem             23.6.0.0
rem             19.11.0.0

drop table prod purge;

set linesize 132 pagesize 120 trimspool on tab off

execute dbms_random.seed(0)

prompt  ====================================
prompt  create table PROD: 50 * 200,000 rows
prompt  Half the rows have product_id = 1
prompt  The rest hold distinct even numbers
prompt  ====================================

create table prod
as
with data as (
        select  rownum rn
        from    dual 
        connect by 
                level <= 2e5
),
products as (
        select 'Product ' || to_char(rownum,'FM000') product_name
        from    dual 
        connect by 
                level <= 50
)
select 
        case when mod(rownum,2) = 1 then 1 else rownum end product_id,
        prd.product_name
from 
        products        prd, 
        data            dat
/

alter table prod modify (product_id not null);
create index prod_i1 on prod(product_id);

spool monika_nvl_3a.lst

select  * 
from    (
        select 
                product_id,
                count(*)                product_count,
                sum(count(*)) over ()   total_rows
        from 
                prod 
        group by 
                product_id 
        order by 
                product_id
        )
where 
        rownum < 11
/

select count(*) from prod;

prompt  ======================
prompt  Collecting  histograms
prompt  ======================

exec dbms_stats.gather_table_stats(user, 'prod', method_opt  => 'for all columns size skewonly')

select  column_name, histogram, num_buckets
from    user_tab_cols
where   table_name = 'PROD'
/

alter system flush shared_pool;
alter session set statistics_level = all;

set timing on
set serveroutput off

column max_name format a32

prompt  =========================================================
prompt  Test for NULL (product not supplied): nProdId as number
prompt  E-Rows is 10M, time to run ca. 9 seconds.
prompt  Note the simple filter predicate, and the absence of a
prompt  predicate for op. 8 when product_id is declared not null.
prompt  If not so declared we'd see "product_id is not null"
prompt  ===========================================================

variable nProdId number
exec :nProdId := null

select  /* PIDN */ 
        count(*), 
        max(product_name) max_name
from    prod
where   product_id = nvl(:nProdId, product_id)
/

select  *
from    dbms_xplan.display_cursor( format=> 'allstats last peeked_binds')
/

prompt  =========================================================
prompt  Test for NULL (product not supplied): nProdId as varchar2
prompt  E-Rows is ca. 2.5M, time to run ca. 11 seconds.
prompt  Note the more complex filter predicates, especially op. 8
prompt  =========================================================

variable nProdId varchar2(30)
exec :nProdId := null

-- alter session set events '10053 trace name context forever';

select  /* PIDN */ 
        count(*), 
        max(product_name) max_name
from    prod
where   product_id = nvl(:nProdId, product_id)
/

-- alter session set events '10053 trace name context off';

select  *
from    dbms_xplan.display_cursor( format=> 'allstats last peeked_binds')
/

spool off

Bind Peeking

Fri, 2025-08-01 12:00

Here’s a simple detail about bind peeking (during optimisation) that makes a big difference to “fetch first/next” queries. The optimizer knows about your bind types, and can peek at the bind values – but that doesn’t mean it will do something sensible with them. Here’s a little model to demonstrate the problem – starting with a trival table creation statement and a simple “old-fashioned” query:

rem
rem     Script:         bind_peeking_wasted.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Aug 2025
rem     Purpose:        
rem
rem     Last tested 
rem             23.6.0.24.10

create table t1
as
select * from all_objects
;

alter table t1 add constraint t1_pk primary key(object_id);

set serveroutput off

variable b1 number
exec :b1 := 20

set feedback only

select  * 
from    (
        select  object_id, object_name, object_type, owner 
        from    t1
        order by 
                object_id
        )
where 
        rownum <= :b1
/

set feedback on

select * from table(dbms_xplan.display_cursor(format=>'peeked_binds'));

Since the optimizer is able to peek at the number variable :b1 it could choose to use an index full scan with stop key to fetch the first 20 ordered rows and, as you can see below, that’s exactly what it did:

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------
SQL_ID  3vd5qn3smsw5s, child number 0
-------------------------------------
select * from (  select object_id, object_name, object_type, owner
from t1  order by   object_id  ) where  rownum <= :b1

Plan hash value: 3766500789

---------------------------------------------------------------------------------------
| Id  | Operation                     | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |       |       |       |     3 (100)|          |
|*  1 |  COUNT STOPKEY                |       |       |       |            |          |
|   2 |   VIEW                        |       |    20 |  3160 |     3   (0)| 00:00:01 |
|   3 |    TABLE ACCESS BY INDEX ROWID| T1    | 63684 |  3855K|     3   (0)| 00:00:01 |
|   4 |     INDEX FULL SCAN           | T1_PK |    20 |       |     2   (0)| 00:00:01 |
---------------------------------------------------------------------------------------

Peeked Binds (identified by position):
--------------------------------------
   1 - :B1 (NUMBER): 20

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter(ROWNUM<=:B1)

As you can see the bind value 20 is echoed in the Rows predication for the index full scan (even though the prediction for the table access doesn’t follow the logic). Now change the code so that the rownum is dictated by summing two variables:

variable b2 number
variable b3 number

exec :b2 := 15; :b3 := 5

set feedback only

select  * 
from    (
        select  object_id, object_name, object_type, owner 
        from    t1
        order by 
                object_id
        )
where 
        rownum <= :b2 + :b3
/

set feedback on

select * from table(dbms_xplan.display_cursor(format=>'peeked_binds'));

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------
SQL_ID  4sjjdyq59rwyw, child number 0
-------------------------------------
select * from (  select object_id, object_name, object_type, owner
from t1  order by   object_id  ) where  rownum <= :b2 + :b3

Plan hash value: 270731910

----------------------------------------------------------------------------------------
| Id  | Operation               | Name | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT        |      |       |       |       |  1342 (100)|          |
|*  1 |  COUNT STOPKEY          |      |       |       |       |            |          |
|   2 |   VIEW                  |      | 63684 |  9826K|       |  1342   (1)| 00:00:01 |
|*  3 |    SORT ORDER BY STOPKEY|      | 63684 |  3855K|  4776K|  1342   (1)| 00:00:01 |
|   4 |     TABLE ACCESS FULL   | T1   | 63684 |  3855K|       |   390   (1)| 00:00:01 |
----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter(ROWNUM<=:B2+:B3)
   3 - filter(ROWNUM<=:B2+:B3)

In this case the index scan doesn’t happen, and the arithmetic predicting 20 rows doesn’t appear anywhere in the plan. In fact we don’t see the peeked binds reported, even though I specifically requested them and even though they will be visible in the view v$sql_bind_capture.

Has the optimizer not looked at them, or has it forgotten them, or has it simply not reported them because it didn’t know how to use them? A quick check of the 10053 (CBO) trace file shows that the optimizer does actually take note of the values fairly early on in its processing:

*******************************************
Peeked values of the binds in SQL statement
*******************************************


----- Bind Info (kkscoacd) -----

PAYLOAD
----------------------------------------------------------------------------------------------------
 Bind#0
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=03 fl2=1000000 frm=00 csi=00 siz=48 off=0
  kxsbbbfp=7fce857384b8  bln=22  avl=02  flg=05
  value=15
 Bind#1
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=03 fl2=1000000 frm=00 csi=00 siz=0 off=24
  kxsbbbfp=7fce857384d0  bln=22  avl=02  flg=01
  value=5

After this, though, there are no further indications of the values being used.

The reason this omission is worth pursuing shows up in an article I wrote a couple of months ago about the “fetch next” syntax. If you use bind variables for the offset and fetch size then Oracle will rewrite your SQL to produce a predicate which effectively reduces to: row_number() <= :offset + :fetch_size and, as with the rownum clause above, will not work out that the sum of the two bind variables means that an indexed access path would be the best choice.

In fact, although I’ve chosen to demonstrate the problem with a sum of two bind variables, any arithmetic applied at this point is enough to disable the use of peeked valued – “:b1 + 0” and “:b1 * 1” both result in a critical loss of information even though one feels that the optimizer ought to be able to transform the constants out of the unparsed query in these extreme cases.

Pagination Cost – 2

Tue, 2025-05-27 10:36

This note is a follow-on to a note I published a couple of years ago, and was prompted by a question on the MOS community forum (needs an acount) about the performance impact of using bind variables instead of literal values in a clause of the form: offset 20 rows fetch next 20 rows only

The issue on MOS may have been to do with the complexity of the view that was was being queried, so I thought I’d take a look at what happened when I introduced bind variables to the simple tests from the previous article. Here’s the (cloned) script with the necessary modification:

rem
rem     Script:         fetch_first_offset_3.sql
rem     Author:         Jonathan Lewis
rem     Dated:          May 2025
rem

create table t1 
as
select 
        * 
from 
        all_objects
where   rownum <= 50000
order by 
        dbms_random.value
/

create index t1_i1 on t1(object_name);

alter session set statistics_level = all;
set serveroutput off

column owner format a32
column object_type format a12
column object_name format a32

spool fetch_first_offset_3.lst

prompt  ===================================
prompt  SQL with literals (non-zero offset)
prompt  ===================================

select
        owner, object_type, object_name
from
        t1
order by
        object_name
offset 
        10 rows
fetch next 
        20 rows only
/

select * from table(dbms_xplan.display_cursor(format=>'+cost allstats last peeked_binds'));


variable offset_size    number
variable fetch_size     number

begin
        :offset_size  := 10; :fetch_size := 20;
end;
/

prompt  ==============
prompt  SQL with binds
prompt  ==============

alter session set events '10053 trace name context forever';

select
        owner, object_type, object_name
from
        t1
order by
        object_name
offset 
        :offset_size rows
fetch next 
        :fetch_size rows only
/

alter session set events '10053 trace name context off';

select * from table(dbms_xplan.display_cursor(format=>'+cost allstats last peeked_binds'));

I’ve created a simple data set by copying 50,000 rows from the view all_objects and creating an index on the object_name column then, using two different strategies, I’ve selected the 21st to 30th rows in order of object_name. The first strategy uses literal values in the offset and fetch first/next clauses to skip 10 rows then fetch 20 rows; the second strategy creates a couple of bind variables to specify the offset and fetch sizes.

Here’s the execution plan (pulled from memory, with the rowsource execution statistics enabled) for the example using literal values:

SQL_ID  d7tm0uhcmpwc4, child number 0
-------------------------------------
select  owner, object_type, object_name from  t1 order by  object_name
offset  10 rows fetch next  20 rows only

Plan hash value: 3254925009

-----------------------------------------------------------------------------------------------------------------------
| Id  | Operation                     | Name  | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers | Reads  |
-----------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |       |      1 |        |    33 (100)|     20 |00:00:00.01 |      35 |    333 |
|*  1 |  VIEW                         |       |      1 |     30 |    33   (0)|     20 |00:00:00.01 |      35 |    333 |
|*  2 |   WINDOW NOSORT STOPKEY       |       |      1 |     30 |    33   (0)|     30 |00:00:00.01 |      35 |    333 |
|   3 |    TABLE ACCESS BY INDEX ROWID| T1    |      1 |  50000 |    33   (0)|     30 |00:00:00.01 |      35 |    333 |
|   4 |     INDEX FULL SCAN           | T1_I1 |      1 |     30 |     3   (0)|     30 |00:00:00.01 |       5 |     28 |
-----------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter(("from$_subquery$_002"."rowlimit_$$_rownumber"<=30 AND
              "from$_subquery$_002"."rowlimit_$$_rownumber">10))
   2 - filter(ROW_NUMBER() OVER ( ORDER BY "OBJECT_NAME")<=30)

As you can see, the optimizer has used (started) an index full scan to access the rows in order of object_name, but the A-Rows column tells you that it has passed just 30 rowids (the 10 to be skipped plus the 20 to be fetched) up to its parent (table access) operation, and the table access operation has passed the required columns up to its parent (window nosort stopkey) which can conveniently discard the first 10 rows that arrive and pass the remain 20 rows up and on to the client without actually doing any sorting.

You can also see in the Predicate Information that the window operation has used the row_number() function to limit itself to the first 30 (i.e. 10 + 20) rows, passing them up to its parent where the “30 rows” predicate is repeated with a further predicate that eliminates the first 10 of those rows, leaving only the 20 rows requested.

So what does the plan look like when we switch to bind variables:

SQL_ID  5f85rkjc8bv8a, child number 0
-------------------------------------
select  owner, object_type, object_name from  t1 order by  object_name
offset  :offset_size rows fetch next  :fetch_size rows only

Plan hash value: 1024497473

--------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                | Name | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers | Reads  |  OMem |  1Mem | Used-Mem |
--------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT         |      |      1 |        |   933 (100)|     20 |00:00:00.03 |     993 |    990 |       |       |          |
|*  1 |  VIEW                    |      |      1 |  50000 |   933   (1)|     20 |00:00:00.03 |     993 |    990 |       |       |          |
|*  2 |   WINDOW SORT PUSHED RANK|      |      1 |  50000 |   933   (1)|     30 |00:00:00.03 |     993 |    990 | 11264 | 11264 |10240  (0)|
|*  3 |    FILTER                |      |      1 |        |            |  50000 |00:00:00.02 |     993 |    990 |       |       |          |
|   4 |     TABLE ACCESS FULL    | T1   |      1 |  50000 |   275   (1)|  50000 |00:00:00.01 |     993 |    990 |       |       |          |
--------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter(("from$_subquery$_002"."rowlimit_$$_rownumber" <= GREATEST(FLOOR(TO_NUMBER(TO_CHAR(:OFFSET_SIZE))),0)+:FETCH_SIZE AND
              "from$_subquery$_002"."rowlimit_$$_rownumber" > :OFFSET_SIZE))
   2 - filter(ROW_NUMBER() OVER ( ORDER BY "OBJECT_NAME") <= GREATEST(FLOOR(TO_NUMBER(TO_CHAR(:OFFSET_SIZE))),0)+:FETCH_SIZE)
   3 - filter(:OFFSET_SIZE < GREATEST(FLOOR(TO_NUMBER(TO_CHAR(:OFFSET_SIZE))),0)+:FETCH_SIZE)

This looks like bad news – we haven’t taken advantage of an index to avoid visiting and sorting all the rows in the table, operation 4 shows us a table scan passing 50,000 rows through a filter up to the window sort at operation 2 which discards the 49,970 rows we definitely don’t want before passing the remaining 30 rows to the view operation that discards the first 10 that we needed to skip. Why don’t we see the far more efficient index scan?

You may have noticed a couple of oddities in the Predicate Information.

Wherever you see the :offset_size bind variable the optimizer has wrapped it in to_number(to_char()) – why?! My first thought about this was that the double conversion made it impossible for the optimizer to peek at the value and use it to get a better estimate of cost, but that’s (probably) not why the index full scan disappeared.
The offset and fetch first are both supposed to be numeric (according to the tram-tracks in the manual) so it seems a little strange that Oracle treats just one of them to a double conversion.
What is that filter() in operation 3 actually trying to achieve? If you tidy up the messy bits it’s just checking two bind variables to make sure that the offset is less than the offset plus fetch size. This is just an example of “conditional SQL”. In this case it’s following the pattern for “columnX between :bind1 and :bind2” – allowing Oracle to short-circuit the sub-plan if the value of bind2 is less than that of bind1. (It wasn’t needed for the example where we used literals because Oracle could do the artithmetic at parse time and see that 10 was – and always would be – less than 30.)
What are the checks actually saying about the optimizer’s (or developer’s) expectation for the way you might use the feature? The generated SQL actually allows for negative, non-integer values here. Negative offsets are replaced by zero, negative fetch sizes result in the query short-circuiting and returning no data (in fact any fetech size strictly less than 1 will return no rows).

Hoping to find further clues about the poor choice of plan, I took a look at the “UNPARSED QUERY” from the CBO (10053) trace, and cross-checked against the result from using the dbms_utility.expand_sql() procedure; the results were (logically, though not cosmetically) the same. Here, with a little extra cosmetic tidying is the SQL the optimizer actually works with:

select 
        a1.owner owner,
        a1.object_type object_type,
        a1.object_name object_name 
from    (
        select 
                a2.owner owner,
                a2.object_type object_type,
                a2.object_name object_name,
                a2.object_name rowlimit_$_0,
                row_number() over (order by a2.object_name) rowlimit_$$_rownumber 
        from
                test_user.t1 a2 
        where
                :b1 < greatest(floor(to_number(to_char(:b2))),0)+:b3
        ) a1 
where 
        a1.rowlimit_$$_rownumber <= greatest(floor(to_number(to_char(:b4))),0) + :b5 
and     a1.rowlimit_$$_rownumber >  :b6
order by 
        a1.rowlimit_$_0
;

It’s fascinating that the optimizer manages to expand the original two bind variables to six bind variables (lots of duplication) and then collapse them back to two named bind variables for the purposes of reporting the Predicate Information. For reference:

b1 = b3 = b5 = fetch_size
b2 = b4 = b5 = offset_size

Line 15, which I’ve highlighted, is clearly the source of the “conditional SQL” filter predicate at operation 3 of the previous execution plan, so I thought I’d try running this query (pre-defining all 6 bind variables correctly) to see if I could get the index-driven plan by modifying that line.

My first attempt was simply to remove the (highly suspect) to_number(to_char()) – but that didn’t help. Then I thought I’d make it really simple by getting rid of the greatest(floor()) functions – and that didn’t help either,. Finally I decided to change what was now :b4 + :b5 to a single bind variable :b7 with the right values – and that’s when I got the plan I wanted:

--------------------------------------------------------------------------------------------------
| Id  | Operation                      | Name  | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
--------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT               |       |      1 |        |     20 |00:00:00.01 |      35 |
|*  1 |  VIEW                          |       |      1 |     30 |     20 |00:00:00.01 |      35 |
|*  2 |   WINDOW NOSORT STOPKEY        |       |      1 |     30 |     30 |00:00:00.01 |      35 |
|*  3 |    FILTER                      |       |      1 |        |     30 |00:00:00.01 |      35 |
|   4 |     TABLE ACCESS BY INDEX ROWID| T1    |      1 |  50000 |     30 |00:00:00.01 |      35 |
|   5 |      INDEX FULL SCAN           | T1_I1 |      1 |     30 |     30 |00:00:00.01 |       5 |
--------------------------------------------------------------------------------------------------

Workaround

Of course this doesn’t help answer the question – how do I make the query faster – it just highlights where in the current transformation the performance problem appears. Maybe it’s a pointer to some Oracle developer that there’s some internal code that could be reviewed – possibly for a special (but potentially common) pattern. Perhaps there’s a point of interception where a fairly small, isolated piece of code could be modified to give the optimizer the simpler expression during optimisation.

As for addressing the problem of finding a “client-oriented” mechanism, I found two solutions for my model. First add the (incomplete, but currently adequate) hint /*+ index(t1) */ to the SQL to get:

---------------------------------------------------------------------------------------------------------------
| Id  | Operation                      | Name  | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |
---------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT               |       |      1 |        | 50311 (100)|     20 |00:00:00.01 |      25 |
|*  1 |  VIEW                          |       |      1 |  50000 | 50311   (1)|     20 |00:00:00.01 |      25 |
|*  2 |   WINDOW NOSORT STOPKEY        |       |      1 |  50000 | 50311   (1)|     20 |00:00:00.01 |      25 |
|*  3 |    FILTER                      |       |      1 |        |            |     20 |00:00:00.01 |      25 |
|   4 |     TABLE ACCESS BY INDEX ROWID| T1    |      1 |  50000 | 50311   (1)|     20 |00:00:00.01 |      25 |
|   5 |      INDEX FULL SCAN           | T1_I1 |      1 |  50000 |   339   (1)|     20 |00:00:00.01 |       5 |
---------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter(("from$_subquery$_002"."rowlimit_$$_rownumber"<=GREATEST(FLOOR(TO_NUMBER(TO_CHAR(:B1))),0
              )+:B2 AND "from$_subquery$_002"."rowlimit_$$_rownumber">:B1))
   2 - filter(ROW_NUMBER() OVER ( ORDER BY "OBJECT_NAME")<=GREATEST(FLOOR(TO_NUMBER(TO_CHAR(:B1))),0)+:
              B2)
   3 - filter(:B1<GREATEST(FLOOR(TO_NUMBER(TO_CHAR(:B1))),0)+:B2)

As you can see we now do the index full scan, but it stops after only 20 rowids have been passed up the plan. This isn’t a good solution, of course, since (a) it’s specific to my model and (b) the estimates still show the optimizer working on the basis of handling and forwarding 50,000 rows (E-rows).

The alternative was to tell the optimizer that since we’re doing pagination queries we’re only expecting to fetch a little data each time we execute the query – let’s add the hint /*+ first_rows(30) */ which gives us:

---------------------------------------------------------------------------------------------------------------
| Id  | Operation                      | Name  | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |
---------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT               |       |      1 |        |    33 (100)|     20 |00:00:00.01 |      25 |
|*  1 |  VIEW                          |       |      1 |     30 |    33   (0)|     20 |00:00:00.01 |      25 |
|*  2 |   WINDOW NOSORT STOPKEY        |       |      1 |     30 |    33   (0)|     20 |00:00:00.01 |      25 |
|*  3 |    FILTER                      |       |      1 |        |            |     20 |00:00:00.01 |      25 |
|   4 |     TABLE ACCESS BY INDEX ROWID| T1    |      1 |  50000 |    33   (0)|     20 |00:00:00.01 |      25 |
|   5 |      INDEX FULL SCAN           | T1_I1 |      1 |     30 |     3   (0)|     20 |00:00:00.01 |       5 |
---------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter(("from$_subquery$_002"."rowlimit_$$_rownumber"<=GREATEST(FLOOR(TO_NUMBER(TO_CHAR(:B1))),0
              )+:B2 AND "from$_subquery$_002"."rowlimit_$$_rownumber">:B1))
   2 - filter(ROW_NUMBER() OVER ( ORDER BY "OBJECT_NAME")<=GREATEST(FLOOR(TO_NUMBER(TO_CHAR(:B1))),0)+:
              B2)
   3 - filter(:B1<GREATEST(FLOOR(TO_NUMBER(TO_CHAR(:B1))),0)+:B2)

This is likely to be a much better strategy than “micro-management” hinting; and it may even be appropriate to set the optimizer_mode at the session level with a logon trigger, first_rows_10 or first_rows_100 could well be a generally acceptable result if most of the queries tended to be about reporting the first few rows of a large result set. A key point to note is that both E-Rows and Cost are reasonably representative of the work done, while the corresponding figures when we hinted the use of the index were wildly inaccurate.

Coalesce SQ bug

Mon, 2025-03-24 05:14

Summary

The “coalesce subquery” transformation introduced in the 11gR2 timeline can produce wrong results and you will find some bug reports and patches on MOS spread over versions from 12cR1 onwards. If you find that you are hitting a bug for which there appears to be no patch there are several options for disabling the feature, with different levels of granularity.

Set hidden parameter _optimizer_coalesce_subqueries to false to disable all suqbuery coalescing
Set fix_control to ‘10216738:0’ disable a particular ANY/ALL class of subquery coalescing
Use the no_coalesce_sq(@query_block) hint to stop specific subqueries from being coalesced
Set the optimizer_features_enable to 11.2.0.4 (not advised) to stop the optimizer from trying so hard.

Coalesce_sq bug – an example

Here’s an example of a problem with subquery coalescing that’s resurfaced in various forms, with patches that had varying degrees of success, since Oracle 12c. This example has been tested up to Oracle 19.11 and 23.7. It appeared as a question on the Oracle customer forum a couple of weeks ago where the OP had modelled their production problem on a copy of the sample scott.emp table (see footnote 1 for a script to recreate the table.).

rem
rem     Script: coalesce_sq_bug.sql
rem     Author: Adrián Gómez Brandón / jpl
rem     Dated:  13th March 2025
rem
rem     Last tested
rem             23.7.0.0
rem             19.11.0.0
rem

select
        /*+
--              optimizer_features_enable('11.2.0.4')
--              opt_param('_optimizer_coalesce_subqueries','false')
--              opt_param('_fix_control','10216738:0')
--              no_coalesce_sq(@em)
--              no_coalesce_sq(@me)
        */
        e1.empno,
        e2.empno
from    emp e1, 
        emp e2
where
        e1.empno != e2.empno
and     (e1.empno, e2.empno) not in (select /*+ qb_name(em) */ e3.empno, e3.mgr from emp e3 where e3.mgr is not null)
--
--      predicate for correct result (156 rows)
--
and     (e2.empno, e1.empno) not in (select /*+ qb_name(me) */ e4.empno, e4.mgr from emp e4 where e4.mgr is not null)
--
--      predicate for wrong results (169 rows)
--
-- and  (e1.empno, e2.empno) not in (select /*+ qb_name(me) */ e4.mgr, e4.empno from emp e4 where e4.mgr is not null)
--
/

There are several lines commented out in the query – mostly comments or hints – but the highlighted lines (28 and 33) are the lines of interest. One of them is currently a comment, the other is not. If you examine the two lines you will see that they say the same thing in two different ways. Simplifying, one says (a, b) not in (x,y) while the other says (b,a) not in (y,x). Unfortunately, though, one of them produces the wrong result.

Here are the execution plans – each preceded by the predicate that produced the plan:

Right result
(e2.empno, e1.empno) not in (select /*+ qb_name(me) */ e4.empno, e4.mgr from emp e4 where e4.mgr is not null)

----------------------------------------------------------------------------------
| Id  | Operation               | Name   | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT        |        |   182 | 14196 |    13   (0)| 00:00:01 |
|*  1 |  HASH JOIN RIGHT ANTI   |        |   182 | 14196 |    13   (0)| 00:00:01 |
|*  2 |   TABLE ACCESS FULL     | EMP    |    13 |   338 |     3   (0)| 00:00:01 |
|*  3 |   HASH JOIN ANTI        |        |   182 |  9464 |    10   (0)| 00:00:01 |
|   4 |    NESTED LOOPS         |        |   182 |  4732 |     7   (0)| 00:00:01 |
|   5 |     INDEX FULL SCAN     | EMP_PK |    14 |   182 |     1   (0)| 00:00:01 |
|*  6 |     INDEX FAST FULL SCAN| EMP_PK |    13 |   169 |     0   (0)| 00:00:01 |
|*  7 |    TABLE ACCESS FULL    | EMP    |    13 |   338 |     3   (0)| 00:00:01 |
----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("E1"."EMPNO"="E3"."EMPNO" AND "E2"."EMPNO"="E3"."MGR")
   2 - filter("E3"."MGR" IS NOT NULL)
   3 - access("E2"."EMPNO"="E4"."EMPNO" AND "E1"."EMPNO"="E4"."MGR")
   6 - filter("E1"."EMPNO"<>"E2"."EMPNO")
   7 - filter("E4"."MGR" IS NOT NULL)


Wrong result
(e1.empno, e2.empno) not in (select /*+ qb_name(me) */ e4.mgr, e4.empno from emp e4 where e4.mgr is not null)

---------------------------------------------------------------------------------
| Id  | Operation              | Name   | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |        |   182 |  9464 |    10   (0)| 00:00:01 |
|*  1 |  HASH JOIN ANTI        |        |   182 |  9464 |    10   (0)| 00:00:01 |
|   2 |   NESTED LOOPS         |        |   182 |  4732 |     7   (0)| 00:00:01 |
|   3 |    INDEX FULL SCAN     | EMP_PK |    14 |   182 |     1   (0)| 00:00:01 |
|*  4 |    INDEX FAST FULL SCAN| EMP_PK |    13 |   169 |     0   (0)| 00:00:01 |
|*  5 |   TABLE ACCESS FULL    | EMP    |    13 |   338 |     3   (0)| 00:00:01 |
---------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("E1"."EMPNO"="E3"."EMPNO" AND "E2"."EMPNO"="E3"."MGR")
   4 - filter("E1"."EMPNO"<>"E2"."EMPNO")
   5 - filter("E3"."MGR" IS NOT NULL)

The first (correct) plan shows us two hash anti-joins (i.e. “not in first SQ” and “not in second SQ”). The second plan shows us only one hash anti-join, and the predicate information tells us it’s referencing the occurence of emp with the alias e3. It looks as if the optimizer has “lost” the second subquery – but how could that happen?

The biggest clue is in the Outline Data section (which I didn’t show). The second query includes the hints: COALESCE_SQ(@”EM”) COALESCE_SQ(@”ME”). The optimizer thinks the two subqueries can be collapsed into a single subquery that will still get the correct answer. The optimizer was wrong and the effect of the logic it has used in the transformation has (effectively) left us with just the first of the two subqueries – and too many rows in the result set.

Workarounds

After seeing the coalesce_sq() hints in the Outline Data my first thought was simply to add the two no_coalesce_sq() hints you see in the SQL I shown above. This gave me the right answer, with the same plan as the working alternative.

It then occurred to me to check whether I actually needed both no_coalesce_sq() hints – and I didn’t, either one would do just as well on its own. This isn’t really surprising, the hint (positive and negative versions) operates at the Query Block level and names a query block that should/should not be coalesced; since I have just two subqueries that could be considered for coalescing, if I tell the optimizer that one of them must not be coalesced then there is nothing left that the optimizer could use to coalesce the other subquery with. I would have had to be more fussy if there had been 3 or more subqueries for the optimizer to choose from.

Another option, of course, is to disable the subquery coalescing feature completely – which could be done at the system level with a startup parameter or through an alter system call, or it could be done at the session level with an alter session call, or, as I’ve shown here, it’s a parameter that can be set for the query with the opt_param(‘_optimizer_coalesce_subqueries’,’false’) hint.

I spent a little time searching MOS for any appearance of this problem, and did find a handful of bug reports about similar problems, and patch notes about fixes, and patches that superceded patches because of errors that still appeared when the first patch was in place. So I passed the test case on to Oracle and it’s now in the system as an (unpublished) bug, but I got a very rapid reply that a workaround mentioned in some of the other MOS bug notes was appropriate for this example – a less aggressive blocking action, disabling only part of the subquery coalescing code: set fix control 10216738 to zero (hence the second opt_param() hint above).

If you check the v$system_fix_control view you’ll see that 10216738 has the description: “Toggels [sic] subquery coalescing for ANY and ALL subqueries”. Remembering that “not in ()” is equivalent to “!= ALL()” we can recognize that this is the area of code where we’re having a problem, so this is probably the minimal blocking that we can do to avoid the problem.

There was one other hack that I’d tried before getting the information about the fix control – I had re-run the query several times taking the optimizer_features_enable backwards through terminal versions of Oracle until it stopped producing the wrong results – hence the optimizer_features_enable(‘11.2.0.4’) hint, which you probably won’t want to use in a production system. One of the side effects of going back to the 11.2.0.4 optimizer features was that fix control 10216738 dropped to zero, so the fix control is clearly the best option until Oracle comes up with a working patch. My preference would be to do this through the fix_control startup parameter.

Footnote 1 (“scott.emp”) Click here to expand / collapse the script

rem     Script:         build_emp.sql
rem     Dated:          Mar 2004
rem

drop table emp;

create table emp (
        empno           number(4) not null,
        ename           varchar2(10),
        job             varchar2(9),
        mgr             number(4),
        hiredate        date,
        sal             number(7, 2),
        comm            number(7, 2),
        deptno          number(2)
);

insert into emp values
        (7369, 'SMITH',  'CLERK',     7902,
        to_date('17-dec-1980', 'dd-mon-yyyy'),  800, null, 20);

insert into emp values
        (7499, 'ALLEN',  'SALESMAN',  7698,
        to_date('20-feb-1981', 'dd-mon-yyyy'), 1600,  300, 30);

insert into emp values
        (7521, 'WARD',   'SALESMAN',  7698,
        to_date('22-feb-1981', 'dd-mon-yyyy'), 1250,  500, 30);

insert into emp values
        (7566, 'JONES',  'MANAGER',   7839,
        to_date('2-apr-1981', 'dd-mon-yyyy'),  2975, null, 20);

insert into emp values
        (7654, 'MARTIN', 'SALESMAN',  7698,
        to_date('28-sep-1981', 'dd-mon-yyyy'), 1250, 1400, 30);

insert into emp values
        (7698, 'BLAKE',  'MANAGER',   7839,
        to_date('1-may-1981', 'dd-mon-yyyy'),  2850, null, 30);

insert into emp values
        (7782, 'CLARK',  'MANAGER',   7839,
        to_date('9-jun-1981', 'dd-mon-yyyy'),  2450, null, 10);

insert into emp values
        (7788, 'SCOTT',  'ANALYST',   7566,
        to_date('09-dec-1982', 'dd-mon-yyyy'), 3000, null, 20);

insert into emp values
        (7839, 'KING',   'PRESIDENT', NULL,
        to_date('17-nov-1981', 'dd-mon-yyyy'), 5000, null, 10);

insert into emp values
        (7844, 'TURNER', 'SALESMAN',  7698,
        to_date('8-sep-1981', 'dd-mon-yyyy'),  1500,    0, 30);

insert into emp values
        (7876, 'ADAMS',  'CLERK',     7788,
        to_date('12-jan-1983', 'dd-mon-yyyy'), 1100, null, 20);

insert into emp values
        (7900, 'JAMES',  'CLERK',     7698,
        to_date('3-dec-1981', 'dd-mon-yyyy'),   950, null, 30);

insert into emp values
        (7902, 'FORD',   'ANALYST',   7566,
        to_date('3-dec-1981', 'dd-mon-yyyy'),  3000, null, 20);

insert into emp values
        (7934, 'MILLER', 'CLERK',     7782,
        to_date('23-jan-1982', 'dd-mon-yyyy'), 1300, null, 10);

--
--      Move emp to pack the data into the first few blocks
--      (and gather statistics)
--

alter table emp move online;

alter table emp  add constraint emp_pk  primary key(empno);

ORA-01652

Wed, 2025-03-19 06:58

This is just a reminder and a quick note about the Oracle error:

ORA-1652: unable to extend temp segment by 128 in tablespace {not the TEMP tablespace}

Since the named tablespace is not a temporary tablespace, the error has occured as a data sesgment was being created, rebuilt, moved, or was trying to extend – and the error message doesn’t tell you which segment. But (as with many other error messages) things have changed with the most recent version(s) of Oracle. Here’s the same error message from 23.6 when I forced it to happen with a simple “create table as select”:

ORA-01652: unable to grow table TEST_USER.T2 in tablespace TEST_8K_ASSM by 1MB 
during operation with SQL ID : gxt04puvaugw5, 
temp space used by session : 0 (MB)
Help: https://docs.oracle.com/error-help/db/ora-01652/

The alert log has an additional line:

ORA-01652 addlInfo tsn:7 objn:76831 objd:76831 user:TEST_USER obj:T2 subobj: type:2

So the reporting is much better – and you even get the sql_id of the statement that caused the error – which you might be able to find it in the library cache.

If you’re not on the latest version, though, or don’t get to see the error message in the alert log until hours later, or if there’s nothing in the library cache when you go searching, how do you discover the SQL that caused the error (or even identify the object if you’re on an older version of Oracle).

Unless you are subject to very frequent recurrence of the error you could just dump an errorstack whenever the error occurred:

alter system set events '1652 trace name errorstack level 1';

Now, when the error occurs the guilty session will dump a trace file – which will be reported with the ORA-01652 error message in the alert log. The file will probably be several thousand lines long – it depends on what you’ve been doing before the error occurs – but in the first 50 lines you should find something like this:

----- Current SQL Statement for this session (sql_id=gxt04puvaugw5) -----
create table t2 as select t1.* from t1, (select rownum from dual connect by level <= 10)

The important bit to search for is “Current SQL Statement” or, if you’re on 23c, the SQL_ID that was reported in the original error message.

Don’t forget to disable the trace once you’ve identified the source of the problem.

alter system set events '1652 trace name errorstack off';

Oracle Evolution

Mon, 2025-03-17 05:49

I responded to a problem on one of the Oracle forums recently about getting rid of some temporary segments that wouldn’t disappear after a call to “alter table move …” command had failed, and in searching for a note I’d written about how smon handled this situation I came across an article I wrote in the days of version 7.3.3.5 describing its actions so, in an idle moment, decided to check if the note was still correct in the days of 19c.

It wasn’t – so here’s a tiny (and probably pointless) comment about an interesting little change that probably caters for ever increasing sizes of database and volume of activity.

In Oracle 7 (and probably some later versions) smon would wake up every 300 seconds (5 minutes) to check if there were any free extents in the fet$ table (remember – we’re back in the days of freelist management and ASSM didn’t exist) and if it found any adjacent free extents it would coalesce a few of them, i.e. replace a few rows in fet$ describing small extents with a single row describing a larger extent. This hasn’t changed, although the newer versions of Oracle select a couple more columns from fet$ than they used to when doing this check.

Every 25th wakeup (which means every 2 hours and 5 minutes) smon would also look for “lost” segments, i.e. the ones listed in seg$ that were marked as temporary (type# = 3), and do the work needed to clean them out – which entailed deleting the row from seg$, deleting any related rows from fet$ and, possibly, doing a couple of other deletes and updates (e.g. tablespace quotas: tsq$). This has changed.

I enabled tracing on smon in an instance of Oracle 19.11 – and then enabled system-wide tracing on the query of seg$ that was appearing in v$sql but wasn’t appearing in the smon trace. It turned out that the query was now being executed every 2 hours (7,200 seconds) by one of the worker processes (Wnnn) controlled by the space management coordinator (SMCO).

I referred to these segments as “lost” rather than temporary – this was just to emphasise the point that I wasn’t talking about segments in the TEMPORARY tablespaces, but segments in a PERMANENT tablespace that had been created as (for example) the space needed during a call to “create index”. Occasionally, as with the forum question, large scale tasks like this fail unexpectedly and don’t always clean up their own messes – and that’s why there’s a process doing a little housekeeping.

You’ll note, of course, that I haven’t said anything about what happens after the check for these type 3 temporary segments has found a target. Having found that it was the Wnnn processes that searched for the segments I enabled sql_trace system-wide for all the Wnnn processes then started a session to execute a “create massive table” statement, but used a kill -9 on its shadow process from the operating system. This left me with a suitable temporary segment in my schema that disappeared a couple of minutes later, leaving the evidence I needed in the trace file from process W007 in the form of these three (cosmetically edited) consecutive statements:

select file#, block#, ts#, blocks from seg$ where type# = 3

update seg$ set 
        type#=:4,blocks=:5,extents=:6,minexts=:7,maxexts=:8,extsize=:9,extpct=:10,user#=:11,iniexts=:12,
        lists=decode(:13, 65535, NULL, :13),groups=decode(:14, 65535, NULL, :14), cachehint=:15,
        hwmincr=:16, spare1=DECODE(:17,0,NULL,:17),scanhint=:18, bitmapranges=:19 
where
        ts#=:1 and file#=:2 and block#=:3

delete from seg$ where ts#=:1 and file#=:2 and block#=:3

The entire “search and destroy” sequence has migrated to the Wnnn process.

Virtual Nuisance

Mon, 2025-02-24 11:09

Here’s a note that’s been sitting as a draft for the last 7 years – finally dusted down, retested, and published. Following my new publishing paradigm, I’m going to tell you the conclusion at the start, then show the working that demonstrates the point.

Summary

When cloning a table, or subset of a table, the simplest method in the “good old days” was to execute a call to: “create table cloned_table as select * from source_table where …” (CTAS) then create any indexes that you needed on the clone. Unfortunately if you’ve created any (visible) virtual columns the CTAS will generate real columns holding the calculated values.

There are various workarounds to this, of course, though not all viable for all systems.

The “obvious and sensible” option is (used to be) to identify just the real columns in the select list. But then you’d have to remember to check and add all the virtual bits afterwards (including the invisible bits).
A “lazy” option that I’ve seen suggested a few times is to drop the virtual columns, clone the table, then recreate the virtual columns – a slightly better bet might be to make them invisible before cloning then make them visible (on the source) afterwards. Either way, though, this isn’t a viable strategy for a production/shared developer table and there are cases where it simply can’t be used and, again, you have to recreate the the virtual columns on the clone.

A fairly nice strategy appeared in the 12.2 feature “create table XX for exchange …” which doesn’t have to be aimed at a partitioned table. This will create the virtual, including invisible, columns. You still have to insert the data, however, which means you still have to create a select list that covers all the “real” columns in the right order; and you still have to recreate the indexes and some constraints.

Discussion and demonstration

Quoting from the SQL Language Reference manual 23c for the Create Table statement, you will find under the subtitle “for exchange with table” the statement:

This clause lets you create a table that matches the structure of an existing partitioned table.

This is true – but not the whole truth. It will do what it says, but it can also be used to create a table that matches the structure of an existing non-partitioned table. This means that when you want to clone a table (or subset of a table) that includes virtual columns and hidden columns, it’s a little easier to get it right. You still have to get the data into the table, though and recreate indexes and some constraints.

The following script shows two attempts at cloning – first using a simple “create as select * ..”, the second using a “create for exchange” followed by steps to populate the copy – first the wrong way then the right way.

We start by creating a source table (a subset of the view all_objects) then add a couple of indexes, a primary key (which also means not null) constraint, and a couple of virtual columns (one invisible).

rem
rem     Script:         virtual_nuisance.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Feb 2018
rem     Purpose:        
rem
rem     Last tested 
rem             23.4.0.0 (23ai)
rem             19.11.0.0
rem             12.2.0.1
rem             11.2.0.4

prompt  ====================
prompt  Basic Table Creation
prompt  ====================

create table source_table nologging
as 
select  * 
from    sys.all_objects 
where   rownum <= 10000
;

-- alter table source_table modify object_id not null;
alter table source_table add constraint st_pk primary key(object_id);

create unique index st_u1 on source_table(object_type, object_id);
create        index st_i1 on source_table(object_name);

prompt  ===================
prompt  Add virtual columns
prompt  ===================

alter table source_table 
        add (
                new_name_visible 
                        generated always as ( initcap(object_name) ) virtual,
                new_name_invisible invisible 
                        generated always as ( lower(object_name) ) virtual
        )
;

Now we create two copies of the table, one with a simple “create as select *”, the other as a “create for exchange” which means we then have to populate it, which we try in two ways – first using “insert select *” which fails, leading us to insert a list of named columns:

prompt  ====================================================== 
prompt  Clone table - which instantiates the visible virtual column
prompt  ======================================================

create table clone_table as select * from source_table;

prompt  ====================================
prompt  Create for exchange (12.2+)
prompt  ====================================

create table exchange_table for exchange with table source_table;

insert into exchange_table select * from source_table;
-- *
-- ERROR at line 1:
-- ORA-54013: INSERT operation disallowed on virtual columns
-- Help: https://docs.oracle.com/error-help/db/ora-54013/
--

insert into exchange_table (
        owner, object_name, subobject_name, object_id, data_object_id, object_type,
        created, last_ddl_time, timestamp, status, temporary, generated,
        secondary, namespace, edition_name, sharing, editionable, oracle_maintained,
        application, default_collation, duplicated, sharded, created_appid, 
        created_vsnid, modified_appid, modified_vsnid
)
select
        owner, object_name, subobject_name, object_id, data_object_id, object_type,
        created, last_ddl_time, timestamp, status, temporary, generated,
        secondary, namespace, edition_name, sharing, editionable, oracle_maintained,
        application, default_collation, duplicated, sharded, created_appid, 
        created_vsnid, modified_appid, modified_vsnid
from
        source_table
/

-- 10000 rows created.

As a check we can select a few rows in the exchange table to see that the virtual columns exist and hold the expected data. (Remember that the invisible columns can be selected by name, even though they are invisible in a describe or a “select *”.)

column object_name              format a24
column new_name_visible         format a24
column new_name_invisible       format a24

select
        object_name, object_type, new_name_visible, new_name_invisible
from
        exchange_table
where
        rownum <= 5
/

OBJECT_NAME              OBJECT_TYPE             NEW_NAME_VISIBLE         NEW_NAME_INVISIBLE
------------------------ ----------------------- ------------------------ ------------------------
ALL_OBJECT_TABLES        VIEW                    All_Object_Tables        all_object_tables
ALL_OBJECT_TABLES        SYNONYM                 All_Object_Tables        all_object_tables
ALL_ALL_TABLES           VIEW                    All_All_Tables           all_all_tables
ALL_ALL_TABLES           SYNONYM                 All_All_Tables           all_all_tables
DBA_TABLES               VIEW                    Dba_Tables               dba_tables

5 rows selected.

Now check which of the three interesting columns exist in the three tables, and what their state is:

break on table_name skip 1

select
        table_name, column_name, 
        virtual_column "Virtual", 
        hidden_column  "Hidden" 
from    user_tab_cols
where   table_name in ( 'SOURCE_TABLE', 'CLONE_TABLE', 'EXCHANGE_TABLE')
and     column_name in ('OBJECT_NAME', 'NEW_NAME_VISIBLE','NEW_NAME_INVISIBLE')
order by
        table_name desc, column_name desc
/

TABLE_NAME                COLUMN_NAME          Vir Hid
------------------------- -------------------- --- ---
SOURCE_TABLE              OBJECT_NAME          NO  NO
                          NEW_NAME_VISIBLE     YES NO
                          NEW_NAME_INVISIBLE   YES YES

EXCHANGE_TABLE            OBJECT_NAME          NO  NO
                          NEW_NAME_VISIBLE     YES NO
                          NEW_NAME_INVISIBLE   YES YES

CLONE_TABLE               OBJECT_NAME          NO  NO
                          NEW_NAME_VISIBLE     NO  NO


8 rows selected.

Critically, the clone (CTAS) table doesn’t have the invisible virtual column, and the visible virtual column is no longer virtual. Both the virtual columns have appeared in the exchange table, both virtual, and the invisible column is still invisible (hidden).

But what of the indexes (including the implicit primary key index):

select 
        table_name, index_name, uniqueness
from
        user_indexes
where   table_name in ( 'SOURCE_TABLE', 'CLONE_TABLE', 'EXCHANGE_TABLE')
order by
        table_name desc, index_name
;

TABLE_NAME                INDEX_NAME           UNIQUENES
------------------------- -------------------- ---------
SOURCE_TABLE              ST_I1                NONUNIQUE
                          ST_PK                UNIQUE
                          ST_U1                UNIQUE


3 rows selected.

Neither of the copies has any indexes – which is not a surprise for the clone (CTAS) table but is a little disappointing for a table we have (nominally) created in anticipation of an exchange with partition. But that is exactly what the manual would have told us would happen – if we’d looked.

There is one other detail to be careful of when cloning tables – will columns that report as NOT NULL in the source table be declared NOT NULL in the clone/exchange table? The answer depends on why they show as NOT NULL.

In my example the object_id in the source table shows up as NOT NULL when described, but doesn’t in either of the copies because it is a side-effect of the primary key constraint and that constraint has not been recreated in the copying process.

ASSM states

Tue, 2025-01-07 03:48

When the dbms_space_usage package came out reporting the “percentage free” space in ASSM blocks with ranges like 0-25%, 25%-50%, etc. with flags FS1 to FS4, I had great trouble remembering whether 0-25% was FS1 or FS4 and it was a strangely long time before I managed to get the obvious association to stick in my mind.

I’ve just found a new way of getting confused again. I’ve been looking at some redo log entries relating to bitmap space management, trying to work out why, and when, Oracle changes the bits. Each “bit” is actually a nybble (half a byte, if you didn’t know) which allows for 8 distinct values, and when Oracle changes the state of a “bit” in an L1 bitmap the redo dump reports OP code 13.22 and lines like:

L1 bitmap change to state 3

So here (E & OE, and mostly for my own benefit) is a table of the flags reported by the dbms_space_usage package and the corresponding state.

7 = FS5 -> 100% empty
6 = not used ?
5 = FS4 -> 75% – 100%
4 = FS3 -> 50% – 75%
3 = FS2 -> 25% – 50%
2 = FS1 -> 0% – 25%
1 = FULL
0 = unformatted

Everything Changes

Tue, 2024-12-10 12:29

I saw a recent tweet (on Bluesky) from SQLDaily highlighting a blog note that Lukas Eder wrote in 2016 with the title: “Avoid using COUNT() in SQL when you could use EXISTS()”. This is a good guideline because it probably maximises the options the optimizer has for producing a good execution plan, but it’s not always necessary in recent versions of Oracle. It depends on what you’re trying to achieve and how you’re using count(), of course, but there are cases where the optimizer is capable of turning a count() subquery into an exists() subquery. Here’s a litle demo running on a PDB in 19.11.0.0:

rem
rem     Script:         count_to_exists.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Dec 2024
rem     Purpose:        
rem
rem     Last tested 
rem             23.4(ai)
rem             19.11.0.0
rem             12.2.0.1
rem
create table t1 as
select  *
from    all_objects
where   rownum <= 5e4
/

create table t2 as
select  *
from    t1
where   mod(object_id,50) = 0
/

create index t2_i1 on t2(owner);

set linesize 120
set pagesize 60
set trimspool on
set tab off

column  object_name     format a32
column  owner           format a32

select  owner, count(*) 
from    t1
group by
        owner
order by
        count(*)
/

select  owner from t1
minus
select  owner from t2
/

set feedback only
set serveroutput off

select
        owner, object_id, object_name
from
        t1
where
        t1.owner like 'D%'
and     0 < (
                select  count(*)
                from    t2
                where   t2.owner = t1.owner
        )
/

set feedback on
select * from table(dbms_xplan.display_cursor)
/

I’ve included a couple of queries to show the numbers of rows for each owner in t1 and to highlight the owner values that appear in t1 but don’t appear in t2. (I used the results of those queries to pick the ‘D%’ predicate in the final query.) Here’s the execution plan for the final query in 19.11.0.0:

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------
SQL_ID  cwj9t6wzhczgh, child number 0
-------------------------------------
select  owner, object_id, object_name from  t1 where  t1.owner like
'D%' and 0 < (   select count(*)   from t2   where t2.owner = t1.owner
)

Plan hash value: 82129279

------------------------------------------------------------------------------
| Id  | Operation            | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |       |       |       |   278 (100)|          |
|*  1 |  HASH JOIN RIGHT SEMI|       |  3244 |   167K|   278   (1)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN   | T2_I1 |   144 |   720 |     2   (0)| 00:00:01 |
|*  3 |   TABLE ACCESS FULL  | T1    |  6025 |   282K|   276   (1)| 00:00:01 |
------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("T2"."OWNER"="T1"."OWNER")
   2 - access("T2"."OWNER" LIKE 'D%')
       filter("T2"."OWNER" LIKE 'D%')
   3 - filter("T1"."OWNER" LIKE 'D%')

There’s no sign of any aggregation in the plan – so the count(*) can’t be happening. Unfortunately the optimizer has transformed the subquery out of existence and executed the query as a semi-join. So let’s make it easier to see what’s happened to the count(*) by stopping the optimizer from unnesting the subquery, and let’s use explain plan so that we can see what the complete Predicate Information looks like when the plan operates with a subquery.

explain plan for
select
        owner, object_id, object_name
from
        t1
where
        t1.owner like 'D%'
and     0 < (
                select  /*+ no_unnest */
                        count(*)
                from    t2
                where   t2.owner = t1.owner
        )
/

select * from table(dbms_xplan.display)
/

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------
Plan hash value: 3988898734

----------------------------------------------------------------------------
| Id  | Operation          | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |       |   463 | 22224 |   278   (1)| 00:00:01 |
|*  1 |  FILTER            |       |       |       |            |          |
|*  2 |   TABLE ACCESS FULL| T1    |  6025 |   282K|   276   (1)| 00:00:01 |
|*  3 |   INDEX RANGE SCAN | T2_I1 |     2 |    10 |     1   (0)| 00:00:01 |
----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter( EXISTS (SELECT /*+ NO_UNNEST */ 0 FROM "T2" "T2" WHERE
              "T2"."OWNER"=:B1))
   2 - filter("T1"."OWNER" LIKE 'D%')
   3 - access("T2"."OWNER"=:B1)

As you can see, the optimizer has worked out that “select count(*) > 0” is equivalent to “exists select” and transformed the subquery accordingly; which means it will stop after finding the first matching row.

Conclusion

I started by pointing out that a recent (Bluesky) tweet was echoing a blog note published in 2016. I also agreed that the general principle of using exists() to find the first occurrence rather than using count(*) was a good idea when you didn’t need to know “how many” and just needed to know “are there any”.

However I then demonstrated that the (Oracle) optimizer is so sophisticated that it can recognise when a count() subquery can be transformed into an exists() subquery -presumably because it gets a “clue” from the comparison with zero.

The point of this blog note, and the moral of the story is this: when trying to solve a problem by searching the internet, an article published 8 years ago may not give you the most up to date information or the best possible solution.

Consistent?

Tue, 2024-12-10 03:32

Here’s a cut-n-paste from a simple SQL*Plus session running under 19.11.0.0 (23.4 produces the same effects):

SQL> drop table t purge;

Table dropped.

SQL> create table t (id number, val number) rowdependencies;

Table created.

SQL> insert into t values(1,0);

1 row created.

SQL> commit;

Commit complete.

SQL> insert into t values(2,0);

1 row created.

SQL> update t set val = 1;

2 rows updated.

SQL> select * from t;

        ID        VAL
---------- ----------
         1          1
         2          1

2 rows selected.

Open a second terminal and start an SQL*Plus session that does the following:

SQL> update t set val = val + 1;

This statement is going to hang because the first (and only) row it ought to see at this moment in time is locked by session 1. If session 1 commits then session 2 should detect that it’s trying to update a row that has changed since the start of the transaction and rollback to restart so that it can do an update that is self-consistent as at a specific point in time.

Question – how many rows will session 2 update?

Here’s the cut-n-paste from session 2 that appeared as soon as session 1 commited:

1 row updated.

SQL> select * from t;

        ID        VAL
---------- ----------
         1          2
         2          1

2 rows selected.

This doesn’t look right. At one point in time the committed data in the table was the single row (1,0), and at another point in time the committed data in the table was the pair of rows (1,1) and (2,1). How can a single, self-consistent, update statement operate as at a single point in time and leave the table holding the rows you see above. Surely the statement should either fail leaving val = 1 in both rows, or it should succeed consistently leaving val = 2 in both rows.

As a further detail, note how I’ve enabled rowdependencies on the table. (This is something I usually do whenever I want to take advantage of the flashback version query feature though it’s not a necessity in this case). Here’s what I see when I commit in session 2 then query the entire versioning history of the table:

column   versions_startscn format   99,999,999,999,999
column   versions_endscn format     99,999,999,999,999
column   versions_starttime format  a24
column   versions_endtime format    a24

select
        versions_starttime,
        versions_startscn,
        versions_endtime,
        versions_endscn,
        versions_operation,
        versions_xid,
        id, val
from    t versions between scn minvalue and maxvalue
/

VERSIONS_STARTTIME         VERSIONS_STARTSCN VERSIONS_ENDTIME             VERSIONS_ENDSCN V VERSIONS_XID             ID        VAL
------------------------ ------------------- ------------------------ ------------------- - ---------------- ---------- ----------
09-DEC-24 02.38.22 PM     12,670,482,437,540                                              U 04001B004F600000          1          2
09-DEC-24 02.35.52 PM     12,670,482,437,468                                              I 06000000DA740000          2          1
09-DEC-24 02.35.52 PM     12,670,482,437,468 09-DEC-24 02.38.22 PM     12,670,482,437,540 U 06000000DA740000          1          1
                                             09-DEC-24 02.35.52 PM     12,670,482,437,468                             1          0

4 rows selected.

Surprisingly there is no “insert transaction” visible for (1,0), but, reading from the bottom upwards we can see that (1,0) ceased to exist at SCN 12,670,482,437,468 and that transaction 06000000DA740000 updated that row to (1,1) and inserted row (2,1). What we’re seeing as “versions” are the “versions as at commit”, of course, rather than the results of all the individual modifications (which is what you can get from Log Miner).

If we look at the (1,1) row again we can see that that verson ceased to exist at SCN 12,670,482,437,540 and the top line of the output tells us that that’s thanks to transaction 06000000DA740000 changing it to (1,2). That’s our update from the second session that we believe should have updated every existing row in the table by adding 1 to the val column.

Finally, when we look at the row for (2,1) we see that it came into existence at the same time (SCN 12,670,482,437,468) as row (1,1) but was not changed by the transaction that “happened” a little later and changed (1,1) to (1,2).

This behaviour first came to my attention in a comment to a note I wrote in 2011 about an upgrade from 11.1.0.6 to 11.2.0.2. The sample I’ve shown above is taken from that comment, modified slightly to separate the table creation from the first insert (just in case create as select introduced some peculiar special effects) and there are other examples and observations in the comments following the one I’ve linked to. The anomaly described in the actual blog note was addressed in 11.2.0.3 and 12.1.0.2 but, as you can see, there are other anomalies still waiting to be addressed.

Fetch size

Thu, 2024-12-05 03:56

Part of my effort to clear my long backlog of drafts

This is a trivial script I wrote many years ago – the date on the draft (one of the oldest on my blog) was August 2009 – but the script itself must have started life in 8i if not earlier

All the script does is execute a simple-minded test to find SQL that may be wasting resources and causing contention by using “small” array fetches to return “large” amounts of data. It does this through a scan of v$sql or (for later versions of Oracle) v$sqlstats or v$sqlstats_plan_hash to compare execution counts, number of fetch calls, and total rows returned.

The original code – probably written for 8i – used v$sql and reported the first_load_time for each child cursor but an update to the script took advantage of the introduction in 10g of the view v$sqlstats and started reporting the last_active_time – which means you could choose to ignore (temporarily, perhaps) any statements that had not been active in the very recent past.

I’ve tended to avoid the view v$sqlarea because, in its earliest form, it was simply an aggregate view of the x$ structure behind v$sql. When v$sqlstats appeared its contents were similar to those of v$sqlarea but its published implementation was a simple query against a different x$ so that’s the one I started using (having failed to notice that the implementation for v$sqlarea had changed to a simple query against yet another x$ at about the same time. Both x$ structures are created on the fly into the PGA so they’re probably doing similar amounts of aggregation under the covers although with many differences in the details they now report – including a number of “deltas since last AWR snapshot” in the newer view.

rem
rem     Script: small_fetch.sql 
rem     Author: Jonathan Lewis
rem     Dated:  (pre) Aug 2009
rem   

set linesize 156
set pagesize 40
set trimspool on
set tab off

break on row skip 1
column first_load_time format a20
column last_active_time format a20
column sql_text wrap word

spool small_fetch

select
-- 
--      v$sql.parsing_user_id,
--      v$sql.parsing_schema_id,
--      v$sql.first_load_time,
--
        to_char(last_active_time,'yyyy-mm-dd/hh24:mi:ss') last_active_time,
        sql_id,
        plan_hash_value,
        executions,
        end_of_fetch_count,
        fetches,
        rows_processed,
        round(rows_processed/nullif(executions,0),0)      rows_per_exec,
        round(rows_processed/nullif(fetches,0),0)         fetchsize,
        sql_text
from
        v$sqlstats
--      v$sql
--      v$sqlstats_plan_hash
where
        plan_hash_value != 0                            -- pl/sql block
-- and  (v$sql.parsing_user_id != 0 or v$sql.parsing_schema_id != 0)
and     fetches > 25000
and     round(rows_processed/nullif(executions,0),0) > 5000
and     round(rows_processed/nullif(fetches,0),0)    < 50
order by
        fetches desc
;

spool off

The numbers I’ve used in the SQL above are fairly arbitrary, your choices for “large” and “small” willl almost certainly be different.

The original script is probably an ad hoc statement that I wrote in a few minutes on a client site to poke around the SQL being executed by a system that was showing network problems and surprisingly large numbers of waits for “SQL*Net message to client”, so I’ve published it more as an example of the type of thing you could do than an example of “best practice”. It may be helpful, but your specific circumstances and requirements may make you want to change the choice of view you use, the column you report, and the predicates you use to filter the results.

You’ll notice from the code that v$sql includes the parsing_user_id and parsing_schema_id, but v$sqlstats doesn’t. A current convenience of using v$sql is that you can eliminate any statements parsed by SYS simply be adding the predicate “parsing_user_id != 0”. The reason why this is convenient is that a lot of the sys-recursive (largely dictionary cache) queries do single row fetches and you don’t want to clutter your report with things you can’t change.

A reason why v$sqlstats might be your preferred choice is that it has a large number of “delta_statistic” column – which give you counts and times since the last AWR snapshot (though how that works if you’re not licenced to run the AWR, I haven’t checked).

Caution

The header on the script suggests that the last system I used, and edited, it on was 11.1.0.7 – so before container databases came into existence. There are all sorts of columns in the views I’ve mentioned that might be very helpful, but there’s also the potential for traps relating to which schema can see what data, and from which PDB. Here, for example, is a cut-n-paste showing something that looks like a bug when running as SYS in a PDB on an instance running 19.11.0.0: note particularly how I’ve changed the view in the three queries from v$sql to v$sqlstats then back to v$sql.

SQL> select con_id, count(*) from (select sql_id, con_id, count(*) ct from V$sql group by sql_id, con_id ) group by con_id;

    CON_ID   COUNT(*)
---------- ----------
         3       1282

1 row selected.

SQL> select con_id, count(*) from (select sql_id, con_id, count(*) ct from V$sqlstats group by sql_id, con_id ) group by con_id;

    CON_ID   COUNT(*)
---------- ----------
         3       1395

1 row selected.

SQL> select con_id, count(*) from (select sql_id, con_id, count(*) ct from V$sql group by sql_id, con_id ) group by con_id;

    CON_ID   COUNT(*)
---------- ----------
         3       1282

1 row selected.

What can v$sqlstats see that v$sql can’t see?

Optimizer Environment 2

Thu, 2024-11-28 11:12

Many years ago I wrote a short note to introduce the view v$sql_optimizer_env (and the session and system equivalents) which report the current values of optimizer related parameters (some of which are not found in the startup file) for a cursor child, or a session, or the system. I also mentioned the underlying x$kqlfsqce (for v$sql_optimizer_env) that held far more parameters than were reported in the v$ view.

Recently I commented on a question about different plans (i.e. child cursors) for the same query for different sessions and noted that changing the optimizer_features_enable might not change any of the visible (v$) parameters – other than optimizer_features_enable itself – so it might be necessary to check the x$ to find out what had changed. Here’s a possible query to do that, followed by a few lines of output:

rem
rem     Script:         optimizer_env_b.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Oct 2020
rem     Purpose:        
rem

-- select count(*) from all_users;
define m_sql_id = 'b2zqhgr5tzbpk'

spool optimizer_env_b.lst

select 
        kqlfsqce_pname,
        kqlfsqce_chno,
        kqlfsqce_pvalue,
        kqlfsqce_flags
from 
        x$kqlfsqce 
where 
        kqlfsqce_sqlid = '&m_sql_id' 
order by 
        kqlfsqce_pname,
        kqlfsqce_chno
/

spool off


KQLFSQCE_PNAME                                     KQLFSQCE_CHNO KQLFSQCE_PVALUE           KQLFSQCE_FLAGS
-------------------------------------------------- ------------- ------------------------- --------------
BlockChain_ledger_infrastructure                               0 0                                     10
BlockChain_ledger_infrastructure                               1 0                                     10
CLI_internal_cursor                                            0 0                                     10
CLI_internal_cursor                                            1 0                                     10
PMO_altidx_rebuild                                             0 0                                      2
PMO_altidx_rebuild                                             1 0                                      2
_adaptive_window_consolidator_enabled                          0 true                                   7
_adaptive_window_consolidator_enabled                          1 true                                   7
_add_stale_mv_to_dependency_list                               0 true                                   6
_add_stale_mv_to_dependency_list                               1 true                                   6
_aggregation_optimization_settings                             0 0                                      7
_aggregation_optimization_settings                             1 0                                      7
_allow_level_without_connect_by                                0 false                                  6
_allow_level_without_connect_by                                1 false                                  6
...
uniaud_internal_cursor                                         0 0                                     10
uniaud_internal_cursor                                         1 0                                     10
workarea_size_policy                                           0 auto                                   2
workarea_size_policy                                           1 auto                                   2

1226 rows selected.

The script allows you to set an sql_id to report – the value I’ve used is for the statement select count(*) from all_users; The query as it stands has to be run by sys, but you could create a view (say x_$kqlfsqce) in the sys schema, grant select on that to public to allow everyone to query the view then create a public synonym for it that allowed everyone else to use the same query as sys:

create or replace view x_$kqlfsqce  as select * from x$kqlfsqce;
grant select on x_$kqlfsqce to public;
create public synonym x$kqlfsqce for x_$kqlfsqce;

The drawback to this query is clear in the final lines of the output – in 19.11 it returns 613 rows per child cursor leaving you have to analyze (by eye?) the entire list to find the parameter names where the parameter values differ. A better query would report only those parameters where the values differed (ideally listing the values side by side, but that’s another story). For example:

rem
rem     Script:         optimizer_env_2.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Oct 2020
rem 

-- select count(*) from all_users;
define m_sql_id = 'b2zqhgr5tzbpk'

spool optimizer_env_2.lst

with child_count as (
        select  
                /*+ materialize */
                count(distinct kqlfsqce_chno) child_count
        from
                x$kqlfsqce
        where
                kqlfsqce_sqlid = '&m_sql_id'
)
select 
        kqlfsqce_chno,
        kqlfsqce_pname,
        kqlfsqce_pvalue,
        kqlfsqce_flags
from 
        x$kqlfsqce 
where 
        kqlfsqce_sqlid = '&m_sql_id' 
and     (kqlfsqce_pnum, kqlfsqce_pvalue) in (
                select
                        kqlfsqce_pnum, kqlfsqce_pvalue
                from    x$kqlfsqce
                where   kqlfsqce_sqlid = '&m_sql_id' 
                group by 
                        kqlfsqce_pnum, kqlfsqce_pvalue
                having  count(*) != (select child_count from child_count)
        )
order by 
        kqlfsqce_pname,
        kqlfsqce_chno
/

spool off


KQLFSQCE_CHNO KQLFSQCE_PNAME                                     KQLFSQCE_PVALUE           KQLFSQCE_FLAGS
------------- -------------------------------------------------- ------------------------- --------------
            0 _bloom_filter_setops_enabled                       true                                   7
            1 _bloom_filter_setops_enabled                       false                                  7
            0 _bloom_pruning_setops_enabled                      true                                   7
            1 _bloom_pruning_setops_enabled                      false                                  7
            0 _cell_offload_grand_total                          true                                   7
            1 _cell_offload_grand_total                          false                                  7
            0 _cell_offload_vector_groupby_fact_key              true                                   7
            1 _cell_offload_vector_groupby_fact_key              false                                  7
 ...
            0 _sqlexec_reorder_wif_enabled                       true                                   7
            1 _sqlexec_reorder_wif_enabled                       false                                  7
            0 optimizer_features_enable                          19.1.0                                 2
            1 optimizer_features_enable                          12.2.0.1                               0

58 rows selected.

For this example I ran my query against all_users twice, but changed the optimizer_features_enable from 19.1.0 (the default for 19.11) to 12.2.0.1 for the second execution, so I got two child cursors. As you can see from the output the number of optimizer parameters that changed was 29 (half of 58) and the only parameter that would have been visible in v$sql_optimizer_env was optimizer_features_enable itself.

The SQL identifies the duplicates by initially counting the number of child cursors for the sql_id – because you might have more than just two, and knowing the number of child cursors is the only way of identifying which parameters have different values for different child cursors.

The fact that (in general) you won’t necessarily know how many child cursors you have to compare and report is what makes it a little harder to write a query that reports variations in values across the page – so it’s left as an exercise for the reader. Personally I’d be inclined to add a clause “kqlfsqce_chno in (N, M)”, picking child cursors with different execution plans, to limit the number to just two interesting child cursors and stick with reporting down the page.

Addendum

In case you’re wondering about the optimizer parameters that aren’t startup parameters, here’s a quick query to run as SYS. (The output is from 19.11 again):

select
        distinct kqlfsqce_pname
from
        x$kqlfsqce
where
        kqlfsqce_sqlid = '&m_sql_id'
minus
select 
        ksppinm
from
        x$ksppi
/

KQLFSQCE_PNAME
--------------------------------------------------------------------------------
BlockChain_ledger_infrastructure
CLI_internal_cursor
PMO_altidx_rebuild
_enable_pmo_ctas
_fix_control_key
_is_lock_table_for_ddl_wait_lock
_is_lock_table_for_split_merge
_long_varchar_allow_IOT
_multi_commit_global_index_maint
_nls_forced_binary
_suppress_scn_chk_for_cqn
_update_bji_ipdml_enabled
_update_global_index_from_part_exchange
_use_hidden_partitions
advanced_queuing_internal_cursor
flashback_data_archive_internal_cursor
flashback_table_rpi
ilm_access_tracking
ilm_dml_timestamp
ilm_filter
iot_internal_cursor
is_recur_flags
kkb_drop_empty_segments
only_move_row
optimizer_features_hinted
optimizer_mode_hinted
parallel_autodop
parallel_dblink
parallel_ddl_forced_degree
parallel_ddl_forced_instances
parallel_ddl_mode
parallel_ddldml
parallel_degree
parallel_dml_forced_dop
parallel_dop_doubled
parallel_execution_enabled
parallel_hinted
parallel_max_degree
parallel_query_default_dop
parallel_query_forced_dop
parallel_query_mode
partition_pruning_internal_cursor
rim_node_exist
sql_from_coordinator
sqlstat_enabled
total_cpu_count
total_processor_group_count
transaction_isolation_level
uniaud_internal_cursor

49 rows selected.

Footnote

Following up my announcement of this note on Bluesky someone asked if this information appeared in the 10053 (CBO) trace file. The answer is yes. Under a heading “PARAMETERS USED BY THE OPTIMIZER” there are three sub-headings (also in capitals, but made more readable below):

Parameters with altered values
Parameters with default values
Parameters in opt_param hint

The first two of these sub-headings share the full set of optimizer parameters, with no duplicates across the two lists. The last sub-heading introduces duplication, listing the parameter where the opt_param() hint has over-ridden the default (system) or altered (session) settings.

WWSS

Sun, 2024-11-24 10:33

What would Shakespeare say?

Q: So, Mr. Shakespeare, which RDBMS do you use to look up your “famous quotes”?
A: My lords, I refer me to the Oracle.

Q: And how would you feel about migrating to SQL Server or DB2?
A: A plague a’ both your houses.

Q: So what is your basic requirement for finding a good quote to use?
A: If it were done when ’tis done, then ’twere well it were done quickly.

Q: Do you think that using multiple block sizes in your database would improve performance?
A: It is a tale told by an idiot full of sound and fury signifying nothing.

Q: The CBO has a big part to play in performance – what makes it so difficult to understand?
A: Her infinite variety.

Q: And what would you say to a DBA who claims to understand the CBO?
A: There are more things in heaven and earth, Horatio, than are dreamt of in your philsophy.

Q: Do you think it’s a good idea to use hints when the CBO misbehaves?
A: Though this be madness, yet there is method in ‘t.

Q: And how much faith do you have in the execution plans that the CBO generates?
A: Oft expectation fails, and most oft there Where most it promises.

Q: NULLs can, of course, make things hard for the CBO – what’s your view on them?
A: Nothing can come of nothing.

Q: And how do you interpret NULLs?
A: To be or not to be, that is the question.

Q: Do you document the SQL statements you write?
A: It is a custom more honor’d in the breach than the observance.

Q: What is your experience with Oracle support?
A: The course of true love never did run smooth.

Q: And how you do think support will be affected by the introduction of AI?
A: More matter with less art.

Q: Are we going to see 23ai installed at Anne Hathaway’s cottage any time soon?
A: Expectation is the root of all heartache.

Q: And finally, what do you make of Sym42 and the MASH program?
A: O brave new world that has such people in it.

4096 Columns

Mon, 2024-10-28 11:43

If you thought 1,000 columns was over the top for a table in Oracle, then you’ll love 23ai which allows you to stretch that to 4,096 columns if you’ve set the parameter max_columns = extended in the spfile (or the pdb equivalent).

Naturally I was curious to see if there had been any significant structural changes to the way Oracle stored a row when the number of columns became very large, so I took the code I had written to examine the case of 1,000 columns, and hacked it around a little bit.

rem
rem     Script:         wide_table_4096.sql
rem     Author:         Jonathan Lewis
rem     Dated:          7th April 2023
rem
rem     Last tested
rem             23.4.25.4 (23ai)
rem
rem     See also
rem     https://jonathanlewis.wordpress.com/2015/02/19/255-columns/
rem     https://jonathanlewis.wordpress.com/2015/05/18/migrated-rows/
rem

set pagesize 0
set feedback off

drop table if exists t1 purge;

spool temp.sql

prompt create table t1(

select
        'col' || to_char(rownum,'fm0000') || '  varchar2(10),'
from
        all_objects
where   rownum <= 4095
;

prompt col4096 varchar2(10)
prompt )
prompt /

spool off

@temp

insert /*+ append */ into t1 (col0001, col0002, col0236,col0491, col0746, col4096) 
values ('0001','0002', '0236','0491','0746','4096')
;

commit;

set pagesize 40 linesize 132 feedback on

-- column ref_file_no  new_value m_fileno
column abs_file_no  new_value m_fileno
column block_no     new_value m_blockno 
column value        new_value m_trace_file

spool wide_table_4096

select
        dbms_rowid.rowid_to_absolute_fno(rowid, user, 'T1')     abs_file_no,
        dbms_rowid.rowid_relative_fno(rowid,'BIGFILE')          rel_file_no,
        dbms_rowid.rowid_relative_fno(rowid,'BIGFILE')          big_rel_file_no,
        dbms_rowid.rowid_relative_fno(rowid,'SMALLFILE')        small_rel_file_no,
        dbms_rowid.rowid_relative_fno(rowid)                    def_rel_file_no,
        dbms_rowid.rowid_block_number(rowid,'BIGFILE')          block_no,
        dbms_rowid.rowid_block_number(rowid,'SMALLFILE')        small_block_no,
        dbms_rowid.rowid_block_number(rowid)                    def_block_no
from
        t1
/


alter system flush buffer_cache;
alter system dump datafile &m_fileno block &m_blockno;

set heading off

-- select payload from my_trace_file;

select  payload
from    v$diag_trace_file_contents
where
        adr_home       = (select value from V$diag_info where name = 'ADR Home')
and     trace_filename = (
                select  substr(
                                value,
                                2 + (   select length(value)
                                        from    v$diag_info
                                        where name = 'Diag Trace'
                                )
                        )
                from    v$diag_info
                where   name = 'Default Trace File'
        )
/


spool off

The code is simple – I’ve used SQL to generate the SQL to create a table of 4,096 columns, then I’ve inserted a single row into the table, populating the first column, the last column, and a few in between. Then I’ve used a few calls to procedures in the dbms_rowid to convert the rowid of that row into a file and block number. When I installed 23ai and created my “user” tablespace it defaulted to a bigfile tablespace, but I’ve left a few variations of the calls in place to suit different configurations.

After finding the file and block number where the row starts I’ve dumped the entire block to my trace file and then read the tracefile back and spooled it to a flat file. My account is a “normal” account, but I’ve given it privileges to read various dynamic performance views – you might prefer to create a view and public synonym in the sys schema and grant select to public on the view. (See: https://jonathanlewis.wordpress.com/2019/10/03/trace-files-2/)

I’m not going to show you the symbolic dump of the block, instead I’ll show you a few egrep commands and their output:

******
****** Number of rows in the block:
******
jonathan@localhost working]$ egrep nro wide_table_4096.lst
nrow=17
0xe:pti[0]	nrow=17	offs=0

******
****** Number of rows in each column, and the row number of each row:
******
tab 0, row 0, @0x1e4a
tl: 262 fb: -----L-- lb: 0x1  cc: 255
--
tab 0, row 1, @0x1d42
tl: 264 fb: -------- lb: 0x1  cc: 255
--
tab 0, row 2, @0x1c3a
tl: 264 fb: -------- lb: 0x1  cc: 255
--
tab 0, row 3, @0x1b32
tl: 264 fb: -------- lb: 0x1  cc: 255
--
tab 0, row 4, @0x1a2a
tl: 264 fb: -------- lb: 0x1  cc: 255
--
tab 0, row 5, @0x1922
tl: 264 fb: -------- lb: 0x1  cc: 255
--
tab 0, row 6, @0x181a
tl: 264 fb: -------- lb: 0x1  cc: 255
--
tab 0, row 7, @0x1712
tl: 264 fb: -------- lb: 0x1  cc: 255
--

tl: 264 fb: -------- lb: 0x1  cc: 255
--
tab 0, row 9, @0x1502
tl: 264 fb: -------- lb: 0x1  cc: 255
--
tab 0, row 10, @0x13fa
tl: 264 fb: -------- lb: 0x1  cc: 255
--
tab 0, row 11, @0x12f2
tl: 264 fb: -------- lb: 0x1  cc: 255
--
tab 0, row 12, @0x11ea
tl: 264 fb: -------- lb: 0x1  cc: 255
--
tab 0, row 13, @0x10de
tl: 268 fb: -------- lb: 0x1  cc: 255
--
tab 0, row 14, @0xfd2
tl: 268 fb: -------- lb: 0x1  cc: 255
--
tab 0, row 15, @0xec6
tl: 268 fb: -------- lb: 0x1  cc: 255
--
tab 0, row 16, @0xea5
tl: 33 fb: --H-F--- lb: 0x1  cc: 16

******
****** First few columns of row 16 (17th row in the block)
******
[jonathan@localhost working]$ egrep -A+16 "row 16" wide_table_4096.lst
tab 0, row 16, @0xea5
tl: 33 fb: --H-F--- lb: 0x1  cc: 16
nrid:  0x00002275.f
col  0: [ 4]  30 30 30 31
col  1: [ 4]  30 30 30 32
col  2: *NULL*
col  3: *NULL*

The row has been stored as 17 separate row pieces with a maximum of 255 colums per row piece. (Note: 16 * 255 + 16 = 4096). The final egrep output shows you that the starting columns of the row are stored as the 17th row. The values of col 0 and col 1 are one clue, but the –H-F— flags (header, first) are the technical indicator. Each row piece points (backwards backwards) to the next rowpiece in the chain ending with row 0 where the —–L– flag tells us it’s the last rowpiece of the chain.

So, no great difference in the treatment of tables with very large numbers of columns – it’s just lots of row pieces of up to 255 columns, where the row is (initially) split up into 255 column chunks working from the end of the row, potentially leaving you with a very small number of columns in the first row piece.

There was one significant difference in the block dump that I haven’t shown yet. The ITL (interested transaction list) ended up with 5 entries as a result of the single insert.

 Itl           Xid                  Uba         Flag  Lck        Scn/Fsc
0x01   0x0009.01c.0000033c  0x00000103.0102.2a  --U-   17  fsc 0x0000.0128d402
0x02   0x0000.000.00000000  0x00000000.0000.00  ----    0  fsc 0x0000.00000000
0x03   0x0000.000.00000000  0x00000000.0000.00  C---    0  scn  0x0000000000000000
0x04   0x0000.000.00000000  0x00000000.0000.00  C---    0  scn  0x0000000000000000
0x05   0x0000.000.00000000  0x00000000.0000.00  C---    0  scn  0x0000000000000000

I did’t know if this is new, or something I had missed when the limit on the number of columns changed from 255 to 1,000, but the number of ITL entries varied according to the number of row pieces created during my single row insert. Repeating the test but changing the “last column populated” I got the following results:

One row piece (col0255) – itc = 2, Lck = 1
Two row pieces (col0256) – itc = 3, Lck = 2
Three row pieces (col0512) – itc = 4, Lck = 3
Four row pieces (col0768) – itc = 5, Lck = 4
Five row pieces (col1024) – itc = 5, Lck = 5 — i.e. same itc as 4096 columns/17 row pieces

Of course the “coincidence” that the ITL stopped growing at the fouth row made me decide to check back to an older version of Oracle (19.11.0.0) to see what happened where the limit was 1,000 columns – and it turned out that 19c also created extra ITL entries in the same way, up to the same limit. (I can’t help wondering whether this limit was “accidentally” carried forward to 23ai or whether there was a deliberate decision that 4 “spare” ITL slots was the best compromise between ITL waits and unnecessary space use.)

Footnote

There are probably several more questions to ask about features that might (or might not) adapt to the raising of the column limit. For example 11g had a limit of 20 (or ceil(column_count / 10)) for the number of extended stats definitions you could create for a single table when the maximum number of columns allowed was 1,000. Might this have changed for 23ai? (Initial answer: No. Longer answer: to be investigated – could you really get 370 sets of extended stats if you started with 3,700 columns in a table or would something breaks at (say) 256 sets?)

Audit Cleaning #JoelKallmanday

Wed, 2024-10-16 06:24

This note is an observation of an odd little detail that’s nice to know about the way that Unified Audit (see part 8 of Neil Chandler’s series on Fundamental Security) handles deleting old data. I don’t have a lot of background experience with Unified Audit, but I was surprised by some comments on a recent thread on the MOSC database forum (needs an account) about problems of chained rows and the difficulty of reclaiming space from the “audit tablespace”, so I spent a couple of hours looking at the tables, views and procedures to see if there were any pre-emptive strategies that would avoid the space management issue. All the comments below are based on a single pdb running under 19.11.

The unifiied audit is kept (in recent versions of Oracle) in an interval-partitioned table called aud$unified, owned by user audsys. The partitioning is on a column called event_timestamp – declared as timestamp(6) not null, the default interval is one month but you can change this by calling the procedure dbms_audit_mgmt.alter_partition_interval – and for some people it would be a very good idea to do so.

The forum thread included the complaint that the segment space advisor has found a lot of chained rows, and that a rebuild could remove 49 percent of them. Since the sum of the declared lengths of the tables columns is around 70KB and there are 3 CLOB columns declared as “enable storage in row”, I don’t think anyone should be surprised to see a lot of chained rows. (And I wouldn’t trust the advisor to do a very good job estimating what might happen with the CLOBs.)

Here’s a simple query to report the partitions, with high_value for the audit table:

select
        partition_name, blocks, num_rows,  high_value
from
        dba_tab_partitions
where
        table_name = 'AUD$UNIFIED'
order by
        partition_position desc
/

There are all sorts of things you’re not allowed to do with it – even if you’re connected as sys you should only be using the management package to handle it.

delete from audsys.aud$unified where event_timestamp < timestamp'2022-06-14 00:00:00'
                   *
ERROR at line 1:
ORA-46385: DML and DDL operations are not allowed on table "AUDSYS"."AUD$UNIFIED".

The thinking behind the audit table seems to be that you should be checking it on a regular basis then archiving (or just deleting) the data you’ve checked so that the audit table doesn’t become enormous. To this end Oracle allows you to set a “last archived” timestamp, which you can use in a call to the procedure dbms_audit_mgmt.clean_audit_trail. You can check the current value of this timestamp by querying view dba_audit_mgmt_last_arch_ts.

SQL> select  last_archive_ts
  2  from    dba_audit_mgmt_last_arch_ts
  3  /

LAST_ARCHIVE_TS
---------------------------------------------------------------------------
01-MAY-22 12.00.01.000000 AM +00:00

1 row selected.

There is also a function dbms_audit_mgmt.get_last_archive_timestamp() to return the timestamp, but this didn’t seem to work on my 19.11 instance. (The procedure to set_last_archive_timestamp() does work.)

There are procedures in the package that let you automate this cleaning, but the point(s) I want to make are things I got to by doing some simple tests calling each step of the process in turn. The oldest few rows in my audit table currently look like this:

PARTITION_NAME             BLOCKS   NUM_ROWS HIGH_VALUE
---------------------- ---------- ---------- ----------------------------------------
...
SYS_P28453                     50        701 TIMESTAMP' 2022-11-01 00:00:00'
SYS_P27945                     43        683 TIMESTAMP' 2022-10-01 00:00:00'
SYS_P20893                     88       1328 TIMESTAMP' 2022-09-01 00:00:00'
SYS_P20176                    110       1766 TIMESTAMP' 2022-08-01 00:00:00'
SYS_P20025                    244       3318 TIMESTAMP' 2022-07-01 00:00:00'
SYS_P17365                    103       1684 TIMESTAMP' 2022-06-01 00:00:00'

Assume I want to delete everything from the last 4 partitions, I could execute the following anonymous block:

declare
        ts      timestamp;
begin
        dbms_audit_mgmt.set_last_archive_timestamp(
                dbms_audit_mgmt.AUDIT_TRAIL_UNIFIED,
                timestamp '2022-09-01 00:00:00'
        );

        select  last_archive_ts
        into    ts
        from    dba_audit_mgmt_last_arch_ts
        ;

        dbms_output.put_line('Set: ' || ts);

        dbms_audit_mgmt.clean_audit_trail(
                audit_trail_type        => dbms_audit_mgmt.AUDIT_TRAIL_UNIFIED,
                use_last_arch_timestamp => true -- default
        );
end;
/


Set: 01-SEP-22 12.00.00.000000 AM

PL/SQL procedure successfully completed.

When I check dba_tab_partitions again, the end of the list is now as follows:

PARTITION_NAME             BLOCKS   NUM_ROWS HIGH_VALUE
---------------------- ---------- ---------- ----------------------------------------
...
SYS_P28453                     50        701 TIMESTAMP' 2022-11-01 00:00:00'
SYS_P27945                     43        683 TIMESTAMP' 2022-10-01 00:00:00'
SYS_P20893                     88       1328 TIMESTAMP' 2022-09-01 00:00:00'

Oracle has dropped three of the 4 partitions – although it has kept the the partition with the high_value of ‘2022-09-01’, which is now empty because that’s what I was expecting when I gave Oracle that timestamp. Looking at the trace file – I can see (using a suitable egrep command) three calls (driven by calls to dbms_pdb_exec_sql()) to drop an explicitly named partition followed by this delete statement:

delete from audsys.aud$unified where event_timestamp < :1 and  (dbid = :2 or dbid = 0)

So Oracle has been very clever, but not quite clever enough, when removing data from the audit table; it drops any partition whose high_value is (strictly) less than the timestamp you supply then, if necessary, uses a delete on just the one partition that has become the oldest partition.

So, a couple of suggestions for cleaning the audit trail (subject to you checking that I haven’t missed any important details, and any variations due to version of Oracle).

If you plan to set up a regular call to clean the audit table it might be a very good idea to match the partitioning interval with the schedule so that you never have to remove part of the data from a partition (i.e. you want to avoid running a big delete)

When you see what goes on with the last_archive_timestamp in my tests, it looks like a good idea to ensure that the value you pass to the cleaning procedure is a “high_value plus one (micro)second” (did you spot the “spare” second in my query result on dba_audit_mgmt_last_arch_ts) so that you drop every relevant partition rather than deleting every single row from the new “oldest” partition.

Footnotes

The delete statement is a little puzzling – how can you drop one partition but think you need to check the dbid before deleting the data from another partition? Maybe the apparent anomaly is a sign that Oracle is reusing a delete statement that would have to replace the drop commands if you had a setup where multiple PDBs were writing to the same (CDB) audit table.

You’ll notice that I’ve passed a (symbolic) parameter value of dbms_audit_mgmt.AUDIT_TRAIL_UNIFIED to the calls to the packaged procedures. If you haven’t moved all the different audit trails (the “legacy” audit, the FGA audit, and any others) into the unified audit there are other parameter values for cleaning the other trails.

Questions appear quite frequently about how to do arithmetic or comparison with the high_value column from dba_tab_partitions – the problem has disappeared in very recent versions of Oracle, but dbms_audit_mgmt has an interesting utility procedure: get_part_highval_as_char() I spoke too soon, the only table that this procedure will work for is audsys.aud$unified. Unless you’re on 23c (with CLOB and JSON versions of the column) you’ll have to fall back to something like the method I showed in a post I wrote earlier this year.

SQL> select dbms_audit_mgmt.get_part_highval_as_char('SYS_P31737') from dual;

DBMS_AUDIT_MGMT.GET_PART_HIGHVAL_AS_CHAR('SYS_P31737')
--------------------------------------------------------------------------------
 2023-04-01 00:00:00

1 row selected.

Safe Harbour / caveat emptor

This is a note about a topic that I’m not really familiar with, written rather quickly, and not yet reviewed for grammar and typing errors.

IN / NOT IN

Sat, 2024-10-12 03:02

A question appeared on the Oracle developer forum recently (Oct 2024) that could be answered by a blog note that I started drafting about 18 months ago after answering a very similar question that had appeared in the same forum. Take a look at the following query, with results, about a status column in a fairly large dataset:

select status, count(*) 
from my_table 
group by status 
order by 2 desc;

STATUS           COUNT(*)
-------------------------
STATUS_7         10580209
STATUS_6            23902
STATUS_5             1504
STATUS_4               48
STATUS_3               17
STATUS_2                8
STATUS_1                5

As you can see, almost all the data is at ‘STATUS_7’ or ‘STATUS_6’. The user wanted some code to report the rows that were not in these two statuses, which leads to one of two possible constructs:

select  message_id
from    my_table
where   status in ('STATUS_1', 'STATUS_2', 'STATUS_3', 'STATUS_4', 'STATUS_5')
;

select  message_id
from    my_table
where   status not in ('STATUS_6', 'STATUS_7')
;

Thanks to an index on status the first query returned its results in 0.05 seconds while the second query had to do a full table scan which took 6 seconds to complete. The problem the user faced was that there might be other status values appearing in the future, so the slow query was the safe one because it wouldn’t need to be edited in the future. So how do you get that “not in” query to run as quickly as the “in” query?

The first thing to check, of course, is how much freedom you have to modify the database and the code; it’s also important to check that any suggestion you make will produce the same result set. (It’s also a reasonably safe bet (on any forum) that the user has simplified the requirement in a way that makes any suggested solution in need of refinement to fit the real problem.)

In this case the answer is (seems to be) simple – change the question by taking advantage of the right Oracle feature and you can produce a solution that may even eliminate some of the current application overhead. “Add” a column to the table that holds a value only for rows that are in a status of interest then index that column – and the best way to “add” that column in all supported (and many unsupported) versions of Oracle is probably to create it as a virtual column:

alter table my_table
        add rare_status /* invisible */ 
        generated always as 
                (case when status in ('STATUS_6','STATUS_7') then null else status end)
        virtual
/

create index mt_i1 on my_table(rare_status);

execute dbms_stats.gather_table_stats(user,'my_table', method_opt=>'for columns rare_status size 1')

select rare_status, message_id from my_table where rare_status is not null;

You’ll notice, of course that I’ve included, as a comment, the invisible option when adding the virtual column. This is generally a good idea, though it may be a little confusing for people to see that they can query a table using a column that doesn’t (apparently) exist. The plus point for this approach is that (naughty) code that executes an “insert as select” statement without specifying the list of target columns will still succeed rather than raising error: ORA-00947: not enough values “for no apparent reason!”.

The fact that I’ve used the predicate “is not null” might prompt you to ask if there are any traps in this approach if status is allowed to be null, and that’s something to check if you extend the strategy to handle a more complicated requirement. In this case the “not in (6,7)” and “in (1,2,3,4,5)” return the same results even if there are rows where the status is null.

Here’s the execution plan (ignore the numbers since there’s no data in the table) for the “is not null” query:

Plan hash value: 3640753122

------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |          |     2 |    38 |     2   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| MY_TABLE |     2 |    38 |     2   (0)| 00:00:01 |
|*  2 |   INDEX FULL SCAN                   | MT_I1    |     1 |       |     1   (0)| 00:00:01 |
------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter("RARE_STATUS" IS NOT NULL)

Footnote

If we return to the user’s original query – it selects message_id and their plan shows the need to visit the table to collect that value. Since the index we’ve now created on the “interesting” rows is so small we might as well extend it to include message_id – remembering to use the same type of case expression so that we include values only for the necessary small number of rows in the index.

alter table my_table
        add rare_message_id invisible 
        generated always as 
                (case when status in ('STATUS_6','STATUS_7') then null else message_id end)
        virtual
/

drop index mt_i1;
create index mt_i1 on my_table(rare_status, rare_message_id);

execute dbms_stats.gather_table_stats(user,'my_table', method_opt=>'for columns rare_message_id size 1')

select rare_status, rare_message_id from my_table where rare_status is not null;


Execution Plan
----------------------------------------------------------
Plan hash value: 3308683209

--------------------------------------------------------------------------
| Id  | Operation        | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT |       |     2 |    18 |     1   (0)| 00:00:01 |
|*  1 |  INDEX FULL SCAN | MT_I1 |     2 |    18 |     1   (0)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("RARE_STATUS" IS NOT NULL)

Footnote

I’ve mentioned it before, and I keep wishing for it to appear in Oracle, but I’d really like to see the “filtered index” mechanism that can be used in SQL Server to handle this type of requirement without writing editing the code; something like:

create index mt_ss1 on my_table (status)
carrying (message_id) 
where    status not in ('STATUS_6','STATUS_7')
/

I’ve invented a syntax to say that this index should also include the message_id – again a feature (though I don’t remember the syntax) in SQL Server. Of course Oracle can carry “spare” columns, but only by adding them as if they belong in the index – which means their presence affects the selectivity and clustering_factor of the index and can, therefore, have serious side-effects on execution plans.

NOT IN bug

Wed, 2024-10-09 08:43

Some years ago I wrote a note describing how with a query using identical literal values in the predicate, the optimizer was able to produce exactly the same plan body with exactly the same Predicate Information section while producing different cardinality estimates depending on how you expressed the requirement for “column not in list of values”.

I won’t go into details of the anomaly since it disappeared in 11.2.0.4, but what I hadn’t paid any attention to was that the arithmetic (for both variations) was wrong, and it was only a recent question on the Oracle database forum that made me revisit the article, allowing me to spot and test 23.3 to see if the arithmetic was still wrong in 23.3. It is.

To demonstrate the problem, here’s a modified version of the code from the previous article. I usually try to write code that will create reproducible data across all versions of Oracle, but I didn’t do that in the previous post, and there was a significant difference between the data generated by 11.2.0.3 and the data generated by 23.3 so, among other details, I’ve changed the code to produce a fixed number of nulls.

rem
rem     Script:         not_in_anomaly_2.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Oct 2024
rem
rem     Last tested 
rem             23.3.0.0
rem             19.11.0.0
rem

create table t1
as
select
        ao.*
from
        all_objects      ao
where
        rownum <= 10000
/


update t1 set 
        object_type = null
where
        rownum <= 1000
--      object_type = 'TABLE'
/

commit;

begin
        dbms_stats.gather_table_stats(
                ownname          => user,
                tabname          =>'T1',
                method_opt       => 'for all columns size 1'
        );
end;
/

break on report
compute sum of ct on report

select
        object_type, count(*) ct
from
        t1
group by
        object_type
order by
        count(*)
/

select
        sample_size, num_nulls, num_distinct, histogram
from
        user_tab_columns
where
        table_name = 'T1'
and     column_name = 'OBJECT_TYPE'
;

This code (in my 23.3 database) produces a table of 10,000 rows, of which 1,000 have object_type set to NULL, with 13 distinct (non-null) values across the rest of the rows. There was a huge skew in the distribution of the data, but I’ve avoided that complication by collecting stats without histograms. Here are the results from the two queries against the data dictionary to confirm those details.

OBJECT_TYPE                     CT
----------------------- ----------
CONTEXT                          2
PROCEDURE                        2
DIRECTORY                        2
OPERATOR                         3
SEQUENCE                         3
FUNCTION                        14
PACKAGE                         22
DOMAIN                          22
INDEX                           48
TABLE                           71
TYPE                           805
                              1000    <<--- NULL
SYNONYM                       4002
VIEW                          4004
                        ----------
sum                          10000

14 rows selected.


SAMPLE_SIZE  NUM_NULLS NUM_DISTINCT HISTOGRAM
----------- ---------- ------------ ---------------
       9000       1000           13 NONE

After generating this data I checked the execution plans – in particular the predicate information and cardinality estimates – of several different queries involving variations of “not in” predicates; but I just want to show one query, with its execution plan from autotrace, that demonstrates the error very clearly:

select 
        count(*) 
from    t1
where   object_type not in ('INDEX', 'SYNONYM', 'VIEW')
union all
select 
        count(*) 
from    t1
where   object_type     in ('INDEX', 'SYNONYM', 'VIEW')
;

Execution Plan
----------------------------------------------------------
Plan hash value: 575959041

----------------------------------------------------------------------------
| Id  | Operation           | Name | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      |     2 |    12 |    52   (4)| 00:00:01 |
|   1 |  UNION-ALL          |      |     2 |    12 |    52   (4)| 00:00:01 |
|   2 |   SORT AGGREGATE    |      |     1 |     6 |            |          |
|*  3 |    TABLE ACCESS FULL| T1   |  7079 | 42474 |    26   (4)| 00:00:01 |
|   4 |   SORT AGGREGATE    |      |     1 |     6 |            |          |
|*  5 |    TABLE ACCESS FULL| T1   |  2077 | 12462 |    26   (4)| 00:00:01 |
----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - filter("OBJECT_TYPE"<>'INDEX' AND "OBJECT_TYPE"<>'SYNONYM' AND
              "OBJECT_TYPE"<>'VIEW')
   5 - filter("OBJECT_TYPE"='INDEX' OR "OBJECT_TYPE"='SYNONYM' OR
              "OBJECT_TYPE"='VIEW')

As you can see, the optimizer has rewritten the “not in()” as a conjunct of inequalities and the “in” as a disjunct of equalities. If I had written the query in exactly that form I would still have got the same results – provided I didn’t go all the way back to 11.2.0.3 in my testing.

The critical point is this: check the Rows column for operations 3 and 5 (the two tablescans), and add the two values together. The result is 9,156. Between them the two branches of the query cover all the rows where col1 is not null – and that’s 9,000 rows. Whatever the optimizer is doing it ought (for such a simple query) to match the total row count.

The “in” branch gives the right estimate: for my 9,000 rows with a non-null col1 the number of distinct values is 13 so selectivity of “column = constant” is 1/13; adding the values for the three selected values that gives us a selectivity of 3/13 and a cardinality of 2076.9 (q.e.d.)

The “not in” branch ought, therefore to give us a selectivity of 10/13 (i.e. 1 – 3/13) for a cardinality of 6,923(.08). Oracle’s estimate of 7,079 is out by 156, which is only a little over 2% in this case, so maybe the error is hardly ever noticeable to most people.

Where does the error come from. There are a couple of different ways of thinking about how the numbers can be produced. Here’s one viewpoint which shifts us from addition/subtraction to multiplication/division (which is a little more like the way the optimizer runs the arithmetic, I think).

To eliminate the first value we scale the rows down by one thirteenth; to eliminate the second value we scale that result by one twelfth (there are only 12 distinct values left), to eliminate the final value we scale the result by a further one eleventh – so the total is: 9000 * 12/13 * 11/12 * 10/11 = 6923.07 (q.e.d.)

BUT it looks as if Oracle is not allowing for the reduction in the number of distinct values as it eliminates each value in the list – so the optimizer’s arithmetic becomes: 9000 * 12/13 * 12/13 * 12/13 = 7078.7 (q.e.d).

Conclusion

There is a flaw in the way that the optimizer calculates the selectivity of “not in {list of values}” – the error may be fairly small in most cases but due to the nature of the error (and with the assistance of histograms and/or large numbers of rows with NULLs in the relevant column) there may be cases where a bad enough cardinality estimate appears and make the optimizer choose a poor execution plan. The nature of the error is such that the optimizer will over-estimate the cardinality.

Index puzzle

Mon, 2024-09-16 17:41

What could make a “create index” statement fail with an error about a table having too many columns (ORA-01792)? And why would this suddenly happen when the code that raised the error had been dropping and recreating the index every night for the best part of 3 years with no problems?

This is a question that came up on the Oracle developer forum recently, and was finally answered by the original poster after getting an important clue from another participant in the thread.

The clue (supplied by Solomon Yakobson) was that when you create a “function-based” index Oracle will create a hidden virtual column on the table but if the table has already reached the limit on the number of columns (1,000 until very recently) the attempt to create the column will fail with error ORA-01792, leading to the create index statement failing.

But this drop/create has been working perfectly for years – and a query for the table definition reported only 40 columns – so how did an extra 960 columns sneak in to break things? It’s a bug that can appear when you mix virtual columns with function-based indexes – Oracle bypasses a section of the code path that is supposed to clean up after the “drop index”. Here’s a very simple example to demonstrate.

rem
rem     Script:         fbi_puzzle_4.sql
rem     Author:         Jonathan Lewis / Marco Antonio Romito
rem     Dated:          Sep 2024
rem     Purpose:        
rem
rem     Last tested 
rem             19.11.0.0

drop table t1;

create table t1(
        n1 number(6,0),
        n2 number(6,0) as (n1 + 1) virtual,
        n3 number(6,0)
)
/

prompt  =============================
prompt  An index that causes problems
prompt  =============================

create index t1_i1 on t1(n1, nvl(n3,0), n2);

column column_name format a32
column data_default format a32

select
        column_id, column_name, 
        hidden_column, virtual_column, user_generated, internal_column_id,
        data_default
from 
        user_tab_cols 
where 
        table_name = 'T1'
order by 
        internal_column_id
/

I’ve created a table of three columns, where the second column is a virtual column. Then I’ve created an index on the table which includes references to all three columns, but one of those references includes a call to the nvl() function – which means Oracle will create a fourth (hidden, virtual) column for that column expression. Here are the results of the subsequent query against user_tab_cols:

 COLUMN_ID COLUMN_NAME                      HID VIR USE INTERNAL_COLUMN_ID DATA_DEFAULT
---------- -------------------------------- --- --- --- ------------------ --------------------------------
         1 N1                               NO  NO  YES                  1
         2 N2                               NO  YES YES                  2 "N1"+1
         3 N3                               NO  NO  YES                  3
           SYS_NC00004$                     YES YES NO                   4 NVL("N3",0)

4 rows selected.

As you can see n2 is flagged as virtual (but not hidden) generated as “N1” + 1; and there’s a fourth, system-generated, hidden, virtual column generated as NVL(“N3”,0).

As ever, things are more likely to break when you mix features. Here are a few more lines of code to run after creating the index and reporting the columns:

drop index t1_i1;
create index t1_i1 on t1(n1, nvl(n3,0), n2);

drop index t1_i1;
create index t1_i1 on t1(n1, nvl(n3,0), n2);

drop index t1_i1;
create index t1_i1 on t1(n1, nvl(n3,0), n2);

drop index t1_i1;
create index t1_i1 on t1(n1, nvl(n3,0), n2);

drop index t1_i1;
create index t1_i1 on t1(n1, nvl(n3,0), n2);

select
        column_id, column_name, 
        hidden_column, virtual_column, user_generated, internal_column_id,
        data_default
from 
        user_tab_cols 
where 
        table_name = 'T1'
order by 
        internal_column_id
/

You might expect Oracle to drop the system-generated column as it drops the index – but here are the results we now get from querying user_tab_cols:

 COLUMN_ID COLUMN_NAME                      HID VIR USE INTERNAL_COLUMN_ID DATA_DEFAULT
---------- -------------------------------- --- --- --- ------------------ --------------------------------
         1 N1                               NO  NO  YES                  1
         2 N2                               NO  YES YES                  2 "N1"+1
         3 N3                               NO  NO  YES                  3
           SYS_NC00004$                     YES YES NO                   4 NVL("N3",0)
           SYS_NC00005$                     YES YES NO                   5 NVL("N3",0)
           SYS_NC00006$                     YES YES NO                   6 NVL("N3",0)
           SYS_NC00007$                     YES YES NO                   7 NVL("N3",0)
           SYS_NC00008$                     YES YES NO                   8 NVL("N3",0)
           SYS_NC00009$                     YES YES NO                   9 NVL("N3",0)

9 rows selected.

Oracle “forgets” to drop the system-generated virtual column, and generates a new virtual column every time the index is recreated. Since the columns are hidden columns you won’t notice that something has gone wrong if you query the (more commonly used) view user_tab_columns – so there may well be a number of sites which have tables with huge numbers of hidden virtual columns, all quietly working their way towards unexpected ORA-01792 errors.

This “failure to drop” isn’t consistent behaviour – if the index had been defined as (n1, n2, nvl(n3,0)) then the system-generated column would have been dropped every time the index was dropped. I haven’t tried to work out exactly what conditions have to be met for the error to appear but I think it may be something to do with a user-defined virtual column appearing in an index after an expression that has to be supported by a system-generated virtual column.

Bear in mind that if I create two indexes (n1, nvl(n3,0)) and (n2, nvl(n3,0)) then Oracle will generate just one virtual column to support the expression nvl(n3,0), so when I drop one of these indexes Oracle will have to say: “does dropping this index mean I should drop a virtual column?” and I’m guessing that somewhere in that area of code Oracle is bypassing some of the checks it should be doing.

Summary

If you have any indexes that contain both virtual columns and column expressions, and if you drop and recreate those indexes from time to time, check user_tab_cols to make sure that you don’t have multiple identical system-generated, hidden, virtual columns that support the expression(s) in those indexes.

Workaround

In my case I found the following steps were sufficient to clear up the excess occurrences:

drop the problem index (n1, nvl(n3,0), n2)
create an index on just the problem expression (nvl(n3,0))
drop that index (which got rid of all the copies of the generated virtual columns)
recreate the problem index.

In more complicated cases (e.g. several indexes following this pattern, multiple indexes that include the same expression etc.) you may have to develop a more sophisticated approach.

num_index_keys – 2

Fri, 2024-08-09 06:58

A recent MOS Community forum posting (needs an account) raised the problem of a query with a specific index hint switching between two plans, one using “inlist iteration of a unique index scan” the other using a simple “index range scan with filter predicate”. Here’s an example of the pattern of the query, with the two execution plans:

select  /*+ index(INDEXES_A_LOCAL_IDX) */ 
        * 
from    INDEXES_A 
WHERE   ID in (
                :1 ,:2 ,:3 ,:4 ,:5 ,:6 ,:7 ,:8 ,:9 ,:10 ,:11 ,:12 ,:13 ,:14 ,:15 ,:16 ,:17 ,:18 ,:19 ,:20 ,
                :21 ,:22 ,:23 ,:24 ,:25 ,:26 ,:27 ,:28 ,:29 ,:30 ,:31 ,:32 ,:33 ,:34 ,:35 ,:36 ,:37 ,:38 ,:39 ,:40 ,
                :41 ,:42 ,:43 ,:44 ,:45 ,:46 ,:47 ,:48 ,:49 ,:50 ,:51 ,:52 ,:53 ,:54 ,:55 ,:56 ,:57 ,:58 ,:59 ,:60 ,
                :61 ,:62 ,:63 ,:64 ,:65 ,:66 ,:67 ,:68 ,:69 ,:70 ,:71 ,:72 ,:73 ,:74 ,:75 ,:76 ,:77 ,:78 ,:79 ,:80 ,
                :81 ,:82 ,:83 ,:84 ,:85 ,:86 ,:87 ,:88 ,:89 ,:90 ,:91 ,:92 ,:93 ,:94 ,:95 ,:96 ,:97 ,:98 ,:99, :100
        ) 
AND     CLIENT_ID = :101 
AND     PRODUCT_ID = :102
;

Plan hash value: 743077671

---------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name                | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
---------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |                     |       |       |   110 (100)|          |       |       |
|   1 |  PARTITION LIST SINGLE              |                     |     8 |   608 |   110   (0)| 00:00:01 |   KEY |   KEY |
|   2 |   INLIST ITERATOR                   |                     |       |       |            |          |       |       |
|   3 |    TABLE ACCESS BY LOCAL INDEX ROWID| INDEXES_A           |     8 |   608 |   110   (0)| 00:00:01 |   KEY |   KEY |
|*  4 |     INDEX UNIQUE SCAN               | INDEXES_A_LOCAL_IDX |     8 |       |   102   (0)| 00:00:01 |   KEY |   KEY |
---------------------------------------------------------------------------------------------------------------------------


Plan hash value: 1011922169

----------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                  | Name                | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
----------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                           |                     |       |       |    88 (100)|          |       |       |
|   1 |  PARTITION LIST SINGLE                     |                     |     1 |    79 |    88   (3)| 00:00:01 |   KEY |   KEY |
|   2 |   TABLE ACCESS BY LOCAL INDEX ROWID BATCHED| INDEXES_A           |     1 |    79 |    88   (3)| 00:00:01 |   KEY |   KEY |
|*  3 |    INDEX RANGE SCAN                        | INDEXES_A_LOCAL_IDX |     1 |       |    87   (3)| 00:00:01 |   KEY |   KEY |
----------------------------------------------------------------------------------------------------------------------------------

The index in the hint is the primary key index, defined as (client_id, product_id, id) and, as you can see, the query specifies exactly those three column with equality predicates – albeit with an “in-list” for the final column of the index. You’ll also notice that the table is list-partitioned (the partition key is the client_id, one value per partition) and the index is a local index.

It may not be immediately obvious how the two plans can achieve the same result, until you check the Predicate Information sections of the two plans which, in the same order and with several lines of “OR” predicates removed, look like this:

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - access("CLIENT_ID"=:101 AND "PRODUCT_ID"=:102 AND (("ID"=:1 OR "ID"=:2 OR "ID"=:3 OR "ID"=:4 OR
        "ID"=:5 OR "ID"=:6 OR "ID"=:7 OR "ID"=:8 OR "ID"=:9 OR "ID"=:10 OR "ID"=:11 OR "ID"=:12 OR "ID"=:13 OR "ID"=:14 OR
        ...
        "ID"=:95 OR "ID"=:96 OR "ID"=:97 OR "ID"=:98 OR "ID"=:99 OR "ID"=:100)))


Predicate Information (identified by operation id):
---------------------------------------------------
    3 - access("CLIENT_ID"=:101 AND "PRODUCT_ID"=:102)
        filter(("ID"=:1 OR "ID"=:2 OR "ID"=:3 OR "ID"=:4 OR "ID"=:5 OR "ID"=:6 OR "ID"=:7 OR "ID"=:8 OR "ID"=:9 OR "ID"=:10 OR
        ...
        "ID"=:95 OR "ID"=:96 OR "ID"=:97 OR "ID"=:98 OR "ID"=:99 OR "ID"=:100))

The plan with the inlist iterator operation uses the list as an access predicate – it does 100 precision probes of the index to find rowids and visits the table for each rowid in turn.

The plan using the index range scan uses the list as a filter predicate – it does a single range scan through the section of the index covering the given client_id and product_id and, for every index entry in that range, checks the id against the the list of values, visiting the table only for the cases when it finds a match (There are probably some cunning little tricks to minimise the work done checking the id – sorting the list into order once at the start of execution would be a reasonable one, starting the range scan at the lowest listed id and stopping at the highest would be another.)

Why can the optimizer produce two different plans? It’s just the result of the costing algorithm. In this case the optimizer compares the cost of doing a unique access 100 times (which is roughly 100 times the cost of doing it once) with the cost of a single large range scan (plus the CPU cost of lots of comparisons).

If you want to jump to it, here’s the summary and workaround.

I’ve blogged about this behaviour in a previous post, though the context in that case was an upgrade from 10g to 12c and the example and consequences were slightly different, so I decided to create a model of this case to demonstrate the point – and then discovered an oddity in the optimizer that probably made the switch appear to be rather random. I’m going to start by creating some data in a list-partitioned table with a local primary key index.

rem
rem     Script:         num_index_keys_3.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Mar 2024
rem     Purpose:        
rem
rem     Last tested 
rem             19.11.0.0
rem             23.3.0.0
rem

execute dbms_random.seed(0)

create table t1(
        client_id  not null,
        product_id not null,
        id         not null,
        v1,
        padding
)
partition by list (client_id) (
        partition p0 values ('Client0'),
        partition p1 values ('Client1'),
        partition p2 values ('Client2'),
        partition p3 values ('Client3'),
        partition p4 values ('Client4'),
        partition p5 values ('Client5'),
        partition p6 values ('Client6'),
        partition p7 values ('Client7'),
        partition p8 values ('Client8'),
        partition p9 values ('Client9')
)
as
with generator as (
        select 
                rownum id
        from dual 
        connect by 
                level <= 1e4    -- > comment to avoid WordPress format issue
)
select
        cast('Client' || least(trunc(abs(3 * dbms_random.normal)),9) as varchar2(254))  client_id,
        cast(trunc(abs(400 * dbms_random.normal)) as number(6,0))                       product_id,
        cast('Id' || lpad(rownum,7,'0') as varchar2(254))                               id,
        lpad(rownum,10,'0')            v1,
        lpad('x',250,'x')              padding
from
        generator       v1,
        generator       v2
where
        rownum <= 1e6   -- > comment to avoid WordPress format issue
order by
        dbms_random.value
/

execute dbms_stats.gather_table_stats(user, 't1', method_opt=>'for all columns size 1')

alter table t1 add constraint t1_pk primary key(client_id, product_id, id) using index local
-- in a production system this index (for this data pattern) ought to be defined "compress 2"
/

alter index t1_pk rebuild partition p3 pctfree 92
/

I’ve created a list-partitioned table with one client_id per partition. I’ve used the dbms_random.normal() function to generate an uneven distribution of client_id values and product_id values, and rownum to generate the final id column; then I’ve randomised the order of data before inserting it so that the index (partitions) will have large values for the clustering_factor. I’ve also rebuilt partition p3 of the index with a pctfree of 92 because I wanted to have a physically large index partition without generating a huge amount of table data. (The OP has partition sizes varying from a few hundred rows to a couple of hundred million rows – which is partly why the switch in plan can seem fairly random, and why the performance impact can be catastrophic.)

With my hacking factors I’ve got 10 values for client_id, 1,603 values for product_id, and the distribution of data across partitions is as follows:

select
        client_id, count(distinct product_id), count(*), min(id) min_id, max(id) max_id
from    t1
group by client_id
order by client_id
/

CLIENT_ID                COUNT(DISTINCTPRODUCT_ID)   COUNT(*) MIN_ID       MAX_ID
------------------------ ------------------------- ---------- ------------ ------------
Client0                                       1474     261218 Id0000002    Id0999997
Client1                                       1455     233591 Id0000001    Id1000000
Client2                                       1448     187657 Id0000006    Id0999998
Client3                                       1390     134710 Id0000003    Id0999996
Client4                                       1339      87095 Id0000034    Id0999981
Client5                                       1270      50005 Id0000056    Id0999902
Client6                                       1198      25894 Id0000124    Id0999975
Client7                                       1068      12083 Id0000033    Id0999804
Client8                                        914       4970 Id0000185    Id0999811
Client9                                        827       2777 Id0000764    Id0999515

After running a couple more queries to examine the data I picked partition p3 for testing, and 284 as a suitable product_id (249 rows in that partition), and created an anonymous PL/SQL block to execute a query to select 100 of those rows (using a couple of queries to generate most of the text).

Here’s a version of the code – cut back to select only 7 rows with an option for an 8th row – followed by a call to generate the SQL Monitor report (available thanks to the hint in the SQL) for the most recent SQL statement execution. The code counts the rows returned just to confirm that every id in the list really does correspond to a row in the partition:

set serveroutput on

<<anon>>
declare
        ct      number := 0;
        m_pad   t1.padding%type;

        client_id       varchar2(32) := 'Client3';
        product_id      number := 284;

        b1 varchar2(32) := 'Id0000628';
        b2 varchar2(32) := 'Id0002350';
        b3 varchar2(32) := 'Id0002472';
        b4 varchar2(32) := 'Id0007921';
        b5 varchar2(32) := 'Id0008073';
        b6 varchar2(32) := 'Id0008398';
        b7 varchar2(32) := 'Id0012196';
        b8 varchar2(32) := 'Id0013212';
begin
        for c1 in  (
                select  /*+ monitor  index(t1 t1_pk) */
                        *
                from    t1 t1
                where   t1.client_id = anon.client_id
                and     t1.product_id = anon.product_id
                and     t1.id in (
                                b1,  b2,  b3,  b4,  b5,  b6,  b7
--                              b1,  b2,  b3,  b4,  b5,  b6,  b7,  b8
                        )
        ) loop
                m_pad := c1.padding;
                ct := ct + 1;
        end loop;
        dbms_output.put_line(ct);
end;
/

set heading off
set pagesize 0 linesize 255
set tab off trimspool on

set long 250000
set longchunksize 250000

column text_Line format a254

select
        dbms_sql_monitor.report_sql_monitor(
                type          =>'TEXT',
                report_level  => 'all'
        ) text_line
from    dual
/

set pagesize 40 linesize 132

The code above produced the following SQL Monitor plan:

============================================================================================================================================
| Id |               Operation               | Name  |  Rows   | Cost |   Time    | Start  | Execs |   Rows   | Activity | Activity Detail |
|    |                                       |       | (Estim) |      | Active(s) | Active |       | (Actual) |   (%)    |   (# samples)   |
============================================================================================================================================
|  0 | SELECT STATEMENT                      |       |         |      |         1 |     +0 |     1 |        7 |          |                 |
|  1 |   PARTITION LIST SINGLE               |       |       7 |   16 |         1 |     +0 |     1 |        7 |          |                 |
|  2 |    INLIST ITERATOR                    |       |         |      |         1 |     +0 |     1 |        7 |          |                 |
|  3 |     TABLE ACCESS BY LOCAL INDEX ROWID | T1    |       7 |   16 |         1 |     +0 |     7 |        7 |          |                 |
|  4 |      INDEX UNIQUE SCAN                | T1_PK |       7 |    9 |         1 |     +0 |     7 |        7 |          |                 |
============================================================================================================================================

As you can see, we’ve used an inlist iterator and done 7 unique probes of the index. The cost of the indexing operation alone is 9: i.e. 2 for the root and branch blocks (which get pinned), and one for each of the 7 leaf blocks that Oracle expects to access.

Change the code so that we include the eighth variable in the list and the SQL Monitor plan shows:

===================================================================================================================================================
| Id |                  Operation                   | Name  |  Rows   | Cost |   Time    | Start  | Execs |   Rows   | Activity | Activity Detail |
|    |                                              |       | (Estim) |      | Active(s) | Active |       | (Actual) |   (%)    |   (# samples)   |
===================================================================================================================================================
|  0 | SELECT STATEMENT                             |       |         |      |         1 |     +0 |     1 |        8 |          |                 |
|  1 |   PARTITION LIST SINGLE                      |       |       8 |   17 |         1 |     +0 |     1 |        8 |          |                 |
|  2 |    TABLE ACCESS BY LOCAL INDEX ROWID BATCHED | T1    |       8 |   17 |         1 |     +0 |     1 |        8 |          |                 |
|  3 |     INDEX RANGE SCAN                         | T1_PK |       8 |    9 |         1 |     +0 |     1 |        8 |          |                 |
===================================================================================================================================================

The optimizer has decided to do an index range scan and examine every index entry for ‘Client3’, product 284 to find the 8 requested IDs – and it’s just about arithmetic. Tell the optimizer to use all three columns as access columns by adding the hint /*+ num_index_keys(t1 t1_pk 3) */ to the SQL and the plan changes to the inlist iterator version – with a higher cost:

============================================================================================================================================
| Id |               Operation               | Name  |  Rows   | Cost |   Time    | Start  | Execs |   Rows   | Activity | Activity Detail |
|    |                                       |       | (Estim) |      | Active(s) | Active |       | (Actual) |   (%)    |   (# samples)   |
============================================================================================================================================
|  0 | SELECT STATEMENT                      |       |         |      |         1 |     +0 |     1 |        8 |          |                 |
|  1 |   PARTITION LIST SINGLE               |       |       8 |   18 |         1 |     +0 |     1 |        8 |          |                 |
|  2 |    INLIST ITERATOR                    |       |         |      |         1 |     +0 |     1 |        8 |          |                 |
|  3 |     TABLE ACCESS BY LOCAL INDEX ROWID | T1    |       8 |   18 |         1 |     +0 |     8 |        8 |          |                 |
|  4 |      INDEX UNIQUE SCAN                | T1_PK |       8 |   10 |         1 |     +0 |     8 |        8 |          |                 |
============================================================================================================================================

Using the inlist iterator the total cost is 18: that’s 2 for the root and branch blocks, 8 for leaf blocks, 8 for table blocks. Using the range scan the total cost is 17: that’s 9 for the index scan plus the extra CPU, plus 8 for visiting 8 table blocks. (If you’re wondering why the index range scan is 9, there are 9,324 leaf blocks in the index partition and 1,390 product ids in the partition (and one client_id) : 9,324/1,390 = 6.7; round up, add one for the root, one for the branch –> total 9. The optimizer is using averages to estimate the size of the range scan needed.

The switch happens – it’s just arithmetic, and depends on how many leaf blocks Oracle thinks will be needed by the range scan compared to the number of leaf blocks that will be accessed by the iterated unique probe. The larger the list the more likely it is that Oracle will switch to using a range scan and filter.

The anomaly/bug

When I first wrote the PL/SQL block I thought I might need about 100 values to demonstrate the problem. In fact I would have needed a much larger data set to need that many values, nevertheless my first attempt introduced a puzzle that turned out to be an unpleasant glitch in the optimizer arithmetic.

If I set the PL/SQL block up to query using 96 variables I get the following plan:

===================================================================================================================================================
| Id |                  Operation                   | Name  |  Rows   | Cost |   Time    | Start  | Execs |   Rows   | Activity | Activity Detail |
|    |                                              |       | (Estim) |      | Active(s) | Active |       | (Actual) |   (%)    |   (# samples)   |
===================================================================================================================================================
|  0 | SELECT STATEMENT                             |       |         |      |         1 |     +0 |     1 |       96 |          |                 |
|  1 |   PARTITION LIST SINGLE                      |       |      96 |  105 |         1 |     +0 |     1 |       96 |          |                 |
|  2 |    TABLE ACCESS BY LOCAL INDEX ROWID BATCHED | T1    |      96 |  105 |         1 |     +0 |     1 |       96 |          |                 |
|  3 |     INDEX RANGE SCAN                         | T1_PK |      96 |    9 |         1 |     +0 |     1 |       96 |          |                 |
===================================================================================================================================================

It shouldn’t be a surprise to see the range scan/filter plan, with a total cost of 105: 9 for the index range scan + 96 for the table accesses.

Increase the number of variable to 97 and watch the cost (the plan doesn’t change):

===================================================================================================================================================
| Id |                  Operation                   | Name  |  Rows   | Cost |   Time    | Start  | Execs |   Rows   | Activity | Activity Detail |
|    |                                              |       | (Estim) |      | Active(s) | Active |       | (Actual) |   (%)    |   (# samples)   |
===================================================================================================================================================
|  0 | SELECT STATEMENT                             |       |         |      |         1 |     +0 |     1 |       97 |          |                 |
|  1 |   PARTITION LIST SINGLE                      |       |      97 |   10 |         1 |     +0 |     1 |       97 |          |                 |
|  2 |    TABLE ACCESS BY LOCAL INDEX ROWID BATCHED | T1    |      97 |   10 |         1 |     +0 |     1 |       97 |          |                 |
|  3 |     INDEX RANGE SCAN                         | T1_PK |       1 |    9 |         1 |     +0 |     1 |       97 |          |                 |
===================================================================================================================================================

Suddenly the Rows (estim) for the index is 1 (even though the row estimate for the table is 97) with the result that the total cost is now 10 – i.e. 9 for the index range scan plus 1 for the use of the single “predicted” rowid to visit the table.

Given very large data sets, and products with very large numbers of rows (which means big range scans and the appearance of the inlist iterator strategy for relatively long lists) you can imagine how baffling it must be when the optimizer suddenly says – “Hey, one extra value in the list, that means the range scan is really cheap” – and that’s probably why the OP was able to say: “The unique scan operation runs sub second, the range scan takes approx 10 secs to complete.”

You may recall that I pointed out in the code that my index really ought to be defined with “compress 2” – that’s true, but if you’ve got a system like this one already and haven’t used basic index compression, don’t rebuild the index to compress it just yet. If you halve the size of the index then (roughly) you will probably halve the size of the list where the optimizer switches from inlist iteration to the range scan/filter strategy.

Why, you might ask does this error appear at 97 rows? The answer is in the statistics about the partitions / clients that I listed above. This partition/client has 134,710 rows and 1,390 distinct products; 134,710 / 1,390 = 96.91. In my case (which had no histograms) when the table row estimate exceeded the average number of rows per product the optimizer hit a glitch in the code path and used the wrong numbers in the arithmetic.

Let’s do one more experiment – take the code with 96 variables (which does most of the arithmetic “correctly” and use the num_index_keys() hint to force Oracle into the iteration strategy, and here’s the plan from SQL Monitor:

============================================================================================================================================
| Id |               Operation               | Name  |  Rows   | Cost |   Time    | Start  | Execs |   Rows   | Activity | Activity Detail |
|    |                                       |       | (Estim) |      | Active(s) | Active |       | (Actual) |   (%)    |   (# samples)   |
============================================================================================================================================
|  0 | SELECT STATEMENT                      |       |         |      |         1 |     +0 |     1 |       96 |          |                 |
|  1 |   PARTITION LIST SINGLE               |       |      96 |  194 |         1 |     +0 |     1 |       96 |          |                 |
|  2 |    INLIST ITERATOR                    |       |         |      |         1 |     +0 |     1 |       96 |          |                 |
|  3 |     TABLE ACCESS BY LOCAL INDEX ROWID | T1    |      96 |  194 |         1 |     +0 |    96 |       96 |          |                 |
|  4 |      INDEX UNIQUE SCAN                | T1_PK |      96 |   98 |         1 |     +0 |    96 |       96 |          |                 |
============================================================================================================================================

The total cost is 194, compared to the cost of 105 for the range scan/filter plan. Despite the fact that Oracle “thinks” that the relevant section of index is only 9 leaf blocks (and I know it’s 17), it has assumed that every single one of the 96 probes it expects to make will have to do a physical read of a leaf block. The optimizer has not allowed for the “self-caching” that is going to happen as the lists get longer and longer – and that’s a fundamental reason behind this change from inlist iteration to range scanning with filters.

Summary

There may be all sorts of variations behind the switch between “iterate/unique” and “range/filter” and it’s likely that the presence of histograms would have some effect on the break point, and there may be some arithmetic relating the low/high values in the table/partition compared with the least/greatest values used in the list.

The bottom line, though, is that there’s some instability in the choice of execution plan. If you start seeing it and it’s causing performance problems your best bet is probably to use the num_index_keys() hint to ensure that you get the same plan every time.

Footnote

When I re-ran the test on an instance of 23.3 Free it behaved the same way, with one (slight?) difference. My 19.11 instance had been estimating the number of rows correctly, so the sudden change appeared when I used 97 variables in the list; in 23.3 I had to use 100 variables in the list before the error appeared – but that was exactly the point where the table row estimate hit 97.

Site search

Site navigation

Aggregator

Jonathan Lewis

rowlen surprise

SQL Developer and CBO

Bind Peeking

Pagination Cost – 2

Coalesce SQ bug

ORA-01652

Oracle Evolution

Virtual Nuisance

ASSM states

Everything Changes

Consistent?

Fetch size

Optimizer Environment 2

WWSS

4096 Columns

Audit Cleaning #JoelKallmanday

IN / NOT IN

NOT IN bug

Index puzzle

num_index_keys – 2

Pages

Site search

User login

Site navigation

Aggregator

You are here

Jonathan Lewis

Pages