RE: null values
Date: Mon, 18 Nov 2013 22:52:39 -0500
Message-ID: <04a701cee4da$cba191d0$62e4b570$_at_rsiz.com>
I think if you apply a predicate to null_col then you'll get good estimates, so *probably* this mis-estimate is limited to counting. You've got the test case handy, so give it a whirl. Something like where null_col > 0
should do it.
-----Original Message-----
From: Martijn Bos [mailto:maboc_at_maboc.nl]
Sent: Monday, November 18, 2013 12:48 PM
To: Mark W. Farnham
Cc: Oracle-L
Subject: Re: null values
Please find my comments inline
Mark, thanks for allready spending time on my problems.
Martijn
On Mon, Nov 18, 2013 at 06:52:21AM -0500, Mark W. Farnham wrote:
> I missed the input column that your stats included 900 num_rows.
>
> You may be on to something. If you add two rows, one null for that
> column and the other non-null, that num_rows stat should move to 901;
> does the estimate move to 1002?
>
insert into cbo_tab_1 values (1001,1,1,null,'filler'); insert into cbo_tab_1 values (1002,1,1,1,'filler'); commit;
- Table Stats ***** TABLE_NAME STATUS PCT_FREE PCT_USED NUM_ROWS BLOCKS EMPTY_BLOCKS LAST_ANALYZED AVG_ROW_LEN --------------- -------- ---------- ---------- ---------- ---------- ------------ -------------------- ----------- CBO_TAB_1 VALID 99 1002 1126 0 2013-NOV-18 15:06:53 515
- Table Column Stats ***** TABLE_NAME COLUMN_NAME NUM_DISTINCT LOW_VALUE HIGH_VALUE DENSITY NUM_NULLS NUM_BUCKETS LAST_ANALYZED SAMPLE_SIZE GLO USE AVG_COL_LEN HISTOGRAM --------------- --------------- ------------ ---------- ---------- ---------- ---------- ----------- -------------------- ----------- --- --- ----------- --------------- CBO_TAB_1 NULL_COL 900 C102 C20A64 .001111111 101 1 2013-NOV-18 15:06:53 901 YES NO 4 NONE
- Index Stats *****
INDEX_NAME INDEX_TYPE UNIQUENES
BLEVEL LEAF_BLOCKS DISTINCT_KEYS CLUSTERING_FACTOR NUM_ROWS LAST_ANALYZED
------------------------------ --------------------------- ---------
---------- ----------- ------------- ----------------- ----------
CBO_TAB_1_NULL_COL NORMAL NONUNIQUE 1 2 900 901 901 2013-NOV-18 15:06:54 - Index Columns ****
INDEX_NAME TABLE_NAME COLUMN_NAME
COLUMN_POSITION
------------------------------ --------------- -------------------------
CBO_TAB_1_NULL_COL CBO_TAB_1 NULL_COL 1
SQL> explain plan for select count(null_col) from cbo_tab_1; _at_xpln
ID OPERATION OBJECT ALIAS CARDINALITY COST OPTIMIZER BYTES ---------- ---------------------------------------- ---------------------------------------- -------------------- ----------- ---------- --------------- ---------- --- 0 SELECT STATEMENT() 1 2 ALL_ROWS 4 --P 1 SORT(AGGREGATE) 1 4 --P 2 INDEX(FAST FULL SCAN) (INDEX) MARTIJN.CBO_TAB_1_NULL_COL CBO_TAB_1_at_SEL$1 1002 2 ANALYZED 4008
ID PROJECTION
1 (#keys=0) COUNT("NULL_COL")[22] 2 "NULL_COL"[NUMBER,22]
SQL> As you can see, the index_stas display 901 rows in the index. (which I think is correct).
However, the CBO estimates 1002 rows now. So the problem persits
> Perhaps the CBO is (incorrectly) just using the table row stat for
> unrestricted query cardinality estimates, still getting the correct
> index choice because it is smaller by blocks than the table and no
> other index contains the column.
>
> Of course for full scans the size in blocks drives the relevant cost,
> but if the CBO were considering nested loop amounts and multiplying by
> a number far off, it could produce sub optimal plans.
>
Exactly. (See also below)
> Hmm. Try your test with 900 nulls and 100 not-null (easiest test from
> your existing script by reversing logic of mod to insert null). That
> should show us if it is just off by 10% or completely using the table
> stats instead of the index stats for cardinality.
>
The insert now is done as follows:
insert into cbo_tab_1 select level,
mod(level,10), round(level/100), -- decode(mod(level, 10), 0, null, level), decode(mod(level, 10), 0, level, null), 'filler' from dual connect by level<=1000; ***** Table Stats ***** TABLE_NAME STATUS PCT_FREE PCT_USED NUM_ROWS BLOCKS EMPTY_BLOCKS LAST_ANALYZED AVG_ROW_LEN--------------- -------- ---------- ---------- ---------- ----------
------------ -------------------- ----------- CBO_TAB_1 VALID 99 1000 1000 0 2013-NOV-18 15:14:06 512 ***** Table Column Stats ***** TABLE_NAME COLUMN_NAME NUM_DISTINCT LOW_VALUE HIGH_VALUE DENSITY NUM_NULLS NUM_BUCKETS LAST_ANALYZED SAMPLE_SIZE GLO USEAVG_COL_LEN HISTOGRAM
--------------- --------------- ------------ ---------- ---------- ---------- ---------- ----------- -------------------- ----------- --- --- ----------- --------------- CBO_TAB_1 NULL_COL 100 C10B C20B .01 900 1 2013-NOV-18 15:14:06 100 YES NO2 NONE
- Index Stats *****
INDEX_NAME INDEX_TYPE UNIQUENES
BLEVEL LEAF_BLOCKS DISTINCT_KEYS CLUSTERING_FACTOR NUM_ROWS LAST_ANALYZED
------------------------------ --------------------------- ---------
---------- ----------- ------------- ----------------- ----------
CBO_TAB_1_NULL_COL NORMAL NONUNIQUE 0 1 100 100 100 2013-NOV-18 15:14:06
The statistics show that the index has 100 rows. Again, I think that that should be enough information for the CBO, given the query.
And the plan output:
explain plan for select count(null_col) from cbo_tab_1; _at_xpln
ID OPERATION OBJECT ALIAS CARDINALITY COST OPTIMIZER BYTES ---------- ---------------------------------------- ---------------------------------------- -------------------- ----------- ---------- --------------- ---------- --- 0 SELECT STATEMENT() 1 1 ALL_ROWS 2 --P 1 SORT(AGGREGATE) 1 2 --P 2 INDEX(FULL SCAN) (INDEX) MARTIJN.CBO_TAB_1_NULL_COL CBO_TAB_1_at_SEL$1 1000 1 ANALYZED 2000
ID PROJECTION
1 (#keys=0) COUNT("NULL_COL")[22] 2 "NULL_COL"[NUMBER,22]
SQL> Now the cardianlity estimate is way off. So potentially creating sub-optimal exceution-paths for more complex query's.
I do notice that the excecution path changed from Fast Full Scan to Full Scan.
> "Thought" is the spelling you were wondering about.
>
Thank you. Somehow this is a very difficult word to spell for me.
> -----Original Message-----
> From: Martijn Bos [mailto:maboc_at_maboc.nl]
> Sent: Monday, November 18, 2013 1:22 AM
> To: Mark W. Farnham
> Subject: Re: null values
>
> Hi Mark,
>
> Thanks for your response.
> Please consider my remarks (inline)
>
> Best regards
> Martijn
>
> On Sun, Nov 17, 2013 at 09:57:05PM -0500, Mark W. Farnham wrote:
> > I think it is because there are 900 distinct keys, but the index is
> > non-unique, so it cannot know merely from the stats there actually
> > are no duplicates and the rest as nulls.
>
> whether there are duplicates or not is, as far as I understand, not so
> interesting.
>
> if my data is:
>
> 1
> 2
> null
> 1
> 2
> null
>
> then count(column) == 4
>
> So duplicates are not so important, in this case I think.
>
> > But it does get selected because it will be fewer total blocks for
> > the fast full scan than the table.
> >
>
> In the index_stats is a column num_rows which also says 900. I would
> have thaught (not sure how to spell the past tense of think :-)) that
> that statistics is sufficient.
>
>
> > mwf
> >
> > -----Original Message-----
> > From: oracle-l-bounce_at_freelists.org
> > [mailto:oracle-l-bounce_at_freelists.org]
> > On Behalf Of Martijn Bos
> > Sent: Sunday, November 17, 2013 4:51 PM
> > To: Oracle-L
> > Subject: null values
> >
> > Hi List,
> >
> > On a play-around sytem I do the following:
> > (This is a 11.2.0.4 RDBMS on linux x86 (32 bits))
> >
> > drop table cbo_tab_1;
> > create table cbo_tab_1 (id number, sca number(10,0), clu
> > number(10,0), null_col number(10,0), filler char(500)) pctfree 99
> > pctused 1;
> >
> > -- Following statement makes sure that ther actually are nulls in
> > column null_col insert into cbo_tab_1 select level, mod(level,10),
> > round(level/100),
> > decode(mod(level, 10), 0, null, level), 'filler' from dual connect by
> > level<=1000;
> > commit;
> >
> > create unique index CBO_TAB_1_ID on CBO_TAB_1(id);
> > -- create index cbo_tab_1_id on cbo_tab_1(id); create index
> > cbo_tab_1_sca on cbo_tab_1(sca); create index cbo_tab_1_clu on
> > cbo_tab_1(clu); create index cbo_tab_1_null_col on
> > cbo_tab_1(null_col);
> >
> > exec dbms_stats.gather_table_stats('MARTIJN','CBO_TAB_1',
> > block_sample=>false, cascade=>true, method_opt=>'for all columns
> > size 1'); alter system flush shared_pool; --Since I try it a lot I
> > need it to parse over and over again.
> >
> > explain plan for select count(null_col) from cbo_tab_1; --HERE IS
> > THE SQL OF WHICH I DON'T UNDERSTAND THE EXPLAIN PLAN
> >
> > SQL> _at_xpln -- A script I created just to get a little more
> > SQL> understanding of
> > explaining statement.
> >
> > ID OPERATION OBJECT
> > ALIAS CARDINALITY COST OPTIMIZER BYTES
> > ---------- ----------------------------------------
> > ---------------------------------------- --------------------
> > -----------
> > ---------- --------------- ----------
> > --- 0 SELECT STATEMENT()
> > 1 2 ALL_ROWS 4
> > --P 1 SORT(AGGREGATE)
> > 1 4
> > --P 2 INDEX(FAST FULL SCAN) (INDEX)
> > MARTIJN.CBO_TAB_1_NULL_COL CBO_TAB_1_at_SEL$1 1000
> 2
> > ANALYZED 4000
> >
> > ID PROJECTION
> > ------
> >
> ----------------------------------------------------------------------
> -----
> > 1 (#keys=0) COUNT("NULL_COL")[22]
> > 2 "NULL_COL"[NUMBER,22]
> > SQL>
> >
> > (if you find that suspect, I also added a more documented way to
> > display explain plans)
> >
> > SQL> select * from table(dbms_xplan.display(format=>'all'));
> >
> > PLAN_TABLE_OUTPUT
> > --------------------------------------------------------------------
> > --
> > ------
> > --------------------------------------------------------------------
> > --
> > ------
> > ------------------------------------------------
> > Plan hash value: 2862360477
> >
> > --------------------------------------------------------------------
> > --
> > ------
> > ----------------
> > | Id | Operation | Name | Rows | Bytes |
Cost
> > (%CPU)| Time |
> > --------------------------------------------------------------------
> > --
> > ------
> > ----------------
> > | 0 | SELECT STATEMENT | | 1 | 4 |
2
> > (0)| 00:02:01 |
> > | 1 | SORT AGGREGATE | | 1 | 4 |
> > | |
> > | 2 | INDEX FAST FULL SCAN| CBO_TAB_1_NULL_COL | 1000 | 4000 |
2
> > (0)| 00:02:01 |
> > --------------------------------------------------------------------
> > --
> > ------
> > ----------------
> >
> > Query Block Name / Object Alias (identified by operation id):
> > -------------------------------------------------------------
> >
> > 1 - SEL$1
> > 2 - SEL$1 / CBO_TAB_1_at_SEL$1
> >
> > Column Projection Information (identified by operation id):
> > -----------------------------------------------------------
> >
> > 1 - (#keys=0) COUNT("NULL_COL")[22]
> > 2 - "NULL_COL"[NUMBER,22]
> > SQL>
> >
> >
> >
> > Now the thing that is troubling me is the cardinality estimate at id
> > 2 (1000). I would say that the CBO could esitmate 900 rows.
> > Where does the CBO get's that idea of 1000?
> >
> >
> > SQL> select TABLE_NAME, STATUS, PCT_FREE, PCT_USED, NUM_ROWS,
> > SQL> BLOCKS, EMPTY_BLOCKS, LAST_ANALYZED, AVG_ROW_LEN
> > from user_tables
> > where table_name='CBO_TAB_1';
> >
> > TABLE_NAME STATUS PCT_FREE PCT_USED NUM_ROWS BLOCKS
> > EMPTY_BLOCKS LAST_ANALYZED AVG_ROW_LEN
> > --------------- -------- ---------- ---------- ---------- ----------
> > ------------ -------------------- -----------
> > CBO_TAB_1 VALID 99 1000 1000
> > 0 2013-NOV-17 22:27:10 515
> >
> > SQL> select index_name, index_type, uniqueness, blevel, leaf_blocks,
> > SQL> distinct_keys, clustering_factor, num_rows, last_analyzed
> > from user_indexes
> > where table_name='CBO_TAB_1';
> >
> > INDEX_NAME INDEX_TYPE UNIQUENES
> > BLEVEL LEAF_BLOCKS DISTINCT_KEYS CLUSTERING_FACTOR NUM_ROWS
> LAST_ANALYZED
> > ------------------------------ --------------------------- ---------
> > ---------- ----------- ------------- ----------------- ----------
> > --------------------
> > CBO_TAB_1_ID NORMAL UNIQUE
> > 1 2 1000 1000 1000 2013-NOV-17
> > 22:27:10
> > CBO_TAB_1_SCA NORMAL NONUNIQUE
> > 1 2 10 1000 1000 2013-NOV-17
> > 22:27:10
> > CBO_TAB_1_CLU NORMAL NONUNIQUE
> > 1 2 11 1000 1000 2013-NOV-17
> > 22:27:10
> > CBO_TAB_1_NULL_COL NORMAL NONUNIQUE
> > 1 2 900 900 900 2013-NOV-17
> > 22:27:10
> > SQL>
> >
> > As far as I'm concerned/understand the CBO chooses the right index
> > (CBO_TAB_1_NULL_COL). CBO_TAB_1_NULL_COL has 900 num rows, since the
> > other
> > 100 rows are null.
> > I would say that in this specific case it would be easy for the CBO
> > to estimate 900 rows since it only visits this index (all
> > information is in the index).
> >
> > Now I'm thinking that I might oversee something very simple (or that
> > i'm flat out wrong), so would anyone be so kind as to point me in
> > the right direction for some guidance, or (better yet :-)) explain
> > where my thinking is wrong.
> >
> >
> > Best Regards,
> > Martijn
> >
>
-- http://www.freelists.org/webpage/oracle-lReceived on Tue Nov 19 2013 - 04:52:39 CET