Oracle FAQ | Your Portal to the Oracle Knowledge Grid |
![]() |
![]() |
Home -> Community -> Usenet -> c.d.o.server -> CBO: Skewed Data and "NOT EQUALS"
I have a table with a column whose values are very skew. A large
majority of rows has a single value for this column. I have an index,
a not-null constraint and a frequency histogram on this column.
The CBO uses the histogram data correctly when I query with an "equals" condition. By which I mean if I want rows where the column is the majority value a full table scan is used and if I want rows for a low cardinality value then the index is used.
However this all changes if I use a "not equals" condition. If I ask for rows *not equal* to the majority value it does a full table scan when the index would be much better.
Setup:
(I'm running on 9.2 & 10.2)
CREATE TABLE skewtab AS
SELECT 'X' col , RPAD('X',4000) rest
FROM all_objects, all_objects
WHERE ROWNUM <=50000;
ALTER TABLE SKEWTAB
MODIFY(COL NOT NULL);
UPDATE skewtab
SET col = 'Z'
WHERE ROWNUM <= 20;
UPDATE skewtab
SET col = 'Y'
WHERE ROWNUM <= 10;
CREATE INDEX I1 on skewtab (col);
ANALYZE TABLE skewtab COMPUTE STATISTICS;
ANALYZE TABLE skewtab COMPUTE STATISTICS FOR COLUMNS col SIZE 5;
ANALYZE INDEX i1 COMPUTE STATISTICS;
SQL> select *
2 from skewtab
3 where col <> 'X'
4 /
20 rows selected.
Execution Plan
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=7636 Card=20 Bytes=80020)
1 0 TABLE ACCESS (FULL) OF 'SKEWTAB' (Cost=7636 Card=20 Bytes=80020)
Statistics
50082 consistent gets
...
Forcing the index with a hint :
SQL> select --+ INDEX(skewtab i1)
2 *
3 from skewtab
4 where col <> 'X'
5 /
20 rows selected.
Execution Plan
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=39 Card=20 Bytes=80020)
1 0 TABLE ACCESS (BY INDEX ROWID) OF 'SKEWTAB' (Cost=39 Card=20 Bytes=80020)
2 1 INDEX (FULL SCAN) OF 'I1' (NON-UNIQUE) (Cost=92 Card=20)
Statistics
116 consistent gets
...
The optimizer can even see that this is better (Cost = 39 versus 7636) yet doesn't choose it.
Even this is a little disappointing when, rather than a full index scan, the CBO could transform the query and do 2 range scans:
SQL> select *
2 from skewtab
3 where col > 'X'
4 or col < 'X'
5 /
20 rows selected.
Execution Plan
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=4 Card=2 Bytes=8002)
1 0 CONCATENATION
2 1 TABLE ACCESS (BY INDEX ROWID) OF 'SKEWTAB' (Cost=2
Card=1 Bytes=4001)
3 2 INDEX (RANGE SCAN) OF 'I1' (NON-UNIQUE) (Cost=2 Card=1)
4 1 TABLE ACCESS (BY INDEX ROWID) OF 'SKEWTAB' (Cost=2 Card=1 Bytes=4001)
5 4 INDEX (RANGE SCAN) OF 'I1' (NON-UNIQUE) (Cost=2 Card=1)
Statistics
28 consistent gets
...
Is this reasonable or should I ask for my money back?
Thanks for your time,
John
Received on Thu Aug 02 2007 - 06:21:20 CDT
![]() |
![]() |