Re: dba_tables.num_rows is less than dba_indexes.num

From: Wolfgang Breitling <breitliw_at_centrexcc.com>
Date: Thu, 11 Aug 2005 09:19:22 -0600
Message-ID: <42FB6C7A.90505@centrexcc.com>

Leng Kaing wrote:

> Wolfgang,
>
> This is all news to me...
>
> 1) I always thought that NUM_ROWS is important. I know that a few weeks
> ago when I noticed the differences in the num_rows in tables vs indexes,
> we manually increased the table's num_rows and the query would revert to
> using an index rather than a full table scan. I have yet to do a test to
> see if what would happen if I set NUM_ROWS to a small number, eg. 5.

num_rows is important, but the CBO only uses the value from *_tables, not the one from *_indexes. As Christian already said, if someone has information to the contrary - with a repeatable testcase - let me know.

>
> 2) I also did "..., estimate_percent=>dbms_stats.auto_sample_size,
> method_opt=>'FOR
> ALL INDEXED COLUMNS', degree=>2, cascade=>true"
>
> Are you saying that FOR ALL INDEX COLUMNS or FOR COLUMNS only gathers
> histograms for the tables, but not the stats? So do I have to do it in 2
> steps - gather stats, then gather histograms?
>
> 3) I've never computed on any of the systems that I've worked on. And it
> certainly cannot be done on the current database as it would take too
> long. I always thought an estimate would be good enough. Now I'm hearing
> that only a compute will do. ARGH!! We cannot afford to do this.

Estimate is usually good enough for tables. In fact amazingly small estimate percentages (.01 or smaller) produce very good estimates for num_rows and avg_row_len.
Estimate is often not good enough for index statistics and IMO never good enough for column histograms.

Maybe one of the reasons you do not have enough time for a gather_compute is that you gather too many histograms and maybe miss a few worth having. A gather_table_stats(.. method_opt=>'for all [indexed] columns size <anything other than 1>') will do a full scan of the table for every column for which a histogram is collected. For estimate_percent=n a temp table with a n% sample of the original table is used instead of the full table.
This testcase shows that even for a very moderatley sized table. I ran every gather twice to alleviate any caching issues:

SQL> select count(*) from pspnlfield;

COUNT(*)

282100

SQL> exec dbms_stats.delete_table_stats(user,'PSPNLFIELD'); Elapsed: 00:00:00.04

SQL> exec
dbms_stats.gather_table_stats(user,'PSPNLFIELD',estimate_percent=>null,method_opt=>'for all columns size 254');
Elapsed: 00:05:33.01

SQL> exec dbms_stats.delete_table_stats(user,'PSPNLFIELD'); Elapsed: 00:00:01.02

SQL> exec
dbms_stats.gather_table_stats(user,'PSPNLFIELD',estimate_percent=>null,method_opt=>'for all columns size 1');
Elapsed: 00:00:56.01

SQL> exec dbms_stats.delete_table_stats(user,'PSPNLFIELD'); Elapsed: 00:00:00.03

SQL> exec
dbms_stats.gather_table_stats(user,'PSPNLFIELD',estimate_percent=>null,method_opt=>'for all columns size 254');
Elapsed: 00:06:04.06

SQL> exec dbms_stats.delete_table_stats(user,'PSPNLFIELD'); Elapsed: 00:00:01.02

SQL> exec
dbms_stats.gather_table_stats(user,'PSPNLFIELD',estimate_percent=>null,method_opt=>'for all columns size 1');
Elapsed: 00:01:01.01

SQL> exec dbms_stats.delete_table_stats(user,'PSPNLFIELD'); Elapsed: 00:00:00.01

SQL> exec
dbms_stats.gather_table_stats(user,'PSPNLFIELD',estimate_percent=>10,method_opt=>'for all columns size 254');
Elapsed: 00:00:32.00

SQL> exec dbms_stats.delete_table_stats(user,'PSPNLFIELD'); Elapsed: 00:00:01.00

SQL> exec
dbms_stats.gather_table_stats(user,'PSPNLFIELD',estimate_percent=>10,method_opt=>'for all columns size 1');
Elapsed: 00:00:06.08

SQL> exec dbms_stats.delete_table_stats(user,'PSPNLFIELD'); Elapsed: 00:00:00.00

SQL> exec
dbms_stats.gather_table_stats(user,'PSPNLFIELD',estimate_percent=>10,method_opt=>'for all columns size 254');
Elapsed: 00:00:31.07

SQL> exec dbms_stats.delete_table_stats(user,'PSPNLFIELD'); Elapsed: 00:00:01.01

SQL> exec
dbms_stats.gather_table_stats(user,'PSPNLFIELD',estimate_percent=>10,method_opt=>'for all columns size 1');
Elapsed: 00:00:06.07

A clarification to the statement that you probably collect too many histograms and at the same time may be missing some: You have method_opt=> 'for all indexed columns [size 75]' a) not all indexed columns really need histograms, only crucial ones with significant skew - that's the too many piece. b) histograms on crucial columns with significant skew, even if not indexed, can lead the optimizer to a better performing plan - that's the too few piece.

If anyone thinks that too many histograms can't hurt (other than the time wasted collecting them), I had an incident where a client changed their statistics gathering from 'all columns size 1' (i.e. no histograms) to 'all indexed columns size skewonly' and the performance of a frequently used sql went from 0.01 seconds to ~ 90 seconds.

-- 
Regards

Wolfgang Breitling
Centrex Consulting Corporation
www.centrexcc.com
--
http://www.freelists.org/webpage/oracle-l

Received on Thu Aug 11 2005 - 10:21:32 CDT

Re: dba_tables.num_rows is less than dba_indexes.num_rows