num_distinct too small with auto sample size on test db but not prod...

From: <bdurrettccci_at_yahoo.com>
Date: 4 Jan 2007 12:28:39 -0800
Message-ID: <1167942518.923027.13440@51g2000cwl.googlegroups.com>

I have two large databases, production and test. They are on 10.1.0.4. In them is a large partitioned table - it has about 1 billion rows. The data in the test and prod versions are not exactly the same, but are close. Stats have been gathered on both tables using AUTO_SAMPLE_SIZE and FOR ALL COLUMNS SIZE AUTO. On the date column which the table is partitioned on I'm getting a very different number of distinct values in test and production.

In production the num_distinct is about right(about 1000). It doesn't use histograms. In test it is way off (17) and has a 17 bucket histogram.

I've done a lot of checking into this and am really puzzled. I saw the bugs on Metalink which talk about num_distinct being way off with auto_sample_size. I did a trace and got the SQL which Oracle uses to get the num_distinct. It is something like this:

select /*+ cursor_sharing_exact use_weak_name_resl dynamic_sampling(0) no_monitoring */
count(distinct "MYCOLUMNNAME")
from "MYSCHEMA"."TEST" sample (10) t;

where "MYCOLUMNNAME" is the column name and 10 is the percentage.

When I fill in the percentage with the same percentage that was used by dbms_stats I get the correct value for num_distinct. It is about 1000 on both databases. I got sample_size from dba_tab_columns and divided it by the number of rows from dba_tables and multiplied by 100 to get the sampling percentage for each database. I plugged that percentage into the above query and ran it against the appropriate database.

So, I can't understand why the num_distict came out as 17.

Also, the tables are compressed and have recently had columns added to them.

Thanks for any input on this issue.

Bobby

Received on Thu Jan 04 2007 - 14:28:39 CST