Oracle FAQ | Your Portal to the Oracle Knowledge Grid |
Home -> Community -> Usenet -> c.d.o.server -> num_distinct too small with auto sample size on test db but not prod...
I have two large databases, production and test. They are on 10.1.0.4.
In them is a large partitioned table - it has about 1 billion rows.
The data in the test and prod versions are not exactly the same, but
are close. Stats have been gathered on both tables using
AUTO_SAMPLE_SIZE and FOR ALL COLUMNS SIZE AUTO. On the date column
which the table is partitioned on I'm getting a very different number
of distinct values in test and production.
In production the num_distinct is about right(about 1000). It doesn't use histograms. In test it is way off (17) and has a 17 bucket histogram.
I've done a lot of checking into this and am really puzzled. I saw the bugs on Metalink which talk about num_distinct being way off with auto_sample_size. I did a trace and got the SQL which Oracle uses to get the num_distinct. It is something like this:
select /*+ cursor_sharing_exact use_weak_name_resl dynamic_sampling(0)
no_monitoring */
count(distinct "MYCOLUMNNAME")
from "MYSCHEMA"."TEST" sample (10) t;
where "MYCOLUMNNAME" is the column name and 10 is the percentage.
When I fill in the percentage with the same percentage that was used by dbms_stats I get the correct value for num_distinct. It is about 1000 on both databases. I got sample_size from dba_tab_columns and divided it by the number of rows from dba_tables and multiplied by 100 to get the sampling percentage for each database. I plugged that percentage into the above query and ran it against the appropriate database.
So, I can't understand why the num_distict came out as 17.
Also, the tables are compressed and have recently had columns added to them.
Thanks for any input on this issue.