Re: DBMS_STATS [resend chomped version]

From: Jonathan Lewis <jonathan_at_jlcomp.demon.co.uk>
Date: Wed, 23 Jun 2004 13:37:29 +0100
Message-ID: <021201c4591e$d9026400$7102a8c0@Primary>

Notes in-line

But I don't understand why the option for including a reply marker doesn't appear when I reply to the list !

Regards

Jonathan Lewis

http://www.jlcomp.demon.co.uk

http://www.jlcomp.demon.co.uk/faq/ind_faq.html The Co-operative Oracle Users' FAQ

http://www.jlcomp.demon.co.uk/seminar.html Optimising Oracle Seminar - schedule updated May 1st

Original Message ----- From: "DENNIS WILLIAMS" <DWILLIAMS_at_LIFETOUCH.COM> To: <oracle-l_at_freelists.org> Sent: Monday, June 21, 2004 3:20 AM Subject: RE: DBMS_STATS [resend chomped version]

Jonathan

Very provocative ideas.
1. I'm assuming you are referring to the AUTO sample size when you say don't let the database work out a sample size? Does anybody know how that works? I hate to trust something if I have no idea how it works.

<JPL>

I was think of the two sets of controls, as indicated below.

begin
dbms_stats.gather_table_stats(
  ownname => 'sys',
  tabname => 'col$',
  estimate_percent => DBMS_STATS.AUTO_SAMPLE_SIZE,   method_opt => 'for all columns size auto' );
end;
/

The two auto samples are the defaults in Oracle 10g, by the way - so be careful is you've left the estimate_percent to default in 9i - there may be side-effects. Defaults for 9i are:
estimate_percent => null, -- equals compute method_opt => 'for all columns size 1' -- no histogram

For auto sizing , Oracle may run two or three sample queries to count the number of rows, then create a temporary table to hold as 'guestimate' sample size.

Then there's a complex query for high, low, distinct and so on of each column in the sample. Then Oracle decides (based on this figures) how many columns could do with histograms, and how many buckets each histogram could use, and generated histograms. Choice of histograms is restricted by an (undocumented) table col_usage$ which (like mon_mods$) counts the number of times you have used different types of predicates against a particular column in each table, so a histogram should not be generated for a column that never appears in a WHERE clause.

</JPL>

2. The study of statistics has a branch related to sample size. Pollsters use that to figure out how many random samples can provide a valid estimate. I'm searching for my old college statistics textbook right now.

<JPL>

I think 1076 or 1097 is the number of individuals that Gallup will poll to get a 95% confidence figure of the opinion of a large population. (The source Niall quoted get 1067 as the result for a large population with 95% confidence of being within 3% of the correct answer).

Oracle default estimate of 1043 rows seems to be pretty close to the requirement for 99% confidence that the result is within 4% of correct.

</JPL>

3. I like your idea of creating a table to hold the sample size for each table in the schema. Other columns could hold the reanalyze interval, date of next analyze, etc. This would ensure each table was analyzed appropriately.

4. My boss heard some Oracle expert say that 30% was the best sample size. Does anyone have an idea of the source?

5. Thanks for your common sense statement that if the machine has excess capacity it doesn't do any damage to analyze to the extreme. And thanks always for your clear insights. Just reading your postings is a great education.

<JPL>

Just to emphasize the point, I said:

"you don't OFTEN do much damage"

Changing the sample size upwards could have a negative impact, particularly on columns that have histograms in place.

Dennis Williams
DBA
Lifetouch, Inc.
dwilliams_at_lifetouch.com

Please see the official ORACLE-L FAQ: http://www.orafaq.com

To unsubscribe send email to: oracle-l-request_at_freelists.org put 'unsubscribe' in the subject line.

--
Archives are at http://www.freelists.org/archives/oracle-l/
FAQ is at http://www.freelists.org/help/fom-serve/cache/1.html
-----------------------------------------------------------------

Received on Wed Jun 23 2004 - 07:37:29 CDT