Home » RDBMS Server » Performance Tuning » Attempt to de-mystify DBMS_STATS ( Any Takers ? )
icon7.gif  Attempt to de-mystify DBMS_STATS ( Any Takers ? ) [message #159613] Mon, 20 February 2006 10:43 Go to next message
Navin Prabhu
Messages: 2
Registered: October 2003
Junior Member
OK ... The problem

One large table which has about 400 million rows ( of course partitioned )... Gather Stats on this table takes about 3 - 3.5 hrs... The table grows every 3 weeks around 10%. The table is cleaned out and reloaded completely every three weeks... ( Don't ask... obvious design flow but unrelated to question at hand )

The Question

I guess my question is once I gather stats on the table do I need to gather stats right from the begining every time I reload it ?

Some information on the table.

The existing rows in the table never change. Only that there will be 10% additional rows.

So with this info is it safe to assume that if I keep the statistics from the load three weeks before, that the queries on the new table ( New size :Original Rows + 10% of original ) will not have any noticeable degradation in performance. As is obvious I am trying to save the gather stats time. One more thing to note is that this table is not in an OLTP ebvironment. It is in a Data warehouse type environment. Hence, it is not consistently hit upon for small chunks of data.

Any replies are most appreciated.... Some of the responses I have so far...

From my company DBAs :-> It is a blasphemy to change a table and not gather statistics from scratch...

From one of the Company DBAs ( After I gave him an oath to never disclose his name Laughing ) :-> If the original rows in the table never change and there is only a 10% increase, you need not bother to gather stats but in the next load when the effective incrase is 20% you will need to gather stats. ( That is still fine as I can atleast save alternate gather stats times )...

There is a mixed reaction to this issue and I cannot get a clear consensus... I don't expect to get one here either but it is worth a shot Razz

Peace,
Navin
Re: Attempt to de-mystify DBMS_STATS ( Any Takers ? ) [message #159623 is a reply to message #159613] Mon, 20 February 2006 11:24 Go to previous messageGo to next message
Mahesh Rajendran
Messages: 10708
Registered: March 2002
Location: oracleDocoVille
Senior Member
Account Moderator
it all depends.
Unless you expect the new records are going to change your plan drastically, you no need to gather stats.
What kind of environment is this? warehouse?
for a certain queries, partition level statistics may be useful.
Re: Attempt to de-mystify DBMS_STATS ( Any Takers ? ) [message #159672 is a reply to message #159623] Mon, 20 February 2006 20:50 Go to previous messageGo to next message
rleishman
Messages: 3728
Registered: October 2005
Location: Melbourne, Australia
Senior Member
I agree with Mahesh, it depends.

If the execution plans of ALL your queries are stable, then there is no need to EVER gather stats again.

Two warnings though:
1. If you submit a query like "...AND txn_dt > 'const val'" then the column histograms from your stale old stats might tell you there are no rows for this range. CBO could choose an index access in this case. But if there are 10M rows in reality, an index-scan would be a bad idea.
If this is a concern, gather stats at every load or do not analyze columns. Be careful.

2. Oracle 10g is a clever thing. It is configured by default to capture a summary of changes to a table since the last analyze. If it sees that a table has been truncated since the analyze, then I can't guarantee the CBO will not take appropriate action (which may include dynamically sampling the table during the parse). This is pure supposition - I have no hard evidence to back it up.

DBAs are understandably anal-retentive about stats. They cannot control the data or the queries, so the only safe advice is to always analyze.
_____________
Ross Leishman
Re: Attempt to de-mystify DBMS_STATS ( Any Takers ? ) [message #159773 is a reply to message #159613] Tue, 21 February 2006 07:53 Go to previous message
smartin
Messages: 1803
Registered: March 2005
Location: Jacksonville, Florida
Senior Member
Just wanted to add...

It takes over 3 hours to gather the stats on a single table, did I read that right? Are you using the degree parameter to gather in parallel? If not, you should consider that. Is your table partitioned? At 400 million rows, I'd think it should be.

Also, you don't have to compute statistics for every row. On a table that big, you can probably get away with a fairly low estimate_percent. Try tests with various percentages to see if there is any appreciable difference in either

a) the actual statistics and histos gathered
b) plans of your most important / expensive queries

Suggested estimate percentages would be 2, 5, 10, 15, 20, 30, 100. Start with two tests, at 2 percent and 100 percent. If no difference, then stop. Otherwise, increase your percentage slowly and try again. Heck with 400 million, you could probably get away with 1%.

Oh, and if you data change is predictable, you can potentially calculate the statistic changes yourself, and manually modify the dictionary via calls to dbms_stats.set_stats. But no need to bother if the above approaches knock your estimate time down to a negligable amount.
Previous Topic: Unique constraint versus unique index
Next Topic: formula for value of INITIAL
Goto Forum:
  


Current Time: Sat Nov 23 14:19:54 CST 2024