Gathering CBO Stats [message #172484] |
Tue, 16 May 2006 16:20 |
smartin
Messages: 1803 Registered: March 2005 Location: Jacksonville, Florida
|
Senior Member |
|
|
I'm at a new place using oracle 9i. They have some sort of vendor COTS applications with lots of tables (2500 or so). Nice box with plenty of cpu power to spare. I'm testing ways to gather stats (since they currently use rule database-wide).
Tables are all in a single schema, so the easiest way to get up and running is to use gather_schema stats, and schedule that however I want. I can even throw a nice degree in there.
But that will still backlog, even in parallel with degree, because there are many small tables to wade through, and actually not that many big ones (or what they consider to be huge, being greater than one million rows).
Anyway, so I'm just pondering other options, and was curious as to what any of you had done before in a similar situation. I haven't compiled or tested this yet or anything, but I'm thinking of breaking the tables up into 8 buckets using ntile, and then submitting 8 jobs to run right away, with each one calling a procedure with a bucket number 1-8. Oh and the schema owner.
This procedure would do a select of all tables for that schema with the ntile function, and since the order by clause would be deterministic (table names for a specific owner), then the buckets would line up the same in each procedure call / job submission (assuming no tables were added/dropped during the operation, which I will see to it is the case).
Basically it is do it yourself parallelism. Anyone have any comments or would like to point out any flaws in the system? As I said I haven't even compiled, much less tested the below code, but I am planning on it, and will compare its runtime to the runtime of the standard gather_schema_stats call.
create or replace procedure gather_bucket_stats(p_owner varchar2, p_bucket number)
is
begin
for t in (select table_name from (
select table_name, ntile(8) over (order by table_name) bucket
from dba_tables where owner = p_owner) where bucket = p_bucket)
loop
dbms_stats.gather_table_stats(p_owner, t.table_name, cascade => true,
method_opt => 'for all columns size 254');
end loop;
end;
/
show errors
create or replace procedure gather_schema_stats(p_owner varchar2)
is
jobno number;
what varchar2(100);
begin
for i in 1 .. 8 loop
what := 'gather_bucket_stats(p_owner, ' || i || ');';
dbms_job.submit(jobno, what);
end loop;
commit;
end;
/
show errors
|
|
|
|
Re: Gathering CBO Stats [message #172734 is a reply to message #172491] |
Wed, 17 May 2006 17:41 |
andrew again
Messages: 2577 Registered: March 2000
|
Senior Member |
|
|
I would definitely not generate histograms unless you know your data is significantly skewed. You'll have a really large amount of histogram data lying around which in many cases will add no value and have tons of overhead to generate. Additionally I'd start by limiting histograms to indexed columns and with much smaller bucket sizes. As far as the parallelizing goes - setting the degree of parallelism in dbms_stats.gather_table_stats may be as good an alternative. If you turn table monitoring on, you can run stats gathering in AUTO or STALE mode to only update state once they've changed significantly...
|
|
|
Re: Gathering CBO Stats [message #172920 is a reply to message #172484] |
Thu, 18 May 2006 11:06 |
smartin
Messages: 1803 Registered: March 2005 Location: Jacksonville, Florida
|
Senior Member |
|
|
Thanks for the thoughts...
I wonder what would happen if I set the degree to a higher than needed value. My thought though is that for the tiny tables, having a degree of anything would actually just slow things down, so the higher the degree the more overhead. But that may not be a factor, I'm just guessing on that.
Vast majority of tables are in the small range (less than 100k rows). I'm running some tests now with the above procedure, and will try out different numbers of degree param as well as number of buckets I use to see if there is any improvement.
Trying to get this initial gathering down in duration, as using the gather schema stats took it like 12 hours, and curious if this approach will help any. On the first run it didn't, but I think part of that is that the ordering was done by table name, and it seems that their bigger tables all have similar names, so they were all being lumped into a single bucket.
So I've changed that...anyway will post back if this approach really does help or if it isn't worth it and just stick with the gather schema and be done with it.
Thing is, there is a ton of spare capacity, at least cpu and ram wise. No point in it sitting idle...
[Updated on: Thu, 18 May 2006 11:11] Report message to a moderator
|
|
|
Re: Gathering CBO Stats [message #173118 is a reply to message #172484] |
Fri, 19 May 2006 08:34 |
smartin
Messages: 1803 Registered: March 2005 Location: Jacksonville, Florida
|
Senior Member |
|
|
Just initial test results in, nothing concrete enough to prove anything on a general basis, or even in this specific case, but it "looks like" the multiple job approach is cutting the gather time in half...from 11 or 12 hours to 5 or 6. But, it could be a matter of simply finding the right value for degree in dbms_stats.gather_schema_stats that would get the same result.
|
|
|