performance imapct on stats collection [message #612795] |
Thu, 24 April 2014 17:27 |
ravikanth_b
Messages: 42 Registered: November 2007 Location: Bay Area, CA
|
Member |
|
|
Hello experts!
Need your suggestion on this one.
We have several "big" tables in our data warehouse. After each ETL we gather stats on the tables. Each of the table has partition and sub-partition. The complete process of gathering stats is taking hours to finish.
currently, this is what we do for each table.
EXEC DBMS_STATS.GATHER_TABLE_STATS(<SCHEMA_NAME>,<TABLE_NAME>);
Do you recommend us doing the following as recommended at
https://blogs.oracle.com/optimizer/entry/maintaining_statistics_on_large_partitioned_tables
EXEC DBMS_STATS.SET_TABLE_PREFS(<SCHEMA_NAME>,<TABLE_NAME>,'INCREMENTAL','TRUE');
EXEC DBMS_STATS.GATHER_TABLE_STATS(<SCHEMA_NAME>,<TABLE_NAME>);
|
|
|
|
Re: performance imapct on stats collection [message #612798 is a reply to message #612796] |
Thu, 24 April 2014 18:10 |
ravikanth_b
Messages: 42 Registered: November 2007 Location: Bay Area, CA
|
Member |
|
|
In reply to BlackSwan:
The application is part of a product offering.
The number of partitions depends on the clients installation and their data variance.
Typically, there could be 20 to 25 partitions/about 5 subpartitions and rows running into several millions in each partition with considerable column width.
For smaller customers it could be 5 to 10 partitions/about 2-3 subpartitons.
|
|
|
|
Re: performance imapct on stats collection [message #612800 is a reply to message #612799] |
Thu, 24 April 2014 18:32 |
ravikanth_b
Messages: 42 Registered: November 2007 Location: Bay Area, CA
|
Member |
|
|
That is the hard part. We don't know customers data. Meaning, the product should be able to cater to least common denominator. I see no issues with smaller data sets. But some of our customers started complaining about longer processing times. Building stats is taking as much time as ETL, which is not acceptable for them.
|
|
|
|
|
Re: performance imapct on stats collection [message #612803 is a reply to message #612801] |
Thu, 24 April 2014 19:37 |
ravikanth_b
Messages: 42 Registered: November 2007 Location: Bay Area, CA
|
Member |
|
|
The reason we analyze after each ETL because, the aggregates and other reporting applications start querying this data as soon as ETL is done. We have see issue with these queries because of stale statistics. For example, partition A would have about 10k records for the first ETL, 2nd ETL might have a million records or more. This would cause huge impact on reporting queries.
|
|
|
|