Tuning SQL with dynamic sampling

articles: 

Another example of what I think of as "the self-tuning database". Setting optimizer_dynamic_sampling=11 can fix many performance problems, without the DBA needing to use his brain at all.

Anyone who tunes SQL should know that developing a good plan is all about the estimated cardinalities. If you don't believe me, read this book Review - It's All About The Cardinalities Anyone who has spent time with me knows that I believe that many fixable bad plan problems can be fixed with extended statistics, I blogged about that here Extended stats
If you do not have the time to create extended stats, there is an alternative: optimizer dynamic sampling. I have long believed that all databases should have this set to 4. Level 4 means that the CBO will always sample if a query has a complex predicate: a predicate involving two or more columns of a table, or columns enclosed in functions. I have seen this deliver a spectacular performance boost, and have never seen it cause a problem. Hoever, level four is not good enough for all situations. Release 12.x introduces a new level, level 11, and this has been back-ported into release 11.2.0.4.
From the docs:

Quote:
11 Use dynamic statistics automatically when the optimizer deems it necessary. The resulting statistics are persistent in the statistics repository, making them available to other queries.

That sounds pretty cool: just let Uncle Oracle get on with it. But does it work? Well, here is one situation (taken from a topic on OTN earlier today) that level 4 could not handle, but level 11 does. First, create a table and gather stats:
orclz>
orclz> create table t1 as select * from all_objects;

Table created.

orclz> exec dbms_Stats.gather_table_stats(user,'t1',-
> estimate_percent=>dbms_stats.auto_sample_size,method_opt=>'for all columns size skewonly')

PL/SQL procedure successfully completed.

orclz> select column_name,num_buckets,histogram from user_tab_col_statistics where table_name='T1';

COLUMN_NAME                    NUM_BUCKETS HISTOGRAM
------------------------------ ----------- ---------------
ORACLE_MAINTAINED                        2 FREQUENCY
EDITIONABLE                              2 FREQUENCY
SHARING                                  3 FREQUENCY
EDITION_NAME                             0 NONE
NAMESPACE                               23 FREQUENCY
SECONDARY                                2 FREQUENCY
GENERATED                                2 FREQUENCY
TEMPORARY                                2 FREQUENCY
STATUS                                   1 FREQUENCY
TIMESTAMP                              254 HYBRID
LAST_DDL_TIME                          254 HYBRID
CREATED                                254 HYBRID
OBJECT_TYPE                             45 FREQUENCY
DATA_OBJECT_ID                         254 HYBRID
OBJECT_ID                                1 NONE
SUBOBJECT_NAME                         254 HYBRID
OBJECT_NAME                            254 HYBRID
OWNER                                   32 FREQUENCY

18 rows selected.

orclz>

Note that I am using auto sample size, which is a requirement to get the new, and much more accurate, HYBRID histograms. They do not help, though, when I run a query that includes an analytical function (this is the example from the OTN topic). First with level 4 sampling, and the estimate is ludicrous:
orclz>
orclz> set autot on exp
orclz> alter session set optimizer_dynamic_sampling=4;

Session altered.

orclz> SELECT object_type
  2       , object_name
  3       , object_rank
  4  FROM (
  5    SELECT object_type
  6         , object_name
  7         , ROW_NUMBER() OVER (PARTITION BY object_type ORDER BY created) object_rank
  8    FROM   t1
  9  )
 10  WHERE object_name = 'DUAL';

OBJECT_TYPE     OBJECT_NAME                    OBJECT_RANK
--------------- ------------------------------ -----------
SYNONYM         DUAL                                     1
TABLE           DUAL                                    62


Execution Plan
----------------------------------------------------------
Plan hash value: 2273146475

------------------------------------------------------------------------------------
| Id  | Operation           | Name | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      | 90329 |  8115K|       |  1402   (1)| 00:00:01 |
|*  1 |  VIEW               |      | 90329 |  8115K|       |  1402   (1)| 00:00:01 |
|   2 |   WINDOW SORT       |      | 90329 |  3704K|  4616K|  1402   (1)| 00:00:01 |
|   3 |    TABLE ACCESS FULL| T1   | 90329 |  3704K|       |   424   (1)| 00:00:01 |
------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("OBJECT_NAME"='DUAL')

orclz>

And now try with automatic dynamic sampling:
orclz>
orclz> alter session set optimizer_dynamic_sampling=11;

Session altered.

orclz> SELECT object_type
  2       , object_name
  3       , object_rank
  4  FROM (
  5    SELECT object_type
  6         , object_name
  7         , ROW_NUMBER() OVER (PARTITION BY object_type ORDER BY created) object_rank
  8    FROM   t1
  9  )
 10  WHERE object_name = 'DUAL';

OBJECT_TYPE     OBJECT_NAME                    OBJECT_RANK
--------------- ------------------------------ -----------
SYNONYM         DUAL                                     1
TABLE           DUAL                                    62


Execution Plan
----------------------------------------------------------
Plan hash value: 2273146475

------------------------------------------------------------------------------------
| Id  | Operation           | Name | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      |     2 |   184 |       |  1402   (1)| 00:00:01 |
|*  1 |  VIEW               |      | 90329 |  8115K|       |  1402   (1)| 00:00:01 |
|   2 |   WINDOW SORT       |      | 90329 |  3704K|  4616K|  1402   (1)| 00:00:01 |
|   3 |    TABLE ACCESS FULL| T1   | 90329 |  3704K|       |   424   (1)| 00:00:01 |
------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("OBJECT_NAME"='DUAL')

Note
-----
   - dynamic statistics used: dynamic sampling (level=AUTO)

orclz>

Not bad. ie, perfect. Clearly, level 11 is doing a much better job than level 4 in this specially devised case, but you will find it works well everywhere. It must now be best practice to use level 11 in all databases.
--
John Watson
Oracle Cetified Master DBA
Skillbuilders Inc

ps - setting something to 11 is reminscent of that awful movie, This Is Spinal Tap. Why is it awful? Because it is so true. I won't tell you how I know this.

Comments

As soon as I saw you suggest setting it to 11, I knew there needed to be a Spinal Tap reference. Sadly for me you got there first, but it is yet another demonstration of the truism that you're articles leave no important stone unturned.

Nice demo (again), although I needed to look back and forth between the plans to understand your point. I was expecting the plans (and therefore performance) to be different and initially thought you'd made a mistake when they came out the same. Your point - of course - is that the ESTIMATE is different. If this was embedded in a larger SQL with joins and/or sub-queries then it COULD actually result in a bad plan, which in turn could result in poorer performance.