RE: Which one is good in terms of performance

From: Jacques Kilchoer <Jacques.Kilchoer_at_quest.com>
Date: Mon, 27 Dec 2004 10:48:13 -0800
Message-ID: <B5C5F99D765BB744B54FFDF35F60262109F87BAF@irvmbxw02>

If I read your table example right, then dbms_stats and block sampling = gave worse results for 50% sampling than it did for 20%, .01%, .000001% = sampling?

-----Original Message-----
From: oracle-l-bounce_at_freelists.org =
[mailto:oracle-l-bounce_at_freelists.org] On Behalf Of Tim Gorman Sent: vendredi, 24. d=E9cembre 2004 07:23 To: 'Oracle-L (E-mail)
Subject: Re: Which one is good in terms of performance

Seema and Wolfgang,
Here is some info I generated recently, to add to the discussion...

My apologies for the length of the post, but feel free to delete if not interested...

Accuracy when calculating table statistics with different sample sizes: >=20
These are the cumulated averaged results from comparing statistics = gathered
using both DBMS_STATS and ANALYZE TABLE against the baseline of = performing a
COMPUTE using DBMS_STATS. For the first round, I concentrated on table statistics, and ignored column statistics (a.k.a. histograms) by using = =B3FOR
ALL COLUMNS SIZE 1=B2.
=20
I tested against four major tables in the Oracle E-Business Suite ERP system, of varying sizes (from small to humongous).

For each of the table statistics gathered for the cost-based optimizer = (i.e.
the columns NUM_ROWS, BLOCKS, and AVG_ROW_LEN in DBA_TABLES), displayed = is
the percentage deviation from the baseline of a COMPUTE using = DBMS_STATS:
=20

                                                   Nbr of
                                                   tables % Deviation %
Deviation % Deviation
LABEL                                             sampled Nbr of Blks =

Nbr of
Rows Avg Row Len

-------------------------------------------------- ------- -----------
----------- -----------
DBMS_STATS: COMPUTE, FOR ALL COLUMNS SIZE 1              4       0.00
0.00        0.00
DBMS_STATS: 10.000000%, FOR ALL COLUMNS SIZE 1           4       0.00
0.02        0.00
DBMS_STATS: 05.000000%, FOR ALL COLUMNS SIZE 1           4       0.00
0.03        0.00
DBMS_STATS: 01.000000%, FOR ALL COLUMNS SIZE 1           4       0.00
0.10        0.00
DBMS_STATS: 50.000000%, FOR ALL COLUMNS SIZE 1           4       0.00
0.22        0.00
DBMS_STATS: 10.000000% BLKSAMP, FOR ALL COLUMNS SI       4       0.00
0.15        0.11
DBMS_STATS: 00.010000%, FOR ALL COLUMNS SIZE 1           4       0.00
0.46        0.00
DBMS_STATS: 20.000000%, FOR ALL COLUMNS SIZE 1           4       0.00
0.58        0.00
DBMS_STATS: 05.000000% BLKSAMP, FOR ALL COLUMNS SI       4       0.00
0.91        0.11
DBMS_STATS: 00.000001%, FOR ALL COLUMNS SIZE 1           4       0.00
1.12        0.00
DBMS_STATS: 20.000000% BLKSAMP, FOR ALL COLUMNS SI       4       0.00
1.35        0.00
DBMS_STATS: 00.010000% BLKSAMP, FOR ALL COLUMNS SI       4       0.00
1.47        0.00
DBMS_STATS: 00.000001% BLKSAMP, FOR ALL COLUMNS SI       4       0.00
1.58        0.00
DBMS_STATS: 50.000000% BLKSAMP, FOR ALL COLUMNS SI       4       0.00
2.03        0.00
DBMS_STATS: 01.000000% BLKSAMP, FOR ALL COLUMNS SI       4       0.00
4.61        0.00
ANALYZE: 05.000000%, FOR ALL COLUMNS SIZE 1              4       0.00
0.42        7.56
ANALYZE: COMPUTE, FOR ALL COLUMNS SIZE 1                 4       0.00
0.00        8.34
ANALYZE: 50.000000%, FOR ALL COLUMNS SIZE 1              4       0.00
0.00        8.34
ANALYZE: 20.000000%, FOR ALL COLUMNS SIZE 1              4       0.00
0.62        7.73
ANALYZE: 10.000000%, FOR ALL COLUMNS SIZE 1              4       0.00
1.44        7.15
ANALYZE: 01.000000%, FOR ALL COLUMNS SIZE 1              4       0.00
7.24        8.62
ANALYZE: 00.010000%, FOR ALL COLUMNS SIZE 1              4       0.00
7.24        8.62
ANALYZE: 00.000001%, FOR ALL COLUMNS SIZE 1              4       0.00
7.24        8.62

=20
The sorting in this report is by the summarization of the three = percentage
columns, from "most accurate" results to "least accurate" results. =20
A couple things to note about the results shown here:

The value in BLOCKS is always completely accurate, regardless of = sample size, because the value is not calculated at all, but is obtained = directly from the segment header
The very worst calculations from the DBMS_STATS package are = (apparently) more accurate than the very best calculations from ANALYZE command
> * The biggest cause seems to be vastly different results on =
AVG_ROW_LEN >> * However, the method of calculating AVG_ROW_LEN by DBMS_STATS is = easily >> exposed by performing SQL trace >>> * Method is: select avg(vsize(col-1)) + avg(vsize(col-2)) + ... + >>> avg(vsize(col-N)) from table-name ...
> * The method of calculating AVG_ROW_LEN by the ANALYZE command cannot =
be > determined using SQL trace
The worst deviation with the results from DBMS_STATS when calculating NUM_ROWS is 4.61%, while the worst deviation with the results from = ANALYZE is almost twice that, at 7.24%
The results from ANALYZE TABLE ... SAMPLE 1 PERCENT, SAMPLE 0.01 = PERCENT, and SAMPLE 0.000001 PERCENT are exactly the same, indicating the lowest valid sampling value for the ANALYZE command is 1 PERCENT
> * DBMS_STATS allows 0.000001% sampling
> * DBMS_STATS also permits the use of BLOCK SAMPLING, which ANALYZE =
does not =20 Accuracy when calculating index statistics with different sample sizes:

In contrast, where even the very worst results obtained by the smallest sample of table statistics is just 7.24% off from performing a COMPUTE operation, the results from using smaller samples when calculating index statistics get much worse very quickly as the sampling size decreases.

Here is the results of testing against the nine indexes of the FND_CONCURRENT_REQUESTS table, which vary widely in terms of density, = data
characteristics, etc. The results are being sorted by the summation of = the
percentage columns, so that the "most accurate" shows first and the = "least
accurate" shows last...

                               Nbr of                             % =

Dev'n %
Dev'n

                              indexes % Dev'n   % Dev'n   % Dev'n Avg =
Leaf
Avg Data % Dev'n       % Dev'n
LABEL                         sampled  BLevel Leaf Blks Dist Keys =

Blks/Key
Blks/Key Clu Fac # Idx Entries

------------------------------ ------- ------- --------- --------- =

------- ------------- DBMS_STATS: INDEX COMPUTE 9 0.00 0.00 0.00 = 0.00 0.00 0.00 0.00 DBMS_STATS: INDEX 05.000000% 9 0.00 0.00 22.13 = 14.85 16.40 0.44 0.00 DBMS_STATS: INDEX 20.000000% 9 0.00 0.00 25.39 = 24.29 24.22 0.44 0.00 DBMS_STATS: INDEX 10.000000% 9 0.00 0.00 25.56 = 16.61 33.89 0.44 0.00 DBMS_STATS: INDEX 00.010000% 9 0.00 0.00 27.45 = 20.13 37.91 0.44 0.00 DBMS_STATS: INDEX 50.000000% 9 0.00 0.00 26.42 = 21.53 38.65 0.44 0.00 DBMS_STATS: INDEX 01.000000% 9 0.00 0.00 31.17 = 27.90 28.18 0.44 0.00 DBMS_STATS: INDEX 00.000001% 9 0.00 0.00 27.81 = 22.46 40.37 0.44 0.00 ANALYZE: INDEX COMPUTE 9 0.00 595.30 0.00 = 239.88 0.00 0.00 0.00 ANALYZE: INDEX 50.000000% 9 0.00 596.27 23.55 = 329.31 13.85 17.29 17.60 ANALYZE: INDEX 01.000000% 9 8.33 530.13 32.28 = 353.86 23.04 32.87 34.08 ANALYZE: INDEX 00.010000% 9 8.33 530.13 32.28 = 353.86 23.04 32.87 34.08 ANALYZE: INDEX 00.000001% 9 8.33 530.13 32.28 = 353.86 23.04 32.87 34.08 ANALYZE: INDEX 20.000000% 9 0.00 596.27 32.97 = 551.43 58.40 19.22 19.51 ANALYZE: INDEX 05.000000% 9 0.00 596.27 40.77 = 854.46 59.40 24.67 23.23 ANALYZE: INDEX 10.000000% 9 0.00 596.27 40.86 = 792.72 141.73 29.25 29.07

First of all, there is the huge deviation the calculations of = LEAF_BLOCKS by
DBMS_STATS and by ANALYZE. Each command (DBMS_STATS or ANALYZE) is consistent, so one must be wrong and the other right. Unfortunately, in this case, it is the DBMS_STATS package that is uniformly wrong on this, because the results produced by the ANALYZE INDEX commands are = corroborated
by similar results from the ANALYZE INDEX ... VALIDATE STRUCTURE = command,
the DBMS_SPACE package, and the values stored in the DBA_SEGMENTS view. = I
searched MetaLink for a bug report related to DBMS_STATS and = LEAF_BLOCKS,
but could not find one. I intend to retest this against 9.2.0.5 and 10.1.0.3, when I get a chance. I=B9ll also test it against 9.2.0.1...

A few more things to note about the results shown here:

DBMS_STATS.GATHER_INDEX_STATS and ANALYZE INDEX produce the exact = same results when COMPUTEing
> * except the LEAF_BLOCKS anomaly
>> * which also apparently causes an anomaly with = AVG_LEAF_BLOCKS_PER_KEY
the ANALYZE INDEX ... COMPUTE results are corroborated by other utilities, so the ANALYZE INDEX command is more accurate in v9.2.0.4 = than the DBMS_STATS.GATHER_INDEX_STATS command
The results from both DBMS_STATS and ANALYZE are wildly worse when performing an estimated sample, instead of compute

That last point is the most important. Nonetheless, since the ANALYZE = INDEX
... COMPUTE command is clearly the most accurate, here is the same = result
data, this time using the results from ANALYZE INDEX .. COMPUTE = STATISTICS
as the baseline instead of DBMS_STATS:
>=20

                               Nbr of                             % =

Dev'n %
Dev'n

                              indexes % Dev'n   % Dev'n   % Dev'n Avg =
Leaf
Avg Data % Dev'n       % Dev'n
LABEL                         sampled  BLevel Leaf Blks Dist Keys =

Blks/Key
Blks/Key Clu Fac # Idx Entries

------------------------------ ------- ------- --------- --------- =

------- ------------- ANALYZE: INDEX COMPUTE 9 0.00 0.00 0.00 0.00 0.00 0.00 0.00 ANALYZE: INDEX 50.000000% 9 0.00 0.13 23.55 = 15.18 13.85 17.29 17.60 DBMS_STATS: INDEX COMPUTE 9 0.00 85.04 0.00 = 41.30 0.00 0.00 0.00 DBMS_STATS: INDEX 05.000000% 9 0.00 85.04 22.13 = 38.50 16.40 0.44 0.00 DBMS_STATS: INDEX 20.000000% 9 0.00 85.04 25.39 = 36.70 24.22 0.44 0.00 ANALYZE: INDEX 01.000000% 9 8.33 8.42 32.28 = 36.11 23.04 32.87 34.08 ANALYZE: INDEX 00.010000% 9 8.33 8.42 32.28 = 36.11 23.04 32.87 34.08 ANALYZE: INDEX 00.000001% 9 8.33 8.42 32.28 = 36.11 23.04 32.87 34.08 DBMS_STATS: INDEX 01.000000% 9 0.00 85.04 31.17 = 36.24 28.18 0.44 0.00 DBMS_STATS: INDEX 10.000000% 9 0.00 85.04 25.56 = 38.07 33.89 0.44 0.00 ANALYZE: INDEX 20.000000% 9 0.00 0.13 32.97 = 53.44 58.40 19.22 19.51 DBMS_STATS: INDEX 50.000000% 9 0.00 85.04 26.42 = 37.22 38.65 0.44 0.00 DBMS_STATS: INDEX 00.010000% 9 0.00 85.04 27.45 = 37.53 37.91 0.44 0.00 DBMS_STATS: INDEX 00.000001% 9 0.00 85.04 27.81 = 37.05 40.37 0.44 0.00 ANALYZE: INDEX 05.000000% 9 0.00 0.13 40.77 = 107.16 59.40 24.67 23.23 ANALYZE: INDEX 10.000000% 9 0.00 0.13 40.86 = 95.99 141.73 29.25 29.07

Again, this perspective shows the two types of COMPUTE largely agreeing = with
each other (except for the LEAF_BLOCKS and the AVG_LEAF_BLOCKS_PER_KEY = value
derived from it), with all other results showing dramatically less = accurate
due to decreases in the sample size.

For indexes, clearly computing yields the best results, while a bug in DBMS_STATS appears to be skewing the LEAF_BLOCK results in v9.2.0.4. = making
the ANALYZE INDEX command the best alternative for that version.

So, from this data, which is admittedly sparse and limited (needs more testing on a wider range of versions and applications), I think that one = can
conclude:

Perform the gathering of table/column and index statistics separately = from one another
> * Gather statistics on tables and columns in one operation using
> DBMS_STATS.GATHER_TABLE_STATS >> * leave the CASCADE parameter at the default of FALSE >> * use a low ESTIMATE_PERCENT and even use BLOCK_SAMPLE=3D>TRUE, if = necessary
> * Gather statistics on indexes in another operation to either ANALYZE =
INDEX or > DBMS_STATS.GATHER_INDEX_STATS >> * only do COMPUTE
There is a bug in DBMS_STATS.GATHER_INDEX_STATS in 9.2.0.4 regarding = the calculation of LEAF_BLOCKS that needs to be explored further...

>From this, I don=B9t think that one can conclude that it is better to = ANALYZE
INDEX ... COMPUTE instead of
DBMS_STATS.GATHER_INDEX_STATS(ESTIMATE_PERCENT=3D>NULL), because I = don=B9t know
if the anomaly found exists outside of 9.2.0.4...

Hope this helps...

--
http://www.freelists.org/webpage/oracle-l

Received on Mon Dec 27 2004 - 12:47:44 CST