Re: hash group by estimates wrong cardinality with statistics

From: Martin Preiss <mtnpreiss_at_gmx.de>
Date: Thu, 30 Apr 2015 08:24:55 +0200
Message-ID: <5541CAB7.8010009_at_gmx.de>



Hello,

some comments without a particular order:

  • replaying your example in 12.1.0.2 I get similar results: the same plan for both queries and a hash group by cardinality of 1 with statistics and equal to the cardinality of the preceding hash join without statistics.
  • given that the operation returns 19 rows in my system both estimates are questionable: maybe the 1 is even the better guess.
  • as far as I know 11g introduced some changes in the calculation of the cardinality of group by operations. At least that's what Alexandr Antonov mentioned in http://antonoal.blogspot.de/2012/12/11g-group-by-selectivity-improvements.html in 2012. According to this article the change is controlled by the parameter _optimizer_improve_selectivity.
  • in older releases the cardinality was determined by the following strategy: "In general, the optimizer estimates the number of distinct combinations of N columns by multiplying the individual num_distinct values, and then dividing by the square root of 2 (N-1) times." (Jonathan Lewis: Cost Based Oracle Fundamentals, p. 388)
  • taking a look at a CBO trace (event 10053) for both operations I see that the the plan with statistics contains the following lines: SINGLE TABLE ACCESS PATH Single Table Cardinality Estimation for T101[B] ... Grouping column cardinality [ OWNER] 19 ...
    Now joining: T101[B]#0
    NL Join ... Grouping column cardinality [ OWNER] 1
  • so the CBO has the correct cardinality for the single table, but for the join the value drops to 1 - and I don't remember (or never knew) how the optimizer calculates the number of distinct combinations in this situation.
  • the Grouping column cardinality are not in the plan with dynamic sampling.

I know there are a lot of gaps in this list of observations... - maybe someone else can close them...

Regards

Martin Preiss

Am 30.04.2015 um 00:59 schrieb Ls Cheng:
> Hi
>
> I have noticed in a simple query between two tables when table have no
> statistics dynamic samping estimates good cardinality for hash group
> by however when statistics are gathered ithe CBO estimates one row for
> the hash gruop by operation.
>
> This is a test case:
>
> drop table t100;
> drop table t101;
>
> create table t100 as select * from dba_source where 1 = 0;
> create table t101 as select * from dba_source where 1 = 0;
>
> insert into t100 select * from dba_source;
> delete t100 where owner = 'SYS' and rownum <49457;
> insert into t101 select * from dba_source where owner = 'SYS' and
> rownum < 100001;
>
> select a_owner, count(*)
> from (
> select a.owner a_owner, a.name <http://a.name>, b.name
> <http://b.name> b_name
> from t100 a left outer join t101 b
> on a.owner = b.owner
> and a.name <http://a.name> = b.name <http://b.name>
> and a.type = b.type
> and a.line = b.line)
> where b_name is null
> group by a_owner;
>
> no statistics gives this plan:
>
> Plan hash value: 908433588
>
> ----------------------------------------------------------------------------------------
> | Id | Operation | Name | Rows | Bytes |TempSpc| Cost
> (%CPU)| Time |
> ----------------------------------------------------------------------------------------
> | 0 | SELECT STATEMENT | | 258K| 27M| | 2929
> (1)| 00:00:36 |
> *| 1 | HASH GROUP BY | | 258K| 27M| | 2929
> (1)| 00:00:36 |*
> |* 2 | FILTER | | | | |
> | |
> |* 3 | HASH JOIN RIGHT OUTER| | 258K| 27M| 6400K| 2915
> (1)| 00:00:35 |
> | 4 | TABLE ACCESS FULL | T101 | 97791 | 5252K| |
> 582 (1)| 00:00:07 |
> | 5 | TABLE ACCESS FULL | T100 | 258K| 13M| | 1200
> (1)| 00:00:15 |
> ----------------------------------------------------------------------------------------
>
> Predicate Information (identified by operation id):
> ---------------------------------------------------
>
> 2 - filter("B"."NAME" IS NULL)
> 3 - access("A"."LINE"="B"."LINE"(+) AND "A"."TYPE"="B"."TYPE"(+) AND
> "A"."NAME"="B"."NAME"(+) AND "A"."OWNER"="B"."OWNER"(+))
>
> Note
> -----
> - dynamic sampling used for this statement (level=2)
>
> with statistics
>
> Execution Plan
> ----------------------------------------------------------
> Plan hash value: 908433588
>
> ----------------------------------------------------------------------------------------
> | Id | Operation | Name | Rows | Bytes |TempSpc| Cost
> (%CPU)| Time |
> ----------------------------------------------------------------------------------------
> | 0 | SELECT STATEMENT | | 1 | 68 | |
> 2575 (1)| 00:00:31 |
> *| 1 | HASH GROUP BY | | 1 | 68 | |
> 2575 (1)| 00:00:31 |*
> |* 2 | FILTER | | | | |
> | |
> |* 3 | HASH JOIN RIGHT OUTER| | 220K| 14M| 4304K|
> 2563 (1)| 00:00:31 |
> | 4 | TABLE ACCESS FULL | T101 | 100K| 3125K| |
> 650 (1)| 00:00:08 |
> | 5 | TABLE ACCESS FULL | T100 | 220K| 7746K| |
> 1199 (1)| 00:00:15 |
> ----------------------------------------------------------------------------------------
>
> Predicate Information (identified by operation id):
> ---------------------------------------------------
>
> 2 - filter("B"."NAME" IS NULL)
> 3 - access("A"."LINE"="B"."LINE"(+) AND "A"."TYPE"="B"."TYPE"(+) AND
> "A"."NAME"="B"."NAME"(+) AND "A"."OWNER"="B"."OWNER"(+))
>
>
> This is 11.2.0.4 in Linux. Any lights?
>
> Thanks
>
>

--
http://www.freelists.org/webpage/oracle-l
Received on Thu Apr 30 2015 - 08:24:55 CEST

Original text of this message