Re: Wrong cardinality estimates if group by (and possibly a hash join)

From: <Laimutis.Nedzinskas_at_seb.lt>
Date: Fri, 27 Apr 2012 12:58:32 +0300
Message-ID: <OF367EADE0.6F07F413-ONC22579ED.0030396D-C22579ED.0036CCB5_at_seb.lt>



>This really isn't an issue with the group by cardinality as the output of
the GBY

I pasted the test query and plan again at the end of the email.

Consider those two fragments, same tables, same number of rows, same predicates:

select with materialize:

|*  3 |    HASH JOIN RIGHT SEMI    |                             |  1917K|
334M|       | 12038   (1)| 00:02:25 |
|   4 |     INDEX FAST FULL SCAN   | I_CL_ID                     |   109K|
640K|       |    69   (2)| 00:00:01 |
|   5 |     TABLE ACCESS FULL      | IP                          |  1917K|
323M|       | 11957   (1)| 00:02:24 |

select without materialize, i.e. transformations allowed:

|* 2 | HASH JOIN | | 53 | 954 | 12341 (1)| 00:02:29 |
| 3 | SORT UNIQUE | | 109K| 640K| 69 (2)| 00:00:01 |
| 4 | INDEX FAST FULL SCAN| I_CL_ID | 109K| 640K| 69 (2)| 00:00:01 |
| 5 | TABLE ACCESS FULL | IP | 1917K| 21M| 11902 (1)| 00:02:23 |

(btw, "Sort unique" seems to have no effect)

IMHO, the later HASH JOIN uses this (10053 trace) to estimate the cardinality:

Join order[2]: CL[CL]#1 IP[IP]#0
...
Outer table: Card: 3.00 Cost: 428.52 Resp: 428.52 Degree: 1 Bytes: 6 Inner table: IP Alias: IP
...
Join Card: 52.63 = outer (3.00) * inner (1917174.00) * sel (9.1504e-06) Join Card - Rounded: 53 Computed: 52.63

This sel (9.1504e-06) seems to be comming from this (density ~ 1/NDV):

Column (#1): CL_ID(NUMBER)

    AvgLen: 6.00 NDV: 109285 Nulls: 0 Density: 9.1504e-06 Min: -4 Max: 2109168

But where in the world from comes the outer (3.00) ??

The CL table is:

Table Stats::
  Table: CL Alias: CL
    #Rows: 109285 #Blks: 1824 AvgRowLen: 124.00   Column (#1): CL_ID(NUMBER)
    AvgLen: 6.00 NDV: 109285 Nulls: 0 Density: 9.1504e-06 Min: -4 Max: 2109168

    Histogram: HtBal #Bkts: 254 UncompBkts: 254 EndPtVals: 255

value 3 is NDV of join predicate column IDV_R_F: 2 - access ("CL"."CL_ID"="IP"."IDV_R_F") Table Stats::
  Table: IP Alias: IP
    #Rows: 1917174 #Blks: 43757 AvgRowLen: 177.00   Column (#9): IDV_R_F(NUMBER)
    AvgLen: 6.00 NDV: 3 Nulls: 0 Density: 2.5603e-07 Min: 0 Max: 2050594     Histogram: Freq #Bkts: 3 UncompBkts: 5104 EndPtVals: 3

Hmmmm... I am lost.

p.s.
If asked why I am so crazy about this whole cardinality estimate issue is this: the case is only a part of a larger case. The full query has a third large table joined. Guess what: Cardinality of merely 53 rows yields a plan with NL index join of this large table... 1917K instead of 53 random table reads make quite a difference... The correct plan (which happens to happen in 10gr2 under some special circumstances) has correct estimate and full scan/hash join of this large table.

My guess: the problem lies in view transformation and cardinality estimates and GBY.



The select used is (with or w/o hint):

explain plan for
with v as (select /*+materialize */* FROM sebim.ip

   WHERE
EXISTS (SELECT 1

                   FROM classif.cl
                  WHERE cl.cl_id = ip.idv_r_f
                  )

)
SELECT ip_id from v
group by ip_id
/

Please consider the environment before printing this e-mail

                                                                                                                                                   
  From:       Greg Rahn <greg_at_structureddata.org>                                                                                                  
                                                                                                                                                   
  To:         Laimutis.Nedzinskas_at_seb.lt                                                                                                           
                                                                                                                                                   
  Cc:         oracle-l_at_freelists.org                                                                                                               
                                                                                                                                                   
  Date:       2012.04.26 17:42                                                                                                                     
                                                                                                                                                   
  Subject:    *** SPAM ***  Re: Wrong cardinality estimates if group by (and possibly a hash join)                                                 
                                                                                                                                                   





This really isn't an issue with the group by cardinality as the output of the GBY isn't input into anything other than the final result set. The difference in the plans has to do with the forced materialization and the change of the join type.

On Thu, Apr 26, 2012 at 5:30 AM, <Laimutis.Nedzinskas_at_seb.lt> wrote:.

      In Oracle 11.2.0.3 and 10.2.0.3 group by produces wrong
      cardinalities.  See
      the inner hash join cardinality bellow.

--
Regards,
Greg Rahn  |  blog  |  twitter  |  linkedin:


--
http://www.freelists.org/webpage/oracle-l
Received on Fri Apr 27 2012 - 04:58:32 CDT

Original text of this message