Re: Using bind variables for multiple values

From: Robert Klemme <shortcutter_at_googlemail.com>
Date: Mon, 24 Nov 2008 22:15:24 +0100
Message-ID: <6p0jreF5rm09U1@mid.individual.net>


On 24.11.2008 17:26, Jonathan Lewis wrote:
> "Robert Klemme" <shortcutter_at_googlemail.com> wrote in message
> news:6c7ef223-0934-4cf9-a44b-ad268937b86f_at_j32g2000yqn.googlegroups.com...

>> I wondered: why is it that the join version does a full table scan but
>> the IN ( subselect ) version does not?  I did a 10053 trace and found
>> out that the same access strategy which leads to the INDEX UNIQUE SCAN
>> in the case of IN ( subselect ) is costed much higher (~ factor of 30)
>> in the JOIN case.  BASE STATISTICS are identical in both cases and it
>> seems the culprit is the join cardinality (255 vs. 8168) which seems
>> derived from the outer table cardinality. The big myth to me is: where
>> does the CBO get the cardinality of 255 from all of a sudden since it
>> originally assumed 8168?

>
> If you check the cardinality, you will see that it is 255.25, which
> is EXACTLY 8168 / 32.

Yes that's what the report for the join analysis said. I didn't bother to quote the fractional digits in the text because I did not notice this relationship. Thanks for the heads up!

> When Oracle doesn't know what the real statistics are, it uses some
> arbitrary estimate. In this case, because the IN approach transforms
> into an inline 'select distinct column_value' and Oracle doesn't know
> how many distinct values there are, I guess is simply assumes 32.
>
> Just as 8168 is a side effect of the default block size, this 32 may also
> be based on the block size. So if you've got a database with a
> different block size handy, you might like to test what happens there.

Hm... I'll see whether I find the time to retest with a different non standard sized tablespace.

Interestingly an explicit DISTINCT does not save the join and we now get a hash join - concluding from the rows the CBO does not assume any duplicate rows - which is in line with the 10053 trace of that run, which assumes 8162 rows for the subquery:

SQL> timing start "Join and distinct"
SQL> select length(dat) from t1 join ( select distinct column_value from table(iset(-1,2,3)) ) tt on tt.column_value = t1.id

   2 /

LENGTH(DAT)


          90
          90


Execution Plan (truncated bytes and other cols for readability)



Plan hash value: 731532569
| Id  | Operation                                | Name | Rows  |
------------------------------------------------------------------
|   0 | SELECT STATEMENT                         |      |  8168 |
|*  1 |  HASH JOIN                               |      |  8168 |
|   2 |   VIEW                                   |      |  8168 |
|   3 |    HASH UNIQUE                           |      |  8168 |
|   4 |     COLLECTION ITERATOR CONSTRUCTOR FETCH|      |       |
|   5 |   TABLE ACCESS FULL                      | T1   |  1000K|
------------------------------------------------------------------

Predicate Information (identified by operation id):


    1 - access("TT"."COLUMN_VALUE"="T1"."ID")

Statistics


           1  recursive calls
           0  db block gets
       14192  consistent gets
           0  physical reads
           0  redo size
         363  bytes sent via SQL*Net to client
         377  bytes received via SQL*Net from client
           2  SQL*Net roundtrips to/from client
           0  sorts (memory)
           0  sorts (disk)
           2  rows processed

SQL> timing stop
timing for: Join and distinct
Elapsed: 00:00:00.23

Basically since the IN with subselect is the version that suits my needs best I am quite happy that it is blessed with a good plan. Although I have to say that it makes me a bit wary that I do not exactly understand what goes on in the other cases. I guess it's time for "CBO Fundamentals" - if only I had enough time...

Again, thanks for your help!

Kind regards

        robert Received on Mon Nov 24 2008 - 15:15:24 CST

Original text of this message