Oracle FAQ Your Portal to the Oracle Knowledge Grid
HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US
 

Home -> Community -> Mailing Lists -> Oracle-L -> Re: CBO and cartesian product

Re: CBO and cartesian product

From: Tim Gorman <tim_at_sagelogix.com>
Date: Sat, 11 Oct 2003 10:14:24 -0800
Message-ID: <F001.005D2D5B.20031011101424@fatcity.com>


Here is the short answer:


Here is the long answer:


Starting in the 8i timeframe, the CBO started borrowing some techniques from data warehouse STAR joins when confronted with any type of query that traversed two different entity-relationship heirarchies starting from the same table.

Say you have three tables (to keep it simple). One table is a "child" entity to the other two tables, which are both "parent" entities in ERD terms. The CBO detects that both "parent" tables are much smaller than the "child" table.

OK, so there is no relationship between the two parent tables -- they are both "related" only through the large child table.

Now, think about what traditional join methods are possible:

  1. start with one of the parent tables as the "driving table", do a indexed nested-loop range-scan during the join to the child table, and then perform indexed nested-loop unique-scan during the final join to the other parent table
  2. reverse the order of option #1. Start with the other parent table, join to the child, and then join up to the remaining parent
  3. start with the child table and join up (via indexed unique-scans) to the two parent tables

The weak point of both of these options is probably the access of the child table. Plain and simple, it is difficult to efficiently get rows from it. It is likely that the index supporting the foreign-key relationship from either parent table is not very efficient by itself, resulting in a very expensive range-scan, requiring a massive number of logical I/Os and "cost" calculated by the CBO.

So, the CBO in 8i started utilizing another option, which initially blew my mind first time I saw it happen. It was the point which I realized that the CBO was _way_ smarter than humans...

This additional option is to perform a "cartesian join" between the two parent tables, to come up with one result set. Then, using the filtered cartesian result set from that join, the CBO probes into the large child table using the _combined_ keys from both parent tables!

Rather brilliant choice, in most cases. The cartesian join, despite everybody's visceral fear of it, is actually rather insignificant if the two parent tables are small. And it is even smaller if there are good "filtering" predicates on those tables in the WHERE clause.

So, instead of having to retrieve rows from the large child table using one or the other of the relatively ineffective indexes supporting each foreign key, the CBO merges and uses both keys, resulting in a far more effective access method into the child table.

So, chances are good that this is the situation you are facing. Is this correct? Can you verify the basic relationships between the tables involved?

So, now the question is: why did the CBO make the wrong choice?

First, the default setting of the OPTIMIZER_INDEX_CACHING parameter (i.e. "0") represents a flaw in the basic costing algorithm used by the CBO. Setting the parameter to 90 or so fixes this flaw. For a more detailed explanation, please feel free to view my paper "Search for Intelligent Life in the CBO", available online at "http://www.EvDBT.com/papers.htm".

Changing that alone may cause the CBO to rethink its decision to go with the derived STAR-join scheme involving a cartesian join, and instead choose the indexed nested-loops scheme which is the __only__ possible choice by the RBO. By discounting the cost of index-based access methods, the CBO (which considers _all_ possible access methods and chooses the one with the lowest cost) may now choose the index-based plan. Once again, the RBO only considered the one plan, which in this case turned into a bit of luck for the RBO, making it look good.

You can experiment with this parameter change using ALTER SESSION, if you like. This is one of the _few_ occasions on which changing a parameter actually has an impact on performance.

There are some other parameter settings which may impact how the CBO costs this query. For example, if DB_FILE_MULTIBLOCK_READ_COUNT is set higher than its default value of 8 or 16, then the CBO will think that access plans involving FULL table scans are cheaper than they are.

Another possible cause for the CBO's bad decision is it's default assumption that data values in a column are evenly distributed. Gathering stats for indexed columns only gathers the number of distinct keys and the low/high values, by default. Therefore, the CBO has no choice except to assume an even distribution of data, that each distinct data value is used by an equal number of rows. Gathering column-level statistics creates "histograms" in the data dictionary that help the CBO recognize which data values would benefit from indexed access and which data values would benefit from a FULL table scan or another index.

Anyway, I'm getting writer's cramp. Plus, it's a beautiful Saturday morning and I've been traveling for two straight weeks...

Hope this helps?

-Tim

on 10/11/03 5:49 AM, Dilip at dilip7772002_at_indiatimes.com wrote:

> Hi List,
>
> DB version - 8174. (Oracle Apps 11.5.4).
>
> One of the customized report started running very slowly. One query was taking
> more than 3 GB of TEMP tablespace for 'HASH' type segment and erroring out
> after 2 hours for a lack of space in TEMP.
>
> Tkprof showed that it is doing Cartesian Sort Merge Join.
> After running the same report under RBO, it is using Nested loops and report
> is completing in just 5 minutes.
> The stats were collected last week-end. I couldn't find any reason for this
> CBO's behavior ? Has anybody experienced this before ?
>
> Thanks,
> ~Dilip
>
>
>
>
>
>
> Get Your Private, Free E-mail from Indiatimes at http://email.indiatimes.com
>
> Buy The Best In BOOKS at http://www.bestsellers.indiatimes.com
>
> Bid for for Air Tickets @ Re.1 on Air Sahara Flights. Just log on to
> http://airsahara.indiatimes.com and Bid Now !

-- 
Please see the official ORACLE-L FAQ: http://www.orafaq.net
-- 
Author: Tim Gorman
  INET: tim_at_sagelogix.com

Fat City Network Services    -- 858-538-5051 http://www.fatcity.com
San Diego, California        -- Mailing list and web hosting services
---------------------------------------------------------------------
To REMOVE yourself from this mailing list, send an E-Mail message
to: ListGuru_at_fatcity.com (note EXACT spelling of 'ListGuru') and in
the message BODY, include a line containing: UNSUB ORACLE-L
(or the name of mailing list you want to be removed from).  You may
also send the HELP command for other information (like subscribing).
Received on Sat Oct 11 2003 - 13:14:24 CDT

Original text of this message

HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US