Oracle FAQ | Your Portal to the Oracle Knowledge Grid |
![]() |
![]() |
Home -> Community -> Mailing Lists -> Oracle-L -> Re: Join order and intermediate results
Dan,
Thanks for taking the time to provide such a detailed explanation. I'm probably missing something, but it appears that this explanation of how one-at-a-time (linear) join orders can match (or nearly so) the performance of other join orders is based on the assumption that the tables have a master-detail relationship of some kind. Queries in a data warehouse rarely have this property. In this particular case, joining A to B gave a relatively small result set and joining C to D gave a relatively small result set, so that joining AB to CD was very efficient.
The other combinations were either very inefficient or had no semantic use: * A cannot be joined to C or D (other than a meaningless Cartesian product) because they have no columns in common * D cannot be joined to A or B (other than a meaningless Cartesian product) because they have no columns in common * B joined to C without filtering B through A and C through D gave an enormous intermediate result set
One could argue that the data model was not suited to this query, but that doesn't solve the problem and the data model may be very well suited to many other classes of queries that are useful to end users.
Also, this sentence left me wondering how non-Oracle databases store and retrieve data:
"So, if (AB) costs N logical I/Os, it is very unlikely to contain more than
N rows."
I assume this is not referring to Oracle databases since it is very easy to
write meaningful queries in Oracle that return many more rows than the
number of LIOs needed to retrieve them. Multiple rows per block and a good
clustering factor make this a snap.
And while the next statement is true, I didn't see how I could apply it except in unusual cases:
"However, if it contains N rows, and if all the primary keys are indexed,
then there has to be a nested-loops path from (AB) to C and D
*individually* that does *at most* 2N index unique scans and 2N table
accesses by ROWID, ..."
This is only true if (1) all of the tables follow a master-detail chain and
(2) I'm joining from the detail rows first (both of which you've stated as
assumptions). However, this bounding formula only applies to the class of
queries where it is more efficient to filter the detail rows first and then
join to the parents. It appears to ignore the other very large class of
useful queries where it is more efficient to start with the master rows
first and work our way down.
Finally, this statement seems to imply that one-at-a-time paths can always provide reasonable approximations to optimal join paths in terms of LIOs (maybe I'm reading too much into it):
"So, I'm fascinated that you appear to have found a case where the sum of
the costs of (AB) and (CD) was 85 logical I/Os (implying either very small
tables or excellent indexed paths to less than 85 rows for both joins), but
did not find a one-at-a-time path that cost less than (or even close to!)
935 logical I/Os (11*85)."
It seems to me that the one-at-a-time approach has some significant
drawbacks for certain classes of queries if the best it can achieve is an
order of magnitude increase in LIOs over other join paths. The optimal join
order in this case just leaps right out at you if you look at the schema and
the data. It would be very nice if the CBO could see it too and I didn't
have to resort to trickery such as adding a pseudo-column like ROWNUM in the
inline views to force it out of the one-at-a-time path. Not only that,
one-at-a-time paths are serial. Individual row source scans and joins can
be performed in parallel, but the overall structure of the join order is
serial whereas nonlinear join orders lend themselves to parallel execution
by their very nature. This would seem to be a good reason to invest in
improving the CBO to consider nonlinear join orders in parallel processing
environments.
Regards,
John Smiley
So, let's examine what normally happens with a 4-way join where you might want to join A to B to make (AB), C to D to make (CD), followed by a join (presumably a hash or a sort-merge join, since there can't be a useful nest-loops path into a pre-joined result) of (AB) to (CD):
In a normal join tree, one of these 4 tables will be the "detail" table of the whole query, and joins will reach the other 3 tables by three detail-to-master joins that begin from that table. Since we can assume there's a join between A and B and another join between C and D, we can assume, without any significant loss of generality that A is the detail table, and that the joins look either like
A.Fkey1=B.Pkey1 AND C.Fkey2=D.Pkey2 AND A.FKey3=C.PKey3
or
A.Fkey1=B.Pkey1 AND C.Fkey2=D.Pkey2 AND B.FKey3=C.PKey3
Now, assuming, again without loss of generality, that A or B is the detail
table, and that table is large enough that optimization is an issue, it is
almost invariably the case that you are going to need an indexed path such
that the number of logical I/Os to achieve (AB) is at least, roughly, as
high as (and quite possibly several times higher than) the number of rows in
the join result
(AB) (since each result row will probably map to a specific table-access by
ROWID, not even counting other indexed accesses and the table access for the
other table). So, if (AB) costs N logical I/Os, it is very unlikely to
contain more than N rows. However, if it contains N rows, and if all the
primary keys are indexed, then there has to be a nested-loops path from (AB)
to C and D
*individually* that does *at most* 2N index unique scans and 2N table
accesses by ROWID, *without* pre-joining C and D. Since an index unique scan
is very unlikely to require more than 4 (usually less) logical I/Os, and a
table access by ROWID requires precisely 1 logical I/O, the entire
worst-case plan that drives from (AB) to C and D individually should have an
upper-bound cost of 11N logical I/Os if (AB) cost N logical I/Os.
Furthermore, the ratio of *runtime* is probably even better (a lot better,
usually) than 11-to-1 between the whole cost and the cost of (AB), because,
A or B being the detail table, and therefore probably the biggest table, you
are more likely to see physical I/O during the reads of A and B than during
the reads of C and D. Even if the join (CD) was *free*, that would mean that
the best one-at-a-time plan in the style A-to-B-to-C-to-D or
A-to-B-to-D-to-C should cost no more than 11 times as much as the best plan
(A-to-B)-to-(C-to-D)
Keep in mind that all this is true even restricting ourselves nested-loops
joins to C and D. When we allow for hash joins (one at a time, only where
they are
best) to C and/or D, the likelihood of dramatic advantages for
(A-to-B)-to-(C-to-D) grows even smaller.
When you factor in all the practical details, the probable ratio of the best
plan in the one-at-a-time style to the best plan that first pairs tables off
is
*usually* <= one (one-at-a-time wins or ties), and *very* rarely more than 2
or 3. In fact, I've known of the possibility of plans joining already-joined
results for years, and it's a trick I haven't needed even once (which is why
I didn't talk about it in my book) to get an excellent result in real-world
tuning.
So, I'm fascinated that you appear to have found a case where the sum of the
costs of (AB) and (CD) was 85 logical I/Os (implying either very small
tables or excellent indexed paths to less than 85 rows for both joins), but
did not find a one-at-a-time path that cost less than (or even close to!)
935 logical I/Os (11*85). Without going into too much more tedious detail,
even unusual join trees that do not have a single detail table, as long as
all joins have a primary key on one end, and point to a single table on both
ends, also obey this 11-to-1 upper bound (between the best one-at-a-time
cost and the best
(AB)-to-(CD) cost) if we allow for hash joins to individual tables. I can
only see a few possibilities:
-You're missing at least one primary-key index.
or
-You didn't find the *best* one-at-a-time join order, and the best one would cost less than 935 logical I/Os, and run in less than a second.
or
-You have an unusual join tree, where at least one of the joins is many-to-many, or some even more unusual feature/ I discuss most cases of unusual join trees in my book , and they usually indicate a functional mistake in the query, though there are of course exceptions to that rule.
Thanks,
Dan Tow
650-858-1557
www.singingsql.com
Quoting Smiley John - IL <SMILEYJ_at_tusc.com>:
> Suppose we have four tables A, B, C, and D that are all used in a
> SELECT statement. Let's further suppose that we know that the best
> way to join these tables (best meaning fastest RT, and fewest LIOs) is
> to join A to B to produce row source AB, join C to D to produce row
> source CD, then join AB to CD.
>
> How do I get Oracle to do this? At first I simply wrote the query as
> a simple SELECT like this:
>
> SELECT ...
> FROM A, B, C, D
> WHERE ...
>
> When that didn't work and I made sure the table, index, and column
> stats were correct, I tried various hints for join order and join
> method. When that didn't work, I decided to smack Oracle in the face
> with the answer like
> this:
>
> SELECT ...
> FROM (SELECT ... FROM A, B WHERE ...) AB, (SELECT ... FROM C, D WHERE
> ...) CD WHERE ...
>
> However, it stubbornly refused to join in the correct order and I had
> to result to storing the intermediate results in GTTs and then joining
> the GTTs.
>
> This is not a very satisfying means of solving the problem. Oracle
> seems to doggedly follow a linear path for satisfying the query, which
> goes something like this: join two row sources to create an
> intermediate result, pick another row source and join it to the
> intermediate result to produce a new intermediate result, repeat until
> done (in the case of parallel query, each PQ slave seems to follow
> this same approach and then the query coordinator merges the results).
>
> Expressed as a tree, it looks like this (where / | and \ are join
> operations):
>
> D (or C)
> /
> C (or D)
> /
> A
> |
> B
>
> When what I want is this:
>
> Result
> / \
> A C
> | |
> B D
>
>
> In the case I'm describing, this makes the difference between a
> response time of 0.1 seconds with 85 LIOs vs. a response time of 120
> seconds with 60,000 LIOs.
>
> I have a feeling I'm going to be smacking my forehead on this one, but
> I just don't see what I'm missing.
>
> John Smiley
>
> P.S. This is 9.2.0.4
>
> --
> http://www.freelists.org/webpage/oracle-l
<http://www.freelists.org/webpage/oracle-l>
>
-- http://www.freelists.org/webpage/oracle-lReceived on Sat Oct 02 2004 - 19:11:28 CDT
![]() |
![]() |