Re: Join order and intermediate results

From: Smiley John - IL <SMILEYJ_at_tusc.com>
Date: Sat, 2 Oct 2004 19:15:46 -0500
Message-ID: <F5E885BEF9540D47A7BDC03CF168808709924AF8@tuscil_ex1>

Dan,

Thanks for taking the time to provide such a detailed explanation. I'm probably missing something, but it appears that this explanation of how one-at-a-time (linear) join orders can match (or nearly so) the performance of other join orders is based on the assumption that the tables have a master-detail relationship of some kind. Queries in a data warehouse rarely have this property. In this particular case, joining A to B gave a relatively small result set and joining C to D gave a relatively small result set, so that joining AB to CD was very efficient.

The other combinations were either very inefficient or had no semantic use: * A cannot be joined to C or D (other than a meaningless Cartesian product) because they have no columns in common * D cannot be joined to A or B (other than a meaningless Cartesian product) because they have no columns in common * B joined to C without filtering B through A and C through D gave an enormous intermediate result set

One could argue that the data model was not suited to this query, but that doesn't solve the problem and the data model may be very well suited to many other classes of queries that are useful to end users.

Also, this sentence left me wondering how non-Oracle databases store and retrieve data:

"So, if (AB) costs N logical I/Os, it is very unlikely to contain more than N rows."
I assume this is not referring to Oracle databases since it is very easy to write meaningful queries in Oracle that return many more rows than the number of LIOs needed to retrieve them. Multiple rows per block and a good clustering factor make this a snap.

And while the next statement is true, I didn't see how I could apply it except in unusual cases:

"However, if it contains N rows, and if all the primary keys are indexed, then there has to be a nested-loops path from (AB) to C and D *individually* that does *at most* 2N index unique scans and 2N table accesses by ROWID, ..."
This is only true if (1) all of the tables follow a master-detail chain and (2) I'm joining from the detail rows first (both of which you've stated as assumptions). However, this bounding formula only applies to the class of queries where it is more efficient to filter the detail rows first and then join to the parents. It appears to ignore the other very large class of useful queries where it is more efficient to start with the master rows first and work our way down.

Finally, this statement seems to imply that one-at-a-time paths can always provide reasonable approximations to optimal join paths in terms of LIOs (maybe I'm reading too much into it):

"So, I'm fascinated that you appear to have found a case where the sum of the costs of (AB) and (CD) was 85 logical I/Os (implying either very small tables or excellent indexed paths to less than 85 rows for both joins), but did not find a one-at-a-time path that cost less than (or even close to!) 935 logical I/Os (11*85)."
It seems to me that the one-at-a-time approach has some significant drawbacks for certain classes of queries if the best it can achieve is an order of magnitude increase in LIOs over other join paths. The optimal join order in this case just leaps right out at you if you look at the schema and the data. It would be very nice if the CBO could see it too and I didn't have to resort to trickery such as adding a pseudo-column like ROWNUM in the inline views to force it out of the one-at-a-time path. Not only that, one-at-a-time paths are serial. Individual row source scans and joins can be performed in parallel, but the overall structure of the join order is serial whereas nonlinear join orders lend themselves to parallel execution by their very nature. This would seem to be a good reason to invest in improving the CBO to consider nonlinear join orders in parallel processing environments.

Regards,

John Smiley

I see that Mark Farnham already did a very nice job of answering the question of how to accomplish this, so I'll focus on the question of "Why?". As it turns out, the need for this sort of unusual join-tree is quite rare. As you already noticed, John, Oracle has a strong tendency to join tables one-at-a-time, something like A to B to get (AB), followed by (AB) to C to get (ABC), followed by (ABC) to D to get (ABCD),..., and with normal hints like ORDERED (without using inline-view tricks, that is), all you can force is which table is "A" (first), which is "B" (second), which is "C" (third), etc., in this one-at-a-time join style.

So, let's examine what normally happens with a 4-way join where you might want to join A to B to make (AB), C to D to make (CD), followed by a join (presumably a hash or a sort-merge join, since there can't be a useful nest-loops path into a pre-joined result) of (AB) to (CD):

In a normal join tree, one of these 4 tables will be the "detail" table of the whole query, and joins will reach the other 3 tables by three detail-to-master joins that begin from that table. Since we can assume there's a join between A and B and another join between C and D, we can assume, without any significant loss of generality that A is the detail table, and that the joins look either like

A.Fkey1=B.Pkey1 AND
C.Fkey2=D.Pkey2 AND
A.FKey3=C.PKey3

A.Fkey1=B.Pkey1 AND
C.Fkey2=D.Pkey2 AND
B.FKey3=C.PKey3

Now, assuming, again without loss of generality, that A or B is the detail table, and that table is large enough that optimization is an issue, it is almost invariably the case that you are going to need an indexed path such that the number of logical I/Os to achieve (AB) is at least, roughly, as high as (and quite possibly several times higher than) the number of rows in the join result
(AB) (since each result row will probably map to a specific table-access by ROWID, not even counting other indexed accesses and the table access for the other table). So, if (AB) costs N logical I/Os, it is very unlikely to contain more than N rows. However, if it contains N rows, and if all the primary keys are indexed, then there has to be a nested-loops path from (AB) to C and D
*individually* that does *at most* 2N index unique scans and 2N table accesses by ROWID, *without* pre-joining C and D. Since an index unique scan is very unlikely to require more than 4 (usually less) logical I/Os, and a table access by ROWID requires precisely 1 logical I/O, the entire worst-case plan that drives from (AB) to C and D individually should have an upper-bound cost of 11N logical I/Os if (AB) cost N logical I/Os. Furthermore, the ratio of *runtime* is probably even better (a lot better, usually) than 11-to-1 between the whole cost and the cost of (AB), because, A or B being the detail table, and therefore probably the biggest table, you are more likely to see physical I/O during the reads of A and B than during the reads of C and D. Even if the join (CD) was *free*, that would mean that the best one-at-a-time plan in the style A-to-B-to-C-to-D or A-to-B-to-D-to-C should cost no more than 11 times as much as the best plan

(A-to-B)-to-(C-to-D)

Keep in mind that all this is true even restricting ourselves nested-loops joins to C and D. When we allow for hash joins (one at a time, only where they are
best) to C and/or D, the likelihood of dramatic advantages for (A-to-B)-to-(C-to-D) grows even smaller.

When you factor in all the practical details, the probable ratio of the best plan in the one-at-a-time style to the best plan that first pairs tables off is
*usually* <= one (one-at-a-time wins or ties), and *very* rarely more than 2 or 3. In fact, I've known of the possibility of plans joining already-joined results for years, and it's a trick I haven't needed even once (which is why I didn't talk about it in my book) to get an excellent result in real-world tuning.

So, I'm fascinated that you appear to have found a case where the sum of the costs of (AB) and (CD) was 85 logical I/Os (implying either very small tables or excellent indexed paths to less than 85 rows for both joins), but did not find a one-at-a-time path that cost less than (or even close to!) 935 logical I/Os (11*85). Without going into too much more tedious detail, even unusual join trees that do not have a single detail table, as long as all joins have a primary key on one end, and point to a single table on both ends, also obey this 11-to-1 upper bound (between the best one-at-a-time cost and the best
(AB)-to-(CD) cost) if we allow for hash joins to individual tables. I can only see a few possibilities:

-You're missing at least one primary-key index.

-You didn't find the *best* one-at-a-time join order, and the best one would cost less than 935 logical I/Os, and run in less than a second.

-You have an unusual join tree, where at least one of the joins is many-to-many, or some even more unusual feature/ I discuss most cases of unusual join trees in my book , and they usually indicate a functional mistake in the query, though there are of course exceptions to that rule.

Thanks,

Dan Tow
650-858-1557
www.singingsql.com

Quoting Smiley John - IL <SMILEYJ_at_tusc.com>:

> Suppose we have four tables A, B, C, and D that are all used in a
> SELECT statement. Let's further suppose that we know that the best
> way to join these tables (best meaning fastest RT, and fewest LIOs) is
> to join A to B to produce row source AB, join C to D to produce row
> source CD, then join AB to CD.
>
> How do I get Oracle to do this? At first I simply wrote the query as
> a simple SELECT like this:
>
> SELECT ...
> FROM A, B, C, D
> WHERE ...
>
> When that didn't work and I made sure the table, index, and column
> stats were correct, I tried various hints for join order and join
> method. When that didn't work, I decided to smack Oracle in the face
> with the answer like
> this:
>
> SELECT ...
> FROM (SELECT ... FROM A, B WHERE ...) AB, (SELECT ... FROM C, D WHERE
> ...) CD WHERE ...
>
> However, it stubbornly refused to join in the correct order and I had
> to result to storing the intermediate results in GTTs and then joining
> the GTTs.
>
> This is not a very satisfying means of solving the problem. Oracle
> seems to doggedly follow a linear path for satisfying the query, which
> goes something like this: join two row sources to create an
> intermediate result, pick another row source and join it to the
> intermediate result to produce a new intermediate result, repeat until
> done (in the case of parallel query, each PQ slave seems to follow
> this same approach and then the query coordinator merges the results).
>
> Expressed as a tree, it looks like this (where / | and \ are join
> operations):
>
> D (or C)
> /
> C (or D)
> /
> A
> |
> B
>
> When what I want is this:
>
> Result
> / \
> A C
> | |
> B D
>
>
> In the case I'm describing, this makes the difference between a
> response time of 0.1 seconds with 85 LIOs vs. a response time of 120
> seconds with 60,000 LIOs.
>
> I have a feeling I'm going to be smacking my forehead on this one, but
> I just don't see what I'm missing.
>
> John Smiley
>
> P.S. This is 9.2.0.4
>
> --
> http://www.freelists.org/webpage/oracle-l
<http://www.freelists.org/webpage/oracle-l>
>

--
http://www.freelists.org/webpage/oracle-l

Received on Sat Oct 02 2004 - 19:11:28 CDT