Re: HASH vs NESTED LOOP join: when is one better than the other ?

From: Jaap W. van Dijk <j.w.vandijk.removethis_at_hetnet.nl>
Date: Fri, 24 Mar 2006 20:54:24 GMT
Message-ID: <44245c4c.1054765@news.hetnet.nl>

On 22 Mar 2006 16:46:51 -0800, "William Robertson" <william.robertson_at_bigfoot.com> wrote:

>Spendius wrote:
>> <snip>
>> Can the algorithm be simply summarized?
>
>In the nested-loops approach, for each row in driving table A you
>search table B for a corresponding value. This will typically (though
>not necessarily) be an index search (visit the index, retrieve the
>rowid, visit the table - potentially accessing a block you have already
>visited in an earlier loop iteration).
>
>In the hash-join approach, as I understand it, you first read the
>smaller table and construct an in-memory (if possible, subject to PGA
>resources etc) array using hash values generated from the specified
>key. Then you work through the second table applying the same hash
>function to the corresponding key values, allowing you to match rows
>without a sort but at some cost in memory and CPU.
>
>Typically the second approach is significantly more efficient when
>joining two large sets of rows. However if you have a relatively small
>number of values to search for, you might as well fetch them directly
>(via nested loops) and avoid the overhead of building large hashes.
>
>There is also the sort-merge approach (largely superseded by hash joins
>since 7.3) in which two large tables are matched by sorting both sets
>of keys.
>

And due to the nature of this mechanism: hash join can only handle equi-joins.

Jaap. Received on Fri Mar 24 2006 - 14:54:24 CST