Re: difference between hash semi join and merge semi join

From: Jonathan Lewis <jonathan_at_jlcomp.demon.co.uk>
Date: Sat, 5 Jun 2004 06:44:20 +0000 (UTC)
Message-ID: <c9rq44$isc$1@titan.btinternet.com>

The semi-join operator is a modified join used for existence (you IN subquery has
been turned into an EXISTS subquery).

Once the subquery has been turned into
a join, Oracle can choose between a
hash, merge or NL - it doesn't really
matter that it is a semi- rather than normal join.

The performance tuning guide and ref
is quite good on describing the joins
in the 9.2 version (although I think it
may have two descriptions of the hash,
one of which is back to front).

Merge:

    The two data sets have to be sorted
    on the join columns. The presence of     a suitable index on the first input means     it may not need a sort operation. The     second input always goes through a
    sort, even if it is acquired from a suitable     index. Oracle then walks the two sorted     sets in tandem, identifying the overlaps.

Hash join

The first data set is distributed into a hash table, ideally in memory, based on hashing the join columns. For large data sets, the hash table will probably overflow to disc. The second data set is acquired; and for each row the join columns are hashed, so that a quick lookup into the hash table can place only for possible matches.

Dependent on resources, data volumes, or data distributions, either method could be faster.

The problem with the explain plan is that Oracle has produced an arithmetic model
that doesn't match the actual data sets. It happens for a variety of reasons - often down to inadequate statistics.

-- 
Regards

Jonathan Lewis

http://www.jlcomp.demon.co.uk

http://www.jlcomp.demon.co.uk/faq/ind_faq.html
The Co-operative Oracle Users' FAQ

http://www.jlcomp.demon.co.uk/seminar.html
Optimising Oracle Seminar - schedule updated May 1st


<samuels_at_red.seas.upenn.edu> wrote in message
news:c9q4p2$386j$1_at_netnews.upenn.edu...


> hi,

> can someone explain to me the difference between a hash semi join and a

merge semi join?


> the version of oracle i'm running is Oracle9i Enterprise Edition Release

9.2.0.4.0  on SunOS 5.8.


> the problem i'm having is that this query ran within minutes on 8.0.6

(same os):


> select customer_number, customer_id, customer_seq_number from

customer_table_1 where customer_id in


> (select distinct customer_id from customer_table_2);

>

> customer_table_1 has about 5 million rows and the primary key on this

table is customer_number, customer_id, and customer_seq_number.


> customer_table_2 has about 3 million rows and the primary key contains the

customer_id columns (the 2nd key value in the index).


> when i do an explain plan on the query (in 9.2.0.4.0), a hash semi-join is

used and the query runs forever (about 4 hours, before i kill it).


> there are about 14 distinct customer_id's and the result set for the query

contains about 1 million rows.


> when i change the above query to the following:

> select customer_number, customer_id, customer_seq_number from

customer_table_1 where customer_id in


> (select /*+merge_sj*/ customer_id from customer_table_2);

>

> the query runs within 2 minutes (when i do an explain plan, the cost is

about 5 times higher than when oracle was using a hash semi-join).


> so i am just wondering, what exactly is going on here?

> if anyone can point me to any websites or documentation that clearly

explains the difference between hash and merge, that would be great.


> and what's the deal with the explain plan? why is oracle choose a bad

access method (i mean i know the hash has a lower value, but it clearly


> is not more efficient)?

>

> thanks in advance.

> -maurice

>

> samuels_at_seas.upenn.edu

Received on Sat Jun 05 2004 - 01:44:20 CDT