Oracle FAQ | Your Portal to the Oracle Knowledge Grid |
Home -> Community -> Usenet -> c.d.o.server -> Re: difference between hash semi join and merge semi join
The semi-join operator is a modified join
used for existence (you IN subquery has
been turned into an EXISTS subquery).
Once the subquery has been turned into
a join, Oracle can choose between a
hash, merge or NL - it doesn't really
matter that it is a semi- rather than normal
join.
The performance tuning guide and ref
is quite good on describing the joins
in the 9.2 version (although I think it
may have two descriptions of the hash,
one of which is back to front).
Merge:
The two data sets have to be sorted
on the join columns. The presence of
a suitable index on the first input means
it may not need a sort operation. The
second input always goes through a
sort, even if it is acquired from a suitable
index. Oracle then walks the two sorted
sets in tandem, identifying the overlaps.
Hash join
The first data set is distributed into a hash table, ideally in memory, based on hashing the join columns. For large data sets, the hash table will probably overflow to disc. The second data set is acquired; and for each row the join columns are hashed, so that a quick lookup into the hash table can place only for possible matches.
Dependent on resources, data volumes, or data distributions, either method could be faster.
The problem with the explain plan is that
Oracle has produced an arithmetic model
that doesn't match the actual data sets. It
happens for a variety of reasons - often
down to inadequate statistics.
-- Regards Jonathan Lewis http://www.jlcomp.demon.co.uk http://www.jlcomp.demon.co.uk/faq/ind_faq.html The Co-operative Oracle Users' FAQ http://www.jlcomp.demon.co.uk/seminar.html Optimising Oracle Seminar - schedule updated May 1st <samuels_at_red.seas.upenn.edu> wrote in message news:c9q4p2$386j$1_at_netnews.upenn.edu...Received on Sat Jun 05 2004 - 01:44:20 CDT
> hi,
> can someone explain to me the difference between a hash semi join and a
merge semi join?
> the version of oracle i'm running is Oracle9i Enterprise Edition Release
9.2.0.4.0 on SunOS 5.8.
> the problem i'm having is that this query ran within minutes on 8.0.6
(same os):
> select customer_number, customer_id, customer_seq_number from
customer_table_1 where customer_id in
> (select distinct customer_id from customer_table_2);
>
> customer_table_1 has about 5 million rows and the primary key on this
table is customer_number, customer_id, and customer_seq_number.
> customer_table_2 has about 3 million rows and the primary key contains the
customer_id columns (the 2nd key value in the index).
> when i do an explain plan on the query (in 9.2.0.4.0), a hash semi-join is
used and the query runs forever (about 4 hours, before i kill it).
> there are about 14 distinct customer_id's and the result set for the query
contains about 1 million rows.
> when i change the above query to the following:
> select customer_number, customer_id, customer_seq_number from
customer_table_1 where customer_id in
> (select /*+merge_sj*/ customer_id from customer_table_2);
>
> the query runs within 2 minutes (when i do an explain plan, the cost is
about 5 times higher than when oracle was using a hash semi-join).
> so i am just wondering, what exactly is going on here?
> if anyone can point me to any websites or documentation that clearly
explains the difference between hash and merge, that would be great.
> and what's the deal with the explain plan? why is oracle choose a bad
access method (i mean i know the hash has a lower value, but it clearly
> is not more efficient)?
>
> thanks in advance.
> -maurice
>
> samuels_at_seas.upenn.edu