Oracle FAQ | Your Portal to the Oracle Knowledge Grid |
![]() |
![]() |
Home -> Community -> Usenet -> c.d.o.server -> Re: Join Two Extremely Large Tables
You are quite right about the quantity of logical I/O, and probably physical I/O, but I think you probably missed the start of the thread. A key feature of the processing is the need to collate all the 'B' rows for a given 'A' row. The only way to do this through a hash join is through an explicit 'order by'.
Whilst an unsorted output could be generated quite rapidly by a hash join (given a hash_area_size somewhat larger than the first input) the cost of the subsequent necessary sort could easily outweigh the benefit of the hash.
As the OP also says, the nested loop solution has better scalability (although I would have used the term "predictability"). Hash joins, like sorts, have a 'catastrophe' performance profile - there are two points where a small change in the inputs results in a large change in the response time. Sometimes it is simply better to go for the consistent, slow, solution, rather than live on the edge with a solution that is sometimes faster, and sometimes much slower.
-- Regards Jonathan Lewis http://www.jlcomp.demon.co.uk The educated person is not the person who can answer the questions, but the person who can question the answers -- T. Schick Jr One-day tutorials: http://www.jlcomp.demon.co.uk/tutorial.html ____UK_______April 22nd ____USA_(FL)_May 2nd ____Denmark__May 21-23rd ____Sweden___June ____Finland__September ____Norway___September Three-day seminar: see http://www.jlcomp.demon.co.uk/seminar.html ____UK_(Manchester)_May x 2 ____Estonia___June (provisional) ____Australia_June (provisional) ____USA_(CA, TX)_August The Co-operative Oracle Users' FAQ http://www.jlcomp.demon.co.uk/faq/ind_faq.html "D.Y." <dyou98_at_aol.com> wrote in message news:f369a0eb.0304211322.21f44f6c_at_posting.google.com...Received on Mon Apr 21 2003 - 16:52:11 CDT
> bchorng_at_yahoo.com (Bass Chorng) wrote in message
news:<bd9a9a76.0304211032.681cd7ba_at_posting.google.com>...
> > "Jonathan Lewis" <jonathan_at_jlcomp.demon.co.uk> wrote in message
news:<b7mull$l1b$1$8300dec7_at_news.demon.co.uk>...
> ...
> > > Sounds like the nested loop is it. You should get
> > > a noticeable improvement by doing the join (hinted
> > > very carefully to do exactly what you want) and
> > > then using array fetches of a few dozen to a few
> > > hundred rows at a time. It's messier to code
> > > at the boundary conditions than one row at a
> > > time - but significantly more efficient.
> >
> > I think nested loop gives you scalability too.
> > If I use hash join, I have to calculate the size
> > of hash_area_size to make sure it performs right.
> > This makes the memory requirement unscalable.
> >
>
> I am not sure nested loop is the best choice, simply because it
> requires many more times logical I/Os (compared to hash join),
> and most of them will be turned into physical I/Os for tables and
> indexes this large. If you can pull all the colums you need in one
> join, and your temp tablespace is not created on a very, very slow
> drive, hash join should be the winner. Of course your temp
tablespace
> needs to be large enough. If nested loop turns out to be faster for
> joining these tables I'd be very interested to know.
![]() |
![]() |