performance problem when use parallel and hash join on large tables [message #255427] |
Tue, 31 July 2007 17:57 |
fbucco
Messages: 3 Registered: July 2007
|
Junior Member |
|
|
I´m trying to create a table using parallel and hash join. The first one has 27Gb and the second has 300Gb. I see at v$session_longops that oracle has already read both table and started to join them using hash join, but the expected time to finished this hash is 138 hours and it has only used 1:30 hours to read both tables. Is there anything that I can do to improve this step?
create table tmp
tablespace ncnologging
parallel 10
as SELECT /*+ use_hash(b,a) */ a.*, b.data_servico, b.id_subparticao
FROM a, b
WHERE a.if_ident_ano = b.ident_ano
AND a.if_ident_sequencial_remessa = b.ident_sequencial_remessa
AND a.if_ident_gerador = b.ident_gerador
AND a.if_ident_sequencial_registro = b.ident_sequencial_registro
Regards.
|
|
|
|
Re: performance problem when use parallel and hash join on large tables [message #255442 is a reply to message #255427] |
Tue, 31 July 2007 23:00 |
rleishman
Messages: 3728 Registered: October 2005 Location: Melbourne, Australia
|
Senior Member |
|
|
It's worth finding out whether the bottleneck is in the SELECT or the CREATE.
Try the following:
SELECT *
FROM (
SELECT /*+ use_hash(b,a) PARALLEL(a) PARALLEL(b)*/ a.*, b.data_servico, b.id_subparticao
FROM a, b
WHERE a.if_ident_ano = b.ident_ano
AND a.if_ident_sequencial_remessa = b.ident_sequencial_remessa
AND a.if_ident_gerador = b.ident_gerador
AND a.if_ident_sequencial_registro = b.ident_sequencial_registro
)
WHERE ROWNUM > 1
It will run the SELECT and get all of the data from disk, but will return no rows. It will give you a benchmark for the SELECT. If that is quick, then writing the data must be the slow bit.
Also, if both tables are very large, the Hash Join may require multiple passes and use heaps of temp space. There's not much you can do about that other than increase the hash area, or you could hash partition the tables (discussed here). But benchmark the SQL first (above), otherwise you might be wasting your time.
Ross Leishman
|
|
|
|
|
Re: performance problem when use parallel and hash join on large tables [message #255768 is a reply to message #255688] |
Wed, 01 August 2007 13:08 |
fbucco
Messages: 3 Registered: July 2007
|
Junior Member |
|
|
Let me try to explain what I have and I need.
I have two tables, A with 27Gb and B with 300Gb. The B table will be partitioned by one column and subpartitioned by another column and both columns are new. Some rows from B table will be discarded. The rows that I have to migrate come from a join between A and B. The join that I need to get these rows is:
[
select a.*, b.data_servico, b.id_subparticao
FROM a, b
WHERE a.if_ident_ano = b.ident_ano
AND a.if_ident_sequencial_remessa = b.ident_sequencial_remessa
AND a.if_ident_gerador = b.ident_gerador
AND a.if_ident_sequencial_registro = b.ident_sequencial_registro
]
There is an index on B table with these columns: (if_ident_ano, if_ident_sequencial_remessa , if_ident_gerador, if_ident_sequencial_registro).
So, I have some options to execute it:
1) create table as select without any parallel
2) create table as select using parallel and hash join
3) create table as select using parallel and nested loop
4) use copy command
I thought that the second one will be the best one, but I´m having this problem with the hash join.
Any sugestions are welcome.
Regards.
|
|
|