Home » RDBMS Server » Performance Tuning » Hash Joins and Memory Requirements
Hash Joins and Memory Requirements [message #389589] Mon, 02 March 2009 18:25 Go to next message
scottwmackey
Messages: 515
Registered: March 2005
Senior Member
I have tables like the following:
CREATE TABLE exp_item
(
  file_group_id NUMBER,
  batch_num     NUMBER,
  item_id       NUMBER,
  type1         NUMBER,
  type2         NUMBER,
  type3         NUMBER,
  type4         NUMBER
)

CREATE TABLE revision
(
  job_id          NUMBER,
  insert_ts       TIMESTAMP,
  parent_id       NUMBER,
  item_id         NUMBER,
  division_num    NUMBER,
  group_num       NUMBER,
  revision_num    NUMBER,
  header_stg_id   NUMBER,
  revision_status RAW(4),
  revision_ts     TIMESTAMP,
  group_end_ts    TIMESTAMP),
  revision_factor NUMBER,
  program_id      NUMBER,
  is_time_synch   NUMBER(1),
  is_exp          NUMBER(1),
  status_flags    RAW(4),
  insert_ts       TIMESTAMP,
  is_artificial   NUMBER,
)

-- create table
CREATE TABLE revision_detail
(
  revision_ts   TIMESTAMP,
  item_id       NUMBER,
  division_num  NUMBER,
  group_num     NUMBER,
  revision_num  NUMBER,
  detail_num    NUMBER,
  detail_status NUMBER,
  value         NUMBER,
  value_alt     NUMBER,
  last          NUMBER,
  last_alt      NUMBER,
  uom_id        NUMBER,
  abs_value     NUMBER,
  abs_value_alt NUMBER
)

CREATE TABLE program_detail
(
  program_id NUMBER,
  detail_num       NUMBER,
  sf               NUMBER,
  group_end_sf     NUMBER,
  type_id          NUMBER,
  type_sf          NUMBER,
  abc_id           NUMBER,
  scalar           NUMBER,
  divisor          NUMBER,
  multiplier       NUMBER,
  ratio_a          NUMBER,
  ratio_b          NUMBER,
  detail_type      NUMBER,
  source_select    NUMBER,
  abc_name         VARCHAR2(60),
  multiplier_ud    NUMBER,
  multiplier_ud_sf NUMBER,
  insert_ts        TIMESTAMP(6) DEFAULT SYSDATE
)
I have a query like the following:
INSERT INTO tmp_exp
SELECT p_file_group_id
      ,ed.batch_num
      ,ed.item_id
      ,li.revision_ts
      ,lic.detail_num
      ,li.job_id
      ,lic.value
      ,lic.value_alt
      ,li.revision_status
      ,lic.detail_status
      ,li.insert_ts
      ,li.revision_factor
      ,lic.last
      ,lic.last_alt
      ,li.revision_num
      ,li.group_num
      ,0 is_adjusted
      ,li.status_flags
      ,li.program_id
      ,mpc.exp_detail_str
FROM exp_item ed
JOIN revision li ON li.item_id = ed.item_id
               AND li.revision_ts > v_exp_period_start
               AND li.revision_ts <= v_exp_period_end
JOIN revision_detail lic ON lic.item_id = li.item_id
                        AND lic.division_num = li.division_num
                        AND lic.group_num = li.group_num
                        AND lic.revision_num = li.revision_num
                        AND lic.revision_ts > v_exp_period_start
                        AND lic.revision_ts <= v_exp_period_end
JOIN program_detail mpc ON mpc.program_id = li.program_id
                                     AND mpc.detail_num = lic.detail_num
                                     AND (mpc.is_expt_detail = 1 OR p_export_detail = 1)
WHERE ed.file_group_id = p_file_group_id;
Nothing fancy here. My problem is the number of rows returned. It is quite possible that exp_item returns 10 million records, revision returns 24 records for each exp_item, and revision detail returns 4 records for each revision, for a grand total of around one billion records. When the CBO uses a nested loop for exp_item, revision, and revision_detail, the query never completes, at least we have never waited long enough for it to complete.

Revision and revision_detail are both partitioned and indexed on revision_ts, so it seems quite reasonable to me that hash joining all four tables would be much quicker. Our DBA suggest, however, that if the query returns too many rows, then we will use up all the memory and kill the DB. So my questions are 1) Is that true?, and 2) How much memory will this query require given the sizes here?

Question 2 is really a question about what Oracle stores when doing hash joins. Does it, for example, 1) read from exp_item, hash the item_ids, pair them with the respective rowids, then 2) read from revision, hash the item_id with rowid, hash the item_id/division_num/group_num/revision_num with rowid, then 3) read from revision_detail, hash the item_id/division_num/group_num/revision_num with rowid. When it is done with all that it could join by rowid for the first batch of records. If that is the case then the memory required would just be (size_of_hash_value + size_of_rowid) * number_of_hash_records. Assuming the hash and rowid are 10 bytes apiece, that would be about 20G for the hash of revision_detail. Or does it have to place the entire row for each record it retrieves into memory? That seems like it would be quite a bit more costly. If there is any documentation on-line that discusses this, pointing me to it would suffice. Thanks in advance for any input.
Re: Hash Joins and Memory Requirements [message #389646 is a reply to message #389589] Tue, 03 March 2009 00:38 Go to previous messageGo to next message
alexzeng
Messages: 133
Registered: August 2005
Location: alexzeng.wordpress.com
Senior Member
Hi,

As your sql will generate one billion rows, that is 1,000,000,000. My humble opinion is that use more temp tables and do it step by step. Some sql statements FYI:

create table tmp_revision as
select li.revision_ts,li.job_id,li.insert_ts,li.revision_factor,li.revision_num,li.group_num,li.status_flags,li.program_id
from revision li
where li.revision_ts > v_exp_period_start AND li.revision_ts <= v_exp_period_end
--as it will return 10 million rows, make sure it use full table scan instead of index

create table tmp1 as
select ed.batch_num,ed.item_id, li.* from tmp_revision, exp_item ed
where li.item_id = ed.item_id;
--if there is a unque index on exp_item(item_id), that's will be good

create table tmp2 as
select ... from tmp1 JOIN program_detail ...
--if there is a unque index on program_detail(program_id), that's will be good

insert into tmp_exp
selct ... from tmp3 join revision_detail ...

Regards,
Alex
Re: Hash Joins and Memory Requirements [message #389696 is a reply to message #389589] Tue, 03 March 2009 04:36 Go to previous messageGo to next message
JRowbottom
Messages: 5933
Registered: June 2006
Location: Sunny North Yorkshire, ho...
Senior Member
How many rows are you expecting the query to return after the date ranges and other restrictions are taken into consideration?

I would be suprised if splitting it into sections and performing DDL turned out to be quicker - it does happen occasionally, but very rarely.
Re: Hash Joins and Memory Requirements [message #389800 is a reply to message #389696] Tue, 03 March 2009 11:05 Go to previous messageGo to next message
scottwmackey
Messages: 515
Registered: March 2005
Senior Member
The numbers I posted are with the date ranges. Without the date ranges and depending upon the retention scheme we implement, the query could have 45 to 60 billion records, maybe more. For every record in exp_item, we are getting between 24 to 96 records each day.
Re: Hash Joins and Memory Requirements [message #389857 is a reply to message #389589] Tue, 03 March 2009 16:45 Go to previous messageGo to next message
coleing
Messages: 213
Registered: February 2008
Senior Member
1 billion rows is a lot of rows to hash join in a single query.

Your dba would need a large temp area to spill this into.

Alexzeng has the right idea for this kind of size of query.

manageable chunks is the key.

The last thing you need is for the query to fail at the last moment becuase it ran out of space after 30 hours of processing or something.

use additional speed techniques like parallel query (if you can resource permitting), nologging on the ddl objects, and append hints for inserts.

if you can use paralle query, then you can also use parallel dml. partition your target tables so you have portions of the data spread across as many different disk groups as you can. this will give you a faster write speed.


Re: Hash Joins and Memory Requirements [message #389871 is a reply to message #389857] Tue, 03 March 2009 20:04 Go to previous messageGo to next message
rleishman
Messages: 3728
Registered: October 2005
Location: Melbourne, Australia
Senior Member
A hash join creates a hash table from the smaller row source and probes it row by row with the larger source, returning rows to the next step of the plan (the next step may itself be a hash join).

If you have enough memory to hash N-1 of the row-sources in your join, hash join is awesome because the possibly monolithic Nth table does not need to be stored in memory.

That is not the case here.

When any of the <N row sources are too large to fit into memory, they are internally partitioned and all but one partition are written to TEMP segments. The performance profile is kind of bi-linear: run-time increases proportionally with volume up to a certain threshold (dictated by memory) then it increases at a faster rate beyond the threshold.

Ultimiately you will exceed your TEMP segment availability and it will crash.

Hash joins do NOT resolve two tables completely and then move on to the next. For each join, one of the row sources is chosen as the hash table, and Oracle tries to build all of the hash tables together. This means that memory and TEMP segments need to be shared across ALL of the joins, not just one at a time.

The divide and conquer method proposed above is - in my opinion - NEVER faster, but it can scale more when resources are limited (which is always the case - memory and TEMP are not infinite). So if your job is crashing with hash joins, you can take a performance hit and use GTTs to allow some higher volumes, or you can take a bigger performance hit and use permanent tables to handle greater volumes still.

For true limitless scaleability, you will need to restructure your tables and use the technique described in this article.

Ross Leishman
Re: Hash Joins and Memory Requirements [message #390047 is a reply to message #389871] Wed, 04 March 2009 13:41 Go to previous messageGo to next message
scottwmackey
Messages: 515
Registered: March 2005
Senior Member
Ross,

Thanks for the input. I think that, from a very high level, I understand your article. If we can create partions/subpartitions that creates small enough chunks then we could avoid problems with memoary and TEMP space. For the problem I have posted, the revision and revision_detail tables are already partitioned by day on revision_ts. I am pretty sure I could, without much collateral damage, add hash subpartitions on item_id. I could also hash partition exp_item on item_id with no problem whatsoever. If I am interpreting the article correctly, that should "solve" this problem. Is my interpretation correct? If so, that leaves one more question. How do I figure out how many subpartitions I will need?

If I have this straight you saying that the smaller table, in this case exp_item is read into memory and hashed. So if I have x items, and the table is partitioned it by y hash partitions, it would begin by hashing x/y rows. So I would assume the amount of memory used at that time would be x/y * k where k is the bytes per row required for the hash. But is k an absolute constant such as bytes_for_the_hash_key + bytes_for_the_row_id, or is k contextual constant such as bytes_for_the_hash_key + sum_of_the_bytes_for_all_the_columns_for_the_table_row?

Then it then queries revision, hashes on item_id, and rows that are matched are written to TEMP segments. Can I assume that only columns that are actually in the SELECT or the WHERE clause are written to TEMP or does it write the whole row from each table?

Then I assume that that hash is released and it reads from TEMP and creates a new hash on item_id/revision_ts. The final size of that hash will be the max memory I require for this query, right? I again have the problem how to calculate how big that will be as described above. It then reads from revision_detail, hashes each row and attempts to match then against the hash, and I would hope writes the matches directly to the tmp_exp table, meaning it does not add anymore burden at this time to TEMP.

If all that is correct, then it would seem I just need to find out what k is, which columns are written to TEMP, how much TEMP space and memory I have, and I will know how many subpartitions I need. So any help with the first two would be greatly appreciated.

Thanks again,
Scott

Re: Hash Joins and Memory Requirements [message #390058 is a reply to message #390047] Wed, 04 March 2009 15:01 Go to previous messageGo to next message
gintsp
Messages: 118
Registered: February 2007
Senior Member
It is not completely true, that one cannot predict temp space.
At least in 10g it is doable - see example (column TempSpc in execution plan):

SQL> create table a as select * from dba_objects;

Table created.

SQL> insert into a select * from a;

49893 rows created.
...
SQL> /

798288 rows created.
SQL> explain plan for 
  2  select a1.* from a a1, a a2
  3  where a1.owner = a2.owner;

Explained.
SQL> select * from table(dbms_xplan.display());
PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------
Plan hash value: 1585513397

-----------------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |   193G|    34T|       |  1681K (99)| 05:36:21 |
|*  1 |  HASH JOIN         |      |   193G|    34T|    43M|  1681K (99)| 05:36:21 |
|   2 |   TABLE ACCESS FULL| A    |  1584K|    25M|       |  4870   (1)| 00:00:59 |
|   3 |   TABLE ACCESS FULL| A    |  1584K|   267M|       |  4902   (2)| 00:00:59 |
-----------------------------------------------------------------------------------

Of course the question remains how accurate it is - and you can try out yourself for a smaller queries. And it obviously depends on how accurate Oracle predicts resultant cardinalities.
Re: Hash Joins and Memory Requirements [message #390080 is a reply to message #390058] Wed, 04 March 2009 21:11 Go to previous messageGo to next message
rleishman
Messages: 3728
Registered: October 2005
Location: Melbourne, Australia
Senior Member
Scott, you may be making some mistaken assumptions about how hash joins work. They (generally?) do not store the result one join before proceeding to the next. All joins happen at once. This is made more complicated by internal partitioning when memory is exceeded, but it is still true.

By default the resultant row set of every hash join is the second row set of the subsequent join. In this way, the result-set is always used to probe the next (pre-built) hash table, and result-sets are never written to memory. If the result-set of a hash join were the FIRST row-set of the next hash join, then it would have to be resolved fully before the next join could start. This could probably be forced using inline views, but I don't recall ever seeing it as default behaviour.

Consider your case: EXP_ITEM and REVISION are the smallest row sets. A plan that hashed these two and then probed with REVISION_DETAIL would look like this:
HASH JOIN
  FULL SCAN EXP_ITEM
  HASH JOIN
    FULL SCAN REVISION
    FULL SCAN REVISION DETAIL

This is the perfect plan in an unlimited-memory scenario, because we don't store result-sets that are of an order-of-magnitude of the largest table.

When memory is exceeded, the two hash tables (REVISION and EXP_ITEM) are internally partitioned; one partition for each is kept in memory and the others written to TEMP. REVISION_DETAIL is then processed, rows matching the in-memory partition of REVISION are joined and passed to the next join, non-matching rows are partitioned and written to TEMP. After the first pass, partitions are then loaded one-by-one and joined.

When you use physical hash partitioning to make the join scaleable, you are trying to COMPLETELY avoid use of TEMP.

That means memory must be large enough to accommodate the entire hash table for one hash partition of each table in the join except the largest. Of course, the hash partition only stores the columns required to satisfy the query.

Sizing the hash partitions is not an exact science because AVAILABLE memory is not a fixed commodity; depending on what else is running on the database, you may not have as much memory as you hope.
Also, you might size them perfectly for this query, but then if you add in another join or even another selected column, the extra space requirement may exceed available memory.

Oracle can help you decide. Look at the following plan:
  1  select /*+ORDERED USE_HASH(o c m) cardinality(o 1000000) cardinality(c 1000000) cardinality(m 1000000)*/ *
  2  from tmt_obj o
  3  join tmt_col c on c.obj_id = o.obj_id
  4* join tmt_col_map m on m.col_id = c.col_id

-------------------------------------------------------------------------------------------
| Id  | Operation           | Name        | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |             |  1701G|   587T|       |    93M (20)|313:18:26 |
|*  1 |  HASH JOIN          |             |  1701G|   587T|    38M|    93M (20)|313:18:26 |
|   2 |   TABLE ACCESS FULL | TMT_COL_MAP |  1000K|    26M|       |    28  (47)| 00:00:01 |
|*  3 |   HASH JOIN         |             |  4385M|  1437G|   226M| 65242  (73)| 00:13:03 |
|   4 |    TABLE ACCESS FULL| TMT_OBJ     |  1000K|   214M|       |    75  (80)| 00:00:01 |
|   5 |    TABLE ACCESS FULL| TMT_COL     |  1000K|   121M|       |    39  (62)| 00:00:01 |
-------------------------------------------------------------------------------------------

Note the TempSpc column tells you that memory is deemed to be insufficient.

As a reasonable start, you might consider partitioning such that around 1M rows are included in each join. Note that if you are sub-partitioning, you should consider the TOTAL of all partitions for a single sub-partition. eg. If you have 10 partitions, each with 3 sub-partitions of 300,000 rows, then the partition-wise join will process 3x300,000 rows at a time.

Ross Leishman
Re: Hash Joins and Memory Requirements [message #390307 is a reply to message #390080] Thu, 05 March 2009 16:59 Go to previous messageGo to next message
scottwmackey
Messages: 515
Registered: March 2005
Senior Member
Ross,

Thanks ever so much for the detailed explanation. It all makes sense to me except for your last statement. If it processes all the subpartitions in the same iteration, it doesn't seem like there is any benefit of subpartitioning as it relates to this problem. What am I missing?

Scott
Re: Hash Joins and Memory Requirements [message #390311 is a reply to message #390307] Thu, 05 March 2009 19:18 Go to previous messageGo to next message
rleishman
Messages: 3728
Registered: October 2005
Location: Melbourne, Australia
Senior Member
I'm talking about a sub-partitioned table. Say you have composite partitioning on a date range and an ID hash.
P1: January
  S1: 1M Rows
  S2: 1M Rows
  S3: 1M Rows
P2: February
  S1: 500K Rows
  S2: 500K Rows
  S3: 500K Rows

Then you partition-wise join to a table hash partitioned on ID.
P1: 2M rows
P2: 2M rows
P3: 2M rows

When you do the partition-wise hash join, the composite table is the smaller row-source so it will be used to build the hash table. But it iterate three times with hash tables of 1.5M rows on each iteration (1M + 500K). It will not iterate 6 times with alternating hash tables of 1M and 500K rows.

Ross Leishman
Re: Hash Joins and Memory Requirements [message #390357 is a reply to message #389871] Fri, 06 March 2009 01:40 Go to previous message
alexzeng
Messages: 133
Registered: August 2005
Location: alexzeng.wordpress.com
Senior Member
I don't mean the divide and conquer method will be fast. I mean it will give you the result in a reasonable time. This is the base line. On this baseline, we can do some optimizer, such as parellel, nologging, partition, etc.

With other appoaches directly, it even cannot get a result. The memory will be sure not enough to the whole hash. Even the temporary tablespace will be not enough.

rleishman wrote on Tue, 03 March 2009 20:04
A hash join creates a hash table from the smaller row source and probes it row by row with the larger source, returning rows to the next step of the plan (the next step may itself be a hash join).

If you have enough memory to hash N-1 of the row-sources in your join, hash join is awesome because the possibly monolithic Nth table does not need to be stored in memory.

That is not the case here.

When any of the <N row sources are too large to fit into memory, they are internally partitioned and all but one partition are written to TEMP segments. The performance profile is kind of bi-linear: run-time increases proportionally with volume up to a certain threshold (dictated by memory) then it increases at a faster rate beyond the threshold.

Ultimiately you will exceed your TEMP segment availability and it will crash.

Hash joins do NOT resolve two tables completely and then move on to the next. For each join, one of the row sources is chosen as the hash table, and Oracle tries to build all of the hash tables together. This means that memory and TEMP segments need to be shared across ALL of the joins, not just one at a time.

The divide and conquer method proposed above is - in my opinion - NEVER faster, but it can scale more when resources are limited (which is always the case - memory and TEMP are not infinite). So if your job is crashing with hash joins, you can take a performance hit and use GTTs to allow some higher volumes, or you can take a bigger performance hit and use permanent tables to handle greater volumes still.

For true limitless scaleability, you will need to restructure your tables and use the technique described in this article.

Ross Leishman

Previous Topic: updation of 200millions rows
Next Topic: IOT secondary indexes SLOW performance
Goto Forum:
  


Current Time: Fri Nov 22 17:55:37 CST 2024