Home » RDBMS Server » Performance Tuning » Hash cluster with small (tiny) rows ? (11g)
Hash cluster with small (tiny) rows ? [message #382952] |
Mon, 26 January 2009 11:34 |
chris3110
Messages: 3 Registered: January 2009
|
Junior Member |
|
|
Hi there,
I have a large table (25M customers, indexed by phone number) that I need to query by PK a lot of times during some parallel computations. I thought a nice way to do that would be to use a hash cluster storage, however to populate the table from the raw data would take a long time because hash clusters cannot be populated in parallel. Currently the table is populated through parallel direct load sqlldr and it is very fast.
Also since basically I only need to retrieve one integer ID for each client for the computations, I thought I could possibly create the cluster table as a CREATE AS SELECT TABLE, extracting only the minimum amount of data necessary, i.e., the PK and the ID. In fact the ID requires a small join to be retrieved, so this would have the additional advantage of precomputing this join.
Now the problem is Oracle refuses to create a hash cluster for such tiny rows because it leads to more clusters in an 8k data block than the maximum permissible (256). So now I'm stuck with this issue: I would definitely like to use a hash cluster table but without transfering the whole table data to the cluster, only a single integer per row. Is there a solution to this problem ? What I'm really trying to do is to store a hash map into a table.
Any advice here ?
Thanks for your help
Chris
|
|
|
|
Re: Hash cluster with small (tiny) rows ? [message #383157 is a reply to message #382954] |
Tue, 27 January 2009 07:38 |
chris3110
Messages: 3 Registered: January 2009
|
Junior Member |
|
|
Hi Swan,
thanks for the hint. In fact I saw the long AskTom thread but didn't find an answer to my question there (although it's difficult to be sure, the thread is really long). So in order to be a bit more clear, this is what I did up to now.
First the DB version: Oracle Database 11g Enterprise Edition Release 11.1.0.7.0 - 64bit Production
I created a sample standard table to get an idea of the average row length (result: 15 bytes)
CREATE TABLE test1 AS
SELECT CAST(substr(cst.msisdn, 4) AS char(8)) suffix, cst.id profile
FROM customers cst
WHERE cst.msisdn LIKE '123________'
AND ROWNUM < 10000;
Then I tried to create the hash cluster as per the documentation but got the error message I mentioned:
CREATE CLUSTER test2 (suffix char(8))
SIZE 16
SINGLE TABLE
HASHKEYS 26000000
TABLESPACE TICKETS;
CREATE CLUSTER test2 (suffix char(8))
*
ERROR at line 1:
ORA-02454: Number of hash keys per block (505) exceeds maximum of 256
But I think I may have found the solution to this problem: simply use larger cluster size, which means that I won't have a one-to-one record/hashkey relationship, but is it a problem ?
CREATE CLUSTER profile_cluster (suffix char(8))
SIZE 32
SINGLE TABLE
HASHKEYS 13000000
TABLESPACE TICKETS
PARALLEL 6;
Now my table is loading, it takes quite some time because there is little parallelism here:
INSERT INTO cust_profile_map
SELECT /*+ FULL(cst) PARALLEL(cst, 6) */
CAST(substr(cst.msisdn, 4) AS char(8)), cst.id
FROM customers cst
WHERE cst.msisdn LIKE '123________';
By the way, if I understand it correctly Oracle will apply on a given key a hash function which result is statistically uniformly distributed over the result space, i.e., in this case hopefully 2 hits per cluster. But this means that statistically there will be cases where more than 2 rows will contend for the same cluster. Then I guess a collision occurs and a new cluster needs to be allocated to the key, which means 2 I/O instead of one, reduced performance, etc. Is there a way to know about how many collisions have occurred during the table population ? Should I take some measure to try to avoid collisions ? The documentation seems to advise nothing particular when the number of clusters per block is large.
Thanks for your help,
Chris
|
|
|
Re: Hash cluster with small (tiny) rows ? [message #383238 is a reply to message #383157] |
Tue, 27 January 2009 20:35 |
rleishman
Messages: 3728 Registered: October 2005 Location: Melbourne, Australia
|
Senior Member |
|
|
I have done much the same thing in the past.
I started off with a 1:1 ratio of values to hashkeys, but since you don't get a perfect distribution, you get a lot of collisions. Since the blocks are full, the colliding row(s) get chained into other blocks.
Since Oracle cannot read any LESS than one block, there is (theoretically) no harm in creating fewer hash keys - so long as collisions do not result in chaining.
At 16 Bytes per key, you can fit more than 500 into an 8KB block. Lets say you leave a little room for growth and for collisions and plan only to put 450 keys in each block. With 26M rows, you will need 57778 (lets call it 58000) blocks. So set your SIZE to 8K and your HASHKEYS to 58000.
Ross Leishman
|
|
|
Re: Hash cluster with small (tiny) rows ? [message #383330 is a reply to message #383238] |
Wed, 28 January 2009 05:49 |
chris3110
Messages: 3 Registered: January 2009
|
Junior Member |
|
|
Hi Ross,
thanks, your suggestion of using the largest possible cluster size in order to even out the non-uniformity of the keys distribution makes sense. However I have a concern regarding the best cluster size: it seems to me that since the data blocks are not wholly usable due to header overhead, 8k might not be the best possible size, or am I mislead here ?
Since in my case the overhead is about 140 bytes, I might as well use the next possible value for cluster size, i.e., 256 bytes (since the table will be used solely for reading, I intend to set PCTFREE to 0). So I would end up with 1625000 clusters of 256 bytes each, which allows to store about 26M rows, leaving some space for possible collisions. Does this make sense to you ?
Besides is there a way to get stats about the frequency of collisions in the cluster after loading ?
Thanks a lot for you help,
Chris
|
|
|
Re: Hash cluster with small (tiny) rows ? [message #383434 is a reply to message #383330] |
Wed, 28 January 2009 20:43 |
rleishman
Messages: 3728 Registered: October 2005 Location: Melbourne, Australia
|
Senior Member |
|
|
I use the following SQL to measure collisions in a single table hash cluster
ACCEPT cluster_name PROMPT 'Cluster Name: '
COLUMN table_name NEW_VALUE table_name
SELECT table_name
FROM user_tables
WHERE cluster_name = upper('&cluster_name')
/
COLUMN db_block_size NEW_VALUE db_block_size
SELECT value AS db_block_size
FROM v$parameter
WHERE name = 'db_block_size'
/
COLUMN nblocks NEW_VALUE nblocks
SELECT round((&db_block_size / key_size) * hashkeys) AS nblocks
FROM user_clusters
WHERE cluster_name = upper('&cluster_name')
/
SELECT count(*) AS overflow_blocks
, sum(row_count) AS overflow_rows
FROM (
SELECT file#
, block#
, row_count
, row_number() OVER (ORDER BY file#, block#) AS block_seq
FROM (
SELECT dbms_rowid.rowid_relative_fno(ROWID) AS file#
, dbms_rowid.rowid_block_number(ROWID) AS block#
, count(*) row_count
FROM &table_name
GROUP BY
dbms_rowid.rowid_relative_fno(ROWID)
, dbms_rowid.rowid_block_number(ROWID)
)
)
WHERE block_seq > &nblocks
/
Your allowance for a header makes sense. I have never made such an allowance, yet I have still managed to get the utilisation I expect using a cluster size of exactly 1 block. I don't have a good explanation for this, although I allow for the possibility that I have been mistaken. You could experiment yourself and see what you see.
Ross Leishman
|
|
|
Goto Forum:
Current Time: Sun Jan 26 07:44:21 CST 2025
|