Horizontal vs Vertical data table design [message #433205] |
Mon, 30 November 2009 11:56 |
wakula
Messages: 150 Registered: February 2008 Location: Poland
|
Senior Member |
|
|
Hello,
I am at the design level of some huge data table and I am not sure which table design would be better :
- multiple columns (horizontal) and DDL
- multiple rows (vertical) without DDL
- other suggestions?
Please, assist.
Expected inserts: ~ 100 000 000 values (numbers) inserted per hour
Expected data: as in example
Expected selects: ~ 30 000 selects like in the example
Database release: Oracle 10.2G
Example source data:
key_column, val_1, val_20, ... , val_500
Note that not all the columns are present. We cannot predict which columns are inserted before we got the data. Data needs to be inserted in many separate bulk loads: ~1000 values per transaction (couple of rows for "horizontal" table).
Table designs on my mind:
CREATE TABLE horizontal
( key_column NUMBER
, val_1 NUMBER
, val_2 NUMBER
, ...
, val_n NUMBER
);
ALTER TABLE horizontal ADD PRIMARY KEY (key_column);
CREATE TYPE my_parent_type AS OBJECT
( dummy CHAR(1) ) NOT FINAL;
CREATE TYPE my_type UNDER my_parent_type
( val_1 NUMBER
, val_20 NUMBER
, val_500 NUMBER
);
CREATE TABLE ora_objects
( key_column NUMBER
, vals my_parent_type
);
ALTER TABLE ora_objects ADD PRIMARY KEY (key_column);
CREATE TABLE vertical
( key_column NUMBER
, value_type NUMBER
, val NUMBER
);
ALTER TABLE vertical ADD PRIMARY KEY (key_column, value_type);
...or nested tables (correct me if I am wrong but type should be created for them as well - I have no experience with nested tables).
Example queries:
SELECT
val_1
+ val_30
FROM horizontal
WHERE key_column IN (:1);
SELECT
TREAT(val AS my_type).val_1
+ TREAT(val AS my_type).val_30
FROM ora_objects
WHERE key_column IN (:1);
SELECT
MAX(DECODE(value_type,:1,val,NULL))
+ MAX(DECODE(value_type,:2,val,NULL))
FROM vertical
WHERE key_column IN (:3);
Different queries (like in example above but with formulas different than val_1+val_30) are executed every hour. Every formula would be executed exactly once against given set of values of key_column.
Cardinality of queries: hundreds of different formulas.
Total SQLs per hour: ~ hundreds * 30 000.
SQL output will be stored in the DB table.
My concerns:
- Queries performance - rows access and data pivoting require additional processing
- Database size - val_type would be repeated for "vertical" design
- Multiple NULLs in "horizontal" design - because we fill only some of the columns
- CREATE TABLE / DROP TABLE which would be executed for every hour - we don't need the data after formula calculation... but we need to remove them at some point
Thanks in advance
|
|
|
Re: Horizontal vs Vertical data table design [message #433811 is a reply to message #433205] |
Fri, 04 December 2009 06:19 |
DBA_SangramKeshari
Messages: 44 Registered: October 2009 Location: Mumbai
|
Member |
|
|
Hi Wakula,
I can help you in logically.
Horigental approach leads to big row size and may cause ROW CHAINING. Which means more than one database blocks will be read.
If you will face this kind of issue they you have to go bith bigger size of data blocks. go with this option if the average row size is less than 4kb and more than 1 kb
Vertical approach may cause disk i/o issue. but as per my thinking this will be the best if you fails in rowsize in horigental approach
|
|
|
Re: Horizontal vs Vertical data table design [message #433856 is a reply to message #433811] |
Fri, 04 December 2009 10:07 |
JRowbottom
Messages: 5933 Registered: June 2006 Location: Sunny North Yorkshire, ho...
|
Senior Member |
|
|
You only get row chaining when you insert/update a row that is longer than the database block size.
As we're looking at 500 numeric columns, that's 500*(21+1) = 11k, plus a little bit - so a 16k block size will ensure that this never happens.
You might get migration if the columns are updated and become larger than will fit in the block, but the OP doesn't mention whether the data gets updated at all or not.
If your queries are going to select a lot of values from the same row and operate on them, then I'd go with the horizontal model as it' will make the queries simpler.
|
|
|
|
Re: Horizontal vs Vertical data table design [message #434166 is a reply to message #434031] |
Mon, 07 December 2009 09:36 |
JRowbottom
Messages: 5933 Registered: June 2006 Location: Sunny North Yorkshire, ho...
|
Senior Member |
|
|
You may well only go for a 16K or 32K block size in those circumstances - I have neither the ability nor the desire to check.
I prefer to look at the individual situation and make a decision based on the requirements of the system in question. In this situation you'd need to look at the data distribution in some detail to make a definite decision about blocksize, but if a high percentage of the columns are populated in general, I'd definitely look at a 16k or 32k blocksize to avoid frequent row-chaining.
Even if the rest of the database were using 8k blocks, there's no reason why you can't create a tablespace for this table that uses 16k blocks.
|
|
|
Re: Horizontal vs Vertical data table design [message #434220 is a reply to message #433205] |
Mon, 07 December 2009 16:35 |
wakula
Messages: 150 Registered: February 2008 Location: Poland
|
Senior Member |
|
|
Some additional details about the data to be stored:- We have 3 types of data:
- 15 types of values - the number should not increase in the future. Usually only a subset of the values will be present.
- 200 types of values - the number will increase in the future. Usually all of the values are present.
- 600 types of values - the number will increase in the future. Usually all of the values are present.
- Values are usually numeric - I believe that the precision might be decreased from Oracle's default.
- Data are going to be stored on NFS inside multiple datafiles and possibly different dataspaces - correct me if I am wrong but the block size might vary.
- Queries would often use multiple columns - thus horizontal approach would simplify them. Pivoting the data require additional resource - again horizontal design is the winner. However we can hit the limit of ~1000 columns - then horizontal would be a great looser (re-design of the DB and code).
- I would expect that row chaining should not happen for "15 values" type and would happen for sure for the "600 values" type.
Today I have talked with some people who are far more experienced than me and the points were that:- Oracle's limit of 1000 columns is a real threat
- Fragmentation might occur but I should not wory about the performance that much
- Data pivoting is a real problem - but the vertical design seems to be optimal for "200 values" and "600 values" types
I will keep the thread updated if we make any final decision internally - to share the knowledge with the community. At the same time I am open to any other ideas/suggestions/comments.
|
|
|
Re: Horizontal vs Vertical data table design [message #434249 is a reply to message #434220] |
Tue, 08 December 2009 02:41 |
JRowbottom
Messages: 5933 Registered: June 2006 Location: Sunny North Yorkshire, ho...
|
Senior Member |
|
|
Row Chaining isn't something to dismiss - it effectively doubles the read cost for affected rows.
I'd still be tempted to go for the horizontal approach - you can work round the 1000 column limit by creating a second table who's primary key is an Id held on the primary table.
|
|
|