Subpartitions number [message #380843] |
Tue, 13 January 2009 09:48 |
wakula
Messages: 150 Registered: February 2008 Location: Poland
|
Senior Member |
|
|
Hi,
I was wondering whether there are any drawbacks of creating large number of subpartitions.
I have a partitioned table (partitioned by date thus it seems that I can only use hash subpartition template) and I wish to divide it into 64 or more partitions. Is there any negative impact of such subpartitioning?
|
|
|
|
Re: Subpartitions number [message #380865 is a reply to message #380843] |
Tue, 13 January 2009 11:15 |
wakula
Messages: 150 Registered: February 2008 Location: Poland
|
Senior Member |
|
|
The goal is to split the table into smaller chunks - without changing its name or partitioning schema (new partitions are added/dropped by some other code).
I have big table (average 20 000 000 rows per partition) partitioned by date column. That partition holds about 100 sets of data. Queries are only for a single set of data.
CREATE TABLE my_table
(col_1 DATE
,col_2 NUMBER -- FOREIGN KEY
,col_3 NUMBER
,target VARCHAR2(1024)
);
Partitioned by col_1.
Example (simplified) query:
SELECT something
FROM other_table
WHERE other_table.target IN
(
SELECT target
FROM my_table
WHERE col_1 = to_date('2009-01-13','YYYY-MM-DD')
AND col_2 = :variable_1
AND col_3 > :variable_2
)
If I could - I would create separate tables for every col_2 value. Right now I seek how to split that existing table into smaller chunks (using col_2) without changes in the schema (there is some code that rely on that table).
I am not sure about the hard drives - probably come RAID matrix.
Additional question:
The query is executed using DBMS_SQL.EXECUTE. We had to hard-code that to_date(...) into the query statement - full table scan was performed when it was specified as a variable. What could have been the reason?
I would be grateful for any hints/answers.
|
|
|
|
Re: Subpartitions number [message #380881 is a reply to message #380843] |
Tue, 13 January 2009 14:36 |
wakula
Messages: 150 Registered: February 2008 Location: Poland
|
Senior Member |
|
|
Michel,
Should I understand that I would get no benefit from dividing that table into subpartitions?
Maybe this is worth mentioning that the table is used only by my PL/SQL code and the some maintenance scripts that I have mentioned already (responsible for partitions adding/dropping).
|
|
|
|
Re: Subpartitions number [message #380920 is a reply to message #380908] |
Wed, 14 January 2009 01:31 |
rleishman
Messages: 3728 Registered: October 2005 Location: Melbourne, Australia
|
Senior Member |
|
|
I disagree with Michel. If you have 20M rows per partiton and you typically perform SQLs that scan 0.5M - 10M rows, then further partitioning will help full partition scans.
First of all, are your current partitions DAILY, or are they some larger grain (monthly)? If they are bigger than daily and most of your queries are for a single day or a VERY small range of days, then make your partitions DAILY.
If you have partitioned to the grain of your queries (ie. daily queries and daily partitions) and you still have 20M rows per partition, then sub-partitions can help queries that include that column. Especially if you can reduce a 20M row scan down to say 1M rows.
If you have 100 distinct values that are EVENLY DISTRIBUTED, then I would suggest 5-10 hash sub-partitions.
If the values are SKEWED, you should use LIST based sub-partitions to get the most populous values in their own sub-partitions.
If the values are HEAVILY SKEWED (>90% with a single value) then there will be no benefit to sub-partitioning. You would be better to index the column; this will provide indexed access for uncommon values and full scans for the common value.
The one thing you have to watch out for when you create lots of partitions is LOCALLY PARTITIONED INDEXES.
If you have 100 partitions and 10 sub-partitions, that's 1000 index segments. If you perform a query that does NOT include the partition key, then Oracle will need to perform 1000 separate range scans.
This is not a problem with BITMAP indexes, nor is it a problem when you fetch thousands of rows (ie. long range scan or full scan). It is only problematic when you fetch a small number of rows without providing the partition key in the SQL.
Note that this is a NON-STANDARD implementation of sub-partitons. The best reasons for sub-partitioning are partition-wise joins and parallel full table scans. If you have NO NEED WHATSOEVER for these popular features, you will be OK. However if these are a factor for you, then they may be at odd with the partition-pruning approach that you are currently taking.
Ross Leishman
|
|
|
|
Re: Subpartitions number [message #380931 is a reply to message #380843] |
Wed, 14 January 2009 03:03 |
wakula
Messages: 150 Registered: February 2008 Location: Poland
|
Senior Member |
|
|
Data distribution:
The queries are almost evenly distributed. Some data sets contains 400'000 rows while some other 50'000. Since this is about 1% of the partition - I guess that it would qualify to "small numer of rows". Data are inserted daily - so the insertion order should match partitioning key order.
Partitioning:
Partitions are created on a monthly basis.
We have 3 types of queries:
1) Query that is executed daily on 1 day of data
2) Query that is executed weekly on 7 days of data
3) Query is executed usually on 28 days of data (usually that would target 2 partitions).
Dividing the data into 1 day partitions is tempting (especially for the first group of queries).
Once in a month we drop oldest partitions and add a new one for upcoming month. This is performed by a shell/sqlplus script executed from cron (yeah - I know that it is not the best approach).
Indexes:
Indexes on almost all tables ale locally partitioned. Queries are always performed using the partitioning key (the key is always a date, the query statement has always the date range embedded into it as a string instead of variable) and some extra columns. Subpartitions would be created on hash of the "extra columns" used in the query. Local indexes are created on date and other columns. Date should be always the first query after WHERE keyword (just in case since Oracle performs the query from left to right).
Server stats:
I am not sure about the stats but some time ago the server used only 1 to 3 CPU's while processing, up to 16 GB of RAM, some swap and main bottleneck was I/O. Not sure about the disk (it is mounted using NFS) - it might be a single disk but most probably this is a RAID with 4 or more drives. The bottleneck for my queries is I/O but the table is partitioned and has local indexes. Single (huge) query should fetch/aggregate up to 2M rows at once from different tables - it takes seconds to minutes to do that. Adding small subquery (to the table that I was asking about) to that long statement would add 50k-400k rows to fetch (without aggregation) and increase processing time to minutes/hours.
Comment:
Until now I was thinking that it is not possible to use range/index partitioning/subpartitioning with a subpartitioning template. Creating the subpartitioning manually could make the logic too complex. Could you confirm that it is possible to use range/index using subpartition template?
[Updated on: Wed, 14 January 2009 04:05] Report message to a moderator
|
|
|
Re: Subpartitions number [message #381205 is a reply to message #380922] |
Thu, 15 January 2009 14:46 |
rleishman
Messages: 3728 Registered: October 2005 Location: Melbourne, Australia
|
Senior Member |
|
|
Michel Cadot wrote on Wed, 14 January 2009 18:45 | What benefit could you have if you only have 1 disk?
Just more contention.
What if the statement is not parallelizable?
What if the workload is 100%CPU yet?
Regards
Michel
|
I don't understand the importance of CPU and number of disks to Partition Pruning. If you can partition your data in such a way that your query scans less data, it will be faster. Right?
Suggest you try daily partitioning on your test system. Instead of creating 1 partition per month, create 28-31 of them.
You may find then that IO is reduced to an extent that you are happy to live with the few that are not significantly improved.
If you are going down the sub-partitioning path, I would go with List Partitioning on a single column since it seems your data is somewhat skewed. The syntax is pretty straightforward - just read the manual. Make sure you do lots of prototyping - this is a pretty serious step that you shouldn't be rushing into production.
Ross Leishman
|
|
|
|
Re: Subpartitions number [message #381279 is a reply to message #381260] |
Fri, 16 January 2009 02:21 |
rleishman
Messages: 3728 Registered: October 2005 Location: Melbourne, Australia
|
Senior Member |
|
|
Michel Cadot wrote on Fri, 16 January 2009 17:53 | Quote: | I don't understand the importance of CPU and number of disks to Partition Pruning
|
OP was talking about hash (sub)partitioning, there is no partition pruning with hash partitioning just spreading IO (and CPU work if parallel query).
Regards
Michel
|
Oracle Data Warehousing Guide 10.2 | Oracle Database prunes partitions when you use range, LIKE, equality, and IN-list predicates on the range or list partitioning columns, and when you use equality and IN-list predicates on the hash partitioning columns.
On composite partitioned objects, Oracle can prune at both the range partition level and at the hash or list subpartition level using the relevant predicates.
|
|
|
|
|
|
Re: Subpartitions number [message #381628 is a reply to message #381304] |
Mon, 19 January 2009 00:39 |
rleishman
Messages: 3728 Registered: October 2005 Location: Melbourne, Australia
|
Senior Member |
|
|
When all else fails, try it out:
SQL> CREATE TABLE sptest (
2 a number
3 , b number
4 , c number
5 )
6 PARTITION BY RANGE (a)
7 SUBPARTITION BY LIST (b)
8 SUBPARTITION TEMPLATE (
9 SUBPARTITION S1 VALUES (1)
10 , SUBPARTITION S2 VALUES (2)
11 , SUBPARTITION DF VALUES (DEFAULT)
12 )
13 (
14 PARTITION VALUES LESS THAN (1)
15 , PARTITION VALUES LESS THAN (2)
16 )
17 /
Table created.
SQL>
SQL> ALTER TABLE sptest
2 ADD PARTITION VALUES LESS THAN (3)
3 /
Table altered.
The documentation is at http://download.oracle.com/docs/cd/B19306_01/server.102/b14200/statements_7002.htm#i2146320
Ross Leishman
|
|
|
Re: Subpartitions number [message #388790 is a reply to message #380843] |
Thu, 26 February 2009 04:13 |
wakula
Messages: 150 Registered: February 2008 Location: Poland
|
Senior Member |
|
|
Hello again,
I have been doing some experiments with the subpartitions.
Although I don't have as much knowledge about that subject, as I would like to (my company is not going to "waste" money on trainings and they want cheap stuff instead - sadly this is the approach of most companies nowadays) - I will try to share my experiences with anyone who might read this thread.
When I have created multiple partitions/subpartitions (total about 365 partitions with 16 subpartitions in each partition) then my database went down. The cause was disk space usage - too many extents were reserved. The database had grown a lot (initial size was about 4GiB). NOTE: the partitions/subpartitions were empty. Used Oracle was 10.2.
Right now I'm going to use bitmap indexes instead (not sure what will be the disk space usage though.
My new DB structure is going to look like (the code bellow could contain some syntax misspelling):
CREATE TABLE a_lot_of_data (EventDate DATE, CompanyID INTEGER, more_columns SOMETHING) PARTITION BY RANGE (EventDate);
There is 1 partition created every month (I am not sure if the duration is not too long thoug).
Then indexes:
CREATE BITMAP INDEX a_lot_of_data_date ON a_lot_of_data(EventDate);
CREATE BITMAP INDEX a_lot_of_data_company ON a_lot_of_data(companyid);
And code hints to use:
SELECT /*+ hint INDEX_COMBINE(a_lot_of_data a_lot_of_data_date a_lot_of_data_company) */ more_columns FROM a_lot_of_data WHERE EventDate BETWEEN :1 AND :2 AND CompanyID=:3
I will update this thread once I have completed my work.
|
|
|