Home » RDBMS Server » Performance Tuning » big table partitioning (oracle 10gR2 )
big table partitioning [message #513535] |
Tue, 28 June 2011 01:46 |
|
mr_wee
Messages: 18 Registered: June 2011
|
Junior Member |
|
|
Hi all,
I have a big table ( about 50 million records now + 3 million records in every month). so, table size is growing fast, I want to partition it. but I have a problem about selecting partition key. this is my table :
logTable:
computer_id number,
ip varchar2,
port number,
datelog date
I have two types of query on this table, one based on "datelog" another based on "port" :
select * from logTable where computer_id =x and datelog between y and z;
select max(datelog) from logTable where computer_id =x and port=r;
the second query is executed 5 times more than the first one in a day. Now I need help about choosing partitioning type. Is it better using partition by List on "port" column or partition by range on "datelog" column? should I use local index on partitions or global index on whole table?
Regards,
wee
[Updated on: Tue, 28 June 2011 01:49] Report message to a moderator
|
|
|
|
|
|
Re: big table partitioning [message #513585 is a reply to message #513535] |
Tue, 28 June 2011 05:53 |
S.Rajaram
Messages: 1027 Registered: October 2006 Location: United Kingdom
|
Senior Member |
|
|
Quote:
so, table size is growing fast, I want to partition it.
Can I assume that you want to partition it because the table is growing faster ? Is that your main reason behind partitioning this table or is there anything else you want to share with us ?
Thanks
Raj
|
|
|
|
|
|
|
|
|
Re: big table partitioning [message #513731 is a reply to message #513592] |
Wed, 29 June 2011 03:31 |
S.Rajaram
Messages: 1027 Registered: October 2006 Location: United Kingdom
|
Senior Member |
|
|
Quote:
I want to partition table because of increasing DB performance and maintenance operation (export,import, purge historical data,...) not just table size is growing fast. I want to say I should do partitioning as soon as possible.
a) Are you experiencing any performance problem currently ? If so what are they ?
b) Export, import could you add bit more clarity to it.
c) Purge historical data, how do you purge it. Is it purely based on a date criteria like records older than <n> days, weeks or months ?
Last but not least, partitioning is not a silver bullet. It's not like telling oracle turn on the fast switch. Partitioning can make things to slower as well.
Check this reply from Tom.
http://asktom.oracle.com/pls/apex/f?p=100:11:0::::P11_QUESTION_ID:3579793200346543970
Why you want to do this is equally important as how you want to do that.
I hope you will bear this in mind from next time.
Regards
Raj
|
|
|
Re: big table partitioning [message #514158 is a reply to message #513731] |
Fri, 01 July 2011 23:41 |
|
Kevin Meade
Messages: 2103 Registered: December 1999 Location: Connecticut USA
|
Senior Member |
|
|
I have taken a lot away from this thread as I am not as experienced with partitioning as I would like. To that end I would like to recap what I have understood and add some commentary of my own. This will help me and maybe you too Mr. Wee.
Previous Commentary I Understand and like:
1) the most important things to understand about partitioning is that is it not an automatic fast switch. To be successful you must make up your mind what the ONE MOST IMPORTANT reason is for attempting partitioning.
a) if your goal is archival of aged data then you should consider range partitioning by date
b) if your goal is performance then you should align your partitions according to your queries. Hash partitioning is often a good partitioning method for this both as a primary and secondary partitioning strategy.
2) global indexes and local indexes behave differently from a performance perspective and the wrong choices can cause performance slowdowns just as easily as good choices can help. Maybe someone could recap the differences for us?
Mr. Wee, it appears that your goals are not fully stated. You should declare what your primary reason for partitioning is (ARCHIVAL? or PERFORMANCE?). Assume you can't have both easy archival based partitioning and fast performance based partitioning and tell us which one of the the two you want most.
That said, looking at your two queries, you might be in one of those rare situatoins where you can actually get both ease of archiving and better performance. I suggest you try the following strategies:
select * from logTable where computer_id =x and datelog between y and z;
select max(datelog) from logTable where computer_id =x and port=r;
Quote:1) range partition on datelog with hash partition on computer_id
2) global index on computer_id,port,datelog
Notice that partitioning and indexing need not be considered alone. You can use them together.
The global index in #2 contains all the columns in your second query. This fact plus the equi-joins and max() function means that you should be able to get instant lookups to the one value you need from this index via MAX/MIN OF INDEX access method when executing the second query. This is an old strategy and like most old tried and true things is quite useful.
The partitioning scheme in #1 should allow you to archive old partitions easily, yet still provide access to only those partitions your first query actually needs via partition pruning. Thus you won't be reading any data other than that which you need.
I would say it seems obvious but then I would be wrong. I am sure you will be doing lots of testing to prove that this works for you as you want before moving forward with it.
Good luck, Kevin
Normally I don't provide this much finished product in a thread because I expect the OP to do most of the work. But I am getting as much out of doing this work as you are so since it helps me to do it I feel it is OK for me to post it.
-- drop table logtable;
create table logtable
(
computer_id number not null
, ip varchar2(30) not null
, port number not null
, datelog date not null
)
PARTITION BY RANGE(datelog)
SUBPARTITION BY HASH(computer_id)
SUBPARTITION TEMPLATE(
SUBPARTITION sp1,
SUBPARTITION sp2,
SUBPARTITION sp3,
SUBPARTITION sp4)
(
PARTITION jan2000 VALUES LESS THAN (TO_DATE('02/2000','MM/YYYY'))
, PARTITION feb2000 VALUES LESS THAN (TO_DATE('03/2000','MM/YYYY'))
, PARTITION mar2000 VALUES LESS THAN (TO_DATE('04/2000','MM/YYYY'))
, PARTITION apr2000 VALUES LESS THAN (TO_DATE('05/2000','MM/YYYY'))
, PARTITION may2000 VALUES LESS THAN (TO_DATE('06/2000','MM/YYYY'))
, PARTITION future VALUES LESS THAN (MAXVALUE)
)
/
create index logtable_xk1 on logtable (port,computer_id,datelog)
/
set autotrace on
Notice how the plan shows partition intelligence. It is only going after those partitions it thinks it really needs. In this case I believe the plan is telling us it went after two of the six range partitions. I believe also it is telling us that from each of these partitions it is taking only one subpartition but then again as I said I do not read the 11gR2 partition plans as well as I would like. Maybe someone else can comment.
SQL> select *
2 from logTable
3 where computer_id = 1
4 and datelog between TO_DATE('02/2000','MM/YYYY') and TO_DATE('03/2000','MM/YYYY')
5 /
no rows selected
Execution Plan
----------------------------------------------------------
Plan hash value: 4018555309
-----------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop |
-----------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 52 | 2 (0)| 00:00:01 | | |
| 1 | PARTITION RANGE ITERATOR| | 1 | 52 | 2 (0)| 00:00:01 | 2 | 3 |
| 2 | PARTITION HASH SINGLE | | 1 | 52 | 2 (0)| 00:00:01 | 4 | 4 |
|* 3 | TABLE ACCESS FULL | LOGTABLE | 1 | 52 | 2 (0)| 00:00:01 | | |
-----------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
3 - filter("COMPUTER_ID"=1 AND "DATELOG"<=TO_DATE(' 2000-03-01 00:00:00', 'syyyy-mm-dd
hh24:mi:ss'))
Note
-----
- dynamic sampling used for this statement (level=2)
Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
8 consistent gets
0 physical reads
0 redo size
538 bytes sent via SQL*Net to client
508 bytes received via SQL*Net from client
1 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
0 rows processed
SQL>
Notice in this plan the access method used. Also note the use of FIRST ROW. They keep changing the names on these things but in the end I believe this is how 11gR2 shows a one row lookup using what we used to call MAX/MIN OF INDEX. This lookup is possible because the second query is looking for the last row of a set of rows and the set of rows is directly addressable because the test conditions are equi tests and based on the index's front columns. Thus you do one index probe and get one row right away. No going back to the table, no scanning the index for lots of rows each time you look.
SQL> select max(datelog) from logTable where computer_id = 1 and port = 2
2 /
MAX(DATEL
---------
Execution Plan
----------------------------------------------------------
Plan hash value: 1317331059
---------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 35 | 1 (0)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 35 | | |
| 2 | FIRST ROW | | 1 | 35 | 1 (0)| 00:00:01 |
|* 3 | INDEX RANGE SCAN (MIN/MAX)| LOGTABLE_XK1 | 1 | 35 | 1 (0)| 00:00:01 |
---------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
3 - access("PORT"=2 AND "COMPUTER_ID"=1)
Note
-----
- dynamic sampling used for this statement (level=2)
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
1 consistent gets
0 physical reads
0 redo size
528 bytes sent via SQL*Net to client
519 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed
SQL>
Since the table is range partitioned on datelog, as these partitions age out, you can use partition maintenance to remove them easily and add new ones easily.
Since your first query uses datelog and computer_id (the partitioning columns), this query will visit only the partitions required thus making it about as fast as it can get.
Since your second query is a simple lookup, using the global index gives you very fast access to the one row you require.
In the end it looks like you must might get all that you want. But again you should be doing lots of testing and talking about this stuff in depth with other DBAs.
[Updated on: Sat, 02 July 2011 00:05] Report message to a moderator
|
|
|
Re: big table partitioning [message #514175 is a reply to message #514158] |
Sat, 02 July 2011 03:27 |
|
mr_wee
Messages: 18 Registered: June 2011
|
Junior Member |
|
|
Thanks all for usefull comments.
I think I should explain my goal clearly. My main goal is achieving better performance and then archiveing data. I try to answer what was ambiguous for guys.
Quote:a) Are you experiencing any performance problem currently ? If so what are they ?
Now, I have a index on columns(computer_id,port) and I haven't had any problems untill recently. In recent days, ADDM show me performance problem on this table as follows :
Individual SQL statements responsible for significant user I/O wait were
found.
SQL statement : select max(datelog) from logTable where computer_id =x and port=r;
Wait class "User I/O" was consuming significant database time.
So, I decide to change my table structure to partitioned table.
Quote:b) Export, import could you add bit more clarity to it.
this table has data of 1.5 year and when it grows up to 3 years I can remove the first year data from table and this means removing about 30 millions rows from table. this is time consuming. Also, after removing data, I transfer data to another database (using export before removing data and importing just data of first year) that keep history of 10 years. So, I prefer to archive aged data to speed up this process.
Quote:c) Purge historical data, how do you purge it. Is it purely based on a date criteria like records older than <n> days, weeks or months ?
As I stated above, transfering data is just based on date criteria.
Quote:Since your second query is a simple lookup, using the global index gives you very fast access to the one row you require.
Dear Kevin, As I declared above I have global index on this table at this moment as you suggested, nevertheless I have performance problem. So, it may not be a good solution.
Regards.
Wee
|
|
|
Re: big table partitioning [message #514425 is a reply to message #514175] |
Mon, 04 July 2011 12:51 |
|
Kevin Meade
Messages: 2103 Registered: December 1999 Location: Connecticut USA
|
Senior Member |
|
|
Well Mr. Wee, there are no guarantees.
However, please note that the index I suggested is not the same as the one you are currently using.
Also note that the partitioning scheme I provided shows that you will only be scanning the partitions you need with this scheme.
Each of your two queries benefits from the solution I have suggested. Seems to me the solutions I have suggested should offer excellent performance for both queries.
However, if I were tuning this problem at my place of employement, I would be asking for lots of information which so far you have not provided. So let me suggest the following:
1) create table statement
2) create index statement
3) dbms_stats statement
4) runtime example from sqlplus with AUTOTRACE ON
5) rowcounts for each table
6) rowcounts after filters are applied to each table
With this information there is a chance to determine if there is something really amiss, or if your expectations may simply be too great.
Consider for example, that you have provided two queries, but you have not provided any exmaple of running them with their associated stats. This makes it difficult to do much but provide generalized adivce.
Still, I stand by my original posts and suggest that you try these ideas before shooting them down. Create the partitioning I suggested in a copy of the table, and add the global index I suggested. Then show us the runtime details for your two queries.
Good luck, Kevin
[Updated on: Mon, 04 July 2011 12:56] Report message to a moderator
|
|
|
Re: big table partitioning [message #517193 is a reply to message #514425] |
Sun, 24 July 2011 03:49 |
|
mr_wee
Messages: 18 Registered: June 2011
|
Junior Member |
|
|
Hi all guys,
After much testing and considering various partitioning schemes with different global/local indexes and Using tips and suggestions from all friends here, finally I can choose the best solution for my problem.
I used the range partitioning with global index on (computer_id,port,datelog) columns. this works the best and has the minimum cost and time.
Thanks all for useful tips and solutions.
Good luck,
Wee
|
|
|
Goto Forum:
Current Time: Sun Nov 24 21:29:01 CST 2024
|