Home » RDBMS Server » Performance Tuning » big table partitioning (oracle 10gR2 )
big table partitioning [message #513535] Tue, 28 June 2011 01:46 Go to next message
mr_wee
Messages: 18
Registered: June 2011
Junior Member
Hi all,

I have a big table ( about 50 million records now + 3 million records in every month). so, table size is growing fast, I want to partition it. but I have a problem about selecting partition key. this is my table :

logTable:
 
computer_id     number,
ip              varchar2,
port            number,
datelog         date


I have two types of query on this table, one based on "datelog" another based on "port" :

select * from logTable where computer_id =x and datelog between y and z;
select max(datelog) from logTable where computer_id =x and port=r;


the second query is executed 5 times more than the first one in a day. Now I need help about choosing partitioning type. Is it better using partition by List on "port" column or partition by range on "datelog" column? should I use local index on partitions or global index on whole table?

Regards,
wee

[Updated on: Tue, 28 June 2011 01:49]

Report message to a moderator

Re: big table partitioning [message #513549 is a reply to message #513535] Tue, 28 June 2011 02:59 Go to previous messageGo to next message
Michel Cadot
Messages: 68732
Registered: March 2007
Location: Saint-Maur, France, https...
Senior Member
Account Moderator
Quote:
Is it better using partition by List on "port" column or partition by range on "datelog" column? should I use local index on partitions or global index on whole table?

Why don't you consider computer_id which is in both query?
What is its value domain?
Is there a correlation between computer_id and port?

Regards
Michel
Re: big table partitioning [message #513554 is a reply to message #513549] Tue, 28 June 2011 03:39 Go to previous messageGo to next message
mr_wee
Messages: 18
Registered: June 2011
Junior Member
Quote:
Why don't you consider computer_id which is in both query?
What is its value domain?

every computer in our domain has a computer_id which is a unique for that computer and we have about 400,000 computer_id. our goal is to log logins of each computer. these id's don't follow any regular pattern that I can use any partition type (ex. partition by range).

Quote:
Is there a correlation between computer_id and port?

No, port says that computer from which network connects.

Regards,
wee
Re: big table partitioning [message #513566 is a reply to message #513554] Tue, 28 June 2011 04:51 Go to previous messageGo to next message
Michel Cadot
Messages: 68732
Registered: March 2007
Location: Saint-Maur, France, https...
Senior Member
Account Moderator
You may investigate in a hash partitioning on computer_id and local index on datelog.

Regards
Michel
Re: big table partitioning [message #513585 is a reply to message #513535] Tue, 28 June 2011 05:53 Go to previous messageGo to next message
S.Rajaram
Messages: 1027
Registered: October 2006
Location: United Kingdom
Senior Member
Quote:

so, table size is growing fast, I want to partition it.

Can I assume that you want to partition it because the table is growing faster ? Is that your main reason behind partitioning this table or is there anything else you want to share with us ?

Thanks

Raj
Re: big table partitioning [message #513592 is a reply to message #513585] Tue, 28 June 2011 06:16 Go to previous messageGo to next message
mr_wee
Messages: 18
Registered: June 2011
Junior Member
Raj

I want to partition table because of increasing DB performance and maintenance operation (export,import, purge historical data,...) not just table size is growing fast. I want to say I should do partitioning as soon as possible.

Thanks
Re: big table partitioning [message #513595 is a reply to message #513592] Tue, 28 June 2011 06:24 Go to previous messageGo to next message
Michel Cadot
Messages: 68732
Registered: March 2007
Location: Saint-Maur, France, https...
Senior Member
Account Moderator
Quote:
purge historical data

This is a KEY point.
In this case, you have no choice, you MUST use date columns as partitioning if you want it to ease your historical data maintenance.

Regards
Michel
Re: big table partitioning [message #513598 is a reply to message #513592] Tue, 28 June 2011 06:27 Go to previous messageGo to next message
mr_wee
Messages: 18
Registered: June 2011
Junior Member
Michel

Thanks for your solution. how should I estimate the number of needed partitions for hash partitioning?

Regards,
Wee
Re: big table partitioning [message #513601 is a reply to message #513598] Tue, 28 June 2011 06:33 Go to previous messageGo to next message
Michel Cadot
Messages: 68732
Registered: March 2007
Location: Saint-Maur, France, https...
Senior Member
Account Moderator
Hash partitioning is ONLY for performances, if you want to use partitioning for historical data maintenance you have to use a range partitioning (on date column).

Regards
Michel
Re: big table partitioning [message #513607 is a reply to message #513598] Tue, 28 June 2011 06:52 Go to previous messageGo to next message
mr_wee
Messages: 18
Registered: June 2011
Junior Member
Michel

Quote:
you MUST use date columns as partitioning if you want it to ease your historical data maintenance


but imagine the second query ( query based on "port" ) run 2000 times on average in a day. as the number of records increase, wouldn't global index on "computer_id,port" decrease performance?
Re: big table partitioning [message #513613 is a reply to message #513607] Tue, 28 June 2011 06:57 Go to previous messageGo to next message
Michel Cadot
Messages: 68732
Registered: March 2007
Location: Saint-Maur, France, https...
Senior Member
Account Moderator
You have to choose your priority then you can choose the partitioning option.
As I said, for better performances of these queries, hash partition on computer_id, for ease of maintenance of historical data, range partition on date column.
Maybe you can choose a composite partition on date/computer_id but you have to benchmark it.

Regards
Michel
Re: big table partitioning [message #513731 is a reply to message #513592] Wed, 29 June 2011 03:31 Go to previous messageGo to next message
S.Rajaram
Messages: 1027
Registered: October 2006
Location: United Kingdom
Senior Member
Quote:

I want to partition table because of increasing DB performance and maintenance operation (export,import, purge historical data,...) not just table size is growing fast. I want to say I should do partitioning as soon as possible.

a) Are you experiencing any performance problem currently ? If so what are they ?
b) Export, import could you add bit more clarity to it.
c) Purge historical data, how do you purge it. Is it purely based on a date criteria like records older than <n> days, weeks or months ?

Last but not least, partitioning is not a silver bullet. It's not like telling oracle turn on the fast switch. Partitioning can make things to slower as well.

Check this reply from Tom.

http://asktom.oracle.com/pls/apex/f?p=100:11:0::::P11_QUESTION_ID:3579793200346543970

Why you want to do this is equally important as how you want to do that.

I hope you will bear this in mind from next time.

Regards

Raj
Re: big table partitioning [message #514158 is a reply to message #513731] Fri, 01 July 2011 23:41 Go to previous messageGo to next message
Kevin Meade
Messages: 2103
Registered: December 1999
Location: Connecticut USA
Senior Member
I have taken a lot away from this thread as I am not as experienced with partitioning as I would like. To that end I would like to recap what I have understood and add some commentary of my own. This will help me and maybe you too Mr. Wee.

Previous Commentary I Understand and like:

1) the most important things to understand about partitioning is that is it not an automatic fast switch. To be successful you must make up your mind what the ONE MOST IMPORTANT reason is for attempting partitioning.

a) if your goal is archival of aged data then you should consider range partitioning by date
b) if your goal is performance then you should align your partitions according to your queries. Hash partitioning is often a good partitioning method for this both as a primary and secondary partitioning strategy.

2) global indexes and local indexes behave differently from a performance perspective and the wrong choices can cause performance slowdowns just as easily as good choices can help. Maybe someone could recap the differences for us?

Mr. Wee, it appears that your goals are not fully stated. You should declare what your primary reason for partitioning is (ARCHIVAL? or PERFORMANCE?). Assume you can't have both easy archival based partitioning and fast performance based partitioning and tell us which one of the the two you want most.

That said, looking at your two queries, you might be in one of those rare situatoins where you can actually get both ease of archiving and better performance. I suggest you try the following strategies:

select * from logTable where computer_id =x and datelog between y and z;
select max(datelog) from logTable where computer_id =x and port=r;

Quote:
1) range partition on datelog with hash partition on computer_id
2) global index on computer_id,port,datelog

Notice that partitioning and indexing need not be considered alone. You can use them together.

The global index in #2 contains all the columns in your second query. This fact plus the equi-joins and max() function means that you should be able to get instant lookups to the one value you need from this index via MAX/MIN OF INDEX access method when executing the second query. This is an old strategy and like most old tried and true things is quite useful.

The partitioning scheme in #1 should allow you to archive old partitions easily, yet still provide access to only those partitions your first query actually needs via partition pruning. Thus you won't be reading any data other than that which you need.

I would say it seems obvious but then I would be wrong. I am sure you will be doing lots of testing to prove that this works for you as you want before moving forward with it.

Good luck, Kevin

Normally I don't provide this much finished product in a thread because I expect the OP to do most of the work. But I am getting as much out of doing this work as you are so since it helps me to do it I feel it is OK for me to post it.

-- drop table logtable;

create table logtable
(
   computer_id number not null
 , ip varchar2(30) not null
 , port number not null
 , datelog date not null
)
PARTITION BY RANGE(datelog) 
SUBPARTITION BY HASH(computer_id)
SUBPARTITION TEMPLATE(
SUBPARTITION sp1,
SUBPARTITION sp2,
SUBPARTITION sp3,
SUBPARTITION sp4)
(
    PARTITION jan2000 VALUES LESS THAN (TO_DATE('02/2000','MM/YYYY'))
  , PARTITION feb2000 VALUES LESS THAN (TO_DATE('03/2000','MM/YYYY'))
  , PARTITION mar2000 VALUES LESS THAN (TO_DATE('04/2000','MM/YYYY'))
  , PARTITION apr2000 VALUES LESS THAN (TO_DATE('05/2000','MM/YYYY'))
  , PARTITION may2000 VALUES LESS THAN (TO_DATE('06/2000','MM/YYYY'))
  , PARTITION future  VALUES LESS THAN (MAXVALUE)
)
/

create index logtable_xk1 on logtable (port,computer_id,datelog)
/

set autotrace on


Notice how the plan shows partition intelligence. It is only going after those partitions it thinks it really needs. In this case I believe the plan is telling us it went after two of the six range partitions. I believe also it is telling us that from each of these partitions it is taking only one subpartition but then again as I said I do not read the 11gR2 partition plans as well as I would like. Maybe someone else can comment.

SQL> select *
  2  from logTable
  3  where computer_id = 1
  4  and datelog between TO_DATE('02/2000','MM/YYYY') and TO_DATE('03/2000','MM/YYYY')
  5  /

no rows selected


Execution Plan
----------------------------------------------------------
Plan hash value: 4018555309

-----------------------------------------------------------------------------------------------------
| Id  | Operation                | Name     | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
-----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT         |          |     1 |    52 |     2   (0)| 00:00:01 |       |       |
|   1 |  PARTITION RANGE ITERATOR|          |     1 |    52 |     2   (0)| 00:00:01 |     2 |     3 |
|   2 |   PARTITION HASH SINGLE  |          |     1 |    52 |     2   (0)| 00:00:01 |     4 |     4 |
|*  3 |    TABLE ACCESS FULL     | LOGTABLE |     1 |    52 |     2   (0)| 00:00:01 |       |       |
-----------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - filter("COMPUTER_ID"=1 AND "DATELOG"<=TO_DATE(' 2000-03-01 00:00:00', 'syyyy-mm-dd
              hh24:mi:ss'))

Note
-----
   - dynamic sampling used for this statement (level=2)


Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
          8  consistent gets
          0  physical reads
          0  redo size
        538  bytes sent via SQL*Net to client
        508  bytes received via SQL*Net from client
          1  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          0  rows processed

SQL>


Notice in this plan the access method used. Also note the use of FIRST ROW. They keep changing the names on these things but in the end I believe this is how 11gR2 shows a one row lookup using what we used to call MAX/MIN OF INDEX. This lookup is possible because the second query is looking for the last row of a set of rows and the set of rows is directly addressable because the test conditions are equi tests and based on the index's front columns. Thus you do one index probe and get one row right away. No going back to the table, no scanning the index for lots of rows each time you look.

SQL> select max(datelog) from logTable where computer_id = 1 and port = 2
  2  /

MAX(DATEL
---------



Execution Plan
----------------------------------------------------------
Plan hash value: 1317331059

---------------------------------------------------------------------------------------------
| Id  | Operation                    | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |              |     1 |    35 |     1   (0)| 00:00:01 |
|   1 |  SORT AGGREGATE              |              |     1 |    35 |            |          |
|   2 |   FIRST ROW                  |              |     1 |    35 |     1   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN (MIN/MAX)| LOGTABLE_XK1 |     1 |    35 |     1   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - access("PORT"=2 AND "COMPUTER_ID"=1)

Note
-----
   - dynamic sampling used for this statement (level=2)


Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
          1  consistent gets
          0  physical reads
          0  redo size
        528  bytes sent via SQL*Net to client
        519  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

SQL>


Since the table is range partitioned on datelog, as these partitions age out, you can use partition maintenance to remove them easily and add new ones easily.

Since your first query uses datelog and computer_id (the partitioning columns), this query will visit only the partitions required thus making it about as fast as it can get.

Since your second query is a simple lookup, using the global index gives you very fast access to the one row you require.

In the end it looks like you must might get all that you want. But again you should be doing lots of testing and talking about this stuff in depth with other DBAs.

[Updated on: Sat, 02 July 2011 00:05]

Report message to a moderator

Re: big table partitioning [message #514175 is a reply to message #514158] Sat, 02 July 2011 03:27 Go to previous messageGo to next message
mr_wee
Messages: 18
Registered: June 2011
Junior Member
Thanks all for usefull comments.
I think I should explain my goal clearly. My main goal is achieving better performance and then archiveing data. I try to answer what was ambiguous for guys.

Quote:
a) Are you experiencing any performance problem currently ? If so what are they ?


Now, I have a index on columns(computer_id,port) and I haven't had any problems untill recently. In recent days, ADDM show me performance problem on this table as follows :

Individual SQL statements responsible for significant user I/O wait were 
found.
SQL statement : select max(datelog) from logTable where computer_id =x and port=r; 
Wait class "User I/O" was consuming significant database time.


So, I decide to change my table structure to partitioned table.

Quote:
b) Export, import could you add bit more clarity to it.


this table has data of 1.5 year and when it grows up to 3 years I can remove the first year data from table and this means removing about 30 millions rows from table. this is time consuming. Also, after removing data, I transfer data to another database (using export before removing data and importing just data of first year) that keep history of 10 years. So, I prefer to archive aged data to speed up this process.

Quote:
c) Purge historical data, how do you purge it. Is it purely based on a date criteria like records older than <n> days, weeks or months ?


As I stated above, transfering data is just based on date criteria.

Quote:
Since your second query is a simple lookup, using the global index gives you very fast access to the one row you require.


Dear Kevin, As I declared above I have global index on this table at this moment as you suggested, nevertheless I have performance problem. So, it may not be a good solution.

Regards.
Wee



Re: big table partitioning [message #514425 is a reply to message #514175] Mon, 04 July 2011 12:51 Go to previous messageGo to next message
Kevin Meade
Messages: 2103
Registered: December 1999
Location: Connecticut USA
Senior Member
Well Mr. Wee, there are no guarantees.

However, please note that the index I suggested is not the same as the one you are currently using.

Also note that the partitioning scheme I provided shows that you will only be scanning the partitions you need with this scheme.

Each of your two queries benefits from the solution I have suggested. Seems to me the solutions I have suggested should offer excellent performance for both queries.

However, if I were tuning this problem at my place of employement, I would be asking for lots of information which so far you have not provided. So let me suggest the following:

1) create table statement
2) create index statement
3) dbms_stats statement
4) runtime example from sqlplus with AUTOTRACE ON
5) rowcounts for each table
6) rowcounts after filters are applied to each table

With this information there is a chance to determine if there is something really amiss, or if your expectations may simply be too great.

Consider for example, that you have provided two queries, but you have not provided any exmaple of running them with their associated stats. This makes it difficult to do much but provide generalized adivce.

Still, I stand by my original posts and suggest that you try these ideas before shooting them down. Create the partitioning I suggested in a copy of the table, and add the global index I suggested. Then show us the runtime details for your two queries.

Good luck, Kevin

[Updated on: Mon, 04 July 2011 12:56]

Report message to a moderator

Re: big table partitioning [message #517193 is a reply to message #514425] Sun, 24 July 2011 03:49 Go to previous message
mr_wee
Messages: 18
Registered: June 2011
Junior Member
Hi all guys,

After much testing and considering various partitioning schemes with different global/local indexes and Using tips and suggestions from all friends here, finally I can choose the best solution for my problem.
I used the range partitioning with global index on (computer_id,port,datelog) columns. this works the best and has the minimum cost and time.

Thanks all for useful tips and solutions.

Good luck,
Wee
Previous Topic: delete statement taking forever
Next Topic: query optimization
Goto Forum:
  


Current Time: Sun Jan 26 13:24:50 CST 2025