Re: partitioning options for a system that will add 50 million rows/day

From: Job Miller <jobmiller_at_yahoo.com>
Date: Tue, 11 Sep 2007 05:33:02 -0700 (PDT)
Message-ID: <233515.37877.qm@web53904.mail.re2.yahoo.com>

range with sub-hash is quite valuable when you need to do large partition wise joins. oracle can parallelize the join by sub hash partition. it sounded like you would be doing simple key lookups that return 150 rows? do you have anything meaningful that is part of the planned query to hash on? are you expecting a query performance benefit from the prune, or is it just to introduce additional loading concurrency?

If you use hash partitions underneath range and you expect to get some kind of performance benefit you have to assure that the hash part key is part of the query. even if it is, compare that to the range without the hash sub.

my tests in the past have shown that the second level prune operation into a smaller index is equivalent to the i/o that would be done if there was just a little larger index without the sub partitions. your mileage will vary, but just don't expect a big benefit from a second level prune, it should be for management reasons, i/o distributions, concurrency on load, etc..

corrections welcome.

ryan_gaffuri_at_comcast.net wrote: I was quoted numbers of 50 Gbs/day so that assumes 1,000 bytes/row. I have not confirmed this. This database will probably grow to around 2 TBs since older data will be purged. All data will be added with bulk loads using external tables. There will be a query element that will use indexes. About 150 rows/query. Not sure what the load will be on this. Right now I am working on a partitioning strategy. We are going to use RAC and we are in 10.2.

We will definitely use date partitions with local indexes. There has been some talk of 1 partition/hour. Some data loads may take a while. This way newer loads can start on newer partitions.

Should I explore hash sub-partitions? I need to run tests, but I believe hash partitions will actually hurt performance of inserts since Oracle has to decided where to put the record.

I think we definitely need ASM because its too hard to manage spreading out partitions across the LUNs on the SAN manually and since we are adding and dropping alot of partitions we would need code to determine which LUN to put partitions on.

The data model is relatively simple. So there is not alot of complex business logic in the database. The rows are scrubbed before getting to the database. Its just straight array inserts off of external tables.

anyone have any suggestions or comments?

Luggage? GPS? Comic books?
Check out fitting gifts for grads at Yahoo! Search.
--

http://www.freelists.org/webpage/oracle-l Received on Tue Sep 11 2007 - 07:33:02 CDT