Partitioning on a VARCHAR2 column [message #620069] |
Mon, 28 July 2014 10:19 |
|
mafoe
Messages: 6 Registered: July 2014
|
Junior Member |
|
|
I have a table with ~7 million entries.
One column of this table is called "model" and its data type is VARCHAR2. There are about 400 different models (aka model has a cardinality of 400), and I select from this table with the model in the WHERE clause. So let's assume there are ~18k rows per model in this table.
Now I could use an index on the "model" column, but to my knowledge, if a select returns a lot of entries, it's expensive for Oracle to link the row ids from the index with the actual data rows because Oracle has to do it 18k times if the number of returned rows is 18k.
My understanding is that the "model" column would be a good key on which to partition. Then a select with the desired model in the WHERE clause could to a full partition scan on the relevant partition, without using any index, and return all the data it has read.
How would I partition this table, seeing that I cannot use interval partitioning on VARCHAR2, without having to maintain a very burdensome list partition? Or is this not a good use case for partitioning?
Thanks in advance.
|
|
|
|
Re: Partitioning on a VARCHAR2 column [message #620073 is a reply to message #620070] |
Mon, 28 July 2014 10:47 |
|
mafoe
Messages: 6 Registered: July 2014
|
Junior Member |
|
|
List partitions have to be created explicitly, no?
I don't want to have to add or remove partitions when a new model is introduced in a year from now.
I've read about interval partitioning where new partitions are created automatically, and I thought something like that would be nice for VARCHARs.
[Updated on: Mon, 28 July 2014 10:49] Report message to a moderator
|
|
|
Re: Partitioning on a VARCHAR2 column [message #620107 is a reply to message #620073] |
Mon, 28 July 2014 19:32 |
rleishman
Messages: 3728 Registered: October 2005 Location: Melbourne, Australia
|
Senior Member |
|
|
mafoe wrote on Tue, 29 July 2014 01:47List partitions have to be created explicitly, no?
That's right, they do.
mafoe wrote on Tue, 29 July 2014 01:47I don't want to have to add or remove partitions when a new model is introduced in a year from now.
If new models are that infrequent, set up a DEFAULT partition to capture the new ones and periodically check the size of the DEFAULT partition. When it gets too big, migrate the new models out to their own LIST partitions.
Alternative: you could use HASH partitions. You won't get one partition per distinct value, but if you used (say) 32 hash partitions then you should get a reasonable distribution and probably no more than 500K rows per partition, which should full scan in a second or two.
Also worth considering: upgrade to 12.1.0.2 and use Zone Maps - they would be perfect for this
Ross Leishman
|
|
|
Re: Partitioning on a VARCHAR2 column [message #620116 is a reply to message #620069] |
Tue, 29 July 2014 00:57 |
Lalit Kumar B
Messages: 3174 Registered: May 2013 Location: World Wide on the Web
|
Senior Member |
|
|
Do you frequently query on all the models in the table? As you said you keep getting new models in course of time. Assuming you have a date type column showing the date when the new model was inserted, is it not possible to create a RANGE-LIST partition?
For example, in your query predicate you mention,
where dt_tm_stamp > sysdate - 30 and model ='x'
Above would only look into the previous month's partition and the sub-partition 'x'.
|
|
|
Re: Partitioning on a VARCHAR2 column [message #620123 is a reply to message #620107] |
Tue, 29 July 2014 04:21 |
|
mafoe
Messages: 6 Registered: July 2014
|
Junior Member |
|
|
Thanks for the advise.
rleishman wrote on Tue, 29 July 2014 02:32If new models are that infrequent, set up a DEFAULT partition to capture the new ones and periodically check the size of the DEFAULT partition. When it gets too big, migrate the new models out to their own LIST partitions.
When I'm gone from this project, nobody will continue doing that, so I need a solution that doesn't require maintenance in the future.
rleishman wrote on Tue, 29 July 2014 02:32Alternative: you could use HASH partitions. You won't get one partition per distinct value, but if you used (say) 32 hash partitions then you should get a reasonable distribution and probably no more than 500K rows per partition, which should full scan in a second or two.
That sounds good. I'll check that out.
Upgrading to Oracle 12 is not in the near future. Upgrades are slow around here.
Lalit Kumar B wrote on Tue, 29 July 2014 07:57Do you frequently query on all the models in the table? As you said you keep getting new models in course of time. Assuming you have a date type column showing the date when the new model was inserted, is it not possible to create a RANGE-LIST partition?
For example, in your query predicate you mention,
where dt_tm_stamp > sysdate - 30 and model ='x'
Above would only look into the previous month's partition and the sub-partition 'x'.
It is entirely unpredictable and out of my hand when and how many new models are created. This table is for persisting data we get from another system, and the inserts are user-triggered.
|
|
|
|