Re: Hash to range partition
Date: Sun, 2 Aug 2020 16:26:19 +0100
Message-ID: <CAGtsp8kxs3UYcwO+y-SVxLxi3kab7dCn0wS45mUAva5bd3vwag_at_mail.gmail.com>
You said all your indexes were global (including the unique index on the
partition key).
You've also said that data access is mostly by ID from recent months, but
haven't given any indication of what the queries might look like - whether
it's "id between 10000000 and 11000000" and whether it's only on equality.
Note - the "hash partition to avoid index contention on sequences" guideline is about globally hash partitioning the index to avoid contention on highly concurrent inserts - you've got the "opposite" construction a global index with hash partitioned tables !
If Oracle is going to visit a row by ID it will do an index probe and visit the table - it's not going to visit all the partitions. The only difference you would get by range partitioning on ID is that a local index on PK might have blevel 3 instead of blevel 4, which would give one less buffer visit and (MAYBE) one less physical read to get to the row. But you could hash partition the index to the same effect.
If you are doing large index range scans you will be visiting all 32 partitions on a single segment index range scan, but you'd still do the same number of table blocks visits and (except for very small range scans) reads, and you'd still be caching the same fraction of the table (the "recent months". The most significant effect of range partitioning with local indexing is that the (default) clustering factor of the ID index would drop because rows with consecutive IDs would tend to be in adjacent blocks rather than rotating around 32 blocks from hash partitions - which means seeing tablescans prematurely when you're expecting to see index range scans. You could avoid this with the global index simply by setting the table preference "table cache blocks" to a value greater than 32 (I'd go for the maximum of 255 in your cast) when gathering index stats. See: https://jonathanlewis.wordpress.com/2013/05/09/clustering_factor-2/ and follow the links to Richard Foote's blog on the topic.
(I've assumed your overnight load is in big batches, low level of concurrency, with a batch having a set of sequential IDs - variations from that pattern will have an impact on the analysis).
Regards
Jonathan Lewis
On Thu, Jul 30, 2020 at 11:17 PM Ram Raman <veeeraman_at_gmail.com> wrote:
> All,
>
> We have a 1.4B row table that is hash partitioned by a key (ID). The key
> is increasing monotonically and seems to be coming from a sequence. The
> loads happen nightly and those are the only time the table is inserted to;
> during the day time only SELECTs run against the table.
>
> The table has 4 indexes on it, including one on the ID column. None of the
> indexes are partitioned (!)
>
> table t:
> --------
>
> ID NUMBER(16)
> ...
> EFFDT DATE
> ..
>
> Unique Index on ID column. Two other single column indexes on 2 number
> columns and one single column index on a date column.
>
> Queries that run in the day time typically access past several months of
> data from the table by the ID key mostly. It is accessed via other columns
> as well, but that is less than 20% of the time. There are about 10M rows
> per month and we have 32 hash partitions. No INSERTS in the day time.
>
> Since the load is happening nightly and there are only queries in the day
> time, I am considering testing out range partitioning the table rather than
> hash partitioning. It seems hash partitioning is recommended for keys based
> on sequences to avoid index contention during inserts. However, with the
> hash based approach the queries are scanning all partitions having to go
> through all 1.4B rows for a few months of records. If I go with range
> partitioning (one per month) on the ID key and if the queries access past
> few months of data only, I feel we can see a substantial performance
> improvement with queries only having to visit a fraction of the partitions
> and rows.
>
> I am also planning on creating a locally partitioned index on the ID
> column and also partitioned indexes on the other 3 columns as well. Not
> sure if it will increase the load time. One option for the indexes is that
> the locally partitioned index will have the new partition created as needed
> and the remaining 3 indexes dropped and rebuilt nightly. Will this be
> faster than the index getting updated during the daily loads?
>
> Can the listers share their thoughts on moving to range partition and
> indexes during load.
>
> Ram.
> --
>
>
>
-- http://www.freelists.org/webpage/oracle-lReceived on Sun Aug 02 2020 - 17:26:19 CEST