Re: Hash to range partition
Date: Fri, 31 Jul 2020 17:52:44 -0500
Message-ID: <CAHSa0M3RLwT-7NMRU_DzbNyXPA1tu=SgL=LxPHnX3ND5d=9s9w_at_mail.gmail.com>
Mladen, Mark and Lothar thank you all for your time.
This is not a Peoplesoft system. The DATE column has a different name, but I put down as EFFDT (I had worked on Peoplesoft systems before - maybe subconsciously it came out). I am not talking about duplicates of an id here. The queries access the table by the ID column mostly and mostly access past few months of data. But the IDs themselves correspond to months, for eg, IDs from 10,000,000 to 20,000,000 may be for June; 20,000,000 to 30,000,000 for July and so on. We have approximately 8 to 14M rows per month.
The reason I posted the question was because I felt that we could get faster response times with range partitioning during SELECTs, but was also wondering about INSERTs which in our case happen only in the nights. I was hoping we could have better SELECT performance in the day at the expense of nightly loads.
With the current hash partitioning the queries have to visit all partitions and all rows to read 3 or 4 month data; very expensive to acquire 30 to 40M rows by having to read all 1.4B rows. I was wondering why the table was hash partitioned and I realized that the ID key feed was coming from a sequence; for those cases it looks like hash partitioning is best because of index contention during INSERTs. I was trying to strike a happy medium by range partitioning that could improve SELECTs (that query only a few months of data by ID column) vs the possibility of slower INSERTs due to index maintenance with range partitioning.
I will test it out anyway and update my observations.
Thanks
Ram.
On Fri, Jul 31, 2020 at 11:18 AM Mark W. Farnham <mwf_at_rsiz.com> wrote:
> ... and ignore my previous advice. For packaged applications you need to
> do things "their way" because you need to service their queries.
>
> Lothar is prompting you to do a careful search to see whether a particular
> approach might be useful.
>
> Just remember as you do this that the vendors can upset your applecart at
> any time. Now if this is a datawarehouse fed FROM a package and you control
> the queries, you may still be in business to significantly improve
> performance.
>
> IF you know the queries you need to service, that is an ideal starting
> point.
>
> Good luck, and Lothar's test should still be done, I think.
>
> mwf
>
> -----Original Message-----
> From: oracle-l-bounce_at_freelists.org [mailto:oracle-l-bounce_at_freelists.org]
> On Behalf Of Lothar Flatz
> Sent: Friday, July 31, 2020 3:31 AM
> To: oracle-l_at_freelists.org
> Subject: Re: Hash to range partition
>
> Hi Ram,
>
> ok, that looks like Peoplesoft. I am suprised that you do not mention
> effdt as search criteria. It would be the usual procedure to use id and
> effdt combined in the query. ("Find the most resent row of an id.") Well,
> in any way if that would be the case we would have duplicates of the id.
> Do we?
> You will understand that this makes a major difference, thus we have to
> find out first.
> If effdt is used as I know it from peoplesoft, effdt would be the column
> to range partition on.
> If I get your thinking right, you want to seperate more recent from older
> rows. That would help your buffer cache.
> The id (higher means newer) is an indirect way to seperate new from old.
> Effdt is straight forward.
> However, if effdt is never used in queries you must not partition on it.
> Run this query to find out (you must be sysdba):
>
> select column_name, equality_preds, range_preds, like_preds,
> equijoin_preds from ( select oo.name owner, o.name, c.name column_name,
> u.equality_preds, u.equijoin_preds, u.nonequijoin_preds, u.range_preds,
> u.like_preds, u.null_preds, u.timestamp from sys.col_usage$ u, sys.obj$ o,
> sys.user$ oo, sys.col$ c where o.obj# = u.obj# and oo.user# = o.owner# and
> c.obj# = u.obj# and c.col# = u.intcol#) dba_col_usage where owner='&owner'
> AND NAME=upper('&table')
> ORDER BY equality_preds+ range_preds+ like_preds desc;
>
> Regards
>
> Lothar
>
> Am 31.07.2020 um 00:16 schrieb Ram Raman:
> > All,
> >
> > We have a 1.4B row table that is hash partitioned by a key (ID). The
> > key is increasing monotonically and seems to be coming from a
> > sequence. The loads happen nightly and those are the only time the
> > table is inserted to; during the day time only SELECTs run against the
> > table.
> >
> > The table has 4 indexes on it, including one on the ID column. None of
> > the indexes are partitioned (!)
> >
> > table t:
> > --------
> >
> > ID NUMBER(16)
> > ...
> > EFFDT DATE
> > ..
> >
> > Unique Index on ID column. Two other single column indexes on 2 number
> > columns and one single column index on a date column.
> >
> > Queries that run in the day time typically access past several months
> > of data from the table by the ID key mostly. It is accessed via other
> > columns as well, but that is less than 20% of the time. There are
> > about 10M rows per month and we have 32 hash partitions. No INSERTS in
> > the day time.
> >
> > Since the load is happening nightly and there are only queries in the
> > day time, I am considering testing out range partitioning the table
> > rather than hash partitioning. It seems hash partitioning is
> > recommended for keys based on sequences to avoid index contention
> > during inserts. However, with the hash based approach the queries are
> > scanning all partitions having to go through all 1.4B rows for a few
> > months of records. If I go with range partitioning (one per month) on
> > the ID key and if the queries access past few months of data only, I
> > feel we can see a substantial performance improvement with queries
> > only having to visit a fraction of the partitions and rows.
> > I am also planning on creating a locally partitioned index on the ID
> > column and also partitioned indexes on the other 3 columns as well.
> > Not sure if it will increase the load time. One option for the indexes
> > is that the locally partitioned index will have the new partition
> > created as needed and the remaining 3 indexes dropped and rebuilt
> > nightly. Will this be faster than the index getting updated during the
> > daily loads?
> >
> > Can the listers share their thoughts on moving to range partition and
> > indexes during load.
> > Ram.
> > --
> >
>
> --
> http://www.freelists.org/webpage/oracle-l
>
>
>
>
> --
> http://www.freelists.org/webpage/oracle-l
>
>
>
-- -- http://www.freelists.org/webpage/oracle-lReceived on Sat Aug 01 2020 - 00:52:44 CEST