RE: Suggestion on compression

From: Mark W. Farnham <mwf_at_rsiz.com>
Date: Fri, 13 Jan 2023 13:56:34 -0500
Message-ID: <51b901d92780$c2770f20$47652d60$_at_rsiz.com>



With regard to attribute clustering, it is in fact not difficult to create a laboratory test case specifically with anti-correlation of values between two columns (x, y) such that the sum of execution times of queries on y get worse if you physically order on x.  

In data in “the wild” I have yet to discover a case where ordering on x degrades performance on y. Most data in the wild is actually randomly associated between two columns in most cases or slightly correlated rather than anti-correlated. Dates and id numbers tend (for example) to rise together unless the ids use a pattern to diverge to avoid hot spots, in which case the association is random. But finding actual anti-correlation that is not intentionally constructed to demonstrate the possibility is rare (zero times in my experience.)  

You win on both if they are even slightly correlated for the obvious reason and you win if they are mutually random because you can column compress the one (or more) you order by to best effect, minimizing the total size yet leaving the random pattern of the second column unchanged.  

JL’s caution of the possibility is responsible. Certainly test that possibility.  

IF you do some attribute clustering, please let me know if you discover a significant anti-correlation. (IF the cluster factor of an index on the non-ordered column gets worse by enough to matter, that would be sufficient cause to scrutinize performance of queries predicated or ordered on the non-ordered column).  

Good luck.  

From: oracle-l-bounce_at_freelists.org [mailto:oracle-l-bounce_at_freelists.org] On Behalf Of Jonathan Lewis Sent: Friday, January 13, 2023 11:42 AM
To: Oracle L
Subject: Re: Suggestion on compression    

Compression does apply only at the index level, not at the partition level, so all partitions have to have the same number of compressed columns; but I find it hard to imagine difference (time-based) partitioned needing different number of columns compressed. (The same MIGHT not be true of list partitions, of course).  

Attribute Clusterind - you have to make the choice: if you sort the data in any way to benefit one pattern of queries you may easily make other queries slower, but there are many ways in which you have to trade "I can make X faster, but it will make Y slower". How often and how important/urgent X and Y might be is something only you (or your users) can decide.  

Partial indexing: you were the one that suggested that partitions in the far past weren't used, or didn't matter much. It's the same "X vs. Y" trade-off. If a query that used to be run once per week and take 20 seconds now takes 5 minutes but the space saving is 200 GB, would you want the space or the speed. Checking obj# isn't a bad idea but you're probably better off querying v$segstat (dba_hist_segstat) which record counts of physical reads and buffer gets. ASH history is a sample every 10 seconds, and could easily deceive.  

Regards

Jonathan Lewis      

On Wed, 11 Jan 2023 at 18:45, yudhi s <learnerdatabase99_at_gmail.com> wrote:

Thank you so much for the guidance here.  

These are eye opener for me, each of the points which Jonathan mentioned. We will test these options. And I am still trying to digest the points which Mark posted, which seems like a bigger design+code change.  

However regarding the 'prefix compression' my understanding was, it can't be done on a partial/historical set of index partitions rather it has to be applied on global index level i.e. across all the index partitions, so I was thinking of automatic index compression/advance compression strategy which has syntax of making it happen on specific set of partition level only. But I will test this out , I may be wrong.  

Never tried "attribute clustering" before ,but what i read in docs is that it will sort the table data on specific columns/attributes, so i can think of why that will help in increasing efficiency/clustering factor of an index based on those exact columns/order and also better compression ratio etc. , but will it not degrade other indexes clustering factor/efficiency/compression ratio at same time , as the ordering of data is strictly based on some other set of columns now?  

With regards to the option of setting the "indexing off" for the historical partition after declaring partial index, I was thinking just in case if any one off queries would be using any of the "most selective index" for current and + historical partitions scan, that will take a hit. So just for the safer side I am planning to check the current_obj# of the dba_hist_active_sess_history and if any of the index partitions are not showing up here(i.e not getting read/accessed) that should be a good candidate for this option. Please correct if my approach is wrong?  

On Tue, Jan 10, 2023 at 10:38 PM Clay Jackson <dmarc-noreply_at_freelists.org> wrote:

What JL and MWF said, ESPECIALLY:

“Doing cyclical operational stuff like this strikes some folks as complicated. Designing it so that it is relatively simple and routine in execution and in particular is relatively simple for operations to always get right is important”  

It seems to me that the savings here would provide more than enough “incentive” to set up the cycles.  

Also don’t forget to TEST, instrument the processes so that they can be monitored and have contingency plans for failures in place BEFORE they happen i.e – plan for running out of space in the partitioning/unhook process.  

Clay Jackson  

From: oracle-l-bounce_at_freelists.org <oracle-l-bounce_at_freelists.org> On Behalf Of Mark W. Farnham Sent: Tuesday, January 10, 2023 8:54 AM
To: jlewisoracle_at_gmail.com; 'Oracle L' <oracle-l_at_freelists.org> Subject: RE: Suggestion on compression  

CAUTION: This email originated from outside of the organization. Do not follow guidance, click links, or open attachments unless you recognize the sender and know the content is safe.  

Everything JL wrote, +1.  

Another approach entirely is to unhook each partition as it passes the commonly queried period (you might want to do that for 7 days even though you only need 5 to facilitate remaining sane on a weekly operational frequency.) It is possible that no partitioning at all and a daily cycle is practical (probably using the alter synonym trick at midnight), leaving you with both a daily and week old schedule (one for shifting where insert update delete is possible [daily], the other for changing from indexes optimal for recent data queries to 30 day queries).  

The unhook would be to a cleverly and consistently named table, probably <thing>_MMDDYY or something similar. Then your monthly queries become union all queries of the seven partitions and the youngest 23 unhooked tables, which come up as effectively parallel 30 with serial on each of the 30 individuals.  

A reason WHY you might want to do this is if fewer indexes are needed for the “monthly” queries and/or slightly different indexes that might be a drag on insert, update, and delete might make queries on the disconnected partitions that have become tables much more efficient.  

A reason WHY you might want to do this is if you don’t really need the detail for some (or all!) of your monthly queries, so permanently creating permanently cast aggregations as new tables becomes part of the unhook process. IF ALL the monthly queries are aggregations, you might consider a schedule less than 365 days for discarding detail tables from the online database. (Unloaded into something easy to reload, of course, in line with your overall information retention and purging schedule. You might need a detail row to prove something you are never routinely interested in reporting on.)  

A reason WHY you might NOT even think about this is if you have a non-period based uniqueness index across all time (which I doubt, since you write you are limited to 365-ish days).

A reason WHY you might NOT want to pursue this is if your ad hoc 30 day window queries that require different aggregations or indexes frequently erupt.  

There certainly may be additional pro or con reasons beyond my terse enumeration.  

Probably the general health and friendliness (to the CBO) of an object with 7 partitions is better than one with 365 partitions.

Possibly the total online storage size required drops because either you don’t need all the indexes or you might need many fewer “partitions” on line or you only need to keep aggregates older than 7 days on line. The “fan-in” of any possible aggregations has an enormous correlation on whether or not dramatic reductions in size are possible.

Possibly old investigative queries on offline detail transaction can be done on a different hardware resource.  

Doing cyclical operational stuff like this strikes some folks as complicated. Designing it so that it is relatively simple and routine in execution and in particular is relatively simple for operations to always get right is important.  

Good luck,  

mwf  

From: oracle-l-bounce_at_freelists.org [mailto:oracle-l-bounce_at_freelists.org] On Behalf Of Jonathan Lewis Sent: Tuesday, January 10, 2023 5:32 AM
To: Oracle L
Subject: Re: Suggestion on compression    

Do you have a decent-sized test system? It doesn't matter what the manuals say about how things work (and how well things work), you need to test for YOUR specific setup because you may have some extreme case or boundary condition that causes the generic implementation to break.  

Having said that (and using 45 to give you a safety margin over the 30 days you say need need as "current".  

  1. Set MOST of your indexes to "indexing partial", then for (table) partitions older that 45 days set indexing off.
  2. I wouldn't set index compression to automatic, generall it's a declaration that you don't know your data and if you're worried about the impact of this specific table then you should know the data very well. I would set specific indexes to fixed size compression.
  3. Why "query high" when you've implied that you hardly ever read data older than 30 days, why not archive high? You may want to play a little safer and say (e.g. archive high over 180 days, query high from 45 to 180 days). (Query high might be a little CPU intensive, so you might want to use query low rather than query high, but you'll have to do the cost/benefit analysis for your system.)
  4. At the point you move from non-compressed to compressed you could consider include an "attribute clustering" clause - this might improve the performance of some queries, and might improve the compression slightly.
  5. Monitor dba_tab_modifications for the table (and its partitions) so that you can see how many changes are reported for compressed partitions - each change is likely to lead to a chained row, so you need to come up with an algorithm for choosing to rebuild any partition that has had "too many" changes.

Regards

Jonathan Lewis      

On Wed, 4 Jan 2023 at 07:01, yudhi s <learnerdatabase99_at_gmail.com> wrote:

Hello Experts, Happy new year to all.

Considering Oracle database version 19C(19.15 to be specific and its Exadata-X8 machine). The database having majority of the transaction tables are daily range partitioned on a business date column holding truncated date values(without any time component). These tables are holding ~1 year+ worth of data i.e. ~365+ partitions.

The DMLS/write only happens on the live partitions or say today's day data(Say Nth partition).

The querying or reading happens mainly last ~5days data or latest N-5 partitions. However during monthend process(which runs once in a month) , it reads/scans a month worth of data i.e. all the partitions > sysdate-30 or latest N-30 partitions. Rest of the data/partitions are read/written in adhoc basis with minimal frequency.

Now, we are seeing a space crunch in the box and also considering future growth we have been asked to compress all the possible table/indexes partitions. My question is , in the above scenario, will it be a good approach..

1)To just HCC compress all the historical table partitions i.e. all the partitions with partition position <N-30th using 'compress for query high' option. 2)And for all <N-30th local index partitions with 'compress advanced low' option. As "ADVANCE" index compression will let oracle choose the best index prefix key in local partition level and leaf block level automatically rather than us to worry about that.

Or any other strategy would be best in the above situation for these transaction tables?

As I understand the writing will not be a concern as we wont be touching the live partitions, however in case the reading(i.e. index scan, cell smart scan etc.) will happen on the compressed table/index data , it can change the current performance figures. Is my above understanding correct here?  

Regards

Yudhi

--
http://www.freelists.org/webpage/oracle-l
Received on Fri Jan 13 2023 - 19:56:34 CET

Original text of this message