Re: How to minimise sorting time in a query
Date: Mon, 7 Nov 2022 19:56:26 +0000
Message-ID: <CAGtsp8=rQsFcSicQe7t0sKZczUHheSdHiqAaUm-j-_5amEcCXQ_at_mail.gmail.com>
create index t1_i1 on t1(c1,c2,c3) global partition by range (c1) (
partition p1 values less than ('31'), partition p2 values less than ('32'), partition pm values less than (maxvalue))
/
Because t1_i1 is not a (non-unique) globally partitioned index starting with (c1, c2), you can also enforce uniqueness on (c1, c2) by executing:
alter table t1 add constraint t1_uk unique (c1, c2) using index t1_i1;
Regards
Jonathan Lewis
On Mon, 7 Nov 2022 at 16:45, Lok P <loknath.73_at_gmail.com> wrote:
> Thank you Jonathan, Andy, Lothar.
>
> "*Since you want to partition by create_date the globally partitioned
> index on (c1,c2,c3) is probably the best bet. You've said that (c1,c2) is
> unique (and the primary key) - so for the heap table solution you could
> create the globally partitioned index on (c1, c2, c3) as a non-unique index
> and then associate the primary key with it then drop the old unique index."*
>
> Am I misinterpreting the above? as because the global partitioned index
> will need to be prefixed , so in that case for the index to be partitioned
> on create_date, it will ask for create_date to be the leading column
> something as in below plan and in that case the sorting will happen across
> all index partitions data.
>
> And though the combination of (C1,C2) is unique , the current PK is on
> (C1,c2 ,date_create), so adding yet another column c3 to it won't
> satisfy the PK uniqueness as it was doing before.
>
> create index idx on tab(date_create,c1,c2,c3) global PARTITION BY RANGE
> (date_create)......;
>
> Execution Plan
> ----------------------------------------------------------
> Plan hash value: 2815965656
>
> --------------------------------------------------------------------------------------------------------
> | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
> Pstart| Pstop |
>
> --------------------------------------------------------------------------------------------------------
> | 0 | SELECT STATEMENT | | 5666 | 199K| 14 (8)| 00:00:01 |
> | |
> | 1 | SORT ORDER BY | | 5666 | 199K| 14 (8)| 00:00:01 |
> | |
> | 2 | PARTITION RANGE ALL | | 5666 | 199K| 13 (0)| 00:00:01 |
> 1 | 5 |
> | 3 | INDEX STORAGE FAST FULL SCAN| IDX12 | 5666 | 199K| 13 (0)|
> 00:00:01 | 1 | 5 |
>
> --------------------------------------------------------------------------------------------------------
>
> In regards to ETL logic, it picks the data from this database and cache it
> and try to use this as lookup while reading data from other database tables
> and filter it through. The ETL tool has some restrictions due to which it
> cant persist the data so as to use the same sorted cached data throughout
> the day across multiple workflows. And thinking of joining the data across
> the databases using DB links, they are not allowed in this environment here
> due to some security issues. and yes, C1 is a character representation of
> number like data which is why posing a challenge to range partition through
> it in equal intervals. We tested earlier fetching all data unsorted and
> sorting it in ETL was taking a lot more time as compared to it does on the
> DB side.
>
>
> On Mon, Nov 7, 2022 at 4:43 PM Jonathan Lewis <jlewisoracle_at_gmail.com>
> wrote:
>
>> >> And also when you mentioned "*and you have to worry about the impact
>> on data loading. If you need other indexes on the table you then have to
>> worry about the effects on those indexes (for both loading and querying).*"
>> I was not able to fully understand. Is this something as below..
>>
>> Without knowing anything about your table, column and index definitions
>> it's not possible (and could be counter-productive) to make specific
>> suggestions, hence the need for generic warnings that leave it up to you to
>> apply the extra information you have. The thoughts you've had about the
>> effect of the costs and benefits of setting the table up as an IOT are
>> essentially correct - it's all driven by the fact that
>>
>> a) when you load a heap table the table data goes into the next few
>> allocated blocks, densely packed and Oracle has some scope for optimising
>> the index maintenance,
>>
>> compared to
>>
>> b) when you load a heap table the data tends to end up scattered over a
>> large number of ("leaf") blocks of the IOT, and then the primary key and
>> block address of each row has to be used in the secondary index
>>
>> Another factor in the querying by secondary index is that Oracle uses the
>> stored block address as a "guess" of where the row will be in the IOT, and
>> if the guess is wrong it then uses the stored primary key to find the row.
>> When the data is constantly growing the guesses often become wrong fairly
>> frequently. (See
>> https://jonathanlewis.wordpress.com/2019/12/16/iot-bug/#footnote - which
>> also describes a bug in secondary indexes that I haven't checked for the
>> latest versions of Oracle, though it does appear in 19.3).
>>
>>
>> Since you want to partition by create_date the globally partitioned index
>> on (c1,c2,c3) is probably the best bet. You've said that (c1,c2) is unique
>> (and the primary key) - so for the heap table solution you could create the
>> globally partitioned index on (c1, c2, c3) as a non-unique index and then
>> associate the primary key with it then drop the old unique index.
>>
>> You've shown a plan where Oracle did a partition-wise index fast full
>> scan - that was just a costing thing, if you hint the query to use an index
>> but NOT an index_ffs() then it should be able to read the index in order.
>>
>> As Andrew Sayer points out, though, dragging 2 billion rows across the
>> network 10 times per day is an extreme thing to do to satisfy your ETL
>> tool. Insisting on having them sorted is even worse. An incremental
>> mechanism might be a lot better if the tool will accept it, or if you have
>> to drag all the data across then using a tablescan and letting the ETL tool
>> do the sorting might be the best bet. In passing, I think we've noted in
>> the past that C1 is a character representation of a fixed width number - so
>> yet another optimisation detail (for network traffic) might be to pass a
>> to_number(c1) to the ETL tool (which, who knows, might be converting the
>> incoming character to numeric anyway).
>>
>> Regards
>> Jonathan Lewis
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> On Sun, 6 Nov 2022 at 12:57, Lok P <loknath.73_at_gmail.com> wrote:
>>
>>> Thank you so much Jonathan and Mladen.
>>>
>>> Currently the table is a yearly range partitioned on DT_CREATE column
>>> and it has data for last ~8 years so total ~8 partitions. Also the table
>>> has a primary key in column (c1,c2,DT_CREATE). And we have another b-tree
>>> index present on column C3. All are local index only. And the combination
>>> of (c1,c2) is unique only.
>>>
>>>
-- http://www.freelists.org/webpage/oracle-lReceived on Mon Nov 07 2022 - 20:56:26 CET