Re: Deigning suggestion for table and indexes

From: yudhi s <learnerdatabase99_at_gmail.com>
Date: Mon, 12 Feb 2024 02:07:01 +0530
Message-ID: <CAEzWdqdL48UQz7caD=H1RX+M-FJ6uW7BiGrXJSceUHHVb+ux4Q_at_mail.gmail.com>



Thank You Andy.

Even if the count query is removed, I think the TOP N will still need to be doing the sorting to find the top based on a certain column. And I think to make the sorting in a less resource intensive way, below Jonathan's Blog was giving helpful pointers. It's mostly sorting cheaply using indexes and grabbing the rowids first and then getting the data using those rowids.

https://jonathanlewis.wordpress.com/2008/05/09/manual-optimisation-2/

In regards to make the above posted query runs faster in general , will it be good idea to have the composite indexes (i.e. Filters along with the JOIN column PR_ID) or is it advisable to just create the indexes on the filter column alone?

Regards
Yudhi

On Mon, Feb 12, 2024 at 1:23 AM Andy Sayer <andysayer_at_gmail.com> wrote:

> There was another recent thread with the same sort of question. But the
> summary of my thoughts are: if you want to do fast top N style pagination
> then you do not want to also be reporting the count of results.
>
> Thanks,
> Andy
>
> On Sun, 11 Feb 2024 at 05:40, yudhi s <learnerdatabase99_at_gmail.com> wrote:
>
>> Hello All,
>>
>> Below was the question in a discussion and I want to understand experts'
>> opinion.
>>
>> The requirement is to have the response time in <1 sec for our UI search
>> query requirement. These will be pagination queries. These read queries
>> will be on big transaction tables (will have ~500+ attributes approx will
>> have approx. rows size of ~1KB) having a continuous stream of inserts
>> consumed from source. And these tables will be a daily range partitioned on
>> the processing_date column. Each partition is going to hold approx
>> ~450million rows when it will serve in full capacity to all the customers
>> transactions.
>>
>> The customer will be given the capability to search on a Max date range
>> of 30 days of transaction data i.e ~30 range partitions and are supposed to
>> get the query response back in <1 sec as it will be UI from which those
>> queries will be submitted. And pagination means the first page will show
>> the latest 100 records , the second page will show 100 to 200 and so on.
>> And count(*) is in the query because , UI also going to show how many total
>> pages are going to be there as per total record set.
>>
>> select count(*) over() as total_record, *
>> from
>> (select .......
>> from TABLE1
>> Left join schema1.TABLE2 on TABLE2.PR_ID = TABLE1.PR_ID and
>> TABLE2.MID = TABLE1.MID
>> and TABLE2.processing_date=TABLE1.processing_date
>> where TABLE2.processing_date between '2023-04-20' and
>> '2023-05-21'-- Partition pruning
>> and TABLE2.ACN_NBR = 'XXXX'
>> and ( TABLE1.MID in (XXXXXX) OR TABLE1.CID in (XXXXXX))
>> order by TABLE1.PR_TIME DESC
>> )
>> limit 100 offset 0;
>>
>> 1)What would be the appropriate indexes to make this above search query
>> run in the quickest possible time?
>>
>> one Index on table1(MID) , one index Table1(CID), one index on
>> table2(ACN_NBR)?
>> OR
>> Should we create a composite index here combining PR_ID i.e (PR_ID,
>> MID), (PR_ID, CID), (PR_ID, ACN_NBR) as that is the most unique attribute
>> here?
>>
>> The data pattern for the columns used in predicate are as below:- Table1
>> will be the driving table.
>>
>> count(distinct ACN_NBR) - 25million
>> count(distinct MID) - 223k
>> count(distinct CID) - 59k
>> count(*)from table1 and table2- ~350 million
>> PR_ID is a unique key.
>>
>> 3)One of the use cases is that the customer should be able to search on
>> certain attributes and should be able to see the transactions in "desc by
>> processing_date" i.e. latest transactions on the first page on the UI. And
>> in such scenario, if the search attribute is less unique and the customer
>> puts a date range of a month i.e. over 30 partitions , it may results in
>> scanning and sorting billions of rows to get the top/recent ~100
>> transactions and most likely not going to respond back in <1 sec, even goes
>> for the index. So how should we handle or design indexes for catering such
>> queries? For e.g. if we have the only filter on column "TABLE1.CID" in the
>> above query, which is very less unique then how to handle it?
>>
>> Regards
>> Yudhi
>>
>>

--
http://www.freelists.org/webpage/oracle-l
Received on Sun Feb 11 2024 - 21:37:01 CET

Original text of this message