Re: Deigning suggestion for table and indexes

From: Andy Sayer <andysayer_at_gmail.com>
Date: Sun, 11 Feb 2024 11:53:12 -0800
Message-ID: <CACj1VR6_Vbhveed0nqj1rrSuS5DM5F260StCzpHtFjObWXJiUQ_at_mail.gmail.com>



There was another recent thread with the same sort of question. But the summary of my thoughts are: if you want to do fast top N style pagination then you do not want to also be reporting the count of results.

Thanks,
Andy

On Sun, 11 Feb 2024 at 05:40, yudhi s <learnerdatabase99_at_gmail.com> wrote:

> Hello All,
>
> Below was the question in a discussion and I want to understand experts'
> opinion.
>
> The requirement is to have the response time in <1 sec for our UI search
> query requirement. These will be pagination queries. These read queries
> will be on big transaction tables (will have ~500+ attributes approx will
> have approx. rows size of ~1KB) having a continuous stream of inserts
> consumed from source. And these tables will be a daily range partitioned on
> the processing_date column. Each partition is going to hold approx
> ~450million rows when it will serve in full capacity to all the customers
> transactions.
>
> The customer will be given the capability to search on a Max date range
> of 30 days of transaction data i.e ~30 range partitions and are supposed to
> get the query response back in <1 sec as it will be UI from which those
> queries will be submitted. And pagination means the first page will show
> the latest 100 records , the second page will show 100 to 200 and so on.
> And count(*) is in the query because , UI also going to show how many total
> pages are going to be there as per total record set.
>
> select count(*) over() as total_record, *
> from
> (select .......
> from TABLE1
> Left join schema1.TABLE2 on TABLE2.PR_ID = TABLE1.PR_ID and
> TABLE2.MID = TABLE1.MID
> and TABLE2.processing_date=TABLE1.processing_date
> where TABLE2.processing_date between '2023-04-20' and
> '2023-05-21'-- Partition pruning
> and TABLE2.ACN_NBR = 'XXXX'
> and ( TABLE1.MID in (XXXXXX) OR TABLE1.CID in (XXXXXX))
> order by TABLE1.PR_TIME DESC
> )
> limit 100 offset 0;
>
> 1)What would be the appropriate indexes to make this above search query
> run in the quickest possible time?
>
> one Index on table1(MID) , one index Table1(CID), one index on
> table2(ACN_NBR)?
> OR
> Should we create a composite index here combining PR_ID i.e (PR_ID,
> MID), (PR_ID, CID), (PR_ID, ACN_NBR) as that is the most unique attribute
> here?
>
> The data pattern for the columns used in predicate are as below:- Table1
> will be the driving table.
>
> count(distinct ACN_NBR) - 25million
> count(distinct MID) - 223k
> count(distinct CID) - 59k
> count(*)from table1 and table2- ~350 million
> PR_ID is a unique key.
>
> 3)One of the use cases is that the customer should be able to search on
> certain attributes and should be able to see the transactions in "desc by
> processing_date" i.e. latest transactions on the first page on the UI. And
> in such scenario, if the search attribute is less unique and the customer
> puts a date range of a month i.e. over 30 partitions , it may results in
> scanning and sorting billions of rows to get the top/recent ~100
> transactions and most likely not going to respond back in <1 sec, even goes
> for the index. So how should we handle or design indexes for catering such
> queries? For e.g. if we have the only filter on column "TABLE1.CID" in the
> above query, which is very less unique then how to handle it?
>
> Regards
> Yudhi
>
>

--
http://www.freelists.org/webpage/oracle-l
Received on Sun Feb 11 2024 - 20:53:12 CET

Original text of this message