RE: Deigning suggestion for table and indexes

From: Mark W. Farnham <mwf_at_rsiz.com>
Date: Sun, 11 Feb 2024 15:56:39 -0500
Message-ID: <53eb01da5d2c$d0053f80$700fbe80$_at_rsiz.com>



+1.  

Particularly in the same query.  

Now if you really must do that to satisfy consumers:  

If you have a list of the included partition names and a local index led by the identifier of the thing for which you are counting the total, you can probably fire that off separately very fast iterating through the pairs of partition names (either by generating the relevant pairwise partition queries with just enough columns to satisfy getting the count strung together by union alls, or by iterating through in a pl/sql loop operating on the pairs of names).  

If you also bring back the rowids of the qualifying rows in the order you want in the pagination (or pairs of rowids in your case with two tables) into a global temporary table, you might consider using that gtt to drive a much simpler query bringing back all the attributes. That is sort of the old fashioned way, from before we had analytical functions or gtts, so we had to use an actual table to stuff rowids into, but it was still very fast.  

You would lose the need for these two predicate lines:

and TABLE2.processing_date=TABLE1.processing_date

        where TABLE2.processing_date between '2023-04-20' and '2023-05-21'-- Partition pruning

(because those are implicit by name in your partition pairs by name)

but you would need to update your list of pairs of partition names once a day. Probably right after you manually generate the empty new partitions so the first insert of the day doesn’t have to wait on storage and structure manipulations.  

Good luck. Doing both count and n-pagination in a single query can be done, but the query becomes much more complicated to get right and to read than it needs to be. Just because you can do something doesn’t mean you should.  

Quoting Andy: “But the summary of my thoughts are: if you want to do fast top N style pagination then you do not want to also be reporting the count of results.”      

From: oracle-l-bounce_at_freelists.org [mailto:oracle-l-bounce_at_freelists.org] On Behalf Of Andy Sayer Sent: Sunday, February 11, 2024 2:53 PM
To: learnerdatabase99_at_gmail.com
Cc: Oracle L
Subject: Re: Deigning suggestion for table and indexes  

There was another recent thread with the same sort of question. But the summary of my thoughts are: if you want to do fast top N style pagination then you do not want to also be reporting the count of results.  

Thanks,

Andy  

On Sun, 11 Feb 2024 at 05:40, yudhi s <learnerdatabase99_at_gmail.com> wrote:

Hello All,

Below was the question in a discussion and I want to understand experts' opinion.  

The requirement is to have the response time in <1 sec for our UI search query requirement. These will be pagination queries. These read queries will be on big transaction tables (will have ~500+ attributes approx will have approx. rows size of ~1KB) having a continuous stream of inserts consumed from source. And these tables will be a daily range partitioned on the processing_date column. Each partition is going to hold approx ~450million rows when it will serve in full capacity to all the customers transactions.  

 The customer will be given the capability to search on a Max date range of 30 days of transaction data i.e ~30 range partitions and are supposed to get the query response back in <1 sec as it will be UI from which those queries will be submitted. And pagination means the first page will show the latest 100 records , the second page will show 100 to 200 and so on. And count(*) is in the query because , UI also going to show how many total pages are going to be there as per total record set.

select count(*) over() as total_record, * from

    (select .......

        from   TABLE1
            Left join schema1.TABLE2  on TABLE2.PR_ID = TABLE1.PR_ID  and TABLE2.MID = TABLE1.MID
and TABLE2.processing_date=TABLE1.processing_date
        where   TABLE2.processing_date between '2023-04-20' and '2023-05-21'-- Partition pruning
            and TABLE2.ACN_NBR = 'XXXX'
            and ( TABLE1.MID in (XXXXXX) OR TABLE1.CID in (XXXXXX))
        order by   TABLE1.PR_TIME DESC

    )
limit 100 offset 0;

 1)What would be the appropriate indexes to make this above search query run in the quickest possible time?  

  one Index on table1(MID) , one index Table1(CID), one index on table2(ACN_NBR)?   OR
  Should we create a composite index here combining PR_ID i.e (PR_ID, MID), (PR_ID, CID), (PR_ID, ACN_NBR) as that is the most unique attribute here?  

The data pattern for the columns used in predicate are as below:- Table1 will be the driving table.

count(distinct ACN_NBR) - 25million
count(distinct MID) - 223k
count(distinct CID) - 59k
count(*)from table1 and table2- ~350 million
PR_ID is a unique key.

3)One of the use cases is that the customer should be able to search on certain attributes and should be able to see the transactions in "desc by processing_date" i.e. latest transactions on the first page on the UI. And in such scenario, if the search attribute is less unique and the customer puts a date range of a month i.e. over 30 partitions , it may results in scanning and sorting billions of rows to get the top/recent ~100 transactions and most likely not going to respond back in <1 sec, even goes for the index. So how should we handle or design indexes for catering such queries? For e.g. if we have the only filter on column "TABLE1.CID" in the above query, which is very less unique then how to handle it?  

Regards

Yudhi

--
http://www.freelists.org/webpage/oracle-l
Received on Sun Feb 11 2024 - 21:56:39 CET

Original text of this message