Re: Estimations are as good as it can be, what else can be done here?
Date: Fri, 17 Nov 2023 16:53:56 +0530
Message-ID: <CAEjw_fhTpaY1eQXvnX5guVhJjLox3VSo2+s_6ZyMFaQ1xKgewg_at_mail.gmail.com>
Jonathan or others may comment.
On Fri, 17 Nov, 2023, 4:31 pm yudhi s, <learnerdatabase99_at_gmail.com> wrote:
> Thank you Jonathan. The blog you pointed is really awesome. It helps in
> getting an idea of how to approach the problem without throwing everything
> to the optimizer.
>
> So basically its a two pass thing. The key strategy should be to get the
> sorting of the large record sets(Millions or billions) happen with minimal
> resources and that can only be achieved if all the column which are getting
> read(which should mostly be PK column or rowid) and the columns those are
> part of filter/join criteria + the sorting criteria ,are part of an index,
> so that it can be sorted as efficiently as possible. And then the related
> handful of rows can be fetched efficiently using the selected PK or Rowids
> and passed to the UI.
>
> It may be challenging when the query is having Joins from multiple tables
> or the columns in select query or the sorting criteria are from different
> tables. But then flattening the data before or de-normalizing the results
> set such that they would be available in one single table(materialized
> view), will make the things easier in such situation.
>
> But yes sorting has to happen during run time only , we cant avoid the run
> time sorting by storing the results in sorted way before hand and just show
> it to the UI user upfront. Correct me if my understanding is wrong.
>
>
> On Wed, 15 Nov, 2023, 8:17 pm Jonathan Lewis, <jlewisoracle_at_gmail.com>
> wrote:
>
>> For examples where you have to select a large volume of data, sort it,
>> and then return only the first few sorted results (pagination queries), you
>> will find that sometimes the only thing to do is a complex rewrite of the
>> query that tries to find the smallest amount of data that needs to be
>> selected to allow the correct sorting followed by a "join back" by rowid or
>> primary key to the tables that supply other non-sorted data.
>>
>> There are some indications of technique at Manual Optimisation | Oracle
>> Scratchpad (wordpress.com)
>> <https://jonathanlewis.wordpress.com/2008/04/27/manual-optimisation/> with
>> a 2nd and 3rd linked at the end of the article.
>>
>> Regards
>> Jonathan Lewis
>>
>>
>> On Wed, 15 Nov 2023 at 05:18, Pap <oracle.developer35_at_gmail.com> wrote:
>>
>>> Jonathan/Andy,
>>>
>>> The OP's query here is really doing a lot of work like passing ~2million
>>> rows all the way across 10+ table joins and also sorting all of them before
>>> publishing the first_n_rows to the client. But a common case could be like,
>>> a scenario in which the UI user has to see latest ~100 transactions details
>>> of certain type for a user within a date range(say one week/month). And in
>>> cases where the daily transaction can go up to 100's of millions and even a
>>> two table join criteria with an order by clause(for finding the latest
>>> transaction details based on date column) can take lot of time. So how
>>> should one handle the design of such a requirement?
>>>
>>> Even one can have the de-normalized/flatten table or materialized view
>>> to persist the two table JOIN data before hand but the order is not
>>> guaranteed while we fetch the data, so we have to have the "order by
>>> date_column" clause added to the query on top of the materialized view
>>> scan(which might be on 100's of millions rows) , and that can well be the
>>> bottleneck then. How should one design to cater such user needs? Can
>>> Parallel processing help here?
>>>
>>> And it's also true that UI users don't understand about the detailed
>>> work it had to go through before publishing those handful of rows in the UI
>>> screen but they want it in a few seconds.
>>>
>>> On Fri, Nov 10, 2023 at 6:27 PM Jonathan Lewis <jlewisoracle_at_gmail.com>
>>> wrote:
>>>
>>>>
>>>> It's not totally trivial working out the timing of activity in a
>>>> parallel plan, although a good starting point is to follow the ordering
>>>> given by the TQ details for each individual DFO tree. And the "Start
>>>> Active / Time Active" is an interesting problem - presumably it's likely to
>>>> cover the first slave start to the last slave ending time, which in
>>>> unbalanced cases could lead to some confusion.
>>>>
>>>> Looking at your plan here, though, the tablescan at operation 86 starts
>>>> at +9 and takes 4 seconds to complete. I'm having a little difficulty
>>>> working out exactly how this table scan passed through lagers of slave to
>>>> get to the sort group by at operation 6 that starts at +12 and runs for 5
>>>> seconds - though that 5 seconds is probably about receiving data from the
>>>> tablescan and join, and not really about sorting.
>>>>
>>>> Given that it's a parallel query, it's not surprising that there are
>>>> two sort group by operations - this is Oracle attempting to reduce the data
>>>> size each PX slave has picked up from its hash join before distributing by
>>>> range for a final sort group by. The sort group by at operation 3 starts
>>>> at +16 and appears to take 24 seconds, but again I think very little time
>>>> is spent in the actual sort, and most of the active time is related to
>>>> passing the data to the client; the "PX SEND QC" is probably the indicator
>>>> that the sort completed at +17.
>>>>
>>>> Given that you seem to have 2M rows moving up the pipeline it's
>>>> possible that the two-pass sorting could be eliminated. If you check the
>>>> Outline Data you probably had a gby_pushdown(_at_qbname) in it. If you change
>>>> this to no_gby_pushdown(_at_qbname) you should find that the preliminary sort
>>>> group by disappears - this might make a couple of seconds difference. (I've
>>>> had a draft note on that hint for 9 years but haven't got around to
>>>> finishing it - there is a mention of it in
>>>> https://jonathanlewis.wordpress.com/2022/04/13/adaptive-joins/ )
>>>>
>>>> As far as a small amount of data making a big difference - that's
>>>> always a possibility in any query. Given the complete cascade of hash
>>>> joins, though, I don't think you're likely to see that happen. I have to
>>>> say, though, that you do seem to have a skew in your data that might be
>>>> adding a couple of seconds to your query. In the PX details you can see
>>>> that slave p005 uses far more CPU time than the rest in slave set 1, and
>>>> slave p00a does far more I/O to temp that the rest in slave set 2.
>>>> Snapshots of PX session stats may give you some clues about those two
>>>> anomalies.
>>>>
>>>> Your general question about timing for 10 table joins with 2M rows -
>>>> you've got more joins than that, and I can see 4, 5, and 8m rows reported,
>>>> so plenty of work to do. 40 seconds doesn't seem unreasonable, until you
>>>> say you only want to look at the first few. Possibly you could find a way
>>>> to create a materialized view (perhaps of only a subset of the tables) that
>>>> reduces that enough to make it look a lot better.
>>>>
>>>> Regards
>>>> Jonathan Lewis
>>>>
>>>>
>>>> On Thu, 9 Nov 2023 at 10:51, yudhi s <learnerdatabase99_at_gmail.com>
>>>> wrote:
>>>>
>>>>> I ran the query manually with parallel(6) and below is the sql
>>>>> monitor. It does finish in ~40 seconds overall. But yes need to execute it
>>>>> from the UI by changing the view definition with parallel hint , to see
>>>>> howmuch extent its helping the real production run which runs with a
>>>>> first_rows kind of setup.
>>>>>
>>>>>
>>>>> https://gist.github.com/databasetech0073/62b51587b3f590914e9a92218609672a
>>>>>
>>>>> But again, if volume increase a bit the response time may again exceed
>>>>> and goes beyond ~10 second reponse which is not good experience for any UI
>>>>> user. So my doubt is, if we should really not expect these type of queries
>>>>> (e.g. with 10 table joins +~2million matching rows +sorting on top) to
>>>>> finish in <10 seconds and thus some different design approach should be
>>>>> followed to cater such requirement? Like persisting the
>>>>> transformed/aggregated results before hand in a materialized view or
>>>>> anything else?
>>>>>
>>>>>>
>>>>>>>>>>
-- http://www.freelists.org/webpage/oracle-lReceived on Fri Nov 17 2023 - 12:23:56 CET