Re: Same query with no plan change or volume but runs for hours vs minutes
Date: Thu, 30 Nov 2023 02:56:16 +0530
Message-ID: <CAKna9VYNYqKxMtiwC5RFFzKRivO+OvO_0C6Qz7xq9hiY+uC8zw_at_mail.gmail.com>
Thank you.
*Is the primary key index local or global?* The primary key index is local.
On Wed, Nov 29, 2023 at 9:44 PM Rajeev Prabhakar <rprabha01_at_gmail.com> wrote:
> One more note worth a look - 2057107.1
>
> On Nov 29, 2023 at 10:43 AM, <Rajeev Prabhakar <rprabha01_at_gmail.com>>
> wrote:
>
> Lok,
>
> If you haven’t already, can you check if
> Oracle note 1948087.1 is of any help/
> relevance to your case ?
>
> Rajeev
>
>
> On Nov 29, 2023 at 9:48 AM, <Mark W. Farnham <mwf_at_rsiz.com>> wrote:
>
> That all makes sense.
>
>
>
> And recreating the index in the hopes that the error was an unlikely
> event, but which should make the problem go away.
>
>
>
> To the scenario possibilities I will add a delete operation that was
> interrupted in an unfriendly way before the commit. Hmm, I wonder if an
> aborted insert could burn space it shouldn’t permanently burn, leaving the
> index leaf marked with plenty of space available, but the free space in the
> block somehow trashed.
>
>
>
> All the “fixes” that come to mind would destroy this as a research case,
> and if something in their operational schedule triggers an otherwise
> unusual event, the usual thing that happens when problems that go away
> unexplained will probably happen.
>
>
>
> Thanks for saving me from drilling into the actual details.
>
>
>
> It certainly would be tedious to dump the blocks of the index looking for
> perversity, and then tying anything odd to a past event is likely to be an
> anecdotal search. “What unusual thing did all y’all do after the last
> ‘good’ run (that might have cleaned some things up) before the next ‘bad’
> run?”
>
>
>
> Happy Holidays.
>
>
>
> *From:* oracle-l-bounce_at_freelists.org [mailto:
> oracle-l-bounce_at_freelists.org] *On Behalf Of *Jonathan Lewis
> *Sent:* Wednesday, November 29, 2023 4:45 AM
> *To:* Oracle L
> *Subject:* Re: Same query with no plan change or volume but runs for
> hours vs minutes
>
>
>
> Mark,
>
>
>
> I'm still trying to work out a consistent story of what the names of the
> statistics are trying to tell us, but it seems likely that part (possibly a
> major part) of the problem relates to Oracle trying to find usable blocks
> in the index segment. One very clear indicator is that the fast run doesn't
> show a value for "ASSM gsp:Optimized index block rejects" whlie the slow
> run reports 438M
>
>
>
> Other observations from the slow stats:
>
> ASSM gsp:reject db = 622315351
>
> ASSM cbk:blocks examined = 184M which is very close to: "reject db -
> optimizer index blocks reject"
>
>
>
> I think this tells us that "optimized" means I checked a "bit" so I didn't
> need to check the block it references.
>
>
>
> So when we see
>
> ASSM gsp:L2 bitmaps examined = 151,000
>
> ASSM gsp:Optimized reject l1 = 1,150
>
> ASSM gsp:L1 bitmaps examined = 42M
>
>
>
> I think that's telling us that there are lots of L1 bitmaps that should
> have been flagged in the L2 as "no space in my data blocks", but haven't.
>
> Somewhere in the code path we have something is failing to set bits in the
> L1.
>
>
>
> The trouble is, I don't think it's the insert that's doing it wrong. Check
> these three stats:
>
> leaf node splits 4,602
>
> branch node splits 8
>
> ASSM cbk:blocks marked full 4,610
>
>
>
> We know that Oracle had to allocate (or re-use) 4,610 index leaf blocks
> because of the block splits - and it has immediately marked those blocks as
> "full" (index blocks are only ever "unformatted" "newly formatted",
> "reusable (FS2)" or "in use (FULL)" as far as the bitmaps are concerned).
>
>
>
> The only ideas I have at present are:
>
> a) an index coalesce or a shrink space has taken place, freed up no (or
> very few) blocks, but erroneously marked the leaf blocks as reusable.
>
> b) something happened to cause two copies of the code to run, and one
> eventually failed or stuck; or the PL/SQL block controlling the insert
> rolled back and restarted, and the internal code left lots of blocks in the
> wrong state for some reason.
>
>
>
> Regards
>
> Jonathan Lewis
>
>
>
>
>
>
>
> On Tue, 28 Nov 2023 at 23:20, Mark W. Farnham <mwf_at_rsiz.com> wrote:
>
> “So ideally it should be a brand new segment without any fragmentation
> etc.”
>
>
>
> ? Does that apply to both the table partition and the index?
>
>
>
> (A quick skim of what JL wrote makes me thing that index space available
> for new inserts is the problem, but I really just glanced.)
>
>
>
> ? Do you insert rows with key values and lots of attributes null and then
> immediately update the null attributes to the real values?
>
>
>
> (If you’re doing that, then do it someplace else (with a huge pctfree) and
> copy the fully formed rows into the real destination. This interim location
> does not need to be indexed, let the secondary insert into the “real”
> destination handle any relational integrity issues, and if your initial
> insert is really noisy with violations possibly test index it for
> uniqueness after the initial insert.)
>
>
>
> ? What is the size of the initial extent of the new partition compared to
> the amount of a batch being stuffed in?
>
>
>
> IF the answer to Dominic’s question is that your function is not direct
> insert friendly, then think about putting the fully formed rows into an
> interim location unindexed and then use those fully formed rows as a source
> for direct inserts into the “real” destination.
>
>
>
> Good luck.
>
>
>
> mwf
>
>
>
> *From:* oracle-l-bounce_at_freelists.org [mailto:
> oracle-l-bounce_at_freelists.org] *On Behalf Of *Dominic Brooks
> *Sent:* Tuesday, November 28, 2023 4:24 PM
> *To:* loknath.73_at_gmail.com
> *Cc:* jlewisoracle_at_gmail.com; Oracle L
> *Subject:* Re: Same query with no plan change or volume but runs for
> hours vs minutes
>
>
>
> Is there a good reason for not using direct path inserts?
>
>
>
> Sent from my iPhone
>
>
>
> On 28 Nov 2023, at 21:03, Lok P <loknath.73_at_gmail.com> wrote:
>
>
>
> Thank you Jonathan.
>
> So, It means the time spent is really on doing INSERT only but not the
> "reading/select" part of the query. While loading , it spends a lot of time
> scanning blocks that are full and thus unable to insert rows in them.
> However, onething want to share here, that the target partition key/date is
> hard coded in this query, which means the data inserted into a fresh new
> blank partition each day. So ideally it should be a brand new segment
> without any fragmentation etc.
>
> *"When I asked about a "catastrophic" event, I wanted you to think whether
> there was any large-scale activity you might have done in the recent past.* "
>
>
> In the recent past this database moved from the older version of
> exadata to the latest X9 and also the database was moved to multitenant
> PDB. How can that be related? But then why do we see the slowness on
> certain days only but not daily when this data load happens?
>
>
>
> Also we see some high "block" count for almost the same amount num_rows,.
> I.e for the same ~30million rows , some partitions showing the blocks as
> ~1million and some are showing ~1.5million. Is this expected?.
>
>
>
> PARTITION_NAME NUM_ROWS BLOCKS AVG_ROW_LEN COMPRESSION
> PBDRA_DTL_11272023_P 33734616 1487844 197 DISABLED
> PBDRA_DTL_11262023_P 33469912 980070 198 DISABLED
> PBDRA_DTL_11252023_P 33527112 994892 198 DISABLED
> PBDRA_DTL_11242023_P 33948522 991827 198 DISABLED
> PBDRA_DTL_11232023_P 32667974 993492 198 DISABLED
> PBDRA_DTL_11222023_P 34653004 1529017 198 DISABLED
> PBDRA_DTL_11212023_P 33233838 968435 198 DISABLED
> PBDRA_DTL_11202023_P 32870086 1472631 198 DISABLED
> PBDRA_DTL_11192023_P 32485084 947193 198 DISABLED
>
>
>
> Yet to find the space usage percentage from the new segments/partitions.
> Hope i can use the scripts in the blog below for the same.
>
>
>
> https://jonathanlewis.wordpress.com/2016/09/11/space-usage/
>
>
>
> On Tue, Nov 28, 2023 at 11:38 PM Jonathan Lewis <jlewisoracle_at_gmail.com>
> wrote:
>
>
>
> an index on (large sequence value, date_value) should store about 310 to
> 320 index entries per leaf block when full.
>
> So the leaf node splits suggest you've inserted roughly 4602 * 310 = 1.4M
> rows in the bad case and 40928 * 310 = 12.7M rows in the good case.
>
> These figures are in line with the execute count (etc.) of 1424230
> and 12752800. This suggests that virtually every row that gets through the
> other join predicates executes the select statement in fun1 and survives
> the predicate check.
>
>
>
> The "index range scans" statistic match the "execution count" statistic in
> both cases, and the various figures for consistent gets suggest that the
> access path is the same in both cases (and is index only-only). So you can
> discount something odd happening with the function.
>
>
>
> Take the execute count as the number of rows inserted
>
> Take "ASSM gsp: get free block" - "leaf node splits" - "branch node
> splits" as the number of new blocks requested for the table.
>
> Divide "rows inserted" by "new table blocks" and the answer is about
> 34.5: multiply that by 203 (avg_row_len + 5 for the row overhead) and the
> answer is about 7,000 - so the figures about data size etc. hang together
> quite well, and make it reasonable to look for the problem in "other
> activity">
>
>
>
> So in the bad case the session requests some ASSM space for inserting
> table rows 41,500 times (46,150 - 4,600). To find that space is examines
>
> To find that space Oracle examines 42.6M Level 1 bitmap blocks: (ASSM
> gsp:L1 bitmaps examined 42601388)
>
> and 184M data blocks: (ASSM cbk:blocks examined 184273393)
>
> Of these(!), Oracle reports 633M rejected (ASSM gsp:reject db
> 622315351)
>
>
>
> I think the inconsistency in the examined/rejected tells us that there's
> an error somewhere in how Oracle is totalling up different subsets, and
> there are lots of ways you can kick around all the examine/reject figures
> to try and guess what they mean and how they've been summed incorrectly.
>
>
>
> What I do think, though, is that the 184M blocks examined, and the 42M L1
> bitmap blocks example are where your code is burning its excess CPU in the
> slow case -- checking blocks which aren't flagged as FULL, but can't accept
> the data that needs to be inserted. In passing 184M + 42M = 226M, which is
> most of the session logical I/O.
>
>
>
>
>
> When I asked about a "catastrophic" event, I wanted you to think whether
> there was any large-scale activity you might have done in the recent past.
> If I had wanted to know if you had done a big delete I would have asked if
> you had done a big delete. We need to find a reason why there are lots of
> blocks that seem to be marked as "space available" in the L1 bitmap, but
> aren't usable once Oracle gets to the block. (At some point in the past
> Oracle Corp. added code for a session to record a list of blocks that it
> had visited unsuccessfully "a few" times so that it stopped visiting them,
> but that may have changed since the initial implementation.)
>
>
>
> You could use the dbms_space.space_usage procedure to check the state of
> the blocks in one of the target partitions to see if there are lots of
> blocks that between 75 and 10% full. This might be a bit of information to
> pass to Oracle.
>
>
>
>
>
> Regards
>
> Jonathan Lewis
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> There exists only one composite (TXN_ID, WOD_DT) primary key index on this
> table-PBDRA_DTL. This table is daily range partitioned on a date column.
> And also the sequence -PBDRAD_SEQ does fed into the leading column(TXN_ID)
> of the PK index.
>
>
>
> The AVG_ROW_LEN i see in dba_tab_partition for last few of the partitions
> are showing ~198 and similar for others too.
>
> PARTITION_NAME NUM_ROWS BLOCKS AVG_ROW_LEN COMPRESSION
> PBDRA_DTL_11272023_P 33734616 1487844 197 DISABLED
> PBDRA_DTL_11262023_P 33469912 980070 198 DISABLED
> PBDRA_DTL_11252023_P 33527112 994892 198 DISABLED
> PBDRA_DTL_11242023_P 33948522 991827 198 DISABLED
> PBDRA_DTL_11232023_P 32667974 993492 198 DISABLED
> PBDRA_DTL_11222023_P 34653004 1529017 198 DISABLED
> PBDRA_DTL_11212023_P 33233838 968435 198 DISABLED
> PBDRA_DTL_11202023_P 32870086 1472631 198 DISABLED
> PBDRA_DTL_11192023_P 32485084 947193 198 DISABLED
>
> Select list doesnt have any plsql function.
>
> Today I have captured the sql monitor and related stats from v$sesstat for
> a ~5minutes delta period for a quick/good run in which the complete query
> finished in ~15minutes. I have posted the sql monitor and stats in same
> github link, The session stats i have posted as comments to the same github
> linc.
>
> https://gist.github.com/oraclelearner/722a77ef5ebce23396d82e2835958303
>
> When you mentioned "*some catastrophic activity on the base table might
> cause the ASSM related stats*", are you pointing to delete or anything?
> Because we normally dont performa DELETE but just use partition DROP for
> data purge on these tables. Below is the stats specific to ASSM i do
> observe , and those i have not noticed in past frequently. And also as we
> encounter this slow run intermittently, so not sure if we are encountering
> any bugs related to ASSM here?
>
>
>
>
-- http://www.freelists.org/webpage/oracle-lReceived on Wed Nov 29 2023 - 22:26:16 CET