RE: High Buffer busy waits

From: Mark W. Farnham <mwf_at_rsiz.com>
Date: Thu, 27 Jul 2023 07:14:06 -0400
Message-ID: <50b201d9c07b$770336b0$6509a410$_at_rsiz.com>



A useful thing to do is review the total volume of redo log switched through compared to the number of inserts completed in a given time. This volume times the number of inserts you attempted divided by the number completed can give you a reasonable minimum target for redo needed to not wait on it. Round up.  

IF your load is peaked and you have some good prediction of when a peak load will start, you can do a manual log switch and checkpoint shortly prior to the beginning of the onslaught. The value of doing this diminishes as the number of log groups rises, but at 10 log groups it gives you up to 10% extra space before waiting on the incomplete and that checkpoint should complete relatively quickly during a lull compared to peak.  

IF your load is relatively constant this won’t do anything useful. Many large online redo log groups really only consumes disk volume.  

IF there are signs that the archive process is falling behind there are several ways to improve archive throughput, starting with making sure the archiver is not starved for cpu and that the archive destination has plenty of available write throughput capacity disjoint from the resources used by the multiple concurrent single row inserters. (That includes network and non-Oracle load.)  

Also, just curious, are these single row inserts also single row commits?  

Good luck.  

From: oracle-l-bounce_at_freelists.org [mailto:oracle-l-bounce_at_freelists.org] On Behalf Of Jonathan Lewis Sent: Thursday, July 27, 2023 4:06 AM
To: Oracle L
Subject: Re: High Buffer busy waits    

As Tim has said, your primary problem is the "checkpoint incomplete" waits. To avoid waits on the "log file switch (checkpoint incomplete)" you generally need more redo log. Tim has suggested making the existing log files larger, but if you don't like the idea of very large log files you could just add more files of the same size; so long as you don't have a silly (tiny) size it's the total volume of the redo log that matters nost not the specific size chosen for the files.  

Since you refer to this as a "sudden" occurrence It is possible that the log writer was temporarily delayed by an unusual slowdown in the database writer - so its worth checking if there was some load on your I/O subsystem (possibly from outside Oracle) that slowed down Oracle's I/O rates.  

You'll notice that the buffer busy waits at the top of the list is labelled "(1st level bmb)" - that about space management blocks, not about competition on the data blocks, so changing the initrans setting wouldn't make a difference. The much smaller percentage of buffer busy waits labelled as [data block] would be the ones that might improve with an increase in initrans, but I wouldn't mess about with the setting until after I'd checked the effect of eliminating the checkpoint waits.  

An idea that I've not experimented with for systems like yours that have a large number of concurrent processes doing single row inserts is whether the size of the extents makes any difference. The way in which Oracle associates processes with data blocks does change with the size of the extent, and I have wondered whether the algorithm goes a little further and allocates more level 1 bitmap blocks as the number of inserting processes grows. You could examine the block addresses of the bmbs that show up in bbws to see how many different blocks are involved in a short period of time, this might give us some clues about how the feature works in extreme cases.  

Regards

Jonathan Lewis          

On Wed, 26 Jul 2023 at 22:17, yudhi s <learnerdatabase99_at_gmail.com> wrote:

Hello Listers,  

We suddenly see higher response time for our incoming transactions during certain time periods. and then tried running tanels DASH_WAIT_CHAIN getting below wait chains. It's pointing to high 'Buffer busy waits' and the top sqls it's pointing to is a single row INSERT query.  

And as ist a single row insert , so the plan_hash_value is showing as '0' and also the Avg number of execution of the Insert query remains same. So wondering what can be the cause of the issue? Can it be associated with something related to infrastructure as the final blocker showing as "DB writer"?

 or

As the current_obj# in ASH is pointing to table only and it has default settings i.e. ini_trans - 1 and max_trans- 255 and we are using ASSM. Should we consider increasing the value of ini_trans here?    

https://github.com/tanelpoder/tpt-oracle/blob/master/ash/dash_wait_chains.sql    

%This

SECONDS AAS DISTINCT_SIDS WAIT_CHAIN FIRST_SEEN LAST_SEEN 38%

5710

15.9

19

-> USER1:(PROGMGR) buffer busy waits [1st level bmb] -> USER1:(PROGMGR) log file switch (checkpoint incomplete) -> [idle blocker 1,2395,34375 (oracle_at_XXXXXXXXXXX (DBW0))]

7/25/2023 5:46

7/25/2023 5:46

20%

3050

8.5

166

-> USER1:(PROGMGR) log file switch (checkpoint incomplete) -> [idle blocker 1,2395,34375 (oracle_at_XXXXXXXXXXX (DBW0))]

7/25/2023 5:46

7/25/2023 5:46

7%

990

2.8

47

-> USER1:(PROGCAP) ON CPU
7/25/2023 5:46

7/25/2023 5:51

5%

770

2.1

40

-> USER1:(PROGCAP) log file switch (checkpoint incomplete) -> [idle blocker 1,2395,34375 (oracle_at_XXXXXXXXXXX (DBW0))]

7/25/2023 5:46

7/25/2023 5:46

5%

700

1.9

16

-> USER1:(PROGMGR) buffer busy waits [data block] -> USER1:(PROGMGR) log file switch (checkpoint incomplete) -> [idle blocker 1,2395,34375 (oracle_at_XXXXXXXXXXX (DBW0))]

7/25/2023 5:46

7/25/2023 5:46

3%

390

1.1

5

-> USER1:(PROGCAP1) buffer busy waits [data block] -> USER1:(PROGCAP1) log file switch (checkpoint incomplete) -> [idle blocker 1,2395,34375 (oracle_at_XXXXXXXXXXX (DBW0))]

7/25/2023 5:46

7/25/2023 5:46

2%

330

0.9

5

-> RPT_USER:(JDBC Thin Client) ON CPU

7/25/2023 5:46

7/25/2023 5:51

2%

290

0.8

6

-> ADMIN:(JDBC Thin Client) ON CPU

7/25/2023 5:46

7/25/2023 5:51

2%

270

0.8

22

-> USER1:(PROGMGR) ON CPU
7/25/2023 5:46

7/25/2023 5:51  

--
http://www.freelists.org/webpage/oracle-l
Received on Thu Jul 27 2023 - 13:14:06 CEST

Original text of this message