RE: High Buffer busy waits

From: Mark W. Farnham <mwf_at_rsiz.com>
Date: Fri, 28 Jul 2023 22:02:44 -0400
Message-ID: <526001d9c1c0$c48da8c0$4da8fa40$_at_rsiz.com>



If you have 4 log groups in a 2 node rac you probably only have 2 on the node that is doing all the inserting. So you probably want to increase that. Its not going to use the redo for the other node.  

I prefer 10 online logs on each active node and over time you can see how close you get to wrapping and having to checkpoint wait from the history of times each becomes active.  

Some guy used to write a lot about this and say stuff like “Disk is cheap, especially when lack of using enough disk is causing you a problem.”  

mwf  

From: oracle-l-bounce_at_freelists.org [mailto:oracle-l-bounce_at_freelists.org] On Behalf Of yudhi s Sent: Friday, July 28, 2023 6:31 PM
To: Priit Piipuu; Jonathan Lewis; Tim Gorman; Timur Akhmadeev; Mark W. Farnham; Lothar Flatz; Oracle L Subject: Re: High Buffer busy waits  

Resending as the initial reply got bounced back.......  

Thank You So much.  

I verified in v$log, we have 4 groups and each having size is ~4GB. And no changes have been done in this recently. And not seeing any change in read/write IO response time for the storage cell servers which could be suspected here.  

Also Jonathan when you mentioned below, can you please give some more pointers , "how can i get the block addresses for the bmbs show up in the bbws" , considering the fact that we have the AWR data with us, as the issues occurred in the past?  

" The way in which Oracle associates processes with data blocks does change with the size of the extent, and I have wondered whether the algorithm goes a little further and allocates more level 1 bitmap blocks as the number of inserting processes grows. You could examine the block addresses of the bmbs that show up in bbws to see how many different blocks are involved in a short period of time, this might give us some clues about how the feature works in extreme cases"  

Additionally I think if the smaller value of initrans would be a factor here, then we would have seen "enq: TX - allocate ITL entry" as the event. Correct me if I'm wrong here.  

When I run the script which "Timur Akhmadeev" has pointed to, to see the amount of redo log needed , I am seeing some of the specific times of the day when the demand is showing very high i.e. 20-40GB . Wondering if it's really pointing towards a smaller REDO or something else? and yes it's a 2 Node RAC system but the application is configured to run on one dedicated service/instance only.  

I have added the full redo log trends for 3days, with each snap for the whole day in the below location.  

https://gist.github.com/databasetech0073/bb4f8c4a7251d513775eebe2ead7bcda  

Below is a sample redo output from Timur's script, where it shows ~20-40GB Redo requirement.

TIM CPU_SEC PHY_READS PHY_WRITES CR_SERVED CURRENT_SERVED REDO_MB PROCESSES AVG_DF_SEQ AVG_DF_SCAT REDO_DIFF_TO_MD_PCT AVG_LFPW AVG_LOG_SYNC LOG_CKPT_SEC REDO_NEEDED 20230722 2000

6333.12

18370113

2806698

2126

674

12038.07

1171    

323.9741

0.119394

0.261889

2070.364

27539.44

20230723 0530

12983.13

14971155

3009666

1566

390

16915.52

1166    

495.7551

0.182841

0.217913

1135.729

21251.59

20230723 2000

5865.12

17578971

2544062

2529

1197

10661.8

1125    

275.5025

0.108198

0.180787

3583.654

42172.39

20230723 2045

6125.02

9263717

2993726

16254

14899

14932.66

1103    

425.92

0.111488

0.446633

3245.135

53604.53

20230723 2145

4593.99

5385775

1848283

19228

12376

8389.169

1158    

195.4619

0.094307

0.177062

1998.818

18549.14    

On Fri, Jul 28, 2023 at 11:11 AM Priit Piipuu <priit.piipuu_at_gmail.com> wrote:

Heh! I guessed it's RAC just by looking at the wait chains. There have been some bugs in 19c about dbwr not picking up writes, for example Bug 33973908.  

On Thu, 27 Jul 2023, 00:48 yudhi s, <learnerdatabase99_at_gmail.com> wrote:

Missed to mention, it's version 19C of oracle and on a exadata box.  

On Thu, 27 Jul, 2023, 2:46 am yudhi s, <learnerdatabase99_at_gmail.com> wrote:

Hello Listers,  

We suddenly see higher response time for our incoming transactions during certain time periods. and then tried running tanels DASH_WAIT_CHAIN getting below wait chains. It's pointing to high 'Buffer busy waits' and the top sqls it's pointing to is a single row INSERT query.  

And as ist a single row insert , so the plan_hash_value is showing as '0' and also the Avg number of execution of the Insert query remains same. So wondering what can be the cause of the issue? Can it be associated with something related to infrastructure as the final blocker showing as "DB writer"?

 or

As the current_obj# in ASH is pointing to table only and it has default settings i.e. ini_trans - 1 and max_trans- 255 and we are using ASSM. Should we consider increasing the value of ini_trans here?    

https://github.com/tanelpoder/tpt-oracle/blob/master/ash/dash_wait_chains.sql    

%This

SECONDS AAS DISTINCT_SIDS WAIT_CHAIN FIRST_SEEN LAST_SEEN 38%

5710

15.9

19

-> USER1:(PROGMGR) buffer busy waits [1st level bmb] -> USER1:(PROGMGR) log file switch (checkpoint incomplete) -> [idle blocker 1,2395,34375 (oracle_at_XXXXXXXXXXX (DBW0))]

7/25/2023 5:46

7/25/2023 5:46

20%

3050

8.5

166

-> USER1:(PROGMGR) log file switch (checkpoint incomplete) -> [idle blocker 1,2395,34375 (oracle_at_XXXXXXXXXXX (DBW0))]

7/25/2023 5:46

7/25/2023 5:46

7%

990

2.8

47

-> USER1:(PROGCAP) ON CPU
7/25/2023 5:46

7/25/2023 5:51

5%

770

2.1

40

-> USER1:(PROGCAP) log file switch (checkpoint incomplete) -> [idle blocker 1,2395,34375 (oracle_at_XXXXXXXXXXX (DBW0))]

7/25/2023 5:46

7/25/2023 5:46

5%

700

1.9

16

-> USER1:(PROGMGR) buffer busy waits [data block] -> USER1:(PROGMGR) log file switch (checkpoint incomplete) -> [idle blocker 1,2395,34375 (oracle_at_XXXXXXXXXXX (DBW0))]

7/25/2023 5:46

7/25/2023 5:46

3%

390

1.1

5

-> USER1:(PROGCAP1) buffer busy waits [data block] -> USER1:(PROGCAP1) log file switch (checkpoint incomplete) -> [idle blocker 1,2395,34375 (oracle_at_XXXXXXXXXXX (DBW0))]

7/25/2023 5:46

7/25/2023 5:46

2%

330

0.9

5

-> RPT_USER:(JDBC Thin Client) ON CPU

7/25/2023 5:46

7/25/2023 5:51

2%

290

0.8

6

-> ADMIN:(JDBC Thin Client) ON CPU

7/25/2023 5:46

7/25/2023 5:51

2%

270

0.8

22

-> USER1:(PROGMGR) ON CPU
7/25/2023 5:46

7/25/2023 5:51  

--
http://www.freelists.org/webpage/oracle-l
Received on Sat Jul 29 2023 - 04:02:44 CEST

Original text of this message