RE: Redo Log improvement - how to analyze and attack?
Date: Thu, 15 Nov 2012 15:29:02 -0600
Message-ID: <F05D8DF1FB25F44085DB74CB916678E885651242A3_at_NADCWPMSGCMS10.hca.corpad.net>
Luis,
Any idea how much its slowing down my system? How can we (as DBAs) quantify the "how much" is "too much"?
This is kind of turned into a thought exercise I guess..I'm not trying to be an a$$.
Looks to me like my wait time over 5 hours was 998s - that's about 16 minutes.
I would "assume" that is high - but is it really? How do we know?
Is anyone waiting on it? No - it's a batch process that runs from 00:00 - 05:00 in the morning.
This goes back to my first question I guess which was: "Are high numbers of log file switches (during certain hours) enough validation for that an improvement is needed?"
Avg %Time Total Wait wait Waits Wait Class Waits -outs Time (s) (ms) /txn -------------------- ---------------- ------ ---------------- ------- --------- Configuration 63,949 .6 998 16 0.7 Avg %Time Total Wait wait Waits Event Waits -outs Time (s) (ms) /txn ---------------------------- -------------- ------ ----------- ------- --------- log file sequential read 32,177 .0 648 20 0.3 log file switch(ckpt incomp 10,330 1.0 572 55 0.1 log file sync 59,236 .0 183 3 0.6 log file switch completion 1,706 3.5 138 81 0.0 log file switch (archiving n 2,927 .2 121 41 0.0 Log archive I/O 27,030 .0 22 1 0.3 Statistic Total per Second per Trans -------------------------------- ------------------ -------------- ------------- redo log space requests 114,499 6.4 1.2 redo log space wait time 85,008 4.7 0.9 redo ordering marks 1,597,131 88.7 16.7 redo size 26,127,511,776 1,451,121.6 272,624.5 redo subscn max counts 163,257 9.1 1.7 redo synch time 18,839 1.1 0.2 redo synch writes 58,819 3.3 0.6 redo wastage 151,718,608 8,426.5 1,583.1 redo write time 131,085 7.3 1.4 redo writer latching time 136 0.0 0.0 redo writes 499,869 27.8 5.2
Instance Activity Stats - Thread ActivityDB/Inst: CCMNASP1/CCMNASP1N1 Snaps:
-> Statistics identified by '(derived)' come from sources other than SYSSTAT
Statistic Total per Hour -------------------------------- ------------------ --------- log switches (derived) 711 142.16 -------------------------------------------------------------
From: Luis [mailto:lcarapinha_at_gmail.com]
Sent: Thursday, November 15, 2012 2:11 PM
To: Taylor Christopher - Nashville
Cc: oracle-l_at_freelists.org
Subject: Re: Redo Log improvement - how to analyze and attack?
From my humble experience and for a OLTP system (batch jobs?), 300 redo log switchs per hour is _VERY_ high. Some say that rule of thumb is 2 to 5 redo log switches per hour/max, so you are getting it too high. Please consider increase you log file size.
Looking at your AWR, the system is producing too much redo before LGWR can actually do its work, so users need to wait for it (wait: log file switch completion). As expected, you also have a Incomplete Checkpoint wait, that means that your DBWR is not able to complete the checkpoint as of course, it is slowing down your operations..
On Wed, Nov 14, 2012 at 9:49 PM, <Christopher.Taylor2_at_parallon.net<mailto:Christopher.Taylor2_at_parallon.net>> wrote:
I have this 3-Node RAC (10.2.0.4) that I've inherited (as I've mentioned) and after putting out a few issues I'm ready to move onto the redo logs (and perhaps I should have started here earlier).
First question:
Are high numbers of log file switches (during certain hours) enough validation for that an improvement is needed?
In my case, during heavy batch processing, this RAC cluster may experience 300 log switches during an hour (I know, I know - sounds bad doesn't it)
(Caveat: The log file switches were generated from within Toad - I'm assuming they are valid)
My first step was to enable FAST_START_MTTR_TARGET and it is set for 3 minutes (180 seconds).
Here's some of my numbers for the time interval on Node1:
Redo Size:
1,451,121.63 per second
AWR redo/log stats (formatted) here:
https://gist.github.com/4075050
Second question:
V$INSTANCE_RECOVERY OPTIMAL_LOGFILE_SIZE = 5421 (~5 GB)
Is OPTIMAL_LOGFILE_SIZE calculated by the *EXISTING LOGFILE GROUPS*? So if I had more groups, would this number go down?
Third Question:
I've seen various mentions of " archive_lag_target" - what do I need to think about regarding this parameter in relation to check points?
Any suggestions/thoughts are appreciated.
Regards,
Chris Taylor
Oracle DBA
Parallon IT&S
christopher.taylor2_at_parallon.net<mailto:christopher.taylor2_at_parallon.net><mailto:christopher.taylor2_at_parallon.net<mailto:christopher.taylor2_at_parallon.net>>
www.parallon.net<http://www.parallon.net>
-- http://www.freelists.org/webpage/oracle-l -- Cumprimentos, Lu�s Marques -- http://www.freelists.org/webpage/oracle-lReceived on Thu Nov 15 2012 - 22:29:02 CET