Re: Application of archived logs on physical standby very slow
Date: Thu, 28 Jan 2010 12:58:23 +0000
Message-ID: <7765c8971001280458q5a8ce9ek70f87bce8320c0_at_mail.gmail.com>
What does the 10046 trace for the MRP process show up? You may well also endup doing os level tracing with strace or similar.
On Thu, Jan 28, 2010 at 9:15 AM, Martin Bach <development_at_the-playground.de>wrote:
> Hi Niall!
>
> couldn't agree more. I seem to have the dilemma where my storage array
> seems to be in good working order and my OS doesn't report any unusually
> high IO service times. I'm wondering where else I could look for clues as to
> why, trying to follow a methodical approach to troubleshooting.
>
> The load is clearly caused by Oracle processes, shutting the standby down
> reduces the iowaits to 0 (but that doesn't really solve my problem :)
>
> Regards,
>
> Martin
>
>
> On 01/27/2010 07:06 PM, Niall Litchfield wrote:
>
>> I'd have thought this problem was eminently suited to method-r.
>>
>> On 1/27/10, Johnson, William L (TEIS)<WLJohnson_at_tycoelectronics.com>
>> wrote:
>>
>>> What do you have your parallel_execution_message_size set to on the
>>> standby
>>> database?
>>>
>>> -----Original Message-----
>>> From: oracle-l-bounce_at_freelists.org [mailto:
>>> oracle-l-bounce_at_freelists.org]
>>> On Behalf Of Martin Bach
>>> Sent: Wednesday, January 27, 2010 12:13 PM
>>> To: ORACLE-L
>>> Subject: Application of archived logs on physical standby very slow
>>>
>>> Dear listers,
>>>
>>> I'd like to add another "interesting" problem to today's interesting
>>> RMAN problem. I have quite a bit of output from diagnostic tools,
>>> hopefully this doesn't got too badly mangled...
>>>
>>> Our alerting system fired when my physical standby database went out of
>>> sync with production. DR and production share the same storage (an entry
>>> level MSA) and are multipathed. I could observe this on the hosts (both
>>> RHEL 5.2 32bit, don't ask):
>>>
>>> Wed Jan 27 10:42:42 2010
>>> RFS[2]: Archived Log: '/u03/oradata/dr/arch/1_49264_596779339.dbf'
>>> Primary database is in MAXIMUM PERFORMANCE mode
>>> Wed Jan 27 10:47:55 2010
>>> Media Recovery Log /u03/oradata/dr/arch/1_49252_596779339.dbf
>>> Wed Jan 27 10:55:27 2010
>>> Media Recovery Log /u03/oradata/dr/arch/1_49253_596779339.dbf
>>> Wed Jan 27 11:01:46 2010
>>>
>>> Online redo logs are 500M in size and so are most archived logs, and as
>>> you can see it takes 5-10 minutes to apply just one, which is consistent.
>>>
>>> top - 11:25:37 up 57 days, 17:34, 13 users, load average: 0.55, 1.05,
>>> 1.19
>>> Tasks: 209 total, 1 running, 208 sleeping, 0 stopped, 0 zombie
>>> Cpu0 : 2.8%us, 1.9%sy, 0.0%ni, 51.9%id, 43.5%wa, 0.0%hi, 0.0%si,
>>> 0.0%st
>>> Cpu1 : 1.9%us, 0.9%sy, 0.0%ni, 42.6%id, 54.6%wa, 0.0%hi, 0.0%si,
>>> 0.0%st
>>> Cpu2 : 0.9%us, 0.0%sy, 0.0%ni, 98.1%id, 0.9%wa, 0.0%hi, 0.0%si,
>>> 0.0%st
>>> Cpu3 : 0.0%us, 0.9%sy, 0.0%ni, 99.1%id, 0.0%wa, 0.0%hi, 0.0%si,
>>> 0.0%st
>>> Mem: 8310064k total, 8019968k used, 290096k free, 36456k buffers
>>> Swap: 8152948k total, 70124k used, 8082824k free, 7568912k cached
>>>
>>> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
>>>
>>> 21647 oracle 15 0 2178m 48m 31m S 3 0.6 0:00.41 ora_mrp0_dr
>>>
>>> 7496 oracle 15 0 2145m 1.5g 1.5g S 2 18.4 190:47.10
>>> ora_p000_dr
>>> 7500 oracle 15 0 2145m 1.5g 1.5g S 2 18.4 199:38.12
>>> ora_p002_dr
>>> 2580 root 10 -5 0 0 0 S 1 0.0 17:42.07
>>> [kjournald]
>>>
>>> 7498 oracle 16 0 2145m 1.5g 1.5g D 1 18.4 189:40.17
>>> ora_p001_dr
>>>
>>> The only processes on cpu are MRP0 and its parallel slaves. Funny
>>> enough, iostat doesn't seem to see a problem:
>>>
>>> avg-cpu: %user %nice %system %iowait %steal %idle
>>> 0.00 0.00 0.25 24.75 0.00 75.00
>>>
>>> Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz
>>> avgqu-sz await svctm %util
>>> cciss/c0d0 0.00 4.00 0.00 8.00 0.00 48.00 12.00
>>> 0.01 0.88 0.50 0.40
>>> cciss/c0d1 0.00 5.00 0.00 18.00 0.00 92.00 10.22
>>> 0.02 0.89 0.06 0.10
>>> sda 0.00 0.00 0.00 0.00 0.00 0.00 0.00
>>> 0.00 0.00 0.00 0.00
>>> sdb 0.00 0.00 0.00 0.00 0.00 0.00 0.00
>>> 0.00 0.00 0.00 0.00
>>> sdc 0.00 211.00 0.00 189.00 0.00 1600.00 16.93
>>> 0.49 2.57 2.57 48.60
>>> sdd 0.00 203.00 0.00 183.00 0.00 1544.00 16.87
>>> 0.50 2.71 2.74 50.10
>>> sde 0.00 0.00 0.00 0.00 0.00 0.00 0.00
>>> 0.00 0.00 0.00 0.00
>>> sdf 0.00 0.00 0.00 0.00 0.00 0.00 0.00
>>> 0.00 0.00 0.00 0.00
>>> dm-0 0.00 0.00 0.00 0.00 0.00 0.00 0.00
>>> 0.00 0.00 0.00 0.00
>>> dm-1 0.00 0.00 0.00 0.00 0.00 0.00 0.00
>>> 0.00 0.00 0.00 0.00
>>> dm-2 0.00 0.00 0.00 786.00 0.00 3144.00 8.00
>>> 2.09 2.64 1.26 98.70
>>> dm-3 0.00 0.00 0.00 0.00 0.00 0.00 0.00
>>> 0.00 0.00 0.00 0.00
>>> dm-4 0.00 0.00 0.00 0.00 0.00 0.00 0.00
>>> 0.00 0.00 0.00 0.00
>>> dm-5 0.00 0.00 0.00 786.00 0.00 3144.00 8.00
>>> 2.09 2.65 1.26 98.80
>>> dm-6 0.00 0.00 0.00 23.00 0.00 92.00 8.00
>>> 0.02 0.83 0.04 0.10
>>> dm-7 0.00 0.00 0.00 0.00 0.00 0.00 0.00
>>> 0.00 0.00 0.00 0.00
>>> dm-8 0.00 0.00 0.00 0.00 0.00 0.00 0.00
>>> 0.00 0.00 0.00 0.00
>>>
>>> Service time is pretty decent I'd say and the SAN administrators tell me
>>> that everything on the array is fine. I checked for AIO and that's also
>>> used:
>>>
>>> [oracle]$ cat /proc/slabinfo | grep ^k
>>> kioctx 25 60 192 20 1 : tunables 120 60
>>> 8 : slabdata 3 3 0
>>> kiocb 64 90 128 30 1 : tunables 120 60
>>> 8 : slabdata 3 3 0
>>>
>>> Data files are scattered over mount points /u02-/u06, and there are no
>>> multipathing (or other I/O related errors) reported in dmesg and
>>> /var/log/messages. Archive logs are received on the DR box in time, and
>>> there are no gaps. Our sys admin tells me that device mapper multipath
>>> is happily distributing IO round robin. Oh yes, and it's Oracle 10.2.0.4
>>> 32bit.
>>>
>>> Now does anyone have an idea on how to proceed?
>>>
>>> Regards,
>>>
>>> Martin
>>> --
>>> Martin Bach
>>> OCM 10g
>>> http://martincarstenbach.wordpress.com
>>> http://www.linkedin.com/in/martincarstenbach
>>> --
>>> http://www.freelists.org/webpage/oracle-l
>>>
>>>
>>> --
>>> http://www.freelists.org/webpage/oracle-l
>>>
>>>
>>>
>>>
>>
> --
> Martin Bach
> OCM 10g
> http://martincarstenbach.wordpress.com
> http://www.linkedin.com/in/martincarstenbach
>
-- Niall Litchfield Oracle DBA http://www.orawin.info -- http://www.freelists.org/webpage/oracle-lReceived on Thu Jan 28 2010 - 06:58:23 CST