RE: ASM of any significant value when switching to Direct NFS / NetApp / non-RAC?
Date: Mon, 20 Aug 2012 16:04:48 +0000
Message-ID: <9F15274DDC89C24387BE933E68BE3FD33A36BD_at_MISOUT7MSGUSR9D.ITServices.sbc.com>
What is your setting for this parameter ?
SQL> alter system set "_high_priority_processes"='LMS*|VKTM|LGWR' scope=spfile sid='*';
System altered.
If LGWR is not set to RT priority it might be the reason behind higher log file sync times.
-----Original Message-----
From: Austin Hackett [mailto:hacketta_57_at_me.com]
Sent: Sunday, August 19, 2012 5:59 AM
To: CRISLER, JON A
Cc: Oracle-L_at_freelists.org
Subject: Re: ASM of any significant value when switching to Direct NFS / NetApp / non-RAC?
Hi Jon
Interesting - thanks for the info.
Yes, we also see the those symptoms - a big spike in log file sync, accompanied by some GCS waits. When the spikes occur, we did check CPU utilization on the storage controller, and it was less than 50%. Write latencies, IOPS, and throughput were all within acceptable limits, and actually much lower than other periods when performance had been fine.
We're using dNFS, so aren't using DM-mutlipath. Indeed, there is only a single storage NIC; a decision that precedes me and we're working to address. We are on OEL 5.4 which is interesting.
One idea is this could be caused by an incorrect MTU on the storage NIC. It's currently set to 8000 (a setting I'm told was inherited when they switched from Solaris to Linux a while back), whereas it's 9000 on the filer and switch.
Out of curiosity, what has your biggest log write elapsed warning? We see 1 or 2 spikes a week and the biggest has been 92 seconds - yes, 92 seconds!
On 17 Aug 2012, at 04:35, CRISLER, JON A wrote:
> Austin- we have observed the exact same behavior, and it appears to be
> periodic spikes on the NetApp controller / cpu utilization in a RAC
> environment. The info is fuzzy right now but if you have a LGWR
> delay, it also causes a GCS delay in passing the dirty block to
> another node that needs it. In our case it's a SAN-ASM-RAC
> environment, and the NetApp cpu is always churning above 80%. In our
> case we found that RH tuning, multipath issues contributed to the
> cause and seems to have been mostly addressed with RH 5.8 (was 5.4).
> In a FC SAN environment something like Sanscreen that can measure end
> to end FC response time helped to narrow down some of the contributing
> factors. You can set a undocumented parameter to allow the gcs dirty
> block to be passed over to the other nodes while a lgwr wait occurs,
> but you risk data corruption in the event of a node crash (hence we
> passed on that tip).
-- http://www.freelists.org/webpage/oracle-lReceived on Mon Aug 20 2012 - 11:04:48 CDT