Re: IO Contention on the Redo Logs
Date: Wed, 14 Oct 2009 15:32:38 -0700 (PDT)
Message-ID: <0c7cfbd0-fd48-4431-b265-df25020648dd_at_e8g2000yqo.googlegroups.com>
On Oct 14, 4:58 pm, Pat <pat.ca..._at_service-now.com> wrote:
snip
> I've been trying to troubleshoot a troublesome (sic) performance issue
> on one of our busiest Oracle servers and I was hoping somebody here
> might have some insight into what I'm seeing.
>
> Every "now and then" (its not predictable), when the box is under a
> whole lot of IO load (lots of read/write activity), the whole box bogs
> down incredibly and I get about a 3 minute "hang". If you look at the
> wait tree, everybody is waiting on the log_file_parallel_write.
>
> Problem is, if I look at my SAR reports (or vmstat) during one of
> these 3 minute hangs, the IOs on the box drop to the floor e.g. we do
> a lot less IOs during a hand than before and after. We're seeing maybe
> 25k blocks/sec in/out before and after, and drop down to 25-50 blocks/
> sec in/out during the hang.
It sounds like the SAN workload may be impacting you. You are trying to do IO but not able to do it quickly.
My system gets around 1 to 2 ms for log file parallel write. 25k blocks/sec is when it is good? Ouch!
> We've engaged Oracle support on this, and, while they're not certain
> then know what's going on, they have pointed to generally poor
> performance of the IO subsystem when writing REDO logs.
>
> Right now, the entire database has everything mounted on a single
> Fiber Channel LUN o /u01. Even the REDO logs are on that same LUN.
You really want to have separate LUNs. All my redo logs for any kind of production system are on RAID 10. You probably do not want RAID 5 for redo logs.
> One of the things the storage guys have been pointing to is that
> there's a single IO queue on our QLogic cards per lun, so my REDO
> traffic is, in fact, fighting its way down the same scheduler queue as
> my normal data blocks and they've suggested provisioning a new pair of
> smaller luns, one for each half of the REDO log group.
Sure.
> Another thing that's been suggested is that I switch the RedHat IO
> scheduler from CFS to NOOP and just let the HBA and SAN handle the
> block reordering. I'm dubious about this one though since I'm not
> seeing a bottleneck on the host scheduler and I have to assume there's
> some benefit to the block reordering going on here.
Dunno about that but we are using ASM which is pretty similar to RAW and the setup of async etc goes along with all that.
My question is ... what are the SAN people telling you about what else is impacting the SAN when your IO throughput goes thru the floor? Received on Wed Oct 14 2009 - 17:32:38 CDT