Re: IO Contention on the Redo Logs

From: joel garry <joel-garry_at_home.com>
Date: Wed, 14 Oct 2009 15:59:25 -0700 (PDT)
Message-ID: <850c3152-67ae-4b4f-b313-13d882ea6cb3_at_u16g2000pru.googlegroups.com>



On Oct 14, 1:58 pm, Pat <pat.ca..._at_service-now.com> wrote:
> I've been trying to troubleshoot a troublesome (sic) performance issue
> on one of our busiest Oracle servers and I was hoping somebody here
> might have some insight into what I'm seeing.
>
> Every "now and then" (its not predictable), when the box is under a
> whole lot of IO load (lots of read/write activity), the whole box bogs
> down incredibly and I get about a 3 minute "hang". If you look at the
> wait tree, everybody is waiting on the log_file_parallel_write.
>
> Problem is, if I look at my SAR reports (or vmstat) during one of
> these 3 minute hangs, the IOs on the box drop to the floor e.g. we do
> a lot less IOs during a hand than before and after. We're seeing maybe
> 25k blocks/sec in/out before and after, and drop down to 25-50 blocks/
> sec in/out during the hang.
>
> We've engaged Oracle support on this, and, while they're not certain
> then know what's going on, they have pointed to generally poor
> performance of the IO subsystem when writing REDO logs.
>
> Right now, the entire database has everything mounted on a single
> Fiber Channel LUN o /u01. Even the REDO logs are on that same LUN.
>
> One of the things the storage guys have been pointing to is that
> there's a single IO queue on our QLogic cards per lun, so my REDO
> traffic is, in fact, fighting its way down the same scheduler queue as
> my normal data blocks and they've suggested provisioning a new pair of
> smaller luns, one for each half of the REDO log group.
>
> Another thing that's been suggested is that I switch the RedHat IO
> scheduler from CFS to NOOP and just let the HBA and SAN handle the
> block reordering. I'm dubious about this one though since I'm not
> seeing a bottleneck on the host scheduler and I have to assume there's
> some benefit to the block reordering going on here.
>
> So I suppose my questions to the group are:
>
> 1) Has anybody else seen similar "hangups" with the characteristic
> lack of IO throughput I identified above?
> 2) If you're deploying Oracle on a SAN, what, in your experience, is
> the optimal layout of files on LUNs? I know how I lay things out on
> DASD, but the rules in the SAN world look to be subtly different.
> 3) Does anybody have any experience tweaking the RedHat IO schedulers?
> What are folks experience with the different options?
>
> Particulars:
> Oracle: 10.2.0.4
> Host: 8 cores (intel) 32G
> OS: RedHat EL 5
> Storage: Netapp 3040
> HBA: QLogic

Don't know anything about it, but this has an interesting graph: http://www.redhat.com/magazine/008jun05/features/schedulers/ . The comment about not wanting to use noop unless you have a saturated cpu seems reasonable too, although simple changes to, say, SGA size or some programs can change the cpu characteristics enormously.

Note that redo is the Achilles' heel of Oracle - if you mess it up, you can lose data. That's why you want to have it on the fastest possible serially writing device, with redundancy. That separate LUN suggestion may be what you need. It is entirely possible that you simply fill up the hardware buffer and then everyone has to wait - the lgwr can't continue until it is told the write has really happened, and that's not something you want to turn off (google asynchronous commit oracle if you want to be stupid, but note that pl/sql does it, because it is smart enough to wait for final ack).

Check out the first (especially the links) and last posts here: http://www.linux-archive.org/device-mapper-development/8763-performance-considerations-io-schedulers-dmmultipathing.html

It's worth it to read through this entire thread: http://www.freelists.org/post/oracle-l/log-writer-tuning

Not real apropro, but interesting thoughts on investigating this kind of thing nonetheless:
http://oraclesponge.wordpress.com/2006/10/02/linux-26-kernel-io-schedulers-for-oracle-data-warehousing-part-ii/

jg

--
_at_home.com is bogus.
http://www.networkworld.com/community/node/46155
Received on Wed Oct 14 2009 - 17:59:25 CDT

Original text of this message