Re: RMAN innocent bystanders killed on linux

From: Rajeev Prabhakar <rprabha01_at_gmail.com>
Date: Thu, 28 Feb 2008 15:03:19 -0500
Message-ID: <2ba656800802281203i4976a7bdk597a97c38c4d8f6a@mail.gmail.com>


Hello,

Given the experience we have had recently, I am not 100% sure if this issue is merely confined to 2.4
kernels. Just to share our recent experience...

Few weeks back we were facing instance crashes on a rac cluster (10.2.0.3, linux 2.6.9-55.0.6.0.1.ELsmp) encountered only during the rman runtime window and subsequent troubleshooting / research led to reducing the parallelism / filesperset for the rman configuration. That has so far avoided the zero memory/swap
scenario we saw in some oracle trace files and we haven't had any instance crashes during rman backup
window since then. Although, o.s. utilities had continued to show a relatively "normal" system from a
memory /swap stand point during those problematic rman backup window times. So, given what we have
seen, I would agree w/Christo that it is an issue associated with large/heavy i/o operations/filesystem cache.

-Rajeev

On Thu, Feb 28, 2008 at 1:38 PM, Christo Kutrovsky <kutrovsky.oracle_at_gmail.com> wrote:
> Hello,
>
> This is known issue with 2.4 kernels. It's not so much to do with low
> memory, but incorrect memory counting from the OOM module.
> It is related with large file io operations, which use a lot of file
> system cache.
>
> Enable DIRECTIO (filesystem_options=directio). In 2.4 kernel you have
> either DIRECTIO or ASYNC for ext3 (I am assuming you are using ext3).
> Not both, if you do "setall" async will take precedence.
>
> Note that this will only help you with your duplicate. If you start a
> "cp" someone will get killed. I believe there's a bugfix for the 2.4
> kernel. Make sure you are using latest 2.4 kernel.
>
> If you really need more info, I can try to lookup the kernel that had
> this issue, and the kernel that did not.
>
>
> --
> Christo Kutrovsky
> DBA Team Lead
> The Pythian Group - www.pythian.com
> I blog at http://www.pythian.com/blogs/

--
http://www.freelists.org/webpage/oracle-l
Received on Thu Feb 28 2008 - 14:03:19 CST

Original text of this message