Re: OOM killer terminating database on AWS EC2
Date: Tue, 14 Jan 2020 13:00:55 -0700
Message-ID: <CAJzM94Ccp9QXsEdurUj1Dw4+w_hGvTCweDWQfqg6agMrny0qMA_at_mail.gmail.com>
Mark,
It's been a busy morning, starting with the page just before midnight.
Spinnaker called me and got my account activated, so I've opened a ticket
with them. We'll see what they come back with. I wish I had better
answers to your questions. I'm pretty new to AWS and had no training to
speak of. Having to cover for the DBA that left 2 weeks ago and being
ignorant about the configuration is not a happy place.
top:
Sandy
On Mon, Jan 13, 2020 at 1:44 PM Mark J. Bobak <mark_at_bobak.net> wrote:
> Hi Sandy,
top - 19:59:10 up 4 days, 20:10, 2 users, load average: 0.11, 0.08, 0.13
Tasks: 385 total, 1 running, 384 sleeping, 0 stopped, 0 zombie
%Cpu(s): 0.1 us, 0.0 sy, 0.0 ni, 99.9 id, 0.0 wa, 0.0 hi, 0.0 si,
0.0 st
KiB Mem : 13070175+total, 1245316 free, 20508408 used, 10894803+buff/cache
KiB Swap: 5242876 total, 3013140 free, 2229736 used. 18428996 avail Mem
>
> I know it's (almost certainly) happening *way* above your level, but
> dropping Oracle support on *any* database, let alone a production database,
> is foolishness, and certainly *not* a cost savings, not in the long run.....
>
> I run Oracle on EC2, w/ mail enabled, and so far, have never run into an
> OOM situation. The system has to be *really* low on memory for the
> kernel's OOM killer to wake up and start killing stuff. When it does,
> Oracle is a big target, because it (almost certainly) is (and should be)
> the big memory consumer on your (EC2) instance.
>
> Some questions:
> 1.) What instance type(s) are you running? Do you have instance store
> volumes configured for swap? Do you have swap configured at all? What is
> the level of swap usage you are seeing?
> 2.) How is your Oracle memory usage configured? Do you have hugepages
> configured? (Please say yes....)
> 3.) What do the outputs of 'free -h' and 'top' tell you? How about
> 'vmstat'? 'sar -B'?
>
> -Mark
>
>
> On Mon, Jan 13, 2020 at 2:33 PM Sandra Becker <sbecker6925_at_gmail.com>
> wrote:
>
>> Server: AWS EC2
>> RHEL: 7.6
>> Oracle: 12.1.0.2
>>
>> We have a database on an AWS EC2 server that the OOM killer has
>> terminated twice in the last 5 days, both times it was the ora_dbw0_dwprod
>> process. On 1/8 postfix was enabled to allow us to email the DBA team
>> through an AWS relay server when a backup failed. We stopped running daily
>> backups and cronjobs that did a quick check for expired accounts. We've
>> left postfix enabled for sending emails. We are searching for answers but
>> have none yet as to why this is happening. We also no longer have Oracle
>> support available to us. (management saving money again).
>>
>> Questions:
>>
>> 1. Could postfix be related to the memory issues even though we
>> haven't sent any emails since the first crash 5 days ago?
>> 2. How can we monitor the memory usage of an EC2 instance?
>> 3. How do you disable the OOM killer in EC2 should we decide to go
>> that route? (we have it disabled on our on-prem servers) The docs I've
>> found so far have not been helpful.
>>
>> I appreciate any help you can give us or pointing us in the right
>> direction.
>>
>> Thank you,
>> --
>> Sandy B.
>>
>>
-- Sandy B. -- http://www.freelists.org/webpage/oracle-lReceived on Tue Jan 14 2020 - 21:00:55 CET