Re: To Swap, or not to Swap
Date: Thu, 30 Mar 2023 14:51:54 -0400
Message-ID: <CAP79kiQ3i6gFjHyBy-t0drm3eYmxoH+tUo5GZ+779+wHqk5ghA_at_mail.gmail.com>
What I mean, (after having gone through this with OOM recently which killed our databases due to memory leak) , we now monitor swap usage over time and MemAvailable over time (especially MemAvailable).
If MemAvailable drops below a threshold we can kill individual processes that are hogging memory. If some memory has been paged out to swap, we can get that back into RAM by doing a swapoff/swapon once the offending process has been terminated.
This should (except in extreme/rare cases) catch potential OOM issues before the server starts thrashing on swap. Like you say it's rare for a process to "suddenly" chew up all your ram and flush a bunch of crap out into swap. Usually it's a "bleeding to death" scenario where swap usage slowly grows and grows and your machine ends up grinding to a halt.
Chris
On Thu, Mar 30, 2023 at 2:46 PM Jared Still <jkstill_at_gmail.com> wrote:
> The test case is extreme: I used 'tail /dev/zero' to consume all memory,
> and it works pretty quickly.
>
> Assuming the issue is a program or set of programs with memory leaks, this
> may be something you monitor.
>
> I have seen scripts used to regularly kill and restart apps with leaks,
> because that was the only available recourse.
>
> If something causes a server to use too much memory, what are you going to
> kill if it is an unknown and new condition?
>
> And if the CPU is reduced to managing swap for active memory, you may not
> even be able to logon to the server.
>
> Probably best to just let OOM deal with it, then find out what happened.
>
> Tanel Poder's 0xtools can be used for forensic analysis of this, I tested
> for this specific case.
>
>
>
>
>
>
> On Thu, Mar 30, 2023 at 11:18 AM Chris Taylor <
> christopherdtaylor1994_at_gmail.com> wrote:
>
>> Well, that's an interesting use case. Not sure what to think about
>> that. It could be argued that if you're monitoring swap usage, you'd catch
>> the problem before OOM got to it, right?
>>
>> Without swap you lose that opportunity, right?
>>
>> Chris
>>
>>
>> On Thu, Mar 30, 2023 at 1:47 PM Jared Still <jkstill_at_gmail.com> wrote:
>>
>>> I was recently asked by a colleague this same question.
>>>
>>> He had been asked by a client, with a fairly well regarded sysadmin team.
>>>
>>> They wanted to eliminate swap: here's why.
>>>
>>> If a process is consuming memory at a prodigious rate, then the OOM (out
>>> of memory) killer is going to catch up to it and kill it eventually.
>>>
>>> Their position was that with a swap partition, this process was
>>> prolonged far too long.
>>>
>>> Without swap, the process gets killed relatively quickly.
>>>
>>> With swap, it can take many minutes. The CPU spends so much time
>>> managing memory on swap (remember, we are at an OOM condition), which is
>>> slow, that the time to kill the process is prolonged to many minutes.
>>>
>>> At first my position was "what, no swap! we can't do that!"
>>>
>>> But, I decided to test it a bit.
>>>
>>> A small physical server, 4 cores and 32G of RAM, is running Oracle 19.3.
>>>
>>> A swingbench test is running, 10 sessions per core.
>>>
>>> When I cause an OOM condition with the 16G swap partition enabled, it
>>> took the system between 7.5-8 minutes to kill the process.
>>>
>>> (For the client, the amount of time was 20+ minutes.)
>>>
>>> And during that time, it was impossible to logon to the server. The CPU
>>> was too busy thrashing around in the swap partition.
>>>
>>> The next step of course is to disable the swap.
>>>
>>> Same OOM condition caused. Time to resolution is now 7 seconds.
>>>
>>> There is no swap to manage as if it were RAM.
>>>
>>> That is quite a bit difference.
>>>
>>> Of course I wondered 'what about paging in memory for new processes?',
>>> as that often uses a page in swap.
>>>
>>> Without swap, it just takes place in memory.
>>>
>>> Swap is also a landing place for some pages used to initialize
>>> processes, as they can only be used once.
>>>
>>> This is a minimal amount, and can just be left in memory.
>>>
>>> If one really wants to conserve, there is a thing called ZRAM
>>> (compressed memory) where those pages can be parked, instead of swap.
>>>
>>> So, does anyone see any other need for a swap partition?
>>>
>>> It seems to have outlived its usefulness.
>>>
>>> Jared Still
>>> Certifiable Oracle DBA and Part Time Perl Evangelist
>>> Principal Consultant at Pythian
>>> Oracle ACE Alumni
>>> Pythian Blog http://www.pythian.com/blog/author/still/
>>> Github: https://github.com/jkstill
>>> Personality: http://www.personalitypage.com/INTJ.html
>>>
>>>
>>>
>>>
>>> On Thu, Mar 30, 2023 at 9:24 AM Jared Still <jkstill_at_gmail.com> wrote:
>>>
>>>> That is the question.
>>>>
>>>> I am curious about current thoughts on having or not having a swap
>>>> partition on Linux based Oracle servers.
>>>>
>>>> Let's assume typical production standard servers with a reasonable
>>>> amount of RAM, sway 256G or more.
>>>>
>>>> I have some thoughts on this myself, but would like to see others'
>>>> thoughts on this.
>>>>
>>>>
>>>> Jared Still
>>>> Certifiable Oracle DBA and Part Time Perl Evangelist
>>>> Principal Consultant at Pythian
>>>> Oracle ACE Alumni
>>>> Pythian Blog http://www.pythian.com/blog/author/still/
>>>> Github: https://github.com/jkstill
>>>> Personality: http://www.personalitypage.com/INTJ.html
>>>>
>>>>
>>>>
-- http://www.freelists.org/webpage/oracle-lReceived on Thu Mar 30 2023 - 20:51:54 CEST