Re: To Swap, or not to Swap

From: Niklas Iveslatt <niklas.iveslatt_at_arisant.com>
Date: Mon, 3 Apr 2023 10:59:43 -0600
Message-ID: <CAHLzPNeqgX7D71jGyvLWhYQ2y3a8z8g3eayc-=CCRC7NwN80VQ_at_mail.gmail.com>

"These days" in Linux there is a Linux metric called PSI (Pressure Stall Information) /proc/pressure/memory - I know we look at this in our monitoring - I think *by default *we look at 3 distinct minutes in sequence and if there are any stalls we issue a warning for someone to look at it. I am not saying this is how it should be done in all cases and any monitoring thresholds may vary depending on the system and related characteristics.

I also looked and "These days" appear to mean Linux kernel >=4.20

Niklas Iveslatt
Senior Partner

Arisant LLC ~ http://www.arisant.com
44 Inverness Dr. E Bldg. C Suite 2 ~ Englewood, CO 80112 mobile: 303.882.4461 ~ main: 303.330.4065 ~ fax: 888.889.0155

Need to send me something securely? *Click here* <https://arisant.sendsafely.com/u/niklas.iveslatt>

On Mon, Apr 3, 2023 at 10:36 AM kyle Hailey <kylelf_at_gmail.com> wrote:

>
>
> was just thinking, there so many folks here who know unix internals way
> better than me ... what metric do you use to track memory pressure? Let me
> frame the question with some similar context ... CPU %Utilization is a bit
> of a crappy metric ... If we are at 100% I don't know if that is meeting
> the demand exactly or if there is a huge backlog of executable code
> wanting to run. Run queue is a bit crappy because it has I/O waiters and
> other issues mixed ... including internal OS locks, waits on memory
> allocation etc. AAS on the other hand is pretty cool. It tells me if 8
> process/queries want to run on 8 vCPUs which might be fine, or if 80
> process want to run on those 8 vCPUs which is some serious CPU concurrency.
> What can one use to monitor memory demand? Similiarly to CPU %utilization
> being bad, I'd say MemAvailable is similarly bad and even weaker. Page out
> is a canary in the coal mine. Page in is just bad. But that requires swap.
> Working on systems without swap, I've have seen scan rates mentioned over
> the years and I never used it because I used page in/out. Then at RDS
> working on test suite that had major memory issues I used perf and noticed
> before the system when down, that the perf showed the top function calls to
> be scans of memory lists.
>
> Perf Top:
> 21.90% [kernel] [k] shrink_inactive_list
> 3.28% libjvm.so [.] BacktraceBuilder::push
> 3.26% [kernel] [k] shrink_page_list
> 3.25% [kernel] [k] __lock_text_start
> 2.00% libjvm.so [.] CodeHeap::find_blob_unsafe
> 1.46% [kernel] [k]
> __raw_callee_save___pv_queued_spin_unlock
> 1.39% [kernel] [k] finish_task_switch
>
>
>
> On Mon, Apr 3, 2023 at 12:56 AM Timur Akhmadeev <timur.akhmadeev_at_gmail.com>
> wrote:
>
>> Just an example for zero swap from Netflix:
>> https://www.brendangregg.com/Slides/AWSreInvent2017_performance_tuning_EC2.pdf
>>
>> Usage: - Swappiness is set to zero to disable swapping and favor ditching
>>> the file system page cache first to free memory. (This tunable doesn’t make
>>> much difference, as swap devices are usually absent.)
>>
>>
>> On Mon, Apr 3, 2023 at 2:13 AM Jared Still <jkstill_at_gmail.com> wrote:
>>
>>> So, I would like to devise some testing for this, with and without swap.
>>>
>>> Suggestions for metrics to track?
>>>
>>> There are certain things I would like to track, mostly from an app
>>> perspective.
>>>
>>> But also I would like to see how responsive the system is under severe
>>> memory pressure, both with and with without swap.
>>>
>>>
>>>
>>> On Sat, Apr 1, 2023 at 06:29 Frits Hoogland <frits.hoogland_at_gmail.com>
>>> wrote:
>>>
>>>> I too keep on coming across systems with no swap. Our YugabyteDB
>>>> systems are setup with no swap.
>>>> And my first reaction was identical to most others: what?! No swap?!
>>>>
>>>> Since then, my position to swap or no swap is much more seeing the
>>>> benefit, and not being fiercely against it.
>>>> Of course the only right answer to swap or not is: it depends.
>>>>
>>>> The way I see it, is that swap is like a soft pillow. If you are
>>>> running into memory shortage, swap will soften the landing, and make the
>>>> system increasingly slower, then coming to a standstill and then still kill
>>>> the system. And therefore the question to ask is: do we want to get into a
>>>> situation of unpredictable slowness before the OOM kill? The latter is how
>>>> I look at it now: there have been countless hours spent on trying to make
>>>> sense of swap, trying to understand and tune swap and swapping, whilst
>>>> there always is, and must be an actual problem that caused swap. So
>>>> removing it removes that discussion and gets you more straight into facing
>>>> the problem.
>>>>
>>>> There are two thing that I see additionally:
>>>> - you might argue that it’s swap will only mildly be used (…in your
>>>> case). I would argue that if you cannot control the server to only take the
>>>> actual memory, and it swaps, how the hell can you control it to just mildly
>>>> swap?
>>>> - many servers perform some swapping whilst memory pressure is never
>>>> seen. One common reason for that is that buffered IO is treated with equal
>>>> priority as memory allocations. That means that if you start performing
>>>> lots of IOs using buffered calls, the OS might, and will, start paging out
>>>> existing memory allocations that have not recently been touched, such as
>>>> bootstrap code for an application because the buffered IO gotten higher
>>>> priority. One extremely common case that such a case happens is with most
>>>> common backups. (I hope this will be an “aha” moment for lots of people
>>>> asking why their database server starts to allocate some swap, whilst it
>>>> never did get over allocated)
>>>>
>>>>
>>>> *Frits Hoogland*
>>>>
>>>>
>>>>
>>>>
>>>> On 1 Apr 2023, at 15:10, Mark W. Farnham <mwf_at_rsiz.com> wrote:
>>>>
>>>> “Are there no longer any scenarios where the swapfile allows the system
>>>> to recover, without failing or hanging?”
>>>>
>>>> First, good ask Jared, excellent analysis Tim, from my viewpoint.
>>>>
>>>> I would slightly alter Tim’s question:
>>>>
>>>> “For the goal of the server in question, are there any scenarios where
>>>> a swapfile allows the system to recover without failing or hanging?”
>>>>
>>>> For a server with a primary goal of providing the support of one or
>>>> more instances of Oracle which are allocated within the bounds of the
>>>> server, I can imagine some “clients” of the database services being allowed
>>>> to run directly on the database server to eliminate all the latencies that
>>>> occur between servers.
>>>> With a very fast swapfile AND a decently implemented sniping monitor, I
>>>> further imagine the database services continue to deliver within the
>>>> planned service quality while the rogue client is paused or killed with
>>>> data and logs for analysis. (Hint, if the rogue client is holding a system
>>>> lock or an application lock that needs to be shared, I’m thinking pausing
>>>> is not an option.)
>>>>
>>>> There are certainly other scenarios where fail as fast as possible and
>>>> recover is better for the goal than even trying to recover.
>>>>
>>>> So I think Clay is also right that “it depends,” begging the question
>>>> of what is the best solution for someone supporting a fleet of generically
>>>> configured servers. Frankly, it would never have occurred to me to *
>>>> *NOT** have swap on the popular OS copied from UNIX, mostly because I
>>>> don’t know whether lack of a certain amount of swap still tosses warnings
>>>> that freak out customers (or actually fail the install) when installing my
>>>> favorite RDBMS.
>>>>
>>>> Seymour Cray, of course, used to say things about only implementing
>>>> virtual memory if you want things to be slower than they need to be.
>>>>
>>>> For database services there is probably room for an OS built for a
>>>> direct addressing cpu/memory complex. In that case programs would only
>>>> start if real space declared to be needed is available and addresses are
>>>> resolved to real addresses at program load time. I’m not even sure whether
>>>> modern chip technology could be faster with direct addressing than with
>>>> virtual addressing.
>>>>
>>>> I suppose quantum computing has problems if you try to use virtual
>>>> memory and/or swapping….
>>>>
>>>> mwf
>>>>
>>>>
>>>>
>>>> *From:* oracle-l-bounce_at_freelists.org [mailto:
>>>> oracle-l-bounce_at_freelists.org] *On Behalf Of *Tim Gorman
>>>> *Sent:* Thursday, March 30, 2023 8:25 PM
>>>> *To:* jkstill_at_gmail.com; Oracle-L Freelists
>>>> *Subject:* Re: To Swap, or not to Swap
>>>>
>>>>
>>>> Jared,
>>>>
>>>> You've made a good point with your testing. In essence, *fail fast*.
>>>> If it is just *fail fast* versus *fail slow*, then of course we all
>>>> choose to *fail fast* and then recover.
>>>>
>>>> The only question that comes to my mind is whether the presence of a
>>>> swapfile always means slow failure.
>>>>
>>>> Are there no longer any scenarios where the swapfile allows the system
>>>> to recover, without failing or hanging?
>>>>
>>>> For example, in Azure, VMs can use remote storage (a.k.a. OsDisk) for
>>>> the swapfile, or VMs can locate the swapfile on optional direct-attached
>>>> SSD storage that is considered "temporary" or ephemeral, because when the
>>>> VM is stopped and deallocated, the direct-attached storage has to be
>>>> erased, because another VM may be allocated to it in future. It is not
>>>> quality of storage that makes it "ephemeral", just the use-case. Anyway,
>>>> the OsDisk has I/O latency averaging 0.70 ms for both reads and writes, but
>>>> the so-called "ephemeral" disk provides less than 0.05 ms I/O latency,
>>>> which is about 14x faster.
>>>>
>>>> Clearly the performance of the storage on which the swapfile resides is
>>>> going to make a difference in its usefulness. If your testing involved
>>>> slow storage, then I can see where the machine would take 7-8 mins to
>>>> fail. I'm not trying to denigrate the resources you used, but I'm trying
>>>> to ask if the swapfile is on fast storage, then perhaps could it be more
>>>> helpful, even in extreme situations?
>>>>
>>>> In other words, shouldn't we ensure that a swapfile is fast, as well as
>>>> big enough? Wouldn't more performant storage allow the swapfile to recover
>>>> the situation?
>>>>
>>>> Thanks so much for the thought exercise!
>>>>
>>>> -Tim
>>>>
>>>> On 3/30/2023 10:46 AM, Jared Still wrote:
>>>>
>>>> I was recently asked by a colleague this same question.
>>>>
>>>> He had been asked by a client, with a fairly well regarded sysadmin
>>>> team.
>>>>
>>>> They wanted to eliminate swap: here's why.
>>>>
>>>> If a process is consuming memory at a prodigious rate, then the OOM
>>>> (out of memory) killer is going to catch up to it and kill it eventually.
>>>>
>>>> Their position was that with a swap partition, this process was
>>>> prolonged far too long.
>>>>
>>>> Without swap, the process gets killed relatively quickly.
>>>>
>>>> With swap, it can take many minutes. The CPU spends so much time
>>>> managing memory on swap (remember, we are at an OOM condition), which is
>>>> slow, that the time to kill the process is prolonged to many minutes.
>>>>
>>>> At first my position was "what, no swap! we can't do that!"
>>>>
>>>> But, I decided to test it a bit.
>>>>
>>>> A small physical server, 4 cores and 32G of RAM, is running Oracle 19.3.
>>>>
>>>> A swingbench test is running, 10 sessions per core.
>>>>
>>>> When I cause an OOM condition with the 16G swap partition enabled, it
>>>> took the system between 7.5-8 minutes to kill the process.
>>>>
>>>> (For the client, the amount of time was 20+ minutes.)
>>>>
>>>> And during that time, it was impossible to logon to the server. The
>>>> CPU was too busy thrashing around in the swap partition.
>>>>
>>>> The next step of course is to disable the swap.
>>>>
>>>> Same OOM condition caused. Time to resolution is now 7 seconds.
>>>>
>>>> There is no swap to manage as if it were RAM.
>>>>
>>>> That is quite a bit difference.
>>>>
>>>> Of course I wondered 'what about paging in memory for new processes?',
>>>> as that often uses a page in swap.
>>>>
>>>> Without swap, it just takes place in memory.
>>>>
>>>> Swap is also a landing place for some pages used to initialize
>>>> processes, as they can only be used once.
>>>>
>>>> This is a minimal amount, and can just be left in memory.
>>>>
>>>> If one really wants to conserve, there is a thing called ZRAM
>>>> (compressed memory) where those pages can be parked, instead of swap.
>>>>
>>>> So, does anyone see any other need for a swap partition?
>>>>
>>>> It seems to have outlived its usefulness.
>>>>
>>>> Jared Still
>>>> Certifiable Oracle DBA and Part Time Perl Evangelist
>>>> Principal Consultant at Pythian
>>>> Oracle ACE Alumni
>>>> Pythian Blog http://www.pythian.com/blog/author/still/
>>>> Github: https://github.com/jkstill
>>>>
>>>> Personality: http://www.personalitypage.com/INTJ.html
>>>>
>>>>
>>>>
>>>> On Thu, Mar 30, 2023 at 9:24 AM Jared Still <jkstill_at_gmail.com> wrote:
>>>>
>>>> That is the question.
>>>>
>>>> I am curious about current thoughts on having or not having a swap
>>>> partition on Linux based Oracle servers.
>>>>
>>>> Let's assume typical production standard servers with a reasonable
>>>> amount of RAM, sway 256G or more.
>>>>
>>>> I have some thoughts on this myself, but would like to see others'
>>>> thoughts on this.
>>>>
>>>>
>>>> Jared Still
>>>> Certifiable Oracle DBA and Part Time Perl Evangelist
>>>> Principal Consultant at Pythian
>>>> Oracle ACE Alumni
>>>> Pythian Blog http://www.pythian.com/blog/author/still/
>>>> Github: https://github.com/jkstill
>>>>
>>>> Personality: http://www.personalitypage.com/INTJ.html
>>>>
>>>>
>>>> --
>>> Jared Still
>>> Certifiable Oracle DBA and Part Time Perl Evangelist
>>> Principal Consultant at Pythian
>>> Oracle ACE Alumni
>>> Pythian Blog http://www.pythian.com/blog/author/still/
>>> Github: https://github.com/jkstill
>>> Personality: http://www.personalitypage.com/INTJ.html
>>>
>>>
>>>
>>
>> --
>> Regards
>> Timur Akhmadeev
>>
>

--
http://www.freelists.org/webpage/oracle-l

Received on Mon Apr 03 2023 - 18:59:43 CEST

This message: [ Message body ]
Next message: steve jamro: "JDBC thin client fetch array size"
In reply to kyle Hailey: "Re: To Swap, or not to Swap"
Next in thread: Mladen Gogala: "Re: To Swap, or not to Swap"
Reply: Mladen Gogala: "Re: To Swap, or not to Swap"
Reply: Niklas Iveslatt: "Re: To Swap, or not to Swap"
Reply: Niklas Iveslatt: "Re: To Swap, or not to Swap"

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ]

Original text of this message