Re: huge CPU load after UEK upgrade

From: Tanel Poder <tanel_at_tanelpoder.com>
Date: Thu, 12 Aug 2021 13:29:33 -0400
Message-ID: <CAMHX9JK9NwrZc3Zd6Q8EbaFkUa61_sUBHFkUB0AocsvzAv7Z5Q_at_mail.gmail.com>



Oh, UEK 4 is Linux kernel 4.1.x, I don't recall if that bug/problem exists there, but you can still use the techniques from my blog entry to drill down deeper. One of the main questions is - is your Linux system load high due to on-CPU threads or threads in D state. perf record -g would help you drill down more into the CPU usage and from the full stack traces you'd see which kernel functions are trying to get that spinlock you already saw contention on, but if it's mostly D state sleeping threads, then /proc/PID/stack or psnapper would tell you more.

On Thu, Aug 12, 2021 at 1:26 PM Tanel Poder <tanel_at_tanelpoder.com> wrote:

> Hi,
>
> Run psnapper or just sudo cat /proc/PID/stack for some of the Oracle &
> kworker processes (that are in D state) and check if you have *wbt_wait*
> in the stack. UEK 4 (kernel 4.14) has a Linux kernel write throttling
> bug/problem and lots of kworkers showing up is one of the symptoms. I've
> written about drilling down into high Linux system load in general here:
>
> https://tanelpoder.com/posts/high-system-load-low-cpu-utilization-on-linux/
>
> Incidentally, this is one of the problem scenarios I cover at the upcoming
> "troubleshooting very complex Oracle performance problems" virtual
> "conference" (where all speakers are me - https://tanelpoder.com/events
> ;-)
>
> --
> Tanel Poder
>
>
> On Thu, Aug 12, 2021 at 2:41 AM Laurentiu Oprea <
> laurentiu.oprea06_at_gmail.com> wrote:
>
>> Hello everyone,
>>
>> I know this is more of a linux question but maybe someone had a similar
>> issue.
>>
>> I started to upgrade the kernel version from UEK3 to UEK4 on servers (rac
>> 2/3 nodes) running OL6. I had no issue upgrading the kernel on virtual
>> machines.
>>
>> When I upgraded the first rac node located on physical hardware, after
>> starting the oracle instances the cpu load went crazy, the whole system was
>> far slower. For example the cpu load on the other rac nodes with UEK3 was
>> 4/3/3/ while the upgraded node load was 107/104/77 with only half the
>> instances started, this workload being created even if the instances had no
>> significant workload (they were just started)
>>
>> I observed that with the new kernel version when oracle instances start
>> there is quite a big number of kworker processes. Perf top will show me as
>> overhead
>> 17.24% [kernel] _raw_spin_lock
>> 4.50% [kernel] acpi_os_write_port
>>
>> Does anyone have any idea on what could be the issue?
>>
>> THanks a lot.
>>
>

--
http://www.freelists.org/webpage/oracle-l
Received on Thu Aug 12 2021 - 19:29:33 CEST

Original text of this message