Re: intermittent long "log file sync" waits

From: Tanel Poder <tanel_at_tanelpoder.com>
Date: Thu, 6 Feb 2020 23:59:05 -0500
Message-ID: <CAMHX9JLvmrCki_724tyQRSYEuOndmUK2CL+FUWSOZ0TGYRigcg_at_mail.gmail.com>



I don't see a clear direct connection between OS directory entry cache and log file sync duration (when it looked like LGWR was idle). Of course sometimes there are surprises and it makes sense in hindsight, but this would be a pretty complex chain of unfortunate events then :)

  1. Dentry cache size (and any slowness) would only affect system calls when trying to find, open files etc. Once the file is open, dentries should not be in play as further system calls refer to the file via their file descriptor that points directly to the inode.
  2. Or perhaps a large dentry cache just consumes lots of kernel (SLAB/SLUB allocator) memory and when reaching some limit, *other* system calls that also hope to quickly allocate a standard-sized chunk from SLAB, end up waiting (or even trying to reclaim other kernel memory!)

You could systematically measure this by running pSnapper on LGWR and one of the test FG session processes at the same time. You can sample just those 2 PIDs at first, sampling everything will make pSnapper slower.

You can run something like this in a loop (change 123,456 to your PIDs of interest) - and it would show if those processes were in some uninterruptible sleep (D) when executing a semop system call.

sudo psn -a -G pid,syscall,wchan,kstack -p 123,456

P.S. For others benefit, if you haven't used the Linux Process Snapper yet, you should! ;-)
Examples here: https://tanelpoder.com/psnapper/

--
Tanel Poder
https://tanelpoder.com/seminar

On Thu, Feb 6, 2020 at 5:21 PM Chris Stephens <cstephens16_at_gmail.com> wrote:


> I received this update from Oracle:
>
> Discussed this issue with a colleague who mentioned the following--- Bug 25962858 : LOG FILE SYNCS ON A RELATIVELY IDLE DATABASE relates to DENTRY
> Closed as not a bug.The solution in that bug was to periodically clear the DENTRY cache.To check the size issueslabtop --onceThis will show something like thisscorreia-linux::oracle> slabtop --once
> Active / Total Objects (% used) : 10136104 / 10227087 (99.1%)
> Active / Total Slabs (% used) : 273801 / 273801 (100.0%)
> Active / Total Caches (% used) : 78 / 108 (72.2%)
> Active / Total Size (% used) : 1572176.58K / 1601572.34K (98.2%)
> Minimum / Average / Maximum Object : 0.01K / 0.16K / 15.19KOBJS ACTIVE USE OBJ SIZE SLABS OBJ/SLAB CACHE SIZE NAME
> 5516472 5516241 99% 0.10K 141448 39 565792K buffer_head
> 1422582 1421855 99% 0.19K 67742 21 270968K dentry <<<<----------------------------
> 481664 474527 98% 0.03K 3763 128 15052K kmalloc-32To clear the dentry cache you can issue.... a reboot would do the sameecho 2 > /proc/sys/vm/drop_caches
>
>
>
>>>>>>>>>>
>>>>>>>>>>
-- http://www.freelists.org/webpage/oracle-l
Received on Fri Feb 07 2020 - 05:59:05 CET

Original text of this message