Re: intermittent long "log file sync" waits
Date: Wed, 29 Jan 2020 09:13:36 +0100 (CET)
Message-ID: <578860919.117416.1580285616352_at_ox.hosteurope.de>
Hello Chris,
as I already mentioned - please use pSnapper ( https://github.com/tanelpoder/psnapper ) by Tanel Poder.
pSnapper samples /proc file system and does not suspend the process like pstack (as it is just a wrapper around gdb which uses ptrace() under the hood). Suspending a process can screw it up :)
In addition pSnapper provides more useful information (e.g. state, C function that runs in kernel space for a specific syscall, etc.).
Best Regards
Independent Oracle performance consultant and researcher
Website: http://www.soocs.de
> Chris Stephens <cstephens16_at_gmail.com> hat am 28. Januar 2020 um 23:49 geschrieben:
Stefan Koehler
Twitter: _at_OracleSK
>
> was just getting ready to sign off and noticed the archivelog backup scheduled to run every hour seems to be stuck and has been for at least 10 mins:
>
> [oracle_at_lsst-oradb04 ~]$ ps -ef | grep oracle_backup.sh
> oracle 13369 31167 0 16:43 pts/0 00:00:00 grep --color=auto oracle_backup.sh
> oracle 14649 14645 0 16:00 ? 00:00:00 /bin/bash /home/oracle/scripts/rman/oracle_backup.sh -d lsst2db -s archivelog.rman -c lsst2db_rman -e cs2018_at_ncsa.illinois.edu -r iddsdba_rman
> [oracle_at_lsst-oradb04 ~]$ pstack 15649
> Process 15649 not found.
> [oracle_at_lsst-oradb04 ~]$ pstack 14649
> #0 0x00007f9a6537e41c in waitpid () from /lib64/libc.so.6
> #1 0x0000000000440b84 in waitchld.isra.10 ()
> #2 0x0000000000441e3c in wait_for ()
> #3 0x0000000000433b0e in execute_command_internal ()
> #4 0x0000000000433d2e in execute_command ()
> #5 0x000000000041e365 in reader_loop ()
> #6 0x000000000041c9ce in main ()
> [oracle_at_lsst-oradb04 ~]$ pstack 14649
> #0 0x00007f9a6537e41c in waitpid () from /lib64/libc.so.6
> #1 0x0000000000440b84 in waitchld.isra.10 ()
> #2 0x0000000000441e3c in wait_for ()
> #3 0x0000000000433b0e in execute_command_internal ()
> #4 0x0000000000433d2e in execute_command ()
> #5 0x000000000041e365 in reader_loop ()
> #6 0x000000000041c9ce in main ()
> [oracle_at_lsst-oradb04 ~]$ pstack 14649
> #0 0x00007f9a6537e41c in waitpid () from /lib64/libc.so.6
> #1 0x0000000000440b84 in waitchld.isra.10 ()
> #2 0x0000000000441e3c in wait_for ()
> #3 0x0000000000433b0e in execute_command_internal ()
> #4 0x0000000000433d2e in execute_command ()
> #5 0x000000000041e365 in reader_loop ()
> #6 0x000000000041c9ce in main ()
> [oracle_at_lsst-oradb04 ~]$ pstack 14649
> #0 0x00007f9a6537e41c in waitpid () from /lib64/libc.so.6
> #1 0x0000000000440b84 in waitchld.isra.10 ()
> #2 0x0000000000441e3c in wait_for ()
> #3 0x0000000000433b0e in execute_command_internal ()
> #4 0x0000000000433d2e in execute_command ()
> #5 0x000000000041e365 in reader_loop ()
> #6 0x000000000041c9ce in main ()
> [oracle_at_lsst-oradb04 ~]$
>
>
> On Tue, Jan 28, 2020 at 4:30 PM Chris Stephens < cstephens16_at_gmail.com> wrote:
> > ok. i've had the following running since lunchtime and will let it run through the night in hopes that issue occurs again:
> >
> > _at_snapper ash,stats,trace 10 999999 lgwr
> >
> > I will run the following script as well:
> >
> > #! /bin/bash
> >
> > while [ 1 ]
> > do
> > echo `date` >> /tmp/pstack.26552
> > pstack 26552 >> /tmp/pstack.26552
> > echo `date` >> /tmp/pstack.26556
> > pstack 26556 >> /tmp/pstack.26556
> > echo `date` >> /tmp/pstack.26560
> > pstack 26560 >> /tmp/pstack.26560
> > sleep 5
> > done
> >
> > based off:
> >
> > [oracle_at_lsst-oradb05 bin]$ ps -ef | egrep "lg.*lsst2db2"
> > oracle 26552 1 0 Jan17 ? 00:06:14 ora_lgwr_lsst2db2
> > oracle 26556 1 0 Jan17 ? 00:00:13 ora_lg00_lsst2db2
> > oracle 26560 1 0 Jan17 ? 00:00:12 ora_lg01_lsst2db2
> >
> > All sessions for this workload are connecting to service w/ tracing enabled so we'll have trace data as well.
> >
> > I will (hopefully) have updates in the morning.
--
http://www.freelists.org/webpage/oracle-l
Received on Wed Jan 29 2020 - 09:13:36 CET