Re: intermittent long "log file sync" waits
Date: Tue, 28 Jan 2020 09:42:36 -0800
Message-ID: <d70d59ac-9dd0-1704-de20-370a43a1c1a2_at_gmail.com>
Chris,
This would be a good use-case for 10046 extended tracing, just sayin'...
But looking at the differences between the two profiles (i.e. "fast" and
"slow"), the number of calls differs by just 1 call on "log file sync",
"row cache lock" and "DLM cross-inst call completion", but for "library
cache lock" and "library cache pin" there are 3 more and 16 less calls,
respectively.
So, "log file sync" might be a root cause, but pinning objects in the
Library Cache might also? Problems getting locks in the library cache
might corroborate the lack of any waits exhibited by "iostat", as the
Library Cache exists entirely in memory (within the Shared Pool in the SGA).
But before obtaining a trace, may I first take a swing for the fence in
the form of a SWAG? Can you check GV$SGA_RESIZE_OPS and verify if the
Shared Pool is being reduced in size in one (or more) of the RAC
instances at the time when the "slow" profile is happening?
Thanks,
-Tim
On 1/28/2020 8:09 AM, Chris Stephens wrote:
> 3-node Oracle 19.3 RAC
> Centos 7
>
> We have a SQLAlchemy/Python based application workload that is running
> the exact same steps with widely varying response times which appear
> to be related to varying "log file sync" wait times.
>
> Here is a profile of a "fast" run:
>
> CALL-NAME DURATION % CALLS MEAN
> MIN MAX
> ------------------------------ --------- ------ ------ --------
> -------- ---------
> SQL*Net message from client 53.197782 91.8% 10,092 0.005271
> 0.000177 28.568493
> EXEC 3.759177 6.5% 9,816 0.000383
> 0.000000 0.239592
> row cache lock 0.233153 0.4% 541 0.000431
> 0.000113 0.000941
> PARSE 0.140399 0.2% 4,867 0.000029
> 0.000000 0.006620
> DLM cross inst call completion 0.137330 0.2% 956 0.000144
> 0.000004 0.000505
> library cache lock 0.100171 0.2% 215 0.000466
> 0.000151 0.002133
> library cache pin 0.079729 0.1% 216 0.000369
> 0.000056 0.000710
> FETCH 0.058253 0.1% 1,062 0.000055
> 0.000000 0.004148
> log file sync 0.048217 0.1% 149 0.000324
> 0.000259 0.000505
> CLOSE 0.045416 0.1% 4,929 0.000009
> 0.000000 0.000073
> 20 others 0.135624 0.2% 11,854 0.000011
> 0.000000 0.001700
> ------------------------------ --------- ------ ------ --------
> -------- ---------
> TOTAL (30) 57.935251 100.0% 44,697 0.001296
> 0.000000 28.568493
>
> Here is a profile of a "slow" run:
>
> CALL-NAME DURATION % CALLS MEAN MIN MAX
> ------------------------------ ---------- ------ ------ --------
> -------- ----------
> SQL*Net message from client 131.186118 61.0% 10,092 0.012999
> 0.000212 106.789360
> log file sync 79.291166 36.8% 150 0.528608
> 0.000264 2.986575
> EXEC 3.728402 1.7% 9,816 0.000380
> 0.000000 0.221403
> row cache lock 0.248868 0.1% 542 0.000459
> 0.000111 0.001036
> PARSE 0.164267 0.1% 4,867 0.000034
> 0.000000 0.004652
> DLM cross inst call completion 0.146981 0.1% 957 0.000154
> 0.000005 0.001188
> library cache lock 0.104354 0.0% 218 0.000479
> 0.000160 0.000728
> library cache pin 0.082504 0.0% 202 0.000408
> 0.000157 0.000672
> FETCH 0.056687 0.0% 1,062 0.000053
> 0.000000 0.003969
> CLOSE 0.043590 0.0% 4,929 0.000009
> 0.000000 0.000180
> 20 others 0.142044 0.1% 11,866 0.000012
> 0.000000 0.001792
> ------------------------------ ---------- ------ ------ --------
> -------- ----------
> TOTAL (30) 215.194981 100.0% 44,701 0.004814
> 0.000000 106.789360
>
> looking at even histogram for that event:
>
> SQL> _at_evh "log file sync"
> EVH_EVENT EVH_WAIT_TIME_MILLI WAIT_COUNT EVH_EST_TIME
> LAST_UPDATE_TIME
> ________________ ______________________ _____________ _______________
> ______________________________________
> log file sync < 1 200051 100.026
> 27-JAN-20 11.39.57.344734 PM -06:00
> log file sync < 2 165 0.248
> 28-JAN-20 12.18.10.429089 AM -06:00
> log file sync < 4 150 0.45
> 27-JAN-20 11.18.31.158102 PM -06:00
> log file sync < 8 199 1.194
> 27-JAN-20 11.19.14.209947 PM -06:00
> log file sync < 16 253 3.036
> 28-JAN-20 08.03.17.851328 AM -06:00
> log file sync < 32 472 11.328
> 27-JAN-20 11.20.22.746033 PM -06:00
> log file sync < 64 728 34.944
> 28-JAN-20 01.13.37.364541 AM -06:00
> log file sync < 128 691 66.336
> 27-JAN-20 11.31.37.400504 PM -06:00
> log file sync < 256 414 79.488
> 28-JAN-20 12.18.10.423987 AM -06:00
> log file sync < 512 405 155.52
> 28-JAN-20 03.27.50.540383 AM -06:00
> log file sync < 1024 459 352.512
> 27-JAN-20 11.35.14.378363 PM -06:00
> log file sync < 2048 482 740.352
> 28-JAN-20 01.18.20.556248 AM -06:00
> log file sync < 4096 576 1769.472
> 27-JAN-20 11.21.05.084998 PM -06:00
> log file sync < 8192 89 546.816
> 27-JAN-20 11.57.36.436460 AM -06:00
> log file sync < 16384 60 737.28
> 25-JAN-20 07.48.31.460408 AM -06:00
> log file sync < 32768 39 958.464
> 27-JAN-20 11.59.09.869286 AM -06:00
> log file sync < 65536 27 1327.104
> 25-JAN-20 09.49.13.856563 AM -06:00
>
> The weird thing is that I don't see corresponding log I/O waits
> (awaits) in iostat output.
>
> I have a ticket open w/ oracle but does anyone have any suggestions to
> discover root cause and/or solution?
-- http://www.freelists.org/webpage/oracle-lReceived on Tue Jan 28 2020 - 18:42:36 CET