Re: Deadlock ITL Waits
Date: Mon, 25 Jul 2011 14:10:11 -0700
Message-ID: <CA+FfP7hLw+Q0UJYoc=Rd2nTgMsheppm0JTgWECL7dr6-H9oAvg_at_mail.gmail.com>
It is 15K RPM, 300G drives.
Thanks Harel for the pointers. I will report back when i hear from storage vendor.
On Mon, Jul 25, 2011 at 12:20 PM, Harel Safra <harel.safra_at_gmail.com> wrote:
> Stalin,
> You haven't specified if the drives are 15k or 10k RPM or the size and
> configuration of the SAN cache, so lets assume 15k, write through cache and
> do some back of the napkin calculations:
> As a rule of thumb a 15k RPM SAS drive can do about 180 IOPS. Since you
> have 22 drives in your array the whole array can do 180*22=3960 IOPS, lets
> call that 4000 IOPS.
> Your array is RAID 1+0 so every database write IO means twice the write IO
> on the drives, so your 1769 writes/s mean ~3500 IOPS to the array. Add the
> ~250 reads/s and you're indeed getting real close to the limit of the array.
> Even if the SAN is writing to cache only, if you're sustaining ~1750 w/s
> the cache quite possibly won't be able to be flushed fast enough.
>
> Grill your storage vendor, they should have the metrics to test if the
> array is reaching its limits.
>
> Harel Safra
>
>
> On Mon, Jul 25, 2011 at 8:34 PM, Stalin <stalinsk_at_gmail.com> wrote:
>
>> Well this is a T5220 Cool thread server, apparently good for OLTP type
>> applications but not good for batch or warehouse type application, unless
>> you use parallel query options.
>>
>> I had got the IOstat numbers during the slowness period,  which seems
>> little puzzling to me.
>>
>>                     extended device statistics
>>     r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
>>   253.7 1769.0 2048.6 15844.8 222.5 253.2  110.0  125.2  94 100 /data
>>
>> With 16MB/s writes, we are seeing service time of 125ms. And also looking
>> the wait time in the Queue, seems like pushing the array to its limits,
>> which i can't believe. Is this normal for an array with 22 disks in Raid 1+0
>> (300G SAS drives, FC attached, SAN  storagetek 2540). We have a ticket
>> opened with Sun/Oracle, but no progress made thus far.
>>
>> We had a bad drive, however spare kicked in, scheduled for replacement.
>> And no errors seen in the path to the array. Any clues what might be
>> happening.
>>
>>  On Thu, Jul 21, 2011 at 8:47 PM, Chitale, Hemant Krishnarao <
>> Hemant.Chitale_at_sc.com> wrote:
>>
>>>
>>> This seems to be similar to this thread :
>>> http://forums.oracle.com/forums/thread.jspa?threadID=2256521&tstart=0
>>>
>>>
>>> 1.4million commits and 1.4million 'log file sync' waits of 3seconds each
>>> ?!!!
>>>
>>>
>>> Given that you have reported (from another email)
>>>
>>> Event                      Waits  <1ms  <2ms  <4ms  <8ms <16ms <32ms
>>>  <=1s   >1s
>>> -------------------------- ----- ----- ----- ----- ----- ----- -----
>>> ----- -----
>>> log file parallel write      38K  72.5  15.4   5.4   2.0    .8    .4
>>> 1.3   2.2
>>> log file sync               838K   2.9   1.0    .5   1.7   1.7    .8
>>> 7.6  83.8
>>>
>>> I would guess that are are certain very very large spikes in I/O response
>>> times  (or that there's a bug in the timed_statistics)
>>>
>>> (A 64 CPU install without the Diagnostic Pack licence ?)
>>>
>>>
>>> Hemant K Chitale
>>>
>>> ________________________________________
>>> From: oracle-l-bounce_at_freelists.org [mailto:
>>> oracle-l-bounce_at_freelists.org] On Behalf Of Stalin
>>> Sent: Thursday, July 21, 2011 6:37 AM
>>> To: oracle-l
>>> Subject: Deadlock ITL Waits
>>>
>>> We have been seeing lots of deadlock errors lately in load testing
>>> environments and they all have been due to enq: TX - allocate ITL entry. In
>>> reviewing the statspack report for the periods of deadlock, i see that, log
>>> file sync wait being the top consumer with a terrible wait time. That makes
>>> to me think the deadlock, is just a symptom of high log file sync wait
>>> times.  Below is the snippet from statspack and looking at these numbers,
>>> especially CPU not being heavily loaded, wondering if this could be a case
>>> of storage issue. Sys Admins are checking the storage layer but thought
>>> would check here get any opinions/feedback.
>>>
>>> Top 5 Timed Events                                                    Avg
>>> %Total
>>> ~~~~~~~~~~~~~~~~~~                                                   wait
>>>   Call
>>> Event                                            Waits    Time (s)   (ms)
>>>   Time
>>> ----------------------------------------- ------------ ----------- ------
>>> ------
>>> log file sync                                1,400,773   4,357,902
>>> 3111   91.4
>>> db file sequential read                        457,568     334,834    732
>>>    7.0
>>> db file parallel write                         565,843      27,573     49
>>>     .6
>>> read by other session                           16,168       7,395    457
>>>     .2
>>> enq: TX - allocate ITL entry                       575       6,854  11919
>>>     .1
>>>          -------------------------------------------------------------
>>> Host CPU  (CPUs: 64  Cores: 8  Sockets: 1)
>>> ~~~~~~~~              Load Average
>>>                      Begin     End      User  System    Idle     WIO
>>> WCPU
>>>                    ------- -------   ------- ------- ------- -------
>>> --------
>>>                       3.13    7.04      2.26    3.30   94.44    0.00
>>>  7.81
>>>
>>> Statistic                                      Total     per Second
>>>  per Trans
>>> --------------------------------- ------------------ --------------
>>> ------------
>>> redo synch time                          435,852,302      120,969.3
>>>  309.7
>>> redo synch writes                          1,400,807          388.8
>>>    1.0
>>> redo wastage                               5,128,804        1,423.5
>>>    3.6
>>> redo write time                              357,414           99.2
>>>    0.3
>>> redo writes                                    9,935            2.8
>>>    0.0
>>> user commits                               1,400,619          388.7
>>>    1.0
>>>
>>>
>>> Environment : 11gr2 EE (11.2.0.1), Sol 10 Sparc
>>>
>>> Thanks,
>>> Stalin
>>>
>>> This email and any attachments are confidential and may also be
>>> privileged.  If you are not the addressee, do not disclose, copy, circulate
>>> or in any other way use or rely on the information contained in this email
>>> or any attachments.  If received in error, notify the sender immediately and
>>> delete this email and any attachments from your system.  Emails cannot be
>>> guaranteed to be secure or error free as the message and any attachments
>>> could be intercepted, corrupted, lost, delayed, incomplete or amended.
>>>  Standard Chartered PLC and its subsidiaries do not accept liability for
>>> damage caused by this email or any attachments and may monitor email
>>> traffic.
>>>
>>> Standard Chartered PLC is incorporated in England with limited liability
>>> under company number 966425 and has its registered office at 1 Aldermanbury
>>> Square, London, EC2V 7SB.
>>>
>>> Standard Chartered Bank ("SCB") is incorporated in England with limited
>>> liability by Royal Charter 1853, under reference ZC18.  The Principal Office
>>> of SCB is situated in England at 1 Aldermanbury Square, London EC2V 7SB. In
>>> the United Kingdom, SCB is authorised and regulated by the Financial
>>> Services Authority under FSA register number 114276.
>>>
>>> If you are receiving this email from SCB outside the UK, please click
>>> http://www.standardchartered.com/global/email_disclaimer.html to refer
>>> to the information on other jurisdictions.
>>>
>>
>>
>>
>> --
>> Thanks,
>>
>> Stalin
>>
>
>
-- Thanks, Stalin -- http://www.freelists.org/webpage/oracle-lReceived on Mon Jul 25 2011 - 16:10:11 CDT
