Re: Deadlock ITL Waits

From: kyle Hailey <kylelf_at_gmail.com>
Date: Wed, 20 Jul 2011 22:20:29 -0700
Message-ID: <CADsdiQiUzTcZ6fMnNRYngUzpmQx46pL1wcOA5ZvtsqfUHMyQjg_at_mail.gmail.com>



Yeah, the massive log waits could exacerbate the ITL waits as things will be slow commiting their changes.
The db file sequential reads are huge as well. Looks like you have a problem on your storage.

Check the log write times ( I blogged some of this at http://dboptimizer.com/2011/07/20/wait-event-and-wait-class-metrics-vs-vsystem_event/)

Latencies in the past minute

col name for a25

select m.intsize_csec,
       n.name ,
       m.time_waited,
       m.wait_count,
       10*m.time_waited/nullif(m.wait_count,0) avgms
from v$eventmetric m,
     v$event_name n

where m.event_id=n.event_id
  and n.name in (
                  'log file sync',
                  'log file parallel write'
);

Latencies averaged over each hour

select

       btime,
       (time_ms_end-time_ms_beg)/nullif(count_end-count_beg,0) avg_ms
from (
select
       to_char(s.BEGIN_INTERVAL_TIME,'DD-MON-YY HH24:MI')  btime,
       total_waits count_end,
       time_waited_micro/1000 time_ms_end,
       Lag (e.time_waited_micro/1000)
              OVER( PARTITION BY e.event_name ORDER BY s.snap_id) time_ms_beg,
       Lag (e.total_waits)
              OVER( PARTITION BY e.event_name ORDER BY s.snap_id) count_beg
from
       DBA_HIST_SYSTEM_EVENT e,
       DBA_HIST_SNAPSHOT s
where
         s.snap_id=e.snap_id
   and e.event_name in (
                  'log file sync',
                  'log file parallel write'
)
order by begin_interval_time
)
order by btime
/

If the log file parallel writes are bad then probably something wrong on the storage system.

On Wed, Jul 20, 2011 at 3:37 PM, Stalin <stalinsk_at_gmail.com> wrote:

> We have been seeing lots of deadlock errors lately in load testing
> environments and they all have been due to enq: TX - allocate ITL entry. In
> reviewing the statspack report for the periods of deadlock, i see that, log
> file sync wait being the top consumer with a terrible wait time. That makes
> to me think the deadlock, is just a symptom of high log file sync wait
> times. Below is the snippet from statspack and looking at these numbers,
> especially CPU not being heavily loaded, wondering if this could be a case
> of storage issue. Sys Admins are checking the storage layer but thought
> would check here get any opinions/feedback.
>
> Top 5 Timed Events Avg
> %Total
> ~~~~~~~~~~~~~~~~~~ wait
> Call
> Event Waits Time (s) (ms)
> Time
> ----------------------------------------- ------------ ----------- ------
> ------
> log file sync 1,400,773 4,357,902 3111
> 91.4
> db file sequential read 457,568 334,834 732
> 7.0
> db file parallel write 565,843 27,573 49
> .6
> read by other session 16,168 7,395 457
> .2
> enq: TX - allocate ITL entry 575 6,854 11919
> .1
> -------------------------------------------------------------
> Host CPU (CPUs: 64 Cores: 8 Sockets: 1)
> ~~~~~~~~ Load Average
> Begin End User System Idle WIO
> WCPU
> ------- ------- ------- ------- ------- -------
> --------
> 3.13 7.04 2.26 3.30 94.44 0.00
> 7.81
>
> Statistic Total per Second per
> Trans
> --------------------------------- ------------------ --------------
> ------------
> redo synch time 435,852,302 120,969.3
> 309.7
> redo synch writes 1,400,807 388.8
> 1.0
> redo wastage 5,128,804 1,423.5
> 3.6
> redo write time 357,414 99.2
> 0.3
> redo writes 9,935 2.8
> 0.0
> user commits 1,400,619 388.7
> 1.0
>
>
> Environment : 11gr2 EE (11.2.0.1), Sol 10 Sparc
>
> Thanks,
> Stalin
>
>

--
http://www.freelists.org/webpage/oracle-l
Received on Thu Jul 21 2011 - 00:20:29 CDT

Original text of this message