Re: How to isolate root cause of increasing libcache latch waits? (Subtitle: Hoping to upstage Oracle Support)
Date: Wed, 14 Oct 2009 10:21:12 -0400
Message-Id: <4EBC478B-6630-4C39-BBE7-A706CEB92618_at_gmail.com>
What does the library cache dump show?
Oradebug dump library_cache level ( 1 thru 4)
On Oct 14, 2009, at 9:42 AM, Charles Schultz <sacrophyte_at_gmail.com> wrote:
> Good day, list,
>
> The Symptoms
> We have observed a gradual rise in library cache latch waits in our
> Production database. Oracle Enterprise Manager shows these as small,
> short-lived "spikes" in the Top Activity Monitor. Our experience is
> that these spikes generally grow in intensity and duration
> (impacting more users for a longer period of time). We have observed
> that turning off OEM and the agent helps to alleviate these latch
> waits.
>
> The System
> Oracle EE 10.2.0.2 on Solaris 10. Upgrading to 10.2.0.4 this
> weekend. Hardware is Sun F15k with 44 CPUs and 96GB of RAM.
>
> The Support
> In our first SR with Oracle, the "resolution" was to stop using OEM.
> This is a very poor response in my opinion, for it does not address
> the root problem at all; it offers no insight as to what is causing
> the latch waits in the first place, nor is it a very effective way
> to treat a problem as it merely assuages the situation, not
> eliminating it.
>
> In our second SR, I have an analyst who wants me to turn on event
> 10046 level 12. Good and dandy, I would have done that myself if not
> for the fact that we often get hit with mutex hangs triggered by
> rowsource execution statistics. Besides, the analyst is simply
> fishing for something, he has given no indication that he really
> understands the situation. He has identified several large, cpu-
> intensive queries; how does cpu-intensive relate to latch waits,
> especially considering the fact that our box is only about 12% cpu
> utilized. I admit, perhaps I am especially dense today and he is
> spot on. Perhaps.
>
> The Synopsis
> Since it is library cache latches that we are talking about, I
> looked at the shared pool fragmentation. Since fragmentation
> generally gets worse as time progresses, it is my theory that
> perhaps the libcache latch requests are taking longer and longer to
> fulfill as the there are more (and smaller) chunks of memory to
> search. I am currently running a couple diagnostic scripts (forgot
> which Metalink note I got them from, but it is a memory diagnostic
> suite). We know from past experience that flushing the shared pool
> has a marginal affect; which makes sense, as x$kssmp shows us that a
> small fraction of the shared pool is flushable. We also know from
> past experience that if we let the latch waits continue to get
> worse, the system approaches a critical mass and we have to bounce
> it. And nobody likes to bounce the Production system. =)
>
> So, one big (obvious) caveat. We are upgrading to 10.2.0.4 this
> weekend which means we will inherit a ton of bug fixes, as well as
> those wonderful new bugs that come along with it. Unfortunately, the
> PSU (10.2.0.4.1 bundle) came out too late for our regression tests,
> but I have scheduled that for future regression tests. If memory
> fragmentation actually is the root problem, I have my doubts as to
> whether 10.2.0.4 is actually going to help us or not. If anyone has
> any insights as to whether or not this might be true, I am all ears.
>
> --
> Charles Schultz
-- http://www.freelists.org/webpage/oracle-lReceived on Wed Oct 14 2009 - 09:21:12 CDT