RE: high "latch: cache buffers chains" waits in 10.2.0.3 DB

From: Mark W. Farnham <mwf_at_rsiz.com>
Date: Wed, 27 Feb 2008 01:43:22 -0500
Message-ID: <001b01c8790c$0daa9270$0312010a@rsiz.com>


It seems that you exceed the ability of the machine and database to keep even with the workload at some number of connections and then the existing system degrades.

If you cannot get the folks writing the system to improve the system, you have two paradoxical possibilites.

  1. If the load is peaked and you could survive if you could just get enough more connections for a while, then either shared servers or simply a higher processes setting might do the trick. This would tend to be the case if the workload of connections varies wildly with many of them being quite light weight and you just once in a while have the unhappy coincidence of too many heavy hitters. Of course if you could identify the worst heavy hitters, then you might convince the application builders to check the number of executors of the heavy requests and wait to fire off the current heavy hitter until the number of concurrent running heavy hitters is low enough to be supported by the system. This contemplates the notion that the heavy hitter queries might be necessarily so rather than simply badly written wasteful queries. Oracle's own "Concurrent Manager" from the eBusiness suite is an example of handling this.
  2. If the load reaches a sustained plateau over which you cannot sustain the workload, then lower the number of connections allowed to the number you can sustain with the current application. They will just have to live with "Try again later." You will thus at least serve those who get a turn reasonably. This will also give you a chance to evaluate the queries generating the work for waste. The most obvious opportunity for improvement is in the queries that take the longest and consume the most resources, but some folks enter an infinite loop trying to improve a costly query that is already optimal. Other load that is sometimes overlooked is a chronic background drag of many unused indexes being updated, and I'm feeling like I'm about to start typing a laundry list so I'll stop. If you (or the developers) cannot reduce the work required per request, then the best you can do is allow the number to be served who can be served.

In the mean time you've started to identify some objects that are apparently getting slammed. (Note Greg Rahn's advice about the query you used and some better alternatives.) The next step is to look at the sql that is doing the slamming. There are various things you can do to reduce query expense if you have no chance of changing the code. What to try depends on the specifics of the query and the data. Whether or not there is any opportunity for improvement worth pursuing without re-writing the sql is an experimental question.

As for the hot block analysis, the details of what is making an object's blocks hot is a required input before embarking on trying to cure it efficiently. It sounds like you think partitioning might be a silver bullet for your particular case but you don't have it licensed. Before you either try to justify the license or implement some version of "Poor man's partitioning" by separating the contents of the table into multiple tables joined by a union all for the views that need it all while operating on only the required piece when the scope of what you would otherwise be using as a partitioning column is known, you would really need to examine what hits that table.

Good luck!

-----Original Message-----
From: oracle-l-bounce_at_freelists.org [mailto:oracle-l-bounce_at_freelists.org] On Behalf Of bkaltofen_at_gmx.de
Sent: Tuesday, February 26, 2008 12:25 PM To: Oracle-L
Subject: high "latch: cache buffers chains" waits in 10.2.0.3 DB

Content-Type: multipart/alternative; boundary="------------050402000304000607000905"

This is a multi-part message in MIME format.

--------------050402000304000607000905
Content-Type: text/plain; charset=ISO-8859-15; format=flowed
Content-Transfer-Encoding: 8bit

Hello,

I'm experiencing a problem with a high number of "latch: cache buffers chains" waits in a customer database.

Environment:
Solaris 10 x86
Oracle EE 10.2.0.3
Application Type: Web-Application which opens a session for each request (no comment. We have no influence on the application)

Symptoms:
"latch: cache buffers chains" waits go up number of sessions increases until "max processes" is reached, so no new connections can be established.
Web-application stops responding, as no more sessions are possible I can not reproduce the issue by will and there is no test database at customer side.

Workaround:
- Wait till latch contention is resolved --> Customer is not appy about
that. One time it took over 2 hours
or
- restart instance --> fast workaround, but buffer and library cache are
lost.

I think it has something to do with hot blocks.

<snip>

--
http://www.freelists.org/webpage/oracle-l
Received on Wed Feb 27 2008 - 00:43:22 CST

Original text of this message