Oracle FAQ | Your Portal to the Oracle Knowledge Grid |
Home -> Community -> Usenet -> c.d.o.server -> [C.D.O.S][Long...] How a DBA should handle a load-average-spike?
People,
I'm trying to write a simple document on how a DBA should handle a host which is experiencing a load-average-spike (and negatively impacting end-user response time).
Do any of you have some common sense you would like to share?
Here are some ideas which come to my mind:
-Try to characterize the current load using:
-the 'ps' command.
-the Targets -- Hosts -- Performance Page in EM
Load-type-1: Is the load-average-spike caused by a single process or a
small
number of processes which are 'hogging' the CPU?
Load-type-2: Is the load-average-spike caused by Oracle background processes which are 'hogging' the CPU?
Load-type-3: Is the load-average-spike caused by Oracle foreground processes which are 'hogging' the CPU?
Load-type-4: Is the load-average-spike caused by an abnormally large
number of
processes which have recently been added to the process list?
If we are dealing with Load-type-1, we need to trace the processes to a user or business-process which spawned the process(es). Then, we implement procedures to rein-in or disconnect this user. This is a difficult situation because risks of doing the "wrong thing" are high. You don't want to disconnect a user who is conducting a legitimate business-critical task.
If we are dealing with Load-type-2, we need to drill down into the
Oracle
background processes to find out what is keeping it overly busy.
EM can help you with this:
Home -- Targets -- Databases -- TheBusyDatabase -- Active Sessions with
High CPU.
In this screen you can see some clues about what is driving each of the
busy sessions.
When you drill down into Oracle background processes, one of the first
layers you
encounter are sessions.
Your goal is to gather enough information to pinpoint the application(s) which are driving these sessions. Then, you interact with an administrative interface on the application to search for end-users who might be responsible. Or you might then find information about the application's configuration or state which is abnormal and needs to be fixed. For example, sometimes public facing websites can be pushed hard by mal-ware Perl scripts operated by delinquents to generate high volumes of traffic (to crash the site). A poorly constructed application might filter (or magnify) some of this traffic to a host for processing.
One level below each session are individual SQL statements. EM offers functionality to look at each SQL statement in a session. If you are confident that the load on the DB is legitimate, an obvious next place to look for clues are the SQL statements. Tuning SQL statements is a vast subject but EM will quickly help you find SQL statements which consume relatively large amounts of CPU. Also EM contains a 'SQL Tuning Advisor'. It is sophisticated enough to help with a wide variety of poorly constructed SQL statements or schemas. For example, it could tell you if a new index would speed up a query and if so how much that index would slow down a corresponding insert.
If we are dealing with Load-type-3, we need to...[ complete later ].
If we are dealing with Load-type-4, we need to search for
characteristics
of the process set which yield clues about the deluge.
Some simple questions might be a good start:
-Are most of the processes owned by one userid?
-Are most of the processes linked to the same executable?
-Do we have source code for the underlying executable?
-When were most of the processes created?
-Are they all consuming CPU cycles?
-Are they connected to anything:
-reading/writing files? -connected to the DB? -network connected to anything?
The goal here is to find the application or person who started all
these processes.
Another goal is to find out if they are obstructed by anything and thus
prevented
from dying since they cannot finish their jobs.
So, that is my initial take on the high-load-average-spike scenario. Do any of you have any comments which would be useful for the DBA who needs a general set of guidelines for dealing with this situation?
...Peter
--
Peter Smith
GoodJobFastCar_at_gmail.com
http://GoodJobFastCar.com
Received on Fri Dec 15 2006 - 17:05:56 CST