Oracle FAQ | Your Portal to the Oracle Knowledge Grid |
Home -> Community -> Mailing Lists -> Oracle-L -> RE: Houston, do I have a problem?
Cary,
Thank you very much for a wonderful response.
I have not yet talked to any users or developers regarding the performance of this database. I just shared with my boss (and with the list) what I found in my brief 'review' of the database. There are no reported performance related problems that need to be fixed right away. But he was concerned as we will be loading this database soon to support a couple of more Applications. One of which will have more OLTP transactions than the other. Currently the database has heavy DML activity only during the batch processes. There is very minimal DML via on-line (Intranet).
From our auto-scheduler (AutoSys) system, I have found out that there are a *number* of batch jobs that run *concurrently* against this database in the batch window. That would be the next area to 'review' as to what these jobs do and how they do it. For all we know, it could just be a scheduling problem!
I am not paying much attention to the stats at system level as they do not
mean much, at this time. I will be watching the batch processes and most
probably consider using Statspack and make use of Sparky
(http://www.hotsos.com/products/sparky/) to dig deeper.
I am not going to reach any conclusion based solely on what I saw in v$system_event. That was just my first step to see what kinds of waits this database had encountered, just as some check the hit ratio first ;-) But I must say, based on those stats, that the I/O subsystem is being stressed quite a bit.
Regards,
-----Original Message-----
Sent: Monday, August 12, 2002 1:38 AM
To: Multiple recipients of list ORACLE-L
This is an interesting report. I think the responses to it are even more interesting. One response admits confusion (which I think is a completely fair reaction). Another zeroes in knowingly on some specific details. If everyone had time to respond, I would expect a rash of differing opinions about what you should do first to fix this system... This kind of game is a fundamental part of using system-wide performance data. (The various ratio problems are just as relevant for system-wide data collected from the "wait interface" as they are from v$sys<anything-else>.)
Don't lose hope if you look at Kirti's note and wonder, "so what's the point?" You cannot see everything that's wrong with a system from a report like this. I think in fact that you can know only two things from a v$system_event report:
Sure, knowing these two things is worth something, but it leaves lots of good questions unanswered (*essential* questions, actually):
So... Is the HDS disk array a problem? Probably. But, it's possible--*likely*, actually--that an analyst could fix all the problems shown here and still have really slow applications. Why? Because several essential-but-slow programs on this system might not spend significant amounts of their response time waiting on any of the top 10 events in this list. We see it pretty often: people fix their system's "worst performance problems" and then find out that their work really didn't make a noticeable end-user impact. (I'm confident that Kirti won't end up in this trap, but that's because I trust him to exercise intuition and experience far beyond the scope of what can be learned from his v$system_event data.)
The "wait interface" is an important tool, because it "finally" (well, since over ten years ago) allows us to see where a program spends its time. But to use that tool to see how a whole system has spent its time since instance startup has the same limitations as any other method that relies upon system-wide aggregated data.
So, what should you look at to avoid performance improvement project ambiguities? Session-level data. Which session? As I mentioned last week, I believe the analyst should focus first upon sessions whose performance improvement would most significantly improve the business. That, in my opinion, is The Big Secret.
Cary Millsap
Hotsos Enterprises, Ltd.
http://www.hotsos.com
Upcoming events:
- Hotsos Clinic, Oct 1-3 San Francisco, Oct 15-17 Dallas, Dec 9-11
Honolulu
- 2003 Hotsos Symposium on OracleR System Performance, Feb 9-12 Dallas
- Next event: NCOAUG Training Day, Aug 16 Chicago
-----Original Message-----
Kirti
Sent: Sunday, August 11, 2002 5:43 PM
To: Multiple recipients of list ORACLE-L
This is not a joke.....!!!
This is from a business critical production database that I was asked to 'review' past Friday.
The report is from v$system_event taken at 10:30am, Aug 9, 2002. The server (and database) was bounced on Aug 4, 2002 at 9:20am.
This was the 1st time I was logging into this database.
SQL> /
EVENT TOTAL_WAITS TOTAL_TIMEOUTS TIME_WAITED AVERAGE_WAIT ----------------------------------- ----------- -------------- ----------- ------------ control file parallel write 143933 0 4080356626 28349.0001 db file scattered read 12540695 0 1.2254E+10 977.107332 buffer busy waits 10740450 36 8193235928 762.839167 SQL*Net message from client 180769027 0 9.9561E+10 550.761199 db file sequential read 298968127 0 1.1839E+11 395.99129 enqueue 13500 6435 2036785 150.872963 SQL*Net more data from client 52227948 0 4093231165 78.3724294 free buffer waits 16 4 795 49.6875 log file switch completion 804 43 16263 20.2276119 log buffer space 977 0 5409 5.53633572 control file single write 17 0 51 3 db file parallel write 1749695 0 2935317 1.67761638 db file parallel read 8149 0 13484 1.65468156 log file single write 1024 0 701 .684570313 latch free 2007034 1616763 1054137 .525221297 log file sync 1366242 560 526049 .385033545 SQL*Net message from dblink 1514480 0 451351 .298023744 log file sequential read 405415 0 82877 .204425095 SQL*Net break/reset to dblink 10 0 2 .2 log file parallel write 2025192 7 293332 .144841576 SQL*Net break/reset to client 28113 0 3221 .114573329 db file single write 320 0 36 .1125 SQL*Net more data from dblink 447044 0 11375 .025444923 SQL*Net more data to client 11770996 0 75680 .006429362 control file sequential read 554851 0 3261 .005877254 SQL*Net more data to dblink 1076 0 5 .00464684 buffer deadlock 1045 1029 1 .000956938 SQL*Net message to dblink 1514485 0 456 .000301092 SQL*Net message to client 180769119 0 48736 .000269604
29 rows selected.
SQL>
Here is the environment:
1)all the file systems for the database, including dump directories are
in a
single disk volume group, 2) all redo logs and control files are spread
among all the other database files, 3) Hitachi array is in use with
nothing
but RAID-5 for all files (redo as well), 4) the real hard drives within
the
array are either shared with other databases on the same server or with
other servers, 5) redo logs are of 100MB size and switch 20+ times/hour
when
some of the batch processes run in the evening, 6) no changes are
allowed to
any SQL code, Pro*COBOL code that use 'COPYBOOKs' (Remember those?) to
interact with tables at single row level (no array processing) using
routines with bunch of parameters (call insert... call update... call
delete...), 7) the array has 32GB of NV cache and that's the max it can
have
(the DB is 180GB, there are 3 other similar ones from just this server).
Now the 'icing on the cake':
The server has 3 other critical databases. All 4 running in archive log
mode. All share the *same* archive log destination. And all databases
are
expected to have same amount of batch processing. The archive log
destination is 8GB in size on the 2nd VG. The DB in question, generated
1.8GB to 2+GB of logs in less than an hour during batch processing. At
times
our automated archived log siphoning process encounters some bottlenecks
from our single IBM/Tivoli TSM Server where the logs are deposited
before
those are purged from archive log destination...
I was also informed that I will not have much chance to bring about any
changes in the environment described above. Because, I was told, ...it
is
the corporate decision to use RAID-5 with HDS array and it is 'the most
cost
effective way to address our storage needs'.... and a single VG per
database
helps UNIX support to implement HACMP with much ease... and we can not
meet
our published deadlines if we made any changes and spent time in testing
those unscheduled changes...... yadi yadi yada....
Fat City Network Services -- (858) 538-5051 FAX: (858) 538-5051 San Diego, California -- Public Internet access / Mailing Lists --------------------------------------------------------------------To REMOVE yourself from this mailing list, send an E-Mail message to: ListGuru_at_fatcity.com (note EXACT spelling of 'ListGuru') and in the message BODY, include a line containing: UNSUB ORACLE-L
-- Please see the official ORACLE-L FAQ: http://www.orafaq.com -- Author: Cary Millsap INET: cary.millsap_at_hotsos.com Fat City Network Services -- (858) 538-5051 FAX: (858) 538-5051 San Diego, California -- Public Internet access / Mailing Lists -------------------------------------------------------------------- To REMOVE yourself from this mailing list, send an E-Mail message to: ListGuru_at_fatcity.com (note EXACT spelling of 'ListGuru') and in the message BODY, include a line containing: UNSUB ORACLE-LReceived on Mon Aug 12 2002 - 17:08:22 CDT
(or the name of mailing list you want to be removed from). You may
also send the HELP command for other information (like subscribing). -- Please see the official ORACLE-L FAQ: http://www.orafaq.com -- Author: Deshpande, Kirti INET: kirti.deshpande_at_verizon.com Fat City Network Services -- (858) 538-5051 FAX: (858) 538-5051 San Diego, California -- Public Internet access / Mailing Lists -------------------------------------------------------------------- To REMOVE yourself from this mailing list, send an E-Mail message to: ListGuru_at_fatcity.com (note EXACT spelling of 'ListGuru') and in the message BODY, include a line containing: UNSUB ORACLE-L
(or the name of mailing list you want to be removed from). You may
also send the HELP command for other information (like subscribing).