Oracle FAQ | Your Portal to the Oracle Knowledge Grid |
Home -> Community -> Usenet -> c.d.o.server -> Locating a bad disk on Solaris w/Veritas
Hi all,
I posted a few weeks ago about some intermittent/inconsistent performance problems we were having for queries that didn't make sense compared to the reported i/o usage for these queries. I've since learned how to use the extended trace facility (10046/12), and I've learned more about the problem, but I've run up against another wall.
What I've found is that our app is spending an inordinate amount of time fetching just one block from the disk. For example, in some slow running candidates, I've seen lines like this:
PARSING IN CURSOR #4 len=1125 dep=0 uid=24 oct=3 lid=24
tim=678997477414 hv=3589042826 ad='cceb3b00'
SELECT /*+ ORDERED */sc.IdentifierId, sc.Coefficient,
fact.ExposureAnalysisId, DataSet FROM ExposureToolModel etm ,
ExposureTool et , ExposureAnalysis fact, ExposureAnalysisDat
a ead , ScalarCoefficient sc WHERE fact.ExposureToolId = 183 AND
fact.RecipeId = 28959 AND fact.MaskId = 23514 AND fact.LayerId = 4130
AND fact.RouteId = 27327 AND fact.Operat
ionId = 51086 AND fact.LotId IN (SELECT l.LotId FROM dual fact, Lot l
WHERE l.DeviceId = 6332 AND l.TechnologyId = 4049 ) AND
fact.ExposureToolId = et.ExposureToolId (+) AND e
t.ExposureToolModelId = etm.ExposureToolModelId (+) AND
etm.ExposureToolType= 'scanner' AND et.ExposureToolId = 183 AND
etm.VendorMake= 'asml' AND etm.VendorModel= 'AT/750B' AN
D fact.ExposeTime + 1 - 1 >= '2004.08.24 11:17:34' AND fact.ExposeTime
+ 1 - 1 <= '2004.10.23 11:17:34' AND fact.ExposeHash >= 1698 AND
fact.ExposeHash <= 1758 AND fact.Exposur
eAnalysisId = ead.ExposureAnalysisId AND fact.FileSpec= 'Feedback' AND
fact.ScriptCategoryId = 100 AND fact.ProcessStatus = 'D' AND
ead.DataSet IS NULL AND sc.DataSetId = ead.D
ataSetId ORDER BY fact.ExposeTime desc, fact.ExposureAnalysisId desc
END OF STMT
PARSE #4:c=50000,e=46837,p=0,cr=9,cu=0,mis=1,r=0,dep=0,og=4,tim=678997477399
WAIT #4: nam='SQL*Net message to client' ela= 4 p1=1413697536 p2=1
p3=0
WAIT #4: nam='SQL*Net message from client' ela= 1110 p1=1413697536
p2=1 p3=0
EXEC #4:c=0,e=1135,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=4,tim=678997480425
WAIT #4: nam='SQL*Net message to client' ela= 9 p1=1413697536 p2=1
p3=0
*** 2004-10-23 11:18:10.028
****** LOOK HERE *****
WAIT #4: nam='db file sequential read' ela= 15714497 p1=4 p2=94824
p3=1
****** LOOK HERE *****
WAIT #4: nam='db file sequential read' ela= 9172 p1=3 p2=404516 p3=1 WAIT #4: nam='db file sequential read' ela= 9519 p1=4 p2=198168 p3=1 WAIT #4: nam='db file sequential read' ela= 8957 p1=3 p2=823622 p3=1 WAIT #4: nam='db file sequential read' ela= 3159 p1=4 p2=857461 p3=1 WAIT #4: nam='db file sequential read' ela= 7666 p1=3 p2=415264 p3=1 WAIT #4: nam='db file sequential read' ela= 10128 p1=4 p2=849608 p3=1 WAIT #4: nam='db file sequential read' ela= 1123429 p1=3 p2=1082960p3=1
WAIT #4: nam='db file sequential read' ela= 7294 p1=3 p2=825174 p3=1 WAIT #4: nam='db file sequential read' ela= 13880 p1=4 p2=857419 p3=1 WAIT #4: nam='db file sequential read' ela= 9227 p1=3 p2=415334 p3=1 WAIT #4: nam='db file sequential read' ela= 21533 p1=4 p2=853244 p3=1 WAIT #4: nam='db file sequential read' ela= 7562 p1=3 p2=1084013 p3=1 WAIT #4: nam='db file sequential read' ela= 9635 p1=3 p2=785154 p3=1 WAIT #4: nam='db file sequential read' ela= 17597 p1=4 p2=198519 p3=1 WAIT #4: nam='db file sequential read' ela= 10863 p1=4 p2=857465 p3=1 WAIT #4: nam='db file sequential read' ela= 26288 p1=3 p2=415301 p3=1 WAIT #4: nam='db file sequential read' ela= 21145 p1=4 p2=852599 p3=1 WAIT #4: nam='db file sequential read' ela= 14026 p1=3 p2=1084347 p3=1 WAIT #4: nam='db file sequential read' ela= 9997 p1=3 p2=1185112 p3=1FETCH #4:c=30000,e=17079350,p=20,cr=222,cu=0,mis=0,r=1,dep=0,og=4,tim=679014560511 WAIT #4: nam='SQL*Net message from client' ela= 1838 p1=1413697536 p2=1 p3=0
STAT #4 id=12 cnt=1 pid=11 pos=1 obj=6592 op='TABLE ACCESS BY INDEX ROWID EXPOSURETOOL (cr=2 r=0 w=0 time=52 us)' STAT #4 id=13 cnt=1 pid=12 pos=1 obj=7493 op='INDEX UNIQUE SCANPKEXPOSURETOOL (cr=1 r=0 w=0 time=29 us)' STAT #4 id=14 cnt=24 pid=11 pos=2 obj=0 op='MERGE JOIN CARTESIAN
STAT #4 id=19 cnt=24 pid=18 pos=1 obj=0 op='BITMAP CONVERSION TO ROWIDS (cr=4 r=0 w=0 time=320 us)' STAT #4 id=20 cnt=1 pid=19 pos=1 obj=0 op='BITMAP AND (cr=4 r=0 w=0time=283 us)'
STAT #4 id=21 cnt=1 pid=20 pos=1 obj=0 op='BITMAP CONVERSION FROM ROWIDS (cr=2 r=0 w=0 time=125 us)' STAT #4 id=23 cnt=1 pid=20 pos=2 obj=0 op='BITMAP CONVERSION FROM ROWIDS (cr=2 r=0 w=0 time=120 us)' STAT #4 id=25 cnt=24 pid=17 pos=2 obj=0 op='BUFFER SORT (cr=3 r=0 w=0time=264 us)'
STAT #4 id=29 cnt=1 pid=28 pos=1 obj=7500 op='INDEX UNIQUE SCAN AKEXPOSURETOOLMODEL (cr=1 r=0 w=0 time=15 us)' STAT #4 id=30 cnt=10 pid=4 pos=2 obj=6584 op='TABLE ACCESS BY INDEXROWID EXPOSUREANALYSISDATA (cr=32 r=8 w=0 time=15797744 us)'
STAT #4 id=31 cnt=10 pid=30 pos=1 obj=7471 op='INDEX RANGE SCAN PKEXPOSUREANALYSISDATA (cr=22 r=4 w=0 time=15744043 us)' STAT #4 id=32 cnt=100 pid=3 pos=2 obj=7590 op='INDEX RANGE SCANPKSCALARCOEFFICIENT (cr=32 r=5 w=0 time=81727 us)'
Note that we're paying almost 16 seconds to fetch a single block from the disk.
The problem is, I'm struggling to figure out why. At the time of this call, there is nothing else going on in our application that is accessing the db - certainly nothing that could block the system for over 10 seconds at a time. I need to find what it is that's standing in between Oracle and the bits on the platters of our disks. Is it a bad disk? Some Veritas weirdness? Some job that's starting up that we've overlooked?
We're running Oracle 9.2.0.4.0 on Solaris 8 and our RAID is managed by
Veritas. Is there some sort of low-level tools we can run that would
determine if we have some bum disks or if there is suspicious access
of the disks at certain times? I've used sar, but I don't know how to
get it to give information that is substantially different from iostat
(i.e. just aggregated statistics & averages).
Thanks,
-Steve
Received on Mon Oct 25 2004 - 16:18:53 CDT