Oracle FAQ | Your Portal to the Oracle Knowledge Grid |
Home -> Community -> Usenet -> c.d.o.server -> Tuning question - low cpu_time, high elapsed_time - what gives?
Hi all,
We're running a 9.2.0.4.0 instance on a 4 processor Solaris box, and I'm running up against an optimization mystery. Our application involves a variety of operations - constant insertion & querying, although the performance we're mainly concerned about are the queries.
Here's the deal: ranging from about 1/second to 10 times per second, we are running a query that a main table (Analysis) to 1 or 2 dimension tables, and about a third of the time joins against a detail table (Coefficient). Analysis has about 100K rows & Coefficient has about 30 million rows. If we join to Coefficient, we typically retrieve 1-10 rows in Coefficient for each row in Analysis. All values being returned are NUMBER columns. The constraints on analysis can be broad or narrow, but the clients executing the queries never return more than the first rows Analysis rows of the query (we only care about recent data). All queries are done with dynamic SQL due to the nature of the constraints (there is no one query with placeholders that would run efficiently for all cases, hints are dynamically computed based on the query, etc), so we always incur parse time.
Most of the time these queries run efficiently - less than 500 buffer_gets and 0 disk_reads, occassionally peaking to 4000 gets (5% of the time or less) and maybe 10-15% of the time we get 5-20 disk reads, and the elapsed_time on these queries is less than .15 seconds. Maybe a few times a day we have some queries that will ultimately need to be optimized (>20,000 gets). However, occasionally - maybe 3% of all queries run - we get elapsed times that exceed 5 seconds, but the other statistics on the query read good.
For example, here are some stats from some of our bad queries, as well as good ones (the last two rows):
BUFFER_GETS DISK_READS ELAPSED_TIME SHARABLE_MEM PERSISTENT_MEM
RUNTIME_MEM
----------- ---------- ------------ ------------ --------------
675 1 6782682 86744 2792 210976
623 1 6734890 87947 2792 210976
577 1 6507345 95338 1472 115304
2094 50 4796384 93897 1288 169480
308 5 483892 94505 1472 115304
419 53 456903 91949 1288 169480
SORTS FETCHES ROWS_PROCESSED OPTIMIZER_COST CPU_TIME
CHILD_LATCH
---------- ---------- -------------- -------------- ----------
4 2 33 2177 100000 5
4 2 33 2177 110000 1
4 2 33 239352 120000 3
1 10 288 28 250000 4
4 2 6 239352 110000 3
1 10 288 28 100000 5
All other stats are the same. As you can see, for the "bad" queries, all the statistics other than elapsed_time fall with the same range as the good ones.
During the times when these unfortunate queries are running, there are other statements executing on the box, perhaps somewhat more than at other times, but we have yet to correlate any load patterns against the periods of slowness. Additionally, we are running using resource_groups, and the resource plan is set to favor these particular queries.
For anyone's curiosity, here's an example of one of these queries:
SELECT /*+ ORDERED */
sc.IdentifierId
, sc.Coefficient
, fact.ExposureAnalysisId
, DataSet
FROM ExposureToolModel etm
, ExposureTool et
, ExposureAnalysis fact
, ExposureAnalysisData ead
, ScalarCoefficient sc
WHERE fact.ExposureToolId = 301 AND fact.ScriptCategoryId = 100 AND fact.RecipeId = 604 AND fact.MaskId = 40733 AND fact.LayerId = 230 AND fact.RouteId = 27233 AND fact.OperationId = 36086 AND fact.LotId IN(SELECT l.LotId FROM dual fact , Lot l WHERE l.DeviceId = 6552 AND l.TechnologyId = 3845)AND fact.ExposureToolId = et.ExposureToolId(+) AND et.ExposureToolModelId = etm.ExposureToolModelId(+) AND etm.ExposureToolType = 'scanner'
AND fact.ExposeTime + 1 - 1 >= '2004.08.12 16:36:50' AND fact.ExposeTime + 1 - 1 <= '2004.10.11 16:36:50' AND fact.ExposeHash >= 1686 AND fact.ExposeHash <= 1746 AND fact.FileSpec = 'Feedback' AND fact.ProcessStatus = 'D'
, fact.ExposureAnalysisId DESC;
Pretty typical, and I can tell you that this particular query returns very few rows - the subquery in the where clause returns a single row, and all columns are indexed (bitmap indexes, in fact, for all the values that have numeric constraints and have relatively few distinct values).
We have run statspack against the instance as well, but I'm not seeing any smoking guns in the tea leaves. The alert_log for the instance doesn't show any unusual activity (log switching, etc), at the time of these operations, and we are not swapping memory.
My only other guess as to what might be going on - as the Analysis table has indexes on it, it may be that an insert into that table is happening just as we are querying it, causing the bitmap indexes to be rebuilt. Could it be that the table is exclusively locked for an instant during this step and the select queries are blocked by the insert process?
Does anyone out there have any ideas as to where I might start looking to figure out why we seem to be getting this blocking behavior?
Thanks,
-Steve Received on Mon Oct 11 2004 - 15:56:30 CDT