Home » RDBMS Server » Performance Tuning » Proving OTT undo reads (In post)
Proving OTT undo reads [message #528664] |
Wed, 26 October 2011 05:59 |
Roachcoach
Messages: 1576 Registered: May 2010 Location: UK
|
Senior Member |
|
|
Version:
mkr@PRD> select * from v$version
2 /
BANNER
----------------------------------------------------------------
Oracle9i Enterprise Edition Release 9.2.0.8.0 - 64bit Production
PL/SQL Release 9.2.0.8.0 - Production
CORE 9.2.0.8.0 Production
TNS for Solaris: Version 9.2.0.8.0 - Production
NLSRTL Version 9.2.0.8.0 - Production
I've had to chop/modify the output a little for linesizes, but the key detail is here, not much I could do with the xplan however.
Xplan:
PLAN_TABLE_OUTPUT
-----------------------------------------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| TQ |IN-OUT| PQ Distrib |
-----------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 31 | 7471 | 766K (1)| | | |
| 1 | SORT GROUP BY | | 31 | 7471 | 766K (1)| 69,08 | P->S | QC (RAND) |
|* 2 | HASH JOIN | | 31 | 7471 | 766K (1)| 69,07 | P->P | HASH |
|* 3 | HASH JOIN | | 31 | 7254 | 766K (1)| 69,07 | PCWP | |
|* 4 | FILTER | | | | | 69,07 | PCWC | |
|* 5 | HASH JOIN OUTER | | | | | 69,05 | P->P | BROADCAST |
|* 6 | HASH JOIN | | 31 | 5208 | 630K (1)| 69,03 | P->P | HASH |
|* 7 | HASH JOIN | | 26 | 2782 | 599K (1)| 69,02 | P->P | BROADCAST |
|* 8 | HASH JOIN | | 83 | 4648 | 520K (1)| 69,01 | P->P | BROADCAST |
|* 9 | TABLE ACCESS FULL| CATALOGUEPRODUCT | 9 | 216 | 12307 (1)| 69,00 | S->P | BROADCAST |
|* 10 | TABLE ACCESS FULL| PORTFOLIOPRODUCT | 8425K| 257M| 507K (1)| 69,01 | PCWP | |
|* 11 | TABLE ACCESS FULL | SUBSCRIPTION | 2285K| 111M| 79287 (1)| 69,02 | PCWP | |
|* 12 | TABLE ACCESS FULL | CLAWBACKITEMS | 3356K| 195M| 30951 (1)| 69,03 | PCWP | |
|* 13 | INDEX FAST FULL SCAN| PK_CLAWBACKEVENTS | 708K| 21M| 17692 (1)| 69,04 | P->P | HASH |
| 14 | TABLE ACCESS FULL | SUBSCRIPTIONAGREEMENTITEM | 65M| 2126M| 117K (1)| 69,07 | PCWP | |
| 15 | TABLE ACCESS FULL | SUBSCRIPTIONAGREEMENT | 31 | 217 | 2 (50)| 69,06 | P->P | BROADCAST |
-----------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("SAI"."SUBSCRIPTIONAGREEMENTID"="SA"."ID")
3 - access("SAI"."SUBSCRIPTIONID"="S"."ID")
4 - filter("CE"."EVENTCODE" IS NULL)
5 - access("CE"."CLAWBACKID"(+)="CI"."ID")
6 - access("CI"."SUBSCRIPTIONID"="S"."ID")
7 - access("S"."SERVICEINSTANCEID"="PP"."SERVICEINSTANCEID")
8 - access("PP"."CATALOGUEPRODUCTID"="CP"."ID")
9 - filter("CP"."SUBSCRIPTIONTYPE"='4' AND ("CP"."SERVICETYPECODE"='13' OR "CP"."SERVICETYPECODE"='8'))
10 - filter("PP"."STATUS"='EN')
11 - filter(("S"."SUBSCRIPTIONTYPEID"='1' OR "S"."SUBSCRIPTIONTYPEID"='2') AND ("S"."STATUS"='AB' OR "S"."STATUS"='PA' OR
"S"."STATUS"='PO' OR "S"."STATUS"='SC'))
12 - filter("CI"."EXEMPTIONREASONCODE" IS NULL AND "CI"."EXEMPTIONDATE" IS NULL)
13 - filter("CE"."EVENTCODE"(+)='EXEMPT')
38 rows selected.
Elapsed: 00:00:01.23
Now then, with the Xplan out of the way, the background:
The above usually runs pretty quickly (in the 30-45 minute range) but today folks let me know it was dragging on for nearly 4 hours and has downstream impacts of a slow run so I started digging and found the below:
Checking live waits:
mkr@PRD> select
2 s.sid,
3 --s.ROW_WAIT_OBJ#,
4 sw.event,
5 --s.ROW_WAIT_FILE#,
6 --s.ROW_WAIT_BLOCK#,
7 ts.name tsname,
8 dbo.object_name,
9 dbo.object_type,
10 sw.seconds_in_wait time
11 from v$session s
12 join v$session_wait sw
13 on s.sid=sw.sid
14 join v$datafile df
15 on s.ROW_WAIT_FILE#=df.file#
16 join v$tablespace ts
17 on df.ts#=ts.ts#
18 left join dba_objects dbo
19 on dbo.object_id = s.ROW_WAIT_OBJ#
20 where s.sid in (select sid from v$px_session where qcsid=&sid) or s.sid=&sid
21 /
old 20: where s.sid in (select sid from v$px_session where qcsid=&sid) or s.sid=&sid
new 20: where s.sid in (select sid from v$px_session where qcsid=1170) or s.sid=1170
SID EVENT TSNAME OBJECT_NAME OBJECT_TYP TIME
---------- ------------------------------ ------------------- ---------------------------- ---------- ----------
1170 PX Deq: Execute Reply DATA_0001_01 CATALOGUEPRODUCT TABLE 0
440 buffer busy waits UNDOTBS 0
2644 buffer busy waits UNDOTBS 0
3813 buffer busy waits UNDOTBS 0
6187 buffer busy waits UNDOTBS 0
6240 latch free UNDOTBS 0
6 rows selected.
Elapsed: 00:01:25.62
mkr@PRD> /
old 20: where s.sid in (select sid from v$px_session where qcsid=&sid) or s.sid=&sid
new 20: where s.sid in (select sid from v$px_session where qcsid=1170) or s.sid=1170
SID EVENT TSNAME OBJECT_NAME OBJECT_TYP TIME
---------- ------------------------------ -------------- --------------------------------- ---------- ----------
1170 PX Deq: Execute Reply DATA_0001_01 CATALOGUEPRODUCT TABLE 1
440 buffer busy waits UNDOTBS 0
2644 db file sequential read UNDOTBS 0
3813 buffer busy waits UNDOTBS 0
6187 db file sequential read UNDOTBS 0
6240 buffer busy waits UNDOTBS 0
6 rows selected.
Elapsed: 00:01:31.33
mkr@PRD> /
old 20: where s.sid in (select sid from v$px_session where qcsid=&sid) or s.sid=&sid
new 20: where s.sid in (select sid from v$px_session where qcsid=1170) or s.sid=1170
SID EVENT TSNAME OBJECT_NAME OBJECT_TYP TIME
---------- ------------------------------ ----------------- ------------------------------ ---------- ----------
1170 PX Deq: Execute Reply DATA_0001_01 CATALOGUEPRODUCT TABLE 1
440 db file sequential read UNDOTBS 0
2644 db file sequential read UNDOTBS 0
3813 db file sequential read UNDOTBS 0
6187 db file sequential read UNDOTBS 0
6240 db file sequential read UNDOTBS 0
6 rows selected.
Elapsed: 00:01:43.65
Checking session events:
mkr@PRD> select * from (
select * from v$session_event where sid in (select sid from v$px_session where qcsid=&sid) or sid=&sid
order by time_waited desc)
where rownum <21
/ 2 3 4 5
SID EVENT TOTAL_WAITS TOTAL_TIMEOUTS TIME_WAITED AVERAGE_WAIT MAX_WAIT
---------- ------------------------------ ----------- -------------- ----------- ------------ ----------
1170 PX Deq: Execute Reply 6816 6336 1276213 187 198
1342 PX Deq: Table Q Normal 82667 4949 1273181 15 197
50 PX Deq: Table Q Normal 82491 4948 1272953 15 197
6159 PX Deq: Table Q Normal 82704 4948 1272878 15 197
1000 PX Deq: Table Q Normal 82478 4948 1272587 15 197
642 PX Deq: Table Q Normal 82468 4948 1272547 15 197
6187 db file sequential read 1768610 0 1034154 1 254
440 db file sequential read 1786564 0 1033784 1 302
3813 db file sequential read 1764778 0 1028883 1 161
6240 db file sequential read 1757146 0 1026620 1 104
2644 db file sequential read 1757390 0 1017182 1 348
2644 buffer busy waits 235996 7 135121 1 98
6240 buffer busy waits 228002 5 128353 1 98
3813 buffer busy waits 219779 1 125131 1 98
440 buffer busy waits 220574 2 122776 1 98
6187 buffer busy waits 216293 1 120654 1 98
2644 direct path read 142094 0 71139 1 159
3813 direct path read 139584 0 70431 1 111
6240 direct path read 129814 0 69788 1 187
6187 direct path read 138785 0 69043 0 184
20 rows selected.
Elapsed: 00:00:03.56
old 2: select * from v$session_event where sid in (select sid from v$px_session where qcsid=&sid) or sid=&sid
new 2: select * from v$session_event where sid in (select sid from v$px_session where qcsid=1170) or sid=1170
SID EVENT TOTAL_WAITS TOTAL_TIMEOUTS TIME_WAITED AVERAGE_WAIT MAX_WAIT
---------- ------------------------------ ----------- -------------- ----------- ------------ ----------
1170 PX Deq: Execute Reply 7528 7020 1412149 188 198
1342 PX Deq: Table Q Normal 90614 5467 1408716 16 197
6159 PX Deq: Table Q Normal 90652 5467 1408664 16 197
50 PX Deq: Table Q Normal 90429 5466 1408400 16 197
1000 PX Deq: Table Q Normal 90381 5466 1407847 16 197
642 PX Deq: Table Q Normal 90346 5466 1407661 16 197
440 db file sequential read 1930671 0 1116983 1 302
6187 db file sequential read 1903706 0 1108608 1 254
2644 db file sequential read 1904438 0 1100081 1 348
6240 db file sequential read 1877216 0 1097300 1 104
3813 db file sequential read 1878367 0 1094915 1 161
2644 buffer busy waits 306781 7 175494 1 98
6240 buffer busy waits 293760 5 165220 1 98
3813 buffer busy waits 285398 1 162639 1 98
440 buffer busy waits 290514 2 162497 1 98
6187 buffer busy waits 268801 1 149951 1 98
3813 direct path read 148608 0 72280 0 111
2644 direct path read 149286 0 72159 0 159
6240 direct path read 139591 0 71256 1 187
6187 direct path read 143816 0 69602 0 184
20 rows selected.
Elapsed: 00:00:03.35
Now those are samples of typical results, I took more samples than posted and saw no great variations. My theory is that those wait events are not in line with that plan, as such, and show a query spending an unacceptable* amount of time in undo reads.
There are LOTS of other jobs going on in parallel to this but a recent one has been added that starts before and ends after this one starts which does a ton of updates to a table in the query.
My questions are: Are my queries/checks on the job activity solid and if so, does it appear in the information to hand that my theory correct - that this query is spending one hell of a long time reading undo, one block at a time?
I'm relatively confident but before I'd value some input if there is anything else I can do to prove this or indeed if I'm basically just going of in the wrong direction. Is there any easy way to prove this after the fact, or do I really need to be tracing it for any measurable retrospective 'proof'?
I realise the data here may be a bit spartan, but it is all I have at the moment.
*Obviously I realise the notion of 'excessive' undo reads is a false one, but if one delays this job until the big bulk updates are done, then this query can crack on with proper direct path reads and not spend all its damned time to single block reads of undo.
|
|
|
Re: Proving OTT undo reads [message #528674 is a reply to message #528664] |
Wed, 26 October 2011 06:28 |
John Watson
Messages: 8964 Registered: January 2010 Location: Global Village
|
Senior Member |
|
|
Remember that when you do a parallel full table scan, Oracle uses direct reads. Therefore, the object must be checkpointed first. I'm sure you knew that, but I always like to mention it because a lot of programmers don't realize this. Then the parallel direct read can proceed. But if the checkpoint includes writing uncommitted changes to the datafiles, whenever the scan hits one of those blocks, it must revert to a read via cache so that a past-image can be constructed in the SGA. Then the scan can continue.
A similar effect occurs with delayed block clean-out. With 11g, if a scan hits a block that has been updated and was written to disc before the commit, the scanning process checks the undo segments to see if the transaction has committed, and then "remembers" the XID for all subsequent blocks that need clean-out, so the interruption occurs only once per transaction. I think that back in 9i, this also caused a interruption for every block.
It looks to me as though your scan is being continually interrupted for these reasons. Normally, it wouldn't be too bad because the undo segment should be in memory: I'm surprised at undo having to be read from disc. If this hypothesis is correct, it could result from your new job that is running during this query, particularly if the transaction(s) is so long lasting that the undo blocks are being flushed to disc.
Basically, parallel FTS with concurrent DML is often disastrous. I've come across this with Exadata, where smart scans can be crippled by it.
I may be wrong, but I think your analysis of the situation is probably correct and I've just tried to formalize the issue.
Hope this helps.
edit: typo
[Updated on: Wed, 26 October 2011 06:29] Report message to a moderator
|
|
|
|
Re: Proving OTT undo reads [message #528713 is a reply to message #528679] |
Wed, 26 October 2011 08:50 |
Roachcoach
Messages: 1576 Registered: May 2010 Location: UK
|
Senior Member |
|
|
By means of follow up - the job had been killed and restarted and is looking significantly more like I'd expect:
old 2: select * from v$session_event where sid in (select sid from v$px_session where qcsid=&sid) or sid=&sid
new 2: select * from v$session_event where sid in (select sid from v$px_session where qcsid=4423) or sid=4423
SID EVENT TOTAL_WAITS TOTAL_TIMEOUTS TIME_WAITED AVERAGE_WAIT MAX_WAIT
---------- ------------------------------ ----------- -------------- ----------- ------------ ----------
4423 PX Deq: Execute Reply 646 121 59270 92 195
2378 PX Deq: Table Q Normal 82871 0 54230 1 79
125 PX Deq: Table Q Normal 82428 0 53309 1 43
1473 PX Deq: Table Q Normal 81328 0 52952 1 77
2882 PX Deq: Table Q Normal 81731 0 52721 1 77
4520 PX Deq: Table Q Normal 81333 0 52709 1 57
383 direct path read 141188 0 43855 0 68
279 direct path read 149906 0 42747 0 112
3856 direct path read 145727 0 42249 0 86
1168 direct path read 149655 0 42178 0 44
2571 direct path read 145821 0 41773 0 58
3856 PX Deq: Table Q Normal 51 24 6340 124 196
2571 PX Deq: Table Q Normal 50 24 6340 127 195
1168 PX Deq: Table Q Normal 50 24 6297 126 195
383 PX Deq: Table Q Normal 47 24 6240 133 196
279 PX Deq: Table Q Normal 48 24 6239 130 196
2378 PX Deq: Execution Msg 30 24 4863 162 195
4520 PX Deq: Execution Msg 29 23 4773 165 195
1473 PX Deq: Execution Msg 29 23 4729 163 195
125 PX Deq: Execution Msg 27 22 4479 166 195
Stats pack also shows undotbs reads are down 50% from the earlier problem window as a whole (600k reads per window down from 1.2m).
Now then...to find the root cause of the transaction(s) pushing it into the undo reads...
|
|
|
Goto Forum:
Current Time: Sat Jan 25 11:11:19 CST 2025
|