Tracing Parallel Execution
If you're looking to tune an SQL statement or a batch job, a common way to find out what happened during the execution of the SQL is to run an extended SQL trace and examine the wait events. But what happens if you are using parallel execution, and all your trace file contains is the parallel execution wait events that are generally considered idle events? Your trace file shows how long your query took to run, and the work involved in controlling the PQ slaves, but the real details of what took up all the execution time are actually to be found in the corresponding PQ slave trace files in the BDUMP directory.
To take an example, I create a copy of the SH.SALES table and enable parallel DML with a degree of parallelism of 2:
SQL*Plus: Release 10.1.0.4.0 - Production on Tue Apr 26 19:49:45 2005 Copyright (c) 1982, 2005, Oracle. All rights reserved. Connected to: Oracle Database 10g Enterprise Edition Release 10.1.0.4.0 - Production With the Partitioning, OLAP and Data Mining options SQL> create table sales_copy as select * from sales; Table created. SQL> alter table sales_copy parallel 2; Table altered.
I then query the parallel_min_servers parameter to make sure some PQ slaves are available:
SQL> show parameter parallel_min_servers NAME TYPE VALUE ------------------------------------ ----------- ------------------------------ parallel_min_servers integer 4 SQL>
Now, I turn on extended SQL tracing for my session:
SQL> alter session set timed_statistics = true; Session altered. SQL> alter session set max_dump_file_size = unlimited; Session altered. SQL> alter session set tracefile_identifier = 'px_test1'; Session altered. SQL> alter session set events = '10046 trace name context forever, level 8'; Session altered.
Now, I run my SELECT statement against the SALES_COPY table and the
PRODUCTS table;
SQL> select p.prod_name, sum(s.amount_sold) 2 from products p, sales_copy s 3 where p.prod_id = s.prod_id 4 group by p.prod_name; PROD_NAME SUM(S.AMOUNT_SOLD) -------------------------------------------------- ------------------ 128MB Memory Card 571333.75 17' LCD w/built-in HDTV Tuner 7189171.77 18' Flat Panel Graphics Monitor 5498727.81 3 1/2' Bulk diskettes, Box of 100 370204.56 3 1/2' Bulk diskettes, Box of 50 254128.79 Adventures with Numbers 175563.92 CD-R Mini Discs 384553.62 CD-R with Jewel Cases, pACK OF 12 170405.76 CD-RW, High Speed Pack of 5 230233.35 CD-RW, High Speed, Pack of 10 106468.41 DVD-R Discs, 4.7GB, Pack of 5 904537.2 ... Y Box 2082330.3 71 rows selected.
Then turn off extended SQL trace:
SQL> alter session set events '10046 trace name context off'; Session altered.
Now, if I format the trace file using TKPROF
C:\oracle\admin\markr10g\udump>tkprof markr10g_ora_11184_px_test1.trc markr10g_ora_11184_px_test1.txt TKPROF: Release 10.1.0.4.0 - Production on Tue Apr 26 20:09:12 2005 Copyright (c) 1982, 2004, Oracle. All rights reserved.
and then take a look at the part of the trace file that relates to my query, I see the following wait events:
select p.prod_name, sum(s.amount_sold) from products p, sales_copy s where p.prod_id = s.prod_id group by p.prod_name call count cpu elapsed disk query current rows ------- ------ -------- ---------- ---------- ---------- ---------- ---------- Parse 1 0.01 1.43 851 87 0 0 Execute 1 0.00 0.07 0 3 0 0 Fetch 6 0.00 3.14 3 4 0 71 ------- ------ -------- ---------- ---------- ---------- ---------- ---------- total 8 0.01 4.65 854 94 0 71 Misses in library cache during parse: 1 Optimizer mode: ALL_ROWS Parsing user id: 61 Rows Row Source Operation ------- --------------------------------------------------- 71 PX COORDINATOR (cr=7 pr=3 pw=0 time=3110386 us) 0 SORT GROUP BY (cr=0 pr=0 pw=0 time=0 us)(Orphan Entry) 0 PX RECEIVE (cr=0 pr=0 pw=0 time=0 us) 0 SORT GROUP BY (cr=0 pr=0 pw=0 time=0 us)(Orphan Entry) 0 HASH JOIN (cr=0 pr=0 pw=0 time=0 us) 0 BUFFER SORT (cr=0 pr=0 pw=0 time=0 us) 0 PX RECEIVE (cr=0 pr=0 pw=0 time=0 us) 72 TABLE ACCESS FULL PRODUCTS (cr=4 pr=3 pw=0 time=15278 us)(Orphan Entry) 0 PX BLOCK ITERATOR (cr=0 pr=0 pw=0 time=0 us) 0 TABLE ACCESS FULL SALES_COPY (cr=0 pr=0 pw=0 time=0 us) Elapsed times include waiting on following events: Event waited on Times Max. Wait Total Waited ---------------------------------------- Waited ---------- ------------ PX Deq: Join ACK 3 0.00 0.00 PX Deq: Parse Reply 5 0.02 0.04 SQL*Net message to client 6 0.00 0.00 PX Deq: Execute Reply 61 0.20 2.79 PX qref latch 31 0.02 0.22 db file sequential read 1 0.01 0.01 db file scattered read 1 0.00 0.00 SQL*Net message from client 6 126.37 126.40 PX Deq: Signal ACK 4 0.09 0.10 ********************************************************************************
Lots of PX 'idle' events, and SQL*Net message to/from client, but not much on what the query actually waited on. If I take a look in my BDUMP directory though, I've got another four trace files, 2 for the processes that scanned my SALES_COPY table and two that did the sorting because of the join.
C:\oracle\admin\markr10g\bdump>dir Volume in drive C has no label. Volume Serial Number is D47C-241A Directory of C:\oracle\admin\markr10g\bdump 26/04/2005 20:03 <DIR> . 26/04/2005 20:03 <DIR> .. 26/04/2005 19:42 349,515 alert_markr10g.log 26/04/2005 20:03 2,353 markr10g_p000_11560.trc 26/04/2005 20:03 2,228 markr10g_p001_6892.trc 26/04/2005 20:03 8,857 markr10g_p002_7568.trc 26/04/2005 20:03 8,312 markr10g_p003_4672.trc 5 File(s) 371,265 bytes 2 Dir(s) 28,870,373,376 bytes free
Now if I format the P002 trace file using TKPROF and look at the bit relating to my SELECT statement, I get the following output:
select p.prod_name, sum(s.amount_sold) from products p, sales_copy s where p.prod_id = s.prod_id group by p.prod_name call count cpu elapsed disk query current rows ------- ------ -------- ---------- ---------- ---------- ---------- ---------- Parse 1 0.00 0.00 0 0 0 0 Execute 1 0.99 3.03 2246 2269 0 0 Fetch 0 0.00 0.00 0 0 0 0 ------- ------ -------- ---------- ---------- ---------- ---------- ---------- total 2 0.99 3.03 2246 2269 0 0 Misses in library cache during parse: 0 Optimizer mode: ALL_ROWS Parsing user id: 61 (recursive depth: 1) Rows Row Source Operation ------- --------------------------------------------------- 0 PX COORDINATOR (cr=0 pr=0 pw=0 time=0 us) 0 SORT GROUP BY (cr=0 pr=0 pw=0 time=0 us)(Orphan Entry) 0 PX RECEIVE (cr=0 pr=0 pw=0 time=0 us) 0 SORT GROUP BY (cr=0 pr=0 pw=0 time=0 us)(Orphan Entry) 0 HASH JOIN (cr=0 pr=0 pw=0 time=0 us) 0 BUFFER SORT (cr=0 pr=0 pw=0 time=0 us) 0 PX RECEIVE (cr=0 pr=0 pw=0 time=0 us) 0 TABLE ACCESS FULL PRODUCTS (cr=0 pr=0 pw=0 time=0 us)(Orphan Entry) 0 PX BLOCK ITERATOR (cr=0 pr=0 pw=0 time=0 us) 0 TABLE ACCESS FULL SALES_COPY (cr=0 pr=0 pw=0 time=0 us) Elapsed times include waiting on following events: Event waited on Times Max. Wait Total Waited ---------------------------------------- Waited ---------- ------------ PX Deq: Execution Msg 28 0.08 0.23 PX Deq Credit: send blkd 3 0.00 0.00 PX Deq: Table Q Normal 3 0.01 0.01 PX qref latch 2 0.00 0.00 direct path read 74 0.13 1.58
Note the 'direct path read' waits. The P003 trace file had the same profile. Also, if I take a look at the P000 and P001 trace files, which presumably are doing the sorting, I get the following profile:
select p.prod_name, sum(s.amount_sold) from products p, sales_copy s where p.prod_id = s.prod_id group by p.prod_name call count cpu elapsed disk query current rows ------- ------ -------- ---------- ---------- ---------- ---------- ---------- Parse 1 0.00 0.00 0 0 0 0 Execute 1 0.00 3.06 0 0 0 0 Fetch 0 0.00 0.00 0 0 0 0 ------- ------ -------- ---------- ---------- ---------- ---------- ---------- total 2 0.00 3.06 0 0 0 0 Misses in library cache during parse: 0 Optimizer mode: ALL_ROWS Parsing user id: 61 (recursive depth: 1) Rows Row Source Operation ------- --------------------------------------------------- 0 PX COORDINATOR (cr=0 pr=0 pw=0 time=0 us) 0 SORT GROUP BY (cr=0 pr=0 pw=0 time=0 us)(Orphan Entry) 0 PX RECEIVE (cr=0 pr=0 pw=0 time=0 us) 0 SORT GROUP BY (cr=0 pr=0 pw=0 time=0 us)(Orphan Entry) 0 HASH JOIN (cr=0 pr=0 pw=0 time=0 us) 0 BUFFER SORT (cr=0 pr=0 pw=0 time=0 us) 0 PX RECEIVE (cr=0 pr=0 pw=0 time=0 us) 0 TABLE ACCESS FULL PRODUCTS (cr=0 pr=0 pw=0 time=0 us)(Orphan Entry) 0 PX BLOCK ITERATOR (cr=0 pr=0 pw=0 time=0 us) 0 TABLE ACCESS FULL SALES_COPY (cr=0 pr=0 pw=0 time=0 us) Elapsed times include waiting on following events: Event waited on Times Max. Wait Total Waited ---------------------------------------- Waited ---------- ------------ PX Deq: Execution Msg 2 2.00 2.92 PX Deq: Table Q Normal 3 0.10 0.10 PX Deq Credit: send blkd 2 0.00 0.00 PX qref latch 2 0.00 0.01
So what do I do if I need to get the full picture of all the wait events that have slowed my SQL statement down?
I'm using Oracle 10g, and I remembered from a previous article I'd written that DBMS_MONITOR allows us to set a 'client_id' that can be used, along with a new utility called TRCSESS, to 'stitch together' trace files that are part of one logical session. Using DBMS_MONITOR, I should be able to produce a single trace file that contains all the trace data for my SQL statement.
The first step then is to log on again as SH and this time use DBMS_MONITOR
SQL> exec dbms_session.set_identifier('px_test2'); PL/SQL procedure successfully completed. SQL> alter session set timed_statistics = true; Session altered. SQL> alter session set max_dump_file_size = unlimited; Session altered. SQL> alter session set tracefile_identifier = 'px_test2'; Session altered. SQL> dbms_monitor.client_id_trace_enable(client_id=>'px_test2'); SP2-0734: unknown command beginning 'dbms_monit...' - rest of line ignored. SQL> exec dbms_monitor.client_id_trace_enable(client_id=>'px_test2'); PL/SQL procedure successfully completed.
Note the call to DBMS_SESSION that sets the client_identifier for this session. This client identifier will be used to tag my SQL statements in the SQL trace file. Also, ensure that your user has execute permission on DBMS_MONITOR for this to work.
Now, I run my SELECT statement as before;
SQL> select p.prod_subcategory, sum(s.amount_sold) 2 from products p, sales_copy s 3 where p.prod_id = s.prod_id 4 group by p.prod_subcategory; PROD_SUBCATEGORY SUM(S.AMOUNT_SOLD) -------------------------------------------------- ------------------ CD-ROM 3995633 Camcorders 8314815.4 Camera Batteries 2143810.36 Camera Media 1190971.74 Cameras 6312268.4 Documentation 3721699.61 Game Consoles 2082330.3 Memory 4948442.05 Monitors 12687899.6 Portable PCs 15011642.5 Recordable CDs 2075521.27 PROD_SUBCATEGORY SUM(S.AMOUNT_SOLD) -------------------------------------------------- ------------------ Y Box Games 1550229.35 Accessories 4057301.26 Bulk Pack Diskettes 624333.35 Desktop PCs 5635963.08 Home Audio 10537384.2 Modems/Fax 2185350.59 Operating Systems 3543725.89 Printer Supplies 5244547.32 Recordable DVD Discs 1914390.17 Y Box Accessories 427571.78 21 rows selected.
and then switch off trace:
SQL> exec dbms_monitor.client_id_trace_disable(client_id=>'px_test2'); PL/SQL procedure successfully completed.
If I take a look at the raw trace file for my main session, the one that requested all the PX slaves, I can see the client ID within the trace entries:
Dump file c:\oracle\admin\markr10g\udump\markr10g_ora_11704_px_test2.trc
Tue Apr 26 20:27:11 2005
ORACLE V10.1.0.4.0 - Production vsnsta=0
vsnsql=13 vsnxtr=3
Oracle Database 10g Enterprise Edition Release 10.1.0.4.0 - Production
With the Partitioning, OLAP and Data Mining options
Windows XP Version V5.1 Service Pack 2
CPU : 1 - type 586
Process Affinity: 0x00000000
Memory (A/P) : PH:561M/2047M, PG:1392M/3945M, VA:1665M/2047M
Instance name: markr10g
Redo thread mounted by this instance: 1
Oracle process number: 17
Windows thread id: 11704, image: ORACLE.EXE (SHAD)
*** 2005-04-26 20:27:11.455
*** ACTION NAME:() 2005-04-26 20:27:11.445
*** MODULE NAME:(SQL*Plus) 2005-04-26 20:27:11.445
*** SERVICE NAME:(markr10g) 2005-04-26 20:27:11.445
*** CLIENT ID:(px_test2) 2005-04-26 20:27:11.445
*** SESSION ID:(140.5258) 2005-04-26 20:27:11.445
WAIT #2: nam='reliable message' ela= 556 p1=1869466216 p2=1869427312 p3=1870014168
=====================
PARSING IN CURSOR #2 len=72 dep=0 uid=61 oct=47 lid=61 tim=376623839081 hv=4221193623 ad='6db34444'
BEGIN dbms_monitor.client_id_trace_enable(client_id=>'px_test2'); END;
END OF STMT
EXEC #2:c=10014,e=24730,p=5,cr=1,cu=7,mis=1,r=1,dep=0,og=1,tim=376623839073
WAIT #2: nam='log file sync' ela= 660 p1=15 p2=0 p3=0
WAIT #2: nam='SQL*Net message to client' ela= 4 p1=1413697536 p2=1 p3=0
*** 2005-04-26 20:31:13.192
WAIT #2: nam='SQL*Net message from client' ela= 241736360 p1=1413697536 p2=1 p3=0
=====================
...
And I can find the same in the PX slave trace files,
STAT #1 id=12 cnt=0 pid=7 pos=2 obj=0 op='PX BLOCK ITERATOR (cr=0 pr=0 pw=0 time=0 us)' STAT #1 id=13 cnt=0 pid=12 pos=1 obj=73335 op='TABLE ACCESS FULL SALES_COPY (cr=0 pr=0 pw=0 time=0 us)' *** TRACE DUMP CONTINUED FROM FILE c:\oracle\admin\markr10g\bdump\markr10g_p003_4672.trc *** *** 2005-04-26 20:31:14.544 *** SERVICE NAME:(markr10g) 2005-04-26 20:31:14.544 *** CLIENT ID:(px_test2) 2005-04-26 20:31:14.544 *** SESSION ID:(150.6373) 2005-04-26 20:31:14.544 WAIT #0: nam='PX Deq: Msg Fragment' ela= 687 p1=268566527 p2=1 p3=0 ===================== PARSING IN CURSOR #1 len=131 dep=1 uid=61 oct=3 lid=61 tim=376866932282 hv=3562105006 ad='6db3f074' select p.prod_subcategory, sum(s.amount_sold) from products p, sales_copy s where p.prod_id = s.prod_id group by p.prod_subcategory END OF STMT PARSE #1:c=0,e=171,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=1,tim=376866932270 WAIT #1: nam='PX qref latch' ela= 2777 p1=1 p2=1 p3=1849357532
Note that the SQL statement before the client ID was recorded was for my previous session when I didn't use a client ID.
Now, I can use the TRCSESS utility to create another trace file that contains all the SQL statements carried out by the 'px_test2' client ID. Where there are common SQL statements, such as our SELECT statement above, TKPROF will amalgamate them into a single statement that has multiple executions. TRCSESS is
found in the /bin directory along with TKPROF.
C:\oracle\admin\markr10g\udump>trcsess output = 'markr10g_px_test2_full.trc' clientid = 'px_test2' markr10g_ora_11704_px_test2.trc ../bdump/*.trc
Note here that I've included my main trace file, and all the PX slave trace files in the BDUMP directory, as all of them could contain statements tagged with my client ID.
Now, If I format the 'markr10g_px_test2_full.trc' trace file with TKPROF and take a look at my SELECT statement, I get a view of all of the waits that contributed to my statement execution time.
select p.prod_subcategory, sum(s.amount_sold) from products p, sales_copy s where p.prod_id = s.prod_id group by p.prod_subcategory call count cpu elapsed disk query current rows ------- ------ -------- ---------- ---------- ---------- ---------- ---------- Parse 5 0.00 0.04 0 1 0 0 Execute 5 1.75 22.13 4433 4479 0 0 Fetch 3 0.02 5.58 0 4 0 21 ------- ------ -------- ---------- ---------- ---------- ---------- ---------- total 13 1.77 27.75 4433 4484 0 21 Misses in library cache during parse: 1 Optimizer mode: ALL_ROWS Parsing user id: 61 (recursive depth: 1) Rows Row Source Operation ------- --------------------------------------------------- 0 PX COORDINATOR (cr=0 pr=0 pw=0 time=0 us) 0 SORT GROUP BY (cr=0 pr=0 pw=0 time=0 us)(Orphan Entry) 0 PX RECEIVE (cr=0 pr=0 pw=0 time=0 us) 0 SORT GROUP BY (cr=0 pr=0 pw=0 time=0 us)(Orphan Entry) 0 HASH JOIN (cr=0 pr=0 pw=0 time=0 us) 0 BUFFER SORT (cr=0 pr=0 pw=0 time=0 us) 0 PX RECEIVE (cr=0 pr=0 pw=0 time=0 us) 0 TABLE ACCESS FULL PRODUCTS (cr=0 pr=0 pw=0 time=0 us)(Orphan Entry) 0 PX BLOCK ITERATOR (cr=0 pr=0 pw=0 time=0 us) 0 TABLE ACCESS FULL SALES_COPY (cr=0 pr=0 pw=0 time=0 us) Elapsed times include waiting on following events: Event waited on Times Max. Wait Total Waited ---------------------------------------- Waited ---------- ------------ PX qref latch 29 0.03 0.23 PX Deq: Execution Msg 54 2.00 11.43 PX Deq: Table Q Normal 6 2.00 2.01 PX Deq Credit: send blkd 12 1.99 3.97 direct path read 145 0.23 4.27 PX Deq: Parse Reply 5 0.12 0.14 SQL*Net message to client 3 0.00 0.00 PX Deq: Execute Reply 55 0.33 3.14 SQL*Net message from client 3 89.12 89.13 PX Deq: Signal ACK 7 0.09 0.20 ********************************************************************************
You will see that the Parse and Execute counts for the statement are now 5, reflecting the fact that the main session then called on 4 PX slaves (2 to scan the SALES_COPY table, 2 to sort the results for the join). There are also a couple of points to note with this technique:
- Firstly, the elapsed time figure is the sum of all the elapsed times for the main session and the PX slaves, i.e. it won't reflect the actual time that the query took to execute. For this figure, you need to use the elapsed time in the main process trace file.
- This is an import one. If you keep executing the same query, with the same client ID, and then use TRCSESS to amalgamate the results, the PX slave trace files will still contain entries for this client ID and this statement from previous executions. This is because the PX slave sessions sit there 'permanently' and wait for parallel queries to run, and will write to the same trace file irrespective of the session ID of the calling session. What this means in reality is that an amalgamated trace file created on say the 10th execution of a statement for a particular client ID will contain all the accumulated waits, execution parse and fetch counts, and elapsed time, for all the previous PX executions of this query. The workaround for this is to ensure that your client ID is unique each execution, perhaps by just adding a number to it each time, either manually or via a sequence.
If you're looking for specific details on the waits that a particular PX slave encountered when executing a query, as opposed to the amalgamated (1 x main) + (4 x PX slaves) details, you can also run TRCSESS on a single PX slave trace file, to extract just those statements that relate to a particular client ID. To do this, run TRCSESS against the particular PX slave trace file, like this:
C:\oracle\admin\markr10g\udump>trcsess output = 'markr10g_p002_7568_full.trc' clientid = 'px_test2' markr10g_p002_7568.trc
I've used this a couple of times now and it seems to work for me. If I've missed anything or misinterpreted a trace feature, drop me a line and I'll incorporate the feedback.
- Mark Rittman's blog
- Log in to post comments
Comments
Vow .. just what i was looking for!! Wonderful! thanks Mark!!