SubQuery - Joins performance compare [message #580070] |
Tue, 19 March 2013 18:17 |
|
isri
Messages: 16 Registered: November 2012
|
Junior Member |
|
|
Hello,
I'm wondering about difference in performance between SubQueries used in Select statement and Joins operations. Let's have a look for a very basic example:
********************************************************************************
SQL ID: 7qf3v64qrrjcg Plan Hash: 3238298962
SELECT D.DEPARTMENT_ID, L.LOCATION_ID
FROM
HR.DEPARTMENTS D LEFT JOIN HR.LOCATIONS L ON (L.LOCATION_ID = D.LOCATION_ID)
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 1 0.00 0.00 0 6 0 27
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 3 0.00 0.00 0 6 0 27
Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: 5
Number of plan statistics captured: 1
Rows (1st) Rows (avg) Rows (max) Row Source Operation
---------- ---------- ---------- ---------------------------------------------------
27 27 27 NESTED LOOPS OUTER (cr=6 pr=0 pw=0 time=392 us cost=2 size=270 card=27)
27 27 27 TABLE ACCESS FULL DEPARTMENTS (cr=2 pr=0 pw=0 time=112 us cost=2 size=189 card=27)
27 27 27 INDEX UNIQUE SCAN LOC_ID_PK (cr=4 pr=0 pw=0 time=111 us cost=0 size=3 card=1)(object id 20135)
********************************************************************************
SQL ID: amtfyvm0wv1q5 Plan Hash: 955629947
SELECT D.DEPARTMENT_ID, (SELECT LOCATION_ID
FROM
HR.LOCATIONS L WHERE L.LOCATION_ID = D.LOCATION_ID) FROM HR.DEPARTMENTS D
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 1 0.00 0.00 0 6 0 27
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 3 0.00 0.00 0 6 0 27
Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: 5
Number of plan statistics captured: 1
Rows (1st) Rows (avg) Rows (max) Row Source Operation
---------- ---------- ---------- ---------------------------------------------------
7 7 7 INDEX UNIQUE SCAN LOC_ID_PK (cr=4 pr=0 pw=0 time=37 us cost=0 size=3 card=1)(object id 20135)
27 27 27 TABLE ACCESS FULL DEPARTMENTS (cr=2 pr=0 pw=0 time=58 us cost=2 size=189 card=27)
********************************************************************************
I've thought that writing such subqueries in select statement are rather not good when we are thinking about performance (and of course code readability).
I know example above is quite easy, and it looks like school homework - it is not
Last time I was working on some query that was looking more less like that:
SELECT
A.COL1,
B.COL2,
A.COL3,
(SELECT COL4 FROM TABLEC WHERE ID = A.ID),
(SELECT COL5 FROM TABLEC WHERE ID = A.ID),
(SELECT COL6 FROM TABLED WHERE ID = A.ID),
(SELECT COL1 FROM TABLEE WHERE ID = A.ID),
(SELECT COL2 FROM TABLEF WHERE ID = A.ID),
(SELECT COL3 FROM TABLEF WHERE ID = A.ID),
(SELECT COL8 FROM TABLEG WHERE ID = A.ID),
(SELECT COL3 FROM TABLEG WHERE ID = A.ID),
(SELECT COL12 FROM TABLEG WHERE ID = A.ID),
(SELECT COL13 FROM TABLEH WHERE ID = A.ID),
(SELECT COL14 FROM TABLEH WHERE ID = B.ID),
(SELECT COL10 FROM TABLEA WHERE ID = B.ID),
(SELECT COL11 FROM TABLEA WHERE ID = B.ID),
(SELECT COL12 FROM TABLEB WHERE ID = B.ID),
(SELECT COL14 FROM TABLED WHERE ID = B.ID),
(SELECT COL15 FROM TABLEK WHERE ID = B.ID),
(SELECT COL16 FROM TABLEB WHERE ID = B.ID)
FROM TABA A INNER JOIN TABB B ON (A.ID = B.ID)
UNION ALL
SELECT (...)
UNION ALL
SELECT (...)
Additionally there were a lot of NVL and DECODE within subselects and additionally those subselects were usually made from few tables joined or 'unioned'. It was running quite well but was completely unreadable...
I've spent few hours to rebuild it to more readable way (using Joins instead of subselect) but each time I was removing one of subselect performance was decreasing. Example above, although it is very simple I think it shows very well how it looks like.
I wanted to ask is it normal and if there is a way to make query that is using JOINS and would work at least as fast as second query with subselects?
Many thanks
P.S. I'm using HR_SCHEMA delivered as example schema by Oracle, please see attachment.
-
Attachment: hr_whole.sql
(Size: 74.47KB, Downloaded 1767 times)
|
|
|
|
Re: SubQuery - Joins performance compare [message #580082 is a reply to message #580070] |
Wed, 20 March 2013 01:33 |
michael_bialik
Messages: 621 Registered: July 2006
|
Senior Member |
|
|
As usual - it depends.
In your example
SELECT D.DEPARTMENT_ID, L.LOCATION_ID
FROM
HR.DEPARTMENTS D LEFT JOIN HR.LOCATIONS L ON (L.LOCATION_ID = D.LOCATION_ID)
vs
SELECT D.DEPARTMENT_ID, (SELECT LOCATION_ID
FROM
HR.LOCATIONS L WHERE L.LOCATION_ID = D.LOCATION_ID) FROM HR.DEPARTMENTS D
for both queries execution plan is about the same:
1. Full table scan on DEPARTMENTS
2. Nested loop join to LOCATIONS using unique index (for each row of DEPARTMENTS).
So the performance is about the same as well.
However when looking at "real" query you can see that it performs on average 2 accesses to the same row:
...(SELECT COL4 FROM TABLEC WHERE ID = A.ID),
(SELECT COL5 FROM TABLEC WHERE ID = A.ID),...
accesses the same row of TABLEC twice,
...(SELECT COL8 FROM TABLEG WHERE ID = A.ID),
(SELECT COL3 FROM TABLEG WHERE ID = A.ID),
(SELECT COL12 FROM TABLEG WHERE ID = A.ID),...
causes 3 accesses to the same row of TABLEG and so on...
When using JOIN operation - all these tables will be accessed one time only. That's the main reason (IMHO) to use
JOIN instead of scalar selects.
Another reason is that scalar select simulates NESTED LOOP join. When joining large number of rows - HASH or even SORT-MERGE
may be preferable (performance wise), so there in no reason forcing optimizer to use one specific join type.
If your "JOIN" query performance is worse then query with sub/scalar selects - post tkprof.
HTH
|
|
|
Re: SubQuery - Joins performance compare [message #580087 is a reply to message #580082] |
Wed, 20 March 2013 02:56 |
|
isri
Messages: 16 Registered: November 2012
|
Junior Member |
|
|
Hello,
Many thanks for the answers. Unfortunately I'm not allowed to post here or anywhere the code I was working on, maybe I will be able to build simillar situation at home.
One more thing I was wondering - the 'time'
(...)
Rows (1st) Rows (avg) Rows (max) Row Source Operation
---------- ---------- ---------- ---------------------------------------------------
27 27 27 NESTED LOOPS OUTER (cr=6 pr=0 pw=0 time=392 us cost=2 size=270 card=27)
27 27 27 TABLE ACCESS FULL DEPARTMENTS (cr=2 pr=0 pw=0 time=112 us cost=2 size=189 card=27)
27 27 27 INDEX UNIQUE SCAN LOC_ID_PK (cr=4 pr=0 pw=0 time=111 us cost=0 size=3 card=1)(object id 20135)
(...)
Rows (1st) Rows (avg) Rows (max) Row Source Operation
---------- ---------- ---------- ---------------------------------------------------
7 7 7 INDEX UNIQUE SCAN LOC_ID_PK (cr=4 pr=0 pw=0 time=37 us cost=0 size=3 card=1)(object id 20135)
27 27 27 TABLE ACCESS FULL DEPARTMENTS (cr=2 pr=0 pw=0 time=58 us cost=2 size=189 card=27)
(...)
If I understand that correctly it is time that DB spent to perform specified operation. In firtst case (Joins) we've got 392 + 112 + 111 = 615, in second (subqueries) 37 + 58 = 95 - about 6 time faster. And that is surprise for me.
[Updated on: Wed, 20 March 2013 02:57] Report message to a moderator
|
|
|