Home » RDBMS Server » Performance Tuning » Correlated subquery iteration (DB 19.3 (unpatched))
Correlated subquery iteration [message #685617] |
Fri, 18 February 2022 10:20 |
John Watson
Messages: 8960 Registered: January 2010 Location: Global Village
|
Senior Member |
|
|
I'm having trouble understanding what Oracle is doing with a simple query: which employees have a salary lower than the average for their department? Like this:
select * from emp e where e.sal < (select avg(f.sal) from emp f join emp e using (deptno));
In principle, Oracle has to run the subquery 14 times, once for each row. However, I was expecting this to be optimized by caching the result for each distinct value of deptno, so that it would be run only 3 times.
What I get is this:
orclz> select * from table(dbms_xplan.display_cursor(format=>'allstats last +cost'));
PLAN_TABLE_OUTPUT
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID a2myk4d9fkzj3, child number 1
-------------------------------------
select * from emp e where e.sal < (select avg(f.sal) from emp f join
emp e using (deptno))
Plan hash value: 2865445000
-------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | Cost (%CPU)| A-Rows | A-Time | Buffers | OMem | 1Mem | Used-Mem |
-------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 9 (100)| 8 |00:00:00.01 | 20 | | | |
|* 1 | TABLE ACCESS FULL | EMP | 1 | 8 | 3 (0)| 8 |00:00:00.01 | 20 | | | |
| 2 | SORT AGGREGATE | | 1 | 1 | | 1 |00:00:00.01 | 13 | | | |
|* 3 | HASH JOIN | | 1 | 70 | 6 (0)| 70 |00:00:00.01 | 13 | 2078K| 2078K| 1005K (0)|
| 4 | TABLE ACCESS FULL| EMP | 1 | 14 | 3 (0)| 14 |00:00:00.01 | 6 | | | |
| 5 | TABLE ACCESS FULL| EMP | 1 | 14 | 3 (0)| 14 |00:00:00.01 | 6 | | | |
-------------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("E"."SAL"<)
3 - access("F"."DEPTNO"="E"."DEPTNO")
Note
-----
- statistics feedback used for this statement
orclz>
so somehow Oracle is running the subquery only once. I could understand that if the subquery were materialized and the outer query then re-written as a join. But it isn't doing that. Can anyone explain to me in words what is going on?
Even if I prevent re-writes, it somehow manages to run the subquery only once:
orclz> select /*+ no_query_transformation */ * from emp e where e.sal < (select avg(f.sal) from emp f join emp e using (deptno));
EMPNO ENAME JOB MGR HIREDATE SAL COMM DEPTNO
--------------- ---------- --------- --------------- ------------------- --------------- --------------- ---------------
7369 SMITH CLERK 7902 1980-12-17:00:00:00 800 20
7499 ALLEN SALESMAN 7698 1981-02-20:00:00:00 1600 300 30
7521 WARD SALESMAN 7698 1981-02-22:00:00:00 1250 500 30
7654 MARTIN SALESMAN 7698 1981-09-28:00:00:00 1250 1400 30
7844 TURNER SALESMAN 7698 1981-09-08:00:00:00 1500 0 30
7876 ADAMS CLERK 7788 1987-05-23:00:00:00 1100 20
7900 JAMES CLERK 7698 1981-12-03:00:00:00 950 30
7934 MILLER CLERK 7782 1982-01-23:00:00:00 1300 10
orclz> select * from table(dbms_xplan.display_cursor(format=>'allstats last +cost'));
PLAN_TABLE_OUTPUT
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID 6842zf033hgmv, child number 0
-------------------------------------
select /*+ no_query_transformation */ * from emp e where e.sal <
(select avg(f.sal) from emp f join emp e using (deptno))
Plan hash value: 3993346602
--------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | Cost (%CPU)| A-Rows | A-Time | Buffers | OMem | 1Mem | Used-Mem |
--------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 9 (100)| 8 |00:00:00.01 | 20 | | | |
|* 1 | FILTER | | 1 | | | 8 |00:00:00.01 | 20 | | | |
| 2 | TABLE ACCESS FULL | EMP | 1 | 14 | 3 (0)| 14 |00:00:00.01 | 7 | | | |
| 3 | SORT AGGREGATE | | 1 | 1 | | 1 |00:00:00.01 | 13 | | | |
| 4 | VIEW | | 1 | 70 | 6 (0)| 70 |00:00:00.01 | 13 | | | |
|* 5 | HASH JOIN | | 1 | 70 | 6 (0)| 70 |00:00:00.01 | 13 | 2078K| 2078K| 1005K (0)|
| 6 | TABLE ACCESS FULL| EMP | 1 | 14 | 3 (0)| 14 |00:00:00.01 | 6 | | | |
| 7 | TABLE ACCESS FULL| EMP | 1 | 14 | 3 (0)| 14 |00:00:00.01 | 6 | | | |
--------------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("E"."SAL"<)
5 - access("F"."DEPTNO"="E"."DEPTNO")
orclz> A filter with two child operations is meant to be iterative! So how is it managing to do it without iterating the subquery?
Thank you for any insight.
|
|
|
Re: Correlated subquery iteration [message #685618 is a reply to message #685617] |
Fri, 18 February 2022 11:46 |
|
Michel Cadot
Messages: 68716 Registered: March 2007 Location: Saint-Maur, France, https...
|
Senior Member Account Moderator |
|
|
Your query is not correct: you do not correlate the department of the employee of main part with the average in the subquery.
It should be something like:
SQL> select * from emp e
2 where e.sal < (select avg(f.sal) from emp f where f.deptno = e.deptno)
3 order by empno;
EMPNO ENAME JOB MGR HIREDATE SAL COMM DEPTNO
---------- ---------- --------- ---------- ------------------- ---------- ---------- ----------
7369 SMITH CLERK 7902 17/12/1980 00:00:00 800 20
7521 WARD SALESMAN 7698 22/02/1981 00:00:00 1250 500 30
7654 MARTIN SALESMAN 7698 28/09/1981 00:00:00 1250 1400 30
7782 CLARK MANAGER 7839 09/06/1981 00:00:00 2450 10
7844 TURNER SALESMAN 7698 08/09/1981 00:00:00 1500 0 30
7876 ADAMS CLERK 7788 23/05/1987 00:00:00 1100 20
7900 JAMES CLERK 7698 03/12/1981 00:00:00 950 30
7934 MILLER CLERK 7782 23/01/1982 00:00:00 1300 10
8 rows selected.
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------
Plan hash value: 1245077725
--------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 64 | 7 (15)| 00:00:01 |
|* 1 | HASH JOIN | | 1 | 64 | 7 (15)| 00:00:01 |
| 2 | VIEW | VW_SQ_1 | 3 | 78 | 4 (25)| 00:00:01 |
| 3 | HASH GROUP BY | | 3 | 21 | 4 (25)| 00:00:01 |
| 4 | TABLE ACCESS FULL| EMP | 14 | 98 | 3 (0)| 00:00:01 |
| 5 | TABLE ACCESS FULL | EMP | 14 | 532 | 3 (0)| 00:00:01 |
--------------------------------------------------------------------------------
Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------
1 - SEL$C772B8D1
2 - SEL$683B0107 / VW_SQ_1@SEL$7511BFD2
3 - SEL$683B0107
4 - SEL$683B0107 / F@SEL$2
5 - SEL$C772B8D1 / E@SEL$1
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("ITEM_1"="E"."DEPTNO")
filter("E"."SAL"<"AVG(F.SAL)")
Column Projection Information (identified by operation id):
-----------------------------------------------------------
1 - (#keys=1; rowset=256) "E"."DEPTNO"[NUMBER,22],
"E"."EMPNO"[NUMBER,22], "E"."ENAME"[VARCHAR2,10],
"E"."JOB"[VARCHAR2,9], "E"."MGR"[NUMBER,22], "E"."HIREDATE"[DATE,7],
"E"."SAL"[NUMBER,22], "E"."COMM"[NUMBER,22]
2 - (rowset=256) "AVG(F.SAL)"[NUMBER,22], "ITEM_1"[NUMBER,22]
3 - (#keys=1; rowset=256) "F"."DEPTNO"[NUMBER,22],
COUNT("F"."SAL")[22], SUM("F"."SAL")[22]
4 - (rowset=256) "F"."SAL"[NUMBER,22], "F"."DEPTNO"[NUMBER,22]
5 - (rowset=256) "E"."EMPNO"[NUMBER,22], "E"."ENAME"[VARCHAR2,10],
"E"."JOB"[VARCHAR2,9], "E"."MGR"[NUMBER,22], "E"."HIREDATE"[DATE,7],
"E"."SAL"[NUMBER,22], "E"."COMM"[NUMBER,22], "E"."DEPTNO"[NUMBER,22]
(I removed the ORDER BY for the plan)
[Updated on: Fri, 18 February 2022 11:47] Report message to a moderator
|
|
|
|
Re: Correlated subquery iteration [message #685626 is a reply to message #685625] |
Sat, 19 February 2022 05:46 |
|
Michel Cadot
Messages: 68716 Registered: March 2007 Location: Saint-Maur, France, https...
|
Senior Member Account Moderator |
|
|
If the number of departments is tiny comparing to the number of employees, it should be better to rewrite the query to something like:
with avgs as (select /*+ materialize */ deptno, avg(sal) avgsal from emp group by deptno)
select e.*
from emp e join avgs a on a.deptno = e.deptno
where e.sal < a.avgsal
/
[Updated on: Sat, 19 February 2022 05:46] Report message to a moderator
|
|
|
Goto Forum:
Current Time: Sat Nov 23 02:20:44 CST 2024
|