Home » RDBMS Server » Performance Tuning » Correlated subquery iteration (DB 19.3 (unpatched))
Correlated subquery iteration [message #685617] Fri, 18 February 2022 10:20 Go to next message
John Watson
Messages: 8960
Registered: January 2010
Location: Global Village
Senior Member
I'm having trouble understanding what Oracle is doing with a simple query: which employees have a salary lower than the average for their department? Like this:
select * from emp e where e.sal < (select avg(f.sal) from emp f join emp e using (deptno));
In principle, Oracle has to run the subquery 14 times, once for each row. However, I was expecting this to be optimized by caching the result for each distinct value of deptno, so that it would be run only 3 times.
What I get is this:
orclz> select * from table(dbms_xplan.display_cursor(format=>'allstats last +cost'));

PLAN_TABLE_OUTPUT
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID  a2myk4d9fkzj3, child number 1
-------------------------------------
select * from emp e where e.sal < (select avg(f.sal) from emp f join
emp e using (deptno))

Plan hash value: 2865445000

-------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation            | Name | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
-------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |      |      1 |        |     9 (100)|      8 |00:00:00.01 |      20 |       |       |          |
|*  1 |  TABLE ACCESS FULL   | EMP  |      1 |      8 |     3   (0)|      8 |00:00:00.01 |      20 |       |       |          |
|   2 |   SORT AGGREGATE     |      |      1 |      1 |            |      1 |00:00:00.01 |      13 |       |       |          |
|*  3 |    HASH JOIN         |      |      1 |     70 |     6   (0)|     70 |00:00:00.01 |      13 |  2078K|  2078K| 1005K (0)|
|   4 |     TABLE ACCESS FULL| EMP  |      1 |     14 |     3   (0)|     14 |00:00:00.01 |       6 |       |       |          |
|   5 |     TABLE ACCESS FULL| EMP  |      1 |     14 |     3   (0)|     14 |00:00:00.01 |       6 |       |       |          |
-------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("E"."SAL"<)
   3 - access("F"."DEPTNO"="E"."DEPTNO")

Note
-----
   - statistics feedback used for this statement

orclz>
so somehow Oracle is running the subquery only once. I could understand that if the subquery were materialized and the outer query then re-written as a join. But it isn't doing that. Can anyone explain to me in words what is going on?
Even if I prevent re-writes, it somehow manages to run the subquery only once:
orclz> select /*+ no_query_transformation */ * from emp e where e.sal < (select avg(f.sal) from emp f join emp e using (deptno));

          EMPNO ENAME      JOB                   MGR HIREDATE                        SAL            COMM          DEPTNO
--------------- ---------- --------- --------------- ------------------- --------------- --------------- ---------------
           7369 SMITH      CLERK                7902 1980-12-17:00:00:00             800                              20
           7499 ALLEN      SALESMAN             7698 1981-02-20:00:00:00            1600             300              30
           7521 WARD       SALESMAN             7698 1981-02-22:00:00:00            1250             500              30
           7654 MARTIN     SALESMAN             7698 1981-09-28:00:00:00            1250            1400              30
           7844 TURNER     SALESMAN             7698 1981-09-08:00:00:00            1500               0              30
           7876 ADAMS      CLERK                7788 1987-05-23:00:00:00            1100                              20
           7900 JAMES      CLERK                7698 1981-12-03:00:00:00             950                              30
           7934 MILLER     CLERK                7782 1982-01-23:00:00:00            1300                              10
orclz> select * from table(dbms_xplan.display_cursor(format=>'allstats last +cost'));

PLAN_TABLE_OUTPUT
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID  6842zf033hgmv, child number 0
-------------------------------------
select /*+ no_query_transformation */ * from emp e where e.sal <
(select avg(f.sal) from emp f join emp e using (deptno))

Plan hash value: 3993346602

--------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation             | Name | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
--------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |      |      1 |        |     9 (100)|      8 |00:00:00.01 |      20 |       |       |          |
|*  1 |  FILTER               |      |      1 |        |            |      8 |00:00:00.01 |      20 |       |       |          |
|   2 |   TABLE ACCESS FULL   | EMP  |      1 |     14 |     3   (0)|     14 |00:00:00.01 |       7 |       |       |          |
|   3 |   SORT AGGREGATE      |      |      1 |      1 |            |      1 |00:00:00.01 |      13 |       |       |          |
|   4 |    VIEW               |      |      1 |     70 |     6   (0)|     70 |00:00:00.01 |      13 |       |       |          |
|*  5 |     HASH JOIN         |      |      1 |     70 |     6   (0)|     70 |00:00:00.01 |      13 |  2078K|  2078K| 1005K (0)|
|   6 |      TABLE ACCESS FULL| EMP  |      1 |     14 |     3   (0)|     14 |00:00:00.01 |       6 |       |       |          |
|   7 |      TABLE ACCESS FULL| EMP  |      1 |     14 |     3   (0)|     14 |00:00:00.01 |       6 |       |       |          |
--------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("E"."SAL"<)
   5 - access("F"."DEPTNO"="E"."DEPTNO")

orclz>
A filter with two child operations is meant to be iterative! So how is it managing to do it without iterating the subquery?

Thank you for any insight.
Re: Correlated subquery iteration [message #685618 is a reply to message #685617] Fri, 18 February 2022 11:46 Go to previous messageGo to next message
Michel Cadot
Messages: 68716
Registered: March 2007
Location: Saint-Maur, France, https...
Senior Member
Account Moderator

Your query is not correct: you do not correlate the department of the employee of main part with the average in the subquery.
It should be something like:
SQL> select * from emp e
  2  where e.sal < (select avg(f.sal) from emp f where f.deptno = e.deptno)
  3  order by empno;
     EMPNO ENAME      JOB              MGR HIREDATE                   SAL       COMM     DEPTNO
---------- ---------- --------- ---------- ------------------- ---------- ---------- ----------
      7369 SMITH      CLERK           7902 17/12/1980 00:00:00        800                    20
      7521 WARD       SALESMAN        7698 22/02/1981 00:00:00       1250        500         30
      7654 MARTIN     SALESMAN        7698 28/09/1981 00:00:00       1250       1400         30
      7782 CLARK      MANAGER         7839 09/06/1981 00:00:00       2450                    10
      7844 TURNER     SALESMAN        7698 08/09/1981 00:00:00       1500          0         30
      7876 ADAMS      CLERK           7788 23/05/1987 00:00:00       1100                    20
      7900 JAMES      CLERK           7698 03/12/1981 00:00:00        950                    30
      7934 MILLER     CLERK           7782 23/01/1982 00:00:00       1300                    10

8 rows selected.
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------
Plan hash value: 1245077725

--------------------------------------------------------------------------------
| Id  | Operation            | Name    | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |         |     1 |    64 |     7  (15)| 00:00:01 |
|*  1 |  HASH JOIN           |         |     1 |    64 |     7  (15)| 00:00:01 |
|   2 |   VIEW               | VW_SQ_1 |     3 |    78 |     4  (25)| 00:00:01 |
|   3 |    HASH GROUP BY     |         |     3 |    21 |     4  (25)| 00:00:01 |
|   4 |     TABLE ACCESS FULL| EMP     |    14 |    98 |     3   (0)| 00:00:01 |
|   5 |   TABLE ACCESS FULL  | EMP     |    14 |   532 |     3   (0)| 00:00:01 |
--------------------------------------------------------------------------------

Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------

   1 - SEL$C772B8D1
   2 - SEL$683B0107 / VW_SQ_1@SEL$7511BFD2
   3 - SEL$683B0107
   4 - SEL$683B0107 / F@SEL$2
   5 - SEL$C772B8D1 / E@SEL$1

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("ITEM_1"="E"."DEPTNO")
       filter("E"."SAL"<"AVG(F.SAL)")

Column Projection Information (identified by operation id):
-----------------------------------------------------------

   1 - (#keys=1; rowset=256) "E"."DEPTNO"[NUMBER,22],
       "E"."EMPNO"[NUMBER,22], "E"."ENAME"[VARCHAR2,10],
       "E"."JOB"[VARCHAR2,9], "E"."MGR"[NUMBER,22], "E"."HIREDATE"[DATE,7],
       "E"."SAL"[NUMBER,22], "E"."COMM"[NUMBER,22]
   2 - (rowset=256) "AVG(F.SAL)"[NUMBER,22], "ITEM_1"[NUMBER,22]
   3 - (#keys=1; rowset=256) "F"."DEPTNO"[NUMBER,22],
       COUNT("F"."SAL")[22], SUM("F"."SAL")[22]
   4 - (rowset=256) "F"."SAL"[NUMBER,22], "F"."DEPTNO"[NUMBER,22]
   5 - (rowset=256) "E"."EMPNO"[NUMBER,22], "E"."ENAME"[VARCHAR2,10],
       "E"."JOB"[VARCHAR2,9], "E"."MGR"[NUMBER,22], "E"."HIREDATE"[DATE,7],
       "E"."SAL"[NUMBER,22], "E"."COMM"[NUMBER,22], "E"."DEPTNO"[NUMBER,22]
(I removed the ORDER BY for the plan)

[Updated on: Fri, 18 February 2022 11:47]

Report message to a moderator

Re: Correlated subquery iteration [message #685625 is a reply to message #685618] Sat, 19 February 2022 04:48 Go to previous messageGo to next message
John Watson
Messages: 8960
Registered: January 2010
Location: Global Village
Senior Member
Thank you, your query gives the plan I expected: the correlation re-written to a join.
Re: Correlated subquery iteration [message #685626 is a reply to message #685625] Sat, 19 February 2022 05:46 Go to previous message
Michel Cadot
Messages: 68716
Registered: March 2007
Location: Saint-Maur, France, https...
Senior Member
Account Moderator

If the number of departments is tiny comparing to the number of employees, it should be better to rewrite the query to something like:
with avgs as (select /*+ materialize */ deptno, avg(sal) avgsal from emp group by deptno)
select e.*
from emp e join avgs a on a.deptno = e.deptno
where e.sal < a.avgsal
/

[Updated on: Sat, 19 February 2022 05:46]

Report message to a moderator

Previous Topic: Long running queries
Next Topic: Re-Partitioning
Goto Forum:
  


Current Time: Sat Nov 23 02:20:44 CST 2024