Recursive WITH, part III: IS_LEAF
The CONNECT BY syntax provides a useful pseudocolumn, CONNECT_BY_ISLEAF, which identifies leaf nodes in the data: it’s 1 when a row has no further children, 0 otherwise. In this post, I’ll look at emulating this pseudocolumn using recursive WITH.
Let’s continue with the example from my previous posts about hierarchical data: the skeleton from the old song “Dem Dry Bones”.
UPDATE skeleton SET connected_to_the=NULL WHERE bone='head'; SELECT * FROM skeleton; BONE CONNECTED_TO_THE ---------------------------------------- ---------------------------------------- shoulder neck back shoulder hip back thigh hip knee thigh leg knee foot heel head neck head toe foot arm shoulder wrist arm ankle leg heel ankle finger wrist a rib back b rib back c rib back
With CONNECT BY, we can use the CONNECT_BY_ISLEAF pseudocolumn to identify leaf nodes:
SELECT bone, level, ltrim(sys_connect_by_path(bone,' -> '),' -> ') AS path FROM skeleton WHERE connect_by_isleaf=1 START WITH connected_to_the IS NULL CONNECT BY prior bone=connected_to_the ORDER siblings BY 1; BONE LEVEL PATH --------- ----- ----------------------------------------------------------------------------------------------- finger 6 head -> neck -> shoulder -> arm -> wrist -> finger a rib 5 head -> neck -> shoulder -> back -> a rib b rib 5 head -> neck -> shoulder -> back -> b rib c rib 5 head -> neck -> shoulder -> back -> c rib toe 12 head -> neck -> shoulder -> back -> hip -> thigh -> knee -> leg -> ankle -> heel -> foot -> toe
This pseudocolumn takes a little more thought to replicate using recursive WITH than the LEVEL pseudocolumn and the SYS_CONNECT_BY_PATH, which, as we saw in my last post, fall naturally out of the recursion.
We can imitate CONNECT_BY_ISLEAF by searching DEPTH FIRST and using the LEAD function to peek at the next row’s the_level value. If the next row’s level is higher than the current row, then it’s a child of the current row; otherwise, it’s not a child. Since, with DEPTH FIRST, all the children of a row come out before any siblings, if the next row isn’t a child, then the current row is a leaf.
WITH skellarchy (bone, parent, the_level) AS ( SELECT bone, connected_to_the, 0 FROM skeleton WHERE bone = 'head' UNION ALL SELECT s.bone, s.connected_to_the , r.the_level + 1 FROM skeleton s, skellarchy r WHERE r.bone = s.connected_to_the ) SEARCH DEPTH FIRST BY bone SET bone_order CYCLE bone SET is_a_cycle TO 'Y' DEFAULT 'N' SELECT lpad(' ',2*the_level, ' ') || bone AS bone_tree , the_level, lead(the_level) OVER (ORDER BY bone_order) AS next_level, CASE WHEN the_level < lead(the_level) OVER (ORDER BY bone_order) THEN NULL ELSE 'LEAF' END is_leaf FROM skellarchy ORDER BY bone_order; BONE_TREE THE_LEVEL NEXT_LEVEL IS_L --------------------------------------------- ---------- ---------- ---- head 0 1 neck 1 2 shoulder 2 3 arm 3 4 wrist 4 5 finger 5 3 LEAF back 3 4 a rib 4 4 LEAF b rib 4 4 LEAF c rib 4 4 LEAF hip 4 5 thigh 5 6 knee 6 7 leg 7 8 ankle 8 9 heel 9 10 foot 10 11 toe 11 LEAF
Watch out for Cycles
The first point of caution about this solution concerns cycles. In my last post, I had created a cycle by making the ‘head’ node’s parent the ‘toe’ node. If I’d left the cycle in the data, the toe node wouldn’t be a leaf any more, but this query would falsely identify the head as a leaf:
UPDATE skeleton SET connected_to_the='toe' WHERE bone='head'; BONE_TREE THE_LEVEL NEXT_LEVEL IS_L --------------------------------------------- ---------- ---------- ---- head 0 1 neck 1 2 shoulder 2 3 arm 3 4 wrist 4 5 finger 5 3 LEAF back 3 4 a rib 4 4 LEAF b rib 4 4 LEAF c rib 4 4 LEAF hip 4 5 thigh 5 6 knee 6 7 leg 7 8 ankle 8 9 heel 9 10 foot 10 11 toe 11 12 head 12 LEAF 19 rows selected.
This can be corrected for by adding WHERE IS_A_CYCLE=’N’ to the query.
Respect the order of evaluation…
A second point of caution: if I add a WHERE clause to the query that limits the number of levels, the last line of the resultset will always be identified as a leaf.
WITH skellarchy (bone, parent, the_level) AS ( SELECT bone, connected_to_the, 0 FROM skeleton WHERE bone = 'head' UNION ALL SELECT s.bone, s.connected_to_the , r.the_level + 1 FROM skeleton s, skellarchy r WHERE r.bone = s.connected_to_the ) SEARCH DEPTH FIRST BY bone SET bone_order CYCLE bone SET is_a_cycle TO 'Y' DEFAULT 'N' SELECT lpad(' ',2*the_level, ' ') || bone AS bone_tree , the_level, lead(the_level) OVER (ORDER BY bone_order) AS next_level, CASE WHEN the_level < lead(the_level) OVER (ORDER BY bone_order) THEN NULL ELSE 'LEAF' END is_leaf FROM skellarchy WHERE the_level < 8 ORDER BY bone_order; BONE_TREE THE_LEVEL NEXT_LEVEL IS_L ------------------------------------------------------------ ---------- ---------- ---- head 0 1 neck 1 2 shoulder 2 3 arm 3 4 wrist 4 5 finger 5 3 LEAF back 3 4 a rib 4 4 LEAF b rib 4 4 LEAF c rib 4 4 LEAF hip 4 5 thigh 5 6 knee 6 7 leg 7 LEAF <<<=====
The leg is falsely identified as a leaf, and NEXT_LEVEL comes out as NULL, even though the ‘leg’ row has a child row. Why is that? It’s because this solution uses the LEAD analytic function. With analytic functions, WHERE clauses are evaluated before the analytic functions.
Highlighting the relevant bits from the query:
WITH skellarchy AS ...[recursive WITH subquery]... SELECT ... LEAD(the_level) OVER (ORDER BY bone_order) AS next_level ... --analytic function FROM skellarchy WHERE the_level < 8 ... --where clause
To quote the documentation:
Analytic functions compute an aggregate value based on a group of rows…. The group of rows is called a window and is defined by the analytic_clause. For each row, a sliding window of rows is defined. The window determines the range of rows used to perform the calculations for the current row…. Analytic functions are the last set of operations performed in a query except for the final ORDER BY clause. All joins and all WHERE, GROUP BY, and HAVING clauses are completed before the analytic functions are processed.
In the query above, “where the_level < 8" will be evaluated before LEAD(the_level). The EXPLAIN PLAN shows this very clearly:
----------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ----------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 2 | 76 | 8 (25)| 00:00:01 | | 1 | WINDOW BUFFER | | 2 | 76 | 8 (25)| 00:00:01 | <<=== LEAD |* 2 | VIEW | | 2 | 76 | 8 (25)| 00:00:01 | <<=== filter("THE_LEVEL"<8) | 3 | UNION ALL (RECURSIVE WITH) DEPTH FIRST| | | | | | |* 4 | TABLE ACCESS FULL | SKELETON | 1 | 24 | 2 (0)| 00:00:01 | |* 5 | HASH JOIN | | 1 | 49 | 5 (20)| 00:00:01 | | 6 | RECURSIVE WITH PUMP | | | | | | | 7 | TABLE ACCESS FULL | SKELETON | 18 | 432 | 2 (0)| 00:00:01 | ----------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - filter("THE_LEVEL"<8) 4 - filter("BONE"='head') 5 - access("R"."BONE"="S"."CONNECTED_TO_THE")
The WINDOW BUFFER (analytic window) is evaluated after the VIEW which filters on “THE_LEVEL”<8. So, "lead(the_level) over (order by bone_order)" will be null where the_level=7, and the 'leg' wrongly identified as a leaf node. What we actually want is for the analytic function LEAD to run over the whole resultset, and only then limit the results to show the levels 0-7. The obvious way to do this is to wrap the query in a second SELECT statement:
SELECT * FROM ( WITH skellarchy (bone, parent, the_level) AS ( SELECT bone, connected_to_the, 0 FROM skeleton WHERE bone = 'head' UNION ALL SELECT s.bone, s.connected_to_the , r.the_level + 1 FROM skeleton s, skellarchy r WHERE r.bone = s.connected_to_the ) SEARCH DEPTH FIRST BY bone SET bone_order CYCLE bone SET is_a_cycle TO 'Y' DEFAULT 'N' SELECT lpad(' ',2*the_level, ' ') || bone AS bone_tree , the_level, lead(the_level) OVER (ORDER BY bone_order) AS next_level, CASE WHEN the_level < lead(the_level) OVER (ORDER BY bone_order) THEN NULL ELSE 'LEAF' END is_leaf FROM skellarchy ORDER BY bone_order ) WHERE the_level < 8; BONE_TREE THE_LEVEL NEXT_LEVEL IS_L ------------------------------------------------------------ ---------- ---------- ---- head 0 1 neck 1 2 shoulder 2 3 arm 3 4 wrist 4 5 finger 5 3 LEAF back 3 4 a rib 4 4 LEAF b rib 4 4 LEAF c rib 4 4 LEAF hip 4 5 thigh 5 6 knee 6 7 leg 7 8
Now, the analytic function in the inner query is evaluated first, before the WHERE clause in the outer query. We can see this in the EXPLAIN PLAN too, of course. Now the WINDOW BUFFER (analytic window) is evaluated before the VIEW with filter(“THE_LEVEL”<8) :
------------------------------------------------------------------------------------------------------ | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ------------------------------------------------------------------------------------------------------ | 0 | SELECT STATEMENT | | 2 | 4068 | 8 (25)| 00:00:01 | |* 1 | VIEW | | 2 | 4068 | 8 (25)| 00:00:01 | <<=== filter("THE_LEVEL"<8) | 2 | WINDOW BUFFER | | 2 | 76 | 8 (25)| 00:00:01 | <<=== LEAD | 3 | VIEW | | 2 | 76 | 8 (25)| 00:00:01 | | 4 | UNION ALL (RECURSIVE WITH) DEPTH FIRST| | | | | | |* 5 | TABLE ACCESS FULL | SKELETON | 1 | 24 | 2 (0)| 00:00:01 | |* 6 | HASH JOIN | | 1 | 49 | 5 (20)| 00:00:01 | | 7 | RECURSIVE WITH PUMP | | | | | | | 8 | TABLE ACCESS FULL | SKELETON | 18 | 432 | 2 (0)| 00:00:01 | ------------------------------------------------------------------------------------------------------ Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter("THE_LEVEL"<8) 5 - filter("BONE"='head') 6 - access("R"."BONE"="S"."CONNECTED_TO_THE")
This is one case of the general point that, as Tom Kyte explains in this Ask Tom answer,“select analytic_function from t where CONDITION” is NOT THE SAME AS “select * from (select analytic_function from t) where CONDITION”.
So, to sum up my last few posts, we can do everything that CONNECT BY can do with the 11g recursive WITH syntax. Plus, the recursive WITH syntax makes it easy to express simple recursive algorithms in SQL.
Republished with permission. Original URL: http://rdbms-insight.com/wp/?p=135
- Natalka Roshak's blog
- Log in to post comments