The SQL WITH Clause, (Order Your Thoughts, Reuse Your Code)
So here I am writing even more documentation for my current Gig, and thinking once again, why not post it to OraFAQ and get double duty out of the document. So here is a discussion of the WITH clause that comes with the SELECT statement now. It is easy to use, and handy as all get out, but seems many people have not yet noticed it. Hmm... a SELECT statement that does not start with SELECT.
I like examples as a learning tools, so lets start off with some seemingly silly code.
col dummy format a10 select dummy from dual / DUMMY ---------- X 1 row selected.
So where is this going?. Check this out.
WITH stupid_is_as_stupid_does as ( select dummy from dual ) select * from stupid_is_as_stupid_does / DUMMY ---------- X 1 row selected.
What just happened? Conceptually you can think of it like this: we created a temporary table called STUPID_IS_AS_STUPID_DOES and then selected from it. I say, conceptually you can think of it like this because although Oracle could use temporary storage as a place to keep rows from our “named” SELECT statement, it usually does not do that but instead merges SQL before executing anything. But that just confuses what we want to know here which is, what the WITH clause does for us.
So what we are saying is, conceptually, the WITH clause allows us to give names to predefined SELECT statements inside the context of a larger SELECT statement. We can then reference the NAMED SELECT statements later. Now consider this next sequence of SELECT statements:
with temp_t1 as ( select dummy c1 from dual ) select * from temp_t1 a ,temp_t1 b / C C - - X X 1 row selected. with temp_t1 as ( select dummy c1 from dual ) ,temp_t2 as ( select dummy c1 from dual ) select * from temp_t1 a ,temp_t2 b / C C - - X X 1 row selected. with temp_t1 as ( select dummy c1 from dual ) ,temp_t2 as ( select dummy c1 from dual a,temp_t1 b where b.c1 = a.dummy ) select * from temp_t2 a / C - X 1 row selected. SQL> select * 2 from temp_t2 a 3 / from temp_t2 a * ERROR at line 2: ORA-00942: table or view does not exist SQL>
From these four SELECT statements we learn that the WITH clause lets us do the following:
1) we can reference a named query any number of times
2) we can create any number of named queries
3) named queries can reference other named queries that came before them and even correlate to previous named queries
4) named queries are only good for the SELECT statement that names them, their scope is local to the SELECT in which they are defined, hence no sharing across statements.
So what does the WITH clause really do for us? Well, there are two main benefits.
The most obvious thing one gets from use of the WITH clause is the ability to construct reusable components inside a SELECT. We can give a name to a set of rows (no matter how complicated the SELECT is to build them), and then use the name of the NAMED SELECT, anywhere we would have otherwise had to duplicate the text we named. The examples above show this in action. But as nice as it is to be able to reduce the amount of code one has to write, there is still a much more important reason to use the WITH clause, LOGICAL THINKING.
A better way of using the WITH clause is as a mechanism for exposing the thought process behind a SELECT. These days people are writing some pretty fancy stuff; long queries with many parts. Interestingly enough, it is typical for large queries like this to be constructed by a thought process where in the pieces of the query build upon each other in a stepwise fashion. Unfortunately large SQL written this way can be difficult to parse mentally. WITH clause to the rescue.
To see how the WITH clause can help in this situation consider the following scenario. You are building a data warehouse, and you want to create what is called a DENSE, MONTHLY INCEPTION TO DATE SUMMARY FACT. The data below will show you what this is:
This data from our daily fact table:
CALENDAR_DATE CLAIM_ID AMOUNT_PAID -------------------- ---------- ----------- 08-jan-2007 00:00:00 1 100 11-jan-2007 00:00:00 1 100 23-feb-2007 00:00:00 1 100 14-jun-2007 00:00:00 1 100 4 rows selected.
Becomes this data in a DENSE, MONTHLY INCEPTION TO DATE SUMMARY FACT:
FILLED_CALENDAR_MONT CLAIM_ID ITD_AMOUNT_PAID -------------------- ---------- --------------- 01-jan-2007 00:00:00 1 200 01-feb-2007 00:00:00 1 300 01-mar-2007 00:00:00 1 300 01-apr-2007 00:00:00 1 300 01-may-2007 00:00:00 1 300 01-jun-2007 00:00:00 1 400 6 rows selected.
To get from point A to point B, the data had to be summed, then analytically summed, then filled in with a densification step. Here is the statement that takes the first dataset and produces the second dataset.
with simple_sum as ( select trunc(calendar_date,'mm') calendar_month_date ,claim_id ,sum(amount_paid) amount_paid from temp_claim_history group by trunc(calendar_date,'mm') ,claim_id ) ,end_month as ( select trunc(sysdate,'mm') calendar_month_date from dual ) ,itd_sum as ( Select simple_sum.calendar_month_date ,simple_sum.claim_id ,simple_sum.amount_paid ,sum(simple_sum.amount_paid) over ( partition by simple_sum.claim_id order by simple_sum.calendar_month_date ) itd_amount_paid ,nvl( lead(simple_sum.calendar_month_date) over ( partition by simple_sum.claim_id order by simple_sum.calendar_month_date ) ,add_months(end_month.calendar_month_date,1) ) next_calendar_month_date from simple_sum ,end_month ) ,month_list as ( select add_months( trunc( to_date('2010' ,'rrrr' ) ,'rrrr' ) ,-rownum+1 ) some_month_date from dual connect by level <= 240 ) ,dense_itd_summary as ( select itd_sum.* ,month_list.some_month_date filled_calendar_month_date from itd_sum ,month_list where month_list.some_month_date >= itd_sum.calendar_month_date and month_list.some_month_date < itd_sum.next_calendar_month_date ) select filled_calendar_month_date ,claim_id ,itd_amount_paid from dense_itd_summary order by claim_id ,filled_calendar_month_date /
OK, this might seem a little intimidating at first but remember we said that real power of the WITH clause was in its ability to let you expose the thought process underneath your SELECT. Clearly this SELECT is making heavy use of the WITH clause. The WITH clause achieves this magic by letting you create your SELECT in steps.
We can think of the processing done by this SELECT as happening from top down. Each NAMED SELECT will transform the data that came before it in some way. All we need to do is look at the data output by each step, to see what is happening, and this is easily done by running the pieces of the SELECT one WITH clause at a time in an incremental fashion. You will find this makes it easy to debug your queries. So let us do that. Let us execute WITH clauses in pieces to see what they do.
Starting with this data:
SQL> select * 2 from temp_claim_history 3 / CALENDAR_DATE CLAIM_ID AMOUNT_PAID -------------------- ---------- ----------- 08-jan-2007 00:00:00 1 100 11-jan-2007 00:00:00 1 100 23-feb-2007 00:00:00 1 100 14-jun-2007 00:00:00 1 100 4 rows selected.
We do an ordinary GROUP BY month.
SQL> with 2 simple_sum as ( 3 select 4 trunc(calendar_date,'mm') calendar_month_date 5 ,claim_id 6 ,sum(amount_paid) amount_paid 7 from temp_claim_history 8 group by 9 trunc(calendar_date,'mm') 10 ,claim_id 11 ) 12 select * 13 from simple_sum 14 / CALENDAR_MONTH_DATE CLAIM_ID AMOUNT_PAID -------------------- ---------- ----------- 01-jan-2007 00:00:00 1 200 01-feb-2007 00:00:00 1 100 01-jun-2007 00:00:00 1 100 3 rows selected.
No rocket science there. All we did was group rows from the same month together. This creates a simple sum. We see next a WITH clause that grabs an EPOCH date. In this case the date represents the last month for which we want to create rows in our final DENSE SUMMARY. WE are going to need this in the step that follows.
SQL> with 2 end_month as ( 3 select trunc(sysdate,'mm') calendar_month_date 4 from dual 5 ) 6 select * 7 from end_month 8 / CALENDAR_MONTH_DATE -------------------- 01-jun-2007 00:00:00 1 row selected.
Next we roll up the simple sum into a running sum using Oracle Analytics. Notice we include on these rows a second analytic that looks to the next row in its group to see the next month with real data. We are going to use that in a minute.
SQL> with 2 simple_sum as ( 3 select 4 trunc(calendar_date,'mm') calendar_month_date 5 ,claim_id 6 ,sum(amount_paid) amount_paid 7 from temp_claim_history 8 group by 9 trunc(calendar_date,'mm') 10 ,claim_id 11 ) 12 ,end_month as ( 13 select trunc(sysdate,'mm') calendar_month_date 14 from dual 15 ) 16 ,itd_sum as ( 17 select 18 simple_sum.calendar_month_date 19 ,simple_sum.claim_id 20 ,simple_sum.amount_paid 21 ,sum(simple_sum.amount_paid) 22 over ( 23 partition by simple_sum.claim_id 24 order by simple_sum.calendar_month_date 25 ) itd_amount_paid 26 ,nvl( 27 lead(simple_sum.calendar_month_date) 28 over ( 29 partition by simple_sum.claim_id 30 order by simple_sum.calendar_month_date 31 ) 32 ,add_months(end_month.calendar_month_date,1) 33 ) next_calendar_month_date 34 from simple_sum 35 ,end_month 36 ) 37 select * 38 from itd_sum 39 / CALENDAR_MONTH_DATE CLAIM_ID AMOUNT_PAID ITD_AMOUNT_PAID NEXT_CALENDAR_MONTH_ -------------------- ---------- ----------- --------------- -------------------- 01-jan-2007 00:00:00 1 200 200 01-feb-2007 00:00:00 01-feb-2007 00:00:00 1 100 300 01-jun-2007 00:00:00 01-jun-2007 00:00:00 1 100 400 01-jul-2007 00:00:00 3 rows selected.
Yes, this ITD_AMOUNT_PAID is a running total, and the NEXT_CALENDAR_MONTH_DATE is the next date for which there is real data. Please notice that in the case of the last row, we defaulted the NEXT_CALENDAR_MONTH_DATE to be the end month + 1. Each last row in a group has no next row (doh!) and therefore no NEXT_CALENDAR_MONTH_DATE, so we took this default so that the date math we are going to do next does not deal with nulls.
In the end you recall we wanted to create a dense summary. Notice in the data above, there are no rows for MARCH, APRIL, or MAY. In a dense summary there would be. So we have to create one for each of these months. We can do this by copying the FEB row and changing its date. For example: if someone asked this question “what is the ITD_AMOUNT_PAID for march”, we would have to do a nearest neighbor lookup best fit low lookup (also known as the BOB BARKER LOOKUP) and show them FEB numbers, but with a MAR date. The reason for a DENSE summary is to skip the BOB BARKER LOOKUP by making sure there is always a row for the month requested. Thus we need to duplicate the FEB row for each month that is missing up to the next real row. This is what we call a fill. We are going to fill in missing periods with a copy of the BOB BARKER row of the missing period, but with the date that corresponds to the period we are filling in. In this case this means copying the FEB row to MAR, APR, MAY and tagging each copy with the appropriate date.
But to do this we first need a list of months so lets make a list of months.
SQL> with 2 month_list as ( 3 select 4 add_months(trunc(to_date('2010','rrrr'),'rrrr'),-rownum+1) some_month_date 5 from dual 6 connect by level <= 240 7 ) 8 select * 9 from month_list 10 / SOME_MONTH_DATE -------------------- 01-jan-2010 00:00:00 01-dec-2009 00:00:00 01-nov-2009 00:00:00 … 01-feb-1990 00:00:00 240 rows selected.
You can see we used the connect by trick to generate two hundred and forty rows with which using JAN of 2010 as an anchor, and a little date math, we can create a list of month dates. Now we can join to this list of dates to DENSIFY our data.
SQL> with 2 simple_sum as ( 3 select 4 trunc(calendar_date,'mm') calendar_month_date 5 ,claim_id 6 ,sum(amount_paid) amount_paid 7 from temp_claim_history 8 group by 9 trunc(calendar_date,'mm') 10 ,claim_id 11 ) 12 ,end_month as ( 13 select trunc(sysdate,'mm') calendar_month_date 14 from dual 15 ) 16 ,itd_sum as ( 17 select 18 simple_sum.calendar_month_date 19 ,simple_sum.claim_id 20 ,simple_sum.amount_paid 21 ,sum(simple_sum.amount_paid) 22 over ( 23 partition by simple_sum.claim_id 24 order by simple_sum.calendar_month_date 25 ) itd_amount_paid 26 ,nvl( 27 lead(simple_sum.calendar_month_date) 28 over ( 29 partition by simple_sum.claim_id 30 order by simple_sum.calendar_month_date 31 ) 32 ,add_months(end_month.calendar_month_date,1) 33 ) next_calendar_month_date 34 from simple_sum 35 ,end_month 36 ) 37 ,month_list as ( 38 select 39 add_months(trunc(to_date('2010','rrrr'),'rrrr'),-rownum+1) some_month_date 40 from dual 41 connect by level <= 240 42 ) 43 ,dense_itd_summary as ( 44 select 45 itd_sum.* 46 ,month_list.some_month_date filled_calendar_month_date 47 from itd_sum 48 ,month_list 49 where month_list.some_month_date >= itd_sum.calendar_month_date 50 and month_list.some_month_date < itd_sum.next_calendar_month_date 51 ) 52 select 53 filled_calendar_month_date 54 ,claim_id 55 ,itd_amount_paid 56 from dense_itd_summary 57 order by 58 claim_id 59 ,filled_calendar_month_date 60 / FILLED_CALENDAR_MONT CLAIM_ID ITD_AMOUNT_PAID -------------------- ---------- --------------- 01-jan-2007 00:00:00 1 200 01-feb-2007 00:00:00 1 300 01-mar-2007 00:00:00 1 300 01-apr-2007 00:00:00 1 300 01-may-2007 00:00:00 1 300 01-jun-2007 00:00:00 1 400 6 rows selected.
And there you have it. A somewhat sophisticated process. But using the WITH clause, we can easily break it down into simple steps, each of which is a data oriented transform. How wonderful it is that we can get away from a row by row (or should I say slow by slow) perspective on manipulating data to get a result like this, and instead think of it all in terms of sets of rows, and how we add to or change one row set in order to create the next.
You will find a process like this is in amazingly fast. After a little practice you will relish using the WITH clause, and eventually I hope find the way you write code changing to exploit a data set oriented approach to transforming information. Thanks WITH clause.
Kevin
- Kevin Meade's blog
- Log in to post comments
Comments
Partitioned Outer Joins
Completely off-topic - I know - but another (less intuitive?) way to obtain a dense fact is to use Partitioned Outer Joins
Thanks, Kevin! Now I have a better idea of With Clause!
We have been working with the WITH clause ... but I didn't really understand it, was just following the pattern we had found and developed ... I REALLY REALLY liked your "See Spot! See Spot run!" example ... gave me an "Aha!" moment.
Thanks for the time it took to type it ...
Suzanne
Good one!
Thanks Kevin :-)
Using DML Statements with "WITH" clause
Hi Kevin,
The article was very informative. However, I have a doubt regarding the utility of WITH clause with DML statements like UPDATE or INSERT?
For example, say I create a named query for a huge query that I have written.
WITH namedQuery
AS
(
SELECT row_number,a,b,c............ from table ta,tb,tc,td
)
Now, I have to join the namedQuery with the same dataset
i.e. select * from namedQuery, namedQuery n2 where
namedQuery.row_number=namedQuery.rownumber+1;
And then I want to insert the dataset created above into a table.
INSERT INTO newTable
select * from namedQuery, namedQuery n2 where
namedQuery.row_number=namedQuery.rownumber+1;
Unfortunately, this doesn't work and gives errors. Could you please help me with this? If WITH clause has the restriction, do we have any other way of doing it?
Thanks,
Pratyush
Using WITH in a DML statement.
Pratyush,
You don't specify your entire statement, but I would guess that it is:
WITH ...
INSERT into ...
WITH is an optional prologue to a SELECT, so it can't be used directly with INSERT, UPDATE, DELETE, etc. However, all of these can be written to use a SELECT in place of a table, and wherever a SELECT can appear, a WITH-SELECT can be used. If you get syntax errors, try wrapping it in another SELECT; i.e., SELECT * from (WITH... SELECT ...).
You should be able to write your statement as:
INSERT into newTable
SELECT *
FROM (
WITH namedQuery
AS
(
SELECT row_number,a,b,c............ from table ta,tb,tc,td
)
select * from namedQuery, namedQuery n2 where
namedQuery.row_number=namedQuery.rownumber+1
)
;
Russ
ORA-32035 and With clause
Hi Kevin,
Firstly, my congratulations for your excellent article. Very useful indeed.
We've been working with "with" clause for a long time and our queries are pretty big by now. Some of them go beyond one thousand lines. The problem is when we need to make small adjustments and we get this ugly ORA-32035 error. For example:
SQL> with
2 temp_t1 as (
3 select dummy c1 from dual
4 )
5 ,temp_t2 as (
6 select dummy c1 from dual
7 )
8 select *
9 from temp_t1
10 /
from temp_t1
*
ERROR at line 9:
ORA-32035: unreferenced query name defined in WITH clause
Would you have any piece of magic to debug enormous queries having WITH clause?
Thanks,
Eduardo Morelli
Hi morelli,
Change the statement "select * from temp_t1" to at least "select * from temp_t1,temp_t2".
i.e., You've created 2 queries with the "with cluase" temp_t1 & temp_t2
But you are referencin only the temp_t1 in the select statement not temp_t2.
Basic funda is you should be referencing all the views declared with the "with cluase".
Regards,
Vishwa.
Can this "with cluase" boost performance..?
Hi kevin,
By any chance, Can this "with cluase" boost performance.?
1.
with emp_t as
(select empno from emp)
select x from dual,emp_t
union
select x from dual,emp_t
2.
select x from dual,emp
union
select x from dual,emp
How does performance differs for the above statements?
Regards,
Vishwa
There should be no difference in performance
Oracle almost never excutes the query you give it.
Oracle takes the query you give it and trys every time to re-write. It looks for ways to simplify the query by removing reducnancies, and it looks for ways to add extra information to open up access paths, and it considers how to minimize intermediary rowset sizes as it goes, and lots of other stuff. What that means is this and all three of these queries and the same:
regardless of which variation you give to Oracke, Oracle will re-write the first two versions of the query into this last version.
With this understanding in mind, it should be clear that if Oracle did its re-write as we expect, the query plans and performance should be exaclty the same because in the end, all three queries will be executed the same because the all become that same query.
Hope this helps, Kevin
Can it work like this...?
Hi Kevin,
Example:
with XX_orders as
(select * from order_detail where supplier='XX')
select item_desc from item_master a,XX_orders b
where a.item=b.item
union
select item_desc from item_master_new a,XX_orders b
where a.item=b.item
In the above case,It makes sense to scan the order_detail only once and join the result set (storing it in temp table) with the other tables rather than scanning the table order_detail twice with each query it is referenced.
Doesn't it make sense?
Can with clause do this..? if no is there any way can this be done in oracle(by oracle itself or by developers)?
Many Thanks
Vishwa...!
Performance Boost
Actually, from a practical perspective, we've found that the use of the WITH clause can produce a significant performance boost. From what I've read, it relates to the fact that later versions of Oracle materialize (effectively, run once and then hold in memory) the subqueries identified in the WITH clause. If the queries are nestled securely in the FROM portion of the SQL there appears to be the possibility that they can be run multiple times. If there is any significant return time required for those subqueries, and for many of ours there is, that time can pile up massively over a large dataset. Simply moving the subqueries up to the WITH clause eliminates all of that reduplicative running and treats those subqueries as if they were standing tables. We've seen runtimes reduced on some of our more massive SQLs by orders of magnitude. Now granted, this is mostly anecdotal, as I've not run the updates through tracing or analyzing to examine exactly what Oracle is doing differently, but it is definitely doing something differently, and that something can be loads faster.
Thanks.
David H.