Example of Data Pivots in SQL (rows to columns and columns to rows)
My current client has the crew here doing a good deal of data pivoting in migrating data between several Oracle Systems. I figured to turn some heads in a code review by providing a solution for data pivots which is not based on "locally favored" traditional 3GL coding practices but instead employs what is to me the more natural way of doing it for an Oracle database. I was right, heads did turn. Since I now have to document it, I figured why not post it on OraFAQ once and then tell them its online, come here and read it. So this is an example of pivoting in SQL.
There are several other offerings on OraFAQ that discuss this topic so look around for them if you are interested. ASKTOM Home also has some good stuff so look there too. Additionally, there is a surprising paper hidden somewhere on the internet that offers up a moderately useful "generic" solution to data pivoting in an Oracle environment. It makes an almost brilliant use of certain Oracle features (was some combination of ANYDATA, REFCURSORS, and COLLECTIONS, and maybe ANALYTICS, I forget). But alas I cannot find it, so if anyone stumbles across it, please post a reply with the link so we can all enjoy it.
There are two ways to pivot data:
1) you convert multiple rows into a single row with repeating groups of columns.
2) you convert columns on a single row into multiple rows with a generic column and a rowtype discriminator.
These operations are essentially the inverse of each other.
For clarification I refer you to the following data representations below. Although there are several layouts for the data, the data itself is exactly the same. It is just presented differently as a result of zero, one, or multiple data pivots. We are going to walk through the code that does these pivots.
There are four different presentations of this data shown (2 rows (the starting layout for us), 1 row, 6 rows, 3 rows). The "2 rows" representation aligns with the table describe. The data is the same in all four representations. This is important to realize so I’ll say it again; The data is the same in all four layouts. We move back and forth between these data representations by doing the right set of pivot operations.
Now if you are a modeler (and I am), you might point out that the potential primary key of each layout of data is in fact different and thus there are in fact four different things. To which I might reply, SURE, but then again it is still the same data so maybe it ain’t four different things, but the same thing stuck in four different boxes. Ah well, not really relevant to our discussion, but I had to say it so I wouldn’t be chastised later for ignoring it.
We won’t discuss the wisdom of storing our data in one layout over another. We will just accept for the moment that collectively they make for a great demonstration pivoting in SQL. I will point out however that I have in fact seen this data stored in all four layouts in four different data systems (legacy COBOL system, OLTP system, Warehouse system, Traditional Reporting system). I’ll let you guess as to which system used what layout. The author of each system had to his/her mind a reasoning for selecting the particular layout, based on performance and ease of use. Ease of use was often driven by limitations of reporting tools. So don’t think any of these data layouts is invalid, or that any is inherently superior of inferior to another. They all work at the right time for the right reason.
Our data shows summaries of retail sales events. There are some points of interest to note.
1) These are summary rows by terminal, by day, hence there is a total amount, a total discount and a total tax.
2) This summary contains the same data twice, because we have physically stored it for two different currencies. In this example, the sale was made in US dollars (Original Currency), and a second row was created for Canadian currency (Base Currency) (our company is maybe Headquartered in Canada?).
SQL> desc retail_sale Name Null? Type ----------------------------------- -------- -------------- RETAIL_SALE_ID NOT NULL NUMBER TERMINAL_ID NOT NULL NUMBER SALE_DATE NOT NULL DATE CURRENCY_CATEGORY_CODE VARCHAR2(8) TTL_AMT NUMBER TTL_DIS NUMBER TTL_TAX NUMBER
This shows the data as two rows. Consider this the original format we are given.
SQL> select * 2 from retail_sale 3 / Currency RETAIL_SALE_ID TERMINAL_ID SALE_DATE Category TTL_AMT TTL_DIS TTL_TAX -------------- ----------- --------- -------- ------- ------- ------- 1 1.00 14-JUN-07 ORIGINAL 100.00 20.00 4.80 2 1.00 14-JUN-07 BASE 117.00 23.40 5.62 2 rows selected.
Using the proper Pivot(s), we could change the two rows into one row with multiple columns, letting column names distinguish what the data is.
TERMINAL_ID SALE_DATE BASE_TTL_AMT BASE_TTL_DIS BASE_TTL_TAX ORIGINAL_TTL_AMT ORIGINAL_TTL_DIS ORIGINAL_TTL_TAX ----------- --------- ------------ ------------ ------------ ---------------- ---------------- ---------------- 1.00 14-JUN-07 117.00 23.40 5.62 100.00 20.00 4.80 1 row selected.
Or using different Pivot(s), we could change the two rows into six rows, by employing a generic amount field, and adding a rowtype discriminator (Dollar Name in this example).
Currency Dollar RETAIL_SALE_ID TERMINAL_ID SALE_DATE Category Name AMT -------------- ----------- --------- -------- -------- ------- 1 1.00 14-JUN-07 ORIGINAL TTL_AMT 100.00 1 1.00 14-JUN-07 ORIGINAL TTL_DIS 20.00 1 1.00 14-JUN-07 ORIGINAL TTL_TAX 4.80 2 1.00 14-JUN-07 BASE TTL_AMT 117.00 2 1.00 14-JUN-07 BASE TTL_DIS 23.40 2 1.00 14-JUN-07 BASE TTL_TAX 5.62 6 rows selected.
Or using yet again different Pivot(s), we could turn our two rows into three rows. This one is most interesting as it involves two pivot operations. Hence we are turning rows into columns and then columns into rows (or was it columns into rows and then rows into columns? (or does the order of pivots even matter?)).
Dollar TERMINAL_ID SALE_DATE Name BASE_AMT ORIGINAL_AMT ----------- --------- -------- -------- ------------ 1.00 14-JUN-07 TTL_AMT 117.00 100.00 1.00 14-JUN-07 TTL_DIS 23.40 20.00 1.00 14-JUN-07 TTL_TAX 5.62 4.80 3 rows selected.
Here is the SQL for building the example:
1) turn rows into columns 2) turn columns into rows drop table retail_sale / create table retail_sale ( retail_sale_id number not null ,terminal_id number not null ,sale_date date not null ,currency_category_code varchar2(8) -- base, original ,ttl_amt number ,ttl_dis number ,ttl_tax number ) / create unique index retail_sale_pk on retail_sale (retail_sale_id) / create unique index retail_sale_uk1 on retail_sale (terminal_id,sale_date,currency_category_code) / alter table retail_sale add constraint retail_sale_pk primary key (retail_sale_id) add constraint retail_sale_uk1 unique (terminal_id,sale_date,currency_category_code) / insert into retail_sale values (1,1,trunc(sysdate),'ORIGINAL',100,20,4.80); insert into retail_sale values (2,1,trunc(sysdate),'BASE',100*1.17,20*1.17,round(4.8*1.17,2)); commit / set numformat 990.99 col retail_sale_id format 990 col currency_categorY_code format a8 head 'Currency|Category' col dollar_name format a8 head 'Dollar|Name' select * from retail_sale / select terminal_id ,sale_date ,case when currency_category_code = 'BASE' then ttl_amt end base_ttl_amt ,case when currency_category_code = 'BASE' then ttl_dis end base_ttl_dis ,case when currency_category_code = 'BASE' then ttl_tax end base_ttl_tax ,case when currency_category_code = 'ORIGINAL' then ttl_amt end original_ttl_amt ,case when currency_category_code = 'ORIGINAL' then ttl_dis end original_ttl_dis ,case when currency_category_code = 'ORIGINAL' then ttl_tax end original_ttl_tax from retail_sale / select terminal_id ,sale_date ,sum(case when currency_category_code = 'BASE' then ttl_amt end) base_ttl_amt ,sum(case when currency_category_code = 'BASE' then ttl_dis end) base_ttl_dis ,sum(case when currency_category_code = 'BASE' then ttl_tax end) base_ttl_tax ,sum(case when currency_category_code = 'ORIGINAL' then ttl_amt end) original_ttl_amt ,sum(case when currency_category_code = 'ORIGINAL' then ttl_dis end) original_ttl_dis ,sum(case when currency_category_code = 'ORIGINAL' then ttl_tax end) original_ttl_tax from retail_sale group by terminal_id ,sale_date / select retail_sale_id ,terminal_id ,sale_date ,currency_category_code ,case when dollar_type.rowno = 1 then 'TTL_AMT' when dollar_type.rowno = 2 then 'TTL_DIS' when dollar_type.rowno = 3 then 'TTL_TAX' end dollar_name ,case when dollar_type.rowno = 1 then ttl_amt when dollar_type.rowno = 2 then ttl_dis when dollar_type.rowno = 3 then ttl_tax end amt from retail_sale ,( select rownum rowno from dual connect by level <= 3 ) dollar_type order by 1,2,3,5 / select terminal_id ,sale_date ,case when dollar_type.rowno = 1 then 'TTL_AMT' when dollar_type.rowno = 2 then 'TTL_DIS' when dollar_type.rowno = 3 then 'TTL_TAX' end dollar_name ,case when dollar_type.rowno = 1 then base_ttl_amt when dollar_type.rowno = 2 then base_ttl_dis when dollar_type.rowno = 3 then base_ttl_tax end base_amt ,case when dollar_type.rowno = 1 then original_ttl_amt when dollar_type.rowno = 2 then original_ttl_dis when dollar_type.rowno = 3 then original_ttl_tax end usd_amt from ( select terminal_id ,sale_date ,sum(case when currency_category_code = 'BASE' then ttl_amt end) base_ttl_amt ,sum(case when currency_category_code = 'BASE' then ttl_dis end) base_ttl_dis ,sum(case when currency_category_code = 'BASE' then ttl_tax end) base_ttl_tax ,sum(case when currency_category_code = 'ORIGINAL' then ttl_amt end) original_ttl_amt ,sum(case when currency_category_code = 'ORIGINAL' then ttl_dis end) original_ttl_dis ,sum(case when currency_category_code = 'ORIGINAL' then ttl_tax end) original_ttl_tax from retail_sale group by terminal_id ,sale_date ) ,( select rownum rowno from dual connect by level <= 3 ) dollar_type order by 1,2,3 /
So lets walk the pivots shall we…
From Two rows to One row (a row to column pivot):
Here we convert our data from two rows with three columns and a row type discriminator (Currency Category), to one row with six columns where each column name distinguishes the identity of the data. Note that the Currency Category column is no longer necessary after the pivot and in deed does not even fit our data after the pivot, so we drop it. Note also that we are forced to drop the primary key of our original table because it too no longer maps to our data after the pivot since two rows have been combined into one.
Ah but there is a little hitch. You see our first pivot query generated two rows, with six columns not one row with six columns. Each of the rows had only a portion of the result columns filled in (the columns that came from the original row that generated this one). What we want is a single row that contains values filled in for all columns.
The most straight forward way to get a single row with filled in columns is to group by the correct keys, which is essentially everything else in the table but the pivoted data. In the case of a non-numeric field you can use max or min instead of sum.
SQL> select * 2 from retail_sale 3 / Currency RETAIL_SALE_ID TERMINAL_ID SALE_DATE Category TTL_AMT TTL_DIS TTL_TAX -------------- ----------- --------- -------- ------- ------- ------- 1 1.00 14-JUN-07 ORIGINAL 100.00 20.00 4.80 2 1.00 14-JUN-07 BASE 117.00 23.40 5.62 2 rows selected. SQL> select 2 terminal_id 3 ,sale_date 4 ,case when currency_category_code = 'BASE' then ttl_amt end base_ttl_amt 5 ,case when currency_category_code = 'BASE' then ttl_dis end base_ttl_dis 6 ,case when currency_category_code = 'BASE' then ttl_tax end base_ttl_tax 7 ,case when currency_category_code = 'ORIGINAL' then ttl_amt end original_ttl_amt 8 ,case when currency_category_code = 'ORIGINAL' then ttl_dis end original_ttl_dis 9 ,case when currency_category_code = 'ORIGINAL' then ttl_tax end original_ttl_tax 10 from retail_sale 11 / TERMINAL_ID SALE_DATE BASE_TTL_AMT BASE_TTL_DIS BASE_TTL_TAX ORIGINAL_TTL_AMT ORIGINAL_TTL_DIS ORIGINAL_TTL_TAX ----------- --------- ------------ ------------ ------------ ---------------- ---------------- ---------------- 1.00 14-JUN-07 100.00 20.00 4.80 1.00 14-JUN-07 117.00 23.40 5.62 2 rows selected. Oops, that looks like two rows, lets try again. SQL> select 2 terminal_id 3 ,sale_date 4 ,sum(case when currency_category_code = 'BASE' then ttl_amt end) base_ttl_amt 5 ,sum(case when currency_category_code = 'BASE' then ttl_dis end) base_ttl_dis 6 ,sum(case when currency_category_code = 'BASE' then ttl_tax end) base_ttl_tax 7 ,sum(case when currency_category_code = 'ORIGINAL' then ttl_amt end) original_ttl_amt 8 ,sum(case when currency_category_code = 'ORIGINAL' then ttl_dis end) original_ttl_dis 9 ,sum(case when currency_category_code = 'ORIGINAL' then ttl_tax end) original_ttl_tax 10 from retail_sale 11 group by 12 terminal_id 13 ,sale_date 14 / TERMINAL_ID SALE_DATE BASE_TTL_AMT BASE_TTL_DIS BASE_TTL_TAX ORIGINAL_TTL_AMT ORIGINAL_TTL_DIS ORIGINAL_TTL_TAX ----------- --------- ------------ ------------ ------------ ---------------- ---------------- ---------------- 1.00 14-JUN-07 117.00 23.40 5.62 100.00 20.00 4.80 1 row selected. SQL>
From Two rows to Six rows (a column to row pivot):
Here we do a pivot in the opposite direction as before. We take multiple columns with unique names and stuff them into a single column of a more generic nature and then add a row type discriminator to tell use where each row came from (Dollar Name in this case). I used the simplistic method of making the row type discriminator values be the original column names. If you are following the modeling sub-thread then you recongnize that this row type discriminator become part of our unique key to the results rowset.
You will notice please that we used the generic row generator technique of connect by level against dual (very handy). This I believe is only good for 9i and above. We generate three rows this way, three because we have three columns to pivot and each column will turn into a row after our pivot. Basically this means we copy each row from our source data three times, and then depending upon which version of the row we have (1 or 2 or 3), we pick off a specific set of columns to put into our generic column. In our case we mapped 1 to TTL_AMT, 2 to TTL_DIS and 3 to TTL_TAX. You should always double check your work to make sure the row type discriminator you use is the correct label for the data value you have picked off.
SQL> select * 2 from retail_sale 3 / Currency RETAIL_SALE_ID TERMINAL_ID SALE_DATE Category TTL_AMT TTL_DIS TTL_TAX -------------- ----------- --------- -------- ------- ------- ------- 1 1.00 14-JUN-07 ORIGINAL 100.00 20.00 4.80 2 1.00 14-JUN-07 BASE 117.00 23.40 5.62 2 rows selected. SQL> select retail_sale_id 2 ,terminal_id 3 ,sale_date 4 ,currency_category_code 5 ,case 6 when dollar_type.rowno = 1 then 'TTL_AMT' 7 when dollar_type.rowno = 2 then 'TTL_DIS' 8 when dollar_type.rowno = 3 then 'TTL_TAX' 9 end dollar_name 10 ,case 11 when dollar_type.rowno = 1 then ttl_amt 12 when dollar_type.rowno = 2 then ttl_dis 13 when dollar_type.rowno = 3 then ttl_tax 14 end amt 15 from retail_sale 16 ,( 17 select rownum rowno 18 from dual 19 connect by level <= 3 20 ) dollar_type 21 order by 1,2,3,5 22 / Currency Dollar RETAIL_SALE_ID TERMINAL_ID SALE_DATE Category Name AMT -------------- ----------- --------- -------- -------- ------- 1 1.00 14-JUN-07 ORIGINAL TTL_AMT 100.00 1 1.00 14-JUN-07 ORIGINAL TTL_DIS 20.00 1 1.00 14-JUN-07 ORIGINAL TTL_TAX 4.80 2 1.00 14-JUN-07 BASE TTL_AMT 117.00 2 1.00 14-JUN-07 BASE TTL_DIS 23.40 2 1.00 14-JUN-07 BASE TTL_TAX 5.62 6 rows selected.
Lastly Two rows into Three rows (a double pivot example):
You will notice that there is really nothing new here. We just did both pivot operations. First we coded the sql for one pivot, and then we embedded it into the sql for then next pivot. So, the first pivot turns two rows into one row removing a row type discriminator naturally present in the data. Then the second pivot turns one row into three rows, adding back a new row type discriminator to distinguish rows with. Can you tell me what the unique key to our data is after these pivots?
Some might ask if the order of operations is important. For example, can we do the second pivot shown here first and the first second? Answer is sure you can. Makes no difference. The end result will be the same. I would suggest to you that you write this code as an exercise. It will ensure you understand what you have just read. Depending upon how smart Oracle is though, there might be a performance difference. Maybe someone could look into that for us and give us a test case to work with.
SQL> select * 2 from retail_sale 3 / Currency RETAIL_SALE_ID TERMINAL_ID SALE_DATE Category TTL_AMT TTL_DIS TTL_TAX -------------- ----------- --------- -------- ------- ------- ------- 1 1.00 14-JUN-07 ORIGINAL 100.00 20.00 4.80 2 1.00 14-JUN-07 BASE 117.00 23.40 5.62 2 rows selected. select terminal_id ,sale_date ,case when dollar_type.rowno = 1 then 'TTL_AMT' when dollar_type.rowno = 2 then 'TTL_DIS' when dollar_type.rowno = 3 then 'TTL_TAX' end dollar_name ,case when dollar_type.rowno = 1 then base_ttl_amt when dollar_type.rowno = 2 then base_ttl_dis when dollar_type.rowno = 3 then base_ttl_tax end base_amt ,case when dollar_type.rowno = 1 then original_ttl_amt when dollar_type.rowno = 2 then original_ttl_dis when dollar_type.rowno = 3 then original_ttl_tax end original_amt from ( select terminal_id ,sale_date ,sum(case when currency_category_code = 'BASE' then ttl_amt end) base_ttl_amt ,sum(case when currency_category_code = 'BASE' then ttl_dis end) base_ttl_dis ,sum(case when currency_category_code = 'BASE' then ttl_tax end) base_ttl_tax ,sum(case when currency_category_code = 'ORIGINAL' then ttl_amt end) original_ttl_amt ,sum(case when currency_category_code = 'ORIGINAL' then ttl_dis end) original_ttl_dis ,sum(case when currency_category_code = 'ORIGINAL' then ttl_tax end) original_ttl_tax from retail_sale group by terminal_id ,sale_date ) ,( select rownum rowno from dual connect by level <= 3 ) dollar_type order by 1,2,3 / Dollar TERMINAL_ID SALE_DATE Name BASE_AMT ORIGINAL_AMT ----------- --------- -------- -------- ------------ 1.00 14-JUN-07 TTL_AMT 117.00 100.00 1.00 14-JUN-07 TTL_DIS 23.40 20.00 1.00 14-JUN-07 TTL_TAX 5.62 4.80 3 rows selected.
There you have it. This is the fastest way I have found to do these kinds of pivots inside Oracle. No combination of PL/SQL (slow by slow as Tom of ASKTOM Home fame would say), nor objects is any better, at least not that I have seen. I thought I read somewhere of analytics maybe being another way to do pivots but I have not figured that one out yet. If anyone knows this, please post an example.
So these techniques are pretty generic in nature. You use CASE and GROUP BY to pivot from rows to columns. You use CASE and ROW DUPLICATION to pivot from columns to rows. You subtract or add a rowtype discriminator in the process as needed, and also alter your primary and/or unique keys at the same time.
Kevin
- Kevin Meade's blog
- Log in to post comments
Comments
Interresting!
Very good stuff indeed... Mostly useful for reporting and re-engineering of data. I don't think it's known enough that it is possible to do that kind of processing in SQL. I have done that before using techniques found in the book Expert One-on-One Oracle by Thomas Kyte (I can't take the credit sadly...). He uses the row_count() stuff I think. It can be done faster or slower depending on the flexibility/generality you need. If you process the data in something like java or any other 3GL language, I would guess that it would be faster and more efficient to do it at that point (my two cents).
Java?
Just thought about embedding Java code in Oracle. Never used that though... Anyone?
Yes thanks, Tom's Blog is an interesting place to see
Here is a link to another kind of data pivot Tom talks about. This is a rows to column pivot but it is a data pivot with pivot results all going to one column. Some of you may recall the STRAGG function. It takes all the values of a column for a group of rows and strings them together in a delimlited list. Here is Tom's new way of doing it using the SYS_CONNECT_BY_PATH() that came with 8i.
Essentially, for a group or rows, create a delimited string of all values in a specific column and then outoupt one row for the group with a column that contains the delimited list.
There are of course limits to this (final string size and all), but it is an interesting read non-the-less. It is after all Tom Kyte.
Check it out.
http://tkyte.blogspot.com/2007/03/stringing-them-up.html
Wondering if this was the article you meant ?
I have found this article on creating pivot tables/views using analytics
http://www.akadia.com/services/ora_analytic_functions.html
Best regards
Carl Bruhn
Are these the articles you meant?
I found the following article which looks like the one you referred to - it describes how to pivot without hard coding column values in the query. http://technology.amis.nl/blog/?p=1207
That one in turn refers to http://technology.amis.nl/blog/?p=1197
Regards,
Paul.
Yes Paul, these are the articles, thanks
thanks also to Carl and JF for their comments and links too.
Kevin
Query
how can i achieve vice verse result set
here products can be n number