Oracle FAQ | Your Portal to the Oracle Knowledge Grid |
Home -> Community -> Usenet -> c.d.o.misc -> Re: Newbie Query Question
"JES" <nausadge_at_yahoo.com> wrote in message
news:xu1ec.14590$mP1.6039_at_newssvr22.news.prodigy.com...
> I have a table that is "long" and I must transform the table into a "wide"
> table. For example, the long table is DATE, DATAVALUE, DATANAME (where
> there is one record for each measured DATAVALUE and where DATANAME comes
> from pre-defined list of names) and I need to transform this table into
> DATE, DATANAME1, DATANAME2, DATANAME3, ETC, where I get one record for
each
> unique DATE and where DATAVALUE from the original table is assigned to the
> corresponding DATANAME column in the "wide" table.
>
> I have done this two ways, and one way is much slower in Oracle than the
> other. Why? Any suggestions for another method? Both methods seem to
run
> quickly in Access...
>
> Method 1. Something like the following...
>
> SELECT DATE AS DATE,
> (SELECT DATAVALUE
> FROM tbl1_tmp
> WHERE tbl1.DATE = tbl1_tmp.DATE AND
> WHERE DATANAME = 'DATANAME1') AS DATANAME1,
> (SELECT DATAVALUE
> FROM tbl1_tmp
> WHERE tbl1.DATE = tbl1_tmp.DATE AND
> DATANAME = 'DATANAME2') AS DATANAME2,
> (SELECT DATAVALUE
> FROM tbl1_tmp
> WHERE tbl1.DATE = tbl1_tmp.DATE AND
> WHERE DATANAME = 'DATANAME3') AS DATANAME3
>
> FROM tbl1
>
> Method 2. And something like the following
>
> SELECT DATE AS DATE,
> SELECT tbl1_tmp1.DATAVALUE AS DATANAME1
> SELECT tbl1_tmp2.DATAVALUE AS DATANAME2
> SELECT tbl1_tmp3.DATAVALUE AS DATANAME3
>
> FROM
> (SELECT DATAVALUE
> FROM tbl1_tmp
> WHERE tbl1.DATE = tbl1_tmp.DATE AND
> WHERE DATANAME = 'DATANAME1') tbl1_tmp1,
>
> (SELECT DATAVALUE
> FROM tbl1_tmp
> WHERE tbl1.DATE = tbl1_tmp.DATE AND
> DATANAME = 'DATANAME2') tbl1_tmp2,
>
> (SELECT DATAVALUE
> FROM tbl1_tmp
> WHERE tbl1.DATE = tbl1_tmp.DATE AND
> WHERE DATANAME = 'DATANAME3') tbl1_tmp3,
>
> tbl1
>
>
> Thanks,
>
> JES
>
>
>
JES, Method #1 has a couple of syntax errors which were easily fixed. However, method #2 is not valid SQL because you are mixing a correlated subquery with a Cartesian product.
In summary, the biggest improvement for method #1 came from putting a primary key constraint on the TBL1_TMP table with a reduction in elapsed time from 349.26 seconds to 2.46 seconds (a reduction of 99.55%). The best time I could get with the test data was 0.82 seconds using a pivot query on a heap table.
The technique you need to master is that of pivot query. For an explanation, see "Pivot Query" on pp.576 to 582 of "Expert One-on-One Oracle" by Thomas Kyte (A-Press:2003). You could also search http://asktom.oracle.com for "pivot query".
These tests were run under 10.1.0.2 on WinXP Pro.
(1) Creation of Test Data
Let's create some test data:
CREATE TABLE tbl1_tmp AS
SELECT
TRUNC( SYSDATE ) + FLOOR( rownum / 3 ) AS datadate, DECODE( MOD( rownum, 3), 0, 'DATANAME1', 1, 'DATANAME2', 2, 'DATANAME3' ) AS dataname, rownum AS datavalue FROM all_objects
I could not call the first column, 'DATE', because it is a data type. The
table structure is:
SQL> desc tbl1_tmp
Name Null? Type ------------------- -------- ------------------ DATADATE DATE DATANAME VARCHAR2(9) DATAVALUE NUMBER
The date table (TBL1) was created as follows: CREATE TABLE tbl1 AS SELECT DISTINCT datadate FROM tbl1_tmp;
I collected statistics for the tables:
SQL> exec dbms_stats.gather_table_stats( USER, 'TBL1' )
SQL> exec dbms_stats.gather_table_stats( USER, 'TBL1_TMP' )
To get the time taken for a query,
SQL> SET TIMING ON
I turned SQL tracing on with:
ALTER SESSION SET timed_statistics=true;
ALTER SESSION SET sql_trace=true;
(2) Original Query
I then corrected some errors in your first query to produce:
SELECT datadate,
(SELECT DATAVALUE
FROM tbl1_tmp
WHERE tbl1.datadate = tbl1_tmp.datadate AND
DATANAME = 'DATANAME1') AS DATANAME1,
(SELECT DATAVALUE
FROM tbl1_tmp
WHERE tbl1.datadate = tbl1_tmp.datadate AND
DATANAME = 'DATANAME2') AS DATANAME2,
(SELECT DATAVALUE
FROM tbl1_tmp
WHERE tbl1.datadate = tbl1_tmp.datadate AND
DATANAME = 'DATANAME3') AS DATANAME3
FROM tbl1
;
The results were as follows:
13035 rows selected.
Elapsed: 00:05:49.26
The TKPROF output shows:
call count cpu elapsed disk query current
rows
------- ------ -------- ---------- ---------- ---------- ---------- ------
Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: 73
Rows Row Source Operation
------- --------------------------------------------------- 13034 TABLE ACCESS FULL TBL1_TMP (cr=2020425 pr=0 pw=0 time=115302886 us)13034 TABLE ACCESS FULL TBL1_TMP (cr=2020425 pr=0 pw=0 time=115816431 us) 13034 TABLE ACCESS FULL TBL1_TMP (cr=2020425 pr=0 pw=0 time=115808387 us) 13035 TABLE ACCESS FULL TBL1 (cr=894 pr=0 pw=0 time=91381 us)
The main point to note is that we are doing in excess of 6 million logical I/O's to process this query because each correlated subquery requires a table scan.
(3) Original Query with PK
I then added a primary key constraint (TBL1_TMP_PK) for DATADATE and
DATANAME because the nature of the first query assumes that the correlated
subqueries return a single value:
ALTER TABLE tbl1_tmp
ADD CONSTRAINT tbl1_tmp_pk
PRIMARY KEY (datadate, dataname)
;
I collected statistics for the index:
SQL> exec dbms_stats.gather_index_stats( USER, 'TBL1_TMP_PK' )
The query was rerun after statistics were collected. The results were as follows:
13035 rows selected.
Elapsed: 00:00:02.46
The TKPROF output showed:
call count cpu elapsed disk query current
rows
------- ------ -------- ---------- ---------- ---------- ---------- ------
Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: 73
Rows Row Source Operation
------- ---------------------------------------------------13034 TABLE ACCESS BY INDEX ROWID TBL1_TMP (cr=26939 pr=52 pw=0 time=322452 us)
Now the logical I/O's have dropped from 6 million down to 81,711 and achieved a reduction in elapsed time of 99.6%.
(4) Original Query with IOT
In an attempt to improve things even further, I created an index organised
table (IOT):
CREATE TABLE tbl1_tmp_iot
(
datadate DATE, dataname VARCHAR2(9), datavalue NUMBER, CONSTRAINT tbl1_tmp_iot_pk PRIMARY KEY ( datadate, dataname ))
I populated the table as follows:
INSERT INTO tbl1_tmp_iot SELECT * FROM tbl1_tmp;
COMMIT;
I collected the statistics:
SQL> exec dbms_stats.gather_index_stats( USER, 'TBL1_TMP_IOT_PK' )
SELECT datadate,
(SELECT DATAVALUE
FROM tbl1_tmp_iot
WHERE tbl1.datadate = tbl1_tmp_iot.datadate AND
DATANAME = 'DATANAME1') AS DATANAME1,
(SELECT DATAVALUE
FROM tbl1_tmp_iot
WHERE tbl1.datadate = tbl1_tmp_iot.datadate AND
DATANAME = 'DATANAME2') AS DATANAME2,
(SELECT DATAVALUE
FROM tbl1_tmp_iot
WHERE tbl1.datadate = tbl1_tmp_iot.datadate AND
DATANAME = 'DATANAME3') AS DATANAME3
FROM tbl1
;
The elapsed time is 00:00:01.29 seconds. The TKPROF output is:
call count cpu elapsed disk query current
rows
------- ------ -------- ---------- ---------- ---------- ---------- ------
Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: 73
Rows Row Source Operation
------- ---------------------------------------------------13034 INDEX UNIQUE SCAN TBL1_TMP_IOT_PK (cr=13905 pr=0 pw=0 time=167930 us)(object id 54041)
Now we have almost halfed the number of logical I/O's by eliminating the table lookup for the value of the DATAVALUE column.
(5) Pivot Query with PK
I took a modified version of Thomas Kyte's query from p.577 to produce the
following query:
SELECT
datadate, MAX( DECODE( dataname, 'DATANAME1', datavalue ) ) AS dataname1, MAX( DECODE( dataname, 'DATANAME2', datavalue ) ) AS dataname2, MAX( DECODE( dataname, 'DATANAME3', datavalue ) ) AS dataname3 FROM tbl1_tmp GROUP BY datadate
The elapsed time is 00:00:00.82 seconds and the TKPROF output shows:
call count cpu elapsed disk query current
rows
------- ------ -------- ---------- ---------- ---------- ---------- ------
Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: 73
Rows Row Source Operation
------- ---------------------------------------------------13035 SORT GROUP BY (cr=155 pr=110 pw=110 time=278318 us) 39102 TABLE ACCESS FULL TBL1_TMP (cr=155 pr=0 pw=0 time=78327 us)
(6) Pivot Query with IOT
I used the above pivot query on the IOT: SELECT
datadate, MAX( DECODE( dataname, 'DATANAME1', datavalue ) ) AS dataname1, MAX( DECODE( dataname, 'DATANAME2', datavalue ) ) AS dataname2, MAX( DECODE( dataname, 'DATANAME3', datavalue ) ) AS dataname3 FROM tbl1_tmp_iot GROUP BY datadate
The elapsed time is 00:00:01.62 seconds, and the TKPROF output shows:
call count cpu elapsed disk query current
rows
------- ------ -------- ---------- ---------- ---------- ---------- ------
Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: 73
Rows Row Source Operation
------- ---------------------------------------------------13035 SORT GROUP BY NOSORT (cr=1013 pr=0 pw=0 time=183040 us) 39102 INDEX FULL SCAN TBL1_TMP_IOT_PK (cr=1013 pr=0 pw=0 time=117988 us)(object id 54041)
(7) Overview of Pivot Query
The key to pivot queries is the use of the comibination of the MAX and DECODE functions with the GROUP BY clause.
To see how the process of pivoting works, lets look at the intermediate step (without the MAX function and GROUP BY clause): SELECT
datadate, DECODE( dataname, 'DATANAME1', datavalue ) AS dataname1, DECODE( dataname, 'DATANAME2', datavalue ) AS dataname2, DECODE( dataname, 'DATANAME3', datavalue ) AS dataname3 FROM tbl1_tmp WHERE rownum <= 5
DATADATE DATANAME1 DATANAME2 DATANAME3
--------- ---------- ---------- ---------- 11-APR-04 1 11-APR-04 2 12-APR-04 3 12-APR-04 4 12-APR-04 5
You will note that rows #3, #4, and #5 have placed the value of the DATAVALUE column into different columns depending upon the value of the DATANAME column. The blank entries are really NULLs.
To condense these groups of three (3) rows into a single row requires the use of the MAX function in conjunction with the GROUP BY clause.
Douglas Hawthorne Received on Mon Apr 12 2004 - 01:38:55 CDT