Home » RDBMS Server » Performance Tuning » Is there a better way to rewrite this query? (Oracle 11g , Linux)
Is there a better way to rewrite this query? [message #474782] |
Wed, 08 September 2010 18:02 |
kvignes1
Messages: 33 Registered: September 2007
|
Member |
|
|
SELECT f."PRODUCT_WID", f."PLANT_LOC_WID", f."STORAGE_LOC_WID",
f."MVMNT_TYPE_ID", f."OBJ_LST_NUM", f."MATERIAL_DOC_NUM",
f."MAT_DOC_YEAR", f."ITEM_NUM", f."DOC_TYPE", f."DB_CR_IND",
f."XACT_DT_WID", f."POSTED_ON_DT_WID", f."SYSTEM_DT_WID",
f."VENDOR_NUM", f."USER_VZID", f."TECH_VZID", f."BATCH_NUM",
f."OBJ_LST_CNTR", f."EQUIPMENT_NUM", f."SERIAL_NUM",
f."DATASOURCE_NUM_ID", f."ETL_PROC_WID", f."INTEGRATION_ID",
f."TENANT_ID"
FROM w_sample_f f,
(SELECT product_wid, serial_num, MAX (obj_lst_num) obj_lst_num
FROM w_sample_f
GROUP BY serial_num, product_wid) g
WHERE f.serial_num = g.serial_num
AND f.obj_lst_num = g.obj_lst_num
AND f.product_wid = g.product_wid
AND f.mvmnt_type_id NOT IN ('201', '701', '702', '502', '641', '962')
ORDER BY f.serial_num;
consider that above FACT table (w_sample_f) is 50GB in size. Its taking too long while querying this.
[Updated on: Wed, 08 September 2010 18:04] Report message to a moderator
|
|
|
Re: Is there a better way to rewrite this query? [message #474783 is a reply to message #474782] |
Wed, 08 September 2010 18:04 |
|
BlackSwan
Messages: 26766 Registered: January 2009 Location: SoCal
|
Senior Member |
|
|
Since NOBODY can optimize SQL just by looking at it, we need a few more details.
http://www.orafaq.com/forum/m/433888/136107/#msg_433888
does SQL below perform any better?
SELECT f."PRODUCT_WID",
f."PLANT_LOC_WID",
f."STORAGE_LOC_WID",
f."MVMNT_TYPE_ID",
f."OBJ_LST_NUM",
f."MATERIAL_DOC_NUM",
f."MAT_DOC_YEAR",
f."ITEM_NUM",
f."DOC_TYPE",
f."DB_CR_IND",
f."XACT_DT_WID",
f."POSTED_ON_DT_WID",
f."SYSTEM_DT_WID",
f."VENDOR_NUM",
f."USER_VZID",
f."TECH_VZID",
f."BATCH_NUM",
f."OBJ_LST_CNTR",
f."EQUIPMENT_NUM",
f."SERIAL_NUM",
f."DATASOURCE_NUM_ID",
f."ETL_PROC_WID",
f."INTEGRATION_ID",
f."TENANT_ID"
FROM w_sample_f f
WHERE f.mvmnt_type_id NOT IN ( '201', '701', '702', '502', '641', '962' )
AND ( f.product_wid, f.serial_num, f.obj_lst_num ) IN
(SELECT product_wid, serial_num, MAX (obj_lst_num) obj_lst_num
FROM w_sample_f
GROUP BY serial_num, product_wid)
ORDER BY f.serial_num;
I need to ask this question
Can/should the same NOT IN be used with the second SELECT?
[Updated on: Wed, 08 September 2010 20:58] Report message to a moderator
|
|
|
Re: Is there a better way to rewrite this query? [message #474795 is a reply to message #474782] |
Thu, 09 September 2010 00:55 |
Its_me_ved
Messages: 979 Registered: October 2009 Location: India
|
Senior Member |
|
|
(SELECT product_wid, serial_num, MAX (obj_lst_num) obj_lst_num
FROM w_sample_f
GROUP BY serial_num, product_wid) g
Try to rewrite the sql replacing above.
May be you can think of creating a function that will return the max of obj_lst_num based on serial_num, product_wid
And rewrite something like..
where obj_lst_num= your function that returns the max value
A test case will help to solve your problem.
Regards
Ved
|
|
|
Re: Is there a better way to rewrite this query? [message #474831 is a reply to message #474795] |
Thu, 09 September 2010 04:17 |
cookiemonster
Messages: 13958 Registered: September 2008 Location: Rainy Manchester
|
Senior Member |
|
|
If you mean a custom function that would almost certainly be less performant than what the OP has now.
If you mean using analytics, why didn't you say so:
SELECT f."PRODUCT_WID", f."PLANT_LOC_WID", f."STORAGE_LOC_WID",
f."MVMNT_TYPE_ID", f."OBJ_LST_NUM", f."MATERIAL_DOC_NUM",
f."MAT_DOC_YEAR", f."ITEM_NUM", f."DOC_TYPE", f."DB_CR_IND",
f."XACT_DT_WID", f."POSTED_ON_DT_WID", f."SYSTEM_DT_WID",
f."VENDOR_NUM", f."USER_VZID", f."TECH_VZID", f."BATCH_NUM",
f."OBJ_LST_CNTR", f."EQUIPMENT_NUM", f."SERIAL_NUM",
f."DATASOURCE_NUM_ID", f."ETL_PROC_WID", f."INTEGRATION_ID",
f."TENANT_ID",
MAX (obj_lst_num) OVER PARTITION BY (serial_num, product_wid) max_obj_lst_num
FROM w_sample_f f
WHERE max_obj_lst_num = obj_lst_num
AND f.mvmnt_type_id NOT IN ('201', '701', '702', '502', '641', '962')
ORDER BY f.serial_num;
That should be pretty fast if you have an index on serial_num, product_wid, obj_lst_num and possibly mvmnt_type_id.
|
|
|
|
Re: Is there a better way to rewrite this query? [message #474836 is a reply to message #474833] |
Thu, 09 September 2010 04:36 |
cookiemonster
Messages: 13958 Registered: September 2008 Location: Rainy Manchester
|
Senior Member |
|
|
Point.
SELECT <COLUMNS....>
FROM (SELECT f."PRODUCT_WID", f."PLANT_LOC_WID", f."STORAGE_LOC_WID",
f."MVMNT_TYPE_ID", f."OBJ_LST_NUM", f."MATERIAL_DOC_NUM",
f."MAT_DOC_YEAR", f."ITEM_NUM", f."DOC_TYPE", f."DB_CR_IND",
f."XACT_DT_WID", f."POSTED_ON_DT_WID", f."SYSTEM_DT_WID",
f."VENDOR_NUM", f."USER_VZID", f."TECH_VZID", f."BATCH_NUM",
f."OBJ_LST_CNTR", f."EQUIPMENT_NUM", f."SERIAL_NUM",
f."DATASOURCE_NUM_ID", f."ETL_PROC_WID", f."INTEGRATION_ID",
f."TENANT_ID",
MAX (obj_lst_num) OVER PARTITION BY (serial_num, product_wid) max_obj_lst_num
FROM w_sample_f f
WHERE f.mvmnt_type_id NOT IN ('201', '701', '702', '502', '641', '962')
)
WHERE max_obj_lst_num = obj_lst_num
ORDER BY f.serial_num;
|
|
|
|
Re: Is there a better way to rewrite this query? [message #474885 is a reply to message #474837] |
Thu, 09 September 2010 08:45 |
cookiemonster
Messages: 13958 Registered: September 2008 Location: Rainy Manchester
|
Senior Member |
|
|
Yes it will throw that error but not where you think. the table alisas on the order by is a problem and I've got a bracket in the wrong place in the analytic. Fixed version:
SELECT <COLUMNS....>
FROM (SELECT f."PRODUCT_WID", f."PLANT_LOC_WID", f."STORAGE_LOC_WID",
f."MVMNT_TYPE_ID", f."OBJ_LST_NUM", f."MATERIAL_DOC_NUM",
f."MAT_DOC_YEAR", f."ITEM_NUM", f."DOC_TYPE", f."DB_CR_IND",
f."XACT_DT_WID", f."POSTED_ON_DT_WID", f."SYSTEM_DT_WID",
f."VENDOR_NUM", f."USER_VZID", f."TECH_VZID", f."BATCH_NUM",
f."OBJ_LST_CNTR", f."EQUIPMENT_NUM", f."SERIAL_NUM",
f."DATASOURCE_NUM_ID", f."ETL_PROC_WID", f."INTEGRATION_ID",
f."TENANT_ID",
MAX (obj_lst_num) OVER (PARTITION BY serial_num, product_wid) max_obj_lst_num
FROM w_sample_f f
WHERE f.mvmnt_type_id NOT IN ('201', '701', '702', '502', '641', '962')
)
WHERE max_obj_lst_num = obj_lst_num
ORDER BY serial_num;
And if you think there's no noticable difference between a custom function and the analytic approach then you've never compared the difference.
Let's see:
Create a table with sample data and index it:
SQL> CREATE TABLE analytic_test AS (SELECT MOD(ROWNUM, 100) serial_num, MOD(ROWNUM, 100) + 1 product_wid,
2 SYSDATE + (ROWNUM/24) obj_lst_num, ROWNUM some_other_data FROM dual CONNECT BY LEVEL < 1000000);
Table created.
SQL> create index analytic_test1 on analytic_test(serial_num, product_wid, obj_lst_num);
Index created.
Lets see how long it takes using the OPs approach
SQL> set serveroutput on
SQL> DECLARE
2 l_start_time DATE;
3
4 BEGIN
5
6
7 FOR n IN 1..10 LOOP
8
9 l_start_time := SYSDATE;
10
11 FOR rec IN (SELECT f.PRODUCT_WID, f.OBJ_LST_NUM, f.SERIAL_NUM, f.some_other_data
12 FROM analytic_test f,
13 (SELECT product_wid, serial_num, MAX (obj_lst_num) obj_lst_num
14 FROM analytic_test
15 GROUP BY serial_num, product_wid) g
16 WHERE f.serial_num = g.serial_num
17 AND f.obj_lst_num = g.obj_lst_num
18 AND f.product_wid = g.product_wid) LOOP
19
20 NULL;
21
22 END LOOP;
23
24 dbms_output.put_line('Run '||n||' time is '||(SYSDATE - l_start_time)*24*60*60||' seconds');
25
26 END LOOP;
27
28 END;
29 /
Run 1 time is 2 seconds
Run 2 time is 2 seconds
Run 3 time is 2 seconds
Run 4 time is 2 seconds
Run 5 time is 2 seconds
Run 6 time is 2 seconds
Run 7 time is 2 seconds
Run 8 time is 2 seconds
Run 9 time is 2 seconds
Run 10 time is 2 seconds
PL/SQL procedure successfully completed.
SQL>
Now let's create a custom function, presume you meant something like this:
SQL> CREATE OR REPLACE FUNCTION get_max(p_product_wid analytic_test.product_wid%TYPE,
2 p_serial_num analytic_test.serial_num%TYPE)
3 RETURN analytic_test.obj_lst_num%TYPE IS
4
5 l_ret analytic_test.obj_lst_num%TYPE;
6
7 BEGIN
8
9 SELECT MAX(obj_lst_num) INTO l_ret
10 FROM analytic_test
11 WHERE product_wid = p_product_wid
12 AND serial_num = p_serial_num;
13
14 RETURN l_ret;
15
16 END get_max;
17 /
Function created.
So let's time it for a query using your function:
SQL> DECLARE
2 l_start_time DATE;
3
4 BEGIN
5
6
7 FOR n IN 1..10 LOOP
8
9 l_start_time := SYSDATE;
10
11 FOR rec IN (SELECT f.PRODUCT_WID, f.OBJ_LST_NUM, f.SERIAL_NUM, f.some_other_data
12 FROM analytic_test f
13 WHERE obj_lst_num = get_max(product_wid, serial_num)) LOOP
14
15 NULL;
16
17 END LOOP;
18
19 dbms_output.put_line('Run '||n||' time is '||(SYSDATE - l_start_time)*24*60*60||' seconds');
20
21 END LOOP;
22
23 END;
24 /
Run 1 time is 117 seconds
Run 2 time is 116 seconds
Run 3 time is 116 seconds
Run 4 time is 116 seconds
Run 5 time is 116 seconds
Run 6 time is 116 seconds
Run 7 time is 116 seconds
Run 8 time is 116 seconds
Run 9 time is 116 seconds
Run 10 time is 115 seconds
PL/SQL procedure successfully completed.
SQL>
Ouch! That murdered the performance.
Let's try it with the analytic:
SQL> DECLARE
2 l_start_time DATE;
3
4 BEGIN
5
6
7 FOR n IN 1..10 LOOP
8
9 l_start_time := SYSDATE;
10
11 FOR rec IN (SELECT PRODUCT_WID, OBJ_LST_NUM, SERIAL_NUM, some_other_data
12 FROM (SELECT PRODUCT_WID, OBJ_LST_NUM, SERIAL_NUM, some_other_data,
13 MAX (obj_lst_num) OVER (PARTITION BY serial_num, product_wid) max_obj_lst_num
14 FROM analytic_test
15 )
16 WHERE max_obj_lst_num = obj_lst_num) LOOP
17
18 NULL;
19
20 END LOOP;
21
22 dbms_output.put_line('Run '||n||' time is '||(SYSDATE - l_start_time)*24*60*60||' seconds');
23
24 END LOOP;
25
26 END;
27 /
Run 1 time is 7 seconds
Run 2 time is 5 seconds
Run 3 time is 5 seconds
Run 4 time is 5 seconds
Run 5 time is 5 seconds
Run 6 time is 5 seconds
Run 7 time is 5 seconds
Run 8 time is 6 seconds
Run 9 time is 4 seconds
Run 10 time is 6 seconds
PL/SQL procedure successfully completed.
SQL>
Turns out it's not quite as good as the OPs orginal approach, at least not with my data, but it's still massively better than custom functions.
Custom functions are a very bad idea in performance terms in 99% of cases.
They hide information from the optimiser and limit its choices and inflict a lot of context switches onto the process.
|
|
|
Re: Is there a better way to rewrite this query? [message #475087 is a reply to message #474782] |
Sun, 12 September 2010 18:13 |
|
Kevin Meade
Messages: 2103 Registered: December 1999 Location: Connecticut USA
|
Senior Member |
|
|
A comment and an idea:
Comment: I wonder if the query is correct:
FROM w_sample_f f,
(SELECT product_wid, serial_num, MAX (obj_lst_num) obj_lst_num
FROM w_sample_f
GROUP BY serial_num, product_wid) g
WHERE f.serial_num = g.serial_num
AND f.obj_lst_num = g.obj_lst_num
AND f.product_wid = g.product_wid
AND f.mvmnt_type_id NOT IN ('201', '701', '702', '502', '641', '962')
ORDER BY f.serial_num;
the nested table query looks at all rows in a group in order to compute a max, but the main query excludes various mvmnt_type_id values. I wonder if this NOT IN should also be in the nested table query? Otherwise the MAX is computed using values you are trying to exclude. This may be correct but often is not. Most times you want your MAX to be computed against the same set of rows you are going to return from.
Suggestion: functions called from SQL are very expensive and though quite convenient will almost never be faster than other alternatives. A good idea but as you can see not the right direction. Instead, rely on Oracle's native optimization capabilities.
Assuming the query is correct, I would try creating the following index:
create index on w_sample_f (serial_num, product_wid, obj_lst_num);
with luck, oracle will quickly compute you max values and then use these to identify the necessary rows. In the end, you may still need to scan the 50bg table, who knows but try this index and let us know what happens.
Also, make sure you do the basics like collect statistics, make sure tables have primary, unique, and foreign key constraints defined.
Kevin
|
|
|
Goto Forum:
Current Time: Mon Nov 25 15:04:25 CST 2024
|