Home » RDBMS Server » Performance Tuning » Is there a better way to rewrite this query? (Oracle 11g , Linux)
Is there a better way to rewrite this query? [message #474782] Wed, 08 September 2010 18:02 Go to next message
kvignes1
Messages: 33
Registered: September 2007
Member


   SELECT   f."PRODUCT_WID", f."PLANT_LOC_WID", f."STORAGE_LOC_WID",
            f."MVMNT_TYPE_ID", f."OBJ_LST_NUM", f."MATERIAL_DOC_NUM",
            f."MAT_DOC_YEAR", f."ITEM_NUM", f."DOC_TYPE", f."DB_CR_IND",
            f."XACT_DT_WID", f."POSTED_ON_DT_WID", f."SYSTEM_DT_WID",
            f."VENDOR_NUM", f."USER_VZID", f."TECH_VZID", f."BATCH_NUM",
            f."OBJ_LST_CNTR", f."EQUIPMENT_NUM", f."SERIAL_NUM",
            f."DATASOURCE_NUM_ID", f."ETL_PROC_WID", f."INTEGRATION_ID",
            f."TENANT_ID"
       FROM w_sample_f f,
            (SELECT   product_wid, serial_num, MAX (obj_lst_num) obj_lst_num
                 FROM w_sample_f
             GROUP BY serial_num, product_wid) g
      WHERE f.serial_num = g.serial_num
        AND f.obj_lst_num = g.obj_lst_num
        AND f.product_wid = g.product_wid
        AND f.mvmnt_type_id NOT IN ('201', '701', '702', '502', '641', '962')
   ORDER BY f.serial_num;


consider that above FACT table (w_sample_f) is 50GB in size. Its taking too long while querying this.

[Updated on: Wed, 08 September 2010 18:04]

Report message to a moderator

Re: Is there a better way to rewrite this query? [message #474783 is a reply to message #474782] Wed, 08 September 2010 18:04 Go to previous messageGo to next message
BlackSwan
Messages: 26766
Registered: January 2009
Location: SoCal
Senior Member
Since NOBODY can optimize SQL just by looking at it, we need a few more details.
http://www.orafaq.com/forum/m/433888/136107/#msg_433888

does SQL below perform any better?

SELECT f."PRODUCT_WID",
       f."PLANT_LOC_WID",
       f."STORAGE_LOC_WID",
       f."MVMNT_TYPE_ID",
       f."OBJ_LST_NUM",
       f."MATERIAL_DOC_NUM",
       f."MAT_DOC_YEAR",
       f."ITEM_NUM",
       f."DOC_TYPE",
       f."DB_CR_IND",
       f."XACT_DT_WID",
       f."POSTED_ON_DT_WID",
       f."SYSTEM_DT_WID",
       f."VENDOR_NUM",
       f."USER_VZID",
       f."TECH_VZID",
       f."BATCH_NUM",
       f."OBJ_LST_CNTR",
       f."EQUIPMENT_NUM",
       f."SERIAL_NUM",
       f."DATASOURCE_NUM_ID",
       f."ETL_PROC_WID",
       f."INTEGRATION_ID",
       f."TENANT_ID"
FROM   w_sample_f f
WHERE  f.mvmnt_type_id NOT IN ( '201', '701', '702', '502',     '641', '962' )
       AND ( f.product_wid, f.serial_num, f.obj_lst_num ) IN 
       (SELECT product_wid, serial_num, MAX (obj_lst_num) obj_lst_num
        FROM   w_sample_f
        GROUP  BY serial_num, product_wid)
ORDER  BY f.serial_num;  


I need to ask this question
Can/should the same NOT IN be used with the second SELECT?

[Updated on: Wed, 08 September 2010 20:58]

Report message to a moderator

Re: Is there a better way to rewrite this query? [message #474795 is a reply to message #474782] Thu, 09 September 2010 00:55 Go to previous messageGo to next message
Its_me_ved
Messages: 979
Registered: October 2009
Location: India
Senior Member
(SELECT   product_wid, serial_num, MAX (obj_lst_num) obj_lst_num
                 FROM w_sample_f
             GROUP BY serial_num, product_wid) g



Try to rewrite the sql replacing above.

May be you can think of creating a function that will return the max of obj_lst_num based on serial_num, product_wid

And rewrite something like..

where obj_lst_num= your function that returns the max value


A test case will help to solve your problem.

Regards
Ved

Re: Is there a better way to rewrite this query? [message #474831 is a reply to message #474795] Thu, 09 September 2010 04:17 Go to previous messageGo to next message
cookiemonster
Messages: 13958
Registered: September 2008
Location: Rainy Manchester
Senior Member
If you mean a custom function that would almost certainly be less performant than what the OP has now.
If you mean using analytics, why didn't you say so:
SELECT f."PRODUCT_WID", f."PLANT_LOC_WID", f."STORAGE_LOC_WID",
       f."MVMNT_TYPE_ID", f."OBJ_LST_NUM", f."MATERIAL_DOC_NUM",
       f."MAT_DOC_YEAR", f."ITEM_NUM", f."DOC_TYPE", f."DB_CR_IND",
       f."XACT_DT_WID", f."POSTED_ON_DT_WID", f."SYSTEM_DT_WID",
       f."VENDOR_NUM", f."USER_VZID", f."TECH_VZID", f."BATCH_NUM",
       f."OBJ_LST_CNTR", f."EQUIPMENT_NUM", f."SERIAL_NUM",
       f."DATASOURCE_NUM_ID", f."ETL_PROC_WID", f."INTEGRATION_ID",
       f."TENANT_ID", 
       MAX (obj_lst_num) OVER PARTITION BY (serial_num, product_wid) max_obj_lst_num
FROM w_sample_f f
WHERE max_obj_lst_num = obj_lst_num
AND f.mvmnt_type_id NOT IN ('201', '701', '702', '502', '641', '962')
ORDER BY f.serial_num;

That should be pretty fast if you have an index on serial_num, product_wid, obj_lst_num and possibly mvmnt_type_id.
Re: Is there a better way to rewrite this query? [message #474833 is a reply to message #474831] Thu, 09 September 2010 04:28 Go to previous messageGo to next message
Its_me_ved
Messages: 979
Registered: October 2009
Location: India
Senior Member
WHERE max_obj_lst_num = obj_lst_num

You can not use alias in the same select

Regards
Ved
Re: Is there a better way to rewrite this query? [message #474836 is a reply to message #474833] Thu, 09 September 2010 04:36 Go to previous messageGo to next message
cookiemonster
Messages: 13958
Registered: September 2008
Location: Rainy Manchester
Senior Member
Point.

SELECT <COLUMNS....>
FROM (SELECT f."PRODUCT_WID", f."PLANT_LOC_WID", f."STORAGE_LOC_WID",
             f."MVMNT_TYPE_ID", f."OBJ_LST_NUM", f."MATERIAL_DOC_NUM",
             f."MAT_DOC_YEAR", f."ITEM_NUM", f."DOC_TYPE", f."DB_CR_IND",
             f."XACT_DT_WID", f."POSTED_ON_DT_WID", f."SYSTEM_DT_WID",
             f."VENDOR_NUM", f."USER_VZID", f."TECH_VZID", f."BATCH_NUM",
             f."OBJ_LST_CNTR", f."EQUIPMENT_NUM", f."SERIAL_NUM",
             f."DATASOURCE_NUM_ID", f."ETL_PROC_WID", f."INTEGRATION_ID",
             f."TENANT_ID", 
       MAX (obj_lst_num) OVER PARTITION BY (serial_num, product_wid) max_obj_lst_num
      FROM w_sample_f f
      WHERE f.mvmnt_type_id NOT IN ('201', '701', '702', '502', '641', '962')
     )
WHERE max_obj_lst_num = obj_lst_num
ORDER BY f.serial_num;
Re: Is there a better way to rewrite this query? [message #474837 is a reply to message #474836] Thu, 09 September 2010 04:46 Go to previous messageGo to next message
Its_me_ved
Messages: 979
Registered: October 2009
Location: India
Senior Member
It wont work. Wink I think you will get an error -invalid identifier!!

obj_lst_num


If there is no big margin of performance difference I will opt to use a custom function to reuse the same.


Regards
Ved

[Updated on: Thu, 09 September 2010 05:00]

Report message to a moderator

Re: Is there a better way to rewrite this query? [message #474885 is a reply to message #474837] Thu, 09 September 2010 08:45 Go to previous messageGo to next message
cookiemonster
Messages: 13958
Registered: September 2008
Location: Rainy Manchester
Senior Member
Yes it will throw that error but not where you think. the table alisas on the order by is a problem and I've got a bracket in the wrong place in the analytic. Fixed version:

SELECT <COLUMNS....>
FROM (SELECT f."PRODUCT_WID", f."PLANT_LOC_WID", f."STORAGE_LOC_WID",
             f."MVMNT_TYPE_ID", f."OBJ_LST_NUM", f."MATERIAL_DOC_NUM",
             f."MAT_DOC_YEAR", f."ITEM_NUM", f."DOC_TYPE", f."DB_CR_IND",
             f."XACT_DT_WID", f."POSTED_ON_DT_WID", f."SYSTEM_DT_WID",
             f."VENDOR_NUM", f."USER_VZID", f."TECH_VZID", f."BATCH_NUM",
             f."OBJ_LST_CNTR", f."EQUIPMENT_NUM", f."SERIAL_NUM",
             f."DATASOURCE_NUM_ID", f."ETL_PROC_WID", f."INTEGRATION_ID",
             f."TENANT_ID", 
       MAX (obj_lst_num) OVER (PARTITION BY serial_num, product_wid) max_obj_lst_num
      FROM w_sample_f f
      WHERE f.mvmnt_type_id NOT IN ('201', '701', '702', '502', '641', '962')
     )
WHERE max_obj_lst_num = obj_lst_num
ORDER BY serial_num;


And if you think there's no noticable difference between a custom function and the analytic approach then you've never compared the difference.
Let's see:
Create a table with sample data and index it:
SQL> CREATE TABLE analytic_test AS (SELECT MOD(ROWNUM, 100) serial_num, MOD(ROWNUM, 100) + 1 product_wid,
  2   SYSDATE + (ROWNUM/24) obj_lst_num, ROWNUM some_other_data FROM dual CONNECT BY LEVEL < 1000000);

Table created.


SQL> create index analytic_test1 on analytic_test(serial_num, product_wid, obj_lst_num);

Index created.


Lets see how long it takes using the OPs approach
SQL> set serveroutput on
SQL> DECLARE
  2  l_start_time DATE;
  3  
  4  BEGIN
  5  
  6  
  7    FOR n IN 1..10 LOOP
  8    
  9      l_start_time := SYSDATE;
 10      
 11      FOR rec IN (SELECT f.PRODUCT_WID, f.OBJ_LST_NUM, f.SERIAL_NUM, f.some_other_data
 12                  FROM analytic_test f,
 13                       (SELECT product_wid, serial_num, MAX (obj_lst_num) obj_lst_num
 14                        FROM analytic_test
 15                        GROUP BY serial_num, product_wid) g
 16                  WHERE f.serial_num = g.serial_num
 17                  AND f.obj_lst_num = g.obj_lst_num
 18                  AND f.product_wid = g.product_wid) LOOP
 19                        
 20        NULL;
 21        
 22      END LOOP;
 23      
 24      dbms_output.put_line('Run '||n||' time is '||(SYSDATE - l_start_time)*24*60*60||' seconds');
 25      
 26    END LOOP;
 27    
 28  END;
 29  /
Run 1 time is 2 seconds
Run 2 time is 2 seconds
Run 3 time is 2 seconds
Run 4 time is 2 seconds
Run 5 time is 2 seconds
Run 6 time is 2 seconds
Run 7 time is 2 seconds
Run 8 time is 2 seconds
Run 9 time is 2 seconds
Run 10 time is 2 seconds

PL/SQL procedure successfully completed.

SQL> 


Now let's create a custom function, presume you meant something like this:
SQL> CREATE OR REPLACE FUNCTION get_max(p_product_wid analytic_test.product_wid%TYPE, 
  2                                     p_serial_num  analytic_test.serial_num%TYPE)
  3  RETURN analytic_test.obj_lst_num%TYPE IS
  4  
  5    l_ret    analytic_test.obj_lst_num%TYPE;
  6    
  7  BEGIN
  8  
  9    SELECT MAX(obj_lst_num) INTO l_ret
 10    FROM analytic_test
 11    WHERE product_wid = p_product_wid
 12    AND serial_num = p_serial_num;
 13    
 14    RETURN l_ret;
 15    
 16  END get_max;
 17  /

Function created.


So let's time it for a query using your function:
SQL> DECLARE
  2  l_start_time DATE;
  3  
  4  BEGIN
  5  
  6  
  7    FOR n IN 1..10 LOOP
  8    
  9      l_start_time := SYSDATE;
 10      
 11      FOR rec IN (SELECT f.PRODUCT_WID, f.OBJ_LST_NUM, f.SERIAL_NUM, f.some_other_data
 12                  FROM analytic_test f
 13                  WHERE obj_lst_num = get_max(product_wid, serial_num)) LOOP
 14                        
 15        NULL;
 16        
 17      END LOOP;
 18      
 19      dbms_output.put_line('Run '||n||' time is '||(SYSDATE - l_start_time)*24*60*60||' seconds');
 20      
 21    END LOOP;
 22    
 23  END;
 24  /
Run 1 time is 117 seconds
Run 2 time is 116 seconds
Run 3 time is 116 seconds
Run 4 time is 116 seconds
Run 5 time is 116 seconds
Run 6 time is 116 seconds
Run 7 time is 116 seconds
Run 8 time is 116 seconds
Run 9 time is 116 seconds
Run 10 time is 115 seconds

PL/SQL procedure successfully completed.

SQL> 

Ouch! That murdered the performance.

Let's try it with the analytic:
SQL> DECLARE
  2  l_start_time DATE;
  3  
  4  BEGIN
  5  
  6  
  7    FOR n IN 1..10 LOOP
  8    
  9      l_start_time := SYSDATE;
 10      
 11      FOR rec IN (SELECT PRODUCT_WID, OBJ_LST_NUM, SERIAL_NUM, some_other_data
 12                  FROM (SELECT PRODUCT_WID, OBJ_LST_NUM, SERIAL_NUM, some_other_data,
 13                               MAX (obj_lst_num) OVER (PARTITION BY serial_num, product_wid) max_obj_lst_num
 14                        FROM analytic_test
 15                       )
 16                  WHERE max_obj_lst_num = obj_lst_num) LOOP
 17                        
 18        NULL;
 19        
 20      END LOOP;
 21      
 22      dbms_output.put_line('Run '||n||' time is '||(SYSDATE - l_start_time)*24*60*60||' seconds');
 23      
 24    END LOOP;
 25    
 26  END;
 27  /
Run 1 time is 7 seconds
Run 2 time is 5 seconds
Run 3 time is 5 seconds
Run 4 time is 5 seconds
Run 5 time is 5 seconds
Run 6 time is 5 seconds
Run 7 time is 5 seconds
Run 8 time is 6 seconds
Run 9 time is 4 seconds
Run 10 time is 6 seconds

PL/SQL procedure successfully completed.

SQL> 

Turns out it's not quite as good as the OPs orginal approach, at least not with my data, but it's still massively better than custom functions.
Custom functions are a very bad idea in performance terms in 99% of cases.
They hide information from the optimiser and limit its choices and inflict a lot of context switches onto the process.
Re: Is there a better way to rewrite this query? [message #475087 is a reply to message #474782] Sun, 12 September 2010 18:13 Go to previous message
Kevin Meade
Messages: 2103
Registered: December 1999
Location: Connecticut USA
Senior Member
A comment and an idea:

Comment: I wonder if the query is correct:

       FROM w_sample_f f,
            (SELECT   product_wid, serial_num, MAX (obj_lst_num) obj_lst_num
                 FROM w_sample_f
             GROUP BY serial_num, product_wid) g
      WHERE f.serial_num = g.serial_num
        AND f.obj_lst_num = g.obj_lst_num
        AND f.product_wid = g.product_wid
        AND f.mvmnt_type_id NOT IN ('201', '701', '702', '502', '641', '962')
   ORDER BY f.serial_num;


the nested table query looks at all rows in a group in order to compute a max, but the main query excludes various mvmnt_type_id values. I wonder if this NOT IN should also be in the nested table query? Otherwise the MAX is computed using values you are trying to exclude. This may be correct but often is not. Most times you want your MAX to be computed against the same set of rows you are going to return from.

Suggestion: functions called from SQL are very expensive and though quite convenient will almost never be faster than other alternatives. A good idea but as you can see not the right direction. Instead, rely on Oracle's native optimization capabilities.

Assuming the query is correct, I would try creating the following index:

create index on w_sample_f (serial_num, product_wid, obj_lst_num);


with luck, oracle will quickly compute you max values and then use these to identify the necessary rows. In the end, you may still need to scan the 50bg table, who knows but try this index and let us know what happens.

Also, make sure you do the basics like collect statistics, make sure tables have primary, unique, and foreign key constraints defined.

Kevin
Previous Topic: Obtaining 10046 Wait Events for Parallel Slaves
Next Topic: Standard vs Advanced Compression
Goto Forum:
  


Current Time: Mon Nov 25 15:04:25 CST 2024