Home » RDBMS Server » Performance Tuning » Tuning analytical function query (oracle 10g)
Tuning analytical function query [message #388873] Thu, 26 February 2009 08:08 Go to next message
ajitpal.s
Messages: 204
Registered: November 2006
Senior Member
Hi Guys,

I have a query, for some reasons its taking more than 5mins to run, which is not so reasonable.

explain plan for
select distinct
       a.line
      ,a.item_id
      ,first_value (b.std) over (partition by b.item_id
                                   order by case when b.r_date <= (select add_months(start_date,-4) from rel_tim where time_id = 5) then b.r_date
                                                 else null end desc nulls last
                                            ,case when b.r_date > (select add_months(start_date,-4) from rel_tim where time_id = 5) then b.r_date
                                                 else null end asc nulls last) val
from 
dm_v a, std_c b
where a.item_id = b.item_id
and b.std <> 0
and b.id = 105
and a.TRX >= '1-Jul-2008'




-------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                      | Name                           | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT               |                                |  4584K|   144M|       |   106K  (2)| 00:21:14 |
|   1 |  TABLE ACCESS BY INDEX ROWID   | rel_tim      			|     1 |    12 |       |     1   (0)| 00:00:01 |
|*  2 |   INDEX UNIQUE SCAN            | rel_tim_PK   			|     1 |       |       |     0   (0)| 00:00:01 |
|   3 |   TABLE ACCESS BY INDEX ROWID  | rel_tim      			|     1 |    12 |       |     1   (0)| 00:00:01 |
|*  4 |    INDEX UNIQUE SCAN           | rel_tim_PK   			|     1 |       |       |     0   (0)| 00:00:01 |
|   5 |  HASH UNIQUE                   |                                |  4584K|   144M|   421M|   106K  (2)| 00:21:14 |
|   6 |   WINDOW SORT                  |                                |  4584K|   144M|   421M|   106K  (2)| 00:21:14 |
|*  7 |    HASH JOIN                   |                                |  4584K|   144M|  6560K| 23300   (2)| 00:04:40 |
|   8 |     TABLE ACCESS BY INDEX ROWID| dm_v                    	|   239K|  3744K|       |  7622   (1)| 00:01:32 |
|*  9 |      INDEX RANGE SCAN          | dm_v_N11                	|   239K|       |       |   649   (2)| 00:00:08 |
|* 10 |     TABLE ACCESS FULL          | std_c 				|   406K|  6745K|       | 14725   (3)| 00:02:57 |
-------------------------------------------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   2 - access("TIME_ID"=5)
   4 - access("TIME_ID"=5)
   7 - access("A"."ITEM_ID"="B"."ITEM_ID")
   9 - access("A"."TRX">=TO_DATE('2008-07-01 00:00:00', 'yyyy-mm-dd hh24:mi:ss'))
  10 - filter("B"."ID"=105 AND "B"."std"<>0)




Note:
1) Both tables have dm_v & std_c have index created on item_id
2) I have anaylzed the table


ANALYZE TABLE dm_v ESTIMATE STATISTICS SAMPLE 30 PERCENT;

ANALYZE TABLE std_c  ESTIMATE STATISTICS SAMPLE 30 PERCENT;


Appreciate suggestions on how can i go about tuning, i see a full table scan, but not sure how to get rid of it.
Re: Tuning analytical function query [message #388884 is a reply to message #388873] Thu, 26 February 2009 08:52 Go to previous messageGo to next message
BlackSwan
Messages: 26766
Registered: January 2009
Location: SoCal
Senior Member
Are there indexes on
where a.item_id = b.item_id
and b.std <> 0
and b.id = 105
and are statistics current on all indexes (not just tables)?
Re: Tuning analytical function query [message #388889 is a reply to message #388884] Thu, 26 February 2009 09:08 Go to previous messageGo to next message
ajitpal.s
Messages: 204
Registered: November 2006
Senior Member
Thanks for the reply.

Noticed and b.std <> 0 no index, but its a number field, not sure if its a good idea creating index on it.

After analyzing rest of indexes, using syntax below, noticed the explain plan still producing same plan. Can i tell the optimizer to use index for this column?

ANALYZE index <name> ESTIMATE STATISTICS SAMPLE 30 PERCENT; 



Is there any guideline as of when to analyze index/table? How many sample percent to estimate and do we analyze table or index first?
Re: Tuning analytical function query [message #388890 is a reply to message #388873] Thu, 26 February 2009 09:13 Go to previous messageGo to next message
BlackSwan
Messages: 26766
Registered: January 2009
Location: SoCal
Senior Member
How many total rows in std_c b?
How many rows a.item_id = b.item_id?
How many rows b.std <> 0?
How many rows b.id = 105?
Re: Tuning analytical function query [message #388893 is a reply to message #388890] Thu, 26 February 2009 09:32 Go to previous messageGo to next message
ajitpal.s
Messages: 204
Registered: November 2006
Senior Member
Did a count, figures are as below, i did not specify the filter on trx (and a.TRX >= '1-Jul-2008')


How many total rows in std_c b?
select count(*) from std_c -- = 4848914


How many rows a.item_id = b.item_id?
select count(*) from dm_v a, std_c b where a.item_id = b.item_id -- = 3709761000 



How many rows b.std <> 0?
select count(*) from std_c  where std <> 0 -- 3280043


How many rows b.id = 105?
select count(*) from std_c  where  id = 105 -- 728215
Re: Tuning analytical function query [message #388895 is a reply to message #388873] Thu, 26 February 2009 09:37 Go to previous messageGo to next message
BlackSwan
Messages: 26766
Registered: January 2009
Location: SoCal
Senior Member
How does
select count(*) from dm_v a, std_c b where a.item_id = b.item_id -- = 3709761000
return MORE rows than exist in std_c?
select count(*) from std_c -- = 4848914

I give up.
Re: Tuning analytical function query [message #388896 is a reply to message #388895] Thu, 26 February 2009 09:44 Go to previous messageGo to next message
ajitpal.s
Messages: 204
Registered: November 2006
Senior Member
Yes. Pls dont give up, i have high hopes on you.

You give up, i end up getting sacked.

Its a many to many join
Re: Tuning analytical function query [message #388902 is a reply to message #388896] Thu, 26 February 2009 09:53 Go to previous messageGo to next message
JRowbottom
Messages: 5933
Registered: June 2006
Location: Sunny North Yorkshire, ho...
Senior Member
Try this - instead of querying rel_tim twice per row returned, try querying it once per query

select distinct
       a.line
      ,a.item_id
      ,first_value (b.std) over (partition by b.item_id
                                 order by case when b.r_date <= add_months(t.start_date,-4) 
                                               then b.r_date end desc nulls last
                                         ,case when b.r_date > add_months(t.start_date,-4) 
                                               then b.r_date end asc nulls last) val
from dm_v a
    ,std_c b
    ,rel_tim t
where a.item_id = b.item_id
and b.std <> 0
and b.id = 105
and a.TRX >= to_date('1-Jul-2008','dd-mon-yyyy')
and t.time_id = 5
Re: Tuning analytical function query [message #388906 is a reply to message #388902] Thu, 26 February 2009 10:10 Go to previous messageGo to next message
ajitpal.s
Messages: 204
Registered: November 2006
Senior Member
Thanks for building up the query initially and tuning it, now seems like its taking - 4min 21secs. Some imporvement there.
explain plan reduced one step.

Just curious, is it possible not to use full table scan on std_c table (step 10 in explain plan). Not sure if it would faster if it accesses via index since theres index created.

Any specific hints can i specify to perform a much faster join?
Re: Tuning analytical function query [message #388952 is a reply to message #388906] Thu, 26 February 2009 14:13 Go to previous messageGo to next message
rleishman
Messages: 3728
Registered: October 2005
Location: Melbourne, Australia
Senior Member
Seems to me like the natsy analytic function can be converted into a slighlty less nasty aggregate function. If we do this and aggregate down to the join key before the mini-cartesian join to A, we can sort fewer rows and reduce the size of joins.
SELECT a.line
,      a.item_id
,      max(d.val)
FROM dm_v a
,(
    SELECT b.item,
           MAX(b.std) KEEP (DENSE_RANK FIRST ORDER BY
                            CASE
                              WHEN b.r_date <= add_months(t.start_date,-4) 
                              THEN b.r_date 
                            END DESC NULLS LAST
                           ,CASE WHEN b.r_date > add_months(t.start_date,-4) 
                            THEN b.r_date 
                            END ASC NULLS LAST
                           ) AS val
    FROM   std_c b
    ,      rel_tim t
    AND    b.std <> 0
    AND    b.id = 105
    AND    t.time_id = 5
    GROUP BY b.item
) d
WHERE a.item = d.item
AND   a.TRX >= to_date('1-Jul-2008','dd-mon-yyyy')
GROUP BY a.line
,      a.item_id


Ross Leishman
Re: Tuning analytical function query [message #388972 is a reply to message #388952] Thu, 26 February 2009 17:11 Go to previous messageGo to next message
ajitpal.s
Messages: 204
Registered: November 2006
Senior Member
Was trying to understand the changes you have made.

The dense_rank assigns rank, if the value is same as previous, same rank gets assigned and the next value will have subsequent rank.

But can you explain the function of aggregrate, how does it work

 MAX(b.std) before dense_rank


Just trying to understand how the sql in 'd' clause effects the join


[Updated on: Thu, 26 February 2009 17:14]

Report message to a moderator

Re: Tuning analytical function query [message #389129 is a reply to message #388972] Fri, 27 February 2009 06:11 Go to previous messageGo to next message
Alessandro Rossi
Messages: 166
Registered: September 2008
Location: Rome
Senior Member
This one should do the same job probably.

select a.line
      ,a.item_id
      ,d.std as val
from dm_v a
	left outer join (
			select b.item_id,
				max(b.std) keep (
					dense_rank first 
					order by abs(b.r_date - add_months(c.start_date,-4))
				) as std
			from std_c b 
				join rel_tim c on ( c.time_id = 5 )
			where b.r_date is not null
			group by b.item
		) d on (a.item_id = d.item_id)
where b.std <> 0
	and b.id = 105
	and a.TRX >= date '2008-7-1'



As Ross Leishman said it can be converted into direct aggregate function (instead of using distinct over analytic).

Here I tried to reduce everything:

1. Instead of using order by .. nulls last I used an outer join against not null first_values.

2. Instead of aggregating the final ( and bigger ) data set I tried to use the aggregate function on the smallest one ( the table std_c ).

3. That order by clause using case may be reduced into a mathematical one probably a bit easier to evaluate.


PS.

the max before dense_rank is there because generally you may have multiple rows to be first (or last) upon a particular sort order. So on those (multiple) rows you need to define an aggregate function, because the result must reside on a single row.


Bye Alessandro

[Updated on: Fri, 27 February 2009 06:18]

Report message to a moderator

Re: Tuning analytical function query [message #389133 is a reply to message #389129] Fri, 27 February 2009 06:28 Go to previous message
ajitpal.s
Messages: 204
Registered: November 2006
Senior Member
Alessandro, thanks alot for tuning the code, appreaciate it,

cant imagine how you guys can come up such good sql scripts in some many differents ways.

Thanks all for the contributions.
Previous Topic: better way to write or tune a query with joins
Next Topic: Understanding Explain Plan
Goto Forum:
  


Current Time: Tue Nov 26 07:07:45 CST 2024