Home » RDBMS Server » Performance Tuning » Self join removal (Oracle, 9.2.0.4, Windows XP)
Self join removal [message #318471] Tue, 06 May 2008 23:47 Go to next message
srihari.gurram
Messages: 13
Registered: May 2008
Junior Member
The following query is taking more time to execute due to the presense of self join and max function to get the previous max date. The self joined table has almost 40 million records. I have created unique indexes on the columns of the join. Please let me know of any solutions to avoid this self join.

                SELECT  max(tpm2.prc_dt),  
                        tpm2.asset_id ,
                        tpm2.pricing_pt_id ,
                        tpm1.prc_dt
                FROM    t_prc_master tpm1,
                        t_prc_master tpm2,
                        t_pricing_pt_grp_assign ptg,
                        t_pricing_pt pt
                WHERE   tpm1.prc_dt = report_date_ 
                AND     tpm1.asset_id = tpm2.asset_id
                AND     tpm1.pricing_pt_id = tpm2.pricing_pt_id
                AND     tpm2.prc_dt < tpm1.prc_dt
                AND     tpm2.accept_flg = 'Y' 
                AND     tpm1.accept_flg = 'Y' 
                AND     tpm1.pricing_pt_id = ptg.pricing_pt_id
                AND     ptg.pricing_pt_id = pt.pricing_pt_id
                AND     ptg.pricing_pt_grp_id =  pricing_pt_grp_id_
                AND     ( pricing_pt_id_ is null OR
                          pricing_pt_id_ = -1 OR
                        tpm1.pricing_pt_id = pricing_pt_id_ )
                GROUP BY tpm2.asset_id, tpm2.pricing_pt_id, tpm1.prc_dt;


[Moderator: formatted code]

[Updated on: Wed, 07 May 2008 03:22] by Moderator

Report message to a moderator

Re: Self join removal [message #318565 is a reply to message #318471] Wed, 07 May 2008 03:23 Go to previous messageGo to next message
rleishman
Messages: 3728
Registered: October 2005
Location: Melbourne, Australia
Senior Member
I added [code]...[/code] tags around your code so that we could read it. Please do it yourself next time.
Re: Self join removal [message #318566 is a reply to message #318565] Wed, 07 May 2008 03:24 Go to previous messageGo to next message
rleishman
Messages: 3728
Registered: October 2005
Location: Melbourne, Australia
Senior Member
Post an Explain Plan. Even better if you can, post the TKPROF output of SQL Trace.

Ross Leishman
Re: Self join removal [message #318569 is a reply to message #318566] Wed, 07 May 2008 03:27 Go to previous messageGo to next message
rleishman
Messages: 3728
Registered: October 2005
Location: Melbourne, Australia
Senior Member
Also, post the results of this SQL:

SELECT count(*), max(cnt), avg(cnt), stddev(cnt)
FROM (
    SELECT asset_id, pricing_pt_id, COUNT(*) AS cnt
    FROM   t_prc_master
    GROUP BY asset_id, pricing_pt_id
)


Ross Leishman

[Updated on: Wed, 07 May 2008 03:28]

Report message to a moderator

Re: Self join removal [message #318632 is a reply to message #318569] Wed, 07 May 2008 06:28 Go to previous messageGo to next message
srihari.gurram
Messages: 13
Registered: May 2008
Junior Member
Result for the query are:
SELECT count(*), max(cnt), avg(cnt), stddev(cnt)
FROM (
    SELECT asset_id, pricing_pt_id, COUNT(*) AS cnt
    FROM   t_prc_master
    GROUP BY asset_id, pricing_pt_id
)

COUNT(*)
2514016

MAX(CNT)
197

AVG(CNT)
21.6669110300014001501979303234346957219

STDDEV(CNT)
30.26544175678646740435447457146461754385
Re: Self join removal [message #318634 is a reply to message #318566] Wed, 07 May 2008 06:35 Go to previous messageGo to next message
srihari.gurram
Messages: 13
Registered: May 2008
Junior Member
Please find Explain plan attached

[Updated on: Wed, 07 May 2008 06:36]

Report message to a moderator

Re: Self join removal [message #318776 is a reply to message #318634] Wed, 07 May 2008 23:29 Go to previous messageGo to next message
rleishman
Messages: 3728
Registered: October 2005
Location: Melbourne, Australia
Senior Member
You have a logic problem. On average there are 21 rows per asset/pricing_pt and a large standard deviation - meaning there are plenty of combos with a lot more than 21.

Consider the worst case of 197 rows on the following:
                FROM    t_prc_master tpm1,
                        t_prc_master tpm2,
                        t_pricing_pt_grp_assign ptg,
                        t_pricing_pt pt
                WHERE   tpm1.prc_dt = report_date_ 
                AND     tpm1.asset_id = tpm2.asset_id
                AND     tpm1.pricing_pt_id = tpm2.pricing_pt_id
                AND     tpm2.prc_dt < tpm1.prc_dt

This means that every row will join to EVERY other row with a lesser prc_dt. Those 197 rows will blow out to 197+196+...+2+1 = 19306 rows.

It's like a mini-cartesian join.

Surely you don't want to join to EVERY row with a lesser date. Since you have MAX(tpm2.prc_dt), I assume you only want the LATEST row earlier than the selected tpm1 row.

I understand that the table is large, but are you dealing with <1% of the rows after applying the report_date_?
WHERE   tpm1.prc_dt = report_date_ 


If so, this will help:
                FROM    t_prc_master tpm1,
                        t_prc_master tpm2,
                        t_pricing_pt_grp_assign ptg,
                        t_pricing_pt pt
                WHERE   tpm1.prc_dt = report_date_ 
                AND     (tpm2.asset_id, tpm2.pricing_pt_id, tpm2.prc_dt) = (
                    SELECT asset_id, pricing_pt_id, MAX(prc_dt)
                    FROM   t_prc_master tpm3
                    WHERE  tpm3.asset_id = tpm1.asset_id
                    AND    tpm3.pricing_pt_id = tpm1.pricing_pt_id
                    AND    tpm3.prc_dt < tpm1.prc_dt
                    AND    tpm3.accept_flg = 'Y'
                    GROUP BY asset_id, pricing_pt_id
               )

And this bit is vitally important, otherwise IT WONT WORK:

Add an index on (asset_id, pricing_pt_id, accept_flg, prc_dt) and gather stats on both the table and index. If you do not have that exact index or you do not gather statistics, the Oracle will choose the wrong plan.

When you run Explain Plan, it MUST contain a step that includes the following:
INDEX RANGE SCAN (MIN/MAX)


If you do not see the MIN/MAX then it hasn't worked - don't bother running it.

Since this SQL will not pick up multiple tpm2 rows, you should be able to eliminate the GROUP BY from the query.

If your report_date_ returns more than 10% of the table then this technique may well be slower. You may need to try an analytic function LAG().

Ross Leishman
Re: Self join removal [message #319504 is a reply to message #318776] Mon, 12 May 2008 00:36 Go to previous messageGo to next message
srihari.gurram
Messages: 13
Registered: May 2008
Junior Member
I have tried to use the mentioned approach, but didnt solve my problem... It takes more time than the original. I have created the indexes and also verified the explain plan.

The data we are dealing with the following condition is less than 10% of the data only (about 1% to 2%), but still it is not working out.

tpm1.prc_dt = report_date_


Can u give me an example on how to use the LAG function for this?

The main thing here (I believe) is to avoid to go for the MAX function.
Re: Self join removal [message #319542 is a reply to message #319504] Mon, 12 May 2008 02:27 Go to previous messageGo to next message
rleishman
Messages: 3728
Registered: October 2005
Location: Melbourne, Australia
Senior Member
Before we reject the other method, can you show me the new SQL AND the Explain Plan. And please enclose it in CODE tags.

Ross Leishman

[Updated on: Mon, 12 May 2008 02:28]

Report message to a moderator

Re: Self join removal [message #319558 is a reply to message #319542] Mon, 12 May 2008 03:39 Go to previous messageGo to next message
srihari.gurram
Messages: 13
Registered: May 2008
Junior Member
Please find below the SQL used and the Explain plan:
                SELECT  tpm2.prc_dt,  
                        tpm2.asset_id ,
                        tpm2.pricing_pt_id ,
                        tpm1.prc_dt
                FROM    t_prc_master tpm1,
                        t_prc_master tpm2,
                        t_pricing_pt_grp_assign ptg,
                        t_pricing_pt pt
                WHERE   tpm1.prc_dt = report_date_ 
                AND     (tpm2.asset_id, tpm2.pricing_pt_id, tpm2.prc_dt) = (
	                    SELECT asset_id, pricing_pt_id, MAX(prc_dt)
        	            FROM   t_prc_master tpm3
                	    WHERE  tpm3.asset_id = tpm1.asset_id
	                    AND    tpm3.pricing_pt_id = tpm1.pricing_pt_id
        	            AND    tpm3.prc_dt < tpm1.prc_dt
                	    AND    tpm3.accept_flg = 'Y'
	                    GROUP BY asset_id, pricing_pt_id
               		)
                AND     tpm1.accept_flg = 'Y' 
                AND     tpm1.pricing_pt_id = ptg.pricing_pt_id
                AND     ptg.pricing_pt_id = pt.pricing_pt_id
                AND     ptg.pricing_pt_grp_id =  pricing_pt_grp_id_
                AND     ( pricing_pt_id_ is null OR
                          pricing_pt_id_ = -1 OR
                        tpm1.pricing_pt_id = pricing_pt_id_ );


Explain plan:
in attachment
Re: Self join removal [message #319743 is a reply to message #318776] Mon, 12 May 2008 22:05 Go to previous messageGo to next message
rleishman
Messages: 3728
Registered: October 2005
Location: Melbourne, Australia
Senior Member
rleishman wrote on Thu, 08 May 2008 14:29
And this bit is vitally important, otherwise IT WONT WORK:

...

When you run Explain Plan, it MUST contain a step that includes the following:
INDEX RANGE SCAN (MIN/MAX)


If you do not see the MIN/MAX then it hasn't worked - don't bother running it.


srihari.gurram wrote on Mon, 12 May 2008 15:36
I have created the indexes and also verified the explain plan.


But the plan did not contain the MIN/MAX step. So why did you lie?

srihari.gurram wrote on Mon, 12 May 2008 18:39
Please find below the SQL used and the Explain plan:
...


The SQL you posted contains 4 tables in the FROM clause, but the plan contains only 2. So why did you lie?

It's almost as though you don't want help because you are deliberately making this harder for us.

But let's press on regardless. Since that SQL did not get the desired plan, we'll try an alternative. Please try the following and post the explain plan. Again, we are looking for a MIN/MAX step in the plan, and you must make sure that you gather statistics on both the table and the indexes.

Rather than posting a dump of the PLAN_TABLE, could you please format the Explain Plan as described here and enclose it in CODE tags.

                SELECT  tpm2.prc_dt,  
                        tpm2.asset_id ,
                        tpm2.pricing_pt_id ,
                        tpm1.prc_dt
                FROM    t_prc_master tpm1,
                        t_prc_master tpm2,
                        t_pricing_pt_grp_assign ptg,
                        t_pricing_pt pt
                WHERE   tpm1.prc_dt = report_date_ 
                AND     tpm2.asset_id = tpm1.asset_id
                AND     tpm2.pricing_pt_id = tpm1.pricing_pt_id
                AND     tpm2.accept_flg = 'Y'
                AND     tpm2.prc_dt = (
	                    SELECT MAX(prc_dt)
        	            FROM   t_prc_master tpm3
                	    WHERE  tpm3.asset_id = tpm1.asset_id
	                    AND    tpm3.pricing_pt_id = tpm1.pricing_pt_id
        	            AND    tpm3.prc_dt < tpm1.prc_dt
                	    AND    tpm3.accept_flg = 'Y'
               		)
                AND     tpm1.accept_flg = 'Y' 
                AND     tpm1.pricing_pt_id = ptg.pricing_pt_id
                AND     ptg.pricing_pt_id = pt.pricing_pt_id
                AND     ptg.pricing_pt_grp_id =  pricing_pt_grp_id_
                AND     ( pricing_pt_id_ is null OR
                          pricing_pt_id_ = -1 OR
                        tpm1.pricing_pt_id = pricing_pt_id_ );


Ross Leishman
Re: Self join removal [message #320383 is a reply to message #319743] Wed, 14 May 2008 23:07 Go to previous messageGo to next message
srihari.gurram
Messages: 13
Registered: May 2008
Junior Member
We have changed a couple of things in the query according to the requirement. The final form of original and modified queries are below:
Original:
                SELECT  max(tpm2.prc_dt),  
                        tpm2.asset_id ,
                        tpm2.pricing_pt_id ,
                        tpm1.prc_dt
                FROM    t_prc_master tpm1,
                        t_prc_master tpm2
                WHERE   tpm1.prc_dt = report_date_ 
                AND     tpm2.asset_id = tpm1.asset_id
                AND     tpm2.pricing_pt_id = tpm1.pricing_pt_id
                AND     tpm2.accept_flg = 'Y'
                AND     tpm2.prc_dt = (
	                    SELECT MAX(prc_dt)
        	            FROM   t_prc_master tpm3
                	    WHERE  tpm3.asset_id = tpm1.asset_id
	                    AND    tpm3.pricing_pt_id = tpm1.pricing_pt_id
        	            AND    tpm3.prc_dt < tpm1.prc_dt
                	    AND    tpm3.accept_flg = 'Y'
               		)
                AND     tpm1.accept_flg = 'Y' 
                AND exists (select 1 from t_temp_prcmov
                            where pca_flg = 'P'
                            and prc_pt_cntry_atyp = tpm1.pricing_pt_id)
                group by tpm2.asset_id, tpm2.pricing_pt_id, tpm1.prc_dt;

Modified: 
                SELECT  tpm2.prc_dt,  
                        tpm2.asset_id ,
                        tpm2.pricing_pt_id ,
                        tpm1.prc_dt
                FROM    t_prc_master tpm1,
                        t_prc_master tpm2
                WHERE   tpm1.prc_dt = report_date_ 
                AND     tpm2.asset_id = tpm1.asset_id
                AND     tpm2.pricing_pt_id = tpm1.pricing_pt_id
                AND     tpm2.accept_flg = 'Y'
                AND     tpm2.prc_dt = (
	                    SELECT MAX(prc_dt)
        	            FROM   t_prc_master tpm3
                	    WHERE  tpm3.asset_id = tpm1.asset_id
	                    AND    tpm3.pricing_pt_id = tpm1.pricing_pt_id
        	            AND    tpm3.prc_dt < tpm1.prc_dt
                	    AND    tpm3.accept_flg = 'Y'
               		)
                AND     tpm1.accept_flg = 'Y' 
                and exists (select 1 from t_temp_prcmov
                            where pca_flg = 'P'
                            and prc_pt_cntry_atyp = tpm1.pricing_pt_id);


This was the reason for having only 2 tables in the explain plan. The two tables which were removed had less than 100 rows. I have tried the explain plan for the above one, WITHOUT the EXISTS condition and had the same explain plan as given in the last reply.
(I didn't get the RANGE SCAN MIN/MAX step in the plan).
The performance got reduced by using the above modified query.

Let me know the usage of LAG function in this Query.
Re: Self join removal [message #320392 is a reply to message #320383] Wed, 14 May 2008 23:47 Go to previous messageGo to next message
rleishman
Messages: 3728
Registered: October 2005
Location: Melbourne, Australia
Senior Member
LAG probably won't help given your volumes. The method I have described is your best chance. If it is not working, then something has not been done correctly. Either:
  • Index is not on (asset_id, pricing_pt_id, accept_flg, prc_dt) in that order.
  • Index or table is not analyzed with BMS_STATS.GATHER TABLE STATS()

Since you don't seem to want to sort it out by finding what the problem is, I'll respect your wishes and drop it.

You could of course read the documentation to work out how to use the LAG function, but since you've been so patient, here it is.

SELECT *
FROM (
    SELECT  asset_id ,
            pricing_pt_id ,
            prc_dt,
            LAG(prc_dt) OVER (PARTITION BY asset_id, pricing_pt_id ORDER BY prc_dt) prev_prc_dt
    FROM    t_prc_master
    WHERE   accept_flg = 'Y'
)
WHERE prc_dt = report_date_


It may be an improvement on what you currently have, but it will still be sub-optimal.

Another syntax you could try would be:
SELECT  asset_id ,
        pricing_pt_id ,
        prc_dt,
        (SELECT MAX(prc_dt)
         FROM   t_prc_master tpm3
         WHERE  tpm3.asset_id = tpm1.asset_id
	 AND    tpm3.pricing_pt_id = tpm1.pricing_pt_id
         AND    tpm3.prc_dt < tpm1.prc_dt
         AND    tpm3.accept_flg = 'Y'
        )
FROM    t_prc_master tpm1
WHERE   accept_flg = 'Y'
AND     prc_dt = report_date_

Once again though, this would require the index I mentioned, which would need to be analysed, and the Explain Plan would have to include the MIN/MAX range scan.


Ross Leishman
Re: Self join removal [message #328006 is a reply to message #320392] Wed, 18 June 2008 10:29 Go to previous messageGo to next message
buharisi
Messages: 2
Registered: June 2008
Location: Chicago
Junior Member
Shocked

Hi Ross Leishman,

This is Buhari replying to this message on behalf of Srihari to follow up on this...

I have run the TKPROF for the initial query which Srihari has submitted and as well as for the query which has been given by you..

I am attaching it herewith...

The 'Before Tuning' SQL was taking around 4 Mins...But 'After Tuning' SQL was taking around 1 Min 48 Sec. CPU Time as you can see it from the tkprof attached..This tuned one is very good. But my question is 'Is there anyway to tune it further (Say...Less than a minute after looking at the TKPROF Output)..It would be appreciated...

Since the Attachment is limited on 1 per message..I will be attaching the 'After Tuned - SQL' in the next message..

Thanks...
Buhari
Re: Self join removal [message #328007 is a reply to message #328006] Wed, 18 June 2008 10:30 Go to previous messageGo to next message
buharisi
Messages: 2
Registered: June 2008
Location: Chicago
Junior Member
Hi Ross,

Here is the next one as i promised...

Thanks..
Buhari

Re: Self join removal [message #328049 is a reply to message #318471] Wed, 18 June 2008 12:59 Go to previous messageGo to next message
coleing
Messages: 213
Registered: February 2008
Senior Member
Hi,

It looks to me like you are reading quite a lot of the t_prc_master table.

Perhaps something like this might run quicker? Its dependent on a full scan of the table being quicker than multiple index lookups against the same row.
The logic might need slight amendment, as I might have lost something in transaltion.

	select *
	from
		(
		SELECT  tpm2.prc_dt,  
                        tpm2.asset_id ,
                        tpm2.pricing_pt_id ,
                        tpm1.prc_dt
			max( case 
                               when tpm1.accept_flag = 'Y' then 
                                    tpm1.prc_dt 
                               else          
                                    to_date('00010101','YYYYMMDD') 
                             end) 
                        over (partition by assit_id, pricing_pt_id) as max_prc_dt
                FROM    (select * from t_prc_master where tpm1.accept_flg = 'Y') tpm1
                WHERE   tpm1.prc_dt = report_date_ 
                and exists (select 1 from t_temp_prcmov
                            where pca_flg = 'P'
                            and prc_pt_cntry_atyp = tpm1.pricing_pt_id)
		)
          where prc_dt = max_prc_dt;
Re: Self join removal [message #328088 is a reply to message #320392] Wed, 18 June 2008 22:37 Go to previous message
rleishman
Messages: 3728
Registered: October 2005
Location: Melbourne, Australia
Senior Member
There's something very fishy going on here. The SQL accesses T_PRC_MASTER 3 times, but the execution plan shows only 2 steps accessing that table: Steps 3 and 5. The plan step for the sub-query is MISSING!

Rows     Row Source Operation
-------  ---------------------------------------------------
 780544  HASH JOIN SEMI (cr=6894392 r=408805 w=7230 time=90095967 us)
 780544   NESTED LOOPS  (cr=6894388 r=401573 w=0 time=83121308 us)
 783459    TABLE ACCESS BY INDEX ROWID T_PRC_MASTER (cr=593790 r=24953 w=0 time=6516069 us)
 784161     INDEX RANGE SCAN PRC_DT_ASSET_ID (cr=7493 r=75 w=0 time=378990 us)(object id 450059)
 780544    INDEX RANGE SCAN ASSET_DT_ACCEPT_FLG (cr=2341687 r=14 w=0 time=3129196 us)(object id 450055)
     55   TABLE ACCESS FULL TT_TEMP_PRCMOV (cr=4 r=2 w=0 time=310 us)


When we look at the re-parsed plan immediately below, we see the sub-query present as a SORT and INDEX access.

This is odd.

This can still run faster. Note that it is NOT performing the INDEX RANGE SCAN (MIN/MAX) step that I suggested it required. Quite simply, until you see this step in the plan you will not get optimal performance. I don't know how I can make that any clearer.

Perhaps you have not created the index correctly. Prove to me that you have an index on (asset_id, pricing_pt_id, accept_flg, prc_dt) IN THAT ORDER, and that statistics have been gathered for both the table and index.

Ross Leishman
Previous Topic: Hit Rate.
Next Topic: slow server
Goto Forum:
  


Current Time: Thu Jan 09 20:11:24 CST 2025