RE: theory of rewriting a query

From: David Kurtz <info_at_go-faster.co.uk>
Date: Tue, 12 May 2009 22:37:04 +0100
Message-ID: <D54D86A4A2C44079B025BDF8FCE1D343_at_GOFASTER4>



While I completely agree with Stephane's points, one of the problems here is the application toolset.

The orignial query was produced by the PS/Query adhoc query tool (I know that because the table aliases are A, B, C, D, etc and on the subqueries they are D_ED, D_ES etc). The effective date and maximum sequence subqueries are also automatically generated.

All of the suggestions below, and in Stephane's e-mail can be implemented in PS/Query, but require that you explicitly code the sub-query. You have to override the automatic sub-query definition that makes the tool far too easy for non-technical business users to use themselves.

Sometimes, it is easier to code the query into a view (where you can specify free format text), and then reference the view in PS/Query.

While on the subject of subqueries, PeopleSoft recommend that _UNNEST_SUBQUERY is set to false - otherwise the optimizer has a tendency to unnest the subqueries into in-line views at the front of the FROM clauses because it underestimates the cardinality of these subqueries.

However, this also means that the sub-query fires for each row in the parent query. This means that it can be very important to consider which tables you reference in the subquery. By default, if PeopleSoft has a subquery on a table the subquery will correlate to that query rather than any other table in the query. This is easier to explain by example.

So lets look at one of those subqueries.

>> FROM ps_gl_transact_det a, ...
>> ps_job d, ...
>> WHERE ...
>> AND a.emplid = d.emplid
>> AND d.empl_rcd = a.empl_rcd ...
>> AND d.effdt = (SELECT max(d_ed.effdt)
>> FROM ps_job d_ed
>> WHERE d.emplid = d_ed.emplid
>> AND d.empl_rcd = d_ed.empl_rcd
>> AND d_ed.effdt <= a.pay_end_dt)
>> AND d.effseq = (SELECT max(d_es.effseq)
>> FROM ps_job d_es
>> WHERE d.emplid = d_es.emplid
>> AND d.empl_rcd = d_es.empl_rcd
>> AND d.effdt = d_es.effdt)

The d_ed subquery is correlated to d on the first two key columns, and there is a reference to a.pay_end_dt for the effdt. But references to d could be replaced with a.

However, that cannot be done for all the references in d_es, because you would still be left with d.effdt.
In which case that subquery has to wait for values in d to be resolved. Thus.

>> FROM ps_gl_transact_det a, ...
>> ps_job d, ...
>> WHERE ...
>> AND a.emplid = d.emplid
>> AND d.empl_rcd = a.empl_rcd ...
>> AND d.effdt = (SELECT max(d_ed.effdt)
>> FROM ps_job d_ed
>> WHERE d_ed.emplid = a.emplid ----was d.
>> AND d_ed.empl_rcd = a.empl_rcd ----was d.
>> AND d_ed.effdt <= a.pay_end_dt)
>> AND d.effseq = (SELECT max(d_es.effseq)
>> FROM ps_job d_es
>> WHERE d.emplid = d_es.emplid
>> AND d.empl_rcd = d_es.empl_rcd
>> AND d.effdt = d_es.effdt)

Now, the d_ed subquery can fire before the d query and provides a value for d.effdt (which is the third key column of 4).

Another Oracle specific trick is this:
- The inline view returns the rows in order, and the rownum condition stops the query when the first matching row is found. The index matches the order by clauses so there is no need for a sort.

>> FROM ps_gl_transact_det a, ...
>> ps_job d, ...
>> WHERE ...
>> AND a.emplid = d.emplid
>> AND d.empl_rcd = a.empl_rcd ...
>> AND (d.effdt,d.effseq) = (

>>			SELECT effdt, effseq
>>			FROM (
>>				SELECT effdt, effseq
>>    	            FROM ps_job 
>>    	            ORDER BY effdt desc, effseq desc
>>				) d_ed

>> WHERE d_ed.emplid = a.emplid
>> AND d_ed.empl_rcd = a.empl_rcd
>> AND d_ed.effdt <= a.pay_end_dt
>> AND rownum <= 1
>> )
This is more efficient that the analytic function because it only gets one row from the d_ed and then stops. The original subqueries fetch all the matching rows, and then determine the
maximum value. Similarly, the analytic functions fetch all the rows from the table and perform a window sort.

Another area where you will have trouble in PS/Query is the multi-column predicates.
This is Oracle specific - you can't do this on SQL server - so PS/Query will not help you to generate this.
You have to use expressions.

>> AND (d.effdt,d.effseq) = (SELECT effdt, effseq...

regards



David Kurtz
Go-Faster Consultancy Ltd.
tel: +44 (0)7771 760660
fax: +44 (0)7092 348865
mailto:david.kurtz_at_go-faster.co.uk
web: www.go-faster.co.uk
Book: PeopleSoft for the Oracle DBA: http://www.psftdba.com DBA Blogs: PeopleSoft: http://blog.psftdba.com, Oracle: http://blog.go-faster.co.uk
PeopleSoft DBA Forum: http://groups.yahoo.com/group/psftdba
>-----Original Message-----
>From: oracle-l-bounce_at_freelists.org 
>[mailto:oracle-l-bounce_at_freelists.org] On Behalf Of Stephane Faroult
>Sent: Tuesday, May 12, 2009 9:03 PM
>To: Joel.Patterson_at_crowley.com
>Cc: oracle-l_at_freelists.org
>Subject: Re: theory of rewriting a query
>Importance: High
>
>Joel,
>
>   What really kills in this type of quey is the fact that the 
>subqueries are dependent on each other  - for instance the 
>second subquery depends on d.effdt that is determined by the 
>first subquery.
>You have the same type of unsound relationship between 
>subqueries 3 and 4. In fact, you want, so to speak, a "greater 
>value of greater value "
>(in that case the greatest effseq for the greatest effdt), 
>which usually calls for analytic functions as Ken Naim suggested.
>
>If you take the subqueries in isolation:
>
> (SELECT max(d_ed.effdt)
>               FROM ps_job d_ed
>               WHERE d.emplid = d_ed.emplid
>                 AND d.empl_rcd = d_ed.empl_rcd
>                 AND d_ed.effdt <= a.pay_end_dt)
>
>(SELECT max(d_es.effseq)
>               FROM ps_job d_es
>               WHERE d.emplid = d_es.emplid
>                 AND d.empl_rcd = d_es.empl_rcd
>                 AND d.effdt = d_es.effdt)
>
>(SELECT max(e_ed.effdt)
>               FROM ps_job e_ed
>               WHERE e.emplid = e_ed.emplid
>                 AND e.empl_rcd = e_ed.empl_rcd
>                 AND e_ed.effdt <= a.pay_end_dt)
>
>(SELECT max(e_es.effseq)
>               FROM ps_job e_es
>               WHERE e.emplid = e_es.emplid
>                 AND e.empl_rcd = e_es.empl_rcd
>                 AND e.effdt = e_es.effdt)
>
>You see that they all refer to the same table, that emplid and 
>empl_rcd are the major link to the outer world, but that we 
>are correlated to different entities. You can easily condense 
>two queries into one, for
>instance:
>
> (SELECT max(d_ed.effdt)
>               FROM ps_job d_ed
>               WHERE d.emplid = d_ed.emplid
>                 AND d.empl_rcd = d_ed.empl_rcd
>                 AND d_ed.effdt <= a.pay_end_dt)
>
>(SELECT max(d_es.effseq)
>               FROM ps_job d_es
>               WHERE d.emplid = d_es.emplid
>                 AND d.empl_rcd = d_es.empl_rcd
>                 AND d.effdt = d_es.effdt)
>
>can become
>
>select effdt, effseq
>from (select j.effdt, j.effseq, rank() over (order by j.effdt 
>desc, j.effseq desc) rnk
>         from ps_job j
>         where j.emplid = d.emplid
>             and j.empl_rcd = j.empl_rcd
>             and j.effdt <= a.pay_end_dt) where rnk = 1
>
>(you can also get something similar with an order by inside 
>the inline view and a condition on rownum at the higher level, 
>but an analytic function will probably be better downstream - 
>for instance, perhaps what you'll really want ultimately is
>
>select emplid, empl_rcd, effdt, effseq
>from (select j.emplid, j.empl_rcd,
>                    j.effdt, j.effseq,
>                    rank() over (partition by j.eplid, j.empl_rcd,
>                                      order by j.effdt desc, j.effseq
>desc) rnk
>         from ps_job j
>         where j.effdt <= a.pay_end_dt)
>where rnk = 1
>
>)
>
>
>At this stage you get two subqueries instead of four. Then you 
>should ask yourself whether you couldn't, at the outer level, 
>refer to each table once instead of twice, use an OR, multiple 
>CASE ... END constructs and a GROUP BY to bring everything in 
>a single pass - at which point you can perhaps reduce your two 
>subqueries to one, partition by emplid and empl_rcd and so on ...
>
>A few hours of work if you want to do it properly, but I hope 
>you get the idea.
>
>HTH
>
>Stéphane Faroult
>
>
>
>Joel.Patterson_at_crowley.com wrote:
>> Just off the top of anyones head going more for the theoretical side,
>> can anyone suggest a better way to write the below 
>peoplesoft query.   I
>> believe this uses a lot of temp tablespace, and something 
>tells me the 
>> subqueries are an issue.
>>
>> Maybe not, but would you in general unravel the subqueries and do it
>> another way?   Same type of querie I see a lot uses union all half a
>> dozen times.
>>
>> Just wondering
>>
>>
>>
>> SELECT a.company, a.union_cd, d.location, a.emplid, a.name, to_char(

>> c.orig_hire_dt, 'YYYY-MM-DD'), f.name, a.erncd,
>sum(a.hours_crw)
>> FROM ps_gl_transact_det a, ps_person c, ps_pers_srch_qry c1,
>> ps_job d,
>> ps_emplmt_srch_qry d1, ps_job e, ps_emplmt_srch_qry e1,
>> ps_person_name f
>> WHERE c.emplid = c1.emplid
>> AND c1.rowsecclass = 'HRSALLU'
>> AND d.emplid = d1.emplid
>> AND d.empl_rcd = d1.empl_rcd
>> AND d1.rowsecclass = 'HRSALLU'
>> AND e.emplid = e1.emplid
>> AND e.empl_rcd = e1.empl_rcd
>> AND e1.rowsecclass = 'HRSALLU'
>> AND a.emplid = c.emplid
>> AND a.emplid = d.emplid
>> AND d.empl_rcd = a.empl_rcd
>> AND a.cmc_trans_type = 'C'
>> AND a.debit_credit = 'DB'
>> AND a.union_cd IN ('ADT', 'AD3', 'ADO')
>> AND a.erncd IN ('REG', 'OVR')
>> AND a.gl_acctng_period BETWEEN :1 AND :2
>> AND d.effdt = (SELECT max(d_ed.effdt)
>> FROM ps_job d_ed
>> WHERE d.emplid = d_ed.emplid
>> AND d.empl_rcd = d_ed.empl_rcd
>> AND d_ed.effdt <= a.pay_end_dt)
>> AND d.effseq = (SELECT max(d_es.effseq)
>> FROM ps_job d_es
>> WHERE d.emplid = d_es.emplid
>> AND d.empl_rcd = d_es.empl_rcd
>> AND d.effdt = d_es.effdt)
>> AND d.reports_to = e.position_nbr
>> AND e.effdt = (SELECT max(e_ed.effdt)
>> FROM ps_job e_ed
>> WHERE e.emplid = e_ed.emplid
>> AND e.empl_rcd = e_ed.empl_rcd
>> AND e_ed.effdt <= a.pay_end_dt)
>> AND e.effseq = (SELECT max(e_es.effseq)
>> FROM ps_job e_es
>> WHERE e.emplid = e_es.emplid
>> AND e.empl_rcd = e_es.empl_rcd
>> AND e.effdt = e_es.effdt)
>> AND e.emplid = f.emplid
>> AND e.empl_status = 'A'
>> AND f.name_type = 'PRI'
>> GROUP BY a.company, a.union_cd, d.location, a.emplid, a.name,
>> to_char(
>> c.orig_hire_dt, 'YYYY-MM-DD'), f.name, a.erncd
>> ORDER BY 2, 3, 7, 5
>> >> >> >> -- >> http://www.freelists.org/webpage/oracle-l >> >> >> >> > > >-- >http://www.freelists.org/webpage/oracle-l > > -- http://www.freelists.org/webpage/oracle-l
Received on Tue May 12 2009 - 16:37:04 CDT

Original text of this message