Re: Aggregation Procedure?

From: Thomas Kyte <tkyte_at_oracle.com>
Date: 20 Jun 2002 05:00:04 -0700
Message-ID: <aesg4402s47@drn.newsguy.com>

In article <aeq1o6$ocg$1_at_babylon.agtel.net>, "Vladimir says...
>
>Just curious how long it took you to design this merge that even takes into
>account the fact that it may be ran more than once and still have to produce
>correct result (that is, necessity of where not exists)? Even minding your
>mantra I would resort to PL/SQL for doing this, and it seems that in 8i it's
>the only way, using your SQL and merging in a loop using update...
>if sql%rowcount=0 then insert... :)

I've been told that I speak in SQL sometimes.

It was a simple matter of

listing down the requirements (which came a bit at a time from HJR).
formulating in my head what needed be done (hey, I needed to generate that set anyway, that 'tricky' subquery)
deciding that this was a merge and that 9i does "merge"

and the statement "fell out". No kidding, if you read my query from the inside out -- you can almost "see" the process/procedural logic in there.

You needed that set I merge with regardless. (the set of new records to insert or existing records to update). You might have generated a LARGER set then I did and reduce it at runtime by catching things like "dup val on index" and ignoring them (rather then use the where not exists) or updating a row that didn't need to be updated.

Once you had that set (which you need to develop regardless) -- merge is trivial.

It took the same or less time then writing brute force procedural code that had to go through steps a, b and c anyway. SQL is just a language -- why do you think it would take longer to write a sql query then procedural code -- if you are adept at both??

I cannot recount the number of times I've take 100 or more lines of procedural code, figured out the "solution" it was trying to do, and turning that into a single sql statement for massive increases in performance (and productivity). When I get questions -- they are many times "hey, here is a bunch of code, how can I make it run faster". I ask them to not give me the code, give me the PROBLEM statement -- then I can tune much easier (by bypassing the procedural code). I'll stick with my mantra.

Here is another example:

http://asktom.oracle.com/pls/ask/f?p=4950:8:::::F4950_P8_DISPLAYID:3561132847397

The sql insert I developed -- complex (if you don't understand and know what is available to you -- I'm sure many people will look at my insert and go "huh" but hopefully they will then go out and read all about analytic functions and go "wow"). But soooo much faster, more compact, and if you understand the constructs available -- readable.

>
>--
>Vladimir Zakharychev (bob@dpsp-yes.com) http://www.dpsp-yes.com
>Dynamic PSP(tm) - the first true RAD toolkit for Oracle-based internet
>applications.
>All opinions are mine and do not necessarily go in line with those of my
>employer.
>
>
>"Thomas Kyte" <tkyte_at_oracle.com> wrote in message
>news:aeprsi0m4m_at_drn.newsguy.com...
>> In article <aep5im$dqe$1_at_lust.ihug.co.nz>, "Howard says...
>> >
>> >HI Thomas,
>> >
>> >Appreciate this, since it's small, compact and effective. Trouble is, I
>> >would like to actually insert a new 9999-plot row if appropriate, or update
>> >the existing plot's Y agflag to N if it's a singleton. Not just report on
>> >the same... do real DML. Which is why I though PL/SQL would be required.
>> >
>> >Any further insights?
>> >
>>
>> 9i can do this. We can generate a set that is
>>
>> a) all records to be inserted (primary key doesn't exist)
>> b) all records to be updates (primary key exists and aggflag is 'Y')
>>
>> we do this by modifying our query to include a "having" clause to get just
>> aggregates (having count(*) > 1 -- these are candidates for insertion) or
>>singletons whose aggflag is Y (these are ones that need to be updated). Once we
>> have those, we'll only keep
>>
>> a) records that do not already exist, using a where not exists
>> b) records that we KNOW we'll be updating (count(*) = 1 records)
>>
>>
>> Then, we merge. Here is the entire thing:
>>
>>
>> ops$tkyte_at_ORA9I.WORLD> select * from t;
>>
>> AREA SITE FTR PLOT SQUARE_MTR A
>> ---------- ---------- ----- ---------- ---------- -
>> 1 BRITPK BR1 1 10 Y
>> 1 BRITPK BR1 2 10 Y
>> 1 BRITPK BR1 3 15 N
>> 1 BRITPK BS1 1 5 Y
>> 1 BRITPK BS1 2 7 Y
>> 1 ANDSO GR1 1 100 Y
>> 1 ANDSO GR1 2 120 Y
>> 1 ANDSO BR1 1 150 Y
>>
>> 8 rows selected.
>>
>> that is what we start with, then we merge:
>>
>>
>> ops$tkyte_at_ORA9I.WORLD>
>> ops$tkyte_at_ORA9I.WORLD> merge into t
>> 2 using
>> 3 ( select *
>> 4 from (select area, site, ftr,
>> 5 decode( count(*), 1, max( plot ), '9999' ) plot,
>> 6 sum(square_mtr) square_mtr, 'N' aggflag,
>> 7 count(*) cnt
>> 8 from t
>> 9 group by area, site, ftr, decode( aggflag, 'Y', 9999, plot )
>> 10 having count(*) > 1
>> 11 or ( count(*) = 1 and max(aggflag) = 'Y' )
>> 12 ) a
>> 13 where not exists ( select null
>> 14 from t
>> 15 where t.area = a.area
>> 16 and t.site = a.site
>> 17 and t.ftr = a.ftr
>> 18 and t.plot = a.plot)
>> 19 or cnt = 1
>> 20 ) t2
>> 21 on ( t.area = t2.area and
>> 22 t.site = t2.site and
>> 23 t.ftr = t2.ftr and
>> 24 t.plot = t2.plot )
>> 25 when matched then
>> 26 update set t.aggflag = t2.aggflag
>> 27 when not matched then
>> 28 insert ( area, site, ftr, plot, square_mtr, aggflag )
>> 29 values
>> 30 ( t2.area, t2.site, t2.ftr, t2.plot, t2.square_mtr, t2.aggflag )
>> 31 /
>>
>> 4 rows merged.
>>
>> Note that only 4 rows were merged. We in fact had 3 to insert and one to
>> update.
>>
>> ops$tkyte_at_ORA9I.WORLD> select * from t;
>>
>> AREA SITE FTR PLOT SQUARE_MTR A
>> ---------- ---------- ----- ---------- ---------- -
>> 1 BRITPK BR1 1 10 Y
>> 1 BRITPK BR1 2 10 Y
>> 1 BRITPK BR1 3 15 N
>> 1 BRITPK BS1 1 5 Y
>> 1 BRITPK BS1 2 7 Y
>> 1 ANDSO GR1 1 100 Y
>> 1 ANDSO GR1 2 120 Y
>> 1 ANDSO BR1 1 150 N
>> 1 ANDSO GR1 9999 220 N
>> 1 BRITPK BR1 9999 20 N
>> 1 BRITPK BS1 9999 12 N
>>
>> 11 rows selected.
>>
>>I believe that is the set you wanted to generate -- the three new rows and the
>> one ANDSO row's aggflag updated...
>>
>> If we rerun the merge right now:
>>
>> ops$tkyte_at_ORA9I.WORLD>
>> ops$tkyte_at_ORA9I.WORLD> merge into t
>> 2 using
>> 3 ( select *
>> 4 from (select area, site, ftr,
>> 5 decode( count(*), 1, max( plot ), '9999' ) plot,
>> 6 sum(square_mtr) square_mtr, 'N' aggflag,
>> 7 count(*) cnt
>> 8 from t
>> 9 group by area, site, ftr, decode( aggflag, 'Y', 9999, plot )
>> 10 having count(*) > 1
>> 11 or ( count(*) = 1 and max(aggflag) = 'Y' )
>> 12 ) a
>> 13 where not exists ( select null
>> 14 from t
>> 15 where t.area = a.area
>> 16 and t.site = a.site
>> 17 and t.ftr = a.ftr
>> 18 and t.plot = a.plot)
>> 19 or cnt = 1
>> 20 ) t2
>> 21 on ( t.area = t2.area and
>> 22 t.site = t2.site and
>> 23 t.ftr = t2.ftr and
>> 24 t.plot = t2.plot )
>> 25 when matched then
>> 26 update set t.aggflag = t2.aggflag
>> 27 when not matched then
>> 28 insert ( area, site, ftr, plot, square_mtr, aggflag )
>> 29 values
>> 30 ( t2.area, t2.site, t2.ftr, t2.plot, t2.square_mtr, t2.aggflag )
>> 31 /
>>
>> 0 rows merged.
>>
>>
>> We can see it finds no new rows to insert or existing rows to update...
>>
>>
>> ops$tkyte_at_ORA9I.WORLD> select * from t;
>>
>> AREA SITE FTR PLOT SQUARE_MTR A
>> ---------- ---------- ----- ---------- ---------- -
>> 1 BRITPK BR1 1 10 Y
>> 1 BRITPK BR1 2 10 Y
>> 1 BRITPK BR1 3 15 N
>> 1 BRITPK BS1 1 5 Y
>> 1 BRITPK BS1 2 7 Y
>> 1 ANDSO GR1 1 100 Y
>> 1 ANDSO GR1 2 120 Y
>> 1 ANDSO BR1 1 150 N
>> 1 ANDSO GR1 9999 220 N
>> 1 BRITPK BR1 9999 20 N
>> 1 BRITPK BS1 9999 12 N
>>
>> 11 rows selected.
>>
>>
>> >Regards
>> >HJR
>> >
>> >
>> >"Thomas Kyte" <tkyte_at_oracle.com> wrote in message
>> >news:aeo0sl02qvu_at_drn.newsguy.com...
>> >> In article <aen6t2$kdc$1_at_lust.ihug.co.nz>, "Howard says...
>> >> >
>> >> >Calling all PL/SQL gurus (since I'm not one).
>> >> >
>> >> >Here's my data:
>> >> >
>> >> >AREA SITE FTR Plot SQUARE_MTR AGGFLAG
>> >> >1 BRITPK BR1 1 10.00 Y
>> >> >1 BRITPK BR1 2 10.00 Y
>> >> >1 BRITPK BR1 3 15.00 N
>> >> >1 BRITPK BS1 1 5.00 Y
>> >> >1 BRITPK BS1 2 7.00 Y
>> >> >1 ANDSO GR1 1 100.00 Y
>> >> >1 ANDSO GR1 2 120.00 Y
>> >> >1 ANDSO BR1 1 150.00 Y
>> >> >
>> >> >In other words, sites contain multiple features. The Primary Key is
>> >> >area||site||ftr||plot.
>> >> >
>> >> >I want to write a procedure that aggregates similar features on a site
>> >into
>> >> >a single 'aggregate plot', provided the real features' agflags are set to
>> >Y.
>> >> >Aggregate plots are identified with a '9999' plot number, and an agflag
>> >of
>> >> >"N" (since aggregates can't themselves be aggregated). Plots which are
>> >not
>> >> >to be aggregated (or which can't be because there is nothing to aggregate
>> >> >them with) just have their Aggflag set to "N".
>> >> >
>> >> >Aggregating the above data following that rule should yield the following
>> >> >(missing out the original plots for clarity's sake, unless they are
>> >> >non-aggregateable):
>> >> >
>> >> >AREA SITE FTR Plot SQUARE_MTR AGGFLAG
>> >> >1 BRITPK BR1 3 15.00 N
>> >> >1 BRITPK BR1 9999 20.00 N
>> >> >1 BRITPK BS1 9999 12.00 N
>> >> >1 ANDSO GR1 9999 220.00 N
>> >> >1 ANDSO BR1 1 150.00 N
>> >> >
>> >> >Note Plot 3 on BRITPK escaped aggregation because its aggflag was "N".
>> >The
>> >> >BR1 on ANDSO also escaped, because it was the only plot of that feature
>> >on
>> >> >the site.
>> >> >
>> >> >I wrote something in Visual Basic (!!) years ago to achieve something
>> >> >similar, and it involved searching through the table in AREA+SITE+FTR
>> >order,
>> >> >noting when there was a change, and bunging in the totals where
>> >appropriate.
>> >> >
>> >> >But I'd like to do it in PL/SQL... so if anyone has any ideas, I'd be
>> >> >grateful.
>> >> >
>> >> >Regards
>> >> >HJR
>> >> >
>> >> >
>> >>
>> >> ops$tkyte_at_ORA817DEV.US.ORACLE.COM> select area,
>> >> 2 site,
>> >> 3 ftr,
>> >> 4 decode( count(*), 1, max( plot ), '9999' ) plot,
>> >> 5 sum(square_mtr),
>> >> 6 'N'
>> >> 7 from t
>> >> 8 group by area, site, ftr, decode( aggflag, 'Y', 9999, plot )
>> >> 9 /
>> >>
>> >> AREA SITE FTR PLOT SUM(SQUARE_MTR) '
>> >> ---------- ---------- ----- ---------- --------------- -
>> >> 1 ANDSO BR1 1 150 N
>> >> 1 ANDSO GR1 9999 220 N
>> >> 1 BRITPK BR1 3 15 N
>> >> 1 BRITPK BR1 9999 20 N
>> >> 1 BRITPK BS1 9999 12 N
>> >>
>> >> may not be intuitive ;)
>> >>
>> >> but it works...
>> >>
>> >> we group by area, site, ftr AND IF aggflag=Y -- a constant, else the rest
>> >of the
>> >> primary key. That "folds" all of the records to be aggregatd together.
>> >Then,
>> >> we can use
>> >>
>> >> 4 decode( count(*), 1, max( plot ), '9999' ) plot,
>> >>
>> >>
>> >> to either
>> >>
>> >> o return 9999 if we actually aggregated.
>> >> o max(plot) -- which is just really plot -- if we did not
>> >>
>> >> --
>> >> Thomas Kyte (tkyte@oracle.com) http://asktom.oracle.com/
>> >> Expert one on one Oracle, programming techniques and solutions for Oracle.
>> >> http://www.amazon.com/exec/obidos/ASIN/1861004826/
>> >> Opinions are mine and do not necessarily reflect those of Oracle Corp
>> >>
>> >
>> >
>>
>> --
>> Thomas Kyte (tkyte@oracle.com) http://asktom.oracle.com/
>> Expert one on one Oracle, programming techniques and solutions for Oracle.
>> http://www.amazon.com/exec/obidos/ASIN/1861004826/
>> Opinions are mine and do not necessarily reflect those of Oracle Corp
>>
>

--
Thomas Kyte (tkyte@oracle.com)             http://asktom.oracle.com/ 
Expert one on one Oracle, programming techniques and solutions for Oracle.
http://www.amazon.com/exec/obidos/ASIN/1861004826/  
Opinions are mine and do not necessarily reflect those of Oracle Corp

Received on Thu Jun 20 2002 - 07:00:04 CDT