Re: Tuning Update query

From: Lok P <loknath.73_at_gmail.com>
Date: Wed, 19 May 2021 02:08:46 +0530
Message-ID: <CAKna9VYuD7soBjb7XCMF-q0Tgqo+n82kmp68jU6p-L0p5sEyFw_at_mail.gmail.com>

Thank You Mark.

I am not sure if I was able to communicate the scenario properly, so I am trying that with a sample script here as I can not create the exact tables with data. I created four tables, one GTT and two transaction tables(TXN_TAB1,txn_tab2) and another PRODUCT_TAB which holds data in denormalized fashion to mimic the situation.

In current procedure we have one INSERT(like below one) which populates the GTT_TAB but the columns related to PRODUCT_TAB were not populated in that same INSERT, mostly because it has one to many relationship with the transaction table txn_tab1, txn_tab2. So those columns were UPDATED later using multiple UPDATE queries(i added two of sample queries).

We are seeing performance issues with so many different UPDATE queries for each different product code which endup accessing the PRODUCT_TAB multiple times. So I was thinking if we can populate those product related columns during INSERT itself . One way is as you mentioned, to use SELF joins , but then as we are updating 10+ such columns, thus accessing the product table(which is huge in size) 10 times in the select part of it won't be a good idea. Also as we update millions of rows it will be called once for each row as it will be queried/accessed in the SELECT part.

So I was thinking if i can use that product_tab as a Join and populate all the columns in the existing INSERT itself, but then that will also make the data duplication because we have one to many relationship between transaction table and product_tab. Other ways like combining into one UPDATE statement and also MERGE query both resulting in error as below.

So is it that , creating a separate temp table and putting the UPDATE logic in them and truncating alternately is the only option for us here?

******Current INSERT and UPDATE statement in the procedure is as below

insert into myschema1.gtt_tab (tdid, part_dt,txn_col1,txn_col2)

select t1.tdid, t1.part_dt, t1.col1, t2.col2

from myschema1.txn_tab1 t1, myschema1.txn_tab2 t2

where t1.tdid= t2.tdid and t1.part_dt= t2.part_dt;

UPDATE myschema1.gtt_tab gtt1 set (c3,c4,c5) =

(select item1,item2,item3 from myschema1.product_tab p where p.code= 'AA' and p.tdid= gtt1.tdid and p.part_dt= gtt1.part_dt );

UPDATE myschema1.gtt_tab gtt1 set (c6,c7)= (select item6,item7 from myschema1.product_tab p where p.code= 'FF' and p.tdid= gtt1.tdid and p.part_dt= gtt1.part_dt );

*VS *

Below will result into duplicate rows... so is not equivalent of above existing INSERT + UPDATE in the proc

insert into myschema1.gtt_tab (tdid,
part_dt,txn_col1,txn_col2,c3,c4,c5,c6,c7)

select t1.tdid, t1.part_dt, t1.col1, t2.col2,

case when code='AA' then item1 else null end,

case when code='AA' then item2 else null end,

case when code='AA' then item3 else null end,

case when code='FF' then item6 else null end,

case when code='FF' then item7 else null end

from myschema1.txn_tab1 t1, myschema1.txn_tab2 t2, myschema1.product_tab p

where t1.tdid= t2.tdid and t1.part_dt= t2.part_dt and p.tdid=t1.tdid and p.part_dt= t1.part_dt;

*VS*

Below Update by combining multiple UPDATES using CASE is resulting into ora-01427 i.e. 'single row subquery returns more than one row'

Update gtt_tab gtt1 set (c3,c4,c5,c6,c7) = (select case when code='AA' then item1 else null end,

case when code='AA' then item2 else null end,

case when code='AA' then item3 else null end,

case when code='FF' then item6 else null end,

case when code='FF' then item7 else null end

from product_tab p where p.tdid= gtt1.tdid and p.part_dt= gtt1.part_dt )

*VS*

Using merge query, but it's throwing Ora-30926- unable to get a stable set of rows in the source table.

MERGE INTO gtt_tab gtt1

using (select item1, item2, item3,item6,item7 , tdid, part_dt,code from product_tab )p

ON ( p.tdid= gtt1.tdid and p.part_dt= gtt1.part_dt )

when matched then

update set gtt1.c3=case when p.code='AA' then p.item1 else null end,

gtt1.c4=case when p.code='AA' then item2 else null end,

gtt1.c5=case when p.code='AA' then item3 else null end,

gtt1.c6=case when p.code='FF' then item6 else null end,

gtt1.c7=case when p.code='FF' then item7 else null end;

Below is the sample tables and test data

create global temporary table myschema1.gtt_tab

(TDID varchar2(4000),

PART_DT date,

txn_col1 varchar2(4000),

txn_col2 varchar2(4000),

c3 varchar2(4000),

c4 varchar2(4000),

c5 varchar2(4000),

c6 varchar2(4000),

c7 varchar2(4000),

c8 varchar2(4000) ) on commit preserve rows;

create table myschema1.product_tab

(TDID varchar2(4000),

PART_DT date,

CODE varchar2(4000),

Item1 varchar2(4000),

Item2 varchar2(4000),

Item3 varchar2(4000),

Item4 varchar2(4000),

Item5 varchar2(4000),

Item6 varchar2(4000),

Item7 varchar2(4000),

Item8 varchar2(4000));

insert into myschema1.product_tab
values(1,to_date('1-jan-2021','DD-MON-YYYY'),'AA','AAITEM1','AAITEM2','AAITEM3',null,null,null,null,null);

insert into myschema1.product_tab
values(1,to_date('1-jan-2021','DD-MON-YYYY'),'BB',null, 'BBITEM2',null,null,'BBITEM5',,null,null,null);

insert into myschema1.product_tab
values(1,to_date('1-jan-2021','DD-MON-YYYY'),'CC',null,null,'CCITEM3','CCITEM4',null,null,null,null);

insert into myschema1.product_tab
values(2,to_date('2-jan-2021','DD-MON-YYYY'),'DD','DDITEM1',null,null,null,'DDITEM4',null,null,null);

insert into myschema1.product_tab
values(2,to_date('2-jan-2021','DD-MON-YYYY'),'EE',null,'EEITEM2',null,null,null,'EEITEM5',null,null);

insert into myschema1.product_tab
values(2,to_date('2-jan-2021','DD-MON-YYYY'),'FF',null,null,null,null,null,null,'FFITEM6',null);

create table myschema1.TXN_TAB1

(TDID varchar2(4000),

PART_DT date,

col1 varchar2(4000));

insert into myschema1.txn_tab1
values(1,to_date('1-jan-2021','DD-MON-YYYY'),'ABC');

insert into myschema1.txn_tab1
values(2,to_date('2-jan-2021','DD-MON-YYYY'),'ABC');

insert into myschema1.txn_tab1
values(3,to_date('3-jan-2021','DD-MON-YYYY'),'ABC');

create table myschema1.TXN_TAB2

(TDID varchar2(4000),

PART_DT date,

col2 varchar2(4000));

insert into myschema1.txn_tab2
values(1,to_date('1-jan-2021','DD-MON-YYYY'),'XXX');

insert into myschema1.txn_tab2
values(2,to_date('2-jan-2021','DD-MON-YYYY'),'YYY');

insert into myschema1.txn_tab2
values(3,to_date('3-jan-2021','DD-MON-YYYY'),'ZZZ');

Create table end ******************************

Regards

Lok

On Mon, May 17, 2021 at 9:07 PM Powell, Mark <mark.powell2_at_dxc.com> wrote:

> Pap, If the posted update statement is complete, then I believe that it
> should have a WHERE clause to limit the rows updated by the sub-select: WHERE
> EXISTS ( same select as in SET clause).  If the sub-select can return more
> than one row then if the select list is the same for all returned rows you
> should be able to add, "and rownum = 1" to the select subquery to limit
> only one row being returned to the SET.  If the select expression list
> values vary by returned rows, then you need a condition to determine which
> is the right one to return.
>
> Mark Powell
> Database Administration
> (313) 592-5148
>
>
> ------------------------------
> *From:* oracle-l-bounce_at_freelists.org <oracle-l-bounce_at_freelists.org> on
> behalf of Lok P <loknath.73_at_gmail.com>
> *Sent:* Saturday, May 15, 2021 2:41 PM
> *To:* Mark W. Farnham <mwf_at_rsiz.com>
> *Cc:* Jonathan Lewis <jlewisoracle_at_gmail.com>; Oracle L <
> oracle-l_at_freelists.org>
> *Subject:* Re: Tuning Update query
>
> So it seems , it's not possible to do it by modifying the same existing
> INSERT query. For e.g if i ignore the one oddity which Jonathan highlighted
> wrt CIND column filter in the UPDATE. And the current update statement
> looks like below. And if i try to incorporate directly this one in the
> existing INSERT query like i posted just now above, it will result in
> 'single row subquery resulting in more than one row" error. As
> becausethe product_tab has a unique key combining all three columns TID,
> PART_DT, CODE. So by ust joning on TID, PART_DT won't help us here when we
> have to deal with multiple CODE like this in the same query. So then the
> only way is to go by is creating a new global temporary(say GTT1) table and
> populate it with equivalent INSERT as its there in the UPDATE statement and
> then we can truncate the existing one (GTT_TAB) and in next step populate
> the (GTT_TAB) with another INSERT equivalent of UPDATE and then truncate
> GTT1, likewise we have to replace all the UPDATE statements. Correct me if
> wrong.
>
> UPDATE GTT_TAB TMP
>    SET (c1,  c2,  c3, c4,c5.. c11) =
>           ( (SELECT product_col1,product_col2,..product_col11
>                FROM PRODUCT_TAB P
>               WHERE     P.TID = TMP.TID
>                      AND P.PART_DT = TMP.PART_DT
>                     AND P.CODE = 'XX'))
>
>
> On Sat, May 15, 2021 at 10:49 PM Mark W. Farnham <mwf_at_rsiz.com> wrote:
>
> *case when P.code='ZZ' then product_col3 else null end , *
>
> *   case when P.code='ZZ' then product_col4 else null end  *
>
>
>
> I believe you have two disjoint sets with ZZ that you are combining so you
> need to figure out how to get col3 for one and col4 for the other when you
> combine them.
>
>
>
> JL mentioned already one apparent bug and this is another unless you fix
> it.
>
>
>
>
>
> *From:* oracle-l-bounce_at_freelists.org [mailto:
> oracle-l-bounce_at_freelists.org] *On Behalf Of *Lok P
> *Sent:* Saturday, May 15, 2021 12:27 PM
> *To:* Mark W. Farnham
> *Cc:* Jonathan Lewis; Oracle L
> *Subject:* Re: Tuning Update query
>
>
>
> Thank You so much Mark.
>
>
>
> I was thinking if we can still able to modify the existing INSERT query
> itself which populates the GTT_TAB which is already using 2-3 other source
> tables in the existing equi join. Below is how the existing INSERT query
> looks like.It's just a sample not exact query though.
>
>
>
> So basically , As I mentioned, the insert which is loading data into the
> GTT_TAB is populating all the columns from 2-3 other tables but not
> populating any data in the columns which were later populated from
> PRODUCT_TAB using UPDATE statement. So let's say we need product_col1,
> product_col2, product_col3, product_col4 from table PRODUCT_TAB to be
> updated in GTT_TAB for a specific TID(transaction ID), PART_DT(partition
> date). But for a specific TID and PART_DT there exists multiple records in
> PRODUCT_TAB. In table product_tab we have unique keys as (TID, PART_DT,
> CODE(which is nothing but product code)).
>
>
>
> So my thought was rather having new separate GTT's created for those
> UPDATE statement, if i  can tweak the same existing INSERT statement(which
> populates GTT_TAB) to include table PRODUCT_TAB as OUTER JOIN and write
> multiple CASE statements to populate product_col1, product_col2,
> product_col3 etc based on Product_code. But it appears that I will
> encounter "single row subquery returning more than one row" error.. And
> also it may result in more rows in the final result set as there exists
> multiple rows for the combination of TID,PART_DT which is used as join
> criteria. The differentiating factor is column CODE in product_tab. So I
> want to understand , Is there a possible way out to make it happen in the
> existing INSERT query or I have to go for the separate INSERT queries using
> new GTT's as Jonathan and you both suggested?
>
>
>
> Insert into GTT_TAB
>
> (......
>
>  )
>
> select .....
>
> from A, B, C
>
> where A.tid=b.tid and b.tid=c.tid and a.part_dt= b.part_dt and
> b.part_dt=c.part_dt;
>
>
>
> *Modified query;-*
>
>
>
> Insert into GTT_TAB
>
> (...... C1, C2, C3,C4
>
>  )
>
> select .....
>
> *case when P.code='XX' then product_col1 else null end,*
>
> *case when P.code='YY' then product_coll2 else null end,*
>
> * case when P.code='ZZ' then product_col3 else null end , *
>
> *   case when P.code='ZZ' then product_col4 else null end  *
>
> from A, B, C,* product_tab P*
>
> where A.tid=b.tid and b.tid=c.tid and a.part_dt= b.part_dt and
> b.part_dt=c.part_dt *and c.tid= P.tid(+) and c.part_dt=P.part_dt(+) *;
>
>
>
>
>
> Regards
>
> Lok
>
>
>
> On Fri, May 14, 2021 at 10:33 PM Mark W. Farnham <mwf_at_rsiz.com> wrote:
>
> That says so me: do the load population of the GTT_0, and then convert
> each update as an insert into GTT_n, where n increases from 1 to m for each
> of the updates you have. If space is a problem, you can drop GTT_n-1 after
> each insert completes.
>
>
>
> I believe that is essentially the synthesis of what JL suggested with a
> trivial, one statement at a time version of the “scaling to infinity”
> method documented and by popular by Tim Gorman. The results **should be**
> full set operations at direct (aka append) speed. By cascading through GTTs
> you dispense with a ton of recoverability overhead, and then you can
> preserve the final result also inserting into the final destination.
>
>
>
> Depending on size, you might want to produce a GTT_Product_tab that
> includes just the rows that GTT_0 says you might require so you don’t have
> to paw through Product_tab n times.
>
>
>
> I guess that depends mostly on whether it is easier to debug your existing
> updates into correct copy/inserts or a one time build the full insert row
> select. Based on JL seeing the apparent bug in the updates, my guess would
> be a series of cascading inserts would be less time to debug for the team
> building this and it shouldn’t be significantly slower. Besides, it’s not
> clear that P.code=’AA’ even can be done in one step.
>
>
>
> Good luck.
>
>
>
> *From:* oracle-l-bounce_at_freelists.org [mailto:
> oracle-l-bounce_at_freelists.org] *On Behalf Of *Lok P
> *Sent:* Friday, May 14, 2021 6:11 AM
> *To:* Mark W. Farnham
> *Cc:* Jonathan Lewis; Oracle L
> *Subject:* Re: Tuning Update query
>
>
>
>   Thank you very much for the guidance here.
>
>
>
> While going through the code , I noticed that none of the UPDATED columns
> was getting populated as part of the INSERT query which is populating data
> in the GTT inside the code. Those are getting populated for the first time
> through the UPDATE statements i.e. post load. Which means I think, we can
> outer join those tables(like PRODUCT_TAB) directly in the INSERT query
> itself and populate the columns rather than doing it after data load.
>
>
>
> But then I see there exist one to many relationship between the sources
> table JOIN(which populates the base data into GTT ) and the PRODUCT_TAB, so
> it means the outer join  is going to break things because the overall
> number of results set will increase which may not be correct. Or else we
> may have to use a DISTINCT clause to get those corrected.
>
>
>
> Regards
>
> Lok
>
>
>
> On Tue, May 4, 2021 at 5:03 PM Mark W. Farnham <mwf_at_rsiz.com> wrote:
>
> Do you have handy the script or load used to create GTT_TAB?
>
>
>
> It seems passing strange that you want to debug what is essentially a row
> by row case computation instead of creating the disjoint sets of rows to
> operate on as a bulk set.
>
>
>
> Consider GTT_TAB_XX_Y alias XX_Y, which contains only TMP.CIND=’Y’ and
> TMP.PART_DT for which the corresponding TMP.PART_DT = P.PART_DT has
> P.CODE=’XX’.
>
>
>
> Then insert into GTT_TAB
>
>
>
> Select p.<the column list from P>, xx_y.<the column list from XX_Y> where
> p.tid=xx_y.tid and p.part_dt=xx_y.part_id
>
>
>
> All the rows from xx_y are used and should be a full table scan, matching
> against the presumably indexed p.tid, p.partid from P.
>
>
>
> Do this for each of your separate “updates”, appending into GTT_TAB, and
> directly append the disjoint bits that require no update directly into
> GTT_TAB however you do now. The only tricky set is P.CODE=’AA’ for which
> you have two updates. For that, use GTT_TAB_AA_5 gets appended into
> GTT_TAB_AA_6 and GTT_TAB_AA_6 gets appended into GTT_TAB.
>
>
>
> All the other rows (which meet zero of your update predicates) you dump
> directly into GTT_TAB from wherever you are currently plucking them.
>
>
>
> This of course fixes your ‘Y’ problem and operates set wise using bulk
> inserts AND operates only on rows that need attention instead of filtering
> out the rows you’re not interested in for each update.
>
>
>
> Write out the differences as a dataflow diagram to understand why this
> always wins.
>
>
>
> mwf
>
>
>
> *From:* oracle-l-bounce_at_freelists.org [mailto:
> oracle-l-bounce_at_freelists.org] *On Behalf Of *Jonathan Lewis
> *Sent:* Tuesday, May 04, 2021 5:07 AM
> *To:* Lok P
> *Cc:* Oracle L
> *Subject:* Re: Tuning Update query
>
>
>
>
>
> a)  I wasn't suggesting that you use an updatable join view, but I can see
> how you could misinterpret what I said. The idea was to see if you could
> write a single subquery that joined several of the source tables (such as
> PRODUCT_TAB) and then still carry out that full tablescan of the GTT but
> using that join as your subquery update rather than having "one tablescan
> and update = one subquery".
>
>
>
> I see you're already doing something of this kind using the MERGE command;
> if you're getting an ORA-01779 when you try to rewrite one of your updates
> as an update through an updatable join view this means that equivalent code
> to do a MERGE could produce a run-time error ORA-30926). The update portion
> of a merge and an updatable join view both require the same uniqueness but
> the view enforces the logical requirement at compile time while the merge
> command allows you to get away with the update so long as you get lucky
> with the data.
>
>
>
> I note that you still haven't moved the "tmp.cind = 'Y'" predicate to the
> correct position - that should have been the zeroth step before worrying
> about anything else. I assume the one update I picked out isn't the only
> one where this error occurs.
>
>
>
>
>
> b) Correct regarding multiple GTTs. Each "insert /*+ append */" would
> probably require it's own target GTT.  Remember, though, that I haven't
> examined your original posting in detail, and you probably haven't told us
> everything a consultant would ask about so if your current code has some
> parts that update disjoint subsets of the data you might find parts of the
> rewrite where you could do multiple inserts into the same GTT.
>
>
>
> One thing to bear in mind - a possible bar to adopting this approach - is
> that you would have to commit after insert otherwise the next insert, or
> the next query against the target GTT would raise the (unexpectedly
> parallel) error:   ORA-12838: cannot read/modify an object after modifying
> it in parallel
>
>
>
> Regards
>
> Jonathan Lewis
>
>
>
>
>
>
>
>
>
>
>
> On Mon, 3 May 2021 at 12:58, Lok P <loknath.73_at_gmail.com> wrote:
>
> Thank You Jonathan and Lothar. I was trying to modify one of the sample
> updates(as below) as per your suggested method.
>
>
>
> UPDATE GTT_TAB TMP
>
>   SET (c1, c2, c3, c4,c5.. c11) =
>
>          ( (SELECT col1,col2,..col11
>
>               FROM PRODUCT_TAB P
>
>              WHERE    P.TID = TMP.TID
>
>                    AND TMP.CIND = 'Y'
>
>                    AND P.PART_DT = TMP.PART_DT
>
>                    AND P.CODE = 'XX'))
>
>
>
> Jonathan, let me know if my understanding is correct on the suggested
> points
>
> *" a) See if you can minimise the number of update ste[ps by updating from
> a JOIN of several source tables" *
>
>
>
> So if I get the above point correct then I was trying to modify the UPDATE
> as below , but I am getting  ORA-01779 while running to see the plan. So
> does it mean that the GTT has to have a unique key present in it to have
> this method work?
>
>
>
> UPDATE (SELECT P.COL1...P.col11,TMP.C1.. TMP.C11
>
>               FROM PRODUCT_TAB P,GTT_TAB TMP
>
>              WHERE    P.TID = TMP.TID
>
>                    AND TMP.CIND = 'Y'
>
>                    AND P.PART_DT = TMP.PART_DT
>
>                    AND P.CODE = 'XX'
>
> )      SET    C1=COL1,    ...     C11=COL11;
>
> ERROR at line 10:
>
> ORA-01779: cannot modify a column which maps to a non key-preserved
> table
>
>
>
> Regarding below point ,
>
> *"b) Rewrite the code to step through a series of GTT doing*
>
> *        insert /*+ append */ into next_gtt select from previous_gtt join
> {source tables}"*
>
>
>
> Do you mean i have to replace the UPDATE with INSERT /*+APPEND*/...
> queries but as it cant be done in the same GTT , so i have to create
> multiple GTT's for each UPDATES, so as to replace them with INSERT APPEND
> queries?
>
>
>
>
>
>
>
>

--
http://www.freelists.org/webpage/oracle-l

Received on Tue May 18 2021 - 22:38:46 CEST

This message: [ Message body ]
Next message: Pap: "PGA on higher version"
Previous message: Tim Gorman: "Re: Linux VM Failover"
Maybe in reply to: Lok P: "Tuning Update query"
Next in thread: Lok P: "Re: PGA on higher version"

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ]

Original text of this message