Re: Reading resource busy error system trace

From: Jonathan Lewis <jlewisoracle_at_gmail.com>
Date: Mon, 4 Jul 2022 15:22:26 +0100
Message-ID: <CAGtsp8=3cVY2bJn5Y4w+-11mkKYBC8QzaKTa-mH-6qjwtXPS7g_at_mail.gmail.com>

You didn't miss anything.

I read your comments about *systemstate level 266*, but my fingers typed *errorstack
level 1* when I wasn't watching because that's a good first choice to try when you see a session raising an error.

This is an extract from my test dump (with a lot of lines near the start of the call stack deleted):

Error Stack Dump ----- <error barrier> at 0x7ffc8604b220 placed dbkda.c_at_296 ORA-00054: resource busy and acquire with NOWAIT specified or timeout expired

Current SQL Statement for this session (sql_id=dutzuyds7kb59) ----- alter table child enable novalidate constraint c_fk_p

Call Stack Trace ----- calling call entry argument values in hex

location             type     point                (? means dubious value)

-------------------- -------- --------------------
----------------------------
ksedst1()+95         call     kgdsdst()            7FFC8604A680 000000002
                                                   7FFC860449B0 ?

7FFC86044AC8 ?
...

                                                   000000000 000000082 ?
ktaiam()+1110        call     ksesecl0()           7FC6171F49C0 ?
7FC6170D0050 ?
                                                   000000036 ? 000000036
                                                   09BB833E8 000000000
ktagetg0()+929       call     ktaiam()             00001E6ED 000000004
000000000
                                                   7FFC8604DAB8 ? 000000000
?
                                                   000000000 ?
ktagetp_internal()+  call     ktagetg0()           00001E6ED ? 000000004 ?
141                                                000000004 ? 7FFC8604DAB8
?
                                                   000000000 ? 000000000 ?
ktagetg_ddlX()+323   call     ktagetp_internal()   00001E6ED 000000000
000000004
                                                   000000000 ? 000000000
                                                   746DC3D600000000
ktagetg_ddl()+30     call     ktagetg_ddlX()       00001E6ED ? 000000000 ?
                                                   000000004 ? 000000000 ?
                                                   000000000 ? 000000000
kkdllk0()+1551       call     ktagetg_ddl()        00001E6ED ? 000000000 ?
                                                   000000004 ? 000000000 ?
                                                   000000000 ? 000000000 ?

In your systemstate dump you should find a shortstack that includes a section similar to:
...
ktaiam()+1110<-ktagetg0()+929<-ktagetp_internal()+141<-ktagetg_ddlX()+323<-ktagetg_ddl()+30 ...
which gets expanded to the detailed call stack above.

Regards
Jonathan Lewis

On Mon, 4 Jul 2022 at 13:26, Pap <oracle.developer35_at_gmail.com> wrote:

> Thank You Jonathan.
>
> In the event trace i don't see any 'call stack trace' but seeing 'Short
> stack dump' something as below. But again in that i don't see such
> object_id in hexa format. Am i missing anything?
>
> Also i didnt see any DML on the parent table same time in ASH but even dev
> team saying they dont perform any DML on the parent objects during this
> heavy child direct path insert runs.
>
> https://gist.github.com/oracle9999/95307babd0366cc0c25afd447fa0264d
>
>
> On Mon, 4 Jul 2022, 3:44 pm Jonathan Lewis, <jlewisoracle_at_gmail.com>
> wrote:
>
>>
>> Your dump file seems to be saying that the session is holding a mode 4 on
>> the "child" table, and since the insert /*+ append */ would have been
>> holding a mode 6 the commit must have cleared the lock properly before the
>> following code acquired the mode 4.
>>
>> Even though you're enabling with "novalidate" the enable would try to
>> lock the child and then the parent in mode 4, so the first thing you need
>> to check is that the problem isn't with the parent table being locked in an
>> incompatible fashion.
>>
>> I'm not 100% certain, but if you check the call stack trace (headed as
>> follows)
>>
>> ----- Call Stack Trace -----
>> calling call entry argument values in hex
>> location type point (? means dubious value)
>> -------------------- -------- --------------------
>> ----------------------------
>>
>> I think you'll find several lines similar to:
>> ktagetg0()+929 call ktaiam() 00001E6ED 000000004
>> 000000000
>> 7FFFC6AA87B8 ?
>> 000000000 ?
>> 000000000 ?
>> ktagetp_internal()+ call ktagetg0() 00001E6ED ? 000000004 ?
>> 141 000000004 ?
>> 7FFFC6AA87B8 ?
>> 000000000 ? 000000000 ?
>> ktagetg_ddlX()+323 call ktagetp_internal() 00001E6ED 000000000
>> 000000004
>> 000000000 ? 000000000
>> 4DE9A6BF00000000
>> ktagetg_ddl()+30 call ktagetg_ddlX() 00001E6ED ? 000000000 ?
>> 000000004 ? 000000000 ?
>> 000000000 ? 000000000
>> kkdllk0()+1551 call ktagetg_ddl() 00001E6ED ? 000000000 ?
>> 000000004 ? 000000000 ?
>> 000000000 ? 000000000 ?
>>
>> The first parameter will be the object_id of the parent table where the
>> attempt to lock failed. (In my sample 0001E6ED).
>>
>> This may take you one step forward in your search for the source of the
>> problem. Is it possible that you've got informatica doing "insert lots of
>> parent rows", "insert lots of child rows" nearly concurrently with the
>> expectation that the parent insert will complete and commit before the
>> child insert starts (or completes). I have seen a couple of cases where
>> such things happen, and if the parent insert gets delayed the child insert
>> crashes.
>>
>> Regards
>> Jonathan Lewis
>>
>>
>>
>>
>>
>>
>>
>> On Sat, 2 Jul 2022 at 18:57, Pap <oracle.developer35_at_gmail.com> wrote:
>>
>>> Hello All, I followed the blog below and tried debugging the system
>>> trace to find the blocking session/sql which caused this resource busy
>>> error, Ora-0054. But I am encountering strange behaviour. It appears to be
>>> the same session blocking itself.The session which did the direct path load
>>> and commit seems not releasing exclusive lock for some duration post
>>> commit. Below are the details of the behaviour. Has anybody
>>> encountered such a scenario?
>>>
>>>
>>> https://nenadnoveljic.com/blog/troubleshooting-ora-00054-resource-busy-and-acquire-with-nowait-specified-or-timeout-expired/
>>>
>>> So the code which is encountering the error is below. Its disabling all
>>> constraints + Making index unusable and then does direct path load to that
>>> table and then after while trying to enable the constraint of the same
>>> table using 'ALTER' and its failing with Ora-0054. The sample code for data
>>> load and ALTER table is as below.
>>>
>>> ****Sample code *****
>>> INSERT /*+Append*/ INTO TAB1 (c1,c2.....)
>>> SELECT (ROWNUM + sqq_key) ,col1, col2, col3..... from....tab2,
>>> tab3, tab4.....;
>>> COMMIT;
>>>
>>> FOR I IN (SELECT TABLE_NAME, CONSTRAINT_NAME FROM ALL_CONSTRAINTS WHERE
>>> TABLE_NAME = v_table_nm AND CONSTRAINT_TYPE = 'R' AND STATUS = 'DISABLED')
>>> LOOP
>>> EXECUTE IMMEDIATE ('ALTER TABLE ' || v_table_nm || ' ENABLE
>>> NOVALIDATE CONSTRAINT '|| I.CONSTRAINT_NAME);
>>> END LOOP I;
>>>
>>> ********
>>>
>>> When i scan through trace file as per the blog step by step, i see below
>>>
>>> *Step- 1 -* I searched through the object_Id(101739) for TAB1 and got
>>> the section below, i.e pointing towards a DML lock on the table. And the
>>> process state object is proc: 0x243efffc0.
>>>
>>> ----------------------------------------
>>> SO: 0x233cd8bc8, type: DML lock (83), map: 0x237d28a10
>>> state: LIVE (0x4532), flags: 0x0
>>> owner: 0x233b9af60, proc: 0x243efffc0
>>> link: 0x233cd8be8[0x28acf1e58, 0x1d793c780]
>>> pg: 0
>>> SOC: 0x237d28a10, type: *DML lock* (83), map: 0x233cd8bc8
>>> state: LIVE (0x99fc), flags: INIT (0x1)
>>> DML LOCK: tab=101739 flg=11 chi=0
>>> his[0]: mod=4 spn=69302
>>> 2022-07-01 05:44:10.662*:ksq.c_at_10787:ksqdmc(): Enqueue Dump
>>> (enqueue) TM-00018D6B-00000000-00000000-00000000 DID: ksqlkdid:
>>> 0001-00CF-000ACD5C
>>> lv: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 res_flag: 0x7
>>> mode: S, lock_flag: 0x20, lock: 0x237d28a38, res: 0x2499f3798
>>> own: 0x2488605f8, sess: 0x2488605f8, *proc: 0x2403a4510*, prv:
>>> 0x2499f37a8
>>> slk: 0x2530967c8
>>> ----------------------------------------
>>>
>>>
>>> *STEP - 2:-* Then i searched through the trace for the state object
>>> address i.e. "*SO: 0x243efffc0*" and got below. its pointing to pid:207
>>> as below.
>>>
>>>
>>> *** 2022-07-01T05:44:10.538156-04:00
>>> PROCESS 207:
>>> ----------------------------------------
>>> *SO: 0x243efffc0*, type: process (2), map: 0x2403a4510
>>> state: LIVE (0x4532), flags: 0x1
>>> owner: (nil), proc: 0x243efffc0
>>> link: 0x243efffe0[0x243efffe0, 0x243efffe0]
>>> child list count: 15, link: 0x243f00030[0x243b7f190, 0x1d7903b10]
>>> pg: 0
>>> SOC: 0x2403a4510, type: process (2), map: 0x243efffc0
>>> state: LIVE (0x99fc), flags: INIT (0x1)
>>> (process) *Oracle pid:207,* ser:3445, calls cur/top:
>>> 0x1a0372f08/0x1d2ee6c70
>>> flags : (0x0) - icon_uid:0 logon_pdbid=0
>>> flags2: (0x1000), flags3: (0x510)
>>> call error: 0, sess error: 0, txn error 0
>>> intr queue: empty
>>> (post info) last post received: 0 0 277
>>> last post received-location: kcl2.h LINE:5703 ID:kclcget
>>> last process to post me: 0x24c3b2180 1 6
>>> last post sent: 0 0 33
>>> last post sent-location: ksa2.h LINE:298 ID:ksasnd
>>> last process posted by me: 0x24837eaa0 1 6
>>> waiter on post event: 0
>>> (latch info) hold_bits=0x0 ud_influx=0x4101b
>>> (osp latch info) hold_bits=0x0 ud_influx=0x0
>>> Process Group: DEFAULT, pseudo proc: 0x2484c5d70
>>> O/S info: user: grid, term: UNKNOWN, ospid: 383076
>>> OSD pid info:
>>> Short stack dump:
>>>
>>> ksedsts()+426<-ksdxfstk()+58<-ksdxdocmdmultex()+5961<-ksdxdocmdmult()+55<-ksudmp_proc()+1626<-ksudss_main()+1976<-ksudss_opt()+1544<-dbkdaKsdActDriver()+2484<-dbgdaExecuteAction()+354<-dbgdaRunAction()+770<-dbgdRunActions()+83<-dbgdProcessEventActions()+525<-dbgdChkEventKgErr()+394<-dbkdChkEventRdbmsErr()+65<-dbgePostErrorKGE()+1066<-dbkePostKGE_kgsf()+71<-kgeade()+338<-kgeselv()+89<-ksesecl0()+189<-ktaiam()+1110<-ktagetg0()+929<-ktagetp_internal()+141<-ktagetg_ddlX()+323<-ktagetg_ddl()+30<-kkdllk0()+1551<-kkdcprc()+711<-kkdcmco()+3682<-atbadd()+8770<-atbdrv()+14663<-opiexe()+30061<-opiosq0()+4575<-opipls()+14619<-opiodr()+1202<-rpidrus()+198<-skgmstack()+65<-rpidru()+132<-rpiswu2()+614<-rpidrv()+1494<-psddr0()+467<-psdnal()+646<-pevm_EXIM()+276<-pfrinstr_EXIM()+43<-pfrrun_no_tool()+52<-pfrrun()+902<-plsql_run()+755<-peicnt()+279<-kkxexe()+720<-opiexe()+24762<-kpoal8()+2226<-opiodr()+1202<-ttcpip()+1239<-opitsk()+1897<-opiino()+936<-opiodr()+1202<-opidrv()+1094<-sou2o()+165<-opimai_real()+422<-ssthrdmain()+417<-main()+256<-__libc_start_main()+245
>>>
>>> KGL-UOL (Process state object)
>>> KGX Atomic Operation Log 0x2403a5690
>>> Mutex (nil)(0, 0) idn 0 oper NONE(0)
>>> FSO mutex uid 262142 efd 0 whr 0 slp 0
>>> KGX Atomic Operation Log 0x2403a56f8
>>> Mutex (nil)(0, 0) idn 0 oper NONE(0)
>>> FSO mutex uid 262142 efd 0 whr 0 slp 0
>>> KGX Atomic Operation Log 0x2403a5760
>>> Mutex (nil)(0, 0) idn 0 oper NONE(0)
>>> FSO mutex uid 262142 efd 0 whr 0 slp 0
>>> KGX Atomic Operation Log 0x2403a57c8
>>> Mutex (nil)(0, 0) idn 0 oper NONE(0)
>>> FSO mutex uid 262142 efd 0 whr 0 slp 0
>>> KGX Atomic Operation Log 0x2403a5830
>>> Mutex (nil)(0, 0) idn 0 oper NONE(0)
>>> FSO mutex uid 262142 efd 0 whr 0 slp 0
>>> KGX Atomic Operation Log 0x2403a5898
>>> Mutex (nil)(0, 0) idn 0 oper NONE(0)
>>> FSO mutex uid 262142 efd 0 whr 0 slp 0
>>> KGX Atomic Operation Log 0x2403a5900
>>> Mutex (nil)(0, 0) idn 0 oper NONE(0)
>>> FSO mutex uid 262142 efd 0 whr 0 slp 0
>>> KGX Atomic Operation Log 0x2403a5968
>>> Mutex (nil)(0, 0) idn 0 oper NONE(0)
>>> FSO mutex uid 262142 efd 0 whr 0 slp 0
>>> KGX Atomic Operation Log 0x2403a59d0
>>> Mutex (nil)(0, 0) idn 0 oper NONE(0)
>>> FSO mutex uid 262142 efd 0 whr 0 slp 0
>>> ----------------------------------------
>>>
>>>
>>> *Step-3 :- * Then searching for the *pid-207* in the start of the
>>> trace , i found *sid - 1072 ser 23 *and its pointing to the same ALTER
>>> TABLE statement. The OSPID is the same one for which the trace gets
>>> generated. And if I try to see the DML in this sesion before this ALTER
>>> it's the same one, which has just finished up doing the COMMIT for the
>>> direct path load.
>>>
>>> So I am wondering , is it possible that we are hitting any bug due to
>>> which Oracle is actually not releasing the Exclusive table lock immediately
>>> post Commit of the direct path load ? And that is why , even if we are
>>> trying to do the ALTER post COMMIT in the same session we are hitting
>>> Ora-0054 as it blocks itself? And then should we put some wait/sleep
>>> statement post commit i.e. before the ALTER to avoid this bug?
>>>
>>> 207: USER ospid 383076 sid 1072 ser 23,
>>> Cmd: ALTER TABLE
>>>
>>> On Mon, Jun 13, 2022 at 4:28 PM Pap <oracle.developer35_at_gmail.com>
>>> wrote:
>>>
>>>> Thank You So much Nenad.
>>>>
>>>> Unfortunately while checking with the team got to know that the other
>>>> trace files has been removed from the location. So i will wait for the next
>>>> occurrence of the error to capture and analyze all the traces wrt that one
>>>> ospid.
>>>>
>>>>
>>>> On Sun, 12 Jun 2022, 9:24 pm Noveljic Nenad, <
>>>> nenad.noveljic_at_vontobel.com> wrote:
>>>>
>>>>> Your system dump apparently consists of several files.
>>>>>
>>>>>
>>>>>
>>>>> At the very beginning of the system dump there must be a section that
>>>>> identifies the blockers for each process and looks something like this:
>>>>>
>>>>> 157: USER ospid 1385877 sid 786 ser 49681, waiting for 'log file
>>>>> switch (checkpoint incomplete)'
>>>>>
>>>>> Cmd: INSERT
>>>>>
>>>>> Blocked by inst: 1, sid: 266, ser: 13238
>>>>>
>>>>> Final Blocker inst: 1, sid: 266, ser: 13238
>>>>>
>>>>> 159: USER ospid 1295896 sid 796 ser 48963, waiting for 'log file
>>>>> switch (checkpoint incomplete)'
>>>>>
>>>>> Cmd: INSERT
>>>>>
>>>>> Blocked by inst: 1, sid: 266, ser: 13238
>>>>>
>>>>> Final Blocker inst: 1, sid: 266, ser: 13238
>>>>>
>>>>>
>>>>>
>>>>> First, check who was blocking the Powercenter process with ospid 89092
>>>>> (Blocked by…). Then search **all** system dump trace files (**89092**.trc)
>>>>> for the blocker to look at its state objects. Once you now the sid and
>>>>> serial of the blocker, you can position yourself to its state objects with
>>>>> the following search expression: sid: xxx ser: xxx . One of the state
>>>>> objects will be for the cursor that blocked your DDL on MAIN_TAB.
>>>>>
>>>>>
>>>>>
>>>>> Best regards,
>>>>>
>>>>>
>>>>>
>>>>> Nenad
>>>>>
>>>>>
>>>>>
>>>>> *From:* Pap <oracle.developer35_at_gmail.com>
>>>>> *Sent:* Sonntag, 12. Juni 2022 14:06
>>>>> *To:* Noveljic Nenad <nenad.noveljic_at_vontobel.com>
>>>>> *Cc:* Oracle L <oracle-l_at_freelists.org>
>>>>> *Subject:* Re: Reading resource busy error system trace
>>>>>
>>>>>
>>>>>
>>>>> Thank You Nenad.
>>>>>
>>>>> So actually here the session/Alter statement, which failed, is from
>>>>> the informatica power center session only. So do you mean we should search
>>>>> the body of the original trace file from which this trace continued from
>>>>> i.e "u01/app/oracle/diag/rdbms/DB1p/DB11/trace/DB11_ora_89092_8.trc"
>>>>> rather this one. And in that trace the sql statements near the "final
>>>>> blocker" string will be the cause of this error?
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> ____________________________________________________
>>>>>
>>>>> Please consider the environment before printing this e-mail.
>>>>>
>>>>> Bitte denken Sie an die Umwelt, bevor Sie dieses E-Mail drucken.
>>>>>
>>>>>
>>>>> Important Notice
>>>>>
>>>>> This message is intended only for the individual named. It may contain
>>>>> confidential or privileged information. If you are not the named addressee
>>>>> you should in particular not disseminate, distribute, modify or copy this
>>>>> e-mail. Please notify the sender immediately by e-mail, if you have
>>>>> received this message by mistake and delete it from your system.
>>>>> Without prejudice to any contractual agreements between you and us
>>>>> which shall prevail in any case, we take it as your authorization to
>>>>> correspond with you by e-mail if you send us messages by e-mail. However,
>>>>> we reserve the right not to execute orders and instructions transmitted by
>>>>> e-mail at any time and without further explanation.
>>>>> E-mail transmission may not be secure or error-free as information
>>>>> could be intercepted, corrupted, lost, destroyed, arrive late or
>>>>> incomplete. Also processing of incoming e-mails cannot be guaranteed. All
>>>>> liability of Vontobel Holding Ltd. and any of its affiliates (hereinafter
>>>>> collectively referred to as "Vontobel Group") for any damages resulting
>>>>> from e-mail use is excluded. You are advised that urgent and time sensitive
>>>>> messages should not be sent by e-mail and if verification is required
>>>>> please request a printed version.
>>>>> Please note that all e-mail communications to and from the Vontobel
>>>>> Group are subject to electronic storage and review by Vontobel Group.
>>>>> Unless stated to the contrary and without prejudice to any contractual
>>>>> agreements between you and Vontobel Group which shall prevail in any case,
>>>>> e-mail-communication is for informational purposes only and is not intended
>>>>> as an offer or solicitation for the purchase or sale of any financial
>>>>> instrument or as an official confirmation of any transaction.
>>>>> The legal basis for the processing of your personal data is the
>>>>> legitimate interest to develop a commercial relationship with you, as well
>>>>> as your consent to forward you commercial communications. You can exercise,
>>>>> at any time and under the terms established under current regulation, your
>>>>> rights. If you prefer not to receive any further communications, please
>>>>> contact your client relationship manager if you are a client of Vontobel
>>>>> Group or notify the sender. Please note for an exact reference to the
>>>>> affected group entity the corporate e-mail signature. For further
>>>>> information about data privacy at Vontobel Group please consult
>>>>> www.vontobel.com.
>>>>>
>>>>

--
http://www.freelists.org/webpage/oracle-l

Received on Mon Jul 04 2022 - 16:22:26 CEST

This message: [ Message body ]
Next message: Pap: "Re: Reading resource busy error system trace"
In reply to Pap: "Re: Reading resource busy error system trace"
Next in thread: Pap: "Re: Reading resource busy error system trace"
Reply: Pap: "Re: Reading resource busy error system trace"

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ]

Original text of this message