Re: How to debug resource busy error

From: Laurentiu Oprea <laurentiu.oprea06_at_gmail.com>
Date: Tue, 3 May 2022 09:05:17 +0300
Message-ID: <CA+riqSWRS8gk1dF4aTrQ9=UZ=FQsU=1+fHVq41C1ZZrvymiDOA_at_mail.gmail.com>



Hello Pap,

I`m very sorry I mislead you because I got spoiled by the fact that I had a kind of similar issue some months ago and I seen into your wait chains parallel slaves blocking eachouther and sessions waiting for cursor s pin wait on x blocked by sessions on CPU and I assumed those minutes you have a parsing issue.

I reviewed the details of my issue and in such a scenario you should encounter the error: ora-04021 timeout occurred while waiting to lock object and should rise after a few minutes thus the wait chains should have been observed.

In your scenario if you are not comfortable with tracing there is no problem, setting ddl_lock_timout at session level for a few minutes should help you determine the issue because it will be recorded in ASH. THus assuming you set it and you replicate the issue running something like: _at_dash_wait_chains sql_opname||':'||sql_id 1=1 tstart tend should display a chain like: ALTER TABLE:SQL_ID -> sql_operation:sql_id

THanks.

În mar., 3 mai 2022 la 07:41, Pap <oracle.developer35_at_gmail.com> a scris:

> Thank you so much.
>
> I never thought of this scenario. Though the parsing query has not been
> seen/captured in the dba_hist_active_sess_history and also we were not able
> to see anything in v$locked_object, however I am sure we do have a lot of
> select queries from reporting module running during the same time on that
> object. so it may happen they are not captured as parsing was happening too
> quick and "Alter" was unlucky to clash with one of them.
>
> I was neglecting that fact and was only thinking that a DML/DDL on the
> base object can cause this failure for this simultaneous ALTER statement
> but now as you mentioned a parsing lock from a SELECT query, on the base
> object can well cause failure of the ALTER statement with Ora-0054. And if
> this is the case then ddl_lock_timeout of 2-3minutes should be enough to
> bypass this issue. Want to know, is there any ash/awr view exists, which
> logs/captures the parsing related lock on the base object so that we can
> query it by the object_id?
>
> Regards
> Pap
>
> On Tue, May 3, 2022 at 1:46 AM Laurentiu Oprea <
> laurentiu.oprea06_at_gmail.com> wrote:
>
>> Thanks. Is strange I don't see any query in parsing phase here based on
>> the chains output. The theory in my mind was that you have a parallel query
>> (can be even a select) for which parsing takes significant time and which
>> include the table you want to enable the constraints thus blocking them but
>> looks like is not the case so tracing should be your friend here.
>>
>> Good luck.
>>
>> On Mon, May 2, 2022, 22:29 Pap <oracle.developer35_at_gmail.com> wrote:
>>
>>> Thank You.
>>>
>>> _at_Laurentiu, I have executed the dashtop query with the suggested inputs
>>> and below is the output from that. But I can't see any query in the output
>>> which can take such a lock on the base table and can cause this error.
>>>
>>> _at_Stefan, Considering future course of action, i was planning to set the
>>> ddl_lock_timeout in the procedure code through the alter session
>>> command. However as you mentioned that culprit session can well be
>>> captured using trace. So i was wondering , if any of the below tracing
>>> is enough to catch the culprit session/sql details or we should
>>> specifically go for "systemstate" dump i.e the one as you suggested? I
>>> never generated or read the systemstate dump, so trying to understand if
>>> other simple traces as below will also help catching the culprit
>>> sessions/sqls etc?
>>>
>>> alter system set events "00054 trace name systemstate level 266,
>>> lifetime 1";
>>>
>>> Get the trace....
>>>
>>> ALTER system SET EVENTS '00054 TRACE NAME CONTEXT OFF';
>>>
>>> ****OR
>>>
>>> ALTER system SET EVENTS '00054 trace name errorstack level 3';
>>>
>>> Get the trace ...
>>>
>>> ALTER system SET EVENTS '00054 TRACE NAME CONTEXT OFF';
>>>
>>> ****OR tracing the exact session
>>>
>>> begin
>>>
>>> dbms_monitor.session_trace_enable( session_id => &m_sid, serial_num
>>> => &m_serial,
>>>
>>> waits => true, bind => true, plan_stat =>
>>> 'all_executions' );
>>>
>>> end;
>>>
>>> /
>>>
>>> -- allow the Ora-00054 failure to happen for the session
>>>
>>> begin
>>>
>>> dbms_monitor.session_trace_disable(session_id => &m_sid,serial_num =>
>>> &m_serial);
>>>
>>> end;
>>>
>>> /
>>>
>>>
>>> [image: User: "image.png"]
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Mon, May 2, 2022 at 6:39 PM Laurentiu Oprea <
>>> laurentiu.oprea06_at_gmail.com> wrote:
>>>
>>>> Hello Pap,
>>>>
>>>> I do see a probable cause for your error so before doing a guess
>>>> completely on feeling, lets get some data :
>>>>
>>>> check the data of: _at_dashtop
>>>> sql_opname,sql_id,sql_exec_start,in_parse session_type='FOREGROUND'
>>>> starttime endtime
>>>> https://github.com/tanelpoder/tpt-oracle/blob/master/ash/dashtop.sql
>>>> BTW: endtime should be the moment you receive the error, starttime can
>>>> be 15 minutes before
>>>>
>>>> At least I`ll make a guess based on feeling + some data :)
>>>>
>>>> În lun., 2 mai 2022 la 14:16, Pap <oracle.developer35_at_gmail.com> a
>>>> scris:
>>>>
>>>>> Thank you so much Laurentiu and Dominic.
>>>>>
>>>>> I tried checking the dash_wait_chains script for 15minutes window ,
>>>>> just by passing first parameter as session_type='FOREGROUND' and but not
>>>>> seeing much blocking waits as in below output. And also as i don't have the
>>>>> sql_id for the exact ALTER statement captured as it must be very fast and
>>>>> thus not getting captured in ASH/AWR, so was not able to pass the second
>>>>> parameter i.e sql_id filter.
>>>>>
>>>>> To Dominic's point , i think this ALTER sql_id wont be captured as its
>>>>> very fast and failing immediately when its seeing the lock from other
>>>>> session. And also we are struggling to catch the culprit session too. So as
>>>>> suggested, will try to set the ddl_lock_timeout to ~300 i.e 5 minutes so as
>>>>> to capture enough blocking session against the ALTER statement logged in
>>>>> the ASH/AWR to get the details around the culprit session. And considering
>>>>> the data-load itself take ~1 hrs+ so this amount of additional wait time(5
>>>>> mins) should be fine. That should possibly also fix the issue if the
>>>>> blocking lock/culprit session is just holding it for very small time and
>>>>> releasing it immediately after.
>>>>>
>>>>>
>>>>> [image: User: "image.png"]
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Mon, 2 May 2022, 12:03 pm Dominic Brooks, <dombrooks_at_hotmail.com>
>>>>> wrote:
>>>>>
>>>>>> Problem is that the error is likely to be immediate.
>>>>>>
>>>>>> If you increase/set the param ddl_lock_timeout in your load code
>>>>>> before you do enable constraints then
>>>>>> a) you might wait sufficient time not to fail, ie your blocker
>>>>>> finishes before the timeout end and/or
>>>>>> b) you should capture some information in ASH about the blocking
>>>>>> scenario.
>>>>>>
>>>>>> Cheers,
>>>>>> Dominic
>>>>>>
>>>>>> Sent from my iPhone
>>>>>>
>>>>>> On 2 May 2022, at 06:55, Laurentiu Oprea <laurentiu.oprea06_at_gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>> 
>>>>>> Hello Pap,
>>>>>>
>>>>>> To diagnose your issue ASH should do the trick.
>>>>>>
>>>>>> Here is an example(script dash_wait_chains) :
>>>>>>
>>>>>> https://tanelpoder.com/2013/11/06/diagnosing-buffer-busy-waits-with-the-ash_wait_chains-sql-script-v0-2/
>>>>>> <https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Ftanelpoder.com%2F2013%2F11%2F06%2Fdiagnosing-buffer-busy-waits-with-the-ash_wait_chains-sql-script-v0-2%2F&data=05%7C01%7C%7C97ae9142e13241b6ef5308da2c005496%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637870677178768226%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=RXOJ6ean3%2FHVZiEqVguXpBwUZWt7sQi4OjPAPTmACX8%3D&reserved=0>
>>>>>>
>>>>>> Thank you.
>>>>>>
>>>>>> În lun., 2 mai 2022 la 05:58, Pap <oracle.developer35_at_gmail.com> a
>>>>>> scris:
>>>>>>
>>>>>>> Hi, We are facing Ora-0054 error on specific time of the month for a
>>>>>>> job and from the line number noted in the error log its pointing to the
>>>>>>> alter statement which enables reference constraints after a direct path
>>>>>>> data load and commit. But we tried monitoring it a few times during
>>>>>>> the error period/run time and surprisingly we are not seeing any lock(from
>>>>>>> v$locked_object) on the base table(say e.g. MAIN_TAB in below example) but
>>>>>>> still the job failed and it succeeded after a couple of rerun
>>>>>>> attempts. But it seems like , some process runs and takes lock(for
>>>>>>> may be very small time) and makes this 'ALTER' failure and we were
>>>>>>> unable to find that out.
>>>>>>>
>>>>>>> So I wanted to understand , how to get the culprit session/sql/job
>>>>>>> so that we can have some dependency set so that this failure won't
>>>>>>> occur? Is there any possible trace to set which will help us to debug this
>>>>>>> scenario and get the details around the culprit session? Or somehow
>>>>>>> we can track it from any of the ASH/AWR views for which we have
>>>>>>> enough retention in place?
>>>>>>>
>>>>>>> *****The error log pointing to one of the below statements
>>>>>>> failure inside the procedure *******
>>>>>>>
>>>>>>> ALTER TABLE SCHEMA1.MAIN_TAB ENABLE NOVALIDATE CONSTRAINT
>>>>>>> MAIN_TAB_R01;
>>>>>>>
>>>>>>> ALTER TABLE SCHEMA1.MAIN_TAB ENABLE NOVALIDATE CONSTRAINT
>>>>>>> MAIN_TAB_R03;
>>>>>>>
>>>>>>> ORA-00054: resource busy and acquire with NOWAIT specified or
>>>>>>> timeout expired
>>>>>>>
>>>>>>> ********Table structure***************
>>>>>>>
>>>>>>> CREATE TABLE SCHEMA1.MAIN_TAB
>>>>>>>
>>>>>>> ( FFKEY NUMBER CONSTRAINT MAIN_TAB_C01 NOT NULL,
>>>>>>>
>>>>>>> SCKEY NUMBER CONSTRAINT MAIN_TAB_C02 NOT NULL,
>>>>>>>
>>>>>>> ACKEY NUMBER,
>>>>>>>
>>>>>>> CKEY NUMBER CONSTRAINT MAIN_TAB_C04 NOT NULL,
>>>>>>>
>>>>>>> CSTKEY NUMBER CONSTRAINT MAIN_TAB_C05 NOT NULL,
>>>>>>>
>>>>>>> EPC_KEY NUMBER CONSTRAINT MAIN_TAB_C06 NOT NULL,
>>>>>>>
>>>>>>> CRRKEY NUMBER CONSTRAINT MAIN_TAB_C07 NOT NULL
>>>>>>>
>>>>>>> );
>>>>>>>
>>>>>>> ALTER TABLE SCHEMA1.MAIN_TAB ADD ( CONSTRAINT MAIN_TAB_R01 FOREIGN
>>>>>>> KEY (CKEY) REFERENCES SCHEMA1.REF_TAB1 (CKEY) ENABLE NOVALIDATE,
>>>>>>>
>>>>>>> CONSTRAINT MAIN_TAB_R03 FOREIGN KEY (SCKEY) REFERENCES
>>>>>>> SCHEMA1.REF_TAB2 (SCKEY) ENABLE NOVALIDATE);
>>>>>>>
>>>>>>> ******************
>>>>>>>
>>>>>>> CREATE TABLE SCHEMA1.REF_TAB1( CKEY NUMBER CONSTRAINT REF_TAB1_C01
>>>>>>> NOT NULL );
>>>>>>>
>>>>>>> CREATE UNIQUE INDEX SCHEMA1.REF_TAB1_PK ON SCHEMA1.REF_TAB1(CKEY);
>>>>>>>
>>>>>>> ALTER TABLE SCHEMA1.REF_TAB1 ADD ( CONSTRAINT REF_TAB1_PK PRIMARY
>>>>>>> KEY (CKEY) USING INDEX SCHEMA1.REF_TAB1_PK ENABLE VALIDATE);
>>>>>>>
>>>>>>> *******************
>>>>>>>
>>>>>>> CREATE TABLE SCHEMA1.REF_TAB2( SCKEY NUMBER CONSTRAINT REF_TAB2_C01
>>>>>>> NOT NULL );
>>>>>>>
>>>>>>> CREATE UNIQUE INDEX SCHEMA1.REF_TAB2_PK ON SCHEMA1.REF_TAB2(SCKEY);
>>>>>>>
>>>>>>> ALTER TABLE SCHEMA1.REF_TAB2 ADD ( CONSTRAINT REF_TAB2_PK PRIMARY
>>>>>>> KEY (SCKEY) USING INDEX SCHEMA1.REF_TAB2_PK ENABLE VALIDATE);
>>>>>>>
>>>>>>> ******************
>>>>>>>
>>>>>>>
>>>>>>>

--
http://www.freelists.org/webpage/oracle-l
Received on Tue May 03 2022 - 08:05:17 CEST

Original text of this message