Re: How to debug resource busy error

From: Pap <oracle.developer35_at_gmail.com>
Date: Tue, 3 May 2022 13:19:44 +0530
Message-ID: <CAEjw_fj6fYLOFYTJ1m6VY8kifE6eso-M705+PtBXLqJ4y58qUg_at_mail.gmail.com>



No issue. Do you mean, the parsing lock which you were pointing to would be mainly library cache/dictionary cache lock and should persist for longer duration as a blocking session and should then be captured in ASH/AWR and also in that case we would have got Ora-04021 for that ALTER statement rather Ora-0054?

But anyway, as you suggested and also Dominic mentioned earlier, it seems having ddl_lock_timeout set in the code should make us see the blocking session details captured in the ASH/AWR and also in case of blocking persists for a very small period this ALTER statement will also succeed without making the data load process fail.

Regards
Pap

On Tue, May 3, 2022 at 11:35 AM Laurentiu Oprea <laurentiu.oprea06_at_gmail.com> wrote:

> Hello Pap,
>
> I`m very sorry I mislead you because I got spoiled by the fact that I had
> a kind of similar issue some months ago and I seen into your wait chains
> parallel slaves blocking eachouther and sessions waiting for cursor s pin
> wait on x blocked by sessions on CPU and I assumed those minutes you have a
> parsing issue.
>
> I reviewed the details of my issue and in such a scenario you should
> encounter the error: ora-04021 timeout occurred while waiting to lock
> object and should rise after a few minutes thus the wait chains should have
> been observed.
>
> In your scenario if you are not comfortable with tracing there is no
> problem, setting ddl_lock_timout at session level for a few minutes should
> help you determine the issue because it will be recorded in ASH. THus
> assuming you set it and you replicate the issue running something like:
> _at_dash_wait_chains sql_opname||':'||sql_id 1=1 tstart tend
> should display a chain like: ALTER TABLE:SQL_ID -> sql_operation:sql_id
>
> THanks.
>
> În mar., 3 mai 2022 la 07:41, Pap <oracle.developer35_at_gmail.com> a scris:
>
>> Thank you so much.
>>
>> I never thought of this scenario. Though the parsing query has not been
>> seen/captured in the dba_hist_active_sess_history and also we were not able
>> to see anything in v$locked_object, however I am sure we do have a lot of
>> select queries from reporting module running during the same time on that
>> object. so it may happen they are not captured as parsing was happening too
>> quick and "Alter" was unlucky to clash with one of them.
>>
>> I was neglecting that fact and was only thinking that a DML/DDL on the
>> base object can cause this failure for this simultaneous ALTER statement
>> but now as you mentioned a parsing lock from a SELECT query, on the base
>> object can well cause failure of the ALTER statement with Ora-0054. And if
>> this is the case then ddl_lock_timeout of 2-3minutes should be enough to
>> bypass this issue. Want to know, is there any ash/awr view exists, which
>> logs/captures the parsing related lock on the base object so that we can
>> query it by the object_id?
>>
>> Regards
>> Pap
>>
>> On Tue, May 3, 2022 at 1:46 AM Laurentiu Oprea <
>> laurentiu.oprea06_at_gmail.com> wrote:
>>
>>> Thanks. Is strange I don't see any query in parsing phase here based on
>>> the chains output. The theory in my mind was that you have a parallel query
>>> (can be even a select) for which parsing takes significant time and which
>>> include the table you want to enable the constraints thus blocking them but
>>> looks like is not the case so tracing should be your friend here.
>>>
>>> Good luck.
>>>
>>> On Mon, May 2, 2022, 22:29 Pap <oracle.developer35_at_gmail.com> wrote:
>>>
>>>> Thank You.
>>>>
>>>> _at_Laurentiu, I have executed the dashtop query with the suggested inputs
>>>> and below is the output from that. But I can't see any query in the output
>>>> which can take such a lock on the base table and can cause this error.
>>>>
>>>> _at_Stefan, Considering future course of action, i was planning to set the
>>>> ddl_lock_timeout in the procedure code through the alter session
>>>> command. However as you mentioned that culprit session can well be
>>>> captured using trace. So i was wondering , if any of the below tracing
>>>> is enough to catch the culprit session/sql details or we should
>>>> specifically go for "systemstate" dump i.e the one as you suggested? I
>>>> never generated or read the systemstate dump, so trying to understand if
>>>> other simple traces as below will also help catching the culprit
>>>> sessions/sqls etc?
>>>>
>>>> alter system set events "00054 trace name systemstate level 266,
>>>> lifetime 1";
>>>>
>>>> Get the trace....
>>>>
>>>> ALTER system SET EVENTS '00054 TRACE NAME CONTEXT OFF';
>>>>
>>>> ****OR
>>>>
>>>> ALTER system SET EVENTS '00054 trace name errorstack level 3';
>>>>
>>>> Get the trace ...
>>>>
>>>> ALTER system SET EVENTS '00054 TRACE NAME CONTEXT OFF';
>>>>
>>>> ****OR tracing the exact session
>>>>
>>>> begin
>>>>
>>>> dbms_monitor.session_trace_enable( session_id => &m_sid, serial_num
>>>> => &m_serial,
>>>>
>>>> waits => true, bind => true, plan_stat =>
>>>> 'all_executions' );
>>>>
>>>> end;
>>>>
>>>> /
>>>>
>>>> -- allow the Ora-00054 failure to happen for the session
>>>>
>>>> begin
>>>>
>>>> dbms_monitor.session_trace_disable(session_id => &m_sid,serial_num =>
>>>> &m_serial);
>>>>
>>>> end;
>>>>
>>>> /
>>>>
>>>>
>>>> [image: User: "image.png"]
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Mon, May 2, 2022 at 6:39 PM Laurentiu Oprea <
>>>> laurentiu.oprea06_at_gmail.com> wrote:
>>>>
>>>>> Hello Pap,
>>>>>
>>>>> I do see a probable cause for your error so before doing a guess
>>>>> completely on feeling, lets get some data :
>>>>>
>>>>> check the data of: _at_dashtop
>>>>> sql_opname,sql_id,sql_exec_start,in_parse session_type='FOREGROUND'
>>>>> starttime endtime
>>>>> https://github.com/tanelpoder/tpt-oracle/blob/master/ash/dashtop.sql
>>>>> BTW: endtime should be the moment you receive the error, starttime can
>>>>> be 15 minutes before
>>>>>
>>>>> At least I`ll make a guess based on feeling + some data :)
>>>>>
>>>>> În lun., 2 mai 2022 la 14:16, Pap <oracle.developer35_at_gmail.com> a
>>>>> scris:
>>>>>
>>>>>> Thank you so much Laurentiu and Dominic.
>>>>>>
>>>>>> I tried checking the dash_wait_chains script for 15minutes window ,
>>>>>> just by passing first parameter as session_type='FOREGROUND' and but not
>>>>>> seeing much blocking waits as in below output. And also as i don't have the
>>>>>> sql_id for the exact ALTER statement captured as it must be very fast and
>>>>>> thus not getting captured in ASH/AWR, so was not able to pass the second
>>>>>> parameter i.e sql_id filter.
>>>>>>
>>>>>> To Dominic's point , i think this ALTER sql_id wont be captured as
>>>>>> its very fast and failing immediately when its seeing the lock from other
>>>>>> session. And also we are struggling to catch the culprit session too. So as
>>>>>> suggested, will try to set the ddl_lock_timeout to ~300 i.e 5 minutes so as
>>>>>> to capture enough blocking session against the ALTER statement logged in
>>>>>> the ASH/AWR to get the details around the culprit session. And considering
>>>>>> the data-load itself take ~1 hrs+ so this amount of additional wait time(5
>>>>>> mins) should be fine. That should possibly also fix the issue if the
>>>>>> blocking lock/culprit session is just holding it for very small time and
>>>>>> releasing it immediately after.
>>>>>>
>>>>>>
>>>>>> [image: User: "image.png"]
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Mon, 2 May 2022, 12:03 pm Dominic Brooks, <dombrooks_at_hotmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Problem is that the error is likely to be immediate.
>>>>>>>
>>>>>>> If you increase/set the param ddl_lock_timeout in your load code
>>>>>>> before you do enable constraints then
>>>>>>> a) you might wait sufficient time not to fail, ie your blocker
>>>>>>> finishes before the timeout end and/or
>>>>>>> b) you should capture some information in ASH about the blocking
>>>>>>> scenario.
>>>>>>>
>>>>>>> Cheers,
>>>>>>> Dominic
>>>>>>>
>>>>>>> Sent from my iPhone
>>>>>>>
>>>>>>> On 2 May 2022, at 06:55, Laurentiu Oprea <
>>>>>>> laurentiu.oprea06_at_gmail.com> wrote:
>>>>>>>
>>>>>>> 
>>>>>>> Hello Pap,
>>>>>>>
>>>>>>> To diagnose your issue ASH should do the trick.
>>>>>>>
>>>>>>> Here is an example(script dash_wait_chains) :
>>>>>>>
>>>>>>> https://tanelpoder.com/2013/11/06/diagnosing-buffer-busy-waits-with-the-ash_wait_chains-sql-script-v0-2/
>>>>>>> <https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Ftanelpoder.com%2F2013%2F11%2F06%2Fdiagnosing-buffer-busy-waits-with-the-ash_wait_chains-sql-script-v0-2%2F&data=05%7C01%7C%7C97ae9142e13241b6ef5308da2c005496%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637870677178768226%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=RXOJ6ean3%2FHVZiEqVguXpBwUZWt7sQi4OjPAPTmACX8%3D&reserved=0>
>>>>>>>
>>>>>>> Thank you.
>>>>>>>
>>>>>>> În lun., 2 mai 2022 la 05:58, Pap <oracle.developer35_at_gmail.com> a
>>>>>>> scris:
>>>>>>>
>>>>>>>> Hi, We are facing Ora-0054 error on specific time of the month for
>>>>>>>> a job and from the line number noted in the error log its pointing to the
>>>>>>>> alter statement which enables reference constraints after a direct path
>>>>>>>> data load and commit. But we tried monitoring it a few times
>>>>>>>> during the error period/run time and surprisingly we are not seeing any
>>>>>>>> lock(from v$locked_object) on the base table(say e.g. MAIN_TAB in below
>>>>>>>> example) but still the job failed and it succeeded after a couple
>>>>>>>> of rerun attempts. But it seems like , some process runs and takes
>>>>>>>> lock(for may be very small time) and makes this 'ALTER' failure
>>>>>>>> and we were unable to find that out.
>>>>>>>>
>>>>>>>> So I wanted to understand , how to get the culprit session/sql/job
>>>>>>>> so that we can have some dependency set so that this failure won't
>>>>>>>> occur? Is there any possible trace to set which will help us to debug this
>>>>>>>> scenario and get the details around the culprit session? Or somehow
>>>>>>>> we can track it from any of the ASH/AWR views for which we have
>>>>>>>> enough retention in place?
>>>>>>>>
>>>>>>>> *****The error log pointing to one of the below statements
>>>>>>>> failure inside the procedure *******
>>>>>>>>
>>>>>>>> ALTER TABLE SCHEMA1.MAIN_TAB ENABLE NOVALIDATE CONSTRAINT
>>>>>>>> MAIN_TAB_R01;
>>>>>>>>
>>>>>>>> ALTER TABLE SCHEMA1.MAIN_TAB ENABLE NOVALIDATE CONSTRAINT
>>>>>>>> MAIN_TAB_R03;
>>>>>>>>
>>>>>>>> ORA-00054: resource busy and acquire with NOWAIT specified or
>>>>>>>> timeout expired
>>>>>>>>
>>>>>>>> ********Table structure***************
>>>>>>>>
>>>>>>>> CREATE TABLE SCHEMA1.MAIN_TAB
>>>>>>>>
>>>>>>>> ( FFKEY NUMBER CONSTRAINT MAIN_TAB_C01 NOT NULL,
>>>>>>>>
>>>>>>>> SCKEY NUMBER CONSTRAINT MAIN_TAB_C02 NOT NULL,
>>>>>>>>
>>>>>>>> ACKEY NUMBER,
>>>>>>>>
>>>>>>>> CKEY NUMBER CONSTRAINT MAIN_TAB_C04 NOT NULL,
>>>>>>>>
>>>>>>>> CSTKEY NUMBER CONSTRAINT MAIN_TAB_C05 NOT NULL,
>>>>>>>>
>>>>>>>> EPC_KEY NUMBER CONSTRAINT MAIN_TAB_C06 NOT NULL,
>>>>>>>>
>>>>>>>> CRRKEY NUMBER CONSTRAINT MAIN_TAB_C07 NOT NULL
>>>>>>>>
>>>>>>>> );
>>>>>>>>
>>>>>>>> ALTER TABLE SCHEMA1.MAIN_TAB ADD ( CONSTRAINT MAIN_TAB_R01 FOREIGN
>>>>>>>> KEY (CKEY) REFERENCES SCHEMA1.REF_TAB1 (CKEY) ENABLE NOVALIDATE,
>>>>>>>>
>>>>>>>> CONSTRAINT MAIN_TAB_R03 FOREIGN KEY (SCKEY) REFERENCES
>>>>>>>> SCHEMA1.REF_TAB2 (SCKEY) ENABLE NOVALIDATE);
>>>>>>>>
>>>>>>>> ******************
>>>>>>>>
>>>>>>>> CREATE TABLE SCHEMA1.REF_TAB1( CKEY NUMBER CONSTRAINT
>>>>>>>> REF_TAB1_C01 NOT NULL );
>>>>>>>>
>>>>>>>> CREATE UNIQUE INDEX SCHEMA1.REF_TAB1_PK ON SCHEMA1.REF_TAB1(CKEY);
>>>>>>>>
>>>>>>>> ALTER TABLE SCHEMA1.REF_TAB1 ADD ( CONSTRAINT REF_TAB1_PK PRIMARY
>>>>>>>> KEY (CKEY) USING INDEX SCHEMA1.REF_TAB1_PK ENABLE VALIDATE);
>>>>>>>>
>>>>>>>> *******************
>>>>>>>>
>>>>>>>> CREATE TABLE SCHEMA1.REF_TAB2( SCKEY NUMBER CONSTRAINT
>>>>>>>> REF_TAB2_C01 NOT NULL );
>>>>>>>>
>>>>>>>> CREATE UNIQUE INDEX SCHEMA1.REF_TAB2_PK ON SCHEMA1.REF_TAB2(SCKEY);
>>>>>>>>
>>>>>>>> ALTER TABLE SCHEMA1.REF_TAB2 ADD ( CONSTRAINT REF_TAB2_PK PRIMARY
>>>>>>>> KEY (SCKEY) USING INDEX SCHEMA1.REF_TAB2_PK ENABLE VALIDATE);
>>>>>>>>
>>>>>>>> ******************
>>>>>>>>
>>>>>>>>
>>>>>>>>

--
http://www.freelists.org/webpage/oracle-l
Received on Tue May 03 2022 - 09:49:44 CEST

Original text of this message