Re: Undo Tablespace issue
Date: Thu, 6 Jul 2023 09:46:40 +0100
Message-ID: <CAGtsp8=Hb=yNrzd70SoyEG=c6PZA4fk2UowZoXyxvAzqwL_0XQ_at_mail.gmail.com>
On Thu, 6 Jul 2023 at 08:35, Timur Akhmadeev <timur.akhmadeev_at_gmail.com> wrote:
> I think you need to add "a.taddr is not null" condition to the main query
> to report "sessions with transactions not reported in v$transaction".
>
> Is it a CDB database? Is local undo disabled?
> Can you share gv$undostat data around the time when ORA-30036 happened
> (+/- 4h, all nodes).
>
> Not sure why the output from the undo sizing report shows 0 in max no
> space count - either it is not recorded properly in AWR, or something else
> is going on.
> Are you sure the job always runs on the same node? Since your undo setup
> and workload varies between nodes, switching the node where the job runs
> may cause different issues with undo as well.
>
> On Thu, Jul 6, 2023 at 8:24 AM yudhi s <learnerdatabase99_at_gmail.com>
> wrote:
>
>> I checked the dba_undotablespaces , Retention is set as NOGUARANTEE for
>> all the four undo tablespaces for all of those four instances. Also when it
>> had errored out , I was checking the v$session to see if any transaction
>> having logon_time was very old and is responsible for holding up the
>> Unexpired UNDO but didn't find any such session during that time. But yes
>> that time i joined the v$transaction and v$session with equi join
>> condition. Note- Undo retention is set as 10800 i.e. 3hrs.
>>
>> As you mentioned, I tried to see if any sessions in v$session which are
>> not in v$transactions(using below query) but are having very old
>> logon_time(more than ~4 days old). I see below entries,
>>
>> select *--a.logon_time, a.sid, a.username,a.machine, a.program,
>> a.status,a.sql_id
>> from gv$session a
>> where not exists (select 1 from gv$transaction b where b.addr=
>> a.taddr and a.inst_id= b.inst_id )
>> and a.inst_id=1
>> -- and schemaname not in ('SYS','DBSNMP')
>> and logon_time<sysdate-4
>> order by logon_time asc
>>
>> I see ~120 sessions are from SYS and those are background
>> processes(something as below).
>>
>> oracle_at_x073db05.salem.paymentech.com (PSP0)
>> oracle_at_x073db05.salem.paymentech.com (CLMN)
>> oracle_at_x073db05.salem.paymentech.com (PMON)
>> oracle_at_x073db05.salem.paymentech.com (IPC0)
>> oracle_at_x073db05.salem.paymentech.com (PMAN)
>> oracle_at_x073db05.salem.paymentech.com (LMS1)
>> oracle_at_x073db05.salem.paymentech.com (MMAN)
>> oracle_at_x073db05.salem.paymentech.com (LMS0)
>> oracle_at_x073db05.salem.paymentech.com (GEN1)
>> oracle_at_x073db05.salem.paymentech.com (SCMN)
>> oracle_at_x073db05.salem.paymentech.com (RMS0)
>> oracle_at_x073db05.salem.paymentech.com (LMD0)
>>
>> 30 are from schema DBSNMP, the program is "OMS".
>>
>> Only ~10 sessions I see are from application users which are showing
>> INACTIVE status in V$session and the PREV_SQL_ID is pointing to below sql.
>>
>> update user$ set spare6=DECODE(to_char(:2, 'YYYY-MM-DD'), '0000-00-00',
>> to_date(NULL), :2) where user#=:1
>>
>> Do you think any of the above shows any oddity which can be the cause of
>> the Ora-30036?
>>
>> Regards
>> Yudhi
>>
>>
>>
>> On Thu, Jul 6, 2023 at 2:00 AM Jonathan Lewis <jlewisoracle_at_gmail.com>
>> wrote:
>>
>>> You've previously reported "unexpired extents" in this instance's undo
>>> tablespace as 1.1TB. Oracle is capable of "stealing" unexpired extents
>>> unless the undo tablespace has been declared with the "retention guarantee"
>>> clause. Can you check whether or not this is the case for this instance. If
>>> you had an old uncommitted transaction - or other event - that had
>>> allocated a slot in the transaction table header of one of your undo
>>> segments then the extents to the future of that transaction start SCN would
>>> be ACTIVE, not unexpired, so you need to find out you have so much
>>> unexpired undo.
>>>
>>> This may require some unusual poking around, but before you do anything
>>> else you could query v$session and v$transaction to see if v$session
>>> reports any transactions that are (persistnently) not visible in
>>> v$transaction (outer join v$session.taddr with v$transaction.addr).
>>>
>>> Regards
>>> Jonathan Lewis
>>>
>>>
>>> On Wed, 5 Jul 2023 at 20:27, yudhi s <learnerdatabase99_at_gmail.com>
>>> wrote:
>>>
>>>> Thank You Timur and Paul for this feedback.
>>>>
>>>> Below is the output from the blog which Timur pointed to. And the job
>>>> finished after we kept increasing the UNDO and it's now at ~2TB. And the
>>>> job ran for 20hrs+. I ran the query by passing awr_snapshot_count as ~50
>>>> and then ~100 and in both cases I got the similar output as below. But yes
>>>> as in our case we were not hitting Ora-01555 but Ora-30036 which is
>>>> different , so how can we interpret this result and find the culprit
>>>> query/session for Ora-30036?
>>>>
>>>> As mentioned the v$transaction was only showing ~300GB of UNDO used by
>>>> that specific session.
>>>>
>>>> The job which was failing multiple times on UNDO was running on Node-1
>>>> on this four node database.
>>>>
>>>> INST_ID CURRENT_SIZE_MB IS_AUTOEXTENSIBLE UNDO_RETENTION
>>>> UNDO_SIZE_MIN_MB UNDO_SIZE_GUARANTEE_MB LONGEST_SQL LONGEST_SQL_ID
>>>> MAX_ORA1555_CNT MAX_NO_SPACE_CNT
>>>> 1 2356086 YES 3 991027.8125 5836862 42.48055556 8yp4vr3aqhnxa 3 0
>>>> 2 798968 YES 3 39969.25 6178679 50.92833333 8pnuydbnxhbhj 3 0
>>>> 3 322301 YES 3 120706.5625 424632 206.5272222 0uypm2w7jxtvk 2 0
>>>> 4 445988 YES 3 130544.75 1729500 72.07694444 ds172hnn0044p 10 0
>>>>
>>>> On Wed, Jul 5, 2023 at 3:33 PM Timur Akhmadeev <
>>>> timur.akhmadeev_at_gmail.com> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> Have you checked MOS Doc 460481.1
>>>>> Also can you run a script from this post
>>>>> https://timurakhmadeev.wordpress.com/2018/02/05/undo-sizing/ and
>>>>> share its output?
>>>>> It's targeted more for handling ORA-1555 errors yet still can be
>>>>> useful. The input should be a number of AWR snapshots covering target
>>>>> undo_retention.
>>>>>
>>>>> On Tue, Jul 4, 2023 at 5:59 PM yudhi s <learnerdatabase99_at_gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hello All,
>>>>>>
>>>>>> Its Oracle version 19C(19.15) and using AUTO undo management. We have
>>>>>> a job keep failing suddenly with UNDO error as below, after running for
>>>>>> 6-7hours. It used to run fine in past months. But this time its failing
>>>>>> multiple times even we rerun. I understand its a big transaction without
>>>>>> commit in between. However checking (used_ublk) from gv$transaction showing
>>>>>> the size its consuming is ~200GB only , however the full size of UNDO
>>>>>> tablespace is ~1.5TB. And then checking the details out of
>>>>>> DBA_UNDO_EXTENTS, we see majority of the blocks in the UNDO in "UNEXPIRED"
>>>>>> status as below.
>>>>>>
>>>>>> Does it mean that it may be that one odd transaction/session (or it
>>>>>> may be a SELECT query) is holding all the UNDO and not letting it to mark
>>>>>> as EXPIRED as its active? But then when doing a select on v$session for
>>>>>> that exact node to which this UNDO tablespace is aligned and doing
>>>>>> "logon_time desc" won't show any such long running sessions? Also checked
>>>>>> GV$UNDOSTAT order by tuned_undoretention desc, but not seeing any such
>>>>>> session standing out apart from the currently running one.
>>>>>>
>>>>>> Any other possible way to get hold of the culprit session/user which
>>>>>> we can kill to get back the "Unexpired UNDO blocks" back to the "Expired"
>>>>>> bucket such that, that can be usable?
>>>>>>
>>>>>> ORA-30036: unable to extend segment by 128 in undo tablespace
>>>>>> 'UNDOTBS1'
>>>>>>
>>>>>> SELECT DISTINCT STATUS, SUM(BYTES)/1024/1024/1024, COUNT(*) FROM
>>>>>> DBA_UNDO_EXTENTS where tablespace_name='UNDOTBS1' GROUP BY STATUS;
>>>>>>
>>>>>> STATUS SUM(BYTES)/1024/1024/1024 COUNT(*)
>>>>>> UNEXPIRED 1130.301331 150047
>>>>>> EXPIRED 0.1328125 3
>>>>>> ACTIVE 594.6703491 44210
>>>>>>
>>>>>>
>>>>>> Regards
>>>>>>
>>>>>> Yudhi
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Regards
>>>>> Timur Akhmadeev
>>>>>
>>>>
>
> --
> Regards
> Timur Akhmadeev
>
-- http://www.freelists.org/webpage/oracle-lReceived on Thu Jul 06 2023 - 10:46:40 CEST