Re: Undo Tablespace issue

From: Timur Akhmadeev <timur.akhmadeev_at_gmail.com>
Date: Thu, 6 Jul 2023 10:34:49 +0300
Message-ID: <CACGsLCJ-2+zMCZP9R2D1mZBg2oc5-4NPQEVz9hoHyCnsW7nw7Q_at_mail.gmail.com>



I think you need to add "a.taddr is not null" condition to the main query to report "sessions with transactions not reported in v$transaction".

Is it a CDB database? Is local undo disabled? Can you share gv$undostat data around the time when ORA-30036 happened (+/- 4h, all nodes).

Not sure why the output from the undo sizing report shows 0 in max no space count - either it is not recorded properly in AWR, or something else is going on.
Are you sure the job always runs on the same node? Since your undo setup and workload varies between nodes, switching the node where the job runs may cause different issues with undo as well.

On Thu, Jul 6, 2023 at 8:24 AM yudhi s <learnerdatabase99_at_gmail.com> wrote:

> I checked the dba_undotablespaces , Retention is set as NOGUARANTEE for
> all the four undo tablespaces for all of those four instances. Also when it
> had errored out , I was checking the v$session to see if any transaction
> having logon_time was very old and is responsible for holding up the
> Unexpired UNDO but didn't find any such session during that time. But yes
> that time i joined the v$transaction and v$session with equi join
> condition. Note- Undo retention is set as 10800 i.e. 3hrs.
>
> As you mentioned, I tried to see if any sessions in v$session which are
> not in v$transactions(using below query) but are having very old
> logon_time(more than ~4 days old). I see below entries,
>
> select *--a.logon_time, a.sid, a.username,a.machine, a.program,
> a.status,a.sql_id
> from gv$session a
> where not exists (select 1 from gv$transaction b where b.addr= a.taddr
> and a.inst_id= b.inst_id )
> and a.inst_id=1
> -- and schemaname not in ('SYS','DBSNMP')
> and logon_time<sysdate-4
> order by logon_time asc
>
> I see ~120 sessions are from SYS and those are background
> processes(something as below).
>
> oracle_at_x073db05.salem.paymentech.com (PSP0)
> oracle_at_x073db05.salem.paymentech.com (CLMN)
> oracle_at_x073db05.salem.paymentech.com (PMON)
> oracle_at_x073db05.salem.paymentech.com (IPC0)
> oracle_at_x073db05.salem.paymentech.com (PMAN)
> oracle_at_x073db05.salem.paymentech.com (LMS1)
> oracle_at_x073db05.salem.paymentech.com (MMAN)
> oracle_at_x073db05.salem.paymentech.com (LMS0)
> oracle_at_x073db05.salem.paymentech.com (GEN1)
> oracle_at_x073db05.salem.paymentech.com (SCMN)
> oracle_at_x073db05.salem.paymentech.com (RMS0)
> oracle_at_x073db05.salem.paymentech.com (LMD0)
>
> 30 are from schema DBSNMP, the program is "OMS".
>
> Only ~10 sessions I see are from application users which are showing
> INACTIVE status in V$session and the PREV_SQL_ID is pointing to below sql.
>
> update user$ set spare6=DECODE(to_char(:2, 'YYYY-MM-DD'), '0000-00-00',
> to_date(NULL), :2) where user#=:1
>
> Do you think any of the above shows any oddity which can be the cause of
> the Ora-30036?
>
> Regards
> Yudhi
>
>
>
> On Thu, Jul 6, 2023 at 2:00 AM Jonathan Lewis <jlewisoracle_at_gmail.com>
> wrote:
>
>> You've previously reported "unexpired extents" in this instance's undo
>> tablespace as 1.1TB. Oracle is capable of "stealing" unexpired extents
>> unless the undo tablespace has been declared with the "retention guarantee"
>> clause. Can you check whether or not this is the case for this instance. If
>> you had an old uncommitted transaction - or other event - that had
>> allocated a slot in the transaction table header of one of your undo
>> segments then the extents to the future of that transaction start SCN would
>> be ACTIVE, not unexpired, so you need to find out you have so much
>> unexpired undo.
>>
>> This may require some unusual poking around, but before you do anything
>> else you could query v$session and v$transaction to see if v$session
>> reports any transactions that are (persistnently) not visible in
>> v$transaction (outer join v$session.taddr with v$transaction.addr).
>>
>> Regards
>> Jonathan Lewis
>>
>>
>> On Wed, 5 Jul 2023 at 20:27, yudhi s <learnerdatabase99_at_gmail.com> wrote:
>>
>>> Thank You Timur and Paul for this feedback.
>>>
>>> Below is the output from the blog which Timur pointed to. And the job
>>> finished after we kept increasing the UNDO and it's now at ~2TB. And the
>>> job ran for 20hrs+. I ran the query by passing awr_snapshot_count as ~50
>>> and then ~100 and in both cases I got the similar output as below. But yes
>>> as in our case we were not hitting Ora-01555 but Ora-30036 which is
>>> different , so how can we interpret this result and find the culprit
>>> query/session for Ora-30036?
>>>
>>> As mentioned the v$transaction was only showing ~300GB of UNDO used by
>>> that specific session.
>>>
>>> The job which was failing multiple times on UNDO was running on Node-1
>>> on this four node database.
>>>
>>> INST_ID CURRENT_SIZE_MB IS_AUTOEXTENSIBLE UNDO_RETENTION
>>> UNDO_SIZE_MIN_MB UNDO_SIZE_GUARANTEE_MB LONGEST_SQL LONGEST_SQL_ID
>>> MAX_ORA1555_CNT MAX_NO_SPACE_CNT
>>> 1 2356086 YES 3 991027.8125 5836862 42.48055556 8yp4vr3aqhnxa 3 0
>>> 2 798968 YES 3 39969.25 6178679 50.92833333 8pnuydbnxhbhj 3 0
>>> 3 322301 YES 3 120706.5625 424632 206.5272222 0uypm2w7jxtvk 2 0
>>> 4 445988 YES 3 130544.75 1729500 72.07694444 ds172hnn0044p 10 0
>>>
>>> On Wed, Jul 5, 2023 at 3:33 PM Timur Akhmadeev <
>>> timur.akhmadeev_at_gmail.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> Have you checked MOS Doc 460481.1
>>>> Also can you run a script from this post
>>>> https://timurakhmadeev.wordpress.com/2018/02/05/undo-sizing/ and share
>>>> its output?
>>>> It's targeted more for handling ORA-1555 errors yet still can be
>>>> useful. The input should be a number of AWR snapshots covering target
>>>> undo_retention.
>>>>
>>>> On Tue, Jul 4, 2023 at 5:59 PM yudhi s <learnerdatabase99_at_gmail.com>
>>>> wrote:
>>>>
>>>>> Hello All,
>>>>>
>>>>> Its Oracle version 19C(19.15) and using AUTO undo management. We have
>>>>> a job keep failing suddenly with UNDO error as below, after running for
>>>>> 6-7hours. It used to run fine in past months. But this time its failing
>>>>> multiple times even we rerun. I understand its a big transaction without
>>>>> commit in between. However checking (used_ublk) from gv$transaction showing
>>>>> the size its consuming is ~200GB only , however the full size of UNDO
>>>>> tablespace is ~1.5TB. And then checking the details out of
>>>>> DBA_UNDO_EXTENTS, we see majority of the blocks in the UNDO in "UNEXPIRED"
>>>>> status as below.
>>>>>
>>>>> Does it mean that it may be that one odd transaction/session (or it
>>>>> may be a SELECT query) is holding all the UNDO and not letting it to mark
>>>>> as EXPIRED as its active? But then when doing a select on v$session for
>>>>> that exact node to which this UNDO tablespace is aligned and doing
>>>>> "logon_time desc" won't show any such long running sessions? Also checked
>>>>> GV$UNDOSTAT order by tuned_undoretention desc, but not seeing any such
>>>>> session standing out apart from the currently running one.
>>>>>
>>>>> Any other possible way to get hold of the culprit session/user which
>>>>> we can kill to get back the "Unexpired UNDO blocks" back to the "Expired"
>>>>> bucket such that, that can be usable?
>>>>>
>>>>> ORA-30036: unable to extend segment by 128 in undo tablespace
>>>>> 'UNDOTBS1'
>>>>>
>>>>> SELECT DISTINCT STATUS, SUM(BYTES)/1024/1024/1024, COUNT(*) FROM
>>>>> DBA_UNDO_EXTENTS where tablespace_name='UNDOTBS1' GROUP BY STATUS;
>>>>>
>>>>> STATUS SUM(BYTES)/1024/1024/1024 COUNT(*)
>>>>> UNEXPIRED 1130.301331 150047
>>>>> EXPIRED 0.1328125 3
>>>>> ACTIVE 594.6703491 44210
>>>>>
>>>>>
>>>>> Regards
>>>>>
>>>>> Yudhi
>>>>>
>>>>
>>>>
>>>> --
>>>> Regards
>>>> Timur Akhmadeev
>>>>
>>>

-- 
Regards
Timur Akhmadeev


--
http://www.freelists.org/webpage/oracle-l
Received on Thu Jul 06 2023 - 09:34:49 CEST

Original text of this message