Re: Undo Tablespace issue

From: yudhi s <learnerdatabase99_at_gmail.com>
Date: Thu, 6 Jul 2023 10:52:57 +0530
Message-ID: <CAEzWdqfVBOcNH+CsYhH7yg0fX-dqqeA+mnqVKbSCKV+bJQ_QFA_at_mail.gmail.com>



I checked the dba_undotablespaces , Retention is set as NOGUARANTEE for all the four undo tablespaces for all of those four instances. Also when it had errored out , I was checking the v$session to see if any transaction having logon_time was very old and is responsible for holding up the Unexpired UNDO but didn't find any such session during that time. But yes that time i joined the v$transaction and v$session with equi join condition. Note- Undo retention is set as 10800 i.e. 3hrs.

As you mentioned, I tried to see if any sessions in v$session which are not in v$transactions(using below query) but are having very old logon_time(more than ~4 days old). I see below entries,

select *--a.logon_time, a.sid, a.username,a.machine, a.program, a.status,a.sql_id
from gv$session a

   where not exists (select 1 from gv$transaction b where b.addr= a.taddr and a.inst_id= b.inst_id )

   and a.inst_id=1

I see ~120 sessions are from SYS and those are background processes(something as below).

oracle_at_x073db05.salem.paymentech.com (PSP0)
oracle_at_x073db05.salem.paymentech.com (CLMN)
oracle_at_x073db05.salem.paymentech.com (PMON)
oracle_at_x073db05.salem.paymentech.com (IPC0)
oracle_at_x073db05.salem.paymentech.com (PMAN)
oracle_at_x073db05.salem.paymentech.com (LMS1)
oracle_at_x073db05.salem.paymentech.com (MMAN)
oracle_at_x073db05.salem.paymentech.com (LMS0)
oracle_at_x073db05.salem.paymentech.com (GEN1)
oracle_at_x073db05.salem.paymentech.com (SCMN)
oracle_at_x073db05.salem.paymentech.com (RMS0)
oracle_at_x073db05.salem.paymentech.com (LMD0)

30 are from schema DBSNMP, the program is "OMS".

Only ~10 sessions I see are from application users which are showing INACTIVE status in V$session and the PREV_SQL_ID is pointing to below sql.

update user$ set spare6=DECODE(to_char(:2, 'YYYY-MM-DD'), '0000-00-00', to_date(NULL), :2) where user#=:1

Do you think any of the above shows any oddity which can be the cause of the Ora-30036?

Regards
Yudhi

On Thu, Jul 6, 2023 at 2:00 AM Jonathan Lewis <jlewisoracle_at_gmail.com> wrote:

> You've previously reported "unexpired extents" in this instance's undo
> tablespace as 1.1TB. Oracle is capable of "stealing" unexpired extents
> unless the undo tablespace has been declared with the "retention guarantee"
> clause. Can you check whether or not this is the case for this instance. If
> you had an old uncommitted transaction - or other event - that had
> allocated a slot in the transaction table header of one of your undo
> segments then the extents to the future of that transaction start SCN would
> be ACTIVE, not unexpired, so you need to find out you have so much
> unexpired undo.
>
> This may require some unusual poking around, but before you do anything
> else you could query v$session and v$transaction to see if v$session
> reports any transactions that are (persistnently) not visible in
> v$transaction (outer join v$session.taddr with v$transaction.addr).
>
> Regards
> Jonathan Lewis
>
>
> On Wed, 5 Jul 2023 at 20:27, yudhi s <learnerdatabase99_at_gmail.com> wrote:
>
>> Thank You Timur and Paul for this feedback.
>>
>> Below is the output from the blog which Timur pointed to. And the job
>> finished after we kept increasing the UNDO and it's now at ~2TB. And the
>> job ran for 20hrs+. I ran the query by passing awr_snapshot_count as ~50
>> and then ~100 and in both cases I got the similar output as below. But yes
>> as in our case we were not hitting Ora-01555 but Ora-30036 which is
>> different , so how can we interpret this result and find the culprit
>> query/session for Ora-30036?
>>
>> As mentioned the v$transaction was only showing ~300GB of UNDO used by
>> that specific session.
>>
>> The job which was failing multiple times on UNDO was running on Node-1 on
>> this four node database.
>>
>> INST_ID CURRENT_SIZE_MB IS_AUTOEXTENSIBLE UNDO_RETENTION UNDO_SIZE_MIN_MB
>> UNDO_SIZE_GUARANTEE_MB LONGEST_SQL LONGEST_SQL_ID MAX_ORA1555_CNT
>> MAX_NO_SPACE_CNT
>> 1 2356086 YES 3 991027.8125 5836862 42.48055556 8yp4vr3aqhnxa 3 0
>> 2 798968 YES 3 39969.25 6178679 50.92833333 8pnuydbnxhbhj 3 0
>> 3 322301 YES 3 120706.5625 424632 206.5272222 0uypm2w7jxtvk 2 0
>> 4 445988 YES 3 130544.75 1729500 72.07694444 ds172hnn0044p 10 0
>>
>> On Wed, Jul 5, 2023 at 3:33 PM Timur Akhmadeev <timur.akhmadeev_at_gmail.com>
>> wrote:
>>
>>> Hi,
>>>
>>> Have you checked MOS Doc 460481.1
>>> Also can you run a script from this post
>>> https://timurakhmadeev.wordpress.com/2018/02/05/undo-sizing/ and share
>>> its output?
>>> It's targeted more for handling ORA-1555 errors yet still can be useful.
>>> The input should be a number of AWR snapshots covering target
>>> undo_retention.
>>>
>>> On Tue, Jul 4, 2023 at 5:59 PM yudhi s <learnerdatabase99_at_gmail.com>
>>> wrote:
>>>
>>>> Hello All,
>>>>
>>>> Its Oracle version 19C(19.15) and using AUTO undo management. We have a
>>>> job keep failing suddenly with UNDO error as below, after running for
>>>> 6-7hours. It used to run fine in past months. But this time its failing
>>>> multiple times even we rerun. I understand its a big transaction without
>>>> commit in between. However checking (used_ublk) from gv$transaction showing
>>>> the size its consuming is ~200GB only , however the full size of UNDO
>>>> tablespace is ~1.5TB. And then checking the details out of
>>>> DBA_UNDO_EXTENTS, we see majority of the blocks in the UNDO in "UNEXPIRED"
>>>> status as below.
>>>>
>>>> Does it mean that it may be that one odd transaction/session (or it may
>>>> be a SELECT query) is holding all the UNDO and not letting it to mark as
>>>> EXPIRED as its active? But then when doing a select on v$session for that
>>>> exact node to which this UNDO tablespace is aligned and doing
>>>> "logon_time desc" won't show any such long running sessions? Also checked
>>>> GV$UNDOSTAT order by tuned_undoretention desc, but not seeing any such
>>>> session standing out apart from the currently running one.
>>>>
>>>> Any other possible way to get hold of the culprit session/user which we
>>>> can kill to get back the "Unexpired UNDO blocks" back to the "Expired"
>>>> bucket such that, that can be usable?
>>>>
>>>> ORA-30036: unable to extend segment by 128 in undo tablespace
>>>> 'UNDOTBS1'
>>>>
>>>> SELECT DISTINCT STATUS, SUM(BYTES)/1024/1024/1024, COUNT(*) FROM
>>>> DBA_UNDO_EXTENTS where tablespace_name='UNDOTBS1' GROUP BY STATUS;
>>>>
>>>> STATUS SUM(BYTES)/1024/1024/1024 COUNT(*)
>>>> UNEXPIRED 1130.301331 150047
>>>> EXPIRED 0.1328125 3
>>>> ACTIVE 594.6703491 44210
>>>>
>>>>
>>>> Regards
>>>>
>>>> Yudhi
>>>>
>>>
>>>
>>> --
>>> Regards
>>> Timur Akhmadeev
>>>
>>

--
http://www.freelists.org/webpage/oracle-l
Received on Thu Jul 06 2023 - 07:22:57 CEST

Original text of this message