Re: Undo Tablespace issue

From: yudhi s <learnerdatabase99_at_gmail.com>
Date: Thu, 6 Jul 2023 20:29:09 +0530
Message-ID: <CAEzWdqcdoVy56TL0YzaAfvNPm+Mg=ormqkpM7W0r6uX0_PG42g_at_mail.gmail.com>



Thank you Jonathan.

Below is the current stats i see.
Currently the UNDOTBS1 is showing below stats i.e. ~819GB contributed to UNEXPIRED extents(which i believe is still on the higher side) and the count of those type of extents is ~57K in total.

SELECT DISTINCT STATUS, SUM(BYTES)/1024/1024/1024, COUNT(*) FROM DBA_UNDO_EXTENTS where tablespace_name='UNDOTBS1' GROUP BY STATUS;

STATUS SUM(BYTES)/1024/1024/1024 COUNT(*) EXPIRED 182.7180 22642
UNEXPIRED 819.250 57899
ACTIVE 5.658 103 Then checking DBA_ROLLBACK_SEGS, In total there are ~150 segments and out of those ~142 are "online" and 8 are "offline".

Then checking DBA_UNDO_EXTENTS, Seeing ~1689 extents which are part of those OFFLINE segment but are in UNEXPIRED state. But are contributed to ~29Gb only. The majority is pointing to the ONLINE Undo segments "UNEXPIRED" extents.

select count(distinct segment_name), sum(decode(status, 'ONLINE', 1, 0)) from DBA_ROLLBACK_SEGS where tablespace_name='UNDOTBS1' --group by segment_name order by count(*) desc

COUNT(DISTINCTSEGMENT_NAME) SUM(DECODE(STATUS,'ONLINE',1,0)) 150 142

select status, count(*),SUM(BYTES)/1024/1024/1024 from DBA_UNDO_EXTENTS where segment_name in (select segment_name from DBA_ROLLBACK_SEGS where tablespace_name='UNDOTBS1' and status='OFFLINE' ) and tablespace_name='UNDOTBS1'
group by status

STATUS COUNT(*) SUM(BYTES)/1024/1024/1024 EXPIRED 515 7.184
UNEXPIRED 1689 29.420 select status, count(*),SUM(BYTES)/1024/1024/1024 from DBA_UNDO_EXTENTS where segment_name in (select segment_name from DBA_ROLLBACK_SEGS where tablespace_name='UNDOTBS1' and status='ONLINE' ) and tablespace_name='UNDOTBS1'
group by status

STATUS COUNT(*) SUM(BYTES)/1024/1024/1024 EXPIRED 22108 174.637
UNEXPIRED 56440 802.550
ACTIVE 342 17.929 On Thu, 6 Jul, 2023, 8:27 pm yudhi s, <learnerdatabase99_at_gmail.com> wrote:

> Thank you so much.
>
> _at_Timur
>
> Adding "a.taddr is not null" condition to the main query results into zero
> rows.
>
> Its not CDB database. This Job always runs on same Node i.e Node-1.
>
> I was unaware of the "local UNDO" but then i see some blogs its introduced
> in 12.2 and checking this database, i am seeing no entries in
> database_properties for property_name - LOCAL_UNDO_ENABLED, so it means we
> dont have local undo enabled here. Btw, will it make things better if we
> switch to that?
>
> The failure happened at - 7/3/2023 12:44. I have published the details
> from gv$undostat from 3-jul-2023 08:00:00 to 3-jul-2023 16:00:00.
>
> https://gist.github.com/oracle9999/7e27b7564464d3086de37c5e4a6deaba
>
>
>
>
>
>
>
> On Thu, 6 Jul, 2023, 2:17 pm Jonathan Lewis, <jlewisoracle_at_gmail.com>
> wrote:
>
>>
>> As Timur points out, "sessions that have transactions ... " translates
>> into "v$session.taddr is not null".
>>
>> I would also remove the check on logon_time, a session could have
>> connected weeks ago, started a transaction, and then never revisit the
>> database.
>> The statement you've shown is (IIRC) a standard "capture the last login
>> timestamp". Your comment about 10 sessions showing this statement in
>> orev_sql_id probably means that those sessions have simply connected to the
>> database and done nothing since.
>>
>>
>> Here's a random thought about the problem:
>> How many undo segments do you have in the undo tablespace for that
>> instance?
>> How many of the undo segments are in status OFFLINE
>> How many of the UNEXPIRED extents are in the OFFLINE undo segments?
>> (I think in theory an undo segment should only be able to go OFFLINE if
>> all its extents are EXPIRED - in practice that doesn't seem to be true.)
>>
>> Every undo segment has to hold at least 2 extents, so OFFLINE segments
>> can absorb a lot of space just from those 2 extents; but I'm not sure that
>> Oracle can steal an extent from an offline segment anyway, so there may be
>> a lot of OFFLINE space that is denied to you due to much larger segments
>> with far more than 2 extents being offline. So it's worth refining your
>> queries to break the results down by online and offline segments and doing
>> a quick check on the number of extents per segment.
>>
>> Regards
>> Jonathan Lewis
>>
>>
>> On Thu, 6 Jul 2023 at 08:35, Timur Akhmadeev <timur.akhmadeev_at_gmail.com>
>> wrote:
>>
>>> I think you need to add "a.taddr is not null" condition to the main
>>> query to report "sessions with transactions not reported in v$transaction".
>>>
>>> Is it a CDB database? Is local undo disabled?
>>> Can you share gv$undostat data around the time when ORA-30036 happened
>>> (+/- 4h, all nodes).
>>>
>>> Not sure why the output from the undo sizing report shows 0 in max no
>>> space count - either it is not recorded properly in AWR, or something else
>>> is going on.
>>> Are you sure the job always runs on the same node? Since your undo setup
>>> and workload varies between nodes, switching the node where the job runs
>>> may cause different issues with undo as well.
>>>
>>> On Thu, Jul 6, 2023 at 8:24 AM yudhi s <learnerdatabase99_at_gmail.com>
>>> wrote:
>>>
>>>> I checked the dba_undotablespaces , Retention is set as NOGUARANTEE for
>>>> all the four undo tablespaces for all of those four instances. Also when it
>>>> had errored out , I was checking the v$session to see if any transaction
>>>> having logon_time was very old and is responsible for holding up the
>>>> Unexpired UNDO but didn't find any such session during that time. But yes
>>>> that time i joined the v$transaction and v$session with equi join
>>>> condition. Note- Undo retention is set as 10800 i.e. 3hrs.
>>>>
>>>> As you mentioned, I tried to see if any sessions in v$session which are
>>>> not in v$transactions(using below query) but are having very old
>>>> logon_time(more than ~4 days old). I see below entries,
>>>>
>>>> select *--a.logon_time, a.sid, a.username,a.machine, a.program,
>>>> a.status,a.sql_id
>>>> from gv$session a
>>>> where not exists (select 1 from gv$transaction b where b.addr=
>>>> a.taddr and a.inst_id= b.inst_id )
>>>> and a.inst_id=1
>>>> -- and schemaname not in ('SYS','DBSNMP')
>>>> and logon_time<sysdate-4
>>>> order by logon_time asc
>>>>
>>>> I see ~120 sessions are from SYS and those are background
>>>> processes(something as below).
>>>>
>>>> oracle_at_x073db05.salem.paymentech.com (PSP0)
>>>> oracle_at_x073db05.salem.paymentech.com (CLMN)
>>>> oracle_at_x073db05.salem.paymentech.com (PMON)
>>>> oracle_at_x073db05.salem.paymentech.com (IPC0)
>>>> oracle_at_x073db05.salem.paymentech.com (PMAN)
>>>> oracle_at_x073db05.salem.paymentech.com (LMS1)
>>>> oracle_at_x073db05.salem.paymentech.com (MMAN)
>>>> oracle_at_x073db05.salem.paymentech.com (LMS0)
>>>> oracle_at_x073db05.salem.paymentech.com (GEN1)
>>>> oracle_at_x073db05.salem.paymentech.com (SCMN)
>>>> oracle_at_x073db05.salem.paymentech.com (RMS0)
>>>> oracle_at_x073db05.salem.paymentech.com (LMD0)
>>>>
>>>> 30 are from schema DBSNMP, the program is "OMS".
>>>>
>>>> Only ~10 sessions I see are from application users which are showing
>>>> INACTIVE status in V$session and the PREV_SQL_ID is pointing to below sql.
>>>>
>>>> update user$ set spare6=DECODE(to_char(:2, 'YYYY-MM-DD'), '0000-00-00',
>>>> to_date(NULL), :2) where user#=:1
>>>>
>>>> Do you think any of the above shows any oddity which can be the cause
>>>> of the Ora-30036?
>>>>
>>>> Regards
>>>> Yudhi
>>>>
>>>>
>>>>
>>>> On Thu, Jul 6, 2023 at 2:00 AM Jonathan Lewis <jlewisoracle_at_gmail.com>
>>>> wrote:
>>>>
>>>>> You've previously reported "unexpired extents" in this instance's undo
>>>>> tablespace as 1.1TB. Oracle is capable of "stealing" unexpired extents
>>>>> unless the undo tablespace has been declared with the "retention guarantee"
>>>>> clause. Can you check whether or not this is the case for this instance. If
>>>>> you had an old uncommitted transaction - or other event - that had
>>>>> allocated a slot in the transaction table header of one of your undo
>>>>> segments then the extents to the future of that transaction start SCN would
>>>>> be ACTIVE, not unexpired, so you need to find out you have so much
>>>>> unexpired undo.
>>>>>
>>>>> This may require some unusual poking around, but before you do
>>>>> anything else you could query v$session and v$transaction to see if
>>>>> v$session reports any transactions that are (persistnently) not visible in
>>>>> v$transaction (outer join v$session.taddr with v$transaction.addr).
>>>>>
>>>>> Regards
>>>>> Jonathan Lewis
>>>>>
>>>>>
>>>>> On Wed, 5 Jul 2023 at 20:27, yudhi s <learnerdatabase99_at_gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Thank You Timur and Paul for this feedback.
>>>>>>
>>>>>> Below is the output from the blog which Timur pointed to. And the job
>>>>>> finished after we kept increasing the UNDO and it's now at ~2TB. And the
>>>>>> job ran for 20hrs+. I ran the query by passing awr_snapshot_count as ~50
>>>>>> and then ~100 and in both cases I got the similar output as below. But yes
>>>>>> as in our case we were not hitting Ora-01555 but Ora-30036 which is
>>>>>> different , so how can we interpret this result and find the culprit
>>>>>> query/session for Ora-30036?
>>>>>>
>>>>>> As mentioned the v$transaction was only showing ~300GB of UNDO used
>>>>>> by that specific session.
>>>>>>
>>>>>> The job which was failing multiple times on UNDO was running on
>>>>>> Node-1 on this four node database.
>>>>>>
>>>>>> INST_ID CURRENT_SIZE_MB IS_AUTOEXTENSIBLE UNDO_RETENTION
>>>>>> UNDO_SIZE_MIN_MB UNDO_SIZE_GUARANTEE_MB LONGEST_SQL LONGEST_SQL_ID
>>>>>> MAX_ORA1555_CNT MAX_NO_SPACE_CNT
>>>>>> 1 2356086 YES 3 991027.8125 5836862 42.48055556 8yp4vr3aqhnxa 3 0
>>>>>> 2 798968 YES 3 39969.25 6178679 50.92833333 8pnuydbnxhbhj 3 0
>>>>>> 3 322301 YES 3 120706.5625 424632 206.5272222 0uypm2w7jxtvk 2 0
>>>>>> 4 445988 YES 3 130544.75 1729500 72.07694444 ds172hnn0044p 10 0
>>>>>>
>>>>>> On Wed, Jul 5, 2023 at 3:33 PM Timur Akhmadeev <
>>>>>> timur.akhmadeev_at_gmail.com> wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> Have you checked MOS Doc 460481.1
>>>>>>> Also can you run a script from this post
>>>>>>> https://timurakhmadeev.wordpress.com/2018/02/05/undo-sizing/ and
>>>>>>> share its output?
>>>>>>> It's targeted more for handling ORA-1555 errors yet still can be
>>>>>>> useful. The input should be a number of AWR snapshots covering target
>>>>>>> undo_retention.
>>>>>>>
>>>>>>> On Tue, Jul 4, 2023 at 5:59 PM yudhi s <learnerdatabase99_at_gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hello All,
>>>>>>>>
>>>>>>>> Its Oracle version 19C(19.15) and using AUTO undo management. We
>>>>>>>> have a job keep failing suddenly with UNDO error as below, after running
>>>>>>>> for 6-7hours. It used to run fine in past months. But this time its failing
>>>>>>>> multiple times even we rerun. I understand its a big transaction without
>>>>>>>> commit in between. However checking (used_ublk) from gv$transaction showing
>>>>>>>> the size its consuming is ~200GB only , however the full size of UNDO
>>>>>>>> tablespace is ~1.5TB. And then checking the details out of
>>>>>>>> DBA_UNDO_EXTENTS, we see majority of the blocks in the UNDO in "UNEXPIRED"
>>>>>>>> status as below.
>>>>>>>>
>>>>>>>> Does it mean that it may be that one odd transaction/session (or it
>>>>>>>> may be a SELECT query) is holding all the UNDO and not letting it to mark
>>>>>>>> as EXPIRED as its active? But then when doing a select on v$session for
>>>>>>>> that exact node to which this UNDO tablespace is aligned and doing
>>>>>>>> "logon_time desc" won't show any such long running sessions? Also checked
>>>>>>>> GV$UNDOSTAT order by tuned_undoretention desc, but not seeing any such
>>>>>>>> session standing out apart from the currently running one.
>>>>>>>>
>>>>>>>> Any other possible way to get hold of the culprit session/user
>>>>>>>> which we can kill to get back the "Unexpired UNDO blocks" back to the
>>>>>>>> "Expired" bucket such that, that can be usable?
>>>>>>>>
>>>>>>>> ORA-30036: unable to extend segment by 128 in undo tablespace
>>>>>>>> 'UNDOTBS1'
>>>>>>>>
>>>>>>>> SELECT DISTINCT STATUS, SUM(BYTES)/1024/1024/1024, COUNT(*) FROM
>>>>>>>> DBA_UNDO_EXTENTS where tablespace_name='UNDOTBS1' GROUP BY STATUS;
>>>>>>>>
>>>>>>>> STATUS SUM(BYTES)/1024/1024/1024 COUNT(*)
>>>>>>>> UNEXPIRED 1130.301331 150047
>>>>>>>> EXPIRED 0.1328125 3
>>>>>>>> ACTIVE 594.6703491 44210
>>>>>>>>
>>>>>>>>
>>>>>>>> Regards
>>>>>>>>
>>>>>>>> Yudhi
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Regards
>>>>>>> Timur Akhmadeev
>>>>>>>
>>>>>>
>>>
>>> --
>>> Regards
>>> Timur Akhmadeev
>>>
>>

--
http://www.freelists.org/webpage/oracle-l
Received on Thu Jul 06 2023 - 16:59:09 CEST

Original text of this message