Re: Undo Tablespace issue

From: yudhi s <learnerdatabase99_at_gmail.com>
Date: Thu, 6 Jul 2023 20:27:06 +0530
Message-ID: <CAEzWdqdqXeFarxf35VNW05eAQz-oaQE34YosfcE80qhv8sL4qQ_at_mail.gmail.com>



Thank you so much.

_at_Timur

Adding "a.taddr is not null" condition to the main query results into zero rows.

Its not CDB database. This Job always runs on same Node i.e Node-1.

I was unaware of the "local UNDO" but then i see some blogs its introduced in 12.2 and checking this database, i am seeing no entries in database_properties for property_name - LOCAL_UNDO_ENABLED, so it means we dont have local undo enabled here. Btw, will it make things better if we switch to that?

The failure happened at - 7/3/2023 12:44. I have published the details from gv$undostat from 3-jul-2023 08:00:00 to 3-jul-2023 16:00:00.

https://gist.github.com/oracle9999/7e27b7564464d3086de37c5e4a6deaba

On Thu, 6 Jul, 2023, 2:17 pm Jonathan Lewis, <jlewisoracle_at_gmail.com> wrote:

>
> As Timur points out, "sessions that have transactions ... " translates
> into "v$session.taddr is not null".
>
> I would also remove the check on logon_time, a session could have
> connected weeks ago, started a transaction, and then never revisit the
> database.
> The statement you've shown is (IIRC) a standard "capture the last login
> timestamp". Your comment about 10 sessions showing this statement in
> orev_sql_id probably means that those sessions have simply connected to the
> database and done nothing since.
>
>
> Here's a random thought about the problem:
> How many undo segments do you have in the undo tablespace for that
> instance?
> How many of the undo segments are in status OFFLINE
> How many of the UNEXPIRED extents are in the OFFLINE undo segments?
> (I think in theory an undo segment should only be able to go OFFLINE if
> all its extents are EXPIRED - in practice that doesn't seem to be true.)
>
> Every undo segment has to hold at least 2 extents, so OFFLINE segments can
> absorb a lot of space just from those 2 extents; but I'm not sure that
> Oracle can steal an extent from an offline segment anyway, so there may be
> a lot of OFFLINE space that is denied to you due to much larger segments
> with far more than 2 extents being offline. So it's worth refining your
> queries to break the results down by online and offline segments and doing
> a quick check on the number of extents per segment.
>
> Regards
> Jonathan Lewis
>
>
> On Thu, 6 Jul 2023 at 08:35, Timur Akhmadeev <timur.akhmadeev_at_gmail.com>
> wrote:
>
>> I think you need to add "a.taddr is not null" condition to the main query
>> to report "sessions with transactions not reported in v$transaction".
>>
>> Is it a CDB database? Is local undo disabled?
>> Can you share gv$undostat data around the time when ORA-30036 happened
>> (+/- 4h, all nodes).
>>
>> Not sure why the output from the undo sizing report shows 0 in max no
>> space count - either it is not recorded properly in AWR, or something else
>> is going on.
>> Are you sure the job always runs on the same node? Since your undo setup
>> and workload varies between nodes, switching the node where the job runs
>> may cause different issues with undo as well.
>>
>> On Thu, Jul 6, 2023 at 8:24 AM yudhi s <learnerdatabase99_at_gmail.com>
>> wrote:
>>
>>> I checked the dba_undotablespaces , Retention is set as NOGUARANTEE for
>>> all the four undo tablespaces for all of those four instances. Also when it
>>> had errored out , I was checking the v$session to see if any transaction
>>> having logon_time was very old and is responsible for holding up the
>>> Unexpired UNDO but didn't find any such session during that time. But yes
>>> that time i joined the v$transaction and v$session with equi join
>>> condition. Note- Undo retention is set as 10800 i.e. 3hrs.
>>>
>>> As you mentioned, I tried to see if any sessions in v$session which are
>>> not in v$transactions(using below query) but are having very old
>>> logon_time(more than ~4 days old). I see below entries,
>>>
>>> select *--a.logon_time, a.sid, a.username,a.machine, a.program,
>>> a.status,a.sql_id
>>> from gv$session a
>>> where not exists (select 1 from gv$transaction b where b.addr=
>>> a.taddr and a.inst_id= b.inst_id )
>>> and a.inst_id=1
>>> -- and schemaname not in ('SYS','DBSNMP')
>>> and logon_time<sysdate-4
>>> order by logon_time asc
>>>
>>> I see ~120 sessions are from SYS and those are background
>>> processes(something as below).
>>>
>>> oracle_at_x073db05.salem.paymentech.com (PSP0)
>>> oracle_at_x073db05.salem.paymentech.com (CLMN)
>>> oracle_at_x073db05.salem.paymentech.com (PMON)
>>> oracle_at_x073db05.salem.paymentech.com (IPC0)
>>> oracle_at_x073db05.salem.paymentech.com (PMAN)
>>> oracle_at_x073db05.salem.paymentech.com (LMS1)
>>> oracle_at_x073db05.salem.paymentech.com (MMAN)
>>> oracle_at_x073db05.salem.paymentech.com (LMS0)
>>> oracle_at_x073db05.salem.paymentech.com (GEN1)
>>> oracle_at_x073db05.salem.paymentech.com (SCMN)
>>> oracle_at_x073db05.salem.paymentech.com (RMS0)
>>> oracle_at_x073db05.salem.paymentech.com (LMD0)
>>>
>>> 30 are from schema DBSNMP, the program is "OMS".
>>>
>>> Only ~10 sessions I see are from application users which are showing
>>> INACTIVE status in V$session and the PREV_SQL_ID is pointing to below sql.
>>>
>>> update user$ set spare6=DECODE(to_char(:2, 'YYYY-MM-DD'), '0000-00-00',
>>> to_date(NULL), :2) where user#=:1
>>>
>>> Do you think any of the above shows any oddity which can be the cause of
>>> the Ora-30036?
>>>
>>> Regards
>>> Yudhi
>>>
>>>
>>>
>>> On Thu, Jul 6, 2023 at 2:00 AM Jonathan Lewis <jlewisoracle_at_gmail.com>
>>> wrote:
>>>
>>>> You've previously reported "unexpired extents" in this instance's undo
>>>> tablespace as 1.1TB. Oracle is capable of "stealing" unexpired extents
>>>> unless the undo tablespace has been declared with the "retention guarantee"
>>>> clause. Can you check whether or not this is the case for this instance. If
>>>> you had an old uncommitted transaction - or other event - that had
>>>> allocated a slot in the transaction table header of one of your undo
>>>> segments then the extents to the future of that transaction start SCN would
>>>> be ACTIVE, not unexpired, so you need to find out you have so much
>>>> unexpired undo.
>>>>
>>>> This may require some unusual poking around, but before you do anything
>>>> else you could query v$session and v$transaction to see if v$session
>>>> reports any transactions that are (persistnently) not visible in
>>>> v$transaction (outer join v$session.taddr with v$transaction.addr).
>>>>
>>>> Regards
>>>> Jonathan Lewis
>>>>
>>>>
>>>> On Wed, 5 Jul 2023 at 20:27, yudhi s <learnerdatabase99_at_gmail.com>
>>>> wrote:
>>>>
>>>>> Thank You Timur and Paul for this feedback.
>>>>>
>>>>> Below is the output from the blog which Timur pointed to. And the job
>>>>> finished after we kept increasing the UNDO and it's now at ~2TB. And the
>>>>> job ran for 20hrs+. I ran the query by passing awr_snapshot_count as ~50
>>>>> and then ~100 and in both cases I got the similar output as below. But yes
>>>>> as in our case we were not hitting Ora-01555 but Ora-30036 which is
>>>>> different , so how can we interpret this result and find the culprit
>>>>> query/session for Ora-30036?
>>>>>
>>>>> As mentioned the v$transaction was only showing ~300GB of UNDO used by
>>>>> that specific session.
>>>>>
>>>>> The job which was failing multiple times on UNDO was running on Node-1
>>>>> on this four node database.
>>>>>
>>>>> INST_ID CURRENT_SIZE_MB IS_AUTOEXTENSIBLE UNDO_RETENTION
>>>>> UNDO_SIZE_MIN_MB UNDO_SIZE_GUARANTEE_MB LONGEST_SQL LONGEST_SQL_ID
>>>>> MAX_ORA1555_CNT MAX_NO_SPACE_CNT
>>>>> 1 2356086 YES 3 991027.8125 5836862 42.48055556 8yp4vr3aqhnxa 3 0
>>>>> 2 798968 YES 3 39969.25 6178679 50.92833333 8pnuydbnxhbhj 3 0
>>>>> 3 322301 YES 3 120706.5625 424632 206.5272222 0uypm2w7jxtvk 2 0
>>>>> 4 445988 YES 3 130544.75 1729500 72.07694444 ds172hnn0044p 10 0
>>>>>
>>>>> On Wed, Jul 5, 2023 at 3:33 PM Timur Akhmadeev <
>>>>> timur.akhmadeev_at_gmail.com> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> Have you checked MOS Doc 460481.1
>>>>>> Also can you run a script from this post
>>>>>> https://timurakhmadeev.wordpress.com/2018/02/05/undo-sizing/ and
>>>>>> share its output?
>>>>>> It's targeted more for handling ORA-1555 errors yet still can be
>>>>>> useful. The input should be a number of AWR snapshots covering target
>>>>>> undo_retention.
>>>>>>
>>>>>> On Tue, Jul 4, 2023 at 5:59 PM yudhi s <learnerdatabase99_at_gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Hello All,
>>>>>>>
>>>>>>> Its Oracle version 19C(19.15) and using AUTO undo management. We
>>>>>>> have a job keep failing suddenly with UNDO error as below, after running
>>>>>>> for 6-7hours. It used to run fine in past months. But this time its failing
>>>>>>> multiple times even we rerun. I understand its a big transaction without
>>>>>>> commit in between. However checking (used_ublk) from gv$transaction showing
>>>>>>> the size its consuming is ~200GB only , however the full size of UNDO
>>>>>>> tablespace is ~1.5TB. And then checking the details out of
>>>>>>> DBA_UNDO_EXTENTS, we see majority of the blocks in the UNDO in "UNEXPIRED"
>>>>>>> status as below.
>>>>>>>
>>>>>>> Does it mean that it may be that one odd transaction/session (or it
>>>>>>> may be a SELECT query) is holding all the UNDO and not letting it to mark
>>>>>>> as EXPIRED as its active? But then when doing a select on v$session for
>>>>>>> that exact node to which this UNDO tablespace is aligned and doing
>>>>>>> "logon_time desc" won't show any such long running sessions? Also checked
>>>>>>> GV$UNDOSTAT order by tuned_undoretention desc, but not seeing any such
>>>>>>> session standing out apart from the currently running one.
>>>>>>>
>>>>>>> Any other possible way to get hold of the culprit session/user which
>>>>>>> we can kill to get back the "Unexpired UNDO blocks" back to the "Expired"
>>>>>>> bucket such that, that can be usable?
>>>>>>>
>>>>>>> ORA-30036: unable to extend segment by 128 in undo tablespace
>>>>>>> 'UNDOTBS1'
>>>>>>>
>>>>>>> SELECT DISTINCT STATUS, SUM(BYTES)/1024/1024/1024, COUNT(*) FROM
>>>>>>> DBA_UNDO_EXTENTS where tablespace_name='UNDOTBS1' GROUP BY STATUS;
>>>>>>>
>>>>>>> STATUS SUM(BYTES)/1024/1024/1024 COUNT(*)
>>>>>>> UNEXPIRED 1130.301331 150047
>>>>>>> EXPIRED 0.1328125 3
>>>>>>> ACTIVE 594.6703491 44210
>>>>>>>
>>>>>>>
>>>>>>> Regards
>>>>>>>
>>>>>>> Yudhi
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Regards
>>>>>> Timur Akhmadeev
>>>>>>
>>>>>
>>
>> --
>> Regards
>> Timur Akhmadeev
>>
>

--
http://www.freelists.org/webpage/oracle-l
Received on Thu Jul 06 2023 - 16:57:06 CEST

Original text of this message