Re: Question on underlying mechanics of UNDO processing

From: Jonathan Lewis <jlewisoracle_at_gmail.com>
Date: Wed, 7 Dec 2022 18:02:58 +0000
Message-ID: <CAGtsp8=_6v6C0=+cJWM3BC-xVkNJ7YnYaWFVQ2hCaytv2iXn0A_at_mail.gmail.com>



That's a fairly entertaining question, which lots of possibilities.

A couple of things you have to remember (before worrying too much about anything else) is that a long-running delete or "insert as select" might spotaneously rollback and restart, and that more or less quadruples the possible combination (or timing or disrtibution) of events that might appear.

The scale of the insert also matters, as does the current pattern of free space, the number of indexes and the impact of RAC, thanks to the way that ASSM tries to minimise the collision between inserts on different nodes by associating different L1 space management blocks with different nodes, and using blocks in the current "newest" extent before checking extents to see what L1 and L2 bitmaps have become free. There's also some variation between versions because of the problem of timing of bitmap updates by processes that are deleting but not committing - I can't remember the details of what changes happened in which versions, but Oracle had to put in some code to try and handle the problem of an inserting process trying to find a data block that really had free space when the bitmap said it did, but the deleting process had updated the bitmap BEFORE the commit.

It would be a lot easier if you described (very carefully) what you're actually seeing and asked for an explanation. That way you might get something close to the right one.

Anyway:
If the insert is trying to insert a row into a TABLE block that the delete wants to delete from it's (probably) because the delete has already deleted a number of rows from that block so that the L1 has been updated to show free space (even though the space can't be used because the delete hasn't committed). This probably won't happen if the delete is using a tablescan to acquire data, but it's quite likely to happen to an index leaf block and you're likely to see lots of "gc current block 2 way" and "gc current block 3 way" (since it's 3-node RAC) as the sessions try to get the current version of the block to do their insert/delete; you'll probably see a lot of activity on the undo segments as well as the sessions try to get a read-consistent version of the block (as at start of insert / delete) to make sure that they aren't (a) inserting a duplicate of a deleted not committed unique key, or (b) deleting a row that didn't actually exist when the delete started. You'll probably see some index ITL waits (TX enqueue type) as index leaf block splits, and some enqueue which will be tablespace waits, segment space waits, block format waits, and control file waits.

I think you'll also see some "gcs log flush sync" waits because of the volume of cross-instance calls for current blocks, and the more indexes you've got the more db file sequential reads you're likely to see on the undo segments. You're probably going to end up with two sessions growing their undo segments like crazy - so you'll probably see waits for "undo segment extension", but this won't go cross-instance since each instance has its own undo tablespace (of course, the undo blocks can cross back and forth because of the updates and undo on (mostly) index blocks by the two sessions).

You may also find that the data (table and index) segments grow much larger than you would expect - there are various buggy side effects with older ASSM implementations that mean Oracle may add a new segment when there's plenty of space still left in the last segment it added. It's possible that that's been fixed by 11.2.0.4, but almost anything that can go wrong will go wrong when you add RAC to the mixture.

Regards
Jonathan Lewis

On Thu, 1 Dec 2022 at 19:17, Hameed, Amir <amir.hameed_at_sleepnumber.com> wrote:

> Hi,
>
> I am trying to understand the internals of UNDO in the following scenario.
> The database is a three-node RAC version 11.2.0.4:
>
>
>
> 1. A DELETE statement is running on node#3. The statement is deleting
> around 99 million records from table T1. The program running this statement
> has no commit baked into the process.
>
> DELETE FROM T1 WHERE TRUNC(TIMESTAMP) < :1 – 1
>
>
>
> 1. An INSERT statement runs on node #1 inserting records into table T1:
>
> INSERT INTO T1 ( LOG_SEQUENCE, …..)
>
>
>
> 1. At his stage, if the INSERT statement (node#1) is trying to insert
> a record into the same data block of T1 where another record is being
> deleted by the DELETE (node #3) statement, how will Oracle process it? What
> kind of waits/contention the session running INSERT might run into?
>
>
>
> 1. The process running the DELETE statement (node #3) is killed
> because it was taking too long to finish, the UNDO segment has millions of
> UNDO blocks that the killed process is now rolling back.
>
>
>
> 1. At his stage, if an INSERT statement (node#1) tries to insert a
> record into a data block of T1 that had a deleted row that is being rolled
> back now, how will Oracle process it? What kind of waits/contention the
> session running INSERT might run into?
>
>
>
> A feedback would be greatly appreciated.
>
>
>
> Thank you,
>
> Amir
>

--
http://www.freelists.org/webpage/oracle-l
Received on Wed Dec 07 2022 - 19:02:58 CET

Original text of this message