Re: Better Delete method

From: Jonathan Lewis <jlewisoracle_at_gmail.com>
Date: Mon, 12 Jul 2021 00:15:09 +0100
Message-ID: <CAGtsp8mMhi0mvTecFRT4ZgZmO+6qg3wx3Jxm5R5wVWqDTCYi3g_at_mail.gmail.com>



Almost certainly true.

The CTAS is the most efficient for undo/redo as a way of getting rid of the 50% of rows you don't want; using the exchange partition trick avoids the need to recreate grants and synonyms.

There are a couple of traps with the method, though, that the initial script won't address (and, personally, I wouldn't want to use a script like that anyway). The first relates to the primary key - if you do CTAS (create table t2 as select * from t1) and have declared a primary key on t1 then the PK column on t1 will be reported by DESCRIBE as NOT NULL; depending on how you created the primary key the same may not be true for t2. If it is not true then adding a primary key to t2 will require 2 tablescans of t2 - one to check and impose a NOT NULL constraint, then the second to create the required supporting index. See
https://jonathanlewis.wordpress.com/2016/01/27/add-primary-key/

The second trap is that if you have any function-based indexes on the table (which you don't) or any virtual columns, or if you have marked unused (but not dropped) any columns then the extract from user_tab_cols will include those columns in the CTAS statement and they will have to be edited out. Worse, though, as a follow-up on unused columns - if you have any unused columns in the table you need to create the copy with the same columns in the right positions and mark them in the copy as unused otherwise the exchange will raise error: ORA-14097: column type or size mismatch in ALTER TABLE EXCHANGE PARTITION. (Oracle 12c allows you to "create table for exchange" which means it will cater for all the odd, invisible, things you can't see in the in the normal use of the table but which exist in the data dictionary.

In passing - if you have any unused columns in your table you probably don't want to drop them in order to avoid the mismatch problem as this will result in Oracle updating every row once for every column dropped: and that will genreate a huge amount of undo and redo. https://www.red-gate.com/simple-talk/sql/oracle/dropping-columns/

Regards
Jonathan Lewis

On Sun, 11 Jul 2021 at 20:25, Lok P <loknath.73_at_gmail.com> wrote:

> Thank you.
>
> If I get it correct, it looks like the CTAS way will be the fastest way in
> our case and should be followed if we can afford a couple of hours of
> downtime. And as you mentioned the delete based on ROWID approach in bulk
> collect will generate gobs of UNDo/REDO and also we may need to reorg the
> table/index at the end considering 50% of the data is getting deleted.
>
> And yes both the Primary key constraint/index of data integrity and other
> composite indexes for Select queries are needed . But as you said we will
> try to create these two indexes in parallel from two sessions to make this
> process faster. But the doubt i had was, if i create a unique index in
> parallel-32 from one session and later on will create PK constraint using
> the previously created unique index will that happen in seconds? and the
> constraint can be created in VALIDATE status? or the PK constraint
> validation will separately happen and will take time then?
>
>
> On Sun, Jul 11, 2021 at 10:37 PM ahmed.fikri_at_t-online.de <
> ahmed.fikri_at_t-online.de> wrote:
>
>> Given the size of the table you mentioned, I'm assuming that clearing the
>> data via partition exchange will require a maintenance window of less than
>> 60 minutes.
>>
>>
>>
>> Are the PK and the other index critical to the application (you mentioned
>> no FKs are pointing to this table)? Why not create them in two sessions
>> each with dop 32 afterwards? The whole operation should take less then 60
>> minutes.
>>
>>
>>
>> If no downtime is tolerated, you can hide your table behind a view and
>> use trigger to implement some sort of redo mechanism analog to the one from
>> Oracle itself.
>>
>>
>>
>> To remove more than 400 GB of data, UNDO / REDO should be avoided as it
>> only pollutes the entire database. They are just a boilerplate. Why do we
>> need to generate all of this data when in some situations we are 300% sure
>> that we don't need it?(Your first method is not efficient. Also requires
>> one single block read for each row -- this is too huge)
>>
>>
>>
>> The problem is that Oracle is not ready to change concepts from the
>> 1970s.So why not introduce (implement) a note /*+ dml_no_consistency_read *
>> /, for example. If this hint is used in a delete statement, nothing is
>> written to the UNDO and if the user has chosen to roll back, appropriate
>> redos are identified and skipped.
>>
>>
>>
>> Best regards
>>
>> Ahmed
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> -----Original-Nachricht-----
>>
>> Betreff: Re: Better Delete method
>>
>> Datum: 2021-07-11T15:07:14+0200
>>
>> Von: "Lok P" <loknath.73_at_gmail.com>
>>
>> An: "ahmed.fikri_at_t-online.de" <ahmed.fikri_at_t-online.de>
>>
>>
>>
>>
>>
>>
>> Thank you Ahmed. So this code is doing data purge by creating a
>> temp table which will be of similar structure(indexes and constraints needs
>> to be exactly same) as of main table but is partitioned , so as to take
>> advantage of partition exchange approach. But yes, it seems like DB
>> resource/time consumption in this method is the same as method-2, which I
>> mentioned in my initial post. And it does need downtime , because in
>> between the CTAS and final partition exchange if any DML operation happens
>> on the base table , that data will be missed.
>>
>> And is it correct that in either of the ways(using CTAS with/without
>> partition exchange), the primary key constraint can be created with a
>> VALIDATE state only without much time and resource , if we first create the
>> UNIQUE index and then create PK constraints in the VALIDATE state using
>> that same unique index? Please correct me if I'm wrong.
>>
>>
>>
>>
>>
>> On Sun, Jul 11, 2021 at 1:44 AM ahmed.fikri_at_t-online.de <
>> ahmed.fikri_at_t-online.de> wrote:
>>
>>> Hi,
>>>
>>>
>>>
>>> in the attached file is a method to delete data from big table using
>>> partition exchange (you have to enhance the method to use indexes....)
>>>
>>>
>>>
>>> Best regards
>>>
>>> Ahmed
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> -----Original-Nachricht-----
>>>
>>> Betreff: Better Delete method
>>>
>>> Datum: 2021-07-10T21:47:55+0200
>>>
>>> Von: "Lok P" <loknath.73_at_gmail.com>
>>>
>>> An: "Oracle L" <oracle-l_at_freelists.org>
>>>
>>>
>>>
>>>
>>>
>>>
>>> Hello , this database version is 11.2.0.4 of Oracle Exadata. A
>>> table(say TAB1) is there holding ~900 million rows with size ~222GB and
>>> it's not partitioned. It has two indexes , one with a three column
>>> composite index with size ~98Gb and other is the primary key on one column
>>> with size ~23GB. As a part of the requirement we need to delete/purge 50%
>>> of its data from this table. No referential constraints exist here. So I
>>> wanted to understand, out of the two below, which is the best method to opt
>>> for? or any other possible better option?
>>>
>>> I can understand method-1 is achievable ONLINE, but is slower while
>>> method-2 will be faster. So if we can afford ~1-2hrs of downtime, is it
>>> good to go for method -2 as the delete approach. As because deleting 50%
>>> rows even in method-1 may also need a table move+index rebuild(which will
>>> again need downtime on 11.2) to lower the high water mark and make the
>>> indexes compact and back to normal. Please advise.
>>>
>>>
>>> Method-1:-
>>>
>>> steps- 1:
>>> In a cursor pick the ~450million rowids of the rows to be deleted
>>> based on filter criteria;
>>> step2:-
>>> Delete based on ROW_IDS in a bulk collect fashion with LIMIT 50K rows
>>> ids at oneshot and commit within loop.
>>>
>>> Method-2:-
>>>
>>> Step- 1
>>> Create a new table using CTAS a new table TAB1_BKP AS select *
>>> from TAB1 where (required filter criteria which will pick ~450 required
>>> rows);
>>> Step-2:-
>>> Create the composite index in parallel.to make it as fast as
>>> possible
>>> Create the unique index on the same column as there in PK.
>>> Create the primary constraints with NOVALIDATE(because creating it
>>> with validate may take a lot of time to validate existing data) using the
>>> above unique index (This operation should happen in seconds as the index is
>>> already created in the above step.)
>>> Rename the TAB1_BKP as TAB1 and TAB1 as TAB1_BKP(which can be served
>>> as backup for a few days and later dropped).
>>>
>>>
>>>
>>>
>> 
>>
>

--
http://www.freelists.org/webpage/oracle-l
Received on Mon Jul 12 2021 - 01:15:09 CEST

Original text of this message