Re: Inserting with billion of rows faster

From: Lothar Flatz <l.flatz_at_bluewin.ch>
Date: Sun, 28 Mar 2021 18:36:56 +0200
Message-ID: <5ea4193d-2486-f1b9-4c65-e612c3bc9ab9_at_bluewin.ch>

Of course I mean that you should not delete but rather insert the missing rows.

Am 28.03.2021 um 13:18 schrieb Lothar Flatz:
> Hi Lok,
>
> First of all, why do you have to delete the data which you have
> already loaded? You do not give an explicit reason, but I guess it is
> because you don't know which rows are loaded and which are not.
> I think you would be much better of comparing keys and find out which
> rows are still to load, rather than deleting and reinserting.
> Reading is faster than inserting which is faster than deleting by far.
> Thus, essentially your cursor is going to change to something like:
>
>     cursor c is select
>                        rp.rowid
>                        from dp, rp
>                        where   .....................
>                        load_date >= &last_incomplete_load
>                         and not exists (select key from target_table
> where load_date >= ...)
>
>
> Basically you can plug this pl/SQL more or less 1:1 into dbms_parallel
> execute.
> There are only some minor changes to be made.
> In your cursor select you need to include a condition for start and
> end rowid. Essetially you add "and ROWID BETWEEN :start_id AND :end_id".
> Then you also need a chunking statement, which is essentially your
> cursor select again which would look like:
>
> select min(r) start_id, max(r) end_id from (
>              select
>                        ntile(48) over (order by rowid) grp, rp.rowid
>                        from dp, rp
>                        where   .....................
>                        load_date >= &last_incomplete_load
>                         and not exists (select key from target_table
> where load_date >= ...)
>             )
>    GROUP BY grp;
>
> Where 48 is the number of chunks I want. (Stew would argue that ntile
> is not precise, but I 'd say it is good enough...)
>
> Regards
>
> Lothar
> Am 28.03.2021 um 08:09 schrieb Lok P:
>> Thank you so much. I forgot to mention the version. It's 11.2.0.4 of
>> Oracle Exadata X7.
>>
>> Also I have never used dbms_paralle_execute before though but it
>> seems it will submit multiple jobs from different sessions which will
>> operate on different set of rowids and looks to have some bit of
>> coding involved, so will try to explore that. Btw currently as a less
>> time consuming and also less risky approach, i am thinking to just
>> drive the same bulk insert/delete code by just having PARALLEL(N)
>> hint while performing the INSERT/DELETE FOR ALL along with the same
>> parallel hint in the cursor query. Hopefully that will make things
>> faster.
>>
>> Regards
>> Lok
>>
>> On Sun, Mar 28, 2021 at 11:16 AM Laurentiu Oprea
>> <laurentiu.oprea06_at_gmail.com <mailto:laurentiu.oprea06_at_gmail.com>> wrote:
>>
>>
>> dbms_parallel_execute might be the method to go, you can find a
>> lot of examples on how you can implement it:
>>
>> https://blogs.oracle.com/warehousebuilder/parallel-processing-with-dbmsparallelexecute
>> <https://blogs.oracle.com/warehousebuilder/parallel-processing-with-dbmsparallelexecute>
>> https://oracle-base.com/articles/11g/dbms_parallel_execute_11gR2
>> <https://oracle-base.com/articles/11g/dbms_parallel_execute_11gR2>
>>
>> În dum., 28 mar. 2021 la 08:44, Lok P <loknath.73_at_gmail.com
>> <mailto:loknath.73_at_gmail.com>> a scris:
>>
>>
>> The delete which we are now using is something as below for
>> removing the duplicates from the target, but it's taking a
>> lot of time, So thinking if there is any faster way to
>> achieve it?
>>
>> DECLARE
>>
>>
>>     type rowidArray is table of rowid index by binary_integer;
>>
>>     type emRec is record
>>
>>     (rowid             rowidArray);
>>
>>
>>     l_record         emRec;
>>
>>
>>     l_array_size number:=10000;
>>
>>     l_done      boolean;
>>
>>     l_cnt       number:=1;
>>
>>
>>     l_processed number:=0;
>>
>>
>>     cursor c is select
>>
>>                        rp.rowid
>>
>>                        from dp, rp
>>
>>                        where   .....................;
>>
>>
>> BEGIN
>>
>>     dbms_application_info.set_module('Deletes ','Starting...');
>>
>>
>>     open c;
>>
>>     loop
>>
>>         dbms_application_info.set_action('Processing
>> '||l_cnt||' thru '||(l_cnt+l_array_size-1)||'
>> ('||l_processed||')');
>>
>>
>>         fetch c bulk collect into l_record.rowid LIMIT
>> l_array_size;
>>
>>         l_done := c%notfound;
>>
>>
>>         forall i in 1 .. l_record.rowid.count
>>
>>             delete from ...........
>>
>>              where rowid   = l_record.rowid(i);
>>
>>
>>          l_processed:=l_processed + SQL%ROWCOUNT;
>>
>>
>>         commit;
>>
>>
>>         exit when (l_done);
>>
>>
>>         l_cnt := l_cnt + l_array_size;
>>
>>     end loop;
>>
>>
>>     dbms_application_info.set_action('Processed
>> '||l_processed||' deletes FROM .......');
>>
>>
>> end;
>>
>> /
>>
>>
>>
>> On Sun, Mar 28, 2021 at 11:07 AM Lok P <loknath.73_at_gmail.com
>> <mailto:loknath.73_at_gmail.com>> wrote:
>>
>> Hi Listers, we have one process in which we load
>> ~50million daily (using direct path insert.. INSERT
>> APPEND ) to a target table which holds ~20billion
>> rows(~1TB in size)) and is list -hash composite
>> partitioned. ~2 list partition and ~2048 hash
>> subpartitions. It has 4 indexes out of those one is a
>> composite primary key comprising of 5 columns.We have
>> that data load job failing since ~15-20 days without
>> notice and it thus accumulated ~billion rows and the load
>> process is now failing with (Ora-01628 max extent 32765
>> reached for rollback segment).
>>
>> So we thought of running it in a bulk collect
>> method(append_values hint) and commit in chunks of
>> ~1million. But during that process we endup seeing the
>> data load is significantly slower , it was inserting
>> ~1million rows in ~40minutes. And we were not able to
>> understand the reason but the wait events were all
>> showing "cell single block physical read" and the object
>> was the target load object. Means it was the INSERt which
>> was struggling. So we now endup having partial data
>> loaded to the target i.e. around ~40-50million loaded to
>> the target. And as it has a primary key , we have to
>> delete the ~40-50million rows from the target table and
>> then reload it.
>>
>> I was thinking if we should do the delete in bulk method
>> with a chunk of 1million , but again that will happen in
>> a single thread and will be significantly slow. So what
>> is the best way to have those ~40-50million data deleted
>> from the target table(which holds a total ~20billion rows)?
>>
>> And then I was also thinking , if we could make the index
>> unusable and perform the delete and then data load(which
>> would happen with almost zero UNDo in absence of INDEXes)
>> , but then in that case it seems that DELETE will need
>> the PK index to fetch the rowids so we cant get rid of
>> the PK index then. So what is the best approach to go
>> for the delete and data load here without breaking data
>> consistency in the target table?
>>
>> So wanted to understand what is the best/faster approach
>> to go for delete and data load in this situation?
>>
>> Regards
>> Lok
>>
>

--
http://www.freelists.org/webpage/oracle-l

Received on Sun Mar 28 2021 - 18:36:56 CEST

This message: [ Message body ]
Next message: Tim Gorman: "Re: Inserting with billion of rows faster"
Previous message: William Robertson: "Re: Inserting with billion of rows faster"
In reply to Lothar Flatz: "Re: Inserting with billion of rows faster"
Next in thread: Lok P: "Re: Inserting with billion of rows faster"

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ]

Original text of this message