Re: Inserting with billion of rows faster

From: Lothar Flatz <l.flatz_at_bluewin.ch>
Date: Sun, 28 Mar 2021 13:18:05 +0200
Message-ID: <a4f48404-c6dd-0594-438d-217690f17f69_at_bluewin.ch>

Hi Lok,

First of all, why do you have to delete the data which you have already loaded? You do not give an explicit reason, but I guess it is because you don't know which rows are loaded and which are not. I think you would be much better of comparing keys and find out which rows are still to load, rather than deleting and reinserting. Reading is faster than inserting which is faster than deleting by far. Thus, essentially your cursor is going to change to something like:

     cursor c is select
                        rp.rowid
                        from dp, rp
                        where   .....................
                        load_date >= &last_incomplete_load
                         and not exists (select key from target_table

where load_date >= ...)

Basically you can plug this pl/SQL more or less 1:1 into dbms_parallel execute.
There are only some minor changes to be made. In your cursor select you need to include a condition for start and end rowid. Essetially you add "and ROWID BETWEEN :start_id AND :end_id". Then you also need a chunking statement, which is essentially your cursor select again which would look like:

select min(r) start_id, max(r) end_id from (

              select
                        ntile(48) over (order by rowid) grp, rp.rowid
                        from dp, rp
                        where   .....................
                        load_date >= &last_incomplete_load
                         and not exists (select key from target_table

where load_date >= ...)
)
GROUP BY grp;

Where 48 is the number of chunks I want. (Stew would argue that ntile is not precise, but I 'd say it is good enough...)

Regards

Lothar
Am 28.03.2021 um 08:09 schrieb Lok P:
> Thank you so much. I forgot to mention the version. It's 11.2.0.4 of
> Oracle Exadata X7.
>
> Also I have never used dbms_paralle_execute before though but it seems
> it will submit multiple jobs from different sessions which will
> operate on different set of rowids and looks to have some bit of
> coding involved, so will try to explore that. Btw currently as a less
> time consuming and also less risky approach, i am thinking to just
> drive the same bulk insert/delete code by just having PARALLEL(N) hint
> while performing the INSERT/DELETE FOR ALL along with the same
> parallel hint in the cursor query. Hopefully that will make things
> faster.
>
> Regards
> Lok
>
> On Sun, Mar 28, 2021 at 11:16 AM Laurentiu Oprea
> <laurentiu.oprea06_at_gmail.com <mailto:laurentiu.oprea06_at_gmail.com>> wrote:
>
>
> dbms_parallel_execute might be the method to go, you can find a
> lot of examples on how you can implement it:
>
> https://blogs.oracle.com/warehousebuilder/parallel-processing-with-dbmsparallelexecute
> <https://blogs.oracle.com/warehousebuilder/parallel-processing-with-dbmsparallelexecute>
> https://oracle-base.com/articles/11g/dbms_parallel_execute_11gR2
> <https://oracle-base.com/articles/11g/dbms_parallel_execute_11gR2>
>
> În dum., 28 mar. 2021 la 08:44, Lok P <loknath.73_at_gmail.com
> <mailto:loknath.73_at_gmail.com>> a scris:
>
>
> The delete which we are now using is something as below for
> removing the duplicates from the target, but it's taking a lot
> of time, So thinking if there is any faster way to achieve it?
>
> DECLARE
>
>
>     type rowidArray is table of rowid index by binary_integer;
>
>     type emRec is record
>
>     (rowid             rowidArray);
>
>
>     l_record         emRec;
>
>
>     l_array_size number:=10000;
>
>     l_done      boolean;
>
>     l_cnt       number:=1;
>
>
>     l_processed number:=0;
>
>
>     cursor c is select
>
>                        rp.rowid
>
>                        from dp, rp
>
>                        where   .....................;
>
>
> BEGIN
>
>     dbms_application_info.set_module('Deletes ','Starting...');
>
>
>     open c;
>
>     loop
>
>         dbms_application_info.set_action('Processing
> '||l_cnt||' thru '||(l_cnt+l_array_size-1)||'
> ('||l_processed||')');
>
>
>         fetch c bulk collect into l_record.rowid LIMIT
> l_array_size;
>
>         l_done := c%notfound;
>
>
>         forall i in 1 .. l_record.rowid.count
>
>             delete from ...........
>
>              where rowid   = l_record.rowid(i);
>
>
>          l_processed:=l_processed + SQL%ROWCOUNT;
>
>
>         commit;
>
>
>         exit when (l_done);
>
>
>         l_cnt := l_cnt + l_array_size;
>
>     end loop;
>
>
>     dbms_application_info.set_action('Processed
> '||l_processed||' deletes FROM .......');
>
>
> end;
>
> /
>
>
>
> On Sun, Mar 28, 2021 at 11:07 AM Lok P <loknath.73_at_gmail.com
> <mailto:loknath.73_at_gmail.com>> wrote:
>
> Hi Listers, we have one process in which we load
> ~50million daily (using direct path insert.. INSERT APPEND
> ) to a target table which holds ~20billion rows(~1TB in
> size)) and is list -hash composite partitioned. ~2 list
> partition and ~2048 hash subpartitions. It has 4 indexes
> out of those one is a composite primary key comprising of
> 5 columns.We have that data load job failing since ~15-20
> days without notice and it thus accumulated ~billion rows
> and the load process is now failing with (Ora-01628 max
> extent 32765 reached for rollback segment).
>
> So we thought of running it in a bulk collect
> method(append_values hint) and commit in chunks of
> ~1million. But during that process we endup seeing the
> data load is significantly slower , it was inserting
> ~1million rows in ~40minutes. And we were not able to
> understand the reason but the wait events were all showing
> "cell single block physical read" and the object was the
> target load object. Means it was the INSERt which was
> struggling. So we now endup having partial data loaded to
> the target i.e. around ~40-50million loaded to the target.
> And as it has a primary key , we have to delete the
> ~40-50million rows from the target table and then reload it.
>
> I was thinking if we should do the delete in bulk method
> with a chunk of 1million , but again that will happen in a
> single thread and will be significantly slow. So what is
> the best way to have those ~40-50million data deleted from
> the target table(which holds a total ~20billion rows)?
>
> And then I was also thinking , if we could make the index
> unusable and perform the delete and then data load(which
> would happen with almost zero UNDo in absence of INDEXes)
> , but then in that case it seems that DELETE will need the
> PK index to fetch the rowids so we cant get rid of the PK
> index then. So what is the best approach to go for
> the delete and data load here without breaking data
> consistency in the target table?
>
> So wanted to understand what is the best/faster approach
> to go for delete and data load in this situation?
>
> Regards
> Lok
>

--
http://www.freelists.org/webpage/oracle-l

Received on Sun Mar 28 2021 - 13:18:05 CEST

This message: [ Message body ]
Next message: Mark W. Farnham: "RE: Inserting with billion of rows faster"
In reply to Lok P: "Re: Inserting with billion of rows faster"
Next in thread: Lothar Flatz: "Re: Inserting with billion of rows faster"
Reply: Lothar Flatz: "Re: Inserting with billion of rows faster"

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ]

Original text of this message