Deleting data during ETL process [message #258981] |
Tue, 14 August 2007 00:40  |
adiga
Messages: 6 Registered: August 2007
|
Junior Member |
|
|
Hello All,
I am working on a data migration project where we are loading data in CSV files into an Oracle 10g DB using Business Objects ETL tool called Data Integrator.
Data has to be migrated from the current system to the new system that we are developing which uses Oracle DB. There are over 250 hospitals where the current system is being used.
Our schema in the Oracle DB is in third-normal form with some specific exceptions where we have denormalized for increased performance. Also, we have a lot of referential integrity constraints
The question is:
Say, after migrating data from one set of files, I find that there were some errors and I want to rerun my ETL process with the same set of files. I want to delete the data specific to this run from the DB. Note that the DB already has data loaded from different hospitals.
What is the best way to delete data specific to a hospital from the DB? Almost all the tables have hospital number field that I can use to identify the records. However, since there are a lot of referential integrity constraints, though I have a delete script in the proper order, the deletes will take a huge amount of time. Another approach I have thought about is to take a backup of the DB before running my ETL job, and restore the DB from the backup in case of any issues with the ETL job. But this would need the help of the DBA which would be an overhead for me since I keep running and rerunning my ETL process.
Is there any other approach that I can try?
Thanks in advance,
Pradeep
|
|
|
|
Re: Deleting data during ETL process [message #258990 is a reply to message #258983] |
Tue, 14 August 2007 01:22   |
adiga
Messages: 6 Registered: August 2007
|
Junior Member |
|
|
Each hospital has its own customization due to which, there might be some data issues during ETL. We log all the errors into an error log table. After checking this table for errors, we might fix the issue in the source system and generate a fresh set of source files that we need to load again into our Oracle DB. Hence, however much we test the ETL process in the test db, in production, we might have to rerun the job frequently.
|
|
|
|
|
Re: Deleting data during ETL process [message #259217 is a reply to message #258981] |
Tue, 14 August 2007 12:35   |
adiga
Messages: 6 Registered: August 2007
|
Junior Member |
|
|
Actually, the problem is not in inserting the data which we are doing using an ETL tool. The issue we are facing is while deleting data for one particular hospital. If I delete using delete statements, they are going to take a lot of time due to the foreign key constraints and I would have to analyze the schema after deleting so much data.
|
|
|
Re: Deleting data during ETL process [message #259246 is a reply to message #259217] |
Tue, 14 August 2007 18:34  |
 |
djmartin
Messages: 10181 Registered: March 2005 Location: Surges Bay TAS Australia
|
Senior Member Account Moderator |
|
|
Instead of deleting why not just update those that have changed data. As the key data will not have changed you won't have to worry about the foreign key constraint issues.
David
|
|
|