RE: Question on data movement

From: Mark W. Farnham <mwf_at_rsiz.com>
Date: Sun, 23 Jun 2024 10:00:14 -0400
Message-ID: <2fbb01dac575$ac7427f0$055c77d0$_at_rsiz.com>



One key bit of information lacking is the volume of changes per unit time to the third party 10GB database. Since the total data size is miniscule in current technology (it would have qualified for VLDB in 1991, if well managed), it is the number and total data size of changes per unit time that is the more important consideration. If you have a way to capture the change transactions, that might be the most economic method if the rate of change is manageable.  

If the transaction rate is modest, you could implement result row “send back” to verify that the changes match the third party data on a trickle feed basis, with control totals on the number of rows changed verifying the send and receipt is correct. Whether you replay the transactions or just send the result row on the “third party” system for each row changed and use the “scaling to infinity” method to evolve a periodic matching result, this has the advantage of positive verification (and potentially a time series of as was values in case a logical application error requires a complete rollback).  

On the other hand, if the transaction rate is massive, you don’t want to be doing anything for each row like a trickle feed. Keeping a log of rows changed might be a challenge since it is a third party app, but at 10GB, resending the entire thing to get a time series might be reasonable. That is more dependent on your bandwidth between the time sharing systems. Of course never underestimate the bandwidth of a USB stick sent via fedex. (Some fellow dinosaurs will recognize that as a tech upgrade of the old saw “never underestimate the bandwidth of a box of 9track tapes on a greyhound bus” which I first heard from Mike Prince circa 1988, but which I also heard independently from Tim Gorman. That method depends on three things: 1) Your latency requirements, 2) Ease of getting a dump onto physical media and sent from Azure, and 3) Ease of submitting and loading that physical media to AWS. The bandwidth is very high, but the latency could be hours or days. Only your contracts and so forth affect 2 and 3.  

If you keep the sticks, that also gives you another image.  

Good luck,  

mwf  

From: oracle-l-bounce_at_freelists.org [mailto:oracle-l-bounce_at_freelists.org] On Behalf Of Lok P Sent: Sunday, June 23, 2024 2:10 AM
To: Oracle L
Subject: Question on data movement  

Hi,
We have a third party application which is having a backend hosted on Azure cloud Oracle database. It's currently processing all of our transactions. Due to some risk and reporting requirement , we want the data to be persisted in our own cloud account i.e. AWS. These data will be ingested to our own data lake through the existing data pipeline which comprises postgres database and snowflake databases. The total size of the third party Oracle database hosting our data is ~10GB.

For this we are thinking of doing it in two methods 1)Get the database full backup or dumpfile from the Oracle database hosted on Azure cloud and MFTS that to the AWS S3 bucket. And then subsequently import the incremental backups from the oracle database to AWS. This will ensure all of the transaction data(means all the data elements without miss) getting persisted in our cloud/data lake environment.

2)List the selected data elements which our datalake needs for the satisfying reporting requirement. Pass on the list of those selected data elements to the third party vendor , so that they will send the delta data/changed data each day in .csv file format for those selected data elements. This csv files will then MFTS to our AWS account S3 bucket and then we can then ingest those to our datalake system for satisfying our reporting needs.

But here I have a question,
If we get the full Oracle database dump from Azure cloud and move it to AWS S3 bucket and then subsequent incremental backup dump. Will this dump be in a state , so that it can be imported into AWS RDS oracle without any issue and the database schema along with the data can be visualized inside our AWS account whenever we need? Or do you suggest any other method to handle the above requirement?

Regards

Lok

--
http://www.freelists.org/webpage/oracle-l
Received on Sun Jun 23 2024 - 16:00:14 CEST

Original text of this message