Re: Question on data movement

From: Lok P <loknath.73_at_gmail.com>
Date: Sun, 23 Jun 2024 20:34:21 +0530
Message-ID: <CAKna9VYcUB56dZ9e53LY-zVjJf88ZQHCvvohTc4TnJOn2LcFuw_at_mail.gmail.com>



Thank you. Agree with your point that the rate of change is key here which will decide the strategy. And we have very few changes happen each day i.e. ~few hundred transactions. However, another requirement is that in the future there is a plan to get rid of the third party vendor and come up with our own system to handle the transaction. So in that case for security reasons we may need to be asked to have all the data elements which gets captured by the current vendor system, which we won't be having if we take selected data elements to ingest in our current data lake.

And the key thing which I was trying to understand is ,if we export the current RMAN backup and subsequent incremental backups of the data from the Azure cloud hosted oracle database, will it be imported into our AWS cloud RDS oracle without any issue? Or should we rather do the weekly expdp dump which will be imported into the AWS hosted RDS oracle?

On Sun, Jun 23, 2024 at 7:30 PM Mark W. Farnham <mwf_at_rsiz.com> wrote:

> One key bit of information lacking is the volume of changes per unit time
> to the third party 10GB database. Since the total data size is miniscule in
> current technology (it would have qualified for VLDB in 1991, if well
> managed), it is the number and total data size of changes per unit time
> that is the more important consideration. If you have a way to capture the
> change transactions, that might be the most economic method if the rate of
> change is manageable.
>
>
>
> If the transaction rate is modest, you could implement result row “send
> back” to verify that the changes match the third party data on a trickle
> feed basis, with control totals on the number of rows changed verifying the
> send and receipt is correct. Whether you replay the transactions or just
> send the result row on the “third party” system for each row changed and
> use the “scaling to infinity” method to evolve a periodic matching result,
> this has the advantage of positive verification (and potentially a time
> series of as was values in case a logical application error requires a
> complete rollback).
>
>
>
> On the other hand, if the transaction rate is massive, you don’t want to
> be doing anything for each row like a trickle feed. Keeping a log of rows
> changed might be a challenge since it is a third party app, but at 10GB,
> resending the entire thing to get a time series might be reasonable. That
> is more dependent on your bandwidth between the time sharing systems. Of
> course never underestimate the bandwidth of a USB stick sent via fedex.
> (Some fellow dinosaurs will recognize that as a tech upgrade of the old saw
> “never underestimate the bandwidth of a box of 9track tapes on a greyhound
> bus” which I first heard from Mike Prince circa 1988, but which I also
> heard independently from Tim Gorman. That method depends on three things:
> 1) Your latency requirements, 2) Ease of getting a dump onto physical media
> and sent from Azure, and 3) Ease of submitting and loading that physical
> media to AWS. The bandwidth is very high, but the latency could be hours or
> days. Only your contracts and so forth affect 2 and 3.
>
>
>
> If you keep the sticks, that also gives you another image.
>
>
>
> Good luck,
>
>
>
> mwf
>
>
>
> *From:* oracle-l-bounce_at_freelists.org [mailto:
> oracle-l-bounce_at_freelists.org] *On Behalf Of *Lok P
> *Sent:* Sunday, June 23, 2024 2:10 AM
> *To:* Oracle L
> *Subject:* Question on data movement
>
>
>
> Hi,
> We have a third party application which is having a backend hosted on
> Azure cloud Oracle database. It's currently processing all of our
> transactions. Due to some risk and reporting requirement , we want the data
> to be persisted in our own cloud account i.e. AWS. These data will be
> ingested to our own data lake through the existing data pipeline which
> comprises postgres database and snowflake databases. The total size of the
> third party Oracle database hosting our data is ~10GB.
>
> For this we are thinking of doing it in two methods
> 1)Get the database full backup or dumpfile from the Oracle database hosted
> on Azure cloud and MFTS that to the AWS S3 bucket. And then subsequently
> import the incremental backups from the oracle database to AWS. This will
> ensure all of the transaction data(means all the data elements without
> miss) getting persisted in our cloud/data lake environment.
>
> 2)List the selected data elements which our datalake needs for the
> satisfying reporting requirement. Pass on the list of those selected data
> elements to the third party vendor , so that they will send the delta
> data/changed data each day in .csv file format for those selected data
> elements. This csv files will then MFTS to our AWS account S3 bucket and
> then we can then ingest those to our datalake system for satisfying our
> reporting needs.
>
> But here I have a question,
> If we get the full Oracle database dump from Azure cloud and move it to
> AWS S3 bucket and then subsequent incremental backup dump. Will this dump
> be in a state , so that it can be imported into AWS RDS oracle without any
> issue and the database schema along with the data can be visualized inside
> our AWS account whenever we need? Or do you suggest any other method to
> handle the above requirement?
>
> Regards
>
> Lok
>
>
>

--
http://www.freelists.org/webpage/oracle-l
Received on Sun Jun 23 2024 - 17:04:21 CEST

Original text of this message