Re: Design related question
Date: Sun, 10 Dec 2023 23:50:58 +0530
Message-ID: <CAEzWdqcWTw=fssOvVW68aQZ1McTEd9OObhFPUAx5ZQifaTesGw_at_mail.gmail.com>
Additionally, I want to understand, as we used to see the data dictionary views (called AWR views) in Oracle to see the current and historical performance statistics like CPU, IO , Memory usage, object level contentions etc. in the oracle database. Do we have such a thing available in Aurora postgre, so as to monitor the performance and get some idea of how well the load test goes and what capacity is available or are we saturating it?
Also can we just keep two databases one for OLTP and other for OLAP, i.e recent data on Aurora postgresql and then all historical data on Snowflake?
On Sun, Dec 10, 2023 at 11:34 PM yudhi s <learnerdatabase99_at_gmail.com> wrote:
> Thank you so much Mark, Pap.
>
> From the OLTP database we are expecting ~15K TPS write and 2K TPS read
> response. The response for the UI queries are expected to be within
> seconds. But yes, as mentioned individual transactions will be batched and
> then will be written to the database, so in that way if we use batch size
> of 1000, that ~15K online TPS would be converted to ~15 batch
> transactions/seconds which would consume significantly less resource and
> also the contentions will be minimized to a great extent. Please correct me
> if I'm wrong.
>
> Currently we are on Oracle Exadata on premise. To test if Aurora postgre
> will be comparable to cater the above needs (in regards to the expected
> performance with nominal cost) ,how should we test it? As we won't be able
> to do everything right away, Should we test basic read and write
> performance and benchmark to have some confidence and go ahead with
> development?
>
> Say for example if one transaction consists of ~8 Inserts we can create a
> sample target table on aurora Postgresql with required indexes/constraints
> and try running those inserts from multiple threads(for concurrency) using
> blazemeter and see/compare the response time, CPU, Memory etc. Similarly to
> see read performance we can run multiple select queries from blazemeter and
> compare the response time.
>
> As Mark rightly mentioned *"the most important bit is that you make sure
> all the pieces local to the data base before the first insert takes place
> and that since your commit point is by batches the entire batch is local to
> the database before the first insert takes place. You do not want even a
> LAN hiccup (or gcc overload in the case of RAC) exposing you extension of
> your concurrency footprint and most especially any uniqueness
> constraints." *
> So does this mean we need to write the app code such that all the values
> for those ~8 inserts queries for ~1000 batches(which would be ~8000
> inserts) have to be prepared collected and stored in the cache by the app
> layer and submitted to the database at one shot so that they can be either
> committed all in one shot or rollback. And that would be truly batch
> processing and would reduce the contentions in the target table making
> write faster.
> or else first Insert all the rows , one by one into a stage table and then
> insert into the main table(which will be having all the indexes/constraints
> etc) by selecting rows from the stage table in batches of ~1000 rows.
>
>
>
>
> On Sun, Dec 10, 2023 at 10:59 PM Pap <oracle.developer35_at_gmail.com> wrote:
>
>> I have seen a similar use case , live data along with recent few months
>> of data eligible for OLTP use case landed in AWS postgresql database and
>> then historical data(from few months to ~2years) in snowflake for the
>> analytics or OLAP use case. And rest of the data( 2years till 10 years)
>> which would be rarely used ,in S3 raw storage. This S3 data can be
>> extracted and pumped into the database and queries when in need.
>>
>> By the way, what is the current TPS your Oracle exadata system is
>> handling and what is expected from this new cloud movement, mainly from
>> OLTP use case perspective?
>>
>>
>>
>>
>> On Sun, Dec 10, 2023 at 7:42 PM Mark W. Farnham <mwf_at_rsiz.com> wrote:
>>
>>> This is the sort of thing that shooting from the hip would only
>>> coincidentally hit the optimal target.
>>>
>>>
>>>
>>> On two little factors I will comment claiming my comments are probably
>>> general:
>>>
>>>
>>>
>>> 1) On 7-8 inserts per transaction, the most important bit is that
>>> you make sure all the pieces local to the data base before the first insert
>>> takes place and that since your commit point is by batches the entire batch
>>> is local to the database before the first insert takes place. You do not
>>> want even a LAN hiccup (or gcc overload in the case of RAC) exposing you
>>> extension of your concurrency footprint and most especially any uniqueness
>>> constraints.
>>>
>>> 2) Some sort of “Scaling to Infinity” dataflow should be employed
>>> with respect to any partitioning of the transaction database and
>>> information life cycle longer retention. Usually either initial insert time
>>> or transaction completion time is the best archiving boundary, but either
>>> way the initial insert hour, day, week, or month is usually the best
>>> partitioning strategy to facilitate swap empty partition or truncate
>>> partition as the pathway to your “archive” database (whether or not the
>>> data from these archived partitions is then loaded into, say, snowflake, or
>>> some other decision support and analytical support data respository).
>>>
>>>
>>>
>>> All y’all should probably invest in specific outside resources to help
>>> you shape the data flow of your solution having read in on your specific
>>> and detailed requirements for at least: availability, disaster business
>>> continuation, critical queries your business needs answered, and the curve
>>> of rate of loss by time unavailable of each of the services these database
>>> services supply.
>>>
>>>
>>>
>>> Good luck,
>>>
>>>
>>>
>>> mwf
>>>
>>>
>>>
>>> *From:* oracle-l-bounce_at_freelists.org [mailto:
>>> oracle-l-bounce_at_freelists.org] *On Behalf Of *yudhi s
>>> *Sent:* Saturday, December 09, 2023 5:26 AM
>>> *To:* Oracle L
>>> *Subject:* Design related question
>>>
>>>
>>>
>>> Hello Listers,
>>> I understand colleagues here having decades of experience in the
>>> database world, We want some guidance, if the below design looks okay for
>>> our customer use case.
>>>
>>> We currently have financial systems transaction data streams to Oracle
>>> exadata(X9) on-premise. This database supports processing of 400million
>>> transactions per day. And again a single transaction for us is a
>>> combination of 7-8 inserts into different transaction tables with Indexes ,
>>> unique constraints etc defined on those and they commit in batches(~1000
>>> batch size) in the database. And this system persists data for ~6 months.
>>> We do have all sorts of OLAP(daily/monthly batch reports running)
>>> applications run on the same database along with some user facing UI
>>> applications showing customer transactions. So it's basically currently
>>> serving a hybrid workload and is one stop solution for all use cases.
>>>
>>> Many of the applications are moving from on premise to AWS cloud as part
>>> of modernization journey and AWS being chosen cloud partner also the
>>> product is expected to expand across more regions and this system is
>>> expected to serve increase in the transaction volume. And also we have a
>>> requirement to persist transaction data for ~10years to have those
>>> available for analytics/data science use cases.
>>>
>>> So the team is thinking of splitting it into two parts
>>> 1)OLTP type use case in which we will persist/write the transaction data
>>> faster and show it to the UI related apps , in near real time/quickest
>>> possible time. and this database will store Max 60-90 days of transaction
>>> data. Not sure if we have an option of Oracle exadata equivalent on AWS, so
>>> team planning of using/experimenting with Aurora postgres. Please correct
>>> me, if there are any other options we should use otherwise?
>>>
>>> 2)Then move the data beyond ~90 days into another database or object
>>> storage S3 which will keep it there for ~10 years and will be queryable
>>> using the necessary API's. That is supposed to cater to Olap/analytics/data
>>> science use cases etc.
>>>
>>> Is the above design is okay? and also in regards to the second point
>>> above i.e. persisting the historical data (that to be in queryable state),
>>> should we go for some database like snowflake or should just keep it on S3
>>> as is and make those queryable through APIs. Please advice?
>>>
>>>
>>>
>>> Regards
>>>
>>> Yudhi
>>>
>>
-- http://www.freelists.org/webpage/oracle-lReceived on Sun Dec 10 2023 - 19:20:58 CET