Re: How to choose a database

From: Ludovico Caldara <ludovico.caldara_at_gmail.com>
Date: Thu, 27 Apr 2023 16:53:03 +0200
Message-ID: <CALSQGrJK_tFvRoMmWgpBF6y8pVaATh1Q2WLzjzjHosrbCfFTwQ_at_mail.gmail.com>



Disclaimer: I work for Oracle.

When I hear "monolithic databases don't scale", my first question is "What are the scalability goals that you can't meet with \"monolithic\" databases?"
As a first step for scalability in Oracle, IF YOU REACH THE ACTUAL PLATFORM LIMITS, is Oracle RAC, where you can scale to up to 100 nodes (that's the number I remember).
The next thing I hear is "RAC doesn't scale", and the argument is usually related to GC events.
The problem when you have GC events (and in general when you have contentions on distributed systems), is that the application isn't designed to scale.
This is the real limit where ALL the platforms, including distributed databases, can't really help.

Having high contention on rows in a distributed system is just worse than on monoliths: data modeling is way more important than the platform scalability itself.

If you carefully design your application, you can push monolithic database limits higher, and scale almost linearly on RAC or shared/distributed systems.
E.g., we've managed to run tests internally in Oracle where we could easily reach 1MTPS on a single Exadata rack. Is it representative of a real workload? Obviously not. The test was created on purpose, to avoid any contentions. The same goes for every vendor benchmark that claims x KTPS or MTPS. The difference between in-house benchmarks and BlueKai use case is that BlueKai shows real production numbers, taken from production measurements/dashboards. They have 52 instances running Oracle Sharding. The theoretical limit is 1000. But they have been incredibly good over the years to create an application that scales linearly using Oracle Sharding as an underlying (scalable) platform.
So yeah, coming to "how to choose a database", scalability is not the first requirement I think about. I'd rather think about robustness/reliability, consistency requirements, development features, easiness of integration, and how it fits into the Data Management policies and operations of the company.

(just my 2 cents)

Il giorno mer 26 apr 2023 alle ore 05:20 Pap <oracle.developer35_at_gmail.com> ha scritto:

> Thank you Mladen.
> I saw those figures from Oracle as somebody posted doing 1M transactions
> per sec achieved using shards. But also saw some blogs stating 20K TPS
> achieved using distributed database yugabyte DB. Are those not the true
> figures?
>
> https://www.yugabyte.com/blog/mindgate-scales-payment-infrastructure/
>
>
> https://blogs.oracle.com/database/post/oracle-bluekai-data-management-platform-scales-to-1-million-transactions-per-second-with-oracle-database-sharding-deployed-in-oracle-cloud-infrastructure
>
> And yes, It's an up and running system which is catering to the business.
> But the new system is completely written from scratch
> (mostly because the existing system complexity is increasing day by day)
> using modern techstacks microservices etc so as to cater future growth and
> provide required scalability, resiliency, availability etc.
>
> On Wed, Apr 26, 2023 at 2:14 AM Mladen Gogala <gogala.mladen_at_gmail.com>
> wrote:
>
>> On 4/25/23 15:07, Lok P wrote:
>>
>> " *For now, I am only aware that the database requirement was for a
>> financial services project which would be hosted on AWS cloud and one RDBMS
>> for storing and processing live users transaction data(retention upto
>> ~3months and can go ~80TB+ in size, ~500million transaction/day) and
>> another OLAP database for doing reporting/analytics on those and persisting
>> those for longer periods(many years, can go till petabytes).* "
>>
>> 500 million transactions per day? That is 5787 transactions per second.
>> Only Oracle and DB2 can do that reliably, day after day, with no
>> interruptions. You will also need very large machine, like HP SuperDome or
>> IBM LinuxOne. To quote a very famous movie, you'll need a bigger boat. I
>> have never heard on anything else in the PB range. You may want to contact
>> Luca Canali or Jeremiah Wilton who have both worked with monstrous servers.
>>
>> Not only will you need a bigger boat, you will also need a very capable
>> SAN device, preferably something like XTremIO or NetApp Flash Array. With
>> almost 6000 TPS, the average time for the entire transaction is 1/6 of a
>> millisecond. In other words, you need I/O time in microseconds. The usual
>> "log file sync" duration of 2 milliseconds will simply not do. You will
>> need log file sync lasting 200 microseconds or less. Those are the physical
>> prerequisites for such a configuration. You will also need to tune the
>> application well. One full table scan or slow range scan and you can kiss
>> 6000 TPS good bye.
>>
>> Your description is pretty extreme. 6000 TPS is a lot. That is an extreme
>> requirement which can only be achieved by the combination of specialized
>> hardware and highly skilled application architecting. Fortunately, there is
>> oracle-l, which can help with the timely quotes from Douglas Adams, Arthur
>> C. Clarke and Monty Python. And of course: all your base are belong to us.
>>
>> --
>> Mladen Gogala
>> Database Consultant
>> Tel: (347) 321-1217https://dbwhisperer.wordpress.com
>>
>>

--
http://www.freelists.org/webpage/oracle-l
Received on Thu Apr 27 2023 - 16:53:03 CEST

Original text of this message