RE: Overhead of load-balanced microservices architecture
Date: Thu, 13 Aug 2020 05:02:37 +0000
Message-ID: <MWHPR19MB01410138B8A7F0A48A512B9E9B430_at_MWHPR19MB0141.namprd19.prod.outlook.com>
I’m by no means an expert on either F5 or Exadata hardware, and things have changed in the last 10 years.
That said; what you might run into (and what I DID run into almost 10 years ago with F5s and Oracle in “another life”) is network queuing. At the network and OS level (“below” Oracle), the (Oracle) listener tells the OS to start listening for connections on a specified port. 7/second is not THAT large; but, if one considers what happens when each connect request is received (several network round trips as TCP negotiates the higher level connections, a message “up” to the Oracle process at some point that tells the Oracle listener process to actually set up the database connection), some of which are “single threaded”; you may start to see queuing for some of those connection requests, and when that happens, it can “cascade” very quickly.
I’ll dig back in my notes and if can find something that specifically relates to what happened, I’ll post it
Clay Jackson
From: oracle-l-bounce_at_freelists.org <oracle-l-bounce_at_freelists.org> On Behalf Of DOUG KUSHNER
Sent: Wednesday, August 12, 2020 9:34 PM
To: oracle-l_at_freelists.org
CAUTION: This email originated from outside of the organization. Do not follow guidance, click links, or open attachments unless you recognize the sender and know the content is safe.
Our dev team recently rolled out an application using an F5 load-balanced microservices architecture. There are several miscroservices, each load balanced on up to 4 servers each, and each with a health-check api that hits the database. While this may have looked good on paper, just the overhead of the health-checks with no work being processed has resulted in roughly 7 connection attempts per second to the database. This results in a version check query about 40K times per hour. The database is on an Exadata (2-node RAC) with several other production databases.
Of course the Exadata has been handling it, so unless you are looking for anomalies (which I always am), this will fly under the radar until it doesn't. :)
I'm wondering if anyone knows how to determine the theoretical max connections/sec that a listener can handle based on the number of cores licensed in the system?
Also wondering if anyone here has encountered this scenario before and how they dealt with it. I'm also looking for a good reference on the subject.
My immediate focus will be on determining why these health check connections do not appear to be utilizing the services' connection pools, while the dev team determines whether they can relax the frequency of these health checks.
Regards,
Doug
Subject: Overhead of load-balanced microservices architecture
--
http://www.freelists.org/webpage/oracle-l
Received on Thu Aug 13 2020 - 07:02:37 CEST