RE: Controlling load
Date: Wed, 22 Feb 2023 11:43:10 -0500
Message-ID: <090001d946dc$c027e610$4077b230$_at_rsiz.com>
Often batch managers are set to spawn n jobs asynchronously, and sometimes there is no load limiting smarts.
IF you are in control and the batch submitter is behaving poorly, often a quick and dirty fix is to break up the queue file that lists the jobs the job spawner is going to process into the number of jobs in parallel that WON’T affect interactive users. Estimating (or measuring) the time it should take the first set of jobs to finish, and set the job emitter to work off the next chunk of jobs at that time plus a little.
There also exist very sophisticated job submitters, but if you’re simply submitting too many in quick order, this may be the simplest change.
Good luck,
mwf
From: oracle-l-bounce_at_freelists.org [mailto:oracle-l-bounce_at_freelists.org] On Behalf Of Andy Sayer
Sent: Wednesday, February 22, 2023 9:52 AM
To: ilmar.kerm_at_gmail.com
Cc: Oracle-L Freelists; yudhi s
Subject: Re: Controlling load
Ok, sounds like cpu *should* be okay. I think we need to go back to performance diagnostics. I would still suggest you look at resource manager (and follow example on oracle-base, supplement with the docs to find out what you can do with just program - although I thought I put that in my first reply). You should also work with whoever owns these processes to configure their workload (ie batch shouldn’t kick off 100 sessions in the middle of the day).
Step 1: see the problem. It’s your non batch program you want to make faster, so those are the sessions you care about. Most of the time performance is fine, so we don’t care about a lot of the time. Where is the time going for your critical program when it is slower than necessary? You can query v$active_session_history manually to figure this out (use the dba_hist version if the event is no longer in memory and sufficiently long enough). Make sure you compare this to what normal is for the same workload (which means more like 100 actions than 5 minutes) to see the difference.
Once you have that information you can go down the next step: solve the problem. If the difference in time is the log file switch wait then you either have a misconfiguration, or you have a period of unsupportable redo generation. If it’s a different wait then it’s a different problem and a different solution.
Thanks,
Andy
On Wed, Feb 22, 2023 at 4:55 AM, Ilmar Kerm <ilmar.kerm_at_gmail.com> wrote:
There are multiple attributes for mapping sessions to consumer groups:
Maybe the attribute you are looking for is CLIENT_OS_USER? This is the "osuser" column in v$session
Also don't forget the mapping priorities:
And privileges, the database users must have privileges to switch to the consumer groups:
On Wed, Feb 22, 2023 at 2:32 PM yudhi s <learnerdatabase99_at_gmail.com> wrote:
Thank you Ilmar.
Going through the example which is in the doc - 848666.1, and its making the consumer group based on the DB username. In our case, these two programs are using exactly same osuser, Username, schemaname, service_name. And the client_id,action is populated as null for both of these. The only difference is program column i.e. for batch it looks something like "batch_prog_at_vm000123 (TNS V1-V3)" and for online "online_prog_at_vm000123 (TNS V1-V3)", and ~40+ different "VM******" machines from which these are coming from. So wondering, would we still be able to create different resource consumer group , just based on this substring of program column?
Additionally, to get the estimate of the CAP/Number of session which we should put for the BATCH programs "resource consumer group". How we should do that? We have total 48*2=96 cores per node. I tried getting the TOTAL AAS aginst those two programs side by side as below. If we eliminate the peak/issue period , the avg active sessions per minute is staying <10 for both the cases. Should we keep it as 40 as the limit for prog_batch?
https://gist.github.com/databasetech0073/dde4804a030814c93fc2111616cd2407
Regards
Yudhi
On Wed, 22 Feb, 2023, 3:12 pm Ilmar Kerm, <ilmar.kerm_at_gmail.com> wrote:
You could try:
active_sess_pool_p1 - Specifies maximum number of sessions that can currently have an active call
queueing_p1 - Specified time (in seconds) after which a call in the inactive session queue (waiting for execution) will time out. Default is NULL, which means unlimited.
Example in: How to Limit max number of concurrently active sessions with Resource Manager (Doc ID 848666.1)
On Wed, Feb 22, 2023 at 10:42 AM yudhi s <learnerdatabase99_at_gmail.com> wrote:
Thank you for the response.
As i mentioned we have 48 core CPU with 2 socket each for a node, so Total ~96 core per node. And the OEM also its showing showing the red line bar on AAS as 96 as Max limit. But during three different issue period we see the below AAS shoot ups above 96 and the wait distribution,
Basically its combination of wait event from classes like concurrency, configuration and others but CPU part is minimum.
But i think you are correct putting CAP on CPU using DB resource manager may not going to help us in above situation as cpu portion is less than 10%. However, i was thinking if any possible way to put cap on the AAS or the number of session those were spawning from the program prog_batch as because they are the one causing the chaos by submitting lot many session simultaneously. So is that possible with the DB resource manager configuration or by tweaking the existing services in anyway?
Regards
Yudhi
On Wed, 22 Feb, 2023, 12:54 pm Ilmar Kerm, <ilmar.kerm_at_gmail.com> wrote:
Resource manager can help you if you are seeing resource shortage (CPU shortage) during these periods and CPU shortage is causing the Concurrency waits.
If you are not seeing CPU shortage, then you need traditional database troubleshooting methods and resource manager will not help you.
For example if you are behind in 19c patching, then for example before 19.15 there was bug 32898678, 33541865 (fixed in 19.17) and multiple others that caused block change tracking (BCT) to freeze the database or just high BCT buffer space waits.
On Wed, Feb 22, 2023 at 6:08 AM yudhi s <learnerdatabase99_at_gmail.com> wrote:
Thank you Andy.
I have never used the DB resource consumer group in our databases though. But going through the doc, it seems it can be set for many number of input parameters and also on many resources. Yet to test it , however In most cases I see it's set on amount of cpu. So in our case will it be good idea to determine the cpu percentage and define the plan or we can do it on number of average active sessions which will be again hardcoded value only.
We need to set it for multiple input program values in our case. If you can pass on reference to some sample doc with its exact code implementation, that would be great.
On Wed, 22 Feb, 2023, 2:11 am Andy Sayer, <andysayer_at_gmail.com> wrote:
You can set up resource manager to map different programs to different resource consumer groups. You’ll need to make sure you order the priority of mapping to make sure program is checked before service name (if that’s not the default). It’s all in dbms_resource_manager
Thanks,
Andy
On Tue, Feb 21, 2023 at 12:19 PM, yudhi s <learnerdatabase99_at_gmail.com> wrote:
--
Ilmar Kerm
--
Ilmar Kerm
-- http://www.freelists.org/webpage/oracle-lReceived on Wed Feb 22 2023 - 17:43:10 CET