Re: Controlling load

From: yudhi s <learnerdatabase99_at_gmail.com>
Date: Wed, 22 Feb 2023 18:02:03 +0530
Message-ID: <CAEzWdqck4vBH=M07-GWTK=+KYG08i1jwq6GdyUEqwc=pvoKKkQ_at_mail.gmail.com>



Thank you Ilmar.

Going through the example which is in the doc - 848666.1, and its making the consumer group based on the DB username. In our case, these two programs are using exactly same osuser, Username, schemaname, service_name. And the client_id,action is populated as null for both of these. The only difference is program column i.e. for batch it looks something like "batch_prog_at_vm000123 (TNS V1-V3)" and for online "online_prog_at_vm000123 (TNS V1-V3)", and ~40+ different "VM******" machines from which these are coming from. So wondering, would we still be able to create different resource consumer group , just based on this substring of program column?

Additionally, to get the estimate of the CAP/Number of session which we should put for the BATCH programs "resource consumer group". How we should do that? We have total 48*2=96 cores per node. I tried getting the TOTAL AAS aginst those two programs side by side as below. If we eliminate the peak/issue period , the avg active sessions per minute is staying <10 for both the cases. Should we keep it as 40 as the limit for prog_batch?

https://gist.github.com/databasetech0073/dde4804a030814c93fc2111616cd2407

Regards
Yudhi

On Wed, 22 Feb, 2023, 3:12 pm Ilmar Kerm, <ilmar.kerm_at_gmail.com> wrote:

> You could try:
>
> https://docs.oracle.com/en/database/oracle/oracle-database/19/arpls/DBMS_RESOURCE_MANAGER.html#GUID-8EC6C735-338D-46D4-B346-AD16D0622B30
> active_sess_pool_p1 - Specifies maximum number of sessions that can
> currently have an active call
> queueing_p1 - Specified time (in seconds) after which a call in the
> inactive session queue (waiting for execution) will time out. Default is
> NULL, which means unlimited.
>
> Example in: How to Limit max number of concurrently active sessions with
> Resource Manager (Doc ID 848666.1)
>
> On Wed, Feb 22, 2023 at 10:42 AM yudhi s <learnerdatabase99_at_gmail.com>
> wrote:
>
>> Thank you for the response.
>>
>> As i mentioned we have 48 core CPU with 2 socket each for a node, so
>> Total ~96 core per node. And the OEM also its showing showing the red line
>> bar on AAS as 96 as Max limit. But during three different issue period we
>> see the below AAS shoot ups above 96 and the wait distribution,
>> Basically its combination of wait event from classes like concurrency,
>> configuration and others but CPU part is minimum.
>>
>> 1) AAS shoot up to - 180, out of which 7% CPU and 34% (buffer busy
>> waits), ~20% Log file switch checkpoint incomplete, 16% enq:tx allocate ITL
>> entry, 12% Others.
>>
>> 2) AAS shoot up to - 515, out of which 4% CPU and 46% (library cache
>> lock), ~17% library cache mutex, 12% enq:tx allocate ITL entry entry, 12%
>> Others.
>>
>> 3) AAS shoot up to - 120, out of which 9% CPU and 30% (buffer busy
>> waits), ~13% Log file switch checkpoint incomplete, 31% enq:tx allocate ITL
>> entry entry, 8% BCT buffer space.
>>
>> But i think you are correct putting CAP on CPU using DB resource manager
>> may not going to help us in above situation as cpu portion is less than
>> 10%. However, i was thinking if any possible way to put cap on the AAS or
>> the number of session those were spawning from the program prog_batch as
>> because they are the one causing the chaos by submitting lot many session
>> simultaneously. So is that possible with the DB resource manager
>> configuration or by tweaking the existing services in anyway?
>>
>> Regards
>> Yudhi
>>
>>
>> On Wed, 22 Feb, 2023, 12:54 pm Ilmar Kerm, <ilmar.kerm_at_gmail.com> wrote:
>>
>>> Resource manager can help you if you are seeing resource shortage (CPU
>>> shortage) during these periods and CPU shortage is causing the Concurrency
>>> waits.
>>> If you are not seeing CPU shortage, then you need traditional database
>>> troubleshooting methods and resource manager will not help you.
>>> For example if you are behind in 19c patching, then for example before
>>> 19.15 there was bug 32898678, 33541865 (fixed in 19.17) and multiple others
>>> that caused block change tracking (BCT) to freeze the database or just high
>>> BCT buffer space waits.
>>>
>>> On Wed, Feb 22, 2023 at 6:08 AM yudhi s <learnerdatabase99_at_gmail.com>
>>> wrote:
>>>
>>>> Thank you Andy.
>>>>
>>>> I have never used the DB resource consumer group in our databases
>>>> though. But going through the doc, it seems it can be set for many number
>>>> of input parameters and also on many resources. Yet to test it , however In
>>>> most cases I see it's set on amount of cpu. So in our case will it be good
>>>> idea to determine the cpu percentage and define the plan or we can do it on
>>>> number of average active sessions which will be again hardcoded value only.
>>>>
>>>> We need to set it for multiple input program values in our case. If
>>>> you can pass on reference to some sample doc with its exact code
>>>> implementation, that would be great.
>>>>
>>>> On Wed, 22 Feb, 2023, 2:11 am Andy Sayer, <andysayer_at_gmail.com> wrote:
>>>>
>>>>> You can set up resource manager to map different programs to different
>>>>> resource consumer groups. You’ll need to make sure you order the priority
>>>>> of mapping to make sure program is checked before service name (if that’s
>>>>> not the default). It’s all in dbms_resource_manager
>>>>>
>>>>> Thanks,
>>>>> Andy
>>>>>
>>>>> On Tue, Feb 21, 2023 at 12:19 PM, yudhi s <learnerdatabase99_at_gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hello Listers,
>>>>>> On a normal day, we see sessions/AAS per ~1 minute interval as ~50 in
>>>>>> this database. and they are mainly from two specific 'programs' (say e.g.
>>>>>> prog_batch, prog_online). But during certain periods of the day we saw a
>>>>>> large number of sessions with programs as 'prog_batch' (i.e. with AAS>~500)
>>>>>> flooding the database and we are seeing large concurrency waits, bct buffer
>>>>>> space waits all over the database. And its impacting critical sessions of
>>>>>> other programs i.e. prog_online. So considering 'prog_batch' is meant for
>>>>>> batch kind of processes and it's okay for those sessions to queue-up and
>>>>>> run a few minutes longer, but at the same time we can't afford to impact
>>>>>> the sessions from prog_online, as those are latency sensitive online users.
>>>>>> But we found both of these programs are pointing to the same database
>>>>>> services and running on the same node-1. So , apart from controlling the
>>>>>> number of sessions/connections from the application end, can we do some
>>>>>> easy fixes with regards to the service config level, which will better
>>>>>> control the incoming load? Or say putting some cap on the incoming sessions
>>>>>> from prog_batch?
>>>>>> This is a 19C database with 2 node RAC and its an Exadata machine.
>>>>>> Each node is a 48 core, 2 socket. Both the programs were using the same
>>>>>> service and having preferred nodes as node-1.
>>>>>>
>>>>>> Regards
>>>>>> Yudhi
>>>>>>
>>>>>
>>>
>>> --
>>> Ilmar Kerm
>>>
>>
>
> --
> Ilmar Kerm
>

--
http://www.freelists.org/webpage/oracle-l
Received on Wed Feb 22 2023 - 13:32:03 CET

Original text of this message