RE: DMON killing RSM0?

From: Herring, Dave <"Herring,>
Date: Wed, 17 Jun 2020 19:57:57 +0000
Message-ID: <CH2PR02MB66641AA040D6EBB10C8DFB1CD49A0_at_CH2PR02MB6664.namprd02.prod.outlook.com>



The RSM tracefiles have the following message in them:

krss_req_task_reg: Removing previously registered task BROKER WORKER for process RSM0

For the RSM that eventually does not get killed, in addition to the above he has a series of messages like:

krsk_srl_access: Failed to open LNO:412: err=235

krsk_srl_access: Failed to open LNO:413: err=235

The "LN0" value increments each line. I know that RSM attempts to have NSV contact the standby so I checked NSV's tracefile and around the same time it generates messages like:

rfi_chk_ipmsg: Timeout in executing inter-instance message.

ASH shows that for the RSM processes that are created and eventually killed during this time range all show waits on "kfk: async disk IO".

Regards,

Dave

From: oracle-l-bounce_at_freelists.org <oracle-l-bounce_at_freelists.org> On Behalf Of Mladen Gogala Sent: Saturday, June 13, 2020 9:51 AM
To: oracle-l_at_freelists.org
Subject: Re: DMON killing RSM0?

CAUTION: This email originated from outside of D&B. Please do not click links or open attachments unless you recognize the sender and know the content is safe.

Hi Dave,

These errors are network timeout errors. RSM processes monitor the standby status. Oracle connects to the primary port, usually 1521, and then the the connection is handed to the dynamic ports. Firewall settings sometimes cut these ports off, at least some of them. The default setting with Oracle installation is something like:

net.ipv4.ip_local_port_range = 9000 65500

Your firewall may be configured to have dynamic ports between 32000 and 55000. The result is the situation in which Linux attempts to hand off the primary connection to the dynamic port which is blocked by firewall. Each killed remote status monitor (RSM) will produce its own trace. Please, check the trace and if you see something like "timeout on the port 55831" then you know that there is some configuration you need to do. Here is a decent article about the dynamic (local) ports:

https://blog.fpmurphy.com/2015/02/ip-dynamic-port-range.html<https://nam03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fblog.fpmurphy.com%2F2015%2F02%2Fip-dynamic-port-range.html&data=02%7C01%7Cherringd%40dnb.com%7Cacd666ae74f54520dd0608d80fa94815%7C19e2b708bf12437597198dec42771b3e%7C0%7C0%7C637276567031592035&sdata=KugYcDMmGKwZE1mME%2BbPvCFrUsjcEP9yw5H0vxf9SeI%3D&reserved=0>

Fortunately, you don't have to deal with the logical standby. Now, that would be fun for the whole family. In addition to the archive delivery and the status monitoring, there is also a redo apply process.

Regards
On 6/12/20 5:39 PM, Herring, Dave (Redacted sender HerringD for DMARC) wrote: I have a situation where it looks like the DMON process is killing off RSM0 processes every night around the same time and I don't have a good explanation as to why. This is on a 4-node Exadata env running 18c with 6 dbs, all using DG (the standby is also a 4-node Exadata env).

Every night between 20:12 and 21:35 we get a series of ORA-16665 errors from all databases, errors found in the broker's logfile. Checking each db's alert log I see messages like the following:

Process RSM0, PID = 51310, will be killed Process termination requested for pid 51310 [source = rdbms], [info = 2] [request issued by pid: 76161, uid: 110]

SPID 76161 is DMON, which means every night DMON kills off RSM0 processes around the same time. This is done for all databases.

Is there a DG broker setting that says to wipe out all DGB resource processes and restart them?

Regards,

Dave

--

Mladen Gogala

Database Consultant

Tel: (347) 321-1217

--

http://www.freelists.org/webpage/oracle-l Received on Wed Jun 17 2020 - 21:57:57 CEST

Original text of this message