11.2.0.4 RAC ora.ons and ora.oc4j resources down
Date: Fri, 1 Nov 2013 09:41:52 -0500
Message-ID: <D95BD5AFADBB0F4E9BB6C53F14D3A05006BBE7B982_at_JRCEXC1V1.research.na.admworld.com>
(second attempt to send...first didn't appear to get to the list)
Oracle Linux 6
11.2.0.4 RAC 2 node (development) and 3 node (production)
I'm very new to RAC so pardon my leaving out relevant details.
2 days ago I received several alerts from EM12c indicating a state change for the ora.ons and ora.oc4j resources in both production at development environments. Both resources in both environments ran into problems at basically the same time (11:00:04 - 11:01:02).
This hasn't affected the availability of either database. On initial investigation, I noticed many log messages indicating problems with NTP so we switch to CTSS. No reason to believe that was related but it is something we've changed since the problem occurred. Because this happened in both environments at the same time, I'm thinking it is something external to Oracle or at least to RAC that caused the problems but I have no idea where to start looking. I've been through all sorts of log files including the ones mentioned below and nothing jumps out of me as relevant.
Can anyone get me started down some productive troubleshooting? ...or, even better, already know what my problem is?
"crsctl stat res -t" shows (dev):
ora.ons
ONLINE UNKNOWN admoract1n1 CHECK TIMED OUT ONLINE UNKNOWN admoract1n2 CHECK TIMED OUT
ora.oc4j
1 ONLINE OFFLINE "crsctl start resource -all" results in:
crsctl start resource -all
CRS-5702: Resource 'ora.DATA.dg' is already running on 'admoract1n1' CRS-5702: Resource 'ora.FRA.dg' is already running on 'admoract1n1' CRS-5702: Resource 'ora.LISTENER.lsnr' is already running on 'admoract1n1' CRS-5702: Resource 'ora.LISTENER_SCAN1.lsnr' is already running on 'admoract1n2' CRS-5702: Resource 'ora.LISTENER_SCAN2.lsnr' is already running on 'admoract1n1' CRS-5702: Resource 'ora.LISTENER_SCAN3.lsnr' is already running on 'admoract1n1' CRS-5702: Resource 'ora.asm' is already running on 'admoract1n1' CRS-5702: Resource 'ora.LISTENER.lsnr' is already running on 'admoract1n1' CRS-2501: Resource 'ora.gsd' is disabled CRS-5702: Resource 'ora.admoract1n1.vip' is already running on 'admoract1n1' CRS-5702: Resource 'ora.asm' is already running on 'admoract1n2' CRS-5702: Resource 'ora.LISTENER.lsnr' is already running on 'admoract1n2' CRS-2501: Resource 'ora.gsd' is disabled CRS-5702: Resource 'ora.admoract1n2.vip' is already running on 'admoract1n2' CRS-5702: Resource 'ora.asm' is already running on 'admoract1n1' CRS-5702: Resource 'ora.cvu' is already running on 'admoract1n1' CRS-2501: Resource 'ora.gsd' is disabled CRS-5702: Resource 'ora.net1.network' is already running on 'admoract1n1' CRS-5702: Resource 'ora.oract1db.adm_dba.svc' is already running on 'admoract1n1' CRS-5702: Resource 'ora.oract1db.db' is already running on 'admoract1n1' CRS-5702: Resource 'ora.oract1db.grcl_712_dev.svc' is already running on 'admoract1n1' CRS-5702: Resource 'ora.oract1db.grcl_712_grmt1.svc' is already running on 'admoract1n1' CRS-5702: Resource 'ora.oract1db.grcl_712_test.svc' is already running on 'admoract1n1' CRS-5702: Resource 'ora.registry.acfs' is already running on 'admoract1n1' CRS-5702: Resource 'ora.scan1.vip' is already running on 'admoract1n2' CRS-5702: Resource 'ora.scan2.vip' is already running on 'admoract1n1' CRS-5702: Resource 'ora.scan3.vip' is already running on 'admoract1n1' CRS-2679: Attempting to clean 'ora.ons' on 'admoract1n2' CRS-2679: Attempting to clean 'ora.ons' on 'admoract1n1' CRS-2672: Attempting to start 'ora.oc4j' on 'admoract1n2' CRS-5014: Agent "/u01/app/11.2.0.4/grid/bin/oraagent.bin" timed out starting process "/u01/app/11.2.0.4/grid/opmn/bin/onsctli" for action "clean": details at "(:CLSN00009:)" in "/u01/app/11.2.0.4/grid/log/admoract1n2/agent/crsd/oraagent_grid/oraagent_grid.log" CRS-5017: The resource action "ora.ons clean" encountered the following error:
(:CLSN00009:)Utils:execCmd aborted. For details refer to "(:CLSN00106:)" in "/u01/app/11.2.0.4/grid/log/admoract1n2/agent/crsd/oraagent_grid/oraagent_grid.log".
CRS-5014: Agent "/u01/app/11.2.0.4/grid/bin/oraagent.bin" timed out starting process "/u01/app/11.2.0.4/grid/opmn/bin/onsctli" for action "clean": details at "(:CLSN00009:)" in "/u01/app/11.2.0.4/grid/log/admoract1n1/agent/crsd/oraagent_grid/oraagent_grid.log" CRS-5017: The resource action "ora.ons clean" encountered the following error:
(:CLSN00009:)Utils:execCmd aborted. For details refer to "(:CLSN00106:)" in "/u01/app/11.2.0.4/grid/log/admoract1n1/agent/crsd/oraagent_grid/oraagent_grid.log".
CRS-5014: Agent "/u01/app/11.2.0.4/grid/bin/oraagent.bin" timed out starting process "/u01/app/11.2.0.4/grid/opmn/bin/onsctli" for action "check": details at "(:CLSN00009:)" in "/u01/app/11.2.0.4/grid/log/admoract1n1/agent/crsd/oraagent_grid/oraagent_grid.log" CRS-5017: The resource action "ora.ons check" encountered the following error:
(:CLSN00009:)Utils:execCmd aborted. For details refer to "(:CLSN00109:)" in "/u01/app/11.2.0.4/grid/log/admoract1n1/agent/crsd/oraagent_grid/oraagent_grid.log".
CRS-5014: Agent "/u01/app/11.2.0.4/grid/bin/oraagent.bin" timed out starting process "/u01/app/11.2.0.4/grid/opmn/bin/onsctli" for action "check": details at "(:CLSN00009:)" in "/u01/app/11.2.0.4/grid/log/admoract1n2/agent/crsd/oraagent_grid/oraagent_grid.log" CRS-5017: The resource action "ora.ons check" encountered the following error:
(:CLSN00009:)Utils:execCmd aborted. For details refer to "(:CLSN00109:)" in "/u01/app/11.2.0.4/grid/log/admoract1n2/agent/crsd/oraagent_grid/oraagent_grid.log".
CRS-2680: Clean of 'ora.ons' on 'admoract1n1' failed CRS-2503: Resource 'ora.ons' is in UNKNOWN state and must be stopped first CRS-2680: Clean of 'ora.ons' on 'admoract1n2' failed CRS-2503: Resource 'ora.ons' is in UNKNOWN state and must be stopped first CRS-2674: Start of 'ora.oc4j' on 'admoract1n2' failed CRS-2679: Attempting to clean 'ora.oc4j' on 'admoract1n2' CRS-2681: Clean of 'ora.oc4j' on 'admoract1n2' succeeded CRS-2563: Attempt to start resource 'ora.oc4j' on 'admoract1n2' has failed. Will re-retry on 'admoract1n1' now. CRS-2672: Attempting to start 'ora.oc4j' on 'admoract1n1' CRS-2674: Start of 'ora.oc4j' on 'admoract1n1' failed CRS-2679: Attempting to clean 'ora.oc4j' on 'admoract1n1' CRS-2681: Clean of 'ora.oc4j' on 'admoract1n1' succeeded CRS-2632: There are no more servers to try to place resource 'ora.oc4j' on that would satisfy its placement policy CRS-4000: Command Start failed, or completed with errors.
CONFIDENTIALITY NOTICE:
This message is intended for the use of the individual or entity to which it is addressed and may contain information that is privileged, confidential and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient or the employee or agent responsible for delivering this message to the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. If you have received this communication in error, please notify us immediately by email reply.
-- http://www.freelists.org/webpage/oracle-lReceived on Fri Nov 01 2013 - 15:41:52 CET