RAC on Windows Server 2003 Itanium hangs at reboot

From: Alfonso León <aleon68_at_gmail.com>
Date: Sun, 12 Oct 2008 18:35:47 -0500
Message-ID: <83a585ac0810121635x4cd812beh619316dae07a3016@mail.gmail.com>


Hello:

I have a 2 nodes rac with windows server 2003, 2 private NiC teamed and 2 public NIC teamed. Also using multipath for storage:

Here is the thing: when I reboot one node it hangs on EVT with the screen on "applying computer settings", Event viewer says:

<Quote>
The following service is taking more than 16 minutes to start and may be hung: OracleEVMService

Contact your system administrator or service vendor for approximate startup times for this service.

If you think this service might be slowing system response or logon time, talk to your system administrator about whether the service should be disabled until the problem is identified. </Quote>

Logs on the OSS says node 2 unreachable, so interconnect at that time wasn't up.

<OSS LOG>
CSSD <http://forums.oracle.com/forums/>2008-10-12 11:57:42.085 >USER: Oracle Database 10g CSS Release 10.2.0.2.0 Production Copyright 1996, 2004 Oracle. All rights reserved.
clsdmt <http://forums.oracle.com/forums/>Listening to
(ADDRESS=(PROTOCOL=tcp)(HOST=127.0.0.1)(PORT=61180))
CSSD <http://forums.oracle.com/forums/>2008-10-12 11:57:42.085 >USER: CSS daemon log for node sapprddb01, number 1, in cluster crs CSSD <http://forums.oracle.com/forums/>2008-10-12 11:57:42.095 2172<http://forums.oracle.com/forums/>>TRACE: clssscmain: local-only set to false
CSSD <http://forums.oracle.com/forums/>2008-10-12 11:57:42.105 2172<http://forums.oracle.com/forums/>>TRACE: clssnmReadNodeInfo: added node 1 (sapprddb01) to cluster
CSSD <http://forums.oracle.com/forums/>2008-10-12 11:57:42.105 2172<http://forums.oracle.com/forums/>>TRACE: clssnmReadNodeInfo: added node 2 (sapprddb02) to cluster
CSSD <http://forums.oracle.com/forums/>2008-10-12 11:57:42.115 3772<http://forums.oracle.com/forums/>>TRACE: clssnm_skgxnmon: skgxn init failed, rc 1
CSSD <http://forums.oracle.com/forums/>2008-10-12 11:57:42.115 2172<http://forums.oracle.com/forums/>>TRACE: clssnm_skgxnonline: Using vacuous skgxn monitor
CSSD <http://forums.oracle.com/forums/>2008-10-12 11:57:42.115 2172<http://forums.oracle.com/forums/>>TRACE: clssnmNMInitialize: misscount set to (60), impending reconfig threshold set to (56)
CSSD <http://forums.oracle.com/forums/>2008-10-12 11:57:42.115 2172<http://forums.oracle.com/forums/>>TRACE: clssnmNMInitialize: diskShortTimeout set to (57000)ms
CSSD <http://forums.oracle.com/forums/>2008-10-12 11:57:42.115 2172<http://forums.oracle.com/forums/>>TRACE: clssnmNMInitialize: diskLongTimeout set to (200000)ms
CSSD <http://forums.oracle.com/forums/>2008-10-12 11:57:42.115 2172<http://forums.oracle.com/forums/>>TRACE: clssnmDiskStateChange: state from 1 to 2 disk
(0/X:\cdata\crs\votedsk)

CSSD <http://forums.oracle.com/forums/>2008-10-12 11:57:42.115 3776<http://forums.oracle.com/forums/>>TRACE: clssnmvDPT: spawned for disk 0 (X:\cdata\crs\votedsk)
CSSD <http://forums.oracle.com/forums/>2008-10-12 11:57:42.115 2172<http://forums.oracle.com/forums/>>TRACE: clssnmDiskStateChange: state from 1 to 2 disk
(1/W:\cdata\crs\votedsk)

CSSD <http://forums.oracle.com/forums/>2008-10-12 11:57:42.115 3780<http://forums.oracle.com/forums/>>TRACE: clssnmvDPT: spawned for disk 1 (W:\cdata\crs\votedsk)
CSSD <http://forums.oracle.com/forums/>2008-10-12 11:57:42.115 2172<http://forums.oracle.com/forums/>>TRACE: clssnmDiskStateChange: state from 1 to 2 disk
(2/V:\cdata\crs\votedsk)

CSSD <http://forums.oracle.com/forums/>2008-10-12 11:57:42.115 3784<http://forums.oracle.com/forums/>>TRACE: clssnmvDPT: spawned for disk 2 (V:\cdata\crs\votedsk)
CSSD <http://forums.oracle.com/forums/>2008-10-12 11:57:44.121 3776<http://forums.oracle.com/forums/>>TRACE: clssnmDiskStateChange: state from 2 to 4 disk
(0/X:\cdata\crs\votedsk)

CSSD <http://forums.oracle.com/forums/>2008-10-12 11:57:44.121 3780<http://forums.oracle.com/forums/>>TRACE: clssnmDiskStateChange: state from 2 to 4 disk
(1/W:\cdata\crs\votedsk)

CSSD <http://forums.oracle.com/forums/>2008-10-12 11:57:44.121 3804<http://forums.oracle.com/forums/>>TRACE: clssnmvKillBlockThread: spawned for disk 0 (X:\cdata\crs\votedsk) initial sleep interval (1000)ms
CSSD <http://forums.oracle.com/forums/>2008-10-12 11:57:44.121 3808<http://forums.oracle.com/forums/>>TRACE: clssnmvKillBlockThread: spawned for disk 1 (W:\cdata\crs\votedsk) initial sleep interval (1000)ms
CSSD <http://forums.oracle.com/forums/>2008-10-12 11:57:44.121 3776<http://forums.oracle.com/forums/>>TRACE: clssnmReadDskHeartbeat: node(2) is down. rcfg(4) wrtcnt(14073)
LATS(110800) Disk lastSeqNo(14073)
CSSD <http://forums.oracle.com/forums/>2008-10-12 11:57:44.121 3780<http://forums.oracle.com/forums/>>TRACE: clssnmReadDskHeartbeat: node(2) is down. rcfg(4) wrtcnt(14073)
LATS(110800) Disk lastSeqNo(14073)
CSSD <http://forums.oracle.com/forums/>2008-10-12 11:57:44.131 3784<http://forums.oracle.com/forums/>>TRACE: clssnmDiskStateChange: state from 2 to 4 disk
(2/V:\cdata\crs\votedsk)

CSSD <http://forums.oracle.com/forums/>2008-10-12 11:57:44.131 2172<http://forums.oracle.com/forums/>>TRACE: clssnmFatalInit: fatal mode enabled
CSSD <http://forums.oracle.com/forums/>2008-10-12 11:57:44.131 3812<http://forums.oracle.com/forums/>>TRACE: clssnmvKillBlockThread: spawned for disk 2 (V:\cdata\crs\votedsk) initial sleep interval (1000)ms
CSSD <http://forums.oracle.com/forums/>2008-10-12 11:57:44.131 3820<http://forums.oracle.com/forums/>>TRACE: clssnmconnect: connecting to node 1, flags 0x0001, connector 1 CSSD <http://forums.oracle.com/forums/>2008-10-12 11:57:44.131 3784<http://forums.oracle.com/forums/>>TRACE: clssnmReadDskHeartbeat: node(2) is down. rcfg(4) wrtcnt(14073)
LATS(110810) Disk lastSeqNo(14073)
CSSD <http://forums.oracle.com/forums/>2008-10-12 11:57:44.131 3820<http://forums.oracle.com/forums/>>TRACE: clssnmClusterListener: Listening on
(ADDRESS=(PROTOCOL=tcp)(HOST=priv_sapprddb01)(PORT=49895))

CSSD <http://forums.oracle.com/forums/>2008-10-12 11:57:44.131 3820<http://forums.oracle.com/forums/>>TRACE: clssnmconnect: connecting to node 0, flags 0x0000, connector 1 CSSD <http://forums.oracle.com/forums/>2008-10-12 11:57:44.131 3820<http://forums.oracle.com/forums/>>TRACE: clssnmClusterListener: Probing node(2)
CSSD <http://forums.oracle.com/forums/>2008-10-12 11:57:44.141 3828<http://forums.oracle.com/forums/>>TRACE: clssgmclientlsnr: listening on (ADDRESS=(PROTOCOL=tcp)(HOST= 127.0.0.1)(PORT=61101))
CSSD <http://forums.oracle.com/forums/>2008-10-12 11:57:44.141 3840<http://forums.oracle.com/forums/>>TRACE: clssgmPeerListener: Listening on
(ADDRESS=(PROTOCOL=tcp)(DEV=1180)(HOST=192.168.4.1)(PORT=1056))
CSSD <http://forums.oracle.com/forums/>2008-10-12 11:57:45.123 3780<http://forums.oracle.com/forums/>>TRACE: clssnmReadDskHeartbeat: node(2) is down. rcfg(4) wrtcnt(14074)
LATS(111800) Disk lastSeqNo(14074)
CSSD <http://forums.oracle.com/forums/>2008-10-12 11:57:45.123 3776<http://forums.oracle.com/forums/>>TRACE: clssnmReadDskHeartbeat: node(2) is down. rcfg(4) wrtcnt(14074)
LATS(111800) Disk lastSeqNo(14074)
CSSD <http://forums.oracle.com/forums/>2008-10-12 11:57:45.133 3784<http://forums.oracle.com/forums/>>TRACE: clssnmReadDskHeartbeat: node(2) is down. rcfg(4) wrtcnt(14074)
LATS(111810) Disk lastSeqNo(14074)
CSSD <http://forums.oracle.com/forums/>2008-10-12 11:57:46.126 3776<http://forums.oracle.com/forums/>>TRACE: clssnmReadDskHeartbeat: node(2) is down. rcfg(4) wrtcnt(14075)
LATS(112800) Disk lastSeqNo(14075)
CSSD <http://forums.oracle.com/forums/>2008-10-12 11:57:46.126 3780<http://forums.oracle.com/forums/>>TRACE: clssnmReadDskHeartbeat: node(2) is down. rcfg(4) wrtcnt(14075)
LATS(112800) Disk lastSeqNo(14075)
CSSD <http://forums.oracle.com/forums/>2008-10-12 11:57:46.136 3784<http://forums.oracle.com/forums/>>TRACE: clssnmReadDskHeartbeat: node(2) is down. rcfg(4) wrtcnt(14075)
LATS(112810) Disk lastSeqNo(14075)
</OSS LOG>

<EVT LOG>
2008-10-12 11:57:47.289: EVMD
<http://forums.oracle.com/forums/>3924<http://forums.oracle.com/forums/>EVMD Starting
2008-10-12 11:57:47.309: EVMD
<http://forums.oracle.com/forums/>3924<http://forums.oracle.com/forums/> Oracle Database 10g CRS Release 10.2.0.2.0 Production Copyright 1996, 2004, Oracle. All rights reserved
2008-10-12 11:57:48.382: COMMCRS
<http://forums.oracle.com/forums/>3932<http://forums.oracle.com/forums/>clsc_send_msg:
(0000000006525C00) NS err (12571, 12560), transport (533, 57, 0)
</EVT LOG>

the reboot was at 11:57, the windows event was posted at 12:14, but it was still waiting unt 12:43 that I reset with the other node down. then it came up. the other node I have to boot it with the OCR services down and then start then manually and the cluster is up.

The interconnect works when both nodes are up and the OCR processes are started manually. so I'm guessing that the EVT process is starting before the teaming of the networks occurs.

If I try to reboot both nodes at the same time with OCR process in automatic they evict themselves.

The problem is only in the reboot time.

Any Suggestions?

Thanks in Advance

Alfonso

-- 
Alfonso Leon

--
http://www.freelists.org/webpage/oracle-l
Received on Sun Oct 12 2008 - 18:35:47 CDT

Original text of this message