RAC on Windows Server 2003 Itanium hangs at reboot
Date: Sun, 12 Oct 2008 18:35:47 -0500
Message-ID: <83a585ac0810121635x4cd812beh619316dae07a3016@mail.gmail.com>
Hello:
I have a 2 nodes rac with windows server 2003, 2 private NiC teamed and 2 public NIC teamed. Also using multipath for storage:
Here is the thing: when I reboot one node it hangs on EVT with the screen on "applying computer settings", Event viewer says:
<Quote>
The following service is taking more than 16 minutes to start and may be
hung: OracleEVMService
Contact your system administrator or service vendor for approximate startup times for this service.
If you think this service might be slowing system response or logon time, talk to your system administrator about whether the service should be disabled until the problem is identified. </Quote>
Logs on the OSS says node 2 unreachable, so interconnect at that time wasn't up.
<OSS LOG>
CSSD <http://forums.oracle.com/forums/>2008-10-12 11:57:42.085 >USER: Oracle
Database 10g CSS Release 10.2.0.2.0 Production Copyright 1996, 2004 Oracle.
All rights reserved.
clsdmt <http://forums.oracle.com/forums/>Listening to
(ADDRESS=(PROTOCOL=tcp)(HOST=127.0.0.1)(PORT=61180))
CSSD <http://forums.oracle.com/forums/>2008-10-12 11:57:42.085 >USER: CSS
daemon log for node sapprddb01, number 1, in cluster crs
CSSD <http://forums.oracle.com/forums/>2008-10-12 11:57:42.095
2172<http://forums.oracle.com/forums/>>TRACE: clssscmain: local-only
set to false
CSSD <http://forums.oracle.com/forums/>2008-10-12 11:57:42.105
2172<http://forums.oracle.com/forums/>>TRACE: clssnmReadNodeInfo:
added node 1 (sapprddb01) to cluster
CSSD <http://forums.oracle.com/forums/>2008-10-12 11:57:42.105
2172<http://forums.oracle.com/forums/>>TRACE: clssnmReadNodeInfo:
added node 2 (sapprddb02) to cluster
CSSD <http://forums.oracle.com/forums/>2008-10-12 11:57:42.115
3772<http://forums.oracle.com/forums/>>TRACE: clssnm_skgxnmon: skgxn
init failed, rc 1
CSSD <http://forums.oracle.com/forums/>2008-10-12 11:57:42.115
2172<http://forums.oracle.com/forums/>>TRACE: clssnm_skgxnonline:
Using vacuous skgxn monitor
CSSD <http://forums.oracle.com/forums/>2008-10-12 11:57:42.115
2172<http://forums.oracle.com/forums/>>TRACE: clssnmNMInitialize:
misscount set to (60), impending reconfig
threshold set to (56)
CSSD <http://forums.oracle.com/forums/>2008-10-12 11:57:42.115
2172<http://forums.oracle.com/forums/>>TRACE: clssnmNMInitialize:
diskShortTimeout set to (57000)ms
CSSD <http://forums.oracle.com/forums/>2008-10-12 11:57:42.115
2172<http://forums.oracle.com/forums/>>TRACE: clssnmNMInitialize:
diskLongTimeout set to (200000)ms
CSSD <http://forums.oracle.com/forums/>2008-10-12 11:57:42.115
2172<http://forums.oracle.com/forums/>>TRACE: clssnmDiskStateChange:
state from 1 to 2 disk
(0/X:\cdata\crs\votedsk)
CSSD <http://forums.oracle.com/forums/>2008-10-12 11:57:42.115
3776<http://forums.oracle.com/forums/>>TRACE: clssnmvDPT: spawned for
disk 0 (X:\cdata\crs\votedsk)
CSSD <http://forums.oracle.com/forums/>2008-10-12 11:57:42.115
2172<http://forums.oracle.com/forums/>>TRACE: clssnmDiskStateChange:
state from 1 to 2 disk
(1/W:\cdata\crs\votedsk)
CSSD <http://forums.oracle.com/forums/>2008-10-12 11:57:42.115
3780<http://forums.oracle.com/forums/>>TRACE: clssnmvDPT: spawned for
disk 1 (W:\cdata\crs\votedsk)
CSSD <http://forums.oracle.com/forums/>2008-10-12 11:57:42.115
2172<http://forums.oracle.com/forums/>>TRACE: clssnmDiskStateChange:
state from 1 to 2 disk
(2/V:\cdata\crs\votedsk)
CSSD <http://forums.oracle.com/forums/>2008-10-12 11:57:42.115
3784<http://forums.oracle.com/forums/>>TRACE: clssnmvDPT: spawned for
disk 2 (V:\cdata\crs\votedsk)
CSSD <http://forums.oracle.com/forums/>2008-10-12 11:57:44.121
3776<http://forums.oracle.com/forums/>>TRACE: clssnmDiskStateChange:
state from 2 to 4 disk
(0/X:\cdata\crs\votedsk)
CSSD <http://forums.oracle.com/forums/>2008-10-12 11:57:44.121
3780<http://forums.oracle.com/forums/>>TRACE: clssnmDiskStateChange:
state from 2 to 4 disk
(1/W:\cdata\crs\votedsk)
CSSD <http://forums.oracle.com/forums/>2008-10-12 11:57:44.121
3804<http://forums.oracle.com/forums/>>TRACE: clssnmvKillBlockThread:
spawned for disk 0 (X:\cdata\crs\votedsk)
initial sleep interval (1000)ms
CSSD <http://forums.oracle.com/forums/>2008-10-12 11:57:44.121
3808<http://forums.oracle.com/forums/>>TRACE: clssnmvKillBlockThread:
spawned for disk 1 (W:\cdata\crs\votedsk)
initial sleep interval (1000)ms
CSSD <http://forums.oracle.com/forums/>2008-10-12 11:57:44.121
3776<http://forums.oracle.com/forums/>>TRACE: clssnmReadDskHeartbeat:
node(2) is down. rcfg(4) wrtcnt(14073)
LATS(110800) Disk lastSeqNo(14073)
CSSD <http://forums.oracle.com/forums/>2008-10-12 11:57:44.121
3780<http://forums.oracle.com/forums/>>TRACE: clssnmReadDskHeartbeat:
node(2) is down. rcfg(4) wrtcnt(14073)
LATS(110800) Disk lastSeqNo(14073)
CSSD <http://forums.oracle.com/forums/>2008-10-12 11:57:44.131
3784<http://forums.oracle.com/forums/>>TRACE: clssnmDiskStateChange:
state from 2 to 4 disk
(2/V:\cdata\crs\votedsk)
CSSD <http://forums.oracle.com/forums/>2008-10-12 11:57:44.131
2172<http://forums.oracle.com/forums/>>TRACE: clssnmFatalInit: fatal
mode enabled
CSSD <http://forums.oracle.com/forums/>2008-10-12 11:57:44.131
3812<http://forums.oracle.com/forums/>>TRACE: clssnmvKillBlockThread:
spawned for disk 2 (V:\cdata\crs\votedsk)
initial sleep interval (1000)ms
CSSD <http://forums.oracle.com/forums/>2008-10-12 11:57:44.131
3820<http://forums.oracle.com/forums/>>TRACE: clssnmconnect:
connecting to node 1, flags 0x0001, connector 1
CSSD <http://forums.oracle.com/forums/>2008-10-12 11:57:44.131
3784<http://forums.oracle.com/forums/>>TRACE: clssnmReadDskHeartbeat:
node(2) is down. rcfg(4) wrtcnt(14073)
LATS(110810) Disk lastSeqNo(14073)
CSSD <http://forums.oracle.com/forums/>2008-10-12 11:57:44.131
3820<http://forums.oracle.com/forums/>>TRACE: clssnmClusterListener:
Listening on
(ADDRESS=(PROTOCOL=tcp)(HOST=priv_sapprddb01)(PORT=49895))
CSSD <http://forums.oracle.com/forums/>2008-10-12 11:57:44.131
3820<http://forums.oracle.com/forums/>>TRACE: clssnmconnect:
connecting to node 0, flags 0x0000, connector 1
CSSD <http://forums.oracle.com/forums/>2008-10-12 11:57:44.131
3820<http://forums.oracle.com/forums/>>TRACE: clssnmClusterListener:
Probing node(2)
CSSD <http://forums.oracle.com/forums/>2008-10-12 11:57:44.141
3828<http://forums.oracle.com/forums/>>TRACE: clssgmclientlsnr:
listening on (ADDRESS=(PROTOCOL=tcp)(HOST=
127.0.0.1)(PORT=61101))
CSSD <http://forums.oracle.com/forums/>2008-10-12 11:57:44.141
3840<http://forums.oracle.com/forums/>>TRACE: clssgmPeerListener:
Listening on
(ADDRESS=(PROTOCOL=tcp)(DEV=1180)(HOST=192.168.4.1)(PORT=1056))
CSSD <http://forums.oracle.com/forums/>2008-10-12 11:57:45.123
3780<http://forums.oracle.com/forums/>>TRACE: clssnmReadDskHeartbeat:
node(2) is down. rcfg(4) wrtcnt(14074)
LATS(111800) Disk lastSeqNo(14074)
CSSD <http://forums.oracle.com/forums/>2008-10-12 11:57:45.123
3776<http://forums.oracle.com/forums/>>TRACE: clssnmReadDskHeartbeat:
node(2) is down. rcfg(4) wrtcnt(14074)
LATS(111800) Disk lastSeqNo(14074)
CSSD <http://forums.oracle.com/forums/>2008-10-12 11:57:45.133
3784<http://forums.oracle.com/forums/>>TRACE: clssnmReadDskHeartbeat:
node(2) is down. rcfg(4) wrtcnt(14074)
LATS(111810) Disk lastSeqNo(14074)
CSSD <http://forums.oracle.com/forums/>2008-10-12 11:57:46.126
3776<http://forums.oracle.com/forums/>>TRACE: clssnmReadDskHeartbeat:
node(2) is down. rcfg(4) wrtcnt(14075)
LATS(112800) Disk lastSeqNo(14075)
CSSD <http://forums.oracle.com/forums/>2008-10-12 11:57:46.126
3780<http://forums.oracle.com/forums/>>TRACE: clssnmReadDskHeartbeat:
node(2) is down. rcfg(4) wrtcnt(14075)
LATS(112800) Disk lastSeqNo(14075)
CSSD <http://forums.oracle.com/forums/>2008-10-12 11:57:46.136
3784<http://forums.oracle.com/forums/>>TRACE: clssnmReadDskHeartbeat:
node(2) is down. rcfg(4) wrtcnt(14075)
LATS(112810) Disk lastSeqNo(14075)
</OSS LOG>
<EVT LOG>
2008-10-12 11:57:47.289: EVMD
<http://forums.oracle.com/forums/>3924<http://forums.oracle.com/forums/>EVMD
Starting
2008-10-12 11:57:47.309: EVMD
<http://forums.oracle.com/forums/>3924<http://forums.oracle.com/forums/>
Oracle Database 10g CRS Release 10.2.0.2.0 Production Copyright 1996, 2004,
Oracle. All rights reserved
2008-10-12 11:57:48.382: COMMCRS
<http://forums.oracle.com/forums/>3932<http://forums.oracle.com/forums/>clsc_send_msg:
(0000000006525C00) NS err (12571, 12560), transport (533, 57, 0)
</EVT LOG>
the reboot was at 11:57, the windows event was posted at 12:14, but it was still waiting unt 12:43 that I reset with the other node down. then it came up. the other node I have to boot it with the OCR services down and then start then manually and the cluster is up.
The interconnect works when both nodes are up and the OCR processes are started manually. so I'm guessing that the EVT process is starting before the teaming of the networks occurs.
If I try to reboot both nodes at the same time with OCR process in automatic they evict themselves.
The problem is only in the reboot time.
Any Suggestions?
Thanks in Advance
Alfonso
-- Alfonso Leon -- http://www.freelists.org/webpage/oracle-lReceived on Sun Oct 12 2008 - 18:35:47 CDT