OCFS2 1.4.3 single node failure mount freeze [message #477779] |
Mon, 04 October 2010 07:44 |
thrixxxadmin
Messages: 3 Registered: October 2010
|
Junior Member |
|
|
Hi there.
I've setup an OCFS2 cluster with 5 nodes, every node is a VM on a VMware ESX server.
SCSI controller for ever VM is set to Physical (disks can be shared between ESX servers).
I've created one LUN on our SAN and used "Raw Mapped Lun" to add the LUN to every VM.
Then i created one partition and formated with OCFS2.
For all timeouts i used the default settings:
Specify heartbeat dead threshold = 31
Specify network idle timeout in ms = 30000
Specify network keepalive delay in ms = 2000
Specify network reconnect delay in ms = 2000
Everything works OK, except sometimes when i power off one node for testing.
After power off i check the other nodes and look at the kernel messages. The node does get kicked out after 30s and after additional 30s another node will jump in as recovery node.
However when i start this node it freezes during OCFS2 mount.
Kernel messages:
[ 207.393896] o2net: connected to node de-db02 (num 2) at 192.168.128.8:7777
[ 209.899312] ocfs2_dlm: Node 2 joins domain 6BECF48BA0524275BEA1E6F32AF9D756
[ 209.899315] ocfs2_dlm: Nodes in domain ("6BECF48BA0524275BEA1E6F32AF9D756"): 1 2 3 4 5
It looks like the node successfully joins the ocsf2 cluster but mount doesn't finish...
It hangs forever.
The only thing i can do is power off EVERY node, and boot them one after another.
Restart is not a problem (with shutdown -r now, because OCFS2 successfully leaves the cluster).
OCFS2 cluster.conf:
node:
name = de-db01
cluster = htdocs01
number = 1
ip_address = 192.168.128.7
ip_port = 7777
node:
name = de-db02
cluster = htdocs01
number = 2
ip_address = 192.168.128.8
ip_port = 7777
node:
name = de-www01
cluster = htdocs01
number = 3
ip_address = 192.168.128.110
ip_port = 7777
node:
name = de-www02
cluster = htdocs01
number = 4
ip_address = 192.168.128.111
ip_port = 7777
node:
name = mail01
cluster = htdocs01
number = 5
ip_address = 192.168.128.240
ip_port = 7777
cluster:
name = htdocs01
node_count = 5
Thank you for any advises!
I've no idea what's wrong...
|
|
|
|
|