Oracle FAQ | Your Portal to the Oracle Knowledge Grid |
![]() |
![]() |
Home -> Community -> Usenet -> c.d.o.server -> Split-brain among HACMP cluster and Oracle9RAC
Background:
Part of our production environment is based on RS/6000 technology, with
HACMP and Oracle9RAC as products on top. We have 4 p570's (4-ways),
running AIX 5.3ML03, HACMP version 5.2 and OracleRAC version 9.2.0.7.
These machines are spread across 2 server rooms (about 300meters
distance). HACMP is configured witch concurrent disk access for Oracle
db-files on raw devices. Also we have configured HACMP with both IP and
NON-IP heartbeat (NON-IP heartbeat over SAN-disks). Oracle's
interconnect are configured as part of HACMP configuration. The total
number of databases/instances are about 20/80.
My problem:
During a test failover (the network in one serverrom goes down) I
observed that all Oracle databases went to "freezed" condition. As far
as I know, this is not correct. I have problem to find out why, but my
guess is that Oracle is waiting for some "network down" or "node down"
from HACMP before Oracle do some action. This will not happend, because
HACMP is talking to all 4 nodes over NON-IP network over the SAN disks
in such situation. When I shut down these 2 "isolated" machines, all
Oracle databases went down (lmon died). I had to start all databases
manually on the 2 "surviving" nodes. After startup I could access the
databases as normal.
I have been in contact with Oracle Support, and they say: "The configuration is insane. The fix is to configure the clusterware heartbeat and the oracle heartbeat on the same network. HACMP and our clusterware must see the same view of the cluster."
But what about the NON-IP heartbeat? HACMP MUST be configured to do heartbeating over IP and NON-IP network to avoid split in cluster, and to avoid disk/data corruption.
I don't think we are the only one customer running AIX, HACMP, concurrent disk acess on raw devices and Oracle9RAC. Therefore I hope that you or somebody else can help me resolving this issue.
I have opened a service request against both Oracle Support and IBM Support and I hope that somebody can help solving this issue. But both parts claime on the opposite products....
Any ideas? Shold I make some custom activity in HACMP to disable NON-IP disk heartbeat network if this happens? Sounds like lot of shampoo for hairless... I presume this could be more like "out-of-box" since the product certify matrix is OK..? (Yes I know HACMP is not out-of-the-box-product, I think I have pretty good control of my HACMP.)
Any ideas?
Thanks for your time, and thanks in advance!
ArneS Received on Thu Sep 21 2006 - 11:29:09 CDT
![]() |
![]() |