Re: Failover testing with 10g RAC

From: Bradd Piontek <piontekdd_at_gmail.com>
Date: Fri, 30 May 2008 10:48:35 -0500
Message-ID: <e9569ef30805300848l49fc70d9hfaf3324d6624e20@mail.gmail.com>

Jeff,
Are the pieces you are failing redundant in nature? For example, multiple HBAs, switches etc? We had some issues in our fail-over testing that had to do with Service Processor fail-over and it was due to a Linux kernel issue and nmi watchdog processes (again, this was on linux). Without redundancy in the components you mentioned, I would expect CRS to reboot the node. What are you using for OCR and Voting Disk?

-- 
Bradd Piontek
Twitter: http://www.twitter.com/piontekdd
Oracle Blog: http://piontekdd.blogspot.com
Linked In: http://www.linkedin.com/in/piontekdd
Last.fm: http://www.last.fm/user/piontekdd/

On Fri, May 30, 2008 at 10:21 AM, Jeffery Thomas <jeffthomas24_at_gmail.com>
wrote:



> Solaris 10, RAC 10.2.0.3.   Using IPMP groups for  NIC redundancy.

>

> We've been conducting failover testing -- disabling a HBA port,  power

> off a switch,

> yank an IC link, etc.

>

> In every single case, CRS rebooted the server where the dire deed was

> performed,

> and when the server came back up, the repair was successful, e.g. failed

> over to

> the secondary HBA port, or the physical IP for the IPMP group floated

> to the standby

> NIC and so forth.

>

> The other server stayed up and all Oracle components remained

> available.   During

> the switch power off  test, the physical IP for the IC actually

> floated over to the

> standby NIC with no outage on this server.

>

> Is this what is to be expected?   CRS will always reboot a server to repair

> itself when an underlying hardware failure is detected?

>

> Thanks,

> Jeff

> --

> http://www.freelists.org/webpage/oracle-l

>

>

>


--
http://www.freelists.org/webpage/oracle-l

Received on Fri May 30 2008 - 10:48:35 CDT

This message: [ Message body ]
Next message: William Wagman: "RE: Failover testing with 10g RAC"
Previous message: Powell, Mark D: "RE: diagnosing db link errors"
Next in thread: William Wagman: "RE: Failover testing with 10g RAC"
Reply: William Wagman: "RE: Failover testing with 10g RAC"
Reply: Jeffery Thomas: "Re: Failover testing with 10g RAC"

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ]

Original text of this message