Re: Failover testing with 10g RAC

From: Bradd Piontek <piontekdd_at_gmail.com>
Date: Fri, 30 May 2008 10:48:35 -0500
Message-ID: <e9569ef30805300848l49fc70d9hfaf3324d6624e20@mail.gmail.com>


Jeff,
  Are the pieces you are failing redundant in nature? For example, multiple HBAs, switches etc? We had some issues in our fail-over testing that had to do with Service Processor fail-over and it was due to a Linux kernel issue and nmi watchdog processes (again, this was on linux). Without redundancy in the components you mentioned, I would expect CRS to reboot the node. What are you using for OCR and Voting Disk?

-- 
Bradd Piontek
Twitter: http://www.twitter.com/piontekdd
Oracle Blog: http://piontekdd.blogspot.com
Linked In: http://www.linkedin.com/in/piontekdd
Last.fm: http://www.last.fm/user/piontekdd/

On Fri, May 30, 2008 at 10:21 AM, Jeffery Thomas <jeffthomas24_at_gmail.com>
wrote:


> Solaris 10, RAC 10.2.0.3. Using IPMP groups for NIC redundancy.
>
> We've been conducting failover testing -- disabling a HBA port, power
> off a switch,
> yank an IC link, etc.
>
> In every single case, CRS rebooted the server where the dire deed was
> performed,
> and when the server came back up, the repair was successful, e.g. failed
> over to
> the secondary HBA port, or the physical IP for the IPMP group floated
> to the standby
> NIC and so forth.
>
> The other server stayed up and all Oracle components remained
> available. During
> the switch power off test, the physical IP for the IC actually
> floated over to the
> standby NIC with no outage on this server.
>
> Is this what is to be expected? CRS will always reboot a server to repair
> itself when an underlying hardware failure is detected?
>
> Thanks,
> Jeff
> --
> http://www.freelists.org/webpage/oracle-l
>
>
>
-- http://www.freelists.org/webpage/oracle-l
Received on Fri May 30 2008 - 10:48:35 CDT

Original text of this message