Oracle FAQ | Your Portal to the Oracle Knowledge Grid |
Home -> Community -> Usenet -> c.d.o.server -> Re: RAC unexpected reboot of nodes
alek wrote:
> HI,
>
> I'm a quite new in the RAC field and I want to know if the following
> behavior is normal for such a configuration:
>
> A few weeks ago we succeeded to configure an Oracle 10.2.0.1 cluster.
> The configuration was comprised of 2 nodes and the underlying OS was
> Redhat AS4. The installation went well following all the installation
> steps mentioned into the official oracle documentation. The OCR and the
> voting disks were configured using NFS. At that time we noticed that
> from time to time one of the nodes (not always the same) was
> unexpectedly rebooted. The system or oracle logs didn't offered any
> clues therefore our conclusion was that the NFS might cause problems.
> In order to prove this we decided to configure a RAC on a single node
> just for testing purposes. The OCR, voting disks and the oracle
> software were installed on OCFS2 partitions therefore no NFS was
> involved. On this node we configured 2 oracle instances which worked
> fine for a while but, from time to time or when the server is stressed
> with intensive SQLs the entire server is rebooted. After some searching
> on metalink we found out the Bug.4741921/4556989 (36) INSTANCE
> RESTARTED AFTER SHUTDOWN ABORT IN RAC ENVIRONMENT which is fixed in
> 10.2.0.2 patch. We downloaded and installed the patch but it seems that
> the strange behavior is still there. We notice, indeed, that the
> frequency of the server reboot is lower now but we have no explanation
> for what really causes the reboot.
> Have anyone notice the same behavior on the 10.2.0.x RAC configuration?
> Are there any workarounds for this?
>
> Many thanks.
I have never seen the behaviour reported by hpuxrac and others with respect to node ejection and rebooting reported but then I do all of my work on NetApps with the connection string supplied by NetApp.
I would suggest you monitor the network for outages as the behaviour you describe is expected if, for some reason, the Oracle clusterware believes it can no longer see a resource. And that resource might be public, memory interconnect, or the storage device.
-- Daniel A. Morgan http://www.psoug.org damorgan_at_x.washington.edu (replace x with u to respond)Received on Sun Mar 12 2006 - 15:26:08 CST