Oracle FAQ | Your Portal to the Oracle Knowledge Grid |
Home -> Community -> Usenet -> c.d.o.server -> Re: 10.1.0.3 RAC on Solaris 8-9
Hi Daniel!
I don't think that the NIC being used does matter. Because if there isn't the CRS on our hosts, Sun Cluster does resynchronization in a few seconds... The problem doesn't exists even we use 10.1.0.2 CRS - reconfiguration takes 10-20 seconds. But when we apply patch 10.1.0.3, some wierd things happen - without any parameters has been changed. See our logs:
We are resetting the first node...
After about 30 seconds the second node recognizes that:
2004-09-28 13:21:41.648 [8] >WARNING: clssnmPollingThread: node(0)
missed(29) checkin(s)
2004-09-28 13:21:42.658 [8] >WARNING: clssnmPollingThread: node(0)
missed(30) checkin(s)
2004-09-28 13:21:43.659 [8] >WARNING: clssnmPollingThread: node(0)
missed(31) checkin(s)
2004-09-28 13:21:44.668 [8] >WARNING: clssnmPollingThread: node(0)
missed(32) checkin(s)
2004-09-28 13:21:45.678 [8] >WARNING: clssnmPollingThread: Eviction started
for node 0, flags 0x0001, state 3, wt4c 0
Good things - CRS is going to evicte the problem node. Now it has to synchronize the cluster:
2004-09-28 13:21:50.729 [8] >TRACE: clssnmDoSyncUpdate: Initiating sync 3
2004-09-28 13:21:50.729 [4] >TRACE: clssnmHandleSync: Acknowledging sync:
src[1] seq[10] sync[3]
2004-09-28 13:21:51.208 [1] >USER: NMEVENT_SUSPEND [00][00][00][02]
Here Oracle totally hangs. What is going on in these 5 minutes?????
2004-09-28 13:26:55.749 [8] >WARNING: clssnmWaitOnEvictions: Unconfirmed
dead node count 1
2004-09-28 13:26:55.750 [4] >USER: clssnmHandleUpdate: SYNC(3) from
node(1) completed
2004-09-28 13:26:55.750 [4] >USER: clssnmHandleUpdate: NODE(1) IS ACTIVE
MEMBER OF CLUSTER
2004-09-28 13:26:56.330 [14] >USER: NMEVENT_RECONFIG [00][00][00][02]
Oracle continues to work.
2004-09-28 13:26:56.331 [7] >TRACE: clssgmPeerListener: connects done
(1/1)
CLSS-3000: reconfiguration successful, incarnation 3 with 1 nodes
CLSS-3001: local node number 1, master node number 1
It looks like new 300 sec. timeout was introduced in 10.1.0.3, but where can i change it?
-- Alexey Sergeyev "Daniel Morgan" <damorgan_at_x.washington.edu> wrote in message news:1096519035.377922_at_yasure...Received on Mon Oct 04 2004 - 04:40:30 CDT
> Alexey Sergeyev wrote:
>
> > Hi
> >
> > Has anyone dealt with 10.1.0.3 RAC on Solaris 8 or 9? How long does a
> > cluster re-synchronize after a failure of one of nodes? We got an
absolutely
> > unexpected result - about 6 minutes...
> >
>
> Outrageous if properly configured. But lets look at the obvious question
> first. Who selected the NIC cards and are they certified for 10g RAC?
> The reason I ask is the "good" NIC cards have a keep-alive and try to
> reconnect. This is the worst possible thing to do with RAC. With RAC you
> want the cheapest dumbest cards you can find because you want a failure
> to kill the connection instantly. O/S may also be configured with a
> keep-alive so check that too.
>
> I routinely get sub-second fail-overs with RedHat Linux.
> --
> Daniel A. Morgan
> University of Washington
> damorgan_at_x.washington.edu
> (replace 'x' with 'u' to respond)
>