Re: Doubt about timeout between nodes of cluster

From: Waldirio Manh�es Pinheiro <waldirio_at_gmail.com>
Date: Thu, 12 Jun 2008 21:22:19 -0300
Message-ID: <7df9f1820806121722l6afcf947o73974cc2a6adfd8b@mail.gmail.com>

Hello Riyaj

I'm re-installing the Operating System of machines and tomorrow I'll re-install the Oracle RAC (with default settings I'll check the crsd logs) and try tuning this time.

Thanks again.

PS: In generally, what the time between stop the first node and the second node up the first VIP interface ?!

Good Night All.
Waldirio

2008/6/12 Riyaj Shamsudeen <riyaj.shamsudeen_at_gmail.com>:

> Hello Waldirio
> Breaking up crsd.log, Approximately 30 seconds spent on CLSC recv/send
> failure etc. Parameter css misscount is set to 30 in unix platforms. I would
> say, misscount is controlling this duration, but that need to be validated
> enabling further trace and looking at cssd.log etc.., if you want.
>
> 2008-06-12 14:19:15.781: [ OCRMSG][1484962144]prom_rpc: CLSC recv
> failure..ret code 7
> 2008-06-12 14:19:42.464: [ OCRMSG][1484962144]prom_rpc: CLSC send
> failure..ret code 6
>
> Another 26 seconds spent in Cluster reconfiguration below..
>
> 2008-06-12 14:19:46.036: [ OCRSRV][2541411904]proath_init: Failed to
> retrieve pubdata. Expect a rcfg
> 2008-06-12 14:20:12.283: [ OCRMAS][1210108256]th_master:12: I AM THE NEW
> OCR MASTER at incar 1. Node Number 1
>
> Changing these parameters have profound effect on availability especially
> if the network architecture is not good enough.
>
> Cheers
> Riyaj Shamsudeen
> The Pythian Group www.pythian.com <http://www.pythian.com/>
> Personal blog: orainternals.wordpress.com <
> http://orainternals.wordpress.com/>
>
> Waldirio Manh�es Pinheiro wrote:
>
>> Hello Friend
>> Thank you for answer .., let's check.
>> 2008/6/12, Riyaj Shamsudeen <riyaj.shamsudeen_at_gmail.com <mailto:
>> riyaj.shamsudeen_at_gmail.com>>:
>>
>> Hello Waldirio
>> >> the time to the first machine detect the second machine
>> powered off is very big (between 1 and 2 min),
>> How are you measuring this time? Are you checking alert log or
>> are you using DB connections to check it?
>>
>> I was check this time starting when I have been send the shutdown to
>> server until the second VIP interface up on second node (backup node).
>>
>> Can you also send crsd.log?
>>
>> Ok, following the address because the size ...
>> http://rafb.net/p/hqE13995.html
>> When I send the power off on first node, on second node (crsd log on link
>> above), on line 1 log the message "[ COMMCRS][1147169120]clsc_receive:
>> (0xc6d180) Error receiving, ns (12535, 12560), transport (505, 110, 0)" and
>> still "Connection not active" until line 2045.
>> PS: Now, my VIP address of first node don't migrated to second node later
>> power off ... (maybe will be necessary re-install the OS and Oracle
>> ClusterWare, because I've changed the system a lot of to test)
>>
>> Further, refer $CRS_HOME/bin/racgvip and there are few parameters
>> such as check interval, restart attempts etc controlling behavior
>> of VIP failover too. Not sure, they are applicable when machine is
>> rebooted since heartbeat will fail before vip check..
>>
>> Yes, I checked this file too, but don't changed.
>> Now, looking the crsd log file, I believe the Oracle know when another
>> node is out, but who is responsible to make a failover (mount the aliases of
>> VIP on another machine) !? (Script, Daemon, Angel :P )
>> Thank you friends for help.
>> Waldirio
>>
>> Cheers
>> Riyaj Shamsudeen
>> The Pythian Group www.pythian.com <http://www.pythian.com/>
>> Personal blog: orainternals.wordpress.com
>> <http://orainternals.wordpress.com/>
>>
>> Waldirio Manh�es Pinheiro wrote:
>>
>> Hello Friends
>> I'd like to ask about Oracle RAC in Linux environment. I
>> installed two machine with RedHat AS 4Up5 and Oracle 10.2.0.3
>> <http://10.2.0.3/> <http://10.2.0.3/> with ClusterWare. The
>>
>> installation finish with successful and the data base work fine.
>> I checked my environment of availability with the test below:
>> Station cambeba UP
>> Station cangua UP
>> # crs_stat -t
>> Name Type Target State Host
>> ------------------------------------------------------------
>> ora....BA.lsnr application ONLINE ONLINE cambeba
>> ora....eba.gsd application ONLINE ONLINE cambeba
>> ora....eba.ons application ONLINE ONLINE cambeba
>> ora....eba.vip application ONLINE ONLINE cambeba
>> ora....UA.lsnr application ONLINE ONLINE cangua
>> ora.cangua.gsd application ONLINE ONLINE cangua
>> ora.cangua.ons application ONLINE ONLINE cangua
>> ora.cangua.vip application ONLINE ONLINE cangua
>> ora.ora10gq.db application ONLINE ONLINE cangua
>> ora....q1.inst application ONLINE ONLINE cangua
>> ora....q2.inst application ONLINE ONLINE cambeba
>> At this point, that's ok, but when I force a power off in
>> cangua or cambeba (the name of my machines), the time to the
>> firt machine detect the second machine powered off is very big
>> (between 1 and 2 min), so, if my client was working, will lost
>> the query for time out.
>> I changed the configurations in objects ora.cambeba.vip and
>> ora.cangua.vip, but without successful.
>> Any Ideia to fix this problem (decrease the time of check
>> between nodes on cluster) ?!?!
>> PS: I checked in list database, but without successful about
>> this problem
>>
>> Thanks in advanced.
>> -- ______________
>> Atenciosamente
>> Waldirio
>> msn: wmp_at_sinope.com.br <mailto:wmp_at_sinope.com.br>
>> <mailto:wmp_at_sinope.com.br <mailto:wmp_at_sinope.com.br>>
>> Site: www.waldirio.com.br <http://www.waldirio.com.br/>
>> <http://www.waldirio.com.br/>
>> Blog: blog.waldirio.com.br <http://blog.waldirio.com.br/>
>> <http://blog.waldirio.com.br/>
>> PGP: www.waldirio.com.br/public.html
>> <http://www.waldirio.com.br/public.html>
>> <http://www.waldirio.com.br/public.html>
>>
>>
>>
>>
>>
>> --
>> ______________
>> Atenciosamente
>> Waldirio
>> msn: wmp_at_sinope.com.br <mailto:wmp_at_sinope.com.br>
>> Site: www.waldirio.com.br <http://www.waldirio.com.br>
>> Blog: blog.waldirio.com.br <http://blog.waldirio.com.br>
>> PGP: www.waldirio.com.br/public.html <
>> http://www.waldirio.com.br/public.html>
>>
>
>

-- 
______________
Atenciosamente
Waldirio
msn: wmp_at_sinope.com.br
Site: www.waldirio.com.br
Blog: blog.waldirio.com.br
PGP: www.waldirio.com.br/public.html

--
http://www.freelists.org/webpage/oracle-l

Received on Thu Jun 12 2008 - 19:22:19 CDT

This message: [ Message body ]
Next message: Dan Norris: "Re: FW: IOUG Session Evaluation Results"
Previous message: Jared Still: "Re: Restricting Oracle to one processor"
In reply to: Riyaj Shamsudeen: "Re: Doubt about timeout between nodes of cluster"
Next in thread: Dan Norris: "Re: Doubt about timeout between nodes of cluster"

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ]

Original text of this message