Oracle-L: RE: clustering

From: Balakrishnan, Ashok - VSCM <Balakrishnan.Ashok_at_vectorscm.com>
Date: Tue, 29 Jul 2003 14:09:28 -0800
Message-ID: <F001.005C7B28.20030729140928@fatcity.com>

We used to experience problems in our RAC environment when there's an interconnect failure. There's a workaround for that problem, that was worked for us -

Create a directory under $ORACLE_HOME/rdbms called ".aixopt". Create (touch) a file called SUSTAIN_IPC_FAILURE (uppercase - 0 byte file).

We're using 9.2.0.3 2node RAC on AIX 5L / HACMP 4.4

Does Sun or Tru64 have similar workarounds or does it work flawlessly without the workaound. Having this workaround tells RAC to make sure atleast there's one surviving instance in the cluster instead of all instances crashing. Here's the section from alert log file with an example of handling failures of all 3 interconnects.

Marking down Network with IP 192.168.17.11 Thu Apr 10 23:29:28 2003
Marking down Network with IP 192.168.18.11 Thu Apr 10 23:29:28 2003
Marking down Network with IP 192.168.18.11 Thu Apr 10 23:29:28 2003
Marking down Network with IP 192.168.18.11 Thu Apr 10 23:29:28 2003
Marking down Network with IP 192.168.18.11 Thu Apr 10 23:29:29 2003
Marking down Network with IP 192.168.18.11 Thu Apr 10 23:29:29 2003
Marking down Network with IP 192.168.18.11 Thu Apr 10 23:29:29 2003
Marking down Network with IP 192.168.18.11 Thu Apr 10 23:29:30 2003
Marking down Network with IP 192.168.18.11 Thu Apr 10 23:29:33 2003
Marking down Network with IP 192.168.19.11 WARNING!!! NO COMMON NETWORKS FOR ALL NODES TO COMMUNICATE SUSTAINING IPC FAILURE
THIS SHOULD BE THE ONLY INSTANCE RUNNING IN THIS CLUSTER -----Original Message-----
Sent: Tuesday, July 29, 2003 9:29 AM
To: Multiple recipients of list ORACLE-L

Hrrrmm - well, we've never seen the problem you describe, and we've got a pretty big RAC environment here (clusters from two to six nodes, and we combine dev clusters to build bigger ones as we need). What the situation you describe sounds like is what happens when there's interconnect failure. Each node thinks independently that its been separated from the rest of the cluster and (effectively) shoots itself in the head. This causes every instance to hang. This is why the crafty RAC Jedi designs well their interconnect architecture.

But yes, if you're willing to take the "completely 2n capacity" cluster route and have two databases, double the oracle licenses, two storage arrays, two fibre channel networks, etc. , that is the highest availability/reliability cluster you can have - although at the highest cost and complexity.

Which clustering solution is right for you? Cheap and inelegant? Expensive and bullet-proof? Well, that's why we get paid the big bucks, right? :)

Thanks,
Matt

--
Matthew Zito
GridApp Systems
Email: mzito_at_gridapp.com
Cell: 646-220-3551
Phone: 212-358-8211 x 359
http://www.gridapp.com <http://www.gridapp.com/>  

-----Original Message-----
Tanel Poder
Sent: Monday, July 28, 2003 7:05 PM
To: Multiple recipients of list ORACLE-L


However, failed transactions must be handled from client side. Queries may
migrate to surviving nodes transparently.
Also, currently RAC has many problems, such all nodes hanging when one node
dies. Completely separate systems are still (an will always be) the most
available solution.
 
Tanel.
 
----- Original Message ----- 

To: Multiple recipients of list  <mailto:ORACLE-L_at_fatcity.com> ORACLE-L 
Sent: Monday, July 28, 2003 7:49 PM


Another Important different is that RAC is best High Availability solution
in case of System/Instance Failure where in case of HP or Veritas Cluster,
all of the resource get stopped on live system/node of the cluster and then
get started on second node and hence user will be affected. But in case of
system or Instance failure, there is seamless transition of the User session
in RAC 


Indy Johal




	"Ron Rogers" <RROGERS_at_galottery.org> 
Sent by: ml-errors_at_fatcity.com 


07/28/03 12:29 PM 
Please respond to ORACLE-L 


        
        To:        Multiple recipients of list ORACLE-L
<ORACLE-L_at_fatcity.com> 
        cc:         
        Subject:        Re: clustering	



ak,
As I understand it, an HP cluster is 2 boxes that have the capability
to access the same disks and data but only one can have the oracle
instance running and accessing the datafiles(active). Sort of like a
high availability option.
With RAC both boxes can access the instance and datafiles at the same
time.
List, Correct me if I need it.
Ron



>>> oramagic_at_hotmail.com 07/28/03 12:14PM >>>

Hi Guys ,
I am new to this clustering concept. Just trying to understand few
basics . Need ur help .

what is differece between oracle running on sun /hp cluster with 2
nodes and oracle with RAC running on 2 nodes ?  

thanks,
-ak
-- 
Please see the official ORACLE-L FAQ: http://www.orafaq.net
-- 
Author: Ron Rogers
 INET: RROGERS_at_galottery.org

Fat City Network Services    -- 858-538-5051 http://www.fatcity.com
San Diego, California        -- Mailing list and web hosting services
---------------------------------------------------------------------
To REMOVE yourself from this mailing list, send an E-Mail message
to: ListGuru_at_fatcity.com (note EXACT spelling of 'ListGuru') and in
the message BODY, include a line containing: UNSUB ORACLE-L
(or the name of mailing list you want to be removed from).  You may
also send the HELP command for other information (like subscribing).





-- 
Please see the official ORACLE-L FAQ: http://www.orafaq.net
-- 
Author: Balakrishnan, Ashok - VSCM
  INET: Balakrishnan.Ashok_at_vectorscm.com

Fat City Network Services    -- 858-538-5051 http://www.fatcity.com
San Diego, California        -- Mailing list and web hosting services
---------------------------------------------------------------------
To REMOVE yourself from this mailing list, send an E-Mail message
to: ListGuru_at_fatcity.com (note EXACT spelling of 'ListGuru') and in
the message BODY, include a line containing: UNSUB ORACLE-L
(or the name of mailing list you want to be removed from).  You may
also send the HELP command for other information (like subscribing).

Received on Tue Jul 29 2003 - 17:09:28 CDT