How to create a service that fails over in a 2 node admin-managed RAC [message #668191] |
Mon, 12 February 2018 02:05 |
|
juniordbanewbie
Messages: 250 Registered: April 2014
|
Senior Member |
|
|
Dear all,
How to create a service that fails over in a 2 node admin-managed RAC if the other node is down
srvctl add service -database dwhdb -service dwh_auto_failover -preferred dwhdb1,dwhdb2 -tafpolicy BASIC -failovertype SESSION -verbose
however when i down dwhdb1, the service fail to fail over to dwhdb2, why is this so?
Maybe I was creating the service in a wrong way, so it cannot failover.
many thanks in advance!
|
|
|
|
Re: How to create a service that fails over in a 2 node admin-managed RAC [message #668298 is a reply to message #668193] |
Sat, 17 February 2018 03:51 |
|
juniordbanewbie
Messages: 250 Registered: April 2014
|
Senior Member |
|
|
Dear John,
yes are you right.
this is how I resolve my issues.
create two service which has preferred and available instances opposite of each other
srvctl add service -db dwhdb -service auto_failover -preferred dwhdb1 -available dwhdb2 -tafpolicy BASIC -failovertype SESSION -verbose
srvctl add service -db dwhdb -service auto_failover_pre_2 -preferred dwhdb2 -available dwhdb1 -tafpolicy BASIC -failovertype SESSION -verbose
from the 32 bit client server
in $ORACLE_HOME/network/admin/tnsnames.ora
insert the following network service name
dwhdb =
(DESCRIPTION_LIST=
(DESCRIPTION=
(ADDRESS=(PROTOCOL = TCP)(HOST = rh73-12102-scan.moonlight.com)(PORT = 1521))
(CONNECT_DATA=
(SERVICE_NAME=auto_failover.moonlight.com)
(INSTANCE_NAME=dwhdb1)))
(DESCRIPTION=
(ADDRESS=(PROTOCOL = TCP)(HOST = rh73-12102-scan.moonlight.com)(PORT = 1521))
(CONNECT_DATA=
(SERVICE_NAME=auto_failover.moonlight.com)
(INSTANCE_NAME=dwhdb2)))
(DESCRIPTION=
(ADDRESS=(PROTOCOL = TCP)(HOST = rh73-12102-scan.moonlight.com)(PORT = 1521))
(CONNECT_DATA=
(SERVICE_NAME=auto_failover_pre_2.moonlight.com)
(INSTANCE_NAME=dwhdb2)))
(DESCRIPTION=
(ADDRESS=(PROTOCOL = TCP)(HOST = rh73-12102-scan.moonlight.com)(PORT = 1521))
(CONNECT_DATA=
(SERVICE_NAME=auto_failover.moonlight.com)
(INSTANCE_NAME=dwhdb1)))
)
instance name has to be added due to sqlplus internal error state 2130 ORA-24315:Illegal Attribute Type (Doc ID 1060880.1)
I wonder if thing can be so complicated I were to implement this in a policy manged db rather the admin managed db.
thanks
[Updated on: Sat, 17 February 2018 03:51] Report message to a moderator
|
|
|
|
Re: How to create a service that fails over in a 2 node admin-managed RAC [message #668325 is a reply to message #668299] |
Sun, 18 February 2018 23:04 |
|
juniordbanewbie
Messages: 250 Registered: April 2014
|
Senior Member |
|
|
Dear John,
set linesize 100;
SET TRIMSPOOL ON
col failover_type for a33;
col FAILOVER_METHOD for a20;
col failed_over for a20;
select FAILOVER_TYPE,FAILOVER_METHOD,FAILED_OVER from v$session where sid in (select distinct sid from v$mystat)
union select name,'(DB NAME)',null from v$database
union select instance_name,host_name,null from v$instance
union select machine,to_char(sid),osuser from v$session where sid in (select distinct sid from v$mystat)
union SELECT sys_context('userenv','service_name'), 'service_name', null FROM dual
union SELECT to_char(systimestamp,'YYYY-MM-DD HH24:MI:SS.FF TZR'), 'current_time_stamp', null FROM dual order by 1;
output as below:
FAILOVER_TYPE FAILOVER_METHOD FAILED_OVER
--------------------------------- -------------------- --------------------
2018-02-19 03:44:42.459847 +01:00 current_time_stamp
dwhDB (DB NAME)
dwhProdApp1 156 oracle
SESSION BASIC NO
auto_failover.moonlight.com service_name
dwhdb1 dwhProdDB1
6 rows selected.
SYSTEM@dwhprodapp1 auto_failover>host ssh root@dwhProdDB1 systemctl reboot
SYSTEM@dwhprodapp1 auto_failover>/
select FAILOVER_TYPE,FAILOVER_METHOD,FAILED_OVER from v$session where sid in (select distinct sid from v$mystat)
*
ERROR at line 1:
ORA-25408: can not safely replay call
SYSTEM@dwhprodapp1 auto_failover>/
FAILOVER_TYPE FAILOVER_METHOD FAILED_OVER
--------------------------------- -------------------- --------------------
2018-02-19 03:46:50.380190 +01:00 current_time_stamp
dwhDB (DB NAME)
dwhProdApp1 547 oracle
SESSION BASIC YES
auto_failover.moonlight.com service_name
dwhdb2 dwhProdDB2
SYSTEM@dwhprodapp1 auto_failover>host ping -c 3 dwhProdDB1
SYSTEM@dwhprodapp1 auto_failover>host ssh root@dwhProdDB2 systemctl reboot
6 rows selected.
SYSTEM@dwhprodapp1 auto_failover>/
select FAILOVER_TYPE,FAILOVER_METHOD,FAILED_OVER from v$session where sid in (select distinct sid from v$mystat)
*
ERROR at line 1:
ORA-03113: end-of-file on communication channel
Process ID: 12773
Session ID: 547 Serial number: 8463
SYSTEM@dwhprodapp1 auto_failover>/
ERROR:
ORA-03114: not connected to ORACLE
as you can observe the session cannot failover when to dwhproddb1 when dwhproddb2 is down.
there is no doubt when I did a srvctl status service -db dwhdb -service "auto_failover,auto_failover_pre_2", the services are in dwhproddb1, so do you consider it a bug?
that's why I implement an alternative, which is to make it work
dwhdb =
(DESCRIPTION_LIST=
(DESCRIPTION=
(ADDRESS=(PROTOCOL = TCP)(HOST = rh73-12102-scan.moonlight.com)(PORT = 1521))
(CONNECT_DATA=
(SERVICE_NAME=auto_failover.moonlight.com)
(INSTANCE_NAME=dwhdb1)))
(DESCRIPTION=
(ADDRESS=(PROTOCOL = TCP)(HOST = rh73-12102-scan.moonlight.com)(PORT = 1521))
(CONNECT_DATA=
(SERVICE_NAME=auto_failover.moonlight.com)
(INSTANCE_NAME=dwhdb2)))
(DESCRIPTION=
(ADDRESS=(PROTOCOL = TCP)(HOST = rh73-12102-scan.moonlight.com)(PORT = 1521))
(CONNECT_DATA=
(SERVICE_NAME=auto_failover_pre_2.moonlight.com)
(INSTANCE_NAME=dwhdb2)))
(DESCRIPTION=
(ADDRESS=(PROTOCOL = TCP)(HOST = rh73-12102-scan.moonlight.com)(PORT = 1521))
(CONNECT_DATA=
(SERVICE_NAME=auto_failover_pre_2.moonlight.com)
(INSTANCE_NAME=dwhdb1)))
)
however i discover even my implementation run into problem when I stop listener in dwhproddb1, it has this error:
ORA-12521: TNS:listener does not currently know of instance requested in
connect descriptor
however it can connect successfully when I up listener in dwhproddb1 and down listener dwhproddb2
|
|
|
Re: How to create a service that fails over in a 2 node admin-managed RAC [message #668330 is a reply to message #668325] |
Mon, 19 February 2018 01:00 |
John Watson
Messages: 8962 Registered: January 2010 Location: Global Village
|
Senior Member |
|
|
Quote:however i discover even my implementation run into problem when I stop listener in dwhproddb1, it has this error:
ORA-12521: TNS:listener does not currently know of instance requested in
connect descriptor
however it can connect successfully when I up listener in dwhproddb1 and down listener dwhproddb2
If you stop the node listener, nothing can connect to a service on that node. THis is the expected behaviour. And I have already told you not specify the instance name.
|
|
|
|
Re: How to create a service that fails over in a 2 node admin-managed RAC [message #668337 is a reply to message #668334] |
Mon, 19 February 2018 02:44 |
John Watson
Messages: 8962 Registered: January 2010 Location: Global Village
|
Senior Member |
|
|
Why do you think that document has any relevance to your problem? My very professional opinion of that article is that it is a perfect example of all that is wring with MOS. It was written five yeas ago and supposedly updated one year ago. And it is rubbish. The idea that you cannot have more than one entry in a tnsnames.ora file unless you specify an instance name is ridiculous.
So, some facts:
Nowhere in the RAC documentation will you EVER find an instruction to use an instance name.
The whole point of load balancing and fault tolerance is that you use service names only.
If you were to be using policy management, you wouldn't even know what the the instance names are.
How can you possible have failover when you demand a named instance?
If you do not believe me, raise a TAR and Support will tell you the same thing. Assuming that this system is in fact under a support contract, which it would have to be for your use of MOS to be legal.
|
|
|
Re: How to create a service that fails over in a 2 node admin-managed RAC [message #668551 is a reply to message #668337] |
Thu, 01 March 2018 02:10 |
|
juniordbanewbie
Messages: 250 Registered: April 2014
|
Senior Member |
|
|
Hi John, you are right.
Quote:
RAC: SQL*Plus Connection to Service Yields SP2-0642: SQL*Plus Internal Error State 2130
(Doc ID 1669959.1)
Do not include any form of the word FAIL in the creation of a RAC Service.
The use of the word FAIL as part of the SERVICE label caused this failure. In this case, when the service label was
changed and the word FAILOVER eliminated,
the error cleared.
Further testing revealed that the SERVICE_NAME label cannot contain any form of the word FAIL.
but right now i have another issue,
when the private interconnect is disconnected, node eviction occurs=> this one is expected
but when the private interconnects is reconnected back, the node don't seem to be able to join back the cluster.
I have to do the following:
su -c 'crsctl stop crs -f'
su -c 'crsctl start crs'
I have read many online articles, but they never mentioned about manual intervention.
so when the private interconnect is connected back, the evicted nodes is supposed to join back the cluster automatically without intervention right?
thanks
|
|
|