Oracle FAQ | Your Portal to the Oracle Knowledge Grid |
![]() |
![]() |
Home -> Community -> Mailing Lists -> Oracle-L -> WAS: Question on starting up Oracle with "startup force" IS: Mandatory readong for HA minded folks
>Folks,
>We are upgrading from "Veritas Database Edition 3.5 for Oracle 9iRAC"
to the "Veritas Foundation Suite 4.1 for Oracle >RAC". While testing in
the lab with the new version, I noticed that the Veritas agent is
starting up Oracle with >"startup force" command as shown below:
First off, if you want to read about
what real, quality HA clusters (failover,rehosting,etc)
here is a brief bit: http://www.polyserve.com/pdf/VMDB_BC.pdf
This is a really long reply to a short question--as it should be. The topic deserves deep consideration. Unfortunately it is true that the people who need to read this response the most don't have time--thus the overwhelming success of over-marketed products (no names mentioned).
I don't think startup force should be a concern. In the case of a failover, the shutdown abort that is built into the startup force command is a no-op. Afterall, the server crashed and that is why a failover occurred.
In the event of a switchover (rehosting), the shutdown Veritas issue is important. If they do a normal or immediate shutdown followed by a startup force on the new node, there again I see it as a no-op. Doing startup force on a cleanly shutdown database is a no-op.
The PolyServe Database Utility for Oracle (a failover-HA product) does shutdown immediate (with timeout) followed by shutdown abort when the admin is moving an instance from one server to the other (we call this dynamic re-hosting because it is a simple GUI drag and drop). When rehosting, the database MUST be shutdown on the node being vacated so the:
if ( not immediate)
then abort
thing is very important.
More important than any of that is how to make sure a non-RAC instance is not mistakingly being brought up on 2 servers at the same time. Non-RAC uses node-local locking on the lkSID in $ORACLE_HOME/dbs. Tell me, if there is a non-Shared Home and a database being failed over from node 1 to, say, node 3 who is to say that, perhaps there are processes with the files open on the vacating node? Especially if your HA kit is a bare-bones setup where the database resides in RAW partitions. The locks Oracle rely on only tell it whether or not there is an instance with that database already ON THE CURRENT SERVER. So, if you have a non-RAC database in NAS, CFS or raw partitions, you can accidentally boot instances from the same database on more than one server. Of course if the HA setup is centered around non-shared filesystems this is not an issue. That is, if the database under HA control is in UFS/VxFS/ Ext3 then the failover action includes an unmount from the vacated node and a mount on the new node. For that reason I feel that HA solutions that are based on non-shared storage are complete garbage. A simple re-hosting operation means filesystem mounting operations? Junk.
Oracle Disk Manager (ODM) makes non-RAC databases completely safe in all clustered scenarios since a part of the spec is "cluster keys" associated with every oracle datafile/redo log, etc. PolyServe and Veritas both implement ODM. So, no matter how big the cluster, and no matter how chaotic the association of nodes to databases, ODM will ensure that no more then 1 instance opens a non-RAC database. The lk${SID} file is a no-op when ODM is in play.
In summary, any "HA solution" that uses shared disk, without ODM is rediculously dangerous. And, yep, there is Open Source junk out there that allows the tinkerers of the world to set up failover HA (with steeleye, or clustersuite, whetever) combined with "CFS" (e.g., OCFS which deserves no more mention than that). Anyone that "saves money" that way will be really sorry they didn't get an ODM implementation just as soon as they have 2 instances mistakingly opening files from a non-RAC database. Like I always try to impress upon people, there is more to this clustering stuff than hooking up cables :-).
Best to talk to people that know clustering.
So, to carry the point even further. A little command line
output is helpful. PolyServe's ODM exposes ODM cluster info
to admins. The same information is used by the HA engine to
make sure our failovers work. The following is output of our mxodmstat
tool
reporting the association of databases instances to server. Veritas
has no such information available. ODM can be implemented in
different ways.
There are 5 databases TESTDB1-TESTDB5 on node tmr6s13 to start. I only pick out DBWR and LGWR for brevity.
$mxodmstat -lv | egrep "Node|Writer"
Node Database Instance Application Proc Type
Pid File Activity
tmr6s13 TESTDB1 TESTDB1 -- DB Writer 24308 SmallData,LargeData,Other; Read,Write; Sync,Async tmr6s13 TESTDB1 TESTDB1 -- Log Writer 24333 SmallData,OLG,Other; Read,Write; Sync,Async tmr6s13 TESTDB2 TESTDB2 -- DB Writer 6052 SmallData,LargeData,Other; Read,Write; Sync,Async tmr6s13 TESTDB2 TESTDB2 -- Log Writer 6057 SmallData,OLG,Other; Read,Write; Sync,Async tmr6s13 TESTDB3 TESTDB3 -- DB Writer 24567 SmallData,LargeData,Other; Read,Write; Sync,Async tmr6s13 TESTDB3 TESTDB3 -- Log Writer 24575 SmallData,OLG,Other; Read,Write; Sync,Async tmr6s13 TESTDB4 TESTDB4 -- DB Writer 13828 SmallData,LargeData,Other; Read,Write; Sync,Async tmr6s13 TESTDB4 TESTDB4 -- Log Writer 13834 SmallData,OLG,Other; Read,Write; Sync,Async tmr6s13 TESTDB5 TESTDB5 -- DB Writer 3908 SmallData,LargeData,Other; Read,Write; Sync,Async tmr6s13 TESTDB5 TESTDB5 -- LogWriter 3912 SmallData,OLG,Other; Read,Write; Sync,Async
...next I use the HA cli command to rehost TESTDB3 to its DEFAULT backup server. I then look at mxodmstat output to see it is on tmr6s14 (different server). Note, that the mxdb command for re-hosting says it will take a moment. So I immediately do mxodmstat again and see that it is still on tmr613, but the next execution of the command shows it is on tmr6s14. Then mxdb is used (-Q) to get more detailed info about it now that it is on tmr6s14 (it has been rehosted)
$mxdb -d TESTDB3 -m DEFAULT
Validating Service Monitor for TESTDB3
Preparing to move the TESTDB3 Service.
Command successfully submitted to the High Availability engine.
Please note, the database will change state asynchronously.
Please allow an additional 15 seconds for the command to be
properly propagated throughout all nodes.
TESTDB3 Service will be moved.
$mxodmstat -lv | grep "TESTDB3.*Writer"
tmr6s13 TESTDB3 TESTDB3 -- DB Writer 24567 SmallData,LargeData,Other; Read,Write; Sync,Async tmr6s13 TESTDB3 TESTDB3 -- LogWriter 24575 SmallData,OLG,Other; Read,Write; Sync,Async $mxodmstat -lv | grep "TESTDB3.*Writer"
tmr6s14 TESTDB3 TESTDB3 -- DB Writer 31827 SmallData,Other; Read,Write; Sync,Async tmr6s14 TESTDB3 TESTDB3 -- LogWriter 31829 SmallData,OLG,Other; Read,Write; Sync,Async
$mxdb -d TESTDB3 -Q
Preparing to query the TESTDB3 Service on all nodes.
Service TESTDB3:
Connect through vhost: 10.10.60.173 Primary Node is: 10.10.60.14 Currently active on 10.10.60.14 Backups: BACKUP-1 10.10.60.13 Enabled Nodes: 10.10.60.13 10.10.60.14 Disabled Nodes: Maintenance Mode: OFF
..And, of course I connect using a PolyServe Virtual Oracle Service :
$sqlplus scott/tiger_at_TESTDB3.pdx.polyserve.com
SQL*Plus: Release 10.2.0.2.0 - Production on Thu Jul 27 10:29:41 2006
Copyright (c) 1982, 2005, Oracle. All Rights Reserved.
Connected to:
Oracle Database 10g Enterprise Edition Release 10.2.0.2.0 - 64bit
Production
With the Partitioning, OLAP and Data Mining options
SQL>
...then, while still running a shell on tmr6s13, I use mxodmstat to
see what I/O TESTDB3 is doing (NOTE--without knowing what node it is
running
on):
$uname -a;mxodmstat -a op -i3 -D TESTDB3 Linux tmr6s13 2.6.9-34.ELsmp #1 SMP Fri Feb 24 16:56:28 EST 2006 x86_64 x86_64 x86_64 GNU/Linux
TESTDB3 Read Write Sync Async KB/s Ave ms Sync Async KB/s Ave ms 4748 14 43933 1 8 951 6891 0 0 0 0 0 0 0.67 11 0 102 4 5050 1 9 293 15128 10 112 8 5053 1 9 511 15503 9 60 4 2930 1 5 156 8731 6 60 4 2578 1 5 238 7902 8 47 4 1544 1 4 247 5647 10
And for spice, I want to see "who" is doing that I/O so I tell mxodmstat to report I/O by process type (again, central monitoring of different cluster activity):
$uname -a;mxodmstat -i3 -N tmr6s14 -s proc Linux tmr6s13 2.6.9-34.ELsmp #1 SMP Fri Feb 24 16:56:28 EST 2006 x86_64 x86_64 x86_64 GNU/Linux
tmr6s14 Background DB Writer Log Writer PQO Foreground
6443 1654 238674 1 23 7603 85688 11 982 1643 176746 1 0 0 0 0 278 0 2224 1 77 59 8653 1 0 253 2981 9 36 49 6038 1 0 0 0 0 0 0 0 0 4 8 165 0 0 78 941 11 10 3 1848 1 0 0 0 0 0 0 0 0 78 59 8897 1 0 231 2805 10 38 47 7664 1 0 0 0 0 0 0 0 0 20 17 2204 0 0 45 683 8 13 13 1998 1 0 0 0 0 0 0 0 0 3 8 144 0 0 139 1427 12 8 0.67 453 1 0 0 0 0 0 0 0 0
...and finally, cluster keys. Here is mxodmstat output
showing control files, data files online logs from TESTDB1-TESTDB5
with their paths and internal (kernel mode) cluster File identifier.
When
Oracle opens an ODM file, it tells ODM a guaranteed world-wide unique
identifier for the file. ODM maintains that FID in its state. Any
other open of that file on any node in the cluster, Oracle will
throw the key at ODM to see if it is open, and if it is, well, the
files don't get opened ...
$mxodmstat -lf | more
FID Type Path
-- http://www.freelists.org/webpage/oracle-lReceived on Thu Jul 27 2006 - 13:17:45 CDT
![]() |
![]() |