Re: Question about resource's start dependency in Clusterware
Date: Thu, 30 Jan 2014 11:13:10 +0100
Message-ID: <CAC08BHJMLoT6UJmXXpUg=FmKcKbCmCsOqoqrMNKdpSnygUL=qA_at_mail.gmail.com>
Hi,
> *Regardless of the value of the **AUTO_START** resource attribute for a
resource, the resource can start if another resource has a hard or weak
start dependency on it or if the resource has a pullup start dependency on
another resource.*
Thanks, I think that answers my question. Though the note in the documentation is actually under the section dealing with startup of Clusterware, so maybe the dependencies in the initial startup sequence are treated differently than when Clusterware is already operational.
> Which seems to indicate that indeed a hard dependency is enough to start
the other resource.
> But in the same document, Oracle states also:
*> Oracle recommends that resources with **hard** start dependencies also
have **pullup** start dependencies.*
> I'm not sure why that is.....
I think the reason for pullup dependencies can be found in the out-of-order
startup sequence described here:
http://docs.oracle.com/cd/E11882_01/rac.112/e16794/crschp.htm#CWADD92086 :
"When two or more resources depend on each other, a failure of one of them
may end up causing the other to fail, as well. In most cases, it is
difficult to control or even predict the order in which these failures are
detected. For example, even if resource A depends on resource B, Oracle
Clusterware may detect the failure of resource B after the failure of
resource A.
This lack of failure order predictability can cause Oracle Clusterware to
attempt to restart dependent resources in parallel, which, ultimately,
leads to the failure to restart some resources, because the resources upon
which they depend are being restarted out of order." And this sentence
explains (in my opinion) why pullup start dependencies are needed: "If the
attempt to restart resource A fails, then as soon as resource B
successfully restarts, Oracle Clusterware reattempts to restart resource
A."
So if we take the explanation above and the example from my first post:
[oracle_at_london1 ~]$ crsctl status resource ora.prod.db -p
NAME=ora.prod.db
TYPE=ora.database.type
ACL=owner:oracle:rwx,pgrp:oinstall:rwx,other::r--
[...]
SPFILE=+DATA/prod/spfileprod.ora
START_DEPENDENCIES=hard(ora.DATA.dg) [...] pullup(ora.DATA.dg)
STOP_DEPENDENCIES=hard(intermediate:ora.asm,shutdown:ora.DATA.dg)
My understanding is as follows:
The ora.prod.db's hard start dependency on ora.DATA.dg means that upon
starting the ora.prod.db resource, the resource ora.DATA.dg should be
already running and if it's not, it should be automatically started (even
without the pullpup dependency). On the other hand the pullup start
dependency means that when the ora.DATA.dg resource is started, it should
also start the ora.prod.db resource if its TARGET is not OFFLINE (since we
don't have the "always" modifier).
Now, if a failure occurs and Clusterware tries to start those two resources out of order as is stated in the documentation above, the pullup dependency is the mechanism to automatically handle this problem, e.g. suppose the ora.DATA.dg resource fails because the ASM instance crashes. Because of the hard stop dependency on ora.asm (which now isn't in either the online or intermediate state) and ultimately because the database can't run without ASM (the assumption is of course that database files are in ASM), the ora.prod.db resource also fails. Now, if Clusterware tries for whatever reason to start the ora.prod.db resource before ora.DATA.dg, the start of ora.prod.db fails since the ora.DATA.dg can't be started yet. However, when the ASM instance starts and the ora.DATA.dg is brought online (by the ASM instance dependency mechanism), the pullup(ora.DATA.dg) dependency will actually reattempt to start the ora.prod.db resource which will now start successfully (although I'm not sure what happens without the "always" modifier in this case). So in this case if the pullup dependency didn't exist, the second attempt to start the ora.prod.db resource wouldn't happen and it would remain offline.
Maybe the example I made wasn't the most appropriate, since stopping ASM in 11.2 has other implications if OCR is stored in it ( http://docs.oracle.com/cd/E11882_01/rac.112/e41960/srvctladmin.htm#RACAD5043: "You cannot use this command when OCR is stored in Oracle ASM because it will not stop Oracle ASM. To stop Oracle ASM you must shut down Oracle Clusterware."), but anyway a similar scenario would probably apply if we have two other dependent resources where neither of them depends on ASM.
Regards,
Jure
On Wed, Jan 29, 2014 at 9:33 PM, D'Hooge Freek <Freek.DHooge_at_exitas.be>wrote:
>
> Hi,
>
> In the documentation I found following note
>
> *Regardless of the value of the **AUTO_START** resource attribute for a
> resource, the resource can start if another resource has a hard or weak
> start dependency on it or if the resource has a pullup start dependency on
> another resource.*
>
> Which seems to indicate that indeed a hard dependency is enough to start
> the other resource.
> But in the same document, Oracle states also:
>
> *Oracle recommends that resources with **hard** start dependencies also
> have **pullup** start dependencies.*
>
> I'm not sure why that is.....
>
>
-- http://www.freelists.org/webpage/oracle-lReceived on Thu Jan 30 2014 - 11:13:10 CET