Re: In Oracle Extended RAC can inaccessibility of quorum failgroup cause node eviction

From: Mikhail Velikikh <mvelikikh_at_gmail.com>
Date: Sun, 16 Jun 2024 14:55:40 +0100
Message-ID: <CALe4Hpk=LeeYqf1wdX3_aBFURefYdSwc4zExB1o77tPkYhxpow_at_mail.gmail.com>



Hi,

That is not what you said in your initial message:

> If you have three voting disks and 2 is in the same array if the
> firmware upgrade takes more than 200 seconds then you will have eviction so
> place the third quorum in a NFS Server and you won't have problems

Per your message there is a storage array that hosts 2 voting disks and a third voting disk on an NFS server. This cluster will go down if you lose your storage array because there will be no voting files majority.

the third in nfs server helps because in an extended cluster you have two
> storage arrays, normally people put 2 voting disk in an array and the third
> in second array, so if you move one of voting disk from the array
> which hosts 2 voting disks you won't have eviction if one arrays is turned
> off or under maintenance

If you have three voting disks and two storage arrays this configuration will not be able to handle a failure of the array that stores the majority of the voting disks (2 in this case). That is why a third storage location is used for that (it can be an NFS server). Besides you don't usually know in advance which one of your arrays is supposed to fail, so I am unsure what movement you mean here. It applies only to planned maintenance but a configuration with two storage arrays will not be reliable anyway because it has a single point of failure - the array with the majority of the voting disks.

Best regards,
*Mikhail Velikikh*

On Sun, 16 Jun 2024 at 14:45, Ls Cheng <exriscer_at_gmail.com> wrote:

> hi
>
> the third in nfs server helps because in an extended cluster you have two
> storage arrays, normally people put 2 voting disk in an array and the third
> in second array, so if you move one of voting disk from the array
> which hosts 2 voting disks you won't have eviction if one arrays is turned
> off or under maintenance
>
> thanks
>
>
> On Sun, Jun 16, 2024 at 3:32 PM Mikhail Velikikh <mvelikikh_at_gmail.com>
> wrote:
>
>> Hi,
>>
>> If you have three voting disks and 2 is in the same array if the
>>> firmware upgrade takes more than 200 seconds then you will have eviction so
>>> place the third quorum in a NFS Server and you won't have problems
>>
>>
>> Oracle Clusterware requires that a majority of voting files should be
>> available for a node to function. Losing 2 out of 3 voting files results in
>> a node being restarted after 200 seconds (long disk timeout). Placing the
>> third voting file on an NFS server does not really help here if it becomes
>> the only voting file available out of 2 required in a configuration with 3
>> voting files in total.
>>
>>
>> https://docs.oracle.com/en/database/oracle/oracle-database/19/cwlin/storage-considerations-for-oracle-grid-infrastructure-and-oracle-rac.html#GUID-02B594C9-9D80-4E17-B4F6-30F40D9D38E4
>>
>>> Storage must be shared; any node that does not have access to an
>>> absolute majority of voting files (more than half) is restarted.
>>
>>
>> That is why a typical Extended Cluster configuration with 2 data sites
>> requires 3 storage sites in total where the additional storage site is used
>> to store the third voting file and quorum failure groups for ASM disk
>> groups.
>>
>> https://docs.oracle.com/en/database/oracle/oracle-database/19/cwlin/about-oracle-extended-clusters.html#GUID-C12A6024-A46D-48F5-A443-E08C28AFC716
>>
>> Best regards,
>> *Mikhail Velikikh*
>>
>>
>>
>> On Sun, 16 Jun 2024 at 12:28, Ls Cheng <exriscer_at_gmail.com> wrote:
>>
>>> Hi
>>>
>>> If you have three voting disks and 2 is in the same array if the
>>> firmware upgrade takes more than 200 seconds then you will have eviction so
>>> place the third quorum in a NFS Server and you won't have problems
>>>
>>> Thank you
>>>
>>>
>>> On Fri, May 10, 2024 at 7:48 AM Sourav Biswas <biswas.sourav_at_hotmail.com>
>>> wrote:
>>>
>>>> Hi All,
>>>>
>>>> We have scheduled storage controller firmware upgrade, which will be in
>>>> rolling fashion. Storage team says, there shouldn't be any LUN availability
>>>> from this array.
>>>>
>>>> We have multiple Extended Oracle RACs running, where the quorum disks
>>>> are coming from this array.
>>>>
>>>> So would like to know in case of any adversities, if the quorum
>>>> failgroup becomes inaccessible, do we run into a risk of node eviction.
>>>>
>>>> And to resolve this issue, can we manually bring down one node before
>>>> storage upgrade. So that even if quorum goes offline, application can
>>>> maintain all its connections.
>>>>
>>>> Please advise.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> Regards,
>>>> Sourav Biswas
>>>>
>>>

--
http://www.freelists.org/webpage/oracle-l
Received on Sun Jun 16 2024 - 15:55:40 CEST

Original text of this message