Re: Oracle ASM disk corruption
Date: Mon, 27 Jul 2020 21:35:46 -0400
Message-ID: <024aafdb-84b5-58d2-e1d1-686682e598f0_at_gmail.com>
Hi Amir,
You obviously have an inconsistent disk group. You should do the following:
- Shut down all instances in the database.
- Startup one instance to the mount state and with CLUSTER_DATABASE parameter set to false.
- Connect to the ASM instance as SYSASM and drop the offending disk group using something like "DROP DISKGROUP GRID FORCE INCLUDING CONTENTS;" After this, all the data in your disk group will be gone. You will need to recover the files on that disk group.
- Delete all ASM disks using "oracleasm" command.
- Additionally, clear the device headers like this: dd if=/dev/null of=/dev/oracleasm/grid/asmgrid01 bs=1024k count=128
- Re-create the ASM devices, re-constitute the ASM diskgroup and restore and recover the files.
Of course, there will be a problem if the disk group GRID is the voting/OCR disk group. In that case, you will have to restore the entire cluster.
Regards
On 7/27/20 8:01 PM, Hameed, Amir wrote:
>
> The status hasn’t changed.
>
> OS disk Space Space Disk
>
> Mount Header Mode Disk Size Total Free ASM Disk
> Failgroup Vote
>
> Grp# Disk# Status Status Status State (MB) (MB) (MB)
> Name Name Disk path file
>
> ---- ----- ------- ------------ ------- -------- ------- -------
> ------- ---------- ---------- ------------------------------ ----
>
> 0 0 CLOSED MEMBER ONLINE NORMAL 20,490 0 0
> /dev/oracleasm/grid/asmgrid01 Y
>
> 2 0 CACHED MEMBER ONLINE NORMAL 20,490 20,480 9,987
> GRID_0000 GRID_0000 /dev/oracleasm/grid/asmgrid03 Y
>
> 2 1 CACHED MEMBER ONLINE NORMAL 20,490 20,480 9,987
> GRID_0001 GRID_0001 /dev/oracleasm/grid/asmgrid02 Y
>
> Thanks
>
> *From:*Mark W. Farnham <mwf_at_rsiz.com>
> *Sent:* Monday, July 27, 2020 4:43 PM
> *To:* Hameed, Amir <Amir.Hameed_at_xerox.com>; 'John Chacho'
> <jchacho_at_gmail.com>
> *Cc:* gogala.mladen_at_gmail.com; oracle-l_at_freelists.org
> *Subject:* RE: Oracle ASM disk corruption
>
> Sorry, I missed the punch line. AFTER the rebalance and check, what does
>
> The following data was captured from V$ASM_DISK but it is consistent
> on all nodes if queried from GV$ASM_DISK:
>
> now show?
>
> mwf
>
> *From:*oracle-l-bounce_at_freelists.org
> <mailto:oracle-l-bounce_at_freelists.org>
> [mailto:oracle-l-bounce_at_freelists.org] *On Behalf Of *Hameed, Amir
> *Sent:* Monday, July 27, 2020 3:58 PM
> *To:* John Chacho
> *Cc:* Mark W. Farnham; gogala.mladen_at_gmail.com
> <mailto:gogala.mladen_at_gmail.com>; oracle-l_at_freelists.org
> <mailto:oracle-l_at_freelists.org>
> *Subject:* RE: Oracle ASM disk corruption
>
> Thanks John. I have already provided ASM alert logs from all three
> nodes to Oracle after opening the SR. The Oracle engineer is of the
> opinion that the MOUNT_STATUS and HEADER_STATUS values suggest that
> the disk is no longer part of the ASM disk group. The engineer is
> suggesting to format/reinitialize the block device and then add it
> back to the ASM DG.
>
> Thanks
>
> *From:*John Chacho <jchacho_at_gmail.com <mailto:jchacho_at_gmail.com>>
> *Sent:* Monday, July 27, 2020 3:50 PM
> *To:* Hameed, Amir <Amir.Hameed_at_xerox.com <mailto:Amir.Hameed_at_xerox.com>>
> *Cc:* Mark W. Farnham <mwf_at_rsiz.com <mailto:mwf_at_rsiz.com>>;
> gogala.mladen_at_gmail.com <mailto:gogala.mladen_at_gmail.com>;
> oracle-l_at_freelists.org <mailto:oracle-l_at_freelists.org>
> *Subject:* Re: Oracle ASM disk corruption
>
> If you haven't already done so, provide support with the complete
> alert_+ASM*.log and the html output from script 1 in Doc ID 470211.1.
>
> The alert_+ASM*.log should indicate why the disk got dropped. The html
> report from Doc ID 470211.1 will show the current status of almost
> everything ASM related.
>
> Support should be able to assess what went wrong and whether the
> 'alter diskgroup' commands are appropriate with the 'force' option.
> <https://docs.oracle.com/en/database/oracle/oracle-database/19/ostmg/alter-diskgroups.html#GUID-6BB31112-8687-4C1E-AF14-D94FFCDA736F>
>
> On Mon, Jul 27, 2020 at 1:28 PM Hameed, Amir <Amir.Hameed_at_xerox.com
> <mailto:Amir.Hameed_at_xerox.com>> wrote:
>
> Thanks Mark.
>
> Please see the information below. I will follow up with Oracle and
> let the list know with the action plan.
>
> I would think that ALTER DISKGROUP GRID DROP DISK GRID_0002
> **might** fix that.
>
> SQL> ALTER DISKGROUP GRID DROP DISK GRID_0002 ;
>
> ALTER DISKGROUP GRID DROP DISK GRID_0002
>
> *
>
> ERROR at line 1:
>
> ORA-15032: not all alterations performed
>
> ORA-15054: disk "GRID_0002" does not exist in diskgroup "GRID"
>
> Likewise, if has that disk listed as a member of diskgroup GRID,
> what happens if you do an ALTER DISKGROUP GRID REBALANCE?
>
> SQL> ALTER DISKGROUP GRID REBALANCE ;
>
> Diskgroup altered.
>
> From the ASM alert log file:
>
> SQL> ALTER DISKGROUP GRID REBALANCE
>
> Mon Jul 27 13:16:29 2020
>
> NOTE: GroupBlock outside rolling migration privileged region
>
> NOTE: requesting all-instance membership refresh for group=2
>
> Mon Jul 27 13:16:29 2020
>
> GMON updating for reconfiguration, group 2 at 30 for pid 31, osid
> 25903
>
> NOTE: group GRID: updated PST location: disk 0000 (PST copy 0)
>
> NOTE: group GRID: updated PST location: disk 0001 (PST copy 1)
>
> Mon Jul 27 13:16:29 2020
>
> NOTE: group 2 PST updated.
>
> Mon Jul 27 13:16:29 2020
>
> NOTE: membership refresh pending for group 2/0x88994cfc (GRID)
>
> NOTE: Attempting voting file refresh on diskgroup GRID
>
> NOTE: Refresh completed on diskgroup GRID
>
> . Found 2 voting file(s).
>
> NOTE: Voting file relocation is required in diskgroup GRID
>
> Mon Jul 27 13:16:29 2020
>
> GMON querying group 2 at 31 for pid 22, osid 25543
>
> Mon Jul 27 13:16:29 2020
>
> SUCCESS: refreshed membership for 2/0x88994cfc (GRID)
>
> Mon Jul 27 13:16:29 2020
>
> SUCCESS: ALTER DISKGROUP GRID REBALANCE
>
> ALTER DISKGROUP GRID CHECK
>
> SQL> ALTER DISKGROUP GRID CHECK
>
> SQL> ALTER DISKGROUP GRID CHECK ;
>
> Diskgroup altered.
>
> From the ASM alert log file:
>
> NOTE: starting check of diskgroup GRID
>
> Mon Jul 27 13:19:46 2020
>
> GMON querying group 2 at 37 for pid 31, osid 4062
>
> GMON checking disk 0 for group 2 at 38 for pid 31, osid 4062
>
> GMON querying group 2 at 39 for pid 31, osid 4062
>
> GMON checking disk 1 for group 2 at 40 for pid 31, osid 4062
>
> Mon Jul 27 13:19:46 2020
>
> SUCCESS: check of diskgroup GRID found no errors
>
> Mon Jul 27 13:19:46 2020
>
> SUCCESS: ALTER DISKGROUP GRID CHECK
>
> Thanks
>
> *From:* Mark W. Farnham <mwf_at_rsiz.com <mailto:mwf_at_rsiz.com>>
> *Sent:* Monday, July 27, 2020 9:39 AM
> *To:* Hameed, Amir <Amir.Hameed_at_xerox.com
> <mailto:Amir.Hameed_at_xerox.com>>; gogala.mladen_at_gmail.com
> <mailto:gogala.mladen_at_gmail.com>; oracle-l_at_freelists.org
> <mailto:oracle-l_at_freelists.org>
> *Subject:* RE: Oracle ASM disk corruption
>
> Okay. So it is closed and a member, but ASM has it recorded as
> still belonging to diskgroup “GRID”.
>
> Let’s see: If it is closed and throwing no errors, does that mean
> that a former drop disk had finished rebalancing to drop it but
> somehow was interrupted before some chicklet in ASM was checked?
>
> I would think that ALTER DISKGROUP GRID DROP DISK GRID_0002
> **might** fix that.
>
> Have you sent the error message below along with the SR
> information? I would think this represents an inconsistency in the
> ASM dictionary and therefore is a bug unless you hand edited
> something at the OS level.
>
> Likewise, if has that disk listed as a member of diskgroup GRID,
> what happens if you do an ALTER DISKGROUP GRID REBALANCE?
>
> Does that either a) work or b) fail to open the disk and give you
> some additional information?
>
> IF a), great, right?
>
> IF b), let us (and the SR folks) know the new information
>
> IF neither a) nor b), I probably fubared the syntax in my
> semi-retired rust.
>
> You might also report the results of
>
> ALTER DISKGROUP GRID CHECK
>
> Good luck, zero of this should be difficult and it should be 100%
> self diagnostic.
>
> PS: I seriously doubt MLADEN is WRONG about the meaning of the
> status information. Anything I’ve written could be wrong and based
> on how I asked them to do it rather than how they did it. Other
> than being a pain to Veritas, ASM was supposed to be easy to use
> and bulletproof. When one of my best friends from Oracle left ASM,
> I think it was.
>
> mwf
>
> *From:*oracle-l-bounce_at_freelists.org
> <mailto:oracle-l-bounce_at_freelists.org>
> [mailto:oracle-l-bounce_at_freelists.org] *On Behalf Of *Hameed, Amir
> *Sent:* Sunday, July 26, 2020 11:04 PM
> *To:* gogala.mladen_at_gmail.com <mailto:gogala.mladen_at_gmail.com>;
> oracle-l_at_freelists.org <mailto:oracle-l_at_freelists.org>
> *Subject:* RE: Oracle ASM disk corruption
>
> Hi Mladen!
>
> Thank you for your input. I already tried that and got the
> following result.
>
> -----
>
> SQL> ALTER DISKGROUP GRID
>
> ADD DISK '/dev/oracleasm/grid/asmgrid01' NAME GRID_0002
>
> /
>
> ALTER DISKGROUP GRID
>
> *
>
> ERROR at line 1:
>
> ORA-15032: not all alterations performed
>
> ORA-15033: disk '/dev/oracleasm/grid/asmgrid01' belongs to
> diskgroup "GRID"
>
> -----
>
> I also opened an SR and the analyst suggested the following action:
>
> /Closed and member status of the disk means that the disk is
> already dropped from asm. The only thing you can do at this point
> is to format that disk and then add it back to asm./
>
> Since it is a block device, I was thinking that overwriting the
> device header would reinitialize it? (I am using UDEV and not
> using ASMLIB. The disk is not partitioned).
>
> Thank you,
>
> Amir
>
> *From:* oracle-l-bounce_at_freelists.org
> <mailto:oracle-l-bounce_at_freelists.org>
> <oracle-l-bounce_at_freelists.org
> <mailto:oracle-l-bounce_at_freelists.org>> *On Behalf Of *Mladen Gogala
> *Sent:* Sunday, July 26, 2020 10:44 PM
> *To:* oracle-l_at_freelists.org <mailto:oracle-l_at_freelists.org>
> *Subject:* Re: Oracle ASM disk corruption
>
> Hi Amir!
>
> The status of CLOSED means that the disk is not being used by the
> ASM instance:
>
> https://docs.oracle.com/en/database/oracle/oracle-database/12.2/refrn/V-ASM_DISK.html#GUID-8E2E5721-6D4E-48C2-8DF3-A0EEBD439606
>
> |MOUNT_STATUS|
>
>
>
> |VARCHAR2(7)|
>
>
>
> Per-instance status of the disk relative to group mounts:
>
> ·|MISSING|- Oracle ASM metadata indicates that the disk is known
> to be part of the Oracle ASM disk group but no disk in the storage
> system was found with the indicated name
>
> ·|CLOSED|- Disk is present in the storage system but is not being
> accessed by Oracle ASM
>
> ·|OPENED|- Disk is present in the storage system and is being
> accessed by Oracle ASM. This is the normal state for disks in a
> database instance which are part of a disk group being actively
> used by the instance.
>
> ·|CACHED|- Disk is present in the storage system and is part of a
> disk group being accessed by the Oracle ASM instance. This is the
> normal state for disks in an Oracle ASM instance which are part of
> a mounted disk group.
>
> ·|IGNORED|- Disk is present in the system but is ignored by Oracle
> ASM because of one of the following:
>
> ·The disk is detected by the system library but is ignored because
> an Oracle ASM library discovered the same disk
>
> ·Oracle ASM has determined that the membership claimed by the disk
> header is no longer valid
>
> ·|CLOSING|- Oracle ASM is in the process of closing this disk
>
> So, the disk is there but it's not used by ASM. You can add it to
> one of your disk groups or leave it as a reserve for the rainy
> days, whatever suits you better. No action is necessary, this is
> no error condition.
>
> Regards
>
> On 7/26/20 10:09 PM, Hameed, Amir wrote:
>
> Hi,
>
> I have an Oracle 12.1.0.2 Grid Infrastructure setup with
> three-nodes. There exist multiple ASM disk groups that are
> managed by this setup. One of the disk groups is called GRID
> and it hosts the OCR and voting disks. Recently I have noticed
> that one of the ASM disks in this group has
> MOUNT_STATUS='CLOSED" and HEADER_STATUS='MEMBER' as shown below:
>
> The following data was captured from V$ASM_DISK but it is
> consistent on all nodes if queried from GV$ASM_DISK:
>
> OS disk Space Space Disk
>
> Mount Header Mode Disk Size Total Free ASM
> Disk Failgroup Vote
>
> Grp# Disk# Status Status Status State (MB) (MB)
> (MB) Name Name Disk path file
>
> ---- ----- ------- ------------ ------- -------- -------
> ------- ------- ---------- ----------
> ------------------------------ ----
>
> 0 0 CLOSED MEMBER ONLINE NORMAL 20,490 0 0
> /dev/oracleasm/grid/asmgrid01 Y
>
> 2 0 CACHED MEMBER ONLINE NORMAL 20,490 20,480
> 9,987 GRID_0000 GRID_0000 /dev/oracleasm/grid/asmgrid03 Y
>
> 2 1 CACHED MEMBER ONLINE NORMAL 20,490 20,480
> 9,987 GRID_0001 GRID_0001 /dev/oracleasm/grid/asmgrid02 Y
>
> The disk that is not showing up is GRID_0002 and the block
> device name is /dev/oracleasm/grid/asmgrid01. The only change
> that has been made recently was that the OS on all three nodes
> was upgraded from RHEL6 to RHEL7. I have tried to drop this
> disk from the DG but that didn't work and I got the message
> that this disk is not part of the GRID DG.
>
> What is the best way to resolve this issue? Should I overwrite
> the header of this device using dd so that it becomes a
> candidate disk? Any help will be appreciated.
>
> Thank you,
>
> Amir
>
> --
>
> Mladen Gogala
>
> Database Consultant
>
> Tel: (347) 321-1217
>
-- Mladen Gogala Database Consultant Tel: (347) 321-1217 -- http://www.freelists.org/webpage/oracle-lReceived on Tue Jul 28 2020 - 03:35:46 CEST