Oracle FAQ Your Portal to the Oracle Knowledge Grid
HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US
 

Home -> Community -> Mailing Lists -> Oracle-L -> RE: Raid Arrays and Power Loss

RE: Raid Arrays and Power Loss

From: MacGregor, Ian A. <ian_at_SLAC.Stanford.EDU>
Date: Tue, 16 Sep 2003 11:04:36 -0800
Message-ID: <F001.005D011E.20030916110436@fatcity.com>


The Raid Array is a Sun A1000. I'm not sure the vintage, but the disks are 18 GB. The Raid array did not lose its configuration. The storage is still there. Neither affected file system was every empty, but a couple of files were lost. One on each file system.

The box is located at one of our interaction regions (IR's). some additional information [results truncated]

oracle_at_bbr-oracle $ last reboot

reboot    system boot                   Fri Sep 12 15:32
reboot    system boot                   Mon Aug 25 14:24

When the

  Fri Sep 12 13:32:01 2003

 ORA-00204: error in reading (block 1, # blocks 1) of controlfile
 ORA-00202: controlfile: '/u1/oradata/BBRO/BBROcntrl01.ctl'
 ORA-27091: skgfqio: unable to queue I/O
 SVR4 Error: 6: No such device or address  Additional information: 1

Error occurred the raid box was off. I had thought that the unix box had already been rebooted but that turns out to be false.

After the box was rebooted with the raid array on

Fri Sep 12 15:33:08 2003
> ORA-00202: controlfile: '/u1/oradata/BBRO/BBROcntrl01.ctl'
> ORA-27037: unable to obtain file status
> SVR4 Error: 2: No such file or directory
> Additional information: 3
> Fri Sep 12 15:33:11 2003

The other files on /u1 were fine. Also concerning

The other error

Fri Sep 12 16:18:58 2003
> Thread recovery: start rolling forward thread 1
> Fri Sep 12 16:18:58 2003
> Errors in file /opt/oracle/admin/BBRO/udump/bbro_ora_1804.trc:
> ORA-00313: open failed for members of log group 3 of thread 1
> ORA-00312: online log 3 thread 1: '/u2/oradata/BBRO/redo0301.log'
> ORA-27037: unable to obtain file status
> SVR4 Error: 2: No such file or directory
> Additional information: 3

The other files are /u2 were fine. The files in question just disappeared. I know this is not normal and raid boxes do not normally lose files, but it's hard to argue against the empirical evidence here that they can. It may be that either I or the folks down an IR-2 induced the problems. But files were indeed lost on two different LUN's.

My current thinking is that the two files were being written when the power was turned off on the raid array or there was not enough to keep the disks spinning because the UPS had been drained. The battery for the cache was reporting low, but based on the number of hours it operation. Should it not have maintained the cache?

Ian MacGregor
Stanford Linear Accelerator Center
ian_at_SLAC.Stanford.edu     

-----Original Message-----
Sent: Tuesday, September 16, 2003 10:55 AM To: Multiple recipients of list ORACLE-L

Okay, core questions:

-as someone asked, what's the make/model of storage? -has your raid array lost its config? In other words, is the storage there, just with an empty vtoc/volume table/partition table (insert your particular OS nomenclature) -Is the filesystem good, just empty? When you say the file is gone, is the /u1 directory empty, or is the filesystem structure there, just that file is gone?

Okay, I just saw your message that shows its solaris 8 + veritas. Here's what probably happened. The box was powered on without the RAID array powered on and consequently veritas doesn't see the disk groups/volumes that are on the RAID array. Have you tried doing (as root):

vxconfigd -km enable

This will cause a rescan of the existing volume groups. Afterwards, what does a vxprint -hrt look like?

In general, power loss to a RAID array will not produce the results you describe - I think its far more likely that a system->array interaction is preventing proper access to your storage.

Thanks,
Matt

--
Matthew Zito
GridApp Systems
Email: mzito_at_gridapp.com
Cell: 646-220-3551
Phone: 212-358-8211 x 359
http://www.gridapp.com


> -----Original Message-----
> From: ml-errors_at_fatcity.com [mailto:ml-errors_at_fatcity.com] On
> Behalf Of MacGregor, Ian A.
> Sent: Tuesday, September 16, 2003 12:34 AM
> To: Multiple recipients of list ORACLE-L
> Subject: Raid Arrays and Power Loss
>
>
> Last Friday was hot here, and rumor has it our 230 KV power
> line sagged and touched some tree branches. The local power
> company shut it off. Leaving our systems to depend on UPS.
> About 30 minutes afterwards one system produced these
> errors. This was jus before the system went dead
>
> Fri Sep 12 12:58:40 2003
> Errors in file /opt/oracle/admin/BBRO/bdump/bbro_ckpt_1420.trc:
> ORA-00206: error in writing (block 3, # blocks 1) of controlfile
> ORA-00202: controlfile: '/u1/oradata/BBRO/BBROcntrl01.ctl'
> ORA-27063: skgfospo: number of bytes read/written is
> incorrect SVR4 Error: 5: I/O error Additional information: -1
> Additional information: 8192 Fri Sep 12 12:58:42 2003 Errors
> in file /opt/oracle/admin/BBRO/bdump/bbro_ckpt_1420.trc:
> ORA-00221: error on write to controlfile
> ORA-00206: error in writing (block 3, # blocks 1) of controlfile
> ORA-00202: controlfile: '/u1/oradata/BBRO/BBROcntrl01.ctl'
> ORA-27063: skgfospo: number of bytes read/written is
> incorrect SVR4 Error: 5: I/O error Additional information: -1
> Additional information: 8192 Fri Sep 12 12:58:42 2003
> CKPT: terminating instance due to error 221
> Instance terminated by CKPT, pid = 1420
> --------------------------------------------------------------
> -----------------------------------------------
> Things look pretty shaky here. When things were restarted
> the following error was produced.
Fri Sep 12 13:32:01 2003
> ORA-00204: error in reading (block 1, # blocks 1) of controlfile
> ORA-00202: controlfile: '/u1/oradata/BBRO/BBROcntrl01.ctl'
> ORA-27091: skgfqio: unable to queue I/O
> SVR4 Error: 6: No such device or address
> Additional information: 1
>
> The raid array had not been powered on
> --------------------------------------------------------------
> -----------------------------------------
> However
> Fri Sep 12 15:33:08 2003
> ORA-00202: controlfile: '/u1/oradata/BBRO/BBROcntrl01.ctl'
> ORA-27037: unable to obtain file status
> SVR4 Error: 2: No such file or directory
> Additional information: 3
> Fri Sep 12 15:33:11 2003
> ORA-205 signalled during: alter database mount...
>
> Now the file system is available, but the file itself has
> disappeared. It was not corrupted, just disappeared. We
> duplex a copy to an internal disk. So recovery was easy.
>
> However once this was fixed
>
> Fri Sep 12 16:18:58 2003
> Thread recovery: start rolling forward thread 1
> Fri Sep 12 16:18:58 2003
> Errors in file /opt/oracle/admin/BBRO/udump/bbro_ora_1804.trc:
> ORA-00313: open failed for members of log group 3 of thread 1
> ORA-00312: online log 3 thread 1: '/u2/oradata/BBRO/redo0301.log'
> ORA-27037: unable to obtain file status
> SVR4 Error: 2: No such file or directory
> Additional information: 3
> ORA-313 signalled during: ALTER DATABASE OPEN...
> --------------------------------------------------------------
> -----------------------------------------------
> These files are on a RAID 1 LUN. Both copies of the file
> are gone. Again not corrupted but gone. I don't know if
> using duplexing rather than RAID 1 would have mattered here,
> but I am changing things so that one group of redo logs is on
> internal disk and written via the duplexing method.
>
>
>
>
> Ian MacGregor
> Stanford linear Accelerator Center
> ian_at_SLAC.Stanford.edu
>
>
>
> --
> Please see the official ORACLE-L FAQ: http://www.orafaq.net
> --
> Author: MacGregor, Ian A.
> INET: ian_at_SLAC.Stanford.EDU
>
> Fat City Network Services -- 858-538-5051 http://www.fatcity.com
> San Diego, California -- Mailing list and web hosting services
> ---------------------------------------------------------------------
> To REMOVE yourself from this mailing list, send an E-Mail message
> to: ListGuru_at_fatcity.com (note EXACT spelling of 'ListGuru')
> and in the message BODY, include a line containing: UNSUB
> ORACLE-L (or the name of mailing list you want to be removed
> from). You may also send the HELP command for other
> information (like subscribing).
>
-- Please see the official ORACLE-L FAQ: http://www.orafaq.net -- Author: Matthew Zito INET: mzito_at_gridapp.com Fat City Network Services -- 858-538-5051 http://www.fatcity.com San Diego, California -- Mailing list and web hosting services --------------------------------------------------------------------- To REMOVE yourself from this mailing list, send an E-Mail message to: ListGuru_at_fatcity.com (note EXACT spelling of 'ListGuru') and in the message BODY, include a line containing: UNSUB ORACLE-L (or the name of mailing list you want to be removed from). You may also send the HELP command for other information (like subscribing). -- Please see the official ORACLE-L FAQ: http://www.orafaq.net -- Author: MacGregor, Ian A. INET: ian_at_SLAC.Stanford.EDU Fat City Network Services -- 858-538-5051 http://www.fatcity.com San Diego, California -- Mailing list and web hosting services --------------------------------------------------------------------- To REMOVE yourself from this mailing list, send an E-Mail message to: ListGuru_at_fatcity.com (note EXACT spelling of 'ListGuru') and in the message BODY, include a line containing: UNSUB ORACLE-L (or the name of mailing list you want to be removed from). You may also send the HELP command for other information (like subscribing).
Received on Tue Sep 16 2003 - 14:04:36 CDT

Original text of this message

HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US