Re: Corrupt redo logs and datafiles

From: Mark Bole <makbo_at_pacbell.net>
Date: Tue, 15 Mar 2005 23:08:31 GMT
Message-ID: <PVJZd.10991$C47.4249@newssvr14.news.prodigy.com>

Frank van Bortel wrote:

> SG wrote:
>

>> Hi all.
>>
>> I am new to Oracle so would appreciate any insight as to what are main
>> reasons based on your experience that cause corrupt redo log files and
>> data files? We had a corrupt sysaux.dbf file and constant corrupt redo
>> logs that would stop our application since it was in archive mode and
>> we'd get archiver errors. We tried creating new log groups and that
>> didn't help. We had to constantly clear unarchived log groups,etc. to
>> get it working. We rebuilt the database and used a dmp from the
>> "suspect" dbase to import our custom talbles in our newly created db
>> and tablespace. All had been fine for a month, but now it's happening
>> again. I looked in the alert logs and see that now we have a corrupt
>> system.dbf file. Has anyone had this type of experience? We are
>> running Oracle 10g on Redhat ES 3.0. The kernel version on the system
>> is 2.4.21.4. An identical system with no problems, same hardware, is
>> running kernel version 2.4.21-15.0.3. Seems to point to a hardware
>> issue maybe? Any ideas would be grealty appreciate. TIA.
>>
>> SG
>>
>>
>>

> And the filesystem(s) you use?
> Ext2, ext3, Reiserfs, hardware RAID, software RAID?
> Hardware: SCSI, IDE (tinkered with params?)
> 
> Why are you running unpatched kernels anyway?
> See: http://rhn.redhat.com/errata/RHSA-2005-043.html
> Linux csdb01.cs.nl 2.4.21-27.0.2.EL

Check values of DB_BLOCK_CHECKING and DB_BLOCK_CHECKSUM and adjust for testing if desired.

In my experience, what you describe is similar to problems I've seen with faulty disk controllers or even bad memory modules. What makes it so difficult to trouble-shoot is the intermittent and unpredictable nature -- runs fine for hours, days, even weeks and then errors start cropping up.

-Mark Bole Received on Tue Mar 15 2005 - 17:08:31 CST