ASSM tablespaces, HWM synchronization issues and a plethora of 10g bugs

From: Don Granaman <DonGranaman_at_solutionary.com>
Date: Thu, 5 Feb 2009 11:14:02 -0600
Message-ID: <EF73D0391D16CD4EA683766122FE9DD7759E84_at_OMU-EXCH01.solutionary.com>



Does anyone have any experience with or knowledge of the problems with ASSM tablespaces in 10g - especially those related to HWM synchronization for objects in ASSM tablespaces? The signature of those errors is:  

ksedmp: internal or fatal error

ORA-00600: internal error code, arguments: [kdsgrp1], [], [], [], [], [], [], []  

We have been getting these errors since moving from 9.2.0.4 to 10.2.0.4 in July 2008. (We had long-standing issues with ASSM in 9.2 and were told that they were "fixed in 10g". Well, sort of...more like "bug enhancement". The 9i ASSM bugs were replaced by more pervasive 10g ASSM bugs!)  

It seems that these are usually transient errors - a query that fails with this error once can almost always be immediately run again with success. I filed a service request and spent weeks dealing with support on the issue, but there was never a satisfactory resolution so I simply gave up. The errors were never show-stoppers and subsided after the first month, but for the last few weeks they have been coming up again. I filed yet another service request and this time they going off in another (probably wrong) direction - wanting me to "fix the corruption" by exporting, dropping and recreating the affected objects, then importing the data back in. There are several problems with this pseudo-solution:  

  1. In testing, the supposed "corruption" reappears shortly (hours or less) after the rebuild.
  2. Some of the affected objects in production are between 100 GB and 1 TB - and are core to an ultra-critical 24xForever system.

There is (supposedly) a patch (6474009) that when combined with event="43809 trace name context forever, level 1" sort of "masks" the problem, but there are evidently no patches to actually fix the core problem (admitted by support, but 'It's fixed in 11.1.0.7". I've heard it before, when we were in 9i - "It's fixed in 10g".).  

You can check for the HWM discrepancy with the (undocumented) DBMS_SPACE_ADMIN.ASSM_SEGMENT_SYNCHWM. After some haranguing, support did publish Doc_ID: 726653.1 - the definition. This was developed to help diagnose and repair the 6474009 bug, but has its own little jewel of a bug - 6493013 (DBMS_SPACE_ADMIN.ASSM_SEGMENT_SYNCHWM with check_only=0 can corrupt blocks).  

It seems that this is the "groundhog day" bug. No matter what, it seems to start over again with the same symptoms. So far, everything I've tried (in a test system) either only very temporarily "fixes" the problem - or has its own little galaxy of bugs.  

Perhaps the only *real* solution is to move *everything* out of ASSM tablespaces, but that isn't really a great option either as this is a RAC system.  

Don Granaman (nee: OraSaurus)  

Confidentiality Notice: The content of this communication, along with any attachments, is covered by federal and state law governing electronic communications and may contain confidential and legally privileged information. If the reader of this message is not the intended recipient, you are hereby notified that any dissemination, distribution, use or copying of the information contained herein is strictly prohibited. If you have received this communication in error, please immediately contact us by telephone at 402.361.3000 or e-mail security_at_solutionary.com. Thank you.  

--
http://www.freelists.org/webpage/oracle-l
Received on Thu Feb 05 2009 - 11:14:02 CST

Original text of this message