Re: 11g fault diagnosability infratructure and poor documentation

From: Andre van Winssen <dreveewee_at_gmail.com>
Date: Wed, 3 Oct 2007 14:49:51 +0200
Message-ID: <9b46ac490710030549i7e75497ayc0ad27c410a524f2@mail.gmail.com>

Hi,

11g engine is more complex than ever which leads to 'better bugs' that are more difficult to solve, enlarged attack surface for hacking, more often unexpected side-effects of so called fixes, more difficulties in finding ways to turn off unwanted features.

as Dilbert said: "if it ain't broke it doesn't have enough features yet" would 11g have enough features now?

Regards,
Andre

2007/10/3, Jeremiah Wilton <jeremiah_at_ora-600.net>:
>
> Am I the only one who has been unable to do much with this feature due
> to the woefully absent documentation? Three components of "fault
> diagnosability" in particular seem very interesting:
>
> - automatic hang detection
> - automatic reactive "health checks"
> - incident packages as a replacement for RDA
>
> Hang detection seems like a great idea, but there is no information on
> precisely what constitutes a "hang" according to DIAG and DIA0. These
> processes seem never to wake up, even in the most dire of hanging
> situations. I did find that by default in single-instance databases,
> the _hang_resolution, _hm_analysis_output_disk and _hm_log_incidents
> parameters are set to FALSE, which I take to mean the feature is turned
> off. Even turned on, long hangs involving chains of waiters visible in
> hanganalyze output do not trigger any actions that I can discern. This
> is slightly complicated by the fact that two components of "fault
> diagnosability" share the initials HM, and packages, parameters and
> views use HM interchangeably to mean "hang manager" and "heath monitor".
>
> As for Health Checks, there is no documentation indicating what kinds of
> events or incidents might result in a "reactive" health check. The
> existence of reactive health checks is repeatedly asserted in the
> documentation, and there is even a parameter called _diag_hm_rc_enabled
> with the description "Parameter to enable/disable Diag HM Reactive
> Checks". Set to FALSE by default, this parameter does nothing in the
> event of a badly degraded and hanging system either. We are left to
> wonder what "reactive" health checks react to!
>
> Finally, the incident packaging service works well enough, but is
> predicated completely upon the notion that any and all problems will be
> associated with a fatal error of some kind. Anything that does not dump
> ORA-600 or another fatal error will not result in an "incident" and thus
> there is nothing to package. There is apparently no provision for
> problems that do not dump on an error. So, an on-demand incident package
> apparently cannot be created. Thus, despite the incident payloads
> having many of the same contents as the horrid RDA of yore, you cannot
> generate one on demand in a supported way. You can shoot a server
> process with a SIGSEGV, but I cannot imagine that is how Oracle intends
> us to get diagnostic data for opening an SR.
>
> You can probably detect that I am frustrated but I have been playing
> with this feature set for weeks and it is a frustrating morass of
> nonworking undocumented wastes of server memory. Remember, we are all
> now running two extra background processes, DIAG and DIA0, just for this
> feature. They are up and running and using memory on all of our 11g
> systems even if they do nothing and are turned off at the parameter
> level by default.
>
> I am ranting here in hopes that someone else has gotten further than I
> have or knows someone on the inside who can shed some light on these
> concerns.
>
> Thanks,
>
> Jeremiah Wilton
> ORA-600 Consulting
> --
> http://www.freelists.org/webpage/oracle-l
>
>
>

--
http://www.freelists.org/webpage/oracle-l

Received on Wed Oct 03 2007 - 07:49:51 CDT