Oracle FAQ | Your Portal to the Oracle Knowledge Grid |
![]() |
![]() |
Home -> Community -> Mailing Lists -> Oracle-L -> Re: 11g fault diagnosability infratructure and poor documentation
Hi,
11g engine is more complex than ever which leads to 'better bugs' that are more difficult to solve, enlarged attack surface for hacking, more often unexpected side-effects of so called fixes, more difficulties in finding ways to turn off unwanted features.
as Dilbert said: "if it ain't broke it doesn't have enough features yet" would 11g have enough features now?
Regards,
Andre
2007/10/3, Jeremiah Wilton <jeremiah_at_ora-600.net>:
>
> Am I the only one who has been unable to do much with this feature due
> to the woefully absent documentation? Three components of "fault
> diagnosability" in particular seem very interesting:
>
> - automatic hang detection
> - automatic reactive "health checks"
> - incident packages as a replacement for RDA
>
> Hang detection seems like a great idea, but there is no information on
> precisely what constitutes a "hang" according to DIAG and DIA0. These
> processes seem never to wake up, even in the most dire of hanging
> situations. I did find that by default in single-instance databases,
> the _hang_resolution, _hm_analysis_output_disk and _hm_log_incidents
> parameters are set to FALSE, which I take to mean the feature is turned
> off. Even turned on, long hangs involving chains of waiters visible in
> hanganalyze output do not trigger any actions that I can discern. This
> is slightly complicated by the fact that two components of "fault
> diagnosability" share the initials HM, and packages, parameters and
> views use HM interchangeably to mean "hang manager" and "heath monitor".
>
> As for Health Checks, there is no documentation indicating what kinds of
> events or incidents might result in a "reactive" health check. The
> existence of reactive health checks is repeatedly asserted in the
> documentation, and there is even a parameter called _diag_hm_rc_enabled
> with the description "Parameter to enable/disable Diag HM Reactive
> Checks". Set to FALSE by default, this parameter does nothing in the
> event of a badly degraded and hanging system either. We are left to
> wonder what "reactive" health checks react to!
>
> Finally, the incident packaging service works well enough, but is
> predicated completely upon the notion that any and all problems will be
> associated with a fatal error of some kind. Anything that does not dump
> ORA-600 or another fatal error will not result in an "incident" and thus
> there is nothing to package. There is apparently no provision for
> problems that do not dump on an error. So, an on-demand incident package
> apparently cannot be created. Thus, despite the incident payloads
> having many of the same contents as the horrid RDA of yore, you cannot
> generate one on demand in a supported way. You can shoot a server
> process with a SIGSEGV, but I cannot imagine that is how Oracle intends
> us to get diagnostic data for opening an SR.
>
> You can probably detect that I am frustrated but I have been playing
> with this feature set for weeks and it is a frustrating morass of
> nonworking undocumented wastes of server memory. Remember, we are all
> now running two extra background processes, DIAG and DIA0, just for this
> feature. They are up and running and using memory on all of our 11g
> systems even if they do nothing and are turned off at the parameter
> level by default.
>
> I am ranting here in hopes that someone else has gotten further than I
> have or knows someone on the inside who can shed some light on these
> concerns.
>
> Thanks,
>
> Jeremiah Wilton
> ORA-600 Consulting
> --
> http://www.freelists.org/webpage/oracle-l
>
>
>
-- http://www.freelists.org/webpage/oracle-lReceived on Wed Oct 03 2007 - 07:49:51 CDT
![]() |
![]() |