Re: No one could logon to productio database for a while

From: joel garry <joel-garry_at_home.com>
Date: 25 Sep 2006 14:05:45 -0700
Message-ID: <1159218345.231356.189230@m7g2000cwm.googlegroups.com>

dbaplusplus_at_hotmail.com wrote:
> dbaplusplus_at_hotmail.com wrote:
> > I am using 9.2.0.5 on HP UNIX. I have an after logon trigger which
> > executes quickly - all it does it set some stored outline parameters
> > (trigger is provided below).
> >
> > While this logon trigger is being executed, if someone else tries to
> > connect, will Oracle wait for logon trigger to be completed before they
> > can connect or will Oracle execute another "instance" of logon
> > trigger? There is only one logon trigger in database. For example if I
> > put a sleep in logon trigger for 5 minutes, will no one will be able to
> > logon during this 5 minutes, I do not think so but want o check.
> >
> > Today for some reason, no one could logon to my production database
> > including sys.
> > When sys tried to logon from sqlplus, session was just hung.
> >
> > We have no of process set to 1200 and there were only 383 oracle shadow
> > processes,
> > But then I killed a time-consuming oracle shadow process, I see a
> > message from this trigger in my alert.log:
> >
> > ORA-1013: user requested cancellation of operation, *** Error in login
> > trigger.
> >
> > Not sure whether this error message came from this long running killed
> > process or some other process which was causing execution of logon
> > trigger.
> >
> > Aftre killng a process, people could connect. I logged on to system. My
> > trigger is valid and now everything works. All I did was killed an
> > Oracle shadow process which somehow will not let any one logon. This
> > sounds like some Oracle quirk/bug. I am not sure whether logon trigger
> > is the culprit or something else, but genrally if logon trigger has
> > some issue it can hose everyone (but not sys). I have no clue how to
> > find root cause f the problem. I have already looked in alert.log and
> > trace files
> >
> > Below is trigger. It is a simple trigger, it calls a function and
> > function simply sets use_stored_outlines for some users.
> >
> >
> > CREATE OR REPLACE TRIGGER my_logon AFTER LOGON ON database
> > BEGIN
> > wmuser.my_outlnfun;
> > exception
> > when others then
> > sys.dbms_system.ksdwrt(2, SUBSTR (SQLERRM, 1, 100) || ' *** Error
> > in logon trigger');
> > END;
> > /
> >
> > create or replace procedure my_outlnfun is
> > begin
> >
> > if (user = 'JJPROS' or user = 'READGGPROS' or user = 'GGOSPP'') then
> > execute immediate 'alter session set
> > use_stored_outlines=my_custom';
> >
> > end if;
> > end;

> Listener.log shows that people were connecting to database, i.e.,
> connections were being made, they appear to be hung.

This sounds like you did something to change privilege, so caused the login trigger to be invalidated, but it couldn't recompile because it couldn't get to where privilege for recompilation is evaluated, and the alter session for everyone else couldn't be evaluated because that was locked while privilege was being altered. Or something like that, pehaps originally instigated by some completely unrelated procedure recompiling or user being modified. Then, when you killed the offending session (that is, the first one to be locked by the original one), the privilege could be evaluated by everyone, your procedure could automatically recompile, and the killed procedure left the message in the alert log.

It doesn't matter how quickly a trigger works if it has to wait for something it can't get.

Perhaps you should leave an OEM session open so you can check locks and waits when a problem like this occurs. I'm speculating you would see something locked in a base table for the *priv* views and some obscure latch wait. It is also possible that you are running into memory leaks or other almost-about-to-ORA-4031 issues since you are on 9205, which could intensify recompilation issues. (See bug 3258390 for one example of something where schemaname is messed up - and note that the fix for it in 9206 makes Bug 4480100 Latch contention for "row cache objects" latch - in other words, the first bug or something related to it could cause your problem the second time something is run, and the fix would make everyone have more latch contention. Six broken eggs or half-a-dozen - what a mess).

--
@home.com is bogus.
http://content.ytmnd.com/content/e/0/3/e035a08e616468d1b0500bc12dcc11e9.gif

Received on Mon Sep 25 2006 - 16:05:45 CDT