Oracle FAQ | Your Portal to the Oracle Knowledge Grid |
![]() |
![]() |
Home -> Community -> Mailing Lists -> Oracle-L -> RAC Cluster - 100% cpu on all nodes
I got called today about one of our RAC clusters (RHEL 4, 2 cpus, 8GB
RAM, 10.2.0.1, 32-bit, ASM 2, EMC clariion cx700 storage, dual qlogic
hbas). that was locked up.
the cpu on both nodes were 100%. It took several minutes to login and
I could never get into sqlplus (10-15 minutes waiting).
I also tried to shut it down with srvctl also but it didn't respond
either.
IO was near zero - make sense. the cpu starved all other resources.
no errors in the alert.logs (both nodes) for both asm and the
instances - just a gap in the entries from 10am - 2pm (reboot).
No new trace files.
nothing significant in the ka-zillion logs in the clusterware home.
no errors in /var/log/messages
while doing a ps -ef, I saw 20+ processes of: /opt/app/oracle/product/crs10.2.0/bin/racgmain check
some were owned by root and some by oracle and everyone took about 5%
cpu.
they didn't want to wait for diagnosis so they said to to reboot them
both.
it came up fine, but after the reboot there was only one of the
processes mentioned above.
I run the cluvfy and it passed all the tests.
I ran the awr reports after from 10am to 2pm but haven't analyzed
them yet.
Has anyone else experienced this with RAC? Is there a quick hit list of things you check when things go south? I'm pretty methodical and started checking the standard things, but that wasn't fast enough for these folks. What do you check when all nodes of a RAC cluster are locked up like that?
I contacted support, but I don't have much hope based on my recent experience.
Thanks,
Steve
p.s. I forgot to grab the sar data. to see what it shows. I'll do that tomorrow.
-- http://www.freelists.org/webpage/oracle-lReceived on Mon Jun 12 2006 - 19:11:48 CDT
![]() |
![]() |