Runaway pl/sql in the db from orphaned oas connections causes complete memory saturation

From: Charles Schultz <sacrophyte_at_gmail.com>
Date: Thu, 6 Dec 2012 21:08:46 -0600
Message-ID: <CAPZQniVC2DQpYhLn_ZyM_GquJR2cvY090Sbi3kKZgrZse4zxBQ_at_mail.gmail.com>



Good day,
I am hoping that some kind soul who is uber awesome at tracking down pl/sql and PGA issues will be able to help us out on this one. :)

*Basic Environment:*

  • Oracle HTTP Server (Fusion Middleware) 11.1.1.4 running on Red Hat Linux 5.8
  • Oracle OSEE 11.2.0.2 (and 11.1.0.7) running on Solaris 64-bit 10
  • LoadRunner and RationalRose on various Windows desktops

*Basic Problem:*
During testing of a particular application, we noticed that memory on the db server (solaris box) was quickly consumed and eventually caused the server to tailspin (hang). After lots of testing, we have been able to paint a much more detailed picture of what is going on behind the scenes.

*Detailed Problem:*
For reasons yet unknown, a small handful of connection pool processes (between 1 and 5, typically) from the oas will report a timeout to the end-user (simulated via the batch job; LoadRunner/RationalRose). However, the associated database process(es) will suddenly allocate massive amounts of pga and uga memory in a looping fashion until all system memory is completely consumed. At first we thought the application was at fault, but we have since duplicated the issue using a simple "hello world" procedure called by HTP (which is called from the simulated user sessions). Terminating either the oas process or the db session has the desired effects of stopping the memory leak.

*Interesting symptoms and observations:*

  • At first we thought the problem was caused by something in 11.2.0.2 (recently upgraded, hence the suspicion). However we determined the same memory leak actually does happen in 11.1.0.7 yet at a significantly less rate, enough to be almost unnoticeable in normal operations.
  • Via the application, adding an index to speed up various queries actually made the memory leak worse. Slowing down the application (removing the index, enabling event 10046) seemed to slow down the rate of memory leaking.
  • Similarly, turning on plsql debugging (via event 10938) or reducing the number of plsql requests (PlsqlMaxRequestsPerSession) to 1 also slows the processes down so much that memory leaking is not evident.
  • We are using prstat (sorted by rss) and 'top' to determine "leaks"; typically we will see a process on the db server start by grabbing a gigabyte of extra memory over and above the sga allocation, and then growing at about 1gb per minute per session.

We managed to accumulate 3 open SRs on this problem:

  • SR 3-6456891351 : Hemorrhaging Memory
  • SR 3-6496145406 : KV: OHS/mod_plsql Connection Keeps Growing and Causes DB to Become Unresponsive /* this one has the most detail */
  • SR 3-6532544331 : Customer getting ORA-2730x errors
-- 
Charles Schultz


--
http://www.freelists.org/webpage/oracle-l
Received on Fri Dec 07 2012 - 04:08:46 CET

Original text of this message