Re: Understanding "cursor: pin S wait on X"

From: Tanel Poder <tanel_at_tanelpoder.com>
Date: Tue, 9 Apr 2013 13:32:33 +0300
Message-ID: <CAMHX9JL+R6-tntZeD7PioqH=QmfZhan5CTzrex5hpx3sxTuhVg_at_mail.gmail.com>



This wait happens when your session wants to examine / execute an existing child cursor - but it's being pinned in X mode by someone else. The usual reasons for this are:
1) The cursor got flushed out for some reason (or it's just the first time anyone is running it) and the other session is currently hard parsing/compiling/loading the contents of this cursor back into library cache. The longer the parsing/loading takes, the longer you wait. If the parsing session hits some (performance) issues, this pin may be held for a long time.

or

2) The child cursor is already loaded into cache, but someone else is currently flushing it out (shared pool shortage, fragmentation & loading of a big object, SGA target / ASMM reducing the SGA size).

The thing with "auto-extending" connection pools is that there's a chicken-and-egg situation with troubleshooting performance & utilization spikes - what caused what? Sometimes there's just a mild performance hiccup for whatever reason, which nobody would even noticed, but a trigger-happy connection pool will greatly amplify it by starting up hundreds of new connections during the problem time.

0.6 logons per second is not a problem if that's your actual number. But if you are talking about 0.6 logons per second *on average in a 1-hour AWR report, *then this could mean also (let's say) 40 new connections per second during your spike time (of 60 seconds for example) and then no more logins throughout the rest of the hour. 60 x 50 / 3600 = 0.66...

So if your spike lasted for X seconds, then your performance data should also be from X seconds (or even more granular), otherwise these averages over long time may hide the truth. You can parse the listener log with a simple grep & awk command to get a more detailed overview (and somewhat closer to the connection pool) about connection rates.

If you want to reproduce "cursor: pin S for X" waits, you can use one of my seminar demo scripts (
http://blog.tanelpoder.com/files/scripts/aot/demo4.sql ). Just run the entire script in one session (it will pin a child and end up hard parsing for a very long time) and then run the select part of that script only in a different session. But my demo very narrowly focuses on causing the wait event problem, I don't think it will reproduce anything like you saw.

Another option would be to get 10s or hundreds of sessions running the same SQL statement (and overload the CPUs) and then purging the cursor with DBMS_SHARED_POOL.PURGE (not just flush shared pool as it will skip pinned chunks, but purge will wait until it gets the pin).

-- 
*Tanel Poder*
Enkitec (The Exadata Experts)
Training <http://blog.tanelpoder.com/seminar/> |
Troubleshooting<http://blog.tanelpoder.com/>
 | Exadata<http://www.amazon.com/Expert-Oracle-Exadata-Apress/dp/1430233923>
 | Voicee App <http://voic.ee/>



On Tue, Apr 9, 2013 at 11:59 AM, Thomas Kellerer <thomas.kellerer_at_mgm-tp.com

> wrote:

> Hello all,
>
> we had a situation on a 2 node RAC system where for some reason the system
> more or less stopped working due to excessive "cursor: pin S wait on X"
> waits. (89% of the DB time on one node, 36% on the other node).
>
> This seems to be the result of the application servers opening an
> excessive high number of connections to the RAC (the reason for that is yet
> unknown).
>
> We are not done invstigating everything that happened on the application
> server yet, but it seems to be that all those sessions were running the
> same statements and for some reason they were all routed to the same RAC
> node.
>
> The top 5 Wait events from the overloaded node show up like this:
>
> Event Waits Time(s) Avg wait (ms) % DB
> time Wait Class
> cursor: pin S wait on X 4,411 429,949 97472 88.68
> Concurrency
> library cache lock 246 30,608 124424 6.31
> Concurrency
> db file sequential read 1,237,878 8,275 7 1.71
> User I/O
> DB CPU 4,728 0.98
> db file scattered read 1,014,046 2,738 3 0.56
> User I/O
>
> (1 hour AWR report)
>
> In order to understand better the backround on when a "cursor: pin S wait
> on X" is requested, I tried to create a test environment to reproduce this.
> But up to now without luck. The "best" I can do simulating the high
> concurrency (on a test environment) will result in a high number of "latch:
> shared pool" and "latch: row cache objects".
>
> So my question is:
>
> Could somebody explain in more detail at which step(s) in the query
> processing a "cursor: pin S wait on X" is requested?
> By understanding the sequence that leads to such a wait event I hope to
> understand better what was going on on the Oracle side of the problem.
>
> Btw: is 0.6 logons per second considered a "logon storm"?
>
> Thanks in advance
>
> Kind regards
> Thomas
>
>
> --
> http://www.freelists.org/webpage/oracle-l
>
>
>
-- http://www.freelists.org/webpage/oracle-l
Received on Tue Apr 09 2013 - 12:32:33 CEST

Original text of this message