RE: Question on concurrency waits
Date: Tue, 26 Oct 2021 12:01:07 -0400
Message-ID: <417701d7ca82$b029af00$107d0d00$_at_rsiz.com>
A guess that satisfies the symptoms is you hit a calendar expired password cycle and issue a change your password instruction interactively to each of many users who have start of workshift collisions, so they are having concurrency issues with each other.
This is similar to an hilarious problem with some badly written timecard punch the clock applications.
A common solution is to divide such monthly calendar collisions by approximately 30 by using the person’s day of the month as the monthly cycle, but you may need to spread out the days that fall on weekends lest Mondays become a bump.
Or it could be something completely different. Answers that merely fit the symptoms are worth checking if the cost is low to check, but they are not always correct. That is the difference between “consistent with” and “cause of.”
Good luck. And upgrade, as per Andy. You might still have a poor underlying situation, but concurrency issues CAN be ameliorated by having each of the colliders take a smaller time slice so there is an improved chance they take place when another is not in flight.
From: Pap [mailto:oracle.developer35_at_gmail.com]
Sent: Tuesday, October 26, 2021 4:03 AM
To: Andy Sayer
Cc: Mark W. Farnham; Oracle L
Subject: Re: Question on concurrency waits
Thank You Andy.
In ASH the column sql_opname is showing 'alter user' and the module and program is passchng.exe for user_id as the same user. The sql_id is not captured either in gv$sql or dba_hist_sqltext so dont know the exact DDL text. And the irony is even so many failed login attempts were happening and the concurrency issue was happening , but it was self correcting itself after sometime, not sure how. Or is it that only a handful of sessions were logged in with the wrong pwd and thus failed. But this Alter user has been logged in ASH during the start of each issue period which happened 2-3 times. And we see the password last change date(PTIME in User$) for that same user is showing as a month past date, so not sure if it means the password has not successfully altered for that user even attempted.
On Tue, Oct 26, 2021 at 1:13 PM Andy Sayer <andysayer_at_gmail.com> wrote:
To clarify, where exactly are you seeing it is an alter user statement being executed? Do you have a sql_id or is this just using the sql command type column in ASH?
Thanks,
Andy
On Tue, 26 Oct 2021 at 04:49, Pap <oracle.developer35_at_gmail.com> wrote:
Thank You Mark and Andy. Yes it's mostly the invalid login attempts that have exposed the concurrency bug and have caused the impact. But yes the cause must be the place from where exactly those 'alter user..' (which is mostly password change request only) is coming up. We are yet to get it from the application side.
On Tue, Oct 26, 2021 at 3:22 AM Mark W. Farnham <mwf_at_rsiz.com> wrote:
Yes to both of Andy’s remarks, but you still also have the issue with “program 'passchng.exe'” and what is causing someone to issue Alter User commands (which from the program name the suggestion is that the password is being changed).
From: oracle-l-bounce_at_freelists.org [mailto:oracle-l-bounce_at_freelists.org] On Behalf Of Andy Sayer
Sent: Monday, October 25, 2021 4:46 PM
To: Pap
Cc: Mark W. Farnham; Oracle L
Subject: Re: Question on concurrency waits
This sounds like the 11.2 issue with bad concurrency when incorrect password attempts are made, this was fixed in 12.1 (both better concurrency and instrumented as a more obvious wait).. I suggest you check for ORA-1017 in your application and listener logs.
Also, it's about time you upgrade ;)
Thanks,
Andy
On Mon, 25 Oct 2021 at 21:29, Pap <oracle.developer35_at_gmail.com> wrote:
I tried the Dash Top script and below result from that clearly showing the top consumer as 'oauth' session. I am not able to see any rows from v$db_object_cache when passing the P1 as filter to hash_value column, maybe they must have been flushed away. Can this be some kind of buggy behavior?
TotalSeconds
AAS %This
EVENT TOP_LEVEL_CALL_NAME SQL_OPNAME P1TEXT P1
FIRST_SEEN LAST_SEEN 92480
77.1
72%
library cache lock
OAUTH handle address
1.28E+11
24-10-2021 22:25
24-10-2021 22:39
14560
12.1
11%
SQL*Net message from dblink
V8 Bundled Exec
SELECT driver id
6.76E+08
24-10-2021 22:20
24-10-2021 22:39
4390
3.7
3%
V8 Bundled Exec
SELECT driver id
6.76E+08
24-10-2021 22:20
24-10-2021 22:39
820
0.7
1%
OAUTH handle address
1.28E+11
24-10-2021 22:25
24-10-2021 22:39
780
0.6
1%
VERSION2 SELECT cellhash#
1.61E+09
24-10-2021 22:20
24-10-2021 22:26
On Tue, Oct 26, 2021 at 1:18 AM Pap <oracle.developer35_at_gmail.com> wrote:
Thank You Mark. got it now.
Now as we are still trying to find what exact changes were made by the dev team. I tried capturing the Dash Wait chain using tanels scripts. And attached are the results. The top sessions seem to be from 'jdbc thin client' but from SYS , so I'm wondering if these are just from the OEM tool. And this might be the victim but not the culprit. And the application queries which were blocked with event = 'library cache lock' were showing in_parse and in_hardparse column as 'N' , so means they were not getting parsed while experiencing 'library cache lock'.
However, while I was manually trying to travel through the wait chain, I saw its ending at some session with top_level_call_name as 'OAUTH'. and it was executing the below query and then might be some 'alter user' from that session. And because of sampling , it has not captured all statements though. But we see the number of sessions with top_level_call_name as 'OAUTH' has been increased from <10 per hour to ~40K+ in an hour during the issue period. So can it be the cause or is it the victim of some other ? How should we proceed to get to the cause? And if it's a good idea to bump the shard_pool size here?
select exptime, ltime, astatus, lcount from user$ where user#=:1
On Tue, Oct 26, 2021 at 12:19 AM Mark W. Farnham <mwf_at_rsiz.com> wrote:
dc_users is being modified by alter user. any parse needs to check dc_users for what that user is allowed and where to put things like temp. consistency is protected by a latch.
Is that the answer to your question about how it is logically related?
mwf
From: Pap [mailto:oracle.developer35_at_gmail.com]
Sent: Monday, October 25, 2021 12:22 PM
To: Mark W. Farnham
Cc: Oracle L
Subject: Re: Question on concurrency waits
Thank You Mark. So we are trying to reach out to the Dev team because it seems to be fired from the application user as we can see from the user_id column in v$active_session_history. But unfortunately the exact sql_text we are not able to see from the database side.
And regarding the relation of library cache lock+cache cache lock with the alter user, Would you explain a bit , how logically this is related. Is this because each application query must be getting hard parsed , if the underlying user from which it's fired is getting altered or anything else?
On Mon, Oct 25, 2021 at 9:30 PM Mark W. Farnham <mwf_at_rsiz.com> wrote:
yes, and the immediate question is to find out why alter user is being issued.
From: oracle-l-bounce_at_freelists.org [mailto:oracle-l-bounce_at_freelists.org] On Behalf Of Pap
Sent: Monday, October 25, 2021 6:31 AM
To: Oracle L
Subject: Re: Question on concurrency waits
My mistake, one correction:- the cache_id for all those 'row cache lock' is pointing to dc_users.
On Mon, Oct 25, 2021 at 3:12 PM Pap <oracle.developer35_at_gmail.com> wrote:
Hi, It's 11.2.0.4 Oracle database. We are suddenly seeing many application sql queries running slow and are showing 'library cache lock'. And checking the ASH for the exact time period when the issue started and the wait event appeared, we found few SYS sessions were doing 'ALTER USER' from program 'passchng.exe' and we are not able to see exact statement from sql_text for this (which may be because of its DDL and for DDL the AWR doesn't capture the text as of 11.2.0.4). So I wanted to understand if the ALTER USER command can cause such locking issues?
And also I see in the initial few minutes this session(Alter user session) was showing 'library cache lock' and I don't know how to get more information from the value of "handle address" but after some time, that session was showing the 'row cache lock' with cache_id pointing to the below cache objects. Are these can cause concurrency / "library cache lock" for other application queries?
kqlsubheap_object
extensible security user and rol
extensible security principal pa
extensible security UID to princ
extensible security principal na
extensible security principal ne
XS security class privilege
qmtmrctp_cache_entries
qmtmrciq_cache_entries
-- http://www.freelists.org/webpage/oracle-lReceived on Tue Oct 26 2021 - 18:01:07 CEST