RE: Global Cache and Enqueue Services statistics
Date: Tue, 31 Jul 2012 14:09:03 -0400
Message-ID: <304F58144267C5439E733532ABC9A3A1159953A0_at_USA0300MS02.na.xerox.net>
Thank you Tim and Gaja for your feedback. It is much appreciated. Here is some more information and the reasoning behind my mentioning of FREELSITS and FREELIST GROUPS. Based on the top waits in the AWR report, I queries DBA_HIST_ACTIVE_SESS_HISTORY and was able to identified statements that were waiting on these events. There are a total of 12-14 statements that account for around 90-95% of these waits. These statements are SELECT FOR UPDATE, UPDATE, DELETE AND INSERT. These include two INSERT statements and one of them is parallelized with over twenty sessions inserting into it in parallel. This was the primary reason for looking into FREELISTS and FREELIST GROUPS for a few tables that were involved in the INSERT statements. As I had mentioned in my previous note, this system was upgraded from 11.0.3 to 11.5.x several years ago and is therefore not configured with ASSM. Additionally, it is my observation that prior to EBS 11i, Oracle did not have a standard on defining FREELIST GROUPS on the base Oracle tables and most of the tables that I have looked at have these values set to 1. Starting with 11i fresh installs (installers prior to 11.5.10), tables started to get created with FLG of 4; at least this is what I have observed. With 11.5.10.2 installers, Oracle uses ASSM.
I have also noticed that the ping timings as reported in AWR are a bit high. The following is taken from node #2 and that is why its ping timing is low.
Target 500B Pin Avg Latency Stddev 8K Ping Avg Latency
Stddev
Instance Count 500B msg 500B msg Count 8K msg 8K
msg
--------- -------- ----------- ----------- -------- -----------
1 2,106 1.17 4.54 2,106 2.25 6.85
2 2,106 .32 .08 2,106 .30 .08
3 2,106 .47 1.17 2,106 .94 3.13
When I use the following commands from all RAC nodes to check latency on the interconnect interfaces, I consistently get sub-second response time. So, I am not sure why ping from AWR is showing higher timings when it also uses Oracle's ping processes to report these timings:
ping -sv -i <SOURCE_IP> <DEST_IP> 8196 <PING_COUNT> traceroute -F -s < SOURCE_IP> <DEST_IP> 8196
-----Original Message-----
From: oracle-l-bounce_at_freelists.org
[mailto:oracle-l-bounce_at_freelists.org] On Behalf Of Tim Gorman
Sent: Tuesday, July 31, 2012 1:05 PM
To: ORACLE-L
Subject: Re: Global Cache and Enqueue Services statistics
Amir,
FREELIST GROUPS could not be a problem if the tablespaces involved are
SEGMENT_SPACE_MANAGEMENT = AUTO; FREELISTS and FREELIST GROUPS pertain
only to SEGMENT_SPACE_MANAGEMENT = MANUAL, which is no longer default.
Even if someone chose the non-default space-management format,
contention on FREELISTS and FREELIST GROUPS only affect INSERT
operations, and the only event in the "/Top Five Timed Events/" report
you provided which might result from INSERT activity is the "/enq: TX -
row lock contention/" and that most definitely has nothing to do with
freelists and everything to do with row-locking on SELECT ... FOR
UPDATE, UPDATE, DELETE, or MERGE.
Of course, such an event might be occurring as the 6th or 10th most prevalent wait (i.e. not in the top 5), but since the top 5 consume roughly 85% of DB Time during this period, then it is unlikely to figure
significantly anyway.
Enable SQL trace on the session, aggregate and analyze those trace files
with either TKPROF, TRCANLZR, or (better yet) Hotsos/Method-R Profiler.
Stop guessing, and work from facts. And don't be deterred by unpalatable
truths.
Hope this helps...
-Tim
On 7/31/2012 10:14 AM, Gaja Krishna Vaidyanatha wrote:
> Hi Amir,
> >From the performance data you have shared, it is evident that RAC's
consumption is approximately 30% of Elapsed Time for your workload. In
the case of your application, that is the overhead that RAC poses in a
3-node cluster configuration. The row-level locking issue can (and will)
exacerbate the elapsed times of everything that RAC has to do in the
form of "inter-instance communication". I would definitely focus on
determining why 14% of DB Time is spent on locking. The application is
incurring "lock waits" that is averaging 1.3 seconds per occurrence. In
the realm of a normal transactional locking application, that is an
eternity.
>
> And to Tim Gorman's point, the data does basically show to you, what
life is going to be in a multi-instance-shared-database architecture
environment. This could be one of those cases, where an application that
performs reasonably well in a single-instance configuration, will
fall-apart in a multi-instance configuration WITHOUT additional work to
deal with the "intricacies of RAC". I would definitely give Tim's
suggestion of setting CLUSTER_DATABASE from one of the instances and
re-running your tests, a try. Also, I am not convinced at this time
whether "Freelists Groups" set at 1 is the cause of your performance
bane. I personally would rather focus on solving the locking issue
first, re-running your tests again and then go down the path of Freelist
Groups manipulation (if the performance data deems it to be necessary).
It may be the case with yours, that you will have to deal with "the
perception of HA" in a completely different fashion. Do keep us
> posted.
>
> Cheers,
>
> Gaja
>
> Gaja Krishna Vaidyanatha,
> CEO & Founder, DBPerfMan LLC
> http://www.dbperfman.com
> http://www.dbcloudman.com
>
> Phone - +1-650-743-6060
> LinkedIn -http://www.linkedin.com/in/gajakrishnavaidyanatha
>
> Co-author: Oracle Insights:Tales of the Oak Table
-http://www.apress.com/9781590593875
> Primary Author: Oracle Performance Tuning 101
-http://www.amzn.com/0072131454
> Enabling Cloud Deployment & Management for Oracle Databases
> >> From: "Hameed, Amir"<Amir.Hameed_at_xerox.com>
> ________________________________
> To:gajav_at_yahoo.com;oracle-l_at_freelists.org
> Sent: Monday, July 30, 2012 1:47 PM
> Subject: RE: Global Cache and Enqueue Services statistics
>
> Hi Gaja,
> Below are the top-5 wait events which are the same on all nodes:
>
> Top 5 Timed Foreground Events
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> Avg
> wait % DB
> Event Waits Time(s) (ms) time
Wait Class
> ------------------------------ ------------ ----------- ------ ------
> db file sequential read 54,365,842 34,374 1 32.3
User I/O
> gc buffer busy acquire 461,651 18,904 41 17.8
Cluster
> enq: TX - row lock contention 11,506 15,269 1327 14.4
Applicatio
> DB CPU 11,476 10.8
> gc current block busy 255,945 10,747 42 10.1
Cluster
>
-- http://www.freelists.org/webpage/oracle-l -- http://www.freelists.org/webpage/oracle-lReceived on Tue Jul 31 2012 - 13:09:03 CDT