RE: Global Cache and Enqueue Services statistics

From: Hameed, Amir <Amir.Hameed_at_xerox.com>
Date: Tue, 31 Jul 2012 14:09:03 -0400
Message-ID: <304F58144267C5439E733532ABC9A3A1159953A0_at_USA0300MS02.na.xerox.net>



Thank you Tim and Gaja for your feedback. It is much appreciated. Here is some more information and the reasoning behind my mentioning of FREELSITS and FREELIST GROUPS. Based on the top waits in the AWR report, I queries DBA_HIST_ACTIVE_SESS_HISTORY and was able to identified statements that were waiting on these events. There are a total of 12-14 statements that account for around 90-95% of these waits. These statements are SELECT FOR UPDATE, UPDATE, DELETE AND INSERT. These include two INSERT statements and one of them is parallelized with over twenty sessions inserting into it in parallel. This was the primary reason for looking into FREELISTS and FREELIST GROUPS for a few tables that were involved in the INSERT statements. As I had mentioned in my previous note, this system was upgraded from 11.0.3 to 11.5.x several years ago and is therefore not configured with ASSM. Additionally, it is my observation that prior to EBS 11i, Oracle did not have a standard on defining FREELIST GROUPS on the base Oracle tables and most of the tables that I have looked at have these values set to 1. Starting with 11i fresh installs (installers prior to 11.5.10), tables started to get created with FLG of 4; at least this is what I have observed. With 11.5.10.2 installers, Oracle uses ASSM.

I have also noticed that the ping timings as reported in AWR are a bit high. The following is taken from node #2 and that is why its ping timing is low.

   Target 500B Pin Avg Latency Stddev 8K Ping Avg Latency Stddev
 Instance Count 500B msg 500B msg Count 8K msg 8K msg
--------- -------- ----------- ----------- -------- -----------


        1 2,106 1.17 4.54 2,106 2.25 6.85

        2 2,106 .32 .08 2,106 .30 .08

        3 2,106 .47 1.17 2,106 .94 3.13


When I use the following commands from all RAC nodes to check latency on the interconnect interfaces, I consistently get sub-second response time. So, I am not sure why ping from AWR is showing higher timings when it also uses Oracle's ping processes to report these timings:

ping -sv -i <SOURCE_IP> <DEST_IP> 8196 <PING_COUNT> traceroute -F -s < SOURCE_IP> <DEST_IP> 8196

-----Original Message-----
From: oracle-l-bounce_at_freelists.org
[mailto:oracle-l-bounce_at_freelists.org] On Behalf Of Tim Gorman Sent: Tuesday, July 31, 2012 1:05 PM
To: ORACLE-L
Subject: Re: Global Cache and Enqueue Services statistics

Amir,
FREELIST GROUPS could not be a problem if the tablespaces involved are SEGMENT_SPACE_MANAGEMENT = AUTO; FREELISTS and FREELIST GROUPS pertain only to SEGMENT_SPACE_MANAGEMENT = MANUAL, which is no longer default. Even if someone chose the non-default space-management format, contention on FREELISTS and FREELIST GROUPS only affect INSERT operations, and the only event in the "/Top Five Timed Events/" report you provided which might result from INSERT activity is the "/enq: TX - row lock contention/" and that most definitely has nothing to do with freelists and everything to do with row-locking on SELECT ... FOR UPDATE, UPDATE, DELETE, or MERGE.

Of course, such an event might be occurring as the 6th or 10th most prevalent wait (i.e. not in the top 5), but since the top 5 consume roughly 85% of DB Time during this period, then it is unlikely to figure

significantly anyway.

Enable SQL trace on the session, aggregate and analyze those trace files

with either TKPROF, TRCANLZR, or (better yet) Hotsos/Method-R Profiler.

Stop guessing, and work from facts. And don't be deterred by unpalatable

truths.

Hope this helps...

-Tim

On 7/31/2012 10:14 AM, Gaja Krishna Vaidyanatha wrote:
> Hi Amir,
> >From the performance data you have shared, it is evident that RAC's
consumption is approximately 30% of Elapsed Time for your workload. In the case of your application, that is the overhead that RAC poses in a 3-node cluster configuration. The row-level locking issue can (and will) exacerbate the elapsed times of everything that RAC has to do in the form of "inter-instance communication". I would definitely focus on determining why 14% of DB Time is spent on locking. The application is incurring "lock waits" that is averaging 1.3 seconds per occurrence. In the realm of a normal transactional locking application, that is an eternity.
>
> And to Tim Gorman's point, the data does basically show to you, what
life is going to be in a multi-instance-shared-database architecture environment. This could be one of those cases, where an application that performs reasonably well in a single-instance configuration, will fall-apart in a multi-instance configuration WITHOUT additional work to deal with the "intricacies of RAC". I would definitely give Tim's suggestion of setting CLUSTER_DATABASE from one of the instances and re-running your tests, a try. Also, I am not convinced at this time whether "Freelists Groups" set at 1 is the cause of your performance bane. I personally would rather focus on solving the locking issue first, re-running your tests again and then go down the path of Freelist Groups manipulation (if the performance data deems it to be necessary). It may be the case with yours, that you will have to deal with "the perception of HA" in a completely different fashion. Do keep us
> posted.

>

> Cheers,
>

> Gaja
>
> Gaja Krishna Vaidyanatha,
> CEO & Founder, DBPerfMan LLC
> http://www.dbperfman.com
> http://www.dbcloudman.com
>

> Phone - +1-650-743-6060
> LinkedIn -http://www.linkedin.com/in/gajakrishnavaidyanatha
>
> Co-author: Oracle Insights:Tales of the Oak Table
-http://www.apress.com/9781590593875
> Primary Author: Oracle Performance Tuning 101
-http://www.amzn.com/0072131454
> Enabling Cloud Deployment & Management for Oracle Databases
>
>

> ________________________________
> From: "Hameed, Amir"<Amir.Hameed_at_xerox.com>
> To:gajav_at_yahoo.com;oracle-l_at_freelists.org
> Sent: Monday, July 30, 2012 1:47 PM
> Subject: RE: Global Cache and Enqueue Services statistics
>
> Hi Gaja,
> Below are the top-5 wait events which are the same on all nodes:
>

> Top 5 Timed Foreground Events
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> Avg
> wait % DB
> Event Waits Time(s) (ms) time
Wait Class
> ------------------------------ ------------ ----------- ------ ------


> db file sequential read 54,365,842 34,374 1 32.3
User I/O
> gc buffer busy acquire 461,651 18,904 41 17.8
Cluster
> enq: TX - row lock contention 11,506 15,269 1327 14.4
Applicatio
> DB CPU 11,476 10.8
> gc current block busy 255,945 10,747 42 10.1
Cluster
>
--
http://www.freelists.org/webpage/oracle-l


--
http://www.freelists.org/webpage/oracle-l
Received on Tue Jul 31 2012 - 13:09:03 CDT

Original text of this message