Skip navigation.

Bobby Durrett's DBA Blog

Syndicate content
Oracle database performance
Updated: 4 hours 18 min ago

AZORA – Arizona Oracle User Group meeting January 20th

Fri, 2014-12-12 16:25

AZORA is planning a meeting January 20th.  Here is the link to RSVP: url

Hope to see you there. :)

– Bobby

Categories: DBA Blogs

10128 trace to see partition pruning

Tue, 2014-12-09 10:57

I am working on an SR with Oracle support and they asked me to do a 10128 trace to look at how the optimizer is doing partition pruning.  I did some quick research on this trace and wanted to pass it along.

Here are the names of the two Oracle support documents that I found most helpful:

How to see Partition Pruning Occurred? (Doc ID 166118.1)

Partition Pruning Min/Max Optimization Fails when Parallel Query Run in Serial (Doc ID 1941770.1)

The first was the one Oracle support recommended.  But, the SR said to run both a level 2 and a level 7 trace and the first document did not mention level 7.  But, the second document has an example of a level 7 trace and more details on how to set it up.

I also found these two non-Oracle sites or blog posts:

http://cbohl.blogspot.com/2006/10/verify-that-partition-pruning-works.html

http://www.juliandyke.com/Diagnostics/Events/EventReference.html#10128

I do not have time to delve into this further now but if you are trying to understand partition pruning then the 10128 trace may help you understand how it works.

– Bobby


Categories: DBA Blogs

Parsing blocks stats blocks parsing

Mon, 2014-11-24 13:33

I had a five-minute conversation with Oracle development Friday that rocked my world.  I found out that parsing blocks stats which blocks parsing.

We have a system with queries that are taking minutes to parse.  These queries include the main tables on our database, one of which is interval partitioned and has 20,000 sub-partitions.  We have seen a situation where Oracle’s delivered statistics gathering job hangs on a library cache lock waiting for a query to finish parsing.  But, much worse than that, we find most queries on our system hanging on library cache lock waits blocked by the statistics job.

We have an SR open on this situation because it seems to be a bug, but Friday someone on the phone from Oracle development explained that this parse blocks stats blocks parse situation is normal.  Later after I got off the phone I built a simple test case proving that what he said was true.  I took a query that took a long time to parse in production and ran it on our development database and it took 16 seconds to parse there.  I choose the smallest table that the query included and gathered stats on it.  The stats ran in a fraction of a second when run by itself, but if I started the long parsing query in one window and ran the stats in another window the stats hung on a library cache lock wait for 15 seconds.  Then I created a trivial query against the same small table I had gathered stats on.  The query ran instantly by itself.  But, if I ran the long parsing query first, kicked off the stats which hung on the lock, and then kicked off the short query against the table I was gathering stats on the short query hung on a library cache lock also.  This example convinced me that the parse blocks stats blocks parse chain was real.

This morning I built a standalone test case that others can run to prove this out on their databases: zip of testcase.  To run the testcase you need three windows where you can run three sqlplus scripts in rapid succession.  In one window first just run tables.sql to create the test tables.  Then run these three scripts one after the other in each window to create the three link chain: chain1.sql, chain2.sql, chain3.sql.  Chain1.sql has the explain plan of a query that takes a long time to parse.  Chain2.sql gathers stats on one table.  Chain3.sql runs a simple query against the table whose stats are being gathered.  Chain1 spends all of its time on the CPU doing the parse.  Chain2 and 3 spends all of their time on library cache lock waits.

First I created two tables:

create table t1 as select * from dba_tables;
create table t2 as select * from dba_tables;

Next I kicked off the explain plan that takes a long time to run.  It joined 100 tables together:

explain plan into plan_table for 
select 
count(*)
from
     t1,
     t2,
     t2 t3,
...
     t2 t100
where
  t1.owner=t2.owner and
...
  t1.owner=t100.owner
/

This explain plan ran for 26 seconds, almost all of which was CPU:

Elapsed: 00:00:26.90

...

CPU in seconds
--------------
         26.81

Right after I kicked off the explain plan I kicked off this statement which gathered stats on the first table in the from clause:

execute dbms_stats.gather_table_stats(NULL,'T1');

This ran for 25 seconds and almost all of the time was spent on a library cache lock wait:

Elapsed: 00:00:25.77

...

Library cache lock in seconds
-----------------------------
                        25.55

Right after I kicked off the gather table stats command I ran this simple query making sure that it was unique and required a hard parse:

select /* comment to force hard parse */ count(*) from T1;

This ran for 24 seconds and almost all of the time was spent on a library cache lock wait:

Elapsed: 00:00:24.48

...

Library cache lock in seconds
-----------------------------
                        24.48

Evidently when a session parses a query it needs to obtain a shared lock on every table that the query includes.  When you gather statistics on a table you need to obtain an exclusive lock on the table, even if you are gathering statistics on one partition or sub-partition of the table.  While the statistics gathering session waits to acquire an exclusive lock any new parses that include the same table will hang.

Prior to Friday I did not think that there was any non-bug situation where gathering optimizer statistics would lock up sessions.  I thought that the only negative to gathering statistics at the same time as other application processing was that statistics gathering would compete for system resources such as CPU and I/O and possibly slow down application code.  But, now I know that gathering statistics can hang all queries that use the given table if stats gathering gets hung up waiting for a query that takes a long time to parse.

– Bobby

P.S. After reviewing the SR I wanted to understand what this parse blocks stats blocks parse looked like in a state dump.  The Oracle support analyst explained how the locks looked in a state dump that we uploaded but I didn’t get a chance to look at it closely until today.  I found the most important information in lines with the string “LibraryObjectLock” at the front of the line after some spaces or tabs.  There were three types of lines – the holding share lock, the waiting exclusive lock, and the many waiting share locks:

LibraryObjectLock:  Address=... Handle=0x5196c8908 Mode=S ...
LibraryObjectLock:  Address=... Handle=0x5196c8908 RequestMode=X ...
LibraryObjectLock:  Address=... Handle=0x5196c8908 RequestMode=S ...

The “…” indicates places I edited out other details.  The handle 0x5196c8908 identifies the table being locked.  The “Mode=S” string indicates a successful share lock of that table by the session with the long parse time.  The “RequestMode=X” was from the stats job trying to get exclusive access to the table.  The “RequestMode=S” was all the other sessions trying to get shared access to the table waiting for stats to first get exclusive access.  Anyway, I just wanted to translate what Oracle support told me into something that might be useful to others.  Plus I want to remember it myself!




Categories: DBA Blogs

Alerting on plans that change for the worse

Fri, 2014-11-14 16:45

I’ve uploaded a monitoring script that I have worked on: zip.

The script alerts you when the optimizer runs a SQL statement with a potentially new and inefficient plan so you can intervene.  This script improves upon my earlier script which only alerts you to SQL statements running with new plans.  The new script compares the average elapsed time of the current plan with the average of the most often executed plan.  If the new plan averages more than ten times the most often executed plan then the script alerts you to a possible new slow plan.  The elapsed time for the current plan comes from V$SQL and the time for the most often executed plan comes from DBA_HIST_SQLSTAT.

Here is an example output from the first test case:

SQL_ID        PLAN_HASH_VALUE
------------- ---------------
a1fw5xjcwkjqx      1357081020

There are two test cases included in the script.  The first test case has a query which uses an index and the plan changes when I drop the index.

If you look at the output for the first test case you see the good plan:

SQL_ID  a1fw5xjcwkjqx, child number 0
-------------------------------------
select /* 9 */ owner,table_name from test where owner='XYZ' and
table_name='XYZ'

Plan hash value: 268773832

---------------------------------------------------------------
| Id  | Operation        | Name  | Rows  | Bytes | Cost (%CPU)|
---------------------------------------------------------------
|   0 | SELECT STATEMENT |       |       |       |    58 (100)|
|*  1 |  INDEX RANGE SCAN| TESTI | 16222 |   221K|    58   (0)|
---------------------------------------------------------------

The good plan uses the index.  Here is the bad plan.  Note how the sql_id and plan_hash_value correspond to the output of the monitor script.

SQL_ID  a1fw5xjcwkjqx, child number 0
-------------------------------------
select /* 9 */ owner,table_name from test where owner='XYZ' and
table_name='XYZ'

Plan hash value: 1357081020

---------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)|
---------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |       |       | 15708 (100)|
|*  1 |  TABLE ACCESS FULL| TEST |  8111 |   110K| 15708   (1)|
---------------------------------------------------------------

I have this running in a cron job on a development server and I plan to put the script into our production server next week and set it up to alert me with an email when the optimizer runs SQL statements with potentially  new and inefficient plans.

– Bobby

 


Categories: DBA Blogs

Check out the new Oracle help web site

Thu, 2014-11-13 17:52

I found a broken link to an Oracle help document in one of my posts and when I went to the Oracle 12c database documentation to find the new URL to put in my post I found that Oracle had totally revamped their online manuals.

Here is a link to the new high level Oracle help site: url

I’ve only looked at it for a few minutes and can’t say whether I like it or not.  You can still download the manuals in PDF format so that remains familiar.  It looks like the new site integrates documentation across most or all of Oracle’s products in a similar format and that’s pretty cool.

Anyway, I just saw this and thought I would pass it along.

– Bobby


Categories: DBA Blogs

Oracle DBA job in Tempe, Arizona

Tue, 2014-11-04 10:16

We still have a position open on our Oracle database team here in Tempe, Arizona.  Here is the link with an updated job description: url

We have a great team and would love to have a new member to join us.

-Bobby

Categories: DBA Blogs

Useful Carlos Sierra post about queries changing performance

Mon, 2014-11-03 09:26

Saw an interesting post related to work I am doing locking in plans and finding plans that have changed: Carlos Sierra post on queries with changing performance.

Our DBA team uses versions of my sqlstat.sql script to find queries whose elapsed time change and we use a script from Carlos Sierra’s SQLT collection to force the execution plan back to an efficient plan that the optimizer choose in the past.  Carlos Sierra’s new post includes scripts that resemble my sqlstat script but with more functionality including an interesting way of helping people figure out which queries are now running slower.  At this point we just look at the top queries that are running and run sqlstat.sql to see if they have run with less elapsed time in the past.  We eyeball it and usually it is obvious because the bad plan takes 100 times longer than the good one.  But the newly posted Carlos Sierra script seems to use some math to help figure out whether the elapsed time is really bad enough for us to consider it a problem.

I want to set up a monitoring script that pages us when a plan changes for the worse so we can decide whether to intervene and force on older plan to improve performance.  I have a simple script running on one of our databases but, as I expected, it is returning a lot of false positives.  So, I’m hoping to have some time this week to improve that script so that we get fewer false alarms and I’m going to take a close look at Carlos Sierra’s newly posted script for ideas.

– Bobby

 




Categories: DBA Blogs

Script to count and recompile invalid objects

Wed, 2014-10-29 16:48

This is pretty simple, but I thought I would share it since it is helpful to me.  I have been preparing for a large migration which involves table, index, type, function, package, and procedure changes.  When I run a big migration like this I check for invalid objects before and after the migration and attempt to recompile any that are invalid.  By checking before and after the migration I know which objects the migration invalidated.

Here’s the script:

select status,count(*)
from dba_objects
where owner='YOURSCHEMA'
group by status
order by status;

select 'alter '||object_type||' '||owner||'.'||object_name||
       ' compile;'
from dba_objects
where owner='YOURSCHEMA' and
status='INVALID' and
object_type <> 'PACKAGE BODY'
union
select 'alter package '||owner||'.'||object_name||' compile body;'
from dba_objects
where owner='YOURSCHEMA' and
status='INVALID' and
object_type = 'PACKAGE BODY';

Replace “YOURSCHEMA” with the schema that your objects are in.

Output is something like this:

STATUS    COUNT(*)
------- ----------
INVALID          7
VALID        53581

'ALTER'||OBJECT_TYPE||''||OWNER||'.'||OBJECT_NAME||'COMPILE;'
--------------------------------------------------------------
alter FUNCTION YOURSCHEMA.YOURFUNCTION compile;
alter package YOURSCHEMA.YOURPACKAGE compile body;

The counts give me a general idea of how many objects are invalid and the alters gives me sql that I can paste into a script and run to attempt to compile the objects and make them valid.

Hope this is helpful to someone else.  It’s helpful to me.

– Bobby



Categories: DBA Blogs

Recap of yesterday’s Arizona Oracle User Group (AZORA) meeting

Fri, 2014-10-24 11:15

Yesterday was the first meeting of the newly restarted Arizona Oracle User Group, AZORA.  It was a good first meeting to kick off what I hope will turn into an active club.

We met in the Tempe Public Library in a large downstairs room with a projector screen and plenty of seating with tables so it was easy to take notes.  We also had snacks.  I got a Paradise Bakery chocolate chip cookie and bottled water so that put me in a good mood for the afternoon meeting. They also had some giveaways and so I picked up an Oracle pen and notebook to use to take notes during the talk.  They also gave away three or four Rich Niemiec books and some gift cards as prizes, but, as usual, I didn’t win anything.

The first talk was about “Big Data”.  I’m skeptical about any buzzword, including big data, but I enjoyed hearing from a technical person who was actually doing this type of work.  I would rather hear an honest perspective from someone in the midst of the battle than get a rosy picture from someone who is trying to sell me something.  Interestingly, the talk was about open source big data software and not about any Oracle product, so it gave me as an Oracle database specialist some insight into something completely outside my experience.

The second talk was about another buzzword – “The Cloud”.  The second talk was as helpful as the first because the speaker had exposure to people from a variety of companies that were actually doing cloud work.  This talk was more directly related to my Oracle database work because you can have Oracle databases and applications based on Oracle databases deployed in the cloud.

It was a good first meeting and I hope to attend and help support future meetings.  Hopefully we can spread the news about the club and get even more people involved and attending so it will be even more useful to all of us.  I appreciate those who put in the effort to kick off this first meeting.

– Bobby





Categories: DBA Blogs

Reminder: first Arizona Oracle User Group meeting tomorrow

Wed, 2014-10-22 14:02

Fellow Phoenicians (citizens of the Phoenix, Arizona area):

This is a reminder that tomorrow is the first meeting of the newly reborn (risen from the ashes) Arizona Oracle User Group.  Here are the meeting details: url

I hope to meet some of my fellow Phoenix area DBAs tomorrow afternoon.

– Bobby

 


Categories: DBA Blogs

Script to compare plan performance

Tue, 2014-10-21 17:55

Here’s a zip of a script I modified today: zip

Here’s an example output:

 QUERY_NUM SQL_ID        PLAN_HASH_VALUE EXECUTIONS AVG_ELAPSED OPTIMIZER_COST AVG_FETCHES  AVG_SORTS AVG_DISK_READS AVG_BUFFER_GETS   AVG_ROWS    AVG_CPU AVG_IOWAIT AVG_DIRECT_WRITES AVG_PHYS_READS AVG_PHYS_WRITES
---------- ------------- --------------- ---------- ----------- -------------- ----------- ---------- -------------- --------------- ---------- ---------- ---------- ----------------- -------------- ---------------
         1 gxk0cj3qxug85      2051250004   39504258  31630.2852             15  4.71993394 .444248288     4.07294381      440.124393 41.3960784 3447.83137 28056.5262                 0     .006406626               0
         1 gxk0cj3qxug85       548353012   24360619  31854.5456             15  4.73696596 .574941507     4.16225047      443.290799 41.5668639 3501.62695 28205.9349                 0     .009019804               0
         2 53b6j9nvd2vwa       376144290    1069593  438746.758     19.6425025  33.7741683          0     58.9193684       3864.5447  332.18592 32952.0388 406548.271                 0     19.2981312               0
         2 53b6j9nvd2vwa       655563694    1008553  414586.506     15.0111675  33.7122908          0     58.6486828      3851.28671 331.575216 32283.0233 382834.453                 0     12.4507269               0
         2 53b6j9nvd2vwa      2504177057     274652  418157.478     19.5541562  32.8918486          0     61.1868838      3726.22908 323.358235 31005.0901 388545.269                 0     23.2050923               0
         3 4usguw7d6kxy4      2543632952    1070303   151648.49     12911.7175  2.30832577 .999997197     5.49557555      6221.80141 16.7674061 130072.596 10153.4778                 0     .388805787               0
         3 4usguw7d6kxy4      3221641997     996033  151860.479      11987.696  2.22684489 .999998996     7.10842914      6073.16306  15.902446  127194.45 13655.9405                 0     .316254582               0
         3 4usguw7d6kxy4      1764817920          2    277287.5          12860           1          1              1            6956          5     260000    10575.5                 0              1               0
         4 buhvbrfw1uyrd      2225737849    2871021  37985.4363     7.37064414  32.5216263          0      5.0695129      439.697767 319.641241 3463.90361 34645.8396                 0     2.01971389               0
         5 bvttgft3kj2hg      3569562598     293543  252814.965     20.6213018  95.7064212 .999996593     12.3908524      11035.1541 951.571562 137023.945 95634.8559                 0     1.94147365               0
         6 084v1t4whvzwd       883418687      70258  940211.781     4875.03099  1.00001423          0     30.6947394      22558.3683 .954980216 880713.798 16997.2457                 0     2.93185118               0
         7 972sx2uhfswn5       632842214  229406586  279.732773             14  1.00002261 .092121309     .003579313      25.6327361 .941600155 229.417607  23.334532                 0     .000786085               0

I’m using this script to help figure out which plans to lock in with SQL profiles.  For example, sql_id gxk0cj3qxug85 executes plan 2051250004 more often than plan 548353012 and they seem similar in performance so maybe it would be safe to just lock in the first plan.

I’ve also been reviewing the plans manually to see if there were major differences or if the plans were essentially the same.

I had a few minutes so I thought I would pass this along.

– Bobby

Categories: DBA Blogs

Tweaked bind variable script

Wed, 2014-10-15 15:17

I modified the bind variable extraction script that I normally use to make it more helpful to me.

Here was my earlier post with the old script: blog post

Here is my updated script:

set termout on 
set echo on
set linesize 32000
set pagesize 1000
set trimspool on

column NAME format a3
column VALUE_STRING format a17

spool bind2.log

select * from 
(select distinct
to_char(sb.LAST_CAPTURED,'YYYY-MM-DD HH24:MI:SS') 
  DATE_TIME,
sb.NAME,
sb.VALUE_STRING 
from 
DBA_HIST_SQLBIND sb
where 
sb.sql_id='gxk0cj3qxug85' and
sb.WAS_CAPTURED='YES')
order by 
DATE_TIME,
NAME;

spool off

Replace gxk0cj3qxug85 with the sql_id of your own query.

The output looks like this (I’ve scrambled the values to obscure production data):

DATE_TIME           NAM VALUE_STRING
------------------- --- -----------------
2014-08-29 11:22:13 :B1 ABC
2014-08-29 11:22:13 :B2 DDDDDD
2014-08-29 11:22:13 :B3 2323
2014-08-29 11:22:13 :B4 555
2014-08-29 11:22:13 :B5 13412341
2014-08-29 11:22:13 :B6 44444
2014-08-29 11:26:47 :B1 gtgadsf
2014-08-29 11:26:47 :B2 adfaf
2014-08-29 11:26:47 :B3 4444
2014-08-29 11:26:47 :B4 5435665
2014-08-29 11:26:47 :B5 4444
2014-08-29 11:26:47 :B6 787

This is better than the original script because it keeps related bind variable values together.

– Bobby


Categories: DBA Blogs

Patch 19183482 resolves ORA-01403 getting plan with baseline

Fri, 2014-09-26 14:51

I was testing SQL plan baselines on a base 11.2.0.3 release of Oracle on a 64 bit Linux virtual machine.  I ran DBMS_SPM.LOAD_PLANS_FROM_CURSOR_CACHE to create a SQL plan baseline for a test query after running that query once to get its plan in the cursor cache.  When I ran the test query after creating the SQL plan baseline and called dbms_xplan.display_cursor to see its new plan I got an ORA-01403 error:

ORCL:SYSTEM>select * from table(dbms_xplan.display_cursor(null,null,'ALL'));

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------
SQL_ID  4mu5a860ardpz, child number 1

An uncaught error happened in prepare_sql_statement : ORA-01403: no data found

NOTE: cannot fetch plan for SQL_ID: 4mu5a860ardpz, CHILD_NUMBER: 1
      Please verify value of SQL_ID and CHILD_NUMBER;
      It could also be that the plan is no longer in cursor cache (check v$sql_plan)

I applied patch 19183482 to my test system and the ORA-01403 error went away:

ORCL:SYSTEM>select * from table(dbms_xplan.display_cursor(null,null,'ALL'));

PLAN_TABLE_OUTPUT
-------------------------------------------------------------------------------------
SQL_ID  4mu5a860ardpz, child number 1
-------------------------------------
select sum(blocks) from test

Plan hash value: 1950795681

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |       |       |    29 (100)|          |
|   1 |  SORT AGGREGATE    |      |     1 |     3 |            |          |
|   2 |   TABLE ACCESS FULL| TEST |  2844 |  8532 |    29   (0)| 00:00:01 |
---------------------------------------------------------------------------

Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------

   1 - SEL$1
   2 - SEL$1 / TEST@SEL$1

Column Projection Information (identified by operation id):
-----------------------------------------------------------

   1 - (#keys=0) SUM("BLOCKS")[22]
   2 - "BLOCKS"[NUMBER,22]

Note
-----
   - SQL plan baseline SQL_PLAN_dscatqjvyk4qy6b581ab9 used for this statement

Here is a zip of the script that I ran to create the ORA-01403 error and the output that it generated with and without the patch applied on my test database: zip

Here is a list of the bugs that patch 19183482’s readme says it will fix:

14512308: SPM BASELINE PLAN CANNOT BE REPRODUCED
15858022: ‘LIBRARY CACHE: MUTEX X’ AND LIBRARY CACHE LOCKS  PURGED_CURSOR
16400122: SPIKES IN LIBRARY CACHE
16625010: SPM BASELINE NOT WORKING FOR SQL CALLED FROM PL/SQL

I haven’t gotten very far into my investigation of SQL plan baselines but it looks like it would be a good idea to apply 19183482 before using SQL plan baselines on 11.2.0.3.  I barely got started using SQL plan baselines and I immediately hit this bug.

– Bobby









Categories: DBA Blogs

Query to show sql_ids related to SQL Profiles

Thu, 2014-09-25 17:58

I have a number 0f SQL Profiles on a database I’m working on and I wanted to know the sql_id of the SQL statements that they relate to.  Here is what I came up with:

select distinct 
p.name sql_profile_name,
s.sql_id
from 
dba_sql_profiles p,
DBA_HIST_SQLSTAT s
where
p.name=s.sql_profile;

Here is the output on my system:

SQL_PROFILE_NAME               SQL_ID
------------------------------ -------------
coe_5up1944u1zb7r_1979920644   5up1944u1zb7r
coe_b9ad7h2rvtxwf_3569562598   b9ad7h2rvtxwf
coe_9f21udzrfcygh_2815756405   9f21udzrfcygh
coe_366brswtxxuct_10194574     366brswtxxuct
coe_2261y37rp45py_2815756405   2261y37rp45py

These are all profiles I created manually with the SQLT scripts so the profile name includes the sql_id but not all sql profiles are like this.  I have more entries in dba_sql_profiles than these five but these are the only rows that matched a row in dba_hist_sqlstat so I guess this won’t work for queries that are not used very often or are so fast that they don’t get pulled into the tables that are behind dba_hist_sqlstat.

– Bobby



Categories: DBA Blogs

Startup upgrade suppresses ORA-00955 on create table WRH$_SQL_PLAN

Tue, 2014-09-23 15:13

Today I was trying to see if upgrading from 11.2.0.2 to 11.2.0.4 would change the SYS.WRH$_SQL_PLAN table.  This table is large on our production system so I wanted to find out if some time-consuming update to this table would occur that would slow down our production upgrade but not be detected on our test systems.  We recently performed this upgrade on our development database and I was looking at the logs to see whether SYS.WRH$_SQL_PLAN was modified.  I found this curious entry (edited for brevity):

create table WRH$_SQL_PLAN
2  (snap_id           number        /* last snap id, used for purging */
3  ,dbid           number       not null
4  ,sql_id           varchar2(13)    not null
...
42   using index tablespace SYSAUX
43  ) tablespace SYSAUX
44  /

Table created.

The “Table created.” message sounds like the database created a new table without any errors.  But, looking at DBA_OBJECTS the table was not new.  So, I guessed that when you are running the catproc.sql script which includes the create table statement for SYS.WRH$_SQL_PLAN it must contain something that suppresses the error that you should get when you try to create a table and the table already exists:

ORA-00955: name is already used by an existing object

So, I opened my 11.2.0.3 test database using STARTUP RESTRICT  and ran @catproc.sql as SYSDBA and to my surprise I got the error just as you normally would:

 42   using index tablespace SYSAUX
 43  ) tablespace SYSAUX
 44  /
create table WRH$_SQL_PLAN
             *
ERROR at line 1:
ORA-00955: name is already used by an existing object

So, I decided to restart this database with STARTUP UPGRADE and rerun catproc.sql and as a result the error disappeared:

 40  ,constraint WRH$_SQL_PLAN_PK primary key
 41      (dbid, sql_id, plan_hash_value, id)
 42   using index tablespace SYSAUX
 43  ) tablespace SYSAUX
 44  /

Table created.

Cue the mysterious Twilight Zone music…

I guess this is a “feature” of the startup upgrade command but the “Table created.” message is kind of confusing.  The table isn’t really created if it exists.  But, I guess the good thing is that it doesn’t report an error.

– Bobby

 



Categories: DBA Blogs

ps and top differences with HugePages

Tue, 2014-09-16 18:09

The Unix utilities ps and top report memory differently with HugePages than without.

Without HugePages ps -eF seems to include the SGA memory under the SZ column:

UID        PID  PPID  C    SZ   RSS PSR STIME TTY          TIME CMD
... 
oracle    1822     1  0 846155 16232  0 07:19 ?        00:00:00 ora_d000_orcl
oracle    1824     1  0 846155 16228  0 07:19 ?        00:00:00 ora_d001_orcl
oracle    1826     1  0 846155 16236  0 07:19 ?        00:00:00 ora_d002_orcl
oracle    1828     1  0 846155 16232  0 07:19 ?        00:00:00 ora_d003_orcl
oracle    1830     1  0 846155 16224  0 07:19 ?        00:00:00 ora_d004_orcl
oracle    1832     1  0 846155 16232  0 07:19 ?        00:00:00 ora_d005_orcl
oracle    1834     1  0 846155 16236  0 07:19 ?        00:00:00 ora_d006_orcl
oracle    1836     1  0 846155 16228  0 07:19 ?        00:00:00 ora_d007_orcl
oracle    1838     1  0 846155 16224  0 07:19 ?        00:00:00 ora_d008_orcl
oracle    1840     1  0 846155 16232  0 07:19 ?        00:00:00 ora_d009_orcl
oracle    1842     1  0 846155 16240  0 07:19 ?        00:00:00 ora_d010_orcl
oracle    1844     1  0 846155 16228  0 07:19 ?        00:00:00 ora_d011_orcl
...

Here SZ = 846155 kilobytes = 826 megabytes.  If you add up all the SZ values it comes to 81 gigabytes which wont fit in my 4 gig memory and 4 gig swap.  It seems to include the amount of the SGA actually used, not the full 3 gigabyte max sga size, otherwise the total would have been hundreds of gigabytes.

Doing the same exercise with 3 gigabytes of huge pages ps looks like this:

UID        PID  PPID  C    SZ   RSS PSR STIME TTY          TIME CMD
...
oracle    1809     1  0 59211 15552   0 07:52 ?        00:00:00 ora_d000_orcl
oracle    1811     1  0 59211 15544   0 07:52 ?        00:00:00 ora_d001_orcl
oracle    1813     1  0 59211 15548   0 07:52 ?        00:00:00 ora_d002_orcl
oracle    1815     1  0 59211 15544   0 07:52 ?        00:00:00 ora_d003_orcl
oracle    1817     1  0 59211 15544   0 07:52 ?        00:00:00 ora_d004_orcl
oracle    1819     1  0 59211 15548   0 07:52 ?        00:00:00 ora_d005_orcl
oracle    1821     1  0 59211 15544   0 07:52 ?        00:00:00 ora_d006_orcl
oracle    1823     1  0 59211 15544   0 07:52 ?        00:00:00 ora_d007_orcl
oracle    1825     1  0 59211 15544   0 07:52 ?        00:00:00 ora_d008_orcl
oracle    1827     1  0 59211 15544   0 07:52 ?        00:00:00 ora_d009_orcl
oracle    1829     1  0 59211 15544   0 07:52 ?        00:00:00 ora_d010_orcl
oracle    1831     1  0 59211 15544   0 07:52 ?        00:00:00 ora_d011_orcl
...

SZ = 59211 k= 57 meg.  Total SZ = 5.89 gigabytes.  Still this is bigger than total memory but closer to the 4 gig memory available.  It’s just a guess, but I’m pretty sure that with HugePages this total doesn’t include the amount of memory in use in the SGA in the SZ for each process as it did without HugePages.

The other weird thing is how different top looks with HugePages.  Here is top with the database having just come up without HugePages:

top - 07:20:16 up 3 min,  2 users,  load average: 1.06, 0.33, 0.13
Tasks: 187 total,   1 running, 186 sleeping,   0 stopped,   0 zombie
Cpu(s):  1.6%us,  6.3%sy,  0.0%ni, 77.8%id, 14.2%wa,  0.0%hi,  0.1%si,  0.0%st
Mem:   4050836k total,   984444k used,  3066392k free,    14460k buffers
Swap:  4095996k total,        0k used,  4095996k free,   450128k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND            
 2010 oracle    20   0 3310m  51m  44m D  7.6  1.3   0:00.21 oracle             
 1988 oracle    20   0 3307m  50m  45m D  3.8  1.3   0:00.21 oracle             
 1794 oracle    -2   0 3303m  15m  13m S  1.9  0.4   0:01.07 oracle

Notice that we have about 3 gigabytes free – 3066392k and nothing in swap.

Here is the same system  with 3 gig of HugePages:

top - 07:53:21 up 2 min,  2 users,  load average: 0.81, 0.29, 0.11
Tasks: 179 total,   1 running, 178 sleeping,   0 stopped,   0 zombie
Cpu(s):  2.0%us,  8.6%sy,  0.0%ni, 69.2%id, 20.1%wa,  0.0%hi,  0.1%si,  0.0%st
Mem:   4050836k total,  3860100k used,   190736k free,    14332k buffers
Swap:  4095996k total,        0k used,  4095996k free,   239104k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND            
 1781 oracle    -2   0 3303m  15m  13m S  3.5  0.4   0:01.02 oracle             
    1 root      20   0 19400 1520 1220 S  0.0  0.0   0:01.43 init               
    2 root      20   0     0    0    0 S  0.0  0.0   0:00.00 kthreadd

Now only 190736k is free.  But, note that in both cases top lists the oracle processes with 3300 meg of virtual memory, which is consistent with the 3 gig max sga.

I’ve still got a lot to learn about HugePages but I thought I would pass along these observations.  This article on Oracle’s support site helped me learn about HugePages:

HugePages on Oracle Linux 64-bit (Doc ID 361468.1)

I ended up sizing the HugePages down to 2 gig on my 4 gig test system and reducing sga max size to 2 gig as well.  My system was sluggish with so little free memory when I was using a 3 gig SGA and HugePages.  It was much snappier with only 2 gig tied up in HugePages and dedicated to the SGA, leaving 2 gig for everything else.

This was all done with Oracle’s 64 bit version of Linux and 11.2.0.3 database.

– Bobby

 






Categories: DBA Blogs

Virtual Circuit Wait

Thu, 2014-09-11 15:28

On Monday we had some performance problems on a system that includes a database which uses shared servers.  The top wait was “virtual circuit wait”.  Here are the top 5 events for a 52 minute time frame:

Top 5 Timed Foreground Events

Event Waits Time(s) Avg wait (ms) % DB time Wait Class virtual circuit wait 1,388,199 17,917 13 50.98 Network db file sequential read 1,186,933 9,252 8 26.33 User I/O log file sync 1,185,620 6,429 5 18.29 Commit DB CPU 5,964 16.97 enq: TX – row lock contention 391 586 1499 1.67 Application

From other monitoring tools there was no sign of poor performance from the database end but virtual circuit wait is not normally the top wait during peak times.  Overall for the time period of this AWR report the shared servers didn’t seem busy:

Shared Servers Utilization Total Server Time (s) %Busy %Idle Incoming Net % Outgoing Net % 111,963 38.49 61.51 15.99 0.01

We have seen virtual circuit waits ever since we upgraded to 11g on this system so I wanted to learn more about what causes it.  These two Oracle support documents were the most helpful:

Troubleshooting: Virtual Circuit Waits (Doc ID 1415999.1)

Bug 5689608: INACTIVE SESSION IS NOT RELEASING SHARED SERVER PROCESS (closed as not bug)

Evidently when you return a cursor from a package and the cursor includes a sort step a shared server will be hung up in a virtual circuit wait state from the time the cursor is first fetched until the application closes the cursor.  Our application uses cursors in this way so it stands to reason that our virtual circuit wait times we saw in our AWR report represent the time it took for our web servers to fetch from the cursors and close them, at least for the cursors that included sort steps.  So, if our web servers slow down due to some other issue they could potentially take longer to fetch from and close the affected cursors and this could result in higher virtual circuit wait times.

Here is a zip of a test script I ran and its output: zip

I took the test case documented in bug 5689608 and added queries to v$session_wait to show the current session’s virtual circuit waits.

Here are the first steps of the test case:

CREATE TABLE TEST AS SELECT * FROM DBA_OBJECTS; 
     
create or replace package cursor_package as
cursor mycursor is select * from test order by object_name;
end;
/
       
begin
 open cursor_package.mycursor;
end;
/
 
create or replace procedure test_case is
l_row TEST%rowtype;
begin
if cursor_package.mycursor%isopen then
fetch cursor_package.mycursor into l_row;
end if;
end;
/

These steps do the following:

  1. Create a test table
  2. Create a package with a cursor that includes an order by to force a sort
  3. Open the cursor
  4. Create a procedure to fetch the first row from the cursor

At this point I queried v$session_wait and found no waits:

SQL> select * from v$session_event
  2  where sid=
  3  (SELECT sid from v$session where audsid=USERENV('SESSIONID')) 
     and
  4  event='virtual circuit wait';

no rows selected

The next step of the test case fetched the first row and then I queried and found the first wait:

SQL> exec test_case;

SQL> select * from v$session_event
  2  where sid=
  3  (SELECT sid from v$session where audsid=USERENV('SESSIONID')) 
     and
  4  event='virtual circuit wait';

       SID EVENT                          TIME_WAITED
---------- --------------------------------------------------------
       783 virtual circuit wait           0

Note that time_waited is 0 which means the time was less than one hundredth of a second.  Next I made my sqlplus client sleep for five seconds using a host command and looked at the wait again:

SQL> host sleep 5

SQL> select * from v$session_event
  2  where sid=
  3  (SELECT sid from v$session where audsid=USERENV('SESSIONID')) 
     and
  4  event='virtual circuit wait';

       SID EVENT                             TIME_WAITED
---------- --------------------------------------------------------
       783 virtual circuit wait              507

Total time is now 507 centiseconds = 5 seconds, same as the sleep time.  So, the time for the virtual circuit wait includes the time after the client does the first fetch, even if the client is idle.  Next I closed the cursor and slept another 5 seconds:

SQL> begin
  2   close cursor_package.mycursor;
  3  end;
  4  /

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.01
SQL> 
SQL> host sleep 5

SQL> 
SQL> select * from v$session_event
  2  where sid=
  3  (SELECT sid from v$session where audsid=USERENV('SESSIONID')) 
     and
  4  event='virtual circuit wait';

       SID EVENT                                 TIME_WAITED
---------- --------------------------------------------------------
       783 virtual circuit wait                  509

The time waited is still just about 5 seconds so the clock stops on the virtual circuit wait after the sqlplus script closes the cursor.  If the session was still waiting on virtual circuit wait after the close of the cursor the time would have been 10 seconds.

This was all new to me.  Even though we have plenty of shared servers to handle the active sessions we still see virtual circuit waits.  These waits correspond to time on the clients fetching from and closing cursors from called packages.  As a result, these wait times represent time outside of the database and not time spent within the database.  These waits tie up shared servers but as long as they are short enough and you have shared servers free they don’t represent a problem.

– Bobby

p.s. This is on hp-ux 11.31 ia64 Oracle 11.2.0.3

 

 












Categories: DBA Blogs

First AZORA usergroup meeting October 23

Fri, 2014-09-05 17:52

Just got the invitation to the first AZORA (Arizona Oracle user group) meeting on October 23.  Here is the link: url

It’s 2 pm at Oracle’s office, 2355 E Camelback Rd Ste 950, Phoenix, AZ.

I’m looking forward to it!

– Bobby

Categories: DBA Blogs

OS Watcher Paging Example

Thu, 2014-09-04 13:13

I have an example of paging in some Exadata OS Watcher log files.  We got an error in the alert log about a high load but reviewing AWR reports it did not seem like we had an unusual number of active sessions.  Also, the CPU use seemed low but the system load was high.  Reviewing the OS Watcher logs and finding a similar case on Oracle’s support site convinced me that our issue was due to a large amount memory being paged out to the swap file.

Here is the top report from the problematic time:

top - 11:07:58 up 239 days, 14:57,  0 users,  load average: 204.84, 183.48, 106.74
Tasks: 1097 total,  71 running, 1022 sleeping,   0 stopped,   4 zombie
Cpu(s):  1.3%us, 98.0%sy,  0.0%ni,  0.2%id,  0.5%wa,  0.0%hi,  0.1%si,  0.0%st
Mem:  74026796k total, 72456380k used,  1570416k free,     2436k buffers
Swap: 16771852k total,  7884656k used,  8887196k free,   597336k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                                                                   
  894 oracle    15   0  324m  26m  10m S 309.1  0.0 129:46.34 /u01/app/11.2.0.2/grid/bin/diskmon.bin -d -f                                             
 1053 root      20  -5     0    0    0 R 100.0  0.0  32:10.62 [kswapd0]                                                                                
 3366 oracle    16   0 1440m  54m  18m S 146.6  0.1   3:03.54 /u01/app/11.2.0.2/grid/bin/oraagent.bin                                                  
23809 oracle    18   0  992m 179m  16m S 128.1  0.2 158:45.67 /u01/app/oracle/Middleware/agent11g/bin/emagent                                          
28878 orarom    18   0  982m 360m  11m S 101.5  0.5  94:58.36 /opt/OracleHomes/agent_home/core/12.1.0.3.0/jdk/bin/java -Xmx154M -XX:MaxPermSize=96M -se
 4501 oracle    19   0 4423m  96m  24m R 89.6  0.1   2435:54 ora_dia0_ourprdfs1                                                                        
18318 oracle    20   0 26.3g 610m  70m R 86.3  0.8   2025:17 ora_p001_ourprd1                                                                          
18226 oracle    18   0 26.3g 595m  61m D 82.3  0.8 212:20.62 ora_p013_ourprd1                                                                          
 7619 oracle    19   0 1727m  16m  14m R 80.5  0.0   0:50.29 oracle+ASM1 (LOCAL=NO)                                                                    
 4389 oracle    25   0 25.9g 158m  27m R 78.5  0.2   2557:06 ora_dia0_ourprd1                                                                          
11008 oracle    18   0 26.3g 561m 133m D 77.9  0.8 373:34.37 ora_p010_ourprd1                                                                          
 8484 root      25   0  238m 6276 2600 R 77.3  0.0   0:46.60 python /usr/local/bin/dcli -s -q -l root -g cell_group exec 2>&1; cellcli -e list quaranti
 7937 oracle    18   0 26.3g 603m  86m R 76.5  0.8   1656:21 ora_p021_ourprd1                                                                          
24118 oracle    25   0 26.3g 602m 103m R 72.7  0.8   2997:08 ora_p007_ourprd1                                                                          
 4491 oracle    16   0 5912m 121m  25m D 72.6  0.2   2407:59 ora_dia0_obrprd1                                                                          
 7919 oracle    25   0 26.4g 680m 247m R 72.2  0.9   4256:57 ora_p011_ourprd1                                                                          
 4384 oracle    18   0 4407m  95m  25m D 70.8  0.1   2344:36 ora_dia0_otrprd1                                                                          
 4397 oracle    23   0 25.8g  44m  23m D 69.5  0.1   1695:21 ora_lmd0_ourprd1                                                                          
 2548 oracle    20   0 1809m 116m  44m R 69.3  0.2   2393:41 asm_dia0_+ASM1

Note the load of 204 and the system cpu % of 98%.  By 11:07 CPU utilization had gone back up so I assume that the swapping I/O had completed but then a bunch of system I/O kicked off.  Also I assume that [kswapd0] has something to do with swapping.  Here is the mpstat report:

             CPU   %user   %nice    %sys %iowait    %irq   %soft  %steal   %idle    intr/s
10:59:00     all    3.49    0.00    2.05   93.81    0.01    0.16    0.00    0.48   4144.00
10:59:05     all    4.94    0.00    2.45   92.21    0.01    0.11    0.00    0.28   2548.00
10:59:16     all    1.42    0.00    1.82   95.64    0.02    0.13    0.00    0.97   3108.66
10:59:21     all    3.84    0.00    3.58   90.68    0.04    0.49    0.00    1.38   3868.40
10:59:26     all    1.99    0.00    1.85   93.56    0.01    0.13    0.00    2.46   2942.48
10:59:31     all    1.38    0.00    1.85   91.37    0.01    0.10    0.00    5.29   2479.04
10:59:36     all    1.75    0.00    2.81   94.19    0.01    0.16    0.00    1.08   2762.60
10:59:41     all    0.89    0.00    1.09   97.41    0.00    0.05    0.00    0.56   2116.80
10:59:46     all    1.19    0.00    1.21   93.91    0.03    0.11    0.00    3.55   2992.40
10:59:51     all    1.79    0.00    1.75   95.88    0.01    0.28    0.00    0.29   3820.00
10:59:56     all    2.09    0.00    1.55   92.85    0.03    0.21    0.00    3.28   4410.80
11:00:01     all    1.85    0.00    1.91   96.00    0.01    0.11    0.00    0.11   3433.47
11:00:06     all    1.50    0.00    1.45   89.07    0.01    0.11    0.00    7.85   3746.00
11:00:11     all    6.41    0.00   10.68   80.29    0.09    0.80    0.00    1.73   9269.46
11:00:16     all   17.14    0.00   19.42   57.12    0.14    0.83    0.00    5.35  17018.64
11:00:21     all    8.58    0.00   80.85    9.01    0.05    0.23    0.00    1.29   6951.70
11:00:27     all    6.21    0.00   91.77    1.12    0.04    0.09    0.00    0.78   5417.11
11:00:32     all    5.65    0.00   93.75    0.53    0.01    0.04    0.00    0.02   3289.82
11:00:38     all    2.72    0.00   96.81    0.35    0.01    0.02    0.00    0.09   2480.87
11:00:44     all    1.73    0.00   97.89    0.28    0.00    0.02    0.00    0.08   1745.89
11:01:00     all    1.85    0.00   97.72    0.16    0.01    0.04    0.00    0.22   2710.93
11:01:05     all    0.79    0.00   98.84    0.25    0.00    0.04    0.00    0.09   2543.39
11:02:23     all    1.00    0.00   98.72    0.15    0.00    0.02    0.00    0.10   1558.00
11:02:33     all    0.95    0.00   98.74    0.09    0.00    0.09    0.00    0.13   1488.49
11:02:39     all    1.64    0.00   98.13    0.08    0.01    0.05    0.00    0.08   1441.73
11:02:44     all    0.46    0.00   99.29    0.10    0.00    0.09    0.00    0.06   1465.73
11:03:20     all    1.54    0.00   97.96    0.25    0.01    0.07    0.00    0.17   2757.95
11:03:25     all    0.40    0.00   99.24    0.21    0.00    0.02    0.00    0.12   1550.60
11:03:30     all    0.39    0.00   99.22    0.23    0.00    0.01    0.00    0.15   1421.20
11:03:35     all    3.29    0.00   96.32    0.25    0.01    0.03    0.00    0.10   1952.51
11:03:40     all    5.90    0.00   93.76    0.28    0.00    0.01    0.00    0.05   1407.00
11:03:45     all    1.14    0.00   98.56    0.20    0.00    0.01    0.00    0.09   1418.00
11:03:50     all    0.19    0.00   99.55    0.21    0.00    0.03    0.00    0.03   1356.69
11:04:23     all    0.36    0.00   99.31    0.18    0.00    0.04    0.00    0.10   1621.67
11:06:13     all    0.58    0.00   99.13    0.14    0.00    0.04    0.00    0.11   1550.73
11:06:18     all    3.32    0.00   96.36    0.23    0.00    0.04    0.00    0.06   2055.30
11:06:24     all    4.39    0.00   95.32    0.24    0.00    0.02    0.00    0.03   1991.53
11:06:31     all    4.14    0.00   95.48    0.28    0.01    0.02    0.00    0.07   1598.53
11:07:21     all    1.38    0.00   98.32    0.19    0.00    0.01    0.00    0.09   1444.49

Notice how it is almost all IO until around 11 am and then it switches to all system cpu%.  I assume the I/O was swapping memory out.  The meminfo report has these numbers:

zzz ***Mon Sep 1 11:00:11 CDT 2014
MemTotal:     74026796 kB
MemFree:        330888 kB
...
SwapTotal:    16771852 kB
SwapFree:      8153352 kB

I take this to mean that we have about 330 meg free out of 74 gig total, but still 8 gig of swap.  Presumably when free memory gets low then idle processes get swapped out.  A member of our Unix team pointed out the swapping on the vmstat output.  Here is a section that shows a lot of pages getting swapped out (so).  Each line represents 5 seconds and I think it starts 10:57:56.

zzz ***Mon Sep 1 10:05:31 CDT 2014 Sample interval 5 seconds
-- deleted 629 lines = 629*5 seconds = 3145 seconds
-- 10:57:56 am
procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
86 16 7567956 328800    800 610940   56 5350   157  5381 1626 10013  4 96  0  0  0
44 32 7602804 327956    996 613940  222 10464   554 10510 2177 13818  0 99  0  0  0
 1 109 7632732 330412   2284 619780  290 7986  1738  8130 3382 14002  1 21  7 70  0
35 41 7702076 353744   2504 621392  623 16054  1290 16206 8776 19234  2  3  2 92  0
 1 103 7749296 327480   2692 622004  422 11307   789 11574 3839 16138  3  2  1 94  0
 4 109 7849812 330972   3036 624808 1974 24691  2844 25110 11705 36080  4  3  1 92  0
 0 66 7869628 346228   3108 626444  859 6309  1055  6309 5538 16922  3  1  1 94  0
23 104 7895176 1998796   3300 624864 1032 7582  1283  7784 4032 24443 56  3  1 40  0
 3 90 7918016 328212   4144 632880 4910 5939  6866  6406 13908 47820 40  8  3 49  0
 1 101 7983136 333008   4480 627416 1180 15655  1523 15960 3977 16945  5  3  1 91  0
 1 125 8033148 327376   4712 628312  922 12213  1302 12295 3545 16946  4  3  0 93  0
 1 141 8072048 329060   4856 626976  764 10862   986 11082 3684 21305  2  1  1 96  0
 2 76 8106056 327556   5056 627124  797 8855  1048  8946 3968 15044  2  2  0 96  0
 0 103 8181216 328348   5284 611140 2415 16583  3076 16938 4374 17445  4  5  2 90  0
 4 92 8222500 330044   5408 610708 1470 9698  1874  9898 2842 15110  2  2  2 94  0
 4 94 8285204 328932   5936 614128 1015 14754  1678 15213 3324 18240  2  2  5 91  0
 1 111 8326908 327348   6184 614600 1034 9759  1465 10012 2577 14809  2  2  1 95  0
 1 119 8338520 328352   6332 616076  610 4896   834  4952 2505 15373  1  1  1 97  0
 0 98 8378304 329308   6536 615944  975 9795  1334  9931 3802 15897  2  2  3 93  0
 0 89 8404456 327204   6800 618872 1232 8327  1662  8413 4557 16856  2  2  2 94  0
 0 123 8430092 327272   7092 619016 1004 7379  1393  7672 2852 14689  1  1  2 96  0
 1 92 8453420 327212   7284 622700 1738 7250  2259  7494 4241 17781  2  2  2 94  0
 7 76 8562308 329132   7860 628932 2616 23847  3409 25162 6833 20938  4  7  7 82  0
 5 60 8691628 334124   7012 635948 6066 28059  8119 32201 13584 32154 15 23  1 61  0
63 37 8683600 327404   2812 582200 8138  661 12947  2870 12507 26754 12 44  6 38  0
98 22 8696516 328056   1260 560572 10494 7070 15550  9394 28854 128944  3 96  0  0  0
72 14 8694008 327148   1960 557616 1693    3  3222   161 5014 29030  5 95  0  0  0
125 19 8693816 327172   1020 558488  226    1   569   227 1991 13841  0 99  0  0  0
97 15 8703600 327044   1032 554792 5083 3217 10447  3722 26436 202860  1 99  0  0  0
102 20 8733556 327272   1432 556224 2573 6713  5399  7194 16118 66262  1 98  0  0  0
84  9 8732984 327164   1100 555640  470    2  1090   117 1585 11922  0 99  0  0  0
85 13 8732940 327304    880 554024   60    0   317    50 1470 11247  0 99  0  0  0
149 24 8732808 327428    988 553988  133    0   814   334 1962 12802  4 95  0  0  0
153 22 8732640 327284    592 553576  187    0   875   548 4619 36573  2 98  0  0  0
159 15 8732584 327104    604 552956   50    0    96   118 2294 11239  0 99  0  0  0
142 14 8732432 328644   1524 554196  174    0  1706   260 6035 45843  0 99  0  0  0
158 12 8755436 327460   2704 555568 1130 4927  2564  5486 14539 103001  0 99  0  0  0
125 14 8754920 327980    748 554368  406    0   662     6 1583 10952  0 100  0  0  0
115 16 8752204 327472   1724 553740 2453    5  6014   532 11366 82957  1 99  0  0  0
124 19 8752276 327076   1092 554200   78   28   113    81 1767 11712  1 99  0  0  0
82  8 8757568 346356   2384 556132 1316 3202  4373  3837 6633 42986  1 99  0  0  0
151 57 8756348 329524   3632 559164  828    0  3718   429 2076 11613  3 96  0  0  0
159 49 8755120 328488   3248 557720 1033    9  4298   212 2248 12981  4 95  0  0  0
107 36 8771748 332068   1016 557272 2765 3977  9481  4370 14773 103588  2 98  0  0  0
117 16 8771592 327180   1168 558788  156    0   979    58 1466 9988  1 99  0  0  0
 3 101 7875688 1448712   5300 599300 29898 73754 52723 83528 62497 360936  2 94  0  4  0
 3 96 7874004 1260800   8856 624552 7470    0 14911   569 3814 17649  5  1 33 61  0
 1 92 7859256 558968  11228 649948 5590    0 11118  2798 10027 41964 20  5 12 63  0
 0 96 7915604 327600  14060 652636 4674 11822  7818 12173 9053 22067  8  7 20 65  0
 8 98 7963060 327240  11800 635164 1994 11097  3294 12031 6762 16549  3  7  1 90  0
 1 104 7969976 340320   8812 579564 9600 1558 12002  2998 8588 22984 17 34 10 39  0
 4 89 7642788 666372  10244 603740 7154    0 12148  2375 9681 23420  7  5 22 67  0
 1 94 7642548 327912  12556 625672 7011    3 12173  1704 7865 21465  6  5 18 71  0
 7 95 7630564 327572   3396 557488 5970   33 10358  1827 4756 15284  5 67  3 25  0
113 45 7630124 330244   1900 545524 3819    5  7540  1314 3781 13454  2 88  0 10  0
 0 88 6897380 1321012   3560 573532 3635    1  9076  2802 3414 11193  3 64  7 26  0
 1 94 6889436 981560   6188 624532 8463    0 16870  4122 4949 17243  2  1 31 65  0
 0 79 6889308 890800   8444 666940 7332    0 13390  3077 3337 15930  5  1 33 60  0
 1 74 6878180 797212  10720 715612 6776    0 14430  4653 3424 13069 11  6 37 47  0
 0 141 6876812 406704  12856 739628 7202    0 11406  2342 4737 23064  6  5 24 64  0
 2 66 6874848 336480  13428 656312 6296    1 10126  1848 4375 16755  7  9 32 52  0
 2 69 6874192 338016   8288 577196 7297    0 14230  1208 5771 16410 10  5 38 47  0
 1 72 6873696 342908   3796 576568 6161    1 12562  1149 5506 13421 11  5 34 50  0
 2 78 6873524 336716   3992 589684 5718    0 12693   280 5268 11889  9  2 31 58  0
97 48 6873180 331280   2884 559732 5110    4  7616   934 6994 14502  7 38 12 43  0
120 41 6872636 327192   1504 549972 4309   32  7965  1497 7029 14159  5 92  0  3  0
101 26 6870204 327620   1992 548548 4122   19  8110  1191 9559 20993  4 96  0  0  0
82 16 6886048 327184    984 547140 2050 3186  3662  3385 4978 16982  2 98  0  0  0
83  6 6886384 327096   1376 545516  563   69  1284   184 1946 7812  1 99  0  0  0
93 11 6886364 328100    836 545120  798    0   911    18 5495 23655  0 100  0  0  0
27 42 7008372 328376   3216 550820 7050 26176  9220 26390 9919 42056  0 100  0  0  0
 3 112 7044188 328628   4552 554876 2424 8493  3526  8570 2402 7432  2 78  1 19  0
 3 113 7108916 348172   5532 560420 3186 14382  4325 14512 9715 15529  4  6  4 87  0
 0 125 7136504 366528   7040 567140 4077 10618  5413 10906 15231 26777  4  1 12 83  0
 1 111 7149172 387580   7320 568044  391 4344   719  4394 2460 7519  1  1  0 99  0
 1 94 7178000 328136   9984 580152 5768 7927  8617  8990 20026 30064  8  3  9 80  0
 0 86 7204696 327100  10268 583912 1972 6991  2825  7515 5943 12124  3  2  8 87  0
 0 103 7218656 347300  10564 585360  457 4985   733  5144 1936 8120  0  1  6 93  0
 5 98 7236196 345460  10784 585224 1298 5318  1467  5477 7973 13715  3  2  6 89  0
 1 111 7243488 328920  10932 586500  488 3097   839  3136 1843 7555  1  1  6 92  0
 0 103 7265100 328016  11132 591452 1439 6088  2170  6241 3984 9707  2  1  2 95  0
 2 91 7274576 335116  11272 591240  709 3526   789  3750 2162 7734  3  1  2 95  0
 1 124 7303804 329200  11408 593516 1170 7348  1587  7614 4230 9955  3  1  6 90  0
 4 61 7403224 343940  11724 601684 3143 20878  4191 22518 7000 14053  4  4  3 89  0
11 62 7589208 349448  12504 607764 4350 38374  5784 38847 13038 21696  6  6  9 79  0
 3 56 7777608 338348  13328 614952 6506 38412  7919 38711 11993 18575  6  6 16 71  0
 2 49 8025300 331604  13568 618408 5642 50410  6727 50490 14740 19779  7  7 35 51  0
 1 69 8159548 977560  14304 624024 4725 27423  6727 28257 13122 22037 10  7 12 72  0
 5 50 8159156 869324  18188 647964 9421    0 13542  1034 14117 30070 16  4 24 55  0
 2 46 8158644 768600  20324 663440 10323    0 13395  2023 18356 27259 15  3 52 30  0
 3 43 8158308 551160  22344 678548 12241    0 16135   623 14080 23349 19  3 42 36  0

We got this message in the alert log:

Mon Sep 01 11:09:28 2014
 GES: System Load is HIGH.
 GES: Current load is 204.89 and high load threshold is 80.00

After this the instance crashed:

Instance terminated by LMS0, pid = 4532

Bug 17316830, “ROOT CAUSE ALL DATABASE INSTANCES CRASHED ON ONE OF OUR NODES”, relates high paging and low free memory to the same GES load messages and to instances crashing as ours did.  So, it seems that when you have a lot of paging it can cause an instance to crash.  Is this a bug or a feature? :)

It was interesting to look at the OS Watcher files to see information about the nature of the system when our issues occurred.  Looks like lots of paging.  The only question is what caused the paging?  We are trying to figure out if we just had some abnormally high memory use from our application or if some bug caused some system processes to eat up the normally free memory.  But, I thought I would share the investigation we did using the OS Watcher logs because it was interesting to see an example with paging, even though we have not gotten to the root of the problem.

– Bobby

 

Categories: DBA Blogs