Skip navigation.

Bobby Durrett's DBA Blog

Syndicate content
Oracle database performance
Updated: 17 hours 24 min ago

Speaking at Delphix User Group Webex March 11

Tue, 2015-02-24 17:42

On March 11 at 10 am California time I will be speaking in a Delphix User Group Webex session.

Here is the sign up url: WebEx sign up.

Adam Leventhal, the Delphix CTO, will also be on the call previewing the new Delphix 4.2 features.

I will describe our experience with Delphix and the lessons we have learned.  It is a technical talk so it should have enough details to have value to a technical audience.  Hopefully I have put enough effort into the talk to make it useful to other people who have or are considering getting Delphix.

There will time for questions and answers in addition to our talks.

I really enjoy doing user group presentations.  I had submitted an abstract for this talk to the Collaborate 2015 Oracle user group conference but it was not accepted so I wont get a chance to give it there.  But, this WebEx event gives me a chance to present the same material, so I’m happy to have this opportunity.

If you have an interest in hearing about Delphix join the call.  It is free and there will be some good technical content.

– Bobby

P.S. If this talk interests you I also have some earlier blog posts that relate to some of the material I will be covering:

Delphix first month

Delphix direct i/o and direct path reads

Delphix table recovery

Also, I plan to post the slides after the talk.

Categories: DBA Blogs

A new index on a small table makes a big difference

Mon, 2015-02-16 16:12

A few weeks back on the weekend just before I went on call we got a complaint about slowness on an important set of reports.  I worried that the slowness of these reports would continue during my support week so I tried to figure out why they were slow.  I reviewed an AWR report for the 24 hours when the reports were running and found a simple query against a tiny table at the top of the “SQL ordered by Elapsed Time” report:

   SQL Id         Elapsed (s)        Execs
-------------   ------------- ------------
77hcmt4kkr4b6      307,516.33 3.416388E+09

I edited the AWR report to show just elapsed seconds and number of executions.  3.4 billion executions totaling 307,000 seconds of elapsed time.  This was about 90 microseconds per execution.

The previous weekend the same query looked like this:

   SQL Id         Elapsed (s)        Execs
-------------   ------------- ------------
77hcmt4kkr4b6      133,143.65 3.496291E+09

So, about the same number of executions but less than half of the elapsed time.  This was about 38 microseconds per execution.  I never fully explained the change from week to week, but I found a way to improve the query performance by adding a new index.

The plan was the same both weekends so the increase in average execution time was not due to a plan change.  Here is the plan:

SQL_ID 77hcmt4kkr4b6
--------------------
SELECT DIV_NBR FROM DIV_RPT_GEN_CTL WHERE RPT_NM = :B1 AND
GEN_STAT = 1

Plan hash value: 1430621991

------------------------------------------------------------------
| Id  | Operation                   | Name               | Rows  |
------------------------------------------------------------------
|   0 | SELECT STATEMENT            |                    |       |
|   1 |  TABLE ACCESS BY INDEX ROWID| DIV_RPT_GEN_CTL    |     1 |
|   2 |   INDEX RANGE SCAN          | DIV_RPT_GEN_CTL_U1 |     1 |
------------------------------------------------------------------

I found that the table only had 369 rows and 65 blocks so it was tiny.

The table’s only index was on columns RPT_NM and RPT_ID but only RPT_NM was in the query.  For the given value of RPT_NM the index would look up all rows in the table with that value until it found those with GEN_STAT=1.  I suspect that on the weekend of the slowdown that the number of rows being scanned for a given RPT_NM value had increased, but I can not prove it.

I did a count grouping by the column GEN_STAT and found that only 1 of the 300 or so rows had GEN_STAT=1.

SELECT GEN_STAT,count(*)
FROM DIV_RPT_GEN_CTL
group by GEN_STAT;

  GEN_STAT   COUNT(*)
---------- ----------
         1          1
         2        339
         0         29

So, even though this table is tiny it made sense to add an index which included the selective column GEN_STAT.  Also, since the reports execute the query billions of times per day it made sense to include the one column in the select clause as well, DIV_NBR.  By including DIV_NBR in the index the query could get DIV_NBR from the index and not touch the table.  The new index was on the columns RPT_NM, GEN_STAT, and DIV_NBR in that order.

Here is the new plan:

SQL_ID 77hcmt4kkr4b6
--------------------
SELECT DIV_NBR FROM DIV_RPT_GEN_CTL WHERE RPT_NM = :B1 AND
GEN_STAT = 1

Plan hash value: 2395994055

-------------------------------------------------------
| Id  | Operation        | Name               | Rows  |
-------------------------------------------------------
|   0 | SELECT STATEMENT |                    |       |
|   1 |  INDEX RANGE SCAN| DIV_RPT_GEN_CTL_U2 |     1 |
-------------------------------------------------------

Note that it uses the new index and does not access the table.  Here is the part of the AWR report for the problem query for last weekend:

   SQL Id         Elapsed (s)        Execs
-------------   ------------- ------------
77hcmt4kkr4b6       84,303.02 4.837909E+09

4.8 billion executions and only 84,000 seconds elapsed.  That is about 17.4 microseconds per execution.  That is less than half of what the average execution time was the weekend before the problem started.

The first Monday after we put the index in we found that one of the slow reports had its run time reduced from 70 minutes to 50 minutes.  It was great that we could improve the run time so much with such a simple fix.

It was a simple query to tune.  Add an index using the columns in the where clause and the one column in the select clause.  It was a tiny table that normally would not even need an index.  But, any query that an application executes billions of times in a day needs to execute in the most efficient way possible so it made sense to add the best possible index.

– Bobby

Categories: DBA Blogs

Long parse time with interval partitioning and subpartitions

Mon, 2015-02-09 13:52

In an earlier post I described how some queries with long parse times were causing long library cache lock waits.  Friday I applied a patch from Oracle that resolved the long parse times.  Here are the conditions which may relate to this bug:

  1. Interval partitioned table
  2. Partitioned by range
  3. Sub-partitioned by list
  4. Open ended range on partitioning column in where clause
  5. Tens of thousands of sub-partitions

Prior to applying the patch I did an explain plan on two versions of the problematic query.  One version specified an open-ended range on the partitioning column and the other closed off the range.

Slow version:

...
T392658.CLNDR_DT >= TO_DATE('2014-11-17' , 'YYYY-MM-DD') and 
...
Explained.

Elapsed: 00:00:46.20

Fast version:

...
T392658.CLNDR_DT >= TO_DATE('2014-11-17' , 'YYYY-MM-DD') and
T392658.CLNDR_DT <= TO_DATE('2014-11-26' , 'YYYY-MM-DD') and
...
Explained.

Elapsed: 00:00:00.09

The queries are the same except for the extra date condition which closes off the date range.  Note that the explain plan took 46 seconds with the open-ended range and less than a tenth of a second with the closed off range.

With the patch the slow version is just as fast as the fast one.

This is bug 20061582 in Oracle’s system.

From my conversations with Oracle it sounds like in certain cases the optimizer is reviewing information for many or all of the sub-partitions and since we have 20,000 for this table in production it can take minutes to parse.  I also messed with the dates on the open-ended condition and found that if I made the date early enough the parse time issue went away.  So, it seems that there is some set of conditions, which I don’t know how to easily reproduce, which lead the optimizer to look at sub-partition information and slows parse time.

This is on 11.2.0.4 on an Exadata system.  I was able to reproduce the long parse times on non-Exadata Linux 11.2.0.4 and 12.1.0.2 systems so it does not occur only on Exadata.

This issue surprised me because I thought that the optimizer would not look at partition or sub-partition statistics in a query that has a range condition which spanned more than one partition.  In the past I have always seen the global or table level stats used in these situations.  But, now I know that there are some cases where the optimizer will dig into the sub-partition stats even though the query covers more than one partition.

Looking at the plan of my slow parsing query before and after the patch I get a clue that the optimizer is looking at sub-partition stats:

Partition range without the patch:

KEY(SQ)|KEY(SQ)

Partition range with the patch:

     1 |   136

Evidently KEY(SQ) relates to some sort of filtering of the sub-partitions which cause the long parse time.  The manuals describe KEY(SQ) in a section titled “Dynamic Pruning with Subqueries” so maybe the problem queries have some sub-query that the optimizer was using to choose which sub-partitions that the query needed.

If you have an interval partitioned table with tens of thousands of sub-partitions and parse times in the minutes with open-ended ranges specified on the partitioning column your system could be hitting this same issue.

– Bobby

Categories: DBA Blogs

Arizona User Group Meeting tomorrow

Mon, 2015-01-19 14:07

Here is the link: url

It looks like our meeting tomorrow will be on Oracle 12c new features.  I’m looking forward to participating. :)

– Bobby

Categories: DBA Blogs

Cary Millsap

Wed, 2015-01-14 14:31

This is my third of four posts about people who have made a major impact on my Oracle database performance tuning journey.  This post is about Cary Millsap.  The previous two were about Craig Shallahamer and Don Burleson.

I am working through these four people in chronological order.  The biggest impact Cary Millsap had on me was through the book Optimizing Oracle Performance which he co-authored with Jeff Holt.  I have also heard Cary speak at conferences and we had him in for a product demo one time where I work.

I have delayed writing this post because I struggle to put into words why Cary’s book was so useful to me without repeating a long explanation of the book’s contents.  Just before reading the book I had worked on a system with high CPU usage and queuing for the CPU.  I had just read the paper “Microstate Response-time Performance Profiling” by Danisment Gazi Unal which talked about why CPU measurements in Oracle do not include time spent queued for the CPU.  Then I read Cary Millsap’s book and it was very enlightening.  For one thing, the book was very well written and written in a convincing way.  But the key concept was Cary Millsap’s idea of looking at the waits and CPU time that Oracle reports at a session level and comparing that to the real elapsed time.  This performance profile with waits, CPU, and elapsed time formed the basis of my first conference talk which I gave at Collaborate 06: PowerPoint, Word, Zip

Here is an example of a session profile from my presentation:

TIMESOURCE                  ELAPSED_SECONDS
--------------------------- ---------------
REALELAPSED                             141
CPU                                   44.81
SQL*Net message from client            9.27
db file sequential read                 .16

This is a profile of a session that spent roughly two-thirds of its time queued for the CPU.

Since reading Optimizing Oracle Performance I have resolved many performance problems by creatively applying the concepts in the book.  The book focuses on using traces to build profiles.  I have made my own scripts against V$ views and I have also used Precise.  I have used traces as the book suggests but only with TKPROF.  I have not had a chance to use the tool that the book describes, the Method R Profiler.

However I do it the focus is on waits, CPU as reported by Oracle, and real elapsed time all for a single session.  It is a powerful way to approach performance tuning and the main thing I learned from Cary Millsap.  I highly recommend Cary Millsap and Jeff Holt’s book to anyone who wants to learn more about Oracle database performance tuning because it made such a profound impact on my career.

– Bobby



Categories: DBA Blogs

Career Day – Oracle Database Administrator

Tue, 2015-01-06 16:10

I will be talking at my daughter’s high school for career day on Monday, explaining my job as an Oracle Database Administrator.  Wish me luck!

The funny thing is that no one understands what Oracle DBAs do, unless they are one or work closely with one.  I have a feeling that my talk is going to fall flat, but if it helps one of the students in any way it will be worth it.

To me the best thing about being an Oracle DBA is that you can do a pretty interesting and technically challenging job and companies that are not technology centric will still hire you to do it.  I’ve always been interested in computer technology but have worked in non-technical companies my entire career – mainly a non-profit ministry and a food distribution company.  Neither companies make computers or sell software!

My other thought is how available computer technology is to students today.  Oracle, in one of the company’s more brilliant moves, made all of its software available for download so students can try out the very expensive software for free.  Plus all the manuals are available online.  What is it like to grow up as a student interested in computer technology in the age of the internet?  I can’t begin to compare it to my days in the 1980s when I was in high school and college.  Did we even have email?  I guess we must have but I can’t remember using it much.  Today a student who owns a laptop and has an internet connection has a world of technology at their fingertips far beyond what I had at their age.

Hopefully I wont bore the students to tears talking about being an Oracle DBA.  They probably still won’t know what it really is after I’m done.  But at least they will know that such a job exists, and maybe that will be helpful to them.

– Bobby

P.S.  There were over 100 students there.  They were pretty polite with only a little talking.  Here is a picture of myself on the left, my daughter in the center, and a coworker who also spoke at the career day on the right.

careerday





Categories: DBA Blogs

AZORA – Arizona Oracle User Group meeting January 20th

Fri, 2014-12-12 16:25

AZORA is planning a meeting January 20th.  Here is the link to RSVP: url

Hope to see you there. :)

– Bobby

Categories: DBA Blogs

10128 trace to see partition pruning

Tue, 2014-12-09 10:57

I am working on an SR with Oracle support and they asked me to do a 10128 trace to look at how the optimizer is doing partition pruning.  I did some quick research on this trace and wanted to pass it along.

Here are the names of the two Oracle support documents that I found most helpful:

How to see Partition Pruning Occurred? (Doc ID 166118.1)

Partition Pruning Min/Max Optimization Fails when Parallel Query Run in Serial (Doc ID 1941770.1)

The first was the one Oracle support recommended.  But, the SR said to run both a level 2 and a level 7 trace and the first document did not mention level 7.  But, the second document has an example of a level 7 trace and more details on how to set it up.

I also found these two non-Oracle sites or blog posts:

http://cbohl.blogspot.com/2006/10/verify-that-partition-pruning-works.html

http://www.juliandyke.com/Diagnostics/Events/EventReference.html#10128

I do not have time to delve into this further now but if you are trying to understand partition pruning then the 10128 trace may help you understand how it works.

– Bobby


Categories: DBA Blogs

Parsing blocks stats blocks parsing

Mon, 2014-11-24 13:33

I had a five-minute conversation with Oracle development Friday that rocked my world.  I found out that parsing blocks stats which blocks parsing.

We have a system with queries that are taking minutes to parse.  These queries include the main tables on our database, one of which is interval partitioned and has 20,000 sub-partitions.  We have seen a situation where Oracle’s delivered statistics gathering job hangs on a library cache lock waiting for a query to finish parsing.  But, much worse than that, we find most queries on our system hanging on library cache lock waits blocked by the statistics job.

We have an SR open on this situation because it seems to be a bug, but Friday someone on the phone from Oracle development explained that this parse blocks stats blocks parse situation is normal.  Later after I got off the phone I built a simple test case proving that what he said was true.  I took a query that took a long time to parse in production and ran it on our development database and it took 16 seconds to parse there.  I choose the smallest table that the query included and gathered stats on it.  The stats ran in a fraction of a second when run by itself, but if I started the long parsing query in one window and ran the stats in another window the stats hung on a library cache lock wait for 15 seconds.  Then I created a trivial query against the same small table I had gathered stats on.  The query ran instantly by itself.  But, if I ran the long parsing query first, kicked off the stats which hung on the lock, and then kicked off the short query against the table I was gathering stats on the short query hung on a library cache lock also.  This example convinced me that the parse blocks stats blocks parse chain was real.

This morning I built a standalone test case that others can run to prove this out on their databases: zip of testcase.  To run the testcase you need three windows where you can run three sqlplus scripts in rapid succession.  In one window first just run tables.sql to create the test tables.  Then run these three scripts one after the other in each window to create the three link chain: chain1.sql, chain2.sql, chain3.sql.  Chain1.sql has the explain plan of a query that takes a long time to parse.  Chain2.sql gathers stats on one table.  Chain3.sql runs a simple query against the table whose stats are being gathered.  Chain1 spends all of its time on the CPU doing the parse.  Chain2 and 3 spends all of their time on library cache lock waits.

First I created two tables:

create table t1 as select * from dba_tables;
create table t2 as select * from dba_tables;

Next I kicked off the explain plan that takes a long time to run.  It joined 100 tables together:

explain plan into plan_table for 
select 
count(*)
from
     t1,
     t2,
     t2 t3,
...
     t2 t100
where
  t1.owner=t2.owner and
...
  t1.owner=t100.owner
/

This explain plan ran for 26 seconds, almost all of which was CPU:

Elapsed: 00:00:26.90

...

CPU in seconds
--------------
         26.81

Right after I kicked off the explain plan I kicked off this statement which gathered stats on the first table in the from clause:

execute dbms_stats.gather_table_stats(NULL,'T1');

This ran for 25 seconds and almost all of the time was spent on a library cache lock wait:

Elapsed: 00:00:25.77

...

Library cache lock in seconds
-----------------------------
                        25.55

Right after I kicked off the gather table stats command I ran this simple query making sure that it was unique and required a hard parse:

select /* comment to force hard parse */ count(*) from T1;

This ran for 24 seconds and almost all of the time was spent on a library cache lock wait:

Elapsed: 00:00:24.48

...

Library cache lock in seconds
-----------------------------
                        24.48

Evidently when a session parses a query it needs to obtain a shared lock on every table that the query includes.  When you gather statistics on a table you need to obtain an exclusive lock on the table, even if you are gathering statistics on one partition or sub-partition of the table.  While the statistics gathering session waits to acquire an exclusive lock any new parses that include the same table will hang.

Prior to Friday I did not think that there was any non-bug situation where gathering optimizer statistics would lock up sessions.  I thought that the only negative to gathering statistics at the same time as other application processing was that statistics gathering would compete for system resources such as CPU and I/O and possibly slow down application code.  But, now I know that gathering statistics can hang all queries that use the given table if stats gathering gets hung up waiting for a query that takes a long time to parse.

– Bobby

P.S. After reviewing the SR I wanted to understand what this parse blocks stats blocks parse looked like in a state dump.  The Oracle support analyst explained how the locks looked in a state dump that we uploaded but I didn’t get a chance to look at it closely until today.  I found the most important information in lines with the string “LibraryObjectLock” at the front of the line after some spaces or tabs.  There were three types of lines – the holding share lock, the waiting exclusive lock, and the many waiting share locks:

LibraryObjectLock:  Address=... Handle=0x5196c8908 Mode=S ...
LibraryObjectLock:  Address=... Handle=0x5196c8908 RequestMode=X ...
LibraryObjectLock:  Address=... Handle=0x5196c8908 RequestMode=S ...

The “…” indicates places I edited out other details.  The handle 0x5196c8908 identifies the table being locked.  The “Mode=S” string indicates a successful share lock of that table by the session with the long parse time.  The “RequestMode=X” was from the stats job trying to get exclusive access to the table.  The “RequestMode=S” was all the other sessions trying to get shared access to the table waiting for stats to first get exclusive access.  Anyway, I just wanted to translate what Oracle support told me into something that might be useful to others.  Plus I want to remember it myself!




Categories: DBA Blogs

Alerting on plans that change for the worse

Fri, 2014-11-14 16:45

I’ve uploaded a monitoring script that I have worked on: zip.

The script alerts you when the optimizer runs a SQL statement with a potentially new and inefficient plan so you can intervene.  This script improves upon my earlier script which only alerts you to SQL statements running with new plans.  The new script compares the average elapsed time of the current plan with the average of the most often executed plan.  If the new plan averages more than ten times the most often executed plan then the script alerts you to a possible new slow plan.  The elapsed time for the current plan comes from V$SQL and the time for the most often executed plan comes from DBA_HIST_SQLSTAT.

Here is an example output from the first test case:

SQL_ID        PLAN_HASH_VALUE
------------- ---------------
a1fw5xjcwkjqx      1357081020

There are two test cases included in the script.  The first test case has a query which uses an index and the plan changes when I drop the index.

If you look at the output for the first test case you see the good plan:

SQL_ID  a1fw5xjcwkjqx, child number 0
-------------------------------------
select /* 9 */ owner,table_name from test where owner='XYZ' and
table_name='XYZ'

Plan hash value: 268773832

---------------------------------------------------------------
| Id  | Operation        | Name  | Rows  | Bytes | Cost (%CPU)|
---------------------------------------------------------------
|   0 | SELECT STATEMENT |       |       |       |    58 (100)|
|*  1 |  INDEX RANGE SCAN| TESTI | 16222 |   221K|    58   (0)|
---------------------------------------------------------------

The good plan uses the index.  Here is the bad plan.  Note how the sql_id and plan_hash_value correspond to the output of the monitor script.

SQL_ID  a1fw5xjcwkjqx, child number 0
-------------------------------------
select /* 9 */ owner,table_name from test where owner='XYZ' and
table_name='XYZ'

Plan hash value: 1357081020

---------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)|
---------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |       |       | 15708 (100)|
|*  1 |  TABLE ACCESS FULL| TEST |  8111 |   110K| 15708   (1)|
---------------------------------------------------------------

I have this running in a cron job on a development server and I plan to put the script into our production server next week and set it up to alert me with an email when the optimizer runs SQL statements with potentially  new and inefficient plans.

– Bobby

 


Categories: DBA Blogs

Check out the new Oracle help web site

Thu, 2014-11-13 17:52

I found a broken link to an Oracle help document in one of my posts and when I went to the Oracle 12c database documentation to find the new URL to put in my post I found that Oracle had totally revamped their online manuals.

Here is a link to the new high level Oracle help site: url

I’ve only looked at it for a few minutes and can’t say whether I like it or not.  You can still download the manuals in PDF format so that remains familiar.  It looks like the new site integrates documentation across most or all of Oracle’s products in a similar format and that’s pretty cool.

Anyway, I just saw this and thought I would pass it along.

– Bobby


Categories: DBA Blogs