Skip navigation.

Doug Burns

Syndicate content
Updated: 4 hours 14 min ago

Recurring Conversations: AWR Intervals (Part 2)

6 hours 18 min ago

(Reminder, just in case we still need it, that the use of features in this post require Diagnostics Pack license.)

Damn me for taking so long to write blog posts these days. By the time I get around to them, certain very knowledgeable people have commented on part 1 and given the game away! ;-)

I finished the last part by suggesting that a narrow AWR interval makes less sense in a post-10g Diagnostics Pack landscape than it used to when we used Statspack.

Why do people argue for a Statspack/AWR interval of 15 or 30 minutes on important systems? Because when they encounter a performance problem that is happening right now or didn’t last for very long in the past, they can drill into a more narrow period of time in an attempt to improve the quality of the data available to them and any analysis based on it. (As an aside, I’m sure most of us have generated additional Statspack/AWR snapshots manually to *really* reduce the time scope to what is happening right now on the system, although this is not very smart if you’re using AWR and Adaptive Thresholds!)

However, there are better tools for the job these days.

If I have a user complaining about system performance then I would ideally want to narrow down the scope of the performance metrics to that user’s activity over the period of time they’re experiencing a slow-down. That can be a little difficult on modern systems that use complex connection pools, though. Which session should I trace? How do I capture what has already happened as well as what’s happening right now? Fortunately, if I’ve already paid for Diagnostics Pack then I have *Active Session History* at my disposal, constantly recording snapshots of information for all active sessions. In which case, why not look at

- The session or sessions of interest (which could also be *all* active sessions if I suspect a system-wide issue)
- For the short period of time I’m interested in
- To see what they’re actually doing

Rather than running a system-wide report for a 15 minute interval that aggregates the data I’m interested in with other irrelevant data? (To say nothing of having to wait for the next AWR snapshot or take a manual one and screwing up the regular AWR intervals ...)

When analysing system performance, it’s important to use the most appropriate tool for the job and, in particular, focus your data collection on what is *relevant to the problem under investigation*. The beauty of ASH is that if I’m not sure what *is* relevant yet, I can start with a wide scope of all sessions to help me find the session or sessions of interest and gradually narrow my focus. It has the history that AWR has, but with finer granularity of scope (whether that be sessions, sql statements, modules, actions or one of the many other ASH dimensions). Better still, if the issue turns out to be one long-running SQL statement, then a SQL Monitoring Active Report probably blows all the other tools out of the water!

With all that capability, why are experienced people still so obsessed with the Top 5 Timed Events section of an AWR report as one of their first points of reference? Is it just because they’ve become attached to it over the years of using Statspack? AWR has it’s uses (see JB’s comments for some thoughts on that and I’ve blogged about it extensively in the past) but analysing specific performance issues on Production databases is not it’s strength. In fact, if we’re going to use AWR, why not just use ADDM and let software perform automatically the same type of analysis most DBAs would do anyway (and in many cases, not as well!)

Remember, there’s a reason behind these Recurring Conversations posts. If I didn’t keep finding myself debating these issues with experienced Oracle techies, I wouldn’t harbour doubts about what seem to be common approaches. In this case, I still think there are far too many people using AWR where ASH or SQL Monitoring are far more appropriate tools. I also think that if we stick with a one hour interval rather than a 15 minute interval, we can retain four times as much *history* in the same space! When it comes to AWR – give me long retention over a shorter interval every time!

P.S. As well as thanking JB for his usual insightful comments, I also want to thank Martin Paul Nash. When I was giving an AWR/ASH presentation at this springs OUGN conference, he noticed the bullet point I had on the slide suggesting that we *shouldn’t* change the AWR interval and asked why. Rather than going into it at the time, I asked him to remind me at the end of the presentation and then because I had no time to answer, I promised I’d be blogging about it that weekend. That was almost 4 months ago! Sigh. But at least I got there in the end! ;-)

Recurring Conversations: AWR Intervals (Part 1)

Mon, 2014-07-07 07:36
I've seen plenty of blog posts and discussions over the years about the need to increase the default AWR retention period beyond the default value of 8 days. Experienced Oracle folk understand how useful it is to have a longer history of performance metrics to cover an entire workload period so that we can, for example, compare the acceptable performance of the last month end batch processes to the living hell of the current month end. You'll often hear a suggested minimum of 35-42 days and I could make good arguments for even more history for trending and capacity management.

That subject has been covered well enough, in my opinion. (To pick one example, this post and it's comments are around 5 years old.)  Diagnostics Pack customers should almost always increase the default AWR retention period for important systems, even allowing for any additional space required in the SYSAUX tablespace.

However, I've found myself talking about the best default AWR snapshot *interval* several times over recent months and years and realising that I'm slightly out of step with the prevailing wisdom on the subject, so let's talk about intervals.

I'll kick off by saying that I think people should stick to the default 1 hour interval, rather than the 15 or 30 minute intervals that most of my peers seem to want. Let me explain why.

Initially I was influenced by some of the performance guys working in Oracle and I remember being surprised by their insistence that one hour is a good interval, which is why they picked it. Hold on, though - doesn't everyone know that a 1 hour AWR report smoothes out detail too much?

Then I got into some discussions about Adaptive Thresholds and it started to make more sense. If you want to compare performance metrics over time and trigger alerts automatically based on apparently unusual performance events or workload profiles, then comparing specific hours today to specific hours a month ago makes more sense than getting down to 15 minute intervals which would be far too sensitive to subtle changes. Adaptive Thresholds would become barking mad if the interval granularity was too fine. But when nobody used Adaptive Thresholds too much even though they seemed like a good idea (sorry JB ;-)) this argument started to make less sense to me.

However, I still think that there are very solid reasons to stick to 1 hour and they make more sense when you understand all of the metrics and analysis tools at your disposal and treat them as a box of tools appropriate to different problems.

Let's go back to why people think that a 1 hour interval is too long. The problem with AWR, Statspack and bstat/estat is that they are system-wide reporting tools that capture the difference (or deltas) between the values of various metrics over a given interval. There are at least a couple of problems with that that come to mind.

1) Although a bit of a simplification, almost all of the metrics are system-wide which makes them a poor data source for analysing an individual users performance experience or an individual batch job because systems generally have a mixture of different activities running concurrently. (Benchmarks and load tests are notable exceptions.)

2) Problem 1 becomes worse when you are looking at *all* of the activity that occurred over a given period of time (the AWR Interval), condensed into a single data set or report. The longer the AWR period you report on, the more useless the data becomes. What use is an AWR report covering a one week period? So much has happened during that time and we might only be interested in what was happening at 2:13 am this morning.

In other words, AWR reports combine a wide activity scope (everything on the system) with a wide time scope (hours or days if generated without thought). Intelligent performance folks reduce the impact of the latter problem by narrowing the time scope and reducing the snapshot interval so that if a problem has just happened or is happening right now, they can focus on the right 15 minutes of activity1.

Which makes complete sense in the Statspack world they grew up in, but makes a lot less sense since Oracle 10g was released in 2004! These days there are probably better tools for what you're trying to achieve.

But, as this post is already getting pretty long, I'll leave that for Part 2.

1The natural endpoint to this narrowing of time scope is when people use tools like Swingbench for load testing and select the option to generate AWR snapshots immediately before and after the test they're running. Any AWR report of that interval will only contain the relevant information if the test is the only thing running on the system. At last year's Openworld, Graham Wood and I also covered the narrowing of the Activity scope by, for example, running the AWR SQL report (awrrpt.sql) to limit the report to a single SQL statement of interest. It's easy for people to forget - it's a *suite* of tools and worth knowing the full range so that you pick the appropriate one for the problem at hand.

OUG Scotland

Wed, 2014-05-28 03:41
Wow, it's that time of year again already and the largest free independent Oracle conference is approaching. Yep, it costs absolutely nothing to attend OUG Scotland and you don't need to be a member of UKOUG. Just bring your brain along with you and enjoy what looks like a stonkingly good agenda, filled with some of the best speakers out there. I had a great time last year and would like to have presented this year but ... erm ... I have a few things going on at the moment.

As always, kudos to Thomas Presslie and all the good people in the UKOUG office who work so hard putting this together.