Re: Performance issue - high Free buff wait
Date: Sun, 17 Jan 2021 16:00:55 +0000
Message-ID: <CAGtsp8kUe+bD-4X0pmgDqbx3H_WiEY2ea+DpJxf9sm5X-qUVVw_at_mail.gmail.com>
- No.
- No
Going back to first principles - typical Unix systems time-slice CPUs 1/100
of a second at a time and OLTP systems do tiny amounts of work that are
likely to take far less than 1/100 of a second if they're written
efficiently. So a well-written OLTP system can probably support a few tens
of user sessions per (real) CPUs because most of the session will spend
most of their time not doing anything.
If, however, you have 64 CPUs and 64 sessions which are trying to model
protein folding images from a chemical formula then you've allocated too
many sessions because every single one of them will want 100% of a CPU 100%
of the time and you've got no CPU left for anything else. More
realistically, if you have a couple of sessions which are trying to do
something resource intensive like running a query over the last 2 years of
sales data to produce a one page summary of customer trends then they are
probably taking out one CPU per session (or N CPUs each if you run the
queries parallel N). So rule 1 is "know your application"
Now take a look at the two bits of AWR you first posted.
1) The average single block read time in BOTH time periods is around 1ms -
which means its coming from cache or flash; so it's likely to be more CPU
than I/O time.
2) The number of db block changes in the heavy period is about 60,000 per
SECOND, compared to about 11,000. (Halve these figures since roughly half
will be for undo records going to pinned blocks) - so in the busy time
period you are potentially make 30,000 blocks dirty per SECOND and will
need to write them out, which is about 240MB per second, and your SGA size
is only 24MB so you may be trying to recycle your buffer cache incredibly
quickly.
3) As a side effect of the db block changes you are generating 12.7MB of
redo per second compared to 2.8MB per second. In a production system not
only does that have to be written to at least 2 files, it has to be re-read
and copied to an archived log destination.
Combining 2 and 3 it's not surprising that the rate of data change means
you have a huge amount of conflict in (a) writing data (b) finding free
buffers to read more data and (c) waitinig for other users to read the
data you want to read.
It's not easy to determine cause and effect after this - but the rate at
which you work in that 15 minutes is simply overloading the system. So the
first thing I'd do is try to find out what that 64 CPUs / 64 Cores really
means. HP/UX is one of the less well-known operating systems, so any
comment you hear about "unix systems" might be very misleading - but to me
it looks as if you may not really have 64 CPUs. Then I'd ask about the I/O
configuration - I know HP/UX is a bit unusual here, but what are the
setting to avoid double caching - you have 512GB of RAM and assign 24GB to
this SGA - are you "wasting" a lot of memory (and CPU) to managing a
file-system. If not, how hard are the other applications / databases on
this machine working during this period.
The other thing that stands out as a separate anomaly is the parsing -
which ends up with lots of "lilbrary cache - mutex X". The counts are high
in both cases and worth looking into, but very expensive in the worst
case. How many of your "parse count" turn into "parse count (hard)", and
what does the time model show as the distribution of parse time. At a guess
I'd say that in this interval you kick of several processes (maybe around
10) that are all trying to optimize the same statements at the same time -
which leads to the "mutex X" (the rough idea is that 9 out of 10 would be
waiting while the 10th does the optimising) - anifd maybe there are a lot
of statments which need a lot of parsing in an environment that is already
generaing a lot of CPU conflict. Note that your EXECUTE count jumps from
6,400 per second to 13,200 per second - you mentioned single-row
processing, the numbers suggest that that might be the case, but also that
a lot of them use literal values, and several sessions may be trying to
parse and execute the same statement at the same time.
Summary:
Regards
On Sun, 17 Jan 2021 at 11:18, Pap <oracle.developer35_at_gmail.com> wrote:
> Basically I have two doubts here,
Check your system and how its configured.
Understand what the application is doing in this 15 minutes
Then worry about getting the same job done with less work and
(particularly) less conflict.
Jonathan Lewis
>
> As it seems for 15minutes period we are pushing the System beyond its
> capacity. So considering we have 64 core CPU here , should we only allow
> max 64 active session at any given point from application? And then
> considering we have 4 app server we need to set max session limit to 16
> from each app server?
>
> Another doubt I had was if the dB writer response time of 100ms during
> that issue period is the cause or the result of slowness? If it's the cause
> can asynch IO setup for data file help improve situation here? Or by just
> making row by row operations converted to batch will be enough ?
>
> Regards
> Pap
>
>
<http://www.avg.com/email-signature?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail>
Virus-free.
<http://www.avg.com/email-signature?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail>
<#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>
-- http://www.freelists.org/webpage/oracle-lReceived on Sun Jan 17 2021 - 17:00:55 CET