Re: AWR showing IO imbalance
Date: Tue, 7 Sep 2021 00:23:21 +0530
Message-ID: <CAEjw_fgYEwjMiG7=+g_ha3+OxM5msCeBDqTSWYmxpfwjGKT5gw_at_mail.gmail.com>
Don't see any screenshot of the discrepancy in IO stats which you stated. And where the disk capacity reaches its max limit.
On Mon, Sep 6, 2021 at 11:22 PM Lok P <loknath.73_at_gmail.com> wrote:
> It got bounced back , so resending...
>
> Hello Listers , It's a X5 half Rack exadata machine . We noticed a few
> times a job runs longer even with no change in execution path of any
> underlying sqls. And all those sqls top events are pointing to 'cell single
> block physical read'. Now looking into the exadata section of the AWR we
> find the below things. .
>
> The performance summary section of the Exadata stats shows ~90% flash
> cache hits and we are expecting this to happen because 'cell single block
> physical read' should be served from flash cache if it doesn't get
> the block in buffer cache. Also the "top IO reason by request" in exadata
> stats showing 'buffer cache reads' as the second highest consumer of IO for
> each cell node which looks to be expected.
>
> But then we are seeing the "Exadata OS IO statistics Top cells" section
> contradicting with "Exadata OS IO statistics Top disks". Exadata OS IO
> statistics Top cells neither show any of the flash nor hard disks
> saturation for the cell nodes.We do see the overall cell single block
> physical reads requests are four times more than normal time, so we are
> expecting those to be served with a similar response time if none of the
> flash/hard disk were running out of capacity. However, in "Exadata OS IO
> statistics Top disks" section , we see one of the hard disks in the cell
> node serving significantly more number requests and also reaching beyond
> its capacity as noted by "*" symbol and red colour in the AWR report as
> below and during this time other hard disks of the cell nodes appeared to
> be not much occupied. So I wanted to understand if it's normal behaviour or
> somehow we are hitting issues in terms of properly balancing IO across all
> the hard disks and it may be because of some wrong cell disk configurations
> or maybe some hardware issue?
>
> And in such a scenario if the single block reads are started getting
> served from the hard disk it can be as slow as the slowest harddisk
> response, correct me if wrong?
>
> Another question I have , As we are just having higher 'buffer cache
> reads' , if some of the reads/blocks won't be available in buffer cache and
> then we are searching storage layer/disk that will first served from "flash
> disk" and if not found will served from "hard disk", so is it correct to
> interpret Buffer cache reads is faster than flash cache reads which is
> faster than hard disk reads?
>
> Attached are the specific parts of the AWR copied. and the screenshots of
> the section showing one of the hard disks reaching beyond max capacity.
>
>
>
-- http://www.freelists.org/webpage/oracle-lReceived on Mon Sep 06 2021 - 20:53:21 CEST