AWR showing IO imbalance
Date: Mon, 6 Sep 2021 23:22:16 +0530
Message-ID: <CAKna9VZg+xT0SWKkXb1FffcAaT=CRW3xen4CekvU_fopiAqZQQ_at_mail.gmail.com>
It got bounced back , so resending...
Hello Listers , It's a X5 half Rack exadata machine . We noticed a few times a job runs longer even with no change in execution path of any underlying sqls. And all those sqls top events are pointing to 'cell single block physical read'. Now looking into the exadata section of the AWR we find the below things. .
The performance summary section of the Exadata stats shows ~90% flash cache hits and we are expecting this to happen because 'cell single block physical read' should be served from flash cache if it doesn't get the block in buffer cache. Also the "top IO reason by request" in exadata stats showing 'buffer cache reads' as the second highest consumer of IO for each cell node which looks to be expected.
But then we are seeing the "Exadata OS IO statistics Top cells" section contradicting with "Exadata OS IO statistics Top disks". Exadata OS IO statistics Top cells neither show any of the flash nor hard disks saturation for the cell nodes.We do see the overall cell single block physical reads requests are four times more than normal time, so we are expecting those to be served with a similar response time if none of the flash/hard disk were running out of capacity. However, in "Exadata OS IO statistics Top disks" section , we see one of the hard disks in the cell node serving significantly more number requests and also reaching beyond its capacity as noted by "*" symbol and red colour in the AWR report as below and during this time other hard disks of the cell nodes appeared to be not much occupied. So I wanted to understand if it's normal behaviour or somehow we are hitting issues in terms of properly balancing IO across all the hard disks and it may be because of some wrong cell disk configurations or maybe some hardware issue?
And in such a scenario if the single block reads are started getting served from the hard disk it can be as slow as the slowest harddisk response, correct me if wrong?
Another question I have , As we are just having higher 'buffer cache reads' , if some of the reads/blocks won't be available in buffer cache and then we are searching storage layer/disk that will first served from "flash disk" and if not found will served from "hard disk", so is it correct to interpret Buffer cache reads is faster than flash cache reads which is faster than hard disk reads?
Attached are the specific parts of the AWR copied. and the screenshots of the section showing one of the hard disks reaching beyond max capacity.
--http://www.freelists.org/webpage/oracle-l
- application/vnd.openxmlformats-officedocument.spreadsheetml.sheet attachment: Selective_Exastats_Stats_Section.xlsx