RE: System stats

From: Stefan Koehler <contact_at_soocs.de>
Date: Mon, 15 Apr 2019 11:29:51 +0200 (CEST)
Message-ID: <1424937505.175578.1555320591702_at_ox.hosteurope.de>


Hello Jay,
well it depends on the kind of storage you have (SCSI, iSCSI, NFS, etc.).

However you can do a lot of analysis on OS level to proof that this is not database, OS or host related. A first starting point is good old iostat ( https://bartsjerps.com/2011/03/04/io-bottleneck-linux/ ) - however be aware that svctm may not be reliable depending on your storage.

Further analysis can be done with blktrace ( e.g. https://www.duo.uio.no/bitstream/handle/10852/10099/CarlHenrikLunde.pdf?sequence=1&isAllowed=y ) and so on.

Please be aware that there is a common mistake that Oracle DBAs claim that the storage is slow (because they notice high I/O latency in AWR, extended SQL-Trace, etc.) but the storage guys really don't see any high latency on their sub-system because the problem is in-between both layers (e.g. I/O scheduler, I/O device-disk queues, etc.) - however you can figure this out with the previously mentioned tools.

P.S.: It is kind of amusing that NetApp is praised as the holy grail in this mail history because it was NetApp that removed their performance analysis tool (if I remember correctly it was/is called LaTX or so) from public. You still can capture the perf data but need to upload them to NetApp afterwards for analysis.

Best Regards
Stefan Koehler

Independent Oracle performance consultant and researcher Website: http://www.soocs.de
Twitter: _at_OracleSK

> dmarc-noreply_at_freelists.org hat am 12. April 2019 um 16:23 geschrieben:
>
> Can you recommend any sort of monitoring to identify when a SAN is getting overloaded? In our case it only became apparent when an app started experiencing latency at the same time for 5-10 minutes every day and we tracked it down to a batch job which was running on an entirely different cluster but which shared the same storage unit. Storage denied it was their problem right up until the point we proved it was.
>
> It would have been nice to have known that before the problems started showing up. Getting a new storage unit is a slow process.
>
> Jay Miller
> Sr. Oracle DBA
> 201.369.8355

--
http://www.freelists.org/webpage/oracle-l
Received on Mon Apr 15 2019 - 11:29:51 CEST

Original text of this message