Re: Oracle 12.1.0.2, RHEL 7 and XFS issue

From: Stefan Koehler <contact_at_soocs.de>
Date: Thu, 30 Jul 2015 22:29:13 +0200 (CEST)
Message-ID: <1709291619.128220.1438288153314.JavaMail.open-xchange_at_app03.ox.hosteurope.de>



Hi Uwe,

> Do you know of any issues with XFS on Linux 7 with direct I/O?

I am not aware of anything special about XFS and OEL 7 in this case. FYI i recently did some benchmarks with XFS and OEL 6.6 (with help of SLOB) at client site and it was an ASM raw device like performance.

> Do you have any suggestions how to further track down the issue? E.g., how could I prove there's something wrong with the O_DIRECT calls?

  1. You can track down the I/O histogram to microsecond level with Oracle 12.1.0.2. Luca Canali already created a script for this ( http://db-blog.web.cern.ch/blog/luca-canali/2015-06-event-histogram-metric-and-oracle-12c ).
  2. I can not see the I/O reference values from your old 11.1 environment. What was the repsonse time there?
  3. Did you use the same amount of LUNs (keyword disk queues) in both envs?
  4. You currently don't know where the "time" is spent and so you have to drill down the I/O stack (System Call Interface -> Virtual File System -> Block Layer -> SCSI layer -> Device driver). You can use blktrace to dig into the block layer, compare the response times and check if the possible time difference is driven above or below / in the block layer. Frits Hoogland has written a blog post about this ( https://fritshoogland.wordpress.com/2014/11/28/physical-io-on-linux/ ). If it is in the VFS layer you can use XFS stats for further information (https://www.kernel.org/doc/Documentation/filesystems/xfs.txt - /proc/fs/xfs/stat).
  5. Compare the response times with the storage capabilities, e.g. what kind of repsonse time should be expected for a 8k request (asuming 8k block size) from a storage perspective.

Hope this helps as a start.

P.S.: You mentioned a "new VM" - what kind of VM do you use? There could be nasty side effects as well, if not configured / tested out correctly.

Best Regards
Stefan Koehler

Freelance Oracle performance consultant and researcher Homepage: http://www.soocs.de
Twitter: _at_OracleSK

> Uwe Küchler <uwe_at_kuechler.org> hat am 30. Juli 2015 um 21:14 geschrieben:
>
>
> Dear fellows of the Oracle,
>
> >From Red Hat and Oracle Linux 7 onwards, XFS is the default file system of
> the OS.
> At a customer site, XFS was already the preferred file system, so the
> customer chose to stick to it for a new VM with OL7 and Oracle 12.1.0.2.
>
> But, while testing the migrated database against the old one, most of the
> batch jobs showed a slowdown to at least twice the run time than in the
> 11.1 environment.
>
> Both statspack reports showed clearly that "db file sequential read" was
> by far the main wait event.
> Top SQL and their explain plans did not differ between the environments.
>
> Research took a while, but to get to the point: It boiled down to the I/O
> response times, as shown in the wait event histogram excerpts below:
>
> With
> - 24 GiB RAM
> - 5 GiB sga_target
> - Buffer Cache: 4,592M
> - "filesystemio_options=NONE":
> Total ----------------- % of Waits
> ------------------
> Event Waits <1ms <2ms <4ms <8ms <16ms <32ms <=1s
> >1s
> -------------------------- ----- ----- ----- ----- ----- ----- ----- -----
> -----
> db file scattered read 836 89.7 1.6 .8 1.8 3.1 1.9 1.1
> db file sequential read 121K 83.8 .7 .8 3.9 6.1 2.8 1.9
> .0
>
> 80-90% of those waits < 1ms?
> This can most certainly be attributed to file system caching (no Flash
> Cache, SSD or other smart stuff in place here).
>
> - 24 GiB RAM
> - 8 GiB sga_target
> - Buffer Cache: 5,856M
> - With "filesystemio_options=SETALL":
> Total ----------------- % of Waits
> ------------------
> Event Waits <1ms <2ms <4ms <8ms <16ms <32ms <=1s
> >1s
> -------------------------- ----- ----- ----- ----- ----- ----- ----- -----
> -----
> db file scattered read 63K 42.6 1.9 3.2 19.9 26.0 4.0 2.4
> db file sequential read 208K 47.1 1.6 3.7 19.0 21.9 4.5 2.3
> .0
>
> In other batch job runs, the amount of waits < 1 ms was even lower (some
> 30%).
> As you can see, I made the SGA / the buffer cache bigger in the 12c
> environment, to allow for more buffering within the SGA.
>
> Of course, I checked MOS for any known issues with Direct I/O on XFS in
> this constellation, but haven't found anything so far. Just the usual
> recommendations to avoid double buffering and also the confirmation that
> XFS is capable of doing direct I/O.
>
> And now for my
> QUESTION(s):
> ============
> Do you know of any issues with XFS on Linux 7 with direct I/O?
> Do you have any suggestions how to further track down the issue? E.g., how
> could I prove there's something wrong with the O_DIRECT calls?
>
> Thanks for your time.
> Uwe
>
>
> P.S.: On (a hundred-and-) second thought I could try to enlarge the buffer
> cache even more, as there's enough RAM left. At least for the tests.
>
> P.P.S.: In case Kevin Closson reads this: I am eagerly awaiting your
> upcoming blog article on XFS!
>
> ---
> http://oraculix.com

--
http://www.freelists.org/webpage/oracle-l
Received on Thu Jul 30 2015 - 22:29:13 CEST

Original text of this message