Oracle FAQ Your Portal to the Oracle Knowledge Grid
HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US
 

Home -> Community -> Mailing Lists -> Oracle-L -> Re: Moving db to linux

Re: Moving db to linux

From: Nuno Souto <dbvision_at_optusnet.com.au>
Date: Sun, 29 Feb 2004 17:48:10 +1100
Message-ID: <006601c3fe90$03113170$9b00a8c0@dcs001>

> I get EINVAL for the next I/O operation. Exactly the same thing happens
> to Oracle 10 when I set the DIRECTIO flag. I did strace of CKPT
> process, and that was precisely what I saw. I chose CKPT because

Bugger!

> > The problem is with your test case. The buffer passed to read and write
> > must be aligned to the file system's block size. From the open(2) man
> > page:
> >
> > "Transfer sizes, and the alignment of user buffer and file offset must
> > all be multiples of the logical block size of the file system."
> >
> > Attached is a modified version of the test case which succeeds on the
> > 2.4.25 kernel.
>
> The modified version is the one that you saw earlier.

Very much so. Thanks for sharing that. I'll bet someone at Oracle forgot about the alignment thing when compiling the Linux kernel. It can be done either via a compiler option or through runtime options to malloc. Either they forgot, or they used a fixed one that you'll have to guess. Try 1/2/4/8K for your f/s blocksize and Oracle DIO might (just) work fine with one of them...

Why is it that DIO has to go to buffers aligned in memory to the f/s block size? Because the DIO is done using the disk controller's DMA memory access DIRECTLY to the buffer cache. Not to an interim fixed buffer that then gets copied somewhere else.

And I/O controllers do not have the same number of address lines as normal memory. For example, they NEVER need to address anything less than 512 bytes (2**8) because no disk currently exists that can read or write less than that! And their size counter works in increments of 512 for the same reason.

So, when a disk controller wants to directly write to say, a 4K buffer in memory, it doesn't mean it wants a 4K size buffer starting ANYWHERE in memory. It does instead want a 4K buffer STARTING at a given 4K memory boundary. It can't address anything less precise than that for that size of I/O. There is more to this than I can fit here. A reading of the internals info on Seagate's and a few controller maker's sites is very educational, for those who want to be bothered with this level of detail.

DIO is low level IO and that means a compromise with the hardware characteristics. It won't always work like a simple s/w option.

Cheers
Nuno Souto
in sunny Sydney, Australia
dbvision_at_optusnet.com.au



Please see the official ORACLE-L FAQ: http://www.orafaq.com

To unsubscribe send email to: oracle-l-request_at_freelists.org put 'unsubscribe' in the subject line.
--
Archives are at http://www.freelists.org/archives/oracle-l/
FAQ is at http://www.freelists.org/help/fom-serve/cache/1.html
-----------------------------------------------------------------
Received on Sun Feb 29 2004 - 00:45:11 CST

Original text of this message

HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US