Oracle FAQ | Your Portal to the Oracle Knowledge Grid |
![]() |
![]() |
Home -> Community -> Mailing Lists -> Oracle-L -> Re: Moving db to linux
Nuno, here's an excerpt from IBM JFS manual:
conditions)
· Truncating regular file
You are right with logging for metadata only, but not so right with direct I/O. Most file systems simply ignore request for open with O_DIRECT, XFS reports an error on Linux (at a subsequent read/write one gets EINVAL) , but works as advertized on Irix. Below is a little program that I used to test direct I/O:
#include <stdio.h> #include <sys/types.h> #include <sys/stat.h> #include <asm/fcntl.h> #include <errno.h> #include <string.h> #define BUFFSIZE 65536 #define ALIGN 4096
main() {
char *buff;
int stat1=0,stat2=0,stat3=0;
int fd1=0,fd2=0;
if (stat3=posix_memalign(&buff,ALIGN,BUFFSIZE)) {
fprintf(stderr,"ALIGN ERR:%s\n",strerror(stat3)); exit(0);
fd1=open("xxx", O_RDONLY|O_DIRECT,S_IRWXU);
fd2=open("yyy",O_CREAT|O_WRONLY|O_DIRECT,S_IRWXU); while(stat1=read(fd1,buff,BUFFSIZE)) {
if (errno) { fprintf(stderr,"READ ERR:%s\n",strerror(errno)); exit(0); } stat2=write(fd2,buff,(unsigned) stat1); if (errno) { fprintf(stderr,"WRITE ERR:%s\n",strerror(errno)); exit(0); }
On 02/28/2004 10:13:12 AM, Nuno Souto wrote:
> ----- Original Message ----- > From: "Mladen Gogala" <mgogala_at_adelphia.net> > > > Journalling for files is a concept similar to redo in the world > > of oracle. > > No, it MOST DEFINITELY is not. Journalled file systems are similar > to redo ONLY for file system metadata. NOT for the data itself! > > > With JFS, you get the process called jfsCommit running, > > which "commits" buffer operations. Each filehandle operation like > > "flush" or "close" is a "commit". > > So it is in a non-journalled file system. "flush" has existed in > normal file systems since the year dot and does exactly and precisely that. > There is also a background process in non-JFS file systems that flushes > every 30 seconds or so: it's called "sync". > > > Basically, journalled FS guarantees > > that the data written down synchronously will really written down > > to the disk device(s). > > ANY file system guarantees that data written synchronously > is really written to the disk device. > Synchronous access is NOT a synonym for journalling. > > > If you can do DIO, your data is a little bit > > safer. > > Most file systems can do DIO. It's got nothing to do with > journalling itself. > > >What a journalling FS protects you against is a huge data loss > > of blocks that were in the buffer cache. > > NO WAY! If you do NOT write synchronously in a JFS, you WILL > lose ANY data blocks in the cache! > > And to write synchronously you have to use synchronous I/O, > DIO or frequent "flushes". Which you can equally do in ANY file > system, be it journalled or not. > > I repeat: Synchronous writing has NOTHING to do with journalling. > > > > What a JFS really does is to automatically (like it or not) write > - synchronously - to a journal file, ANY changes to file system METADATA. > IOW, any changes that involve creation/delete files, allocation of > disk space or freeing of disk space. > > Those and ONLY those are recovered after a system crash, by simply > reading from the journal file. Instead of inspecting the ENTIRE file > system looking for broken metadata. Which is what fsck does in a > non-journalled file system. > > With the result (in a JFS) that you do not lose large chunks of a file. > This is the problem that fsck has with non-journaled file systems: > sometimes it cannot recover the metadata and it loses track of an entire > space > allocation for a file. Which can be a substantial part of the file. This > happens mostly when files are very volatile or constantly changing in > allocation. > > Which is NOT the case for Oracle datafiles. They are pre-allocated > and do not often change in size. > > > It's high time this myth of journalled file systems "protecting" > data is exposed. A run-of-the-mill JFS does NOT protect data blocks inside > files, it protects ONLY the file system's own meta data! That is certainly > the case of ext3, JFS, NTFS and many other journalled f/s. Veritas > is the only JFS I know of that can ALSO protect the data but that is > an add-on, not a characteristic of JFS. > > > > Historical note: > This f/s metadata thing is the major factor why I never lost a benchmark > against > Ingres: journalled file systems were unknown back then and Ingres did not > use the concept of pre-allocated datafiles like Oracle. Their tables were > stored one table per file, with dynamic space management done by the file > system itself. With the result that if you specified a benchmark where > tables > were dropped/re-created and inserted/deleted from and you pulled the plug > half > way through, you'd have a very high probability fsck would NOT recover the > file system where the Ingres database was. > > While Oracle would quietly just rollback the last transaction and keep > going. > After the fsck was finished, of course. Remember: no JFS back then! Not > once > did I have to use the redo log. Datafiles were pre-allocated and the f/s > metadata > never changed, no matter how busy the system was. > > > As well, not ONCE did Ingres survive this little "technique"! > Cheers > Nuno Souto > in sunny Sydney, Australia > dbvision_at_optusnet.com.au > > ---------------------------------------------------------------- > Please see the official ORACLE-L FAQ: http://www.orafaq.com > ---------------------------------------------------------------- > To unsubscribe send email to: oracle-l-request_at_freelists.org > put 'unsubscribe' in the subject line. > -- > Archives are at http://www.freelists.org/archives/oracle-l/ > FAQ is at http://www.freelists.org/help/fom-serve/cache/1.html > ----------------------------------------------------------------- >
-- Mladen Gogala Oracle DBA ---------------------------------------------------------------- Please see the official ORACLE-L FAQ: http://www.orafaq.com ---------------------------------------------------------------- To unsubscribe send email to: oracle-l-request_at_freelists.org put 'unsubscribe' in the subject line. -- Archives are at http://www.freelists.org/archives/oracle-l/ FAQ is at http://www.freelists.org/help/fom-serve/cache/1.html -----------------------------------------------------------------Received on Sat Feb 28 2004 - 12:35:19 CST
![]() |
![]() |