Oracle FAQ | Your Portal to the Oracle Knowledge Grid |
![]() |
![]() |
Home -> Community -> Mailing Lists -> Oracle-L -> RE: 64 node Oracle RAC Cluster (The reality of...)
>Kevin,
>I don't know for the others... but I'd like to keep reading
>this thread and how it is evolving.
>The discussion is interesting.
>
>Fabrizio
This thread gets more traction in this forum than suse-oracle as Fabrizio and I can attest. It seems over there that any platform software, regardless of quality, is best as long as it is free and open source...which I find particularly odd when choosing a platform to host the most expensive (and most feature rich) closed source software out there (Oracle). hmmm...
So the thread is a technical comparison of cluster filesystem architectures. Or at least a tip-toe through the tulips on horseback.
One camp being the central locking and metadata approach of the IBM GPFS, Sistina GFS, Veritas CFS camp versus the fully symmatric, distributed approach implemented by PolyServe on Linux and Windows.
The central approach is the easiest approach.
Period. That does not make them useless. On
the contrary, they are extremely good (better
than PolyServe) at HPC workloads. When you
compare more commerical-style workloads, like
email, the distributed, symmetric approach
bears fruit. Workloads like email are great for
making the point of whether a CFS is
general purpose and what isn't. See the following
URL of an independent test of an email system for
hundreds of thousands of users comparing
the various CFS technology out there (for Linux):
http://www.polyserve.com/pdf/Caspur_CS.pdf http://www.linuxelectrons.com/article.php/20050126205113614
Mladen asked about such intricasies as versioning and such. There is no such concept on the table. A CFS is responsible for keeping filesystem metadata coherent, applications are responsible for keeping file content coherent. Now, having said that, PolyServe supports positional locking and we do also maintain page cache coherency on a per-file granularity. So, if two processes in the cluster use a non-cluster-aware program, like vi, and set out to edit the same file in the CFS, the result will be that the last process to write the file will be the winner. This is how vi works on a non-CFS, so this should be expected.
Oracle file access characteristics are an entirely different story. Here, the application is cluster-aware so we've implemented a mount option for direct IO (akin to forcedirectio mount option in Solaris). Here, the IO requests are DMAed directly from the address space of the process to disk - without serialization or inode updates like [ma]time. The value add that we implented, however, is what sets this approach apart.
In the same filesystem, that is mounted with
the direct IO option, you can have one process
performing properly aligned Ios going through
the direct IO path (e.g., lgwr) while another
process is doing unaligned buffered IO. This
comes in handy, for instance, when you
have a process like ARCH spooling the archived
redo logs (direct IO) followed by compress/gzip
compressing down the file. Tools like compress
nearly always produce an output file that is not
a multiple of 512 bytes, so for that reason alone
it cannot use direct IO on any SCSI based system.
Lot's of stuff to consider in making a comprehensive
cluster platform for databases...
The concerns of a good CFS being able to handle text-mapping is not an issue. The following example is a small 10 node PolyServe Matrix (cluster). The test consists of first comparing 1000 executions of the Pro*C executable comparing to a non-CFS (reiserfs in this case).
First, prove that the test binary (proc in this case) is the same inode in the CFS on all 10 nodes:
$ for i in 1 2 3 4 5 6 7 8 9 10; do rsh mxserv$i "ls -i
$ORACLE_HOME/bin/proc"; done 2241437 /u01/app/oracle10/product/10.1.0/db_1/bin/proc 2241437 /u01/app/oracle10/product/10.1.0/db_1/bin/proc 2241437 /u01/app/oracle10/product/10.1.0/db_1/bin/proc 2241437 /u01/app/oracle10/product/10.1.0/db_1/bin/proc 2241437 /u01/app/oracle10/product/10.1.0/db_1/bin/proc 2241437 /u01/app/oracle10/product/10.1.0/db_1/bin/proc 2241437 /u01/app/oracle10/product/10.1.0/db_1/bin/proc 2241437 /u01/app/oracle10/product/10.1.0/db_1/bin/proc 2241437 /u01/app/oracle10/product/10.1.0/db_1/bin/proc 2241437 /u01/app/oracle10/product/10.1.0/db_1/bin/proc
Next, copy the proc executable to /tmp to get baseline non-CFS (reiserfs) to PolyServe CFS comparison:
$ cp $ORACLE_HOME/bin/proc /tmp
$ md5sum $ORACLE_HOME/bin/proc /tmp/proc
af42f080f2ddba7fe90530d15ac1880a
/u01/app/oracle10/product/10.1.0/db_1/bin/proc
af42f080f2ddba7fe90530d15ac1880a /tmp/proc
$
Next, a quick script to fire off 1000 concurrent invocations of the binary pointed to by arg1
$ cat t_proc
#!/bin/bash
binary=$1
getenv=$2
[[ ! -z "$2" ]] && cd ~oracle && . ./.bash_profile
cnt=0
until [ $cnt -eq 1000 ]
do
(( cnt = $cnt + 1 )) ( $binary sqlcheck=FULL foo.pc > /dev/null 2>&1) &
done
###End script
Next, execute the script under time(1) to get count of minor faults and execution time. When executed as /tmp/proc, the cost is 1020884 minor faults and 11.6 total complete time.
$ /usr/bin/time ./t_proc /tmp/proc
11.60user 10.42system 0:11.72elapsed 187%CPU (0avgtext+0avgdata
0maxresident)k
0inputs+0outputs (0major+1020884minor)pagefaults 0swaps
Next, execute the script pointing to the Shared Oracle Home copy of the proc executable:
$ echo $ORACLE_HOME
/u01/app/oracle10/product/10.1.0/db_1
$ /usr/bin/time ./t_proc $ORACLE_HOME/bin/proc
11.43user 10.52system 0:11.08elapsed 198%CPU (0avgtext+0avgdata
0maxresident)k
0inputs+0outputs (0major+1016753minor)pagefaults 0swaps
So, 1000 invocations parallelized as much as a dual proc system can muster yields the same execution performance on non-CFS as CFS.
Next, execute the script in parallel on 2,4 and then 10 nodes in parallel. Note, the timing granularity is seconds using the $SECONDS builtin variable.
$ cat para_t_proc
for i in 1 2
do
rsh mxserv$i "/u01/t_proc $ORACLE_HOME/bin/proc GETENV" &
done
wait
echo $SECONDS
for i in 1 2 3 4
do
rsh mxserv$i "/u01/t_proc $ORACLE_HOME/bin/proc GETENV" &
wait
echo $SECONDS
for i in 1 2 3 4 5 6 7 8 9 10
do
rsh mxserv$i "/u01/t_proc $ORACLE_HOME/bin/proc GETENV" &
done
wait
echo $SECONDS
$ sh ./para_t_proc
11
22
34
So, parallel and cluster-concurrent execution of bits is 100% linear scalable...as it should be. Otherwise, as I've ranted before, you would not be able to call it a CFS, or an FS at all for that matter :-)
-- http://www.freelists.org/webpage/oracle-lReceived on Wed Jun 22 2005 - 16:48:11 CDT
![]() |
![]() |