RE: Oracle & pagefaults (solaris)
Date: Mon, 13 Aug 2012 10:44:58 -0400
Message-ID: <C95D75DD2E01DD4D81124D104D317ACA1C7517B8DB_at_JAXMSG01.crowley.com>
To throw some suggestions at the issue while you are waiting for some responses: Are you using ZFS? If so, have you looked at any of the following? ZFS for Databases:
http://www.solarisinternals.com/wiki/index.php/ZFS_for_Databases Limitations -- Full table scans, etc.
Variable performance of sequential reads - Since ZFS does block-level copy-on-write for all writes, the physical layout of a file will tend to become random over time for files that are updated randomly. Normal RDBMS operations such as the updating of tables and indexes characteristically result in such random writes. As a consequence, the performance of database queries that do sustained large sequential reads (including "full table scans", "full index scans", or CREATE INDEX operations) can deteriorate over time as the database is updated. Simple ad-hoc or utility queries such as SELECT COUNT (*) from a non-indexed table commonly cause such sustained sequential I/O demand. The worst case impact varies inversely with the ZFS record size, thus posing a tradeoff between the performance of random reads and the performance of sequential reads in relation to the degree of potential fragmentation.
http://developers.sun.com/solaris/docs/wp-oraclezfsconfig-0510_ds_ac2.pdf
Backups:
Copy operations, like backups, are similar to a full scan, long logical sequential reads and
are also subject to IOPS inflation as compared to traditional file system storage. The
elapsed time for a database backup can be due to the copy-on-write evolution
of the on-disk format for data files. Copying the data files, although cumbersome, might
help to relocate the blocks into a more continuous physical layout, particularly if the
pool is left with a minimum of free disk blocks.
This Guy, Constantin Gonzalez like ZFS and supports the defaults for most situations: http://constantin.glez.de/blog/2010/04/ten-ways-easily-improve-oracle-solaris-zfs-filesystem-performance
Evil Tuning Guide:
http://www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_Guide
Have you changed any of your SAN hardware or software; any other changes?
Joel Patterson
Database Administrator
904 727-2546
-----Original Message-----
From: oracle-l-bounce_at_freelists.org [mailto:oracle-l-bounce_at_freelists.org] On Behalf Of przemolicc_at_poczta.fm
Sent: Monday, August 13, 2012 10:27 AM
To: oracle-l_at_freelists.org
Subject: Oracle & pagefaults (solaris)
Hi all,
we have been facing performance problems on our Solaris 10 Sparc based server with several Oracle databases running on it. It started 2 months ago and gradually is increasing. There is always high level of syscalls when we face these problems:
09:20:03 25 8 0 68 09:30:02 22 7 0 70 09:40:03 23 8 0 70 09:50:03 23 7 0 69 10:00:03 22 9 0 69 10:10:03 25 34 0 41 10:20:03 28 28 0 44 10:30:03 27 24 0 49 10:40:02 24 8 0 68 10:50:03 27 19 0 54 11:00:02 31 27 0 41 11:10:21 29 48 0 23 11:20:03 36 56 0 9 11:30:05 32 51 0 18 11:40:03 37 43 0 20 11:50:06 28 59 0 14 12:00:06 32 61 0 7 12:10:03 38 26 0 36 12:20:02 34 9 0 57 12:30:02 35 8 0 58 12:40:03 35 9 0 56
Normal level of syscalls is about 7-10%. But when something is happening it increases to over 50%. Using DTrace I can find that kernel is working in 'pagefault' function:
......
unix`page_freelist_coalesce+0x894 [***] 0x1fe0000 unix`page_get_mnode_freelist+0x398 unix`page_get_freelist+0x428 unix`page_alloc_pages+0x110 genunix`anon_map_getpages+0x348 genunix`segvn_fault_anonpages+0x32c genunix`segvn_fault+0x530 genunix`as_fault+0x4c8 unix`pagefault+0x68 unix`trap+0xd50 unix`utl0+0x4c 29062 unix`cpu_halt+0x10c unix`cpu_halt+0x104 unix`idle+0x128 unix`thread_start+0x4 719133
(this is just part of the image - there are much more pagefaults then listed above). So natural is to check on behalf of which application kernel is working in pagefaults:
[ High syscalls ]
CPU ID FUNCTION:NAME 96 85656 :tick-5sec nscd 1 top 1 dtrace 5 perl 122 emagent 124 tnslsnr 1569 oracle 5538
[ Low (normal) syscalls ]
CPU ID FUNCTION:NAME 20 85656 :tick-5sec nscd 1 dtrace 5 emagent 41 sh 91 perl 93 emdctl 243 tnslsnr 320 oracle 2466
When I watch which Oracle's PID are generating these high syscalls using 'ps' most of them don't exists (!). Are they short lived processes ? Our DBA claims that this is not a problem of Oracle but DTrace shows that system is working on behalf of Oracle. Can you shed some light where our DBAs should look for any solution ? When I analyzed number of connections in listener log they don't differ much between low and high system calls.
Can you help me please ? I don't feel comfortable because my feeling is that I am doing DBAs homework but they claim that AWR report is the same as before so there is nothing they could do.
Best regards
Przemek
-- http://www.freelists.org/webpage/oracle-l -- http://www.freelists.org/webpage/oracle-lReceived on Mon Aug 13 2012 - 09:44:58 CDT