Excessive cpu usage on 10.2.0.3 linux 64bit
Date: Wed, 25 Jun 2008 13:19:20 +0200
Message-ID: <23e0d1170806250419p4c22c3f8s2a54ef34f8229cfe@mail.gmail.com>
Hi all,
I'm facing a problem on Oracle 10.2.0.3 linux 64bit (Centos 5)
I'm issuing a simple delete from command, the table has 4M rows and is
referenced by some fk (but there is and index for every fk on the
other tables).
the command takes ages to complete and I cannot figure why since two
days ago there were no problems.
Since it'is a development box I hadn't collected any performance data
before the issue.
This is what I've done in order to collect some information.
I issued a query on v$session_longops that shows a very slow progress
in getting a single block from the table
I enabled sql_trace with binds and here is an excerpt
WAIT #2: nam='free buffer waits' ela= 10728 file#=2 block#=59589
set-id#=5 obj#=46865 tim=1185928593541132
WAIT #2: nam='free buffer waits' ela= 10726 file#=2 block#=59589
set-id#=5 obj#=46865 tim=1185928593551874
WAIT #2: nam='free buffer waits' ela= 10729 file#=2 block#=59589
set-id#=5 obj#=46865 tim=1185928593562617
*** 2008-06-25 12:47:59.815
WAIT #2: nam='free buffer waits' ela= 6959 file#=2 block#=59589
set-id#=5 obj#=46865 tim=1185928593569595
WAIT #2: nam='free buffer waits' ela= 221 file#=2 block#=59589
set-id#=5 obj#=46865 tim=1185928593569854
WAIT #2: nam='db file scattered read' ela= 105346 file#=7 block#=106821 blocks=16 obj#=46828 tim=1185928594430683 WAIT #2: nam='db file sequential read' ela= 6993 file#=8 block#=934757blocks=1 obj#=46865 tim=1185928595079089 WAIT #2: nam='db file sequential read' ela= 4705 file#=8 block#=131725 blocks=1 obj#=46865 tim=1185928597295791 WAIT #2: nam='db file sequential read' ela= 6222 file#=8 block#=118799 blocks=1 obj#=46864 tim=1185928597498068 WAIT #2: nam='db file sequential read' ela= 9507 file#=8 block#=131805 blocks=1 obj#=46865 tim=1185928599659415 WAIT #2: nam='db file sequential read' ela= 9257 file#=8 block#=118559 blocks=1 obj#=46864 tim=1185928599885609 WAIT #2: nam='db file sequential read' ela= 15414 file#=8 block#=764020 blocks=1 obj#=46864 tim=1185928602481449
*** 2008-06-25 12:48:12.280
WAIT #2: nam='latch free' ela= 36 address=4318780640 number=127 tries=1 obj#=-1 tim=1185928605742858 WAIT #2: nam='db file sequential read' ela= 38090 file#=8block#=118736 blocks=1 obj#=46864 tim=1185928605981000
WAIT #2: nam='latch free' ela= 23 address=4318780640 number=127 tries=1 obj#=-1 tim=1185928609090528 WAIT #2: nam='latch free' ela= 44 address=4318780640 number=127 tries=1 obj#=-1 tim=1185928610275061 WAIT #2: nam='db file sequential read' ela= 3839 file#=8 block#=761139blocks=1 obj#=46864 tim=1185928612233425 WAIT #2: nam='db file sequential read' ela= 24 file#=8 block#=934722 blocks=1 obj#=46865 tim=1185928612798930
*** 2008-06-25 12:48:23.585
WAIT #2: nam='db file sequential read' ela= 17800 file#=8 block#=763028 blocks=1 obj#=46864 tim=1185928616782421
I noticed the 10 second pauses and i tried to do some other investigations
select * from v$session_wait where sid=145 usually shows db file sequential read event
select * from v$session_event where sid=145 shows the i/o events as the most time consuming (high values in the time_waited_micro)
select * from v$sess_time_model where sid=145 shows
145 3649082374 DB time 2410134327 145 2748282437 DB CPU 2328552497
......
So far it seems that the process is consuming a lot of cpu
the ps -fe confirmed that the unix process corresponing to the command
executed is eating 94% of the CPU
I did a strace of the unix process and this is an excerpt:
poll([{fd=13, events=POLLIN|POLLRDNORM}], 1, 0) = 0
times(NULL) = 498061066 times(NULL) = 498061066 times(NULL) = 498061066 times(NULL) = 498061066 times(NULL) = 498061066 times(NULL) = 498061066 times(NULL) = 498061068 times(NULL) = 498061068 times(NULL) = 498061068 times(NULL) = 498061068 times(NULL) = 498061068 times(NULL) = 498061068 times(NULL) = 498061069 times(NULL) = 498061069 times(NULL) = 498061069 times(NULL) = 498061069 times(NULL) = 498061069 times(NULL) = 498061069 poll([{fd=13, events=POLLIN|POLLRDNORM}], 1, 0) = 0 times(NULL) = 498061071 times(NULL) = 498061071 times(NULL) = 498061071 times(NULL) = 498061071 times(NULL) = 498061071 times(NULL) = 498061071 times(NULL) = 498061072 times(NULL) = 498061072pread(12, "\6\242\0\0\215\320\1\2\6\361)#\0\0\f\4B\335\0\0\2\0.\0\20\267\0\0\25\26\35#"..., 8192, 974233600) = 8192
times(NULL) = 498061073 times(NULL) = 498061073 times(NULL) = 498061073 times(NULL) = 498061073 times(NULL) = 498061075 times(NULL) = 498061075 times(NULL) = 498061075 times(NULL) = 498061075 times(NULL) = 498061075 times(NULL) = 498061075poll([{fd=13, events=POLLIN|POLLRDNORM}], 1, 0) = 0
I'm not linux an expert but it seems a polling process. the command iostat 5 -k shows very low rates (some days ago I saw values 10 times higher while executing the same sql command)
Can anyone tell me if there is some other investigation I can do? does anyopne has any advice?
Thanks a lot,
Giovanni
-- -------------------------------------------------------------------- "You don't know the power of dark side" - Darth Vader -- http://www.freelists.org/webpage/oracle-lReceived on Wed Jun 25 2008 - 06:19:20 CDT