Oracle FAQ | Your Portal to the Oracle Knowledge Grid |
![]() |
![]() |
Home -> Community -> Mailing Lists -> Oracle-L -> RE: SOS Alert
CASE In a Bank , Interest Calculation Batch Processing Unable to go beyond
500 A/cs per minute
ACTION Moved the Database to Another IBM Machine ( 4 CPU , 4 GB RAM ) with HARDWARE Mirroring & 64 K Stripe Size
NOTE - On the Under-performing DB Server S80 Model of IBM) there exists Software Mirroring (By the AIX O.S.) & a Stripe Size of 4K exist
Firing Off the Interest Calculation Runs from 2 APP Servers Giving an Overall Total Processing of Number of 3000 A/cs per minute ( 1500 from each )
Qs. Which of the 2 i.e. Hardware Mirroring OR 64 K Striping , do you think is the Cause of the Increase in performance as we put BOTH features Together at the SAME Time OR do Both Contribute Equally ?
Thanks to Vivek for everything
-----Original Message-----
From: VIVEK_SHARMA Sent: Thursday, December 07, 2000 6:35 PM Subject: RE: SOS Alert ACTION Identified the 2 indexes having the waits of "db file sequential read" Created 2 NEW Tablespaces Each containing 1 Index Also Specified parameter log_small_entry_max_size = 0 NOTE - ORA 7.3.4.5 RESULT Performance thruput increased to 1500 A/cs per minute from1000 A/cs per minute
Allow me to begin :- There were 2 routes in between the APP & DB Servers . One had a Bandwidth of 10 MBPS while Another had a Bandwidth of 100 MBPS The 10 MBPS route was being used all this while . On Enabling the 100 MBPS route (by Specifying the I.P. Address Instead of Hostname in the tnsnames.ora & listener.ora) the performance has jumped to 1100 records per minute (from 500 records per minute) i.e. Almost Doubled . 100 MBPS is the MAX Possible Bandwidth Possible as per the SA Qs. Your Views on this please Additional responses Follow your Views BELOW :-often.
-----Original Message-----
From: Vivek [SMTP:Vivek_at_1800FLOWERS.com] Sent: Tuesday, November 28, 2000 6:30 PM To: 'VIVEK_SHARMA' Cc: Arun Chakrapani Subject: RE: SOS Alert Vivek: Sorry for not replying to you yesterday itself. A Busy day. Took a look at you report.txt. They were having 2 different profiles. Since You had mentioned that it also stalled in Test DB I concentrated more on the test report.txt. One thing which stood out on the other one (live) on opening you report.txt is the SQL _AREA GET_HIT_RATIO is very poor. I don't know whether you have already pinned your SYS objects and other objects that are executed often in shared pool. Please use DBMS_SHARED_POOL.keep to pin package,procedure,function,sequence,cursors, triggers that are used
Ans OK will do so
So now I will concentrate on the Test DB.
OK Allow me to say that the shared_pool_size has by experience been kept at bare minimum required for we did experience performance poblems with a shred_pool_size of 300 MB ( Though on Another machine on SUN OS ) It is recommended to keep shared_pool_size at rock bottom just above the Limit where ORA-4031 out of shared pool segments occurs 2. On seeing you system wide wait events the following stands out. 1.log file sync 2863 83922 29.31 2.write complete waits 905 71401 78.9 3.free buffer waits 60 4718 78.63 The first event log file sync event mostly comes when there is a disk i/o bottleneck for the online redo logs.This happens when the processes have to wait for the LGWR to sync a particular REDO log. The p1 parameter in a Log sync in the corresponding LOg Buffer. The Write complete waits comes when the user process is waiting for the LGWR to complete the writing of the redo information. So Please take a look at the placing of the online redo log. In which disk have you placed this. Also if it has been place in its own file system it doesn't mean that it is not going to a busy disk. Since you have 0+1, The slices may belong to a HOT disk. What you need to do is ask your SA to monitor the disk I/O usage and see whats happening to the disk containing the redo log file. Also you can use iostat -xPnce 5 5 (I think its available in AIX). the output will look something like us sy wt id 23 7 49 21 extended device statistics ---- errors --- r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b s/w h/w trn tot device 0.0 0.0 0.0 0.0 0.0 0.0 0.0 13.4 0 0 0 0 0 0 c1t10d0s0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 2.4 0 0 0 0 0 0 c1t10d0s1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 0 0 c1t10d0s2 0.0 1.3 0.3 10.1 0.0 0.1 4.6 105.8 0 1 0 0 0 0 c1t10d0s3 Watch out for %b column. If its somewhere around 60 then you may be experiencing contention on those slices. Ask you SA by giving him/her the device name and they will be able to give you what the FS is. NOTE - 4 Datafiles Containing a Particular Table & it's Corresponding indexes (to Which INSERTS happen ) were Moved out of the RAID onto a Local Disk whereafter performance Jumped from 850 per minute to 1100 per minute Ans - OK . Will Issue the Command Also I see Free Buffer waits. This is caused because there are no free buffers when they are requested - pointing directly to the inefficiency of the DBWR. Again the DBWR may be slow because you have a I/O bottleneck for the DB files and indirectly because of the Write complete waits. Qs. Would you recommend Setting Multiple db_writer processes even though async_io is set to true ? " Thus a Volume Group Contains 8 Disks with 4 Disks > Stripe Size = 4K " [Vivek] Please confirm me that the Stripe size is 4K. Usually you will have a stripe size of 32K,64K,128K,256K .... If you are having a stripe size of 4K and your db_block_size as 8K, It is going to take 2 reads to satisfy one request for a db_file_sequential_read (Usually requests 1 Oracle block) [Vivek] . This doesn't sound good if thats true. NOTE - The "test" Database is a 4K DB_BLOCK_SIZE while the Live Database is an 8K DB_BLOCK_SIZE , But the SAME performance Limits were Observedon Both
So if possible could you please do the following.
set lines 132 select sid, substr(event,1,40),p1,p2,p3 from v$session_wait where event in ('db file scattered read','db file sequential read','latch free','buffer busy waits','log file sync','db file scattered read','enqueue'); and give me the output. OK - Will Action & revert to you 5. IF may waits are showing as "latch free" in the event column of v$session_wait, Please query v$latch for latch#,name,level,GETS,MISSES,SLEEPS IMMEDIATE_GETS ,WAITS_HOLDING_LATCH, SPIN_GETS, SLEEP1. Ans OK 6. Preserve the output for v$latch_children. Ans OK Qs. Are their any IBM OS Specific Parameters which are consideredbenefitial for performance ?
I will take a look at both the report.txt in more detail today and get back
to you.
Thanks Again so much Where are you Located / Phone No ? if you please We are Currently Located in Bangalore & Addressing The Problem at Mumbai over a WAN Phone 91 80 6588668 HTH Vivek > -----Original Message----- > From: VIVEK_SHARMA [SMTP:VIVEK_SHARMA_at_infy.com] > Sent: Monday, November 27, 2000 11:16 AM > To: 'Vivek' > Cc: Arun Chakrapani > Subject: RE: SOS Alert > > THANKS so much for the association in this issue > > Attached 2 report.txt each taken for 1 Hour approx . Is it Sufficient ? > > Attached report.txt of Problem on the "Live" Database > > <<report.txt_Live>> > NOTE - A "test" Database Created on the DB Server Itself in the Same > Partitions as the "Live" Database > Runs Done on this "test" Database Also failing to go beyond 500 A/cs per > minute > > > <<report.txt_TEST>> > > NOTE - Interest Runs were Done on the "Test" Database During Down-Time > Periods of the "Live" Database > > You Wrote :- > Since there is some write complete waits Please take a look at the > place the on line redo logs are and what is the read, write on these > disks. > > Ans Online Redo logs ALSO present on the RAID 0+1 Disks & are Multiplexed > Size of Online Redo logfile on the "Live" Database = 30 MB > > NOTE - On the test Database we increased the Size of the Online Redo > logfile > to 100MB but with NO benefit > > You wrote :- > Please give me the DB Version as well > > Ans Database Oracle version = 7.3.4.5.0 > > You wrote :- > I see some waits on More data from client. > Qs. What is to be done about the Same ? > > NOTE - Will Run the 2 SQL Scripts Suggested by you & revert back > > Thanks Again > > > > -----Original Message----- > > From: Vivek [SMTP:Vivek_at_1800FLOWERS.com] > > Sent: Monday, November 27, 2000 7:44 PM > > To: 'VIVEK_SHARMA_at_infy.com' > > > > Cc: Arun Chakrapani > > Subject: SOS Alert > > > > Hi Vivek: > > > > Will it be possible to send the whole report.txt to me. Since you have > not > > given for how long this was taken (time from utlbstat and utlestat) it > is > > more difficult to say anything. Based on the report.txt snippet you > have > > given there seems to me there are some issues with the enqueues and > latch > > free waits. Please run the following SQL as SYS so bet' 2 different > times > > ( > > say 1/2 apart) and send me both the values(It will be great if you could > > run > > this when the batch processing is running). > > > > select KSQSTTYP,KSQSTGET,KSQSTWAT from x$ksqst where KSQSTGET>0 and > > KSQSTWAT > > >0; > > > > The delta bet' the 2 results will give you information on what is the > > enqueue that we see more waits on. Also Please run this sql given below > in > > a > > cron every minute or so and output this to a file > > > > SELECT p1 "File", p2 "Block", p3 "Reason" FROM v$session_wait WHERE > event > > like 'buffer busy waits' ; > > > > This will give you and idea about the buffer busy waits. Since there is > > some > > write complete waits Please take a look at the place the on line redo > logs > > are and what is the read, write on these disks. Also Please give me > > infomation on the relog file size. Please give me the DB Version as > well. > > I > > see some waits on More data from client. Again since I don't know for > how > > long the stats were run can't say whether they are significant or not. > > > > HTH > > > > Vivek > > > > > > > > From: VIVEK_SHARMA <VIVEK_SHARMA_at_infy.com> > > Date: Mon, 27 Nov 2000 12:11:59 +0530 > > Subject: SOS Alert > > > > CASE In a Bank , Interest Calculation Batch Processing Unable to go > beyond > > 500 A/cs per minute > > > > CAUSE of the Problem is UNKNOWN > > > > ORACLE 7.3.4.5 on AIX 4.3.3 > > DB Server - IBM S80 Model - 12 CPUs , 3 GB RAM > > APP Server 1 - IBM S80 Model - 6 CPUs , 2 GB RAM > > APP Servers 2 & 3 - IBM Thin Nodes model - 4 CPUs , 1 GB RAM > > > > Storage Box :- > > =========== > > SS Class Storage > > RAID 0+1 - (First Striped & then Mirrored) > > NOTE - 2 Storage Boxes Exist , one being the Mirror of the Other > > Striping exists across a set 4 Disks (in one Box) with another 4 being > > it's > > mirror > > (in another Box). > > Thus a Volume Group Contains 8 Disks with 4 Disks > > Stripe Size = 4K > > > > NOTE - Runs Tried from BOTH the APP Servers 1 OR 2 ,But with the SAME > MAX > > of > > 500 A/cs > > processed per minute > > > > CPU Utilizations on BOTH APP & DB Server = 40 % > > wio% on BOTH APP & DB Servers = 35 % > > No paging happening on Both APP & DB Servers > > > > - Oracle Contention Values Seem Small to us as shown Below or so they > seem > > to us :- > > > > SVRMGR> Rem System wide wait events for non-background processes (PMON, > > SVRMGR> Rem SMON, etc). Times are in hundreths of seconds. Each one of > > > SVRMGR> Rem these is a context switch which costs CPU time. By looking > at > > SVRMGR> Rem the Total Time you can often determine what is the > bottleneck > > SVRMGR> Rem that processes are waiting for. This shows the total time > > spent > > SVRMGR> Rem waiting for a specific event and the average time per wait > on > > SVRMGR> Rem that event. > > SVRMGR> select n1.event "Event Name", > > 2> n1.event_count "Count", > > 3> n1.time_waited "Total Time", > > 4> round(n1.time_waited/n1.event_count, 2) "Avg Time" > > 5> from stats$event n1 > > 6> where n1.event_count > 0 > > 7> order by n1.time_waited desc; > > Event Name Count Total Time Avg Time > > > -------------------------------- ------------- ------------- > ------------- > > SQL*Net message from client 10856276 31977110 > 2.95 > > enqueue 1295 374980 > 289.56 > > db file sequential read 3614044 303848 > .08 > > write complete waits 5812 295937 > 50.92 > > latch free 5045060 242170 > .05 > > SQL*Net more data from client 13939 165275 > 11.86 > > log file sync 12794 146409 > 11.44 > > buffer busy waits 100443 92477 > .92 > > > > > > > > - ALL Literal SQLs were Converted to using Bind variables > > - ALL Tables Running on Indexes Without Any FULL Scans happening . > > > > - All the Literal SQLs (Dynamic) Converted to using Bind variables > (Static > > Queries) > > > > - event="10181 trace name context forever, level 1000" > > NOTE - Set nevertheless , None of the SQLs taking Excessive time to > parse > > though > > > > - NO statistics are AnalyzedReceived on Wed Dec 20 2000 - 01:24:35 CST
![]() |
![]() |