| Oracle FAQ | Your Portal to the Oracle Knowledge Grid | |
|  |  | |||
Home -> Community -> Mailing Lists -> Oracle-L -> RE: SOS Alert
This message is in MIME format. Since your mail reader does not understand
this format, some or all of this message may not be legible.
------_=_NextPart_001_01C06C3D.239F2110
Content-Type: text/plain;
charset="iso-8859-1"
Interesting question! My stab in the dark, shot from the hip, wild-ass guess is as follows:
Since modern OSes can cache code segments via shared libraries, there is a *very high* read to write ratio (i.e. it is read once from disk into OS cache, and re-used from there by many folks ) and so the "read from disk" is not a significant performance issue.
Where disk config *would* become significant would be in case of failure: What is faster? Recover by backup? Software re-install? And, what is more likely to fail? Software striped across ten disks _without_ a mirror is more likely to become unavailable than software located on one disk, just because of the inverse additive effect of individual disk MTBFs in a stripe set. So, if you must stripe app binaries, mirror them.
There are other comments about code re-use in the OS, but if i start on those, this won't be a "shot from the hip" anymore.
Maybe Eric Pierce can Google us up some pithy and provocative URLs to keep us thinking on this topic!
hth
Ross
-----Original Message-----
From: VIVEK_SHARMA [mailto:VIVEK_SHARMA_at_infy.com]
Sent: Friday, December 22, 2000 1:26 AM
To: 'ORACLE-L_at_fatcity.com'; 'MohanR_at_STARS-SMI.com'; 'turner_at_tellme.com';
'oracledba_at_lazydba.com'
Subject: RE: SOS Alert 
A related Basic Qs.
Can the Application Software ( Banking Application of pro*c Executables in this Case) be put on Striped Disks belonging to the APP Server ?
Does it Have any Certain Negative performance impacts ? OR is Performance the Same whether the Application Lies on Striped Disks or NON-Striped Disks ?
> -----Original Message-----
> From:	Mohan, Ross [SMTP:MohanR_at_STARS-SMI.com]
> Sent:	Thursday, December 21, 2000 1:51 AM
> To:	Multiple recipients of list ORACLE-L
> Subject:	RE: SOS Alert 
> 
> 4K striping is a major Major MAJOR mistake. 
> 
> Doing it in software makes it at least 3x as 
> bad. 
> 
> Either fix alone would have provided at least a 
> 2-4x improvement. 
> 
> Given a choice, I'd fix the stripe size first, then 
> convert the mirroring to hardware. 
> 
> hth 
> 
> imho 
> 
> ymmv 
> 
> afaik, 
> 
> etc. 
> 
> - Ross Mohan 
> 
> -----Original Message----- 
> From: VIVEK_SHARMA [ <mailto:VIVEK_SHARMA_at_infy.com>] 
> Sent: Wednesday, December 20, 2000 11:13 AM 
> To: Multiple recipients of list ORACLE-L 
> Subject: RE: SOS Alert 
> 
> 
> CASE In a Bank , Interest Calculation Batch Processing Unable to go beyond
> 
> 500 A/cs per minute 
> 
> ACTION Moved the Database to Another IBM Machine ( 4 CPU , 4 GB RAM ) with
> 
> HARDWARE Mirroring & 64 K Stripe Size 
> 
> NOTE - On the Under-performing DB Server S80 Model of IBM) there exists 
> Software Mirroring (By the AIX O.S.) & a Stripe Size of 4K exist 
> 
> Firing Off the Interest Calculation Runs from 2 APP Servers Giving an 
> Overall Total  Processing of Number of 3000 A/cs per minute ( 1500 from
> each 
> ) 
> 
> Qs. Which of the 2 i.e. Hardware Mirroring OR 64 K Striping , do you think
> 
> is the Cause of the Increase in performance as we put BOTH features
> Together 
> at the SAME Time OR do Both Contribute Equally ? 
> 
> Thanks to Vivek for everything 
> 
> 
>         -----Original Message----- 
>         From:   VIVEK_SHARMA 
>         Sent:   Thursday, December 07, 2000 6:35 PM 
>         Subject:        RE: SOS Alert 
> 
> 
> 
>         ACTION Identified the 2 indexes having the waits of "db file 
> sequential read" 
>         Created 2 NEW Tablespaces Each containing 1 Index 
>         Also Specified parameter log_small_entry_max_size = 0 
>         NOTE - ORA 7.3.4.5 
> 
>         RESULT Performance thruput increased to 1500 A/cs per minute from 
> 1000 A/cs per minute 
> 
> 
>         ============================================== 
>         Allow me to begin :- 
>         There were 2 routes in between the APP & DB Servers . 
>         One had a Bandwidth of 10 MBPS while Another had a Bandwidth of
> 100 
> MBPS 
>         The 10 MBPS route was being used all this while . On Enabling the 
> 100 MBPS route 
>         (by Specifying the I.P. Address Instead of Hostname in the 
> tnsnames.ora & listener.ora) 
>         the performance has jumped to 1100 records per minute (from 500 
> records per minute) 
>         i.e. Almost Doubled . 
>         100 MBPS is the MAX Possible Bandwidth Possible as per the SA 
> 
>         Qs. Your Views on this please 
> 
>         Additional responses Follow your Views BELOW :- 
> 
>         -----Original Message----- 
>         From:   Vivek [SMTP:Vivek_at_1800FLOWERS.com] 
>         Sent:   Tuesday, November 28, 2000 6:30 PM 
>         To:     'VIVEK_SHARMA' 
>         Cc:     Arun Chakrapani 
>         Subject:        RE: SOS Alert 
> 
>         Vivek: 
> 
>           Sorry for not replying to you yesterday itself. A Busy day. Took
> a 
> look at 
>         you report.txt. They were having 2 different profiles. Since You
> had 
>         mentioned that it also stalled in  Test DB I concentrated more on 
> the test 
>         report.txt. One thing which stood out on the other one (live) on 
> opening you 
>         report.txt is the SQL _AREA GET_HIT_RATIO is very poor. I don't
> know 
> whether 
>         you have already pinned your SYS objects and other objects that
> are 
> executed 
>         often in shared pool. Please use DBMS_SHARED_POOL.keep to pin 
>         package,procedure,function,sequence,cursors, triggers that are
> used 
> often. 
> 
>         Ans OK will do so 
> 
>         So now I will concentrate on the Test DB. 
> 
>         1. Even though the SQL_AREA GET_HIT_RATIO is higer(82%) when 
> compared to 
>         live this is not good enough. It should be close to 100%(1 some 
> where like 
>         .98 alteast). Please pin the SYS objects to start with. 
> 
>         OK 
>         Allow me to say that the shared_pool_size has by experience been 
> kept at bare minimum required 
>         for we did experience performance poblems with a shred_pool_size
> of 
> 300 MB 
>         ( Though on Another machine on SUN OS ) 
>         It is recommended to keep shared_pool_size at rock bottom just
> above 
> the Limit 
>         where ORA-4031 out of shared pool segments occurs 
> 
> 
>         2. On seeing you system wide wait events the following stands out.
> 
> 
>             1.log file sync                    2863        83922
> 29.31 
>             2.write complete waits             905         71401      78.9
> 
>             3.free buffer waits                60          4718
> 78.63 
> 
>             The first event log file sync event mostly comes when there is
> a 
> disk 
>         i/o bottleneck for the online redo logs.This happens when the 
> processes have 
>         to wait for the LGWR to sync a particular REDO log. The p1
> parameter 
> in a 
>         Log sync in the corresponding LOg Buffer. 
> 
> 
>         The Write complete waits comes 
>         when the user process is waiting for the LGWR to complete the 
> writing of the 
>         redo information.  So Please take a look at the placing of the 
> online redo 
>         log. In which disk have you placed this. Also if it has been place
> 
> in its 
>         own file system it doesn't mean that it is not going to a busy
> disk. 
> Since 
>         you have 0+1, The slices may belong to a HOT disk. What you need
> to 
> do is 
>         ask your SA to monitor the disk I/O usage and see whats happening
> to 
> the 
>         disk containing the redo log file. Also you can use iostat -xPnce
> 5 
> 5 (I 
>         think its available in AIX). the output will look something like 
> 
>         us sy wt id 
>          23  7 49 21 
>                                       extended device statistics ----
> errors 
> --- 
>           r/s  w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b s/w h/w
> trn 
> tot 
>         device 
>           0.0  0.0    0.0    0.0  0.0  0.0    0.0   13.4   0   0   0   0
> 0 
> 0 
>         c1t10d0s0 
>           0.0  0.0    0.0    0.0  0.0  0.0    0.0    2.4   0   0   0   0
> 0 
> 0 
>         c1t10d0s1 
>           0.0  0.0    0.0    0.0  0.0  0.0    0.0    0.0   0   0   0   0
> 0 
> 0 
>         c1t10d0s2 
>           0.0  1.3    0.3   10.1  0.0  0.1    4.6  105.8   0   1   0   0
> 0 
> 0 
>         c1t10d0s3 
>         Watch out for %b column. If its somewhere around 60 then you may
> be 
>         experiencing contention on those slices. Ask you SA by giving 
> him/her the 
>         device name and they will be able to give you what the FS is. 
> 
>         NOTE - 4 Datafiles Containing a Particular Table & it's 
> Corresponding indexes (to Which INSERTS 
>         happen ) were Moved out of the RAID onto a Local Disk whereafter 
> performance Jumped from 
>         850 per minute to 1100  per minute 
> 
>         Ans - OK . Will Issue the Command 
> 
>         Also I see 
>         Free Buffer waits. This is caused because there are no free
> buffers 
> when 
>         they are requested - pointing directly to the inefficiency of the 
> DBWR. 
>         Again the DBWR may be slow because you have a I/O bottleneck for
> the 
> DB 
>         files and indirectly because of the Write complete waits. 
> 
>         Qs. Would you recommend Setting Multiple db_writer processes 
>         even though async_io is set to true ? 
> 
> 
>                 " Thus a Volume Group Contains 8 Disks with 4 Disks 
>                 > Stripe Size = 4K " 
> 
> 
>         [Vivek]  Please confirm me that the Stripe size is 4K. Usually you
> 
> will have 
>         a stripe size of 32K,64K,128K,256K .... If you are having a stripe
> 
> size of 
>         4K and your db_block_size as 8K, It is going to take 2 reads to 
> satisfy one 
>         request for a db_file_sequential_read (Usually requests 1 Oracle 
> block) 
>         [Vivek]  . This doesn't sound good if thats true. 
> 
>         NOTE - The "test" Database is a 4K DB_BLOCK_SIZE while the Live 
> Database is 
>         an 8K DB_BLOCK_SIZE , But the SAME performance Limits were
> Observed 
> on Both 
> 
>         So if possible could you please do the following. 
> 
>         1. Please shutdown the Test db and start fresh. 
>         2. Run the same process again and also run the iostat -xPnce 60 
> 10000 for 
>         the duration of the run (output this to a file) 
>         3. Run vmstat and see whats the CPU usage for USER and SYSTEM 
>         4. When the 500 A/c per sec reaches - Please run 
> 
>         set lines 132 
>         select sid, substr(event,1,40),p1,p2,p3 from v$session_wait 
>         where event in ('db file scattered read','db file sequential 
> read','latch 
>         free','buffer busy waits','log file sync','db file scattered 
>         read','enqueue'); 
> 
>         and give me the output.  
> 
>         OK - Will Action & revert to you 
> 
>         5. IF may waits are showing as "latch free" in the event column of
> 
>         v$session_wait, Please query v$latch for 
>         latch#,name,level,GETS,MISSES,SLEEPS IMMEDIATE_GETS 
>         ,WAITS_HOLDING_LATCH, SPIN_GETS, SLEEP1. 
> 
>         Ans OK 
> 
>         6. Preserve the output for v$latch_children. 
> 
>         Ans OK 
> 
>         Qs. Are their any IBM OS Specific Parameters which are considered 
> benefitial for performance ? 
> 
> 
>         I will take a look at both the report.txt in more detail today and
> 
> get back 
>         to you. 
> 
>         Thanks Again so much 
>         Where are you Located  / Phone No ? if you please 
>         We are Currently Located in Bangalore & Addressing The Problem at 
> Mumbai over a WAN 
>         Phone 91 80 6588668 
> 
>         HTH 
> 
>         Vivek 
> 
>         > -----Original Message----- 
>         > From: VIVEK_SHARMA [SMTP:VIVEK_SHARMA_at_infy.com] 
>         > Sent: Monday, November 27, 2000 11:16 AM 
>         > To:   'Vivek' 
>         > Cc:   Arun Chakrapani 
>         > Subject:      RE: SOS Alert 
>         > 
>         > THANKS so much for the association in this issue 
>         > 
>         > Attached 2 report.txt each taken for 1 Hour approx . Is it 
> Sufficient ? 
>         > 
>         > Attached  report.txt of  Problem on the "Live" Database 
>         > 
>         >  <<report.txt_Live>> 
>         > NOTE - A "test" Database  Created on the DB Server Itself  in
> the 
> Same 
>         > Partitions as the "Live" Database 
>         > Runs Done on this "test" Database Also failing to go beyond 500 
> A/cs per 
>         > minute 
>         > 
>         > 
>         >  <<report.txt_TEST>> 
>         > 
>         > NOTE - Interest Runs were Done on the "Test" Database During 
> Down-Time 
>         > Periods of the "Live" Database 
>         > 
>         > You Wrote :- 
>         >       Since there is some write complete waits Please take a
> look 
> at the 
>         > place the on line redo logs are and what is the read, write on 
> these 
>         > disks. 
>         > 
>         > Ans Online Redo logs ALSO present on the RAID 0+1 Disks & are 
> Multiplexed 
>         > Size of Online Redo logfile on the "Live" Database = 30 MB 
>         > 
>         > NOTE - On the test Database we increased the Size of the Online 
> Redo 
>         > logfile 
>         > to 100MB but with NO benefit 
>         > 
>         > You wrote :- 
>         > Please give me the DB Version as well 
>         > 
>         > Ans Database Oracle  version = 7.3.4.5.0 
>         > 
>         > You wrote :- 
>         >       I see some waits on More data from client. 
>         > Qs. What is to be done about the Same ? 
>         > 
>         > NOTE - Will Run the 2 SQL Scripts Suggested by you & revert back
> 
>         > 
>         > Thanks Again 
>         > 
>         > 
>         > > -----Original Message----- 
>         > > From:       Vivek [SMTP:Vivek_at_1800FLOWERS.com] 
>         > > Sent:       Monday, November 27, 2000 7:44 PM 
>         > > To: 'VIVEK_SHARMA_at_infy.com' 
>         > > 
>         > > Cc: Arun Chakrapani 
>         > > Subject:    SOS Alert 
>         > > 
>         > > Hi Vivek: 
>         > > 
>         > > Will it be possible to send the whole report.txt to me. Since 
> you have 
>         > not 
>         > > given for how long this was taken (time from utlbstat and 
> utlestat) it 
>         > is 
>         > > more difficult to say anything.  Based on the report.txt
> snippet 
> you 
>         > have 
>         > > given there seems to me there are some issues with the
> enqueues 
> and 
>         > latch 
>         > > free waits. Please run the following SQL as SYS so bet' 2 
> different 
>         > times 
>         > > ( 
>         > > say 1/2 apart) and send me both the values(It will be great if
> 
> you could 
>         > > run 
>         > > this when the batch processing is running). 
>         > > 
>         > > select KSQSTTYP,KSQSTGET,KSQSTWAT from x$ksqst where
> KSQSTGET>0 
> and 
>         > > KSQSTWAT 
>         > > >0; 
>         > > 
>         > > The delta bet' the 2 results will give you information on what
> 
> is the 
>         > > enqueue that we see more waits on. Also Please run this sql 
> given below 
>         > in 
>         > > a 
>         > > cron every minute or so and output this to a file 
>         > > 
>         > > SELECT p1 "File", p2 "Block", p3 "Reason"  FROM v$session_wait
> 
> WHERE 
>         > event 
>         > > like  'buffer busy waits'  ; 
>         > > 
>         > > This will give you and idea about the buffer busy waits. Since
> 
> there is 
>         > > some 
>         > > write complete waits Please take a look at the place the on
> line 
> redo 
>         > logs 
>         > > are and what is the read, write on these disks. Also Please
> give 
> me 
>         > > infomation on the relog file size. Please give me the DB
> Version 
> as 
>         > well. 
>         > > I 
>         > > see some waits on More data from client. Again since I don't 
> know for 
>         > how 
>         > > long the stats were run can't say whether they are significant
> 
> or not. 
>         > > 
>         > > HTH 
>         > > 
>         > > Vivek 
>         > > 
>         > > 
>         > > 
>         > > From: VIVEK_SHARMA <VIVEK_SHARMA_at_infy.com> 
>         > >  Date: Mon, 27 Nov 2000 12:11:59 +0530 
>         > >  Subject: SOS Alert 
>         > > 
>         > > CASE In a Bank , Interest Calculation Batch Processing Unable
> to 
> go 
>         > beyond 
>         > > 500 A/cs per minute 
>         > > 
>         > > CAUSE of the Problem is UNKNOWN 
>         > > 
>         > > ORACLE 7.3.4.5 on AIX 4.3.3 
>         > > DB Server - IBM S80 Model - 12 CPUs , 3 GB RAM 
>         > > APP Server 1 - IBM S80 Model - 6 CPUs , 2 GB RAM 
>         > > APP Servers 2 & 3 - IBM Thin Nodes model - 4 CPUs , 1 GB RAM 
>         > > 
>         > > Storage Box :- 
>         > > =========== 
>         > > SS Class Storage 
>         > > RAID 0+1 - (First Striped & then Mirrored) 
>         > > NOTE - 2 Storage Boxes Exist , one being the Mirror of the
> Other 
>         > > Striping exists across a set 4 Disks (in one Box) with another
> 4 
> being 
>         > > it's 
>         > > mirror 
>         > > (in another Box). 
>         > > Thus a Volume Group Contains 8 Disks with 4 Disks 
>         > > Stripe Size = 4K 
>         > >  
>         > > NOTE - Runs Tried from BOTH the APP Servers 1 OR 2 ,But with
> the 
> SAME 
>         > MAX 
>         > > of 
>         > > 500 A/cs 
>         > > processed per minute 
>         > > 
>         > > CPU Utilizations on BOTH APP & DB Server = 40 % 
>         > > wio% on BOTH APP & DB Servers = 35 % 
>         > > No paging happening on Both APP & DB Servers 
>         > > 
>         > > - Oracle Contention Values Seem Small to us as shown Below or
> so 
> they 
>         > seem 
>         > > to us :- 
>         > > 
>         > > SVRMGR> Rem System wide wait events for non-background
> processes 
> (PMON, 
>         > > SVRMGR> Rem SMON, etc).  Times are in hundreths of seconds. 
> Each one of 
>         > 
>         > > SVRMGR> Rem these is a context switch which costs CPU time.
> By 
> looking 
>         > at 
>         > > SVRMGR> Rem the Total Time you can often determine what is the
> 
>         > bottleneck 
>         > > SVRMGR> Rem that processes are waiting for.  This shows the 
> total time 
>         > > spent 
>         > > SVRMGR> Rem waiting for a specific event and the average time 
> per wait 
>         > on 
>         > > SVRMGR> Rem that event. 
>         > > SVRMGR> select      n1.event "Event Name", 
>         > >      2>             n1.event_count "Count", 
>         > >      3>     n1.time_waited "Total Time", 
>         > >      4>     round(n1.time_waited/n1.event_count, 2) "Avg Time"
> 
>         > >      5>    from stats$event n1 
>         > >      6>    where n1.event_count > 0 
>         > >      7>    order by n1.time_waited desc; 
>         > > Event Name                       Count         Total Time
> Avg 
> Time 
>         > 
>         > > -------------------------------- ------------- ------------- 
>         > ------------- 
>         > > SQL*Net message from client           10856276      31977110 
>         > 2.95 
>         > > enqueue                                   1295        374980 
>         > 289.56 
>         > > db file sequential read                3614044        303848 
>         > .08 
>         > > write complete waits                      5812        295937 
>         > 50.92 
>         > > latch free                             5045060        242170 
>         > .05 
>         > > SQL*Net more data from client            13939        165275 
>         > 11.86 
>         > > log file sync                            12794        146409 
>         > 11.44 
>         > > buffer busy waits                       100443         92477 
>         > .92 
>         > > 
>         > > 
>         > > 
>         > > - ALL Literal SQLs were Converted to using Bind variables 
>         > > - ALL Tables Running on Indexes Without Any FULL Scans
> happening 
> . 
>         > > 
>         > > - All the Literal SQLs (Dynamic) Converted to using Bind 
> variables 
>         > (Static 
>         > > Queries) 
>         > > 
>         > > - event="10181 trace name context forever, level 1000" 
>         > > NOTE - Set nevertheless , None of the SQLs taking Excessive
> time 
> to 
>         > parse 
>         > > though 
>         > > 
>         > > - NO statistics are Analyzed 
>         > > - 7 tables involved in the interest Calc. 
>         > > inserts to 2 Tables 
>         > > selects , updates in 5 tables 
>         > > 
>         > > Qs. Are there Any BASIC O.S./Hardware Features to Enhance 
> Performance on 
>         > > IBM 
>         > > AIX Systems ? 
>         > > 
>         > > Qs. Is Any Other portion of the report.txt required for
> perusal 
> ? 
>         > > 
>         > > Please Revert to me for any further info / Clarifications ? 
>         > > 
>         > > 
>         > > 
>         > >  << File: report.txt_Live >>  << File: report.txt_TEST >> 
> 
> -- 
> Please see the official ORACLE-L FAQ: <http://www.orafaq.com> 
> -- 
> Author: VIVEK_SHARMA 
>   INET: VIVEK_SHARMA_at_infy.com 
> 
> Fat City Network Services    -- (858) 538-5051  FAX: (858) 538-5051 
> San Diego, California        -- Public Internet access / Mailing Lists 
> -------------------------------------------------------------------- 
> To REMOVE yourself from this mailing list, send an E-Mail message 
> to: ListGuru_at_fatcity.com (note EXACT spelling of 'ListGuru') and in 
> the message BODY, include a line containing: UNSUB ORACLE-L 
> (or the name of mailing list you want to be removed from).  You may 
> also send the HELP command for other information (like subscribing). 
> 
------_=_NextPart_001_01C06C3D.239F2110
Content-Type: text/html;
charset="iso-8859-1"
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN"> <HTML> <HEAD> <META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859-1"> <META NAME="Generator" CONTENT="MS Exchange Server version 5.5.2653.12"> <TITLE>RE: SOS Alert </TITLE> </HEAD> <BODY>
<P><FONT SIZE=2>Interesting question! My stab in the dark, shot from the hip, wild-ass guess is as follows:</FONT> </P>
<P><FONT SIZE=2>Since modern OSes can cache code segments via shared </FONT> <BR><FONT SIZE=2>libraries, there is a *very high* read to write ratio (i.e. </FONT> <BR><FONT SIZE=2>it is read once from disk into OS cache, and re-used from </FONT> <BR><FONT SIZE=2>there by many folks ) and so the "read from disk" is not a </FONT> <BR><FONT SIZE=2>significant performance issue.</FONT> </P> <P><FONT SIZE=2>Where disk config *would* become significant would be in case </FONT> <BR><FONT SIZE=2>of failure: What is faster? Recover by backup? Software </FONT> <BR><FONT SIZE=2>re-install? And, what is more likely to fail?  Software </FONT> <BR><FONT SIZE=2>striped across ten disks _without_ a mirror is more likely to </FONT> <BR><FONT SIZE=2>become unavailable than software located on one disk, just </FONT> <BR><FONT SIZE=2>because of the inverse additive effect of individual disk </FONT> <BR><FONT SIZE=2>MTBFs in a stripe set.  So, if you must stripe app binaries, </FONT><BR><FONT SIZE=2>mirror them. </FONT>
<P><FONT SIZE=2>There are other comments about code re-use in the OS, but </FONT> <BR><FONT SIZE=2>if i start on those, this won't be a "shot from the hip" </FONT> <BR><FONT SIZE=2>anymore. </FONT> </P> <P><FONT SIZE=2>Maybe Eric Pierce can Google us up some pithy and provocative </FONT><BR><FONT SIZE=2>URLs to keep us thinking on this topic!</FONT> </P>
<P><FONT SIZE=2>hth</FONT>
</P>
<P><FONT SIZE=2>Ross</FONT>
</P>
<P><FONT SIZE=2>-----Original Message-----</FONT> <BR><FONT SIZE=2>From: VIVEK_SHARMA [<A HREF="mailto:VIVEK_SHARMA_at_infy.com">mailto:VIVEK_SHARMA_at_infy.com</A>]</FONT> <BR><FONT SIZE=2>Sent: Friday, December 22, 2000 1:26 AM</FONT> <BR><FONT SIZE=2>To: 'ORACLE-L_at_fatcity.com'; 'MohanR_at_STARS-SMI.com'; 'turner_at_tellme.com';</FONT> <BR><FONT SIZE=2>'oracledba_at_lazydba.com'</FONT> <BR><FONT SIZE=2>Subject: RE: SOS Alert </FONT> </P> <BR>
<P><FONT SIZE=2>A related Basic Qs.</FONT> </P>
<P><FONT SIZE=2>Can the Application Software (  Banking Application of pro*c Executables  in</FONT> <BR><FONT SIZE=2>this Case) be put on Striped Disks belonging to the APP Server ? </FONT> </P> <P><FONT SIZE=2>Does it Have any Certain Negative performance impacts ?</FONT> <BR><FONT SIZE=2>OR is Performance the Same whether the  Application Lies on Striped Disks or</FONT> <BR><FONT SIZE=2>NON-Striped Disks ?</FONT></P>
<P><FONT SIZE=2>> -----Original Message-----</FONT>
<BR><FONT SIZE=2>> From: Mohan, Ross [SMTP:MohanR_at_STARS-SMI.com]</FONT>
<BR><FONT SIZE=2>> Sent: Thursday, December 21, 2000 1:51 AM</FONT>
<BR><FONT SIZE=2>> To:   Multiple recipients of list ORACLE-L</FONT>
<BR><FONT SIZE=2>> Subject:      RE: SOS Alert </FONT>
<BR><FONT SIZE=2>> </FONT>
<BR><FONT SIZE=2>> 4K striping is a major Major MAJOR mistake. </FONT>
<BR><FONT SIZE=2>> </FONT>
<BR><FONT SIZE=2>> Doing it in software makes it at least 3x as </FONT>
<BR><FONT SIZE=2>> bad. </FONT>
<BR><FONT SIZE=2>> </FONT>
<BR><FONT SIZE=2>> Either fix alone would have provided at least a </FONT>
<BR><FONT SIZE=2>> 2-4x improvement. </FONT>
<BR><FONT SIZE=2>> </FONT>
<BR><FONT SIZE=2>> Given a choice, I'd fix the stripe size first, then </FONT>
<BR><FONT SIZE=2>> convert the mirroring to hardware. </FONT>
<BR><FONT SIZE=2>> </FONT>
<BR><FONT SIZE=2>> hth </FONT>
<BR><FONT SIZE=2>> </FONT>
<BR><FONT SIZE=2>> imho </FONT>
<BR><FONT SIZE=2>> </FONT>
<BR><FONT SIZE=2>> ymmv </FONT>
<BR><FONT SIZE=2>> </FONT>
<BR><FONT SIZE=2>> afaik, </FONT>
<BR><FONT SIZE=2>> </FONT>
<BR><FONT SIZE=2>> etc. </FONT>
<BR><FONT SIZE=2>> </FONT>
<BR><FONT SIZE=2>> - Ross Mohan </FONT>
<BR><FONT SIZE=2>> </FONT>
<BR><FONT SIZE=2>> -----Original Message----- </FONT>
<BR><FONT SIZE=2>> From: VIVEK_SHARMA [ <<A HREF="mailto:VIVEK_SHARMA_at_infy.com">mailto:VIVEK_SHARMA_at_infy.com</A>>] </FONT>
<BR><FONT SIZE=2>> Sent: Wednesday, December 20, 2000 11:13 AM </FONT>
<BR><FONT SIZE=2>> To: Multiple recipients of list ORACLE-L </FONT>
<BR><FONT SIZE=2>> Subject: RE: SOS Alert </FONT>
<BR><FONT SIZE=2>> </FONT>
<BR><FONT SIZE=2>> </FONT>
<BR><FONT SIZE=2>> CASE In a Bank , Interest Calculation Batch Processing Unable to go beyond</FONT>
<BR><FONT SIZE=2>> </FONT>
<BR><FONT SIZE=2>> 500 A/cs per minute </FONT>
<BR><FONT SIZE=2>> </FONT>
<BR><FONT SIZE=2>> ACTION Moved the Database to Another IBM Machine ( 4 CPU , 4 GB RAM ) with</FONT>
<BR><FONT SIZE=2>> </FONT>
<BR><FONT SIZE=2>> HARDWARE Mirroring & 64 K Stripe Size </FONT>
<BR><FONT SIZE=2>> </FONT>
<BR><FONT SIZE=2>> NOTE - On the Under-performing DB Server S80 Model of IBM) there exists </FONT>
<BR><FONT SIZE=2>> Software Mirroring (By the AIX O.S.) & a Stripe Size of 4K exist </FONT>
<BR><FONT SIZE=2>> </FONT>
<BR><FONT SIZE=2>> Firing Off the Interest Calculation Runs from 2 APP Servers Giving an </FONT>
<BR><FONT SIZE=2>> Overall Total  Processing of Number of 3000 A/cs per minute ( 1500 from</FONT>
<BR><FONT SIZE=2>> each </FONT>
<BR><FONT SIZE=2>> ) </FONT>
<BR><FONT SIZE=2>> </FONT>
<BR><FONT SIZE=2>> Qs. Which of the 2 i.e. Hardware Mirroring OR 64 K Striping , do you think</FONT>
<BR><FONT SIZE=2>> </FONT>
<BR><FONT SIZE=2>> is the Cause of the Increase in performance as we put BOTH features</FONT>
<BR><FONT SIZE=2>> Together </FONT>
<BR><FONT SIZE=2>> at the SAME Time OR do Both Contribute Equally ? </FONT>
<BR><FONT SIZE=2>> </FONT>
<BR><FONT SIZE=2>> Thanks to Vivek for everything </FONT>
<BR><FONT SIZE=2>> </FONT>
<BR><FONT SIZE=2>> </FONT>
<BR><FONT SIZE=2>>         -----Original Message----- </FONT>
<BR><FONT SIZE=2>>         From:   VIVEK_SHARMA </FONT>
<BR><FONT SIZE=2>>         Sent:   Thursday, December 07, 2000 6:35 PM </FONT>
<BR><FONT SIZE=2>>         Subject:        RE: SOS Alert </FONT>
<BR><FONT SIZE=2>> </FONT>
<BR><FONT SIZE=2>> </FONT>
<BR><FONT SIZE=2>> </FONT>
<BR><FONT SIZE=2>>         ACTION Identified the 2 indexes having the waits of "db file </FONT>
<BR><FONT SIZE=2>> sequential read" </FONT>
<BR><FONT SIZE=2>>         Created 2 NEW Tablespaces Each containing 1 Index </FONT>
<BR><FONT SIZE=2>>         Also Specified parameter log_small_entry_max_size = 0 </FONT>
<BR><FONT SIZE=2>>         NOTE - ORA 7.3.4.5 </FONT>
<BR><FONT SIZE=2>> </FONT>
<BR><FONT SIZE=2>>         RESULT Performance thruput increased to 1500 A/cs per minute from </FONT>
<BR><FONT SIZE=2>> 1000 A/cs per minute </FONT>
<BR><FONT SIZE=2>> </FONT>
<BR><FONT SIZE=2>> </FONT>
<BR><FONT SIZE=2>>         ============================================== </FONT>
<BR><FONT SIZE=2>>         Allow me to begin :- </FONT>
<BR><FONT SIZE=2>>         There were 2 routes in between the APP & DB Servers . </FONT>
<BR><FONT SIZE=2>>         One had a Bandwidth of 10 MBPS while Another had a Bandwidth of</FONT>
<BR><FONT SIZE=2>> 100 </FONT>
<BR><FONT SIZE=2>> MBPS </FONT>
<BR><FONT SIZE=2>>         The 10 MBPS route was being used all this while . On Enabling the </FONT>
<BR><FONT SIZE=2>> 100 MBPS route </FONT>
<BR><FONT SIZE=2>>         (by Specifying the I.P. Address Instead of Hostname in the </FONT>
<BR><FONT SIZE=2>> tnsnames.ora & listener.ora) </FONT>
<BR><FONT SIZE=2>>         the performance has jumped to 1100 records per minute (from 500 </FONT>
<BR><FONT SIZE=2>> records per minute) </FONT>
<BR><FONT SIZE=2>>         i.e. Almost Doubled . </FONT>
<BR><FONT SIZE=2>>         100 MBPS is the MAX Possible Bandwidth Possible as per the SA </FONT>
<BR><FONT SIZE=2>> </FONT>
<BR><FONT SIZE=2>>         Qs. Your Views on this please </FONT>
<BR><FONT SIZE=2>> </FONT>
<BR><FONT SIZE=2>>         Additional responses Follow your Views BELOW :- </FONT>
<BR><FONT SIZE=2>> </FONT>
<BR><FONT SIZE=2>>         -----Original Message----- </FONT>
<BR><FONT SIZE=2>>         From:   Vivek [SMTP:Vivek_at_1800FLOWERS.com] </FONT>
<BR><FONT SIZE=2>>         Sent:   Tuesday, November 28, 2000 6:30 PM </FONT>
<BR><FONT SIZE=2>>         To:     'VIVEK_SHARMA' </FONT>
<BR><FONT SIZE=2>>         Cc:     Arun Chakrapani </FONT>
<BR><FONT SIZE=2>>         Subject:        RE: SOS Alert </FONT>
<BR><FONT SIZE=2>> </FONT>
<BR><FONT SIZE=2>>         Vivek: </FONT>
<BR><FONT SIZE=2>> </FONT>
<BR><FONT SIZE=2>>           Sorry for not replying to you yesterday itself. A Busy day. Took</FONT>
<BR><FONT SIZE=2>> a </FONT>
<BR><FONT SIZE=2>> look at </FONT>
<BR><FONT SIZE=2>>         you report.txt. They were having 2 different profiles. Since You</FONT>
<BR><FONT SIZE=2>> had </FONT>
<BR><FONT SIZE=2>>         mentioned that it also stalled in  Test DB I concentrated more on </FONT>
<BR><FONT SIZE=2>> the test </FONT>
<BR><FONT SIZE=2>>         report.txt. One thing which stood out on the other one (live) on </FONT>
<BR><FONT SIZE=2>> opening you </FONT>
<BR><FONT SIZE=2>>         report.txt is the SQL _AREA GET_HIT_RATIO is very poor. I don't</FONT>
<BR><FONT SIZE=2>> know </FONT>
<BR><FONT SIZE=2>> whether </FONT>
<BR><FONT SIZE=2>>         you have already pinned your SYS objects and other objects that</FONT>
<BR><FONT SIZE=2>> are </FONT>
<BR><FONT SIZE=2>> executed </FONT>
<BR><FONT SIZE=2>>         often in shared pool. Please use DBMS_SHARED_POOL.keep to pin </FONT>
<BR><FONT SIZE=2>>         package,procedure,function,sequence,cursors, triggers that are</FONT>
<BR><FONT SIZE=2>> used </FONT>
<BR><FONT SIZE=2>> often. </FONT>
<BR><FONT SIZE=2>> </FONT>
<BR><FONT SIZE=2>>         Ans OK will do so </FONT>
<BR><FONT SIZE=2>> </FONT>
<BR><FONT SIZE=2>>         So now I will concentrate on the Test DB. </FONT>
<BR><FONT SIZE=2>> </FONT>
<BR><FONT SIZE=2>>         1. Even though the SQL_AREA GET_HIT_RATIO is higer(82%) when </FONT>
<BR><FONT SIZE=2>> compared to </FONT>
<BR><FONT SIZE=2>>         live this is not good enough. It should be close to 100%(1 some </FONT>
<BR><FONT SIZE=2>> where like </FONT>
<BR><FONT SIZE=2>>         .98 alteast). Please pin the SYS objects to start with. </FONT>
<BR><FONT SIZE=2>> </FONT>
<BR><FONT SIZE=2>>         OK </FONT>
<BR><FONT SIZE=2>>         Allow me to say that the shared_pool_size has by experience been </FONT>
<BR><FONT SIZE=2>> kept at bare minimum required </FONT>
<BR><FONT SIZE=2>>         for we did experience performance poblems with a shred_pool_size</FONT>
<BR><FONT SIZE=2>> of </FONT>
<BR><FONT SIZE=2>> 300 MB </FONT>
<BR><FONT SIZE=2>>         ( Though on Another machine on SUN OS ) </FONT>
<BR><FONT SIZE=2>>         It is recommended to keep shared_pool_size at rock bottom just</FONT>
<BR><FONT SIZE=2>> above </FONT>
<BR><FONT SIZE=2>> the Limit </FONT>
<BR><FONT SIZE=2>>         where ORA-4031 out of shared pool segments occurs </FONT>
<BR><FONT SIZE=2>> </FONT>
<BR><FONT SIZE=2>> </FONT>
<BR><FONT SIZE=2>>         2. On seeing you system wide wait events the following stands out.</FONT>
<BR><FONT SIZE=2>> </FONT>
<BR><FONT SIZE=2>> </FONT>
<BR><FONT SIZE=2>>             1.log file sync                    2863        83922</FONT>
<BR><FONT SIZE=2>> 29.31 </FONT>
<BR><FONT SIZE=2>>             2.write complete waits             905         71401      78.9</FONT>
<BR><FONT SIZE=2>> </FONT>
<BR><FONT SIZE=2>>             3.free buffer waits                60          4718</FONT>
<BR><FONT SIZE=2>> 78.63 </FONT>
<BR><FONT SIZE=2>> </FONT>
<BR><FONT SIZE=2>>             The first event log file sync event mostly comes when there is</FONT>
<BR><FONT SIZE=2>> a </FONT>
<BR><FONT SIZE=2>> disk </FONT>
<BR><FONT SIZE=2>>         i/o bottleneck for the online redo logs.This happens when the </FONT>
<BR><FONT SIZE=2>> processes have </FONT>
<BR><FONT SIZE=2>>         to wait for the LGWR to sync a particular REDO log. The p1</FONT>
<BR><FONT SIZE=2>> parameter </FONT>
<BR><FONT SIZE=2>> in a </FONT>
<BR><FONT SIZE=2>>         Log sync in the corresponding LOg Buffer. </FONT>
<BR><FONT SIZE=2>> </FONT>
<BR><FONT SIZE=2>> </FONT>
<BR><FONT SIZE=2>>         The Write complete waits comes </FONT>
<BR><FONT SIZE=2>>         when the user process is waiting for the LGWR to complete the </FONT>
<BR><FONT SIZE=2>> writing of the </FONT>
<BR><FONT SIZE=2>>         redo information.  So Please take a look at the placing of the </FONT>
<BR><FONT SIZE=2>> online redo </FONT>
<BR><FONT SIZE=2>>         log. In which disk have you placed this. Also if it has been place</FONT>
<BR><FONT SIZE=2>> </FONT>
<BR><FONT SIZE=2>> in its </FONT>
<BR><FONT SIZE=2>>         own file system it doesn't mean that it is not going to a busy</FONT>
<BR><FONT SIZE=2>> disk. </FONT>
<BR><FONT SIZE=2>> Since </FONT>
<BR><FONT SIZE=2>>         you have 0+1, The slices may belong to a HOT disk. What you need</FONT>
<BR><FONT SIZE=2>> to </FONT>
<BR><FONT SIZE=2>> do is </FONT>
<BR><FONT SIZE=2>>         ask your SA to monitor the disk I/O usage and see whats happening</FONT>
<BR><FONT SIZE=2>> to </FONT>
<BR><FONT SIZE=2>> the </FONT>
<BR><FONT SIZE=2>>         disk containing the redo log file. Also you can use iostat -xPnce</FONT>
<BR><FONT SIZE=2>> 5 </FONT>
<BR><FONT SIZE=2>> 5 (I </FONT>
<BR><FONT SIZE=2>>         think its available in AIX). the output will look something like </FONT>
<BR><FONT SIZE=2>> </FONT>
<BR><FONT SIZE=2>>         us sy wt id </FONT>
<BR><FONT SIZE=2>>          23  7 49 21 </FONT>
<BR><FONT SIZE=2>>                                       extended device statistics ----</FONT>
<BR><FONT SIZE=2>> errors </FONT>
<BR><FONT SIZE=2>> --- </FONT>
<BR><FONT SIZE=2>>           r/s  w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b s/w h/w</FONT>
<BR><FONT SIZE=2>> trn </FONT>
<BR><FONT SIZE=2>> tot </FONT>
<BR><FONT SIZE=2>>         device </FONT>
<BR><FONT SIZE=2>>           0.0  0.0    0.0    0.0  0.0  0.0    0.0   13.4   0   0   0   0</FONT>
<BR><FONT SIZE=2>> 0 </FONT>
<BR><FONT SIZE=2>> 0 </FONT>
<BR><FONT SIZE=2>>         c1t10d0s0 </FONT>
<BR><FONT SIZE=2>>           0.0  0.0    0.0    0.0  0.0  0.0    0.0    2.4   0   0   0   0</FONT>
<BR><FONT SIZE=2>> 0 </FONT>
<BR><FONT SIZE=2>> 0 </FONT>
<BR><FONT SIZE=2>>         c1t10d0s1 </FONT>
<BR><FONT SIZE=2>>           0.0  0.0    0.0    0.0  0.0  0.0    0.0    0.0   0   0   0   0</FONT>
<BR><FONT SIZE=2>> 0 </FONT>
<BR><FONT SIZE=2>> 0 </FONT>
<BR><FONT SIZE=2>>         c1t10d0s2 </FONT>
<BR><FONT SIZE=2>>           0.0  1.3    0.3   10.1  0.0  0.1    4.6  105.8   0   1   0   0</FONT>
<BR><FONT SIZE=2>> 0 </FONT>
<BR><FONT SIZE=2>> 0 </FONT>
<BR><FONT SIZE=2>>         c1t10d0s3 </FONT>
<BR><FONT SIZE=2>>         Watch out for %b column. If its somewhere around 60 then you may</FONT>
<BR><FONT SIZE=2>> be </FONT>
<BR><FONT SIZE=2>>         experiencing contention on those slices. Ask you SA by giving </FONT>
<BR><FONT SIZE=2>> him/her the </FONT>
<BR><FONT SIZE=2>>         device name and they will be able to give you what the FS is. </FONT>
<BR><FONT SIZE=2>> </FONT>
<BR><FONT SIZE=2>>         NOTE - 4 Datafiles Containing a Particular Table & it's </FONT>
<BR><FONT SIZE=2>> Corresponding indexes (to Which INSERTS </FONT>
<BR><FONT SIZE=2>>         happen ) were Moved out of the RAID onto a Local Disk whereafter </FONT>
<BR><FONT SIZE=2>> performance Jumped from </FONT>
<BR><FONT SIZE=2>>         850 per minute to 1100  per minute </FONT>
<BR><FONT SIZE=2>> </FONT>
<BR><FONT SIZE=2>>         Ans - OK . Will Issue the Command </FONT>
<BR><FONT SIZE=2>> </FONT>
<BR><FONT SIZE=2>>         Also I see </FONT>
<BR><FONT SIZE=2>>         Free Buffer waits. This is caused because there are no free</FONT>
<BR><FONT SIZE=2>> buffers </FONT>
<BR><FONT SIZE=2>> when </FONT>
<BR><FONT SIZE=2>>         they are requested - pointing directly to the inefficiency of the </FONT>
<BR><FONT SIZE=2>> DBWR. </FONT>
<BR><FONT SIZE=2>>         Again the DBWR may be slow because you have a I/O bottleneck for</FONT>
<BR><FONT SIZE=2>> the </FONT>
<BR><FONT SIZE=2>> DB </FONT>
<BR><FONT SIZE=2>>         files and indirectly because of the Write complete waits. </FONT>
<BR><FONT SIZE=2>> </FONT>
<BR><FONT SIZE=2>>         Qs. Would you recommend Setting Multiple db_writer processes </FONT>
<BR><FONT SIZE=2>>         even though async_io is set to true ? </FONT>
<BR><FONT SIZE=2>> </FONT>
<BR><FONT SIZE=2>> </FONT>
<BR><FONT SIZE=2>>                 " Thus a Volume Group Contains 8 Disks with 4 Disks </FONT>
<BR><FONT SIZE=2>>                 > Stripe Size = 4K " </FONT>
<BR><FONT SIZE=2>> </FONT>
<BR><FONT SIZE=2>> </FONT>
<BR><FONT SIZE=2>>         [Vivek]  Please confirm me that the Stripe size is 4K. Usually you</FONT>
<BR><FONT SIZE=2>> </FONT>
<BR><FONT SIZE=2>> will have </FONT>
<BR><FONT SIZE=2>>         a stripe size of 32K,64K,128K,256K .... If you are having a stripe</FONT>
<BR><FONT SIZE=2>> </FONT>
<BR><FONT SIZE=2>> size of </FONT>
<BR><FONT SIZE=2>>         4K and your db_block_size as 8K, It is going to take 2 reads to </FONT>
<BR><FONT SIZE=2>> satisfy one </FONT>
<BR><FONT SIZE=2>>         request for a db_file_sequential_read (Usually requests 1 Oracle </FONT>
<BR><FONT SIZE=2>> block) </FONT>
<BR><FONT SIZE=2>>         [Vivek]  . This doesn't sound good if thats true. </FONT>
<BR><FONT SIZE=2>> </FONT>
<BR><FONT SIZE=2>>         NOTE - The "test" Database is a 4K DB_BLOCK_SIZE while the Live </FONT>
<BR><FONT SIZE=2>> Database is </FONT>
<BR><FONT SIZE=2>>         an 8K DB_BLOCK_SIZE , But the SAME performance Limits were</FONT>
<BR><FONT SIZE=2>> Observed </FONT>
<BR><FONT SIZE=2>> on Both </FONT>
<BR><FONT SIZE=2>> </FONT>
<BR><FONT SIZE=2>>         So if possible could you please do the following. </FONT>
<BR><FONT SIZE=2>> </FONT>
<BR><FONT SIZE=2>>         1. Please shutdown the Test db and start fresh. </FONT>
<BR><FONT SIZE=2>>         2. Run the same process again and also run the iostat -xPnce 60 </FONT>
<BR><FONT SIZE=2>> 10000 for </FONT>
<BR><FONT SIZE=2>>         the duration of the run (output this to a file) </FONT>
<BR><FONT SIZE=2>>         3. Run vmstat and see whats the CPU usage for USER and SYSTEM </FONT>
<BR><FONT SIZE=2>>         4. When the 500 A/c per sec reaches - Please run </FONT>
<BR><FONT SIZE=2>> </FONT>
<BR><FONT SIZE=2>>         set lines 132 </FONT>
<BR><FONT SIZE=2>>         select sid, substr(event,1,40),p1,p2,p3 from v$session_wait </FONT>
<BR><FONT SIZE=2>>         where event in ('db file scattered read','db file sequential </FONT>
<BR><FONT SIZE=2>> read','latch </FONT>
<BR><FONT SIZE=2>>         free','buffer busy waits','log file sync','db file scattered </FONT>
<BR><FONT SIZE=2>>         read','enqueue'); </FONT>
<BR><FONT SIZE=2>> </FONT>
<BR><FONT SIZE=2>>         and give me the output.  </FONT>
<BR><FONT SIZE=2>> </FONT>
<BR><FONT SIZE=2>>         OK - Will Action & revert to you </FONT>
<BR><FONT SIZE=2>> </FONT>
<BR><FONT SIZE=2>>         5. IF may waits are showing as "latch free" in the event column of</FONT>
<BR><FONT SIZE=2>> </FONT>
<BR><FONT SIZE=2>>         v$session_wait, Please query v$latch for </FONT>
<BR><FONT SIZE=2>>         latch#,name,level,GETS,MISSES,SLEEPS IMMEDIATE_GETS </FONT>
<BR><FONT SIZE=2>>         ,WAITS_HOLDING_LATCH, SPIN_GETS, SLEEP1. </FONT>
<BR><FONT SIZE=2>> </FONT>
<BR><FONT SIZE=2>>         Ans OK </FONT>
<BR><FONT SIZE=2>> </FONT>
<BR><FONT SIZE=2>>         6. Preserve the output for v$latch_children. </FONT>
<BR><FONT SIZE=2>> </FONT>
<BR><FONT SIZE=2>>         Ans OK </FONT>
<BR><FONT SIZE=2>> </FONT>
<BR><FONT SIZE=2>>         Qs. Are their any IBM OS Specific Parameters which are considered </FONT>
<BR><FONT SIZE=2>> benefitial for performance ? </FONT>
<BR><FONT SIZE=2>> </FONT>
<BR><FONT SIZE=2>> </FONT>
<BR><FONT SIZE=2>>         I will take a look at both the report.txt in more detail today and</FONT>
<BR><FONT SIZE=2>> </FONT>
<BR><FONT SIZE=2>> get back </FONT>
<BR><FONT SIZE=2>>         to you. </FONT>
<BR><FONT SIZE=2>> </FONT>
<BR><FONT SIZE=2>>         Thanks Again so much </FONT>
<BR><FONT SIZE=2>>         Where are you Located  / Phone No ? if you please </FONT>
<BR><FONT SIZE=2>>         We are Currently Located in Bangalore & Addressing The Problem at </FONT>
<BR><FONT SIZE=2>> Mumbai over a WAN </FONT>
<BR><FONT SIZE=2>>         Phone 91 80 6588668 </FONT>
<BR><FONT SIZE=2>> </FONT>
<BR><FONT SIZE=2>>         HTH </FONT>
<BR><FONT SIZE=2>> </FONT>
<BR><FONT SIZE=2>>         Vivek </FONT>
<BR><FONT SIZE=2>> </FONT>
<BR><FONT SIZE=2>>         > -----Original Message----- </FONT>
<BR><FONT SIZE=2>>         > From: VIVEK_SHARMA [SMTP:VIVEK_SHARMA_at_infy.com] </FONT>
<BR><FONT SIZE=2>>         > Sent: Monday, November 27, 2000 11:16 AM </FONT>
<BR><FONT SIZE=2>>         > To:   'Vivek' </FONT>
<BR><FONT SIZE=2>>         > Cc:   Arun Chakrapani </FONT>
<BR><FONT SIZE=2>>         > Subject:      RE: SOS Alert </FONT>
<BR><FONT SIZE=2>>         > </FONT>
<BR><FONT SIZE=2>>         > THANKS so much for the association in this issue </FONT>
<BR><FONT SIZE=2>>         > </FONT>
<BR><FONT SIZE=2>>         > Attached 2 report.txt each taken for 1 Hour approx . Is it </FONT>
<BR><FONT SIZE=2>> Sufficient ? </FONT>
<BR><FONT SIZE=2>>         > </FONT>
<BR><FONT SIZE=2>>         > Attached  report.txt of  Problem on the "Live" Database </FONT>
<BR><FONT SIZE=2>>         > </FONT>
<BR><FONT SIZE=2>>         >  <<report.txt_Live>> </FONT>
<BR><FONT SIZE=2>>         > NOTE - A "test" Database  Created on the DB Server Itself  in</FONT>
<BR><FONT SIZE=2>> the </FONT>
<BR><FONT SIZE=2>> Same </FONT>
<BR><FONT SIZE=2>>         > Partitions as the "Live" Database </FONT>
<BR><FONT SIZE=2>>         > Runs Done on this "test" Database Also failing to go beyond 500 </FONT>
<BR><FONT SIZE=2>> A/cs per </FONT>
<BR><FONT SIZE=2>>         > minute </FONT>
<BR><FONT SIZE=2>>         > </FONT>
<BR><FONT SIZE=2>>         > </FONT>
<BR><FONT SIZE=2>>         >  <<report.txt_TEST>> </FONT>
<BR><FONT SIZE=2>>         > </FONT>
<BR><FONT SIZE=2>>         > NOTE - Interest Runs were Done on the "Test" Database During </FONT>
<BR><FONT SIZE=2>> Down-Time </FONT>
<BR><FONT SIZE=2>>         > Periods of the "Live" Database </FONT>
<BR><FONT SIZE=2>>         > </FONT>
<BR><FONT SIZE=2>>         > You Wrote :- </FONT>
<BR><FONT SIZE=2>>         >       Since there is some write complete waits Please take a</FONT>
<BR><FONT SIZE=2>> look </FONT>
<BR><FONT SIZE=2>> at the </FONT>
<BR><FONT SIZE=2>>         > place the on line redo logs are and what is the read, write on </FONT>
<BR><FONT SIZE=2>> these </FONT>
<BR><FONT SIZE=2>>         > disks. </FONT>
<BR><FONT SIZE=2>>         > </FONT>
<BR><FONT SIZE=2>>         > Ans Online Redo logs ALSO present on the RAID 0+1 Disks & are </FONT>
<BR><FONT SIZE=2>> Multiplexed </FONT>
<BR><FONT SIZE=2>>         > Size of Online Redo logfile on the "Live" Database = 30 MB </FONT>
<BR><FONT SIZE=2>>         > </FONT>
<BR><FONT SIZE=2>>         > NOTE - On the test Database we increased the Size of the Online </FONT>
<BR><FONT SIZE=2>> Redo </FONT>
<BR><FONT SIZE=2>>         > logfile </FONT>
<BR><FONT SIZE=2>>         > to 100MB but with NO benefit </FONT>
<BR><FONT SIZE=2>>         > </FONT>
<BR><FONT SIZE=2>>         > You wrote :- </FONT>
<BR><FONT SIZE=2>>         > Please give me the DB Version as well </FONT>
<BR><FONT SIZE=2>>         > </FONT>
<BR><FONT SIZE=2>>         > Ans Database Oracle  version = 7.3.4.5.0 </FONT>
<BR><FONT SIZE=2>>         > </FONT>
<BR><FONT SIZE=2>>         > You wrote :- </FONT>
<BR><FONT SIZE=2>>         >       I see some waits on More data from client. </FONT>
<BR><FONT SIZE=2>>         > Qs. What is to be done about the Same ? </FONT>
<BR><FONT SIZE=2>>         > </FONT>
<BR><FONT SIZE=2>>         > NOTE - Will Run the 2 SQL Scripts Suggested by you & revert back</FONT>
<BR><FONT SIZE=2>> </FONT>
<BR><FONT SIZE=2>>         > </FONT>
<BR><FONT SIZE=2>>         > Thanks Again </FONT>
<BR><FONT SIZE=2>>         > </FONT>
<BR><FONT SIZE=2>>         > </FONT>
<BR><FONT SIZE=2>>         > > -----Original Message----- </FONT>
<BR><FONT SIZE=2>>         > > From:       Vivek [SMTP:Vivek_at_1800FLOWERS.com] </FONT>
<BR><FONT SIZE=2>>         > > Sent:       Monday, November 27, 2000 7:44 PM </FONT>
<BR><FONT SIZE=2>>         > > To: 'VIVEK_SHARMA_at_infy.com' </FONT>
<BR><FONT SIZE=2>>         > > </FONT>
<BR><FONT SIZE=2>>         > > Cc: Arun Chakrapani </FONT>
<BR><FONT SIZE=2>>         > > Subject:    SOS Alert </FONT>
<BR><FONT SIZE=2>>         > > </FONT>
<BR><FONT SIZE=2>>         > > Hi Vivek: </FONT>
<BR><FONT SIZE=2>>         > > </FONT>
<BR><FONT SIZE=2>>         > > Will it be possible to send the whole report.txt to me. Since </FONT>
<BR><FONT SIZE=2>> you have </FONT>
<BR><FONT SIZE=2>>         > not </FONT>
<BR><FONT SIZE=2>>         > > given for how long this was taken (time from utlbstat and </FONT>
<BR><FONT SIZE=2>> utlestat) it </FONT>
<BR><FONT SIZE=2>>         > is </FONT>
<BR><FONT SIZE=2>>         > > more difficult to say anything.  Based on the report.txt</FONT>
<BR><FONT SIZE=2>> snippet </FONT>
<BR><FONT SIZE=2>> you </FONT>
<BR><FONT SIZE=2>>         > have </FONT>
<BR><FONT SIZE=2>>         > > given there seems to me there are some issues with the</FONT>
<BR><FONT SIZE=2>> enqueues </FONT>
<BR><FONT SIZE=2>> and </FONT>
<BR><FONT SIZE=2>>         > latch </FONT>
<BR><FONT SIZE=2>>         > > free waits. Please run the following SQL as SYS so bet' 2 </FONT>
<BR><FONT SIZE=2>> different </FONT>
<BR><FONT SIZE=2>>         > times </FONT>
<BR><FONT SIZE=2>>         > > ( </FONT>
<BR><FONT SIZE=2>>         > > say 1/2 apart) and send me both the values(It will be great if</FONT>
<BR><FONT SIZE=2>> </FONT>
<BR><FONT SIZE=2>> you could </FONT>
<BR><FONT SIZE=2>>         > > run </FONT>
<BR><FONT SIZE=2>>         > > this when the batch processing is running). </FONT>
<BR><FONT SIZE=2>>         > > </FONT>
<BR><FONT SIZE=2>>         > > select KSQSTTYP,KSQSTGET,KSQSTWAT from x$ksqst where</FONT>
<BR><FONT SIZE=2>> KSQSTGET>0 </FONT>
<BR><FONT SIZE=2>> and </FONT>
<BR><FONT SIZE=2>>         > > KSQSTWAT </FONT>
<BR><FONT SIZE=2>>         > > >0; </FONT>
<BR><FONT SIZE=2>>         > > </FONT>
<BR><FONT SIZE=2>>         > > The delta bet' the 2 results will give you information on what</FONT>
<BR><FONT SIZE=2>> </FONT>
<BR><FONT SIZE=2>> is the </FONT>
<BR><FONT SIZE=2>>         > > enqueue that we see more waits on. Also Please run this sql </FONT>
<BR><FONT SIZE=2>> given below </FONT>
<BR><FONT SIZE=2>>         > in </FONT>
<BR><FONT SIZE=2>>         > > a </FONT>
<BR><FONT SIZE=2>>         > > cron every minute or so and output this to a file </FONT>
<BR><FONT SIZE=2>>         > > </FONT>
<BR><FONT SIZE=2>>         > > SELECT p1 "File", p2 "Block", p3 "Reason"  FROM v$session_wait</FONT>
<BR><FONT SIZE=2>> </FONT>
<BR><FONT SIZE=2>> WHERE </FONT>
<BR><FONT SIZE=2>>         > event </FONT>
<BR><FONT SIZE=2>>         > > like  'buffer busy waits'  ; </FONT>
<BR><FONT SIZE=2>>         > > </FONT>
<BR><FONT SIZE=2>>         > > This will give you and idea about the buffer busy waits. Since</FONT>
<BR><FONT SIZE=2>> </FONT>
<BR><FONT SIZE=2>> there is </FONT>
<BR><FONT SIZE=2>>         > > some </FONT>
<BR><FONT SIZE=2>>         > > write complete waits Please take a look at the place the on</FONT>
<BR><FONT SIZE=2>> line </FONT>
<BR><FONT SIZE=2>> redo </FONT>
<BR><FONT SIZE=2>>         > logs </FONT>
<BR><FONT SIZE=2>>         > > are and what is the read, write on these disks. Also Please</FONT>
<BR><FONT SIZE=2>> give </FONT>
<BR><FONT SIZE=2>> me </FONT>
<BR><FONT SIZE=2>>         > > infomation on the relog file size. Please give me the DB</FONT>
<BR><FONT SIZE=2>> Version </FONT>
<BR><FONT SIZE=2>> as </FONT>
<BR><FONT SIZE=2>>         > well. </FONT>
<BR><FONT SIZE=2>>         > > I </FONT>
<BR><FONT SIZE=2>>         > > see some waits on More data from client. Again since I don't </FONT>
<BR><FONT SIZE=2>> know for </FONT>
<BR><FONT SIZE=2>>         > how </FONT>
<BR><FONT SIZE=2>>         > > long the stats were run can't say whether they are significant</FONT>
<BR><FONT SIZE=2>> </FONT>
<BR><FONT SIZE=2>> or not. </FONT>
<BR><FONT SIZE=2>>         > > </FONT>
<BR><FONT SIZE=2>>         > > HTH </FONT>
<BR><FONT SIZE=2>>         > > </FONT>
<BR><FONT SIZE=2>>         > > Vivek </FONT>
<BR><FONT SIZE=2>>         > > </FONT>
<BR><FONT SIZE=2>>         > > </FONT>
<BR><FONT SIZE=2>>         > > </FONT>
<BR><FONT SIZE=2>>         > > From: VIVEK_SHARMA <VIVEK_SHARMA_at_infy.com> </FONT>
<BR><FONT SIZE=2>>         > >  Date: Mon, 27 Nov 2000 12:11:59 +0530 </FONT>
<BR><FONT SIZE=2>>         > >  Subject: SOS Alert </FONT>
<BR><FONT SIZE=2>>         > > </FONT>
<BR><FONT SIZE=2>>         > > CASE In a Bank , Interest Calculation Batch Processing Unable</FONT>
<BR><FONT SIZE=2>> to </FONT>
<BR><FONT SIZE=2>> go </FONT>
<BR><FONT SIZE=2>>         > beyond </FONT>
<BR><FONT SIZE=2>>         > > 500 A/cs per minute </FONT>
<BR><FONT SIZE=2>>         > > </FONT>
<BR><FONT SIZE=2>>         > > CAUSE of the Problem is UNKNOWN </FONT>
<BR><FONT SIZE=2>>         > > </FONT>
<BR><FONT SIZE=2>>         > > ORACLE 7.3.4.5 on AIX 4.3.3 </FONT>
<BR><FONT SIZE=2>>         > > DB Server - IBM S80 Model - 12 CPUs , 3 GB RAM </FONT>
<BR><FONT SIZE=2>>         > > APP Server 1 - IBM S80 Model - 6 CPUs , 2 GB RAM </FONT>
<BR><FONT SIZE=2>>         > > APP Servers 2 & 3 - IBM Thin Nodes model - 4 CPUs , 1 GB RAM </FONT>
<BR><FONT SIZE=2>>         > > </FONT>
<BR><FONT SIZE=2>>         > > Storage Box :- </FONT>
<BR><FONT SIZE=2>>         > > =========== </FONT>
<BR><FONT SIZE=2>>         > > SS Class Storage </FONT>
<BR><FONT SIZE=2>>         > > RAID 0+1 - (First Striped & then Mirrored) </FONT>
<BR><FONT SIZE=2>>         > > NOTE - 2 Storage Boxes Exist , one being the Mirror of the</FONT>
<BR><FONT SIZE=2>> Other </FONT>
<BR><FONT SIZE=2>>         > > Striping exists across a set 4 Disks (in one Box) with another</FONT>
<BR><FONT SIZE=2>> 4 </FONT>
<BR><FONT SIZE=2>> being </FONT>
<BR><FONT SIZE=2>>         > > it's </FONT>
<BR><FONT SIZE=2>>         > > mirror </FONT>
<BR><FONT SIZE=2>>         > > (in another Box). </FONT>
<BR><FONT SIZE=2>>         > > Thus a Volume Group Contains 8 Disks with 4 Disks </FONT>
<BR><FONT SIZE=2>>         > > Stripe Size = 4K </FONT>
<BR><FONT SIZE=2>>         > >  </FONT>
<BR><FONT SIZE=2>>         > > NOTE - Runs Tried from BOTH the APP Servers 1 OR 2 ,But with</FONT>
<BR><FONT SIZE=2>> the </FONT>
<BR><FONT SIZE=2>> SAME </FONT>
<BR><FONT SIZE=2>>         > MAX </FONT>
<BR><FONT SIZE=2>>         > > of </FONT>
<BR><FONT SIZE=2>>         > > 500 A/cs </FONT>
<BR><FONT SIZE=2>>         > > processed per minute </FONT>
<BR><FONT SIZE=2>>         > > </FONT>
<BR><FONT SIZE=2>>         > > CPU Utilizations on BOTH APP & DB Server = 40 % </FONT>
<BR><FONT SIZE=2>>         > > wio% on BOTH APP & DB Servers = 35 % </FONT>
<BR><FONT SIZE=2>>         > > No paging happening on Both APP & DB Servers </FONT>
<BR><FONT SIZE=2>>         > > </FONT>
<BR><FONT SIZE=2>>         > > - Oracle Contention Values Seem Small to us as shown Below or</FONT>
<BR><FONT SIZE=2>> so </FONT>
<BR><FONT SIZE=2>> they </FONT>
<BR><FONT SIZE=2>>         > seem </FONT>
<BR><FONT SIZE=2>>         > > to us :- </FONT>
<BR><FONT SIZE=2>>         > > </FONT>
<BR><FONT SIZE=2>>         > > SVRMGR> Rem System wide wait events for non-background</FONT>
<BR><FONT SIZE=2>> processes </FONT>
<BR><FONT SIZE=2>> (PMON, </FONT>
<BR><FONT SIZE=2>>         > > SVRMGR> Rem SMON, etc).  Times are in hundreths of seconds. </FONT>
<BR><FONT SIZE=2>> Each one of </FONT>
<BR><FONT SIZE=2>>         > </FONT>
<BR><FONT SIZE=2>>         > > SVRMGR> Rem these is a context switch which costs CPU time.</FONT>
<BR><FONT SIZE=2>> By </FONT>
<BR><FONT SIZE=2>> looking </FONT>
<BR><FONT SIZE=2>>         > at </FONT>
<BR><FONT SIZE=2>>         > > SVRMGR> Rem the Total Time you can often determine what is the</FONT>
<BR><FONT SIZE=2>> </FONT>
<BR><FONT SIZE=2>>         > bottleneck </FONT>
<BR><FONT SIZE=2>>         > > SVRMGR> Rem that processes are waiting for.  This shows the </FONT>
<BR><FONT SIZE=2>> total time </FONT>
<BR><FONT SIZE=2>>         > > spent </FONT>
<BR><FONT SIZE=2>>         > > SVRMGR> Rem waiting for a specific event and the average time </FONT>
<BR><FONT SIZE=2>> per wait </FONT>
<BR><FONT SIZE=2>>         > on </FONT>
<BR><FONT SIZE=2>>         > > SVRMGR> Rem that event. </FONT>
<BR><FONT SIZE=2>>         > > SVRMGR> select      n1.event "Event Name", </FONT>
<BR><FONT SIZE=2>>         > >      2>             n1.event_count "Count", </FONT>
<BR><FONT SIZE=2>>         > >      3>     n1.time_waited "Total Time", </FONT>
<BR><FONT SIZE=2>>         > >      4>     round(n1.time_waited/n1.event_count, 2) "Avg Time"</FONT>
<BR><FONT SIZE=2>> </FONT>
<BR><FONT SIZE=2>>         > >      5>    from stats$event n1 </FONT>
<BR><FONT SIZE=2>>         > >      6>    where n1.event_count > 0 </FONT>
<BR><FONT SIZE=2>>         > >      7>    order by n1.time_waited desc; </FONT>
<BR><FONT SIZE=2>>         > > Event Name                       Count         Total Time</FONT>
<BR><FONT SIZE=2>> Avg </FONT>
<BR><FONT SIZE=2>> Time </FONT>
<BR><FONT SIZE=2>>         > </FONT>
<BR><FONT SIZE=2>>         > > -------------------------------- ------------- ------------- </FONT>
<BR><FONT SIZE=2>>         > ------------- </FONT>
<BR><FONT SIZE=2>>         > > SQL*Net message from client           10856276      31977110 </FONT>
<BR><FONT SIZE=2>>         > 2.95 </FONT>
<BR><FONT SIZE=2>>         > > enqueue                                   1295        374980 </FONT>
<BR><FONT SIZE=2>>         > 289.56 </FONT>
<BR><FONT SIZE=2>>         > > db file sequential read                3614044        303848 </FONT>
<BR><FONT SIZE=2>>         > .08 </FONT>
<BR><FONT SIZE=2>>         > > write complete waits                      5812        295937 </FONT>
<BR><FONT SIZE=2>>         > 50.92 </FONT>
<BR><FONT SIZE=2>>         > > latch free                             5045060        242170 </FONT>
<BR><FONT SIZE=2>>         > .05 </FONT>
<BR><FONT SIZE=2>>         > > SQL*Net more data from client            13939        165275 </FONT>
<BR><FONT SIZE=2>>         > 11.86 </FONT>
<BR><FONT SIZE=2>>         > > log file sync                            12794        146409 </FONT>
<BR><FONT SIZE=2>>         > 11.44 </FONT>
<BR><FONT SIZE=2>>         > > buffer busy waits                       100443         92477 </FONT>
<BR><FONT SIZE=2>>         > .92 </FONT>
<BR><FONT SIZE=2>>         > > </FONT>
<BR><FONT SIZE=2>>         > > </FONT>
<BR><FONT SIZE=2>>         > > </FONT>
<BR><FONT SIZE=2>>         > > - ALL Literal SQLs were Converted to using Bind variables </FONT>
<BR><FONT SIZE=2>>         > > - ALL Tables Running on Indexes Without Any FULL Scans</FONT>
<BR><FONT SIZE=2>> happening </FONT>
<BR><FONT SIZE=2>> . </FONT>
<BR><FONT SIZE=2>>         > > </FONT>
<BR><FONT SIZE=2>>         > > - All the Literal SQLs (Dynamic) Converted to using Bind </FONT>
<BR><FONT SIZE=2>> variables </FONT>
<BR><FONT SIZE=2>>         > (Static </FONT>
<BR><FONT SIZE=2>>         > > Queries) </FONT>
<BR><FONT SIZE=2>>         > > </FONT>
<BR><FONT SIZE=2>>         > > - event="10181 trace name context forever, level 1000" </FONT>
<BR><FONT SIZE=2>>         > > NOTE - Set nevertheless , None of the SQLs taking Excessive</FONT>
<BR><FONT SIZE=2>> time </FONT>
<BR><FONT SIZE=2>> to </FONT>
<BR><FONT SIZE=2>>         > parse </FONT>
<BR><FONT SIZE=2>>         > > though </FONT>
<BR><FONT SIZE=2>>         > > </FONT>
<BR><FONT SIZE=2>>         > > - NO statistics are Analyzed </FONT>
<BR><FONT SIZE=2>>         > > - 7 tables involved in the interest Calc. </FONT>
<BR><FONT SIZE=2>>         > > inserts to 2 Tables </FONT>
<BR><FONT SIZE=2>>         > > selects , updates in 5 tables </FONT>
<BR><FONT SIZE=2>>         > > </FONT>
<BR><FONT SIZE=2>>         > > Qs. Are there Any BASIC O.S./Hardware Features to Enhance </FONT>
<BR><FONT SIZE=2>> Performance on </FONT>
<BR><FONT SIZE=2>>         > > IBM </FONT>
<BR><FONT SIZE=2>>         > > AIX Systems ? </FONT>
<BR><FONT SIZE=2>>         > > </FONT>
<BR><FONT SIZE=2>>         > > Qs. Is Any Other portion of the report.txt required for</FONT>
<BR><FONT SIZE=2>> perusal </FONT>
<BR><FONT SIZE=2>> ? </FONT>
<BR><FONT SIZE=2>>         > > </FONT>
<BR><FONT SIZE=2>>         > > Please Revert to me for any further info / Clarifications ? </FONT>
<BR><FONT SIZE=2>>         > > </FONT>
<BR><FONT SIZE=2>>         > > </FONT>
<BR><FONT SIZE=2>>         > > </FONT>
<BR><FONT SIZE=2>>         > >  << File: report.txt_Live >>  << File: report.txt_TEST >> </FONT>
Received on Fri Dec 22 2000 - 11:32:21 CST
|  |  |