Oracle FAQ | Your Portal to the Oracle Knowledge Grid |
![]() |
![]() |
Home -> Community -> Mailing Lists -> Oracle-L -> RE: SOS Alert
This message is in MIME format. Since your mail reader does not understand
this format, some or all of this message may not be legible.
------_=_NextPart_001_01C06C3D.239F2110
Content-Type: text/plain;
charset="iso-8859-1"
Interesting question! My stab in the dark, shot from the hip, wild-ass guess is as follows:
Since modern OSes can cache code segments via shared libraries, there is a *very high* read to write ratio (i.e. it is read once from disk into OS cache, and re-used from there by many folks ) and so the "read from disk" is not a significant performance issue.
Where disk config *would* become significant would be in case of failure: What is faster? Recover by backup? Software re-install? And, what is more likely to fail? Software striped across ten disks _without_ a mirror is more likely to become unavailable than software located on one disk, just because of the inverse additive effect of individual disk MTBFs in a stripe set. So, if you must stripe app binaries, mirror them.
There are other comments about code re-use in the OS, but if i start on those, this won't be a "shot from the hip" anymore.
Maybe Eric Pierce can Google us up some pithy and provocative URLs to keep us thinking on this topic!
hth
Ross
-----Original Message-----
From: VIVEK_SHARMA [mailto:VIVEK_SHARMA_at_infy.com]
Sent: Friday, December 22, 2000 1:26 AM
To: 'ORACLE-L_at_fatcity.com'; 'MohanR_at_STARS-SMI.com'; 'turner_at_tellme.com';
'oracledba_at_lazydba.com'
Subject: RE: SOS Alert
A related Basic Qs.
Can the Application Software ( Banking Application of pro*c Executables in this Case) be put on Striped Disks belonging to the APP Server ?
Does it Have any Certain Negative performance impacts ? OR is Performance the Same whether the Application Lies on Striped Disks or NON-Striped Disks ?
> -----Original Message-----
> From: Mohan, Ross [SMTP:MohanR_at_STARS-SMI.com]
> Sent: Thursday, December 21, 2000 1:51 AM
> To: Multiple recipients of list ORACLE-L
> Subject: RE: SOS Alert
>
> 4K striping is a major Major MAJOR mistake.
>
> Doing it in software makes it at least 3x as
> bad.
>
> Either fix alone would have provided at least a
> 2-4x improvement.
>
> Given a choice, I'd fix the stripe size first, then
> convert the mirroring to hardware.
>
> hth
>
> imho
>
> ymmv
>
> afaik,
>
> etc.
>
> - Ross Mohan
>
> -----Original Message-----
> From: VIVEK_SHARMA [ <mailto:VIVEK_SHARMA_at_infy.com>]
> Sent: Wednesday, December 20, 2000 11:13 AM
> To: Multiple recipients of list ORACLE-L
> Subject: RE: SOS Alert
>
>
> CASE In a Bank , Interest Calculation Batch Processing Unable to go beyond
>
> 500 A/cs per minute
>
> ACTION Moved the Database to Another IBM Machine ( 4 CPU , 4 GB RAM ) with
>
> HARDWARE Mirroring & 64 K Stripe Size
>
> NOTE - On the Under-performing DB Server S80 Model of IBM) there exists
> Software Mirroring (By the AIX O.S.) & a Stripe Size of 4K exist
>
> Firing Off the Interest Calculation Runs from 2 APP Servers Giving an
> Overall Total Processing of Number of 3000 A/cs per minute ( 1500 from
> each
> )
>
> Qs. Which of the 2 i.e. Hardware Mirroring OR 64 K Striping , do you think
>
> is the Cause of the Increase in performance as we put BOTH features
> Together
> at the SAME Time OR do Both Contribute Equally ?
>
> Thanks to Vivek for everything
>
>
> -----Original Message-----
> From: VIVEK_SHARMA
> Sent: Thursday, December 07, 2000 6:35 PM
> Subject: RE: SOS Alert
>
>
>
> ACTION Identified the 2 indexes having the waits of "db file
> sequential read"
> Created 2 NEW Tablespaces Each containing 1 Index
> Also Specified parameter log_small_entry_max_size = 0
> NOTE - ORA 7.3.4.5
>
> RESULT Performance thruput increased to 1500 A/cs per minute from
> 1000 A/cs per minute
>
>
> ==============================================
> Allow me to begin :-
> There were 2 routes in between the APP & DB Servers .
> One had a Bandwidth of 10 MBPS while Another had a Bandwidth of
> 100
> MBPS
> The 10 MBPS route was being used all this while . On Enabling the
> 100 MBPS route
> (by Specifying the I.P. Address Instead of Hostname in the
> tnsnames.ora & listener.ora)
> the performance has jumped to 1100 records per minute (from 500
> records per minute)
> i.e. Almost Doubled .
> 100 MBPS is the MAX Possible Bandwidth Possible as per the SA
>
> Qs. Your Views on this please
>
> Additional responses Follow your Views BELOW :-
>
> -----Original Message-----
> From: Vivek [SMTP:Vivek_at_1800FLOWERS.com]
> Sent: Tuesday, November 28, 2000 6:30 PM
> To: 'VIVEK_SHARMA'
> Cc: Arun Chakrapani
> Subject: RE: SOS Alert
>
> Vivek:
>
> Sorry for not replying to you yesterday itself. A Busy day. Took
> a
> look at
> you report.txt. They were having 2 different profiles. Since You
> had
> mentioned that it also stalled in Test DB I concentrated more on
> the test
> report.txt. One thing which stood out on the other one (live) on
> opening you
> report.txt is the SQL _AREA GET_HIT_RATIO is very poor. I don't
> know
> whether
> you have already pinned your SYS objects and other objects that
> are
> executed
> often in shared pool. Please use DBMS_SHARED_POOL.keep to pin
> package,procedure,function,sequence,cursors, triggers that are
> used
> often.
>
> Ans OK will do so
>
> So now I will concentrate on the Test DB.
>
> 1. Even though the SQL_AREA GET_HIT_RATIO is higer(82%) when
> compared to
> live this is not good enough. It should be close to 100%(1 some
> where like
> .98 alteast). Please pin the SYS objects to start with.
>
> OK
> Allow me to say that the shared_pool_size has by experience been
> kept at bare minimum required
> for we did experience performance poblems with a shred_pool_size
> of
> 300 MB
> ( Though on Another machine on SUN OS )
> It is recommended to keep shared_pool_size at rock bottom just
> above
> the Limit
> where ORA-4031 out of shared pool segments occurs
>
>
> 2. On seeing you system wide wait events the following stands out.
>
>
> 1.log file sync 2863 83922
> 29.31
> 2.write complete waits 905 71401 78.9
>
> 3.free buffer waits 60 4718
> 78.63
>
> The first event log file sync event mostly comes when there is
> a
> disk
> i/o bottleneck for the online redo logs.This happens when the
> processes have
> to wait for the LGWR to sync a particular REDO log. The p1
> parameter
> in a
> Log sync in the corresponding LOg Buffer.
>
>
> The Write complete waits comes
> when the user process is waiting for the LGWR to complete the
> writing of the
> redo information. So Please take a look at the placing of the
> online redo
> log. In which disk have you placed this. Also if it has been place
>
> in its
> own file system it doesn't mean that it is not going to a busy
> disk.
> Since
> you have 0+1, The slices may belong to a HOT disk. What you need
> to
> do is
> ask your SA to monitor the disk I/O usage and see whats happening
> to
> the
> disk containing the redo log file. Also you can use iostat -xPnce
> 5
> 5 (I
> think its available in AIX). the output will look something like
>
> us sy wt id
> 23 7 49 21
> extended device statistics ----
> errors
> ---
> r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b s/w h/w
> trn
> tot
> device
> 0.0 0.0 0.0 0.0 0.0 0.0 0.0 13.4 0 0 0 0
> 0
> 0
> c1t10d0s0
> 0.0 0.0 0.0 0.0 0.0 0.0 0.0 2.4 0 0 0 0
> 0
> 0
> c1t10d0s1
> 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0
> 0
> 0
> c1t10d0s2
> 0.0 1.3 0.3 10.1 0.0 0.1 4.6 105.8 0 1 0 0
> 0
> 0
> c1t10d0s3
> Watch out for %b column. If its somewhere around 60 then you may
> be
> experiencing contention on those slices. Ask you SA by giving
> him/her the
> device name and they will be able to give you what the FS is.
>
> NOTE - 4 Datafiles Containing a Particular Table & it's
> Corresponding indexes (to Which INSERTS
> happen ) were Moved out of the RAID onto a Local Disk whereafter
> performance Jumped from
> 850 per minute to 1100 per minute
>
> Ans - OK . Will Issue the Command
>
> Also I see
> Free Buffer waits. This is caused because there are no free
> buffers
> when
> they are requested - pointing directly to the inefficiency of the
> DBWR.
> Again the DBWR may be slow because you have a I/O bottleneck for
> the
> DB
> files and indirectly because of the Write complete waits.
>
> Qs. Would you recommend Setting Multiple db_writer processes
> even though async_io is set to true ?
>
>
> " Thus a Volume Group Contains 8 Disks with 4 Disks
> > Stripe Size = 4K "
>
>
> [Vivek] Please confirm me that the Stripe size is 4K. Usually you
>
> will have
> a stripe size of 32K,64K,128K,256K .... If you are having a stripe
>
> size of
> 4K and your db_block_size as 8K, It is going to take 2 reads to
> satisfy one
> request for a db_file_sequential_read (Usually requests 1 Oracle
> block)
> [Vivek] . This doesn't sound good if thats true.
>
> NOTE - The "test" Database is a 4K DB_BLOCK_SIZE while the Live
> Database is
> an 8K DB_BLOCK_SIZE , But the SAME performance Limits were
> Observed
> on Both
>
> So if possible could you please do the following.
>
> 1. Please shutdown the Test db and start fresh.
> 2. Run the same process again and also run the iostat -xPnce 60
> 10000 for
> the duration of the run (output this to a file)
> 3. Run vmstat and see whats the CPU usage for USER and SYSTEM
> 4. When the 500 A/c per sec reaches - Please run
>
> set lines 132
> select sid, substr(event,1,40),p1,p2,p3 from v$session_wait
> where event in ('db file scattered read','db file sequential
> read','latch
> free','buffer busy waits','log file sync','db file scattered
> read','enqueue');
>
> and give me the output.
>
> OK - Will Action & revert to you
>
> 5. IF may waits are showing as "latch free" in the event column of
>
> v$session_wait, Please query v$latch for
> latch#,name,level,GETS,MISSES,SLEEPS IMMEDIATE_GETS
> ,WAITS_HOLDING_LATCH, SPIN_GETS, SLEEP1.
>
> Ans OK
>
> 6. Preserve the output for v$latch_children.
>
> Ans OK
>
> Qs. Are their any IBM OS Specific Parameters which are considered
> benefitial for performance ?
>
>
> I will take a look at both the report.txt in more detail today and
>
> get back
> to you.
>
> Thanks Again so much
> Where are you Located / Phone No ? if you please
> We are Currently Located in Bangalore & Addressing The Problem at
> Mumbai over a WAN
> Phone 91 80 6588668
>
> HTH
>
> Vivek
>
> > -----Original Message-----
> > From: VIVEK_SHARMA [SMTP:VIVEK_SHARMA_at_infy.com]
> > Sent: Monday, November 27, 2000 11:16 AM
> > To: 'Vivek'
> > Cc: Arun Chakrapani
> > Subject: RE: SOS Alert
> >
> > THANKS so much for the association in this issue
> >
> > Attached 2 report.txt each taken for 1 Hour approx . Is it
> Sufficient ?
> >
> > Attached report.txt of Problem on the "Live" Database
> >
> > <<report.txt_Live>>
> > NOTE - A "test" Database Created on the DB Server Itself in
> the
> Same
> > Partitions as the "Live" Database
> > Runs Done on this "test" Database Also failing to go beyond 500
> A/cs per
> > minute
> >
> >
> > <<report.txt_TEST>>
> >
> > NOTE - Interest Runs were Done on the "Test" Database During
> Down-Time
> > Periods of the "Live" Database
> >
> > You Wrote :-
> > Since there is some write complete waits Please take a
> look
> at the
> > place the on line redo logs are and what is the read, write on
> these
> > disks.
> >
> > Ans Online Redo logs ALSO present on the RAID 0+1 Disks & are
> Multiplexed
> > Size of Online Redo logfile on the "Live" Database = 30 MB
> >
> > NOTE - On the test Database we increased the Size of the Online
> Redo
> > logfile
> > to 100MB but with NO benefit
> >
> > You wrote :-
> > Please give me the DB Version as well
> >
> > Ans Database Oracle version = 7.3.4.5.0
> >
> > You wrote :-
> > I see some waits on More data from client.
> > Qs. What is to be done about the Same ?
> >
> > NOTE - Will Run the 2 SQL Scripts Suggested by you & revert back
>
> >
> > Thanks Again
> >
> >
> > > -----Original Message-----
> > > From: Vivek [SMTP:Vivek_at_1800FLOWERS.com]
> > > Sent: Monday, November 27, 2000 7:44 PM
> > > To: 'VIVEK_SHARMA_at_infy.com'
> > >
> > > Cc: Arun Chakrapani
> > > Subject: SOS Alert
> > >
> > > Hi Vivek:
> > >
> > > Will it be possible to send the whole report.txt to me. Since
> you have
> > not
> > > given for how long this was taken (time from utlbstat and
> utlestat) it
> > is
> > > more difficult to say anything. Based on the report.txt
> snippet
> you
> > have
> > > given there seems to me there are some issues with the
> enqueues
> and
> > latch
> > > free waits. Please run the following SQL as SYS so bet' 2
> different
> > times
> > > (
> > > say 1/2 apart) and send me both the values(It will be great if
>
> you could
> > > run
> > > this when the batch processing is running).
> > >
> > > select KSQSTTYP,KSQSTGET,KSQSTWAT from x$ksqst where
> KSQSTGET>0
> and
> > > KSQSTWAT
> > > >0;
> > >
> > > The delta bet' the 2 results will give you information on what
>
> is the
> > > enqueue that we see more waits on. Also Please run this sql
> given below
> > in
> > > a
> > > cron every minute or so and output this to a file
> > >
> > > SELECT p1 "File", p2 "Block", p3 "Reason" FROM v$session_wait
>
> WHERE
> > event
> > > like 'buffer busy waits' ;
> > >
> > > This will give you and idea about the buffer busy waits. Since
>
> there is
> > > some
> > > write complete waits Please take a look at the place the on
> line
> redo
> > logs
> > > are and what is the read, write on these disks. Also Please
> give
> me
> > > infomation on the relog file size. Please give me the DB
> Version
> as
> > well.
> > > I
> > > see some waits on More data from client. Again since I don't
> know for
> > how
> > > long the stats were run can't say whether they are significant
>
> or not.
> > >
> > > HTH
> > >
> > > Vivek
> > >
> > >
> > >
> > > From: VIVEK_SHARMA <VIVEK_SHARMA_at_infy.com>
> > > Date: Mon, 27 Nov 2000 12:11:59 +0530
> > > Subject: SOS Alert
> > >
> > > CASE In a Bank , Interest Calculation Batch Processing Unable
> to
> go
> > beyond
> > > 500 A/cs per minute
> > >
> > > CAUSE of the Problem is UNKNOWN
> > >
> > > ORACLE 7.3.4.5 on AIX 4.3.3
> > > DB Server - IBM S80 Model - 12 CPUs , 3 GB RAM
> > > APP Server 1 - IBM S80 Model - 6 CPUs , 2 GB RAM
> > > APP Servers 2 & 3 - IBM Thin Nodes model - 4 CPUs , 1 GB RAM
> > >
> > > Storage Box :-
> > > ===========
> > > SS Class Storage
> > > RAID 0+1 - (First Striped & then Mirrored)
> > > NOTE - 2 Storage Boxes Exist , one being the Mirror of the
> Other
> > > Striping exists across a set 4 Disks (in one Box) with another
> 4
> being
> > > it's
> > > mirror
> > > (in another Box).
> > > Thus a Volume Group Contains 8 Disks with 4 Disks
> > > Stripe Size = 4K
> > >
> > > NOTE - Runs Tried from BOTH the APP Servers 1 OR 2 ,But with
> the
> SAME
> > MAX
> > > of
> > > 500 A/cs
> > > processed per minute
> > >
> > > CPU Utilizations on BOTH APP & DB Server = 40 %
> > > wio% on BOTH APP & DB Servers = 35 %
> > > No paging happening on Both APP & DB Servers
> > >
> > > - Oracle Contention Values Seem Small to us as shown Below or
> so
> they
> > seem
> > > to us :-
> > >
> > > SVRMGR> Rem System wide wait events for non-background
> processes
> (PMON,
> > > SVRMGR> Rem SMON, etc). Times are in hundreths of seconds.
> Each one of
> >
> > > SVRMGR> Rem these is a context switch which costs CPU time.
> By
> looking
> > at
> > > SVRMGR> Rem the Total Time you can often determine what is the
>
> > bottleneck
> > > SVRMGR> Rem that processes are waiting for. This shows the
> total time
> > > spent
> > > SVRMGR> Rem waiting for a specific event and the average time
> per wait
> > on
> > > SVRMGR> Rem that event.
> > > SVRMGR> select n1.event "Event Name",
> > > 2> n1.event_count "Count",
> > > 3> n1.time_waited "Total Time",
> > > 4> round(n1.time_waited/n1.event_count, 2) "Avg Time"
>
> > > 5> from stats$event n1
> > > 6> where n1.event_count > 0
> > > 7> order by n1.time_waited desc;
> > > Event Name Count Total Time
> Avg
> Time
> >
> > > -------------------------------- ------------- -------------
> > -------------
> > > SQL*Net message from client 10856276 31977110
> > 2.95
> > > enqueue 1295 374980
> > 289.56
> > > db file sequential read 3614044 303848
> > .08
> > > write complete waits 5812 295937
> > 50.92
> > > latch free 5045060 242170
> > .05
> > > SQL*Net more data from client 13939 165275
> > 11.86
> > > log file sync 12794 146409
> > 11.44
> > > buffer busy waits 100443 92477
> > .92
> > >
> > >
> > >
> > > - ALL Literal SQLs were Converted to using Bind variables
> > > - ALL Tables Running on Indexes Without Any FULL Scans
> happening
> .
> > >
> > > - All the Literal SQLs (Dynamic) Converted to using Bind
> variables
> > (Static
> > > Queries)
> > >
> > > - event="10181 trace name context forever, level 1000"
> > > NOTE - Set nevertheless , None of the SQLs taking Excessive
> time
> to
> > parse
> > > though
> > >
> > > - NO statistics are Analyzed
> > > - 7 tables involved in the interest Calc.
> > > inserts to 2 Tables
> > > selects , updates in 5 tables
> > >
> > > Qs. Are there Any BASIC O.S./Hardware Features to Enhance
> Performance on
> > > IBM
> > > AIX Systems ?
> > >
> > > Qs. Is Any Other portion of the report.txt required for
> perusal
> ?
> > >
> > > Please Revert to me for any further info / Clarifications ?
> > >
> > >
> > >
> > > << File: report.txt_Live >> << File: report.txt_TEST >>
>
> --
> Please see the official ORACLE-L FAQ: <http://www.orafaq.com>
> --
> Author: VIVEK_SHARMA
> INET: VIVEK_SHARMA_at_infy.com
>
> Fat City Network Services -- (858) 538-5051 FAX: (858) 538-5051
> San Diego, California -- Public Internet access / Mailing Lists
> --------------------------------------------------------------------
> To REMOVE yourself from this mailing list, send an E-Mail message
> to: ListGuru_at_fatcity.com (note EXACT spelling of 'ListGuru') and in
> the message BODY, include a line containing: UNSUB ORACLE-L
> (or the name of mailing list you want to be removed from). You may
> also send the HELP command for other information (like subscribing).
>
------_=_NextPart_001_01C06C3D.239F2110
Content-Type: text/html;
charset="iso-8859-1"
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN"> <HTML> <HEAD> <META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859-1"> <META NAME="Generator" CONTENT="MS Exchange Server version 5.5.2653.12"> <TITLE>RE: SOS Alert </TITLE> </HEAD> <BODY>
<P><FONT SIZE=2>Interesting question! My stab in the dark, shot from the hip, wild-ass guess is as follows:</FONT> </P>
<P><FONT SIZE=2>Since modern OSes can cache code segments via shared </FONT> <BR><FONT SIZE=2>libraries, there is a *very high* read to write ratio (i.e. </FONT> <BR><FONT SIZE=2>it is read once from disk into OS cache, and re-used from </FONT> <BR><FONT SIZE=2>there by many folks ) and so the "read from disk" is not a </FONT> <BR><FONT SIZE=2>significant performance issue.</FONT> </P> <P><FONT SIZE=2>Where disk config *would* become significant would be in case </FONT> <BR><FONT SIZE=2>of failure: What is faster? Recover by backup? Software </FONT> <BR><FONT SIZE=2>re-install? And, what is more likely to fail? Software </FONT> <BR><FONT SIZE=2>striped across ten disks _without_ a mirror is more likely to </FONT> <BR><FONT SIZE=2>become unavailable than software located on one disk, just </FONT> <BR><FONT SIZE=2>because of the inverse additive effect of individual disk </FONT> <BR><FONT SIZE=2>MTBFs in a stripe set. So, if you must stripe app binaries, </FONT><BR><FONT SIZE=2>mirror them. </FONT>
<P><FONT SIZE=2>There are other comments about code re-use in the OS, but </FONT> <BR><FONT SIZE=2>if i start on those, this won't be a "shot from the hip" </FONT> <BR><FONT SIZE=2>anymore. </FONT> </P> <P><FONT SIZE=2>Maybe Eric Pierce can Google us up some pithy and provocative </FONT><BR><FONT SIZE=2>URLs to keep us thinking on this topic!</FONT> </P>
<P><FONT SIZE=2>hth</FONT>
</P>
<P><FONT SIZE=2>Ross</FONT>
</P>
<P><FONT SIZE=2>-----Original Message-----</FONT> <BR><FONT SIZE=2>From: VIVEK_SHARMA [<A HREF="mailto:VIVEK_SHARMA_at_infy.com">mailto:VIVEK_SHARMA_at_infy.com</A>]</FONT> <BR><FONT SIZE=2>Sent: Friday, December 22, 2000 1:26 AM</FONT> <BR><FONT SIZE=2>To: 'ORACLE-L_at_fatcity.com'; 'MohanR_at_STARS-SMI.com'; 'turner_at_tellme.com';</FONT> <BR><FONT SIZE=2>'oracledba_at_lazydba.com'</FONT> <BR><FONT SIZE=2>Subject: RE: SOS Alert </FONT> </P> <BR>
<P><FONT SIZE=2>A related Basic Qs.</FONT> </P>
<P><FONT SIZE=2>Can the Application Software ( Banking Application of pro*c Executables in</FONT> <BR><FONT SIZE=2>this Case) be put on Striped Disks belonging to the APP Server ? </FONT> </P> <P><FONT SIZE=2>Does it Have any Certain Negative performance impacts ?</FONT> <BR><FONT SIZE=2>OR is Performance the Same whether the Application Lies on Striped Disks or</FONT> <BR><FONT SIZE=2>NON-Striped Disks ?</FONT></P>
<P><FONT SIZE=2>> -----Original Message-----</FONT> <BR><FONT SIZE=2>> From: Mohan, Ross [SMTP:MohanR_at_STARS-SMI.com]</FONT> <BR><FONT SIZE=2>> Sent: Thursday, December 21, 2000 1:51 AM</FONT> <BR><FONT SIZE=2>> To: Multiple recipients of list ORACLE-L</FONT> <BR><FONT SIZE=2>> Subject: RE: SOS Alert </FONT> <BR><FONT SIZE=2>> </FONT> <BR><FONT SIZE=2>> 4K striping is a major Major MAJOR mistake. </FONT> <BR><FONT SIZE=2>> </FONT> <BR><FONT SIZE=2>> Doing it in software makes it at least 3x as </FONT> <BR><FONT SIZE=2>> bad. </FONT> <BR><FONT SIZE=2>> </FONT> <BR><FONT SIZE=2>> Either fix alone would have provided at least a </FONT> <BR><FONT SIZE=2>> 2-4x improvement. </FONT> <BR><FONT SIZE=2>> </FONT> <BR><FONT SIZE=2>> Given a choice, I'd fix the stripe size first, then </FONT> <BR><FONT SIZE=2>> convert the mirroring to hardware. </FONT> <BR><FONT SIZE=2>> </FONT> <BR><FONT SIZE=2>> hth </FONT> <BR><FONT SIZE=2>> </FONT> <BR><FONT SIZE=2>> imho </FONT> <BR><FONT SIZE=2>> </FONT> <BR><FONT SIZE=2>> ymmv </FONT> <BR><FONT SIZE=2>> </FONT> <BR><FONT SIZE=2>> afaik, </FONT> <BR><FONT SIZE=2>> </FONT> <BR><FONT SIZE=2>> etc. </FONT> <BR><FONT SIZE=2>> </FONT> <BR><FONT SIZE=2>> - Ross Mohan </FONT> <BR><FONT SIZE=2>> </FONT> <BR><FONT SIZE=2>> -----Original Message----- </FONT> <BR><FONT SIZE=2>> From: VIVEK_SHARMA [ <<A HREF="mailto:VIVEK_SHARMA_at_infy.com">mailto:VIVEK_SHARMA_at_infy.com</A>>] </FONT> <BR><FONT SIZE=2>> Sent: Wednesday, December 20, 2000 11:13 AM </FONT> <BR><FONT SIZE=2>> To: Multiple recipients of list ORACLE-L </FONT> <BR><FONT SIZE=2>> Subject: RE: SOS Alert </FONT> <BR><FONT SIZE=2>> </FONT> <BR><FONT SIZE=2>> </FONT> <BR><FONT SIZE=2>> CASE In a Bank , Interest Calculation Batch Processing Unable to go beyond</FONT> <BR><FONT SIZE=2>> </FONT> <BR><FONT SIZE=2>> 500 A/cs per minute </FONT> <BR><FONT SIZE=2>> </FONT> <BR><FONT SIZE=2>> ACTION Moved the Database to Another IBM Machine ( 4 CPU , 4 GB RAM ) with</FONT> <BR><FONT SIZE=2>> </FONT> <BR><FONT SIZE=2>> HARDWARE Mirroring & 64 K Stripe Size </FONT> <BR><FONT SIZE=2>> </FONT> <BR><FONT SIZE=2>> NOTE - On the Under-performing DB Server S80 Model of IBM) there exists </FONT> <BR><FONT SIZE=2>> Software Mirroring (By the AIX O.S.) & a Stripe Size of 4K exist </FONT> <BR><FONT SIZE=2>> </FONT> <BR><FONT SIZE=2>> Firing Off the Interest Calculation Runs from 2 APP Servers Giving an </FONT> <BR><FONT SIZE=2>> Overall Total Processing of Number of 3000 A/cs per minute ( 1500 from</FONT> <BR><FONT SIZE=2>> each </FONT> <BR><FONT SIZE=2>> ) </FONT> <BR><FONT SIZE=2>> </FONT> <BR><FONT SIZE=2>> Qs. Which of the 2 i.e. Hardware Mirroring OR 64 K Striping , do you think</FONT> <BR><FONT SIZE=2>> </FONT> <BR><FONT SIZE=2>> is the Cause of the Increase in performance as we put BOTH features</FONT> <BR><FONT SIZE=2>> Together </FONT> <BR><FONT SIZE=2>> at the SAME Time OR do Both Contribute Equally ? </FONT> <BR><FONT SIZE=2>> </FONT> <BR><FONT SIZE=2>> Thanks to Vivek for everything </FONT> <BR><FONT SIZE=2>> </FONT> <BR><FONT SIZE=2>> </FONT> <BR><FONT SIZE=2>> -----Original Message----- </FONT> <BR><FONT SIZE=2>> From: VIVEK_SHARMA </FONT> <BR><FONT SIZE=2>> Sent: Thursday, December 07, 2000 6:35 PM </FONT> <BR><FONT SIZE=2>> Subject: RE: SOS Alert </FONT> <BR><FONT SIZE=2>> </FONT> <BR><FONT SIZE=2>> </FONT> <BR><FONT SIZE=2>> </FONT> <BR><FONT SIZE=2>> ACTION Identified the 2 indexes having the waits of "db file </FONT> <BR><FONT SIZE=2>> sequential read" </FONT> <BR><FONT SIZE=2>> Created 2 NEW Tablespaces Each containing 1 Index </FONT> <BR><FONT SIZE=2>> Also Specified parameter log_small_entry_max_size = 0 </FONT> <BR><FONT SIZE=2>> NOTE - ORA 7.3.4.5 </FONT> <BR><FONT SIZE=2>> </FONT> <BR><FONT SIZE=2>> RESULT Performance thruput increased to 1500 A/cs per minute from </FONT> <BR><FONT SIZE=2>> 1000 A/cs per minute </FONT> <BR><FONT SIZE=2>> </FONT> <BR><FONT SIZE=2>> </FONT> <BR><FONT SIZE=2>> ============================================== </FONT> <BR><FONT SIZE=2>> Allow me to begin :- </FONT> <BR><FONT SIZE=2>> There were 2 routes in between the APP & DB Servers . </FONT> <BR><FONT SIZE=2>> One had a Bandwidth of 10 MBPS while Another had a Bandwidth of</FONT> <BR><FONT SIZE=2>> 100 </FONT> <BR><FONT SIZE=2>> MBPS </FONT> <BR><FONT SIZE=2>> The 10 MBPS route was being used all this while . On Enabling the </FONT> <BR><FONT SIZE=2>> 100 MBPS route </FONT> <BR><FONT SIZE=2>> (by Specifying the I.P. Address Instead of Hostname in the </FONT> <BR><FONT SIZE=2>> tnsnames.ora & listener.ora) </FONT> <BR><FONT SIZE=2>> the performance has jumped to 1100 records per minute (from 500 </FONT> <BR><FONT SIZE=2>> records per minute) </FONT> <BR><FONT SIZE=2>> i.e. Almost Doubled . </FONT> <BR><FONT SIZE=2>> 100 MBPS is the MAX Possible Bandwidth Possible as per the SA </FONT> <BR><FONT SIZE=2>> </FONT> <BR><FONT SIZE=2>> Qs. Your Views on this please </FONT> <BR><FONT SIZE=2>> </FONT> <BR><FONT SIZE=2>> Additional responses Follow your Views BELOW :- </FONT> <BR><FONT SIZE=2>> </FONT> <BR><FONT SIZE=2>> -----Original Message----- </FONT> <BR><FONT SIZE=2>> From: Vivek [SMTP:Vivek_at_1800FLOWERS.com] </FONT> <BR><FONT SIZE=2>> Sent: Tuesday, November 28, 2000 6:30 PM </FONT> <BR><FONT SIZE=2>> To: 'VIVEK_SHARMA' </FONT> <BR><FONT SIZE=2>> Cc: Arun Chakrapani </FONT> <BR><FONT SIZE=2>> Subject: RE: SOS Alert </FONT> <BR><FONT SIZE=2>> </FONT> <BR><FONT SIZE=2>> Vivek: </FONT> <BR><FONT SIZE=2>> </FONT> <BR><FONT SIZE=2>> Sorry for not replying to you yesterday itself. A Busy day. Took</FONT> <BR><FONT SIZE=2>> a </FONT> <BR><FONT SIZE=2>> look at </FONT> <BR><FONT SIZE=2>> you report.txt. They were having 2 different profiles. Since You</FONT> <BR><FONT SIZE=2>> had </FONT> <BR><FONT SIZE=2>> mentioned that it also stalled in Test DB I concentrated more on </FONT> <BR><FONT SIZE=2>> the test </FONT> <BR><FONT SIZE=2>> report.txt. One thing which stood out on the other one (live) on </FONT> <BR><FONT SIZE=2>> opening you </FONT> <BR><FONT SIZE=2>> report.txt is the SQL _AREA GET_HIT_RATIO is very poor. I don't</FONT> <BR><FONT SIZE=2>> know </FONT> <BR><FONT SIZE=2>> whether </FONT> <BR><FONT SIZE=2>> you have already pinned your SYS objects and other objects that</FONT> <BR><FONT SIZE=2>> are </FONT> <BR><FONT SIZE=2>> executed </FONT> <BR><FONT SIZE=2>> often in shared pool. Please use DBMS_SHARED_POOL.keep to pin </FONT> <BR><FONT SIZE=2>> package,procedure,function,sequence,cursors, triggers that are</FONT> <BR><FONT SIZE=2>> used </FONT> <BR><FONT SIZE=2>> often. </FONT> <BR><FONT SIZE=2>> </FONT> <BR><FONT SIZE=2>> Ans OK will do so </FONT> <BR><FONT SIZE=2>> </FONT> <BR><FONT SIZE=2>> So now I will concentrate on the Test DB. </FONT> <BR><FONT SIZE=2>> </FONT> <BR><FONT SIZE=2>> 1. Even though the SQL_AREA GET_HIT_RATIO is higer(82%) when </FONT> <BR><FONT SIZE=2>> compared to </FONT> <BR><FONT SIZE=2>> live this is not good enough. It should be close to 100%(1 some </FONT> <BR><FONT SIZE=2>> where like </FONT> <BR><FONT SIZE=2>> .98 alteast). Please pin the SYS objects to start with. </FONT> <BR><FONT SIZE=2>> </FONT> <BR><FONT SIZE=2>> OK </FONT> <BR><FONT SIZE=2>> Allow me to say that the shared_pool_size has by experience been </FONT> <BR><FONT SIZE=2>> kept at bare minimum required </FONT> <BR><FONT SIZE=2>> for we did experience performance poblems with a shred_pool_size</FONT> <BR><FONT SIZE=2>> of </FONT> <BR><FONT SIZE=2>> 300 MB </FONT> <BR><FONT SIZE=2>> ( Though on Another machine on SUN OS ) </FONT> <BR><FONT SIZE=2>> It is recommended to keep shared_pool_size at rock bottom just</FONT> <BR><FONT SIZE=2>> above </FONT> <BR><FONT SIZE=2>> the Limit </FONT> <BR><FONT SIZE=2>> where ORA-4031 out of shared pool segments occurs </FONT> <BR><FONT SIZE=2>> </FONT> <BR><FONT SIZE=2>> </FONT> <BR><FONT SIZE=2>> 2. On seeing you system wide wait events the following stands out.</FONT> <BR><FONT SIZE=2>> </FONT> <BR><FONT SIZE=2>> </FONT> <BR><FONT SIZE=2>> 1.log file sync 2863 83922</FONT> <BR><FONT SIZE=2>> 29.31 </FONT> <BR><FONT SIZE=2>> 2.write complete waits 905 71401 78.9</FONT> <BR><FONT SIZE=2>> </FONT> <BR><FONT SIZE=2>> 3.free buffer waits 60 4718</FONT> <BR><FONT SIZE=2>> 78.63 </FONT> <BR><FONT SIZE=2>> </FONT> <BR><FONT SIZE=2>> The first event log file sync event mostly comes when there is</FONT> <BR><FONT SIZE=2>> a </FONT> <BR><FONT SIZE=2>> disk </FONT> <BR><FONT SIZE=2>> i/o bottleneck for the online redo logs.This happens when the </FONT> <BR><FONT SIZE=2>> processes have </FONT> <BR><FONT SIZE=2>> to wait for the LGWR to sync a particular REDO log. The p1</FONT> <BR><FONT SIZE=2>> parameter </FONT> <BR><FONT SIZE=2>> in a </FONT> <BR><FONT SIZE=2>> Log sync in the corresponding LOg Buffer. </FONT> <BR><FONT SIZE=2>> </FONT> <BR><FONT SIZE=2>> </FONT> <BR><FONT SIZE=2>> The Write complete waits comes </FONT> <BR><FONT SIZE=2>> when the user process is waiting for the LGWR to complete the </FONT> <BR><FONT SIZE=2>> writing of the </FONT> <BR><FONT SIZE=2>> redo information. So Please take a look at the placing of the </FONT> <BR><FONT SIZE=2>> online redo </FONT> <BR><FONT SIZE=2>> log. In which disk have you placed this. Also if it has been place</FONT> <BR><FONT SIZE=2>> </FONT> <BR><FONT SIZE=2>> in its </FONT> <BR><FONT SIZE=2>> own file system it doesn't mean that it is not going to a busy</FONT> <BR><FONT SIZE=2>> disk. </FONT> <BR><FONT SIZE=2>> Since </FONT> <BR><FONT SIZE=2>> you have 0+1, The slices may belong to a HOT disk. What you need</FONT> <BR><FONT SIZE=2>> to </FONT> <BR><FONT SIZE=2>> do is </FONT> <BR><FONT SIZE=2>> ask your SA to monitor the disk I/O usage and see whats happening</FONT> <BR><FONT SIZE=2>> to </FONT> <BR><FONT SIZE=2>> the </FONT> <BR><FONT SIZE=2>> disk containing the redo log file. Also you can use iostat -xPnce</FONT> <BR><FONT SIZE=2>> 5 </FONT> <BR><FONT SIZE=2>> 5 (I </FONT> <BR><FONT SIZE=2>> think its available in AIX). the output will look something like </FONT> <BR><FONT SIZE=2>> </FONT> <BR><FONT SIZE=2>> us sy wt id </FONT> <BR><FONT SIZE=2>> 23 7 49 21 </FONT> <BR><FONT SIZE=2>> extended device statistics ----</FONT> <BR><FONT SIZE=2>> errors </FONT> <BR><FONT SIZE=2>> --- </FONT> <BR><FONT SIZE=2>> r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b s/w h/w</FONT> <BR><FONT SIZE=2>> trn </FONT> <BR><FONT SIZE=2>> tot </FONT> <BR><FONT SIZE=2>> device </FONT> <BR><FONT SIZE=2>> 0.0 0.0 0.0 0.0 0.0 0.0 0.0 13.4 0 0 0 0</FONT> <BR><FONT SIZE=2>> 0 </FONT> <BR><FONT SIZE=2>> 0 </FONT> <BR><FONT SIZE=2>> c1t10d0s0 </FONT> <BR><FONT SIZE=2>> 0.0 0.0 0.0 0.0 0.0 0.0 0.0 2.4 0 0 0 0</FONT> <BR><FONT SIZE=2>> 0 </FONT> <BR><FONT SIZE=2>> 0 </FONT> <BR><FONT SIZE=2>> c1t10d0s1 </FONT> <BR><FONT SIZE=2>> 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0</FONT> <BR><FONT SIZE=2>> 0 </FONT> <BR><FONT SIZE=2>> 0 </FONT> <BR><FONT SIZE=2>> c1t10d0s2 </FONT> <BR><FONT SIZE=2>> 0.0 1.3 0.3 10.1 0.0 0.1 4.6 105.8 0 1 0 0</FONT> <BR><FONT SIZE=2>> 0 </FONT> <BR><FONT SIZE=2>> 0 </FONT> <BR><FONT SIZE=2>> c1t10d0s3 </FONT> <BR><FONT SIZE=2>> Watch out for %b column. If its somewhere around 60 then you may</FONT> <BR><FONT SIZE=2>> be </FONT> <BR><FONT SIZE=2>> experiencing contention on those slices. Ask you SA by giving </FONT> <BR><FONT SIZE=2>> him/her the </FONT> <BR><FONT SIZE=2>> device name and they will be able to give you what the FS is. </FONT> <BR><FONT SIZE=2>> </FONT> <BR><FONT SIZE=2>> NOTE - 4 Datafiles Containing a Particular Table & it's </FONT> <BR><FONT SIZE=2>> Corresponding indexes (to Which INSERTS </FONT> <BR><FONT SIZE=2>> happen ) were Moved out of the RAID onto a Local Disk whereafter </FONT> <BR><FONT SIZE=2>> performance Jumped from </FONT> <BR><FONT SIZE=2>> 850 per minute to 1100 per minute </FONT> <BR><FONT SIZE=2>> </FONT> <BR><FONT SIZE=2>> Ans - OK . Will Issue the Command </FONT> <BR><FONT SIZE=2>> </FONT> <BR><FONT SIZE=2>> Also I see </FONT> <BR><FONT SIZE=2>> Free Buffer waits. This is caused because there are no free</FONT> <BR><FONT SIZE=2>> buffers </FONT> <BR><FONT SIZE=2>> when </FONT> <BR><FONT SIZE=2>> they are requested - pointing directly to the inefficiency of the </FONT> <BR><FONT SIZE=2>> DBWR. </FONT> <BR><FONT SIZE=2>> Again the DBWR may be slow because you have a I/O bottleneck for</FONT> <BR><FONT SIZE=2>> the </FONT> <BR><FONT SIZE=2>> DB </FONT> <BR><FONT SIZE=2>> files and indirectly because of the Write complete waits. </FONT> <BR><FONT SIZE=2>> </FONT> <BR><FONT SIZE=2>> Qs. Would you recommend Setting Multiple db_writer processes </FONT> <BR><FONT SIZE=2>> even though async_io is set to true ? </FONT> <BR><FONT SIZE=2>> </FONT> <BR><FONT SIZE=2>> </FONT> <BR><FONT SIZE=2>> " Thus a Volume Group Contains 8 Disks with 4 Disks </FONT> <BR><FONT SIZE=2>> > Stripe Size = 4K " </FONT> <BR><FONT SIZE=2>> </FONT> <BR><FONT SIZE=2>> </FONT> <BR><FONT SIZE=2>> [Vivek] Please confirm me that the Stripe size is 4K. Usually you</FONT> <BR><FONT SIZE=2>> </FONT> <BR><FONT SIZE=2>> will have </FONT> <BR><FONT SIZE=2>> a stripe size of 32K,64K,128K,256K .... If you are having a stripe</FONT> <BR><FONT SIZE=2>> </FONT> <BR><FONT SIZE=2>> size of </FONT> <BR><FONT SIZE=2>> 4K and your db_block_size as 8K, It is going to take 2 reads to </FONT> <BR><FONT SIZE=2>> satisfy one </FONT> <BR><FONT SIZE=2>> request for a db_file_sequential_read (Usually requests 1 Oracle </FONT> <BR><FONT SIZE=2>> block) </FONT> <BR><FONT SIZE=2>> [Vivek] . This doesn't sound good if thats true. </FONT> <BR><FONT SIZE=2>> </FONT> <BR><FONT SIZE=2>> NOTE - The "test" Database is a 4K DB_BLOCK_SIZE while the Live </FONT> <BR><FONT SIZE=2>> Database is </FONT> <BR><FONT SIZE=2>> an 8K DB_BLOCK_SIZE , But the SAME performance Limits were</FONT> <BR><FONT SIZE=2>> Observed </FONT> <BR><FONT SIZE=2>> on Both </FONT> <BR><FONT SIZE=2>> </FONT> <BR><FONT SIZE=2>> So if possible could you please do the following. </FONT> <BR><FONT SIZE=2>> </FONT> <BR><FONT SIZE=2>> 1. Please shutdown the Test db and start fresh. </FONT> <BR><FONT SIZE=2>> 2. Run the same process again and also run the iostat -xPnce 60 </FONT> <BR><FONT SIZE=2>> 10000 for </FONT> <BR><FONT SIZE=2>> the duration of the run (output this to a file) </FONT> <BR><FONT SIZE=2>> 3. Run vmstat and see whats the CPU usage for USER and SYSTEM </FONT> <BR><FONT SIZE=2>> 4. When the 500 A/c per sec reaches - Please run </FONT> <BR><FONT SIZE=2>> </FONT> <BR><FONT SIZE=2>> set lines 132 </FONT> <BR><FONT SIZE=2>> select sid, substr(event,1,40),p1,p2,p3 from v$session_wait </FONT> <BR><FONT SIZE=2>> where event in ('db file scattered read','db file sequential </FONT> <BR><FONT SIZE=2>> read','latch </FONT> <BR><FONT SIZE=2>> free','buffer busy waits','log file sync','db file scattered </FONT> <BR><FONT SIZE=2>> read','enqueue'); </FONT> <BR><FONT SIZE=2>> </FONT> <BR><FONT SIZE=2>> and give me the output. </FONT> <BR><FONT SIZE=2>> </FONT> <BR><FONT SIZE=2>> OK - Will Action & revert to you </FONT> <BR><FONT SIZE=2>> </FONT> <BR><FONT SIZE=2>> 5. IF may waits are showing as "latch free" in the event column of</FONT> <BR><FONT SIZE=2>> </FONT> <BR><FONT SIZE=2>> v$session_wait, Please query v$latch for </FONT> <BR><FONT SIZE=2>> latch#,name,level,GETS,MISSES,SLEEPS IMMEDIATE_GETS </FONT> <BR><FONT SIZE=2>> ,WAITS_HOLDING_LATCH, SPIN_GETS, SLEEP1. </FONT> <BR><FONT SIZE=2>> </FONT> <BR><FONT SIZE=2>> Ans OK </FONT> <BR><FONT SIZE=2>> </FONT> <BR><FONT SIZE=2>> 6. Preserve the output for v$latch_children. </FONT> <BR><FONT SIZE=2>> </FONT> <BR><FONT SIZE=2>> Ans OK </FONT> <BR><FONT SIZE=2>> </FONT> <BR><FONT SIZE=2>> Qs. Are their any IBM OS Specific Parameters which are considered </FONT> <BR><FONT SIZE=2>> benefitial for performance ? </FONT> <BR><FONT SIZE=2>> </FONT> <BR><FONT SIZE=2>> </FONT> <BR><FONT SIZE=2>> I will take a look at both the report.txt in more detail today and</FONT> <BR><FONT SIZE=2>> </FONT> <BR><FONT SIZE=2>> get back </FONT> <BR><FONT SIZE=2>> to you. </FONT> <BR><FONT SIZE=2>> </FONT> <BR><FONT SIZE=2>> Thanks Again so much </FONT> <BR><FONT SIZE=2>> Where are you Located / Phone No ? if you please </FONT> <BR><FONT SIZE=2>> We are Currently Located in Bangalore & Addressing The Problem at </FONT> <BR><FONT SIZE=2>> Mumbai over a WAN </FONT> <BR><FONT SIZE=2>> Phone 91 80 6588668 </FONT> <BR><FONT SIZE=2>> </FONT> <BR><FONT SIZE=2>> HTH </FONT> <BR><FONT SIZE=2>> </FONT> <BR><FONT SIZE=2>> Vivek </FONT> <BR><FONT SIZE=2>> </FONT> <BR><FONT SIZE=2>> > -----Original Message----- </FONT> <BR><FONT SIZE=2>> > From: VIVEK_SHARMA [SMTP:VIVEK_SHARMA_at_infy.com] </FONT> <BR><FONT SIZE=2>> > Sent: Monday, November 27, 2000 11:16 AM </FONT> <BR><FONT SIZE=2>> > To: 'Vivek' </FONT> <BR><FONT SIZE=2>> > Cc: Arun Chakrapani </FONT> <BR><FONT SIZE=2>> > Subject: RE: SOS Alert </FONT> <BR><FONT SIZE=2>> > </FONT> <BR><FONT SIZE=2>> > THANKS so much for the association in this issue </FONT> <BR><FONT SIZE=2>> > </FONT> <BR><FONT SIZE=2>> > Attached 2 report.txt each taken for 1 Hour approx . Is it </FONT> <BR><FONT SIZE=2>> Sufficient ? </FONT> <BR><FONT SIZE=2>> > </FONT> <BR><FONT SIZE=2>> > Attached report.txt of Problem on the "Live" Database </FONT> <BR><FONT SIZE=2>> > </FONT> <BR><FONT SIZE=2>> > <<report.txt_Live>> </FONT> <BR><FONT SIZE=2>> > NOTE - A "test" Database Created on the DB Server Itself in</FONT> <BR><FONT SIZE=2>> the </FONT> <BR><FONT SIZE=2>> Same </FONT> <BR><FONT SIZE=2>> > Partitions as the "Live" Database </FONT> <BR><FONT SIZE=2>> > Runs Done on this "test" Database Also failing to go beyond 500 </FONT> <BR><FONT SIZE=2>> A/cs per </FONT> <BR><FONT SIZE=2>> > minute </FONT> <BR><FONT SIZE=2>> > </FONT> <BR><FONT SIZE=2>> > </FONT> <BR><FONT SIZE=2>> > <<report.txt_TEST>> </FONT> <BR><FONT SIZE=2>> > </FONT> <BR><FONT SIZE=2>> > NOTE - Interest Runs were Done on the "Test" Database During </FONT> <BR><FONT SIZE=2>> Down-Time </FONT> <BR><FONT SIZE=2>> > Periods of the "Live" Database </FONT> <BR><FONT SIZE=2>> > </FONT> <BR><FONT SIZE=2>> > You Wrote :- </FONT> <BR><FONT SIZE=2>> > Since there is some write complete waits Please take a</FONT> <BR><FONT SIZE=2>> look </FONT> <BR><FONT SIZE=2>> at the </FONT> <BR><FONT SIZE=2>> > place the on line redo logs are and what is the read, write on </FONT> <BR><FONT SIZE=2>> these </FONT> <BR><FONT SIZE=2>> > disks. </FONT> <BR><FONT SIZE=2>> > </FONT> <BR><FONT SIZE=2>> > Ans Online Redo logs ALSO present on the RAID 0+1 Disks & are </FONT> <BR><FONT SIZE=2>> Multiplexed </FONT> <BR><FONT SIZE=2>> > Size of Online Redo logfile on the "Live" Database = 30 MB </FONT> <BR><FONT SIZE=2>> > </FONT> <BR><FONT SIZE=2>> > NOTE - On the test Database we increased the Size of the Online </FONT> <BR><FONT SIZE=2>> Redo </FONT> <BR><FONT SIZE=2>> > logfile </FONT> <BR><FONT SIZE=2>> > to 100MB but with NO benefit </FONT> <BR><FONT SIZE=2>> > </FONT> <BR><FONT SIZE=2>> > You wrote :- </FONT> <BR><FONT SIZE=2>> > Please give me the DB Version as well </FONT> <BR><FONT SIZE=2>> > </FONT> <BR><FONT SIZE=2>> > Ans Database Oracle version = 7.3.4.5.0 </FONT> <BR><FONT SIZE=2>> > </FONT> <BR><FONT SIZE=2>> > You wrote :- </FONT> <BR><FONT SIZE=2>> > I see some waits on More data from client. </FONT> <BR><FONT SIZE=2>> > Qs. What is to be done about the Same ? </FONT> <BR><FONT SIZE=2>> > </FONT> <BR><FONT SIZE=2>> > NOTE - Will Run the 2 SQL Scripts Suggested by you & revert back</FONT> <BR><FONT SIZE=2>> </FONT> <BR><FONT SIZE=2>> > </FONT> <BR><FONT SIZE=2>> > Thanks Again </FONT> <BR><FONT SIZE=2>> > </FONT> <BR><FONT SIZE=2>> > </FONT> <BR><FONT SIZE=2>> > > -----Original Message----- </FONT> <BR><FONT SIZE=2>> > > From: Vivek [SMTP:Vivek_at_1800FLOWERS.com] </FONT> <BR><FONT SIZE=2>> > > Sent: Monday, November 27, 2000 7:44 PM </FONT> <BR><FONT SIZE=2>> > > To: 'VIVEK_SHARMA_at_infy.com' </FONT> <BR><FONT SIZE=2>> > > </FONT> <BR><FONT SIZE=2>> > > Cc: Arun Chakrapani </FONT> <BR><FONT SIZE=2>> > > Subject: SOS Alert </FONT> <BR><FONT SIZE=2>> > > </FONT> <BR><FONT SIZE=2>> > > Hi Vivek: </FONT> <BR><FONT SIZE=2>> > > </FONT> <BR><FONT SIZE=2>> > > Will it be possible to send the whole report.txt to me. Since </FONT> <BR><FONT SIZE=2>> you have </FONT> <BR><FONT SIZE=2>> > not </FONT> <BR><FONT SIZE=2>> > > given for how long this was taken (time from utlbstat and </FONT> <BR><FONT SIZE=2>> utlestat) it </FONT> <BR><FONT SIZE=2>> > is </FONT> <BR><FONT SIZE=2>> > > more difficult to say anything. Based on the report.txt</FONT> <BR><FONT SIZE=2>> snippet </FONT> <BR><FONT SIZE=2>> you </FONT> <BR><FONT SIZE=2>> > have </FONT> <BR><FONT SIZE=2>> > > given there seems to me there are some issues with the</FONT> <BR><FONT SIZE=2>> enqueues </FONT> <BR><FONT SIZE=2>> and </FONT> <BR><FONT SIZE=2>> > latch </FONT> <BR><FONT SIZE=2>> > > free waits. Please run the following SQL as SYS so bet' 2 </FONT> <BR><FONT SIZE=2>> different </FONT> <BR><FONT SIZE=2>> > times </FONT> <BR><FONT SIZE=2>> > > ( </FONT> <BR><FONT SIZE=2>> > > say 1/2 apart) and send me both the values(It will be great if</FONT> <BR><FONT SIZE=2>> </FONT> <BR><FONT SIZE=2>> you could </FONT> <BR><FONT SIZE=2>> > > run </FONT> <BR><FONT SIZE=2>> > > this when the batch processing is running). </FONT> <BR><FONT SIZE=2>> > > </FONT> <BR><FONT SIZE=2>> > > select KSQSTTYP,KSQSTGET,KSQSTWAT from x$ksqst where</FONT> <BR><FONT SIZE=2>> KSQSTGET>0 </FONT> <BR><FONT SIZE=2>> and </FONT> <BR><FONT SIZE=2>> > > KSQSTWAT </FONT> <BR><FONT SIZE=2>> > > >0; </FONT> <BR><FONT SIZE=2>> > > </FONT> <BR><FONT SIZE=2>> > > The delta bet' the 2 results will give you information on what</FONT> <BR><FONT SIZE=2>> </FONT> <BR><FONT SIZE=2>> is the </FONT> <BR><FONT SIZE=2>> > > enqueue that we see more waits on. Also Please run this sql </FONT> <BR><FONT SIZE=2>> given below </FONT> <BR><FONT SIZE=2>> > in </FONT> <BR><FONT SIZE=2>> > > a </FONT> <BR><FONT SIZE=2>> > > cron every minute or so and output this to a file </FONT> <BR><FONT SIZE=2>> > > </FONT> <BR><FONT SIZE=2>> > > SELECT p1 "File", p2 "Block", p3 "Reason" FROM v$session_wait</FONT> <BR><FONT SIZE=2>> </FONT> <BR><FONT SIZE=2>> WHERE </FONT> <BR><FONT SIZE=2>> > event </FONT> <BR><FONT SIZE=2>> > > like 'buffer busy waits' ; </FONT> <BR><FONT SIZE=2>> > > </FONT> <BR><FONT SIZE=2>> > > This will give you and idea about the buffer busy waits. Since</FONT> <BR><FONT SIZE=2>> </FONT> <BR><FONT SIZE=2>> there is </FONT> <BR><FONT SIZE=2>> > > some </FONT> <BR><FONT SIZE=2>> > > write complete waits Please take a look at the place the on</FONT> <BR><FONT SIZE=2>> line </FONT> <BR><FONT SIZE=2>> redo </FONT> <BR><FONT SIZE=2>> > logs </FONT> <BR><FONT SIZE=2>> > > are and what is the read, write on these disks. Also Please</FONT> <BR><FONT SIZE=2>> give </FONT> <BR><FONT SIZE=2>> me </FONT> <BR><FONT SIZE=2>> > > infomation on the relog file size. Please give me the DB</FONT> <BR><FONT SIZE=2>> Version </FONT> <BR><FONT SIZE=2>> as </FONT> <BR><FONT SIZE=2>> > well. </FONT> <BR><FONT SIZE=2>> > > I </FONT> <BR><FONT SIZE=2>> > > see some waits on More data from client. Again since I don't </FONT> <BR><FONT SIZE=2>> know for </FONT> <BR><FONT SIZE=2>> > how </FONT> <BR><FONT SIZE=2>> > > long the stats were run can't say whether they are significant</FONT> <BR><FONT SIZE=2>> </FONT> <BR><FONT SIZE=2>> or not. </FONT> <BR><FONT SIZE=2>> > > </FONT> <BR><FONT SIZE=2>> > > HTH </FONT> <BR><FONT SIZE=2>> > > </FONT> <BR><FONT SIZE=2>> > > Vivek </FONT> <BR><FONT SIZE=2>> > > </FONT> <BR><FONT SIZE=2>> > > </FONT> <BR><FONT SIZE=2>> > > </FONT> <BR><FONT SIZE=2>> > > From: VIVEK_SHARMA <VIVEK_SHARMA_at_infy.com> </FONT> <BR><FONT SIZE=2>> > > Date: Mon, 27 Nov 2000 12:11:59 +0530 </FONT> <BR><FONT SIZE=2>> > > Subject: SOS Alert </FONT> <BR><FONT SIZE=2>> > > </FONT> <BR><FONT SIZE=2>> > > CASE In a Bank , Interest Calculation Batch Processing Unable</FONT> <BR><FONT SIZE=2>> to </FONT> <BR><FONT SIZE=2>> go </FONT> <BR><FONT SIZE=2>> > beyond </FONT> <BR><FONT SIZE=2>> > > 500 A/cs per minute </FONT> <BR><FONT SIZE=2>> > > </FONT> <BR><FONT SIZE=2>> > > CAUSE of the Problem is UNKNOWN </FONT> <BR><FONT SIZE=2>> > > </FONT> <BR><FONT SIZE=2>> > > ORACLE 7.3.4.5 on AIX 4.3.3 </FONT> <BR><FONT SIZE=2>> > > DB Server - IBM S80 Model - 12 CPUs , 3 GB RAM </FONT> <BR><FONT SIZE=2>> > > APP Server 1 - IBM S80 Model - 6 CPUs , 2 GB RAM </FONT> <BR><FONT SIZE=2>> > > APP Servers 2 & 3 - IBM Thin Nodes model - 4 CPUs , 1 GB RAM </FONT> <BR><FONT SIZE=2>> > > </FONT> <BR><FONT SIZE=2>> > > Storage Box :- </FONT> <BR><FONT SIZE=2>> > > =========== </FONT> <BR><FONT SIZE=2>> > > SS Class Storage </FONT> <BR><FONT SIZE=2>> > > RAID 0+1 - (First Striped & then Mirrored) </FONT> <BR><FONT SIZE=2>> > > NOTE - 2 Storage Boxes Exist , one being the Mirror of the</FONT> <BR><FONT SIZE=2>> Other </FONT> <BR><FONT SIZE=2>> > > Striping exists across a set 4 Disks (in one Box) with another</FONT> <BR><FONT SIZE=2>> 4 </FONT> <BR><FONT SIZE=2>> being </FONT> <BR><FONT SIZE=2>> > > it's </FONT> <BR><FONT SIZE=2>> > > mirror </FONT> <BR><FONT SIZE=2>> > > (in another Box). </FONT> <BR><FONT SIZE=2>> > > Thus a Volume Group Contains 8 Disks with 4 Disks </FONT> <BR><FONT SIZE=2>> > > Stripe Size = 4K </FONT> <BR><FONT SIZE=2>> > > </FONT> <BR><FONT SIZE=2>> > > NOTE - Runs Tried from BOTH the APP Servers 1 OR 2 ,But with</FONT> <BR><FONT SIZE=2>> the </FONT> <BR><FONT SIZE=2>> SAME </FONT> <BR><FONT SIZE=2>> > MAX </FONT> <BR><FONT SIZE=2>> > > of </FONT> <BR><FONT SIZE=2>> > > 500 A/cs </FONT> <BR><FONT SIZE=2>> > > processed per minute </FONT> <BR><FONT SIZE=2>> > > </FONT> <BR><FONT SIZE=2>> > > CPU Utilizations on BOTH APP & DB Server = 40 % </FONT> <BR><FONT SIZE=2>> > > wio% on BOTH APP & DB Servers = 35 % </FONT> <BR><FONT SIZE=2>> > > No paging happening on Both APP & DB Servers </FONT> <BR><FONT SIZE=2>> > > </FONT> <BR><FONT SIZE=2>> > > - Oracle Contention Values Seem Small to us as shown Below or</FONT> <BR><FONT SIZE=2>> so </FONT> <BR><FONT SIZE=2>> they </FONT> <BR><FONT SIZE=2>> > seem </FONT> <BR><FONT SIZE=2>> > > to us :- </FONT> <BR><FONT SIZE=2>> > > </FONT> <BR><FONT SIZE=2>> > > SVRMGR> Rem System wide wait events for non-background</FONT> <BR><FONT SIZE=2>> processes </FONT> <BR><FONT SIZE=2>> (PMON, </FONT> <BR><FONT SIZE=2>> > > SVRMGR> Rem SMON, etc). Times are in hundreths of seconds. </FONT> <BR><FONT SIZE=2>> Each one of </FONT> <BR><FONT SIZE=2>> > </FONT> <BR><FONT SIZE=2>> > > SVRMGR> Rem these is a context switch which costs CPU time.</FONT> <BR><FONT SIZE=2>> By </FONT> <BR><FONT SIZE=2>> looking </FONT> <BR><FONT SIZE=2>> > at </FONT> <BR><FONT SIZE=2>> > > SVRMGR> Rem the Total Time you can often determine what is the</FONT> <BR><FONT SIZE=2>> </FONT> <BR><FONT SIZE=2>> > bottleneck </FONT> <BR><FONT SIZE=2>> > > SVRMGR> Rem that processes are waiting for. This shows the </FONT> <BR><FONT SIZE=2>> total time </FONT> <BR><FONT SIZE=2>> > > spent </FONT> <BR><FONT SIZE=2>> > > SVRMGR> Rem waiting for a specific event and the average time </FONT> <BR><FONT SIZE=2>> per wait </FONT> <BR><FONT SIZE=2>> > on </FONT> <BR><FONT SIZE=2>> > > SVRMGR> Rem that event. </FONT> <BR><FONT SIZE=2>> > > SVRMGR> select n1.event "Event Name", </FONT> <BR><FONT SIZE=2>> > > 2> n1.event_count "Count", </FONT> <BR><FONT SIZE=2>> > > 3> n1.time_waited "Total Time", </FONT> <BR><FONT SIZE=2>> > > 4> round(n1.time_waited/n1.event_count, 2) "Avg Time"</FONT> <BR><FONT SIZE=2>> </FONT> <BR><FONT SIZE=2>> > > 5> from stats$event n1 </FONT> <BR><FONT SIZE=2>> > > 6> where n1.event_count > 0 </FONT> <BR><FONT SIZE=2>> > > 7> order by n1.time_waited desc; </FONT> <BR><FONT SIZE=2>> > > Event Name Count Total Time</FONT> <BR><FONT SIZE=2>> Avg </FONT> <BR><FONT SIZE=2>> Time </FONT> <BR><FONT SIZE=2>> > </FONT> <BR><FONT SIZE=2>> > > -------------------------------- ------------- ------------- </FONT> <BR><FONT SIZE=2>> > ------------- </FONT> <BR><FONT SIZE=2>> > > SQL*Net message from client 10856276 31977110 </FONT> <BR><FONT SIZE=2>> > 2.95 </FONT> <BR><FONT SIZE=2>> > > enqueue 1295 374980 </FONT> <BR><FONT SIZE=2>> > 289.56 </FONT> <BR><FONT SIZE=2>> > > db file sequential read 3614044 303848 </FONT> <BR><FONT SIZE=2>> > .08 </FONT> <BR><FONT SIZE=2>> > > write complete waits 5812 295937 </FONT> <BR><FONT SIZE=2>> > 50.92 </FONT> <BR><FONT SIZE=2>> > > latch free 5045060 242170 </FONT> <BR><FONT SIZE=2>> > .05 </FONT> <BR><FONT SIZE=2>> > > SQL*Net more data from client 13939 165275 </FONT> <BR><FONT SIZE=2>> > 11.86 </FONT> <BR><FONT SIZE=2>> > > log file sync 12794 146409 </FONT> <BR><FONT SIZE=2>> > 11.44 </FONT> <BR><FONT SIZE=2>> > > buffer busy waits 100443 92477 </FONT> <BR><FONT SIZE=2>> > .92 </FONT> <BR><FONT SIZE=2>> > > </FONT> <BR><FONT SIZE=2>> > > </FONT> <BR><FONT SIZE=2>> > > </FONT> <BR><FONT SIZE=2>> > > - ALL Literal SQLs were Converted to using Bind variables </FONT> <BR><FONT SIZE=2>> > > - ALL Tables Running on Indexes Without Any FULL Scans</FONT> <BR><FONT SIZE=2>> happening </FONT> <BR><FONT SIZE=2>> . </FONT> <BR><FONT SIZE=2>> > > </FONT> <BR><FONT SIZE=2>> > > - All the Literal SQLs (Dynamic) Converted to using Bind </FONT> <BR><FONT SIZE=2>> variables </FONT> <BR><FONT SIZE=2>> > (Static </FONT> <BR><FONT SIZE=2>> > > Queries) </FONT> <BR><FONT SIZE=2>> > > </FONT> <BR><FONT SIZE=2>> > > - event="10181 trace name context forever, level 1000" </FONT> <BR><FONT SIZE=2>> > > NOTE - Set nevertheless , None of the SQLs taking Excessive</FONT> <BR><FONT SIZE=2>> time </FONT> <BR><FONT SIZE=2>> to </FONT> <BR><FONT SIZE=2>> > parse </FONT> <BR><FONT SIZE=2>> > > though </FONT> <BR><FONT SIZE=2>> > > </FONT> <BR><FONT SIZE=2>> > > - NO statistics are Analyzed </FONT> <BR><FONT SIZE=2>> > > - 7 tables involved in the interest Calc. </FONT> <BR><FONT SIZE=2>> > > inserts to 2 Tables </FONT> <BR><FONT SIZE=2>> > > selects , updates in 5 tables </FONT> <BR><FONT SIZE=2>> > > </FONT> <BR><FONT SIZE=2>> > > Qs. Are there Any BASIC O.S./Hardware Features to Enhance </FONT> <BR><FONT SIZE=2>> Performance on </FONT> <BR><FONT SIZE=2>> > > IBM </FONT> <BR><FONT SIZE=2>> > > AIX Systems ? </FONT> <BR><FONT SIZE=2>> > > </FONT> <BR><FONT SIZE=2>> > > Qs. Is Any Other portion of the report.txt required for</FONT> <BR><FONT SIZE=2>> perusal </FONT> <BR><FONT SIZE=2>> ? </FONT> <BR><FONT SIZE=2>> > > </FONT> <BR><FONT SIZE=2>> > > Please Revert to me for any further info / Clarifications ? </FONT> <BR><FONT SIZE=2>> > > </FONT> <BR><FONT SIZE=2>> > > </FONT> <BR><FONT SIZE=2>> > > </FONT> <BR><FONT SIZE=2>> > > << File: report.txt_Live >> << File: report.txt_TEST >> </FONT>Received on Fri Dec 22 2000 - 11:32:21 CST
![]() |
![]() |