Re: hanging shutdowns (addressing the requirement for a UNIX reboot)

From: LiShan Cheng <exriscer_at_gmail.com>
Date: Tue, 28 Feb 2006 14:12:00 +0100
Message-ID: <6e9345580602280512s6647dc78p810eacf74348b79b@mail.gmail.com>

I have been in a customer a couple of years ago wose backup script did something like this

retry = 0
loop

   if retry == 3 shutdown abort and exit loop    shutown immediate
   wait 60 seconds
   if success then exit loop else retry = retry + 1 end loop

this was used since 1998 until today, shutdown abort has been used time to time with no errors

On 2/28/06, oracle-l-bounce_at_freelists.org <oracle-l-bounce_at_freelists.org> wrote:
>
> All,
>
> I am with Jeremiah on this: A shutdown abort DOES NOT harm a database
> (at least in the five years I had used it on a set of active databases a
> few years ago). The ONLY time a Db had a problem after shutdown abort
> was in a 8i upgraded to 9i database (there was a bug a while ago which
> was related to the change of format in the redo log to support LSB which
> manifested itself when a shutdown abort was issued in between the
> upgrade before it was completed - I don't remember the specifics, but it
> manifested only during the upgrade).
>
> As to the requirement to reboot the Solaris server, was this because the
> Database did not restart and complained of 'Unable to create Shared Mem
> segment' (Or similar message)? I believe this could have been because
> you killed the background processes after a 'shutdown immediate' "hang".
> This is because once you initiate a 'shutdown immediate' and
> 'control-c'ed out of it, then you will never be able to login since any
> new attaches will complain that a shutdown is in progress, and the only
> way out is to kill the backend processes. In this case, the shared
> memory segment is never released and you get the error at database
> restart because the SHM start address is calculated to the same existing
> but currently open value, everything being equal). You can very easily
> get out of this using the example in the following real life event:
>
> In this case, I had three databases (the surviving Ist, 2nd Dbs and then
> the third whose backend had to be killed). In this case, use 'ipcs -am'
> to determine the memory segments, calculate the SGA size of the
> surviving databases and map the segment IDs using the LPIDs as shown
> below. Then use 'ipcrm -m <Key>' to kill the *right* segment (ipcrm -m
> 23175 in tis case) which will then allow you to restart the database.
> (Take it from me, I have done it many times before). In addition, the
> NATTCH column which shows 0 attaches is another giveaway!
>
> $ ipcs -am | head -2; ipcs -am | grep oracle
> IPC status from <running system> as of Thu Dec 8 13:47:57 BST 2005
> T ID KEY MODE OWNER GROUP CREATOR
> CGROUP NATTCH SEGSZ CPID LPID ATIME DTIME CTIME
> m 147840 0 --rw-r----- oracle dba oracle
> dba 0 655441920 8931 23175 13:47:22 13:47:22 11:42:07
> m 2 0xdd27ed28 --rw-r----- oracle dba oracle
> dba 16 371458048 6548 22193 13:45:01 13:45:01 14:35:12
> m 276867 0xfa9fd35c --rw-r----- oracle dba oracle
> dba 0 502874112 8931 23175 13:47:22 13:47:22 11:42:11
> m 787590 0 --rw-r----- oracle dba oracle
> dba 139 655441920 11593 23223 13:47:46 13:47:47 6:06:10
> m 716359 0xe315db0c --rw-r----- oracle dba oracle
> dba 139 502874112 11593 23223 13:47:46 13:47:47 6:06:15
>
> Ist surviving DB SQL> show sga
>
> Total System Global Area 1157681312 bytes <== LPID 23223, 139 attaches)
> Fixed Size 73888 bytes
> Variable Size 501182464 bytes
> Database Buffers 655360000 bytes
> Redo Buffers 1064960 bytes
>
> 1158316032 = 655441920 + 502874112 (LPID 23223 - 2 segments)
>
> 2nd surviving DB SQL> show sga
>
> Total System Global Area 370548720 bytes <== LPID 22193)
> Fixed Size 69616 bytes
> Variable Size 328454144 bytes
> Database Buffers 40960000 bytes
> Redo Buffers 1064960 bytes
>
> John Kanagaraj <><
> DB Soft Inc
> Phone: 408-970-7002 (W)
>
> Co-Author: Oracle Database 10g Insider Solutions
> http://www.amazon.com/exec/obidos/tg/detail/-/0672327910/
>
> ** The opinions and facts contained in this message are entirely mine
> and do not reflect those of my employer or customers **
>
>
>
>
> -----Original Message-----
> From: oracle-l-bounce_at_freelists.org
> [mailto:oracle-l-bounce_at_freelists.org] On Behalf Of Roger Xu
> Sent: Monday, February 27, 2006 3:24 PM
> To: Oracle-L_at_Freelists
> Subject: RE: hanging shutdowns
>
> What should I do if "shutdown immediate" hangs?
> Last time, I had to reboot the Solaris Server.
>
> -----Original Message-----
> From: oracle-l-bounce_at_freelists.org
> [mailto:oracle-l-bounce_at_freelists.org]On Behalf Of Edgar Chupit
> Sent: Monday, February 27, 2006 2:12 PM
> To: Oracle-L_at_Freelists
> Subject: Re: hanging shutdowns
>
>
> Dear Jeremiah,
>
> First of all, I would like to mention that I don't like to shutdown
> database without any practical reason (like hardware/OS
> maintenance/upgrades/etc).
>
> And still I would like to argue that under normal circumstances startup
> force restrict + shutdown immediate (or shutdown abort, startup force,
> shutdown immediate) will run almost as fast and is as dangerous as a
> single shutdown immediate.
>
> After shutting down abort in order to perform cold backup you still need
> to startup database and close it in consistent mode. Database startup is
> not very fast process in it self, because Oracle not only needs to
> recover database into consistent state (rollback uncommitted
> transactions), but also allocate memory structures and prepare itself
> for a normal work. And to shutdown database in consistent state you
> still need to issue shutdown immediate.
>
> One of the popular reasons why shutdown immediate can take a longer time
> to proceed is because Oracle waits for SNP process to wakeup
> (Note: 1018421.102), but this can also happened when the shutdown
> immediate is called second time (after startup force), so even
> checkpointing and using startup force restrict can cause database to
> hang in shutdown immediate mode.
>
> Also, there is a Note: 46001.1 that suggest to minimize usage of
> shutdown abort on Windows systems, because it can cause "allocation
> problems when Oracle is next started.". Note: 161234.1 that describes
> situation when shutdown abort can hang. Note: 222553.1 that states that
> startup force can be safer than shutdown abort. And plenty of other
> notes that describes different problems that can occur during database
> shutdown.
>
> And surely there are many bugs that can occur after shutdown abort (but
> under normal circumstances shutdown abort is very safe).
>
> Saying all this, I would like to return to thread subject and suggest to
> the original poster to try to convince the management to switch to hot
> backups, and forget about shutting down the databases because of backup
> at all.
>
> On 2/27/06, Jeremiah Wilton <jeremiah_at_ora-600.net> wrote:
> > If you 'alter system checkpoint' before the 'shutdown abort' then it
> > should be a lot faster for the user with a hanging or prolonged
> > 'shutdown immediate'.
>
> > Jeremiah Wilton
> > ORA-600 Consulting
> > Recoveries - Seminars - Hiring
> > http://www.ora-600.net
>
>
> --
> Best regards,
> Edgar Chupit
> callto://edgar.chupit
> --
> http://www.freelists.org/webpage/oracle-l
>
>
>
> For technical support please email tech_support_at_dp7uptx.com or you can
> call (972)721-8257.
> This email has been scanned for all viruses by the MessageLabs Email
> Security System.
>
> This e-mail is intended solely for the person or entity to which it is
> addressed and may contain confidential and/or privileged information.
> Any review, dissemination, copying, printing or other use of this e-mail
> by persons or entities other than the addressee is prohibited. If you
> have received this e-mail in error, please contact the sender
> immediately and delete the material.
> ____________________________________________________________________
> This email has been scanned for all viruses by the MessageLabs Email
> Security System. Any questions please call 972-721-8257 or email your
> request to tech_support_at_dp7uptx.com.
> --
> http://www.freelists.org/webpage/oracle-l
>
>
> --
> http://www.freelists.org/webpage/oracle-l
>
>
>

--
http://www.freelists.org/webpage/oracle-l

Received on Tue Feb 28 2006 - 07:12:00 CST