Oracle FAQ | Your Portal to the Oracle Knowledge Grid |
Home -> Community -> Usenet -> c.d.o.server -> Re: [C.D.O.S][Long] How to deal with full file system? Best Practice?
Peter Smith [gjfc] wrote:
> People,
>
> I'm trying to write a simple document on how a DBA should handle an
> alert that a file system is full.
>
> Do any of you have some common sense you would like to share?
>
> Here are some ideas which come to my mind:
>
> The first reaction should be issuance of set of questions.
>
> The questions should explore two degrees of freedom.
>
> The first degree of freedom is related to the purpose or use of the
> file system.
> The second degree of freedom is related to the pattern of growth.
>
>
> Questions about the purpose or use of the file system:
>
> - Does this file system have an obvious purpose?
> - Is it supposed to hold a certain type of data or files?
> - Does this file system collect Archived Redo Logs?
> - If yes, is it the sole destination for Archived Redo Logs?
> - Does this file system hold data files for a database?
> - If yes, are these data files configured to autoextend?
> - If yes, what is the historical behavior of these data files?
> - If yes, Have they been growing?
> - Is this file system supposed to be a container of online backups?
> - Is this file system used to hold a Flashback Recovery Area for a 10g
> Database?
> - Is this file system the root "/" file system?
> - Is this file system the /tmp file system?
> - Is this file system the /var file system?
> - Which UNIX id's have permissions to write to this file system?
>
> Questions about growth patterns:
>
> - What is the historical growth behavior of this file system?
> - How long has it been full?
> - Did it recently fill up via a growth spurt or has the growth been
> gradual?
> - Do any growth patterns correlate with any business processes such as
> month end close, shopping season, large news events, or recent
> marketing campaigns?
> - Is a process currently trying to write to the filesystem?
> - Is the file system more than 100% full? (meaning that a root owned
> process filled it up)
>
> The first set of questions number 14 and the second set has 6. This
> means we
> have 14 x 6 = 84 permutations of answers.
>
> Next, once the DBA has some answers to the above questions or some
> other obvious questions, a common sense reaction should emerge.
>
> Some scenarios which might warrant discussion are listed below:
>
> Scenario 1:
> -A file system, which is a sole collector of Archived Redo Logs, has
> just filled up
> and the database has stopped transactions; it appears 'frozen'.
>
> Scenario 2:
> -A file system, which contains all the data files for a particular
> 'hot' tablespace
> has just filled up. The database still functions but the application
> which depends
> on the 'hot' tablespace is malfunctioning.
>
> Scenario 3:
> -A file system, which contains data files for a variety of tablespaces,
> has been slowly
> growing over that past few months and has just reached the limit of
> its size.
>
> Scenario 4:
> -A file system, which contains data files for a variety of tablespaces,
> has had no growth
> over the past several months and then in the space of several hours it
> experienced a
> growth spurt and is now full.
>
> Scenario 5:
> -A file system, which contains 'online backups' has been slowly
> growing over that past few months and has just reached the limit of
> its size.
>
> Scenario 6:
> -The root file system which is quite small compared to all the others,
> has just filled up.
>
> Scenario 7:
> -The /tmp file system which is quite small compared to all the others,
> has just filled up.
>
> Scenario 8:
> -The /var file system which is quite small compared to all the others,
> has just filled up.
>
>
> Scenario 1:
> Both the probability of occurrence and the severity of service
> disruption put this scenario
> at the top of the discussion list. Any database in Archivelog mode is
> at risk.
> The reaction of the DBA to this situation can usually be dictated by
> common sense but if
> the DBA is operating in a complex and unfamiliar environment he faces
> significant risk that
> he will do the wrong thing.
> One classic response is this:
> - Find another file system which has free space.
> - Move some Archived Redologs out of the full file system into the
> other file system.
> - Check the behavior of the database; it should start processing
> transactions by
> itself with no human interaction; hopefully dependent applications
> are just as resilient.
> - At this point you should have some breathing room but the risk is
> still high that
> the file system will fill and the DB will freeze again.
> - So, note the time and start a hot backup; hopefully it will go quick.
> - Assuming that the hot backup started at 01:00, you can delete all
> Archived Redologs
> created before 01:00 and this will create a large amount of free
> space in the file system.
>
>
> Scenario 2:
>
> The DBA will encounter this scenario if data files in the full file
> system had been configured to 'autoextend'. This configuration option
> simply means that a data file can grow. So, database growth can cause
> a file system to fill up. The DBA has a short term task and a longer
> term task. In the short term, he can configure the appropriate
> tablespace to grow in another file system which has free space.
> Another option is to locate un-needed data in the tablespace resident
> within the full file system. Once located, the DBA can remove the
> un-needed data which will free up space within the tablespace (but not
> the file system). Another option is to locate an inactive data file
> within the file system and move the file to another file system (the
> idea behind this is simpler than the SQL required to implement it).
>
>
> In the long term the DBA needs to implement capacity planning.
> Obviously if database growth caused a file system to fill up and a
> dependent application to malfunction, the original capacity planning
> was implemented improperly. If the DBA has the luxury to be
> proactive, one feature that exists in 10g which might help is ASM.
> This feature allows the DBA to treat available disk space as a large
> pool which requires less management from the DBA. One advantage of
> ASM is that it automates the placement of database segments for the
> purpose of balancing I/O. Before ASM, the DBA needed to balance I/O.
> Sometimes this need to balance I/O would lead to full file systems
> because placement of data (for I/O purposes) would sometimes contend
> with the goal of finding enough room for the data. Another way to say
> this is that the probability that a large pool of data will become
> full is lower than the probability that a single file system will fill
> up
> when the database spans several file systems.
>
> Another degree of freedom the DBA has to explore is the compression of
> data.
> Oracle offers a number of ways pack data more tightly. The oldest
> method
> is to cram as many table rows into each data block as possible through
> the use of a storage parameter named PCT_FREE. This method works best
> if the block size is a large value such as 32k or 64k. Setting
> PCT_FREE
> to a small value and block size to a large value would probably come
> with a performance cost. Also, the DBA can compress data within the
> database. This obviously comes with a performance cost since the
> kernel
> will be tasked with extra compression and decompression chores.
>
> Scenario 4:
> This scenario is troublesome. Based on the description, it's probable
> that some
> malfunction or unforseen business process has recently pumped a large
> amount
> of data into the database. The DBA is then tasked with becoming a data
> detective
> who needs to gain a quick understanding of this new data. He may be
> faced
> with a difficult decision to either keep the new data or jettison it.
> The
> scenario is too general to say more except that either decision is
> probably full
> of risks. It's easy to suggest that the DBA work to become familiar
> with
> the data in his database. Of course if the business processes which
> depend
> on the database are undergoing chaotic or exponential growth, the DBA
> can only
> stay ahead of the curve if he is closely connected with inner workings
> of both
> the business and the applications which are filling the database with
> data.
>
>
> Scenario 6:
> -The root file system which is quite small compared to all the others,
> has just filled up.
>
> This scenario suggests a malfunction in the operating system or a human
> error
> caused by the hand of a person with the root password. One obvious
> cause of
> this scenario is a 2 step error condition:
> 1. A file system becomes unmounted (due to disk error perhaps)
> 2. A process dependent on that file system tries to write a large
> amount of
> data. Since the file system is gone, the root file system receives
> the
> data and quickly fills.
>
> Scenario 7:
> -The /tmp file system which is quite small compared to all the others,
> has just filled up.
>
> The cause of this scenario often is similar to Scenario 6. The
> probability of /tmp filling to 100% is higher since it is publicly
> writeable. Removing large amounts of data from /tmp (to remedy the
> situation) is probably safer than removing large amounts of data from
> /. On Solaris, /tmp is actually a file system resident within memory
> rather than on disk. So filling /tmp on a Solaris machine is probably
> more harmful to availability than it would be on other UNIX variants.
> When dealing with Oracle software, consider it a best practice to make
> use of appropriate environment variables (TMP and TMPDIR) to prevent
> Oracle from writing files into /tmp when possible.
>
>
> Scenario 8:
> -The /var file system which is quite small compared to all the others,
> has just filled up.
>
> The purpose of the /var file system is to collect 'variable' data
> generated by a wide variety of applications such as web servers and
> e-mail agents. It's considered best to allow this data to accumulate
> in /var rather than / or /usr. So, obviously the probability that
> /var will fill up is higher than the filling of / and /usr. When /var
> does fill up, however, availability of some applications will suffer.
> Fortunately, most applications which have the ability to fill /var
> also provide utilities to remove un-needed files. A good DBA/SysAdmin
> is aware of these utilities and how to use them.
>
> So, these are some general ideas and thoughts concerning how a DBA
> should react when he recieves an alert that a file system is full. I
> realize that a lot could be written about this topic.
>
> Are there any obvious or large ideas that I've missed that the
> proactive DBA should be thinking about?
Assuming 9i or 10g include:
CREATE OR REPLACE TRIGGER logon_trigger
AFTER logon
ON DATABASE
BEGIN
execute immediate 'alter session enable resumable';
dbms_resumable.set_timeout(1800); -- or some other reasonable value
END logon_trigger;
/
Then
-- Daniel A. Morgan University of Washington damorgan_at_x.washington.edu (replace x with u to respond) Puget Sound Oracle Users Group www.psoug.orgReceived on Mon Dec 18 2006 - 12:13:59 CST