Oracle FAQ | Your Portal to the Oracle Knowledge Grid |
Home -> Community -> Usenet -> c.d.o.server -> [C.D.O.S][Long] How to deal with full file system? Best Practice?
People,
I'm trying to write a simple document on how a DBA should handle an alert that a file system is full.
Do any of you have some common sense you would like to share?
Here are some ideas which come to my mind:
The first reaction should be issuance of set of questions.
The questions should explore two degrees of freedom.
The first degree of freedom is related to the purpose or use of the
file system.
The second degree of freedom is related to the pattern of growth.
Questions about the purpose or use of the file system:
Questions about growth patterns:
The first set of questions number 14 and the second set has 6. This
means we
have 14 x 6 = 84 permutations of answers.
Next, once the DBA has some answers to the above questions or some other obvious questions, a common sense reaction should emerge.
Some scenarios which might warrant discussion are listed below:
Scenario 1:
-A file system, which is a sole collector of Archived Redo Logs, has
just filled up
and the database has stopped transactions; it appears 'frozen'.
Scenario 2:
-A file system, which contains all the data files for a particular
'hot' tablespace
has just filled up. The database still functions but the application
which depends
on the 'hot' tablespace is malfunctioning.
Scenario 3:
-A file system, which contains data files for a variety of tablespaces,
has been slowly
growing over that past few months and has just reached the limit of
its size.
Scenario 4:
-A file system, which contains data files for a variety of tablespaces,
has had no growth
over the past several months and then in the space of several hours it
experienced a
growth spurt and is now full.
Scenario 5:
-A file system, which contains 'online backups' has been slowly
growing over that past few months and has just reached the limit of
its size.
Scenario 6:
-The root file system which is quite small compared to all the others,
has just filled up.
Scenario 7:
-The /tmp file system which is quite small compared to all the others,
has just filled up.
Scenario 8:
-The /var file system which is quite small compared to all the others,
has just filled up.
Scenario 1:
Both the probability of occurrence and the severity of service
disruption put this scenario
at the top of the discussion list. Any database in Archivelog mode is
at risk.
The reaction of the DBA to this situation can usually be dictated by
common sense but if
the DBA is operating in a complex and unfamiliar environment he faces
significant risk that
he will do the wrong thing.
One classic response is this:
- Find another file system which has free space.
Scenario 2:
The DBA will encounter this scenario if data files in the full file system had been configured to 'autoextend'. This configuration option simply means that a data file can grow. So, database growth can cause a file system to fill up. The DBA has a short term task and a longer term task. In the short term, he can configure the appropriate tablespace to grow in another file system which has free space. Another option is to locate un-needed data in the tablespace resident within the full file system. Once located, the DBA can remove the un-needed data which will free up space within the tablespace (but not the file system). Another option is to locate an inactive data file within the file system and move the file to another file system (the idea behind this is simpler than the SQL required to implement it).
In the long term the DBA needs to implement capacity planning.
Obviously if database growth caused a file system to fill up and a
dependent application to malfunction, the original capacity planning
was implemented improperly. If the DBA has the luxury to be
proactive, one feature that exists in 10g which might help is ASM.
This feature allows the DBA to treat available disk space as a large
pool which requires less management from the DBA. One advantage of
ASM is that it automates the placement of database segments for the
purpose of balancing I/O. Before ASM, the DBA needed to balance I/O.
Sometimes this need to balance I/O would lead to full file systems
because placement of data (for I/O purposes) would sometimes contend
with the goal of finding enough room for the data. Another way to say
this is that the probability that a large pool of data will become
full is lower than the probability that a single file system will fill
up
when the database spans several file systems.
Another degree of freedom the DBA has to explore is the compression of
data.
Oracle offers a number of ways pack data more tightly. The oldest
method
is to cram as many table rows into each data block as possible through
the use of a storage parameter named PCT_FREE. This method works best
if the block size is a large value such as 32k or 64k. Setting
PCT_FREE
to a small value and block size to a large value would probably come
with a performance cost. Also, the DBA can compress data within the
database. This obviously comes with a performance cost since the
kernel
will be tasked with extra compression and decompression chores.
Scenario 4:
This scenario is troublesome. Based on the description, it's probable
that some
malfunction or unforseen business process has recently pumped a large
amount
of data into the database. The DBA is then tasked with becoming a data
detective
who needs to gain a quick understanding of this new data. He may be
faced
with a difficult decision to either keep the new data or jettison it.
The
scenario is too general to say more except that either decision is
probably full
of risks. It's easy to suggest that the DBA work to become familiar
with
the data in his database. Of course if the business processes which
depend
on the database are undergoing chaotic or exponential growth, the DBA
can only
stay ahead of the curve if he is closely connected with inner workings
of both
the business and the applications which are filling the database with
data.
Scenario 6:
-The root file system which is quite small compared to all the others,
has just filled up.
This scenario suggests a malfunction in the operating system or a human
error
caused by the hand of a person with the root password. One obvious
cause of
this scenario is a 2 step error condition:
1. A file system becomes unmounted (due to disk error perhaps)
2. A process dependent on that file system tries to write a large
amount of
data. Since the file system is gone, the root file system receives the
data and quickly fills.
Scenario 7:
-The /tmp file system which is quite small compared to all the others,
has just filled up.
The cause of this scenario often is similar to Scenario 6. The probability of /tmp filling to 100% is higher since it is publicly writeable. Removing large amounts of data from /tmp (to remedy the situation) is probably safer than removing large amounts of data from /. On Solaris, /tmp is actually a file system resident within memory rather than on disk. So filling /tmp on a Solaris machine is probably more harmful to availability than it would be on other UNIX variants. When dealing with Oracle software, consider it a best practice to make use of appropriate environment variables (TMP and TMPDIR) to prevent Oracle from writing files into /tmp when possible.
Scenario 8:
-The /var file system which is quite small compared to all the others,
has just filled up.
The purpose of the /var file system is to collect 'variable' data generated by a wide variety of applications such as web servers and e-mail agents. It's considered best to allow this data to accumulate in /var rather than / or /usr. So, obviously the probability that /var will fill up is higher than the filling of / and /usr. When /var does fill up, however, availability of some applications will suffer. Fortunately, most applications which have the ability to fill /var also provide utilities to remove un-needed files. A good DBA/SysAdmin is aware of these utilities and how to use them.
So, these are some general ideas and thoughts concerning how a DBA should react when he recieves an alert that a file system is full. I realize that a lot could be written about this topic.
Are there any obvious or large ideas that I've missed that the proactive DBA should be thinking about?
--
Peter Smith
GoodJobFastCar_at_gmail.com
http://GoodJobFastCar.com
Received on Mon Dec 18 2006 - 01:15:29 CST