Recovery
Preparing to recover
The following steps should be followed in the event of a media or system failure. In the event of a disaster (the physical system or computer room is destroyed), these steps should not be used and the Database Administrator should refer to the Business Continuity manual for the Organization.
- Remain calm and do not panic.
- Determine the severity of the failure.
- Assess the recovery strategies. If necessary, refer to an Oracle Backup and Recovery Manual or call Oracle World Wide Support for assistance (See the organization's Site Guide for account information).
- If the disk containing the online backup is accessible, use that data to restore the database, otherwise work with the UNIX or Windows administrators to restore necessary files from tape.
- Recover the database.
- If an incomplete recovery was necessary, get a cold backup of the database immediately to protect the company against another failure in the near future.
Why is Recovery Essential
To improve performance, Oracle keeps many changes in memory, even after they are committed. It may also write data to the datafiles to free up memory, even though the changes have not been committed. At the time of a failure, all data in memory is lost. In order to ensure no committed changes are lost, Oracle records all operations in an online redo logfile. The information in the log file allows Oracle to redo any operations that may be lost in a failure. Writing to the logfile does not hurt performance, because these writes are sequential and very fast. Writing to datafiles on the other hand is random and can be very slow because the disk block to be modified on disk must be located, and the disk head properly positioned for every write.
To Understand Recovery more clearly, we need to Understand two "Buzz Words", Roll Forward and Rollback.
Roll forward
During the roll forward phase, Oracle replays changes recorded in the redo log starting from the checkpoint position. The checkpoint position is the place in the redo log where changes associated with previous redo entries had been saved to the datafiles before the failure. (Each datafile, in its header, has a checkpoint structure that gets updated every time LGWR issues a checkpoint to the DBWR.)
The online redo log contains all (committed and uncommitted) changes made to the database buffers. Before the roll forward phase, datafiles may not contain all committed changes. It is possible that data changed and committed in the database buffers are recorded in the redo log files, but not yet written to the datafiles by DBWR. It is also possible that datafiles contain uncommitted changes, because DBWR may have written uncommitted changes to the datafiles during high transaction activities to free up space in the database buffer cache for new transactions.
As Oracle replays the redo operations, it applies both committed and uncommitted changes to the datafiles. At the conclusion of the roll forward phase, the datafiles contain all committed changes as well as new uncommitted changes (applied during roll forward) and possibly old uncommitted changes (saved to the datafiles prior to the failure).
Rollback
During the rollback phase, Oracle searches out changes associated with dead transactions that had not committed before the failure occurred. It then uses transaction tables in rollback segment (already recovered during the roll forward phase) to rollback the change to its previous value. At the completion of this phase, the data returns to a transactionally consistent state.
Oracle can be open during the rollback phase. Any new transaction that tries to modify a row locked by a dead transaction will need to rollback the transaction blocking its path to release the lock.
How Oracle determines if recovery is needed
When the database is opened, a start SCN is recorded in the control file for every data file associated in the database, and a stop SCN is set to Infinity.
During normal database operation's, the SCN and the checkpoint counter, information in the data file header, is incremented every time a checkpoint is done.
When the database is shutdown with the normal or immediate option, an end SCN is recorded in the data file header. This information is also recorded in the control files, i.e. end SCN of the datafile is equal to the stop SCN of the control files.
When database is opened the next time, Oracle makes two checks:
- if end SCN in the data file header matches its corresponding stop SCN in the control file,
- if checkpoint in the data file header matches its corresponding checkpoint counter in the control file.
If you shut down the database in ABORT mode, then no checkpoint is performed, and the STOP SCN in the control file is left at infinity (the same state when you started or opened your data files). For example, the end SCN in the datafile header is "1000" and stop SCN in the control file is "infinity". In this case Oracle performs crash recovery: it reads the on-line redo log files and applies the changes to the database as a part of the roll forward and reads the rollback segment's transaction table to perform transaction recovery (roll backward).
You can often use the table V$RECOVER_FILE to determine which files to recover. This table lists all files that need to be recovered, and explains why they need to be recovered.
Note: This view is not useful if the control file currently in use is a restored backup or a new control file was created since the media failure had occurred. A restored or re-created control file does not contain the information Oracle needs to fill V$RECOVER_FILE accurately.
The following query displays the file ID numbers of data files that require recovery:
SELECT file#, online, error FROM v$recover_file;
Recovery Scenarios
- Complete Media Recovery
- Incomplete Media Recovery
- Datafile belonging to UNDO tablespace
- Archivelog mode and datafile recovery (non-SYSTEM tablespace)
- Archivelog mode and redolog recovery
- Control file recovery
- One or more datafiles damaged
- Datafile damaged and backup not available
- Recovering from user errors