To understand instance recovery, you need to understand the functioning of certain background processes.
Checkpoint (CKPT) Process
Every three seconds (or more frequently), the CKPT process stores data in
the control file to document the modified data blocks that DBWn has
written from the SGA to disk. This is called a “checkpoint.” The purpose of a
checkpoint is to identify that place in the online redo log file where instance
recovery is to begin (which is called the “checkpoint position”).
In the event of a log switch, the CKPT process also writes this
checkpoint information to the headers of data files.
Checkpoints exist for the following reasons:
•To ensure that modified data blocks in memory are written to the disk
regularly so that data is not lost in case of a system or database failure
•To reduce the time required for instance recovery (Only the online redo
log file entries following the last checkpoint need to be processed for
recovery.)
•To ensure that all committed data has been written to data files during
shutdown
The checkpoint information written by the CKPT process includes
checkpoint position, system change number (SCN), location in the online redo
log file to begin recovery, information about logs, and so on.
Note: The CKPT process does not write data blocks to the disk or
redo blocks to the online redo log files.
Background
Processes and Recovery: Redo Log Files and LogWriter
Redo log
files record changes to the database as a result of transactions and internal
Oracle server actions. (A transaction is a logical unit of work consisting of
one or more SQL statements run by a user.) Redo log files protect the database
from the loss of integrity because of system failures caused by power outages,
disk failures, and so on. Redo log files must be multiplexed to ensure that the
information stored in them is not lost in the event of a disk failure.
The redo
log consists of groups of redo log files. A group consists of a redo log file
and its multiplexed copies. Each identical copy is said to be a member of that group,
and each group is identified by a number. The LogWriter (LGWR) process writes
redo records from the redo log buffer to all members of a redo log group until
the files are filled or a log switch
operation is requested.
It then switches and writes to the files in the next group. Redo log groups are
used in a circular fashion.
Best
practice: If
possible, multiplexed redo log files should reside on different disks.
The
Archiver (ARCn) Process
ARCn
is an optional background process. However, it is crucial to the recovery of a database
after the loss of a disk. When an online redo log group gets filled, the Oracle
instance begins writing to the next online redo log group. The process of
switching from one online redo log group to another is called a log switch.
The ARCn process initiates archiving of the filled log group at every
log switch. It automatically archives the online redo log group before the log
group can be reused so that all the changes made to the database are preserved.
This enables recovery of the database to the point of failure even if a disk
drive is damaged.
One of the
important decisions that a DBA must make is whether to configure the database
to operate in ARCHIVELOG mode or in NOARCHIVELOG mode.
•
In
NOARCHIVELOG mode, the online redo log files are overwritten each time a log
switch occurs.
•
In
ARCHIVELOG mode, inactive groups of filled online redo log files must be
archived before they can be used again.
Note: ARCHIVELOG mode is essential for
most backup strategies (and is very easy to configure).
Instance
Recovery
The Oracle
database automatically recovers from instance failure. All that the DBA needs
to do is start the instance normally. The instance mounts the control files and
then attempts to open the data files. When it discovers that the data files
have not been synchronized during shutdown, the instance uses information
contained in the redo log groups to roll the data files forward to the time of
shutdown. Then the database is opened and (because the undo tablespace is also
rolled forward) any uncommitted transactions are rolled back.
Phases
of Instance Recovery
For an
instance to open a data file, the system change number (SCN) contained in the
data file’s header must match the current SCN that is stored in the database’s
control files.
If the
numbers do not match, the instance applies redo data from the online redo logs,
sequentially “redoing” transactions until the data files are up-to-date. After
all data files have been synchronized with the control files, the database is
opened and users can log in.
When redo
logs are applied, all transactions are applied to bring the database up
to the state as of the time of failure. This usually includes transactions that
are in progress but have not yet been committed. After the database has been
opened, those uncommitted transactions are rolled back.
At the end of the rollback phase of instance recovery, the data files contain only committed data.
At the end of the rollback phase of instance recovery, the data files contain only committed data.