My Basic Knowledge: Recovering from the Loss of a Oracle Redo Log Group

To deal with the loss of redo log files, it is important to understand the possible states of redo log groups. Redo log groups cycle through three different states as part of the normal running of the Oracle database. They are, in order of the cycle:

• CURRENT: This state means that the redo log group is being written to by LGWR to record redo data for any transactions going on in the database. The log group remains in this state until there is a switch to another log group.

• ACTIVE: The redo log group still contains redo data that is required for instance recovery. This is the status during the time when a checkpoint has not yet executed that would write out to the data files all data changes that are represented in the redo log group.

• INACTIVE: The checkpoint discussed above has indeed executed, meaning that the redo log group is no longer needed for instance recovery, and is free to become the next CURRENT log group.

If you have lost an entire redo log group, then all copies of the log files for that group are unusable or gone. The simplest case is where the redo log group is in the inactive state. That means it is not currently being written to, and it is no longer needed for instance recovery. If the problem is temporary, or you are able to fix the media, then the database continues to run normally, and the group is reused when enough log switch events occur. Otherwise, if the media cannot be fixed, you can clear the log file. When you clear a log file, you are indicating that it can be reused.

If the redo log group in question is active, then, even though it is not currently being written to, it is still needed for instance recovery. If you are able to perform a checkpoint, then the log file group is no longer needed for instance recovery, and you can proceed as if the group were in the inactive state.

If the log group is in the current state, then it is, or was, being actively written to at the time of the loss. You may even see the LGWR process fail in this case. If this happens, the instance crashes. Your only option at this point is to restore from backup, perform cancel-based incomplete recovery, and then open the database with the RESETLOGS option.

Clearing a Log File

Clear a log file using this command:

ALTER DATABASE CLEAR LOGFILE [UNARCHIVED] GROUP <n> [UNRECOVERABLE DATAFILE]

When you clear a log file, you are indicating that it can be reused. If the log file has already been archived, the simplest form of the command can be used. Use the following query to determine which log groups have been archived:

SQL> SELECT GROUP#, STATUS, ARCHIVED FROM V$LOG;

For example, the following command clears redo log group 3, which has already been archived:

SQL> ALTER DATABASE CLEAR LOFGILE GROUP 3;

If the redo log group has not been archived, then you must specify the UNARCHIVED keyword. This forces you to acknowledge that it is possible that there are backups that rely on that redo log for recovery, and you have decided to forgo that recovery opportunity. This may be satisfactory for you, especially if you take another backup right after you correct the redo log group problem; you then no longer need that redo log file.

It is possible that the redo log is required to recover a data file that is currently offline.