Wednesday, August 6, 2008

So your standby is out of sync?

When using dataguard, there are various scenarios where physical standby goes out of sync with primary. Refer http://download.oracle.com/docs/cd/B19306_01/server.102/b14239/scenarios.htm#CIHIAADC for scenarios where standby needs to be rolled forward and the steps to follow to bring it in sync with primary.

Before doing anything we need to verify that why standby is not in sync with primary. In this paticular note, i am covering the scenrio where a log is missing from the standby and the associated problems. Verify from v$archived_log that there is a gap in Sequence number. All the logs upto that gap should have APPLIED=YES.

SQL> SELECT SEQUENCE#, APPLIED FROM V$ARCHIVED_LOG;

There are basically couple of steps to be performed when the standby is not in sync with Primary and is lagging in terms of redo logs. These are:
1. Take a incremental backup of primary from the SCN where standby is lagging behind and apply on the standby server
2. Then re-create the controlfile of standby database from the primary
******************************************************************************
STEP#1
1. on STANDBY database query the V$DATABASE view and record the current SCN of the standby database:
SQL> SELECT CURRENT_SCN FROM V$DATABASE;
CURRENT_SCN
-----------
1.3945E+10

SQL> SELECT to_char(CURRENT_SCN) FROM V$DATABASE;
TO_CHAR(CURRENT_SCN)
----------------------------------------
13945141914

2. Stop Redo Apply on the standby database:
SQL> ALTER DATABASE RECOVER MANAGED STANDBY DATABASE CANCEL;ALTER DATABASE RECOVER MANAGED STANDBY DATABASE CANCEL*ERROR at line 1:ORA-16136: Managed Standby Recovery not active
If you see this above error, it means Managed Recovery is already off
You can also confirm from the view v$managed_standby to see if the MRP is running or not
SQL> SELECT PROCESS, STATUS FROM V$MANAGED_STANDBY;

3. Connect to the primary database as the RMAN target and create an incremental backup from the current SCN of the standby database that was recorded in step 1:
BACKUP INCREMENTAL FROM SCN 13945141914 DATABASE FORMAT '/tmp/ForStandby_%U' tag 'FOR STANDBY'

4. Do a recovery of STANDBY database using the incremental backup of primary taken above
--> On the Standby server, Without connecting to recovery catalog, catalog the backupset of the incermental backup
$ rman nocatalog target /
RMAN> CATALOG BACKUPPIECE '/dump/ipwp/inc_bkup/ForStandby_1qjm8jn2_1_1';
--> Now in the same session, start the recovery
RMAN> RECOVER DATABASE NOREDO;
you should see something like follwing at the end:
channel ORA_DISK_1: reading from backup piece /dump/ipwp/inc_bkup/ForStandby_1qjm8jn2_1_1channel ORA_DISK_1: restored backup piece 1piece handle=/dump/ipwp/inc_bkup/ForStandby_1qjm8jn2_1_1 tag=FOR STANDBYchannel ORA_DISK_1: restore complete, elapsed time: 01:53:08
Finished recover at 2008-07-25 05:20:3
--> Delete the backup set from standby
RMAN> DELETE BACKUP TAG 'FOR STANDBY';
using channel ORA_DISK_1
List of Backup PiecesBP Key BS Key Pc# Cp# Status Device Type Piece Name
------- ------- --- --- ----------- ----------- ----------
17713 17713 1 1 AVAILABLE DISK /dump/ipwp/inc_bkup/ForStandby_1qjm8jn2_1_1
Do you really want to delete the above objects (enter YES or NO)? YES
deleted backup piecebackup piece handle=/dump/ipwp/inc_bkup/ForStandby_1qjm8jn2_1_1 recid=17713 stamp=660972421Deleted 1 objects

5. Try to start the managed recovery.
ALTER DATABASE RECOVER MANAGED STANDBY DATABASE USING CURRENT LOGFILE DISCONNECT FROM SESSION;
--> If you get an error here, you need to goto STEP#2 for bringing standby in sync
--> If no error, then using the view v$managed_standby, verify that MRP process is started.

6. After this check whether the logs are being applied on the standby or not.
SQL> SELECT SEQUENCE#, APPLIED FROM V$ARCHIVED_LOG;
*******************************************************************************************
STEP #2: Since Managed recovery failed after applying the incremental backup, we need to re-create the controlfile of standby. The reason for re-creating the controlfile is that State of the database was same because the database_scn was not updated in the control file after applying the incremental backup while the scn for datafiles were updated. Due to this standby database was still looking for the old file to apply.
To recreate the standby controlfile:
--> Take the backup of controlfile from primary
rman target sys/oracle@boston catalog rman/cat@emrepbackup current controlfile for standby;
--> Copy the controlfile backup to the standby system (or if it is on the common NFS mount, no need to transfer or copy)
--> Shutdown all instances (If standby is RAC) of the standby.
sqlplus / as sysdbashutdown immediateexit
--> Startup nomount, one instance.
sqlplus / as sysdbastartup nomountexit
--> Restore the standby control file.
rman nocatalog target /restore standby controlfile from '/tmp/o1_mf_TAG20070220T151030_.bkp';exit
--> Startup the standby with the new control file.
sqlplus / as sysdbashutdown immediatestartup mountexit
--> Restart managed recovery in one instance (if standby is RAC) of the standby database:
sqlplus / as sysdba
ALTER DATABASE RECOVER MANAGED STANDBY DATABASE DISCONNECT

The abobe statement will succeed without errors but still MRP process is not started. The reason is that since the controlfile has been restored from the primary, it is looking for datafiles at the same location as are in primary instead of standby. For example, if the priamry datafiles are located at '+DATA/prod_db/DATAFILE' and standby datafiles are at '+DATA/standby_db/DATAFILE', the new controlfile has the datafiles location as '+DATA/prod_db/DATAFILE'. This can be verified from the query "select name from v$datafile" on the standby instance. We need to rename all the datafiles to reflect the correct location.
To rename the datafiles, there are 2 ways:
1. Without RMAN
--> Change the parameter standby_file_management=manual--> ALTER DATABASE RENAME FILE '+DATA/prod_db/datafile/users.310.620229743' TO '+DATA/standby_db/datafile/USERS.1216.648429765';
2. Using RMAN
--> rman nocatalog target /
--> Catalog the files, the string specified should refer to the diskgroup/filesystem destination of the standby data files.
RMAN> catalog start with '+diskgroup//datafile/';
e.g.:
RMAN> catalog start with '+DATA/ipwp_sac1/datafiles/';
This will give the user a list of files and ask if they should all be catalog. The user should review and say YES if all the datafiles are properly listed.
--> Once that is done, then commit the changes to the controlfile
RMAN> switch database to copy;
--> Now if you start the managed recovery
ALTER DATABASE RECOVER MANAGED STANDBY DATABASE DISCONNECT
and check for processes in the view V$MANAGED_STANBY, MRP process should be there. It will also start applying all the archived logs that were missing since last applied log (this might take hours)
*******************************************************************************************
After re-crreating the controfile and renaming the datafiles (before starting the managed recovery), we observed (and it is possible) that there is a datafile in production which is not present on the standby. This was verified by checking the names of datafiles from v$datafile view. It showed that there is one datafile whose location is still as of production and renaming effort also failed because the datafile is not at all present in the standby. So in such a case we need to backup that single datafile from production and restore it at standby.
--> On Production:
RMAN> run{ Allocate channel c1 type disk; Backup datafile '+DATA/ipwp_rwc1/datafile/ipw_invli_is.1643.660041401' format '/dump/ipwp/rman_backup/ipw_invli_is'; }

--> On standby:
RMAN> catalog backuppiece '/dump/db/rman_backup/ipw_invli_is;
cataloged backuppiecebackup piece handle=/dump/ipwp/rman_backup/ipw_invli_is recid=26806 stamp=661232892
RMAN> RESTORE DATAFILE '+DATA/prod_db/datafile/ipw_invli_is.1643.660041401';
Starting restore at 2008-07-28 03:58:18allocated channel: ORA_DISK_1channel ORA_DISK_1: sid=321 devtype=DISK
channel ORA_DISK_1: starting datafile backupset restorechannel ORA_DISK_1: specifying datafile(s) to restore from backup setrestoring datafile 00045 to +DATA/ipwp_rwc1/datafile/ipw_invli_is.1643.660041401channel ORA_DISK_1: reading from backup piece /dump/ipwp/rman_backup/ipw_invli_ischannel ORA_DISK_1: restored backup piece 1piece handle=/dump/ipwp/rman_backup/ipw_invli_is tag=TAG20080728T010613channel ORA_DISK_1: restore complete, elapsed time: 00:01:26Finished restore at 2008-07-28 03:59:46
RMAN> delete backuppiece '/dump/ipwp/rman_backup/ipw_invli_is';
allocated channel: ORA_DISK_1channel ORA_DISK_1: sid=310 devtype=DISK
List of Backup PiecesBP Key BS Key Pc# Cp# Status Device Type Piece Name------- ------- --- --- ----------- ----------- ----------26806 26805 1 1 AVAILABLE DISK /dump/ipwp/rman_backup/ipw_invli_is
Do you really want to delete the above objects (enter YES or NO)? YESdeleted backup piecebackup piece handle=/dump/ipwp/rman_backup/ipw_invli_is recid=26806 stamp=661232892Deleted 1 objects
--> After this re-confirm the location of all datafiles and then start the managed recovery.
ALTER DATABASE RECOVER MANAGED STANDBY DATABASE DISCONNECT;
*******************************************************************************

Hope the note helps. This has been my real-life experience and hence found worth sharing. Any thoughts and suggestions, always welcome!!!
Also refer http://www.dba-oracle.com/t_physical_standby_missing_log_scenario.htm
-Rohit

2 comments:

Asif Jafri said...

Why not first try to copy the missing log file (archived) and register it on the standby. Once this is ruled out, we can proceed to step #1 and 2 as described in your article.

Thanks,

Anshul said...

thanks rohit sir;

please give us some more such type of blog....

Anshul Shukla
Dy Mgr(IT)