Thursday, October 16, 2008

Physical Standby out of sync - Missing Datafiles Scenario

Environment:

1. Primary has 200 datafiles and standby has only 166 datafiles

2. Primary is a 3 node cluster and Standby is a 2 node cluster

3. The DB name is mydb

Problem and Symptoms:

1. When I tried to start the MRP on standby, it reported the following error in alert log:

**************************************************************
Errors in file /u01/app/oracle/admin/mydb/bdump/mydb1_mrp0_21189.trc:ORA-01111: name for data file 167 is unknown - rename to correct fileORA-01110: data file 167: '/u01/app/oracle/product/9.2.0/dbs/UNNAMED00167'ORA-01157: cannot identify/lock data file 167 - see DBWR trace fileORA-01111: name for data file 167 is unknown - rename to correct fileORA-01110: data file 167: '/u01/app/oracle/product/9.2.0/dbs/UNNAMED00167'
*************************************************************

2. On further investigation, standby’s alert log also contains following errors:

************************************************************************
Tue Sep 9 04:05:03 2008Media Recovery Log /u03/oradata/mydb/arc_backup/mydb_2_2173.arcMedia Recovery Log /u03/oradata/mydb/arc_backup/mydb_1_1896.arcWARNING: File being created with same name as in PrimaryExisting file may be overwrittenFile #167 added to control file as 'UNNAMED00167'. Originally created as:'/u07/oradata/mydb/myfile_1.dbf'Recovery was unable to create the file as:'/u07/oradata/mydb/myfile_1.dbf'MRP0: Background Media Recovery terminated with error 1274Tue Sep 9 04:05:06 2008Errors in file /u01/app/oracle/admin/mydb/bdump/mydb1_mrp0_7175.trc:ORA-01274: cannot add datafile '/u07/oradata/mydb/myfile-1.dbf' - file could not be createdORA-01119: error in creating database file '/u07/oradata/mydb/myfile_1.dbf'ORA-27054: Message 27054 not found; product=RDBMS; facility=ORALinux-x86_64 Error: 13: Permission denied
**************************************************************************

3. On checking the view v$archived_log, there were lot of log sequence# which were APPLIED=NO

4. There is no gap in the sequence#

Reason:

Parameter db_file_name_convert was not set at standby database. So as long as the files were created on /u02 and /u03 on primary, there was no problem on the standby because standby had /u02 and /u03. But when file#167 was added at /u07 on primary (on Sep 9 04:05:03 2008), it could not map to a /u07 mount point on standby because /u07 does not exists on standby and db_file_name_convert was also not set. As indicated by the alert log, the file#167 was registered in the standby’s control file as “UNANMED00167” at the default location of $ORACLE_HOME/dbs but the file was not created physically on standby database.

Action Plan:

1. At the standby:Please set the db_file_name_convert parameter at the Standby for the /u07 folder at the Primary to the corresponding folder at the Standby.
Since this parameter is a Static parameter, you need to bounce the Standby DB.

******************************************************************************
As step#1, you can do following instead of the above step:

At the standby:
Create /u07 soft link for /u02, to eliminate the bounce of standby db due to the addition of db_file_name_convert init.ora parameter

*********************************************************************************************

2. At the standby :SQL> alter system set standby_file_management=manual;

3. At the Primary for the datafile 167 :

SQL> alter tablespace <> begin backup ;
Copy the Datafile from the Primary to Standby to the correct location.
SQL> Alter tablespace end backup;

4. At the Standby:

SQL> alter database rename file '.......UNNAMED00167' to '<>';

******************************************************************************
You can skip steps#3 and #4 and instead do following step after #2:

At the Standby:
SQL> ALTER DATABASE CREATE DATAFILE '< ....UNNAMED00167>' as '<>';

******************************************************************************

5. To create the remaining datafiles at the Standby automatically:

SQL> alter system set standby_file_management=auto;

6. Start the MRP at the Standby
SQL> alter database recover managed standby database;

At standby database ensure the MRP is running as expected:
SQL>select process, status , sequence# from v$managed_standby;

When Primary and Standby are RAC databases:

1. On Standby: You can see multiple copies of some or all logs transported and applied on standby when you check the view v$archived_log.

2. On Standby: All sequence# should have APPLIED=YES in v$archived_log for all threads. This ensures that all logs from all threads were transported and applied on standby and hence keeps standby in sync with primary.

3. On Standby: In the view v$archived_log you may not see same number of multiple copies of all logs. For example, if the primary is a 3 node cluster, you may or may not have 3 copies of each log i.e. you may not have the same sequence# log on standby for all 3 threads. Of course the reason is that number of logs generated on all 3 nodes of primary will differ. The current sequence# transported from a node of primary RAC database can be seen by querying v$archived_log on standby:

SQL> select max(sequence#) from v$archived_log where thread#=1;

As explained above, the output will differ for all 3 threads.

Wednesday, August 6, 2008

Tips for Oracle DBAs

I have a blog where i stream some useful Oracle tips.
http://rohitguptaoracletips.blogspot.com/
These are very few tips out of some challenging tasks i have been doing and do have lot more to add

-Rohit

So your standby is out of sync?

When using dataguard, there are various scenarios where physical standby goes out of sync with primary. Refer http://download.oracle.com/docs/cd/B19306_01/server.102/b14239/scenarios.htm#CIHIAADC for scenarios where standby needs to be rolled forward and the steps to follow to bring it in sync with primary.

Before doing anything we need to verify that why standby is not in sync with primary. In this paticular note, i am covering the scenrio where a log is missing from the standby and the associated problems. Verify from v$archived_log that there is a gap in Sequence number. All the logs upto that gap should have APPLIED=YES.

SQL> SELECT SEQUENCE#, APPLIED FROM V$ARCHIVED_LOG;

There are basically couple of steps to be performed when the standby is not in sync with Primary and is lagging in terms of redo logs. These are:
1. Take a incremental backup of primary from the SCN where standby is lagging behind and apply on the standby server
2. Then re-create the controlfile of standby database from the primary
******************************************************************************
STEP#1
1. on STANDBY database query the V$DATABASE view and record the current SCN of the standby database:
SQL> SELECT CURRENT_SCN FROM V$DATABASE;
CURRENT_SCN
-----------
1.3945E+10

SQL> SELECT to_char(CURRENT_SCN) FROM V$DATABASE;
TO_CHAR(CURRENT_SCN)
----------------------------------------
13945141914

2. Stop Redo Apply on the standby database:
SQL> ALTER DATABASE RECOVER MANAGED STANDBY DATABASE CANCEL;ALTER DATABASE RECOVER MANAGED STANDBY DATABASE CANCEL*ERROR at line 1:ORA-16136: Managed Standby Recovery not active
If you see this above error, it means Managed Recovery is already off
You can also confirm from the view v$managed_standby to see if the MRP is running or not
SQL> SELECT PROCESS, STATUS FROM V$MANAGED_STANDBY;

3. Connect to the primary database as the RMAN target and create an incremental backup from the current SCN of the standby database that was recorded in step 1:
BACKUP INCREMENTAL FROM SCN 13945141914 DATABASE FORMAT '/tmp/ForStandby_%U' tag 'FOR STANDBY'

4. Do a recovery of STANDBY database using the incremental backup of primary taken above
--> On the Standby server, Without connecting to recovery catalog, catalog the backupset of the incermental backup
$ rman nocatalog target /
RMAN> CATALOG BACKUPPIECE '/dump/ipwp/inc_bkup/ForStandby_1qjm8jn2_1_1';
--> Now in the same session, start the recovery
RMAN> RECOVER DATABASE NOREDO;
you should see something like follwing at the end:
channel ORA_DISK_1: reading from backup piece /dump/ipwp/inc_bkup/ForStandby_1qjm8jn2_1_1channel ORA_DISK_1: restored backup piece 1piece handle=/dump/ipwp/inc_bkup/ForStandby_1qjm8jn2_1_1 tag=FOR STANDBYchannel ORA_DISK_1: restore complete, elapsed time: 01:53:08
Finished recover at 2008-07-25 05:20:3
--> Delete the backup set from standby
RMAN> DELETE BACKUP TAG 'FOR STANDBY';
using channel ORA_DISK_1
List of Backup PiecesBP Key BS Key Pc# Cp# Status Device Type Piece Name
------- ------- --- --- ----------- ----------- ----------
17713 17713 1 1 AVAILABLE DISK /dump/ipwp/inc_bkup/ForStandby_1qjm8jn2_1_1
Do you really want to delete the above objects (enter YES or NO)? YES
deleted backup piecebackup piece handle=/dump/ipwp/inc_bkup/ForStandby_1qjm8jn2_1_1 recid=17713 stamp=660972421Deleted 1 objects

5. Try to start the managed recovery.
ALTER DATABASE RECOVER MANAGED STANDBY DATABASE USING CURRENT LOGFILE DISCONNECT FROM SESSION;
--> If you get an error here, you need to goto STEP#2 for bringing standby in sync
--> If no error, then using the view v$managed_standby, verify that MRP process is started.

6. After this check whether the logs are being applied on the standby or not.
SQL> SELECT SEQUENCE#, APPLIED FROM V$ARCHIVED_LOG;
*******************************************************************************************
STEP #2: Since Managed recovery failed after applying the incremental backup, we need to re-create the controlfile of standby. The reason for re-creating the controlfile is that State of the database was same because the database_scn was not updated in the control file after applying the incremental backup while the scn for datafiles were updated. Due to this standby database was still looking for the old file to apply.
To recreate the standby controlfile:
--> Take the backup of controlfile from primary
rman target sys/oracle@boston catalog rman/cat@emrepbackup current controlfile for standby;
--> Copy the controlfile backup to the standby system (or if it is on the common NFS mount, no need to transfer or copy)
--> Shutdown all instances (If standby is RAC) of the standby.
sqlplus / as sysdbashutdown immediateexit
--> Startup nomount, one instance.
sqlplus / as sysdbastartup nomountexit
--> Restore the standby control file.
rman nocatalog target /restore standby controlfile from '/tmp/o1_mf_TAG20070220T151030_.bkp';exit
--> Startup the standby with the new control file.
sqlplus / as sysdbashutdown immediatestartup mountexit
--> Restart managed recovery in one instance (if standby is RAC) of the standby database:
sqlplus / as sysdba
ALTER DATABASE RECOVER MANAGED STANDBY DATABASE DISCONNECT

The abobe statement will succeed without errors but still MRP process is not started. The reason is that since the controlfile has been restored from the primary, it is looking for datafiles at the same location as are in primary instead of standby. For example, if the priamry datafiles are located at '+DATA/prod_db/DATAFILE' and standby datafiles are at '+DATA/standby_db/DATAFILE', the new controlfile has the datafiles location as '+DATA/prod_db/DATAFILE'. This can be verified from the query "select name from v$datafile" on the standby instance. We need to rename all the datafiles to reflect the correct location.
To rename the datafiles, there are 2 ways:
1. Without RMAN
--> Change the parameter standby_file_management=manual--> ALTER DATABASE RENAME FILE '+DATA/prod_db/datafile/users.310.620229743' TO '+DATA/standby_db/datafile/USERS.1216.648429765';
2. Using RMAN
--> rman nocatalog target /
--> Catalog the files, the string specified should refer to the diskgroup/filesystem destination of the standby data files.
RMAN> catalog start with '+diskgroup//datafile/';
e.g.:
RMAN> catalog start with '+DATA/ipwp_sac1/datafiles/';
This will give the user a list of files and ask if they should all be catalog. The user should review and say YES if all the datafiles are properly listed.
--> Once that is done, then commit the changes to the controlfile
RMAN> switch database to copy;
--> Now if you start the managed recovery
ALTER DATABASE RECOVER MANAGED STANDBY DATABASE DISCONNECT
and check for processes in the view V$MANAGED_STANBY, MRP process should be there. It will also start applying all the archived logs that were missing since last applied log (this might take hours)
*******************************************************************************************
After re-crreating the controfile and renaming the datafiles (before starting the managed recovery), we observed (and it is possible) that there is a datafile in production which is not present on the standby. This was verified by checking the names of datafiles from v$datafile view. It showed that there is one datafile whose location is still as of production and renaming effort also failed because the datafile is not at all present in the standby. So in such a case we need to backup that single datafile from production and restore it at standby.
--> On Production:
RMAN> run{ Allocate channel c1 type disk; Backup datafile '+DATA/ipwp_rwc1/datafile/ipw_invli_is.1643.660041401' format '/dump/ipwp/rman_backup/ipw_invli_is'; }

--> On standby:
RMAN> catalog backuppiece '/dump/db/rman_backup/ipw_invli_is;
cataloged backuppiecebackup piece handle=/dump/ipwp/rman_backup/ipw_invli_is recid=26806 stamp=661232892
RMAN> RESTORE DATAFILE '+DATA/prod_db/datafile/ipw_invli_is.1643.660041401';
Starting restore at 2008-07-28 03:58:18allocated channel: ORA_DISK_1channel ORA_DISK_1: sid=321 devtype=DISK
channel ORA_DISK_1: starting datafile backupset restorechannel ORA_DISK_1: specifying datafile(s) to restore from backup setrestoring datafile 00045 to +DATA/ipwp_rwc1/datafile/ipw_invli_is.1643.660041401channel ORA_DISK_1: reading from backup piece /dump/ipwp/rman_backup/ipw_invli_ischannel ORA_DISK_1: restored backup piece 1piece handle=/dump/ipwp/rman_backup/ipw_invli_is tag=TAG20080728T010613channel ORA_DISK_1: restore complete, elapsed time: 00:01:26Finished restore at 2008-07-28 03:59:46
RMAN> delete backuppiece '/dump/ipwp/rman_backup/ipw_invli_is';
allocated channel: ORA_DISK_1channel ORA_DISK_1: sid=310 devtype=DISK
List of Backup PiecesBP Key BS Key Pc# Cp# Status Device Type Piece Name------- ------- --- --- ----------- ----------- ----------26806 26805 1 1 AVAILABLE DISK /dump/ipwp/rman_backup/ipw_invli_is
Do you really want to delete the above objects (enter YES or NO)? YESdeleted backup piecebackup piece handle=/dump/ipwp/rman_backup/ipw_invli_is recid=26806 stamp=661232892Deleted 1 objects
--> After this re-confirm the location of all datafiles and then start the managed recovery.
ALTER DATABASE RECOVER MANAGED STANDBY DATABASE DISCONNECT;
*******************************************************************************

Hope the note helps. This has been my real-life experience and hence found worth sharing. Any thoughts and suggestions, always welcome!!!
Also refer http://www.dba-oracle.com/t_physical_standby_missing_log_scenario.htm
-Rohit