Saturday, November 21, 2009


Resolve Physical Standby gaps

Scenario #1 - Version 10g – ASM is used as storage and primary’s backups not available on standby

Assumptions:

1. The primary’s backup location can not be mounted onto standby server i.e it the NFS mount where primary’s backup are taken is not available to standby server
2. ASM is the storage on primary and standby

To resolve the gap, we need to transfer the standby from primary to standby.

There are 2 cases:

1. The missing log is available in the primary database and has not been deleted
2. The missing log is not available in the primary database and is available in the archive log backup of primary.


First check if the log is available or not, run the following query on primary database:


select status, deleted from v$archived_log where sequence# = ;

In both the cases, the steps are almost similar. Wherever the steps are exclusive to a case, a note is provided to specify the same.

Step#1 – Restore the missing log from primary’s archived log backup.

This step is only required when the log has been deleted from the primary database and is available only in the archived log backups.

RMAN> connect catalog username/password@catalog

RMAN> connect target /

RMAN> restore archivelog sequence 157682;

==> If there is a gap of more than 1 log, we need to use the statement like:

RMAN> restore archivelog from sequence until sequence ;

Note that this will restore the archive log to the default archival destination of the database specified by the parameter log_archive_dest_1

==> You can use the following command to know the backup piece which contains the backup of that missing log

RMAN> list backup of archivelog sequence 157682;

Step#2 Copy the archive log from ASM diskgroup to the normal OCFS file system

If the missing archive log is available in the primary database, start from this step.

Use the following RMAN command to achieve this:

RMAN> copy archivelog '+DATA/IPWP_RWC1/ARCHIVELOG/2009_11_13/thread_1_seq_157682.1341.702785107' to '/tmp/thread_1_seq_157682.1341.702785107';

Step#3 “scp” the log from primary server to standby server’s file system

scp /tmp/thread_1_seq_157682.1341.702785107 oracle@ipw-db-sac1: /tmp/thread_1_seq_157682.1341.702785107

Step#4 Manually recover the standby database using the just shipped archived log

To do this first cancel the managed recovery:

Now, we need to perform the manual recovery to apply the archive log which is missing and has been transferred from primary:

SQL> ALTER DATABASE RECOVER MANAGED STANDBY DATABASE CANCEL;

Database altered.

SQL> recover automatic standby database; <== This is for manual recovery ORA-00279: change 34121233951 generated at 11/12/2009 07:54:10 needed for
thread 1
ORA-00289: suggestion : +DATA
ORA-00280: change 34121233951 for thread 1 is in sequence #157682
ORA-00278: log file '+DATA' no longer needed for this recovery
ORA-00308: cannot open archived log '+DATA'
ORA-17503: ksfdopn:2 Failed to open file +DATA
ORA-15045: ASM file name '+DATA' is not in reference form


Specify log: {=suggested filename AUTO CANCEL}
/tmp/thread_1_seq_157682.1341.702785107 <== At this prompt, provide the name of the log which has been copied over from primary

ORA-00279: change 34121306612 generated at 11/12/2009 07:58:19 needed for
thread 1
ORA-00289: suggestion : +DATA
ORA-00280: change 34121306612 for thread 1 is in sequence #157683
ORA-00278: log file '/tmp/thread_1_seq_157682.1341.702785107' no longer needed
for this recovery


Specify log: {=suggested filename AUTO CANCEL}
CANCEL <== At this prompt now, enter CANCEL to tell oracle to stop recovery because we have applied the one missing log
Media recovery cancelled.

Now, restart managed recovery

SQL> ALTER DATABASE RECOVER MANAGED STANDBY DATABASE DISCONNECT;

Now by querying the view v$archived_log, we can see that the missing log has been bypassed and MRP process has proceeded the log application process.

Scenario # 2 - Version 10g – Primary’s backups are available on the standby

Assumptions:

1. The primary’s backup location can be mounted onto standby server i.e it the NFS mount where primary’s backup are taken is available to standby server
2. OCFS is being used as storage

If the archive logs have been deleted from the database, we need to restore them from the archive log backups as demonstrated. This scenario is in fact independent of ASM or OCFS because backup pieces are available on the standby.

STEP#1: First, know the primary’s backup piece(s) which contain the missing archive log(s), for example:

RMAN> list backup of archivelog from sequence 154472 until sequence 154474;
This will give the name of all backup pieces which contain the required logs.

STEP#2: Since the backup pieces are available on the standby server, catalog all the backup pieces on the standby. For example,

RMAN > catalog backuppiece '/location/piece_name';
You do not need to connect to recovery catalog, but connect to standby as the target. Note that “catalog” is only available in version 10g

STEP#3: Once the pieces are cataloged, stay connected to the standby database as target and restore the archive logs as follows:

RMAN> restore archivelog from sequence 154472 until sequence 154474;

RMAN will restore all the archivelogs to the correct diskgroup (archival location) on the standby

Since we have now restored the missing log(s) onto the standby, there is no need of recovery. The restored logs will now be available in standby and shold be applied by MRP automatically which has been so far waiting for the gaps to be filled.

Scenario # 3 - Version 9i – Primary’s backups are not available on standby server

Assumptions:

1. The archive logs have been deleted from the primary database
2. Primary’s backups are not available on standby server.

In this scenario, the steps are almost similar to the first scenario.

Note that, if ASM is not the storage, then we just need to restore the archive logs and move them to the standby server. Which means step#2 can be skipped when compared to the first scenario.

Important notes on resolving physical standby gaps


==> If the archive log has not been deleted from the database and ASM is not the storage, then simply copy the archive log from primary to standby’s archival location. MRP will automatically pick the archive logs and start applying them. This is true for 10g as well as 9i.

==> In all the scenarios, there are 2 fundamental ways to fill the gap on standby – Either by restoring the missing logs at standby’s archival location where MRP can pick it up automatically
OR
If the missing log(s) can not be restored at the default location, then perform an incomplete recovery of standby using the restored archived log which has been restored at the non-default location. This is basically applicable in cases where ASM is being used

Thursday, October 16, 2008

Physical Standby out of sync - Missing Datafiles Scenario

Environment:

1. Primary has 200 datafiles and standby has only 166 datafiles

2. Primary is a 3 node cluster and Standby is a 2 node cluster

3. The DB name is mydb

Problem and Symptoms:

1. When I tried to start the MRP on standby, it reported the following error in alert log:

**************************************************************
Errors in file /u01/app/oracle/admin/mydb/bdump/mydb1_mrp0_21189.trc:ORA-01111: name for data file 167 is unknown - rename to correct fileORA-01110: data file 167: '/u01/app/oracle/product/9.2.0/dbs/UNNAMED00167'ORA-01157: cannot identify/lock data file 167 - see DBWR trace fileORA-01111: name for data file 167 is unknown - rename to correct fileORA-01110: data file 167: '/u01/app/oracle/product/9.2.0/dbs/UNNAMED00167'
*************************************************************

2. On further investigation, standby’s alert log also contains following errors:

************************************************************************
Tue Sep 9 04:05:03 2008Media Recovery Log /u03/oradata/mydb/arc_backup/mydb_2_2173.arcMedia Recovery Log /u03/oradata/mydb/arc_backup/mydb_1_1896.arcWARNING: File being created with same name as in PrimaryExisting file may be overwrittenFile #167 added to control file as 'UNNAMED00167'. Originally created as:'/u07/oradata/mydb/myfile_1.dbf'Recovery was unable to create the file as:'/u07/oradata/mydb/myfile_1.dbf'MRP0: Background Media Recovery terminated with error 1274Tue Sep 9 04:05:06 2008Errors in file /u01/app/oracle/admin/mydb/bdump/mydb1_mrp0_7175.trc:ORA-01274: cannot add datafile '/u07/oradata/mydb/myfile-1.dbf' - file could not be createdORA-01119: error in creating database file '/u07/oradata/mydb/myfile_1.dbf'ORA-27054: Message 27054 not found; product=RDBMS; facility=ORALinux-x86_64 Error: 13: Permission denied
**************************************************************************

3. On checking the view v$archived_log, there were lot of log sequence# which were APPLIED=NO

4. There is no gap in the sequence#

Reason:

Parameter db_file_name_convert was not set at standby database. So as long as the files were created on /u02 and /u03 on primary, there was no problem on the standby because standby had /u02 and /u03. But when file#167 was added at /u07 on primary (on Sep 9 04:05:03 2008), it could not map to a /u07 mount point on standby because /u07 does not exists on standby and db_file_name_convert was also not set. As indicated by the alert log, the file#167 was registered in the standby’s control file as “UNANMED00167” at the default location of $ORACLE_HOME/dbs but the file was not created physically on standby database.

Action Plan:

1. At the standby:Please set the db_file_name_convert parameter at the Standby for the /u07 folder at the Primary to the corresponding folder at the Standby.
Since this parameter is a Static parameter, you need to bounce the Standby DB.

******************************************************************************
As step#1, you can do following instead of the above step:

At the standby:
Create /u07 soft link for /u02, to eliminate the bounce of standby db due to the addition of db_file_name_convert init.ora parameter

*********************************************************************************************

2. At the standby :SQL> alter system set standby_file_management=manual;

3. At the Primary for the datafile 167 :

SQL> alter tablespace <> begin backup ;
Copy the Datafile from the Primary to Standby to the correct location.
SQL> Alter tablespace end backup;

4. At the Standby:

SQL> alter database rename file '.......UNNAMED00167' to '<>';

******************************************************************************
You can skip steps#3 and #4 and instead do following step after #2:

At the Standby:
SQL> ALTER DATABASE CREATE DATAFILE '< ....UNNAMED00167>' as '<>';

******************************************************************************

5. To create the remaining datafiles at the Standby automatically:

SQL> alter system set standby_file_management=auto;

6. Start the MRP at the Standby
SQL> alter database recover managed standby database;

At standby database ensure the MRP is running as expected:
SQL>select process, status , sequence# from v$managed_standby;

When Primary and Standby are RAC databases:

1. On Standby: You can see multiple copies of some or all logs transported and applied on standby when you check the view v$archived_log.

2. On Standby: All sequence# should have APPLIED=YES in v$archived_log for all threads. This ensures that all logs from all threads were transported and applied on standby and hence keeps standby in sync with primary.

3. On Standby: In the view v$archived_log you may not see same number of multiple copies of all logs. For example, if the primary is a 3 node cluster, you may or may not have 3 copies of each log i.e. you may not have the same sequence# log on standby for all 3 threads. Of course the reason is that number of logs generated on all 3 nodes of primary will differ. The current sequence# transported from a node of primary RAC database can be seen by querying v$archived_log on standby:

SQL> select max(sequence#) from v$archived_log where thread#=1;

As explained above, the output will differ for all 3 threads.

Wednesday, August 6, 2008

Tips for Oracle DBAs

I have a blog where i stream some useful Oracle tips.
http://rohitguptaoracletips.blogspot.com/
These are very few tips out of some challenging tasks i have been doing and do have lot more to add

-Rohit

So your standby is out of sync?

When using dataguard, there are various scenarios where physical standby goes out of sync with primary. Refer http://download.oracle.com/docs/cd/B19306_01/server.102/b14239/scenarios.htm#CIHIAADC for scenarios where standby needs to be rolled forward and the steps to follow to bring it in sync with primary.

Before doing anything we need to verify that why standby is not in sync with primary. In this paticular note, i am covering the scenrio where a log is missing from the standby and the associated problems. Verify from v$archived_log that there is a gap in Sequence number. All the logs upto that gap should have APPLIED=YES.

SQL> SELECT SEQUENCE#, APPLIED FROM V$ARCHIVED_LOG;

There are basically couple of steps to be performed when the standby is not in sync with Primary and is lagging in terms of redo logs. These are:
1. Take a incremental backup of primary from the SCN where standby is lagging behind and apply on the standby server
2. Then re-create the controlfile of standby database from the primary
******************************************************************************
STEP#1
1. on STANDBY database query the V$DATABASE view and record the current SCN of the standby database:
SQL> SELECT CURRENT_SCN FROM V$DATABASE;
CURRENT_SCN
-----------
1.3945E+10

SQL> SELECT to_char(CURRENT_SCN) FROM V$DATABASE;
TO_CHAR(CURRENT_SCN)
----------------------------------------
13945141914

2. Stop Redo Apply on the standby database:
SQL> ALTER DATABASE RECOVER MANAGED STANDBY DATABASE CANCEL;ALTER DATABASE RECOVER MANAGED STANDBY DATABASE CANCEL*ERROR at line 1:ORA-16136: Managed Standby Recovery not active
If you see this above error, it means Managed Recovery is already off
You can also confirm from the view v$managed_standby to see if the MRP is running or not
SQL> SELECT PROCESS, STATUS FROM V$MANAGED_STANDBY;

3. Connect to the primary database as the RMAN target and create an incremental backup from the current SCN of the standby database that was recorded in step 1:
BACKUP INCREMENTAL FROM SCN 13945141914 DATABASE FORMAT '/tmp/ForStandby_%U' tag 'FOR STANDBY'

4. Do a recovery of STANDBY database using the incremental backup of primary taken above
--> On the Standby server, Without connecting to recovery catalog, catalog the backupset of the incermental backup
$ rman nocatalog target /
RMAN> CATALOG BACKUPPIECE '/dump/ipwp/inc_bkup/ForStandby_1qjm8jn2_1_1';
--> Now in the same session, start the recovery
RMAN> RECOVER DATABASE NOREDO;
you should see something like follwing at the end:
channel ORA_DISK_1: reading from backup piece /dump/ipwp/inc_bkup/ForStandby_1qjm8jn2_1_1channel ORA_DISK_1: restored backup piece 1piece handle=/dump/ipwp/inc_bkup/ForStandby_1qjm8jn2_1_1 tag=FOR STANDBYchannel ORA_DISK_1: restore complete, elapsed time: 01:53:08
Finished recover at 2008-07-25 05:20:3
--> Delete the backup set from standby
RMAN> DELETE BACKUP TAG 'FOR STANDBY';
using channel ORA_DISK_1
List of Backup PiecesBP Key BS Key Pc# Cp# Status Device Type Piece Name
------- ------- --- --- ----------- ----------- ----------
17713 17713 1 1 AVAILABLE DISK /dump/ipwp/inc_bkup/ForStandby_1qjm8jn2_1_1
Do you really want to delete the above objects (enter YES or NO)? YES
deleted backup piecebackup piece handle=/dump/ipwp/inc_bkup/ForStandby_1qjm8jn2_1_1 recid=17713 stamp=660972421Deleted 1 objects

5. Try to start the managed recovery.
ALTER DATABASE RECOVER MANAGED STANDBY DATABASE USING CURRENT LOGFILE DISCONNECT FROM SESSION;
--> If you get an error here, you need to goto STEP#2 for bringing standby in sync
--> If no error, then using the view v$managed_standby, verify that MRP process is started.

6. After this check whether the logs are being applied on the standby or not.
SQL> SELECT SEQUENCE#, APPLIED FROM V$ARCHIVED_LOG;
*******************************************************************************************
STEP #2: Since Managed recovery failed after applying the incremental backup, we need to re-create the controlfile of standby. The reason for re-creating the controlfile is that State of the database was same because the database_scn was not updated in the control file after applying the incremental backup while the scn for datafiles were updated. Due to this standby database was still looking for the old file to apply.
To recreate the standby controlfile:
--> Take the backup of controlfile from primary
rman target sys/oracle@boston catalog rman/cat@emrepbackup current controlfile for standby;
--> Copy the controlfile backup to the standby system (or if it is on the common NFS mount, no need to transfer or copy)
--> Shutdown all instances (If standby is RAC) of the standby.
sqlplus / as sysdbashutdown immediateexit
--> Startup nomount, one instance.
sqlplus / as sysdbastartup nomountexit
--> Restore the standby control file.
rman nocatalog target /restore standby controlfile from '/tmp/o1_mf_TAG20070220T151030_.bkp';exit
--> Startup the standby with the new control file.
sqlplus / as sysdbashutdown immediatestartup mountexit
--> Restart managed recovery in one instance (if standby is RAC) of the standby database:
sqlplus / as sysdba
ALTER DATABASE RECOVER MANAGED STANDBY DATABASE DISCONNECT

The abobe statement will succeed without errors but still MRP process is not started. The reason is that since the controlfile has been restored from the primary, it is looking for datafiles at the same location as are in primary instead of standby. For example, if the priamry datafiles are located at '+DATA/prod_db/DATAFILE' and standby datafiles are at '+DATA/standby_db/DATAFILE', the new controlfile has the datafiles location as '+DATA/prod_db/DATAFILE'. This can be verified from the query "select name from v$datafile" on the standby instance. We need to rename all the datafiles to reflect the correct location.
To rename the datafiles, there are 2 ways:
1. Without RMAN
--> Change the parameter standby_file_management=manual--> ALTER DATABASE RENAME FILE '+DATA/prod_db/datafile/users.310.620229743' TO '+DATA/standby_db/datafile/USERS.1216.648429765';
2. Using RMAN
--> rman nocatalog target /
--> Catalog the files, the string specified should refer to the diskgroup/filesystem destination of the standby data files.
RMAN> catalog start with '+diskgroup//datafile/';
e.g.:
RMAN> catalog start with '+DATA/ipwp_sac1/datafiles/';
This will give the user a list of files and ask if they should all be catalog. The user should review and say YES if all the datafiles are properly listed.
--> Once that is done, then commit the changes to the controlfile
RMAN> switch database to copy;
--> Now if you start the managed recovery
ALTER DATABASE RECOVER MANAGED STANDBY DATABASE DISCONNECT
and check for processes in the view V$MANAGED_STANBY, MRP process should be there. It will also start applying all the archived logs that were missing since last applied log (this might take hours)
*******************************************************************************************
After re-crreating the controfile and renaming the datafiles (before starting the managed recovery), we observed (and it is possible) that there is a datafile in production which is not present on the standby. This was verified by checking the names of datafiles from v$datafile view. It showed that there is one datafile whose location is still as of production and renaming effort also failed because the datafile is not at all present in the standby. So in such a case we need to backup that single datafile from production and restore it at standby.
--> On Production:
RMAN> run{ Allocate channel c1 type disk; Backup datafile '+DATA/ipwp_rwc1/datafile/ipw_invli_is.1643.660041401' format '/dump/ipwp/rman_backup/ipw_invli_is'; }

--> On standby:
RMAN> catalog backuppiece '/dump/db/rman_backup/ipw_invli_is;
cataloged backuppiecebackup piece handle=/dump/ipwp/rman_backup/ipw_invli_is recid=26806 stamp=661232892
RMAN> RESTORE DATAFILE '+DATA/prod_db/datafile/ipw_invli_is.1643.660041401';
Starting restore at 2008-07-28 03:58:18allocated channel: ORA_DISK_1channel ORA_DISK_1: sid=321 devtype=DISK
channel ORA_DISK_1: starting datafile backupset restorechannel ORA_DISK_1: specifying datafile(s) to restore from backup setrestoring datafile 00045 to +DATA/ipwp_rwc1/datafile/ipw_invli_is.1643.660041401channel ORA_DISK_1: reading from backup piece /dump/ipwp/rman_backup/ipw_invli_ischannel ORA_DISK_1: restored backup piece 1piece handle=/dump/ipwp/rman_backup/ipw_invli_is tag=TAG20080728T010613channel ORA_DISK_1: restore complete, elapsed time: 00:01:26Finished restore at 2008-07-28 03:59:46
RMAN> delete backuppiece '/dump/ipwp/rman_backup/ipw_invli_is';
allocated channel: ORA_DISK_1channel ORA_DISK_1: sid=310 devtype=DISK
List of Backup PiecesBP Key BS Key Pc# Cp# Status Device Type Piece Name------- ------- --- --- ----------- ----------- ----------26806 26805 1 1 AVAILABLE DISK /dump/ipwp/rman_backup/ipw_invli_is
Do you really want to delete the above objects (enter YES or NO)? YESdeleted backup piecebackup piece handle=/dump/ipwp/rman_backup/ipw_invli_is recid=26806 stamp=661232892Deleted 1 objects
--> After this re-confirm the location of all datafiles and then start the managed recovery.
ALTER DATABASE RECOVER MANAGED STANDBY DATABASE DISCONNECT;
*******************************************************************************

Hope the note helps. This has been my real-life experience and hence found worth sharing. Any thoughts and suggestions, always welcome!!!
Also refer http://www.dba-oracle.com/t_physical_standby_missing_log_scenario.htm
-Rohit