RMAN — ORA-19909: datafile 1 belongs to an orphan incarnation in a Standby Database.

We had an interesting scenario where we did a full rman refresh of a standby database becuase the archivelogs went missing and there was a gap. After doing the restore we got the below error in the RMAN recovery

rman nocatalog target sys/******@mw auxiliary /
connected to target database: MW (DBID=000000000)
using target database control file instead of recovery catalog
connected to auxiliary database: MW (not mounted)
RMAN> DUPLICATE TARGET DATABASE FOR STANDBY NOFILENAMECHECK DORECOVER;
The restore was succesful but the recovery gave below error

Oracle Error:
ORA-01547: warning: RECOVER succeeded but OPEN RESETLOGS would get error below
ORA-01152: file 1 was not restored from a sufficiently old backup
ORA-01110: data file 1: ‘+DATAC1/mwdr/datafile/system.572.867661777’
After doing a lot of googling and searching support.oracle.com it turns out it has something to do with the
incarnation of databases. We checked the incarnation for both the Primary and Standby database.

Output for Primary Database
RMAN> list incarnation of database;
List of Database Incarnations
DB Key Inc Key DB Name DB ID STATUS Reset SCN Reset Time
——- ——- ——– —————- — ———- ———-
1 1 MW 1312370394 PARENT 1 24-AUG-13
2 2 MW 1312370394 CURRENT 925702 10-JUL-14
Output for Standby Database

RMAN> list incarnation of database;

List of Database Incarnations
DB Key Inc Key DB Name DB ID STATUS Reset SCN Reset Time
——- ——- ——– —————- — ———- ———-
1 1 MW 1312370394 PARENT 1 24-AUG-13
2 2 MW 1312370394 ORPHAN 925702 10-JUL-14
4 4 MW 1312370394 ORPHAN 20799166 23-OCT-14
3 3 MW 1312370394 CURRENT 22477563 30-OCT-14


Solution:

As we can see clearly from the output the primary database is currenly on a different incarnation from the standby database. The way to address it is to reset the incarnation on the standby database to match the primary incarnation.
RMAN> reset database to incarnation 2;

RMAN> list incarnation of database;
List of Database Incarnations
DB Key Inc Key DB Name DB ID STATUS Reset SCN Reset Time
——- ——- ——– —————- — ———- ———-
1 1 MW 1312370394 PARENT 1 24-AUG-13
2 2 MW 1312370394 CURRENT 925702 10-JUL-14
4 4 MW 1312370394 ORPHAN 20799166 23-OCT-14
3 3 MW 1312370394 ORPHAN 22477563 30-OCT-14
As you can now the incarnations match. Now do the recovery for the database again and the logs will apply and start shipping like normal again.

Guaranteed Restore Points Oracle 11g

Guaranteed Restore Points are a life-saver when it comes to Application changes.It can ensure that you can rewind the database to a time without tradional point in time recovery. Guaranteed restore points are basically alias’es for SCN’s

A normal restore point assigns a restore point name to an SCN or specific point in time.The control file can maintain a record of thousands of normal restore points with no significant effect on database performance. Normal restore points eventually age out of the control file if not manually deleted, so they require no ongoing maintenance.

SQL> CREATE RESTORE POINT before_upgrade;

Creating a normal restore point eliminates manually recording an SCN in advance or determine the correct SCN after the fact by using features such as Flashback Query.

Like a normal restore point, a guaranteed restore point serves as an alias for an SCN in recovery operations. A principal difference is that guaranteed restore points never age out of the control file and must be explicitly dropped. In general, you can use a guaranteed restore point as an alias for an SCN with any command that works with a normal restore point. Except as noted, the information about where and how to use normal restore points applies to guaranteed restore points as well.

A guaranteed restore point ensures that you can use Flashback Database to rewind a database to its state at the restore point SCN, even if the generation of flashback logs is disabled. If flashback logging is enabled, then a guaranteed restore point enforces the retention of flashback logs required for Flashback Database to any SCN after the earliest guaranteed restore point.

This example shows how to create a guaranteed restore point:

SQL> CREATE RESTORE POINT before_upgrade GUARANTEE FLASHBACK DATABASE;

Listing Restore Points
______________________

You can use the LIST command to list either a specific restore point or all restore points known to the RMAN repository. The variations of the command are as follows:

LIST RESTORE POINT restore_point_name;
LIST RESTORE POINT ALL;

RMAN indicates the SCN and time of the restore point, the type of restore point, and the name of the restore point. The following example shows sample output:

RMAN> LIST RESTORE POINT ALL;

using target database control file instead of recovery catalog
SCN              RSP Time  Type       Time      Name
—————- ——— ———- ——— —-
341859           28-JUL-06            28-JUL-06 NORMAL_RS
343690           28-JUL-06 GUARANTEED 28-JUL-06 GUARANTEED_RS

To see a list of all currently defined restore points (normal and guaranteed), use the V$RESTORE_POINT control file view, by means of the following query:

SQL> SELECT NAME, SCN, TIME, DATABASE_INCARNATION#,
GUARANTEE_FLASHBACK_DATABASE,STORAGE_SIZE
FROM V$RESTORE_POINT

Dropping Restore Points
_______________________

When you are satisfied that you do not need an existing restore point, or when you want to create a new restore point with the name of an existing restore point, you can drop the restore point, using the DROP RESTORE POINT SQL*Plus statement. For example:

SQL> DROP RESTORE POINT before_app_upgrade;
Restore point dropped.

The same statement is used to drop both normal and guaranteed restore points.

Note:
Normal restore points eventually age out of the control file, even if not explicitly dropped.
Guaranteed restore points never age out of the control file. They remain until they are explicitly dropped.

Flashing back database till Restore Point
_________________________________________

Login to RMAN and bring the database to mount mode.

FLASHBACK DATABASE TO RESTORE POINT ‘before_upgrade’;

FLASHBACK DATABASE TO SCN 202381;

Open the database.

I normally create guaranteed retore points before doing a Dataguard switchover. It helps you assure you can go back to the state before you started (in case anything goes wrong).

Flashback Table to Timestamp 11gR2

Flashback Technology provided by Oracle is one of the most useful in a production environment. It is a life saver. You can even rollback a query to a point in time without requiring traditional database restore. I will demonstrate below how to do flashback a table to a point in time. But first you have to enable flashback.

From 11gR2 onwards you don’t have to shutdown the database to enable Flashback. You can do it online. To enable flashback in 11gR2 do the below steps.

1.  alter system set db_recovery_file_dest=’+FRA’ scope=both sid=’*’;

2. alter system set db_recovery_file_dest_size=’400G’ scope=both sid=’*’;

3.  select flashback_on from v$database;

FLASHBACK_ON
——————
NO

4. alter database flashback on;

select flashback_on from v$database;

FLASHBACK_ON
——————
YES

_______________________________________________________

— For a table to be flash-backed you need to have row movement enabled on it. We have a table here called Employees in Schema HR

ALTER TABLE HR.EMPLOYEES ENABLE ROW MOVEMENT;

 

— Check current SCN and Database Time

SELECT current_scn, TO_CHAR(SYSTIMESTAMP, ‘YYYY-MM-DD HH24:MI:SS’) FROM v$database;

4784136875    2012-04-30 8:30:00

 

— Check the current total no. of records in the table

SELECT COUNT(*) FROM HR.EMPLOYEES

Count(*)

126

 

— Check the current total no. of records in the table as it existed half an hour back at 8:00 AM

SELECT COUNT(*) FROM HR.EMPLOYEES AS OF TIMESTAMP TO_TIMESTAMP(‘2012-04-30 8:00:00’, ‘YYYY-MM-DD HH24:MI:SS’)

Count(*)

127

This means there is one record was deleted. We need to flashback the table as it existsed at 8:00AM

 

FLASHBACK TABLE HR.EMPLOYEES TO TIMESTAMP TO_TIMESTAMP(‘2012-04-30 8:00:00’, ‘YYYY-MM-DD HH24:MI:SS’);

 

SELECT COUNT(*) FROM HR.EMPLOYEES

Count(*)

127

 

Now we can see that the record are displaying as 127. Which means table is back to the state before the record was deleted.

RMAN Point-In-Time Recovery

RMAN Point-In-Time Recovery Example

There are many ways to restore a database using an RMAN backup – this example assumes you are running RMAN without a Catalog and are performing a Restore & Point-In-Time Recovery of all data back to a particular date/time in the past.

If you are running in Archive log mode and recover without specifying a date/time then RMAN will apply all Archived logs it can find, ofter recovering the database right back to the time when you started the restore operation!

If you are running in Archive log mode (and you should be), point-in time is probably the most common recovery scenario.

You will need the following information:

Database SID: ________

Database SYS password: ________

The Date and Time to restore to : ________

There are 5 steps to recover the database:

1) Restore backup files from tape
2) Mount the instance
3) Restore the datafiles
4) Recover the database
5) Reset the logs

Restore backup files from tape

If you are looking to restore the database to a time of (say 09:00) you will need the most recent RMAN backup files prior to the date (say 23:00 from the previous day) plus all the archive logs from the backup time until the restore time, in this case from 23:00 until 09:00.

If any of these files have been moved (e.g. archived to tape) restore them to the default locations on the oracle database server.

Mount the instance

C:\> Set ORACLE_SID=Live
C:\> rman TARGET SYS/Password NOCATALOG

RMAN:> shutdown immediate;
RMAN:> startup mount;

Restore and recover the datafiles

RMAN> run
{
allocate channel dev1 type disk;
set until time “to_date(‘2010-30-12:00:00:00’, ‘yyyy-dd-mm:hh24:mi:ss’)”;
restore database;
recover database; }

For a large database it can take a long time to restore each tablespace – for better performance during a restore place the RMAN backup files on a separate disk to the Oracle datafiles to reduce disk contention.

Open the database and reset logs

RMAN> alter database open resetlogs;

This will update all current datafiles and online redo logs and all subsequent archived redo logs with a new RESETLOGS SCN and time stamp.

As soon as you have done a resetlogs run a full backup, this is important as should you suffer a second failure you will not be able to perform a second recovery because after resetting the logs the SCN numbers will no longer match any older backup files.

original article : http://ss64.com/ora/rman_pitr_example.html

 

Block Media Recovery using RMAN ( Oracle 10g)

ORA-01578: ORACLE data block corrupted (file # 19, block # 461509)
ORA-01110: data file 19: ‘E:\ORACLE\ORADATA\PLACID\CARA_NS.ORA’

You are getting the above error in Application. This means a block has been corrupted on the disk. In this scenario we had last nights RMAn backup. IF you do not have RMAN configured. Please make sure RMAN backups are properly configured. RMAN is one of the most powerful backup and recovery tool for ORacle database. And every DBA should be familiar with RMAN.

First we will check for the objects which has been corrupted using the below quiery. In our case it was an Index.

LOgin to SQL*PLUS as sysdba

sqlplus sys/****** as sysdba

+ Identify the object holding the corrupted block using the following query:

SQL> SELECT tablespace_name, segment_type, owner,
    segment_name
    FROM  dba_extents
    WHERE file_id = 19
    and 461509 between block_id AND block_id +  blocks – 1;

+ If the block belonged to an index segment, drop and recreate the index.

+ If the block belonged to a data segment either use Rman Blockrecover or restore and recover the complete datafile to fix the corruption.

TABLESPACE_NAME                SEGMENT_TYPE       OWNER
—————————— —————— ——————————

SEGMENT_NAME
——————————————————————————–

CAIND_TS                       INDEX              PLACID
CAOUT_L_ITEM_I

SQL>SELECT * FROM V$DATABASE_BLOCK_CORRUPTION;

no rows selected.

So now we will logon to RMAN on the server itself( No recovery catalog is used. The information is stored in control file itself) Another silver bullet is that always tag your RMAN backups. It is very helpful in recovery.
eg : – backup incremental level 0 TAG ‘DAILY_FULL_CARA’ database filesperset 4;

rman target sys/****

RMAN> BLOCKRECOVER DATAFILE 19 BLOCK 461509;

Starting blockrecover at 16-AUG-11
allocated channel: ORA_DISK_1
channel ORA_DISK_1: sid=59 devtype=DISK

RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of blockrecover command at 08/16/2011 09:40:54
RMAN-06026: some targets not found – aborting restore
RMAN-06023: no backup or copy of datafile 19 found to restore

RMAN is not able to find the backup. So we will use the tag name and blockrecover for the entire corrupted blocks. To find out the tag name you can use list backup summary command;

RMAN> list backup summary;

RMAN> blockrecover corruption list from tag “DAILY_FULL_CARA”;

Starting blockrecover at 16-AUG-11
using channel ORA_DISK_1

starting media recovery
media recovery complete, elapsed ti

Finished blockrecover at 16-AUG-11

RMAN> exit;

/*
We need to flush the buffer_cache because if the block 461509 is
in the buffer_cache already it will not be read from the data file.
*/

 SQL> alter system flush buffer_cache;

NOw restart the application. You would have succesfully restored the corrupted block and everything should be ok.

Restore ControlFile with RMAN / Without RMAN

Restoring a controlfile without rman
_____________________________________

suppose you lost a controlfile say control01.ctl. The controlfile got deleted but you have the controlfile
control02.ctl control03.ctl

Now when you login to database, you get below error

SQL> shutdown immediate
ORA-00210: cannot open the specified control file
/u2/oracle/product/10.1.0/db/oradata/SWX/control01.ctl’
ORA-27041: unable to open file
SVR4 Error: 2: No such file or directory
Additional information: 3

The best thing to do in such scenario quickly is to copy any of the remaining controlfile and rename it to controlfile “control01.ctl”

You can do this using OS command like below:

bash-3.00$ cp -p control02.ctl control01.ctl

Now log back to database and Voila everything should work perfectly

Connected to:
Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 – 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options

SQL> shutdown immediate;
Database closed.
Database dismounted.
ORACLE instance shut down.
SQL> startup
ORACLE instance started.

Restoring a controlfile from rman
_____________________________________

We are goign to assume you have controlfile autobackup on and you have latest full backup in RMAN

When you connect rman to restore control file it gives below error:

RMAN TARGET /
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
ORA-00210: cannot open the specified control file
ORA-00202: control file: ‘/u2/oracle/product/10.1.0/db/oradata/SWX/control03.ctl’
ORA-27041: unable to open file

Dont panic just login to sqlplus as sysdba and do a shutdown abort (make sure your ORACLR_SID is set properly)

Now log back to RMAN

RMAN TARGET /
connected to target database (not started)

RMAN> restore controlfile from autobackup;

Starting restore at 23-JAN-11
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of restore command at 01/23/2011 11:46:36
RMAN-12010: automatic channel allocation initialization failed
RMAN-06403: could not obtain a fully authorized session
ORA-01034: ORACLE not available
ORA-27101: shared memory realm does not exist
SVR4 Error: 2: No such file or directory

Again it fails. The correct step is as belows:

RMAN> startup nomount

RMAN> set dbid=4217664994

executing command: SET DBID

RMAN>  restore controlfile from ‘/u3/rmanbkp/c-4217664994-20110124-00′;

Starting restore at 24-JAN-11
allocated channel: ORA_DISK_1
channel ORA_DISK_1: sid=541 devtype=DISK

RMAN> mount database;

database mounted
released channel: ORA_DISK_1

RMAN> restore database;

Starting restore at 24-JAN-11
Starting implicit crosscheck backup at 24-JAN-11
allocated channel: ORA_DISK_1
channel ORA_DISK_1: sid=541 devtype=DISK
.
.
.
.
.

RMAN> recover database;

Starting recover at 24-JAN-11
using channel ORA_DISK_1
using channel ORA_DISK_2
using channel ORA_DISK_3
using channel ORA_DISK_4

RMAN> alter database open resetlogs;

database opened

Several points are worth emphasizing.

1. Recovery using a backup controlfile should be done only if a current control file is unavailable.
2. All datafiles must be restored from backup. This means the database will need to be recovered using archived and online redo logs. These MUST be available for recovery until the time of failure.
3. As with any database recovery involving RESETLOGS, take a fresh backup immediately.
4. Technically the above is an example of complete recovery – since all committed transactions were recovered. However, some references consider this to be incomplete recovery because the database log sequence had to be reset.

After recovery using a backup controlfile, all temporary files associated with locally-managed tablespaces are no longer available. You can check that this is so by querying the view V$TEMPFILE – no rows will be returned. Therefore tempfiles must be added (or recreated) before the database is made available for general use. In the case at hand, the tempfile already exists so we merely add it to the temporary tablespace. This can be done using SQLPlus or any tool of your choice:

SQL> alter tablespace temp add tempfile
‘D:\oracle_data\datafiles\ORCL\TEMP01.DBF’;

Tablespace altered.

 

A Ballad For a DBA

Dedicated to Gwen Shapira

It’s a tale of woe and malice, a tale of a DBA
That the users learned to fear and to get out of his way
He would drop production tables when complaints had grown to loud
He would smile and feign surprise and then suggested “use the cloud”

He would backup every hour but would not deign to restore
“If it’s gone, just be more careful, it’s just me the backup’s for”
He would kill off all the queries that had gone on way too long
When the users would complain he would just say that they are wrong

When the database expanded
And the system was too slow
He just shrugged, “go talk to network –
Go tell them your tales of woe”

When the backups all are running and the database is gone
He won’t care about the users, just let an evil yawn
When the disks will come down crashing, if it’s way too late at night
He’ll just sets his cell to “silent”, waits until the morning light

When the database expanded
And the network was too slow
He just shrugged, “go talk to network –
Go tell them your tales of woe”

When the database expanded
I/O rates became too high
He just shrugged, “go talk to storage –
Or just go crawl off and die”

And the coders told him – “Buddy,
All your indices are wrong”
So he told them, “It’s your queries,
and your fault they take too long”

I've lost an archived/online REDO LOG file, can I get my DB back?

The following INIT.ORA/SPFILE parameter can be used if your current redologs are corrupted or blown away. It may also be handy if you do database recovery and one of the archived log files are missing and cannot be restored.

NOTE: Caution is advised when enabling this parameter as you might end-up losing your entire database. Please contact Oracle Support before using it.

_allow_resetlogs_corruption = true

This should allow you to open the database. However, after using this parameter your database will be inconsistent (some committed transactions may be lost or partially applied).

Steps:

  • Do a “SHUTDOWN NORMAL” of the database
  • Set the above parameter
  • Do a “STARTUP MOUNT” and “ALTER DATABASE OPEN RESETLOGS;”
  • If the database asks for recovery, use an UNTIL CANCEL type recovery and apply all available archive and on-line redo logs, then issue CANCEL and reissue the “ALTER DATABASE OPEN RESETLOGS;” command.
  • Wait a couple of minutes for Oracle to sort itself out
  • Do a “SHUTDOWN NORMAL”
  • Remove the above parameter!
  • Do a database “STARTUP” and check your ALERT.LOG file for errors.

Extract the data and rebuild the entire database.

Oracle Database HA Architecture

By Uwe Hesse

Link to original post: http://uhesse.wordpress.com/oracle-database-ha-architecture/

A very popular topic in many of my courses is Oracle Database Architecture regarding High Availability (HA). This page is supposed to address this topic with a high level overview, covering “Ordinary” Single Instance Databases, Data Guard, Real Application Clusters (RAC) and Extended RAC (sometime called “Stretched Cluster”). The combination of RAC and Data Guard is advertised by Oracle Corp. under the label Maximum Availability Architecture (MAA). In addition to these Oracle HA Solutions, I will briefly cover also one Third Party HA Solution: Remote Mirroring. I don’t intend to dive deep into technical details of all these solutions but instead just want to differentiate them and talk briefly about the various advantages and maybe drawbacks of each of them.

At first, we look at the still most common Oracle Database Architecture: Single Instance. An Oracle RDBMS consists always of one Database – made up by Datafiles, Online Redo Logfiles and Controlfile(s) and at least one Instance – made up by Memory Structures (like a Database Buffer Cache) and Background Processes (like a Database Writer). If we have one Database and multiple Instances accessing it – that’s a RAC. If only one Instance accesses the Database – that’s Single Instance. Small Installations have stored all the components inside one Server like this:

Also common these days is the placement of the Database on a Storage Area Network (SAN) like this:

From a HA perspective this Architecture is vulnerable: Server A and Server B are Single Point of Failures (SPOFs) as well as Database A and Database B are. Also, the Site where these Servers are placed is a SPOF. Should one of the SPOFs fail, the whole Database is unavailable. An “ordinary” RAC addresses the Server SPOFs like this:

Should one of the two Servers fail, the Database C is still available. HA is not the only reason to use RAC, of course. Amongst others, a further valid reason to use RAC is Scalability: If our requirements increase in the future, we can add another Server (Node) to the cluster. Also, we have options like Service-Management and Load Balancing with RAC. In short words: RAC is not just for HA, but it is out of the scope of this article to address the other reasons in detail. Drawback from a HA perspective of the above architecture is: The Database C resp. the Site C is a SPOF. Should i.e a fire destroy Site C, the Database C is unavailable. Therefore, we have the option to stretch the Database across two Sites, which is usually called Extended RAC:

The Sites are no longer SPOFs here. The Database D is mirrored across the two sites. Drawback of this architecture is the cost of the Network Connection between the two Sites, if long distances are desired. That is critical, because large Data Volume has to be mirrored. In effect, this leads to distances that usually do not extend a couple of kilometers – which may conflict with the goal to get Disaster Protection. Here, Data Guard kicks in: We can reach long distances for Disaster Protection with Data Guard easier, because we do not transmit the whole Data Volume but instead just the (relatively small) Redo Protocoll. In the following picture, the Servers hold  Single Instances like Server A and Server B above:

The Redo Protocol from the Primary Database is used to actualize the Standby Database. Should the Primary fail, we can failover to the Standby and continue to work productively. This failover can be done automatically by an Observer (which is called Fast-Start Failover). The distance between the two Servers can reach thousands of kilometers – depending on the kind of redo transmission and the protection level. If we combine RAC and Data Guard, we get MAA. Obviously, MAA is an expensive solution, but it also combines the advantages of RAC and Data Guard.

A popular Third Party HA Solution is Remote Mirroring. On a high level, it looks like that:

The Site is no SPOF here also, like with Extended RAC. Drawbacks of this solution: The distance is usually very limited for the same reason as with Extended RAC. Also, the Secondary Site cannot be used productively while the mirroring is in progress – contrary to the above Oracle HA solutions. With RAC all Servers resp. Sites are in use productively. Even with Data Guard, the Standby is not merely waiting for the Primary to fail. It can also be used for Read Only Access – effectively reducing the load on the Primary:

Above illustrates the 11g New Feature Real-Time Query for Physical Standby Databases. The Standby is accessed Read Only while the actualization continues. Additionally, it is possible to offload Backups to the Physical Standby (also before 11g).

Backup Controlfile to Trace

How to backup the Oracle Control File?

There are two approaches: you either generate a binary image of the Control File, or you generate a text file script which will re-generate a Control File when run as a SQL script.

To create the binary image, issue the command ALTER DATABASE BACKUP CONTROLFILE TO ‘C:\SOMEWHERE\CONTROL01.BKP’; (obviously pick a destination and file name that are suitable for your needs).

To create the text script version, issue the command ALTER DATABASE BACKUP CONTROLFILE TO TRACE. That causes a file to be written to wherever your init.ora parameter USER_DUMP_DEST is pointing. The file name will be something like ‘ora_<some numbers>.trc’. You’ll have to track it down using the O/S date and timestamp (or you can take advantage of the fact that the “some numbers” bit is your Server Process ID Number -which you can determine from v$session and v$process).
The binary image is ready to work from the word go (with one hugely important proviso, of which more below).

The trace file is, however, a bit of a mess, and needs to be knocked into shape if it is to be of any use in a recovery situation. Firstly, it contains a whole heap of junk at the top (referencing your O/S, your Oracle version, dates and times and process or thread numbers). That’s got to go -which means deleting all of it, until the first line reads STARTUP NOMOUNT. Immediately before that line, you need a connect string, since you can’t ever issue the ‘startup’ command until you’ve connected as a Privileged User. Insert something appropriate, therefore (such as CONNECT / AS SYSDBA if using O/S authentication, or CONNECT SYS/ORACLE AS SYSDBA if using Password File Authentication). You may possibly also need to qualify the ‘STARTUP NOMOUNT’ line itself so that it references an appropriate init.ora (for example, STARTUP NOMOUNT PFILE=/SOMEWHERE/NON-DEFAULT/INIT.ORA). Other than that, the file is fine, and useable.

Under no circumstances should you ever need a backup of your Control File, of either type. You’d only use one if ALL copies of the Control File had become corrupt or had been destroyed (and the whole purpose of multiplexing your Control Files is to minimise those possibilities). But if that fateful day ever arrives, then you simply need to fire up Server Manager (or SQL Plus if using 8i or above), and issuing the command to run the trace file script (i.e., typing @name_of_script.ext). There’s no need to connect first -remember, that was the first line we added to the script earlier. If for some reason the Instance is already running, the script will fail -it needs a completely shut down system before it can do its stuff.

The trouble starts when you attempt to restore the binary version of the Control File backup. Because it was an exact, binary copy of a Control File, its SCN number will not agree with the SCN in the headers of all the data files -basically, the Master Clock is out of whack. You therefore have to issue the command “RECOVER DATABASE USING BACKUP CONTROLFILE;” to tell the system not to pay too much attention to the SCN of the Control File. Unfortunately, after you issue that command (and following any recovery that it might cause to take place), you must open the database with the command “ALTER DATABASE OPEN RESETLOGS;”.

That’s unfortunate, because the ‘resetlogs’ command basically forces the entire database to re-synchronise at time zero, which is fine for getting the database open, but is not exactly useful if you ever need to restore from your prior backups (taken when the database was at a time of, say, 10894329), or if you ever expect to be able to apply redo from priot archive logs (which were also taken when the database was at time 10894329 and earlier). Basically, a ‘resetlogs’ renders your database entirely vulnerable: there are no effective backups, and no effective archives. You are therefore supposed to immediately shut the newly-recovered database down, and perform a whole, closed backup of the entire database (which is not exactly a cheap option).

You might wonder why the use of the trace file does not cause this HUGE problem.  The answer is simple: it cheats. Contained within the trace file are locations of every data file in the system. When the file is run, it uses those locations to read the headers of all the data files, whereupon it picks the highest SCN it comes across as the one to write into the header of the Control File it is about to create. That means the re-constructed Control File is already in synchronisation with the data files, and no forced synchronisation to time Zero is therefore required.

So, what’s the best way of backing up the Control File? Answer: multiplex your Control Files so that a recovery is never needed in the first place. But if, despite that, you need some insurance, the Trace File is definitely the better way to go.  It doesn’t render your entire database exposed to failure, it doesn’t effectively trash all prior backups and archives, it works quickly, and well.

There’s only one proviso to this whole discussion: whatever backup method you choose, you need to keep it up-to-date.  Since the Control File contains pointers to the physical location of all data files and redo logs, any backup of that file needs to make sure that those pointers are accurate. Making a physical change to your database (for example, adding a new data file or tablespace, dropping a tablespace, moving or renaming a data file) will instantly render all prior Control File backups out-of-date.  Slightly unnervingly, changing a tablespace from read-write to read-only (or vice versa) also counts as a physical change to your database (because the Control File must always accurately identify any read-only data files).  After any of those operations, therefore, you need to take a fresh backup of the Control File.

It is always conceivable that you could edit a Trace File backup before using it to take account of any physical changes, but the syntax is not easy, and I don’t rate your chances of pulling it off. As for editing the binary copy -forget it!  Net result: keep taking the backups on a regular basis.  I usually recommend a chron job (for our NT friends, that’s an AT job!) which issues the ‘backup to trace’ command every night. It means you need a bit of house-keeping to avoid complete mayhem (and a million trace files) in the user_dump_dest, but it will guarantee a file which, at worst, can be used with the mere addition of a line or two to reference any data files created between the time the trace file was created and the time all Control Files went awol.