By Uwe Hesse
Link to original post: http://uhesse.wordpress.com/oracle-database-ha-architecture/
A very popular topic in many of my courses is Oracle Database Architecture regarding High Availability (HA). This page is supposed to address this topic with a high level overview, covering “Ordinary” Single Instance Databases, Data Guard, Real Application Clusters (RAC) and Extended RAC (sometime called “Stretched Cluster”). The combination of RAC and Data Guard is advertised by Oracle Corp. under the label Maximum Availability Architecture (MAA). In addition to these Oracle HA Solutions, I will briefly cover also one Third Party HA Solution: Remote Mirroring. I don’t intend to dive deep into technical details of all these solutions but instead just want to differentiate them and talk briefly about the various advantages and maybe drawbacks of each of them.
At first, we look at the still most common Oracle Database Architecture: Single Instance. An Oracle RDBMS consists always of one Database – made up by Datafiles, Online Redo Logfiles and Controlfile(s) and at least one Instance – made up by Memory Structures (like a Database Buffer Cache) and Background Processes (like a Database Writer). If we have one Database and multiple Instances accessing it – that’s a RAC. If only one Instance accesses the Database – that’s Single Instance. Small Installations have stored all the components inside one Server like this:
Also common these days is the placement of the Database on a Storage Area Network (SAN) like this:
From a HA perspective this Architecture is vulnerable: Server A and Server B are Single Point of Failures (SPOFs) as well as Database A and Database B are. Also, the Site where these Servers are placed is a SPOF. Should one of the SPOFs fail, the whole Database is unavailable. An “ordinary” RAC addresses the Server SPOFs like this:
Should one of the two Servers fail, the Database C is still available. HA is not the only reason to use RAC, of course. Amongst others, a further valid reason to use RAC is Scalability: If our requirements increase in the future, we can add another Server (Node) to the cluster. Also, we have options like Service-Management and Load Balancing with RAC. In short words: RAC is not just for HA, but it is out of the scope of this article to address the other reasons in detail. Drawback from a HA perspective of the above architecture is: The Database C resp. the Site C is a SPOF. Should i.e a fire destroy Site C, the Database C is unavailable. Therefore, we have the option to stretch the Database across two Sites, which is usually called Extended RAC:
The Sites are no longer SPOFs here. The Database D is mirrored across the two sites. Drawback of this architecture is the cost of the Network Connection between the two Sites, if long distances are desired. That is critical, because large Data Volume has to be mirrored. In effect, this leads to distances that usually do not extend a couple of kilometers – which may conflict with the goal to get Disaster Protection. Here, Data Guard kicks in: We can reach long distances for Disaster Protection with Data Guard easier, because we do not transmit the whole Data Volume but instead just the (relatively small) Redo Protocoll. In the following picture, the Servers hold Single Instances like Server A and Server B above:
The Redo Protocol from the Primary Database is used to actualize the Standby Database. Should the Primary fail, we can failover to the Standby and continue to work productively. This failover can be done automatically by an Observer (which is called Fast-Start Failover). The distance between the two Servers can reach thousands of kilometers – depending on the kind of redo transmission and the protection level. If we combine RAC and Data Guard, we get MAA. Obviously, MAA is an expensive solution, but it also combines the advantages of RAC and Data Guard.
A popular Third Party HA Solution is Remote Mirroring. On a high level, it looks like that:
The Site is no SPOF here also, like with Extended RAC. Drawbacks of this solution: The distance is usually very limited for the same reason as with Extended RAC. Also, the Secondary Site cannot be used productively while the mirroring is in progress – contrary to the above Oracle HA solutions. With RAC all Servers resp. Sites are in use productively. Even with Data Guard, the Standby is not merely waiting for the Primary to fail. It can also be used for Read Only Access – effectively reducing the load on the Primary:
Above illustrates the 11g New Feature Real-Time Query for Physical Standby Databases. The Standby is accessed Read Only while the actualization continues. Additionally, it is possible to offload Backups to the Physical Standby (also before 11g).
At first, we look at the still most common Oracle Database Architecture: Single Instance. An Oracle RDBMS consists always of one Database – made up by Datafiles, Online Redo Logfiles and Controlfile(s) and at least one Instance – made up by Memory Structures (like a Database Buffer Cache) and Background Processes (like a Database Writer). If we have one Database and multiple Instances accessing it – that’s a RAC. If only one Instance accesses the Database – that’s Single Instance. Small Installations have stored all the components inside one Server like this:
Also common these days is the placement of the Database on a Storage Area Network (SAN) like this:
From a HA perspective this Architecture is vulnerable: Server A and Server B are Single Point of Failures (SPOFs) as well as Database A and Database B are. Also, the Site where these Servers are placed is a SPOF. Should one of the SPOFs fail, the whole Database is unavailable. An “ordinary” RAC addresses the Server SPOFs like this:
Should one of the two Servers fail, the Database C is still available. HA is not the only reason to use RAC, of course. Amongst others, a further valid reason to use RAC is Scalability: If our requirements increase in the future, we can add another Server (Node) to the cluster. Also, we have options like Service-Management and Load Balancing with RAC. In short words: RAC is not just for HA, but it is out of the scope of this article to address the other reasons in detail. Drawback from a HA perspective of the above architecture is: The Database C resp. the Site C is a SPOF. Should i.e a fire destroy Site C, the Database C is unavailable. Therefore, we have the option to stretch the Database across two Sites, which is usually called Extended RAC:
The Sites are no longer SPOFs here. The Database D is mirrored across the two sites. Drawback of this architecture is the cost of the Network Connection between the two Sites, if long distances are desired. That is critical, because large Data Volume has to be mirrored. In effect, this leads to distances that usually do not extend a couple of kilometers – which may conflict with the goal to get Disaster Protection. Here, Data Guard kicks in: We can reach long distances for Disaster Protection with Data Guard easier, because we do not transmit the whole Data Volume but instead just the (relatively small) Redo Protocoll. In the following picture, the Servers hold Single Instances like Server A and Server B above:
The Redo Protocol from the Primary Database is used to actualize the Standby Database. Should the Primary fail, we can failover to the Standby and continue to work productively. This failover can be done automatically by an Observer (which is called Fast-Start Failover). The distance between the two Servers can reach thousands of kilometers – depending on the kind of redo transmission and the protection level. If we combine RAC and Data Guard, we get MAA. Obviously, MAA is an expensive solution, but it also combines the advantages of RAC and Data Guard.
A popular Third Party HA Solution is Remote Mirroring. On a high level, it looks like that:
The Site is no SPOF here also, like with Extended RAC. Drawbacks of this solution: The distance is usually very limited for the same reason as with Extended RAC. Also, the Secondary Site cannot be used productively while the mirroring is in progress – contrary to the above Oracle HA solutions. With RAC all Servers resp. Sites are in use productively. Even with Data Guard, the Standby is not merely waiting for the Primary to fail. It can also be used for Read Only Access – effectively reducing the load on the Primary:
Above illustrates the 11g New Feature Real-Time Query for Physical Standby Databases. The Standby is accessed Read Only while the actualization continues. Additionally, it is possible to offload Backups to the Physical Standby (also before 11g).