VIP fails over to other node : Oracle RAC 11gR2

IF crsctl stat res -t gives output like below, means that  the VIP failed over to node 1 from node 2

ora.swx.swxscan.svc
1        ONLINE  ONLINE       swxracnode1
2        ONLINE  OFFLINE
ora.swxracnode1.vip
1        ONLINE  ONLINE       swxracnode1
ora.swxracnode2.vip
1        ONLINE  INTERMEDIATE swxracnode1              FAILED OVER

Logon to Node2 and start the listener with grid user and start the listener

$ lsnrctl start

Transparent Application Failover (TAF) Service in Oracle RAC 11gR2

ADDING TAF SERVICE FOR SCAN
_____________________________

####### As “Oracle” (owner of database) user

swx – name of database
swx1- instance on node1
swx2- instance on node2
swxscan- name for the new taf service

srvctl add service -d swx -s swxscan -r “swx1,swx2″ -P BASIC

srvctl start service -d swx -s swxscan

srvctl config service -d swx

SQL> select name,service_id from dba_services where name = ‘swxscan’;

SQL> select name, failover_method, failover_type, failover_retries,goal,
clb_goal,aq_ha_notifications  from dba_services where service_id = 3;

SQL> execute dbms_service.modify_service (service_name => ‘swxscan’ -
, aq_ha_notifications => true -
, failover_method => dbms_service.failover_method_basic -
, failover_type => dbms_service.failover_type_select -
, failover_retries => 180 -
, failover_delay => 5 -
, clb_goal => dbms_service.clb_goal_long);

lsnrctl status
lsnrctl status LISTENER_SCAN1
lsnrctl status LISTENER_SCAN2
lsnrctl status LISTENER_SCAN3

————————————————————

TESTING THE TAF USING SCAN

In your client tnsnames.ora file add below entry after creating TAF service above :

SWXSCAN =
(DESCRIPTION =
(ADDRESS = (PROTOCOL = TCP)(HOST = swxscan.qiibonline.com)(PORT = 1521))
(CONNECT_DATA =
(SERVER = DEDICATED)
(SERVICE_NAME = swxscan)
(FAILOVER_MODE=
(TYPE=select)
(METHOD=basic))))

sqlplus system/system@swxrac

You can verify the client connections to 11gR2 database for TAF using -

SQL> SELECT MACHINE, FAILOVER_TYPE, FAILOVER_METHOD, FAILED_OVER, COUNT(*)
FROM V$SESSION
GROUP BY MACHINE, FAILOVER_TYPE, FAILOVER_METHOD, FAILED_OVER;

MACHINE    AILOVER_TYPE FAILOVER_M FAI COUNT(*)
——— ————- ——— —– —
mymachine    SELECT        BASIC    NO   1

SQL> select count(*) from table;

# You can continously run this query and mean while go to the cluter node and down the service on node2 and the instance also on node 2

LOGON TO CLUSTER NODE1 as “grid” user:

srvctl stop service -d swx -n swxracnode2

srvctl stop instance -d swx -n swxracnode2

crsctl stat res -t

And when you go back to your client connection. You can see that the query is still executing without the connection being lost.

SQL> SELECT MACHINE, FAILOVER_TYPE, FAILOVER_METHOD, FAILED_OVER, COUNT(*)
FROM V$SESSION
GROUP BY MACHINE, FAILOVER_TYPE, FAILOVER_METHOD, FAILED_OVER;

Don;t forget to start the service again :) )

srvctl start instance -d swx -n swxracnode2

srvctl start service -d swx -n swxracnode2

_______________________________________________

###########SRVCTL COMMAND TO ADD SERVICE FOR TAF

srvctl add service -d swx -s swxscan -r “swx1,swx2″ -P BASIC -m BASIC -z 180 -w 5 -j LONG -q TRUE -e SESSION

-d database unique name

-s name of the service to be created

-r preferred instances where the service would run

-P Basic or Preconnect method of connection, PRECONNECT establishes a backup connection on another ndoe, whereas BASIC does a session failover

-z failover retries

-w failover delay

-j session type. either a long session or short session, LONG or SHORT

-q Send Oracle Advanced Queuing (AQ) HA notifications. For standalone servers, applicable in Oracle Data Guard environments only

-e Session Select or None. Use sess

### Command to check serrvice status ####

srvctl config service -d swx -s swxscan -a

Warning:-a option has been deprecated and will be ignored.
Service name: swxscan
Service is enabled
Server pool: swx_swxscan
Cardinality: 2
Disconnect: false
Service role: PRIMARY
Management policy: AUTOMATIC
DTP transaction: false
AQ HA notifications: true
Failover type: SESSION
Failover method: NONE
TAF failover retries: 180
TAF failover delay: 5
Connection Load Balancing Goal: LONG
Runtime Load Balancing Goal: NONE
TAF policy specification: BASIC
Edition:
Preferred instances: swx1,swx2
Available instances:

Setting up user profiles for Grid and Oracle user : Oracle RAC 11gR2 on Solaris 10

Below is the Solaris user profiles for “grid” and “oracle” user. If you have followed Oracles official documentation for installation ORacle Rac 11gR2. It is best practice to install Oracle clusterware under grid user and oracle database with Oracle user. It takes a little time to get used to this setup specially for RAC users coming from 10g. But once you get a hang of it. It will work like a charm.

FYI the below user profiles have been set on Solaris 10\

___________________________________________

GRID USER PROFILE
___________________________________________

grid:x:100:100::/export/home/grid:/usr/bin/bash
oracle:x:101:100::/export/home/oracle:/usr/bin/bash

cd /export/home/grid

root@swxracnode1 # more .bash_profile

## If doesnt exist create it and give permission as 775 and owner as grid:oinstall

ORACLE_HOME=/u02/11.2.0/grid
GRID_HOME=/u02/11.2.0/grid
ORACLE_SID=+ASM1
export ORACLE_HOME ORACLE_SID GRID_HOME
PATH=/u02/11.2.0/grid/bin:$PATH

. .bash_profile

___________________________________________

ORACLE USER PROFILE
___________________________________________

grid:x:100:100::/export/home/grid:/usr/bin/bash
oracle:x:101:100::/export/home/oracle:/usr/bin/bash

cd /export/home/oracle

root@swxracnode1 # more .bash_profile

## If doesnt exist create it and give permission as 775 and owner as oracle:oinstall

ORACLE_HOME=’/u01/app/oracle/product/11.2.0/db_1′
ORACLE_SID=’swx1′
export ORACLE_HOME ORACLE_SID
PATH=$PATH:$ORACLE_HOME/bin
export PATH
ORACLE_UNQNAME=’swx’
export ORACLE_UNQNAME

Change SCAN ip’s Oracle RAC11gR2

SCAN (Single Client Access Name) is a new concept introduced in since 11gR2. It eliminated to have VIP )Virutal ip) of nodes in the tnsnames.ora file. Prior to release 11gR2 the TNSNAMES.ora file required to have all nodes entered in it for load balancing and failover. This created unnecessarily changes everytime a node was added or deleted. With 11gR2 Oracle created a listener runnin on top of the node listeners. This listener needs 3 ips to be configured on the DNS. And it creates a virtual hostname which can be used as a signle entry in the tnsnames. Its a very useful feature. But sometimes if you change the scan ips in your organization. The SCAN ip’s first need to be changed on your DNS by your network administrator, then the clusterware needs to be told of the changes. Heres how you do it.

Steps:

1. Check the current status SCN IP address in the DNS server.

$nslookup <scan-name>

$nslookup testrac-scan.abc.com

Server:         160.34.11.20
Address:        160.34.11.21#40

Name:   testrac-scan.abc.com
Address: 160.34.11.88
Name:   testrac-scan.abc.com
Address: 160.34.11.89
Name:   testrac-scan.abc.com
Address: 160.34.11.90

2. Check the current status SCAN-VIP in the resource file

# $GRID_HOME/bin/srvctl config scan
SCAN name: testrac-scan, Network: 1/10.101.10.0/255.255.255.0/eth4
SCAN VIP name: scan1, IP: /testrac-scan.abc.com/160.34.11.88
SCAN VIP name: scan2, IP: /testrac-scan.abc.com/160.34.11.89
SCAN VIP name: scan3, IP: /testrac-scan.abc.com/160.34.11.90

3. Request network admin to update NEW SCAN IP address in the DNS server.

Example:

Old SCAN IP:

160.34.11.88
160.34.11.89
160.34.11.90

new SCAN IP:

170.35.12.60
170.35.12.61
170.35.12.62

$nslookup testrac-scan.abc.com

Server:         170.35.12.20

Address:        170.35.12.21#40

Name:   testrac-scan.abc.com
Address: 170.35.12.60
Name:   testrac-scan.abc.com
Address: 170.35.12.61
Name:   testrac-scan.abc.com
Address: 170.35.12.62

4.CRS to update the SCAN VIP resources:

Note: current SCAN-VIP resource should be stopped before modifying the CRS resoruce file.

# $GRID_HOME/bin/srvctl stop scan_listener
# $GRID_HOME/bin/srvctl stop scan
# $GRID_HOME/bin/srvctl status scan
SCAN VIP scan1 is enabled
SCAN VIP scan1 is not running
SCAN VIP scan2 is enabled
SCAN VIP scan2 is not running
SCAN VIP scan3 is enabled
SCAN VIP scan3 is not running
# $GRID_HOME/bin/srvctl status scan_listener
SCAN Listener LISTENER_SCAN1 is enabled
SCAN listener LISTENER_SCAN1 is not running
SCAN Listener LISTENER_SCAN2 is enabled
SCAN listener LISTENER_SCAN2 is not running
SCAN Listener LISTENER_SCAN3 is enabled
SCAN listener LISTENER_SCAN3 is not running

# $GRID_HOME/bin/srvctl modify scan -n testrac-scan.abc.com

5. To verify that the change was successful

# $GRID_HOME/bin/srvctl config scan
SCAN name: sales-scan, Network: 1/1/10.101.10.0/255.255.255.0/eth4
SCAN VIP name: scan1, IP: /testrac-scan.abc.com/170.35.12.60
SCAN VIP name: scan2, IP: /testrac-scan.abc.com/170.35.12.61
SCAN VIP name: scan3, IP: /testrac-scan.abc.com/170.35.12.62

6. Start SCAN and the SCAN listener

# $GRID_HOME/bin/srvctl start scan
# $GRID_HOME/bin/srvctl start scan_listener

Installing 11gR2 Real Application Clusters on Oracle Enterprise Linux 4 x86-64 (64 bit)

Installing Oracle Grid Infrastrcture

This document provides a step by step guide to, installing Oracle 11gR2 Real Application clusters on Oracle Enterprise Linux 4, x86-64 (64 Bit).
Keep in mind, this is a test installation and hence does not follow some of the best practices for installing oracle real application clusters (eg: use a separate o/s user for oracle grid infrastructure, have at least 3 voting disks etc…).

Important Documents

Grid infrastructure installation guide for Linux
Real Application Clusters Installation guide for Linux and Unix
Clusterware Administration and Deployment Guide
Real Application Clusters Administration and Deployment Guide

Some new concepts in 11gR2 Rac


Oracle clusterware and ASM now are installed into the Same Oracle Home, and is now called the grid infrastructure install.

Raw devices are no longer supported for use for anything (Read oracle cluster registry, voting disk, asm disks), for new installs.

OCR and Voting disk can now be stored in ASM, or a certified cluster file system.

The redundancy level of your ASM diskgroup (That you choose to place voting disk on) determines the number of voting disks you can have.
You can place

  • Only One voting disk on an ASM diskgroup configured as external redundancy
  • Only Three voting disks on an ASM diskgroup configured as normal redundancy
  • Only Five voting disks on an ASM diskgroup configured as high redundancy


The contents of the voting disks are automatically backed up into the OCR

ACFS (Asm cluster file system) is only supported on Oracle Enterprise Linux 5 (And RHEL5), not on OEL4.
There is a new service called cluster time synchronization service that can keep the clocks on all the servers in the cluster synchronized (In case you dont have network time protocol (ntp) configured)

Single Client Access Name (SCAN), is a hostname in the DNS server that will resolve to 3 (or at least one) ip addresses in your public network. This hostname is to be used by client applications to connect to the database (As opposed to the vip hostnames you were using in 10g and 11gr1). SCAN provides location independence to the client connections connecting to the database. SCAN makes node additions and removals transparent to the client application (meaning you dont have to edit your tnsnames.ora entries every time you add or remove a node from the cluster).

Oracle Grid Naming Service (GNS), provides a mechanism to make the allocation and removal of VIP addresses a dynamic process (Using dynamic Ip addresses).

Intelligent Platform Management Interface (IPMI) integration, provides a new mechanism to fence server’s in the cluster, when the server is not responding.

The installer can now check the O/S requirements, report on the requirements that are not met, and give you fixup scripts to fix some of them (like setting kernel parameters).

The installer can also help you setup SSH between the cluster nodes.

There is anew deinstall utility that cleans up a existing or failed install.

There is a new Instantaneous problem detection OS tool. This tool is the nextgen oswatcher.

And the list goes on and on.

Listed below are some of the top hardware and software requirements for the install. Please refer to the grid infrastructure install guide for all the pre-requisites.

Hardware requirements

Please refer to the install guide for all the pre-requisites.

One or more servers
Shared Disk storage (SAN, NAS)
GigE or higher network switch (For the private interconnect)
Atleast One GigE or higher Network interface card for the private interconnect
Atleast One Network interface card for the public interconnect
IP address requirements

  • One SCAN name (That resolves to 3 ip addresses) for the cluster
  • For each Node
    • 1 public IP address
    • 1 private IP address
    • 1 VIP IP address
  • The SCAN, Public IP addresses and the VIP should be in the same subnet.


Operating System requirements

Amoung other requirements (Please see the install documents for all the requirements)

RHEL4 (or OEL4), update 7 or higher
RHEL5 (or OEL5), update 2 or higher

Software requirements list for x86-64 linux platforms
Kernel parameters

If you are part of the Redhat Linux Network, or Oracle Unbreakable linux network, then you can get the oracle-validated rpm. This rpm sets up all the (or most of it in my case) required rpm’s and kernel parameters for you. You can use up2date to install this package. This really simplifies the setup steps that you need to perform.

Setup SSH between the cluster nodes.

Setup ntp deamon for clock synchronization on all the nodes. Remember to use the -x switch, or else cluvfy will always call this out and declare failure.

Configure ASMLIB

  • Install the asm packages oracleasm-support oracleasmlib oracleasm
  • Run oracleasm configure on each node to configure asmlib
  • Run oracleasm createdisk to create the needed ASM disks from one node
  • Run oracleasm scandisks to mount the disks on all the other nodes.

Download Software for Installation

Download Oracle Grid Infrastructure Software
Download Oracle Database  Software – Part 1
Download Oracle Database  Software – Part 2

Grid Infrastructure Installation

Unzip the grid infrastructure software into a staging directory on one of the server’s.
login as the oracle user.
cd <software stage home>/grid
./runInstaller


In the screen above, choose to “Install and configure grid infrastructure for a cluster”, click Next.


Choose a “Typical” installation, Click Next.

The above screen will be displayed with a default scan name, and just the details regarding the node from which you are running the installer.
Click on the Add button and add the hostname and virtual ip name for the rest of the nodes in the cluster.
The SCAN Name has to be a valid hostname that is setup in DNS which resolves to 3 ip addresses (atleast 1) in a round robin fashion.
Replace the default shown in the screen above with a valid SCAN you will be using.

If you have not configured ssh connectivity, you can click on “SSH connectivity” and configure ssh.

You can click on “Identify network interfaces” to check and/or change, the interfaces being used for the public and private interfaces.


Click Ok to return to the Cluster Configuration screen.

Click Next

In the screen above, choose the location for Oracle Base, the Home directory for the Grid infrastructure software, the passwords for the ASM instance and the group to be used as the OSASM group. Click Next to continue.


In the screen above, choose the disk group name to use for the first ASM diskgroup created, and which disks to use. The ASM instance needs to be created so that the voting disk and cluster registry can be placed on ASM. In my case i had configured the disks using ASMLIB before i started the grid infrastructure install. In my case the disks are created as Raid 10 on the array, so i chose external redundancy. This means that there will be only one voting disk created. Click Next to continue.

In the “Prerequisite checks” screen above, the installer runs through and checks all the pre-requisites (kernel parameters, ssh, ntp, packages). It displays the pre-requisites that are not being met. Some of these it is able to help us fix (like kernel parameters). For the one’s it can fix it will give you a script to run from /tmp as root.

In my case although i had used the oracle-validated package, there were some packages missing (i386 versions of lib-aio, unixODBC and unixODBC-devel), and some kernel parameters to be fixed.


I fixed them and ran the Check again, which resulted in the above two remaining. Since I know that i have enough swap space and I had not used the -x flag for the ntp setup, i choose to Ignore All and continue.


In the above screen, click finish to continue with the install.


Finally it will prompt you with the screen above to run the scripts as root.
Make sure that you run them one at a time, one node at a time.
Once the scripts are run to completion successfully, click on the Ok button.

In my case the cluster verification utility failed, because of not using the -x switch in ntp. I choose to skip it. I’ll fix it later.


In the screen above, click Close to exit the installer.

Your Oracle 11gR2 grid infrastructure installation is now complete.

Screen output from root.sh

First Node





Second Node


Oracle 11g RAC Administration Commands

11g RAC Administration and Maintenance Tasks and Utilities:

Task List:

Checking CRS Status
Viewing Name Of the Cluster
Viewing Nodes Configuration
Checking Votedisk Information
Checking OCR Disk information
Timeout Settings in Cluster
ADD/Remove OCR files
ADD/Remove Votedisk
Backing Up OCR
Backing Up Votedisk
Restoring OCR Devices
Restoring Voting Disk Devices
Changing Public IPs as well as Virtual IPs

Checking CRS Status:

The below two commands are generally used to check the status of CRS. The first command lists the status of CRS
on the local node where as the other command shows the CRS status across all the nodes in Cluster.

crsctl check crs <<– for the local node
crsctl check cluster <<– for remote nodes in the cluster

[root@node1-pub ~]# crsctl check crs
Cluster Synchronization Services appears healthy
Cluster Ready Services appears healthy
Event Manager appears healthy
[root@node1-pub ~]#

Checking Viability of CSS across nodes:

crsctl check cluster

For this command to run, CSS needs to be running on the local node. The “ONLINE” status for remote node says that CSS is running on that node.
When CSS is down on the remote node, the status of “OFFLINE” is displayed for that node.

[root@node1-pub ~]# crsctl check cluster
node1-pub ONLINE
node2-pub ONLINE

Viewing Cluster name:

I use below command to get the name of Cluster. You can also dump the ocr and view the name from the dump file.

ocrdump -stdout -keyname SYSTEM | grep -A 1 clustername | grep ORATEXT | awk ‘{print $3}’

[root@node1-pub ~]# ocrdump -stdout -keyname SYSTEM | grep -A 1 clustername | grep ORATEXT | awk ‘{print $3}’
test-crs
[root@node1-pub ~]#

OR

ocrconfig -export /tmp/ocr_exp.dat -s online
for i in `strings /tmp/ocr_exp.dat | grep -A 1 clustername` ; do if [ $i != 'SYSTEM.css.clustername' ]; then echo $i; fi; done

[root@node1-pub ~]# ocrconfig -export /tmp/ocr_exp.dat -s online
[root@node1-pub ~]# for i in `strings /tmp/ocr_exp.dat | grep -A 1 clustername` ; do if [ $i != 'SYSTEM.css.clustername' ]; then echo $i; fi; done
test-crs
[root@node1-pub ~]#

OR

Oracle creates a directory with the same name as Cluster under the $ORA_CRS_HOME/cdata. you can get the cluster name from this directory as well.

[root@node1-pub ~]# ls /u01/app/crs/cdata
localhost test-crs

Viewing No. Of Nodes configured in Cluster:

The below command can be used to find out the number of nodes registered into the cluster.
It also displays the node’s Public name, Private name and Virtual name along with their numbers.

olsnodes -n -p -i

[root@node1-pub ~]# olsnodes -n -p -i
node1-pub 1 node1-prv node1-vip
node2-pub 2 node2-prv node2-vip

Viewing Votedisk Information:

The below command is used to view the no. of Votedisks configured in the Cluster.

crsctl query css votedisk

[root@node1-pub ~]# crsctl query css votedisk
0. 0 /u02/ocfs2/vote/VDFile_0
1. 0 /u02/ocfs2/vote/VDFile_1
2. 0 /u02/ocfs2/vote/VDFile_2
Located 3 voting disk(s).
[root@node1-pub ~]#

[root@node1-pub ~]# crsctl check crs
Cluster Synchronization Services appears healthy
Cluster Ready Services appears healthy
Event Manager appears healthy
[root@node1-pub ~]#

Viewing OCR Disk Information:

The below command is used to view the no. of OCR files configured in the Cluster. It also displays the version of OCR
as well as storage space information. You can only have 2 OCR files at max.

ocrcheck

[root@node1-pub ~]# ocrcheck
Status of Oracle Cluster Registry is as follows :
Version : 2
Total space (kbytes) : 262120
Used space (kbytes) : 3848
Available space (kbytes) : 258272
ID : 744414276
Device/File Name : /u02/ocfs2/ocr/OCRfile_0
Device/File integrity check succeeded
Device/File Name : /u02/ocfs2/ocr/OCRfile_1
Device/File integrity check succeeded

Cluster registry integrity check succeeded

Various Timeout Settings in Cluster:

Disktimeout:
Disk Latencies in seconds from node-to-Votedisk. Default Value is 200. (Disk IO)
Misscount:
Network Latencies in second from node-to-node (Interconnect). Default Value is 60 Sec (Linux) and 30 Sec in Unix platform. (Network IO)
Misscount Disktimeout) OR (Network IO time > Misscount)
THEN
REBOOT NODE
ELSE
DO NOT REBOOT
END IF;

crsctl get css disktimeout
crsctl get css misscount
crsctl get css reboottime

[root@node1-pub ~]# crsctl get css disktimeout
200

[root@node1-pub ~]# crsctl get css misscount
Configuration parameter misscount is not defined. <<<<< This message indicates that the Misscount is not set maually and it is set to it’s
Default Value On Linux, it is default to 60 Second. If you want to chang it, you can do that as below. (Not recommended)

[root@node1-pub ~]# crsctl set css misscount 100
Configuration parameter misscount is now set to 100.
[root@node1-pub ~]# crsctl get css misscount
100

The below command sets the value of misscount back to its Default values:

crsctl unset css misscount

[root@node1-pub ~]# crsctl unset css misscount

[root@node1-pub ~]# crsctl get css reboottime
3

Add/Remove OCR file in Cluster:

Removing OCR File

(1) Get the Existing OCR file information by running ocrcheck utility.

[root@node1-pub ~]# ocrcheck
Status of Oracle Cluster Registry is as follows :
Version : 2
Total space (kbytes) : 262120
Used space (kbytes) : 3852
Available space (kbytes) : 258268
ID : 744414276
Device/File Name : /u02/ocfs2/ocr/OCRfile_0 <– OCR
Device/File integrity check succeeded
Device/File Name : /u02/ocfs2/ocr/OCRfile_1 <– OCR Mirror
Device/File integrity check succeeded

Cluster registry integrity check succeeded

(2) The First command removes the OCR mirror (/u02/ocfs2/ocr/OCRfile_1). If you want to remove the OCR
file (/u02/ocfs2/ocr/OCRfile_1) run the next command.

ocrconfig -replace ocrmirror
ocrconfig -replace ocr

[root@node1-pub ~]# ocrconfig -replace ocrmirror
[root@node1-pub ~]# ocrcheck
Status of Oracle Cluster Registry is as follows :
Version : 2
Total space (kbytes) : 262120
Used space (kbytes) : 3852
Available space (kbytes) : 258268
ID : 744414276
Device/File Name : /u02/ocfs2/ocr/OCRfile_0 <<– OCR File
Device/File integrity check succeeded

Device/File not configured <– OCR Mirror not existed any more

Cluster registry integrity check succeeded

Adding OCR

You need to add OCR or OCR Mirror file in a case where you want to move the existing OCR file location to the different Devices.
The below command add ths OCR mirror file if OCR file alread exists.

(1) Get the Current status of OCR:

[root@node1-pub ~]# ocrconfig -replace ocrmirror
[root@node1-pub ~]# ocrcheck
Status of Oracle Cluster Registry is as follows :
Version : 2
Total space (kbytes) : 262120
Used space (kbytes) : 3852
Available space (kbytes) : 258268
ID : 744414276
Device/File Name : /u02/ocfs2/ocr/OCRfile_0 <<– OCR File
Device/File integrity check succeeded

Device/File not configured <– OCR Mirror does not exist

Cluster registry integrity check succeeded

As You can see, I only have one OCR file but not the second file which is OCR Mirror.
So, I can add second OCR (OCR Mirror) as below command.

ocrconfig -replace ocrmirror

[root@node1-pub ~]# ocrconfig -replace ocrmirror /u02/ocfs2/ocr/OCRfile_1
[root@node1-pub ~]# ocrcheck
Status of Oracle Cluster Registry is as follows :
Version : 2
Total space (kbytes) : 262120
Used space (kbytes) : 3852
Available space (kbytes) : 258268
ID : 744414276
Device/File Name : /u02/ocfs2/ocr/OCRfile_0
Device/File integrity check succeeded
Device/File Name : /u02/ocfs2/ocr/OCRfile_1
Device/File integrity check succeeded

Cluster registry integrity check succeeded

You can have at most 2 OCR devices (OCR itself and its single Mirror) in a cluster. Adding extra Mirror gives you below error message

[root@node1-pub ~]# ocrconfig -replace ocrmirror /u02/ocfs2/ocr/OCRfile_2
PROT-21: Invalid parameter
[root@node1-pub ~]#

Add/Remove Votedisk file in Cluster:

Adding Votedisk:

Get the existing Vote Disks associated into the cluster. To be safe, Bring crs cluster stack down on all the nodes
but one on which you are going to add votedisk from.

(1) Stop CRS on all the nodes in cluster but one.

[root@node2-pub ~]# crsctl stop crs

(2) Get the list of Existing Vote Disks

crsctl query css votedisk

[root@node1-pub ~]# crsctl query css votedisk
0. 0 /u02/ocfs2/vote/VDFile_0
1. 0 /u02/ocfs2/vote/VDFile_1
2. 0 /u02/ocfs2/vote/VDFile_2
Located 3 voting disk(s).

(3) Backup the VoteDisk file

Backup the existing votedisks as below as oracle:

dd if=/u02/ocfs2/vote/VDFile_0 of=$ORACLE_BASE/bkp/vd/VDFile_0

[root@node1-pub ~]# su – oracle
[oracle@node1-pub ~]$ dd if=/u02/ocfs2/vote/VDFile_0 of=$ORACLE_BASE/bkp/vd/VDFile_0
41024+0 records in
41024+0 records out
[oracle@node1-pub ~]$

(4) Add an Extra Votedisk into the Cluster:

If it is a OCFS, then touch the file as oracle. On raw devices, initialize the raw devices using “dd” command

touch /u02/ocfs2/vote/VDFile_3 <<– as oracle
crsctl add css votedisk /u02/ocfs2/vote/VDFile_3 <<– as oracle
crsctl query css votedisks

[root@node1-pub ~]# su – oracle
[oracle@node1-pub ~]$ touch /u02/ocfs2/vote/VDFile_3
[oracle@node1-pub ~]$ crsctl add css votedisk /u02/ocfs2/vote/VDFile_3
Now formatting voting disk: /u02/ocfs2/vote/VDFile_3.
Successful addition of voting disk /u02/ocfs2/vote/VDFile_3.

(5) Confirm that the file has been added successfully:

[root@node1-pub ~]# ls -l /u02/ocfs2/vote/VDFile_3
-rw-r—– 1 oracle oinstall 21004288 Oct 6 16:31 /u02/ocfs2/vote/VDFile_3
[root@node1-pub ~]# crsctl query css votedisks
Unknown parameter: votedisks
[root@node1-pub ~]# crsctl query css votedisk
0. 0 /u02/ocfs2/vote/VDFile_0
1. 0 /u02/ocfs2/vote/VDFile_1
2. 0 /u02/ocfs2/vote/VDFile_2
3. 0 /u02/ocfs2/vote/VDFile_3
Located 4 voting disk(s).
[root@node1-pub ~]#

Removing Votedisk:

Removing Votedisk from the cluster is very simple. Tthe below command removes the given votedisk from cluster configuration.

crsctl delete css votedisk /u02/ocfs2/vote/VDFile_3

[root@node1-pub ~]# crsctl delete css votedisk /u02/ocfs2/vote/VDFile_3
Successful deletion of voting disk /u02/ocfs2/vote/VDFile_3.
[root@node1-pub ~]#

[root@node1-pub ~]# crsctl query css votedisk
0. 0 /u02/ocfs2/vote/VDFile_0
1. 0 /u02/ocfs2/vote/VDFile_1
2. 0 /u02/ocfs2/vote/VDFile_2
Located 3 voting disk(s).
[root@node1-pub ~]#

Backing Up OCR

Oracle performs physical backup of OCR devices every 4 hours under the default backup direcory $ORA_CRS_HOME/cdata/
and then it rolls that forward to Daily, weekly and monthly backup. You can get the backup information by executing below command.

ocrconfig -showbackup

[root@node1-pub ~]# ocrconfig -showbackup

node2-pub 2007/09/03 17:46:47 /u01/app/crs/cdata/test-crs/backup00.ocr

node2-pub 2007/09/03 13:46:45 /u01/app/crs/cdata/test-crs/backup01.ocr

node2-pub 2007/09/03 09:46:44 /u01/app/crs/cdata/test-crs/backup02.ocr

node2-pub 2007/09/03 01:46:39 /u01/app/crs/cdata/test-crs/day.ocr

node2-pub 2007/09/03 01:46:39 /u01/app/crs/cdata/test-crs/week.ocr
[root@node1-pub ~]#

Manually backing up the OCR

ocrconfig -manualbackup <<–Physical Backup of OCR

The above command backs up OCR under the default Backup directory. You can export the contents of the OCR using below command (Logical backup).

ocrconfig -export /tmp/ocr_exp.dat -s online <<– Logical Backup of OCR

Restoring OCR

The below command is used to restore the OCR from the physical backup. Shutdown CRS on all nodes.

ocrconfig -restore

Locate the avialable Backups

[root@node1-pub ~]# ocrconfig -showbackup

node2-pub 2007/09/03 17:46:47 /u01/app/crs/cdata/test-crs/backup00.ocr

node2-pub 2007/09/03 13:46:45 /u01/app/crs/cdata/test-crs/backup01.ocr

node2-pub 2007/09/03 09:46:44 /u01/app/crs/cdata/test-crs/backup02.ocr

node2-pub 2007/09/03 01:46:39 /u01/app/crs/cdata/test-crs/day.ocr

node2-pub 2007/09/03 01:46:39 /u01/app/crs/cdata/test-crs/week.ocr

node1-pub 2007/10/07 13:50:41 /u01/app/crs/cdata/test-crs/backup_20071007_135041.ocr

Perform Restore from previous Backup

[root@node2-pub ~]# ocrconfig -restore /u01/app/crs/cdata/test-crs/week.ocr

The above command restore the OCR from week old backup.
If you have logical backup of OCR (taken using export option), then You can import it with the below command.

ocrconfig -import /tmp/ocr_exp.dat

Restoring Votedisks

Shutdown CRS on all the nodes in Cluster.
Locate the current location of the Votedisks
Restore each of the votedisks using “dd” command from the previous good backup of Votedisk taken using the same “dd” command.
Start CRS on all the nodes.

crsctl stop crs
crsctl query css votedisk
dd if= of= <<– do this for all the votedisks
crsctl start crs

Changing Public and Virtual IP Address:

Current Config Changed to

Node 1:

Public IP: 216.160.37.154 192.168.10.11
VIP: 216.160.37.153 192.168.10.111
subnet: 216.160.37.159 192.168.10.0
Netmask: 255.255.255.248 255.255.255.0
Interface used: eth0 eth0
Hostname: node1-pub.test.net node1-pub.test.net

Node 2:

Public IP: 216.160.37.156 192.168.10.22
VIP: 216.160.37.157 192.168.10.222
subnet: 216.160.37.159 192.168.10.0
Netmask: 255.255.255.248 255.255.255.0
Interface used: eth0 eth0
Hostname: node1-pub.test.net node2-pub.test.net

=======================================================================
(A)

Take the Services, Database, ASM Instances and nodeapps down on both the Nodes in Cluster. Also disable the nodeapps, asm and database instances to prevent them from restarting in case if this node gets rebooted during this process.

srvctl stop service -d test
srvctl stop database -d test
srvctl stop asm -n node1-pub
srvctl stop asm -n node2-pub
srvctl stop nodeapps -n node1-pub,node1-pub2
srvctl disable instance -d test -i test1,test2
srvctl disable asm -n node1-pub
srvctl disable asm -n node2-pub
srvctl disable nodeapps -n node1-pub
srvctl disable nodeapps -n node2-pub

(B)
Modify the /etc/hosts and/or DNS, ifcfg-eth0 (local node) with the new IP values
on All the Nodes

(C)
Restart the specific network interface in order to use the new IP.

ifconfig eth0 down
ifconfig eth0 up

Or, you can restart the network.
CAUTION: on NAS, restarting entire network may cause the node to be rebooted.

(D)
Update the OCR with the New Public IP.
In case of public IP, you have to delete the interface first and then add it back with the new IP address.

As oracle user, Issue the below command:

oifcfg delif -global eth0
oifcfg setif -global eth0/192.168.10.0:public

(E)
Update the OCR with the New Virtual IP.
Virtual IP is part of the nodeapps and so you can modify the nodeapps to update the Virtual IP information.

As privileged user (root), Issue the below commands:

srvctl modify nodeapps -n node1-pub -A 192.168.10.111/255.255.255.0/eth0 <– for Node 1
srvctl modify nodeapps -n node1-pub -A 192.168.10.222/255.255.255.0/eth0 <– for Node 2

(F)
Enable the nodeapps, ASM, database Instances for all the Nodes.

srvctl enable instance -d test -i test1,test2
srvctl enable asm -n node1-pub
srvctl enable asm -n node2-pub
srvctl enable nodeapps -n node1-pub
srvctl enable nodeapps -n node2-pub

(G)
Update the listener.ora file on each nodes with the correct IP addresses in case if it uses the IP address instead of the hostname.

(H)
Restart the Nodeapps, ASM and Database instance

srvctl start nodeapps -n node1-pub
srvctl start nodeapps -n node2-pub
srvctl start asm -n node1-pub
srvctl start asm -n node2-pub
srvctl start database -d test

=======================================================================

Transparent Application Failover (TAF) in Oracle

Transparent Application Failover (TAF) is a client-side feature that allows for clients to reconnect to surviving nodes in the event of a failure of an instance. The reconnect happens automatically from within the OCI (Oracle Call Interface) library. Any uncommitted transactions are rolled back and server side program variables and session properties will be lost. In some case the select statements automatically re-executed on the new connection with the cursor positioned on the row on which it was positioned prior to the failover.

For high availability and scalability, Oracle provides the Transparent Application Failover feature part of Oracle Real Application Clusters (RAC).

The failover is configured in tnsnames.ora file, the TAF settings are placed in CONNECT_DATA section of the tnsnames.ora using FAILOVER_MODES parameters.

FAILOVER_MODE contains the subparameters
———————————————————-
BACKUP: Specify a different net service name for backup instance connections. A backup should be specified when using PRECONNECT to pre-establish connections.

A sample configuration would look like

TESTDB1 =
(DESCRIPTION =
(LOAD_BALANCE = ON)
(FAILOVER = ON)
(ADDRESS = (PROTOCOL = TCP)(HOST = TESTRAC1-VIP)(PORT = 1521))
(ADDRESS = (PROTOCOL = TCP)(HOST = TESTRAC2-VIP)(PORT = 1521))
(CONNECT_DATA =
(SERVICE_NAME = testdb.easyoradba.com)
(FAILOVER_MODE = (TYPE = SELECT)(METHOD = PRECONNECT)(BACKUP=TESTDB2))
)
)

TYPE: TAF supports three types of failover types

1.SESSION failover – If a user’s connection is lost, SESSION failover establishes a new session automatically created for the user on the backup node. This type of failover does not attempt to recover selects. This failover is ideal for OLTP (online transaction processing) systems, where transactions are small.

2.SELECT failover – If the connection is lost, Oracle Net establishes a connection to another node and re-executes the SELECT statements with cursor positioned on the row on which it was positioned prior to the failover. This mode involves overhead on the client side and Oracle NET keeps track of SELECT statements. This approach is best for data warehouse systems, where the transactions are big and complex

3.NONE: This setting is the default and failover functionality is provided. Use this setting to prevent failover.

A sample configuration would look like

TESTDB1 =
(DESCRIPTION =
(LOAD_BALANCE = ON)
(FAILOVER = ON)
(ADDRESS = (PROTOCOL = TCP)(HOST = TESTRAC1-VIP)(PORT = 1521))
(ADDRESS = (PROTOCOL = TCP)(HOST = TESTRAC2-VIP)(PORT = 1521))
(CONNECT_DATA =
(SERVICE_NAME = testdb.easyoradba.com)
(FAILOVER_MODE = (TYPE = SELECT)(METHOD = BASIC)(RETRIES = 10)(DELAY = 5))
)
)

METHOD: This parameters determines how failover occurs from the primary node to the backup node
BASIC: Use this mode to establish connections at failover time, no work on the backup server until failover time.
PRECONNECT: Use this mode to pre-established connections. This PRECONNECT mode provides faster failover but requires that the backup instance be capable of supporting all connections from every supported instance.

RETRIES: Use this parameter to specify number of times to attempt to connect after a failover. If DELAY is specified but RETRIES is not specified, RETRIES default to five retry attempts.

DELAY: Use this parameter to Specify the amount of time in seconds to wait between connect attempts. If RETRIES is specified but DELAY is not specified, DELAY default to one second.

A sample configuration would look like

TESTDB1 =
(DESCRIPTION =
(LOAD_BALANCE = ON)
(FAILOVER = ON)
(ADDRESS = (PROTOCOL = TCP)(HOST = TESTRAC1-VIP)(PORT = 1521))
(ADDRESS = (PROTOCOL = TCP)(HOST = TESTRAC2-VIP)(PORT = 1521))
(CONNECT_DATA =
(SERVICE_NAME = testdb.easyoradba.com)
(FAILOVER_MODE = (TYPE = SELECT)(METHOD = BASIC)(RETRIES = 10)(DELAY = 5))
)
)

Please note that you can pre-establish a connection to reduce the failover time using METHOD=PRECONNECT option.

To verify that TAF is correctly configured, you query FAILOVER_TYPE, FAILOVER_METHOD, and FAILED_OVER columns in the V$SESSION view.
SQL> SELECT MACHINE, FAILOVER_TYPE, FAILOVER_METHOD, FAILED_OVER, COUNT(*) FROM V$SESSION

Oracle Database HA Architecture

By Uwe Hesse

Link to original post: http://uhesse.wordpress.com/oracle-database-ha-architecture/

A very popular topic in many of my courses is Oracle Database Architecture regarding High Availability (HA). This page is supposed to address this topic with a high level overview, covering “Ordinary” Single Instance Databases, Data Guard, Real Application Clusters (RAC) and Extended RAC (sometime called “Stretched Cluster”). The combination of RAC and Data Guard is advertised by Oracle Corp. under the label Maximum Availability Architecture (MAA). In addition to these Oracle HA Solutions, I will briefly cover also one Third Party HA Solution: Remote Mirroring. I don’t intend to dive deep into technical details of all these solutions but instead just want to differentiate them and talk briefly about the various advantages and maybe drawbacks of each of them.

At first, we look at the still most common Oracle Database Architecture: Single Instance. An Oracle RDBMS consists always of one Database – made up by Datafiles, Online Redo Logfiles and Controlfile(s) and at least one Instance – made up by Memory Structures (like a Database Buffer Cache) and Background Processes (like a Database Writer). If we have one Database and multiple Instances accessing it – that’s a RAC. If only one Instance accesses the Database – that’s Single Instance. Small Installations have stored all the components inside one Server like this:

Also common these days is the placement of the Database on a Storage Area Network (SAN) like this:

From a HA perspective this Architecture is vulnerable: Server A and Server B are Single Point of Failures (SPOFs) as well as Database A and Database B are. Also, the Site where these Servers are placed is a SPOF. Should one of the SPOFs fail, the whole Database is unavailable. An “ordinary” RAC addresses the Server SPOFs like this:

Should one of the two Servers fail, the Database C is still available. HA is not the only reason to use RAC, of course. Amongst others, a further valid reason to use RAC is Scalability: If our requirements increase in the future, we can add another Server (Node) to the cluster. Also, we have options like Service-Management and Load Balancing with RAC. In short words: RAC is not just for HA, but it is out of the scope of this article to address the other reasons in detail. Drawback from a HA perspective of the above architecture is: The Database C resp. the Site C is a SPOF. Should i.e a fire destroy Site C, the Database C is unavailable. Therefore, we have the option to stretch the Database across two Sites, which is usually called Extended RAC:

The Sites are no longer SPOFs here. The Database D is mirrored across the two sites. Drawback of this architecture is the cost of the Network Connection between the two Sites, if long distances are desired. That is critical, because large Data Volume has to be mirrored. In effect, this leads to distances that usually do not extend a couple of kilometers – which may conflict with the goal to get Disaster Protection. Here, Data Guard kicks in: We can reach long distances for Disaster Protection with Data Guard easier, because we do not transmit the whole Data Volume but instead just the (relatively small) Redo Protocoll. In the following picture, the Servers hold  Single Instances like Server A and Server B above:

The Redo Protocol from the Primary Database is used to actualize the Standby Database. Should the Primary fail, we can failover to the Standby and continue to work productively. This failover can be done automatically by an Observer (which is called Fast-Start Failover). The distance between the two Servers can reach thousands of kilometers – depending on the kind of redo transmission and the protection level. If we combine RAC and Data Guard, we get MAA. Obviously, MAA is an expensive solution, but it also combines the advantages of RAC and Data Guard.

A popular Third Party HA Solution is Remote Mirroring. On a high level, it looks like that:

The Site is no SPOF here also, like with Extended RAC. Drawbacks of this solution: The distance is usually very limited for the same reason as with Extended RAC. Also, the Secondary Site cannot be used productively while the mirroring is in progress – contrary to the above Oracle HA solutions. With RAC all Servers resp. Sites are in use productively. Even with Data Guard, the Standby is not merely waiting for the Primary to fail. It can also be used for Read Only Access – effectively reducing the load on the Primary:

Above illustrates the 11g New Feature Real-Time Query for Physical Standby Databases. The Standby is accessed Read Only while the actualization continues. Additionally, it is possible to offload Backups to the Physical Standby (also before 11g).