IRIS FailSafe 2.1.x is not a new release of the IRIS FailSafe 1.2 product but, instead, is a new set of files and scripts that provides many additional possibilities for the size and complexity of a highly available system. If you wish to migrate a FailSafe 1.2 system to a FailSafe 2.1.x system to take advantage of these features, you must upgrade your system configuration. There is no upgrade installation option to automatically upgrade FailSafe 1.2 to FailSafe 2.1.x.
This appendix provides a description of the procedures you perform to upgrade a system from FailSafe 1.2 to FailSafe 2.1.x . It includes the following sections:
There are no hardware changes that are required when you upgrade a system to FailSafe 2.1.x. A FailSafe 1.2 system will be a dual-hosted storage with reset ring two-node configuration in FailSafe 2.1.x.
With FailSafe 2.1.x, you can test the hardware configuration with FailSafe diagnostic commands. See Chapter 8, “Testing the Configuration”, for instructions on using FailSafe to test the connections. These diagnostics are not run automatically when you start FailSafe 2.1.x; you must run them manually.
You can also use the admin ping command to test the serial reset line in FailSafe 2.1.x. This command replaces the ha_spng command you used with FailSafe 1.2.
FailSafe 1.2 command to test serial reset lines:
# /usr/etc/ha_spng -i 1 -d msc -f /dev/ttyd2 # echo $status |
FailSafe 2.1.x cmgr command to test serial reset lines:
cmgr> admin ping dev_name /dev/ttyd2 of dev_typetty with sysctrl_type msc |
See Chapter 4, “Administration Tools”, for information on using cmgr commands.
FailSafe 2.1.x consists of a different set of files than FailSafe 1.2. FailSafe 1.2 and FailSafe 2.1. x software can exist on the same node, but you cannot run both versions of FailSafe at the same time.
FailSafe 1.2 contains a configuration file, ha.conf. In FailSafe 2.1.x, configuration information is contained in a cluster database at /var/cluster/cdb/cdb.db that is kept in all nodes in the pool. You create the cluster database using the cmgr command or the GUI.
The FailSafe 2.1.x cluster database is automatically copied to all nodes in the pool. The FailSafe 2.1. x configuration is kept in all nodes in the pool.
You must reconfigure your FailSafe 1.2 system by using the FailSafe 2.1.x GUI or the FailSafe 2.1. x cmgr command to configure the system as a FailSafe 2.1.x system. For information on using these administration tools, see Chapter 4, “Administration Tools”.
To update a FailSafe 1.2 configuration, consider how the FailSafe 1.2 configuration maps onto the concept of resource groups:
Each resource group contains all the applications that were primary on each node and backed up by the other node.
When you configure a FailSafe 2.1.x system, you perform the following steps:
Add nodes to the pool
Create cluster
Add nodes to the cluster
Set HA parameters (FailSafe 2.1. x can be started at this point, if desired)
Create resources
Create failover policy
Create resource groups
Add resources to resource groups
Put resource groups online
These steps are captured in the guided configuration task sets in the GUI. These task sets lead you through these configuration steps.
For a configuration example that compares FailSafe 1.2 configuration to FailSafe 2.1.x configuration, see “Upgrade Examples”.
All FailSafe 1.2 scripts must be rewritten for FailSafe 2.1. x. The IRIS FailSafe Version 2 Programmer's Guide provides detailed information on FailSafe 2.1.x scripts as well as detailed instructions for migrating FailSafe 1.2 scripts to their FailSafe 2.1. x functional equivalent.
In FailSafe 1.2, the unit of failover is the node. In FailSafe 2.1. x, the unit of failover is the resource group. Because of this, the concepts of node failover, node failback, and even node state do longer apply to FailSafe 2.1.x. In addition, all FailSafe scripts differ between the two releases.
The following table summarizes the differences between the releases.
Table B-1. Differences between IRIS FailSafe 1.2 and 2.1.x
In order to upgrade a FailSafe 1.2 system to a FailSafe 2.1. x system, you must examine your ha.conf file to determine how to define the equivalent parameters in the FailSafe 2.1.x cluster database.
The following sections show upgrade examples for the following tasks:
Defining a Node
Defining a Cluster
Setting HA Parameters
Defining a Resource: XLV Volume
Defining a Resource: XFS Filesystem
Defining a Resource: IP Address
For upgrade examples of the following tasks, see the IRIS FailSafe Version 2 Programmer's Guide, where customized resources and scripts are described.
Defining a Resource Type
Defining a Failover Policy
Writing FailSafe Scripts
The following example shows node definition in the FailSafe 1.2 ha.conf file. Parameters that you must use when configuring a FailSafe 2.1.x system are indicated in bold.
Node node1 { interface node1-fxd { name = rns0 ip-address = 54.3.252.6 netmask = 255.255.255.0 broadcast-addr = 54.3.252.6 } heartbeat { hb-private-ipname = 192.0.2.3 hb-public-ipname = 54.3.252.6 hb-probe-time = 6 hb-timeout = 6 hb-lost-count = 4 } reset-tty = /dev/ttyd2 sys-ctlr-type = MSC } |
In this configuration example, you will use the following values when you define the same node in FailSafe 2.1.x:
Node name: node1
Primary network interface: node1
Type of system controller: msc
System control device name: /dev/ttyd2
Control networks: 192.0.2.3, 54.3.252.6
Use the following cmgr command to use these values to define a node in FailSafe 2.1.x. Note that there are additional parameters you must specify when you define this node.
cmgr> define node node1
Enter commands, you may enter "done" or "cancel" at any time to exit |
Hostname[optional]? node1 Is this a FailSafe node <true|false> ? true Is this a CXFS node <true|false> ? false Node ID ? 10 Reset type <powerCycle> ? (powerCycle) Do you wish to define system controller info[y/n]:y Sysctrl Type <msc|mmsc>? (msc) msc Sysctrl Password [optional]? ( ) Sysctrl Status <enabled|disabled>? enabled Sysctrl Owner? node2 Sysctrl Device? /dev/ttyd2 Sysctrl Owner Type <tty> [tty]? Number of Network interfaces [2]? 2 NIC 1 - IP Address? 192.0.2.3 NIC 1 - Heartbeat HB (use network for heartbeats) <true|false>? true NIC 1 - (use network for control messages) <true|false>? true NIC 1 - Priority <1,2,...>? 1 ... |
As this ha.conf node definition shows, in FailSafe 1.2 you defined parameters to set the values that determined how often to send monitoring messages and how long of a time period without a response would indicate a failure when you defined a node. For information on setting monitoring values in FailSafe 2.1.x, see “Setting HA Parameters”.
Although FailSafe 1.2 does not require the definition of clusters, you specify a parameter in the ha.conf file that FailSafe 2.1.x uses in its cluster definition: the e-mail address to use to notify the system administrator when problems occur in the cluster.
The ha.conf file includes the following:
system configuration
{
mail-dest-addr = root@localhost
...
} |
When you define a cluster in FailSafe 2.1.x, you can use this as the e-mail address to use for problem notification.
There are other things you must provide in addition to this parameter when you define a FailSafe 2.1.x cluster, such as the e-mail program to use for this notification and, of course, the nodes to include in the cluster. Use the following cmgr command to define a cluster:
cmgr> define cluster apache-cluster Enter commands, you may enter "done" or "cancel" at any time to exit |
cluster apache-cluster? set notify_addr to root@localhost
cluster A? done |
Use the following cmgr command to add nodes to the cluster:
cmgr> modify cluster apache-cluster Enter commands, you may enter "done" or "cancel" at any time to exit |
cluster apache-cluster? add node node1
cluster A? done |
The following example shows the sections of a FailSafe 1.2 ha.conf file that are used to set monitoring and timeout values. Parameters that you must use when configuring a FailSafe 2.1. x system are indicated in bold.
system-configuration
{
pwrfail = true
...
}
Node node1
{
...
heartbeat
{
hb-private-ipname = 192.0.2.3
hb-public-ipname = 54.3.252.6
hb-probe-time = 6
hb-timeout = 6
hb-lost-count = 4
}
...
} |
As this ha.conf node-definition shows, in FailSafe 1.2 you defined hb-probe-time, hb-timeout , and hb-lost-count parameters to set the values that determined how often to send monitoring messages and how long of a time period without a response would indicate a failure. FailSafe 2.1.x uses a different method for monitoring the nodes in a cluster than FailSafe 1.2 uses, sending out continuous messages to the other nodes in a cluster and, in turn, maintaining continuous monitoring of the messages the other nodes are sending.
Because of the different monitoring methods between the two systems, there is no one-to-one correspondence between the values you set in the ha.conf file and the timeout and heartbeat intervals you set in FailSafe 2.1.x when you set FailSafe HA parameters. However, if you wish to maintain approximately the same time interval before which your system determines that failure has occurred, you can use the following formula to determine the value to which you should set your node timeout interval:
node_timeout = (probetime + timeout) * lostcount |
This formula should account for the same total node-to-node communication time.
All FailSafe 2.1.x timeouts are in milliseconds, and can be changed when FailSafe 2.1.x is running. Timeouts can be specified for the cluster for a specific node in the cluster.
There is no long-timeout value in FailSafe 2.1.x. The long-timeout value equivalent is set with the resource type start and stop action monitor timeouts. The resource type start, monitor, and stop action timeouts can be changed using the GUI or cmgr.
Use the following cmgr command to modify the HA parameters for node1 in FailSafe 2.1. x:
cmgr> modify ha_parameters on node node1 in cluster apache-cluster Enter commands, when finished enter either "done" or "cancel" node1 ? set node_timeout to 24000 node1 ? set heartbeat to 6000 node1 ? set run_pwrfail to true node1 ? done |
The following example shows a volume definition in the FailSafe 1.2 ha.conf file. Parameters that you must use when configuring the same volume as a volume resource in a FailSafe 2.1. x system are indicated in bold.
volume apache-vol
{
server-node = node1
backup-node = node2
devname = apache-vol
devname-owner = root
devname-group = sys
devname-mode = 600
} |
In this configuration example, you will use the following values when you define the same volume in FailSafe 2.1.x:
Volume name: apache-vol
User name of device file owner: root
Group name of device file: sys
Device file permissions: 600
cmgr> define resource apache-vol of resource_type volume in cluster apache-cluster Enter commands, when finished enter either "done" or "cancel" resource apache-vol? set devname-owner to root resource apache-vol? set devname-group to sys resource apache-vol? set devname-mode to 600 resource apache-vol? done |
The following example shows an XFS filesystem definition in the FailSafe 1.2 ha.conf file. Parameters that you must use when configuring the same filesystem as a filesystem resource in a FailSafe 2.1.x system are indicated in bold.
filesystem apache-fs
{
mount-point = /apache-fs
mount-info
{
fs-type = xfs
volume-name = apache-vol
mode = rw, noauto
}
} |
In this configuration example, you will use the following values when you define the same filesystem in FailSafe 2.1.x:
Resource name (mount point): /apache-vol
XLV volume: apache-vol
Mount options: rw, noauto
To create a filesystem resource, use the following cmgr commands:
cmgr> define resource /apache-fs of resource_type filesystem in cluster apache-cluster Enter commands, when finished enter either "done" or "cancel" resource /apache-fs? set volume-name to apache-vol resource /apache-fs? set mount-options to "rw,noauto" resource /apache-fs? done |
The following example shows an IP address definition in the FailSafe 1.2 ha.conf file. Parameters that you must use when configuring the same IP address as a highly available resource in a FailSafe 2.1.x system are indicated in bold.
interface-pair FDDI_1
{
primary-interface = node-fxd
secondary-interface = node2-fxd
re-mac = false
netmask = 0xffffff00
broadcast-addr = 54.3.252.255
ip-aliases = ( 54.3.252.7 )
} |
In this configuration example, you will use the following values when you define the same IP Address in FailSafe 2.1.x:
Resource name: 54.3.252.7
Broadcast address: 54.3.252.255
Network mask: 0xffffff00
To create an IP address resource, use the following cmgr commands:
cmgr> define resource 54.3.252.7 of resource_type IP_address in cluster apache-cluster Enter commands, when finished enter either "done" or "cancel" resource 54.3.252.7? set interfaces to rns0 resource 54.3.252.7? set NetworkMask to 0xffffff00 resource 54.3.252.7? set BroadcastAddress to 54.3.252.255 resource 54.3.252.7? done |
After you have defined your nodes, clusters, and resources, you define your resource groups, a task which has no equivalent in FailSafe 1.2. When you define a resource group, you specify the resources that will be included in the resource group and the failover policy that determines which node will take over the services of the resource group on failure.
For information on defining resource groups, see “Define a Resource Group with the GUI” in Chapter 5.
After you have configured your system, you can start FailSafe services, as described in “Start FailSafe HA Services” in Chapter 5.
In FailSafe 1.2, you produced a display of the system status with the ha_admin -a command. In FailSafe 2.1. x, you can display the system status in the following ways:
You can keep continuous watch on the state of a cluster using the GUI.
You can query the status of an individual resource group, node, or cluster using either the GUI or cmgr.
You can use the /var/cluster/cmgr-scripts/ha Status script provided with the cmgrto see the status of all clusters, nodes, resources, and resource groups in the configuration.
For information on performing these tasks, see “System Status” in Chapter 7.