Appendix B. Updating from IRIS FailSafe 1.2 to IRIS FailSafe 2.1.x

IRIS FailSafe 2.1.x is not a new release of the IRIS FailSafe 1.2 product but, instead, is a new set of files and scripts that provides many additional possibilities for the size and complexity of a highly available system. If you wish to migrate a FailSafe 1.2 system to a FailSafe 2.1.x system to take advantage of these features, you must upgrade your system configuration. There is no upgrade installation option to automatically upgrade FailSafe 1.2 to FailSafe 2.1.x.

This appendix provides a description of the procedures you perform to upgrade a system from FailSafe 1.2 to FailSafe 2.1.x . It includes the following sections:

Hardware Changes

There are no hardware changes that are required when you upgrade a system to FailSafe 2.1.x. A FailSafe 1.2 system will be a dual-hosted storage with reset ring two-node configuration in FailSafe 2.1.x.

With FailSafe 2.1.x, you can test the hardware configuration with FailSafe diagnostic commands. See Chapter 8, “Testing the Configuration”, for instructions on using FailSafe to test the connections. These diagnostics are not run automatically when you start FailSafe 2.1.x; you must run them manually.

You can also use the admin ping command to test the serial reset line in FailSafe 2.1.x. This command replaces the ha_spng command you used with FailSafe 1.2.

FailSafe 1.2 command to test serial reset lines:

# /usr/etc/ha_spng -i 1 -d msc -f /dev/ttyd2
# echo $status

FailSafe 2.1.x cmgr command to test serial reset lines:

cmgr> admin ping dev_name /dev/ttyd2 of dev_typetty with sysctrl_type msc

See Chapter 4, “Administration Tools”, for information on using cmgr commands.

Software Changes

FailSafe 2.1.x consists of a different set of files than FailSafe 1.2. FailSafe 1.2 and FailSafe 2.1. x software can exist on the same node, but you cannot run both versions of FailSafe at the same time.

FailSafe 1.2 contains a configuration file, ha.conf. In FailSafe 2.1.x, configuration information is contained in a cluster database at /var/cluster/cdb/cdb.db that is kept in all nodes in the pool. You create the cluster database using the cmgr command or the GUI.

The FailSafe 2.1.x cluster database is automatically copied to all nodes in the pool. The FailSafe 2.1. x configuration is kept in all nodes in the pool.

Configuration Changes

You must reconfigure your FailSafe 1.2 system by using the FailSafe 2.1.x GUI or the FailSafe 2.1. x cmgr command to configure the system as a FailSafe 2.1.x system. For information on using these administration tools, see Chapter 4, “Administration Tools”.

To update a FailSafe 1.2 configuration, consider how the FailSafe 1.2 configuration maps onto the concept of resource groups:

  • A dual-active FailSafe 1.2 configuration contains two resource groups, one for each node.

  • An active/standby FailSafe 1.2 configuration contains one resource group, consisting of an entire node (the active node).

Each resource group contains all the applications that were primary on each node and backed up by the other node.

When you configure a FailSafe 2.1.x system, you perform the following steps:

  1. Add nodes to the pool

  2. Create cluster

  3. Add nodes to the cluster

  4. Set HA parameters (FailSafe 2.1. x can be started at this point, if desired)

  5. Create resources

  6. Create failover policy

  7. Create resource groups

  8. Add resources to resource groups

  9. Put resource groups online

These steps are captured in the guided configuration task sets in the GUI. These task sets lead you through these configuration steps.

For a configuration example that compares FailSafe 1.2 configuration to FailSafe 2.1.x configuration, see “Upgrade Examples”.

Scripts

All FailSafe 1.2 scripts must be rewritten for FailSafe 2.1. x. The IRIS FailSafe Version 2 Programmer's Guide provides detailed information on FailSafe 2.1.x scripts as well as detailed instructions for migrating FailSafe 1.2 scripts to their FailSafe 2.1. x functional equivalent.

Operational Comparison

In FailSafe 1.2, the unit of failover is the node. In FailSafe 2.1. x, the unit of failover is the resource group. Because of this, the concepts of node failover, node failback, and even node state do longer apply to FailSafe 2.1.x. In addition, all FailSafe scripts differ between the two releases.

The following table summarizes the differences between the releases.

Table B-1. Differences between IRIS FailSafe 1.2 and 2.1.x

FailSafe 1.2

FailSafe 2.1. x

ha.conf configuration file.

Cluster database at /var/cluster/cdb/cdb/db. The database is automatically copied to all nodes in the pool.

Much of the data contained in the 1.2 ha.conf file will be used in the 2.1. x database, but the format is completely different. You will configure the database using the Cluster Manager graphical user interface or the cmgr command.

Node states (standby, normal, degraded, booting or up).

Resource group states (online, offline, pending, maintenance, error).

Scripts:

giveaway, giveback

takeover, takeback

check

(no equivalent)

Scripts:

stop

start

monitor

exclusive, probe, restart

Failover script

Failover attributes

All common functions and variables are kept in the /var/ha/actions/common.vars file.

All common functions and variables are kept in the /var/cluster/ha/common_scripts/scriptlib file.

Configuration information is read using the ha_cfginfo command.

Configuration information is read using the ha_get_info() and ha_get_field() shell functions.

Software links specify application ordering.

Software links are not used for ordering.

Scripts use /sbin/sh.

Scripts use /sbin/ksh .

Scripts require configuration checksum verification.

There is no configuration checksum verification in the scripts.

Scripts require resource ownership.

Action scripts have no notion of resource ownership.

Scripts do not run in parallel.

Multiple instances of action scripts can be run at the same time.

Each service had its own log in /var/ha/logs.

Action scripts use cluster logging and all scripts log to the same file using the ha_cilog command.

There were two units of failover, one for each node in the cluster.

There is a unit of failover (a resource group) for each highly available service.


Upgrade Examples

In order to upgrade a FailSafe 1.2 system to a FailSafe 2.1. x system, you must examine your ha.conf file to determine how to define the equivalent parameters in the FailSafe 2.1.x cluster database.

The following sections show upgrade examples for the following tasks:

  • Defining a Node

  • Defining a Cluster

  • Setting HA Parameters

  • Defining a Resource: XLV Volume

  • Defining a Resource: XFS Filesystem

  • Defining a Resource: IP Address

For upgrade examples of the following tasks, see the IRIS FailSafe Version 2 Programmer's Guide, where customized resources and scripts are described.

  • Defining a Resource Type

  • Defining a Failover Policy

  • Writing FailSafe Scripts

Defining a Node

The following example shows node definition in the FailSafe 1.2 ha.conf file. Parameters that you must use when configuring a FailSafe 2.1.x system are indicated in bold.

Node node1
{
interface node1-fxd
{
name = rns0
ip-address = 54.3.252.6
netmask = 255.255.255.0
broadcast-addr = 54.3.252.6
}
heartbeat
{
hb-private-ipname = 192.0.2.3
hb-public-ipname = 54.3.252.6
hb-probe-time = 6
hb-timeout = 6
hb-lost-count = 4
}
reset-tty = /dev/ttyd2

sys-ctlr-type = MSC
}

In this configuration example, you will use the following values when you define the same node in FailSafe 2.1.x:

  • Node name: node1

  • Primary network interface: node1

  • Type of system controller: msc

  • System control device name: /dev/ttyd2

  • Control networks: 192.0.2.3, 54.3.252.6

Use the following cmgr command to use these values to define a node in FailSafe 2.1.x. Note that there are additional parameters you must specify when you define this node.

cmgr> define node node1
Enter commands, you may enter "done" or "cancel" at any time to exit

Hostname[optional]? node1
Is this a FailSafe node <true|false> ? true
Is this a CXFS node <true|false> ? false
Node ID ? 10
Reset type <powerCycle> ? (powerCycle)
Do you wish to define system controller info[y/n]:y
Sysctrl Type <msc|mmsc>? (msc) msc
Sysctrl Password [optional]? ( )
Sysctrl Status <enabled|disabled>? enabled
Sysctrl Owner? node2
Sysctrl Device? /dev/ttyd2
Sysctrl Owner Type <tty> [tty]? 
Number of Network interfaces [2]? 2
NIC 1 - IP Address? 192.0.2.3
NIC 1 - Heartbeat HB (use network for heartbeats) <true|false>? true
NIC 1 - (use network for control messages) <true|false>? true
NIC 1 - Priority <1,2,...>? 1
...

As this ha.conf node definition shows, in FailSafe 1.2 you defined parameters to set the values that determined how often to send monitoring messages and how long of a time period without a response would indicate a failure when you defined a node. For information on setting monitoring values in FailSafe 2.1.x, see “Setting HA Parameters”.

Defining a Cluster

Although FailSafe 1.2 does not require the definition of clusters, you specify a parameter in the ha.conf file that FailSafe 2.1.x uses in its cluster definition: the e-mail address to use to notify the system administrator when problems occur in the cluster.

The ha.conf file includes the following:

system configuration
{
mail-dest-addr = root@localhost
...
}

When you define a cluster in FailSafe 2.1.x, you can use this as the e-mail address to use for problem notification.

There are other things you must provide in addition to this parameter when you define a FailSafe 2.1.x cluster, such as the e-mail program to use for this notification and, of course, the nodes to include in the cluster. Use the following cmgr command to define a cluster:

cmgr> define cluster apache-cluster
Enter commands, you may enter "done" or "cancel" at any time to exit

cluster apache-cluster? set notify_addr to root@localhost
cluster A? done

Use the following cmgr command to add nodes to the cluster:

cmgr> modify cluster apache-cluster
Enter commands, you may enter "done" or "cancel" at any time to exit

cluster apache-cluster? add node node1
cluster A? done

Setting HA Parameters

The following example shows the sections of a FailSafe 1.2 ha.conf file that are used to set monitoring and timeout values. Parameters that you must use when configuring a FailSafe 2.1. x system are indicated in bold.

system-configuration
{
      pwrfail = true
      ...
}


Node node1
{
...
heartbeat
{
       hb-private-ipname = 192.0.2.3
       hb-public-ipname = 54.3.252.6
       hb-probe-time = 6
       hb-timeout = 6
       hb-lost-count = 4
}
...
}

As this ha.conf node-definition shows, in FailSafe 1.2 you defined hb-probe-time, hb-timeout , and hb-lost-count parameters to set the values that determined how often to send monitoring messages and how long of a time period without a response would indicate a failure. FailSafe 2.1.x uses a different method for monitoring the nodes in a cluster than FailSafe 1.2 uses, sending out continuous messages to the other nodes in a cluster and, in turn, maintaining continuous monitoring of the messages the other nodes are sending.

Because of the different monitoring methods between the two systems, there is no one-to-one correspondence between the values you set in the ha.conf file and the timeout and heartbeat intervals you set in FailSafe 2.1.x when you set FailSafe HA parameters. However, if you wish to maintain approximately the same time interval before which your system determines that failure has occurred, you can use the following formula to determine the value to which you should set your node timeout interval:

node_timeout = (probetime + timeout) * lostcount

This formula should account for the same total node-to-node communication time.

All FailSafe 2.1.x timeouts are in milliseconds, and can be changed when FailSafe 2.1.x is running. Timeouts can be specified for the cluster for a specific node in the cluster.

There is no long-timeout value in FailSafe 2.1.x. The long-timeout value equivalent is set with the resource type start and stop action monitor timeouts. The resource type start, monitor, and stop action timeouts can be changed using the GUI or cmgr.

Use the following cmgr command to modify the HA parameters for node1 in FailSafe 2.1. x:

cmgr> modify ha_parameters on node node1 in cluster apache-cluster
Enter commands, when finished enter either "done" or "cancel"

node1 ? set node_timeout to 24000
node1 ? set heartbeat to 6000
node1 ? set run_pwrfail to true
node1 ? done

Defining a Resource: XLV Volume

The following example shows a volume definition in the FailSafe 1.2 ha.conf file. Parameters that you must use when configuring the same volume as a volume resource in a FailSafe 2.1. x system are indicated in bold.

volume apache-vol
{
   server-node = node1
   backup-node = node2
   devname = apache-vol
   devname-owner = root
   devname-group = sys
   devname-mode = 600
}

In this configuration example, you will use the following values when you define the same volume in FailSafe 2.1.x:

  • Volume name: apache-vol

  • User name of device file owner: root

  • Group name of device file: sys

  • Device file permissions: 600

To create an XLV volume resource, use the following cmgr commands:

cmgr> define resource apache-vol of resource_type volume in cluster apache-cluster
Enter commands, when finished enter either "done" or "cancel"

resource apache-vol? set devname-owner to root
resource apache-vol? set devname-group to sys
resource apache-vol? set devname-mode to 600
resource apache-vol? done

Defining a Resource: XFS Filesystem

The following example shows an XFS filesystem definition in the FailSafe 1.2 ha.conf file. Parameters that you must use when configuring the same filesystem as a filesystem resource in a FailSafe 2.1.x system are indicated in bold.

filesystem apache-fs
{
   mount-point = /apache-fs
   mount-info
{
   fs-type = xfs
   volume-name = apache-vol
   mode = rw, noauto
}
}

In this configuration example, you will use the following values when you define the same filesystem in FailSafe 2.1.x:

  • Resource name (mount point): /apache-vol

  • XLV volume: apache-vol

  • Mount options: rw, noauto

To create a filesystem resource, use the following cmgr commands:

cmgr> define resource /apache-fs of resource_type filesystem in cluster apache-cluster
Enter commands, when finished enter either "done" or "cancel"

resource /apache-fs? set volume-name to apache-vol
resource /apache-fs? set mount-options to "rw,noauto"
resource /apache-fs? done

Defining a Resource: IP Address

The following example shows an IP address definition in the FailSafe 1.2 ha.conf file. Parameters that you must use when configuring the same IP address as a highly available resource in a FailSafe 2.1.x system are indicated in bold.

interface-pair FDDI_1
{
   primary-interface = node-fxd
   secondary-interface = node2-fxd
   re-mac = false
   netmask = 0xffffff00
   broadcast-addr = 54.3.252.255

   ip-aliases = ( 54.3.252.7 )
}

In this configuration example, you will use the following values when you define the same IP Address in FailSafe 2.1.x:

  • Resource name: 54.3.252.7

  • Broadcast address: 54.3.252.255

  • Network mask: 0xffffff00

To create an IP address resource, use the following cmgr commands:

cmgr> define resource 54.3.252.7 of resource_type IP_address in cluster apache-cluster
Enter commands, when finished enter either "done" or "cancel"

resource 54.3.252.7? set interfaces to rns0
resource 54.3.252.7? set NetworkMask to 0xffffff00
resource 54.3.252.7? set BroadcastAddress to 54.3.252.255
resource 54.3.252.7? done

Additional FailSafe 2.1.x Tasks

After you have defined your nodes, clusters, and resources, you define your resource groups, a task which has no equivalent in FailSafe 1.2. When you define a resource group, you specify the resources that will be included in the resource group and the failover policy that determines which node will take over the services of the resource group on failure.

For information on defining resource groups, see “Define a Resource Group with the GUI” in Chapter 5.

After you have configured your system, you can start FailSafe services, as described in “Start FailSafe HA Services” in Chapter 5.

Status

In FailSafe 1.2, you produced a display of the system status with the ha_admin -a command. In FailSafe 2.1. x, you can display the system status in the following ways:

  • You can keep continuous watch on the state of a cluster using the GUI.

  • You can query the status of an individual resource group, node, or cluster using either the GUI or cmgr.

  • You can use the /var/cluster/cmgr-scripts/ha Status script provided with the cmgrto see the status of all clusters, nodes, resources, and resource groups in the configuration.

For information on performing these tasks, see “System Status” in Chapter 7.