This chapter provides an overview of the components and operation of the IRIS FailSafe system. It contains these major sections:
If your IRIS FailSafe system is running the 1.0 release of IRIS FailSafe and you plan to upgrade it to the 1.1 Release, you can skip to the last major section of this chapter, “Overview of IRIS FailSafe 1.1 for IRIS FailSafe 1.0 Users,” for information about the new and changed features in 1.1 and how to upgrade the system to 1.1.
In the world of mission critical computing, the availability of information and computing resources is extremely important. The availability of a system is affected by how long it is unavailable after a failure in any of its components. Different degrees of availability are provided by different types of systems:
Fault-tolerant systems (continuous availability). These systems use redundant components and specialized logic to ensure continuous operation and to provide complete data integrity. On these systems the degree of availability is extremely high. Some of these systems can also tolerate outages due to hardware or software upgrades (continuous availability). This solution is very expensive and requires specialized hardware and software.
Highly available systems. These systems survive single points of failure by using redundant off-the-shelf components and specialized software. They provide a lower degree of availability than the fault-tolerant systems, but at much lower cost. Typically these systems provide high availability only for client/server applications, and base their redundancy on cluster architectures with shared resources.
SGI's high availability solution, IRIS FailSafe, is based on a two node cluster. This provides redundancy of processors and I/O controllers. The redundancy of storage is obtained through the use of dual-hosted RAID devices and plexed (mirrored) disks.
If one of the nodes in the cluster or one of the nodes' components fails, the second node restarts the highly available services of the failed node. In the client/server paradigm, the client does not care which of the two nodes in the cluster is providing the service. The clients see only a brief interruption of the service.
Highly available services are monitored by the IRIS FailSafe software. During normal operation, if a failure is detected on any of these components, a failover process is initiated on the surviving node. This process consists of isolating the failed node (to ensure data consistency), doing any recovery required by the failed over services, and quickly restarting the services on the surviving node.
In a high availability system, each node serves as backup for the other node. Unlike the backup resources in a fault-tolerant system, which serves purely as redundant hardware for backup in case of failure, the resources of each node in a high availability system can be used during normal operation.
The Silicon Graphics® IRIS FailSafe product provides a general facility for providing highly available services. These services fall into two groups: highly available resources and highly available applications. Highly available resources are network interfaces, XLV logical volumes, and XFS™ filesystems that have been configured for IRIS FailSafe. Optional IRIS FailSafe products are available for these applications: NFS™, the Netscape™ Communications Server™, the Netscape Commerce Server, Sybase, INFORMIX, and Oracle.
The Silicon Graphics IRIS FailSafe system consists of two CHALLENGE, POWER CHALLENGE, or Onyx servers that provide highly available services. The servers need not be the same model, for example an IRIS FailSafe cluster can consist of a CHALLENGE L server and a CHALLENGE S server. Disks are shared by physically attaching them to both the nodes in the system. The two servers (called nodes throughout this guide) and disks, along with IRIS FailSafe software, form an IRIS FailSafe cluster.
While running highly available services, the nodes can run other applications that are not highly available. All highly available services are owned and accessed by one node at a time.
The IRIS FailSafe system supports fast failover: if a highly available service—interface, disk, application, or the node itself—fails, the service quickly (although not instantaneously) resumes because the second node in the system shuts down the failed node and takes over all of its services. To clients, the services on the second node are indistinguishable from the original services before failure occurred. It appears as if the original node has crashed and rebooted quickly. The clients notice a brief interruption in the highly available service.
Two configurations are possible:
All highly available services run on one node. The other node is the backup node. After failover, the services run on the backup node. In this case, the backup node is a hot standby for failover purposes only. The backup node can run other applications that are not highly available.
Highly available services run concurrently on both nodes. For each service the other node serves as a backup node. For example, both nodes can be exporting different NFS filesystems. If a failover occurs, one node then exports all of the NFS filesystems.
The base software for the IRIS FailSafe system consists of IRIX™ 5.3 with XFS or IRIX 6.2, IRIX patches, IRIS FailSafe software, FDDI software (if FDDI networking is used), and software for the optional CHALLENGE RAID storage system.
There are IRIS FailSafe software options for some highly available applications. Optional software includes:
IRIS FailSafe NFS
IRIS FailSafe Web (for Netscape servers)
IRIS FailSafe INFORMIX
IRIS FailSafe Sybase
IRIS FailSafe Oracle
IRIS FailSafe enhances the Silicon Graphics Oracle Parallel Server™ (OPS™) by providing IP failover in an OPS hardware configuration. However, the two products are not merged administratively, so different tools are required to maintain a combined system.
IRIS FailSafe provides a framework for making applications into highly available resources. If you want to add highly available applications on an IRIS FailSafe cluster, you must write scripts to handle monitoring and failover functions. In addition, you must add the new highly available applications to the IRIS FailSafe configuration file /va/ha/ha.conf to register the application and scripts with the IRIS FailSafe software. Developing these scripts and making additions to the configuration file is described in the IRIS FailSafe Programmer's Guide .
Figure 1-1 shows the IRIS FailSafe hardware components.
The hardware components of the IRIS FailSafe system are as follows:
two CHALLENGE nodes—S, DM, L, XL, POWER CHALLENGE, or Onyx—in any combination
one or more interfaces on each node to one or more public networks (Ethernet or FDDI)
These public interfaces attached to each node connect the node to one or more public networks, which link the cluster to clients. Each public interface has an IP address called a fixed IP address that doesn't move to the other node in the cluster during failover. Each public interface can have additional IP addresses, called high availability IP addresses, that are transferred to an interface on the surviving node in case of failover.
a serial line from a serial port on each node to the other's Remote System Control port (to the Silicon Graphics remote power control unit on CHALLENGE S nodes)
A surviving node uses this line to reboot the failed node during takeover. This procedure ensures that the failed node is not using the shared disks when the surviving node takes them over.
one interface on each node for the private network (Ethernet or FDDI)
One Ethernet or FDDI interface on each node is required for the private heartbeat connection, by which each node monitors the state of the other node. The IRIS FailSafe software also uses this connection to pass control messages between nodes. These interfaces, called private interfaces, have distinct IP addresses that are kept private for security reasons.
disk storage and SCSI bus shared by the nodes in the cluster
The nodes in the IRIS FailSafe system share dual-hosted disk storage over a shared fast and wide SCSI bus. The bus is shared so that either node can take over the disks in case of failure. The hardware required for the disk storage is either:
a CHALLENGE Vault peripheral enclosure with SCSI disks
CHALLENGE RAID deskside or rackmount storage system; each chassis assembly has two storage-control processors (SPs) and at least five disk modules with caching enabled
![]() | Note: The IRIS FailSafe system is designed to survive a single point of failure. Therefore, when a system component fails, it must be restarted, repaired, or replaced as soon as possible to avoid the possibility of two or more failed components. |
IRIS FailSafe software includes a set of processes running on each node and communicating with each other. These processes use scripts failover and recovery operations and for monitoring the highly available services on each node. These processes and scripts read IRIS FailSafe configuration information from a configuration file, /var/ha/ha.conf.
The IRIS FailSafe daemons and scripts are shown in Figure 1-2. For each daemon or set of scripts, the diagram shows other daemons and scripts it communicates with and the communication path it uses.
These IRIS FailSafe daemons and scripts are described in the following subsections.
The heartbeat daemon ha_hbeat runs in each node and is the first IRIS FailSafe process to start. It is polled by the application monitor on the other node. These heartbeat messages enable each node to determine the liveliness of the other node. Heartbeat messages are passed on the private network. If there is a failure of the private network, heartbeat messages can be passed on the public network.
The node controller process ha_nc determines each node's current state. The node states are described in the section “Node States” in this chapter.
The node controllers in the cluster pass messages to each other over the private network. If there is a failure of the private network, the node controllers don't use the public network.
On each node, the application monitor process ha_appmon monitors all services on both nodes and reports any failures to the node controller.
The application monitor polls the heartbeat daemon on the other node to determine its liveliness. It also executes the failover scripts during state transitions.
Because the application monitor ha_appmon is a multi-threaded process, you may see several instances of ha_appmon running on a node simultaneously when you look at the output of the ps command.
The kill daemon ha_killd on each node monitors the serial connection to the other node and provides the power-cycling capability.
The interface agent monitors all local interfaces to determine if they are still functioning.
The interface agent uses the number of input packets as the criteria to determine whether a network interface is working or not. The interface agent injects packets into public network if it finds the number of input packets in an interface is not increasing. This prevents false failovers in networks that do not have any I/O activity.
This section discusses the highly available resources that are provided on an IRIS FailSafe system.
If a node crashes or hangs (for example, due to a parity error or bus error), it will not respond to the heartbeat message sent by the application monitor on the other node. The other (good) node takes over the failed node's services after resetting the failed node.
If a node fails, the interfaces, access to storage, and services also become unavailable. See the succeeding sections for descriptions of how the IRIS FailSafe system handles or eliminates these points of failure.
Clients access the highly available services provided by the IRIS FailSafe cluster using IP addresses. Each highly available service can use multiple IP addresses. The IP addresses are not tied to a particular highly available service; they can be shared by all the highly available services in the cluster.
IRIS FailSafe uses the IP aliasing mechanism to support multiple IP addresses on a single network interface. Clients can use a highly available service using multiple IP addresses even when there is only one network interface in the server node.
The IP aliasing mechanism allows an IRIS FailSafe configuration that has a node with multiple network interfaces backed up by a node with a single network interface. IP addresses configured on multiple network interfaces are moved to the single interface on the other node in case of a failure.
IRIS FailSafe requires that each network interface in a cluster have an IP address that does not failover. These IP addresses, called fixed IP addresses, are used to monitor network interfaces. Each fixed IP address must be configured to a network interface at system boot up time. All other IP addresses in the cluster are configured as high availability IP addresses. They are included in the IRIS FailSafe configuration file /var/ha/ha.conf.
High availability IP addresses are configured on a network interface and moved to another network interface in the other node by IRIS FailSafe during a failover or recovery process. IRIS FailSafe uses the ifconfig command to configure an IP address on a network interface and to move IP addresses from one interface to another.
In some networking implementations, it is not sufficient to move IP addresses from one interface to another using the ifconfig command. IRIS FailSafe uses MAC address impersonation (also called re-mac'ing) to support these networking implementations. MAC address impersonation moves the physical network address of a network interface to another interface. If an interface is configured for MAC address impersonation in the IRIS FailSafe configuration file, IRIS FailSafe moves the IP address using the ifconfig command and moves the physical address using the macconfig command.
Since the physical address of an interface is moved in MAC address impersonation, each network interface has to be backed up by a dedicated backup interface (each backup interface backs up only one network interface) if MAC address impersonation is required. CISCO routers and PC/NFS clients require MAC address impersonation.
See the section “Planning Network Interface and IP Address Configuration” in Chapter 2 for more information about determining if MAC address impersonation is required.
The IRIS FailSafe system includes shared SCSI-based storage in the form of one or more CHALLENGE RAID storage systems or CHALLENGE Vaults with plexed disks. All data for highly available applications must be stored in XLV logical volumes on shared disks. If highly available applications use filesystems, XFS filesystems must be used.
For CHALLENGE RAID storage systems, if a disk or disk controller fails, the RAID storage system is equipped to keep services available through its own capabilities.
With plexed XLV logical volumes on the disks in a CHALLENGE Vault, the XLV system provides redundancy. No participation of the IRIS FailSafe system software is required for a disk failure. If a disk controller fails, the IRIS FailSafe system software initiates the failover process.
Figure 1-3 shows disk storage takeover. The surviving node takes over the shared disks and recovers the logical volumes and filesystems on the disks. This process is expedited by the XFS filesystem, which supports fast recovery because it uses journaling technology that does not require the use of the fsck command for filesystem consistency checking.
Each application has a primary node and backup node. The primary node is the node on which the application runs when FailSafe is in normal state. When a failure of any highly available resources or highly available application is detected by IRIS FailSafe software, all highly available resources on the failed node are failed over to the other node and the highly available applications on the failed node are stopped. When these operations are complete, the highly available applications are started on the backup node.
All information about highly available applications, including the primary node and backup node for the application and monitoring scripts, is specified in the IRIS FailSafe configuration file. Monitoring scripts detect the failure of a highly available application.
IRIS FailSafe option products provide monitoring scripts and failover scripts that make NFS, Web, Oracle, INFORMIX, and Sybase applications highly available.
The IRIS FailSafe software provides a framework for making applications highly available. By writing scripts and modifying the IRIS FailSafe configuration file, you can turn client/server applications into highly available applications. For information, see the IRIS FailSafe Programmer's Guide .
When a failure is detected on one node (the node has crashed, hung, or been shut down, or a highly available service is no longer operating), the other node performs a failover of the highly available services that are being provided on the node with the failure (called the failed node). Failover makes all of the highly available services, previously provided by both nodes in a cluster, available on the surviving node in the cluster. This is called degraded state. (Node states are more fully described in the next section, “Node States.”)
A failure in a highly available service can be detected by IRIS FailSafe processes running on a either node. Depending on which node detects the failure, the sequence of actions following the failure is different.
If the failure is detected by the IRIS FailSafe software running on the same node, the failed node performs these operations:
Stops all highly available applications running on the node
Moves all highly available resources (IP addresses and shared disks) to the other node
Sends a message to the other node (surviving node) to start providing all highly available resources and applications previously provided by the failed node
Moves to a state called standby state
When it receives the message, the surviving node performs these operations:
Transfers ownership of all the highly available resources from the failed node to itself
Starts offering the highly available resources of the failed node and the applications that were running on the failed node
Moves to degraded state
If the failure is detected by FailSafe software running on the other node, the node detecting the failure (the surviving node) performs these operations:
Using the serial connection between the nodes, reboots the failed node to prevent corruption of data
Transfers ownership of all the highly available resources from the failed node to itself
Starts offering the highly available resources of the failed node and the applications that were running on the failed node
Moves to degraded state
When a failed node is coming back up (called a recovering node), it determines if the other node is running. There are three possible scenarios:
If IRIS FailSafe is not running on the other node or if the private interfaces or private network are not functioning, the recovering node does not begin providing highly available services. It goes into standby state.
If IRIS FailSafe is running on the other node and controlled failback is configured on for the node, the recovering node doesn't begin proving highly available services; it goes to a state called controlled failback state.
If IRIS FailSafe is running on the other node and controlled failback is configured on for the node, the surviving node shuts down the highly available services for which it is the backup node, and the recovering node begins providing the highly available services for which it is the primary node.
Normally, a node that experiences a failure automatically reboots and resumes providing highly available services. This scenario works well for transient errors (as well as for planned outages for equipment and software upgrades). However, if there are persistent errors, automatic reboot can cause recovery and an immediate failover again. To prevent this, the IRIS FailSafe software checks how long the rebooted node has been up since the last time it was started. If the interval is less than five minutes (by default), the IRIS FailSafe software automatically does a chkconfig failsafe off on the failed node and does not start up the IRIS FailSafe software on this node. It also writes error messages to the console and /var/adm/SYSLOG.
Each node that is running IRIS FailSafe software is in one of the six states described in Table 1-1.
Node State | Definition |
|---|---|
joining | The node is coming up and joining the cluster. The node should never remain in this state for more than two or three minutes. |
normal | The node is actively providing its own highly available services. |
degraded | The node is providing all highly available services for the cluster; the other node is unavailable. |
standby | This node has stopped monitoring the other node in the cluster and is no longer providing highly available services because a local failure has been detected or an administrative command has moved the node to this state. Also, if a node cannot move to normal state during the joining phase, it moves to this state. |
controlled failback | This node is no longer providing highly available services, but it is monitoring the other node in the cluster and the services it is providing. |
error | An unrecoverable failure has occurred. |
Figure 1-4 diagrams the node states and the events that govern them.
Table 1-2 shows the possible combinations of states for two nodes. When the state is listed as (none), it means that IRIS FailSafe software is not running or the node is shut down.
Table 1-2. Possible Combinations of Node States
State of One Node | State of the Other Node | Situation |
|---|---|---|
joining | joining | These are transient state combinations that occur immediately after one or both nodes have been rebooted. |
normal | normal | Both nodes are operating normally, providing the highly available services for which they are the primary node. |
degraded | standby | The node in degraded state is providing all highly available services. The node in standby state is not providing any highly available services and is not performing monitoring. |
degraded | controlled failback | The node in degraded state is providing all highly available services. The node in controlled failback state is not providing any highly available services, but is performing monitoring. |
degraded | (none) | The node in degraded state is providing all highly available services while the other node isn't running IRIS FailSafe software or is shut down. |
standby | (none) | The node in standby state is running IRIS FailSafe software, but is not providing any highly available services. The other node isn't running IRIS FailSafe software or is shut down. |
After the IRIS FailSafe cluster hardware has been installed, follow this general procedure to configure and test the IRIS FailSafe system:
Become familiar with IRIS FailSafe terms by reviewing this chapter and the “Glossary” at the end of this guide.
Plan the configuration of highly available applications and services on the cluster using Chapter 2, “Planning IRIS FailSafe Configuration.”
Perform various administrative tasks, including the installation of prerequisite software, that are required by IRIS FailSafe. The instructions are in Chapter 3, “Configuring Nodes for IRIS FailSafe.”
Prepare the IRIS FailSafe configuration file as explained in Chapter 4, “Creating the IRIS FailSafe Configuration File.”
Test the IRIS FailSafe system in three phases: test individual components prior to starting IRIS FailSafe software, test normal operation of the IRIS FailSafe system, and simulate failures to test the operation of the system after a failure occurs. The instructions are in Chapter 5, “Testing IRIS FailSafe Configuration.”
The subsections below explain the differences between Release 1.0 of IRIS FailSafe and Release 1.1 of IRIS FailSafe. The final subsection describes the procedure for upgrading an IRIS FailSafe system from 1.0 to 1.1.
IRIS FailSafe 1.1 uses a substantially different model for IP address failover that is changed in several ways and provides new features:
IRIS FailSafe 1.1 supports the failover of multiple network interfaces in a node, not just a designated primary interface.
There are no designated primary and secondary network interfaces or primary and secondary IP addresses in a node.
Any network interface in a node can act as primary interface and as a secondary interface for an IP address (if MAC address impersonation is not required).
A dual-active cluster can be configured even if each node has just one public network interface. (If MAC address impersonation is required, two public interfaces are required on each node.)
IRIS FailSafe 1.1 can failover multiple IP addresses configured to a single network interface using the IP aliasing mechanism. These IP addresses are called high availability IP addresses.
The IP addresses configured to several network interfaces on one node can be failed over to a single network interface on the other node.
Each network interface in a cluster has an IP address that is configured at the boot time and is not failed over. These IP addresses are called fixed IP addresses.
The IRIS FailSafe 1.0 restriction that the IP name used for the private network between the nodes be the hostname of the node has been removed.
IRIS FailSafe 1.1 has an optional new feature called controlled failback that gives administrators more control over the behavior of an IRIS FailSafe cluster after a failover.
In IRIS FailSafe 1.0, when a remote monitor failure or a heartbeat failure is detected, the node with the failure is rebooted and the other node takes over all the highly available services. The failed node boots up and takes back its services. Thus, the services provided by the node with the failure have moved twice, and the clients using the services see two interruptions in the service.
In IRIS FailSafe 1.1, if the controlled failback feature is enabled for the node, the second interruption in service can be prevented. The rebooted node comes up and does not take back its services. It is now in a state called controlled failback. It monitors the highly available services running in the other node, providing a backup for the highly available services. Using an administrative command, the node in the controlled failback state can be moved to normal state and thus resume providing the highly available services for which it is the primary node.
A new process called an interface agent (/usr/etc/ha_ifa) has been added to IRIS FailSafe 1.1. It monitors all the network interfaces that have high availability IP addresses in the cluster. The interface agent makes the monitoring of the highly available services on the other node in the cluster unnecessary. So, remote monitoring scripts are no longer used in general. (An exception is IRIS FailSafe 1.1 systems using IRIS FailSafe 1.0 configuration files and the IRIS FailSafe NFS or IRIS FailSafe Web options. They continue to use remote monitoring scripts.)
New IRIS FailSafe option products support highly available Oracle, INFORMIX and Sybase database servers. All the database servers are monitored using database agents (/usr/etc/ha_sybs for Sybase databases, /usr/etc/ha_ifmx for INFORMIX databases, and /usr/etc/ha_orcl for Oracle databases).
IRIS FailSafe 1.1 uses the new process /usr/etc/ha_hbeat to receive heartbeat messages sent by the application monitor on the other node. In IRIS FailSafe 1.0, heartbeat messages were sent and received by the application monitor (/usr/etc/ha_appmon).
In IRIS FailSafe 1.1, all configurations can use the public network to send and receive heartbeat messages in case the private network fails. This feature was available only in dual-active configurations in IRIS FailSafe 1.0.
The format of the IRIS FailSafe configuration file, the file that contains configuration and monitoring frequency information, has changed in IRIS FailSafe 1.1. The blocks required in the configuration file and their contents are different.
IRIS FailSafe 1.1 can work with IRIS FailSafe 1.0 configuration files. However, using IRIS FailSafe 1.1 with a 1.0 configuration file provides only bug fixes and an improvement in heartbeat failure detection time. It does not provide the new features of IRIS FailSafe 1.1. To use the new features of IRIS FailSafe 1.1, you must replace the 1.0 configuration file with one in the IRIS FailSafe 1.1 format. You cannot simply add the new 1.1 configuration file blocks to a 1.0 configuration file. Upgrading 1.0 configuration files to the IRIS FailSafe 1.1 format is recommended. Chapter 4, “Creating the IRIS FailSafe Configuration File,” explains how to create a 1.1 configuration file.
Follow this general procedure when you are upgrading an IRIS FailSafe cluster from Release 1.0 to Release 1.1:
Review the IRIS FailSafe concepts and the information about differences between IRIS FailSafe 1.0 and 1.1 presented in this chapter.
Review Chapter 2, “Planning IRIS FailSafe Configuration,” and perform any planning activities required, such as interface configuration, because of the changes from IRIS FailSafe 1.0 to 1.1.
Review Chapter 3, “Configuring Nodes for IRIS FailSafe,” and perform any necessary tasks, such as installing the IRIS FailSafe 1.1 software.
If you decide to switch to a Release 1.1 configuration file, prepare one as described in Chapter 4, “Creating the IRIS FailSafe Configuration File.”
Perform the upgrade procedure in Chapter 7, “Upgrading an IRIS FailSafe Cluster,” to start IRIS FailSafe 1.1 on both nodes in the cluster.