This chapter provides an overview of the cluster configuration tools and the basic configuration process. It discusses the following:
SGI Cluster Manager supports the following tools to configure the cluster:
At any given time, you must use only one of these tools to perform configuration tasks. The GUI and the CLI supply similar functionality, although there are a few exceptions.
| Note: If you are going to access the GUI remotely, SGI recommends
that you use a virtual X server method such as Virtual Network Computing
(VNC) for better performance. For more information, see the following
RealVNC website:
http://www.realvnc.com/download.html |
The GUI displays the current status of the cluster. To display more details about an item, select the item and click Properties . Figure 4-1 shows an example of the GUI.
In the CLI, enter an argument with =value to assign a value or alone (without a =value) to display the current setting. For example, the following displays the name of the cluster and the number of times the configuration has been changed:
# sgicm-config-cluster-cmd --cluster cluster: name = SGI High Availability cluster config_viewnumber = 14 |
This section discusses the configuration steps:
The names of the device files for filesystems to store quorum information must be the same on all cluster members. You must do the following:
Ensure that the members have their disks attached identically.
Create two unlabeled volumes of at least 10 MB in size on different physical devices. For example, you could use TPSSM on an SGI TP9500 RAID. For more information about TPSSM, see the SGI TPSSM Administration Guide.
Run the parted(8) command to create two partitions of at least 10 MB in size with a 0x83 device type on the chosen volume. For more information, see the parted(8) man page.
Create symbolic links (symlinks) so that the names of shared quorum partitions are the same on all cluster members. You must re-create the symlinks every time the machine reboots because /dev files are re-created. Therefore, you should modify the /usr/lib/clumanager/create_device_links script to add the device symlinks that are required.
For example:
#!/bin/sh # # Create device links for shared quorum partitions if it does not exist # if [ ! -h <device link> ]; then # ln -s /dev/.... /dev/shared .... # fi # # # Create device links for shared disks if the disks are not in the # same I/O slot in all cluster members if it does not exist # if [ ! -h /dev/shared1 ]; then ln -s /dev/xscsi/pci02.02.0/node20000050cc00857a/port1/lun0/part1 /dev/shared1 fi if [ ! -h /dev/shared2 ]; then ln -s /dev/xscsi/pci02.02.1/node20000050cc00857a/port4/lun1/part2 /dev/shared2 fi |
SGI recommends that the two shared quorum partitions should be on different Fibre Channel controllers; ideally, they should be on separate Fibre Channel controllers at the front end, on separate HBAs on the Altix, and on separate RAID logical units (LUNs) or RAID arrays if possible. They should be at least 10 MB in size and the partition type must be Linux. For more information, see Appendix B, “Setting the Partition Type to Linux”.
You can use the lspci command to see the Fibre Channel cards and you can use the hwinfo --disk command to see information about the shared disks. For example:
# lspci ... 0000:05:02.0 Fibre Channel: QLogic Corp. QLA2300 64-bit Fibre Channel Adapter (rev 01) ... # hwinfo --disk --short disk: /dev/sda SGI ST373307LC /dev/sdb SGI TP9100 FFX2 /dev/sdc SGI TP9100 FFX2 /dev/sdd SGI TP9100 FFX2 /dev/sde SGI TP9100 FFX2 /dev/sdf SGI TP9100 FFX2 /dev/sdg SGI TP9100 FFX2 /dev/sdh SGI TP9100 FFX2 |
For more information, see the hwinfo(8) man page.
Partition 1 from the disk at Fibre Channel target 2 and partition 1 from the disk at Fibre Channel target 3 will be used for storing shared state. To assure that the devices have the same name on both cluster members, you could create symlinks to the block device files:
# ln -s /dev/sdb1 /dev/shared1 # ln -s /dev/sdc1 /dev/shared2 |
You should add these symlink commands to the /usr/lib/clumanager/create_device_links script on the cluster member. For more information, see the ln(1) man page.
The shared quorum partitions will then be referred to as /dev/shared1 and /dev/shared2. These partitions will be the primary and shadow, respectively, in the following examples.
To define the shared quorum partitions in the GUI configuration window, select the following from the Cluster Configuration window:
Cluster -> Shared State
In the CLI:
sgicm-config-cluster-cmd --sharedstate \
--type=raw \
--rawprimary=path1 \
--rawshadow=path2 |
For example:
# sgicm-config-cluster-cmd --sharedstate --type=raw \ --rawprimary=/dev/shared1 --rawshadow=/dev/shared2 |
You should perform this step before defining the cluster (“Step 2: Create the Cluster”).
To create the cluster in the GUI, type the cluster name in the Cluster Name field in cluster configuration window. The default cluster name is SGI High Availability cluster.
In the CLI:
sgicm-config-cluster-cmd --cluster --name "clustername" |
To define a member in the GUI, do the following in the cluster configuration window:
Click the Members tab.
Click New.
Enter the hostname of the new member. SGI recommends that member hostnames and addresses be present in /etc/hosts so that communication between cluster daemons does not rely on DNS or NIS being available.
In the CLI:
sgicm-config-cluster-cmd --add_member --name=membername |
For each member, you must provide information about its power controller. The SGI Cluster Manager supports the L2 system controller using either serial cables or Ethernet cables.
| Note: The Serial and Network power controllers shown in the GUI refer to third-party products that are not supported by SGI Cluster Manager. (Although the other machine in the cluster will appear in the Owner field automatically, this value is not used.) Do not confuse network-based power controllers with the l2network L2 Ethernet connection. |
In the GUI Cluster Configuration window, select the member and click Add Child. The fields for SGI controllers are as follows:
Type: the power controller type of the member being defined (the local member):
l2 for using L2 serial cables.
l2network for using the L2 Ethernet connection. (This is the default in the GUI.)
| Note: If you have a system with an emulated L2 controller (such as an Altix 3700 Bx2), or if you run CXFS with SGI Cluster Manger, you must use the l2network connection type. See “l2network Ethernet Connection” in Chapter 2. |
This is the type field in the CLI; in the CLI, there is no default value.
Peer's TTY device file name: the tty device filename on the peer member to which the local system controller is connected.
This is the device field in the CLI.
Altix partition: the local member's system partition ID. If there are no partitions, partition ID is 0. The default value in the GUI is 0.
This is the partion field in the CLI.
Figure 4-2 shows an example for an L2 using the Ethernet network and Figure 4-3 shows an example in the GUI for an L2 using serial cables.
| Note: You can use a hostname in place of the IP address when configuring l2network reset provided that name resolution is in place; that is, the name is in /etc/hosts on all of the servers or is otherwise available via gethostbyname(2). |
In the CLI:
sgicm-config-cluster-cmd --member=membername \
--add_powercontroller \
--type=l2|l2network \
required only for l2network:
--ipaddress=L2_IPaddress_or_hostname \
--password=L2_password_(if_defined) \
--partition=Altix_partition_ID \
--device=/dev/ttyIOCx\
--partition=n |
You can optionally set a password for the L2 to prevent unauthorized access to L2 functions via Ethernet. If you choose to use this security feature, SGI Cluster Manager must know the password in order to access L2 functionality. For more information, see SGI L1 and L2 Controller Software User's Guide.
For example, the following defines an L2 using the Ethernet method (therefore there is no --device argument):
# sgicm-config-cluster-cmd --member=member1 --add_powercontroller \ --type=l2network --ipaddress=192.168.0.100 --partition=3 --password=foo |
For example, the following defines an L2 using the serial cable method:
# sgicm-config-cluster-cmd --member=member1 --add_powercontroller \ --type=l2 --device=/dev/ttyIOC0 |
For hardware information, see “Power Control” in Chapter 2.
You can modify the time it takes to detect a member failure, known as the failover speed.
| Note: The default failover speed differs depending upon which tool (GUI or CLI) you use to define the cluster. You cannot change the value for failover speed while the cluster daemons are running. |
In the GUI, you can supply the failover speed directly:
In the Cluster Configuration window, select the following:
Cluster -> Daemon properties
Select the clumembd tab
Use the sliding bar to adjust failover speed, as shown in Figure 4-4. The GUI provides 15 seconds as the default failover speed value.
You can choose to enable either broadcast heartbeating or multicast heartbeating.
The clumembd daemon lets you specify the failover speed indirectly by defining the heartbeat interval and the timeout, from which the failover speed is automatically calculated:
interval specifies the heartbeat interval, which is the number of microseconds before a heartbeat is sent to all other members in the cluster. The default value is 500000 (0.5 seconds).
tko_count specifies the heartbeat timeout, which is the number of heartbeats missed before a member is declared as failed. The default value is 20.
| Note: The GUI does not let you display or set the heartbeat interval or the heartbeat timeout individually. |
The failover speed is calculated as follows:
interval_value * tko_count_value = failover_speed |
Therefore, the default member failure detection time is 10 seconds (0.5 * 20 = 10).
Table 4-1 shows the failure detection times and parameter values that are supported.
Table 4-1. Supported Failure Detection Times and Parameter Values
Failover Speed (in seconds) | tko_count | ||
|---|---|---|---|
30 |
| 30 | |
25 |
| 25 | |
20 | 1000000 | 20 | |
15 | 750000 | 20 | |
10 | 500000 | 20 | |
5 | 330000 | 15 |
For example, the following command displays the heartbeat interval and tko_count values:
# sgicm-config-cluster-cmd --clumembd clumembd: loglevel = 5 interval = 500000 tko_count = 20 thread = yes broadcast = no multicast = yes multicast_ipaddress = 225.0.0.11 |
The failover speed is therefore 10 seconds. The following command changes the failover speed 15 seconds:
# sgicm-config-cluster-cmd --clumembd --interval=750000 --tko_count=20 |
| Note: You cannot change the values for interval and tko_count while the cluster daemons are running. |
For more information about using the command-line interface, see sgicm-config-cluster-cmd man page.
There are two types of tiebreakers:
Network tiebreaker is used to avoid a split-brain scenario, in which both members attempt to form individual clusters. The network tiebreaker ensures that only the member that can contact the tiebreaker IP address is able to form a cluster. The network tiebreaker is the IP address of a machine or a router that does not participate in the cluster. Usually, it is the IP address of a network router that connects the members to the external world (clients).
| Note: You must verify that the network tiebreaker can be accessed by the ping(1) command. (Some sites like to disable internet control message protocols at routers so the router or machines more than one hop away do not answer; such a router or machine could not be used as a tiebreaker.) |
Disk tiebreaker: If two members cannot talk to each other, they look at the status on the shared quorum partition disk to decide which member should survive and be part of the cluster membership. If the disk cannot be accessed or membership on the disk does not include a given machine, all SGI Cluster Manager processes on the machine exit. You can specify the number of seconds between the updates to the on-disk status. In the GUI, the default is 2 seconds.
In the GUI:
Select the following in the Cluster Configuration window:
Cluster -> Daemon properties
Select the cluquorumd tab.
Specify the desired values for the tiebreakers.
Figure 4-5 shows an example of the cluquorumd window.
In the CLI:
sgicm-config-cluster-cmd --cluquorumd \
--tiebreaker_ip=IPaddress \
--pinginterval=seconds |
The failover domain is optional; if a failover domain is not defined, the service will be started on any member. For more information, see “Failover Domains” in Chapter 1.
In the GUI Cluster Configuration window:
Select the Failover Domains tab.
Click New.
Enter the domain name and choose the desired failover and failback options.
Click OK to create the domain.
For information about the failover and failback options, see “Failover Domains” in Chapter 1.
Figure 4-6 shows an example.
In the CLI:
sgicm-config-cluster-cmd --add_failoverdomain \
--name=domainname \
--restricted=yes|no \
--ordered=yes|no \
--controlled=yes|no
sgicm-config-cluster-cmd --failoverdomain=domainname \
--add_failoverdomainnode \
--name=membername |
The default for --restricted, --ordered , and --controlled is no.
You can specify the following for a service (the service must be disabled in order to configure it):
Service name.
| Note: If you are using the GUI, you cannot include white space within a service name. |
Failover domain name (see “Step 7: Create the Failover Domain”).
Monitor interval (in seconds).
Service timeout (in seconds), which is common for all actions (start, stop, and status check) that apply to the service. A service timeout of 0 means that there is no timeout (the service action will never timeout).
| Note: You cannot specify individual timeouts for each resource within the service nor for each action (stop/start/monitor). |
Monitor level (for NFS and Samba only):
Check for processes
NFS checks for nfsd processes.
Samba checks for smb and nmb processes
Check as client
NFS sends null RPCs to the NFS server.
Samba sends smb and nmb queries to the samba server.
Restart count limit, which is the number of local restarts allowed for a service. When the limit is exceeded, the service is failed over to the other member. If there are no monitor failures for a day, the number of restart failures is reinitialized to 0. The maximum is 500.
User application script or directory, if applicable. (If you are configuring NFS or Samba services, it is not necessary to put anything in this field.)
In this field, you can specify an individual script or a directory containing scripts. A script contains functions to implement service failover. The directory or script is specified as a service parameter.
Each function will be called with two parameters:
An action: one of start, stop, or status
A service ID
If successful, the function must return 0; if it fails, it must return a non-zero value.
For an example script, see “Sample User Application Script” in Chapter 6.
In the GUI Cluster Configuration window:
Select the Services tab.
Click New.
Enter the desired values.
Click OK to create the service.
Figure 4-7 shows an example of configuring an NFS high-availability service.
In the CLI:
sgicm-config-cluster-cmd --add_service \
--name=servicename \
--failoverdomain=domainname \
--checkinterval=seconds \
--servicetimeout=seconds \
--monitorlevel="level" \
--restartcount=N \
--userscript=pathname |
Note: The monitoring-level string values are case-sensitive and
should be either of the following:
|
In the GUI Cluster Configuration window:
Select the Services tab.
Select the service name.
Click Add Child.
Choose Add service IP address and click OK.
Enter the IP address and optional netmask and broadcast address.
Click OK.
In the CLI:
sgicm-config-cluster-cmd --service=servicename \
--add_service_ipaddress \
--ipaddress=IPaddress \
--netmask=netaddress \
--broadcast=broadcastaddress |
In the GUI Cluster Configuration window:
Select the Services tab.
Select the service name.
Click Add Child.
Choose Add Device and click OK.
Enter information for the following, as appropriate:
Device special filename.
Samba share name.
Local XVM physical volumes (physvols). This must be a comma-separated list.
Mount point. If you are configuring a filesystem that requires the dmi mount option and are using local XVM, you must specify the mount point as follows:
mtpt=mountpoint |
For example, if the mount point is /dmfs:
mtpt=/dmfs |
When using CXFS, the mount point options are specified using CXFS tools.
Filesystem type xfs or cxfs (if using the CXFS plug-in). Default is xfs.
| Note: When configuring local XVM in the GUI, you must own the physvol for the XVM volume so that you may see the block device file for the volume in /dev/lxvm/. |
Mount options
Enable Force Unmount.
Click OK.
In the CLI:
sgicm-config-cluster-cmd --service=servicename \
--add_device \
--name=path
sgicm-config-cluster-cmd --service=servicename \
--device=path \
--mount \
--mountpoint=mountpoint \
--fstype=xfs|cxfs \
--options=mountoptions \
--forceunmount=yes |
Samba share names must be unique within the cluster.
The Samba Druid is a configuration guide that lets you create a new Samba service or add Samba to an existing service. In the GUI Cluster Configuration window, start the Samba Druid by selecting the following:
Add Exports -> Samba
For more information, see “Samba Druid Example”.
In the CLI:
sgicm-config-cluster-cmd --service=servicename \
--device=path \
--sharename=sharename |
Define the NFS export point and NFS client information.
The NFS Druid is a configuration guide that lets you create a new NFS service or add NFS export points to an existing service. In the GUI Cluster Configuration window, start the NFS Druid by selecting the following:
Add Exports -> NFS
For more information, see “NFS Druid Example”.
In the CLI:
sgicm-config-cluster-cmd --service=servicename \
--device=path \
--add_nfsexport \
--name=exportdirectory
sgicm-config-cluster-cmd --service=servicename \
--device=path \
--nfsexport=exportpath \
--add_client \
--name=\* \
--options=options |
The value * for the NFS client name means “all NFS clients.” For better security, supply a list of NFS client systems instead of the * character. For more information, see the exports(5) man page.
| Note: In general, you must use the fsid option to set the fsid value for the export. Any number in the range 1 through 65535 will work. See the exports(5) man page for further details on the fsid option. |
If you are using the GUI, you must explicitly save the configuration information as noted in “Cluster Configuration Tools”. Select the following from the Cluster Configuration window:
File -> Save
During the initial configuration, you must manually copy the /etc/cluster.xml file to the other member in the cluster whether you use the GUI or the CLI.
Each member has an /etc/cluster.xml file that contains cluster configuration information. If you make a change to this file on one member, you must copy the file to the other member using a command such as scp(1).
After making configuration changes, you must verify that the configuration files across the cluster are in synchronization. To do this, you can run the following command on each node and compare the config_viewnumber value on each, which lists the configuration file version number:
sgicm-config-cluster-cmd --cluster |
The config_viewnumber value is updated each time a change is made to the configuration file.
For example, the following output from Machine1 and Machine2 shows that the configuration files are in synchronization for test-cluster because they both have the same config_viewnumber value (10):
Machine1:
Machine1# sgicm-config-cluster-cmd --cluster cluster: name = test-cluster config_viewnumber = 10 |
Machine2:
Machine2# sgicm-config-cluster-cmd --cluster cluster: name = test-cluster config_viewnumber = 10 |
In another example, the following output from Machine1 and Machine2 shows that the configuration files are out synchronization because they have differentconfig_viewnumber values (10 and 11):
Machine1:
Machine1# sgicm-config-cluster-cmd --cluster cluster: name = test-cluster config_viewnumber = 10 |
Machine2:
Machine2# sgicm-config-cluster-cmd --cluster cluster: name = test-cluster config_viewnumber = 11 |
If the config_viewnumber values are different, then configuration files are different. You should copy the configuration file with higher config_viewnumber number (which indicates the more recent configuration file) to the other member. In this case, you would copy the configuration file from Machine2 (which has the higher number of 11) to Machine1 (which has the lower number of 10).
To automatically restart the SGI Cluster Manager daemons after a reboot, do the following in the CLI:
Enter the following command:
# chkconfig clumanager on |
Start local cluster daemons on each member in the cluster by doing either of the following:
Enter /etc/init.d/clumanager start
In the GUI, select the following in the Cluster Status window:
Cluster -> Start Local Cluster Daemons
For more information, see Chapter 5, “Administration”.
Figure 4-8 shows the Samba Druid initial window. Click Forward to configure the Samba service.
You can choose to add Samba to an existing service or to create a new Samba service. In Figure 4-9, a new Samba service named samba with service IP address 192.168.0.3 is being created.
You can choose the device and mount point of an existing service or add a new device and mount point. In Figure 4-10, a new device ( /dev/sdd) and mount point (/samba) are being added. To change mount options, you must double-click on the device in the Cluster Configuration window after completing the Samba Druid configuration.
In Figure 4-11, the name of the share is specified as mysamba. Only one share is configured at a time.
Click Apply to complete the configuration of the Samba service. You must copy the /etc/samba/smb.conf.mysamba configuration file to the other member in the cluster. See Chapter 7, “Samba Plug-In” for information about the newly created Samba configuration file.
Figure 4-13 shows the NFS Druid initial window. Click Forward to proceed.
Figure 4-14 shows the window that lets you enter the name of the export directory and its export options. You can add only one export directory at a time.
You are given the choice of adding the export directory to an existing service or creating a new service for the export directory. In Figure 4-15, a new service nfs is being created with service IP address 192.168.0.4.
You can add devices (filesystems) to the service. If you had chosen an existing service in the Select Service for Export window (Figure 4-15), you could choose an existing device mount point in the Select Device for Export. In Figure 4-16, a new device (/dev/shared1) and mount point (/lun2 ) are specified. To add filesystem mount options, you must double-click the device entry in the Cluster Configuration window after completing the NFS service configuration.
Click Apply to complete the NFS configuration. If you want to modify service parameters, you must double-click the service in the Cluster Configuration window.
The following example uses sgicm-config-cluster-cmd commands to create a two-member cluster with a service providing Samba shares and NFS service:
member1 is an Altix 350 system with no partitions that is connected to an L2 power controller
member2 is partition 3 of an Altix 3700 system that is connected to an L2 power controller using Ethernet, where 192.168.9.2 is the IP address of the L2 connected to member2
The network tiebreaker is the IP address of a network router or another machine that determines which member should have connectivity to the public network
service1 is the IP address that will be used by clients to access the Samba share and NFS export point
The service is allowed to restart four times within one day before a failover occurs
| Note: Commands that modify the configuration file do not print anything if they are successful. The command exit status is 0 when successful. |
Do the following:
# sgicm-config-cluster-cmd --sharedstate --type=raw --rawprimary=/dev/shared_1 \ --rawshadow=/dev/shared_2 |
# sgicm-config-cluster-cmd --cluster --name "test-cluster" |
# sgicm-config-cluster-cmd --add_member --name=member1 --watchdog=no # sgicm-config-cluster-cmd --add_member --name=member2 --watchdog=no |
Add power controller information for the members (192.168.9.2 is the IP address of the L2):
# sgicm-config-cluster-cmd --member=member1 --add_powercontroller --type=l2 \ --device=/dev/ttyIOC0 --partition=0 # sgicm-config-cluster-cmd --member=member2 --add_powercontroller --type=l2network \ --ipaddress=192.168.9.2 --partition=3 |
Change the heartbeat timeout to 20 seconds with heartbeat interval of 1 second, resulting in a failover speed of 20 seconds:
# sgicm-config-cluster-cmd --clumembd --interval=1000000 --tko_count=20 |
Set up a network tiebreaker for the cluster:
# sgicm-config-cluster-cmd --cluquorumd --tiebreaker_ip=192.0.2.245 |
Create a failover domain with an ordered failover policy where the primary member is member1 and the backup member is member2:
# sgicm-config-cluster-cmd --add_failoverdomain --name=domain1 \ --restricted=yes --ordered=yes # sgicm-config-cluster-cmd --failoverdomain=domain1 --add_failoverdomainnode \ --name=member1 # sgicm-config-cluster-cmd --failoverdomain=domain1 --add_failoverdomainnode \ --name=member2 |
Create the service definition:
# sgicm-config-cluster-cmd --add_service --name=service1 --checkinterval=60 \ --servicetimeout=40 --monitorlevel="Check as client" \ --failoverdomain=domain1 --restartcount=4 |
# sgicm-config-cluster-cmd --service=service1 --add_service_ipaddress \ --ipaddress=192.168.1.2 --netmask=255.255.255.0 \ --broadcast=192.168.1.255 |
Add the shared quorum partition and filesystem information to service1:
# sgicm-config-cluster-cmd --service=service1 --add_device --name=/dev/shared1 # sgicm-config-cluster-cmd --service=service1 --device=/dev/shared1 --mount \ --mountpoint=/mnt1 --fstype=xfs --options=rw,sync \ --forceunmount=yes |
# sgicm-config-cluster-cmd --service=service1 --device=/dev/shared1 \ --sharename=share1 |
Define the NFS export point and NFS client information. The directory is exported to all clients with read-only access:
# sgicm-config-cluster-cmd --service=service1 --device=/dev/shared1 \ --add_nfsexport --name=/shared1/export_dir # sgicm-config-cluster-cmd --service=service1 --device=/dev/shared1 \ --nfsexport=/shared1/export_dir --add_client \ --name=\* --options=ro |
| Note: The value of * for the NFS client name means “all NFS clients.” For better security, supply a list of NFS client systems instead of the * character. For more information, see the exports(5) man page. |
(If you were using the GUI, you would have to save the configuration at this point.)
Synchronize the configuration changes. For example:
# scp /etc/cluster.xml root@member2:/etc/cluster.xml root@member2's password:ENTER_ROOT_PASSWORD cluster.xml 100% 3297 57.1MB/s 00:00 |
Verify that the changes are synchronized by running the following command on each member:
# sgicm-config-cluster-cmd --cluster |
Start the SGI Cluster Manager daemons:
Enter the following command:
# chkconfig clumanager on |
Start local cluster daemons on each member in the cluster doing either of the following:
# service clumanager start or # /etc/init.d/clumanager start |
For more information and additional examples, see the sgicm-config-cluster-cmd(8) man page.