This chapter discusses the following:
For more information about storage configuration, see the following:
TPM Installation Instructions and User's Guide for SGI TP9100
SGI InfiniteStorage TP9300 and TP9300S RAID User's Guide
SGI® InfiniteStorage TP9400 and SGI® InfiniteStorage TP9500 and TP9500S RAID User's Guide
SGI InfiniteStorage TP9500 and TP9700 RAID User's Guide
SGI TPSSM Administration Guide
SGI Cluster Manager for Linux requires two 10-MB disk partitions to keep membership quorum: the primary partition and the shadow partition (used for backup purposes). You should use the block device to access these partitions.
To provide maximum redundancy, the primary partition and the shadow partition should be in different storage devices connected to the members using different Fibre Channel (FC) cards. The two partitions should have independent I/O paths.
SGI Cluster Manager works on supported SGI RAID configurations. Each member in the cluster should be connected to storage using multiple paths so that service failovers are minimized. Ideally, the two shared quorum partitions should be on separate FC controllers at the front end, separate HBAs on the Altix, and on separate RAID logical units (LUNs) or RAID arrays if possible. They should be at least 10 MB in size and the partition type must be linux.
The device names for the shared quorum partitions must be identical on all cluster members. Use the /usr/lib/clumanager/create_device_links script to create the same device name on each member.
For more information, see the books listed at the beginning of this chapter.
SGI Cluster Manager uses hostnames for sending heartbeat and control messages to indicate that a member is up and running and to request operations or distribute information. Ethernet cables are provided that will allow the members to be connected directly or connected using a network hub.
You can use 10/100baseT or 1-Gb ports in the system for heartbeat communication. For more information, see SGI Altix Systems Dual-Port Gigabit Ethernet Board User's Guide.
Heartbeats are either broadcast on all networks or multicast on the network interface that hostname configured.
You should use firewall software such as SuSEfirewall with care because it could make broadcast and multicast ineffectual.
You must use SGI L2 system controllers for power control. An L2 controller is standard with each Altix 3700 rack. On some platforms, including the Altix 350, an L2 controller is a separate optional product that must be purchased in order to use SGI Cluster Manager.
You may connect to the L2 over an Ethernet network or (depending on your particular hardware) you may connect directly to the L2 through a serial port. The network connection method (l2network) is preferred because it is easier to set up and provides for greater flexibility while configuring a cluster.
| Note: If you have a system with an emulated L2 controller (such as an Altix 330 or Altix 3700 Bx2), or if you run CXFS with SGI Cluster Manger, you must use the l2network connection type. See “l2network Ethernet Connection”. |
For information about configuring the power controller, see “Step 4: Add Power Controller Configuration” in Chapter 4.
You can use one of the following methods:
The l2network Ethernet connection is the preferred L2 connection method. It requires the following:
An Ethernet port on each member.
All members in the cluster and the L2 must be connected to the same network. SGI recommends using a private network for greater reliability. (If a private network is used, a PCI Ethernet card is required for each member.)
The IP address of each member's L2 controller must be entered as the address of that member's power controller. For example, to specify the power controller for cluster member Machine-A, enter the IP address of the L2 of Machine-A.
| Note: Multiple members within a partitioned system may share a single L2 as long as the system serial number on each L1 is the same. |
Figure 2-1 shows the L2 Ethernet connection for an Altix 3700.
Figure 2-2 shows the L1 USB port on an Altix 3700 Bx2 CR brick. Use a USB cable to connect the L1 USB port to the USB/network adapter mounted on the rack. The USB/network adapter should be connected to the network and must be accessible from the other SGI Cluster Manager member via the network.
| Note: The l2network Ethernet connection is preferred. |
A serial connection requires the following:
Altix 350: serial ports on Altix 350 with IO10 and a IO10 CBL-SATA-SERIAL multiport serial adapter cable. You must also order the LS-BASE-IO serial ATA (SATA) drive option.
Figure 2-3 shows the rear panel for an Altix 350. For information about using an Altix 350 with an IO9 PCI card, see “Hardware Requirements” in Chapter 1.
Serial cables should use the remote modem port on the L2 system controller. Connect the serial cable to the remote modem port on one end and the tty port on the other end.
The l2 designation in the cluster configuration
Figure 2-4 and Figure 2-5 show the serial connections.
You must turn on the apwr automatic power variable on the L2.
To show the current status, use the following command on the L2:
L2> apwr |
To turn on automatic power, set the apwr value to on on the L2. For example:
L2> apwr on |
Use the appropriate testing method:
To determine an L2's IP address or to configure an IP address for an L2, connect to the L2 using the serial port and use the L2 ip command.
For example, to show the current IP setting:
l2-foo-001-L2>ip addr: 192.0.2.70 netmask: 255.255.255.0 broadcast addr: 192.0.2.255 |
Note: If you are using DHCP to assign the IP addresses of the L2s
dynamically, you will see the following message:
This is not an error, it just indicates that static IP addresses are not in use. To determine the IP address, use the cfg command from the L2 to show each brick and each L2 in the configuration. |
To change the IP setting to 63.154.16.7:
l2-foo-001-L2> ip 63.154.16.7 255.255.255.0 63.154.16.255 |
You can use the ping command to test connectivity to an L2. You can also use the L2 l2find command to find other L2s in the same subnet. For example:
[root@altix root]$ telnet l2-server.acme.com
Trying 192.0.1.98...
Connected to l2-server.acme.com.
Escape character is '^]'.
Linux 2.4.7-sgil2 (192.0.1.98) (ttyp2)
SGI SN1 L2 Controller
INFO: connection established to localhost, to quit enter <ctrl-]> <>
server-001-L2>help l2find
l2find
print list of L2's on the same subnet as this one
server-001-L2>l2find
6 L2's discovered:
IP SSN NAME RACK FIRMWARE
--------------- -------- ---------------- ---- ------------
[ L2's with different System Serial Numbers ]
192.0.1.67 R2000016 000 L3 controlle
192.0.1.132 L1000487 001 1.3.61
192.0.1.96 N0000005 bar 002 1.24.2
192.0.1.100 N0000005 bar2 003 1.24.2
192.0.1.94 N0000005 bar3 004 1.24.2
192.0.1.105 N0000005 bar_l2_2 001 1.22.0
server-001-L2> |
You can use the cu(1) command to test the serial reset lines if you have installed the uucp RPM.
The cu command requires that the device files be readable and writable by the user uucp. The command also requires that the /var/lock directory be writable by group uucp.
Perform the following steps:
Assure that the ioc4_serial module is loaded:
# lsmod | grep ioc4_serial |
If you do not see ioc4_serial in the output from the lsmod command, install the module with modprobe:
# modprobe ioc4_serial |
If you intend to use the L2 serial connection permanently, the ioc4_serial module must be loaded automatically when the system boots. Typically, this is done by editing /etc/sysconfig/kernel . See Chapter 3, “Software Installation”.
Change ownership of the serial devices so that they are in group uucp and owned by user uucp.
| Note: The ownership change may not be persistent across reboots. |
For example, suppose you have the following TTY devices on the IO10:
# ls -l /dev/ttyIOC* crw-rw---- 1 root uucp 204, 50 Sep 15 16:20 /dev/ttyIOC0 crw-rw---- 1 root uucp 204, 51 Sep 15 16:20 /dev/ttyIOC1 crw-rw---- 1 root uucp 204, 52 Sep 15 16:20 /dev/ttyIOC2 crw-rw---- 1 root uucp 204, 53 Sep 15 16:20 /dev/ttyIOC3 |
To change ownership of them to uucp, you would enter the following:
# chown uucp.uucp /dev/ttyIOC* |
Determine if group uucp can write to the /var/lock directory and change permissions if necessary.
For example, the following shows that group uucp cannot write to the directory:
# ls -ld /var/lock drwxr-xr-t 5 root uucp 88 Sep 19 08:21 /var/lock |
The following adds write permission for group uucp:
# chmod g+w /var/lock |
Join the uucp group temporarily, if necessary, and use cu to test the line.
For example:
# newgrp uucp # cu -l /dev/ttyIOC0 -s 38400 Connected nodeA-001-L2>cfg L2 192.168.0.1: - 001 (LOCAL) L1 192.0.1.133:0:0 - 001c04.1 L1 192.0.1.133:0:1 - 001i13.1 L1 192.0.1.133:0:5 - 001c07.2 L1 192.0.1.133:0:6 - 001i02.2 |
For more information, see the cu(1) man page and the documentation that comes with the uucp RPM.
Other tools that may be useful when testing connectivity are minicom(1) kermit(1).
After you have configured the cluster software, (see Chapter 4, “Configuration”), you can also use the clufence(8) command to test serial connectivity. See the clufence (8) man page for more information.