This chapter describes how to use the SGI Tempo systems management software to discovery, install, and configure your Altix ICE system and covers the following topics
The configure_cluster command launches a cluster configuration tool. It allows you to perform the following:
Change the subnet numbers for the various cluster networks
Change and configure the domain of the cluster (which is likely different than the domain of eth0 on the system admin controller itself)
Prompts for the SLES10 SP1 media and directs creation of image repositories which you can use to customize your software image
Runs a set of commands that allows you to setup the cluster
Information on using this tool is described in the procedure in the following section, see “Installing Software on the System Admin Controller”.
This section describes how to install software on the system admin controller (admin node). The system admin controller contains software for provisioning, administering, and operating the SGI Altix ICE 8200 system. The SGI Admin Node Autoinstallation DVD contains RPMs for the system admin controller and the software images for the rack leader controllers (leader nodes), service, and the compute nodes.
To install software images on the system admin controller, perform the following steps:
Turn on, reset, or reboot the system admin controller. The power on button is on the right of the system admin controller, as shown in Figure 2-1.
Insert the SGI Admin Node Autoinstallation DVD in the DVD drive on the left of the system admin controller as shown in Figure 2-1.
An autoinstall message appears on your console, as follows:
SGI Admin Node Autoinstallation DVD This is the SGI Admin Node autoinstall DVD. If you proceed, the entire system will be erased and re-installed. You may install from the vga screen or from the serial console. Whichever you choose, the system will be set up to use the serial console. Therefore, it is important that you connect to the serial console the first time you boot the machine after installation. The first boot after installation, you will be prompted for system setup questions on the serial console. Experts: You may choose to use the "auto" label (auto reboot and skip firstboot questions). You may also append the "netinst" option with an nfs path (hostname:/mntpoint/file.iso) to nfs mount the ISO. Press ENTER to send autoinstallation output to the vga screen. Type "serial" at the boot prompt to send autoinstallation output to the serial console. |
| Note: If you want to use the serial console, enter serial at the boot: prompt, otherwise, output for the install procedure goes to VGA screen. |
You can hit the ENTER button or just wait and the system installation process automatically starts. The boot initrd.image executes, the hard drive is patitioned creating a swap area and a root file system, the Linux operating system and the cluster manager software is installed and a repository is set up for the rack leader controller, service node, and compute node software RPMs.
| Note: This step takes several minutes. When the installation is complete, the system admin controller DVD drive automatically ejects the DVD. |
Once installation of software on the system admin controller is complete, remove the DVD from the DVD drive.
Once the system has been installed, enter the reboot command to reboot your system. The system comes up with console output going to the serial console.
You will see messages about the system admin controller booting the kernel. You can ignore any messages about a few services that may fail to start.
| Note: You must connect to the serial console for this boot to answer the firstboot questions, starting with Welcome screen as shown in Figure 2-2. If you connect to the serial console too late, you can enter Ctrl -l to re-draw the welcome screen. |
After the reboot completes, the YaST first boot installation tool starts and a Welcome screen appears, as shown in Figure 2-2. Click on the Next button to proceed.
| Note: The YaST Installation Tool has a main menu with sub-menus. You will be redirected back to the main menu, at various times, as you follow the steps in this procedure. |
You will be prompted by YaST firstboot installer to enter your system details including the root password, network configuration, time zone, and so on.
From the Hostname and Name Server Configuration screen, as shown in Figure 2-3, enter the hostname and domain name of your system in the appropriate fields. Make sure that Change Hostname via DHCP is unselected (no x should appear in the box). Click on the Next button to continue.
| Note: You can use Ctrl L to refresh the YaST screen as necessary. |
From the Network Card Configuration Interfaces screen, shows the suggested configuration as shown in Figure 2-4. Click Next to continue.
From the Network Card Configuration Overview screen, configure the first card under Name to establish the public network (sometimes called the house network) connection to your SGI Altix ICE 8200 system.
| Note: Do NOT configure the second interface at this time. A script will do this for you in a later step. |
From the Network Address Setup screen, choose dynamic address setup via DHCP or enter the IP address for the system admin controller. This is your public/house network information. Click on the Next button to continue.
From the Hostname and Name Server Configuration screen, enter the name and DNS domain name as shown in Figure 2-7. Note that the hostname was entered in step 7.
From the Routing Configuration screen, enter the appropriate gateway address and netmask. Click on the Next button to continue.
From the Clock and Time Zone screen, select the appropriate region and time zone. Click on the Next button to continue.
From the Password for the System Administrator “root” screen, set the root password.
Select the authentication method to use for the users on your system. Click on the Next button to continue.
Enter the user's full name, username, and user password in the New Local User screen. Click on the Next button to continue.
From the Hardware Configuration screen, select Use Following Configuration. Click on the Next button to continue.
An Installation Competed screen appears, as show in Figure 2-8. Click on the Finish button.
After you have completed the YaST first boot installation instructions, login into the system admin controller. You can use YaST to confirm or correct any configuration settings.
| Note: It is important that you make sure that you network settings are correct before proceeding with cluster configuration. |
To start cluster configuration, enter the following command:
% /opt/sgi/sbin/configure_cluster |
The Cluster Configuration Tool: Initial Configuration Check screen appears, as shown in Figure 2-9. This tool provides instructions on the steps you need to take to configure your cluster. Click OK to continue.
The Cluster Configuration Tool: Initial Cluster Setup screen appears, as shown in Figure 2-10. Read the notice and then click OK to continue.
| Note: The Cluster Configuration Tool has a main menu with sub-menus. You will be redirected back to the main menu, at various times, as you follow the steps in this procedure. |
Copy the RPMs from your local SLES media.
% mount /dev/dvd /mnt |
The first of three Copy RPMS screens appears, as shown in Figure 2-12. Click Yes to continue.
Select Network Settings from the Install Cluster Setup Tools menu...."
Enter the /mnt directory to browse its contents, as shown in Figure 2-13. Make sure the RPMs have been successfully copied. Click OK to continue.
The Copy of RPMS from media complete message appears, as shown in Figure 2-14. Click OK to continue.
The Cluster Network Setup screen appears, as shown in Figure 2-15.
The subnet addresses allows you to change the cluster internal network addresses. SGI recommends that you do NOT change these. Click OK to continue to adjust subnets. Otherwise, select Domain Name: Configure Cluster Domain Name and then skip to step 30. A warning screen appears, as shown in Figure 2-16.
The Update Subnet Addresses screen appears, as shown in Figure 2-17.
The default IP address of the system admin controller which is the Head Network for the Altix ICE system is shown. SGI recommends that you do NOT change the IP address of the system admin controller (admin node) or rack leader controllers (leader nodes) if at all possible. You can adjust the IP addresses of the InfiniBand network (ib0 and ib1) to match the IP requirements of the house network. Click OK to continue.
Enter the domain name for your Altix ICE system, as shown in Figure 2-18. Click OK to continue (this will be a subdomain to your house network, by default).
The next steps in this procedure changes your NTP configuration file. Click on Yes to continue. This sets the system admin controller to serve time to the Altix ICE system and allows you to add time servers on your house networks, which you may optionally use.
Configure NTP time service as shown in Figure 2-20. Click Next to continue.
A new ntp.config configuration file is created. Click on OK to continue.
Optionally, configure the house domain name service (DNS) resolvers as shown in Figure 2-22. After entering the IPs, click OK to enable, click Disable House DNS to stop using house DNS resolution, click Back to leave house DNS resolution as it was when you started (disabled at installation).
When the Admin Infrastructure One Time Setup screen appears, as shown in Figure 2-18, a series of scripts now will run to configure the system admin controller of the Altix ICE system. Click OK to continue.
Once the scripts have completed configuring the system admin controller, a completion message appears, as shown inFigure 2-24. Click OK to continue.
| Note: The main menu contains a reset the database function that allows you to start software installation over without having to reinstall the system admin controller. |
Proceed to “Installing Software on the Rack Leader Controllers and Service Nodes”. It describes the discovery process for the rack leader controllers in your system and how to install software on the rack leader controllers.
The discover command is used to discover rack leader controllers (leader nodes), service nodes, including the their associated BMC controllers, and compute nodes in an entire system or in a set of one or more racks that you select. Rack numbers generally start at one. Service nodes generally start at zero. When you use the discover command to perform the discovery operation on your Altix ICE system, you will be prompted with instructions on how to proceed (see “Installing Software on the Rack Leader Controllers and Service Nodes”).
The discover command is, as follows:
/opt/sgi/sbin/discover --rack <#>[,<hw-type>] /opt/sgi/sbin/discover --rackset <start-number>,<count>[,<hw-type>] /opt/sgi/sbin/discover --service <#>[,<hw-type>] |
The discover command accepts the following options:
| Option | Description | |
| --rack | Discovers a specific rack or set of racks | |
| --rackset | Discovers count racks starting at start-number | |
| --service | Discovers the specified service node | |
| --force | Use --force to avoid sanity checks that require input. | |
| --delrack | Deletes racks and associated leaders and blades | |
| --delservice | Deletes a service node | |
| --help | Usage and help text |
The hw-type parameter is a hardware model that affects how the discover command proceeds. If hw-type is not specified, a default value is used. Use the other hardware type for a service node you supply and manage. This mode allocates IP addresses for you and print them to the screen. This other type of service node is not managed by the Tempo systems management software.
Valid hardware type specifiers are, as follows:
ice-csn (default type)
xe210
xe240
xe310
altix450 (NAS cube)
altix4000
altix4700
other
If you wish to re-discover an existing service node or rack, simply run the discover command in the same manner you normally would. If you wish to purge a rack or service node entirely, (never to be seen again), use --delservice and --delrack options.
EXAMPLES
Example 2-1. discover Command Examples
The following examples walk you through some typical discover command operations.
To discover rack 1 and service node 0, perform the following:
# /opt/sgi/sbin/discover --rack 1 --service 0 |
In this example, service node 0 is an Altix XE210 system.
To discover racks 1 and 4, service node 1, and ignore MAC address 00:04:23:d6:03:1c, perform the following:
# /opt/sgi/sbin/discover --ignoremac 00:04:23:d6:03:1c --rack 1 --rack 4 --service 1 |
To discover racks 1-5, and service node 0-2, perform the following:
# /opt/sgi/sbin/discover --rackset 1,5 --service 0 --service 1,altix450 --service 2,other |
In this example, Service node 1 is an Altix 450 system. Service node 2 is other hardware type.
The discover command, described in “discover Command”, sets up the leader and managed service nodes for installation and discovery. This section describes the discovery process you use to determine the Media Access Control (MAC) address, that is, the unique hardware address, of each rack leader controller (leader nodes) and then how to install software on the rack leader controllers.
To install software on the rack leader controllers, perform the following steps:
Use the discover command from the command line, as follows:
# /opt/sgi/sbin/discover --rack 1 |
| Note: You can discover multiple racks at a time and service nodes using the --service option. |
The discover script executes. When prompted, turn the power on to the node being discovered and only that node.
| Note: Make sure you only power on the node being discovered and nothing else in the system. Make sure not to power the system up itself. |
When the node has electrical power, the BMC starts up even though the system is not powered on. The BMC does a network DHCP request that the discover script intercepts and then configures the cluster database and DHCP with the MAC address for the BMC. The BMC then retrieves its IP address. Next, this script instructs the BMC to power up the node. The node performs a DHCP request that the script intercepts and then configures the cluster database and DHCP with the MAC address for the node. The rack leader controller installs itself using the systemimager software and then boots itself.
The discover script will turn on the chassis identify light for 2 minutes. Output similar to the following appears on the console:
Discover of rack1 / leader node r1lead complete r1lead has been set up to install itself using systemimager The chassis identify light has been turned on for 2 minutes |
The blue chassis identify light is your cue to power on the next rack leader controller and start the process all over.
Using this method, you can configure all the rack leader controllers and service nodes in the cluster without having to go back and fourth to and from your workstation between each discovery operation.
You can use the ssh command to verify r1lead node is available, as follows:
# ssh r1lead hostname r1lead |
If your discover process does not find the appropriate BMC after a few minutes, the following message appears:
==============================================================================
Warning: Trouble discovering the BMC!
==============================================================================
3 minutes have passed and we still can't find the BMC we're looking for.
We're going to keep looking until/if you hit ctrl-c.
Here are some ideas for what might cause this:
- Ensure the system is really plugged in and is connected to the network.
- This can happen if you start discover AFTER plugging in the system.
Discover works by watching for the DHCP request that the BMC on the system
makes when power is applied. Only nodes that have already been discovered
should be plugged in. You should only plug in service and leader nodes
when instructed.
- Ensure the CMC is operational and passing network traffic.
- Ensure the CMC firwmare up to date and that it's configured to do VLANs.
- Ensure the BMC is properly configured to use dhcp when plugged in to power.
- Ensure the BMC, frusdr, and bios firmware up to date on the node.
- Ensure the node is connected to the correct CMC port.
Still Waiting. Hit ctrl-c to abort this process. That will abort discovery
at this problem point -- previously discovered components will not be affected.
============================================================================== |
If your discover process finds the appropriate BMC, but cannot find the leader or service node that is powered up after a few minutes, the following message appears:
============================================================================== Warning: Trouble discovering the NODE! ============================================================================== 4 minutes have passed and we still can't find the node. We're going to keep looking until/if you hit ctrl-c. If you got this far, it means we did detect the BMC earlier, but we never saw the node itself perform a DHCP request. Here are some ideas for what might cause this: - Ensure the BIOS boot order is configured to boot from the network first - Ensure the BIOS / frusdr / bmc firmware are up to date. - Is the node failing to power up properly? (possible hardware problem?) Consider manually pressing the front-panel power button on this node just in case the ipmitool command this script issued failed. - Try connecting a vga screen/keyboard to the node to see where it's at. - Is there a fault on the node? Record the error state of the 4 LEDs on the back and contact SGI support. Consider moving to the next rack in the mean time, skippnig this rack (hit ctrl-c and re-run discover for the other racks and service nodes). Still Waiting. Hit ctrl-c to abort this process. That will abort discovery at this problem point -- previously discovered components will not be affected. ============================================================================== |
You are now ready to discover and install software on the compute blades in the rack. For instructions, see “Discovering Compute Nodes”.
| Note: Before you run the discover-rack command, make sure the rack leader controllers (leader nodes) have booted and are up. |
In addition, racks can be rediscovered by running the discover-rack command again on a previously discovered rack. Because the discover-rack command turns off the power for all blades, you need to power on the rack again. When powering up the rack it is not necessary to power up the rack leader controller because it is already on. SGI recommends that you avoid power cycling the rack leader controller.
EXAMPLES
Example 2-2. discover-rack Command Examples
To discover rack 1, perform the following:
# /opt/sgi/sbin/discover-rack --rack 1 |
This section describes how to discover compute nodes in your Altix ICE system.
To discover compute nodes (blades) in your Altix ICE system, complete the steps in “Installing Software on the Rack Leader Controllers and Service Nodes”. Then run perform this procedure for each rack in your system:
| Note: Some of the output shown in this example has been modified to fit the format of this manual. |
Run the discover-rack command for each rack in your system from the system admin controller, as follows:
system-admin:~ # /opt/sgi/sbin/discover-rack --rack 1
/opt/sgi/sbin/discover-rack: Running [ssh r1lead discover-blades|cat > /tmp/slot_file_1]
to discover the IRU in rack 1
/opt/sgi/sbin/discover-rack: Running [populate-db-rack --rack 1] to populate the DB with rack 1
/opt/sgi/sbin/discover-rack: Running [generate-leader-hostfile --rack 1] to generate the hosts file
for the leader for rack 1
/opt/sgi/sbin/discover-rack: Running [generate-leader-dhcpfile --rack 1] to generate the dhcpd.conf file
for the leader for rack 1
Shutting down DHCP server ..done
Starting DHCP server [chroot]..done
/opt/sgi/sbin/discover-rack: Running [generate-admin-c3-file] to generate
the c3 configuration file for the admin
/opt/sgi/sbin/discover-rack: Running [generate-leader-c3-file --rack 1]
to generate the c3 configuration file
for the leader for rack 1
/opt/sgi/sbin/discover-rack: Running [generate-service-c3-file] to generate
the c3 configuration file
for all service nodes
/opt/sgi/sbin/discover-rack: Running [generate-leader-ganglia-file --rack 1] to generate
the Ganglia configuration file
for the leader for rack 1
Shutting down gmond..done
Starting gmond..done
/opt/sgi/sbin/discover-rack: Running [generate-admin-ganglia-files] to generate
the Ganglia configuration files for the admin node
Use of $# is deprecated at /opt/sgi/sbin/generate-admin-ganglia-files line 344.
Use of uninitialized value in concatenation (.) or string
at /opt/sgi/sbin/generate-admin-ganglia-files line 344.
Shutting down gmond done
Starting gmond done
Shutting down gmetadsaving /dev/shm/rrds to /var/lib/ganglia/snaps/snap.tar.gz
done
Starting gmetadrrd directory already exists in /dev/shm
snaps directory already exists in /var/lib/ganglia
restoring /dev/shm/rrds from /var/lib/ganglia/snaps/snap.tar.gz
done
/opt/sgi/sbin/discover-rack: Running [generate-admin-dns-zonefile]
to generate DNS zone files for the admin node
Shutting down name server BIND waiting for named to shut down (28s) done
Starting name server BIND done
/opt/sgi/sbin/discover-rack: Running [generate-conserver-files]
to generate conserver.cf files for admin and leader node
Reloading conserver: ..done
Reloading conserver: done
/opt/sgi/sbin/discover-rack: Running [push-and-set-default-compute-image --rack 1]
to push default compute node image to rack 1 and set new blades to boot it. |
At this point, the compute nodes (blades) are ready to be powered up for the first time. They are configured to use the default compute node software image. For information on how to customize the compute node software images for your site, see “Customizing Compute Node Software” in Chapter 3.
For instructions on how to configure, start, verify, or stop the InfiniBand Fabric management software on your Altix ICE system, see Chapter 4, “System Fabric Management”.
| Note: The InfiniBand fabric does not automatically configure itself. For information on how to configure and start up the InfiniBand fabric, see Chapter 4, “System Fabric Management”. |
This section describes how to configure a service node and covers the following topics:
You may want to reach network services outside of your SGI Altix ICE 8200 system. For this type of access, SGI recommends using Network Address Translation (NAT), also known as IP Masquerading or Network Masquerading. Depending on the amount of network traffic and your site needs, you may want to have multiple service nodes providing NAT services.
To enable NAT on your service node, perform the following steps:
Use the configuration tools provided on your service node to turn on IP forwarding and enable NAT/IP MASQUERADE.
Specific instructions should be available in the third-party documentation provided for your storage node system. For service node running SUSE Linux Enterprise Server (SLES), there is documentation at /opt/sgi/docs/setting-up-NAT/README . This document describes how to get NAT working for both IB interfaces.
| Note: This file is only on the service node. You need to # ssh service0 and then from service 0 # cd /opt/sgi/docs/setting-up-NAT. |
Update the all of the compute node images with default route configured for NAT.
SGI recommends a script on the system admin controller at /opt/sgi/share/per_host_customization/global/sgi-static-routes that can customize the routes based upon rack, IRU, and slot of the compute blade. Some examples are available in that script.
Use the use the cimage --add-rack command to propagate the changes to the proper location for compute nodes to boot. For more information on using the cimage command, see “cimage Command” in Chapter 3 and “Customizing Compute Node Software” in Chapter 3.
Use the cimage --set command to select the image
Reboot/reset the compute nodes using that desired image.
Once the service node(s) has NAT enabled, is attached to an operational house network, and the compute nodes are booted from an image which sets their routing to point at the service node, test the NAT operation by using the ping(8) command to ping known IP addresses on the house network from an interactive session on the compute blade.
See the troubleshooting discussion that follows.
Troubleshooting can become very complex. The first steps are to determine that the service node(s) are correctly configured for the house network and can ping the house IP addresses. Good choices are house name servers possibly found in /etc/resolv.conf or /etc/named.conf files. Additionally, the default gateway addresses for the service node may be a good choice. You can use the netstat -rn command for this information, as follows:
system-1:/ # netstat -rn Kernel IP routing table Destination Gateway Genmask Flags MSS Window irtt Iface 128.162.244.0 0.0.0.0 255.255.255.0 U 0 0 0 eth0 172.16.0.0 0.0.0.0 255.255.0.0 U 0 0 0 eth1 169.254.0.0 0.0.0.0 255.255.0.0 U 0 0 0 eth0 172.17.0.0 0.0.0.0 255.255.0.0 U 0 0 0 eth1 127.0.0.0 0.0.0.0 255.0.0.0 U 0 0 0 lo 0.0.0.0 128.162.244.1 0.0.0.0 UG 0 0 0 eth0 |
If the ping command executed from the service node to the selected IP address gets responses, network monitoring tools such as tcpdump(1) should be used. On the service node, monitor the eth1 interface and simultaneously in a separate session monitor the ib[01] interface. You should specify monitoring specific-enough to not have additional noise then attempt execute a ping command from the compute node.
Example 2-3. tcpdump Command Examples
tcpdump -i eth1 ip proto ICMP # Dump ping packets on the public side of service node. tcpdump -i ib1 ip proto ICMP # Dump ping packets on the IB fabric side of service node. tcpdump -i eth1 port nfs # Dump NFS traffic on the eth1 side of service node. tcpdump -i ib1 port nfs # Dump NFS traffic on the eth1 side of service node. |
If packets do not reach the service nodes respective IB interface, perform the following:
Check the system admin controller's compute image configuration of the default route
Verify that this image has been pushed to the compute nodes
Verify that the compute nodes have booted with this image
If the packets reach the service nodes IB interface, but do not exit the eth1 interface, verify the NAT configuration on the service node.
If the packets exit the eth1 interface, but replies do not return, verify the house network configuration and that IP masquerading is properly configured so that the packets exiting the interface appear to be originating from the service node and not the compute node.
You may chose to connect your compute nodes using routable addresses on the house network. This requires planning before the installation by reserving a large block of routable IP addresses on the house network and the correct steps early in installation.
| Note: Placing a fabric on the house network does make it more susceptible to bandwidth and latency fluctuations due to undesired or unexpected network traffic. |
To connect your compute nodes using routable addresses on the house network, perform the following steps:
Enter IP values into the configure-cluster script while you make sure to assign IP addresses in the routable range to the IB fabric(s) you desire.
You can make either ib0, ib1, or both routable on the house network. Careful planning is required.
After house network addresses are assigned, you need to use the service node(s) operating system tools to enable IP forwarding and configure the house routers or network infrastructure to route addresses for the desired fabrics through the desired service nodes.
All of these steps are extremely site specific, therefore, you need to rely on your network administrators to set up this type of configuration.
For information on setting up DNS, see “Installing Software on the System Admin Controller”.
Assuming the installation has either NAT or Gateway operations configured on one or more service nodes, the compute nodes can directly mount the house NFS server's exports (see the exports(5) man page).
To allow the compute nodes to directly mount the house NFS server's exports, perform the following steps:
Edit the system admin controller's /opt/sgi/share/per_host_customization/global/sgi-fstab file or alternatively an image-specific script. An example of the sgi-fstab file is, as follows:
system-1-admin:/opt/sgi/share/per-host-customization/global # cat sgi-fstab
#!/bin/sh
#
# Set up the compute node's /etc/fstab file.
#
# Modify per your sites requirements.
#
# This script is excecuted once per-host as part of the install-image operation
# run on the leader nodes. The full path to the per-host iru+slot directory is
# passed in as $1, e.g. /var/lib/sgi/per-host//i2n11.
#
iruslot=$1
cat <${iruslot}/etc/fstab
# tmpfs /tmp tmpfs defaults 0 0
EOF |
Add the mount point, push the image, and reset the node.
The server's export should get mounted. If it is not, use the technique for troubleshooting outlined in “Troubleshooting Service Node Configuration for NAT”.
This section describes two different ways to configure a service node for NIS, as follows:
NIS with the compute nodes directly accessing the house NIS infrastructure
NIS with a service node as a NIS slave server to the house NIS master
Assuming the installation has either Network Address Translation (NAT) or Gateway operations configured on one or more service nodes, the compute nodes can directly access the house NIS servers. Broadcast operations for discovering NIS servers do not typically work. Therefore, you need to configure the compute images with the IP address of the NIS server to which you want them to connect.
Procedure 2-7. Service Node Configuration for NIS with the Compute Nodes Directly Accessing the House NIS Infrastructure
To configure NIS on a compute node, perform the following steps:
Clone a compute image which you would like to extend to use NIS (see “cimage Command” in Chapter 3 and “Customizing Compute Node Software” in Chapter 3).
| Note: The default installation does not contain the ypbind package. You need to install it for use in your cloned image. |
Install the ypbind package using the operating system package manager.
Use the operating system configuration tools to configure the ypbind software. See your operating system documentation for instructions on configuring ypbind for NIS operations and the ypbind(8) man page.
Push this new image out to the compute nodes and reboot the system to test the configuration.
If the compute blades fail to connect to the NIS server, use the technique for troubleshooting outlined in “Troubleshooting Service Node Configuration for NAT”.
To configure NIS with a service node as a NIS slave server to the house NIS master, perform the following steps:
Make sure your network administrator has authorized the service node to act as a slave server.
Use the service node operating system tools to configure the NIS slave server on the service node.
Use the ypwhich(1) command to verify that it shows localhost as the current server and ypcat(1) passwd looks consistent with what you expect.
| Note: You may have some issues with configuration tools, such as, removing parts of the host name or IP for the server. This can be solved by creating a /etc/hosts record. |
Install the ypbind package using the operating system package manager.
Use the operating system configuration tools to configure the ypbind software. See your operating system documentation for instructions on configuring ypbind for NIS operations and the ypbind(8) man page.
Push this new image out to the compute nodes and reboot the system to test the configuration.
If the compute blades fail to connect to the NIS server, use the technique for troubleshooting outlined in “Troubleshooting Service Node Configuration for NAT”.
| Note: Multiple service nodes can be used as NIS slave servers. |
These section describes how to make a service node an NFS home directory server for the compute nodes.
| Note: Having a single, small server provide filesystems to the whole Altix ICE system could create network bottlenecks that the hierarchical design of Altix ICE is meant to avoid, especially if large files are stored there. Consider putting your home filesystems on an NAS file server. For instructions on how to do this, see “Service Node Configuration for NFS ”. |
The instructions in this section assume you are using the service node image provided with the Tempo software. If you are using your own installation procedures or a different operating system, the instructions will not be exact but the approach is still appropriate.
When you are choosing a disk, please consider the following:
The Tempo installation procedure overwrites data at /dev/sda. Keep /dev/sda exclusively for use by the system.
Most administrators use /dev/sdb to house the home directories. Depending on your hardware, these devices may be single disks or RAIDs.
This example uses sdb name here but sometimes disk ordering can be complicated. It is a good idea to ensure sdb matches the disk you really wish to use, then use filesystem LABELs to ensure the correct filesystem is mounted regardless of what letter it has.
| Note: Steps 1 through 7 of this procedure are performed on the service node. Steps 8 and 9 are performed from the system admin controller (admin node). |
Use the parted(8) utility or some other partition tool to create a partition on /dev/sdb. The following example makes one filesystem out of the disk. You can use parted utility interactively or in a command-line driven manner.
Make a new msdos label, as follows:
# parted /dev/sdb mklabel msdos |
Find the size of the disk, as follows:
# parted /dev/sdb print Disk geometry for /dev/sdb: 0kB - 500GB Disk label type: msdos Number Start End Size Type File system Flags Information: Don't forget to update /etc/fstab, if necessary. |
Create a partition that spans the disk, as follows:
# parted /dev/sdb mkpart primary ext2 0 500GB |
Create a filesystem on the disk. You can choose the filesystem type.
| Note: The mkfs.ext3 command takes more than 10 minutes to create a single 500GB filesystem using default mkfs.ext3 options. If you do not need the number of inodes created by default, use the -N option to mkfs.ext3 or other options that reduce the number of inodes. The following example creates 20 million inodes. XFS filesystems can be created in much shorter time. |
# mkfs.ext3 -L mylabel -N 20000000 /dev/sdb1 |
# mkfs.xfs -L mylabel /dev/sdb1 |
| Note: SGI suggests using a label in this step with the -L parameter, as shown here in both examples. |
Issue the following command to cause the /dev/disk/by-label device to be ready for use immediately and avoid rebooting after creating your home filesystem:
# udevtrigger |
Add the newly created filesystem to the server's fstab file and mount it. Ensure that the new filesystem is exported and that the NFS service is running, as follows:
Append the following line to your /etc/fstab file.
| Note: If you are using XFS, replace ext3 with xfs. Note here the LABEL= is used and not an actual device name. |
LABEL=mylabel /home ext3 defaults 1 2 |
Add the /home filesystem to /etc/exports.
| Note: You may wish to use a more secure export than shown here. See the exports (5) man page for information. |
/home *(rw,sync,mountpoint=/home,no_subtree_check) |
Make sure the NFS server service is enabled, as follows:
# chkconfig nfsserver on # rcnfsserver restart |
The following steps describe how to mount the home filesystem on the compute nodes, as follows:
| Note: SGI recommends that you always work on clones of the SGI-supplied compute image so that you always have a base to copy to fall back to if necessary. For information on cloning a compute node image, see “Creating a Simple Compute Node Image Clone” in Chapter 3. |
Make a mount point in the blade image. In the following example, /home already is a mount point. If you used a different mount point, you need to do something similar to the following on the system admin controller. Note that the rest of the examples will resume using /home.
# mkdir /var/lib/systemimager/images/compute-sles10sp1-clone/my-mount-point |
Add the /home filesystem to the compute nodes. SGI supplies an example script for managing this. You just need to add your new mount point to the sgi-fstab post-host-customization script.
Use a text editor to edit the following file:
/opt/sgi/share/per-host-customization/global/sgi-fstab |
Insert the following line just before the "EOF" line in sgi-fstab file:
service-ib1:/home /home nfs hard 0 0 |
| Note: In order to maximize performance, SGI advises that the ib0 fabric be used for all MPI traffic. The ib1 fabric is reserved for storage related traffic. |
Use the cimage command to push the update to the rack leader controllers serving each compute node, as follows:
# cimage --add-rack compute-sles10sp1-clone "r*" |
Using add-rack on an image that is already on the rack leader controllers has the simple affect of updating them with the change you made above. For more information on using the cimage, see “cimage Command” in Chapter 3.
When you reboot the compute nodes, they will mount your new home filesystem.
For information on centrally managed user accounts, see “Setting Up a NIS Server for Your Altix ICE System”. It describes NIS master set up. In this design, the master server residing on the service node provides the filesystem and the NIS slaves reside on the rack leader controllers. If you have more than one home server, you need to export all home filesystems on all home servers to the server acting as the NIS master. You also need to export the filesystems to the NIS master using the no_root_squash exports flag.
If you want to use NAS server for scratch storage or make home filesystems available on NAS, you can follow the instructions in “Setting Up an NFS Home Server on a Service Node for Your Altix ICE System”. In this example, you need to replace service-ib1 with the ib1 InfiniBand host name for the NAS server and you need to know where on the NAS server the home filesystem is mounted to craft the sgi-fstab script properly.
This section describes how to set up a network information service (NIS) server running SLES10 for your Altix ICE system. If you would like to use an existing house network NIS server, see “Service Node Configuration for NIS for the House Network”. This section covers the following topics:
In the procedures that follow in this section, here are some of the tasks you need to perform and system features you need to consider:
Make a service node the NIS master
Make the rack leader controllers (leader nodes) the NIS slave servers
Not make the system admin controller as the NIS master because it may not be able to mount all of the storage types. Having the storage mounted on the NIS master server makes it far less complicated to add new accounts using NIS.
If multiple service nodes provide home filesystems, the NIS master should mount all remote home filesystems. They should be exported to the NIS master service node with the no_root_squash export option. The example in the following section assumes a single service node with storage and that same node is the NIS master.
NIS synchronization traffic between NIS master and slave servers (leader nodes) goes over Infiniband connections when NIS maps are adjusted and pushed out.
Service node NIS (besides NIS master) traffic goes over InfiniBand because of how host name resolution works.
Compute node NIS traffic goes over Ethernet, not InfiniBand, by way of using a the lead-eth server name in the yp.conf file. This design feature prevents NIS traffic from affecting the InfiniBand traffic between the compute nodes.
This section describes how to set up a service node as a NIS master. This section only applies to service nodes running SLES10.
To set up a service node as a NIS master, perform the following steps:
| Note: These instructions use the text-based version of YaST. The graphical version of YaST may be slightly different. |
Start up YaST, as follows:
# yast nis_server |
Choose Create NIS Master Server and click on Next to continue.
Choose an NIS domain name and place it in the NIS Domain Name window. This example, uses ice.
Select This host is also a NIS client.
Select Active Slave NIS server exists .
Select Fast Map distribution.
Select Allow changes to passwords .
Click on Next to continue.
Set up the NIS master server slaves.
| Note: You are now in the NIS Master Server Slaves Setup. Just now, you can enter the already defined rack leader controllers (leader nodes) here. If you add more leader nodes or re-discover leader nodes, you will need to change this list. For more information, see “Tasks You Should Perform After Changing a Rack Leader Controller ”. |
Select Add and enter r1lead-ib1 in the Edit Slave window. Enter any other rack leader controllers you may have just like above. Click on Next to continue.
| Note: This example uses r1lead-ib1 because r1lead would not resolve to anything on a service node. |
You are now in NIS Server Maps Setup . The default selected maps are okay. Avoid using the hosts map (not selected by default) because can interfere with Altix ICE system operations. Click on Next to continue.
You are now in NIS Server Query Hosts Setup. Use the default settings here. However, you may want to adjust settings for security purposes. Click on Next to continue.
At this point, the NIS master is configured. Assuming you checked the This host is also a NIS client box, the service node will be configured as a NIS client to itself and start yp ypbind for you.
This section describes how to use YaST to set up your other service nodes to be broadcast binding NIS clients. This section only applies to service nodes running SLES10.
| Note: You do not do this on the NIS Master service node that you already configured as a client in “Setting Up a Service Node as a NIS Master”. |
To set up a service node as a NIS client, perform the following steps:
Enable ypbind, perform the following:
# chkconfig ypbind on |
Set the default domain (already set on NIS master). Change ice (or whatever domain name you choose above) to be the NIS domain for your Altix ICE system, as follows:
# echo "ice" > /etc/defaultdomain |
Set up the service node to broadcast bind by creating this simple yp.conf file, as follows:
# echo "broadcast" > /etc/yp.conf |
Start the ypbind service, as follows:
# rcypbind start |
The service node is now bound.
Add the NIS include statement to the end of the password and group files, as follows:
# echo "+:::" >> /etc/group # echo "+::::::" >> /etc/passwd # echo "+" >> /etc/shadow |
This section provides two sets of instructions for setting up rack leader controllers (leader nodes) as NIS slave servers. One set of instructions uses YaST, the other uses a set of commands that could be scripted if you so choose. It is possible to make all these adjustments to the leader image in /var/lib/systemimager/images. Currently, SGI does not recommend using this approach.
| Note: Be sure the InfiniBand interfaces are up and running before proceeding because the rack leader controller gets its updates from the NIS Master over the InfiniBand network. If you get a "can't enumerate maps from service0" error, check to be sure the InfiniBand network is operational. |
Use the following set of commands to set up a rack leader controller (leader node) as a NIS slave server and client.
| Note: Replace ice with your NIS domain name and service0 with the service node you set up as the master server. |
# cexec --head chkconfig ypserv on # cexec --head chkconfig ypbind on # cexec --head chkconfig portmap on # cexec --head chkconfig nscd on # cexec --head rcportmap start # cexec --head "echo ice > /etc/defaultdomain" # cexec --head "ypdomainname ice" # cexec --head "echo ypserver 127.0.0.1 > /etc/yp.conf" # cexec --head "echo +::: >> /etc/group" # cexec --head "echo +:::::: >> /etc/passwd" # cexec --head "echo + >> /etc/shadow" # cexec --head /usr/lib/yp/ypinit -s service0 # cexec --head rcportmap start # cexec --head rcypserv start # cexec --head rcypbind start # cexec --head rcnscd start |
This section describes how to set up the compute nodes to be NIS clients. You an configure NIS on the clients to use a server list that only contains the their rack leader controller (leader node). All operations are performed from the system admin controller (admin node).
To set up the compute nodes to be NIS clients, perform the following steps:
Create a compute node image clone. SGI recommends that you always work with a clone of the compute node images. For information on how to clone the compute node image, see “Creating a Simple Compute Node Image Clone” in Chapter 3.
Change the compute nodes to use the cloned image/kernel pair, as follows:
# cimage --set compute-sles10sp1-clone 2.6.16.46-0.12-smp "r*i*n*" |
Set up the NIS domain, as follows ( ice in this example):
# echo "ice" > /var/lib/systemimager/images/compute-sles10sp1-clone/etc/defaultdomain |
Set up compute nodes to get their NIS service from their rack leader controller (fix the domain name as appropriate), as follows:
# echo "ypserver lead-eth" > /var/lib/systemimager/images/compute-sles10sp1-clone/etc/yp.conf |
Enable the ypbind service, using the chroot command, as follows:
# chroot /var/lib/systemimager/images/compute-sles10sp1-clone chkconfig ypbind on |
Set up the password, shadow, and group files with NIS includes, as follows:
# echo "+:::" >> /var/lib/systemimager/images/compute-sles10sp1-clone/etc/group # echo "+::::::" >> /var/lib/systemimager/images/compute-sles10sp1-clone/etc/passwd # echo "+" >> /var/lib/systemimager/images/compute-sles10sp1-clone/etc/shadow |
Push out the updates using the cimage command, as follows:
# cimage --add-rack compute-sles10sp1-clone "r*" |
The NAS cube needs to get configured with each InfiniBand fabric interface in a separate subnet. These fabrics will be separated from each other logically, but attached to the same physical network. For simplicity, this guide assumes that the -ib1 network for the compute nodes has addresses assigned in the 10.149.0.0/16 network. This guide also assumes the lowest address the cluster management software has used is 10.149.0.1 and the highest is 10.149.1.3 (already assigned to the NAS cube).
For the NAS cube, you need to configure the large physical network into four, smaller subnets, each of which would be capable of containing all the nodes and service nodes. It will have subnets 10.149.0.0/18 , 10.149.64.0/18, 10.149.128.0/18 , and 10.149.192.0/18.
After the discovery of the storage node has happened, SGI personnel will need to log onto the NAS box and change the network settings to use the smaller subnets, and then define the other three adapters with the same offset within the subnet; for example: Initial configuration of the storage node had set ib0 fabric's IP to 10.149.1.3 netmask 255.255.0.0. After the addresses are changed, ib0=10.149.1.3:255.255.192.0, ib1=10.149.65.3:255.255.192.0 , ib2=10.149.129.3:255.255.192.0, ib3=10.149.193.3:255.255.192.0 . The NAS cube should now have all four adapter connections connected to the fabric with IP addresses which can be pinged from the service node.
| Note: The service nodes and the rack leads will remain in the 10.149.0.0/16 subnet. |
For the compute blades, log into the admin node and modify /opt/sgi/share/per_host_customizations/global/sgi-setup-ib-configs file. Following the line iruslot=$1, insert:
# Compute NAS interface to use
IRU_NODE=`basename ${iruslot}`
RACK=`cminfo --rack`
RACK=$(( ${RACK} - 1 ))
IRU=`echo ${IRU_NODE} | sed -e s/i// -e s/n.*//`
NODE=`echo ${IRU_NODE} | sed -e s/.*n//`
POSITION=$(( ${IRU} * 16 + ${NODE} ))
POSITION=$(( ${RACK} * 64 + ${POSITION} ))
NAS_IF=$(( ${POSITION} % 4 ))
NAS_IPS[0]="10.149.1.3"
NAS_IPS[1]="10.149.65.3"
NAS_IPS[2]="10.149.129.3"
NAS_IPS[3]="10.149.193.3" |
Then following the line . $iruslot/etc/opt/sgi/cminfo add:
IB_1_OCT12=`echo ${IB_1_IP} | awk -F "." '{ print $1 "." $2 }'`
IB_1_OCT3=`echo ${IB_1_IP} | awk -F "." '{ print $3 }'`
IB_1_OCT4=`echo ${IB_1_IP} | awk -F "." '{ print $4 }'`
IB_1_OCT3=$(( ${IB_1_OCT3} + ${NAS_IF} * 64 ))
IB_1_NAS_IP="${IB_1_OCT12}.${IB_1_OCT3}.${IB_1_OCT4}" |
Then change the IPADDR='${IB_1_IP}' and NETMASK='${IB_1_NETMASK}' lines to the following:
IPADDR='${IB_1_NAS_IP}'
NETMASK='255.255.192.0' |
Then add the following to the end of the file:
# ib-1-vlan config
cat < $iruslot/etc/sysconfig/network/ifcfg-vlan1
# ifcfg config file for vlan ib1
BOOTPROTO='static'
BROADCAST=''
ETHTOOL_OPTIONS=''
IPADDR='${IB_1_IP}'
MTU=''
NETMASK='255.255.192.0'
NETWORK=''
REMOTE_IPADDR=''
STARTMODE='auto'
USERCONTROL='no'
ETHERDEVICE='ib1'
EOF
if [ $NAS_IF -eq 0 ]; then
rm $iruslot/etc/sysconfig/network/ifcfg-vlan1
fi |
To update the fstab for the compute blades, edit /opt/sgi/share/per-host-customization/global/sgi-fstab file. Perform the equivalent steps as above to add the # Compute NAS interface to use section into this file. Then to specify mount points, add lines similar to the following example:
# SGI NAS Server Mounts
${NAS_IPS[${NAS_IF}]}:/mnt/data/scratch /scratch nfs defaults 0 0 |
If you add or remove a rack leader controller (leader node), for example, if you use discover command to discover a new rack of equipment, you will need to configure the new rack leader controller to be an NIS slave server as described in “Setting Up a Service Node as a NIS Client”.
In addition, you need to add or remove the leader from the /var/yp/ypservers file on NIS Master service node. Remember to use the -ib1 name for the leader, as service nodes cannot resolve r2lead style names. For example, use r2lead-ib1.
# cd /var/yp && make |
The example used in this section assumes that the home directory is mounted on the NIS Master service and that the NIS master is able to create directories and files on it as root. The following example use command line commands. You could also create accounts using YaST.
To create user accounts on the NIS server, perform the following steps:
Log in to the NIS Master service node as root.
Issue a useradd command similar to the following:
# useradd -c "Joe User" -m -d /home/juser juser |
Provide the user a password, as follows:
# passwd juser |
Push the new account to the NIS servers, as follows:
# cd /var/yp && make |