Chapter 2. Getting Started with Scali Manage

This section describes how to install, configure, discover, and operate your SGI Altix ICE system using the Scali Manage software management tool. It covers the following topics:


Note: SGI Altix ICE systems running Scali Manage software are shipped pre-installed. Instructions in this section for defining and discovering nodes can be used if you are expanding the initially delivered cluster or reinstalling your software. They are NOT for configuring the initially delivered cluster.


Installing or Updating Software

Scali Manage offers a mechanism to upload and install software across the SGI Altix ICE system. This upload and installation process requires that the software installation be in RPM format. Tarball software distributions can be installed across a cluster.

Instructions for installing software options or uploading additional software for your system using the Scali GUI are covered in Chapter 3 of the Scali Manage User's Guide.

Customers with support contracts needing BIOS or Firmware updates, should check the SGI Supportfolio Web Page at: https://support.sgi.com/login

Administrative Tips

This section describes some useful administrative tips and covers these topics:

System Password Information

Root password and administrative information includes:

  • Root password = sgisgi (system admin controller (admin node) and compute nodes)

  • ipmitool user/password information: User = ADMIN Password = ADMIN

Power on or Power off System Components or Obtain Status

To power on or power off system componets, use the Scali Manage power command. To get a system console, use the Scali Manage console command. See the "The Power Interface" and "The Console Interface" sections in Chapter 9, "Scali Manage Command Line Interfaces" of the Scali Manage User's Guide.

From the admin node, to power on compute nodes r01i01n01 and r01i01n02, perform the following:

system-admin: # scash -p -n r01lead /opt/scali/sbin/power r01i01n0[1,2] on

To check status for compute nodes r01i01n01 and r01i01n02, perform the following:

system-admin: # scash -p -n r01lead /opt/scali/sbin/power r01i01n0[1,2] status
r01lead  : r01i01n01: ON
r01lead  : r01i01n02: ON

To get a console for a service node, perform the following:

system-admin: # console service1
[Enter `^Ec?' for help]


Welcome to SUSE Linux Enterprise Server 10 SP1 (x86_64) - Kernel 2.6.16.46-0.12-smp (console).


service1 login:


Note: For this release, you need to log onto the appropriate leader node to run the power commands.


You can also use the Scali Manage GUI to execute power commands. The Scali Manage GUI supports a clean shutdown. Clean shutdown is required for the service and leader nodes because they have local disk. A power cycle with the Altix ICE compute nodes causes the complete re-imaging of the compute node.

For information on Scali Manage networking conventions used with the power commands, see “Network Interface Naming Conventions” in Chapter 1.

Scali Manage Installer Directory

The Scali Manage installer directory (/usr/local/Scali### ) is the location of the code used to install Scali Cluster management Software.

The Factory-Install directory is located on the admin node server at /usr/local/Factory-Install. The /Factory-Install directory contains software files that support the cluster integration and many files and scripts under /usr/local/ that may be helpful, including:

/Factory-Install/Apps 

Scali, ibhost, Intel compilers, MPI runtime libraries, ipmitool, and so on

/Factory-Install/ISO 

CD ISO images of the base OS for installing Scali Cluster Manage software

/Factory-Install/Docs 

Cluster documentation manuals (Scali, PBS Professional, Voltaire, SMC, SGI)

/Factory-Install/Firmware 

Voltaire HCA and Voltaire switch firmware files, etc

/Factory-Install/CFG 

Cluster configuration files

/Factory-Install/Scripts 

Miscellaneous utility scripts

Scali Manage Command CLI Help

You can get a help statement for the Scali Manage command line interface (CLI) as shown in the following example:

system-1:~ # scalimanage-cli help SGI
----  SGI Altix ICE commands ----
List of commands:
definealtixiceblade - Define Altix ICE Blades(s) 
definealtixicecmc - Define Altix ICE CMCs(s) 
definealtixiceleadnode - Define Altix ICE Lead node(s) 
definealtixicerack - Define Altix ICE Rack(s) 
definealtixiceservicenode - Define Altix ICE Service node(s) 
discoveraltixicecmc - Discover CMC and Blade MAC addresses 
discoveraltixiceservicenode - Find BMC MAC addresses for systems
initaltixicesms - Initate Scali Manage Server for Altix ICE 
poweraltixiceiru - Control power to an Altix ICE IRU 
restartaltixiceopensm - Restart the OpenSM subnetmanagers on the leadnodes 
Type "help" followed by 
command name for full documentation.
Command name abbreviations are allowed.

To get partial help information, perform the following:

[system-1 ~]# cli help all | grep -i remotefs addremotefs <systemnames> <fstype> <src> 
<mntpoint> [options] listremotefs <systemnames> removeremotefs <systemnames> <mntpoint> 

You can also get help on specific commands, as follows:

system-1:~ # scalimanage-cli help definealtixiceblade

definealtixiceblade  [irus=[1-4]] [slots=[1-16]] 
    Define Altix ICE Blades(s)
  
    racks       - Rack number for the blade(s) [..]
    irus        - IRU number for the blade(s) [..]
    slots       - Blade slots for the blade(s) [..]

    Options:
        --irus=IRUS
        --slots=SLOTS

Configuring the Scali Manage Server

After installing the operating system and Scali Manage on the system admin node (Scali Manage server) to automatically configure the Scali Manage Server according to the Altix ICE network topology, perform the following:

scalimanage-cli initaltixicesms ProPack_path

This will perform the following actions on your system:

Defining New Racks or Service Nodes

To add one or more racks of compute nodes, perform the following:

scalimanage-cli definealtixicerack <racknumbers>

To add multiple racks in one action, use a bracket expression for the racknumber, such as the following:

scalimanage-cli definealtixicerack [1-16]

Running the command above, will update the Scali Manage configuration database and define the following

  • One rack leader controller (leader node), sixteen leader nodes for the example above

  • A rack subnet and a rack BMC subnet per rack

  • Four chassis management controllers (CMCs) per rack

  • 16 compute blades per CMC per rack

Optionally, the number of CMCs per rack and the number of blades per CMC can be specified to define partial Altix ICE rack configurations. The command is a shortcut for definealtixiceleadnode, definealtixicecmc and definealtixiceblade. For more fine grained control, for example, adding only a partly full rack or add more compute nodes to an existing rack, use the definealtixiceleadnode command.

To define a service node, use the definealtixiceservicenode command.

After defining new hardware in the database, the service node(s) and leader nodes BMCs must be discovered and configured (see “Discovering Service and Leader Nodes”).

Discovering Service and Leader Nodes

Before new service or leader nodes can be installed, the associated BMCs MAC addresses must be discovered and IP addresses must be assigned. To do this, perform the following:

scalimanage-cli discoveraltixiceservicenode [systemnames]

This will perform the following actions on your system:

  • Ask the operator to plug in one rack/service node at the time

  • Discover the MAC addresses of the associated BMCs

  • Assign IP addresses to the BMCs via dynamic host configuration protocol (DHCP)

Installing Service and Leader Nodes

To install service nodes or leader nodes, perform the following:

scalimanage-cli install <systemnames> 

If the MAC address of the system is unknown, it will be automatically determined via DHCP discovery.

Discovering CMCs and Compute Nodes

Before compute nodes can be booted the CMCs must have IP addresses assigned and the MAC addresses of the BMCs and blades must be discovered and IP addresses must be assigned. To do this, perform the following:

scalimanage-cli discoveraltixicecmc [cmcnames]

This will perform the following actions on your system:

  • Discover the MAC addresses of the CMCs

  • Assign IP addresses to the CMCs

  • Power on the IRUs

  • Discover MAC addresses of BMCs through CMCs

  • Assign IP addresses to the BMCs

  • Power on nodes

  • Discover MAC addresses of blades through CMCs

Installing Compute Nodes

As described in the chapter 1, “SGI Altix ICE 8200 System Overview”, on SGI Altix ICE systems, the InfiniBand network ib1 is to be used for storage traffic, the InfiniBand network ib0 is to be used for MPI traffic, and the Ethernet network is used only for system administration.

The node from which the compute node installation image is created must not have a service-ib1:/home NFS mount entry in the /etc/fstab file. It is very likely that you will need to manually delete this entry before creating an installation image of the node.

Use the scalimanage-cli addremotefs command to create the mounts. Scali Manage runs the NFS mount command later in the boot sequence than mounting of NFS filesystems listed in /etc/fstab .

The mounts created with scalimanage-cli addremotefs are persistent across reboots and reinstalls. If new nodes are added to the system, it is necessary to run scalimanage-cli addremotefs for the new nodes.

To install a compute node, perform the following steps:

  1. The installation of the compute node can either be a direct installation using packages, or can be an installation using an installation image created from another node, such as a service node. If the former method is used, it is possible to use a compute node specific installation template.

  2. Confirm that there is no service-ib1:/home NFS mount configured.

    Confirm that there is no entry for service-ib1:/home in the /etc/fstab file of the system from which the image is going to be created.

    If this is the initial installation, then no /home entry is expected in the /etc/fstab of the compute node that is about to be imaged. However, if you are not doing an initial installation, then this entry will exist in /ec/fstab. Delete the service-ib1:/home entry from the node /etc/fstab file. There may be other NFS entries included in /etc/fstab file that also need deleting, such as, a service-ib1:/data entry, or an entry or entries for off-cluster NFS servers.

  3. Create an installation image from this node.

    From the Altix ICE admin node, run the scalimanage-cli captureimage command. You can get a usage statement for this command, as follows:

    scalimanage-cli help captureimage
    captureimage <systemname> <imagename> [description] [excludes..]
         Capture image from system
         Arguments
             systemname  - name of system
             imagename   - name of image
             description - Description of image; Default none
             excludes    - list of files or directories to be excluded (space separated)
    
         Options:
             --description=DESCRIPTION

    To capture an image from node r01i01n02 and save that image as r01i01n02-image1, perform the following:

    scalimanage-cli captureimage r01i01n02 r01i01n02-image1

  4. Configure the compute nodes to use this image.

    scalimanage-cli help setdiskless
    setdiskless <systemnames> <imagename>
         This method sets systems(s) diskless with software image
         Arguments:
             systemnames - system(s) {[..]}
             imagename   - os image to set

    To set nodes r01i01n02...r01i01n06 to use the image created from r01i01n02, perform the following:

    scalimanage-cli setdiskless r01i01n0[2-6] r01i01n02-image1

    Run scalimanage-cli reconfigure all to propagate the Scali Manage changes.

  5. Configure Scali Manage to manage the service-ib1:/home NFS mount.

    With the image saved, use the scalimanage-cli addremotefs command to add the NFS mount to nodes. You can get a usage statement for this command, as follows:

    scalimanage-cli help addremotefs
    addremotefs <systemnames> <fstype> <src> <mntpoint> [options=_netdev]
         Add mounting for remote filesystem on system(s)
         Arguments:
             systemnames - name of system(s) {[..]}
             fstype      - type of filesystem, legal values: "nfs" "lustre"
             src         - source
             mntpoint    - mountpoint
             options     - options to mount command to be given as -o options to mount.
                           By default options="_netdev".
                           Options values should be comma seperated 
                           for e.g "_netdev,tcp,hard,rsize=64K,wsize=64K,intr"
    
         Options:
             --options=OPTIONS
    

    To configure Scali Manage with the NFS mount, perform the following:

    scalimanage-cli addremotefs r01i01n0[2-6] nfs service1-ib1:/home /home

    Propagate the changes with scalimanage-cli reconfigure all command. Errors may occur with the nodes that have not been installed.

    You can confirm that Scali Manage knows about the mounts, as follows:

    scalimanage-cli listremotefs r01i01n0[2-6]
    

  6. Power off or confirm that the compute nodes are powered off.

    Currently, the Scali Manage GUI power node off or on does not work correctly from the admin node.

    You can log on to the rack leader controller (leader node) and power off the compute nodes from there using one of the methods described below.

    From the leader node, highlight all nodes in the GUI and select right click> Node On/Off > Power Off

  7. From the rack leader node use power command, as follows:

    r01lead:~ # power r01i01n0[2-6] off
    r01i01n02: SUCCESS
    r01i01n03: SUCCESS
    r01i01n04: SUCCESS
    r01i01n05: SUCCESS
    r01i01n06: SUCCESS
    
    r01lead# power r01i01n0[2-6] status
    r01i01n02: OFF
    r01i01n03: OFF
    r01i01n04: OFF
    r01i01n05: OFF
    r01i01n06: OFF

  8. From the rack leader node you can use the ipmitool, as follows:

    r01lead# /usr/bin/ipmitool -I lanplus -o supermicro -U ADMIN -P ADMIN -H 192.168.1.14 power off
    Chassis Power Control: Down/Off
    
    r01lead# /usr/bin/ipmitool -I lanplus -o supermicro -U ADMIN -P ADMIN -H 192.168.1.14 power status
    Chassis Power is off

  9. Install the compute nodes.

    Starting with the nodes powered off, install the compute nodes from the SGI Altix ICE admin node, as follows:

    scalimanage-cli install r01i01n0[2-6]

    DHCP requests can be followed on the rack leader nodes, as follows:

    r01lead# tail -f /var/log/messages

    You can follow the installation and boot of the nodes (or of representative nodes) using either the Scali Manage consoles (from the GUI) or using ipmitool SOL console interface.

  10. From the admin node, verify that /home is mounted as expected, as follows:

    # scashdc -p mount | grep home | sort
    r01i01n02-eth0 : service1-ib1:/home on /home type nfs (rw,_netdev,addr=10.1.0.1) 
    r01i01n03-eth0 : service1-ib1:/home on /home type nfs (rw,_netdev,addr=10.1.0.1) 
    r01i01n04-eth0 : service1-ib1:/home on /home type nfs (rw,_netdev,addr=10.1.0.1) 
    r01i01n05-eth0 : service1-ib1:/home on /home type nfs (rw,_netdev,addr=10.1.0.1)
    r01i01n06-eth0 : service1-ib1:/home on /home type nfs (rw,_netdev,addr=10.1.0.1)

Configuration Session Example

This is section shows a complete SGI Altix ICE configuration example, as follows:

scalimanage-cli initaltixicesms /tmp/ofed.stout5sp2.rpms.tgz
scalimanage-cli definealtixicerack 1 [1-2] [1-2]
scalimanage-cli definealtixiceservicenode service1
/etc/init.d/scance restart
scalimanage-cli discoveraltixiceservicenode
scalimanage-cli install "service1 r01lead"
scalimanage-cli discoveraltixicecmc
scalimanage-cli captureimage service1 image1
scalimanage-cli setdiskless r01i[01-02]n[01-02] image1
scalimanage-cli install r01i[01-02]n[01-02]

Using the Scali Manage GUI

This chapter provides general administrative information section and information on starting and using the Scali Manage GUI in a Scali managed cluster. For information on using the Scali Manage command line interface, refer to the Scali Manage User's Guide.

Login to the Scali Manage interface as root, the factory password is sgisgi. Use your system name and log in as root as shown in Figure 2-1.

Figure 2-1. Example Starting Screen for the Scali Manage GUI

Example Starting Screen for the Scali Manage GUI

Displaying Cluster Components

Cluster components are shown in Figure 2-2. r01 is rack 01 and r02 is rack 02, i01 is IRU 1, and n01 and n02 are nodes 1 and 2. r01lead and r02lead are the rack leader controllers (leader nodes) for the cluster. service1 is the service nodes for the cluster. System naming conventions when using Scali Manage are described in “Network Interface Naming Conventions” in Chapter 1.

Figure 2-2. Cluster Components Selection Screen Example

Cluster Components Selection Screen Example

Scali Manage Troubleshooting Tips

This section describes some general guidelines as well as emergency procedures.

Whenever a Scali cluster parameter is changed, it is necessary to apply the configuration. This can be done either through the graphical user interface (GUI) by selecting Provisioning > Apply All Configuration Changes or via the command line interface (CLI), as follows: scalimanage-cli reconfigure all. Changes can be made in batches and then applied all at once.

There are situations when the GUI does not reflect the cluster configuration properly. Restarting the GUI may solve this problem.

In rare cases the Scali product enters an inconsistent state. In this state it shows abnormal behavior and refuses to take any input. In this case try to reinitialize the admin node via /etc/init.d/scance restart.

This command must be run on the admin node. If this does not change Scali's state, then you should reboot the admin node. This should ensure that Scali will be in a consistent state.

Array services has configuration files /etc/array/arrayd.{auth,conf} with links to /usr/lib/arrayd.{conf,auth}. When you update your system configuration and later reboot the compute node(s), your configuration will be lost because the compute nodes are stateless. You need to capture another image after changing configuration files.

Compute Node RPMs

The following section describes what packages are installed on the compute node and covers these topics:

Compute Node RPMs on SLES

The following RPMs reside on the compute node when you run Scali Manage on top of SUSE Linux Enterprise Server 10 (SLES10):

cpuset-utils
dapl
dapl-devel
dapl-utils
ibutils
intel-cluster-runtime
ipoibtools
kernel-ib-ice
libbitmask
libcpuset
libibcm
libibcommon
libibmad
libibumad
libibverbs
libibverbs-devel
libibverbs-utils
libmthca
libopensm
libosmcomp
libosmvendor
librdmacm
librdmacm-utils
lkSGI
mpitests_mpt
msr-tool
mstflint
numatools
ofed-docs
ofed-scripts
openib-diags

Compute Node RPMs on RHEL

The following RPMs reside on the compute node when you run Scali Manage on top of Red Hat Enterprise Linux 5 (RHEL5):

cpuset-utils
dapl
dapl-devel
dapl-utils
environment-modules
ibutils
intel-cluster-runtime
ipoibtools
kernel-ib-ice
kmod-numatools
kmod-ofa_kernel
kmod-xpmem
libbitmask
libcpuset
libibcm
libibcommon
libibmad
libibumad
libibverbs
libibverbs-devel
libibverbs-utils
libmthca
libopensm
libosmcomp
libosmvendor
librdmacm
librdmacm-utils
lkSGI
mpitests_mpt
msr-tool
mstflint
numatools
ofed-docs
ofed-scripts
openib-diags
pcp-open
perftest
rds-tools
sgi-arraysvcs
sgi-mpt
sgi-procset
sgi-release
sgi-support-tools
tvflash
xpmem