Chapter 5. Multiboard Feature for GSN and HiPPI 800 Networks

The SGI implementation of MPI automatically detects multiple high speed interconnect networks, including GSN and HiPPI 800 network adapters, and attempts to use the fastest interconnect by default. When multiple interface adapters are detected on a host, MPI attempts to use as many of them as possible when sending messages among hosts. During the initialization of the MPI job, each detected adapter is tested to determine which hosts it can reach. The adapter is then added to the list of available adapters for messages among the reachable hosts.

The multiboard feature is enabled by default. This implementation relaxes the requirements of earlier SGI MPI releases that the HiPPI interface adapters be located in the same board slot and have the same interface number, such as hip0.

When a high-speed interconnect network is detected, SGI MPI uses an OS Bypass protocol if the interconnect network is available for use on every host for the given MPI job. If the detected high speed interconnect network is not available for use on every host in a multihost MPI job, SGI MPI tries the next slower network until it finds one that meets this connection requirement. The order of attempted usage is GSN, HiPPI 800, and finally TCP/IP over the standard Ethernet interface. Note that one can redirect the TCP/IP traffic over an alternate Ethernet interface, such as TCP/IP-over-GSN or Gigabit Ethernet, by specifying the appropriate host name or IP address on the mpirun line.

When using the high speed OS Bypass protocol, the multiboard feature does not require that every host exist on a given subnet. This allows for network switch topologies that may be limited to 8 or 32 hosts per switch or per subnet. SGI MPI automatically detects the network topology during the startup phase. If every host is on at least one GSN or HiPPI 800 interconnect network, the OS Bypass protocol for that homogenous network, either GSN or HiPPI 800, is used.

When OS Bypass protocol is in use and multiple high speed GSN or HiPPI 800 connections are available between any two hosts, the MPI message traffic is sent over the all of the available adapters. Note that SGI MPI automatically splits messages whose lengths exceed 16,384 bytes into smaller, 16,384-byte chunks. Individual chunks can be sent over different adapters, having a desirable load-balancing effect.

The multiboard feature allows users of the HiPPI 800 adapters to select the manner in which multiple adapters are assigned to MPI processes and the algorithm for distributing the MPI job message traffic. You can change the method for selecting a HiPPI 800 adapter to use by setting the MPI_BYPASS_DEV_SELECTION environment variable. Table 5-1 describes the algorithms for the various settings.

Note that the GSN interconnect uses only the round robin selection method and is not affected by the MPI_BYPASS_DEV_SELECTION variable settings.

Table 5-1. Algorithms for assigning multiple HiPPI adapters to MPI processes

Setting

Description

0

Static device selection. In this case, a process is assigned a HIPPI device to use for communication with processes on another host. The process uses only this HIPPI device to communicate with another host. This algorithm has been observed to be effective when interhost communication patterns are dominated by large messages (significantly more than 16384 bytes).

1 (default)

Dynamic device selection. In this case, a process can select from any of the devices available for communication between any given pair of hosts. The first device that is not being used by another process is selected. This algorithm has been found to work best for applications in which multiple processes are trying to send medium-sized messages (16,384 or fewer bytes) between processes on different hosts. Large messages (more than 16,384 bytes) are split into chunks of 16,384 bytes. Different chunks can be sent over different HIPPI devices.

2

Round robin device selection. In this case, each process sends successive messages over a different HIPPI 800 device.