Chapter 4. System Fabric Management

The InfiniBand network on SGI Altix ICE systems uses Open Fabrics Enterprise Distribution (OFED) software. This section describes the InfiniBand fabric and how to manage it. For background information on OFED, see http://www.openfabrics.org.

InfiniBand Fabric Management

This section describes the InfiniBand fabric and covers the following topics:

InfiniBand Fabric Overview

InfiniBand fabric management on SGI Altix ICE systems is done using the OFED OpenSM software package and the sgifmcli tool (see “Fabric Component sgifmcli Command”). The InfiniBand fabric connects the service nodes, rack leader controllers (leader nodes), and the compute nodes. It does not connect to the system admin controller (admin node) or the chassis management control (CMC) blades. SGI Altix ICE systems usually have two separate InfiniBand fabrics, which are generally referred to as "ib0" and "ib1" within this manual, see “InfiniBand Fabric” in Chapter 1.


Note: The LX series only has one ib fabric, "ib0". Any references to "ib1" in this manual do not apply to LX systems.

Each InfiniBand fabric (also sometimes called an InfiniBand subnet) has its own subnet manager (SM), which runs on a rack leader controller (leader node). For a system with two or more racks, the SM for each fabric is usually configured to run on different leader nodes. In a single rack system, both SMs will run on the single leader node. Each SM may also be paired with a standby SM which can take over in the event of the failure of the primary SM. For more information, see “InfiniBand Fabric Failover Mechanism”.

Rack leader controllers associate a SM instance with a particular port on the leader node. Usually, ib0 is mapped to port 1 of the InfiniBand host channel adapter (HCA) on the SM node, and ib1 is mapped to port 2 of the HCA on the SM node (see Figure 1-9). SM for ib0 and ib1 is configured using the corresponding /etc/ofa/opensm-ib[01].conf file.


Note: After a system reboot, the opensm daemons start running automatically.


SGI supports the following topologies: hypercube, enhanced hypercube, and fat tree.

The InfiniBand Management Tool Graphical User Interface

You can use the InfiniBand management tool graphical user interface (GUI) to configure, administer, or verify the InfiniBand fabric on your SGI Altix ICE system. You can use it to configure, start, stop, restart, cleanup, or get status for the InfiniBand fabric.

From the system admin controller (admin node), enter the following command:

admin:~ # tempo-configure-fabric

The InfiniBand Management Tool GUI appears, as shown in Figure 4-1.

You can also access this command from the configure-cluster GUI main menu Configure Infiniband Fabric option (see “configure-cluster Command Cluster Configuration Tool” in Chapter 2). For more information, see Figure 4-1.

Figure 4-1. InfiniBand Management Tool Screen

InfiniBand Management Tool
Screen

Use the Select button to select the action you want to perform. A submenu will appear. Use the Quit button to return to the previous screen. Use the InfiniBand Management GUI to manage your InfiniBand fabric. You can use the Help button to get online help for each of the GUI actions.

If the tempo-configure-fabric command fails in a configuration or administrative operation, it suggests that you use the sgifmcli(8) command (described in “Fabric Component sgifmcli Command”) to debug the problem. Alternatively, you can use the Reset and Init Fabric Database option from the InfiniBand Management Tool main menu (see Figure 4-1) to start over and completely reconfigure the InfiniBand fabrics.

From the Configure InfiniBand screen, make sure you select the Configure Topolgy option to set the topology as shown in Figure 4-2. For more information, see “Network Topology”.

Figure 4-2. Configure Topology Screen

Configure Topology Screen

Use the the online help available with this tool to guide you through the InfiniBand configuration. After configuring and bringing up the InfiniBand network, select the Administer InfiniBand ib0 option or the Administer InfiniBand ib1 option, the Administer InfiniBand screen appears as shown in Figure 4-3. You can use this screen to start, stop, restart, or refresh a fabric.

Figure 4-3. Administer InfiniBand Tool Screen

Administer InfiniBand
Tool Screen

You can verify the status via the Status option, as shown in Figure 4-4.

Figure 4-4. Administer InfiniBand Status Option

Administer InfiniBand Status Option

The Refresh Enhanced Hypercube Config and Restart option applies only to the Enhanced Hypercube topology. You are required to refresh the fabric configuration when you either add, remove, or move one or more compute blades or service nodes. The refresh action updates the guid routing order file which is used to balance InfiniBand traffic for the Enhanced Hypercube topology. In addition, this action also automatically restarts the master subnet manaager (SM) and the optional standby SM for the specified fabric (see “InfiniBand Fabric Failover Mechanism”).

Ideally, the refresh action for a fabric should be taken when there are no jobs running in the system. Restarting the subnet manager can have an adverse impact on the running jobs in the system.

Fabric Component sgifmcli Command

For the most common fabric management operations, the tempo-configure-fabric command (described in “The InfiniBand Management Tool Graphical User Interface”) is entirely sufficient, and recommended. The sgifmcli(8) command can be used for more advanced fabric management tasks.

The most common operations that sgifmcli would be used for are, as follows:

  • Initializing and configuring external InfiniBand switches

  • Verifying the integrity of the InfiniBand fabric(s)

For more information, see the sgifmcli(8) man page.

Currently, the following switches are supported:

Switch Type 

Description

voltaire-isr-9024 

Voltaire ISR 9024

voltaire-isr-2004 

Voltaire ISR 2004

voltaire-isr-2012 

Voltaire ISR 2012

voltaire-isr-9096 

Voltaire ISR 9096

voltaire-isr-9288 

Voltaire ISR 9288

To configure an external InfiniBand switch, cluster-wide InfiniBand connectivity is not required. The only necessity is that the supplied switch host name is resolvable and a working networking connection to the external InfiniBand switch exists. See the sgifmcli(8) man page for more information about adding external InfiniBand switches to your cluster's fabric.

Verify the integrity of an InfiniBand fabric requires that the InfiniBand network is first configured properly. This is most easily done using tempo-configure-fabric (see “The InfiniBand Management Tool Graphical User Interface”). See the sgifmcli(8) man page for details about the fabric verification operation.

sgifmcli SGI Fabric Component Command

The sgifmcli(8) command is, as follows:

sgifmcli [type action [options]] | [options]


Note: You can use shortened versions of the following sgifmcli options as long as the option is unambiguous. For example, sgifmcli --vers for sgifmcli --version.


It accepts the following general options:

General Option 

Description

-h, --help 

Displays a help message and the exits

-V, --version 

Shows the version number of the program

-v, --verbose [DEBUG | INFO | ERROR] 

Select verbosity level (default: ERROR). Most the messages from sgmifmcli are written to a log file named /var/log/sgifmcli.log. The default level reports error messages only. INFO provides the user with details about the operation of sgifmcli in addition to error messages. The DEBUG level produces output that is tailored toward the developer to help with bug fixing. In addition, the DEBUG level also produces INFO and ERROR messages.

It accepts the following detailed options:

Detailed Option 

Description

type 

The type option is one of the following:

  • --mastersm - Master subnet manager

  • --standby - Standby subnet manager

  • --ibswitch - InfiniBand switch

  • --ibfabric - InfiniBand fabric

action 

The action option is one of the following:

  • --init - Initializes the switch or fabric

  • --start - Starts a subnet manager

  • --stop - Stops a subnet manager

  • --status - Prints the status of a subnet manager

  • --verify - Verifies the fabric

  • --refresh - Update a InfiniBand fabric (for Enhanced Hypercube)

  • --set - Sets specific SM configuration parameter (see arglist)

  • --add - Adds a subcomponent to its container, for example, add a switch to a fabric

  • --delete - Deletes a subcomponent from its container, for example, delete a switch from a fabric Removes the switch or fabric

  • --remove - Removes an entity

  • --showconfig - Prints fabric configuration

  • --switchlist - Lists switches in a fabric

  • --create-node-name-map - Creates a node name map for internal SGI Alitx ICE switches

options 

The options option is one or more of the following with no duplicates, for example, the --fabric option must be either ib0 or ib1, not both:

  • --id - Unique identifier, for example, host name

  • --hostname - Name of the node on which to run OpenSM

  • --switchtype - Type of switch (leaf or spine)

  • --model - Switch model ( voltaire-isr-9024, voltaire-isr-2004, voltaire-isr-2012, voltaire-isr-9096, or voltaire-isr-9288)

  • --fabric - Fabric, either ib0 or ib1

  • --topology - InfiniBand topology, either hypercube, enhanced-hypercube, or ftree

  • --arglist - List of Subnet Manager configuration parameters: param_1=val_1, param_2=val_2, ...

EXIT CODES

To facilitate the use of the sgifmcli(8) command in shell scripts, an exit code is returned to give an indication of what occurred during a given connection.

The exit codes returned by sgifmcli are, as follows:

0 

Successful termination.

255 

Abnormal termination.

For a detailed man page, perform the following command from the admin node:

admin:~ # man sgifmcli

The sgifmcli(8) fabric administration utilities man page appears.

sgifmdb Fabric Management Database Command

The fabric component maintains a database (DB) of the objects it manages (managed objects). The database version is automatically set during cluster install. You do not need to set it. Most likely, this database will change over time. To manage multiple database versions and also to aid in field support, SGI has added another command line tool that currently reports the managed objects database version.

The sgifmdb command is, as follows:

sgifmdb [--get|-g] [--dump|-d] [-v|--version] [-r|--reset] [--help|-h]

It accepts the following general options:

General Option 

Description

-g, --get 

Reads the database version object from the database

-d, --dump 

Dumps the database. This option allows the you to see what fabric objects are currently stored in the fabric database.

-v, --version 

Prints version

-r, --reset 

Resets the database and starts clean

-h, --help 

-h, --help

Example 4-1. Getting sgifmdb(8) Command Help

For a sgifmdb command usage statement, perform the following from the admin node:

admin:~ # sgifmdb -h
SGI Fabric Component DB tool
Usage: db_version [--get|-g] [--dump|-d] [-v|--version] [-r|--reset] [--help|-h]

        -g, --get       Read DB version object from DB
        -d, --dump      Dump the DB
        -v, --version   Print version
        -r, --reset     Reset the database and start clean
        -h, --help      Show this text


InfiniBand Fabric Management Configuration and Operation Overview

Each subnet manager (SM) performs a light sweep of the fabric it is managing, every 10 seconds by default. The time interval is set by setting the sweep_interval variable in the /opt/sgi/var/sgifmcli/opensm-ib0.conf.templ file and then doing a Commit operation in the tempo-configure-fabric GUI. Alternately, the sgifmcli command has a --arglist option to set various subnet manager configuration parameters including the sweep interval.


Note: If your cluster is larger than 256 nodes, SGI highly recommends increasing this variable to 90 seconds or even larger value.


If an SM detects a change in the fabric during a light sweep, such as, the addition or deletion of a node, it performs a heavy sweep. The heavy sweep actually changes the fabric configuration to reflect the current state of the system. For more information, see the opensm(8) man page on the leader node.

The opensm-ibx.conf configuration files are located in the /opt/sgi/var/sgifmcli directory on the admin node.

Each opensm instance (one for each fabric) associates itself with a particular globally unique identifier (GUID) for a port on the node where opensm runs (see Figure 4-5). This association is configured with the "guid" entry in the corresponding opensm-ib[01].conf file.

Figure 4-5. Two InfiniBand Fabrics in a System with Two IRUs

Two InfiniBand Fabrics in a System with Two IRUs

Network Topology

For SGI Altix ICE systems with a hypercube topology, SGI uses the dimension order routing (DOR) algorithm.

The dimension order routing algorithm is based on the min hop algorithm and so uses shortest paths. Instead of spreading traffic out across different paths with the same shortest distance, it chooses among the available shortest paths based on an ordering of dimensions.

For SGI Altix ICE systems with a fat-tree topology, SGI uses updn as the default routing algorithm. Unicast routing algorithm (UPDN) is also based on the minimum hops to each node, but it is constrained to ranking rules.

For more information on routing variables, see the opensm (8) man page.

Hypercube network topology is well suited for smaller node count MPI jobs or jobs that have communication patterns that are not sensitive to bisection bandwidth. Fat-tree network topology is well suited for large node count MPI jobs that are sensitive to bi-section bandwidth.

As stated above, there are two opensm daemons, one for each fabric, opensmd-ib0 and opensmd-ib1 , respectively. They are controlled by the init.d scripts. Each init.d script has a separate configuration file for each fabric, opensm-ib0 and opensm-ib1 , respectively.

You can use the sminfo command to show the GUID of the SM master.

Configuring the InfiniBand Fabric

This section describes how to configure and administer the InfiniBand fabric using the sgifmcli(8) command.


Note: SGI highly recommends that you use the tempo-configure-fabric GUI to configure and administer the fabric (see “The InfiniBand Management Tool Graphical User Interface”).


Procedure 4-1. Configure the Master Subnet Manager

    When configuring the SM master, the following rules apply:

    • Each InfiniBand fabric needs to have a subnet manager (SM) master.

    • There can be at most one SM master per InfiniBand fabric.

    • Fabric configuration and administration can only be done via the SM master.

    • Fabric configuration becomes active after (re)starting the SM master.

    • Deleting an SM master automatically deletes its standby, if it exists.

    The syntax to configure an SM master is, as follows:

    sgifmcli --mastersm --init --id identifier --hostname hostname --fabric fabric --topology topology

    This command creates a master with the name provided by the --id option. The identifier can be any arbitrary string. The hostname determines the host on which the SM master manager is launched. The fabric option associates the SM master manager with either ib0 or ib1. The topology option refers to the InfiniBand topology, which can be either hypercube, enhanced hypercube, or fat tree.

    To configure a master for the fabric ib0 on a hypercube cluster, perform the following steps:

    1. From the admin node to configure an SM master, perform the following:

      # sgifmcli --mastersm --init --id master_ib0 --hostname r1lead --fabric ib0 --topology hypercube

      This creates an SM master for ib0. The underlying topology is a hypercube and thus the routing algorithm dor will be used. This SM master, named master_ib0, is configured to run on the host r1lead.

    2. The syntax to start an SM master is, as follows:

      sgifmcli --start --id identifier

      To start the master_ib0 SM master, perform the following:

      # sgifmcli --start --id master_ib0

      At this point a master for the fabric ib0 is running on the r1lead and thus the fabric ib0 is available for compute jobs. If a standby has been defined, it will be launched automatically, in addition, to the master.

    3. The syntax to stop an SM master is, as follows:

      sgifmcli --stop --id identifier

      To stop the master_ib0 SM master, perform the following:
      # sgifmcli --stop --id master_ib0

      The SM master master_ib0 running on host r1lead is stopped. If a standby has been defined then it will be stopped automatically, in addition to the master.

    4. The syntax to check the status of an SM master is, as follows:

      sgifmcli --status --id identifier

      To check the status of the master_ib0 SM master, perform the following:
      # sgifmcli --status --id master_ib0
      Master SM
      Host = rlead
      Guid = 0x0002c902002838f5
      Fabric = ib0
      Topology = hypercube
      Routing Engine = dor
      OpenSM = running

      The status of the master SM master master_ib0 running on host r1lead is reported. If a standby has been defined, its status will be reported in addition to the master.

    5. The syntax to remove an SM master is, as follows:

      sgifmcli --remove --id identifier

      To remove the master_ib0 SM master, first stop it and then perform the -remove option, as follows:

      # sgifmcli --stop --id master_ib0
      
      # sgifmcli --remove --id master_ib0

      The SM master is removed from the entity list. If a standby has been defined, it is removed, in addition to the master.

    6. To find the ID of the master SM in the database, perform the following:

      # sgifmcli --dump --id ib0 | grep MASTER

    7. To print the fabric configuration, run the following:

      # sgifmcli --showconfig
      
      --------------
      NAME = ib1
      TYPE = ibfabric
      MASTER = 
      STANDBY = 
      SWITCH_LIST = 
      --------------
      NAME = ib0
      TYPE = ibfabric
      MASTER = 
      STANDBY = 
      SWITCH_LIST = 

    InfiniBand Fabric Failover Mechanism

    Each subnet manager (SM) has a failover mechanism. If the master SM fails, the standby SM takes over operation of the fabric. This failover operation is performed automatically by the opensm software.Typically, rack1 is the MASTER for the ib0 fabric and rack2 has the MASTER for the ib1 fabric, as shown in Figure 4-6.

    Figure 4-6. opensm Software Failover

    opensm Software Failover

    The following procedure describes how to setup the failover mechanism.

    Procedure 4-2. Enabling the InfiniBand Failover Mechanism

      When enabling the InfiniBand failover mechanism, the following rules apply:

      • Each InfiniBand fabric can optionally have exactly one standby.

      • A standby SM can only be created for a particular fabric when a master already exists.

      • When adding a standby after a master has already been defined and started, the master needs to be stopped before the standby is defined via the --init option. After defining the standby via --init, restart the master.

      • A SM master and SM standby for a particular fabric can not coexist on the same node.

      SGI highly recommends that you use the tempo-configure-fabric GUI to configure the failover mechanism. If it is necessary to use sgifmcli(8) to enable the InfiniBand failover mechanism, perform the following steps:

      1. If an SM master is defined and running, stop it, as follows:

        # sgifmcli --stop --id master_ib0

        If the SM master has not been defined, define it, as follows:
        # sgifmcli --mastersm --init --id master_ib0 --hostname r1lead --fabric ib0 --topology hypercube

      2. Define the SM standby, as follows:

        # sgifmcli --standbysm --init --id standby_ib0 --hostname r2lead --fabric ib0

      3. Start the SM master, as follows:

        # sgifmcli --start --id master_ib0

        This automatically starts the SM master and the SM standby for ib0.

      4. Now check the status for the subnet manager of ib0, as follows:

        sgifmcli --status --id master_ib0
        
        Master SM
        Host = r1lead
        Guid = 0x0008f10403987da9
        Fabric = ib0
        Toplogy = hypercube
        Routing Engine = dor
        OpenSM = running
        Standby SM
        Host = r2lead
        Guid = 0x0008f10403987d25
        Fabric = ib0
        OpenSM = running

      5. To remove the standby_ib0 SM standby, first stop its master and then perform the remove option, as follows:

        # sgifmcli --stop --id master_ib0
        # sgifmcli --remove --id standby_ib0

        The SM standby is removed from the entity list. If a standby has been defined, it is removed, in addition to the master.

      Configuring the InfiniBand Fat-tree Network Topology

      This section describes how to configure InfiniBand fat-tree network topology. The fat-tree topology involves external InfiniBand switches. For the list of supported external switches, see “Fabric Component sgifmcli Command”.

      InfiniBand switches are generally classified as being of two types: edge switches and core or spine switches. Edge switches are used to connect to compute nodes. Core or spine switches are used to connect edge switches together. The integrated InfiniBand switches in SGI Altix ICE systems are considered to be edge switches and external InfiniBand switches used to connect these edge switches together in a fat-tree topology are considered to be spine switches.

      SGI recommends that you use the Tempo discover command (see “discover Command” in Chapter 2) to discover external IB switches. After discovery is completed, an external switch can also be initialized and added to the InfiniBand system using the sgifmcli command.

      Procedure 4-3. Configuring InfiniBand Fat-tree Network Topology

        To configure the InfiniBand fat-tree network topology on an SGI Altix ICE system, perform the following steps:

        1. Make sure that your switch is properly connected to the InfiniBand network. Also, make sure that the admin port of the switch is properly connected to the Ethernet network.

        2. Power on the switch. See the switch manual for operation information.

        3. From the admin node, initialize the switch. The syntax to initialize the switch is, as follows:

          sgifmcli --init --ibswitch --model   --id  --switchtype [leaf | spine]

          An example command is, as follows:

          # sgifmcli --init --ibswitch --model voltaire-isr-2004  --id isr2004 --switchtype spine

          This configures a Voltaire switch ISR2004 with hostname isr2004 as a spine switch. isr2004 refers to the admin port of the switch and needs to be configured previously to allow for switch access. The switch is now initialized and the root GUID from the spine switches have been downloaded.

        4. From the admin node, add the switch to the fabric. The syntax to add the switch is, as follows:

          sgifmcli --add --id <fabric> --switch <hostname>

          An example command is, as follows:

          # sgifmcli --add --id ib0 --switch isr2004

          In this example, ISR2004 is connected to the ib0 fabric.

        5. For the new switch to be activated, the SM master and the optional SM standby need to be (re)started.

          # sgifmcli --start --id master_ib0

          If the SM master was running while the switch was added, you first need to stop and then start the master, as follows:

          # sgifmcli --stop --id master_ib0
          # sgifmcli --start --id master_ib0

          If a standby has been defined, then in case of an SM master failure the SM standby subnet manager will automatically take over and assume control over the switch.

        6. The switches related to a particular fabric can be listed, as follows:

          # sgifmcli --switchlist --id <fabric>

        Verifying the InfiniBand Network

        After your InfiniBand fabric has been configured and started, you can use the sgifmcli(8) command to verify the health of the fabric.

        Procedure 4-4. Verifying the InfiniBand Network

          The fabric can be either ib0 or ib1 . This version of the InfiniBand verifier runs the recommended OFED test suite. In addition, the SGI Tempo cluster view is compared with the InfiniBand cluster view and potential differences are reported.

          To verify the ibo fabric, perform the following command:

          # sgifmcli --verify --id <fabric>

          For more information, see the sgifmcli(8) man page.

          Useful Utilities and Diagnostics

          The infiniband-diags-pp package contains useful tools and diagnostic software for Open Fabrics Enterprise Distribution (OFED). This section describes some of these tools. These tools reside on the rack leader controller (leader node) in the /usr/sbin directory. To see a full list of diagnostics, from the leader node, use the following command:

          # rpm -ql infiniband-diags-pp | grep "/usr/sbin"

          This section covers the following topics:

          ibstat and ibstatus Commands

          You can use the ibstat command to see the current status of the host channel adapters (HCA) in your InfiniBand fabric including the HCAs on rack leader controllers. The following view is prior to starting the fabric management:

          r1lead:/usr/bin # ibstat
          CA 'mthca0'
                  CA type: MT25208 (MT23108 compat mode)
                  Number of ports: 2
                  Firmware version: 4.7.600
                  Hardware version: a0
                  Node GUID: 0x0008f104039881a8
                  System image GUID: 0x0008f104039881ab
                  Port 1:
                          State: Initializing
                          Physical state: LinkUp
                          Rate: 20
                          Base lid: 0
                          LMC: 0
                          SM lid: 0
                          Capability mask: 0x02510a68
                          Port GUID: 0x0008f104039881a9
                  Port 2:
                          State: Initializing
                          Physical state: LinkUp
                          Rate: 20
                          Base lid: 0
                          LMC: 0
                          SM lid: 0
                          Capability mask: 0x02510a68
                          Port GUID: 0x0008f104039881aa

          The following shows output from the ibstat command after the fabric management software has been started:

          r1lead:/opt/sgi/sbin # ibstat
          CA 'mthca0'
                  CA type: MT25208 (MT23108 compat mode)
                  Number of ports: 2
                  Firmware version: 4.7.600
                  Hardware version: a0
                  Node GUID: 0x0008f104039881a8
                  System image GUID: 0x0008f104039881ab
                  Port 1:
                          State: Active
                          Physical state: LinkUp
                          Rate: 20
                          Base lid: 1
                          LMC: 0
                          SM lid: 1
                          Capability mask: 0x02510a6a
                          Port GUID: 0x0008f104039881a9
                  Port 2:
                          State: Active
                          Physical state: LinkUp
                          Rate: 20
                          Base lid: 1
                          LMC: 0
                          SM lid: 1
                          Capability mask: 0x02510a6a
                          Port GUID: 0x0008f104039881aa

          You can use the ibstatus (less verbose that ibstat) command to show the link rate, as follows:

          r1lead:/opt/sgi/sbin # ibstatus
          Infiniband device 'mthca0' port 1 status:
                  default gid:     fe80:0000:0000:0000:0008:f104:0398:81a9
                  base lid:        0x1
                  sm lid:          0x1
                  state:           4: ACTIVE
                  phys state:      5: LinkUp
                  rate:            20 Gb/sec (4X DDR)
          
          Infiniband device 'mthca0' port 2 status:
                  default gid:     fe80:0000:0000:0000:0008:f104:0398:81aa
                  base lid:        0x1
                  sm lid:          0x1
                  state:           4: ACTIVE
                  phys state:      5: LinkUp
                  rate:            20 Gb/sec (4X DDR)


          Note: If link rate is not 20 Gb/sec 4xDDR, and you have a DDR capable HCA, there is a physical link problem with your system.


          perfquery Command

          The perfquery command is useful for find errors on a particular or number of HCA's and switch ports. You can also use perfquery to reset HCA and switch port counters.

          To see a usage statement for the perfquery command, perform the following:

          r1lead:/opt/sgi/sbin # perfquery --help
          Usage: perfquery [-d(ebug) -G(uid) -a(ll_ports) -r(eset_after_read) -C ca_name -P ca_port -R(eset_only)
           -t(imeout) timeout_ms -V(ersion) -h(elp)] [<lid|guid> [[port] [reset_mask]]]
                  Examples:
                          perfquery               # read local port's performance counters
                          perfquery 32 1          # read performance counters from lid 32, port 1
                          perfquery -e 32 1       # read extended performance counters from lid 32, port 1
                          perfquery -a 32         # read performance counters from lid 32, all ports
                          perfquery -r 32 1       # read performance counters and reset
                          perfquery -e -r 32 1    # read extended performance counters and reset
                          perfquery -R 0x20 1     # reset performance counters of port 1 only
                          perfquery -e -R 0x20 1  # reset extended performance counters of port 1 only
                          perfquery -R -a 32      # reset performance counters of all ports
                          perfquery -R 32 2 0x0fff        # reset only error counters of port 2
                          perfquery -R 32 2 0xf000        # reset only non-error counters of port 2

          Some sample output from the perfquery command is, as follows:
          r1lead:/opt/sgi/sbin # perfquery
          # Port counters: Lid 1 port 1
          PortSelect:......................1
          CounterSelect:...................0x0000
          SymbolErrors:....................0
          LinkRecovers:....................0
          LinkDowned:......................0
          RcvErrors:.......................0
          RcvRemotePhysErrors:.............0
          RcvSwRelayErrors:................0
          XmtDiscards:.....................0
          XmtConstraintErrors:.............0
          RcvConstraintErrors:.............0
          LinkIntegrityErrors:.............0
          ExcBufOverrunErrors:.............0
          VL15Dropped:.....................0
          XmtData:.........................0
          RcvData:.........................0
          XmtPkts:.........................0
          RcvPkts:.........................0

          ibnetdiscover Command

          The ibnetdiscover command allows you discover the IB fabric.

          To see a usage statement for the ibnetdiscover command, perform the following:

          r1lead:/opt/sgi/sbin # ibnetdiscover --help
          Usage: ibnetdiscover [-d(ebug)] -e(rr_show) -v(erbose) -s(how) -l(ist) 
          -g(rouping) -H(ca_list) -S(witch_list) 
          -V(ersion) -C ca_name -P ca_port -t(imeout) timeout_ms 
          --switch-map switch-map] [<topology-file>]
          --switch-map <switch-map>  specify a switch-map file


          Note: Only abbreviated output is shown in the this example.


          Some sample output from the ibnetdiscover command is, as follows:
          r1lead:/opt/sgi/sbin # ibnetdiscover
          #
          # Topology file: generated on Tue Jul 17 14:05:20 2007
          #
          # Max of 3 hops discovered
          # Initiated from node 0008f104039881a8 port 0008f104039881a9
          
          vendid=0x2c9
          devid=0xb924
          sysimgguid=0x8006900000000dd
          
          ...
          
          Switch   : 0x08006900000000dc ports 24 devid 0xb924 vendid 0x2c9 
          "MT47396 Infiniscale-III Mellanox Technologies"
          Switch   : 0x08006900000000a4 ports 24 devid 0xb924 vendid 0x2c9 
          "MT47396 Infiniscale-III Mellanox Technologies"
          
          r1lead:/opt/sgi/sbin # ibnetdiscover -H (HCA's)
          Ca       : 0x0030487aa7940000 ports 1 devid 0x6274 vendid 0x2c9 "MT25204 InfiniHostLx Mellanox Technologies"
          Ca       : 0x0030487aa78c0000 ports 1 devid 0x6274 vendid 0x2c9 "r1i0n8-ib0 HCA-1"
          Ca       : 0x0008f10403988198 ports 2 devid 0x6278 vendid 0x8f1 " HCA-1"
          Ca       : 0x0030487aa7840000 ports 1 devid 0x6274 vendid 0x2c9 "r1i0n1-ib0 HCA-1"
          Ca       : 0x0030487aa79c0000 ports 1 devid 0x6274 vendid 0x2c9 "r1i1n0-ib0 HCA-1"
          Ca       : 0x0030487aa7900000 ports 1 devid 0x6274 vendid 0x2c9 "r1i1n8-ib0 HCA-1"
          Ca       : 0x0030487aa7980000 ports 1 devid 0x6274 vendid 0x2c9 "r1i1n1-ib0 HCA-1"
          Ca       : 0x0008f104039881a8 ports 2 devid 0x6278 vendid 0x8f1 " HCA-1"
          
          ======================================================================================================

          ibdiagnet Command

          The ibdiagnet command is a useful diagnostic tool.

          To see a usage statement for the ibdiagnet command, perform the following:

          r1lead:/opt/sgi/sbin # ibdiagnet --help
          Loading IBDIAGNET from: /usr/lib64/ibdiagnet1.2
          NAME
            ibdiagnet
          SYNOPSYS
            ibdiagnet [-c ] [-v] [-r] [-o ]
               [-t ] [-s ] [-i ] [-p ]
               [-pm] [-pc] [-P <>]
               [-lw <1x|4x|12x>] [-ls <2.5|5|10>]
              
          
          DESCRIPTION
            ibdiagnet scans the fabric using directed route packets and extracts all the 
            available information regarding its connectivity and devices.
            It then produces the following files in the output directory defined by the
            -o option (see below): 
              ibdiagnet.lst    - List of all the nodes, ports and links in the fabric
              ibdiagnet.fdbs   - A dump of the unicast forwarding tables of the fabric
                                 switches
              ibdiagnet.mcfdbs - A dump of the multicast forwarding tables of the fabric
                                 switches
              ibdiagnet.masks  - In case of duplicate port/node Guids, these file include
                                 the map between masked Guid and real Guids 
              ibdiagnet.sm     - A dump of all the SM (state and priority) in the fabric
              ibdiagnet.pm     - In case -pm option was provided, this file contain a dump
                                 of all the nodes PM counters
            In addition to generating the files above, the discovery phase also checks for
            duplicate node/port GUIDs in the IB fabric. If such an error is detected, it 
            is displayed on the standard output.
            After the discovery phase is completed, directed route packets are sent
            multiple times (according to the -c option) to detect possible problematic 
            paths on which packets may be lost. Such paths are explored, and a report of
            the suspected bad links is displayed on the standard output.
            After scanning the fabric, if the -r option is provided, a full report of the
            fabric qualities is displayed.
            This report includes: 
              SM report
              Number of nodes and systems
              Hop-count information: 
                   maximal hop-count, an example path, and a hop-count histogram
              All CA-to-CA paths traced 
              Credit loop report
              mgid-mlid-HCAs matching table
            Note: In case the IB fabric includes only one CA, then CA-to-CA paths are not
            reported.
            Furthermore, if a topology file is provided, ibdiagnet uses the names defined
            in it for the output reports.
                
          OPTIONS
            -c                      : The minimal number of packets to be sent
                                             across each link (default = 10)
            -v                             : Instructs the tool to run in verbose mode
            -r                             : Provides a report of the fabric qualities
            -o                    : Specifies the directory where the output
                                             files will be placed (default = /tmp)
            -t                  : Specifies the topology file name
            -s                   : Specifies the local system name. Meaningful
                                             only if a topology file is specified
            -i                  : Specifies the index of the device of the port
                                             used to connect to the IB fabric (in case of
                                             multiple devices on the local system)
            -p                   : Specifies the local device's port number used
                                             to connect to the IB fabric
            -pm                            : Dumps all pmCounters values into ibdiagnet.pm
            -pc                            : reset all the fabric links pmCounters
            -P <>: If any of the provided pm is greater then its
                                             provided value, print it to screen
            -lw <1x|4x|12x>                : Specifies the expected link width
            -ls <2.5|5|10>                 : Specifies the expected link speed
                                               
            -h|--help                      : Prints this help information
            -V|--version                   : Prints the version of the tool
               --vars                      : Prints the tool's environment variables and
                                             their values
          
          ERROR CODES
            1 - Failed to fully discover the fabric
            2 - Failed to parse command line options
            3 - Failed to interact with IB fabric
            4 - Failed to use local device or local port
            5 - Failed to use Topology File
            6 - Failed to load required Package
          

          Output which shows no errors means the system is operating correctly:

          r1lead:/opt/sgi/sbin # ibdiagnet
          Loading IBDIAGNET from: /usr/lib64/ibdiagnet1.2
          Loading IBDM from: /usr/lib64/ibdm1.2
          -W- Topology file is not specified.
              Reports regarding cluster links will use direct routes.
          -W- A few ports of local device are up.
              Since port-num was not specified (-p option), port 1 of device 1 will be
              used as the local port.
          -I- Discovering the subnet ... 10 nodes (2 Switches & 8 CA-s) discovered.
          
          
          -I---------------------------------------------------
          -I- Bad Guids Info
          -I---------------------------------------------------
          -I- No bad Guids were found
          
          -I---------------------------------------------------
          -I- Links With Logical State = INIT
          -I---------------------------------------------------
          -I- No bad Links (with logical state = INIT) were found
          
          -I---------------------------------------------------
          -I- PM Counters Info
          -I---------------------------------------------------
          -I- No illegal PM counters values were found
          
          -I---------------------------------------------------
          -I- Bad Links Info
          -I---------------------------------------------------
          -I- No bad link were found
           
          -I- Done. Run time was 0 seconds.
          

          You can use ibdiagnet to load the fabric to test it, as follows:

          r1lead:/opt/sgi/sbin # ibdiagnet -c 5000
          Loading IBDIAGNET from: /usr/lib64/ibdiagnet1.2
          Loading IBDM from: /usr/lib64/ibdm1.2
          -W- Topology file is not specified.
              Reports regarding cluster links will use direct routes.
          -W- A few ports of local device are up.
              Since port-num was not specified (-p option), port 1 of device 1 will be
              used as the local port.
          -I- Discovering the subnet ... 10 nodes (2 Switches & 8 CA-s) discovered.
          
          
          -I---------------------------------------------------
          -I- Bad Guids Info
          -I---------------------------------------------------
          -I- No bad Guids were found
          
          -I---------------------------------------------------
          -I- Links With Logical State = INIT
          -I---------------------------------------------------
          -I- No bad Links (with logical state = INIT) were found
          
          -I---------------------------------------------------
          -I- PM Counters Info
          -I---------------------------------------------------
          -I- No illegal PM counters values were found
          
          -I---------------------------------------------------
          -I- Bad Links Info
          -I---------------------------------------------------
          -I- No bad link were found
           
          -I- Done. Run time was 8 seconds.