Chapter 4. System Fabric Management

The InfiniBand network on SGI Altix ICE 8200 series systems uses Open Fabrics Enterprise Distribution (OFED) software. This section describes the InfiniBand fabric and how to manage it. For background information on OFED, see http://www.openfabrics.org .

InfiniBand Fabric Management

This section describes the InfiniBand fabric and covers the following topics:

InfiniBand Fabric Overview

Fabric management on SGI Altix ICE 8200 series systems uses the OFED OpenSM software package. The InfiniBand fabric connects the service nodes, rack leader controllers (leader nodes), and the compute nodes. It does not connect to the system admin controller (admin node) or the chassis management control (CMC) blades. The InfiniBand network has two separate network fabrics, ib0 and ib1 (see “InfiniBand Fabric” in Chapter 1) with the following characteristics:

  • Each network fabric has its own subnet manager (SM).

  • For a system with two racks or more, one rack leader controller (leader node) runs an instance of SM to manage the ib0 fabric and a second leader node runs an instance of SM to manage the ib1 fabric. A database on the admin node keeps a record of which rack leader nodes are running the fabric management software for either ib0 or ib1, respectively. The smadmin command has the logic to place opensm on the appropriate rack leader controller. If one of the rack leader controllers becomes unavailable, management of fabric can be assigned to another available rack leader node in the system.


    Note: The LX series only has one ib fabric, therefore, the smconfig/smadmin commands, should only be run on ib0 (see “InfiniBand Fabric Administrative Tools”).


  • On a system with a single rack, both instances of opensm run on the same rack leader node.

  • Each instance of SM on the rack leader controller is controlled by the /etc/ofa/opensm-ib[01].conf configuration file. For more information, see “smconfig Automatic Fabric Configuration Tool ”.

  • Rack leader controllers run the opensm daemon for each fabric over separate HCA ports (see Figure 1-9).


    Note: With this release, after a system reboot, the opensm daemons start running automatically on the InfiniBand fabric.


  • Each fabric is addressed by a global unique identifier (GUID) and unique HCA port.

    The GUID and HCA port is set in the configuration file.


Note: Currently, the InfiniBand fabric ib0 is reserved for MPI and the InfiniBand fabric ib1 is reserved for storage.


InfiniBand Fabric Administrative Tools

This section describes how to configure and administer you InfiniBand fabric and covers these topics:


Note: The LX series only has one ib fabric, therefore, the smconfig/ smadmin commands described in this section, should only be run on the ib0 fabric.


InfiniBand Fabric Management Tool

You can use the InfiniBand management tool to configure or administer the InfiniBand fabric on your SGI Altix ICE system. You can use it to configure, start, stop, restart, cleanup, or get status for the InfiniBand fabric.

From the system admin controller (admin node), enter the following command:

sys-admin:~ # tempo-configure-fabric

The InfiniBand Management Tool GUI appears, as shown in Figure 4-1.

You can also access this tool from the configure-cluster main menu. For more information, see “configure-cluster Command” in Chapter 2.

Figure 4-1. InfiniBand Management Tool

InfiniBand Management Tool

Use the Select button to select the action you want to perform. A submenu will appear. Use the Quit button to return to the previous screen. Unless you are an expert user, SGI recommends that you use the InfiniBand Management GUI to manage your InfiniBand fabric rather than the smconfig and smadmin CLI commands. You can use the Help button to get online help for each of the GUI actions.

smconfig Automatic Fabric Configuration Tool

SGI Tempo also provides a command-line smconfig tool that automatically configures the fabric for you. “Configuring and Initializing the InfiniBand Fabric Manually” describes how to manually configure a fabric and provides more detailed information on how fabric configuration works.

The smconfig command is, as follows:

smconfig -f [ib0 | ib1] -e [dor | updn] OPTION 

It accepts the following options:
Option 

Description

-f [ib0 | ib1] 

Selects fabric ib0 or ib1 (Required)

-e [dor | updn] 

Use dor variable for hypercube topology. Use updn variable for fat-tree topology. (Required) For more information, see “Network Topology”.

OPTION 

Any combination of the following:


Note: The following options are optional and you can specifiy any combination of the options from OPTION.


-a FILE 

Specify external switch file

-o "172.23.0.2 172.23.0.3" 

OSM hosts list


Note: For the SGI Tempo v1.5 release, the -l and -r options have been deprecated.


Procedure 4-1. Using the smconfig Command to Automatically Configure the InfiniBand Fabric

    To automatically configure the ib0 and ib1 InfiniBand fabrics on your system, perform the following:

    1. From the system admin controller (admin node), perform the following command:

      admin:~ # smconfig -f ib0 -e dor
      Configuring r1lead
      Configuring r2lead
      Configuring r3lead
      Configuring r4lead

    2. Repeat the command for the ib1 fabric, as follows:

      admin:~ # smconfig -f ib1 -e dor
      Configuring r1lead
      Configuring r2lead
      Configuring r3lead
      Configuring r4lead

    smadmin InfiniBand Fabric Administration Tool

    SGI Tempo provides the smadmin tool that allows you to start up or stop the ib0 and ib1 InfiniBand fabrics. You can also use this tool to restart a fabric or get the status of a fabric. Use this command after your Altix ICE system has been discovered and is powered up (see “smconfig Automatic Fabric Configuration Tool ”).

    The smadmin command is, as follows:

     smadmin -f [ib0 | ib1] OPTION

    It accepts the following options:

    Option 

    Description

    -f 

    Fabric ib0 or fabric ib1 (Required)

    OPTION: (one of the following)

    -u 

    Start fabric management

    -d 

    Stop fabric management

    -r 

    Restart fabric management

    -c 

    Attempt a fabric cleanup

    -s 

    Get opensmd status (see “InfiniBand Fabric Management Configuration and Operation Overview ”)

    -m 

    Find opensmd MASTER node


    Note: For the SGI Tempo v1.5 release, the -e option has been deprecated.


    Procedure 4-2. Using the smadmin Command to Administer the InfiniBand Fabric

      The opensm instance for each fabric is run on different rack leader nodes. For example, the first rack leader controller discovered runs opensm for ib0, the second rack leader controller discovered runs opensm for ib1. The smadmin command has the logic to place opensm on the appropriate rack leader controller.

      1. From the system admin controller (admin node), to start fabric management on the ib0 fabric, perform the following:

        admin:~ # smadmin -f ib0 -u
        Running start of ib0
        opensm is stopped
        Starting opensm on r1lead
        opensm start [ OK ]
        smagent-rack: opensm configuration r1lead: opensmd started on fabric ib0
        Started opensm for fabric ib0 on r1lead
        

      2. From the admin node, to start fabric management on the ib1 fabric, perform the following:

        admin:~ # smadmin -f ib1 -u
        Running start of ib1
        Another fabric has opensm (pid 1253) running...
        smadmin notice : Another opensm is already running on r1lead
        Proceeding to next rack lead opensm is stopped Starting opensm on r2lead opensm start [ OK ]
        smagent-rack: opensm configuration r2lead: opensmd started on fabric ib1
        Started opensm for fabric ib1 on r2lead
        


        Note: The output for the command looks somewhat different because fabric ib0 is already running and the fabric management software detects this.


        If a fabric fails to start, you will see output similar to the following:

        Running start on r1lead
        smadmin: smadmin error : Invalid configuration on r1lead
         - Re run /opt/sgi/sbin/smconfig for r1le

        To fix this, run commands similar to the following:

        # smadmin -f ib0 -c       (clear the fabric)
        # smconfig -f ib0 -e dor         (reconfigure ib0, for instance)
        # smadin -f ib0 -u               (start up the fabric ib0, for instance)

      3. If both fabric managers started ok, you should be able to ping various -ib0 and -ib1 host names in your system (use the ifconfig(8) command to get the IP address). From one of the rack leader controllers, ping the service0 ib0 interface, as follows:

        r1lead# ping -c 1 10.148.0.67
        PING 10.148.0.67 (10.148.0.67) 56(84) bytes of data.
        64 bytes from 10.148.0.67: icmp_seq=1 ttl=64 time=0.013 ms
        
        --- 10.148.0.67 ping statistics ---
        1 packets transmitted, 1 received, 0% packet loss, time 0ms
        rtt min/avg/max/mdev = 0.013/0.013/0.013/0.000 ms

        If you are not able to ping a system node at this point, it is most likely a cabling issue.

      4. To stop the fabric management software on a fabric, perform the following:

        admin:~ # smadmin -f ib0 -d
        Running stop of ib0
        opensm is running with pid of 1253...
        ......
        opensm shutdown [ OK ]
        smagent-rack: opensm configuration r1lead: opensmd stopped on fabric ib0
        

      5. The fabric manager for each fabric runs on a different rack leader controller node. There is one MASTER node and no standby. From the admin node, to find the MASTER node, perform the following:

        admin:~ # smadmin -f ib0 -m
        smagent-rack: opensm configuration (from r1lead): opensmd master for ib0 is r1lead

      6. To determine the status of the fabric management software running on your system, perform the following:

        admin:~ # smadmin -f ib0 -s
        Running status of ib0 on r1lead
        opensm is running with pid of 30761...
        Running status of ib0 on r2lead
        Another fabric has opensm (pid 30263) running...
        Running status of ib0 on r3lead
        opensm is stopped
        Running status of ib0 on r4lead
        opensm is stopped.
        

      Procedure 4-3. Troubleshooting the InfiniBand Fabric

        If the fabric management software dies or exits incorrectly, a state may exist that will prevent it form being re-started on that fabric until a cleanup of the fabric management database is performed, as follows:

        1. Perform this set of commands from the system admin controller (admin node):

          admin:~ # /opt/sgi/sbin/smadmin -f ib0 -d
          admin:~ # /opt/sgi/sbin/smadmin -f ib0 -c
          admin:~ # /opt/sgi/sbin/smadmin -f ib0 -u

        2. Repeat for ib1 fabric if necessary.

        For hypercube topology, the OpenSM software is resilient in the following situations:

        • compute blade failures / reboots

        • IB switch cable pull-outs / and reseating

        • IB switch failures / reboots

        InfiniBand Fabric Management Configuration and Operation Overview

        Each subnet manager (SM) performs a light sweep of the fabric it is managing, every 10 seconds by default. The time interval by setting is in the SWEEP variable in the opensm-ib0.conf and opensm-ib1.conf configuration files located in the /etc/ofa directory on the rack leader node.


        Note: SGI highly recommends that you do NOT change this variable.


        If an SM detects a change in the fabric during a light sweep, such as, the addition or deletion of a node, it performs a heavy sweep. The heavy sweep actually changes the fabric configuration to reflect the current state of the system.

        A sample opensm-ibx.conf configuration file is, as follows:

        Example 4-1. opensm-ib0.conf and opensm-ib1.conf Configuration Files

        # DEBUG mode
        #  This option specifies a debug option.
        #  These options are not normally needed.
        #  The number following -d selects the debug
        #  option to enable as follows:
        #  OPT   Description
        #  ---    -----------------
        #  0  - Ignore other SM nodes.
        #  1  - Force single threaded dispatching.
        #  2  - Force log flushing after each log message.
        #  3  - Disable multicast support.
        #  4  - Put OpenSM in memory tracking mode.
        #  10.. Put OpenSM in testability mode.
        #  none, no debug options are enabled.
        DEBUG=none
        
        # LMC          
        #  This option specifies the subnet's LMC value.
        #  The number of LIDs assigned to each port is 2^LMC.
        #  The LMC value must be in the range 0-7.
        #  LMC values > 0 allow multiple paths between ports.
        #  LMC values > 0 should only be used if the subnet
        #  topology actually provides multiple paths between
        #  ports, i.e. multiple interconnects between switches.
        #  OpenSM defaults to LMC = 0, which allows
        #  one path between any two ports.
        LMC=0
        
        # MAXSMPS
        #  This option specifies the number of VL15 SMP MADs
        #  allowed on the wire at any one time.
        #  Specifying -maxsmps 0 allows unlimited outstanding SMPs.
        #  Without -maxsmps, OpenSM defaults to a maximum of
        #  one outstanding SMP.
        MAXSMPS=0
        
        # REASSIGN_LIDS
        #  This option causes OpenSM to reassign LIDs to all
        #  end nodes. Specifying "REASSIGN_LIDS=yes" on a running subnet
        #  may disrupt subnet traffic.
        #  With "REASSIGN_LIDS=no", OpenSM attempts to preserve existing
        #  LID assignments resolving multiple use of same LID.
        REASSIGN_LIDS="no"
        
        # SWEEP
        #  This option specifies the number of seconds between
        #  subnet sweeps.  Specifying SWEEP=0 disables sweeping.
        #  OpenSM defaults to a sweep interval of 10 seconds.
        SWEEP=10
        
        # TIMEOUT
        #  This option specifies the time in milliseconds
        #  used for transaction timeouts.
        #  Specifying -t 0 disables timeouts.
        #  Without -t, OpenSM defaults to a timeout value of
        #  200 milliseconds.
        TIMEOUT=200
        
        # OSM_LOG
        #  This option defines the log to be the given file.
        #  By default the log goes to /tmp/osm.log.
        #  For the log to go to standard output use OSM_LOG=stdout.
        OSM_LOG=/var/log/osm-ib0.log                                                                         
        
        # VERBOSE
        #  This option increases the log verbosity level.
        #  The "-v" option may be specified multiple times
        #  to further increase the verbosity level.
        #   "-V" option sets the maximum verbosity level and
        #   forces log flushing.
        #   The "-V" is equivalent to "-vf 0xFF -d 2".
        VERBOSE="none"
        
        # ROUTING_ENGINE
        #  This option chooses the routing engine instead of 
        #  the Min Hop algorithm which is default.
        #  Valid routing engines are :-
        #         Min Hop, dor, updn, file, ftree
        #  To switch to different routing engine set the engine
        #  name in ROUTING_ENGINE (i.e.  ROUTING_ENGINE=dor).
        #  For Min Hop use ROUTING_ENGINE="none" or ROUTING_ENGINE=
        ROUTING_ENGINE="dor"
        
        # GUID_FILE
        #  This option only allowed when UPDN algorithm is activated
        #  It specifies the guid list file from which to fetch the guid list
        #  The file contain in each line only one valid guid
        GUID_FILE="none"
        
        #  This option specifies the local port GUID value
        #  with which OpenSM should bind.  OpenSM may be
        #  bound to 1 port at a time.
        #  If GUID given is 0, opensmd use PORT_NUM parameter.
        #  Without -g (GUID="none"), OpenSM trys to use the default port.
        #  example GUID="0x0005ad00000517c9"
        GUID="none"
        
        # OSM_HOSTS
        #  The list of all SM's IP addresses in InfiniBand subnet
        #  Used to handover mechanism
        #  example OSM_HOSTS="128.162.246.221 128.162.246.42"
        OSM_HOSTS="none"
        
        # OSM_CACHE_DIR
        OSM_CACHE_DIR="/var/cache/osm/ib0"
        
        # CACHE_OPTIONS
        #  Cache the given command line options into the file
        #  /var/cache/osm/opensm-ib0.opts for use next invocation
        #  The cache directory can be changed by the environment
        #  variable OSM_CACHE_DIR
        #  Set to '--cache-options' or '-c' in order to enable
        CACHE_OPTIONS="-c"
        
        # HONORE_GUID2LID 
        #  This option forces OpenSM to honor the guid2lid file,
        #  when it comes out of Standby state, if such file exists
        #  under OSM_CACHE_DIR, and is valid.
        #  Set to '--honor_guid2lid' or '-x' to enable.
        #  By default this is FALSE. Will be set automatically to '--honor_guid2lid'
        #  if OSM_HOSTS includes list of more then one IP addresses.
        HONORE_GUID2LID="none"
        
        # RCP
        #  This option osed by SLDD daemon for handover mechanism
        #  to copy local cache file to remote computer
        RCP=/usr/bin/scp
        
        # RSH
        #  This option osed by SLDD daemon for handover mechanism
        #  to execute commands on remote computer
        RSH=/usr/bin/ssh
        
        # RESCAN_TIME
        #  This option osed by SLDD daemon for handover mechanism
        #  Time between sweep of sldd daemon in seconds
        RESCAN_TIME=60
        
        # PORT_NUM
        #  This option defines HCA's port number which OpenSM should bind
        PORT_NUM=1
        
        # ONBOOT
        #  To start OpenSM automatically set ONBOOT=yes
        ONBOOT=yes
        
        # MULTI_FABRIC
        # Allow multiple fabrics (and copies of OpenSM) on the same SM host
        MULTI_FABRIC=yes
        

        Each fabric is addressed by a global unqiue identifier (GUID) and unique HCA port (see Figure 4-2). Each fabric has a unique GUID set in its respective configuration file.

        Figure 4-2. Two InfiniBand Fabrics in a System with Two IRUs

        Two InfiniBand Fabrics in a System with Two IRUs

        Network Topology

        For SGI Altix ICE systems with a hypercube topology, SGI requires ROUTING_ENGINE="dor" as the default variable (dimension order routing algorithm).

        The dimension order routing algorithm is based on the min hop algorithm and so uses shortest paths. Instead of spreading traffic out across different paths with the same shortest distance, it chooses among the available shortest paths based on an ordering of dimensions.

        For SGI Altix ICE systems with a fat-tree topology, SGI requires ROUTING_ENGINE="updn" as the default variable. Unicast routing algorithm (UPDN) is also based on the minimum hops to each node, but it is constrained to ranking rules.

        For more information on routing variables, see the opensm (8) man page.

        Hypercube network topology is well suited for smaller node count MPI jobs or jobs that have communication patterns that are not sensitive to bisection bandwidth. Fat-tree network topology is well suited for large node count MPI jobs that are sensitive to bi-section bandwidth.

        As stated above, there are two opensm daemons, one for each fabric, opensmd-ib0 and opensmd-ib1 , respectively. They are controlled by the init.d scripts. Each init.d script has a separate configuration file for each fabric, opensm-ib0 and opensm-ib1 , respectively.

        You can use the sminfo file to show the GUID of the SM master.

        Configuring the InfiniBand Fat-tree Network Topology

        This section describes how to configure InfiniBand fat-tree network topology.

        Procedure 4-4. Configuring InfiniBand Fat-tree Network Topology

          To configure the InfiniBand fat-tree network topology on an SGI Altix ICE 8200 series system, perform the following steps:

          1. Create the following files on the system admin controller (admin node):

            /opt/sgi/var/smadmin/ext_switch-ib0
            /opt/sgi/var/smadmin/ext_switch-ib1

          2. With your favorite text editor, specify the external switches on the ib0 and ib1 fabrics and their model names in colon-separated lists. For example, for the /opt/sgi/var/smadmin/ext_switch-ib0 file add two entries, one for each external switch and its model name, as follows:

            service52020:ISR2012
            
            service51020:ISR2012

            Perform this step for the /opt/sgi/var/smadmin/ext_switch-ib1 file.

          3. From the admin node, run the following command:

            # smconfig -f ib0 -e updn -a /opt/sgi/var/smadmin/ext_switch-ib0

          4. From the admin node, run the following command:

            # smadmin -f ib0 -u

            Repeat these two steps for the ib1 fabric

          5. From the admin node, run the following command:

            # smconfig -f ib1 -e updn -a /opt/sgi/var/smadmin/ext_switch-ib1

          6. From the admin node, run the following command:

            # smadmin -f ib1 -u

          Configuring and Initializing the InfiniBand Fabric Manually

          This section describes the changes you need to make to the /etc/opensm-ib0.conf or /etc/opensm-ib1.conf configuration file to configure opensm software, how to start the opensmd-ib0 and opensmd-ib1 daemons, and verify the fabric is operating. For an overview of fabric configuration and management, see “InfiniBand Fabric Management Configuration and Operation Overview ”.

          Procedure 4-5. Configuring and Initializing the InfiniBand Fabric Manually

            To configure, initialize, and verify the InfiniBand farbic, perform the following steps:

            1. From the admin node, connect to the leader node or rack 1, as follows:

              # ssh r1lead


              Note: Before you attempting to initialize the InfiniBand fabric, make sure all compute nodes are booted and operational.


            2. From the admin node, determine and record the IP addresses of the leader nodes, as follows:

              # ping -c 1 r1lead
              PING r1lead.ice.americas.sgi.com (172.16.0.2) 56(84) bytes of data.
              64 bytes from r1lead.ice.americas.sgi.com (172.16.0.2): icmp_seq=1 ttl=64 time=0.127 ms
              
              --- r1lead.ice.americas.sgi.com ping statistics ---
              1 packets transmitted, 1 received, 0% packet loss, time 0ms
              rtt min/avg/max/mdev = 0.127/0.127/0.127/0.000 ms
              # ping -c 1 r2lead
              PING r2lead.ice.americas.sgi.com (172.16.0.3) 56(84) bytes of data.
              64 bytes from r2lead.ice.americas.sgi.com (172.16.0.3): icmp_seq=1 ttl=64 time=0.089 ms
              
              --- r2lead.ice.americas.sgi.com ping statistics ---
              1 packets transmitted, 1 received, 0% packet loss, time 0ms
              rtt min/avg/max/mdev = 0.089/0.089/0.089/0.000 ms
              # ping -c 1 r3lead
              PING r3lead.ice.americas.sgi.com (172.16.0.4) 56(84) bytes of data.
              64 bytes from r3lead.ice.americas.sgi.com (172.16.0.4): icmp_seq=1 ttl=64 time=0.129 ms
              
              --- r3lead.ice.americas.sgi.com ping statistics ---
              1 packets transmitted, 1 received, 0% packet loss, time 0ms
              rtt min/avg/max/mdev = 0.129/0.129/0.129/0.000 ms
              # ping -c 1 r4lead
              PING r4lead.ice.americas.sgi.com (172.16.0.5) 56(84) bytes of data.
              64 bytes from r4lead.ice.americas.sgi.com (172.16.0.5): icmp_seq=1 ttl=64 time=0.136 ms
              
              --- r4lead.ice.americas.sgi.com ping statistics ---
              1 packets transmitted, 1 received, 0% packet loss, time 0ms
              rtt min/avg/max/mdev = 0.136/0.136/0.136/0.000 ms

            3. From the leader node, issue an ibstat command to determine the Port GUID values, as follows:

              r1lead:/ # ibstat
              CA 'mthca0'
                      CA type: MT23108
                      Number of ports: 2
                      Firmware version: 3.3.3
                      Hardware version: a1
                      Node GUID: 0x0008f1040397b03c
                      System image GUID: 0x0008f1040397b03f
                      Port 1:
                              State: Active
                              Physical state: LinkUp
                              Rate: 10
                              Base lid: 1
                              LMC: 0
                              SM lid: 1
                              Capability mask: 0x02510a6a
                              Port GUID: 0x0008f1040397b03d <---<< goes into opensm-ib0.conf
                      Port 2:
                              State: Initializing
                              Physical state: LinkUp
                              Rate: 10
                              Base lid: 0
                              LMC: 0
                              SM lid: 0
                              Capability mask: 0x02510a68
                              Port GUID: 0x0008f1040397b03e <---<< goes into opensm-ib1.conf


              Note: Get usage information on the ibstat command, as follows:
              r1lead:/ # ibstat --help
              Usage: ibstat [-d(ebug) -l(ist_of_cas) -s(hort) -p(ort_list) -V(ersion)]  [portnum]
                      Examples:
                              ibstat -l         # list all IB devices
                              ibstat mthca0 2 # stat port 2 of 'mthca0'



            4. From the leader node, change directory to the /etc, as follows:

              r1lead:/ # cd /etc

            5. Using your favorite editor, open the opensm-ib0.conf file and enter the Port GUID: value, in this example, 0x0008f1040397b03d, as follows:

              GUID="0x0008f1040397b03d"

            6. Using your favorite editor, open the opensm-ib1.conf file and enter the Port GUID: value, in this example, 0x0008f1040397b03e, as follows:

              GUID="0x0008f1040397b03e"

            7. There are two routing options available based on fabric topology, as follows:

              • For fat-tree network topology, use updn.

              • For hypercube network topology, use dor.

              For more information, see “Network Topology”.

              For hypercube network topology, set the ROUTING_ENGINE variable in both configuration files to dor (dimension order routing) , as follows:

              ROUTING_ENGINE="dor"

              For fat--tree network topology, set the ROUTING_ENGINE variable in both configuration files to updn (Unicast routing algorithm) , as follows:

              ROUTING_ENGINE="updn"

            8. To initialize the ib0 fabric, start the opensmd-ib0 daemon, as follows:

              # /etc/init.d/opensmd-ib0 start

            9. To initialize the ib1 fabric, start the opensmd-ib1 daemon, as follows:

              # /etc/init.d/opensmd-ib1 start

            10. Use the ibnetdiscover command to verify the fabric, as follows:

              r1lead:/ # ibnetdiscover -l
              Switch   : 0x08006900000000dc ports 24 devid 0xb924 vendid 0x2c9 "MT47396 Infiniscale-III Mellanox Technologies"
              Switch   : 0x08006900000000a4 ports 24 devid 0xb924 vendid 0x2c9 "MT47396 Infiniscale-III Mellanox Technologies"
              Ca       : 0x0030487aa7940000 ports 1 devid 0x6274 vendid 0x2c9 " HCA-1"
              Ca       : 0x0030487aa78c0000 ports 1 devid 0x6274 vendid 0x2c9 " HCA-1"
              Ca       : 0x0008f10403988198 ports 2 devid 0x6278 vendid 0x8f1 "service0-ib0 HCA-1"
              Ca       : 0x0030487aa7840000 ports 1 devid 0x6274 vendid 0x2c9 " HCA-1"
              Ca       : 0x0030487aa79c0000 ports 1 devid 0x6274 vendid 0x2c9 " HCA-1"
              Ca       : 0x0030487aa7900000 ports 1 devid 0x6274 vendid 0x2c9 " HCA-1"
              Ca       : 0x0030487aa7980000 ports 1 devid 0x6274 vendid 0x2c9 " HCA-1"
              Ca       : 0x0008f104039881a8 ports 2 devid 0x6278 vendid 0x8f1 " HCA-1"


              Note: Get usage information on the ibnetdiscover command, as follows:
              r1lead:/ # ibnetdiscover --help
              Usage: ibnetdiscover [-d(ebug)] -e(rr_show) -v(erbose) -s(how) -l(ist) -g(rouping) -H(ca_list) -S(witch_list)
               -V(ersion) -C ca_name -P ca_port -t(imeout) timeout_ms --switch-map switch-map] []
              --switch-map  specify a switch-map file



            11. Exit the rack leader controller (leader node) and return to the system admin controller (admin node), you should be good to go now.

            Useful Utilities and Diagnostics

            The openib-diags package contains useful tools and diagnostic software for Open Fabrics Enterprise Distribution (OFED). This section describes some of these tools. These tools reside on the rack leader controller (leader node) in the /usr/bin directory, as follows:

            r1lead:~ # cd /usr/bin
            r1lead:/usr/bin # ls ib*
            ibaddr            ibcheckstate     ibdiscover.pl        ibnetdiscover     ib_rdma_bw   ibstatus        ...
            ibcheckerrors     ibcheckwidth     ibdmchk              ibnlparse         ib_rdma_lat  ibswitches      ...
            ibcheckerrs       ibclearcounters  ibdmsh               ibnodes           ib_read_bw   ibsysstat       ...
            ibchecknet        ibclearerrors    ibdmtr               ibping            ib_read_lat  ibtopodiff      ...
            ibchecknode       ib_clock_test    ibfindnodesusing.pl  ibportstate       ibroute      ibtracert       ...
            ibcheckport       ibdiagnet        ibhosts              ibprintca.pl      ib_send_bw   ibv_asyncwatch  ...
            ibcheckportstate  ibdiagpath       ibis                 ibprintswitch.pl  ib_send_lat  ibv_devices     ...
            ibcheckportwidth  ibdiagui         iblinkinfo.pl        ibqueryerrors.pl  ibstat       ibv_devinfo

            This section covers the following topics:

            ibstat and ibstatus Commands

            You can use the ibstat command to see the current status of the host channel adapaters (HCA) in your InfiniBand fabric incluing the HCAs on rack leader controllers. The following view is prior to starting the fabric management:

            r1lead:/usr/bin # ibstat
            CA 'mthca0'
                    CA type: MT25208 (MT23108 compat mode)
                    Number of ports: 2
                    Firmware version: 4.7.600
                    Hardware version: a0
                    Node GUID: 0x0008f104039881a8
                    System image GUID: 0x0008f104039881ab
                    Port 1:
                            State: Initializing
                            Physical state: LinkUp
                            Rate: 20
                            Base lid: 0
                            LMC: 0
                            SM lid: 0
                            Capability mask: 0x02510a68
                            Port GUID: 0x0008f104039881a9
                    Port 2:
                            State: Initializing
                            Physical state: LinkUp
                            Rate: 20
                            Base lid: 0
                            LMC: 0
                            SM lid: 0
                            Capability mask: 0x02510a68
                            Port GUID: 0x0008f104039881aa

            The following shows output from the ibstat command after the fabric management software has been started:

            r1lead:/opt/sgi/sbin # ibstat
            CA 'mthca0'
                    CA type: MT25208 (MT23108 compat mode)
                    Number of ports: 2
                    Firmware version: 4.7.600
                    Hardware version: a0
                    Node GUID: 0x0008f104039881a8
                    System image GUID: 0x0008f104039881ab
                    Port 1:
                            State: Active
                            Physical state: LinkUp
                            Rate: 20
                            Base lid: 1
                            LMC: 0
                            SM lid: 1
                            Capability mask: 0x02510a6a
                            Port GUID: 0x0008f104039881a9
                    Port 2:
                            State: Active
                            Physical state: LinkUp
                            Rate: 20
                            Base lid: 1
                            LMC: 0
                            SM lid: 1
                            Capability mask: 0x02510a6a
                            Port GUID: 0x0008f104039881aa

            You can use the ibstatus (less verbose that ibstat) command to show the link rate, as follows:

            r1lead:/opt/sgi/sbin # ibstatus
            Infiniband device 'mthca0' port 1 status:
                    default gid:     fe80:0000:0000:0000:0008:f104:0398:81a9
                    base lid:        0x1
                    sm lid:          0x1
                    state:           4: ACTIVE
                    phys state:      5: LinkUp
                    rate:            20 Gb/sec (4X DDR)
            
            Infiniband device 'mthca0' port 2 status:
                    default gid:     fe80:0000:0000:0000:0008:f104:0398:81aa
                    base lid:        0x1
                    sm lid:          0x1
                    state:           4: ACTIVE
                    phys state:      5: LinkUp
                    rate:            20 Gb/sec (4X DDR)


            Note: If link rate is not 20 Gb/sec 4xDDR, there is a physical link problem with your system.


            perfquery Command

            The perfquery command is useful for find errors on a particular or number of HCA's and switch ports. You can also use perfquery to reset HCA and switch port counters.

            To see a usage statement for the perfquery command, perform the following:

            r1lead:/opt/sgi/sbin # perfquery --help
            Usage: perfquery [-d(ebug) -G(uid) -a(ll_ports) -r(eset_after_read) -C ca_name -P ca_port -R(eset_only)
             -t(imeout) timeout_ms -V(ersion) -h(elp)] [<lid|guid> [[port] [reset_mask]]]
                    Examples:
                            perfquery               # read local port's performance counters
                            perfquery 32 1          # read performance counters from lid 32, port 1
                            perfquery -e 32 1       # read extended performance counters from lid 32, port 1
                            perfquery -a 32         # read performance counters from lid 32, all ports
                            perfquery -r 32 1       # read performance counters and reset
                            perfquery -e -r 32 1    # read extended performance counters and reset
                            perfquery -R 0x20 1     # reset performance counters of port 1 only
                            perfquery -e -R 0x20 1  # reset extended performance counters of port 1 only
                            perfquery -R -a 32      # reset performance counters of all ports
                            perfquery -R 32 2 0x0fff        # reset only error counters of port 2
                            perfquery -R 32 2 0xf000        # reset only non-error counters of port 2

            Some sample output from the perfquery command is, as follows:
            r1lead:/opt/sgi/sbin # perfquery
            # Port counters: Lid 1 port 1
            PortSelect:......................1
            CounterSelect:...................0x0000
            SymbolErrors:....................0
            LinkRecovers:....................0
            LinkDowned:......................0
            RcvErrors:.......................0
            RcvRemotePhysErrors:.............0
            RcvSwRelayErrors:................0
            XmtDiscards:.....................0
            XmtConstraintErrors:.............0
            RcvConstraintErrors:.............0
            LinkIntegrityErrors:.............0
            ExcBufOverrunErrors:.............0
            VL15Dropped:.....................0
            XmtData:.........................0
            RcvData:.........................0
            XmtPkts:.........................0
            RcvPkts:.........................0

            ibnetdiscover Command

            The ibnetdiscover command allows you discover the IB fabric.

            To see a usage statement for the ibnetdiscover command, perform the following:

            r1lead:/opt/sgi/sbin # ibnetdiscover --help
            Usage: ibnetdiscover [-d(ebug)] -e(rr_show) -v(erbose) -s(how) -l(ist) 
            -g(rouping) -H(ca_list) -S(witch_list) 
            -V(ersion) -C ca_name -P ca_port -t(imeout) timeout_ms 
            --switch-map switch-map] [<topology-file>]
            --switch-map <switch-map>  specify a switch-map file


            Note: Only abbreviated output is shown in the this example.


            Some sample output from the ibnetdiscover command is, as follows:
            r1lead:/opt/sgi/sbin # ibnetdiscover
            #
            # Topology file: generated on Tue Jul 17 14:05:20 2007
            #
            # Max of 3 hops discovered
            # Initiated from node 0008f104039881a8 port 0008f104039881a9
            
            vendid=0x2c9
            devid=0xb924
            sysimgguid=0x8006900000000dd
            
            ...
            
            Switch   : 0x08006900000000dc ports 24 devid 0xb924 vendid 0x2c9 
            "MT47396 Infiniscale-III Mellanox Technologies"
            Switch   : 0x08006900000000a4 ports 24 devid 0xb924 vendid 0x2c9 
            "MT47396 Infiniscale-III Mellanox Technologies"
            
            r1lead:/opt/sgi/sbin # ibnetdiscover -H (HCA's)
            Ca       : 0x0030487aa7940000 ports 1 devid 0x6274 vendid 0x2c9 "MT25204 InfiniHostLx Mellanox Technologies"
            Ca       : 0x0030487aa78c0000 ports 1 devid 0x6274 vendid 0x2c9 "r1i0n8-ib0 HCA-1"
            Ca       : 0x0008f10403988198 ports 2 devid 0x6278 vendid 0x8f1 " HCA-1"
            Ca       : 0x0030487aa7840000 ports 1 devid 0x6274 vendid 0x2c9 "r1i0n1-ib0 HCA-1"
            Ca       : 0x0030487aa79c0000 ports 1 devid 0x6274 vendid 0x2c9 "r1i1n0-ib0 HCA-1"
            Ca       : 0x0030487aa7900000 ports 1 devid 0x6274 vendid 0x2c9 "r1i1n8-ib0 HCA-1"
            Ca       : 0x0030487aa7980000 ports 1 devid 0x6274 vendid 0x2c9 "r1i1n1-ib0 HCA-1"
            Ca       : 0x0008f104039881a8 ports 2 devid 0x6278 vendid 0x8f1 " HCA-1"
            
            ======================================================================================================

            ibdiagnet Command

            The ibdiagnet command is a useful diagnostic tool.

            To see a usage statement for the ibdiagnet command, perform the following:

            r1lead:/opt/sgi/sbin # ibdiagnet --help
            Loading IBDIAGNET from: /usr/lib64/ibdiagnet1.2
            NAME
              ibdiagnet
            SYNOPSYS
              ibdiagnet [-c ] [-v] [-r] [-o ]
                 [-t ] [-s ] [-i ] [-p ]
                 [-pm] [-pc] [-P <>]
                 [-lw <1x|4x|12x>] [-ls <2.5|5|10>]
                
            
            DESCRIPTION
              ibdiagnet scans the fabric using directed route packets and extracts all the 
              available information regarding its connectivity and devices.
              It then produces the following files in the output directory defined by the
              -o option (see below): 
                ibdiagnet.lst    - List of all the nodes, ports and links in the fabric
                ibdiagnet.fdbs   - A dump of the unicast forwarding tables of the fabric
                                   switches
                ibdiagnet.mcfdbs - A dump of the multicast forwarding tables of the fabric
                                   switches
                ibdiagnet.masks  - In case of duplicate port/node Guids, these file include
                                   the map between masked Guid and real Guids 
                ibdiagnet.sm     - A dump of all the SM (state and priority) in the fabric
                ibdiagnet.pm     - In case -pm option was provided, this file contain a dump
                                   of all the nodes PM counters
              In addition to generating the files above, the discovery phase also checks for
              duplicate node/port GUIDs in the IB fabric. If such an error is detected, it 
              is displayed on the standard output.
              After the discovery phase is completed, directed route packets are sent
              multiple times (according to the -c option) to detect possible problematic 
              paths on which packets may be lost. Such paths are explored, and a report of
              the suspected bad links is displayed on the standard output.
              After scanning the fabric, if the -r option is provided, a full report of the
              fabric qualities is displayed.
              This report includes: 
                SM report
                Number of nodes and systems
                Hop-count information: 
                     maximal hop-count, an example path, and a hop-count histogram
                All CA-to-CA paths traced 
                Credit loop report
                mgid-mlid-HCAs matching table
              Note: In case the IB fabric includes only one CA, then CA-to-CA paths are not
              reported.
              Furthermore, if a topology file is provided, ibdiagnet uses the names defined
              in it for the output reports.
                  
            OPTIONS
              -c                      : The minimal number of packets to be sent
                                               across each link (default = 10)
              -v                             : Instructs the tool to run in verbose mode
              -r                             : Provides a report of the fabric qualities
              -o                    : Specifies the directory where the output
                                               files will be placed (default = /tmp)
              -t                  : Specifies the topology file name
              -s                   : Specifies the local system name. Meaningful
                                               only if a topology file is specified
              -i                  : Specifies the index of the device of the port
                                               used to connect to the IB fabric (in case of
                                               multiple devices on the local system)
              -p                   : Specifies the local device's port number used
                                               to connect to the IB fabric
              -pm                            : Dumps all pmCounters values into ibdiagnet.pm
              -pc                            : reset all the fabric links pmCounters
              -P <>: If any of the provided pm is greater then its
                                               provided value, print it to screen
              -lw <1x|4x|12x>                : Specifies the expected link width
              -ls <2.5|5|10>                 : Specifies the expected link speed
                                                 
              -h|--help                      : Prints this help information
              -V|--version                   : Prints the version of the tool
                 --vars                      : Prints the tool's environment variables and
                                               their values
            
            ERROR CODES
              1 - Failed to fully discover the fabric
              2 - Failed to parse command line options
              3 - Failed to interact with IB fabric
              4 - Failed to use local device or local port
              5 - Failed to use Topology File
              6 - Failed to load required Package
            

            Output which shows no errors means the system is operating correctly:

            r1lead:/opt/sgi/sbin # ibdiagnet
            Loading IBDIAGNET from: /usr/lib64/ibdiagnet1.2
            Loading IBDM from: /usr/lib64/ibdm1.2
            -W- Topology file is not specified.
                Reports regarding cluster links will use direct routes.
            -W- A few ports of local device are up.
                Since port-num was not specified (-p option), port 1 of device 1 will be
                used as the local port.
            -I- Discovering the subnet ... 10 nodes (2 Switches & 8 CA-s) discovered.
            
            
            -I---------------------------------------------------
            -I- Bad Guids Info
            -I---------------------------------------------------
            -I- No bad Guids were found
            
            -I---------------------------------------------------
            -I- Links With Logical State = INIT
            -I---------------------------------------------------
            -I- No bad Links (with logical state = INIT) were found
            
            -I---------------------------------------------------
            -I- PM Counters Info
            -I---------------------------------------------------
            -I- No illegal PM counters values were found
            
            -I---------------------------------------------------
            -I- Bad Links Info
            -I---------------------------------------------------
            -I- No bad link were found
             
            -I- Done. Run time was 0 seconds.
            

            You can use ibdiagnet to load the fabric to test it, as follows:

            r1lead:/opt/sgi/sbin # ibdiagnet -c 5000
            Loading IBDIAGNET from: /usr/lib64/ibdiagnet1.2
            Loading IBDM from: /usr/lib64/ibdm1.2
            -W- Topology file is not specified.
                Reports regarding cluster links will use direct routes.
            -W- A few ports of local device are up.
                Since port-num was not specified (-p option), port 1 of device 1 will be
                used as the local port.
            -I- Discovering the subnet ... 10 nodes (2 Switches & 8 CA-s) discovered.
            
            
            -I---------------------------------------------------
            -I- Bad Guids Info
            -I---------------------------------------------------
            -I- No bad Guids were found
            
            -I---------------------------------------------------
            -I- Links With Logical State = INIT
            -I---------------------------------------------------
            -I- No bad Links (with logical state = INIT) were found
            
            -I---------------------------------------------------
            -I- PM Counters Info
            -I---------------------------------------------------
            -I- No illegal PM counters values were found
            
            -I---------------------------------------------------
            -I- Bad Links Info
            -I---------------------------------------------------
            -I- No bad link were found
             
            -I- Done. Run time was 8 seconds.