This chapter describes system monitoring and covers the following topics:
This section describes some common maintenance procedures, as follows:
This section describes how to temporarily take a node offline for maintenance.
To temporarily Take a node offline for maintenance, perform the following steps:
Disable the node in the batch scheduler (depends on your batch scheduler).
Power off the node, as follows:
# cpower --down r1i0n0 |
Mark the node offline, as follows:
# cadmin --set-admin-status --node r1i0n0 offline |
Perform any maintenance to the blade that needs to be done.
Mark the node online, as follows:
# cadmin --set-admin-status --node r1i0n0 online |
Power up the node, as follows:
# cpower --boot r1i0n0 |
Enable the node in the batch scheduler (depends on your batch scheduler).
| Note: See your SGI field support person for the physical removal and replacement of SGI Altix ICE compute nodes (blades). |
This section describes how to permanently replace a failed blade.
To permanently replace a failed blade (compute node), perform the following steps:
Disable the node in the batch scheduler (depends on your batch scheduler).
Power off the node, as follows:
# cpower --down r1i0n0 |
Mark the node offline, as follows:
# cadmin --set-admin-status --node r1i0n0 offline |
Physically remove and replace the failed blade.
It is not necessary to run discover-rack when a blade is replaced. This is handled by blademond daemon. See “Discovering Compute Nodes” in Chapter 2, for more information.
Set the node to boot your desired compute image (see cimage --list-images and “cimage Command” in Chapter 3 for your options), as follows:
# cimage --set mycomputeimage mykernel r1i0n0 |
Power up the node, as follows:
# cpower --boot r1i0n0 |
Enable the node in the batch scheduler (depends on your batch scheduler).
This section describes how to permanently remove a blade from your Altix ICE system.
To permanently remove a blade from your system, perform the following steps:
Disable the node in the batch scheduler (depends on your batch scheduler).
Power off the node, as follows:
# cpower --down r1i0n0 |
Mark the node offline, as follows:
# cadmin --set-admin-status --node r1i0n0 offline |
Physically remove the failed blade.
It is not necessary to run discover-rack when a blade is replaced. This is handled by blademond daemon. See “Discovering Compute Nodes” in Chapter 2, for more information.
This section describes how to add a new blade to an Altix ICE system.
To add a new blade to your system, perform the following steps:
Physically insert the new blade
It is not necessary to run discover-rack when a blade is replaced. This is handled by blademond daemon. See “Discovering Compute Nodes” in Chapter 2, for more information.
Set the node to boot your desired compute image (see cimage --list-images and “cimage Command” in Chapter 3 for your options), as follows:
# cimage --set mycomputeimage mykernel r1i0n0 |
Power up the node, as follows:
# cpower --boot r1i0n0 |
Enable the node in the batch scheduler (depends on your batch scheduler).
This section describe how to install and configure a spare admin, leader, or managed service node. The cold spare can be a shelf spare or a factory-installed cold spare that ships with your system. For more information on cold spare requirements and tools needed to do this procedure, see “Cold Spare Admin or Leader Node Availability”.
It covers the following topics:
| Note: When ordering shelf spare systems from SGI, it is important to order spare nodes appropriate to or in conjunction with your SGI Altix ICE system. This is because the Altix ICE serial number is programmed into the admin node itself. If you try to migrate the admin node to a shelf spare system that does not have the correct Altix ICE system serial number programmed into it, parts of Tempo software may not work correctly. In particular, the Embedded Support Partner (ESP) software will fail to start if the system serial number does not match the number that was previously in use. |
Depending on the system ordered, your SGI Altix ICE system should be mounted in an SGI rack or racks. The system admin controller (admin node) and rack leader controller (leader node) are generally installed within (or in some cases on top of) the system rack. For an example, see Figure 1-1. The replacement of a failed admin node or leader node is accomplished in four basic steps:
Identify the failed unit and disconnect system and power cables.
Transfer the disk drives from the failed server into the cold spare unit.
Connect the applicable cables to the cold spare server.
Power-up the new server and restart the ICE system.
For detailed procedures on installing a cold spare, see sections“Identify the Failed Unit and Unplug all Cables”, “Transfer Disks from Existing Server to the Cold Spare”, “Migrating to a Cold Spare: Importing the Disk Volumes” and “Migrating to a Cold Spare: Booting for the First Time on the Migrated Node”.
| Note: If you are using multiple root slots (making use of cascading dual-boot as described in “Cascading Dual-Boot” in Chapter 2) the procedures described in this section will have to be repeated for each slot. |
A cold spare node is like an existing admin or leader node, but it sits on a shelf or is a factory preinstalled node to be used in an emergency.
If the admin or leader node should fail, the cold spare can be swapped in to position to take over the duities of the failed node.
If you wish to make use of cold spare nodes, SGI suggests that you have both an admin node and a leader node on the shelf as available spares. Some of the reasons to have two separate nodes instead of one are (not an exhaustive list), as follows:
The BIOS settings of an admin and leader node are different. For example, an admin node does not PXE boot by default. However, a leader node must PXE boot each boot. This means the boot order is different for each type.
The BMC of a leader node is set up to use DHCP by default. An admin node may not be set up this way.
Given the examples cited about, if you try to use a shelf-spare admin node as a leader, the leader will not be properly discovered.
Currently, the hardware replacement procedure described in this section only supports Altix ice-csn nodes, that is, admin controller and rack leader controller nodes and managed service nodes.
You will need a Video Graphics Array (VGA) screen and a keyboard to perform this procedure. This is because you need to interact with the LSI BIOS tool to import the root volumes. You cannot do this from an Intelligent Platform Management Interface (IPMI) serial console session because of the following:
For leader nodes, the cluster does not know the MAC addresses of the replacement BMC so there is no way for the cluster to connect to it until the migration script is run.
The LSI BIOS tool requires the use of Alt characters which often do not transfer through the serial console properly.
If you have already identified the failed admin node or leader node, proceed with disconnecting the cables from the failed unit. The front panel lights on the server can indicate if the unit has failed and give you information on why, see Figure 5-1.
The universal information LED (left side of the panel) shows two types of failure that can bring the server down. This multi-color LED blinks red quickly to indicate a fan failure and blinks red slowly for a power failure. A continuous solid red LED indicates a CPU is overheating.
If the unit's power supply has failed or been disconnected, the power LED (far right) will be dark. Check both ends of the power cable for a firm connection prior to switching over to the cold spare.
If you find that an admin node or leader node has failed and you need to replace it with a cold spare system, this section describes what to do in terms of the physical hardware.
Admin nodes are the only node type that store the system-wide serial number. Therefore, if you use a shelf spare leader node as an admin node, ESP will fail to start properly due to the system serial number mismatch and much of the logging and monitoring infrastructure will fail to function. The admin node shelf spares must be ordered from the factory as an admin node shelf spare so that the proper serial number can be stored within.
To replace an admin node or leader node that has failed, perform the following steps:
Power down the failed node (if possible).
Disconnect both power cables, see Figure 5-2 for server connection locations.
Remove the two system disks from the failed node and set them aside for later reinstallation.
Unplug the Ethernet cable used for system management (be sure to note the plug number. Label the cables to avoid confusing them. It is important that they stay in the same jacks in the new node). See the example drawing in Figure 1-4 on page 6. This connection is vital to proper system management and communication. The Ethernet cable must be connected to the same plug on the cold spare unit.
If the unit has a system console attached, remove the keyboard, mouse, and video cables.
Remove the system from the rack.
Install the shelf spare system into the rack.
Install the system disks you set aside in step 3 (from the system you are replacing).
Connect the Ethernet cables in the same way they were connected to the replaced node.
Connect AC power.
Connect a keyboard and VGA monitor (and mouse if you like).
Do NOT power up the system just yet. Proceed to “Migrating to a Cold Spare: Importing the Disk Volumes”.
| Note: The factory-installed cold spare does NOT ship with disks so you need to transfer existing disks and PCI cards from the existing server to the cold spare before mounting the spare rack. |
This section describes how to import the disk volumes into the new node installed in “Identify the Failed Unit and Unplug all Cables”.
| Note: This section does not apply to SGI Altix XE250 systems with MegaRAID SAS/SATA storage hardware. |
To import the disk volumes into the new node, perform the following steps:
At this time, you can power up the system using the power button.
Watch the VGA screen output.
When you see the LSI BIOS tool come up up, enter Ctrl-C. This will instruct the LSI BIOS tool to enter the configuration utility.
A screen appears listing the LSI controllers in the system. Normally, there is just one. Hit the Enter key to proceed.
Choose RAID Properties.
It is important to note that the controller supports only two RAIDs at a time. Therefore, if the system had two volumes at a time in the past, one or more volumes may appear empty now. It is important to use the utility to delete these empty volumes representing disks that are no longer installed before proceeding. Otherwise, if the tool sees more than one volume, activating volumes will not work.
Enter Alt-N to browse the list of volumes. Delete the empty ones as described in the step, above. Eventually, you will encounter an inactive volume. This inactive volume represents the disks you migrated from the failed node to this node.
With the inactive volume selected, choose Manage Array.
Choose Activate and answer y to the activate and exit this menu choice.
At this point, especially if the node has more than one volume, it is important to select the migrated system disk volume as the boot volume. To select the boot volume, choose SAS Topology .
In SAS Topology, you can expand the volumes to see the disks within them if you choose by hitting Enter on volumes.
Choose the volume that represents your newly imported volume. Highlight it, then enter Alt-B.
You should see that the volume now has a Boot flag associated with it.
| Note: If, after you exit the tool, the system does not appear to boot from the disk. You may have selected the wrong volume from which to boot. In that case, reset, re-enter the LSI BIOS Tool, and choose a different volume to be the boot volume. |
Escape out of the LSI tool and exit.
Keep watching the VGA screen! You will have to hit a key at the correct moment in the next section. Go to “Migrating to a Cold Spare: Booting for the First Time on the Migrated Node”.
This section provides details on booting the system for the first time on the replacement node. These instructions include some special things you need to do with GRUB boot loader to ensure, for this boot only, that the console output goes to the VGA screen. This important because on leader nodes, there is no way to connect to the BMC with IPMI at this moment to use the IPMI serial console. The console command will not work for the leader node until the system is configured as described in this section. The network will not be properly configured until the end of this procedure either.
To boot for the first time on a migrated node, perform the following steps:
Ensure that the VGA console is powered on.
At this moment, the node is in the process of resetting because you exited the LSI BIOS tool at the end of the procedure, above (see “Migrating to a Cold Spare: Importing the Disk Volumes”).
| Note: After rebooting, drive 1 will resync with drive 0, automatically. Drive 1 will have the RED LED on during this time. This process takes from eight to 48 hours depending on the drive size. During that period, the RAID redundancy is not available but the system will function normally. |
| Note: If it is not booting from disk, the wrong volume may be selected as the boot disk in the LSI BIOS tool. See “Migrating to a Cold Spare: Importing the Disk Volumes”. |
Arrow down once so that the line starting kernel is highlighted.
Enter e to edit the kernel parameters.
Now you need to add console=tty0 as the final parameter in the list. This ensures that console output goes to the VGA screen for this boot. Enter the space character followed by console=tty0. The line should look similar to the following after adding the console parameter (characters wrapped in the front):
<hkernel=128M@16M rootflags=prjquota,logbsize=256k console=tty0 |
Press the Enter key.
Enter b to boot the system.
The system will now boot with console output going to the VGA screen.
Networking will fail to start and some error messages will appear.
It is normal to see that the Ethernet devices were renumbered. This will be fixed below.
Eventually the login prompt will appear.
Log in as root.
The following script fixes the network settings and update the SGI Tempo database for the new network interfaces, as follows:
# migrate-to-shelf-spare-node |
| Note: If you have Ethernet cards installed, in addition to the ones that come with the system itself, the script could possibly guess the integrated Ethernet devices incorrectly. This may mean you have to manually configure networking including the ifcfg-eth-id-* files in /etc/sysconfig/network and the /etc/udev/rules.d/30-net_persistent_names.rules file (to number them how you want and ensure integrated Ethernet is eth0 and eth1). |
At this time, networking should be operational.
Reboot the node and let it boot normally.
You can use the SGI Tempo inventory verification tool to query, take snapshots, analyze and compare the node and network inventory of a cluster. Various hardware, network and operating system configuration properties are available and are presented in user-specified formats.
| Note: If you are reinstalling the system admin controller (admin node), you may want to make a backup of the cluster configuration snapshot that comes with your system so that you can recover it later. You can find it in the /opt/sgi/var/ivt directory on the admin node; it is the earliest snapshot taken. You can use this information with the interconnect verification tool (IVT) to verify that the current system shows the same hardware configuration as when it was shipped. For more information, see “Installing Software on the System Admin Controller ” in Chapter 2. |
To make an inventory snapshot of an Altix ICE system, use the following command from the system admin controller (admin node).
admin:~ # ivt -M Making a cluster inventory snapshot. Takes a couple of minutes... |
Each snapshot is assigned a unique number and marked with the date and time it was taken. Use the ivt -L command to list active snapshot information, as follows:
admin:~ # ivt -L
1 2007-07-13.11:42:47 |
You can query (-Q option), compare ( -C option) and analyze (-S option) existing snapshots. A variety of system hardware and configuration properties can be displayed. You can compare two snapshots to see what has changed or analyze a system snapshot for failed nodes and or see network fabric links.
You use the ivt command to show general information about your system (note that only a portion of the output of this command is shown below), as follows:
admin:~ # ivt -S
Your system has 6 compute blades.
All 6 blades have the following characteristics:
bios_date: 05/29/2007
cpu_core_count: 8
cpu_model: Intel(R) Xeon(R) CPU E5345 @ 2.33GHz
kernel: 2.6.16.46-0.12-smp
memsize: 2059264
os_product: SLES
os_vendor: SUSE
os_version: 10.1
The following characteristics have different values for some blades.
ib0_phys_state (State of InfiniBand ib0 physical link):
4 blades have ib0_phys_state == LinkUp (r1i0n0, r1i1n0, r1i0n8, ...)
2 blades have ib0_phys_state == unknown (r1i0n1, r1i1n1)
Query the value for all blades with:
ivt -Q -w blades -f 'blade $blade has ib0_phys_state $ib0_phys_state'
ib0_rate (Rate of InfiniBand ib0 link - Gb/sec):
2 blades have ib0_rate == unknown (r1i0n1, r1i1n1)
4 blades have ib0_rate == 20 (r1i0n0, r1i1n0, r1i0n8, ...)
Query the value for all blades with:
ivt -Q -w blades -f 'blade $blade has ib0_rate $ib0_rate'
...
ib_bios_rev (Revision of InfiniBand BIOS on blade):
2 blades have ib_bios_rev == unknown (r1i0n1, r1i1n1)
4 blades have ib_bios_rev == 1.2.0 (r1i0n0, r1i1n0, r1i0n8, ...)
Query the value for all blades with:
ivt -Q -w blades -f 'blade $blade has ib_bios_rev $ib_bios_rev'
image (image provisioned on blade):
5 blades have image == compute-sles10sp1 (r1i0n1, r1i1n1, r1i1n0, ...)
1 blades have image == erikj-blade-mksiimage (r1i0n0)
Query the value for all blades with:
ivt -Q -w blades -f 'blade $blade has image $image'
rack_blade_count (number of booted blades in this blades rack):
2 blades have rack_blade_count == 5 (r1i0n1, r1i1n1)
4 blades have rack_blade_count == 4 (r1i0n0, r1i1n0, r1i0n8, ...)
Query the value for all blades with:
ivt -Q -w blades -f 'blade $blade has rack_blade_count $rack_blade_count'
InfiniBand GUID check:
Do fabric (ibnetdiscover) and blades (ib stat) have same GUIDs?
ib0 plane: unmatched GUIDs
GUIDs seen on blade ports, missing on fabric: unknown 0030487aa7940000
GUIDs see on fabric, missing on blade ports: 0030487aa7840000 0030487aa7980000
ib1 plane: unmatched GUIDs
GUIDs seen on blade ports, missing on fabric: unknown 0030487aa7950000
GUIDs see on fabric, missing on blade ports: 0030487aa7850000 0030487aa7990000
InfiniBand Link state check:
Are any IB ports not ACTIVE, not 20 Gb/sec rate or not Up?
...
|
You can use the ivt -c cpu command to show an inventory of the system compute blades and the number of CPUs each blade contains, as follows:
admin:~ # ivt -c cpu r1i0n0 has 8 CPUs r1i0n1 has 8 CPUs r1i0n8 has 8 CPUs r1i1n0 has 8 CPUs r1i1n1 has 8 CPUs r1i1n8 has 8 CPUs |
You can use the ivt tool to determine which compute nodes (blades) are up or down, as follows:
admin:~ # ivt -Q -w blades -f '$blade $sshstate' r1i0n0 up r1i0n1 down r1i0n8 up r1i1n0 up r1i1n1 down r1i1n8 up |
You can use the ivt tool to determine the GigE Ethernet address for each compute node (blade) , as follows:
admin:~ # ivt -Q -w blades -f '$blade $gige_ip_addr' r1i0n0 192.168.159.10 r1i0n1 192.168.159.11 r1i0n8 192.168.159.18 r1i1n0 192.168.159.26 r1i1n1 192.168.159.27 r1i1n8 192.168.159.34 |
For detailed information on how to use the ivt tool, see the ivt(8) man page or ivt -h, --help usage statement.
Ganglia is a scalable, distributed monitoring system for monitoring system for high-performance computing systems, such as the SGI Altix ICE system. It displays web browser-based, real-time (on demand) histograms of system metrics, as shown in Figure 5-4.
Detailed information about the Ganglia monitoring system is available at: http://ganglia.info/.
SGI Tempo has devised a Ganglia model for the Altix ICE system that makes maximum use of Ganglia's highly scalable architecture: each compute node (blade) presents a single monitoring source sending its statistics to the rack leader controller. Therefore, the rack leader controller receives, at most, data from 64 blades. After collecting the data, the rack leader controller forwards aggregated rack statistics to the system admin controller (admin node). The rack leader controller also sends its own statistics to the system admin controller. The system admin controller presents the meta-aggregator for the entire Altix ICE system. It collects data from all rack leaders and presents the cluster-wide metrics. This model enables SGI to scale-out Ganglia to very large cluster deployments.
The Node View as shown in Figure 5-5 can aid in system troubleshooting. For every blade in the system, the Location field of the Node View shows the exact physical location of the blade. This is an extremely useful when trying to locate a blade that is down.
This section describes the operation of the Ganglia system monitor and covers the following topics:
To access the Ganglia system monitor, point your browser to the following location: http://admin_pub_name /ganglia
By default, Ganglia monitors standard operating system metrics like CPU load, memory usage. The Grid Report view shows an overview of your system, such as the number of CPUs, the number of hosts (compute nodes) that are up or down, service node information, memory usage information, and so on.
The Last pull down menu allows you to view performance data on an hourly, daily, weekly, or yearly basis. The Sorted pull down menu allows provides an ascending, descending, or by host view of performance data. The Grid pull-down menu allows you to see performance data for a particular rack or service node. The Get Fresh Data button allows you to see current data performance.
The system admin controller, rack leader controllers, the service nodes, the chassis management controllers (CMCs) and all the compute nodes (blades) are equipped with a specialized controller, called the Board Management Controller (BMC). This unit provides a broad set of functions as described in the IPMI 2.0 standard. SGI TEMPO software uses the BMCs predominantly for remote power management, remote system configuration, and for gathering critical hardware events.
Currently, critical hardware events are gathered for the following nodes: rack leader controllers (leader nodes), CMCs and compute nodes (blades). These events are logged in the following locations:
/var/log/messages via syslog
var/log/sel/sel.log
Embedded Support Partner (ESP)
Whenever critical hardware event occurs, information is forwarded about the event to all three locations. You can observe a critical hardware event via syslog, via sel.log or using ESP. Furthermore, administrator-defined actions can be triggered via ESP, for instance sending an e-mail notification to the system administrator. For more information on ESP, see esp(5) man page and the SGI Embedded Support Partner User Guide.
All critical hardware events are summarized under the BMC_CMC event type. One particular event holds the following useful information:
MSG ::= <syslog-prefix> TEMPO:<node> EVENT:<event> APP:<app> Date:<date> VERSION:<version> TEXT <text> |
The following fields are all of the type string:
| <node> | node name, for example, r1i0n5 | |
| <event> | BMC_CMC | |
| <app> | SEL-LOGGER | |
| <date> | date / time of the event | |
| <version> | 1.0 | |
| <text> | Exact copy of the hardware event description from the BMC |
After reading the events from the BMCs, the BMC event logs are cleared on the controller to avoid duplicate events.
The availability of each node in an SGI Altix ICE system is monitored by a lightweight daemon called tempohbc. Each managed service node, rack leader controller (leader node), and compute node runs this daemon and reports its status to the server which monitors it. The server daemon, which runs on the admin node and leader nodes, reports if the client is down after approximately 120 seconds. In this event, a HEARTBEAT Embedded Support Partner (ESP) event is generated. You can observe this event via syslog or using ESP. Furthermore, administrator-devined actions can be triggered, for instance sending an e-mail notification to the system administrator. For more information on ESP, see esp(5) man page and the SGI Embedded Support Partner User Guide.
The HEARTBEAT event contains the following useful information:
MSG ::= <syslog-prefix> TEMPO:<node> EVENT:HEARTBEAT APP:TEMPOHBD Date:<date> VERSION:1.0 TEXT <text> |
The HEARTBEAT event is created when nodes fail or recover, described by the TEXT field.
The following fields are all of the type string:
| <node> | node name, for example, r1i0n5 | ||
| <date> | date / time of the event | ||
| <text> | Description of event:
|
A wealth of system metrics are also available through the Performance Co-Pilot (see Performance Co-Pilot Linux User's and Administrator's Guide). The Performance Co-Pilot collection daemon (PMCD) runs on the admin node, managed service nodes, and rack leader nodes. A performance metrics domain agent (PMDA) is running on the rack leader nodes, which collects metrics from the compute nodes.
The new cluster metrics domain contains metrics that were previously available in other PMDAs. The method in which they are collected is different in a Tempo system, in order to minimize load on the compute nodes. The following metrics are available for each compute node in a system by querying the PMCD on their rack leader node:
admin:~ # pminfo -h r1lead cluster cluster.control.suspend_monitoring cluster.kernel.percpu.cpu.user cluster.kernel.percpu.cpu.sys cluster.kernel.percpu.cpu.idle cluster.kernel.percpu.cpu.intr cluster.kernel.percpu.cpu.wait.total cluster.mem.util.free cluster.mem.util.bufmem cluster.mem.util.dirty cluster.mem.util.writeback cluster.mem.util.mapped cluster.mem.util.slab cluster.mem.util.cache_clean cluster.mem.util.anonpages cluster.network.interface.in.bytes cluster.network.interface.in.errors cluster.network.interface.in.drops cluster.network.interface.out.bytes cluster.network.interface.out.errors cluster.network.interface.out.drops cluster.network.ib.in.bytes cluster.network.ib.in.errors.drop cluster.network.ib.in.errors.filter cluster.network.ib.in.errors.local cluster.network.ib.in.errors.remote cluster.network.ib.out.bytes cluster.network.ib.out.errors.drop cluster.network.ib.out.errors.filter cluster.network.ib.total.errors.link cluster.network.ib.total.errors.recover cluster.network.ib.total.errors.integrity cluster.network.ib.total.errors.vl15 cluster.network.ib.total.errors.overrun cluster.network.ib.total.errors.symbol |
The list of metrics that are monitored by the compute node and are pushed to the PMCD on the leader node is configurable. In some cases , it may be even be desirable to disable metric collection entirely, as follows:
# cexec --head --all pmstore cluster.control.suspend_monitoring 1 pmstore -h r1lead cluster.control.suspend_monitoring 1 |
The default list of metrics that are collected by each compute node contains 41 metrics. There are dozens more available in the cluster.* namespace. The default list is stored on each leader node in the /var/lib/pcp/pmdas/cluster/config file. Changing this file will allow you to modify the default metric list with rack granularity. To change the list on a single node store a newline-delimited list of metrics to the node's instance of the cluster.control.metrics metric.
To see the current metric list for a compute node, perform the following:
# pmval -h r1lead -s 1 -i 'r1i1n0' cluster.control.metrics
metric: cluster.control.metrics
host: r1lead
semantics: discrete instantaneous value
units: none
samples: 1
r1i1n0
"cluster.kernel.percpu.cpu.user
cluster.kernel.percpu.cpu.nice
cluster.kernel.percpu.cpu.sys
cluster.kernel.percpu.cpu.idle
cluster.kernel.percpu.cpu.intr
cluster.kernel.percpu.cpu.wait.total
cluster.mem.util.free
cluster.mem.util.bufmem
cluster.mem.util.dirty
cluster.mem.util.writeback
cluster.mem.util.mapped
cluster.mem.util.slab
cluster.mem.util.cache_clean
cluster.mem.util.anonpages
cluster.infiniband.port.rate
cluster.infiniband.port.in.bytes
cluster.infiniband.port.in.packets
cluster.infiniband.port.in.errors.drop
cluster.infiniband.port.in.errors.filter
cluster.infiniband.port.in.errors.local
cluster.infiniband.port.in.errors.remote
cluster.infiniband.port.out.bytes
cluster.infiniband.port.out.packets
cluster.infiniband.port.out.errors.drop
cluster.infiniband.port.out.errors.filter
cluster.infiniband.port.total.bytes
cluster.infiniband.port.total.packets
cluster.infiniband.port.total.errors.drop
cluster.infiniband.port.total.errors.filter
cluster.infiniband.port.total.errors.link
cluster.infiniband.port.total.errors.recover
cluster.infiniband.port.total.errors.integrity
cluster.infiniband.port.total.errors.vl15
cluster.infiniband.port.total.errors.overrun
cluster.infiniband.port.total.errors.symbol
cluster.network.interface.in.bytes
cluster.network.interface.in.errors
cluster.network.interface.in.drops
cluster.network.interface.out.bytes
cluster.network.interface.out.errors
cluster.network.interface.out.drops
" |
An example that changes the metric list to only include the CPU metrics for r1i1n0 is, as follows:
# pmstore -h r1lead -i 'r1i1n0' cluster.control.metrics 'cluster.kernel.percpu.cpu.user cluster.kernel.percpu.cpu.nice cluster.kernel.percpu.cpu.sys cluster.kernel.percpu.cpu.idle cluster.kernel.percpu.cpu.intr cluster.kernel.percpu.cpu.wait.total |
The sensor data repository (SDR) metrics are available through Performance Co-Pilot (see Performance Co-Pilot Linux User's and Administrator's Guide). The SDR provides temperature, voltage, and fan speed information for all service nodes, leader nodes, compute nodes, and CMCs. This information is collected from service and compute nodes through their BMC interface, so it is out-of-band and does not impact the performance of the node.
The following metrics are available through the PMCD:
admin:~ # pminfo -h r1lead sensor sensor.value.fan sensor.value.voltage sensor.value.temperature |
<nodeName>:<nodeType>:<metricName> nodeName ::= Tempo node names (rXlead, rXiYc, rXiYnZ) nodeType ::= "service", "cmc", "blade", "leader" |
For example, to view voltages for the rack leader node, perform the following
admin:~ # pminfo -h r1lead -f sensor.value.voltage | grep -E '(^$|^sensor|r1lead)'
sensor.value.voltage
inst [0 or "r1lead:leader:CPU1_Vcore"] value 1.3
inst [1 or "r1lead:leader:CPU2_Vcore"] value 1.3
inst [2 or "r1lead:leader:3.3V"] value 3.26
inst [3 or "r1lead:leader:5V"] value 4.9
inst [4 or "r1lead:leader:12V"] value 11.71
inst [5 or "r1lead:leader:-12V"] value -12.3
inst [6 or "r1lead:leader:1.5V"] value 1.47
inst [7 or "r1lead:leader:5VSB"] value 4.9
inst [8 or "r1lead:leader:VBAT"] value 3.31 |
For additional examples on how to retrieve values using pmval(1) and for using this data in trend analysis using pmie(1), see the appropriate man page and the Performance Co-Pilot Linux User's and Administrator's Guide.
Currently, in temperature.pmie there are values that will "Monintor: shut down compents if temp too high" . This featue is enabled by default as a safety mechanism. The procedure below describes how to turn it off.
To turn off the temperature.pmie feature, perform the following steps:
Edit the /var/lib/pcp/config/pmie/control file to comment out or remove the line that calls /opt/sgi/lib/temperature.pmie.
Run the /etc/init.d/pmie restart command.
You can use the Cluster Performance Monitor to monitor your Altix ICE system. Log into the admin node using the ssh -X command. Execute the pmice command and the pmice - Cluster Performance Monitor appears, as follows:
admin:~ # pmice --h /usr/bin/pmice: illegal option -- - Info: Usage: pmice [options] [pmgadgets options] options: -K list Show these CPUs. Comma-separated list -N list Show these nodes. Comma-separated list -R list Show these racks. Comma-separated list -V Verbose/diagnostic output pmgadgets(1) options: -C check configuration file and exit -h host metrics source is PMCD on host -n pmnsfile use an alternative PMNS -t interval sample interval [default 2.0 seconds] -z set reporting timezone to local time of metrics source -Z timezone set reporting timezone -zoom factor make the gadgets bigger by a factor of 1, 2, 3 or 4 -infofont fontname use fontname for text in info dialogs -defaultfont fontname use fontname for label gadgets -display display-string -geometry geometry-string -name name-string -title title-string -xrm resource |
The Embedded Support Partner (ESP) is a software suite to monitor events, set up proactive notification, and generate reports on SGI Altix systems. This section describes how to set it up on an SGI Altix ICE system. For detailed information about ESP, see Embedded Support Partner User Guide.
To set up ESP on an SGI Altix ICE system, perform the following steps:
From the admin node, use the chkconfig command to make sure that the state of ESP is on, as follows:
admin:~ # chkconfig --list | grep esp
esp 0:on 1:on 2:on 3:on 4:on 5:on 6:on
sgi-esphttp: on
sgi_espd: on |
ESP should already be running if its chkconfig flag is on. You can interact with ESP using a web interface or the command line (see Chapter 4, “Setting Up the ESP Environment” in the Embedded Support Partner User Guide.
From the admin node, create the default ESP user account, as follows:
admin:~ # espconfig -createadmin |
Enable the hosts that will be allowed to access ESP with the following commands:
admin:~ # espconfig -enable ipaddr 127.0.0.0 admin:~ # espconfig -enable ipaddr 127.0.0.1 admin:~ # espconfig -enable ipaddr IP_address_of_client |
From your laptop or PC system, point your browser to http://mymachine__-admin:5554 and log into ESP.
When the ESP login screen appears, login as administrator, use the password partner. After you login, the System Information screen appears (see Chapter 2, “Accessing ESP” Embedded Support Partner User Guide.
Now enter the Customer Profile information, as follows:
Select ESP Administration from the menu.
Click on Customer Profile (if not selected by default).
Fill in the form and then click Add.
Click Commit; or Update if already filled out.
Use ESP to Examine Inventory, as follows:
Select Reports Hardware Generate Report.
Select Reports Software Generate Report.
You can search for individual packages by entering the name in the search box (below the system host name) and then selecting GO on the right hand side of the screen. You can also use the down arrow to select a package in this search box.
Use ESP to enable or disable Performance Monitoring, as follows:
Select Configuration (from the top level menu) and then select Performance Monitoring.
Enable PMIE.
Disable the PMIE rule cpu.util.
Select Commit.
Select Configuration System Monitoring and enable the service pmcd.
Select Update and Commit (this may take a few minutes).
Use ESP to examine errors logs, as follows:
From the top level menus, select Report Events .
Then select Last 30 days and All Classes before clicking on Generate Report.
Use ESP to enable or disable Notification
Notification of events is handled by espnotify. The notication can be of types e-mail, system console, or graphics console. The notifications are enabled or disabled by specific actions. So after configuring the notification action you can enable or disable the notification, as follows:
Select Configuration Actions and click Continue.
Decide on the notification format and then check and select Continue and Commit .
Select Enable/ Disable from the third level menu, and click to enable the notification you set up.
Click Commit.
This section describes some troubleshooting tools and covers these topics:
You can run the dbdump script to see an inventory of the Altix ICE database.
The dbdump command is, as follows:
/opt/sgi/sbin/dbdump --admin /opt/sgi/sbin/dbdump --leader /opt/sgi/sbin/dbdump --rack [--rack ] /opt/sgi/sbin/dbdump |
Use the --admin argument to dump the system admin controller (admin node)
Use the --leader argument to dump all rack leader controllers (leader nodes)
Use the --rack argument to dump a specific rack
Use the dbdump command without any argument to dump the entire Altix ICE system.
EXAMPLES
Example 5-1. dbdump Command Examples
To dump the entire database, perform the following:
admin:~ # dbdump
0 is { cluster=oscar ifname=service0-bmc dev=bmc0 ip=172.24.0.3 net=head-bmc node=service0
nodetype=oscar_service mac=00:30:48:8e:
1 is { cluster=oscar ifname=service0 dev=eth0 ip=172.23.0.3 net=head node=service0
nodetype=oscar_service mac=00:30:48:33:53:2e }
2 is { cluster=oscar ifname=service0-ib0 dev=ib0 ip=10.148.0.2 net=ib-0 node=service0
nodetype=oscar_service }
3 is { cluster=oscar ifname=service0-ib1 dev=ib1 ip=10.149.0.2 net=ib-1 node=service0
nodetype=oscar_service }
4 is { cluster=oscar dev=eth0 ip=128.162.244.86 net=public node=oscar_server
nodetype=oscar_server mac=00:30:48:34:2B:E0 }
... |
| Note: Some of the sample output in this section has been modified to fit the format of this manual. |
To dump just the rack leader controller, perform the following:
admin:~ # /opt/sgi/sbin/dbdump --leader
0 is { cluster=rack1 ifname=r1lead-bmc dev=bmc0 ip=172.24.0.2 net=head-bmc node=r1lead
nodetype=oscar_leader mac=00:30:48:8a:a4:c2 }
1 is { cluster=rack1 ifname=lead-bmc dev=eth0 ip=192.168.160.1 net=bmc node=r1lead
nodetype=oscar_leader mac=00:30:48:33:54:9e }
2 is { cluster=rack1 ifname=lead-eth dev=eth0 ip=192.168.159.1 net=gbe node=r1lead
nodetype=oscar_leader mac=00:30:48:33:54:9e }
3 is { cluster=rack1 ifname=r1lead dev=eth0 ip=172.23.0.2 net=head node=r1lead
nodetype=oscar_leader mac=00:30:48:33:54:9e }
4 is { cluster=rack1 ifname=r1lead-ib0 dev=ib0 ip=10.148.0.1 net=ib-0 node=r1lead
nodetype=oscar_leader }
5 is { cluster=rack1 ifname=r1lead-ib1 dev=ib1 ip=10.149.0.1 net=ib-1 node=r1lead
nodetype=oscar_leader } |
admin:~ # /opt/sgi/sbin/dbdump --rack 1
0 is { cluster=rack1 ifname=i0n0-bmc dev=bmc0 ip=192.168.160.10 net=bmc node=r1i0n0
nodetype=oscar_clients mac=00:30:48:7a:a7:96 }
1 is { cluster=rack1 ifname=i0n0-eth dev=eth0 ip=192.168.159.10 net=gbe node=r1i0n0
nodetype=oscar_clients mac=00:30:48:7a:a7:94 }
2 is { cluster=rack1 ifname=r1i0n0-ib0 dev=ib0 ip=10.148.0.3 net=ib-0 node=r1i0n0
nodetype=oscar_clients }
3 is { cluster=rack1 ifname=r1i0n0-ib1 dev=ib1 ip=10.149.0.3 net=ib-1 node=r1i0n0
nodetype=oscar_clients }
4 is { cluster=rack1 ifname=i0n1-bmc dev=bmc0 ip=192.168.160.11 net=bmc node=r1i0n1
nodetype=oscar_clients mac=00:30:48:7a:a7:86 slot=1 }
5 is { cluster=rack1 ifname=i0n1-eth dev=eth0 ip=192.168.159.11 net=gbe node=r1i0n1
nodetype=oscar_clients mac=00:30:48:7a:a7:84 slot=1 }
6 is { cluster=rack1 ifname=r1i0n1-ib0 dev=ib0 ip=10.148.0.4 net=ib-0 node=r1i0n1
nodetype=oscar_clients slot=1 }
7 is { cluster=rack1 ifname=r1i0n1-ib1 dev=ib1 ip=10.149.0.4 net=ib-1 node=r1i0n1
nodetype=oscar_clients slot=1 }
8 is { cluster=rack1 ifname=i0n10-bmc dev=bmc0 ip=192.168.160.20 net=bmc node=r1i0n10
nodetype=oscar_clients slot=10 }
9 is { cluster=rack1 ifname=i0n10-eth dev=eth0 ip=192.168.159.20 net=gbe node=r1i0n10
nodetype=oscar_clients slot=10 }
10 is { cluster=rack1 ifname=r1i0n10-ib0 dev=ib0 ip=10.148.0.13 net=ib-0 node=r1i0n10
nodetype=oscar_clients slot=10 }
... |
The tempo-info-gather command enables to collect vital system data especially when troubleshooting problems. The tempo-info-gather command collects the information about the following:
Digital media dminfo files, syslogs, Dynamic Host Configuration Protocol (DHCP), network file system (NFS)
MySQL cluster database dump
Network service configuration files, for example, C3, Ganglia, DHCP, domain name service (DNS) configuration files
A list of installed system images
Log files in /var/log/messages
Chassis management control (CMC) slot table for each rack
basic input-output system (BIOS), Baseboard Management Controller (BMC), CMC and InfiniBand fabric software versions from all Altix ICE nodes
To see a usage statement for the tempo-info-gather command, perform the following:
admin:/opt/sgi/sbin # tempo-info-gather -h
usage: tempo-info-gather [-h] [-P path] [-o file]
tempo-info-gather -h # Print this usage page
tempo-info-gather -o file # Tar and gzip the directories
into file (imply -n)
tempo-info-gather -p path # Directory to write the data
(default /var/tmp/tempo)
|
The cminfo command is used internally by many of the SGI Tempo scripts that are used to discover, configure, and manage an SGI Altix ICE system.
In a troubleshooting situation, you can use it to gather information about your system. To see a usage statement from a rack leader controller, perform the following:
r1lead:~ # cminfo --help Usage: cminfo [--bmc_base_ip|--bmc_ifname|--bmc_iftype|--bmc_ip|--bmc_mac|--bmc_netmask|--bmc_nic| --dns_domain|--gbe_base_i p|--gbe_ifname|--gbe_iftype|--gbe_ip|--gbe_mac|--gbe_netmask|--gbe_nic|--head_base_ip| --head_bmc_base_ip|--head_bmc_ifname| --head_bmc_iftype|--head_bmc_ip|--head_bmc_mac|--head_bmc_netmask|--head_bmc_nic|--head_ifname| --head_iftype|--head_ip|--he ad_mac|--head_netmask|--head_nic|--ib_0_base_ip|--ib_0_ifname|--ib_0_iftype|--ib_0_ip|--ib_0_mac| --ib_0_netmask|--ib_0_nic| --ib_1_base_ip|--ib_1_ifname|--ib_1_iftype|--ib_1_ip|--ib_1_mac|--ib_1_netmask| --ib_1_nic|--name|--rack] r1lead:~ # cminfo --bmc_base_ip |
EXAMPLES
Example 5-2. cminfo Command Examples
To see the rack leader node BMC IP address, perform the following:
r1lead:~ # cminfo --bmc_base_ip 192.168.160.0 |
To see the rack leader DNS domain, perform the following:
r1lead:~ # cminfo --dns_domain ice.domain_name.mycompany.com |
To see the BMC nic, perform the following:
r1lead:~ # cminfo --bmc_nic eth0 |
To see the IP address of the ib1 InfiniBand fabric, perform the following:
r1lead:~ # cminfo --ib_1_base_ip 10.149.0.0 |
The kdump utility is a kexec-based crash dumping mechanism for the Linux operating system. You can downlonad debuginfo kernel RPMs for use with crash and any kernel dumps at the following location: http://support.novell.com/linux/psdb/byproduct.html.
To get a traceback or system dump, perform the following from the system console:
console r1i0n0 ^e c l 1 8 ^e c l 1 t #traceback ^e c l 1 c #dump |
| Note: This example shows the letter “c”, a lowercase L “l”, and the number one “ 1” in all three lines. |
On the admin node, go to /net/r1lead/var/log/consoles for the traceback and /net/r1lead/var/log/dumps/r1i0n0 for the system dump.
You can dump a compute node, the rack leader, such as, r1lead, or a service node, such as, service0.
| Note: Your SGI Altix ICE system comes preinstalled with the appropriate firmware. See your SGI field support person for any BMC, BIOS, and CMC firmware updates. |
| sgi-ice-blade-bmc-1.43.5-1.x86_64.rpm | |
Blade BMC firmware and update tool | |
| sgi-ice-blade-bios-2007.08.10-1.x86_64.rpm | |
Blade BIOS image and update tool | |
| sgi-ice-cmc-0.0.11-2.x86_64.rpm | |
CMC firmware and update tool | |
To identify the BIOS you need both the version and the release date. You can get these using the dmidecode command. Log onto the node on which you want to interrogate BIOS level and perform the following:
# dmidecode -s bios-version; dmidecode -s bios-release-date |
The BMC firmware revision can be retrieved using the ipmiwrapper . For example, from the admin node, the following command gets the BMC firmware revision for r1i0n0:
# ipmiwrapper r1i0n0 bmc info | grep 'Firmware Revision' |
The CMC firmware version can can be retrieved using the version command to the CMC. For example, if you are logged onto the r1lead rack leader controller, the following command gets the CMC firmware version:
# ssh root@r1i0-cmc version |
The ibstat command retrieves information for the InfiniBand links including the firmware version. The following command gets the InfiniBand firmware version:
# ibstat | grep Firmware |
The firmware_revs script on the system admin controller (admin node) collects the firmware information for all nodes in the SGI Altix ICE system, as follows:
admin:~ # firmware_revs BIOS versions: -------------- admin: 6.00 r1lead: 6.00 service0: 6.00 r1i0n0: 6.00 r1i0n1: 6.00 r1i0n8: 6.00 r1i1n0: 6.00 r1i1n1: 6.00 r1i1n8: 6.00 BIOS release dates: ------------------- admin: 05/10/2007 r1lead: 05/10/2007 service0: 05/10/2007 r1i0n0: 05/29/2007 r1i0n1: 05/29/2007 r1i0n8: 05/29/2007 r1i1n0: 05/29/2007 r1i1n1: 05/29/2007 r1i1n8: 05/29/2007 BMC versions: ------------- admin: 1.31 r1lead: 1.31 service0: 1.31 r1i0n0: 1.29 r1i0n1: 1.29 r1i0n8: 1.29 r1i1n0: 1.29 r1i1n1: 1.29 r1i1n8: 1.29 CMC versions: ------------- r1i0c: 0.0.9pre10 r1i1c: 0.0.9pre10 Infiniband versions: -------------------- r1lead: 4.7.600 service0: 4.7.600 r1i0n0: 1.2.0 r1i0n0: 1.2.0 r1i0n1: 1.2.0 r1i0n1: 1.2.0 r1i0n8: 1.2.0 r1i0n8: 1.2.0 r1i1n0: 1.2.0 r1i1n0: 1.2.0 r1i1n1: 1.2.0 r1i1n1: 1.2.0 r1i1n8: 1.2.0 r1i1n8: 1.2.0 |