Chapter 2. System Control

This chapter describes the interaction and functions of system controllers in the following sections:

The L1/L2 control system for the SGI Altix 4700 series servers manages power control and sequencing, provides environmental control and monitoring, initiates system resets, stores identification and configuration information, and provides console/diagnostic and scan interface.

Each IRU enclosure has a system controller that can communicate with other IRU system controllers when they are NUMAlinked together under a single system image. Each system controller constantly shares its information with all controllers in the system. Note that optional mass storage enclosures do not have a system controller.

Figure 2-1 shows an example system control network using an optional and separate (remote) workstation to monitor a single-rack Altix 4700 system.

Figure 2-1. SGI Altix 4700 L1/L2 System Control Network (Example)

SGI Altix 4700 L1/L2 System Control Network (Example)

Levels of System Control

The system control network configuration of your server will depend on the size of the system and control options selected. Typically, an Ethernet connection to the system controller network is used. This Ethernet connection is either made from an IRU in a smaller system or from a Dense router in larger systems.

The system controller is designed into all IRUs. The level one (L1) system control hardware component is the basic administrative interface for each IRU. When an Ethernet interface is connected to the system control interface it spawns a level two (L2) application function. An Ethernet connection directly from the IRU or Dense router to a local private or public Ethernet allows the system to be administered directly from a local or remote console. Note that there is no interconnected system controller function in the optional storage modules.

The L1/L2 controllers within the system report and share status information via the NUMAlink cables, thus maintaining controller configuration and topology information between all controllers in the system.


Note: Mass storage option enclosures are not specifically monitored by the system controller network. Most optional mass storage enclosures have their own internal microcontrollers for monitoring and controlling all elements of the disk array. See the owner's guide for your mass storage option for more information on this topic.

For information on attaching network connected older SGI systems see the SGIconsole Hardware Connectivity Guide, (P/N 007-4340-00x).

System Controller Interaction

In all Altix 4700 servers all the system controllers communicate with each other in the following ways:

  • All enclosures within an Altix 4700 system communicate with each other through their NUMAlink connections using low voltage differential signaling (LVDS).

  • When connecting to the L1/ L2 host Ethernet connection on the system control board of an IRU or to a Dense router, the system controller spawns an L2 application providing L2 functionality.

L1 Controller

All IRUs and Dense routers have L1 controllers. The following subsections describe the basic features of all L1 controllers:

L1 Controller Functions

The following list summarizes the control and monitoring functions that the L1 controller performs. Many of the L1 controller functions are common across both IRU and Dense routers; however, some functions are specific to the type of enclosure.

  • Controls voltage margining within the IRU or Dense router

  • Controls and monitors IRU and Dense router fan speeds

  • Reads system identification (ID) PROMs

  • Monitors voltage levels and reports failures

  • Monitors and controls warning LEDs on the enclosure

  • Monitors the On/Off power switch

  • Monitors the reset switch and the nonmaskable interrupt (NMI) switch

  • Reports the population of the PCI cards and the power levels of the PCI slots in installed I/O blades

  • Powers on the PCI slots and their associated LEDs

L1 Front Panel Display

Figure 2-2 shows the L1 controller front panel on the IRU.

Figure 2-2. L1 Front Panel

L1 Front Panel

The front panel display contains the following items:

  • 2 x 12 character liquid crystal display (LCD). The display identifies the IRU, shows system status, warns of required service, and identifies a failed component.

  • Power (On/Off) button (insert paper clip to actuate) and power on LED.

  • Service required LED.

  • Failure LED.

  • Reset switch (insert paper clip to actuate).

  • Non-maskable interrupt (NMI) switch (insert paper clip to actuate).


    Note: The reset and NMI switch functions are available on the IRUs only.


Ethernet Switch

Use of an Ethernet switch is the preferred method of interconnecting large systems with multiple L2s and remote support hardware to multiple systems. An optional Ethernet switch provides multiple Ethernet connectors. Figure 2-3 shows example connections between the Ethernet switch, a network-connected workstation console, an IRU and other local SGI systems.

Figure 2-3. Ethernet Switch System Controller Block Diagram (Example)

Ethernet Switch System Controller Block Diagram (Example)

Console Hardware Requirements

The console type and how these console types are connected to the Altix 4700 servers is determined by what console option is chosen.

If you have an Altix 4700 server and wish to use a “dumb terminal”, you can connect the terminal via a serial cable to the (DB-9) console port connector on the system control board of the IRU.

The terminal should be set to the following functional modes:

  • Baud rate of 38,400

  • 8 data bits

  • One stop bit, no parity

  • Hardware flow control (RTS/CTS)

Note that the serial console is generally connected to the first (bottom) IRU in any single rack configuration.

You can also connect to the RJ45 Ethernet connector on the system control board of the IRU and place it on a network. This starts an L2 on the IRU's system controller. You can use the telnet command to connect.

To verify that the L2 is running, at the L1 prompt enter the following:

001c01-L2> l2

L2 Controller is running

001c01-L2>

If the L2 is not running; enter:

001c01-L2> ! init 4

By default the L2 uses DHCP to obtain an IP address. To set a state IP, access the L1 via the serial port and use the following commands:

001c01-L1> l2 ip <ip address> <netmask> <broadcast> 

001c01-L2> reboot_l1

These console connections enable you to view the status and error messages generated by the system controllers on your system. You can also use the console to input commands to manage and monitor your system(s). For more information on the L2, see “L2 Operation”.

For more information on connecting a console to an Altix 4700 series server, see “System Controller Network” in Chapter 1. For more information on monitoring your server, see “Monitoring Your Server” in Chapter 1.

Operating the L1

Each IRU and Dense router in the Altix 4700 system has an updated and enhanced system control implementation. This updated system controller provides both L1 and L2 functionality. The system controller utilizes an embedded version of the Linux operating system. L1 functionality is provided by an application that is always running on the system controller. When the enclosure is connected to a LAN via the L2 host connector, the system controller spawns an application that provides L2 functionality.

The L1 operates in one of these two modes, which are discussed in the sections that follow:

L1 Mode 

The L1 prompt is visible and all input is directed to the L1 command processor. The Altix 4700 server L1 system control can perform the following:

  • Managing power and sequencing control

  • Environmental monitoring and control functions

  • Initiation of system resets

  • Read/write storage for identification and configuration information

  • Provides console/diagnostic and scan interface

The L1 controller in each of the enclosures is a complete and fully functional system controller. All the blades are interconnected by NUMAlink and each shares its system control information with all other system controllers.

Console Mode from L1 

Output from the system is visible and all input is directed to the system console. 


Note: The “console mode from L1” mode is supported only if the system console L1 port is connected directly to the console system (laptop, PC, etc.).


L1 Mode

If you see a prompt of the following form, the L1 is ready to accept commands.

001c01-L1> 

Common operations are discussed in the following sections:

Viewing System Configuration (from an IRU's Perspective)

An L1 has limited knowledge of the system topology, depending on the system's configuration. Typically, an L1 has information only about L1s that are directly NUMAlink connected.

In large configurations with more than one L1, the L1 may have knowledge of only a portion of the L1s in the system. These configurations require the use of the L2, see “L2 Operation” for further details.

You can view an IRUs configuration information with the config command as in the following:

001c01-L1> config
:0  001c01 LOC
001c01-L1>

This example is a system with one IRU. The <number> that follows the colon (0, 1, and 2, from top to bottom in this example), refers to which local port the IRU is connected to or accessed through. The local (LOC) IRU is the IRU that is processing the command.

On all IRUs :0 is the local IRU, with other values referring to various ports. The specific port description follows the IRU's rack/type/slot field: (i.e. LOC, U-F, U-G, etc.)

021c01-L1> config 
:0 021c01 L0C
:2 021c11 L1H
:8 021r41 L0G
:5 022r41 R3G
021c01-L1>

Command Targeting

All commands entered affect only the local IRU. You can target a command to all IRUs (including the local IRU) by prefixing the command with an asterisk (*).

001c01-L1> * version 
001c01:
L1 0.7.37 (Image A), Built 11/24/2005 14:59:42 [2MB image]
001c11:
L1 0.7.37 (Image A), Built 11/24/2005 14:59:42 [2MB image]
001c21:
L1 0.7.37 (Image A), Built 11/24/2005 14:59:42 [2MB image]
001c01-L1>

Commands can be targeted to other L1s by preceding the command with a rack and slot:

001c01-L1> 1.11 version 

The command above issues a version command to IRU in rack 001, U position 11.

Some commands can be targeted to a specific blade within an IRU. Precede the command with the blade designator:

001c01-L1> b1 power down

The command above issues a power down command to the blade in blade slot 1 of the IRU in rack 001, U position 01.

Console Mode from L1

In console mode, output from the system boot process or OS is visible and all input is directed to the system. To enter console mode, press Ctrl+D at the L1 prompt:

001c01-L1> Ctrl+D 
entering console mode 001c01 console, <CTRL-T> to escape to L1
.
<system output appears here> 
.

To return to L1 mode, press Ctrl+T:

Ctrl+T 
escaping to L1 system controller
001c01-L1>

While in L1 mode, you can enter any L1 command. Once the command is executed, the L1 returns to console mode:

re-entering console mode 001c01 console, <CTRL-T> to escape to L1

To permanently engage the L1 mode, press Ctrl+T and then enter the l1 command:

Ctrl+T 
escaping to L1 system controller
001c01-L1> l1 
L1 command processor engaged, <CTRL-D> for console mode.
001c01-L1>

L1 Console Selection

If the system contains more than one IRU and a serial connection is utilized for the console, the serial cable must be connected to the IRU that is located in the lowest rack and slot position.

The select command shows the current console mode settings:

001c01-L1> select 
console input: 001c01 console0
console output: not filtered

The following are common subchannels associated with console communications:

  • Subchannel 0A specifies Blade 0, CPU A.

  • Subchannel 0C specifies Blade 0 CPU C.

  • Subchannel 1A specifies Blade 1, CPU A.

  • Subchannel 1C specifies Blade 1, CPU C.

  • Subchannel 2A specifies Blade 2, CPU A.

  • Subchannel 2C specifies Blade 2, CPU C.

  • Subchannel 3A specifies Blade 3, CPU A.

  • Subchannel 3C specifies Blade 3, CPU C.

  • Subchannel console0 Blade 0 console subchannel.

  • Subchannel console1 Blade 1 console subchannel.

The output from the select command:
console input: 001c01 console0
shows that the system controller will send input to IRU 001c01 blade 0 and the subchannel to be used is the console subchannel.

During the boot process, there is a window of time in which all processors may be producing output. This output can produce a somewhat jumbled output at the L1.

However, you can filter the console output so that the L1 shows output from only the processor chosen to receive console input. You can turn filtering on and off with the select filter command.

If you attempt to communicate with an IRU that is not responding, a time-out condition results:

001c01-L1> 

entering console mode 001c01 console, <CTRL-T> to escape to L1
no response from 001c01 junk bus console UART:UART_TIMEOUT

When this time-out condition occurs, either the IRU is hung or the subchannel is incorrect. An IRU is identified by its rack, type, and slot (001c01).

Viewing Information, Warnings, and Error Messages

All information, warnings, and error messages generated by any of the system controllers are in the following form:

001c01 ERROR: invalid arguments for `ver' command, try “help ver”

The general format includes a IRU identification and the type of message, followed by the message. A message may be the result of an invalid command, as shown in the example, or the result of tasks running on the L1, such as the environmental monitor.

Each L1 has a log of local events. Use the L1 command log to view events on any of the L1s.

L2 Operation

As mentioned in “System Controller Interaction” the system controller in an Altix 4700 system can provide L2 functionality.

An Ethernet cable can be plugged into the RJ45 connector on the IRU enclosure or on the Dense router. Connecting the IRU or Dense router to an active LAN via the L2 host connector will cause the system controller to spawn an L2. This connection provides network access to the system controller through the L2.

Configuring an L2's IP Address

This section refers to setting the IP address on the IRU enclosure when using an Ethernet connection.

Setting the IP address of the L2 on the target IRUs should be done before connecting the IRUs to the network as follows:

Connect a serial cable to the serial console port on the target IRU and get the L1 prompt.

To see if the l2 is running (it will be if the LAN cable is plugged in and the cable is connected to an active LAN).

At the L1 prompt type:

001c01-L1> l2
L2 Controller is running.
001c01-L1>

If the L2 is not running type:

001c01-L1>!  init  4

This switches the system controller to run level 4 and forces the L2 to be started whether or not the LAN is plugged in. There is a space between the “!” and “init”

Verify the L2 is running again as above.

To set the IP address on the L2 type:

001c01-L1> l2 ip a.b.c.d 255.255.255.0 a.b.c.255

Verifying that the system serial number is set on the L2:

001c01-L1> l2 serial 

To set the L2 system serial number:

001c01-L1> l2 serial set <serial number>

Verifying that msys is enabled (this allows multiple L2s in a system to exist peacefully with other L2s from another system on the same subnet)

001c01-L1> l2 msys

If msys is off, turn it on:

001c01-L1> l2 msys on

Reboot the system controller to make the IP address change take effect.

001c01-L1> reboot_l1

Once this is done for all target IRUs and Dense routers, connect them to the network (using an optional Ethernet switch if necessary).

The rackid on the L2 cannot be set with the L2 “rackid” command, instead it will be inherited from the local L1. As an example: the L2 running on the system controller in 1r41 will have a rack id of 141 (rack * 100 + slot of the local L1).

Once the L2 is running, you can telnet to the L2 using an optional LAN connected console system.

After the connection to the L2 controller is established, a prompt similar to the following appears, indicating that the L2 is ready to accept commands:

olympic-101-L2> 

Common operations are discussed in the subsections that follow.

Viewing System Configuration

You can use the L2's config command to view the current system configuration from an IRU level:

olympic-101-L2> config
L2 127.0.0.1: - 001 (LOCAL)
L1 127.0.0.1:0:0 	- 001c31
L1 127.0.0.1:0:1 	- 001c21
L1 127.0.0.1:0:2 	- 001c11
L1 127.0.0.1:0:3 	- 001c01
L2>

As shown above, config produces a list of IRUs and their locations in the system and the system controller address of each IRU and Dense router. This is similar to the output from using the config command on the L1 with the addition of the L2 IP address, L1 connection, and L1 index.

The structure of the IRU and Dense router address is as follows:

a.b.c.d:x:y

where:

a.b.c.d  

is the IP address of the L2. (In the example above, the IP address is 127.0.0.1.)

x  

connection number is only 0 for Altix 4700

y 

is the L1 index, as follows:

0 is the local IRU

A number greater than 0 indicates that it is attached directly to or indirectly to the local IRU. An IRU is identified by its rack, type, and slot (001c01). The structure of the IRU location is as follows:

rrrbss.p

where:

rrr 

is the rack number.

b 

is the enclosure type.

ss 

is the slot location of the enclosure.

p 

is the partition of the enclosure (not present if the system is not partitioned).

In the example shown above, 001c01 is an IRU in rack 001 and slot position 01.

Setting Command Targeting

If a command is not understood by the L2 system controller, in general it is passed to the L1 system controllers. The destination determines which L1s receive the command. A destination, specified by the following, is a range of racks and slots:

rack <rack list> slot <slot list> 

The <rack list> specifies a list of racks. This can be a list delimited by commas, such that 2,4,7 specifies racks 2, 4, and 7. You can use a dash to specify a range of racks, such that 2-4 specifies racks 2, 3, and 4. Both nomenclatures can be combined, such that 2-4,7 specifies racks 2, 3, 4, and 7.

You can specify the <slot list> using the same nomenclature. The slot number, sometimes referred to as a bay number, is the unit position number located on the rack, slightly above where the bottom of the IRU sits. Each rack unit position number is located toward the top of the two lines that mark the unit position that the number represents. For example, the rack numbering for a IRU located in slot 10 would appear on the left front side of the rack.

The slot <slot list> is optional; if not given, then all slots in the specified rack(s) are implied. You should avoid specifying a rack list and a slot list that includes multiple racks and slots, such as rack 2-4,7 slot 1-8,11,13. Generally, you specify a rack and slot together to specify an individual IRU or Dense router.

You can use the aliases r and s to specify rack and slot, respectively. You can use the alias all or * in both the <rack list> and the <slot list>, or by themselves, to specify all racks and all slots.

To send a command to all IRUs in a partition, enter the following:

partition <partition> <cmd>

Individual IRU and Dense routers can also be targeted with a short <rack>.<slot> prefix. As in 1.11 <command>.

To target individual blades in an IRU use the following syntax:

olympic-101-L2> 1.11 b1 power down

Executing the above command will power down the blade in blade slot 1 of the IRU in rack 001 U position 11.

Default Destination 

When the L2 starts, the default destination is set to all racks and all slots. You can determine the default destination by using the destination command:

L2> destination 
all racks, all slots
L2>

The following command sets the destinations to rack 2 and 3, all slots:

L2> r 2,3 destination 
2 default destination(s) set
L2>

The following example shows what IRUs are found in the default destination. If you enter a command not understood by the L2, the command is sent to these IRUs.


Note: In the current implementation, if you add an IRU to either rack 2 or 3, it would not be automatically included in the default destination. You would need to reset the default destination.


L2> destination 
001c01 (127.0.0.1:0:2)
001c01 (127.0.0.1:0:0)
L2>

The following command resets the default destination to all racks and all slots:

L2> destination reset 
default destination reset to all racks and slots
L2>

Current Destination 

The current destination is a range of racks and slots for a given command. For example, the following command sends the command <L1 command> to all IRUs in racks 2, 3, 4, and 7:

L2> r 2-4,7 <L1 command> 

This is a one-time destination.

Command Interpretation

Some L2 commands are the same as the L1 commands. In many cases, this is intentional because the L2 provides sequencing that is necessary for a command to function correctly.

When L1 and L2 commands are similar, you can ensure that an L1 command is entered for the IRUs current destination by preceding the command <L1 command> with the l1 command:

L2> r 2-4,7 l1 <L1 command> 

This is a one-time destination.

Viewing Information, Warnings, and Error Messages

All information, warnings, and error messages generated by any of the system controllers are in the following form:

001c01 ERROR: invalid arguments for `ver' command, try “help ver”

The general format includes an IRU identification and the type of message, followed by the message. A message may be the result of an invalid command, as shown in the example, or the result of tasks running on the L1, such as the environmental monitor.

Each L1 has a log of local events. Use the L1 command log to view events on any of the L1s.

Powering On, Powering Off, and Resetting the System From the L2

You can power on and power off the system with the power command. This command is interpreted by the L2, because the IRUs must be powered on in a specific order.

L2> power up 
L2>

The power command may require several seconds to several minutes to complete. In the example above, all racks and slots in the default destination are affected. Any errors or warnings are reported as described above in “Viewing Information, Warnings, and Error Messages”.

To power on or power off a specific IRU, specify a current destination:

L2> r 2 s 5 power up 
L2>

To power on or power off all IRUs in a partition, enter the following:

L2> partition <partition number> <power up or power down>

To reset the system, enter the following:

L2> reset
L2>

This command restarts the system by resetting all registers to their default settings and rebooting the system controllers. Resetting a running system will cause the operating system to reboot and all data in memory will be lost.

Console Mode from the L2 

In console mode, all output from the system is visible and all input is directed to the system.

To enter console mode from L2, press Ctrl+D at the L2 prompt and observe the response:

L2> Ctrl+D 
entering system console mode (001c01 console0),
<CTRL_T> to escape to L2
.
<system output appears here>
.

To return to L2 mode from console mode, press Ctrl+T:

Ctrl+T 
escaping to L2 system controller 
L2>

At this point, you can enter any L2 or L1 command. When the command completes, the L2 returns to console mode:

Re-entering system console mode (001c01 console0),
<CTRL_T> to escape to L2

To permanently engage the L2 mode, press Ctrl+T and then enter the l2 command:

Ctrl+T 
escaping to L2 system controller
L2> l2 
L2 command processor engaged, <CTRL_D> for console mode.
L2>

Console Selection

When in console mode, the L2 communicates with the IRU set with the select command to be the system console or global master. All input from the console is directed to that IRU. You can set and view the system console with the select command.

The L2 chooses an IRU as the default console in the following order of priority:

  • The IRU in the lowest numbered rack and slot, which has previously produced console output.

  • The IRU in the lowest numbered rack and slot.

The select command by itself shows the current console mode settings:

L2> select 
known system consoles (non-partitioned)

	001c01-L2 detected

current system console

console input: 001c01 CPU 0A
console output: not filtered

The following are ten common subchannels associated with console communications:

  • Subchannel 0A specifies Blade 0, CPU A.

  • Subchannel 0C specifies Blade 0 CPU C.

  • Subchannel 1A specifies Blade 1, CPU A.

  • Subchannel 1C specifies Blade 1, CPU C.

  • Subchannel 2A specifies Blade 2, CPU A.

  • Subchannel 2C specifies Blade 2, CPU C.

  • Subchannel 3A specifies Blade 3, CPU A.

  • Subchannel 3C specifies Blade 3, CPU C.

  • Subchannel console0 Blade 0 console subchannel.

  • Subchannel console1 Blade 1 console subchannel.

The select command output: “console input: 001c01 console0” shows that the L2 will send console input to IRU 001c01 blade 0 and the console subchannel will be used.

To change the IRU that will be the system console, use the select <rack>.<slot> command, where <rack> is the rack and <slot> is the slot where the IRU is located:

L2> select 1.1 
console input: 001c01 console
console output: no filtered
console detection: L2 detected

To change the subchannel used on the selected IRU, use the select subchannel <0A|0C|1A|1C> command. (Use the select subchannel console to select the current console as the subchannel of the IRU to be the system console.) For example, to select blade 1, CPU A as the subchannel of the IRU to be the system console, enter the following:

L2> select subchannel 1A
console input: 001c01 console CPU1A
console output: not filtered

During the boot process on a system with multiple CPUs, there is a window of time in which the CPUs are all producing output. This can result in a somewhat jumbled output from the L2. However, you can filter console output so that the L2 will show output from only the IRU chosen to receive console input. You can turn on filtering with the select filter on command and turn off filtering with the select filter off command.

If you attempt to communicate with an IRU chosen to receive console input but that is not responding, a time-out condition results:

L2> Ctrl+D 
entering console mode 001c01 CPU1A, <CTRL_T> to escape to L2

no response from 001c01 Junk bus CPU1A system not responding
no response from 001c01 Junk bus CPU1A system not responding

When this time-out condition occurs, either the IRU is hung or the subchannel is not correct.

L1 Mode From L2

In L1 mode, the prompt from a single L1 is visible, and all input is directed to that L1 command processor.

To enter L1 mode, enter the rack and a slot followed by l1:

L2> r 2 s 1 l1 

An alternate method is:

L2> 2.1 l1 
enterling L1 mode 001c01, <CTRL-T> to escape to L2
001c01-L1>

To return to L2 mode, press Ctrl+T:

001c01-L1> Ctrl+T 
escaping to L2 system controller, <CTRL-T> to send escape to L1
L2>

At this point, you can enter any L2 command. Once the command is executed, the L2 returns to L1 mode:

re-entering L1 mode 001c01, <CTRL-T> to escape to L2
001c01-L1>

To permanently engage the L2 mode, press Ctrl+T and enter the l2 command:

001c01-L1> Ctrl+T 
escaping to L2 system controller, <CTRL-T> to send escape to L1
L2> l2 
L2 command processor engaged, <CTRL-T> for console mode.

L2>

Upgrading L1 Firmware

The L1 firmware is currently distributed as part of the snxsc_firmware package. To determine which version of the package is installed on your system console, enter the following command:

$> rpm -q snxsc_firmware 

If the package is installed, the full package name (including the revision) is returned:

snxsc_firmware-1.18.3-1 

The L1 firmware binary and the utilities used to update it are stored in /usr/cpu/firmware/sysco.

Note that an Ethernet connection (LAN console option) is required to execute the commands described in this section. See “Console Hardware Requirements” and Figure 2-1 for descriptions of the hardware connections.

The L1 firmware consists of three parts:

  • Boot image

  • A image

  • B image

At boot time, the boot image validates the A and B image, and if it is not instructed otherwise, it executes the newer of the two images. Because the L1 is running one of the two images, the image not in use is the image that will be overwritten when the firmware is upgraded. You need to re-boot any L1 update either by power-cycling the IRU or by using the L1 command reboot_l1.

Typically, you will upgrade the firmware through the network connection from the LAN console workstation to the L1:

$> /usr/cpu/firmware/sysco/flashsc --12 10.1.1.1 -p /usr/cpu/firmware/sysco/l1.bin all 

This updates all the IRUs in the system. The -p at the end of the first line instructs the firmware to flash the proms in parallel.

You can update individual IRUs by replacing all with a rack and slot number:

$> /usr/cpu/firmware/sysco/flashsc --12 10.1.1.1 /usr/cpu/firmware/sysco/l1.bin 1.19 

This updates only the IRU in rack 1, slot 19.