This chapter explains how to operate your new system in the following sections:
Before operating your system, familiarize yourself with the safety information in the following sections:
| Caution: Observe all ESD precautions. Failure to do so can result in damage to the equipment. |
Wear an SGI-approved wrist strap when you handle an ESD-sensitive device to eliminate possible ESD damage to equipment. Connect the wrist strap cord directly to earth ground.
| Warning: Before operating or servicing any part of this product, read the “Safety Information” in Appendix B. |
| Warning: Keep fingers and conductive tools away from high-voltage areas. Failure to follow these precautions will result in serious injury or death. The high-voltage areas of the system are indicated with high-voltage warning labels. |
| Caution: Power off the system only after the system software has been shut down in an orderly manner. If you power off the system before you halt the operating system, data may be corrupted. |
All Altix 4700 enclosures contain an embedded microprocessor board and display assembly known as the system controller. This microprocessor runs an embedded version of the Linux operating system. The system controller runs off standby power and is running as long as the enclosure is connected to an active power source.
There are two primary applications that run on the system controller. The L1, or Level 1 system controller, is an application that provides control and monitoring functionality for each individual rack unit (IRU) enclosure, and communication to other L1s in adjacent enclosures connected via NUMALink 4 cables. The L1 is always resident.
The Level 2 (L2) system controller is an application that provides control over multiple L1s and communication to other L2s. The L2 is resident when the enclosure is connected by an Ethernet connection to a Local Area Network (LAN).
The system controller network provides the following functionality:
Powering the entire system on and off.
Powering individual IRUs and Dense routers on and off.
Monitoring the environmental state of the system
Viewing the system's status and error message information generated and displayed by the SGI system's L1 controller.
Enter L1 controller commands to monitor or change particular system functions. You can, for example, monitor the speed of fans for a particular individual rack unit (IRU) enclosure. See the SGI L1 and L2 Controller Software User's Guide for a complete list of commands.
Provides access to the system OS console allowing you to run diagnostics and boot the system.
A console is defined as a connection to the system that provides access to the system controller network. A console can be a LAN-attached personal computer, laptop or workstation (Ethernet connection) or a dumb terminal (serial connection).
The Altix 4700 series supports two types of console connections:
An ethernet connection on the system control board of an IRU or on a Dense router - an RJ45 connection is typically labeled “L2 host”
A serial connection to the serial console port on the system control board of an IRU or the serial console port on a Dense router - a DB9 connector typically labeled “Console”
The ethernet connection is the preferred method of accessing the system console. Depending on the size of the system, there will be one or more ethernet connections utilized.
When an enclosure is connected to the LAN, the system controller will start the ethernet interface with either the pre-assigned IP address or will attempt to acquire an IP address via DHCP, and will then spawn the L2 application. See the section on setting up an IP address for further details. Once the system controller is connected to the LAN, it can be accessed via a simple telnet session.
The serial connection is used to communicate directly with the L1 system controller. This connection is typically used for service purposes or for system controller and system console access in small systems where an ethernet connection is not used or available.
Once a connection to the console is established, the user will be presented with either an L2 prompt (ethernet connection), known as L2 mode, or and L1 prompt (serial connection), known as L1 mode. From either of these prompts various system controller commands can be entered.
001c01-L1> <l1 command>
olympic-101-L2> <l2 command>
To access the system console, known as console mode, a control-d is entered at either the L1 prompt or the L2 prompt. To return to or escape back to the L1 or L2 mode, a control-t is entered. This escape is only temporary and you will be returned to console mode once the “Return/Enter” key is pressed. To re-engage L1 or L2 mode, enter control-t followed by either “l1” or “l2” depending on the original mode. System control commands are always entered in lower case unless otherwise specified.
This section explains how to power on and power off individual rack units, Dense routers, or your entire Altix 4700 system, as follows:
For servers with a system console, you can power on and power off individual IRUs and Dense routers or the entire system at the system console.
If you are using an SGIconsole, you can monitor and manage your server from a remote location. You may also monitor and manage your server with tools such as VACM, Console Manager, and PCP. For details, see the documentation for the particular tool.
The Embedded Support Partner (ESP) program enables you and your SGI system support engineer (SSE) to monitor your server remotely and resolve issues before they become problems. For details on this program, see “Using Embedded Support Partner (ESP) ”.
To prepare to power on your system, follow these steps:
Check to ensure that the cabling between the rack's power distribution unit (PDU) and the wall power-plug receptacle is secure.
For each individual IRU that you want to power on, make sure that the power cables are plugged into all the IRU power supplies correctly, as shown in Figure 1-1. Setting the circuit breakers on the PDUs to the “On” position will apply power to the IRU and will start the system controller(s) in the IRUs. Note that the system controller in each IRU and Dense router stays powered on as long as there is power coming into the unit.
If you plan to power on a server that includes optional mass storage enclosures, make sure that the power switch on the rear of each PSU/cooling module (one or two per enclosure) is in the 1 (on) position.
Make sure that all PDU circuit breaker switches (see the examples in Figure 1-2 and Figure 1-3) are turned on to provide power to the server system when the system is powered on.
Figure 1-3 shows an example of the three-phase PDU.
The power-on and off procedure at a console varies with your server setup, as follows:
If you have a console connected to a server with a serial interface, you can toggle between L1 and console mode. This enables you to power on your server with L1 commands and view the activity by changing to the console mode.
For detailed instructions on using a system console using the L1 mode, see “Operating the L1” in Chapter 2.
If you have a system console connected to a server with an ethernet interface, you can toggle between L2 and console mode, and power on your server with L2 controller commands, and monitor the power-on activity by changing to the console mode.
See the section “Console Hardware Requirements” in Chapter 2 for additional information on optional consoles.
Commands issued at the L1 prompt typically only affect the local enclosure. The following sections describe how to power on and power off your system in L1 mode.
The L1 controller display, located on the front of each IRU, should display L1 running once the power-on procedure starts (storage modules do not have L1s). The prompt on your console screen shows the rack and slot number of the IRU to which you have connected your console.
| Note: If you have a problem while powering on and an error message appears on your console display, see “L1 Controller Error Messages” in Chapter 7 to learn what the error message indicates and how to resolve the problem. |
If you want to power on the IRU (001c01 in our example) indicated in the prompt, enter the following command.
001c01-L1> power up |
To power off:
001c01-L1> power down |
If you are attempting to power on or power off a system with multiple enclosures from the L1 prompt, you will need to prepend the command with an asterisk as follows:
oo1c01-L1> * power up or oo1c01-L1> * power down (* indicates all) |
| Note: If you are accessing a large system via the serial connection, the entire system may not be accessible from this point and as a result only a portion of the system may be affected. |
From the L1 prompt, display the system configuration information by entering the following command:
001c01-L1> config :0 001c01 L0C :2 002r01 L1H :8 002r03 R0H :4 002r05 L3H :6 002r07 R2H 001c01-L1> |
In L1 mode, you can obtain only limited information about the system configuration. An IRU has information about its internal blades, and also if other IRUs are NUMAlink attached to the IRU, information about those IRUs.
| Note: Verify that the power LED on the L1 displays turns on and lights green and that your controllers display that the system is powered on for each segment of the procedure, which indicates that the power-on procedure is proceeding properly. If you have a problem while powering on and an error message appears on the L1 controller, or the system console, see your online log files and the information in “L1 Controller Error Messages” in Chapter 7 to learn what the error message indicates and how to best resolve the problem. |
Embedded Support Partner (ESP) automatically detects system conditions that indicate potential future problems and then notifies the appropriate personnel. This enables you and SGI system support engineers (SSEs) to proactively support systems and resolve issues before they develop into actual failures.
ESP enables users to monitor one or more systems at a site from a local or remote connection. ESP can perform the following functions:
Monitor the system configuration, events, performance, and availability.
Notify SSEs when specific events occur.
Generate reports.
ESP also supports the following:
Remote support and on-site troubleshooting.
System group management, which enables you to manage an entire group of systems from a single system.
For additional information on this and other available monitoring services, see the section “SGI Electronic Support ” in Chapter 7.
You can monitor your Altix 4700 server from the following sources:
On the L1 controller's display at the front of each IRU as shown in Figure 1-4, you can monitor system operational status. For example, you can monitor error messages that warn of power or temperature values that are out of tolerance.
A DVI or VGA monitor with USB keyboard/mouse can be connected for basic monitoring and administration of the Altix system. See the section “2D Graphics Video Interface” for more information. SLES 10 is required for this card. RHEL 5 is not supported.
You can connect an optional console via an Ethernet port adapter. You will need to connect either a local or remote workstation/PC to monitor the servers via Ethernet.
These console connections enable you to view the status and error messages generated by the L1/L2 controllers in your Altix 4700 rack. You can also use these consoles to input L1/L2 commands to manage and monitor your system. See the section “Console Hardware Requirements” in Chapter 2, for additional information on the L1/L2 console.
If your system was ordered in January 2007 or later, it may come equipped with an optional 2D graphics board interface (not supported with SLES9 or RHEL5 systems). This low-profile PCI interface card is installed in the IA2 (base I/O) blade in the Altix system. One 2D card is supported per system or partition; no audio function is supported. The 2D video card has the following features:
64 MB DDR graphics memory
DMS-59 display connector for two display support (no synchronization between monitors)
Simultaneous display to DVI and VGA monitors using an adapter
Analog resolution up to 2048 x 1536 per display
DVI resolution up to 1600 x 1200 per display
| Note: A single USB keyboard/mouse is supported by the IA2 (base I/O) blade. |
The 2D video interface card can be used for all basic interaction with your Altix system. You should note that it does not provide an interface to the L1/L2 controller and L1/L2 administrative commands cannot be issued through the 2D video interface.
Besides adding a network-connected system console or basic VGA monitor, you can add or replace the following hardware items on your Altix 4700 series server:
Peripheral component interface (PCI) cards into your system I/O blades.
Disk drives in your IA blade (base I/O).
The sections that follow discuss these activities in more detail.
| Warning: You can add or replace only the items listed in this section. For your safety and for the protection of your server system, contact your SGI system support engineer (SSE) to install any hardware items not listed in this section. |
| Warning: Before installing, operating, or servicing any part of this product, read the “Safety Information” in Appendix B. |
The PCI, PCI-X and PCIe based I/O sub-systems, are industry standard for connecting peripherals, storage and graphics to a processor blade. These are the primary configurable I/O system interfaces for the Altix 4700 series systems. They include:
The IA blade (base I/O) used in the Altix 4700 system includes two half-height PCI/PCI-X slots. The two option cards install adjacent to the system disk(s) and DVD drive that are also resident in the IA blade. Note that the newer version of the IA blade that supports RAID 1 and DVD-R/W became available for order in January 2007. This enhanced version of the base I/O is generally referred to as the IA2.
The optional three-slot PCI/PCI-X double-wide blade holds three sled-mounted PCI/PCI-X cards for easy insertion and extraction from the system.
The optional single-wide PCI express (PCIe) blade supports two optional PCI express cards.
The optional double-wide PCI express (PCIe) and PCI-X blade supports two optional PCI express slots and two PCI/PCI-X slots.This blade is commonly used for installing a pair of SGI 3D PCIe option cards. Note the width of the 3D cards precludes use of the PCI-X slots.
Other optional I/O blades provide additional PCI and PCI-X support for the Altix 4700 server system. Not all blades may be available with your system configuration.
Check with your SGI sales or service representative for availability. See Chapter 6, “Maintenance and Upgrade Procedures” for detailed instructions on installing or removing PCI cards in the blades. Information on installing or removing optional 3D graphics cards is included in the user's guide shipped with the 3D graphics option.
The IA-blade (base I/O blade) within the IRU supports the system boot functions and contains one or two low-profile disk drives. See “Installing or Replacing a Disk Drive in the IA/IA2 Blade” in Chapter 6 for detailed instructions on installing or removing disk drives.
System disk drive(s) ordered starting in March 2007 may come configured as either a non-RAID (jbod) or as a RAID 1 (disk mirror) system. Note that while RAID 0 (striping) is supported by the hardware, SGI recommends using either jbod or RAID 1 to ensure maximum reliability and data retention.
The type of disk configuration is ordered by the customer and configured at the SGI factory. If a requirement exits to reconfigure the system disk(s), use the following information to make the changes needed. The lsiutil configuration software is the only supported methodology for modifying the disks at the time this document was published. Use of other software interfaces has not been verified and cannot be supported.
| Important: The example process that follows presumes there are two disks identical in size and speed installed in your Altix system IA2 (base I/O) blade. Also, note that this example of re-imaging a disk is done via network access. |
The lsiutil file (efi version) resides at /boot/efi
Launch the configuration utility with the command:
fs0:\> lsiutil64.efi
You should see an interactive interface similar to the following:
LSI Logic MPT Configuration Utility, Version 1.xx, November 7, 200X 1 MPT Port found Port Name Chip Vendor/Type/Rev MPT Rev Firmware Rev IOC 1. 01/00/01/0 LSI Logic SAS1068 B0 105 01100000 0 |
The SAS1068 is the chip that controls the device that the lsiutil utility was loaded from.
Select a device: [1-1 or 0 to quit] 1 1. Identify firmware, BIOS, and/or FCode 2. Download firmware (update the FLASH) 4. Download/erase BIOS and/or FCode (update the FLASH) 8. Scan for devices 10. Change IOC settings (interrupt coalescing) 13. Change SAS IO Unit settings 16. Display attached devices 18. Change WWID 20. Diagnostics 21. RAID actions 22. Reset bus 23. Reset target 24. Clear ACA 39. Force firmware download boot 45. Concatenate SAS firmware and NVDATA files 60. Show non-default settings 61. Restore default settings 97. Reset SAS phy 98. Reset SAS link 99. Reset port e Enable expert mode in menus p Enable paged mode in menus w Enable logging Main menu, select an option: [1-99 or e/p/w or 0 to quit] 8 |
| Important: After this command is entered, the load device cannot be accessed again to open files until lsiutil has been exited and restarted. |
SAS1068's links are 3.0 G, 3.0 G, down, down, down, down, down, down
B___T___L Type Vendor Product Rev SASAddress PhyNum
0 0 0 Disk SGI ST3146854SS X422 5000c5000002cb45 0
0 1 0 Disk SGI ST3146854SS X422 5000c500000121cd 1
Main menu, select an option: [1-99 or e/p/w or 0 to quit] 0
Port Name Chip Vendor/Type/Rev MPT Rev Firmware Rev IOC
1. 01/00/01/0 LSI Logic SAS1068 B0 105 01100000 0
Select a device: [1-1 or 0 to quit] 1
1. Identify firmware, BIOS, and/or FCode
2. Download firmware (update the FLASH)
4. Download/erase BIOS and/or FCode (update the FLASH)
8. Scan for devices
10. Change IOC settings (interrupt coalescing)
13. Change SAS IO Unit settings
16. Display attached devices
18. Change WWID
20. Diagnostics
21. RAID actions
22. Reset bus
23. Reset target
24. Clear ACA
39. Force firmware download boot
45. Concatenate SAS firmware and NVDATA files
60. Show non-default settings
61. Restore default settings
97. Reset SAS phy
98. Reset SAS link
99. Reset port
e Enable expert mode in menus
p Enable paged mode in menus
w Enable logging
Main menu, select an option: [1-99 or e/p/w or 0 to quit] 21
1. Show volumes
2. Show physical disks
3. Get volume state
23. Replace physical disk
30. Create volume
31. Delete volume
32. Change volume settings
50. Create hot spare
99. Reset port
e Enable expert mode in menus
p Enable paged mode in menus
w Enable logging
RAID actions menu, select an option: [1-99 or e/p/w or 0 to quit] 1
0 volumes are active, 0 physical disks are active
RAID actions menu, select an option: [1-99 or e/p/w or 0 to quit] 30
B___T___L Type Vendor Product Rev Disk Blocks Disk MB
1. 0 0 0 Disk SGI ST3146854SS X422 286749488 140014
2. 0 1 0 Disk SGI ST3146854SS X422 286749488 140014
To create a volume, select two or more of the available targets
Select a target: [1-2 or RETURN to quit] 1
Select a target: [1-2 or RETURN to quit] 2
2 physical disks were created
Select volume type: [0=Mirroring, 1=Striping, default is 0]
Select volume size: [1 to 139898 MB, default is 139898]
Enable write caching: [Yes or No, default is No]
Zero the first and last blocks of the volume? [Yes or No, default is No]
Volume was created
RAID actions menu, select an option: [1-99 or e/p/w or 0 to quit] 1
1 volume is active, 2 physical disks are active
Volume 0 is Bus 0 Target 0, Type IM (Integrated Mirroring)
Volume State: degraded, enabled , resync in progress
Volume Settings: write caching disabled , auto configure
Volume Size 139898 MB, Stripe Size 0 KB, 2 Members
Primary is PhysDisk 0 (Bus 0 Target 56)
Secondary is PhysDisk 1 (Bus 0 Target 1)
RAID actions menu, select an option: [1-99 or e/p/w or 0 to quit] 99
Resetting port...
RAID actions menu, select an option: [1-99 or e/p/w or 0 to quit] 0
Main menu, select an option: [1-99 or e/p/w or 0 to quit] 0
Port Name Chip Vendor/Type/Rev MPT Rev Firmware Rev IOC
1. 01/00/01/0 LSI Logic SAS1068 B0 105 01100000 0
Select a device: [1-1 or 0 to quit] 0
Sas(Pun0,Lun0) LSILOGICLogical Volume 3000
fs0:\> reset
.
.
.
EFI Boot Manager ver 1.10 [14.62]
Partition 0: Enabled Disabled
CBlades 2 Nodes 3 0
RBlades 0 CPUs 4 0
IOBlades 1 Mem(GB) 6 0
Please select a boot option
EFI Shell
>>> netboot
Boot option maintenance menu
Loading: netboot
Running LoadFile()
CLIENT MAC ADDR: 08 00 69 14 E3 0C
CLIENT IP: 137.38.82.91 MASK: 255.255.255.0 DHCP IP: 137.38.228.4
GATEWAY IP: 137.38.82.252 137.38.82.254
TSize.Running LoadFile()
Starting: netboot
ELILO
.
.
.
.
Quietly populating /dev/sda2 and /dev/sda1 with ia64-propack51-sles10-Jan19
# reboot....
|
| Note: The initial rsynchronization of the mirrored disk (depending on the size of the disk) may take up to 45 minutes or longer. |
RAID actions menu, select an option: [1-99 or e/p/w or 0 to quit] 1 1 volume is active, 2 physical disks are active Volume 0 is Bus 0 Target 0, Type IM (Integrated Mirroring) Volume State: optimal, enabled Volume Settings: write caching disabled, auto configure Volume draws from Hot Spare Pools: 0 Volume Size 139898 MB, Stripe Size 0 KB, 2 Members Primary is PhysDisk 1 (Bus 0 Target 56) Secondary is PhysDisk 0 (Bus 0 Target 1) |
For information on the two RAID 1 disks, use the lsscsi command as shown in the following example:
# lsscsi
[1:0:0:0] disk SGI ST3146854SS X422 - /dev/sg2 [1:0:1:0] disk SGI ST3146854SS X422 - /dev/sg3 [1:1:0:0] disk LSILOGIC Logical Volume 3000 /dev/sdc /dev/sg4 |
This is an example of what you'll see on SLES10 or RHEL5 Linux operating systems. Disk [1:0:0:0] and [1:0:1:0] are the individual drives in the RAID 1, but you cannot access them through the disk driver. You access [1:1:0:0], which is /dev/sdc in this case. Caution must be exercised when entering Linux SCSI generic (sg) commands via the sg devices.