This chapter provides an overview of the physical and architectural aspects of your SGI Altix 450 system. The major components of the Altix 450 series systems are described and illustrated.
The Altix 450 series is a family of multiprocessor distributed shared memory (DSM) computer systems that initially scale from 2 to 76 Intel 64-bit processor cores as a cache-coherent single system image (SSI). Contact your SGI sales or service representative for the most current information on this topic.
In a DSM system, each processor board contains memory that it shares with the other processors in the system. Because the DSM system is modular, it combines the advantages of low entry-level cost with global scalability in processors, memory, and I/O. You can install and operate the Altix 450 series system in your lab or server room. Each 20U SGI rack holds from one to four 5U high enclosures that support up to five compute/memory and I/O sub modules known as “blades.” These blades are single printed circuit boards (PCBs) with ASICS, processors, memory components and I/O chipsets mounted on a mechanical carrier. The blades slide directly into and out of the Altix 450 IRU enclosures.
This chapter consists of the following sections:
Figure 3-1 shows the front view of a 20U rack used to house the Altix 450 system.
The basic enclosure within the Altix 450 system is the 5U high (8.68 inch or 22 cm) “individual rack unit” (IRU). The IRU enclosure houses a maximum of four single-wide blades and one double-wide blade. Each IRU comes with two built-in high-speed routers. The routers connect to the installed blades via a backplane. Each router has two ports that are brought out to external NUMAlink 4 connectors. The 20U or 42U rack for this server houses all IRU enclosures, option modules, and other components; up to a 76-processor core configuration (38 processor sockets) in a single rack. The Altix 450 server system needs a minimum of one IA/IA2 blade (base I/O).
The rack system requires a minimum of one 20U-high rack with one single-phase power distribution unit (PDU) installed in the rack. Each single-phase PDU has 5 outlets (to support two IRUs).
Figure 3-2 shows an example configuration of a 42U Altix 450 system “tall rack.”
You can also add additional PCI expansion blades or RAID and non-RAID disk storage to your server system.
Figure 3-3 shows an individual blade, IRU and Rack.
The Altix 450 computer system is based on a distributed shared memory (DSM) architecture. The system uses a global-address-space, cache-coherent multiprocessor that scales up to 76 Intel 64-bit processor cores in a single rack. Because it is modular, the DSM combines the advantages of lower entry cost with the ability to scale processors, memory, and I/O independently.
The system architecture for the Altix 450 system is a fourth-generation NUMAflex DSM architecture known as NUMAlink 4. In the NUMAlink 4 architecture, all processors and memory are tied together into a single logical system with special crossbar switches (routers). This combination of processors, memory, and crossbar switches constitute the interconnect fabric called NUMAlink. There are two internal router switches on each 5U IRU enclosure.
The basic expansion building block for the NUMAlink interconnect is the processor node; each processor node consists of a Super-Hub (SHub) ASIC and one or two 64-bit processors with three levels of on-chip secondary caches. The Intel 64-bit processors are connected to the SHub ASIC via a single high-speed front side bus.
The SHub ASIC is the heart of the processor and memory node blade technology. This specialized ASIC acts as a crossbar between the processors, local SDRAM memory, and the network interface. The SHub ASIC memory interface enables any processor in the system to access the memory of all processors in the system.
Another component of the NUMAlink 4 architecture is the router ASIC. The router ASIC is a custom designed 8-port crossbar ASIC. Using the router ASICs with a highly specialized backplane and NUMAlink 4 cables provides a high-bandwidth, extremely low-latency interconnect between all processor, I/O, and other option blades within the system.
Figure 3-4 shows a functional block diagram of the Altix 450 series IRU components.
The main features of the Altix 450 series server systems are introduced in the following sections:
The Altix 450 series systems are modular systems. The components are primarily housed in building blocks referred to as individual rack units (IRUs). Additional optional mass storage may be added to the rack along with additional IRUs. You can add different types of blade options to a system IRU to achieve the desired system configuration. You can easily configure systems around processing capability, I/O capability, memory size, or storage capacity. You place individual blades that create the basic functionality (compute/memory, I/O, and power) into IRUs. The air-cooled IRU enclosure system has redundant, hot-swap fans and redundant, hot-swap power supplies at the IRU level.
In the Altix 450 system, memory is physically distributed both within and among the IRU enclosures (compute/memory/I/O blades); however, it is accessible to and shared by all NUMAlinked devices within the single-system image. This is to say that all NUMAlinked components sharing a single Linux operating system operate and share the memory “fabric” of the system.
Note the following sub-types of memory within a system:
If a processor accesses memory that is connected to the same SHub ASIC on a compute node blade, the memory is referred to as the node's local memory.
If processors access memory located in other blade nodes within the IRU, (or other NUMAlinked IRUs) the memory is referred to as remote memory.
The total memory within the NUMAlinked system is referred to as global memory.
Memory latency is the amount of time required for a processor to retrieve data from memory. Memory latency is lowest when a processor accesses local memory.
Like DSM, I/O devices are distributed among the blade nodes within the IRUs (each base I/O blade node has two NUMAlink ports) and are accessible by all compute nodes within the SSI through the NUMAlink interconnect fabric.
As the name implies, the cache-coherent non-uniform memory access (ccNUMA) architecture has two parts, cache coherency and nonuniform memory access, which are discussed in the sections that follow.
The Altix 450 systems use caches to reduce memory latency. Although data exists in local or remote memory, copies of the data can exist in various processor caches throughout the system. Cache coherency keeps the cached copies consistent.
To keep the copies consistent, the ccNUMA architecture uses directory-based coherence protocol. In directory-based coherence protocol, each block of memory (128 bytes) has an entry in a table that is referred to as a directory. Like the blocks of memory that they represent, the directories are distributed among the compute/memory blade nodes. A block of memory is also referred to as a cache line.
Each directory entry indicates the state of the memory block that it represents. For example, when the block is not cached, it is in an unowned state. When only one processor has a copy of the memory block, it is in an exclusive state. And when more than one processor has a copy of the block, it is in a shared state; a bit vector indicates which caches contain a copy.
When a processor modifies a block of data, the processors that have the same block of data in their caches must be notified of the modification. The Altix 450 server series use an invalidation method to maintain cache coherence. The invalidation method purges all unmodified copies of the block of data, and the processor that wants to modify the block receives exclusive ownership of the block.
The Altix 450 server series components have the following features to increase the reliability, availability, and serviceability (RAS) of the systems.
Power and cooling:
IRU power supplies are redundant and can be hot-swapped under most circumstances. Note that this might not be possible in a “fully loaded” system. If all the blade positions are filled, be sure to consult with a service technician before removing a power supply while the system is running.
IRUs have overcurrent protection at the blade and power supply level.
Fans are redundant and can be hot-swapped.
Fans run at multiple speeds in the IRUs. Speed increases automatically when temperature increases or when a single fan fails.
System monitoring:
System controllers monitor the internal power and temperature of the IRUs, and can automatically shut down an enclosure to prevent overheating.
Memory, L2 cache, L3 cache, and all external bus transfers are protected by single-bit error correction and double-bit error detection (SECDED).
The NUMAlink interconnect network is protected by cyclic redundancy check (CRC).
The L1 primary cache is protected by parity.
Each IRU and each blade/node installed has failure LEDs that indicate the failed part; LEDs are readable at the front of the IRU or via the system controllers.
Systems support the optional Embedded Support Partner (ESP), a tool that monitors the system; when a condition occurs that may cause a failure, ESP notifies the appropriate SGI personnel.
Systems support remote console and maintenance activities.
Power-on and boot:
Automatic testing occurs after you power on the system. (These power-on self-tests or POSTs are also referred to as power-on diagnostics or PODs).
Processors and memory are automatically de-allocated when a self-test failure occurs.
Boot times are minimized.
Further RAS features:
Optional RAID 1 in base I/O (IA2 blade), check with your SGI sales or service representative for availability. Note that RAID 0 (striping), while supported by the IA2 hardware, is not recommended as a RAS option.
Systems have a local field-replaceable unit (FRU) analyzer.
All system faults are logged in files.
Memory can be scrubbed using error checking code (ECC) when a single-bit error occurs.
The Altix 450 series system features the following major components:
20U rack. The “short” rack is a custom rack used with the Altix 450 system. It holds up to 4 IRUs or a combination of IRUs and option modules (such as mass storage).
42U rack. The “tall” rack is a custom rack used for both the compute and I/O rack in the Altix 450 system. Up to 8 IRUs can be installed in each rack. There is also a 2U space reserved at the top for an option module.
Individual Rack Unit (IRU). This enclosure contains the compute/memory blades, IA blade (base I/O), standard routers and optional I/O blades for the Altix 450 series systems. The enclosure is 5U high.
Compute/Memory blade. Holds up to two IA-64 processor sockets and 4, 8 or 12 memory DIMMs.
Memory-only blade. This blade acts as a memory expansion node with no processor compute circuitry included on the blade. This blade holds 4, 8, or 12 memory DIMMs.
Single-wide PCI-X expansion blade. This two-slot PCI/PCI-X option blade supplies an individual PCI bus for each option card.
Double-wide PCI-X expansion blade. Supports three PCI/PCI-X 133 MHz 64-bit option cards. This three-slot blade features card carriers that allow you to slide PCI/PCI-X boards directly into and out of the unit.
Single-wide PCIe expansion blade. The single wide PCI blade supports one or two PCI Express option cards. Note that when used with a 3D PCIe graphics card, only one can be inserted due to space constraints.
Double-wide PCIe/PCI-X expansion blade. This blade supports two PCI Express option cards and two PCI/PCI-X option cards. Note that when this blade is used with two 3D PCIe graphics cards the PCI-X slots cannot be used due to space constraints.
IA/IA2 blade (Base I/O blade). Double-wide I/O blade that supports all base system I/O functions including one or two disk drives, a DVD drive, two low-profile PCI-X card slots, two ethernet ports, one SAS/SATAII port, and four USB ports. Optional RAID 1 functionality and DVD-R/W is available with the IA2 version of the base I/O blade only.
Figure 3-5 shows the Altix 450 IRU system components.
Bays in the racks are numbered using standard units. A standard unit (SU) or unit (U) is equal to 1.75 inches (4.445 cm). Because IRUs occupy multiple standard units, IRU locations within a rack are identified by the bottom unit (U) in which the IRU resides. For example, in a 42U rack, an IRU positioned in U01 through U05 is identified as U01.
Each rack is numbered with a three-digit number sequentially beginning with 001. A rack contains IRU enclosures, optional mass storage enclosures, and potentially other options. In a single compute rack system, the rack number is always 001.
Availability of optional components for the SGI 450 systems may vary based on new product introductions or end-of-life components. Some options are listed in this manual, others may be introduced after this document goes to production status. Check with your SGI sales or support representative for the most current information on available product options not discussed in this manual.