This chapter provides an overview of the physical and architectural aspects of your SGI Altix 4700 series system. The major components of the Altix 4700 series systems are described and illustrated.
The Altix 4700 series is a family of multiprocessor distributed shared memory (DSM) computer systems that initially scale from 16 to 1,024 Intel 64-bit processor cores (8 to 512 sockets) as a cache-coherent single system image (SSI). Future releases will scale to larger processor counts for single system image (SSI) applications. Contact your SGI sales or service representative for the most current information on this topic.
In a DSM system, each processor board contains memory that it shares with the other processors in the system. Because the DSM system is modular, it combines the advantages of low entry-level cost with global scalability in processors, memory, and I/O. You can install and operate the Altix 4700 series system in your lab or server room. Each 42U SGI rack holds from one to four 10U high enclosures that support up to ten compute/memory and I/O sub modules known as “blades.” These blades are single printed circuit boards (PCBs) with ASICS, processors, memory components and I/O chipsets mounted on a mechanical carrier. The blades slide directly in and out of the Altix 4700 IRU enclosures.
This chapter consists of the following sections:
Figure 3-1 shows the front views of a multiple-rack Altix 4700 system.
The basic enclosure within the Altix 4700 system is the 10U high (17.5 inch or 44.45 cm) “individual rack unit” (IRU). The IRU enclosure contains up to ten single-wide blades or two double-wide blades and eight single-wide blades. Each IRU comes with four high-speed routers. The routers connect to the installed blades via a backplane. Each router also has two ports that are brought out to external NUMAlink 4 connectors. The 42U rack for this server houses all IRU enclosures, option modules, and other components; up to 64 processor sockets (128 processor cores) in a single rack. The Altix 4700 server system can expand up to 1,024 Intel 64-bit processor cores; a minimum of one IA blade (base I/O) is required for every 1,024 processor cores.
Figure 3-2 shows an example configuration of a single-rack Altix 4700 server.
The system requires a minimum of one 42U tall rack with one single-phase power distribution unit (PDU) per IRU installed in the rack. Each single-phase PDU has 5 outlets (four of which are required to support the four power supplies in each IRU).
The three-phase PDU has 18 outlets (16 connections are required to support the four IRUs that may be installed in the rack).
You can also add additional PCI expansion blades or RAID and non-RAID disk storage to your server system.
Figure 3-3 shows an individual blade, IRU and Rack.
The Altix 4700 computer system is based on a distributed shared memory (DSM) architecture. The system uses a global-address-space, cache-coherent multiprocessor that scales up to sixty four Intel 64-bit processors in a single rack. Because it is modular, the DSM combines the advantages of lower entry cost with the ability to scale processors, memory, and I/O independently to a maximum of 512 processor sockets (1,024 processor cores) on a single-system image (SSI). Larger SSI configurations may be offered in the future, contact your SGI sales or service representative for information.
The system architecture for the Altix 4700 system is a fourth-generation NUMAflex DSM architecture known as NUMAlink 4. In the NUMAlink 4 architecture, all processors and memory are tied together into a single logical system with special crossbar switches (routers). This combination of processors, memory, and crossbar switches constitute the interconnect fabric called NUMAlink. There are four router switches in each 10U IRU enclosure.
The basic expansion building block for the NUMAlink interconnect is the processor node; each processor node consists of a Super-Hub (SHub) ASIC and one or two 64-bit processors with three levels of on-chip secondary caches. The Intel 64-bit processors are connected to the SHub ASIC via a single high-speed front side bus.
The SHub ASIC is the heart of the processor and memory node blade technology. This specialized ASIC acts as a crossbar between the processors, local SDRAM memory, and the network interface. The SHub ASIC memory interface enables any processor in the system to access the memory of all processors in the system.
Another component of the NUMAlink 4 architecture is the router ASIC. The router ASIC is a custom designed 8-port crossbar ASIC. Using the router ASICs with a highly specialized backplane or NUMAlink 4 cables provides a high-bandwidth, extremely low-latency interconnect between all processor, I/O, and other option blades within the system.
Figure 3-4 shows a functional block diagram of the Altix 4700 series system IRU processor blades, NUMAlink interface, and other major components.
The main features of the Altix 4700 series server systems are introduced in the following sections:
The Altix 4700 series systems are modular systems. The components are primarily housed in building blocks referred to as individual rack units (IRUs). Additional option mass storage may be added to the rack along with additional IRUs. You can add different types of blade options to a system IRU to achieve the desired system configuration. You can easily configure systems around processing capability, I/O capability, memory size, or storage capacity. You place individual blades that create the basic functionality (compute/memory, I/O, and power) into custom 19-inch racks. The air-cooled IRU enclosure system has redundant, hot-swap fans and redundant, hot-swap power supplies at the IRU level.
In the Altix 4700 series server, memory is physically distributed both within and among the IRU enclosures (compute/memory/I/O blades); however, it is accessible to and shared by all NUMAlinked devices within the single-system image. This is to say that all NUMAlinked components sharing a single Linux operating system, operate and share the memory “fabric” of the system.
Note the following sub-types of memory within a system:
If a processor accesses memory that is connected to the same SHub ASIC on a compute node blade, the memory is referred to as the node's local memory.
If processors access memory located in other blade nodes within the IRU, (or other NUMAlinked IRUs) the memory is referred to as remote memory.
The total memory within the NUMAlinked system is referred to as global memory.
Memory latency is the amount of time required for a processor to retrieve data from memory. Memory latency is lowest when a processor accesses local memory.
Like DSM, I/O devices are distributed among the blade nodes within the IRUs (each base I/O blade node has two NUMAlink ports) and are accessible by all compute nodes within the SSI through the NUMAlink interconnect fabric.
As the name implies, the cache-coherent non-uniform memory access (ccNUMA) architecture has two parts, cache coherency and nonuniform memory access, which are discussed in the sections that follow.
The Altix 4700 server series use caches to reduce memory latency. Although data exists in local or remote memory, copies of the data can exist in various processor caches throughout the system. Cache coherency keeps the cached copies consistent.
To keep the copies consistent, the ccNUMA architecture uses directory-based coherence protocol. In directory-based coherence protocol, each block of memory (128 bytes) has an entry in a table that is referred to as a directory. Like the blocks of memory that they represent, the directories are distributed among the compute/memory blade nodes. A block of memory is also referred to as a cache line.
Each directory entry indicates the state of the memory block that it represents. For example, when the block is not cached, it is in an unowned state. When only one processor has a copy of the memory block, it is in an exclusive state. And when more than one processor has a copy of the block, it is in a shared state; a bit vector indicates which caches contain a copy.
When a processor modifies a block of data, the processors that have the same block of data in their caches must be notified of the modification. The Altix 4700 server series use an invalidation method to maintain cache coherence. The invalidation method purges all unmodified copies of the block of data, and the processor that wants to modify the block receives exclusive ownership of the block.
The Altix 4700 server series components have the following features to increase the reliability, availability, and serviceability (RAS) of the systems.
Power and cooling:
IRU power supplies are redundant and can be hot-swapped under most circumstances. Note that this might not be possible in a “fully loaded” system. If all the blade positions are filled, be sure to consult with a service technician before removing a power supply while the system is running.
IRUs have overcurrent protection at the blade and power supply level.
Fans are redundant and can be hot-swapped.
Fans run at multiple speeds in the IRUs. Speed increases automatically when temperature increases or when a single fan fails.
System monitoring:
System controllers monitor the internal power and temperature of the IRUs, and can automatically shut down an enclosure to prevent overheating.
Memory, L2 cache, L3 cache, and all external bus transfers are protected by single-bit error correction and double-bit error detection (SECDED).
The NUMAlink interconnect network is protected by cyclic redundancy check (CRC).
The L1 primary cache is protected by parity.
Each IRU and each blade/node installed has failure LEDs that indicate the failed part; LEDs are readable at the front of the IRU or via the system controllers.
Systems support the optional Embedded Support Partner (ESP), a tool that monitors the system; when a condition occurs that may cause a failure, ESP notifies the appropriate SGI personnel.
Systems support remote console and maintenance activities.
Power-on and boot:
Automatic testing occurs after you power on the system. (These power-on self-tests or POSTs are also referred to as power-on diagnostics or PODs).
Processors and memory are automatically de-allocated when a self-test failure occurs.
Boot times are minimized.
Further RAS features:
Systems have a local field-replaceable unit (FRU) analyzer.
All system faults are logged in files.
Memory can be scrubbed using error checking code (ECC) when a single-bit error occurs.
The Altix 4700 series system features the following major components:
42U rack. This is a custom rack used for both the compute and I/O rack in the Altix 4700 system. Up to 4 IRUs can be installed in each rack. There is also a 2U space reserved at the top for an optional router.
Individual Rack Unit (IRU). This enclosure contains the compute/memory blades, IA2 blade (base I/O), standard routers and optional I/O blades for the Altix 4700. The enclosure is 10U high. Figure 3-5 shows the Altix 4700 IRU system components.
Compute/Memory blade. Holds up to two IA-64 processor sockets and 4, 8 or 12 memory DIMMs.
Memory-only blade. This blade acts as a memory expansion node with no processor compute circuitry included on the blade. This blade holds 4, 8, or 12 memory DIMMs.
Double-wide PCI-X expansion blade. Supports three PCI/PCI-X 133 MHz 64-bit option cards. This three-slot blade features card carriers that allow you to slide PCI/PCI-X boards directly in and out of the unit.
PCIe expansion blade. The single wide PCI blade supports two PCI Express option cards.
Double-wide PCIe/PCI-X expansion blade. This blade supports two PCI Express option cards and two PCI/PCI-X option cards.
| Note: PCIe options may be limited, check with your SGI sales or support representative. |
IA2 blade (Base I/O blade). Double-wide I/O blade that supports all base system I/O functions including one or two disk drives with optional RAID 0 or 1, a DVD-R/W drive, two low-profile PCI card slots, two ethernet connectors, one SAS/SATAII port, and four USB ports. The enchanced IA blade (also known as the IA2 blade) began shipping as original equipment in most systems ordered after January 2007. You must have the IA2 version of the blade and SLES 10 or later to use RAID 1 mirroring on your system disk pair.
| Note: While the system I/O blade is capable of RAID 0 support, SGI does not recommend the end user configure it in this way. RAID 0 offers no fault tolerance to the system disks, and a decrease in overall system reliability. In a RAID 0 configuration, failure of either system disk will result in data being lost on both disks, resulting in system failure. The Altix 4700 ships with RAID 1 functionality (disk mirroring) configured if the option is ordered. |
Bays in the racks are numbered using standard units. A standard unit (SU) or unit (U) is equal to 1.75 inches (4.445 cm). Because IRUs occupy multiple standard units, IRU locations within a rack are identified by the bottom unit (U) in which the IRU resides. For example, in a 42U rack, an IRU positioned in U01 through U10 is identified as U01.
Each rack is numbered with a three-digit number sequentially beginning with 001. A rack contains IRU enclosures, optional mass storage enclosures, optional router bricks and potentially other options. In a single compute rack system, the rack number is always 001.
Availability of optional components for the SGI 4700 systems may vary based on new product introductions or end-of-life components. Some options are listed in this manual, others may be introduced after this document goes to production status. Check with your SGI sales or support representative for the most current information on available product options not discussed in this manual.