AlphaServer 4000/4100
Service Manual

Order Number: EK–4100A–SV. B01

This manual is for anyone who services an AlphaServer 4000/4100 pedestal or cabinet system. It includes troubleshooting information, configuration rules, and instructions for removal and replacement of field-replaceable units (FRUs).

Digital Equipment Corporation
Maynard, Massachusetts
Contents

Preface ........................................................................................................................................... xi

Chapter 1      System Overview

1.1  AlphaServer 4100 System Drawer (BA30A).........................................................1-2
1.2  AlphaServer 4000 System Drawer (BA30C)...........................................................1-4
1.3  AlphaServer 4100 System Drawer (BA30B)...........................................................1-6
1.4  Cabinet System .........................................................................................................1-8
1.5  Pedestal System .........................................................................................................1-10
1.6  Control Panel and Drives .........................................................................................1-12
1.7  System Consoles ........................................................................................................1-14
1.8  System Architecture ..................................................................................................1-16
1.9  System Motherboard ................................................................................................1-18
1.10  CPU Types ..............................................................................................................1-20
1.11  Memory Modules .....................................................................................................1-22
1.12  Memory Addressing ...............................................................................................1-24
1.13  System Bus ................................................................................................................1-26
1.14  System Bus to PCI Bus Bridge Module ...................................................................1-28
1.15  PCI I/O Subsystem ..................................................................................................1-30
1.16  Server Control Module ............................................................................................1-32
1.17  Power Control Module ............................................................................................1-34
1.18  Power Supply ..........................................................................................................1-36

Chapter 2      Power-Up

2.1  Control Panel .............................................................................................................2-2
2.2  Power-Up Sequence ..................................................................................................2-4
2.3  SROM Power-Up Test Flow ....................................................................................2-8
2.4  SROM Errors Reported ..........................................................................................2-11
2.5  XSROM Power-Up Test Flow ................................................................................2-12
2.6  XSROM Errors Reported .......................................................................................2-15
2.7  Console Power-Up Tests ........................................................................................2-16
5.4.1 System Bus ECC Error ................................................................. 5-39
5.4.2 System Bus Nonexistent Address Error ..................................... 5-40
5.4.3 System Bus Address Parity Error ............................................... 5-41
5.4.4 PIO Buffer Overflow Error (PIO_OVFL) ................................... 5-42
5.4.5 Page Table Entry Invalid Error .................................................. 5-43
5.4.6 PCI Master Abort ....................................................................... 5-43
5.4.7 PCI System Error ...................................................................... 5-43
5.4.8 PCI Parity Error ........................................................................ 5-43
5.4.9 Broken Memory ....................................................................... 5-44
5.4.10 Command Codes ...................................................................... 5-46
5.4.11 Node IDs .................................................................................. 5-47
5.5 Double Error Halts and Machine Checks While in PAL Mode .......... 5-48
5.5.1 PALcode Overview .................................................................... 5-48
5.5.2 Double Error Halt ...................................................................... 5-49
5.5.3 Machine Checks While in PAL ................................................... 5-49

Chapter 6 Error Registers

6.1 External Interface Status Register - EI_STAT .................................. 6-2
6.1.1 External Interface Address Register - EI_ADDR .......................... 6-6
6.1.2 MC Error Information Register 0 ................................................ 6-8
6.1.3 MC Error Information Register 1 ................................................. 6-9
6.1.4 CAP Error Register .................................................................. 6-11
6.1.5 PCI Error Status Register 1 .......................................................... 6-14

Chapter 7 Removal and Replacement

7.1 System Safety .................................................................................. 7-1
7.2 FRU List .......................................................................................... 7-2
7.3 4100 Power System FRUs ................................................................. 7-8
7.4 4000 Power System FRUs ................................................................. 7-10
7.5 System Drawer Exposure (Cabinet) ................................................... 7-12
7.5.1 Cabinet Drawer Exposure (H9A10-EB & -EC) ............................. 7-12
7.5.1 Cabinet Drawer Exposure (H9A10-EL & -EM) ............................. 7-14
7.6 System Drawer Exposure (Pedestal) ................................................ 7-16
7.7 CPU Removal and Replacement ..................................................... 7-18
7.8 CPU Fan Removal and Replacement .............................................. 7-20
7.9 Memory Removal and Replacement ................................................ 7-22
7.10 Power Control Module Removal and Replacement ....................... 7-24
7.11 System Bus to PCI Bus Bridge (B3040-AA) Removal and Replacement 7-26
7.12 System Bus to PCI Bus Bridge (B3040-AB) Removal and Replacement 7-28
7.13 System Motherboard (4100 & early 4000) Removal and Replacement 7-30
7.14 System Motherboard (Later 4000) Removal and Replacement ........ 7-32
7.15 PCI/EISA Motherboard (B3050) Removal and Replacement .......... 7-34
7.16 PCI Motherboard (B3051) Removal and Replacement.........................7-36
7.17 Server Control Module Removal and Replacement............................7-38
7.18 PCI/EISA Option Removal and Replacement .....................................7-40
7.19 Power Supply Removal and Replacement ........................................7-42
7.20 Power Harness (4100 & early 4000) Removal and Replacement ..........7-44
7.21 Power Harness (Later 4000) Removal and Replacement ....................7-46
7.22 System Drawer Fan Removal and Replacement ...................................7-48
7.23 Cover Interlock (4100 & early 4000) Removal and Replacement .........7-50
7.24 Cover Interlock (Later 4000) Removal and Replacement ....................7-52
7.25 Operator Control Panel Removal and Replacement (Cabinet) ...............7-54
7.26 Operator Control Panel Removal and Replacement (Pedestal) ..............7-56
7.27 Floppy Removal and Replacement .....................................................7-58
7.28 CD-ROM Removal and Replacement ................................................7-60
7.29 Cabinet Fan Tray Removal and Replacement ......................................7-62
7.30 Cabinet Fan Tray Power Supply Removal and Replacement .................7-64
7.31 Cabinet Fan Tray Fan Removal and Replacement ................................7-66
7.32 Cabinet Fan Tray Fan Fail Detect Module Removal and Replacement ....7-68
7.33 StorageWorks Shelf Removal and Replacement ..................................7-70

Appendix A  Running Utilities

A.1 Running Utilities from a Graphics Monitor ...........................................A-2
A.2 Running Utilities from a Serial Terminal .............................................A-3
A.3 Running ECU ....................................................................................A-4
A.4 Running RAID Standalone Configuration Utility ....................................A-5
A.5 Updating Firmware with LFU ..............................................................A-6
A.5.1 Updating Firmware from the Internal CD-ROM ................................A-8
A.5.2 Updating Firmware from the Internal Floppy Disk — Creating the Diskettes .................................................................A-12
A.5.3 Updating Firmware from the Internal Floppy Disk — Performing the Update .................................................................A-14
A.5.4 Updating Firmware from a Network Device ..................................A-18
A.5.5 LFU Commands ..........................................................................A-22
A.6 Updating Firmware from AlphaBIOS ..................................................A-25
A.7 Upgrading AlphaBIOS ......................................................................A-26

Appendix B  SRM Console Commands and Environment Variables

B.1 Summary of SRM Console Commands ................................................B-2
B.2 Summary of SRM Environment Variables .......................................B-4
B.3 Recording Environment Variables ....................................................B-6
Appendix C  Operating the System Remotely

C.1 RCM Console Overview ............................................................. C-1
C.1.1 Modem Usage ....................................................................... C-2
C.1.2 Entering and Leaving Command Mode ................................. C-5
C.1.3 RCM Commands ................................................................. C-6
C.1.4 Dial-Out Alerts ................................................................. C-15
C.1.5 Resetting the RCM to Factory Defaults ............................... C-18
C.1.6 Troubleshooting Guide ....................................................... C-19
C.1.7 Modem Dialog Details ...................................................... C-22

Index

Examples

2-1  SROM Errors Reported at Power-Up ........................................ 2-11
2-2  XSROM Errors Reported at Power-Up .................................... 2-15
2-3  Power-Up Display .................................................................... 2-20
3-1  Test Command Syntax ........................................................... 3-12
3-2  Sample Test Command .......................................................... 3-13
3-3  Sample Test Memory Command ............................................. 3-15
3-4  Sample Test Command for PCI ............................................. 3-17
5-1  MCHK 670 ........................................................................... 5-12
5-2  MCHK 670 CPU and IOD-Detected Failure ......................... 5-17
5-3  MCHK 670 Read Dirty Failure ............................................. 5-23
5-4  MCHK 660 IOD Detected Failure .......................................... 5-29
5-5  MCHK 630 Correctable CPU Error ...................................... 5-34
5-6  MCHK 620 Correctable Error ............................................. 5-36
5-7  INFO 3 Command .............................................................. 5-50
5-8  INFO 5 Command .............................................................. 5-52
5-9  INFO 8 Command .............................................................. 5-54
A-1 Starting LFU from the SRM Console ................................. A-6
A-2 Updating Firmware from the Internal CD-ROM ................. A-8
A-3 Creating Update Diskettes on an OpenVMS System .......... A-13
A-4 Updating Firmware from the Internal Floppy Disk ............ A-14
A-5 Selecting AS4X00FW to Update Firmware from the Internal Floppy Disk A-17
A-6 Updating Firmware from a Network Device .................... A-18
C-1 Sample Remote Dial-In Dialog ........................................... C-4
C-2 Entering and Leaving RCM Command Mode ..................... C-5
C-3 Configuring the Modem for Dial-Out Alerts ..................... C-15
C-4 Typical RCM Dial-Out Command ................................. C-15
Figures

1-1 Components of the BA30A System Drawer ......................................................1-2
1-2 Cover Interlock Circuit (BA30A) .....................................................................1-3
1-3 Components of the BA30C System Drawer ..................................................1-4
1-4 Cover Interlock Circuit (BA30C) .....................................................................1-5
1-5 Components of the BA30B System Drawer ..................................................1-6
1-6 Cover Interlock Circuit (BA30B) .....................................................................1-7
1-7 AlphaServer 4100 Cabinet System ..................................................................1-8
1-8 Cabinet Fan Tray ............................................................................................1-9
1-9 Pedestal System Front .....................................................................................1-10
1-10 Pedestal System Rear ....................................................................................1-11
1-11 Control Panel Assembly ...............................................................................1-12
1-12 Architecture Diagram ..................................................................................1-16
1-13 System Motherboard Module Locations ....................................................1-18
1-14 CPU Module Layout and Placement ............................................................1-20
1-15 Memory Module Layout and Placement ......................................................1-22
1-16 How Memory Addressing Is Calculated ......................................................1-24
1-17 System Bus Block Diagram and Slot Designation .......................................1-26
1-18 Bridge Module ..............................................................................................1-28
1-19 PCI Block Diagram .......................................................................................1-30
1-20 Server Control Module ...............................................................................1-32
1-21 Power Control Module ...............................................................................1-34
1-22 Location of Power Supply ..........................................................................1-36
2-1 Control Panel and LCD Display ......................................................................2-2
2-2 Power-Up Flow ...............................................................................................2-4
2-3 Contents of FEPROMs ..................................................................................2-5
2-4 Console Code Critical Path ............................................................................2-6
2-5 SROM Power-Up Test Flow ...........................................................................2-8
2-6 XSROM Power-Up Flowchart .........................................................................2-12
2-7 Console Device Determination Flowchart ....................................................2-18
3-1 CPU and Bridge Module LEDs ......................................................................3-2
3-2 Cabinet Power and Fan LEDs ........................................................................3-4
3-4 PCM LEDs .....................................................................................................3-8
3-5 I²C Bus Block Diagram ..................................................................................3-10
4-1 Power Supply Outputs ....................................................................................4-2
4-2 Power Control Module ...................................................................................4-4
4-3 Power Circuit Diagram ...................................................................................4-6
4-4 Power Up/Down Sequence Flowchart ...........................................................4-8
4-5 Simple -EB & -EC Cabinet Power Configuration ...........................................4-10
4-6 Worst-Case -EB & -EC Cabinet Power Configuration ......................................4-11
4-7 -EL & -EM Single Drawer Cabinet Power Configuration ................................4-12
4-8 -EL & -EM Three Drawer Cabinet Power Configuration ................................4-13
4-7 Pedestal Power Distribution (N.A. and Japan) ...............................................4-14
Pedestal Power Distribution (Europe and AP) ......................................... 4-15
Error Detector Placement ........................................................................ 5-2
System Drawer FRU Locations ............................................................. 7-2
Location of 4100 Power System FRUs ............................................... 7-8
Location of 4000 Power System FRUs .................................................. 7-10
Exposing System Drawer (H9A10-EB & -EC Cabinet) ......................... 7-12
Exposing System Drawer (H9A10-EL & -EM Cabinet) ......................... 7-14
Exposing System Drawer (Pedestal) ..................................................... 7-16
Removing CPU Module ........................................................................ 7-18
Removing CPU Fan ................................................................................ 7-20
Removing Memory Module ................................................................. 7-22
Removing Power Control Module ......................................................... 7-24
Removing System Bus to PCI/EISA Bus Bridge Module (B3040-AA) .... 7-26
Removing System Bus to PCI Bus Bridge Module (B3040-AB) ............... 7-28
Removing System Motherboard (4000 & early 4000) ......................... 7-30
Removing System Motherboard (Later 4000) ....................................... 7-32
Replacing PCI/EISA Motherboard ...................................................... 7-34
Replacing PCI Motherboard ............................................................... 7-36
Removing Server Control Module ....................................................... 7-38
Removing PCI/EISA Option ............................................................... 7-40
Removing Power Supply ...................................................................... 7-42
Removing Power Harness (4100 & early 4000) .................................... 7-44
Removing Power Harness (Later 4000) ................................................ 7-46
Removing System Drawer Fan ............................................................. 7-48
Removing Cover Interlocks (4100 & early 4000) .................................. 7-50
Removing Cover Interlocks (Later 4000) ............................................. 7-52
Removing OCP (Cabinet) .................................................................... 7-54
Removing OCP (Pedestal) .................................................................... 7-56
Removing Floppy Drive ....................................................................... 7-58
Removing CD-ROM ............................................................................ 7-60
Removing Cabinet Fan Tray ............................................................... 7-62
Removing Cabinet Fan Tray Power Supply ........................................ 7-64
Removing Cabinet Fan Tray Fan ....................................................... 7-66
Removing Fan Tray Fan Fail Detect Module ....................................... 7-68
Removing StorageWorks Shelf ............................................................. 7-70
Running a Utility from a Graphics Monitor .......................................... A-2
Starting LFU from the AlphaBIOS Console ......................................... A-6
AlphaBIOS Setup Screen .................................................................... A-25
RCM Connections ............................................................................... C-2

1–1 PCI Motherboard Slot Numbering .................................................. 1-31
2–1 Control Panel Display .................................................................... 2-3
**Intended Audience**
This manual is written for the customer service engineer.

**Document Structure**
This manual uses a structured documentation design. Topics are organized into small sections for efficient online and printed reference. Each topic begins with an abstract, followed by an illustration or example, and ends with descriptive text.

This manual has seven chapters and three appendixes, as follows:

- **Chapter 1, System Overview**, introduces the DIGITAL AlphaServer 4000/4100 pedestal and cabinet systems and gives an overview of the system bus modules.
- **Chapter 2, Power-Up**, provides information on how to interpret the power-up display on the operator control panel, the console screen, and system LEDs. It also describes how hardware diagnostics execute when the system is initialized.
- **Chapter 3, Troubleshooting**, describes troubleshooting during power-up and booting, as well as the `test` command.
- **Chapter 4, Power System**, describes the AlphaServer 4000/4100 power system.
- **Chapter 5, Error Logs**, explains how to interpret error logs and how to use DECevent.
- **Chapter 6, Error Registers**, describes the error registers used to hold error information.
- **Chapter 7, Removal and Replacement**, describes removal and replacement procedures for field-replaceable units (FRUs).
- **Appendix A, Running Utilities**, explains how to run utilities such as the EISA Configuration Utility and RAID Standalone Configuration Utility.
• Appendix B, SRM Console Commands and Environment Variables, summarizes the commands used to examine and alter the system configuration.

• Appendix C, Operating the System Remotely, describes how to use the remote console monitor (RCM) to monitor and control the system remotely.

**Documentation Titles**

Table 1 lists titles related to AlphaServer 4000/4100 systems.

<table>
<thead>
<tr>
<th>Title</th>
<th>Order Number</th>
</tr>
</thead>
<tbody>
<tr>
<td><strong>ApphaServer 4100 User and Configuration</strong></td>
<td></td>
</tr>
<tr>
<td>Documentation Kit</td>
<td>QZ–00VAA–GZ</td>
</tr>
<tr>
<td>Configuration and Installation Guide</td>
<td>EK–4100A–CG</td>
</tr>
<tr>
<td><strong>ApphaServer 4000 User and Configuration</strong></td>
<td></td>
</tr>
<tr>
<td>Documentation Kit</td>
<td>QZ–00VAB–GZ</td>
</tr>
<tr>
<td>System Drawer User’s Guide</td>
<td>EK–4000A–UG</td>
</tr>
<tr>
<td>Configuration and Installation Guide</td>
<td>EK–4100A–CG</td>
</tr>
<tr>
<td>Service Manual (hard copy)</td>
<td>EK–4100A–SV</td>
</tr>
<tr>
<td>Service Manual (diskette)</td>
<td>AK–QXBJB–CA</td>
</tr>
<tr>
<td>System Drawer Upgrades</td>
<td>EK–4041A–UI</td>
</tr>
<tr>
<td>PCI Upgrade</td>
<td>EK–4000A–UI</td>
</tr>
<tr>
<td>KN30n CPU Installation Card</td>
<td>EK–KN300–IN</td>
</tr>
<tr>
<td>MS3n0 Memory Installation Card</td>
<td>EK–MS300–IN</td>
</tr>
<tr>
<td>H7291 Power Supply Installation Card</td>
<td>EK–H7291–IN</td>
</tr>
<tr>
<td><strong>ServerWORKS Manager Administrator User’s Guide</strong></td>
<td></td>
</tr>
<tr>
<td></td>
<td>ER–4QXAA–UA</td>
</tr>
</tbody>
</table>
Information on the Internet

Using a Web browser you can access the AlphaServer InfoCenter at:

Access the latest system firmware either with a Web browser or via FTP as follows:

Interim firmware released since the last firmware CD is located at:
This chapter introduces the DIGITAL AlphaServer 4000 and the DIGITAL AlphaServer 4100 systems. These systems are available in cabinets or pedestals.

There are three system drawers; two, the BA30B and the BA30C, are used in the AlphaServer 4000, and the third, the BA30A, is used in the AlphaServer 4100.

The pedestal system has one system drawer and up to three StorageWorks shelves. The cabinet system can have a combination of system drawers and StorageWorks shelves that occupy the five sections of the cabinet.

Topics in this chapter include the following:

- AlphaServer 4100 System Drawer (BA30A)
- AlphaServer 4000 System Drawer (BA30C)
- AlphaServer 4000 System Drawer (BA30B)
- Cabinet System
- Pedestal System
- Control Panel and Drives
- System Consoles
- System Architecture
- System Motherboard
- CPU Types
- Memory Modules
- Memory Addressing
- System Bus
- System Bus to PCI Bus Bridge Module
- PCI I/O Subsystem
- Server Control Module
- Power Control Module
- Power Supply
1.1 AlphaServer 4100 System Drawer (BA30A)

Components in the BA30A system drawer are located in the system bus card cage, the PCI card cage, the control panel assembly, and the power and cooling section. The drawer measures 30 cm x 45 cm (11.8 in. x 17.7 in.) and fully configured weighs approximately 45.5 kg (~100 lbs).

Figure 1-1 Components of the BA30A System Drawer

When the system drawer is in a pedestal, the control panel assembly is mounted in a tray at the top of the drawer.

The numbered callouts in Figure 1-1 refer to components of the system drawer.
1. System card cage, which holds the system motherboard and the CPU, memory, bridge, and power control modules. (The difference between the BA30A and the BA30C is the system motherboard.)
2. PCI/EISA card cage, which holds the PCI motherboard, option cards, and server control module.
3. Server control module, which holds the I/O connectors and remote console monitor.
4. Control panel assembly, which includes the control panel, a floppy drive, and a CD-ROM drive.
5. Power and cooling section, which contains one to three power supplies and fans.

Cover Interlocks
The system drawer has three cover interlocks: one for the system bus card cage, one for the PCI card cage, and one for the power and system fan area.

Figure 1-2 Cover Interlock Circuit

NOTE: The cover interlocks must be engaged to enable power-up.

To override the cover interlocks, find a suitable object to close the interlock circuit.
1.2 AlphaServer 4000 System Drawer (BA30C)

Components in the BA30C system drawer are located in the system bus card cage, PCI card cage, control panel assembly, and power and cooling section. The drawer measures 30 cm x 45 cm (11.8 in. x 17.7 in.) and fully configured weighs approximately 45.5 kg (~100 lbs).

Figure 1-3 Components of the BA30C System Drawer

When the system drawer is in a pedestal, the control panel assembly is mounted in a tray at the top of the drawer.

The numbered callouts in Figure 1-3 refer to components of the system drawer.
1 System card cage, which holds the system motherboard and the CPU, memory, bridge, and power control modules. (The difference between the BA30A and the BA30C is the system motherboard.)

2 PCI/EISA card cage, which holds the PCI motherboard, option cards, and server control module.

3 Server control module, which holds the I/O connectors and remote console monitor.

4 Control panel assembly, which includes the control panel, a floppy drive, and a CD-ROM drive.

5 Power and cooling section, which contains one to three power supplies and fans.

Cover Interlocks
The system drawer has three cover interlocks: one for the system bus card cage, one for the PCI card cage, and one for the power and system fan area.

Figure 1-4 Cover Interlock Circuit

NOTE: The cover interlocks must be engaged to enable power-up.

To override the cover interlocks, find a suitable object to close the interlock circuit.
1.3 AlphaServer 4000 System Drawer (BA30B)

Components in the BA30B system drawer are located in the system bus card cage, two PCI card cages, the control panel assembly, and the power and cooling section. The drawer measures 30 cm x 45 cm (11.8 in. x 17.7 in.) and fully configured weighs approximately 45.5 kg (~100 lbs).

Figure 1-5 Components of the BA30B System Drawer

When the system drawer is in a pedestal, the control panel assembly is mounted in a tray at the top of the drawer.

The numbered callouts in Figure 1-5 refer to components of the system drawer.
1. System card cage holds the system motherboard, the CPU, memory, bridge, and power control modules.
2. PCI/EISA card cage holds the PCI/EISA motherboard for PCI/EISA 0 and PCI 1, option cards, and server control module.
3. Server control module holds the I/O connectors and remote console.
4. Control panel assembly holds the control panel, a floppy, and a CD-ROM.
5. Power and cooling section contains one to three power supplies and three fans.
6. PCI card cage holds the PCI motherboard for PCI 2 and PCI 3.

**Cover Interlocks**
The system drawer has four cover interlocks: one for each section of the drawer.

**Figure 1-6 Cover Interlock Circuit**
1.4 Cabinet System

The AlphaServer 4000/4100 cabinet system can accommodate multiple systems in a single cabinet. There are four cabinet variations that can hold different system configurations. Differences are in power distribution and drawer mounting; from the outside the cabinets look almost identical.

Figure 1-7 AlphaServer 4000/4100 Cabinet System
Cabinet Differences

<table>
<thead>
<tr>
<th>Cabinet</th>
<th>Power</th>
<th>Mounting</th>
<th>Destination</th>
</tr>
</thead>
<tbody>
<tr>
<td>H9A10-EB</td>
<td>AC input box power strips</td>
<td>C channel</td>
<td>North America</td>
</tr>
<tr>
<td></td>
<td></td>
<td>(max drawers: 4)</td>
<td>Asia Pacific</td>
</tr>
<tr>
<td>H9A10-EC</td>
<td>AC input box power strips</td>
<td>C channel</td>
<td>Europe</td>
</tr>
<tr>
<td></td>
<td></td>
<td>(max drawers: 4)</td>
<td></td>
</tr>
<tr>
<td>H9A10-EL</td>
<td>Two 120 volt H7600-AA power controllers</td>
<td>Pull-out tray</td>
<td>North America</td>
</tr>
<tr>
<td></td>
<td></td>
<td>(max drawers: 3)</td>
<td>Asia Pacific</td>
</tr>
<tr>
<td>H9A10-EM</td>
<td>Two 240 volt H7600-DB power controllers</td>
<td>Pull-out tray</td>
<td>Europe</td>
</tr>
<tr>
<td></td>
<td></td>
<td>(max drawers: 3)</td>
<td></td>
</tr>
</tbody>
</table>

Cabinet System Fan Tray

At the top of cabinet systems is a fan tray containing three exhaust fans, a small 12-volt power supply, and a module that distributes power to the server control module in each drawer.

Figure 1-8 Cabinet Fan Tray
1.5 Pedestal System

The pedestal system contains one system drawer with a control panel, a CD-ROM drive, and a floppy drive. In the pedestal control panel area there is space for an optional tape or disk drive. Three StorageWorks shelves provide up to 90 Gbytes of in-cabinet storage.

Figure 1-9 Pedestal System Front

In the pedestal system, the control panel is located at the top left in a tray. See Figure 1-11. There is space for an optional device beside it.
Figure 1-10  Pedestal System Rear
### 1.6 Control Panel and Drives

The control panel includes the On/Off, Halt, and Reset buttons and a display. In a pedestal system the control panel is located in a tray at the top of the system drawer. In a cabinet system it is at the bottom of the system drawer with the CD-ROM drive and the floppy drive.

#### Figure 1-11 Control Panel Assembly

1. On/Off button. Powers the system drawer on or off. When the LED at the top of the button is lit, the power is on. The On/Off button is connected to the power supplies and the system interlocks.
NOTE: The LEDs on some modules are on when the line cord is plugged in, regardless of the position of the On/Off button.

2 Halt button. Pressing this button in (so the LED at the top of the button is on) does the following:

If DIGITAL UNIX or OpenVMS is running, halts the operating system and returns to the SRM console. The Halt button has no effect on Windows NT.

If the Halt button is in when the system is reset or powered up, the system halts in the SRM console, regardless of the operating system. DIGITAL UNIX and OpenVMS systems that are configured for autoboot will not boot if the Halt button is in. Windows NT systems halt in the SRM console; AlphaBIOS is not loaded and started.

If you press the Halt button in (LED on) and do not issue commands that disturb the system state, entering the continue command returns the system to the operating system it was running. To return to console mode again, press the Halt button (LED off) and then press it again (LED on).

If the system is hung, pressing the Halt button (LED on) usually brings up the SRM console. Enter the crash command to do a crash dump. If pressing the Halt button does not bring up the SRM console, there is probably a hardware fault that is not allowing the halt signal to pass from the XBUS to the CPU.

3 Reset button. Initializes the system drawer. If the Halt button is pressed (LED on) when the system is reset, the SRM console is loaded and remains in the system regardless of any other conditions.

4 Control panel display. Indicates status during power-up and self-test. The OCP display is a 16-character LCD. Its controller is on the XBUS on the PCI motherboard.

While the operating system is running, displays the system type as a default. This message can be changed by the user.

CD-ROM drive. The CD-ROM drive is used to load software, firmware, and updates. Its controller is on PCI11 on the PCI motherboard.

Floppy disk drive. The floppy drive is used to load software and firmware updates. The floppy controller is on the XBUS on the PCI motherboard.
1.7 System Consoles

There are two console programs: the SRM console and the AlphaBIOS console.

SRM Console Prompt

On systems running the DIGITAL UNIX or OpenVMS operating system, the following console prompt is displayed after system startup messages are displayed, or whenever the SRM console is invoked:

P00>>>

NOTE: The console prompt displays only after the entire power-up sequence is complete. This can take up to several minutes if the memory is very large.

AlphaBIOS Boot Menu

On systems running the Windows NT operating system, the Boot menu is displayed when the AlphaBIOS console is invoked:

AlphaBIOS Version 5.12

Please select the operating system to start:

Windows NT Server 3.51

Use ↓ and ↑ to move the highlight to your choice.
Press Enter to choose.
SRM Console

The SRM console is a command-line interface that is used to boot the DIGITAL UNIX and OpenVMS operating systems. It also provides support for examining and modifying the system state and configuring and testing the system. The SRM console can be run from a serial terminal or a graphics monitor.

AlphaBIOS Console

The AlphaBIOS console is a menu-based interface that supports the Microsoft Windows NT operating system. AlphaBIOS is used to set up operating system selections, boot Windows NT, and display information about the system configuration. The EISA Configuration Utility and the RAID Standalone Configuration Utility are run from the AlphaBIOS console. AlphaBIOS runs on either a serial or graphics terminal, but Windows NT requires a graphics monitor.

Environment Variables

Environment variables are software parameters that define, among other things, the system configuration. They are used to pass information to different pieces of software running in the system at various times. The `os_type` environment variable, which can be set to VMS, UNIX, or NT, determines which of the two consoles is to be used. The SRM console is always brought into memory, but AlphaBIOS is loaded if `os_type` is set to NT and the Halt button is out (not lit).

Refer to Appendix B of this guide for a list of the environment variables used to configure AlphaServer 4000 and 4100 systems.

Refer to the AlphaServer 4x00 System Drawer User’s Guide for information on setting environment variables.

It is recommended that you keep a record of the environment variables for each system that you service. Some environment variable settings are lost when a module is swapped and must be restored after the new module is installed. Refer to Appendix B for a convenient worksheet for recording environment variable settings.
1.8 System Architecture

Alpha microprocessor chips are used in these systems. The CPU, memory, and the I/O bridge module(s) are connected to the system bus motherboard.
AlphaServer 4000/4100 systems use the Alpha chip for the CPU. The CPU, memory, and I/O bridge modules, one to PCI/EISA I/O buses and another (4000 only) to another pair of PCIs, are connected to the system bus motherboard. A fourth type of module, the power control module, also plugs into the system motherboard. A fully configured 4100 system drawer can have up to four CPUs, four memory pairs, and a total of eight I/O options. The I/O options can be all PCI options or a combination of PCI options and EISA options, but there can be no more than three EISA options. A fully configured 4000 system drawer can have up to two CPUs, two memory options, and a total of sixteen I/O options. The I/O options can be all PCI options or a combination of PCI options and EISA options, but there can be no more than three EISA options.

The system bus has a 144-bit data bus protected by 16 bits of ECC and a 40-bit command/address bus protected by parity. The bus speed depends on the speed of the CPU in slot 0 which provides the clock for the buses. The 40-bit address bus can create one terabyte of addresses (that’s a million billion). The bus connects CPUs, memory, and the system bus to PCI bus bridge(s).

The CPU modules are available with and without an external cache. The Alpha chip has an 8-Kbyte instruction cache (I-cache), an 8-Kbyte write-through data cache (D-cache), and a 96-Kbyte, write-back secondary data cache (S-cache). Some variants of the CPU module include an onboard cache. The cache system is write-back. The system drawer supports up to four CPUs.

The memory modules are placed on the system motherboard in pairs. Each module drives half of the system bus, along with the associated ECC bits. Memory pairs consist of two modules that are the same size and type. Two types are available: synchronous and asynchronous (EDO) memory.

The system bus to PCI bus bridge module translates system bus commands and data addressed to I/O space to PCI commands and data. It also translates PCI bus commands and data addressed to system memory or CPUs to system bus commands and data. The PCI bus is a 64-bit wide bus used for I/O. Both the 4100 and the 4000 have one PCI/EISA card cage, and the 4000 may contain a second PCI card cage.

The power control module, which is on the system motherboard, monitors power and the system environment.
1.9 System Motherboard

The system motherboard is on the floor of the system card cage. It has slots for the CPU, memory, power control, and bridge modules.

Figure 1-13 System Motherboard Module Locations

4100 Motherboard (54-23803-01)

4000 Motherboard (54-23803-02)

4000 Motherboard (54-23805-01)
The system motherboard has the logic for the system bus. It is the backplane that holds the CPU, memory, bridge, and power control modules. Figure 1-13 shows diagrams of the three motherboards used in AlphaServer 4000/4100 systems. The module locations are designated by the callouts.

1. CPU module
2. Memory module
3. Bridge module
4. Power control module
1.10 CPU Types

AlphaServer 4000 and 4100 systems can be configured with one of several CPU variants. Variants are differentiated by CPU speeds and the presence or absence of a backup data cache external to the Alpha microprocessor chip.

Figure 1-14 CPU Module Layout

Alpha Chip Composition
The Alpha chip is made using state-of-the-art chip technology, has a transistor count of 9.3 million, consumes 50 watts of power, and is air cooled (a fan is on the chip). The default cache system is write-back and when the module has an external cache, it is write-back.
Chip Description

<table>
<thead>
<tr>
<th>Unit</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>Instruction</td>
<td>8-byte cache, 4-way issue</td>
</tr>
<tr>
<td>Execution</td>
<td>4-way execution; 2 integer units, 1 floating-point adder, 1 floating-point multiplier</td>
</tr>
<tr>
<td>Memory</td>
<td>Merge logic, 8-Kbyte write-through first-level data cache, 96-Kbyte write-back second-level data cache, bus interface unit</td>
</tr>
</tbody>
</table>

CPU Variants

<table>
<thead>
<tr>
<th>Module Variant</th>
<th>Clock Frequency</th>
<th>Onboard Cache</th>
</tr>
</thead>
<tbody>
<tr>
<td>B3001-CA</td>
<td>300 MHz</td>
<td>None</td>
</tr>
<tr>
<td>B3002-AB</td>
<td>300 MHz</td>
<td>2 Mbytes</td>
</tr>
<tr>
<td>B3004-BA</td>
<td>300 MHz</td>
<td>2 Mbytes</td>
</tr>
<tr>
<td>B3004-AA</td>
<td>400 MHz</td>
<td>4 Mbytes</td>
</tr>
<tr>
<td>B3004-DA</td>
<td>466 MHz</td>
<td>4 Mbytes</td>
</tr>
</tbody>
</table>

CPU Configuration Rules

- The first CPU must be in CPU slot 0 to provide the system clock.
- Additional CPU modules should be installed in ascending order by slot number.
- All CPUs must have the same Alpha chip clock speed. The system bus will hang without an error message if the oscillators clocking the CPUs are different.
- Mixing of cached and uncached CPUs is not supported.

Color Codes

The top edge of the CPU module variant is color coded for easy identification.

<table>
<thead>
<tr>
<th>Color</th>
<th>Option Number</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>Dark Blue</td>
<td>B3001-CA</td>
<td>300 MHz, uncached</td>
</tr>
<tr>
<td>Green</td>
<td>B3002-AB</td>
<td>300 MHz, 2MB cached</td>
</tr>
<tr>
<td>Green</td>
<td>B3004-BA</td>
<td>300 MHz, 2MB cached</td>
</tr>
<tr>
<td>Orange</td>
<td>B3004-AA</td>
<td>400 MHz, 4MB cached</td>
</tr>
<tr>
<td>Red</td>
<td>B3004-DA</td>
<td>466 MHz, 4MB cached</td>
</tr>
</tbody>
</table>
1.11 Memory Modules

Memory modules are used only in pairs — two modules of the same size and type. Each module provides either the low half or the high half of the memory space. The 4100 system drawer can hold up to four memory module pairs. The 4000 system drawer can hold up to two memory module pairs.

Figure 1-15 Memory Module Layout

Typical Synchronous Memory

Typical EDO Memory
Memory Variants

Each memory option consists of two identical modules. Each 4100 drawer supports up to four memory options, for a total of 4 Gbytes of memory: 4000 drawers support half that. Memory modules are used only in pairs and are available in 128 Mbyte, 512 Mbyte, and 1 Gbyte sizes. The 128-Mbyte option is synchronous memory, while the larger sizes are asynchronous memory (EDO).

<table>
<thead>
<tr>
<th>Option</th>
<th>Size</th>
<th>Module</th>
<th>Type</th>
<th>Number</th>
<th>Size</th>
</tr>
</thead>
<tbody>
<tr>
<td>MS320-CA</td>
<td>128 MB</td>
<td>B3020-CA</td>
<td>Synch.</td>
<td>36</td>
<td>4 MB x 4</td>
</tr>
<tr>
<td>MS330-EA</td>
<td>512 MB</td>
<td>B3030-EA</td>
<td>Asynch. (EDO)</td>
<td>144</td>
<td>4 MB x 4</td>
</tr>
<tr>
<td>MS330-FA</td>
<td>1 GB</td>
<td>B3030-FA</td>
<td>Asynch. (EDO)</td>
<td>72</td>
<td>16 MB x 4</td>
</tr>
<tr>
<td>MS330-GA</td>
<td>2 GB</td>
<td>B3030-GA</td>
<td>Asynch. (EDO)</td>
<td>144</td>
<td>16 MB x 4</td>
</tr>
</tbody>
</table>

Memory Operation

Memory modules are used only in pairs; each module provides half the data, or 64 bits plus 8 ECC bits, of the octaword (16 byte) transferred on the system bus. Modules are placed in slots designated MEMxL and MEMxH.

NOTE: Modules in slots MEMxL do not drive the lower 8 bytes, and modules in slots MEMxH do not drive the higher 8 bytes of the 16 byte transfer.

Unless otherwise programmed, memory drives the system bus in bursts. Upon each memory fetch, data is transferred in 4 consecutive cycles transferring 64 bytes. There are situations, however, when memories made with EDO DRAMs cannot provide data fast enough to complete the system bus transactions. When these situations arise, EDO type memories assert a signal that causes the system bus to stall for one (occasionally more) clock tick. When memory completes such an operation, it releases the system bus.

Memory Configuration Rules

In a system, memories of different sizes and types are permitted, but:

- Memory modules are installed and used in pairs. Both modules in a memory pair must be of the same size and type.
- The largest memory pair must be in slots MEM 0L and MEM 0H.
- Other memory pairs must be the same size or smaller than the first memory pair.
- Memory pairs must be installed in consecutive slots.
1.12 Memory Addressing

Alpha system memory addressing is unusual because memory address space is determined not by the amount of physical memory but is calculated by a multiple of the size of the memory pair in slot MEM0x.

Figure 1-16 How Memory Addressing Is Calculated

- Fourth pair address space
  - 512 Mbyte space empty

- Third pair address space
  - 512 Mbyte 1/2 occupied
  - (2 B3020-DA - 128 Mbyte/mod)

- Second pair address space
  - 512 Mbyte 1/2 occupied
  - (2 B3020-DA - 128 Mbyte/mod)

- First pair defines total address space always fully occupied
  - (2 B3020-EA 256 Mbyte/mod)

PKW0424-96
The rules for addressing memory are as follows:

1. Address space is determined by the memory pair in slot MEM0.
2. Memory pairs need not be the same size.
3. The memory pair in slot MEM0 must be the largest of all memory pairs. Other memory pairs may be as large but none may be larger.
4. The starting address of each memory pair is N times the size of the memory pair in slot MEM0. N=0,1,2,3.
5. Memory addresses are contiguous within each module pair.
6. If memory pairs are of different sizes, memory “holes” can occur in the physical address space. See Figure 1-16.
7. Software creates contiguous virtual memory even though physical memory may not be contiguous.
### 1.13 System Bus

The system bus consists of a 40-bit command/address bus, a 128-bit plus ECC data bus, and several control signals and clocks.

**Figure 1-17 System Bus Block Diagram**
The system bus motherboard consists of a 40-bit command/address bus, a 128-bit plus ECC data bus, and several control signals, clocks, and a bus arbiter. The bus requires that all CPUs have the same high-speed oscillator providing the clock to the Alpha chip.

The AlphaServer 4100 system bus connects up to four CPUs, four pairs of memory modules, and a single I/O bus bridge module. Note that the I/O bus bridges may be designated as IOD\(n\) where \(n\) is the number of the PCI bus. The first bridge is designated IOD0 and IOD1.

The AlphaServer 4000 system bus connects up to two CPUs, two pairs of memory modules, and two I/O bus bridge modules. The second bridge on the 4000 system bus is designated IOD2 and IOD3.

The system bus clock is provided by an oscillator on the CPU in slot CPU0. This oscillator has a 1:5 ratio to the Alpha chip. With 300 MHz CPUs, for example, the system bus operates at 60 MHz.

The system bus motherboard initiates memory refresh transactions. The motherboard sits at the bottom of the system drawer, and in addition to CPUs, memory, and I/O bridges, holds a power control module.

5 volt and 3.43 volt power is provided directly to the motherboard from the power supplies.
1.14 System Bus to PCI Bus Bridge Module

The bridge module is the physical interconnect between the system motherboard and any PCI motherboard in the system.

Figure 1-18  Bridge Module
The system bus to PCI bus bridge module converts system bus commands and data addressed to I/O space to PCI commands and data; and converts PCI bus commands and data addressed to system memory or CPUs to system bus commands and data. An AlphaServer 4100 system has one bridge module; an AlphaServer 4000 system can have a second bridge module.

The bridge has two major components:

- Command/address processor (CAP) chip
- Two data path chips (MDPA and MDPB)

There are two sets of these three chips, one set on each side of the module. Each set bridges to one of the PCI buses on the PCI motherboard.

The interface on the system bus side of the bridge responds to system bus commands addressed to the upper 64 Gbytes of I/O space. I/O space is addressed whenever bit <39> on the system bus address lines is set. The space so defined is 512 Gbytes in size. The first 448 Gbytes are reserved and the last 64 Gbytes, when bits <38:36> are set, are mapped to the PCI I/O buses.

The interface on the PCI side of the bridge responds to commands addressed to CPUs and memory on the system bus. On the PCI side, the bridge provides the interface to the PCs. Each PCI bus is addressed separately. The bridge does not respond to devices communicating with each other on the same PCI bus. However, should a device on one PCI address a device on the other PCI bus, commands, addresses, and data run through the bridge out onto the system bus and back through the bridge to the other PCI bus.

In addition to its bridge function, the system bus to PCI bus bridge module monitors every transaction on the system bus for errors. It monitors the data lines for ECC errors and the command/address lines for parity errors.

NOTE: When errors are logged, the two bridge modules on the AlphaServer 4000 are differentiated in the error log by their engineering code names, the left hand horse and the right hand horse. The left hand horse is the B3040-AA module; it is in the left most slot on the system bus motherboard when seen from the rear of the drawer. The right hand horse is the B3040-AB module, and it is in the right most slot on the system bus motherboard when seen from the rear of the drawer.
1.15 PCI I/O Subsystem

The I/O subsystem is PCI. Both the 4100 and the 4000 have two four-slot PCI buses that hold up to eight I/O options. One of these buses can be both PCI and EISA but can hold not more than four options three of which may be EISA. The 4000 can have an additional two four-slot PCI buses allowing a total of sixteen I/O options.

Figure 1-19 PCI Block Diagram
Table 1-1 PCI Motherboard Slot Numbering

<table>
<thead>
<tr>
<th>Slot</th>
<th>PCI0</th>
<th>PCI1</th>
<th>PCI2 (4000 only)</th>
<th>PCI3 (4000 only)</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>Reserved</td>
<td>Reserved</td>
<td>Reserved</td>
<td>Reserved</td>
</tr>
<tr>
<td>1</td>
<td>PCI to EISA bridge</td>
<td>Internal CD-ROM controller</td>
<td>Reserved</td>
<td>Reserved</td>
</tr>
<tr>
<td>2</td>
<td>PCI or EISA slot</td>
<td>PCI slot</td>
<td>PCI slot</td>
<td>PCI slot</td>
</tr>
<tr>
<td>3</td>
<td>PCI or EISA slot</td>
<td>PCI slot</td>
<td>PCI slot</td>
<td>PCI slot</td>
</tr>
<tr>
<td>4</td>
<td>PCI or EISA slot</td>
<td>PCI slot</td>
<td>PCI slot</td>
<td>PCI slot</td>
</tr>
<tr>
<td>5</td>
<td>PCI slot</td>
<td>PCI slot</td>
<td>PCI slot</td>
<td>PCI slot</td>
</tr>
</tbody>
</table>

The logic for two PCI buses is on each PCI motherboard.

- **PCI0** is a 64-bit bus with a built-in PCI to EISA bus bridge. PCI0 has one dedicated PCI slot and three slots, though there are six connectors, that can be PCI or EISA slots. Each slot has an EISA connector and a PCI connector only one of which may be used at a time. PCI0 is powered by 5V.

- **PCI1** is a 64-bit bus with a built-in CD-ROM controller and four PCI slots. PCI1 is powered by 5V.

- **PCI2 (4000 only)** is a 64-bit four-slot PCI bus powered by both 3V and 5V.

- **PCI3 (4000 only)** is a 64-bit four-slot PCI bus powered by both 3V and 5V.

The B3050-AA PCI motherboard has cable connections to remote I/O (mouse, keyboard, serial port, and parallel port), an internal floppy drive, an internal CD-ROM drive, the control panel, and 5V power. Also on this module are the chips for the PCI to EISA bridge and the internal CD-ROM controller. This module is the motherboard for the PCI card cage on the left side of the system drawer.

An 8-bit XBUS is connected to the EISA bus. On this bus there is an interface to the system I'C bus; mouse and keyboard support; an I/O combo controller supporting two serial ports, the floppy controller, and a parallel port; a real-time clock; two 1-Mbyte flash ROMs containing system firmware, and an 8-Kbyte NVRAM.

The B3050-AB PCI motherboard, used only in the AlphaServer 4000, contains two four-slot 64-bit PCI buses.
1.16 Server Control Module

The server control module enables remote console connections to the system drawer. The module passes signals to COM ports 1 and 2, the keyboard, and the mouse to the standard I/O connectors.

Figure 1-20 Server Control Module
The server control module has two sections: the remote console monitor (RCM) and the standard I/O. See Appendix C for information on controlling the system remotely.

The remote console monitor connects to a modem through the modem port on the bulkhead. The RCM requires a 12V power connection.

The standard I/O ports (keyboard, mouse, COM1 and COM2 serial, and parallel ports) are on the same bulkhead.
1.17 Power Control Module

The power control module controls power sequencing and monitors power supply voltage, temperature, and fans.

Figure 1-21 Power Control Module
The power control module performs these functions:

- Controls power sequencing.
- Monitors the combined output of power supplies and shuts down power if it is not in range.
- Monitors system temperature and shuts off power if it is out of range.
- Monitors the fans in the system drawer and on the CPU modules and shuts down power if a fan fails.
- Provides visual indication of faults through LEDs.
1.18 Power Supply

The system drawer power supplies provide power only to components in the drawer. One or two power supplies are required, depending on the number of CPU modules and PCI card cages; a second or third can be added for redundancy. The power system is described in detail in Chapter 4.

Figure 1-22 Location of Power Supply
Description

One to three power supplies provide power to components in the system drawer. (They supply power only for the drawer in which they are located.) Three power supplies provide redundant power in fully loaded AlphaServer 4000/4100 systems. These power supplies share the load, and redundant configurations are supported. They autoselect line voltage (120V to 240V). Each has 450 W output and supplies up to 75A of 3.43V, 50A of 5.0V, 11A of 12V, and small amounts of –5V, –12V, and auxiliary voltage (Vaux).

NOTE: The LEDs on some modules are on when the line cord is plugged in, regardless of the position of the On/Off button.

Configuration

- An AlphaServer 4100 system with one or two CPUs requires one power supply (two for redundancy).
- An AlphaServer 4100 system with three or four CPUs requires two power supplies (three for redundancy).
- An AlphaServer 4000 system with one or two CPUs and one PCI card cage requires one power supply (two for redundancy).
- An AlphaServer 4000 system with one or two CPUs and two PCI card cages requires two power supplies (three for redundancy).
- Power supply 0 is installed first, power supply 2 second, and power supply 1 third. See Figure 1-22. (The power supply numbering shown here corresponds to the numbering displayed by the SRM console’s show power command.)
This chapter describes system power-up testing and explains the power-up displays. The following topics are covered:

- Control Panel
- Power-Up Sequence
- SROM Power-Up Test Flow
- SROM Errors Reported
- XSROM Power-Up Test Flow
- XSROM Errors Reported
- Console Power-Up Tests
- Console Device Determination
- Console Power-Up Display
- Fail-Safe Loader
2.1 Control Panel

The control panel display indicates the likely device when testing fails.

Figure 2-1 Control Panel and LCD Display

- When the On/Off button LED is on, power is applied and the system is running. When it is off, the system is not running, but power may or may not be present. If power is present, the PCM or the power LED on the system bus to PCI bus bridge module should be flashing. Otherwise, there is a power problem.
- When the Halt button LED is lit and the On/Off button is on, the system should be running either the SRM console or Windows NT. If the Halt button is in, but the LED is off, the OCP, its cables, or the PCM is likely to be broken.
<table>
<thead>
<tr>
<th>Field</th>
<th>Content</th>
<th>Display</th>
<th>Meaning</th>
</tr>
</thead>
<tbody>
<tr>
<td>➊</td>
<td>CPU number</td>
<td>P0–P3</td>
<td>CPU reporting status</td>
</tr>
<tr>
<td>➋</td>
<td>Status</td>
<td>TEST</td>
<td>Tests are executing</td>
</tr>
<tr>
<td></td>
<td></td>
<td></td>
<td>Failure has been detected</td>
</tr>
<tr>
<td></td>
<td></td>
<td></td>
<td>Machine check has occurred</td>
</tr>
<tr>
<td></td>
<td></td>
<td></td>
<td>Error interrupt has occurred</td>
</tr>
<tr>
<td>➌</td>
<td>Test number</td>
<td>CPU0–3</td>
<td>CPU module number</td>
</tr>
<tr>
<td></td>
<td>Suspected device</td>
<td>MEM0–3 and L, H, or *</td>
<td>Memory module number and low module, high module, or either</td>
</tr>
<tr>
<td></td>
<td></td>
<td></td>
<td>IOD0 Bridge to PCI bus 0</td>
</tr>
<tr>
<td></td>
<td></td>
<td></td>
<td>IOD1 Bridge to PCI bus 1</td>
</tr>
<tr>
<td></td>
<td></td>
<td></td>
<td>IOD2 Bridge to PCI bus 2</td>
</tr>
<tr>
<td></td>
<td></td>
<td></td>
<td>IOD3 Bridge to PCI bus 3</td>
</tr>
<tr>
<td></td>
<td></td>
<td>FROM0</td>
<td>Flash ROM</td>
</tr>
<tr>
<td></td>
<td></td>
<td>COMBO</td>
<td>COM controller</td>
</tr>
<tr>
<td></td>
<td></td>
<td>PCEB</td>
<td>PCI-to-EISA bridge</td>
</tr>
<tr>
<td></td>
<td></td>
<td>ESC</td>
<td>EISA system controller</td>
</tr>
<tr>
<td></td>
<td></td>
<td>NVRAM</td>
<td>Nonvolatile RAM</td>
</tr>
<tr>
<td></td>
<td></td>
<td>TOY</td>
<td>Real-time clock</td>
</tr>
<tr>
<td></td>
<td></td>
<td>I8242</td>
<td>Keyboard and mouse controller</td>
</tr>
</tbody>
</table>

The potentiometer, accessible through the access hole just above the Reset button controls the intensity of the LCD. Use a small Phillips head screwdriver to adjust.

---

1 CPU module
2 Memory module
3 Bridge module (B3040-AA)
4 Bridge module (B3040-AB)
5 EISA/PCI motherboard
2.2 Power-Up Sequence

Console and most power-up tests reside on the I/O subsystem, not on the CPU nor on any other module on the system bus.

Figure 2-2 Power-Up Flow

Definitions

SROM. The SROM is a 128-Kbit ROM on each CPU module. The ROM contains minimal diagnostics that test the Alpha chip and the path to the XSROM. Once the path is verified, it loads XSROM code into the Alpha chip and jumps to it.

XSROM. The XSROM, or extended SROM, contains back-up cache and memory tests, and a fail-safe loader. The XSROM code resides in sector 0 of FEPROM 0 on
the XBUS. Sector 2 of FEPROM 0 contains a duplicate copy of the code and is used if sector 0 is bad.

**FEPROM**. Two 1-Mbyte programmable ROMs are on the XBUS on PCI0. FEPROM 0 contains two copies of the XSROM, the OpenVMS and DIGITAL UNIX PALcode, and the SRM console and decompression code. FEPROM 1 contains the AlphaBIOS and NT HALcode. See Figure 2-3. These two FEPROMs can be flash updated. Refer to Appendix A.

**Figure 2-3 Contents of FEPROMs**

<table>
<thead>
<tr>
<th>Sector</th>
<th>FEPROM 0</th>
<th>FEPROM 1</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>XSROM and Fail Safe ldr</td>
<td>64Kb</td>
</tr>
<tr>
<td>1</td>
<td>Pal Code</td>
<td>64Kb</td>
</tr>
<tr>
<td>2</td>
<td>XSROM and Fail Safe ldr</td>
<td>64Kb</td>
</tr>
<tr>
<td>3</td>
<td>Decompress</td>
<td>AlphaBIOS Code</td>
</tr>
<tr>
<td>31</td>
<td>SRM Console Code</td>
<td>1 Mbyte</td>
</tr>
</tbody>
</table>

PKW0431D-96
For the console to run, the path from the CPU to the XSROM must be functional. The XSROM resides in FEPROM0 on the XBUS, off the EISA bus, off PCI 0, off IOD 0. See Figure 2-4. This path is minimally tested by SROM.

**Figure 2-4 Console Code Critical Path (4100 Block Diagram)**

![Console Code Critical Path Diagram](image)
The SROM contents are loaded into each CPU’s I-cache and executed on power-up/reset. After testing the caches on each processor chip, it tests the path to the XSROM. Once this path is tested and deemed reliable, layers of the XSROM are loaded sequentially into the processor chip on each CPU. None of the SROM or XSROM power-up tests are run from memory—all run from the caches in the CPU chip, thus providing excellent diagnostic isolation. Later power-up tests, run under the console, are used to complete testing of the I/O subsystem.

There are two console programs: the SRM console and the AlphaBIOS console, as detailed in the *AlphaServer 4100 System Drawer User’s Guide* (EK–4100A–UG) and the *AlphaServer 4000 System Drawer User’s Guide* (EK–4000A–UG). By default, the SRM console is always loaded and I/O system tests are run under it before the system loads AlphaBIOS. To load AlphaBIOS, the `os_type` environment variable must be set to `NT` and the Halt button should be out (LED not lit). Otherwise, the SRM console continues to run.
2.3 SROM Power-Up Test Flow

The SROM tests the CPU chip and the path to the XSROM.

Figure 2-5  SROM Power-Up Test Flow

- For each CPU chip
  - Initialize CPU chip
  - Turn off CPU LED
- Light CPU LED
- Determine Primary
  - Size IOD
  - Loopback on each IOD
  - Check integrity of XSROM
  - Load first 8K of XSROM into S-cache
  - Jump to XSROM overlay in S-cache
- Initialize PCI-EISA bridge chip
- Read TOY NVRAM
- Initialize Combo Chip on XBUS for access to COM port 1
- Initialize OCP port on XBUS for access to OCP display
- Print to console device and OCP
- Initialize all S-cache banks
- Duplicate Tag or Fill errors
  - Yes
  - No
- All 3 S-cache banks pass
  - Yes
  - No
- D-cache errors
  - Yes
  - No
- HANG
- Yes
- No
- Yes
- No
- Light IOD LEDs
The Alpha chip built-in self-test tests the I-cache at power-up and upon reset.

Each CPU chip loads its SROM code into its I-cache and starts executing it. If the chip is partially functional, the SROM code continues to execute. However, if the chip cannot perform most of its functions, that CPU hangs and that CPU pass/fail LED remains off.

If the system has more than one CPU and at least one passes both the SROM and XSROM power-up tests, the system will bring up the console. The console checks the FW_SCRATCH register where evidence of the power-up failure is left. Upon finding the error, the console sends these messages to COM1 and the OCP:

- COM1 (or VGA): Power-up tests have detected a problem with your system
- OCP: Power-up failure
Table 2-2 lists the tests performed by the SROM.

### Table 2-2  SROM Tests

<table>
<thead>
<tr>
<th>Test Name</th>
<th>Logic Tested</th>
</tr>
</thead>
<tbody>
<tr>
<td>D-cache RAM March test</td>
<td>D-cache access, D-cache data, D-cache address logic</td>
</tr>
<tr>
<td>D-cache Tag RAM March test</td>
<td>D-cache tag store RAM, D-cache bank address logic</td>
</tr>
<tr>
<td>S-cache Data March test</td>
<td>S-cache RAM cells, S-cache data path, S-cache address path</td>
</tr>
<tr>
<td>S-cache Tag RAM March test</td>
<td>S-cache tag store RAM, S-cache bank address logic</td>
</tr>
<tr>
<td>I-cache Parity Error test</td>
<td>I-cache parity error detection, ISCR register and error forcing logic, IC_PERR_STAT register and reporting logic</td>
</tr>
<tr>
<td>D-cache Parity Error test</td>
<td>D-cache parity error detection, DC_MODE register and parity error forcing logic, DC_PERR_STAT register and reporting logic</td>
</tr>
<tr>
<td>S-cache Parity Error test</td>
<td>S-cache parity error detection, AC_CTL register and parity error forcing logic, SC_STAT register and reporting logic</td>
</tr>
<tr>
<td>IOD Access test</td>
<td>Access to IOD CSRs, data path through CAP chip and MDP0 on each IOD, PCI0 A/D lines &lt;31:0&gt;</td>
</tr>
</tbody>
</table>
2.4 SROM Errors Reported

The SROM reports machine checks, pending interrupt/exception errors, and errors related to corruption of FEPROM 0. If SROM errors are fatal, the particular CPU will hang and only the CPU self-test pass LEDs and/or the LEDs on the system bus to PCI bus bridge module will indicate the failure.

Example 2-1  SROM Errors Reported at Power-Up

Unexpected Machine Check (CPU Error)

UNEX_MCHK on CPU 0
EXCADR 42a9
EISTAT ffffffff004fffffff
EADDR ffffffff00000801f
SCSTAT 0
SCADDR FFFFFFFFFF000005F2F

Pending Interrupt/Exception (CPU Error)

INT-EXC on CPU0
ISR 400000
EISTAT ffffffff007fffffff
EADDR ffffffff000000fddf
FILSYN 631B
BCTGADR ffffffff000000afff

FEPROM Failures (PCI Motherboard Error)

Sctr 0 -XSROM header PTRN fail
Sctr 0 -XSROM head CHKS fail
Sctr 0 -XSROM code CHKS fail
Sctr 2 -XSROM head PTRN fail
Sctr 2 -XSROM head CHKS fail
Sctr 2 -XSROM code CHKS fail
2.5 XSROM Power-Up Test Flow

Once the SROM has completed its tests and verified the path to the FEPROM containing the XSROM code, it loads the first 8 Kbytes of XSROM into the primary CPU’s S-cache and jumps to it.

Figure 2-6  XSROM Power-Up Flowchart

Note: The XSROM can only print to the console device if the environment variable console = serial. It always sends output to the OCP.

XSROM tests are described in Table 2-3. Failure indicates a CPU failure.
After jumping to the primary CPU’s S-cache, the code then intentionally I-caches itself and is completely register based (no D-stream for stack or data storage is used). The only D-stream accesses are writes/reads during testing.

Each FEPROM has sixteen 64-Kbyte sectors. The first sector contains B-cache tests, memory tests, and a fail-safe loader. The second sector contains PALcode. The third sector contains a copy of the first sector. The remaining thirteen sectors contain the SRM console and decompression code.

**NOTE:** Memory tests are run during power-up and reset (see Table 2-4). They are also affected by the state of the memory_test environment variable, which can have the following values:

- **FULL**
  - Test all memory
- **PARTIAL**
  - Test up to the first 256 Mbytes
- **NONE**
  - Test 32 Mbytes

### Table 2-3  XSROM Tests

<table>
<thead>
<tr>
<th>Test</th>
<th>Test Name</th>
<th>Logic Tested</th>
</tr>
</thead>
<tbody>
<tr>
<td>11</td>
<td>B-cache Tag Data Line test</td>
<td>Access to B-cache tags, shorts between tag data and its status and parity bits</td>
</tr>
<tr>
<td>12</td>
<td>B-cache Tag March test</td>
<td>B-cache tag store RAMs, B-cache STAT store RAMs</td>
</tr>
<tr>
<td>13</td>
<td>B-cache Data Line test</td>
<td>B-cache data lines to B-cache data RAMs, B-cache read/write logic</td>
</tr>
<tr>
<td>14</td>
<td>B-cache Data March test</td>
<td>B-cache data RAMs, CPU chip B-cache control, CPU chip B-cache address decode, INDEX_H&lt;2x:6&gt; (address bus)</td>
</tr>
<tr>
<td>15</td>
<td>B-cache ECC Data Line test</td>
<td>CPU chip ECC generation and checking logic, ECC lines from CPU chip to B-cache, B-cache ECC RAMs</td>
</tr>
<tr>
<td>16</td>
<td>B-cache Data ECC March test</td>
<td>Portion of B-cache data RAMs used for ECC</td>
</tr>
<tr>
<td>17</td>
<td>CPU chip ECC Single/Double bit Error test</td>
<td>CPU chip ECC single-bit error detection and correction, ECC double-bit error detection, ECC error reporting</td>
</tr>
<tr>
<td>18</td>
<td>B-cache Tag Store Parity Error test</td>
<td>B-cache tag array, CPU parity detection, EI_ADDR and EI_STAT register operation</td>
</tr>
<tr>
<td>19</td>
<td>B-cache STAT Store Parity Error test</td>
<td>B-cache STAT array, CPU chip B-cache STAT parity generation/detection</td>
</tr>
<tr>
<td>Test</td>
<td>Test Name</td>
<td>Logic Tested</td>
</tr>
<tr>
<td>------</td>
<td>------------------</td>
<td>-------------------------------------------</td>
</tr>
<tr>
<td>20</td>
<td>Memory Data test</td>
<td>Data path to and from memory&lt;br&gt;Data path on memory and RAMs</td>
</tr>
<tr>
<td>21</td>
<td>Memory Address test</td>
<td>Address path to and from memory&lt;br&gt;Address path on memory and RAMs</td>
</tr>
<tr>
<td>23*</td>
<td>Memory Bitmap Building</td>
<td>No new logic</td>
</tr>
<tr>
<td>24</td>
<td>Memory March test</td>
<td>No new logic</td>
</tr>
</tbody>
</table>

* There is no test 22.
The XSROM reports B-cache test errors and memory test errors. It also reports a warning if memory is illegally configured.

Example 2-2  XSROM Errors Reported at Power-Up

B-cache Error (CPU Error)

TEST ERR on cpu0 #CPU running the test
FRU: cpu0
err# 2
tst# 11
exp: 5555555555555555 #Expected data
rcv: aaaaaaaaaaaaaaa #Received data
adr: ffff8 #B-cache location error #occurred

Memory Error (Memory Module Indicated)

20..21..
TEST ERR on cpu0 #CPU running test
FRU: MEM1L #Low member of memory pair 1
err# c
tst# 21
22..23..24..Memory testing complete on cpu0

Memory Configuration Error (Operator Error)

ERR! mem_pair0 misconfigured
ERR! mem_pair1 card size mismatch
ERR! mem_pair1 card type mismatch
ERR! mem_pair1 EMPTY

FEPROM Failures (PCI Motherboard Error)

Sctr 1 -PAL headr PTTRN fail
Sctr 1 -PAL headr CHKSM fail
Sctr 1 -PAL code CHKSM fail
Sctr 3 -CONSOLE headr PTTRN fail
Sctr 3 -CONSOLE headr CHKSM fail
Sctr 3 -CONSOLE code CHKSM fail
2.7 Console Power-Up Tests

Once the SRM console is loaded, it does further testing of each IOD. Table 2-5 describes the IOD power-up tests, and Table 2-6 describes the PCI motherboard power-up tests.

### Table 2-5 IOD Tests

<table>
<thead>
<tr>
<th>Test Number</th>
<th>Test Name</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>1</td>
<td>IOD CSR Access test</td>
<td>Read and write all CSRs in each IOD.</td>
</tr>
<tr>
<td>2</td>
<td>Loopback test</td>
<td>Dense space writes to the IOD’s PCI dense space to check the integrity of ECC lines on the IODs.</td>
</tr>
<tr>
<td>3</td>
<td>ECC test</td>
<td>Loopback tests similar to test 2 but with a varying pattern to create an ECC of 0s. Single- and double-bit errors are checked.</td>
</tr>
<tr>
<td>4</td>
<td>Parity Error and Fill Error tests</td>
<td>Parity errors are forced on the address and data lines on system bus and PCI buses. A fill error transaction is forced on the system bus.</td>
</tr>
<tr>
<td>5</td>
<td>Translation Error test</td>
<td>A loopback test using scatter/gather address translation logic on each IOD.</td>
</tr>
<tr>
<td>6</td>
<td>Write Pending test</td>
<td>Runs test 2 with the write-pending bit set and clear in the CAP chip control register.</td>
</tr>
<tr>
<td>7</td>
<td>PCI Loopback test</td>
<td>Loops data through each PCI on each IOD, testing the mask field of the system bus.</td>
</tr>
<tr>
<td>8</td>
<td>PCI Peer-to-Peer Byte Mask test</td>
<td>Tests that devices on the same PCI and on different PCIs can communicate.</td>
</tr>
</tbody>
</table>
### Table 2-6  PCI Motherboard Tests (B3050 only)

<table>
<thead>
<tr>
<th>Test Number</th>
<th>Test Name</th>
<th>Diagnostic Name</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>1</td>
<td>PCEB</td>
<td>pceb_diag</td>
<td>Tests the PCI to EISA bridge chip</td>
</tr>
<tr>
<td>2</td>
<td>ESC</td>
<td>esc_diag</td>
<td>Tests the EISA system controller</td>
</tr>
<tr>
<td>3</td>
<td>8K NVRAM</td>
<td>nvramp_diag</td>
<td>Tests the NVRAM</td>
</tr>
<tr>
<td>4</td>
<td>Real-Time Clock</td>
<td>ds1287_diag</td>
<td>Tests the real-time clock chip</td>
</tr>
<tr>
<td>5</td>
<td>Keyboard and Mouse</td>
<td>i8242_diag</td>
<td>Tests the keyboard/mouse chip</td>
</tr>
<tr>
<td>6</td>
<td>Flash ROM</td>
<td>flash_diag</td>
<td>Dumps contents of flash ROM</td>
</tr>
<tr>
<td>7</td>
<td>Serial and Parallel Ports and Floppy</td>
<td>combo_diag</td>
<td>Tests COM ports 1 and 2, the parallel port, and the floppy</td>
</tr>
<tr>
<td>8</td>
<td>CD-ROM</td>
<td>ncr810_diag</td>
<td>Tests the CD-ROM controller</td>
</tr>
</tbody>
</table>

For both IOD tests and PCI 0 and PCI 1 tests, trace and failure status is sent to the OCP. If any of these tests fail, a warning is sent to the SRM console device after the console prompt (or AlphaBIOS pop-up box). The LEDs on the system bus to PCI bus bridge module are controlled by the diagnostics. If a LED is off, a failure occurred.
2.8 Console Device Determination

After the SROM and XSROM have completed their tasks, the SRM console program, as it starts, determines where to send its power-up messages.

Figure 2-7  Console Device Determination Flowchart

Power-Up/Reset or
P00>>> Init

Console Envar = serial

Yes

Enable COM port 1 and send messages as system is powering up

No

Console Envar = graphics

Yes

Enable COM port 1 and send messages as system is powering up. Warning message sent if a VGA adapter is seen on PCI 1

Yes

VGA adapter on PCI0

VGA becomes the console device.

No

PKW0434-96
Console Device Options

The console device can be either a serial terminal or a graphics monitor. Specifically:

- A serial terminal connected to COM1 off the server control module. The terminal connected to COM1 must be set to 9600 baud. This baud rate cannot be changed.

- A graphics monitor off an adapter on PCI0.

Systems running Windows NT must have a graphics monitor as the console device and run AlphaBIOS as the console program.

During power-up, the SROM and the XSROM always send progress and error messages to the OCP and to the COM1 serial port if the SRM console environment variable (set with the `set console` command) is set to `serial`. If the `console` environment variable is set to `graphics`, no messages are sent to COM1.

If the console device is connected to COM1, the SROM, XSROM, and console power-up messages are sent to it once it has been initialized. If the console device is a graphics device, console power-up messages are sent to it, but SROM and XSROM power-up messages are lost. No matter what the `console` environment variable setting, each of the three programs sends messages to the control panel display.

<table>
<thead>
<tr>
<th>Messages Sent By</th>
<th>Console Set to</th>
<th>Serial</th>
<th>Graphics</th>
</tr>
</thead>
<tbody>
<tr>
<td>SROM</td>
<td>COM1</td>
<td>Lost</td>
<td></td>
</tr>
<tr>
<td>XSROM</td>
<td>COM1</td>
<td>Lost</td>
<td></td>
</tr>
<tr>
<td>SRM console</td>
<td>COM1</td>
<td></td>
<td>VGA</td>
</tr>
</tbody>
</table>

Changing Where Console Output Is Displayed

You can change where console output is displayed, assuming the SRM console has fully powered up and the `os_type` environment variable is set to `openvms` or `unix`. (The following does not work if `os_type` is set to `nt`.)

If the `console` environment variable is set to `serial` and no serial terminal is attached to COM1, pressing a carriage return on a graphics monitor attached to the system makes it the console device and the console prompt is sent to it. If the console environment variable is set to `graphics` and no graphics monitor is attached to the adapter, pressing a carriage return on a serial terminal attached to COM1 makes it the console device and the console prompt is sent to it.
2.9 Console Power-Up Display

The entire power-up display prints to a serial terminal (if the `console` environment variable is set to `serial`), and parts of it print to the control panel display. The last several lines print to either a serial terminal or a graphics monitor.

Example 2-3   Power-Up Display

```
SROM V1.0 on cpu0
SROM V1.0 on cpu1
SROM V1.0 on cpu2
SROM V1.0 on cpu3
XSROM V1.0 on cpu2
XSROM V1.0 on cpu1
XSROM V1.0 on cpu3
XSROM V1.0 on cpu0
BCache testing complete on cpu2
BCache testing complete on cpu0
BCache testing complete on cpu3
BCache testing complete on cpu1
mem_pair0 - 128 MB
mem_pair1 - 128 MB
20..20..21..20..21..20..21..21..23..24..24..24..24..
Memory testing complete on cpu0
Memory testing complete on cpu1
Memory testing complete on cpu3
Memory testing complete on cpu2
```
At power-up or reset, the SROM code on each CPU module is loaded into that module’s I-cache and tests the module. If all tests pass, the processor’s LED lights. If any test fails, the LED remains off and power-up testing terminates on that CPU.

The first determination of the primary processor is made, and the primary processor executes a loopback test to each PCI bridge. If this test passes, the bridge LED lights. If it fails, the LED remains off and power-up continues. The EISA system controller, PCI-to-EISA bridge, COM1 port, and control panel port are all initialized thereafter.

Each CPU prints an SROM banner to the device attached to the COM1 port and to the control panel display. (The banner prints to the COM1 port if the console environment variable is set to serial. If it is set to graphics, nothing prints to the console terminal, only to the control panel display, until ⑥).

Each processor's S-cache is initialized, and the XSROM code in the FEPROM on the PCI 0 is unloaded into them. (If the unload is not successful, a copy is unloaded from a different FEPROM sector. If the second try fails, the CPU hangs.)

Each processor jumps to the XSROM code and sends an XSROM banner to the COM1 port and to the control panel display.

The three S-cache banks on each processor are enabled, and then the B-cache is tested. If a failure occurs, a message is sent to the COM1 port and to the control panel display.

Each CPU sends a B-cache completion message to COM1.

The primary CPU is again determined, and it sizes memory by reading memory registers on the I2C bus.

The information on memory pairs is sent to COM1. If an illegal memory configuration is detected, a warning message is sent to COM1 and the control panel display.

Memory is initialized and tested, and the test trace is sent to COM1 and the control panel display. Each CPU participates in the memory testing. The numbers for tests 20 and 21 might appear interspersed, as in Example 2-3. This is normal behavior. Test 24 can take several minutes if the memory is very large. The message “P0 TEST 24 MEM**” is displayed on the control panel display; the second asterisk rotates to indicate that testing is continuing. If a failure occurs, a message is sent to the COM1 port and to the control panel display.

Each CPU sends a test completion message to COM1.

Continued on next page
Example 2-3  Power-Up Display (Continued)

starting console on CPU 0  
sizing memory  
  0  128 MB SYNC  
  1  128 MB SYNC  
starting console on CPU 1  
starting console on CPU 2  
starting console on CPU 3  
probing IOD1 hose 1  
  bus 0 slot 1 - NCR 53C810  
  bus 0 slot 2 - DECchip 21041-AA  
  bus 0 slot 3 - NCR 53C810  
  bus 0 slot 4 - DECchip 21040-AA  
probing IOD0 hose 0  
  bus 0 slot 1 - PCEB  
Configuring I/O adapters...  
AlphaServer 4100 Console V1.0, 13-MAR-1996 18:18:26

P00>>>
The final primary CPU determination is made. The primary CPU unloads PALcode and decompression code from the FEPROM on the PCI 0 to its B-cache. The primary CPU then jumps to the PALcode to start the SRM console.

The primary CPU prints a message indicating that it is running the console. Starting with this message, the power-up display is printed to the default console terminal, regardless of the state of the `console` environment variable. (If `console` is set to `graphics`, the display from here to the end is saved in a memory buffer and printed to the graphics monitor after the PCI buses are sized and the graphics device is initialized.)

The size and type of each memory pair is determined.

The console is started on each of the secondary CPUs. A status message prints for each CPU.

The PCI bridges (indicated as IOD\textsubscript{n}) are probed and the devices are reported. I/O adapters are configured.

The SRM console banner and prompt are printed. (The SRM prompt is shown in this manual as P00>>>. It can, however, be P01>>>, P02>>>, or P03>>>.

The number indicates the primary processor.) If the `auto_action` environment variable is set to `boot` or `restart` and the `os_type` environment variable is set to `unix` or `openvms`, the DIGITAL UNIX or OpenVMS operating system boots.

If the system is running the Windows NT operating system (the `os_type` environment variable is set to `nt`), the SRM console loads and starts the AlphaBIOS console and does not print the SRM banner or prompt.
2.10 Fail-Safe Loader

The fail-safe loader is a software routine that loads the SRM console image from floppy. Once the console is running you will want to run LFU to update FEPROM 0 with a new image.

NOTE: FEPROM 0 contains images of the SROM, XSROM, PAL, decompression, and SRM console code.

If the fail-safe loader loads, the following conditions exist on the machine:

- The SROM has passed its tests and successfully unloaded the XSROM. If the SROM fails to unload both copies of XSROM, it reports the failure to the control panel display and COM1 if possible, and the system hangs.
- The XSROM has completed its B-cache and memory tests but has failed to unload the PAL code in FEPROM 0 sector 1 or the SRM console code.
- The XSROM reports the errors encountered and loads the fail-safe loader.
This chapter describes troubleshooting during power-up and booting, as well as diagnostics for AlphaServer 4000/4100 systems. The following topics are covered:

- Troubleshooting with LEDs
- Troubleshooting Power Problems
- Running Diagnostics—Test Command
3.1 Troubleshooting with LEDs

During power-up, reset, initialization, or testing, diagnostics are run on CPUs, memories, bridge modules, PCI motherboards, and sometimes options. The following sections describe possible problems that can be identified by checking LEDs.

**Figure 3-1 CPU and Bridge Module LEDs**

<table>
<thead>
<tr>
<th>Bridge Module LEDs (IOD 0 &amp; 1)</th>
<th>CPU LEDs</th>
</tr>
</thead>
<tbody>
<tr>
<td>IOD0 Self-Test Pass</td>
<td>DC_OK</td>
</tr>
<tr>
<td>IOD1 Self-Test Pass</td>
<td>SROM Oscillator</td>
</tr>
<tr>
<td>POWER_FAN_OK</td>
<td>CPU Self-Test Pass</td>
</tr>
<tr>
<td>TEMP_OK</td>
<td>Regulator OK (EV56)</td>
</tr>
</tbody>
</table>

<table>
<thead>
<tr>
<th>Bridge Module LEDs (IOD 2 &amp; 3)</th>
</tr>
</thead>
<tbody>
<tr>
<td>IOD2 Self-Test Pass</td>
</tr>
<tr>
<td>IOD3 Self-Test Pass</td>
</tr>
</tbody>
</table>

Normally On

Normally Off

PKW0400C-97
CPU LEDs

- If the CPU STP LED on any CPU module is lit, that CPU chip is functioning properly. If the operating system is NT and the CPU STP LED is off, that CPU may or may not be functioning.

  You can use the Halt button on the OCP to prevent the AlphaBIOS console (which turns off the CPU STP LED) from booting, thus assuring the validity of the CPU STP LED. If the LED is off, replace the CPU. If the LED is lit, you can use the SRM console command `alphabios` to load and run the AlphaBIOS console.

- The top LED on a CPU module is a DC OK LED. It is driven by the PCM module. If it is not lit, there are probably power problems.

- The second from the top LED on a CPU lights only when the SROM on the CPU is loaded.

- On modules with EV56 CPU processors a fourth LED is present at the bottom of the column. The LED is normally on indicating that the power regulator on the module is working properly. If the LED is off, replace the module.

System Bus to PCI Bus Bridge Module LEDs (B3040-AA)

There are four LEDs on the B3040-AA system bus to PCI bus bridge module:

- The top two LEDs indicate the condition of the bridge module. If either is off, the module should be replaced.

- The bottom two LEDs are passed from the PCM. Both should be on during normal operation. If either is off while the system is on, the LEDs on the PCM module should indicate what failed. If they do not, the PCM could be broken or the bridge module is not passing the signals to the LEDs.

  NOTE: If AC power is applied and the system is off and a power supply is in operation, the power LED, the top one of the bottom two, flashes, indicating the presence of Vaux (auxiliary voltage).

System Bus to PCI Bus Bridge Module LEDs (B3040-AB)

There are two LEDs on the B3040-AB system bus to PCI bus bridge module:

- The two LEDs indicate the condition of the bridge module. If either is off, the module should be replaced.
3.1.1 Cabinet Power and Fan LEDs

Figure 3-2 Cabinet Power and Fan LEDs
A cabinet system has three exhaust fans at the top of the cabinet. They are powered from a small power supply in the fan tray. This power supply also powers the server control module at the bottom of the PCI card cage to allow remote access to the system. A failure of the power supply is indicated only by the LEDs. No messages are displayed.

There are two LEDs on the top panel: a fan LED and a power LED.

- When the fan LED (amber) is flashing, a cabinet fan needs replacing. Look to see which fan appears broken (either not functioning at all; or it appears to be slower than the others).
- When the power LED (green) is off, either the power supply in the fan tray is broken or there is a power problem.
3.2 Troubleshooting Power Problems

Power problems can occur before the system is up or while the system is running. If a system stops running, make a habit of checking the PCM.

Power Problem List

The system will halt for the following:

1. A CPU fan failure
2. A system fan failure
3. An overtemperature condition
4. Power supplied out of tolerance
5. Circuit breaker(s) tripped
6. AC problem
7. Interlock switch activation or failure
8. PCM failure
9. Environmental electrical failure or unrecoverable system fault with auto_action ev = halt or boot
10. Operator error - failure to unplug all power supplies and letting Vaux drain (10 sec delay) before restarting
11. Cable failure
12. Module failure - System motherboard, PCI motherboard, or system bus to PCI bus bridge
13. SCM breaking the interlock circuit

Indications of failure:

1. Power control module LEDs indicate CPU fan, system fan, overtemperature, and power supply failures
2. Circuit breaker(s) tripped

No obvious indications for failures 7 - 13 from the power system.

PKW0436A-96

If Halt Is Caused by Power, Fan, or Overtemperature

If a system is stopped because of a power, fan, or overtemperature problem, use the PCM LEDs to diagnose the problem. See Section 3.2.1.
If Power Problem Occurs at Power-Up

If the system has a power problem on a cold start, the PCM LEDs are not valid until after DCOK_SENSE has been asserted. The cause is one of the following:

- Broken system fan
- Broken CPU fan
- Power supplied to the system is out of tolerance (a power supply could be broken and the system could still power up)
- PCM failure
- Interlock failure
- Wire problems
- Temperature problem (unlikely)

Recommended Order for Troubleshooting Failure at Power-Up

1. Check to see if any CPU fan or system fan is not spinning. Fans can fail by not spinning and/or not putting out the tachometer output necessary as input to the PCM comparator that checks the fans. (See steps 4 and 5.) Replace broken fan.
2. Replace the PCM.
3. Sequentially remove CPUs and try to power up after you remove a CPU. If the system powers up, the last CPU you removed had a fan failure.
4. Check the output of the power supplies. See Section 4.1 for locations of +5 and +3.43 volt output pins. If the output is above or below the threshold, replace the faulty power supply.
5. Check the output of each system fan with a voltmeter. Probe the middle of three outputs of the fans with the positive lead of the meter and ground the other probe. The meter should read 2.5 volts to 3 volts. If a fan’s output is out of this range, replace the fan.

NOTE: You will have to disable the interlocks to check the voltages in step 5. You will have only 10 seconds to measure them. There is a 10-second delay before the PCM turns off the power.

The PCM must sense a change in Vaux (auxiliary voltage) to start the power supplies. Pressing the On button has no effect if the machine halted because of a failure in the power system. The power supplies must be unplugged and plugged back in for the On button to work.
3.2.1 Power Control Module LEDs

The PCM has 11 LEDs visible through the system card cage. The LED display shows the relative placement of the LEDs.

Figure 3-3  PCM LEDs

- DCOK_SENSE
- PS0_OK
- PS1_OK
- PS2_OK
- TEMP_OK
- CPUFAN_OK
- SYSFAN_OK
- CS_FAN0
- CS_FAN1
- CS_FAN2
- C_FAN3

- Normally On
- Tested at one-second intervals
- Off if power supply not present or broken
## Table 3-1  Power Control Module LED States

<table>
<thead>
<tr>
<th>LED</th>
<th>State</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>DCOK_SENSE</td>
<td>On</td>
<td>Both +5.0V and +3.43V are present and within limits.</td>
</tr>
<tr>
<td>PS0_OK</td>
<td>On</td>
<td>Power supply 0 is present and has asserted POK_H.</td>
</tr>
<tr>
<td>PS1_OK</td>
<td>Off</td>
<td>Power supply 1 not present.</td>
</tr>
<tr>
<td></td>
<td>On</td>
<td>Power supply 1 is present and has asserted POK_H.</td>
</tr>
<tr>
<td>PS2_OK</td>
<td>On</td>
<td>Power supply 2 is present and has asserted POK_H.</td>
</tr>
<tr>
<td></td>
<td>Off</td>
<td>Power supply 2 not present.</td>
</tr>
<tr>
<td>TEMP_OK</td>
<td>On</td>
<td>The system temperature is below 55° C.</td>
</tr>
<tr>
<td>CPUFAN_OK</td>
<td>On</td>
<td>All CPU fans are OK.</td>
</tr>
<tr>
<td></td>
<td>Off</td>
<td>A CPU fan has failed. The specific fan is identified by the CS_FANx or C_FAN3 LED that remains lit.</td>
</tr>
<tr>
<td>SYSFAN_OK</td>
<td>On</td>
<td>All system fans are OK.</td>
</tr>
<tr>
<td></td>
<td>Off</td>
<td>A system fan has failed. The specific fan is identified by the CS_FANx that remains lit.</td>
</tr>
<tr>
<td>CS_FAN0</td>
<td>On</td>
<td>CPU fan 0 and system fan 0 are being sampled or one of them has failed as indicated by CPUFAN_OK and SYSFAN_OK.</td>
</tr>
<tr>
<td></td>
<td>Off</td>
<td>CPU fan 0 and system fan 0 are not being sampled and are functioning properly.</td>
</tr>
<tr>
<td>CS_FAN1</td>
<td>On</td>
<td>CPU fan 1 and system fan 1 are being sampled or one of them has failed as indicated by CPUFAN_OK and SYSFAN_OK.</td>
</tr>
<tr>
<td></td>
<td>Off</td>
<td>CPU fan 1 and system fan 1 are not being sampled and are functioning properly.</td>
</tr>
<tr>
<td>CS_FAN2</td>
<td>On</td>
<td>CPU fan 2 and system fan 2 are being sampled or one of them has failed as indicated by CPUFAN_OK and SYSFAN_OK.</td>
</tr>
<tr>
<td></td>
<td>Off</td>
<td>CPU fan 2 and system fan 2 are not being sampled and are functioning properly.</td>
</tr>
<tr>
<td>C_FAN3</td>
<td>On</td>
<td>CPU fan 3 is being sampled or has failed as indicated by CPUFAN_OK and SYSFAN_OK.</td>
</tr>
<tr>
<td></td>
<td>Off</td>
<td>CPU fan 3 and system fan 3 are not being sampled and are functioning properly.</td>
</tr>
</tbody>
</table>
3.3 Maintenance Bus (I\(^2\)C Bus)

The I\(^2\)C bus (referred to as the “I squared C bus”) is a small internal maintenance bus used to monitor system conditions scanned by the power control module, write the fault display, store error state, and track configuration information in the system. Although all system modules (not I/O modules) sit on the maintenance bus, only the I\(^2\)C controller accesses it. Everything written or read on the I\(^2\)C bus is done by the controller. The block diagram below notes differences between the AlphaServer 4000 and 4100 with respect to the I\(^2\)C bus.

Figure 3-4  I\(^2\)C Bus Block Diagram
Monitor
The I²C bus monitors the state of system conditions scanned by the PCM. There are two registers on the PCM:
• One records the state of the fans and power supplies and is latched when there is a fault.
• The other causes an interrupt on the I²C bus when a CPU or system fan fails, an overtemperature condition exists, or power supplied to the system is out of tolerance.

The interrupt received by the I²C bus controller on PCI 0 alerts the system of imminent power shutdown. The controller has 30 seconds to read the two registers and store the information in the EEPROM on the PCM. The SRM console command `show power` reads these registers.

Fault Display
The OCP display is written through the I²C bus.

Error State
Error state is written and read for power conditions. The state of the Halt button (in/out) is read on the I²C bus.

Configuration Tracking
Each CPU, PCI bridge, PCI motherboard, and system motherboard has an EEPROM that contains information about the module that can be written and read over the I²C bus. All modules contain the following information:
• Module type
• Module serial number
• Hardware revision
• Firmware revision
• Memory size (only required for memory modules)
3.4 Running Diagnostics — Test Command

The test command runs diagnostics on the entire system, CPU devices, memory devices, and the PCI I/O subsystem. The test command runs only from the SRM console. Ctrl/C stops the test.

Example 3-1   Test Command Syntax

PO0>>> help test
FUNCTION

SYNOPSIS
   test ([[-q] [-t <time>]] [option]
   where option is:
      cpu
      mem
      pci

   and n can be one of 0, 1, 2, 3, or *.

   The entire system is tested by default if no option specified.

NOTE: If you are running the Microsoft Windows NT operating system, switch from AlphaBIOS to the SRM console in order to enter the test command. From the AlphaBIOS console, press in the Halt button (the LED will light) and reset the system, or select DIGITAL UNIX (SRM) or OpenVMS (SRM) from the Advanced CMOS Setup screen and reset the system.

   test [-t time] [-q] [option]

   -t time     Specifies the run time in seconds. The default for system test is 600 seconds (10 minutes).

   -q          Disables the display of status messages as exerciser processes are started and stopped during testing.

   option      Either cpu, mem, or pci, where n is 0, 1, 2, 3, or *. If nothing is specified, the entire system is tested.
3.5 Testing an Entire System

A test command with no modifiers runs all exercisers for subsystems and devices on the system. I/O devices tested are supported boot devices. The test runs for 10 minutes.

Example 3-2  Sample Test Command

POD>>> test
Console is in diagnostic mode
System test, runtime 600 seconds

Type ^C to stop testing

Configuring system..
polling ncr0 (NCR 53C810) slot 1, bus 0 PCI, hose 1  SCSI Bus ID 7
dka500.5.0.1.1  DMA500  ROM 1645
polling ncr1 (NCR 53C810) slot 3, bus 0 PCI, hose 1  SCSI Bus ID 7
dkb200.2.0.3.1  DKB200  0007
dkb400.4.0.3.1  DKB400  0007
polling floppy0 (FLOPPY) PCEB - XBUS hose 0
dva0.0.0.1000.0  DVA0  RM23
polling tulip0 (DECchip 21040-AA) slot 2, bus 0 PCI, hose 1
ewa0.0.0.2.1: 08-00-2B-E5-B4-1A

Testing EM0 network device

Testing VGA (alphanumeric mode only)

Starting background memory test, affinity to all CPUs..
Starting processor/cache thrasher on each CPU..
Starting processor/cache thrasher on each CPU..
Starting processor/cache thrasher on each CPU..

Testing SCSI disks (read-only)
No CD-ROM present, skipping embedded SCSI test
Testing other SCSI devices (read-only)..

Testing floppy drive (dva0, read-only)
<table>
<thead>
<tr>
<th>ID</th>
<th>Program</th>
<th>Device</th>
<th>Pass</th>
<th>Hard/Soft</th>
<th>Bytes Written</th>
<th>Bytes Read</th>
</tr>
</thead>
<tbody>
<tr>
<td>00003047</td>
<td>memtest</td>
<td>memory</td>
<td>1</td>
<td>0</td>
<td>134217728</td>
<td>134217728</td>
</tr>
<tr>
<td>00003050</td>
<td>memtest</td>
<td>memory</td>
<td>205</td>
<td>0</td>
<td>213883392</td>
<td>213883392</td>
</tr>
<tr>
<td>00003059</td>
<td>memtest</td>
<td>memory</td>
<td>192</td>
<td>0</td>
<td>200253568</td>
<td>200253568</td>
</tr>
<tr>
<td>00003062</td>
<td>memtest</td>
<td>memory</td>
<td>192</td>
<td>0</td>
<td>200253568</td>
<td>200253568</td>
</tr>
<tr>
<td>00003084</td>
<td>memtest</td>
<td>memory</td>
<td>80</td>
<td>0</td>
<td>82827392</td>
<td>82827392</td>
</tr>
<tr>
<td>000030d8</td>
<td>exer_kid</td>
<td>dkb200.2.0.3</td>
<td>26</td>
<td>0</td>
<td>13690880</td>
<td>13690880</td>
</tr>
<tr>
<td>000030d9</td>
<td>exer_kid</td>
<td>dkb400.4.0.3</td>
<td>26</td>
<td>0</td>
<td>13674496</td>
<td>13674496</td>
</tr>
<tr>
<td>0000310d</td>
<td>exer_kid</td>
<td>dva0.0.0.100</td>
<td>0</td>
<td>0</td>
<td>327680</td>
<td>327680</td>
</tr>
<tr>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>00003047</td>
<td>memtest</td>
<td>memory</td>
<td>1</td>
<td>0</td>
<td>432013312</td>
<td>432013312</td>
</tr>
<tr>
<td>00003050</td>
<td>memtest</td>
<td>memory</td>
<td>635</td>
<td>0</td>
<td>664716032</td>
<td>664716032</td>
</tr>
<tr>
<td>00003059</td>
<td>memtest</td>
<td>memory</td>
<td>619</td>
<td>0</td>
<td>647940864</td>
<td>647940864</td>
</tr>
<tr>
<td>00003062</td>
<td>memtest</td>
<td>memory</td>
<td>620</td>
<td>0</td>
<td>648989312</td>
<td>648989312</td>
</tr>
<tr>
<td>00003084</td>
<td>memtest</td>
<td>memory</td>
<td>263</td>
<td>0</td>
<td>274693376</td>
<td>274693376</td>
</tr>
<tr>
<td>000030d8</td>
<td>exer_kid</td>
<td>dkb200.2.0.3</td>
<td>90</td>
<td>0</td>
<td>47572992</td>
<td>47572992</td>
</tr>
<tr>
<td>000030d9</td>
<td>exer_kid</td>
<td>dkb400.4.0.3</td>
<td>90</td>
<td>0</td>
<td>47523840</td>
<td>47523840</td>
</tr>
<tr>
<td>0000310d</td>
<td>exer_kid</td>
<td>dva0.0.0.100</td>
<td>0</td>
<td>0</td>
<td>327680</td>
<td>327680</td>
</tr>
<tr>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>00003047</td>
<td>memtest</td>
<td>memory</td>
<td>1</td>
<td>0</td>
<td>727711744</td>
<td>727711744</td>
</tr>
<tr>
<td>00003050</td>
<td>memtest</td>
<td>memory</td>
<td>1054</td>
<td>0</td>
<td>1104015744</td>
<td>1104015744</td>
</tr>
<tr>
<td>00003059</td>
<td>memtest</td>
<td>memory</td>
<td>1039</td>
<td>0</td>
<td>1088289024</td>
<td>1088289024</td>
</tr>
<tr>
<td>00003062</td>
<td>memtest</td>
<td>memory</td>
<td>1041</td>
<td>0</td>
<td>1090385920</td>
<td>1090385920</td>
</tr>
<tr>
<td>00003084</td>
<td>memtest</td>
<td>memory</td>
<td>447</td>
<td>0</td>
<td>467607808</td>
<td>467607808</td>
</tr>
<tr>
<td>000030d8</td>
<td>exer_kid</td>
<td>dkb200.2.0.3</td>
<td>155</td>
<td>0</td>
<td>81488896</td>
<td>81488896</td>
</tr>
<tr>
<td>000030d9</td>
<td>exer_kid</td>
<td>dkb400.4.0.3</td>
<td>155</td>
<td>0</td>
<td>81472512</td>
<td>81472512</td>
</tr>
<tr>
<td>0000310d</td>
<td>exer_kid</td>
<td>dva0.0.0.100</td>
<td>1</td>
<td>0</td>
<td>607232</td>
<td>607232</td>
</tr>
</tbody>
</table>

Testing aborted. Shutting down tests.
Please wait..

System test complete

^C
POD>>>
3.5.1 Testing Memory

The test mem command tests individual memory devices or all memory. The test shown in Example 3-3 runs for 2 minutes.

Example 3-3  Sample Test Memory Command

P00>>> test memory
Console is in diagnostic mode
System test, runtime 120 seconds

Type ^C to stop testing

Starting background memory test, affinity to all CPUs.
Starting memory thrasher on each CPU.
Starting memory thrasher on each CPU.
Starting memory thrasher on each CPU.

ID       Program      Device       Pass  Hard/Soft Bytes Written  Bytes Read
-------- ------------ ------------ ------ --------- ------------- ------------
000046d7      memtest memory            1    0    0     48234496     48234496
000046e0      memtest memory          122    0    0    126862208    126862208
000046e9      memtest memory          111    0    0    115329280    115329280
000046f2      memtest memory          109    0    0     113232384    113232384
000046fb      memtest memory           41    0    0      41937920     41937920

ID       Program      Device       Pass  Hard/Soft Bytes Written  Bytes Read
-------- ------------ ------------ ------ --------- ------------- ------------
000046d7      memtest memory            1    0    0     226492416    226492416
000046e0      memtest memory         566    0    0     592373120    592373120
000046e9      memtest memory         555    0    0     580840192    580840192
000046f2      memtest memory         554    0    0     579791744    579791744
000046fb      memtest memory         211    0    0     220174080    220174080

ID       Program      Device       Pass  Hard/Soft Bytes Written  Bytes Read
-------- ------------ ------------ ------ --------- ------------- ------------
000046d7      memtest memory            1    0    0     404750336    404750336
000046e0      memtest memory         1011    0    0    1058932480   1058932480
000046e9      memtest memory         1000    0    0     1047399552   1047399552
000046f2      memtest memory         999    0    0     1046351104   1046351104
000046fb      memtest memory          381    0    0     398410240    398410240

Troubleshooting  3-15
<table>
<thead>
<tr>
<th>ID</th>
<th>Program</th>
<th>Device</th>
<th>Pass</th>
<th>Hard/Soft</th>
<th>Bytes Written</th>
<th>Bytes Read</th>
</tr>
</thead>
<tbody>
<tr>
<td>000046d7</td>
<td>memtest memory</td>
<td></td>
<td>1</td>
<td>0</td>
<td>583008256</td>
<td>583008256</td>
</tr>
<tr>
<td>000046e0</td>
<td>memtest memory</td>
<td>memory</td>
<td>1456</td>
<td>0</td>
<td>1525491840</td>
<td>1525491840</td>
</tr>
<tr>
<td>000046e9</td>
<td>memtest memory</td>
<td>memory</td>
<td>1446</td>
<td>0</td>
<td>1515007360</td>
<td>1515007360</td>
</tr>
<tr>
<td>000046f2</td>
<td>memtest memory</td>
<td>memory</td>
<td>1444</td>
<td>0</td>
<td>1512910464</td>
<td>1512910464</td>
</tr>
<tr>
<td>000046fb</td>
<td>memtest memory</td>
<td>memory</td>
<td>1444</td>
<td>0</td>
<td>1512910464</td>
<td>1512910464</td>
</tr>
</tbody>
</table>

Memory test complete

Test time has expired...

P00>>>

---

3-16 AlphaServer 4000/4100 Service Manual
3.5.2 Testing PCI

The test pci command tests PCI buses and devices. The test runs for 2 minutes.

Example 3-4 Sample Test Command for PCI

P00>>> test pci
Console is in diagnostic mode
System test, runtime 120 seconds

Type ^C to stop testing

Configuring all PCI buses..
polling ncr0 (NCR 53C810) slot 1, bus 0 PCI, hose 1 SCSI Bus ID 7
dka500.5.0.1.1  DKa500  RRD45  1645
polling ncr1 (NCR 53C810) slot 3, bus 0 PCI, hose 1 SCSI Bus ID 7
dkb200.2.0.3.1  DKb200  RZ29B  0007
dkb400.4.0.3.1  DKb400  RZ29B  0007
polling tulip0 (DECchip 21040-AA) slot 2, bus 0 PCI, hose 1
eva0.0.0.2.1: 08-00-2B-ES-B4-1A
polling floppy0 (FLOPPY) PCEB - XBUS hose 0
dva0.0.0.1000.0  DVA0  RX23

Testing all PCI buses..
Testing EWA0 network device
Testing VGA ( alphanumeric mode only)
Testing SCSI disks (read-only)
Testing floppy (dva0, read-only)

<table>
<thead>
<tr>
<th>ID</th>
<th>Program</th>
<th>Device</th>
<th>Pass</th>
<th>Hard/Soft</th>
<th>Bytes Written</th>
<th>Bytes Read</th>
</tr>
</thead>
<tbody>
<tr>
<td>00002c29</td>
<td>exer_kid</td>
<td>dkb200.2.0.3</td>
<td>27</td>
<td>0</td>
<td>0</td>
<td>14642176</td>
</tr>
<tr>
<td>00002c2a</td>
<td>exer_kid</td>
<td>dkb400.4.0.3</td>
<td>27</td>
<td>0</td>
<td>0</td>
<td>14642176</td>
</tr>
<tr>
<td>00002c5e</td>
<td>exer_kid</td>
<td>dva0.0.0.100</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>ID</td>
<td>Program</td>
<td>Device</td>
<td>Pass</td>
<td>Hard/Soft</td>
<td>Bytes Written</td>
<td>Bytes Read</td>
</tr>
<tr>
<td>--------</td>
<td>----------</td>
<td>--------------</td>
<td>------</td>
<td>-----------</td>
<td>--------------</td>
<td>------------</td>
</tr>
<tr>
<td>00002c29</td>
<td>exer_kid</td>
<td>dkb200.2.0.3</td>
<td>92</td>
<td>0</td>
<td>0</td>
<td>48689152</td>
</tr>
<tr>
<td>00002c2a</td>
<td>exer_kid</td>
<td>dkb400.4.0.3</td>
<td>92</td>
<td>0</td>
<td>0</td>
<td>48689152</td>
</tr>
<tr>
<td>00002c5e</td>
<td>exer_kid</td>
<td>dva0.0.0.100</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>286720</td>
</tr>
</tbody>
</table>

Testing aborted. Shutting down tests.
Please wait..

Testing complete

^C

P00>>>
This chapter describes the AlphaServer 4000/4100 power system:

- Power Supply
- Power Control Module Features
- Power Circuit and Cover Interlocks
- Power-Up/Down Sequencing
- Cabinet Power Configuration Rules
- Pedestal Power Configuration Rules (North America and Japan)
- Pedestal Power Configuration Rules (Europe and Asia Pacific)
4.1 Power Supply

Power supply outputs are shown in Figure 4-1.

Figure 4-1 Power Supply Outputs

- Current share
- +5V/Return
- +3.4V/Return
- +3.4V/Return
- +12V/Return
- PKW0402A-96

Misc. Signal
Power Supply Features

- 90–264 Vrms input
- 450 watts output. Output voltages are as follows:

<table>
<thead>
<tr>
<th>Output Voltage</th>
<th>Min. Voltage</th>
<th>Max. Voltage</th>
<th>Max. Current</th>
</tr>
</thead>
<tbody>
<tr>
<td>+5.0</td>
<td>4.85</td>
<td>5.25</td>
<td>50</td>
</tr>
<tr>
<td>+3.43</td>
<td>3.400</td>
<td>3.465</td>
<td>75</td>
</tr>
<tr>
<td>+12</td>
<td>11.5</td>
<td>12.6</td>
<td>11</td>
</tr>
<tr>
<td>–12</td>
<td>–10.9</td>
<td>–13.2</td>
<td>0.2</td>
</tr>
<tr>
<td>–5.0</td>
<td>–4.6</td>
<td>–5.5</td>
<td>0.2</td>
</tr>
<tr>
<td>Vaux</td>
<td>8.5</td>
<td>9.5</td>
<td>0.05</td>
</tr>
</tbody>
</table>

- Remote sense on +5.0V and +3.43V
  
  +5.0V is sensed on all CPUs in the system, the system bus motherboard, and the PCI bus motherboard(s).

  +3.43V is sensed on all CPUs in the system and the system bus motherboard.

- Current share on +5.0V, +3.43V, and +12V.
- 1% regulation on +3.43V.
- Fault protection (latched). If a fault is detected by the power supply, it will shut down. The faults detected are:
  
  Overvoltage
  Overcurrent
  Power overload

- DC_ENABLE_L input signal starts the DC outputs.
- POK_H output signal indicates that the power supply is operating properly.
4.2 Power Control Module Features

The power control module (54-24117-01) is located behind the B3040-AA module, the system bus to PCI bus bridge module.

Figure 4-2 Power Control Module
The power control module performs the following functions:

- Controls the power-up/down sequencing.
- Monitors the combined output of power supplies VDD (3.43V) and VCC (5.0V) and asserts DCOK_SENSE if these voltages are within range and asserts POWER_FAULT_L causing an immediate power shutdown if either is not.
- Monitors system temperature and asserts TEMP_FAIL, if temperature exceeds 55° C.
- Monitors CPU and system drawer fans and asserts CPUFAN_OK if all CPU fans are functioning properly, asserts SYSTEM_FAN_OK if the drawer cooling fans are functioning properly; otherwise it asserts FAN_FAULT_L. Each fan is checked at 1 second intervals.
- Powers down the system 30 seconds after detecting TEMP_FAIL, or the absence of CPUFAN_OK, or the absence of SYSTEM_FAN_OK by asserting POWER_FAULT_L.
- Provides visual indication of faults through LEDs.
- Has two registers, one that generates interrupts when bits change, and one that latches errors but does not generate interrupts.
4.3 Power Circuit and Cover Interlocks

Figure 4-3  Power Circuit Diagram
Figure 4-3 shows the distribution of power throughout the system drawer. Openings in the circuit or the PCM signal POWER_FAULT_L or the SCM signal RSM_DC_EN_L interrupt DC power applied to the system. The openings can be caused by the On/Off button or the cover interlocks. The POWER_FAULT_L signal is asserted by the PCM module if it detects a fault and the RSM_DC_EN_L is controlled remotely.

A failure anywhere in the circuit will result in the removal of DC power. A potential failure is the relay used on the SCM modules to control the RSM_DC_EN_L signal.

The 4100 and early 4000 system drawers have three cover interlocks: one for the system bus card cage, one for the PCI card cage, and one for the power and system fan area. Later 4000 system drawers have four cover interlocks; the fourth switch is for the second PCI card cage.

To override the cover interlocks, find a suitable object to close the interlock circuit at the location identified in Figure 4-3. The switch assembly that contains single switches for all three covers is located where all three covers meet.
4.4 Power-Up/Down Sequence

The On/Off button can be controlled manually or remotely. The button is on the OCP. Remote power control is provided though the remote I/O port connected to the PCI. The power-up/down sequence flow is shown below.

Figure 4-4  Power Up/Down Sequence Flowchart
When AC is applied to the system, Vaux (auxiliary voltage) is asserted and is sensed by the PCM. The PCM asserts DC_ENABLE_L starting the power supplies. If there is a hard fault on power-up, the power supplies shut down immediately; otherwise, the power system powers up and remains up until the system is shut off or the PCM senses a fault. If a power fault is sensed, the power system attempts to restore power and will do so if the fault is not sensed a second time. If the fault is still present, the power system shuts down.

Since Vaux is independent of the power supply start, the AC plugs at the front of the supplies must be removed to reset Vaux, allowing capacitors to drain voltage. All power failures require this procedure since the PCM must sense a change in Vaux to start the power supplies.
4.5 Cabinet Power Configuration Rules

There are four cabinets with different power delivery systems. See page 1-9 for a description of differences. A barcode label designating the cabinet variation is located inside the back door in the upper left corner of the bezel holding the door. The four variations are: H9A10-EB, -EC, -EL, -EM.

Figure 4-5 Simple -EB & -EC Cabinet Power Configuration
Figure 4-6  Worst-Case -EB & -EC Cabinet Power Configuration

<table>
<thead>
<tr>
<th>Total Power Available</th>
<th>4800 VA</th>
</tr>
</thead>
<tbody>
<tr>
<td>Single Drawer</td>
<td>1100 VA</td>
</tr>
<tr>
<td>Single StorageWorks Shelf</td>
<td>150 VA</td>
</tr>
<tr>
<td>System Fan Tray</td>
<td>100 VA</td>
</tr>
<tr>
<td>Outlets</td>
<td>18 IEC 320 max. (3 power strips)</td>
</tr>
<tr>
<td>Site Grounding</td>
<td>Leakage current exceeds 3.5 mAms.</td>
</tr>
<tr>
<td>Power Strip</td>
<td>One system drawer per power strip. In four-system drawer configuration, fourth drawer should have its three power cords distributed among the three power strips.</td>
</tr>
</tbody>
</table>
Figure 4-7  -EL & -EM Single Drawer Cabinet Power Configuration
(Single drawer -EM shown with H7600-DB controller)

2 Power Controllers

StorageWorks

0.38 A

0.38 Ams

0.38 Ams

0.38 Ams

0.38 Ams

0.38 Ams

0.38 Ams

1.83 Ams

1.83 Ams

System Drawer

Fan tray

1.83 Ams

0.5 Am

StorageWorks

StorageWorks

StorageWorks

StorageWorks

StorageWorks

StorageWorks

StorageWorks

StorageWorks

StorageWorks

240 V, 16 AMP Controller with 12 IEC C13 outlets (Europe & A.P.)

240 VMS

PKW0406E-97
Figure 4-8  -EL Three Drawer Cabinet Power Configuration
(Three drawer -EL shown with H7600-AA controller)

2 Power Controllers

System Drawer

System Drawer

StorageWorks

System Drawer

Fan tray

StorageWorks

120 V, 14 AMP Controller with
10 NEMA 5-15 outlets (N.A. & A.P.)

120 VMS

PKW0406F-97
### 4.6 Pedestal Power Configuration Rules (North America and Japan)

**Figure 4-9 Pedestal Power Distribution (N.A. and Japan)**

<table>
<thead>
<tr>
<th>Total Power Available (Assuming a 15 A branch)</th>
</tr>
</thead>
<tbody>
<tr>
<td>N. America: 1800 VA per branch circuit and 1400 VA per line cord</td>
</tr>
<tr>
<td>Japan: 1500 VA per branch circuit and 1200 VA per line cord</td>
</tr>
<tr>
<td>Single Drawer</td>
</tr>
<tr>
<td>Single StorageWorks Shelf</td>
</tr>
<tr>
<td>Outlets</td>
</tr>
<tr>
<td>Power Strip</td>
</tr>
</tbody>
</table>
4.7 Pedestal Power Configuration Rules (Europe and Asia Pacific)

Figure 4-10 Pedestal Power Distribution (Europe and AP)

<table>
<thead>
<tr>
<th>Power Strips</th>
<th>StorageWorks</th>
<th>StorageWorks</th>
<th>System Drawer</th>
</tr>
</thead>
<tbody>
<tr>
<td>10A</td>
<td>0.34 Arms</td>
<td>0.34 Arms</td>
<td>1.67 Arms</td>
</tr>
<tr>
<td>0.34 Arms</td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>1.67 Arms</td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>200 - 240 Vrms</td>
<td>5.0 Arms</td>
<td>200 - 240 Vrms</td>
<td>3.0 Arms</td>
</tr>
</tbody>
</table>

<table>
<thead>
<tr>
<th>Total Power Available</th>
<th>2200 VA per power strip</th>
</tr>
</thead>
<tbody>
<tr>
<td>Single Drawer</td>
<td>1100 VA</td>
</tr>
<tr>
<td>Single StorageWorks Shelf</td>
<td>150 VA</td>
</tr>
<tr>
<td>Outlets</td>
<td>10 IEC 320 receptacles max. One receptacle is blocked on each power strip to control leakage.</td>
</tr>
<tr>
<td>Power Strip</td>
<td>Single AC power strip supports one system drawer and three StorageWorks shelves.</td>
</tr>
</tbody>
</table>
Chapter 5

Error Logs

This chapter provides information on troubleshooting with error logs. The following topics are covered:

- Using Error Logs
- Using DECevent
- Error Log Examples and Analysis
- Troubleshooting IOD-Detected Errors
- Double Error Halts and Machine Checks While in PAL Mode

Error registers are described in Chapter 6.
5.1 Using Error Logs

Error detection is performed by CPUs, the IOD, and the EISA to PCI bus bridge. (The IOD is the acronym used by software to refer to the system bus to PCI bus bridge.)

Figure 5-1 Error Detector Placement
As shown in Figure 5-1 and the accompanying table, the CPU chip is isolated by transceivers (XVER) from the data and command/address lines on the module. This allows the CPU chip access to the duplicate tag and B-cache while the system bus is in use. The CPU detects errors only when it is the consumer of the data. The IOD detects errors on each system bus cycle regardless of whether it is involved in the transaction.

System bus errors detected by the CPU may also be detected by the IOD. It is necessary to check the IOD for errors any time there is a CPU machine check.

- If the CPU sees bad data and the IOD does not, the CPU is at fault.
- If both the CPU and the IOD see bad data on the system bus, either memory or a secondary CPU is the cause. In such a case, the Dirty bit, bit<20>, in the IOD MC_ERR1 Register should be set or clear. If the Dirty bit is set, the source of the data is a CPU’s cache destined for a different CPU. If the Dirty bit is not set, memory caused the bad data on the bus. In this case, multiple error log entries occur and must be analyzed together to determine the cause of the error.
5.1.1 Hard Errors

There are two categories of hard errors:

- System-independent errors detected by the CPU. These errors are processor machine checks handled as MCHK 670 interrupts and are:
  - Internal EV5 or EV56 cache errors
  - CPU B-cache module errors

- System-dependent errors detected by both the CPU and IOD. These errors are system machine checks handled as MCHK 660 interrupts and are:
  - CPU-detected external reference errors
  - IOD hard error interrupts

The IOD can detect hard errors on either side of the bridge.

5.1.2 Soft Errors

There are two categories of soft errors:

- System-independent errors detected and corrected by the CPU. These errors are CPU module correctable errors handled as MCHK 630 interrupts.

- System-dependent errors that are correctable single-bit errors on the system bus and are handled as MCHK 620 interrupts.
5.1.3 Error Log Events

Several different events are logged by OpenVMS and DIGITAL UNIX. Windows NT does not log errors in this fashion.

<table>
<thead>
<tr>
<th>Error Log Event</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>MCHK 670</td>
<td>Processor machine checks. These are synchronous errors that inform precisely what happened at the time the error occurred. They are detected inside the CPU chip and are fatal errors.</td>
</tr>
<tr>
<td>MCHK 660</td>
<td>System machine checks. These are asynchronous errors that are recorded after the error has occurred. Data on exactly what was going on in the machine at the time of the error may not be known. They are fatal errors.</td>
</tr>
<tr>
<td>MCHK 630</td>
<td>Processor correctable errors</td>
</tr>
<tr>
<td>MCHK 620</td>
<td>System correctable errors</td>
</tr>
<tr>
<td>Last fail</td>
<td>Used to collect system bus registers prior to crashing</td>
</tr>
<tr>
<td>I/O error interrupt</td>
<td>IOD error interrupts</td>
</tr>
<tr>
<td>System environment</td>
<td>Used to provide status on power, fans, and temperature</td>
</tr>
<tr>
<td>Configuration</td>
<td>Used to provide system configuration information</td>
</tr>
</tbody>
</table>
5.2 Using DECevent

DECevent produces bit-to-text ASCII reports derived from system event entries or user-supplied event logs. The format of the reports is determined by commands, qualifiers, parameters, and keywords appended to the command. The maximum command line length is 255 characters.

DECevent allows you to do the following:

- Translate event log files into readable reports
- Select alternate input and output files
- Filter input events
- Select alternative reports
- Translate events as they occur
- Maintain and customize your environment with the interactive shell commands

To access on-line help:

OpenVMS
$ HELP DIAGNOSE or
$ DIA /INTERACTIVE
DIA> HELP

DIGITAL UNIX
> man dia or
> dia hlp

Privileges necessary to use DECevent:

- SYSPRV for the utility
- DIAGNOSE to use the /CONTINUOUS qualifier
5.2.1 Translating Event Files

To produce a translated event report using the default event log file, SYSSERRORLOG:ERRLOG.SYS, enter the following command:

OpenVMS
$ DIAGNOSE

DIGITAL UNIX
> dia -a

The DIAGNOSE command allows DECevent to use built-in defaults. This command produces a full report, directed to the terminal screen, from the input event file, SYSSERRORLOG:ERRLOG.SYS. The /TRANSLATE qualifier is understood on the command line.

To select an alternate input file

OpenVMS
$ DIAGNOSE ERRORLOG.OLD

DIGITAL UNIX
> dia -a -f syserr-old.hostname

These commands select an alternate input file (ERRORLOG.OLD or syserr-old) as the event log to translate. The file name can contain the directory or path, if needed. Wildcard characters can be used.

To send reports to an output file

OpenVMS
$ DIAGNOSE/OUTPUT=ERRLOG_OLD.TXT

DIGITAL UNIX
> dia -a > syserr-old.txt

These commands direct the output of DECevent to ERRLOG_OLD.TXT or syserr-old.txt.
To reverse the order of the input events

OpenVMS
$ DIAGNOSE/TRANSLATE/REVERSE

DIGITAL UNIX
> dia -R

These commands reverse the order in which events are displayed. The default order is forward chronologically.

5.2.2 Filtering Events

/INCLUDE and /EXCLUDE qualifiers allow you to filter input event log files.

The /INCLUDE qualifier is used to create output for devices named in the command.

OpenVMS
$ DIAGNOSE/TRANSLATE/INCLUDE=(DISK=RZ,DISK=RA92,CPU)

DIGITAL UNIX
> dia -i disk=rz disk=ra92 cpu

The commands shown here create output using only the entries for RZ disks, RA92 disks, and CPUs.

The /EXCLUDE qualifier is used to create output for all devices except those named in the command.

OpenVMS
$ DIAGNOSE/TRANSLATE/EXCLUDE=(MEMORY)

DIGITAL UNIX
> dia -x mem
Use the /BEFORE and /SINCE qualifiers to select events before or after a certain date and time.

OpenVMS
$ DIAGNOSE/TRANSLATE/BEFORE=15-JAN-1996:10:30:00
or
$ DIAGNOSE/TRANSLATE/SINCE=15-JAN-1996:10:30:00

DIGITAL UNIX
> dia -t s:15-jan-1996 e:20-jan-1996

If no time is specified, the default time is 00:00:00, and all events for that day are selected.

The /BEFORE and /SINCE qualifiers can be combined to select a certain period of time.

OpenVMS

If no value is supplied with the /SINCE or /BEFORE qualifiers, DECevent defaults to TODAY.
5.2.3 Selecting Alternative Reports

Table 5-2 describes the DECevent report formats. Report formats are mutually exclusive. No combinations are allowed. The default format is /Full.

<table>
<thead>
<tr>
<th>Format</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>/Full</td>
<td>Translates all available information for each event</td>
</tr>
<tr>
<td>/Brief</td>
<td>Translates key information for each event</td>
</tr>
<tr>
<td>/Terse</td>
<td>Provides binary event information and displays register values and other ASCII messages in a condensed format</td>
</tr>
<tr>
<td>/Summary</td>
<td>Produces a statistical summary of the events in the log</td>
</tr>
<tr>
<td>/Fsterr</td>
<td>Produces a one-line-per-entry report for disk and tape devices</td>
</tr>
</tbody>
</table>

The syntax is:

OpenVMS
$ DIAGNOSE/TRANSLATE/<format>

DIGITAL UNIX
> dia -o <format>
5.3 Error Log Examples and Analysis

The following sections provide examples and analysis of error logs.

5.3.1 MCHK 670 CPU-Detected Failure

The error log in Example 5-1 shows the following:

1. CPU1 logged the error in a system with two CPUs.
2. During a D-ref fill, the External Interface Status Register logged an uncorrectable EEC error. (When a CPU chip does not find data it needs to perform a task in any of its caches, it requests data from off the chip to fill its D-caches. It performs a “D-ref fill.”) Bit<30> is clear, indicating that the source of the error is the B-cache.

The error was detected by a CPU and the data was not on the system bus. Otherwise, the IODs would have seen the error. Therefore, CPU1 is broken.

NOTE: The error log example has been edited to decrease its size; registers of interest are in bold type. The “Horse” module referred to in the error log is the system bus to PCI bus bridge module, the B3040 module. The “Saddle” module is the PCI motherboard, the B3050 module. The “MC” bus is the system bus.

Refer to Table 5-9 for information on decoding commands, and refer to Table 5-10 for information on node IDs.
Example 5-1  MCHK 670

Logging OS                        2. DIGITAL UNIX
System Architecture               2. Alpha
Event sequence number             4.
Timestamp of occurrence            04-APR-1996 17:20:04
Host name                            whip16

System type register    x00000016  AlphaStation 4x00
Number of CPUs (mpnum)  x00000002
CPU logging event (mperr) x00000001

Event validity                    1. O/S claims event is valid
Event severity                    1. Severe Priority
Entry type                        100. CPU Machine Check Errors
CPU Minor class                   1. Machine check (670 entry)

Software Flags                   x0000000030000000
                          IOD 1 Register Subpkt Pres
                          IOD 2 Register Subpkt Pres
Active CPUs                      x00000003
Hardware Rev                     x00000000
System Serial Number             C1563
Module Serial Number             x0000
System Revision                  x00000000

* MCHK 670 Regs *
Flags:                            x00000000
PCI Mask                          x0000
Machine Check Reason              x0098
PAL SHADOW REG 0                  x00000000
PAL SHADOW REG 1                  x00000000

PAL SHADOW REG 6                  x00000000
PAL SHADOW REG 7                  x00000000
PALTEMP0                          x000000000087C7A58
PALTEMP1                          xFFFFFFFE8F658000
PALTEMP2                          xFFFFFFFC00003C9F40

PALTEMP22                         xFFFFFFFC00004F9D60
PALTEMP23                         x00000000008709A58
Exception Address Reg             xFFFFFFFC000003BFB88
Exception Summary Reg             x00000000
Exception Mask Reg                x00000000
PAL BASE                          x0000000000020000
Base addr for palcode = x000000000008
Interrupt Summary Reg             x00000000
IBOX Ctrl and Status Reg          x00000000C16000000000
                   AST requests 3 - 0 x00000000
                   Timeout Bit Not Set
                   PAL Shadow Registers Enabled
                   Correctable Err Intrpts Enabled
                   ICACHE BIST Successful

5-12  AlphaServer 4000/4100 Service Manual
TEST_STATUS_H Pin Asserted

Icache Par Err Stat Reg  x00000000
Dcache Par Err Stat Reg  x00000000
Virtual Address Reg     xFFFFFFFF63BD38
Memory Mgmt Flt Sts Reg x00000001661D

Ref which caused err was a write
Ref resulted in DTB miss
RA Field  x00000000000001B
Opcode Field  x00000000002C

Scache Address Reg  xFFFFFFF00000254BF
Scache Status Reg   x00000000
Bcache Tag Address Reg  xFFFFFFF80E98F7FFF

External cache hit
Parity for ds and v bits
Cache block dirty
Cache block valid
Ext cache tag addr parity bit
Tag addr x38:20 is

x00000000000E98
Ext Interface Address Reg  xFFFFFFF00E984DBCF
Fill Syndrome Reg         x0000000000002B

Ext Interface Status Reg  xFFFFFFFF104FFFFF
Uncorrectable ECC error
Error occurred during D-ref fill

LD LOCK                    xFFFFFFF003797340F

** IOD SUBPACKET -> **

IOD 0 Register Subpacket
WHOAMI  x000000BB
Device ID  x0000003B
Bcache Size = 2MB
VCTY ASIC Rev = 0
Module Revision 0.

Base Address of Bridge x000000F9E0000000
PCI Revision 0x06008021
CAP Chip Revision x00000001
Horse Module Revision x00000002
Saddle Module Revision x00000000
Saddle Module Type - Left Hand
EISA Present
PCI Class Code x00000600

MC-PCI Command Register 0x06480FF1
Selftest passed
Delayed read enabled
Bridge PCI trans enabled
Reg 64 bit data trans enabled
Accept 64 bit data trans enabled
Check PCI Addr Parity enabled
Check MC bus CMS/Addr Parity enabled
Check MC bus NXM enabled
Check all transaction enabled
16 byte aligned block write enabled
Write Pend Number Thresho x00000008
RD_TYPE  Short
RL_TYPE  Medium
RM_TYPE  Long
ARB_MODE  MC-PCI Bridge Priority Mode

Memory Host Addr Exten x00000000
IO Host Addr Extension x00000000
Interrupt Control x00000003
MC-PCI Intr Enabled
Device intr info enabled if en_int=1

Interrupt Request 0x00000000
Interrupt Mask Register 0 x00C50010
Interrupt Mask Register 1 x00000000
MC Error Info Register 0 x0E000000
MC Error Info Register 1 x000E88FD
MC bus trans addr <31:4> x0E000000
MC bus trans addr <39:32>x0000000F
I/O Subpacket ->

** WHOAMI **

Device ID x0000003B
Bcache Size = 2MB
VCTY ASIC Rev = 0
Module Revision 0.

** Base Address of Bridge **
x000000FBE000000

** PCI Revision **
x06000021
Horse Module Revision x00000002
Saddle Module Revision x00000000
PCI Class Code x00000000

** MC-PCI Command Register **
x06480FF1
Self test passed
Delayed read enabled
Bridge PCI trans enabled
Req 64 bit data trans enabled
Accept 64 bit data trans enabled
Check PCI Addr Parity enabled
Check MC bus CMS/Addr Parity enabled
Check MC bus NXM enabled
Check all transaction enabled
16 byte aligned block write enabled
Write Pend Number Thresh x00000000
RD_TYPE Short
RL_TYPE Medium
RM_TYPE Long
ARB_MODE MC-PCI Bridge Priority Mode

** Memory Host Addr Exten **
x00000000

** IO Host Addr Extension **
x00000000

** Interrupt Control **
x00000003
MC-PCI Intr Enabled
Device intr info enabled if en_int = 1

** Interrupt Request **
x00000000
Interrupts asserted x00000000

** Interrupt Mask Register 0 **
x00C50001

** Interrupt Mask Register 1 **
x00000000

** MC Error Info Register 0 **
x00000000
MC bus trans addr <31:4> x0E000000

** MC Error Info Register 1 **
x0000E88FD
MC bus trans addr <39:32> x00000000
MC Command x00000008
Device Id x0000003A

** CAP Error Register **
x00000000
(no error seen)

** PCI Bus Trans Error Addr **
x00000000
MDPA Status Register x00000000
MDPA Error Syndrome Reg x00000000
MDPB Status Register x00000000
MDPB Error Syndrome Reg x00000000
MDPA Chip Revision x00000000
Cycle 0 ECC Syndrome x00000000
Cycle 1 ECC Syndrome x00000000
Cycle 2 ECC Syndrome x00000000
Cycle 3 ECC Syndrome x00000000
MDPB Chip Revision x00000000
Cycle 0 ECC Syndrome x00000000
Cycle 1 ECC Syndrome x00000000
Cycle 2 ECC Syndrome x00000000
Cycle 3 ECC Syndrome x00000000

** Device Id **
x00000003A

** CAP Error Register **
x00000000
(no error seen)

** PCI Bus Trans Error Addr **
x000018B48
MDPA Status Register x00000000
MDPA Error Syndrome Reg x00000000
MDPB Status Register x00000000
MDPB Error Syndrome Reg x00000000
MDPA Chip Revision x00000000
Cycle 0 ECC Syndrome x00000000
Cycle 1 ECC Syndrome x00000000
Cycle 2 ECC Syndrome x00000000
Cycle 3 ECC Syndrome x00000000
MDPB Chip Revision x00000000
Cycle 0 ECC Syndrome x00000000
Cycle 1 ECC Syndrome x00000000
Cycle 2 ECC Syndrome x00000000
Cycle 3 ECC Syndrome x00000000

PALcode Revision: Palcode Rev: 1.21-3
5.3.2 MCHK 670 CPU and IOD-Detected Failure

The error log in Example 5-2 shows the following:

1. CPU3 logged the error in a system with four CPUs.
2. The External Interface Status Register logged an uncorrectable ECC error during a D-ref fill. (When a CPU chip does not find data it needs to perform a task in any of its caches, it requests data from off the chip to fill its D-cache. It performs a “D-ref fill.”) Bit <30> is set, indicating that the source of the error is memory or the system. Bits <32> and <35> are set, indicating an uncorrectable ECC error and a second external interface hard error, respectively.
4. The command at the time of the error was a read.
5. The bus master at the time of the error was CPU3.
6. The Dirty bit, bit <20> in the MC_ERR1 Register is clear, indicating the data is clean and comes from memory.

The error was detected by a CPU, and the data was on the system bus and is clean. Therefore, a memory module provided the wrong data. (If the Dirty bit had been set, the data would have come from the cache of another CPU.) To determine which memory, see Section 5.4

NOTE: The error log example has been edited to decrease its size; registers of interest are in bold type. The “Horse” module referred to in the error log is the system bus to PCI bus bridge module, the B3040 module. The “Saddle” module is the PCI motherboard, the B3050 module. The “MC” bus is the system bus.

Refer to Table 5-9 for information on decoding commands, and refer to Table 5-10 for information on node IDs.
Example 5-2  MCHK 670 CPU and IOD-Detected Failure

Logging OS 2. DIGITAL UNIX
System Architecture 2. Alpha
Event sequence number 6.
Timestamp of occurrence 08-APR-1996 11:27:55 whip16
Host name
System type register x000000016  AlphaStation 4x00
Number of CPUs (mpnum) x00000004
CPU logging event (mperr) x00000003
Event validity 1. O/S claims event is valid
Event severity 1. Severe Priority
Entry type 100. CPU Machine Check Errors
CPU Minor class 1. Machine check (670 entry)
Software Flags x0000000030000000 IOD 1 Register Subpkt Prs
IOD 2 Register Subpkt Prs
Active CPUs x0000000F
Hardware Rev x00000000
System Serial Number C1563
Module Serial Number x0000
Module Type x00000000
System Revision x00000000

* MCHK 670 Regs *
Flags: x00000000
PCI Mask x0000
Machine Check Reason x0098
PAL SHADOW REG 0 x00000000
PAL SHADOW REG 1 x00000000
.
.
PAL SHADOW REG 6 x00000000
PAL SHADOW REG 7 x00000000
PALTEMP0 x0000000001401A7A90
PALTEMP1 x00000000000021
.
.
PALTEMP23 x000000000ECE77A58
Exception Address Reg x0000000012005A8B4
Native-mode instruction
Exception PC x0000000048016A2D
Exception Summary Reg x00000000
Exception Mask Reg x00000000
PAL BASE x000000000020000
Base addr for palcode = x00000000008
Interrupt Summary Reg x00000000
AST requests 3 - 0 x00000000
IBOX Ctrl and Status Reg x0000000C16400000
Timeout Bit Not Set
Floating Point Instr. may be issued
PAL Shadow Registers Enabled
Correctable Err Intrpts Enabled
ICACHE BIST Successful
TEST_STATUS_H Pin Asserted
Icache Par Err Stat Reg x00000000
Dcache Par Err Stat Reg x00000000
Virtual Address Reg x00000001407D6000
Memory Mgmt Flt Stats Reg x00000000011A1A0
  Ref resulted in DTB miss
  RA Field x0000000008
  Opcode Field x00000000000023
Scache Address Reg xFFFFFFF00000254BF
Scache Status Reg x00000000
Bcache Tag Address Reg xFFFFFFF80286FF7FF
  External cache hit
  Parity for ds and v bits
  Cache block dirty
  Cache block valid
  Ext cache tag addr parity bit
  Tag address<38:20> is
x00000000000286
Ext Interface Address Reg xFFFFFFF0028681A8F
Fill Syndrome Reg x00000000004B00
  Uncorrectable ECC error
  Error occurred during D-ref fill
  Second external interface hard error
LD LOCK xFFFFFFF0000020040F

** IOD SUB PACKET -> **
WHOAMI x000000BF
  Device ID x0000000F
  VCTY ASIC Rev = 0
  Module Revision 0.
Base Address of Bridge x000000F9E0000000
PCI Revision x06008021
  CAP Chip Revision 00000001
  Horse Module Revision 00000002
  Saddle Module Revision 00000000
  Saddle Module Type Left Hand
  EISA Present
  PCI Class Code x00000600
MC-PCI Command Register x06460FF1
  Selftest passed
  Delayed read enabled
  Bridge PCI trans enabled
  Req 64 bit data trans enabled
  Accept 64 bit data trans enabled
  Check PCI Addr Parity enabled
  Check MC bus CMS/Addr Parity enabled
  Check MC bus NXM enabled
  Check all transaction enabled
  16 byte aligned block write enabled
  Write Pend Number Thresho 0x00000006
  RD_TYPE Short
  RL_TYPE Medium
  RM_TYPE Long
  ARB_MODE MC-PCI Bridge Priority Mode
Memory Host Addr Extension x00000000
IO Host Addr Extension x00000000
Interrupt Control x00000003
  MC-PCI Intr Enabled
  Device intr info enabled if en_int = 1
Interrupt Request x00810000
  Interrupts asserted x00010000
  Hard Error
Interrupt Mask Register 0 x00C50010
Interrupt Mask Register 1 x00000000
MC Error Info Register 0 x28681A80  MC bus trans addr <31:4> x028681A8
MC Error Info Register 1 x800FD800  MC bus trans addr <39:32> x00000000

MC_Command x00000018
Device Id x0000003F
MC error info valid

CAP Error Register xC0000000 Uncorrectable ECC err det by MDPB

MC error info latched

PCI Bus Trans Error Addr x000003FD
MDPA Status Register x00000000 MDPA Chip Revision x00000000
MDPA Error Syndrome Reg x00000000 Cycle 0 ECC Syndrome x00000000
Cycle 1 ECC Syndrome x00000000
Cycle 2 ECC Syndrome x00000000
Cycle 3 ECC Syndrome x00000000

MDPB Status Register x80000000 MDPB Chip Revision x00000000
MDPB Error Syndrome of uncorrectable read error

MDPB Error Syndrome Reg x0000004B Cycle 0 ECC Syndrome
Cycle 1 ECC Syndrome x00000000
Cycle 2 ECC Syndrome x00000000
Cycle 3 ECC Syndrome x00000000

** IOD SUBPACKET -> **

WHOAMI x000000BF Device ID x0000003F
Bcache Size = 2MB
VCTY ASIC Rev = 0
Module Revision 0.

Base Address of Bridge x000000FBED000000
PCI Revision x06000021 CAP Chip Revision x00000001
Horse Module Revision x00000002
Saddle Module Revision x00000000
Saddle Module Type Left Hand
PCI Class Code x00000600

MC-PCI Command Register x06460FF1 Selftest passed
Delayed read enabled
Bridge PCI trans enabled
Req 64 bit data trans enabled
Accept 64 bit data trans enabled
Check PCI Addr Parity enabled
Check MC bus CMS/Addr Parity enabled
Check MC bus NXM enabled
Check all transaction enabled
16 byte aligned block write enabled
Write Pend Number Thresh x00000006
RD_TYPE Short
RL_TYPE Medium
RM_TYPE Long
ARB_MODE MC-PCI Bridge Priority Mode

Memory Host Addr Ext x00000000
IO Host Addr Extension x00000000
Interrupt Control x00000003 MC-PCI Intr Enabled
Device intr info enabled if en_int = 1

Interrupt Request x00800000 Interrupts asserted x00000000

Hard Error

Interrupt Mask Register 0 x00C50001
Interrupt Mask Register 1 x00000000

MC Error Info Register 0 x28681A80  MC bus trans addr <31:4> x028681A8
MC Error Info Register 1 x800FD800  MC bus trans addr <39:32> x00000000
MCCommand x00000018

Device Id x0000003F

MC error info valid

MC error info latched

CAP Error Register xC0000000

PCI Bus Trans Error Adr x00000000
MDPA Status Register x00000000
MDPA Error Syndrome Reg x00000000
MDPB Status Register x80000000
MDPB Error Syndrome Reg x0000004B

MDPA Chip Revision x00000000
Cycle 0 ECC Syndrome x00000000
Cycle 1 ECC Syndrome x00000000
Cycle 2 ECC Syndrome x00000000
Cycle 3 ECC Syndrome x00000000
Cycle 1 ECC Syndrome x00000000
Cycle 2 ECC Syndrome x00000000
Cycle 3 ECC Syndrome x00000000
Cycle 0 ECC Syndrome

MDPB Chip Revision x00000000
MDPB Error Syndrome of uncorrectable read error

PALcode Revision Palcode Rev: 1.21-3
5.3.3 MCHK 670 Read Dirty CPU-Detected Failure

The error log in Example 5-3 shows the following:

1. CPU0 logged the error in a system with two CPUs.
2. The External Interface Status Register records an uncorrectable ECC error from the system (bit <30> set).
4. The MC Error Info Registers 0 and 1 have captured the error information.
5. The commander at the time of the error was CPU0 (known from MC_ERR1).
6. The command on the bus at the time was a read memory command.
7. The address read was a memory address, not an I/O address.
8. The data associated with the read was dirty.

From this information you know CPU0 requested data that was dirty; therefore, memory did not provide it, nor did an I/O device. Only another CPU could have provided the data from its cache. There is only one other CPU in this system, and it is faulty. Had there been more than two CPUs you could not have identified the error to a particular CPU. See Section 5.4 for a procedure designed to help with IOD-detected errors.

NOTE: The error log example has been edited to decrease its size; registers of interest are in bold type. The “Horse” module referred to in the error log is the system bus to PCI bus bridge module, the B3040 module. The “Saddle” module is the PCI motherboard, the B3050 module. The “MC” bus is the system bus.

Refer to Table 5-9 for information on decoding commands, and refer to Table 5-10 for information on node IDs.
Example 5-3  MCHK 670 Read Dirty Failure

Logging OS 2. DIGITAL UNIX
System Architecture 2. Alpha
Event sequence number 4.
Timestamp of occurrence 08-APR-1996 10:20:37
Host name sect06

System type register x00000016 AlphaStation 4x00
Number of CPUs (mpnum) x00000002
CPU logging event (mperr) x00000000

Event validity 1. O/S claims event is valid
Event severity 1. Severe Priority
Entry type 100. CPU Machine Check Errors

CPU Minor class 1. Machine check (670 entry)

Software Flags x0000000300000000
Active CPUs x00000003
Hardware Rev x00000000
System Serial Number C1563
Module Serial Number x0000
System Revision x00000000

* MCHK 670 Regs *
Flags: x00000000
PCI Mask x0000
Machine Check Reason x0098 Fatal Alpha Chip Detected HardError
PAL SHADOW REG 0 x0000000000000000
PAL SHADOW REG 1 x0000000000000000
PAL SHADOW REG 2 x0000000000000000
PAL SHADOW REG 3 x0000000000000000
PAL SHADOW REG 4 x0000000000000000
PAL SHADOW REG 5 x0000000000000000
PAL SHADOW REG 6 x0000000000000000
PAL SHADOW REG 7 x0000000000000000
PALTEMP0 x0000000000000000
PALTEMP1 x0000000000000000
PALTEMP2 x0000000000000000
PALTEMP3 x0000000000000000
PALTEMP4 x0000000000000000
PALTEMP5 x0000000000000000
PALTEMP6 x0000000000000000
PALTEMP7 x0000000000000000
PALTEMP8 x0000000000000000
PALTEMP9 x0000000000000000
PALTEMP10 x0000000000000000
PALTEMP11 x0000000000000000
PALTEMP12 x0000000000000000
PALTEMP13 x0000000000000000
PALTEMP14 x0000000000000000
PALTEMP15 x0000000000000000
PALTEMP16 x0000000000000000
PALTEMP17 x0000000000000000
PALTEMP18 x0000000000000000
PALTEMP19 x0000000000000000
PALTEMP20 x0000000000000000
PALTEMP21 x0000000000000000
PALTEMP22 x0000000000000000
PALTEMP23 x0000000000000000
Exception Address Reg x0000000000000000
Exception Summary Reg x0000000000000000
Exception Mask Reg x0000000000000000
PAL Base Address Reg x0000000000000000
Interrupt Summary Reg x0000000000000000
IBOX Ctrl and Status Reg x0000000000000000

IBOX  Timeout Counter Bit Clear.
IBOX Timeout Counter Enabled.
Floating Point Instructions will cause FEN Exceptions.
PAL Shadow Registers Enabled.
Correctable Error Interrupts Enabled.
ICACHE BIST (Self Test) Was Successful.
TEST_STATUS_H Pin Asserted

Icache Par Err Stat Reg  x0000000000000000
Dcache Par Err Stat Reg  x0000000000000000
Virtual Address Reg      x0000000000044000
Memory Mgmt Flt Sts Reg  x0000000000005D10

If Err, Reference Resulted in DTB Miss
Fault Inst RA Field:
x0000000000000014

Fault Inst Opcode:
x000000000000000B
Scache Address Reg       xFFFFFF00000254BF
Scache Status Reg         x0000000000000000
Bcache Tag Address Reg    xFFFFFF8007EE2FFF

Last Bcache Access Resulted in a Miss.
Value of Parity Bit for Tag Control Status
   Bits Dirty, Shared & Valid is Set.
Value of Tag ControlDirty Bit is Clear.
Value of Tag Control Shared Bit is Clear.
Value of Tag Control Valid Bit is Clear.
Value of Parity Bit Covering Tag Store address Bits is Set.
Tag Address<38:20> Is:
x000000000000007E

Ext Interface Address Reg xFFFFFFF00007BF0BF
Fill Syndrome Reg         x000000000000D189
Ext Interface Status Reg  xFFFFFFF944FDDDD

Error Source is Memory or System
UNCORRECTABLE ECC ERROR
Error Occurred During D-ref Fill Error

LD LOCK                   xFFFFFFF0007FBF00F
** IOD SUBPACKET -> **               IOD 0 Register Subpacket
WHOAMI                    x000000BA  Module Revision  0.
VCTY ASIC Rev = 0
Bcache Size = 2MB
MID  2.
GID  7.

Base Address of Bridge    x000000F9E0000000
Dev Type & Rev Register   x0608021  CAP Chip Revision:  x00000001
HORSE Module Revision:  x00000002
SADDLE Module Revision:  x00000000
SADDLE Module Type:     LeftHand
PCI-EISA Bus Bridge Present on PCI Segment
PCI Class Code           x00000600
MC-PCI Command Register  x06480FF1  Module SelfTest Passed LED on
Delayed PCI Bus Reads Protocol:  Enabled

Error Logs  5-23
### Bridge to PCI Transactions
- **Enabled**
- **Bridge REQUESTS 64 Bit Data Transactions**
- **Bridge ACCEPTS 64 Bit Data Transactions**
- **PCI Address Parity Check**: Enabled
- **MC Bus CMD/Addr Parity Check**: Enabled
- **MC Bus NXM Check**: Enabled
- **Check ALL Transactions for Errors**
- **Use MC_BMSK for 16 Byte Align Blk Mem Wrt**
- **Wrt PEND_NUM Threshold**: 8.
- **RD_TYPE Memory Prefetch Algorithm**: Short
- **RL_TYPE Mem Rd Line Prefetch Type**: Medium
- **RM_TYPE Mem Rd Multiple Cmd Type**: Long
- **ARB_MODE Arbitration**: MC-PCI

### Priority Mode
- **Mem Host Address Ext Reg**: x00000000
- **IO Host Addr Ext Register**: x00000000
- **Interrupt Ctrl Register**: x00000003
- **Interrupt Request**: x00800000
- **IO Interrupt Control Register**: x00000000
- **Interrupt Request**: x00C50010
- **Interrupt Mask1 Register**: x00000000
- **MC Error Info Register 0**: x07FBF080
- **MC Bus Trans Addr<31:4>: 7FBF080**
- **MC Command is Read0-Mem**
- **Device ID 2**: x00000002
- **MC bus error assoc w read/dirty**

### CAP Error Register
- **x00000000**: Uncorrectable ECC err det by MDPA
- **x00000000**: Uncorrectable ECC err det by MDPB
- **x00000000**: MC error info valid
- **x00000000**: MC error info latched

### Sys Environmental Regs
- **x00000000**: MDPA Status Register Data Not Valid
- **x00000000**: MDPA Syndrome Register Data Not Valid
- **x00000000**: MDPB Status Register Data Not Valid
- **x00000000**: MDPB Syndrome Register Data Not Valid

### WHOAMI
- **x0000000BA**: Module Revision 0.
- **VCTY ASIC Rev = 0**
- **Bcache Size = 2MB**
- **MID = 2**
- **GID = 7**

### Base Address of Bridge
- **x000000FBE0000000**: Device Type & Rev Register
- **x0600000021**: CAP Chip Revision: x00000001
- **x0000000021**: HORSE Module Revision: x00000002
- **x0000000021**: SADDLE Module Revision: x00000002
- **SADDLE Module Type**: LeftHand
- **Internal CAP Chip Arbiter**: Enabled
PCI Class Code: 00000600

Module SelfTest Passed LED on Delayed PCI Bus Reads Protocol: Enabled
Bridge to PCI Transactions: Enabled
Bridge REQUESTS 64 Bit Data Transactions
Bridge ACCEPTS 64 Bit Data Transactions
PCI Address Parity Check: Enabled
MC Bus Addr/Addr Parity Check: Enabled
MC Bus NXM Check: Enabled
Check ALL Transactions for Errors
Use MC_BMSK for 16 Byte Align Blk

Mem Wrt

Wrt PEND_NUM Threshold: 8
RD_TYPE Memory Prefetch Algorithm: Short
RL_TYPE Mem Rd Line Prefetch Type: Medium
RM_TYPE Mem Rd Multiple Cmd Type: Long
ARB_MODE Arbitration: MC-PCI Priority Mode

Mem Host Address Ext Reg: 00000000
HAE Sparse Mem Addr<31:27>: 00000000
IO Host Adr Ext Register: 00000000
PCI Upper Adr Bits<31:25>: 00000000
Interrupt Ctrl Register: 00000000
Struct:Enabled
Interrupt Request: 00080000
Interrupts asserted 00000000

Interrupt Mask0 Register: 00000000
Interrupt Mask1 Register: 00000000

MC Error Info Register 0: 00000000
MC Bus Trans Addr<31:4>: 7FBF080
MC bus trans addr <39:32>: 00000000
MC Command is Read0-Mem
Device ID 2: 00000002
MC bus error assoc w read/dirty
MC error info valid

MC Error Info Register 1: 00000000
Uncorrectable ECC err det by MDPA
Uncorrectable ECC err det by MDPB
MC error info latched

CAP Error Register: 00000000
Sys Environmental Regs: 00000000
PCI Bus Trans Error Addr: 00000000
MDPA Status Register: 00000000
MDPA Error Syndrome Reg Valid: 00000000
MDP8 Status Register: 00000000
MDP8 Error Syndrome Reg Valid: 00000000
Palcode Revision: 1.21-3
5.3.4 MCHK 660 IOD-Detected Failure (System Bus Error)

The error log in Example 5-4 shows the following:

1. CPU0 logged the error in a system with two CPUs.
2. The External Interface Status Register does not record an error.
4. The MC Error Info Registers 0 and 1 captured the error information.
5. The commander at the time of the error was CPU3 (known from MC_ERR1).
6. The command on the bus at the time was a write-back memory command.

Since this is an MCHK 660, the IOD detected the error on the bus, and CPU0 is logging the error. CPU0 registers are not important in this case since it is servicing the IOD interrupt. There are three devices that can put data on the system bus: CPUs, memory, or an IOD. From MC_ERR Register 1 we know that at the time of the error CPU3 put bad data on the bus while writing to memory. See Section 5.4 for a procedure designed to help with IOD-detected errors.

NOTE: The error log example has been edited to decrease its size; registers of interest are in bold type. The “Horse” module referred to in the error log is the system bus to PCI bus bridge module, the B3040 module. The “Saddle” module is the PCI motherboard, the B3050 module. The “MC” bus is the system bus.

Refer to Table 5-9 for information on decoding commands, and refer to Table 5-10 for information on node IDs.
Example 5-4  MCHK 660 IOD-Detected Failure (System Bus Error)

Logging OS 2. DIGITAL UNIX  
System Architecture 2. Alpha  
Event sequence number 6.  
Timestamp of occurrence 04-APR-1996 17:20:04  
whip16  
System type register x00000016 AlphaStation 4x00  
Number of CPUs (mpnum) x00000002  
CPU logging event (mperr) x00000000  
Event validity 1. O/S claims event is valid  
Event severity 1. Severe Priority  
Entry type 100. CPU Machine Check Errors  

CPU Minor class 2. 660 Entry  
Software Flags x000000000300000000 IOD 1 Register Subpkt Pres  
Active CPUs x00000003 IOD 2 Register Subpkt Pres  
Hardware Rev x00000000  
System Serial Number C1563  
Module Serial Number x0000  
Module Type x0000  
System Revision x00000000  
* MCHK 660 Regs *  
Flags: x00000000  
PCI Mask x0000  
PAL Check Reason x0202  
PAL SHADOW REG 0 x00000000  
PAL SHADOW REG 7 x00000000  
PALSHADOW REG 7 x0000000007  
PALSHADOW REG 23 x00000000047FDA58  
Exception Address Reg xFFFFFFFC0000000000 Native-mode instruction  
Exception Summary Reg x00000000  
Exception Mask Reg x00000000  
PAL BASE x00000000020000  
Interrupt Summary Reg x00000000020000 EXT. NW interrupt at IPL21  
IBOX Ctrl and Status Reg x00000000C16000000 Timeout Bit Not Set  
Icache Par Err Stat Reg x00000000  
Dcache Par Err Stat Reg x00000000  
Virtual Address Reg xFFFFFFF800130  
Memory Mgmt Flt Sts Reg x00000000014990 Ref resulted in DTB miss  

Error Logs 5-27
RA Field  x0000000006
Opcode Field  x0000000000029
Scache Address Reg  xFFFFFFF000002A4AF
Scache Status Reg  x00000000
Bcache Tag Address Reg  xFFFFFFF80FFED6FFF
Parity for ds and v bits
Cache block dirty
Cache block valid
Tag address<38:20> is
x0000000000000FFE
Ext Interface Address Reg  xFFFFFFF00FC00000F
Fill Syndrome Reg  x0000000000C5D2
Ext Interface Status Reg  xFFFFFFF004FFFFFF
Error occurred during D-ref fill ②
LD LOCK  xFFFFFFF0000020665F
** IOD SUBPACKET -> **
WHOAMI  x0000000BA  Device ID  x0000003A
Bcache Size = 2MB
VCTY ASIC Rev = 0
Module Revision  0.
Base Address of Bridge  x000000F9E0000000
PCI Revision  x06008021
CAP Chip Revision  x00000001
Horse Module Revision  x00000002
Saddle Module Revision  x00000000
Saddle Module Type  Left Hand
EISA Present
PCI Class Code  x00000600
MC-PCI Command Register  x064800F1
Selftest passed
Delayed read enabled
Bridge PCI trans enabled
Reg 64 bit data trans enabled
Accept 64 bit data trans enabled
Check PCI Addr Parity enabled
Check MC bus CMS/Addr Parity enabled
Check MC bus NXM enabled
Check all transaction enabled
16 byte aligned block write enabled
Write Pend Number Thresho x00000008
RD_TYPE  Short
RL_TYPE  Medium
RM_TYPE  Long
ARB_MODE  MC-PCI Bridge Priority Mode
Memory Host Addr Extension  x00000000
IO Host Addr Extension  x00000000
Interrupt Control  x00000003
MC-PCI Intr Enabled
Device intr info enabled if en_int = 1
Interrupt Request  x00000000
Interrupts asserted  x00000000
Hard Error
Interrupt Mask Register 0  x00000000
Interrupt Mask Register 1  x00000000
MC Error Info Register 0  x4A26DBF0
MC bus trans addr <31:4>  x04A26DBF
MC Error Info Register 1  x800ED600
MC bus trans addr <39:32>  x00000000
MC_Command  x000000016  ⑥
Device Id  x0000003B  ⑤
MC error info valid
CAP Error Register  xA0000000
Uncorrectable ECC err det by MDFA  ③
MC error info latched  ④

5-28 AlphaServer 4000/4100 Service Manual
PCI Bus Trans Error Addr  x00000000
MDPA Status Register  x80000000  MDPA Chip Revision  x00000000
MDPA Error Syndrome of uncorrectable read error

MDPA Error Syndrome Reg  x1E00001E  Cycle 0 ECC Syndrome
  Cycle 1 ECC Syndrome  x00000000
  Cycle 2 ECC Syndrome  x00000000
  Cycle 3 ECC Syndrome

MDPB Status Register  x00000000  MDPB Chip Revision  x00000000
MDPB Error Syndrome Reg  x00000000  Cycle 0 ECC Syndrome  x00000000
  Cycle 1 ECC Syndrome  x00000000
  Cycle 2 ECC Syndrome  x00000000
  Cycle 3 ECC Syndrome  x00000000

** IOD SUBPACKET -> **

IOD 1 Register Subpacket
WHOAMI  x000000BA  Device ID  x0000003A
  Bcache Size = 2MB
  VCTY ASIC Rev = 0
  Module Revision 0.

Base Address of Bridge  x0000000FBE000000
PCI Revision  x06000021  CAP Chip Revision  x00000001
  Horse ModuleRevision  x00000002
  Saddle Module Revision  x00000000
  Saddle Module Type Left Hand
  PCI Class Code  x00000600
  MC-PCI Command Register  x06480FF1
    Selftest passed
    Delayed read enabled
    Bridge PCI trans enabled
    Req 64 bit data trans enabled
    Accept 64 bit data trans enabled
    Check PCI Addr Parity enabled
    Check MC bus CMS/Addr Parity enabled
    Check MC bus NXM enabled
    Check all transaction enabled
    16 byte aligned block write enabled
    Write Pend Number Thresho x00000008
    RD_TYPE Short
    RL_TYPE Medium
    RM_TYPE Long
    ARB_MODE MC-PCI Bridge Priority Mode

Memory Host Addr Extten  x00000000
IO Host Addr Extension  x00000000
Interrupt Control  x00000003  MC-PCI Intr Enabled
    Device Intr info enabled if en_int = 1

Interrupt Request  x00800000  Interrupts asserted  x00000000
Hard Error

Interrupt Mask Register 0  x00C50001
Interrupt Mask Register 1  x00000000
MC Error Info Register 0  x4A26DBF0
MC Error Info Register 1  x800ED600
MC bus trans addr <31:4>  x0A4A26DBF
MC bus trans addr <39:32>  x00000000

MC_Command  x00000016
Device Id  x00000003B
MC error info valid

CAP Error Register  xA0000000
Uncorrectable ECC err det by MDPA
MC error info latched

PCI Bus Trans Error Addr  x00000000
MDPA Status Register  x80000000  MDPA Chip Revision  x00000000
<table>
<thead>
<tr>
<th>Description</th>
<th>Value</th>
</tr>
</thead>
<tbody>
<tr>
<td>MDPA Error Syndrome Reg</td>
<td>x1E00001E</td>
</tr>
<tr>
<td>Cycle 0 ECC Syndrome</td>
<td>x00000000</td>
</tr>
<tr>
<td>Cycle 1 ECC Syndrome</td>
<td>x00000000</td>
</tr>
<tr>
<td>Cycle 2 ECC Syndrome</td>
<td>x00000000</td>
</tr>
<tr>
<td>Cycle 3 ECC Syndrome</td>
<td>x00000000</td>
</tr>
<tr>
<td>MDPB Status Register</td>
<td>x00000000</td>
</tr>
<tr>
<td>MDPB Chip Revision</td>
<td>x00000000</td>
</tr>
<tr>
<td>MDPB Error Syndrome Reg</td>
<td>x00000000</td>
</tr>
<tr>
<td>Cycle 0 ECC Syndrome</td>
<td>x00000000</td>
</tr>
<tr>
<td>Cycle 1 ECC Syndrome</td>
<td>x00000000</td>
</tr>
<tr>
<td>Cycle 2 ECC Syndrome</td>
<td>x00000000</td>
</tr>
<tr>
<td>Cycle 3 ECC Syndrome</td>
<td>x00000000</td>
</tr>
</tbody>
</table>

PALcode Revision: Palcode Rev: 1.21-3
5.3.5 MCHK 660 IOD-Detected Failure (PCI Error)

The error log in Example 5-5 shows the following:

1. CPU 0 logged the error in a system with three CPUs.
2. The External Interface Status register records that the error occurred during a D-ref Fill but does not indicate what the error is.
3. The CAP Error register for IOD0 did not see an error.
4. The CAP Error register for IOD1, however, records a serious error.
5. The MC Error Info registers 0 and 1 captured the error information.
6. The commander at the time of the error was CPU0 and the command was a Read-IO (known from MC_ERR1).
7. There is a PCI Subpacket from PCI1 with five nodes on it. Three devices on the PCI bus did not see an error, however two did, the Mylex DAC960 and the DEC_KZPSA. Either device could have caused the parity error.

Since this is an MCHK 660, the IOD-detected the error on the bus, and CPU0 is logging the error. CPU0 registers are not important in this case since it is servicing the IOD interrupt. There are three devices that can put data on the system bus: CPUs, memory, or an IOD. The CAP Error register for IOD1 saw a serious error and the MC Error Info register captured error information. See Section 5.4 for a procedure designed to help with IOD-detected errors.

NOTE: The error log example has been edited to decrease its size; registers of interest are in bold type. The "Horse" module referred to in the error log is the system bus to PCI bus bridge module, the B3040 module. The "Saddle" module is the PCI motherboard, the B3050 module. The "MC" bus is the system bus.

Refer to Table 5-9 for information on decoding commands, and refer to Table 5-10 for information on node IDs.
Example 5-5  MCHK 660 IOD-Detected Failure (PCI Error)

<table>
<thead>
<tr>
<th>Description</th>
<th>Value</th>
</tr>
</thead>
<tbody>
<tr>
<td>Logging OS</td>
<td>2. DIGITAL UNIX</td>
</tr>
<tr>
<td>System Architecture</td>
<td>2. Alpha</td>
</tr>
<tr>
<td>Event sequence number</td>
<td>2.</td>
</tr>
<tr>
<td>Timestamp of occurrence</td>
<td>27-AUG-1996 08:15:41</td>
</tr>
<tr>
<td>Host name</td>
<td>mason3</td>
</tr>
<tr>
<td>System type register</td>
<td>x00000016</td>
</tr>
<tr>
<td>Number of CPUs (mpnum)</td>
<td>x00000003</td>
</tr>
<tr>
<td>CPU logging event (mperr)</td>
<td>x00000000</td>
</tr>
<tr>
<td>Event validity</td>
<td>1. O/S claims event is valid</td>
</tr>
<tr>
<td>Event severity</td>
<td>1. Severe Priority</td>
</tr>
<tr>
<td>Entry type</td>
<td>100. CPU Machine Check Errors</td>
</tr>
<tr>
<td>CPU Minor class</td>
<td>2. 660 Entry</td>
</tr>
<tr>
<td>Software Flags</td>
<td>x000000023000000000</td>
</tr>
<tr>
<td></td>
<td>IOD 0 Register Subpkt Pres</td>
</tr>
<tr>
<td></td>
<td>IOD 1 Register Subpkt Pres</td>
</tr>
<tr>
<td></td>
<td>PCI 1 Bus Snapshot Present</td>
</tr>
<tr>
<td>Active CPUs</td>
<td>x00000007</td>
</tr>
<tr>
<td>Hardware Rev</td>
<td>x00000000</td>
</tr>
<tr>
<td>System Serial Number</td>
<td>NI62503MWE</td>
</tr>
<tr>
<td>Module Serial Number</td>
<td>x0000</td>
</tr>
<tr>
<td>System Revision</td>
<td>x00000000</td>
</tr>
</tbody>
</table>

* MCHK 660 Regs *

<table>
<thead>
<tr>
<th>Description</th>
<th>Value</th>
</tr>
</thead>
<tbody>
<tr>
<td>Flags:</td>
<td>x00000000</td>
</tr>
<tr>
<td>PCI Mask</td>
<td>x0002</td>
</tr>
<tr>
<td>Machine Check Reason</td>
<td>x0202</td>
</tr>
<tr>
<td></td>
<td>IOD-Detected Hard Error -OR- DTag Parity Error (If Cached CPU)</td>
</tr>
<tr>
<td>PAL SHADOW REG 0</td>
<td>x000000000000000000</td>
</tr>
<tr>
<td>PAL SHADOW REG 1</td>
<td>x000000000000000000</td>
</tr>
<tr>
<td>PAL SHADOW REG 2</td>
<td>x000000000000000000</td>
</tr>
<tr>
<td>PAL SHADOW REG 3</td>
<td>x000000000000000000</td>
</tr>
<tr>
<td>PAL SHADOW REG 4</td>
<td>x000000000000000000</td>
</tr>
<tr>
<td>PAL SHADOW REG 5</td>
<td>x000000000000000000</td>
</tr>
<tr>
<td>PAL SHADOW REG 6</td>
<td>x000000000000000000</td>
</tr>
<tr>
<td>PAL SHADOW REG 7</td>
<td>x000000000000000000</td>
</tr>
<tr>
<td>PALTEMP0</td>
<td>xFFFFFFFBBB93C0000</td>
</tr>
<tr>
<td>PALTEMP1</td>
<td>x000000000000000000</td>
</tr>
<tr>
<td>PALTEMP2</td>
<td>xFFFFFFCC00043CD00</td>
</tr>
<tr>
<td>PALTEMP3</td>
<td>x000000000000000000</td>
</tr>
<tr>
<td>PALTEMP4</td>
<td>x000000000000000000</td>
</tr>
<tr>
<td>PALTEMP5</td>
<td>x000000000000000000</td>
</tr>
<tr>
<td>PALTEMP6</td>
<td>x000000000000000000</td>
</tr>
<tr>
<td>PALTEMP7</td>
<td>xFFFFFFC00043C820</td>
</tr>
<tr>
<td>PALTEMP8</td>
<td>x1F1E71515020100</td>
</tr>
<tr>
<td>PALTEMP9</td>
<td>xFFFFFFC00043C810</td>
</tr>
<tr>
<td>PALTEMP10</td>
<td>xFFFFFFC000433E0C</td>
</tr>
<tr>
<td>PALTEMP11</td>
<td>xFFFFFFC00043C970</td>
</tr>
<tr>
<td>PALTEMP12</td>
<td>xFFFFFFC00043CD10</td>
</tr>
<tr>
<td>PALTEMP13</td>
<td>x0000000000026E80</td>
</tr>
<tr>
<td>PALTEMP14</td>
<td>x000000000000000000</td>
</tr>
<tr>
<td>PALTEMP15</td>
<td>x000000000000000000</td>
</tr>
<tr>
<td>PALTEMP16</td>
<td>x0000020366000001</td>
</tr>
<tr>
<td>PALTEMP17</td>
<td>x000000000000000000</td>
</tr>
<tr>
<td>PALTEMP18</td>
<td>x000000000000000000</td>
</tr>
<tr>
<td>PALTEMP19</td>
<td>xFFFFFFFBBB93F958</td>
</tr>
<tr>
<td>PALTEMP20</td>
<td>x00000000009D2000</td>
</tr>
</tbody>
</table>
PALTEMP21 xFFFFFFC0000043CD40
PALTEMP22 xFFFFFFC000058D540
PALTEMP23 x0000000007C7A58
Exception Address Reg xFFFFFFC000043EEC

Native-mode Instruction
Exception PC x3FFFFF000010CF83

Exception Summary Reg x0000000000000000
Exception Mask Reg x0000000000000000
PAL Base Address Reg x0000000000020000

Base Addr for PALcode:
x0000000000000008
Interrupt Summary Reg x0000000000020000
External HW Interrupt at IPL21
AST Requests 3-0:
x0000000000000000
IBOX Ctrl and Status Reg x0000000C16000000
Timeout Counter Bit Clear.
IBOX Timeout Counter Enabled.
Floating Point Instructions will Cause
FEN Exceptions.
PAL Shadow Registers Enabled.
Correctable Error Interrupts Enabled.
ICACHE BIST (Self Test) Was Successful.
TEST_STATUS_H Pin Asserted

Icache Par Err Stat Reg x0000000000000000
Dcache Par Err Stat Reg x0000000000000000
Virtual Address Reg xFFFFFFFB6ED3D38
Memory Mgmt Flt Sts Reg x0000000000016211
If Error, Reference Which Caused Was Write
If Err, Reference Resulted in DTB Miss
Fault Inst RA Field: x0000000000000008
Fault Inst Opcode: x000000000000000C

Scache Address Reg xFFFFFFFF0000002502F
Scache Status Reg x0000000000000000
Bcache Tag Address Reg xFFFFFFF8077AFAFFF
Last Bcache Access Resulted in a Miss.
Value of Parity Bit for Tag Control Status
Bits Dirty, Shared & Valid is Set.
Value of Tag Control Dirty Bit is Clear.
Value of Tag Control Shared Bit is Set.
Value of Tag Control Valid Bit is Set.
Value of Parity Bit Covering Tag Store
Address Bits is Set.

Tag Address<38:20> Is: x0000000000000077A
Ext Interface Address Reg xFFFFFFF00E7E0000F
Fill Syndrome Reg x000000000000DE08

Ext Interface Status Reg xFFFFFFF004EFFFFF
Error Occurred During D-ref Fill
LD LOCK xFFFFFFF0076750B8F

** IOD SUBPACKET -> **

IOD 0 Register Subpacket
WHOAMI x000008BA Module Revision 2.
VCTY ASIC Rev = 0
Bcache Size = 2MB
MID 2.
GID 7.

Base Address of Bridge x000000F9E0000000
Dev Type & Rev Register x06008231 CAP Chip Revision: x00000001
HORSE Module Revision: x00000003
SADDLE Module Revision: x00000002
SADDLE Module Type: Left Hand
PCI-EISA Bus Bridge Present on PCI Segment
** PCI Class Code **

PCI Class Code: x00000600

** MC-PCI Command Register **

MC-PCI Command Register: x06470FB1
- Module SelfTest Passed LED on
- Delayed PCI Bus Reads Protocol: Enabled
- Bridge WILL NOT REQUEST 64 Bit Data Trans
- Bridge ACCEPTS 64 Bit Data Transactions
- PCI Address Parity Check: Enabled
- MC Bus CMD/Addr Parity Check: Enabled
- MC Bus NXM Check: Enabled
- Check ALL Transactions for Errors
- Use MC_BMSK for 16 Byte Align Blk Mem Wrt
- Wrt PEND_NUM Threshold: 7
- RD_TYPE Memory Prefetch Algorithm: Short
- RL_TYPE Mem Rd Line Prefetch Type: Medium
- RM_TYPE Mem Rd Multiple Cmd Type: Long
- ARB_MODE Arbitration: MC-PCI Priority Mode

** Mem Host Address Ext Reg **

Mem Host Address Ext Reg: x00000000
- HA Sparse Mem Addr<31:27>: x00000000

** IO Host Adr Ext Register **

IO Host Adr Ext Register: x00000000
- PCI Upper Adr Bits<31:25>: x00000000

** Interrupt Ctrl Register **

Interrupt Ctrl Register: x00000003
- Write Device Interrupt Info

** Interrupt Reqst **

Interrupt Request: x00000000
- Interrupts asserted: x00000000

** Interrupt Mask0 Register **

Interrupt Mask0 Register: x00000000

** Interrupt Mask1 Register **

Interrupt Mask1 Register: x00000000

** MC Error Info Register 0 **

MC Error Info Register 0: xE0000000
- MC Bus Trans Addr<31:4>: E00000000

** MC Error Info Register 1 **

MC Error Info Register 1: x000E89FD
- MC bus trans addr<39:32>: x000000FD
- MC Command is Read1-IO
- CPU0 Master at Time of Error
- Device ID 2: x00000002

** CAP Error Register **

CAP Error Register: x00000000

** Sys Environmental Regs **

Sys Environmental Regs: x00000000

** PCI Bus Trans Error Addr **

PCI Bus Trans Error Addr: x00000000

** MDPA Status Register **

MDPA Status Register: x00000000
- MDPA Status Register Data Not Valid

** MDPA Error Syndrome Reg **

MDPA Error Syndrome Reg: x00000000
- MDPA Syndrome Register Data Not Valid

** MDPB Status Register **

MDPB Status Register: x00000000
- MDPB Status Register Data Not Valid

** MDPB Error Syndrome Reg **

MDPB Error Syndrome Reg: x00000000
- MDPB Syndrome Register Data Not Valid

** WHOAMI **

WHOAMI: x000008BA
- Module Revision: 2
- VCTY ASIC Rev: 0
- Bcache Size: 2MB
- MID: 2
- GID: 7

** Base Address of Bridge **

Base Address of Bridge: x00000F8E0000000

** Dev Type & Rev Register **

Dev Type & Rev Register: x06000231
- CAP Chip Revision: x00000001
- HORSE Module Revision: x00000003
- SADDLE Module Revision: x00000002
- SADDLE Module Type: Left Hand
- Internal CAP Chip Arbiter: Enabled
- PCI Class Code: x00000600

** MC-PCI Command Register **

MC-PCI Command Register: x06470FB1
- Module SelfTest Passed LED on
- Delayed PCI Bus Reads Protocol: Enabled
- Bridge to PCI Transactions: Enabled
- Bridge WILL NOT REQUEST 64 Bit Data Trans
- Bridge ACCEPTS 64 Bit Data Transactions
- PCI Address Parity Check: Enabled
- MC Bus CMD/Addr Parity Check: Enabled
- MC Bus NXM Check: Enabled
- Check ALL Transactions for Errors
- Use MC_BMSK for 16 Byte Align Blk Mem Wrt
- Wrt PEND_NUM Threshold: 7
RD_TYPE Memory Prefetch Algorithm: Short
RL_TYPE Mem Rd Line Prefetch Type: Medium
RM_TYPE Mem Rd Multiple Cmd Type: Long
ARB_MODE Arbitration: MC-PCI Priority Mode
Mem Host Address Ext Reg x00000000 HAE Sparse Mem Addr<31:27> x00000000
IO Host Adr Ext Register x00000000 PCI Upper Adr Bits<31:25> x00000000
Interrupt Ctrl Register x00000003 Write Device Interrupt Info
Struct:Enabled Interrupt Request x00800000 Interrupts asserted x00000000
Interrupt Mask0 Register x00C51111 Hard Error
Interrupt Mask1 Register x00000000
MC Error Info Register 0 xE0000000
MC Error Info Register 1 x000E89FD 6 MC Bus Trans Addr<31:4>: E0000000 7
MC Command is Read-1/O 6 MC bus trans addr <39:32> x0000000FD 5
CAP Error Register x00000012 4 PCI error address reg locked
Sys Environmental Regs x00000000
cPCI Bus Trans Error Adr x00BB6900
MDPA Status Register x00000000  MDPA Status Register Data Not Valid
MDPA Error Syndrome Reg x00000000  MDPA Syndrome Reg. Data Not Valid
MDPB Status Register x00000000  MDPB Status Register Data Not Valid
MDPB Error Syndrome Reg x00000000  MDPB Syndrome Reg. Data Not Valid
PALcode Revision Palcode Rev: 1.21-3
** PCI SUBPACKET -> ** PCI 1 Subpacket
Node Qty 5.
CONFIG Address x000000FBC0000800
Device and Vendor ID x00011000 NCR 53C810 NCR_810 SCSI Narrow
SingleEnded Vendor ID: x1000 (NCR)
Command Register x0147 I/O Space Accesses Response: Enabled
x00 Memory Space Accesses Response: Enabled
PCI Bus Master Capability: Enabled
Monitor for Special Cycle Ops: DISABLED
Generate Mem Wrt/Invalidate CmDs: DISABLED
Parity Error Detection Response: Normal
Wait Cycle Address/Data Stepping: DISABLED
SERR# Sys Err Driver Capability: Enabled
Fast Back-to-Back to Many Target: DISABLED
x0200 Device is 33 Mhz Capable.
Status Register No Support for User Defineable Features.
Fast Back-to-Back to Different Targets,
Is Not Supported in Target Device.
Device Select Timing: Medium.
Revision ID x02
Device Class Code x010000 Mass Storage: SCSI Bus Controller
Cache Line S x00
Latency T. xFF
Header Type x00 Single Function Device
Bist x00
Base Address Register 1 x00101300
Base Address Register 2 x0119300
Base Address Register 3 x00000000
Base Address Register 4 x00000000
Base Address Register 5 x00000000

** PCI SUBPACKET -> **
PCI 1 Subpacket
Node Qty 5.
Base Address Register 6  x00000000
Expansion Rom Base Address x00000000
Interrupt P1   x04
Interrupt P2   x01
Min Gnt       x00
Max Lat       x00

CONFIG Address  x0000000FBC0001000
Device and Vendor ID x00011069  Mylex DAC960 KEPSC RAID Controller
Vendor ID:  x1069 (Mylex)
Device ID:  x00000001
Command Register  x0147  I/O Space Accesses Response:  Enabled
Memory Space Accesses Response:  Enabled
PCI Bus Master Capability:  Enabled
Monitor for Special Cycle Ops:  DISABLED
Generate Mem Wrt/Invalidate Cnms:  DISABLED
Parity Error Detection Response:  Normal
Wait Cycle Address/Data Stepping:  DISABLED
SERR# Sys Err Driver Capability:  Enabled
Fast Back-to-Back to Many Target:  DISABLED
xC200  Device is 33 Mhz Capable.
Status Register
No Support for User Defineable Features.
Fast Back-to-Back to Different Targets, Is Not Supported in Target Device.
Device Select Timing:  Medium.
SIGNALED SYSTEM ERROR:  This Device has Set A System Error on SERR# Line.
DETECTED PARITY ERROR:  This Device Detected
Revision ID x02
Device Class Code  x010400
Cache Line S  x10
Latency T.  xFF
Header Type  x00  Single Function Device
Bist  x00
Base Address Register 1  x00101200
Base Address Register 2  x01119200
Base Address Register 3  x00000000
Base Address Register 4  x00000000
Base Address Register 5  x00000000
Base Address Register 6  x00000000
Expansion Rom Base Address  x01110000
Interrupt P1  x08
Interrupt P2  x01
Min Gnt  x04
Max Lat  x00

CONFIG Address  x0000000FBC0001800
Device and Vendor ID x00011000  NCR 53C810 NCR_810 SCSI Narrow SingleEnded
Vendor ID:  x1000 (NCR)
Device ID:  x00000001
Command Register  x0147  I/O Space Accesses Response:  Enabled
Memory Space Accesses Response:  Enabled
PCI Bus Master Capability:  Enabled
Monitor for Special Cycle Ops:  DISABLED
Generate Mem Wrt/Invalidate Cnms:  DISABLED
Parity Error Detection Response:  Normal
Wait Cycle Address/Data Stepping:  DISABLED
SERR# Sys Err Driver Capability:  Enabled
Fast Back-to-Back to Many Target:  DISABLED
xC200  Device is 33 Mhz Capable.
Status Register
No Support for User Defineable Features.
Fast Back-to-Back to Different Targets, Is Not Supported in Target Device.
Device Select Timing: Medium.

Revision ID: x02
Device Class Code: x010000 Mass Storage: SCSI Bus Controller
Cache Line S: x00
Latency T.: xFF
Header Type: x00 Single Function Device
Bist: x00
Base Address Register 1: x00101100
Base Address Register 2: x01119100
Base Address Register 3: x00000000
Base Address Register 4: x00000000
Base Address Register 5: x00000000
Base Address Register 6: x00000000
Expansion Rom Base Address: x00000000
Interrupt P1: x0C
Interrupt P2: x01
Min Gnt: x00
Max Lat: x00

CONFIG Address: x000000FBC0002000
Device and Vendor ID: x00091011 DECchip 21140 10/100Mhz TULIP
Ethernet
Vendor ID: x1011 (Digital Equip Corp)
Device ID: x00000009
Command Register: x0147 I/O Space Accesses Response: Enabled
Memory Space Accesses Response: Enabled
PCI Bus Master Capability: Enabled
Monitor for Special Cycle Ops: DISABLED
Generate Mem Hrt/Invalidate Cncls: DISABLED
Parity Error Detection Response: Normal
Wait Cycle Address/Data Stepping: DISABLED
SERR# Sys Err Driver Capability: Enabled
Fast Back-to-Back to Many Target: DISABLED
x0280 Device is 33 Mhz Capable.

Status Register
Revision ID: x12
Device Class Code: x020000 Network Controller: Ethernet Controller
Cache Line S: x00
Latency T.: xFF
Header Type: x00 Single Function Device
Bist: x00
Base Address Register 1: x00101000
Base Address Register 2: x01119000
Base Address Register 3: x00000000
Base Address Register 4: x00000000
Base Address Register 5: x00000000
Base Address Register 6: x00000000
Expansion Rom Base Address: x00000000
Interrupt P1: x10
Interrupt P2: x01
Min Gnt: x00
Max Lat: x00

CONFIG Address: x000000FBC0002800
Device and Vendor ID: x00081011 DEC_KZPSA Fast-Wide-Differential SCSI
Vendor ID: x1011 (Digital Equip Corp)
Device ID: x00000008

Error Logs 5-37
### Command Register
- `x0147` I/O Space Accesses Response: Enabled
- Memory Space Accesses Response: Enabled
- PCI Bus Master Capability: Enabled
- Generate Mem Wrt/Invalidate Cmds: DISABLED
- Parity Error Detection Response: Normal
- Wait Cycle Address/Data Stepping: DISABLED
- SERR# Sys Err Driver Capability: Enabled
- Fast Back-to-Back to Many Target: DISABLED

### Status Register
- `x82C0` Device is 33 Mhz Capable.
- Device Supports User Defineable Features.
- Fast Back-to-Back to Different Targets, Is Supported in Target Device.
- Device Select Timing: Medium.

**RECEIVED MASTER-ABORT:** Master Sets When Its Transaction Terminated by MasterAbort.

**SIGNALED SYSTEM ERROR:** This Device has Set A System Error on SERR# Line.

**DETECTED PARITY ERROR:** This Device Detected

**Revision ID**
- `00`

**Device Class Code**
- `010000` Mass Storage: SCSI Bus Controller

**Cache Line S**
- `10`

**Latency T.**
- `FF`

**Header Type**
- `00` Single Function Device

**Bist**
- `80`

**Base Address Register 1**
- `01118000`

**Base Address Register 2**
- `00000000`

**Base Address Register 3**
- `00100000`

**Base Address Register 4**
- `01000000`

**Base Address Register 5**
- `00000000`

**Base Address Register 6**
- `00000000`

**Expansion Rom Base Address**
- `01100000`

**Interrupt P1**
- `14`

**Interrupt P2**
- `01`

**Min Gnt**
- `08`

**Max Lat**
- `7F`
5.3.6 MCHK 630 Correctable CPU Error

The error log in Example 5-6 shows the following:

1. CPU0 logged the error in a system with two CPUs.
2. During a D-ref fill, the External Interface Status Register shows no error but states that the “data source is b-cache.” (When a CPU chip does not find data it needs to perform a task in any of its caches, it requests data from off the chip to fill its D-cache. It performs a D-ref fill.)
3. Both IOD CAP Error Registers logged no error.
4. The FIL Syndrome Register has a valid ECC code for the lower half of the data.

Machine check 630s are detected by CPUs when they either take data off the system bus or when they access their own B-cache. In this case, the data did not come from the system bus, otherwise bit <30> would be set in the External Interface Status Register. CPU0 had a single-bit, ECC correctable error.

NOTE: The error log example has been edited to decrease its size; registers of interest are in bold type. The “Horse” module referred to in the error log is the system bus to PCI bus bridge module, the B3040 module. The “Saddle” module is the PCI motherboard, the B3050 module. The “MC” bus is the system bus.

Refer to Table 5-9 for information on decoding commands, and refer to Table 5-10 for information on node IDs.
Example 5-6  MCHK 630 Correctable CPU Error

Logging OS 2. DIGITAL UNIX
System Architecture 2. Alpha
Event sequence number 415.
Timestamp of occurrence 09-MAY-1996 14:56:30
whipl6
Host name
System type register x00000016 AlphaStation 4x00
Number of CPUs (mpnum) x00000002
CPU logging event (mperr) x00000000

Event validity 1. O/S claims event is valid
Event severity 3. High Priority
Entry type 100. CPU Machine Check Errors

CPU Minor class 3. Bcache error (630 entry)

Software Flags x00000000
Active CPUs x00000003
Hardware Rev x00000000
System Serial Number C1563
Module Serial Number x0000
System Revision x00000000

Machine Check Reason x0086 Alpha Chip Detected ECC Err, From B-Cache

EI STAT xFFFFFFF004FFFFF
DATA SOURCE IS BCACHE
D-ref fill EV5 Chip Rev 4
EI ADDRESS xFFFFFFF00138D8EF

FIL SYNDROME x00000000000800
ISR x0000000100200000
WHOGAMI x00000000 Module Revision 0.
MID 0.
GID 0.

Sys Environmental Regs x00000000
Base Addr of Bridge x00000000
Dev Type & Rev Register x00000000

CAP Chip Revision x00000000
Horse Module Revision: x00000000
Saddle Module Revision: x00000000
Saddle Module Type: LeftHand
Internal CAP Chip Arbiter: Enabled
PCI Class Code x00000000

MC Error Info Register 0 x00000000
MC Bus Trans Addr<31:4>: 0
MC bus trans addr <39:32> x00000000
MC Command is Illegal
Illegal Device ID 2 x00000000

MC Error Info Register 1 x00000000

CAP Error Register x00000000
MDPA Status Register x00000000 MDPA Status Register Data Not Valid
MDPA Error Syndrome Reg x00000000 MDPA Syndrome Register Data Not Valid
MDPB Status Register x00000000 MDPB Status Register Data Not Valid
MDPB Error Syndrome Reg x00000000 MDPB Syndrome Register Data Not Valid

PALcode Revision x00000000
Palcode Rev: 1.21-3
5.3.7 MCHK 620 Correctable Error

The MCHK 620 error is a correctable error detected by the IOD.

The error log in Example 5-7 shows the following:

1. CPU0 logged the error in a system with two CPUs.
2. The External Interface Status Register is not valid.
3. The MC Error Info Registers 0 and 1 captured the error information.
4. The commander at the time of the error was CPU0.
5. The command at the time of the error was a write-back memory command.

The IOD detected a recoverable error on the system bus. The MC command at the time of the error is a WriteThru-Mem Command (x00000006). The system bus commander at the time of the error is CPU0. Since this is a write, the defective FRU is CPU0.

NOTE: The error log example has been edited to decrease its size; registers of interest are in bold type. The “Horse” module referred to in the error log is the system bus to PCI bus bridge module, the B3040 module. The “Saddle” module is the PCI motherboard, the B3050 module. The “MC” bus is the system bus.

Refer to Table 5-9 for information on decoding commands, and refer to Table 5-10 for information on node IDs.

Example 5-7 MCHK 620 Correctable Error

| Logging OS | 2. DIGITAL UNIX |
| System Architecture | 2. Alpha |
| Event sequence number | 32 |
| Timestamp of occurrence | 28-JUN-1996 19:45:42 |
| Host name | sect06 |
| System type register | x00000016 AlphaStation 4x00 |
| Number of CPUs (mpnum) | x00000002 |
| CPU logging event (mperr) | x00000000 |

6. Software Flags | x0000000000000000 |
7. Active CPUs | x00000003 |
8. Hardware Rev | x00000000 |
9. System Serial Number | C1563 |
10. Module Serial Number |
11. Module Type | x0000 |
<table>
<thead>
<tr>
<th>Field</th>
<th>Value</th>
</tr>
</thead>
<tbody>
<tr>
<td>System Revision</td>
<td>x00000000</td>
</tr>
<tr>
<td>Machine Check Reason</td>
<td>x0204 IOD Detected Soft Error</td>
</tr>
<tr>
<td><strong>Ext Interface Status Reg</strong></td>
<td><strong>x0000000000000000</strong></td>
</tr>
<tr>
<td><strong>Ext Interface Address Reg</strong></td>
<td><strong>x0000000000000000</strong></td>
</tr>
<tr>
<td><strong>Fill Syndrome Reg</strong></td>
<td><strong>x0000000000000000</strong></td>
</tr>
<tr>
<td><strong>Interrupt Summary Reg</strong></td>
<td><strong>x0000000000000000</strong></td>
</tr>
<tr>
<td>WHOAMI</td>
<td>x00000000 Module Revision 0. MID 0. GID 0.</td>
</tr>
<tr>
<td>Sys Environmental Regs</td>
<td>x00000000</td>
</tr>
<tr>
<td>Base Addr of Bridge</td>
<td>x0000000FBE00000000</td>
</tr>
<tr>
<td>Dev Type &amp; Rev Register</td>
<td>x06000012 CAP Chip Revision: x00000002 Horse Module Revision: x00000003</td>
</tr>
<tr>
<td></td>
<td>Saddle Module Revision: x00000000 Saddle Module Type: LeftHand</td>
</tr>
<tr>
<td></td>
<td>Internal CAP Chip Arbiter: Enabled</td>
</tr>
<tr>
<td></td>
<td>PCI Class Code: x00000600</td>
</tr>
<tr>
<td>MC Error Info Register 0</td>
<td>x122D5640</td>
</tr>
<tr>
<td>MC Error Info Register 1</td>
<td>x800E9600</td>
</tr>
<tr>
<td>CAP Error Register</td>
<td>x89000000</td>
</tr>
<tr>
<td>MDPA Status Register</td>
<td>x00000000 MDPA Status Register Data Not Valid</td>
</tr>
<tr>
<td>MDPA Error Syndrome Reg</td>
<td>x00000000 MDPA Syndrome Register Data Not Valid</td>
</tr>
<tr>
<td>MDPB Status Register</td>
<td>x00000000 MDPB Status Register Data Not Valid</td>
</tr>
<tr>
<td>MDPB Error Syndrome Reg</td>
<td>x00000000 MDPB Syndrome Register Data Not Valid</td>
</tr>
<tr>
<td>PALcode Revision</td>
<td>x00000000 MDPB Palcode Rev: 0.0-1</td>
</tr>
</tbody>
</table>
5.4 Troubleshooting IOD-Detected Errors

Step 1
Read the CAP Error Registers on both PCI bridges (F9E0000880 and FBE0000880). If one or both of these registers shows an error, match the register contents with the data pattern and perform the action indicated.

Table 5-3  CAP Error Register Data Pattern

<table>
<thead>
<tr>
<th>Data Pattern</th>
<th>Most Likely Cause</th>
<th>Action</th>
</tr>
</thead>
<tbody>
<tr>
<td>110x x00x x000 0000 0000 0000 000x xxxx</td>
<td>RDSB - Uncorrectable ECC error detected on upper QW of MC bus (D127:64&gt;)</td>
<td>Go to Step 2</td>
</tr>
<tr>
<td>101x x00x x000 0000 0000 0000 000x xxxx</td>
<td>RDSA - Uncorrectable ECC error detected on lower QW of MC bus (D63:0&gt;)</td>
<td>Go to Step 2</td>
</tr>
<tr>
<td>111x x00x x000 0000 0000 0000 000x xxxx</td>
<td>RDS detected in both QWs</td>
<td>Go to Step 2</td>
</tr>
<tr>
<td>1001 1000 x000 0000 0000 0000 000x xxxx</td>
<td>CRDB - Correctable ECC error detected on upper QW of MC bus (D127:64&gt;)</td>
<td>Go to Step 2</td>
</tr>
<tr>
<td>1000 0000 x000 0000 0000 0000 0000 xxxx</td>
<td>CRDA - Correctable ECC error detected on lower QW of MC bus (D63:0&gt;)</td>
<td>Go to Step 2</td>
</tr>
<tr>
<td>1001 1000 x000 0000 0000 0000 0000 xxxx</td>
<td>CRD detected in both QWs.</td>
<td>Go to Step 2</td>
</tr>
<tr>
<td>100x x10x x000 0000 0000 0000 000x xxxx</td>
<td>NXM - Nonexistent MC bus address</td>
<td>Go to Step 3</td>
</tr>
<tr>
<td>100x x01x x000 0000 0000 0000 000x xxxx</td>
<td>MC_ADR_PERR - MC bus address parity error</td>
<td>Go to Step 4</td>
</tr>
<tr>
<td>100x x00x 1000 0000 0000 0000 0000 xxxx</td>
<td>PIO_OVFL - PIO buffer overflow</td>
<td>Go to Step 5</td>
</tr>
<tr>
<td>0000 0000 0000 0000 0000 0000 0001 1xxx</td>
<td>PTE_INV - Page table entry is invalid</td>
<td>Go to Step 6</td>
</tr>
<tr>
<td>0000 0000 0000 0000 0000 0000 0000 0001 x1xx</td>
<td>MAB - Master abort</td>
<td>Go to Step 7</td>
</tr>
<tr>
<td>0000 0000 0000 0000 0000 0000 0000 0001 xx1x</td>
<td>SERR - PCI system error</td>
<td>Go to Step 8</td>
</tr>
<tr>
<td>0000 0000 0000 0000 0000 0000 0000 0001 xxx1</td>
<td>PERR - PCI parity error</td>
<td>Go to Step 9</td>
</tr>
</tbody>
</table>
### 5.4.1 System Bus ECC Error

**Step 2**

Read the MC_ERR1 register and match the contents with the data pattern. Perform the action indicated.

#### Table 5-4 System Bus ECC Error Data Pattern

<table>
<thead>
<tr>
<th>MC_ERR1 Data Pattern</th>
<th>Most Likely Cause</th>
<th>Action</th>
</tr>
</thead>
<tbody>
<tr>
<td><strong>for Memory Read</strong></td>
<td></td>
<td></td>
</tr>
<tr>
<td>1000 0000 0000 xxxx xxxx 10xx 0xxx xxxx</td>
<td>Bad nondirty data from memory (bad memory)</td>
<td>Go to Step 10</td>
</tr>
<tr>
<td>1000 0000 0000 xxxx xxxx 111x 0xxx xxxx</td>
<td>Bad nondirty data from memory (bad memory)</td>
<td>Go to Step 10</td>
</tr>
<tr>
<td>1000 0000 0001 xxxx xxxx 10xx 0xxx xxx</td>
<td>Bad dirty data from a CPU</td>
<td>Replace CPU(s)</td>
</tr>
<tr>
<td>1000 0000 0001 xxxx xxxx 111x 0xxx xxxx</td>
<td>Bad dirty data from a CPU</td>
<td>Replace CPU(s)</td>
</tr>
<tr>
<td><strong>for Memory or I/O Write</strong></td>
<td></td>
<td></td>
</tr>
<tr>
<td>1000 0000 0000x xxxx0 10xx 011x xxxx xxxx</td>
<td>Bad data from MID = 2</td>
<td>Replace CPU0</td>
</tr>
<tr>
<td>1000 0000 0000x xxxx0 11xx 011x xxxx xxxx</td>
<td>Bad data from MID = 3</td>
<td>Replace CPU1</td>
</tr>
<tr>
<td>1000 0000 0000x xxxx1 00xx 011x xxxx xxxx</td>
<td>Bad data from MID = 4</td>
<td>Replace IOD0</td>
</tr>
<tr>
<td>1000 0000 0000x xxxx1 01xx 011x xxxx xxxx</td>
<td>Bad data from MID = 5</td>
<td>Replace IOD0</td>
</tr>
<tr>
<td>1000 0000 0000x xxxx1 10xx 011x xxxx xxxx</td>
<td>Bad data from MID = 6</td>
<td>Replace CPU2 or IOD1</td>
</tr>
<tr>
<td>1000 0000 0000x xxxx1 11xx 011x xxxx xxxx</td>
<td>Bad data from MID = 7</td>
<td>Replace CPU3 or IOD1</td>
</tr>
<tr>
<td><strong>for Memory Fill Transactions</strong></td>
<td></td>
<td></td>
</tr>
<tr>
<td>1000 0000 0000x xxxx1 00xx 110x xxxx xxxx</td>
<td>Bad data from MID = 4</td>
<td>Replace IOD0</td>
</tr>
<tr>
<td>1000 0000 0000x xxxx1 01xx 110x xxxx xxxx</td>
<td>Bad data from MID = 5</td>
<td>Replace IOD0</td>
</tr>
<tr>
<td>1000 0000 0000x xxxx1 10xx 110x xxxx xxxx</td>
<td>Bad data from MID = 6</td>
<td>Replace IOD1</td>
</tr>
<tr>
<td>1000 0000 0000x xxxx1 11xx 110x xxxx xxxx</td>
<td>Bad data from MID = 7</td>
<td>Replace IOD1</td>
</tr>
</tbody>
</table>

*NOTE: IOD0 = B3040-AA bridge module; IOD1 = B3040-AB bridge module.*
5.4.2 System Bus Nonexistent Address Error

Step 3
Determine which node (if any) should have responded to the command/address identified in MC_ERR1. Perform the action indicated.

Table 5-5 System Bus Nonexistent Address Error Troubleshooting

<table>
<thead>
<tr>
<th>MC_ERR1 Data Pattern</th>
<th>Most Likely Cause</th>
<th>Action</th>
</tr>
</thead>
<tbody>
<tr>
<td>1000 0000 0000 xxxx xxxx 0xxx xxxx</td>
<td>Software generated an MC ADDR &gt; TOP_OF_MEM reg</td>
<td>Fix software</td>
</tr>
<tr>
<td>1000 0000 0000 xxxx xxxx 1xxx 100x</td>
<td>PCI0 bridge did not respond</td>
<td>Replace IOD0</td>
</tr>
<tr>
<td>1000 0000 0000 xxxx xxxx 1xxx 101x</td>
<td>PCI1 bridge did not respond</td>
<td>Replace IOD0</td>
</tr>
<tr>
<td>1000 0000 0000 xxxx xxxx 1xxx 110x</td>
<td>PCI2 bridge did not respond</td>
<td>Replace IOD1</td>
</tr>
<tr>
<td>1000 0000 0000 xxxx xxxx 1xxx 111x</td>
<td>PCI3 bridge did not respond</td>
<td>Replace IOD1</td>
</tr>
</tbody>
</table>

NOTE: IOD0 = B3040-AA bridge module; IOD1 = B3040-AB bridge module.
5.4.3 System Bus Address Parity Error

Step 4

Determine which node put the bad command/address on the system bus identified in MC_ERR1. Perform the action indicated.

Table 5-6  Address Parity Error Troubleshooting

<table>
<thead>
<tr>
<th>MC_ERR1 Data Pattern</th>
<th>Most Likely Cause</th>
<th>Action</th>
</tr>
</thead>
<tbody>
<tr>
<td>1000 0000 000x xx0010xx xxxx xxxx xxxx</td>
<td>Data sourced by MID = 2</td>
<td>Replace CPU0</td>
</tr>
<tr>
<td>1000 0000 000x xx011xx xxxx xxxx xxxx</td>
<td>Data sourced by MID = 3</td>
<td>Replace CPU1</td>
</tr>
<tr>
<td>1000 0000 000x xxx000xx xxxx xxxx xxxx</td>
<td>Data sourced by MID = 4</td>
<td>Replace IOD0</td>
</tr>
<tr>
<td>1000 0000 000x xxx010xx xxxx xxxx xxxx</td>
<td>Data sourced by MID = 5</td>
<td>Replace IOD0</td>
</tr>
<tr>
<td>1000 0000 000x xxx010xx xxxx xxxx xxxx</td>
<td>Data sourced by MID = 6</td>
<td>Replace CPU2 or IOD1</td>
</tr>
<tr>
<td>1000 0000 000x xxx11xx xxxx xxxx xxxx</td>
<td>Data sourced by MID = 7</td>
<td>Replace CPU3 or IOD1</td>
</tr>
</tbody>
</table>

NOTE: IOD0 = B3040-AA bridge module; IOD1 = B3040-AB bridge module.
**5.4.4 PIO Buffer Overflow Error (PIO_OVFL)**

**Step 5**

Enter the value of the CAP_CTRL register bits<19:16> (Actual_PEND_NUM) in the following formula. Compare the results as indicated in Table 5-7 to determine the most likely cause of the error. When an IOD is implicated in the analysis of the error, replace the one that captured the error in its CAP Error Register.

\[
\text{Expected}_\text{PEND}_\text{NUM} = 12 - ((2 \times (X - 1)) + Y)
\]

Where:  
- \(X\) = Number of PCIs  
- \(Y\) = Number of CPUs

**Table 5-7 Cause of PIO_OVFL Error**

<table>
<thead>
<tr>
<th>Comparison</th>
<th>Most Likely Cause</th>
<th>Action</th>
</tr>
</thead>
<tbody>
<tr>
<td>Actual_PEND_NUM =</td>
<td>Broken hardware on IOD</td>
<td>Replace IOD</td>
</tr>
<tr>
<td>Expected_PEND_NUM</td>
<td></td>
<td></td>
</tr>
<tr>
<td>Actual_PEND_NUM &lt;</td>
<td>Broken hardware on IOD</td>
<td>Replace IOD</td>
</tr>
<tr>
<td>Expected_PEND_NUM</td>
<td></td>
<td></td>
</tr>
<tr>
<td>Actual_PEND_NUM &gt;</td>
<td>PEND_NUM setup incorrect</td>
<td>Fix the software</td>
</tr>
<tr>
<td>Expected_PEND_NUM</td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

*NOTE: IOD0 = B3040-AA bridge module; IOD1 = B3040-AB bridge module.*
5.4.5 Page Table Entry Invalid Error

Step 6
This error is almost always a software problem. However, if the software is known to be good and the hardware is suspected, swap the IOD.

5.4.6 PCI Master Abort

Step 7
Master aborts normally occur when the operating system is sizing the PCI bus. However, if the master abort occurs after the system is booted, read PCI_ERR1 and determine which PCI device should have responded to this PCI address. Replace this device.

5.4.7 PCI System Error

Step 8
For this error to occur a PCI device asserted SERR. Read the error registers in all the PCI devices to determine which device. The PCI device that set SERR should have information logged in its error registers that should indicate a device.

5.4.8 PCI Parity Error

Step 9
Read PCI_ERR1 and determine which PCI device normally uses that PCI address space. Replace that device. Also, read the error registers in all the PCI devices to determine which device was driving the PCI bus when the parity error occurred.
5.4.9 Broken Memory

Step 10
Refer to the following sections.

For a Read Data Substitute Error (uncorrectable ECC error)
When a read data substitute (RDS) error occurs, determine which memory module pair caused the error as follows:
1. Run the memory diagnostic to see if it catches the bad memory. If so, replace the memory module that it reports as bad.
2. At the SRM console prompt, enter the `show mem` command.
   `P00>>> show mem`
   This command displays the base address and size of the memory module pair for each slot.
   OR
   Read the configuration packet, found in the error log, to retrieve the base address and size of the memory module pair.
3. Compare this address to the failing address from the MC_ERR1 and MC_ERR0 Registers to determine which memory slot is failing.
4. Replace both memory modules (high and low) for that slot. For an RDS error, there is no way to know which memory module (high or low) is bad.

For a Corrected Read Data Error (CRD)
When a CRD error occurs, determine which memory module pair caused the error as follows:
1. At the SRM console prompt, enter the `show mem` command. This command displays the base address and size of the memory module pair for each slot.
   `P00>>> show mem`
2. Compare this address to the failing address from the MC_ERR1 and MC_ERR0 Registers to determine which memory slot is failing.
3. When you have isolated the failing memory pair, determine which of the two modules is bad. (You cannot do this if the operating system is Windows NT.) Read the CPU FIL SYNDROME Register. If this register is non-zero, use the ECC syndrome bits in Table 5-8 to determine which module had the single-bit error.

### Table 5-8  ECC Syndrome Bits Table

<table>
<thead>
<tr>
<th>MDP Syndrome Values for Low-Order Memory</th>
<th>01</th>
<th>02</th>
<th>04</th>
<th>08</th>
<th>10</th>
<th>20</th>
<th>40</th>
<th>80</th>
<th>CE</th>
</tr>
</thead>
<tbody>
<tr>
<td>CB</td>
<td>D3</td>
<td>D5</td>
<td>D6</td>
<td>D9</td>
<td>DA</td>
<td>DC</td>
<td>23</td>
<td>25</td>
<td></td>
</tr>
<tr>
<td>26</td>
<td>29</td>
<td>2C</td>
<td>31</td>
<td>13</td>
<td>19</td>
<td>4F</td>
<td>4A</td>
<td>52</td>
<td></td>
</tr>
<tr>
<td>54</td>
<td>57</td>
<td>58</td>
<td>5B</td>
<td>5D</td>
<td>A2</td>
<td>A4</td>
<td>A8</td>
<td>B0</td>
<td></td>
</tr>
</tbody>
</table>

<table>
<thead>
<tr>
<th>MDP Syndrome Values for High-Order Memory</th>
<th>2A</th>
<th>0E</th>
<th>0B</th>
<th>15</th>
<th>16</th>
<th>1A</th>
<th>1C</th>
<th>E3</th>
</tr>
</thead>
<tbody>
<tr>
<td>E5</td>
<td>E6</td>
<td>E9</td>
<td>EA</td>
<td>EC</td>
<td>F1</td>
<td>F4</td>
<td>A7</td>
<td>AB</td>
</tr>
<tr>
<td>AD</td>
<td>B5</td>
<td>8F</td>
<td>8A</td>
<td>92</td>
<td>94</td>
<td>97</td>
<td>98</td>
<td>9B</td>
</tr>
<tr>
<td>9D</td>
<td>62</td>
<td>64</td>
<td>67</td>
<td>68</td>
<td>6B</td>
<td>6D</td>
<td>70</td>
<td>75</td>
</tr>
</tbody>
</table>
5.4.10 Command Codes

Table 5-9 shows the codes for transactions on the system bus and how they are affected by the commander in charge of the bus during the transaction. The command is a six-bit field in the command address (bits<5:0>). Bit-to-text translations give six-bit data (although the top two bits may or may not be relevant). Note that address bit<39> defines the command as being either a system space or an I/O command.

<table>
<thead>
<tr>
<th>MC_C</th>
<th>MD</th>
<th>CMD in Hex</th>
<th>MC_ADR&lt;39&gt;</th>
<th>Description</th>
<th>No B-Cache CPU</th>
<th>B-Cache CPU</th>
<th>IOD</th>
</tr>
</thead>
<tbody>
<tr>
<td>5 4</td>
<td>3 2 1 0</td>
<td>X 0 1</td>
<td>Mem Idle</td>
<td>Y</td>
<td>Y</td>
<td></td>
<td></td>
</tr>
<tr>
<td>0 0</td>
<td>0 1 0 1</td>
<td>0 2 1</td>
<td>Write Pend Ack</td>
<td></td>
<td>Y</td>
<td></td>
<td></td>
</tr>
<tr>
<td>x x</td>
<td>0 0 1 1</td>
<td>X 3 1</td>
<td>Mem Refresh</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>x x</td>
<td>1 0 1 0</td>
<td>X 4 0</td>
<td>Set Dirty</td>
<td></td>
<td>Y</td>
<td></td>
<td></td>
</tr>
<tr>
<td>x 0</td>
<td>0 1 1 0</td>
<td>0/2 6 0</td>
<td>Write Thru - Mem</td>
<td>Y</td>
<td>Y</td>
<td></td>
<td></td>
</tr>
<tr>
<td>x 0</td>
<td>0 1 1 0</td>
<td>0/2 6 1</td>
<td>Write Thru - I/O</td>
<td>Y</td>
<td>Y</td>
<td></td>
<td></td>
</tr>
<tr>
<td>x 1</td>
<td>0 1 1 0</td>
<td>3/1 6 0</td>
<td>Write Back - Mem</td>
<td>Y</td>
<td>Y</td>
<td></td>
<td></td>
</tr>
<tr>
<td>x 1</td>
<td>0 1 1 0</td>
<td>3/1 6 1</td>
<td>Write Intr - I/O</td>
<td></td>
<td>Y</td>
<td></td>
<td></td>
</tr>
<tr>
<td>0 0</td>
<td>0 1 1 1</td>
<td>0 7 0</td>
<td>Write Full - Mem</td>
<td></td>
<td>Y</td>
<td></td>
<td></td>
</tr>
<tr>
<td>1 0</td>
<td>0 1 1 1</td>
<td>2 7 0</td>
<td>Write Part - Mem (B-cache CPU only)</td>
<td></td>
<td>Y</td>
<td></td>
<td></td>
</tr>
<tr>
<td>x 0</td>
<td>0 1 1 1</td>
<td>0/2 7 1</td>
<td>Write Mask - I/O</td>
<td></td>
<td>Y</td>
<td></td>
<td></td>
</tr>
<tr>
<td>x 0</td>
<td>0 1 1 1</td>
<td>0/2 7 0</td>
<td>Write Merge - Mem</td>
<td></td>
<td>Y</td>
<td></td>
<td></td>
</tr>
<tr>
<td>x x</td>
<td>1 0 0 0</td>
<td>X 8 0</td>
<td>Read0 - Mem</td>
<td>Y</td>
<td>Y</td>
<td>Y</td>
<td></td>
</tr>
<tr>
<td>x x</td>
<td>1 0 0 0</td>
<td>X 8 1</td>
<td>Read0 - I/O</td>
<td>Y</td>
<td>Y</td>
<td></td>
<td></td>
</tr>
<tr>
<td>x x</td>
<td>1 0 0 1</td>
<td>X 9 0</td>
<td>Read1 - Mem</td>
<td>Y</td>
<td>Y</td>
<td>Y</td>
<td></td>
</tr>
<tr>
<td>x x</td>
<td>1 0 0 1</td>
<td>X 9 1</td>
<td>Read1 - I/O</td>
<td>Y</td>
<td>Y</td>
<td></td>
<td></td>
</tr>
<tr>
<td>x x</td>
<td>1 0 1 0</td>
<td>X A 0</td>
<td>Read Mod0 - Mem</td>
<td>Y</td>
<td>Y</td>
<td>Y</td>
<td></td>
</tr>
</tbody>
</table>
Table 5-9 Decoding Commands (continued)

<table>
<thead>
<tr>
<th>5 4</th>
<th>MC_C MD 3210</th>
<th>CMD in Hex</th>
<th>MC_ADR &lt;39&gt;</th>
<th>Description</th>
<th>No B-Cache CPU</th>
<th>B-Cache CPU</th>
<th>IOD</th>
</tr>
</thead>
<tbody>
<tr>
<td>x x</td>
<td>1 0 1 0</td>
<td>X A 0</td>
<td></td>
<td>Read Mod0 - Mem</td>
<td>Y</td>
<td>Y</td>
<td>Y</td>
</tr>
<tr>
<td>x x</td>
<td>1 0 1 0</td>
<td>X A 1</td>
<td></td>
<td>Read Peer0 - I/O</td>
<td></td>
<td></td>
<td>Y</td>
</tr>
<tr>
<td>x x</td>
<td>1 0 1 1</td>
<td>X B 0</td>
<td></td>
<td>Read Mod1 - Mem</td>
<td>Y</td>
<td>Y</td>
<td>Y</td>
</tr>
<tr>
<td>x x</td>
<td>1 0 1 1</td>
<td>X B 1</td>
<td></td>
<td>Read Peer1 - I/O</td>
<td></td>
<td></td>
<td>Y</td>
</tr>
<tr>
<td>1 0</td>
<td>1 1 0 0</td>
<td>2 C 1</td>
<td></td>
<td>FILL0 (due to Read0/Peer0)</td>
<td></td>
<td></td>
<td>Y</td>
</tr>
<tr>
<td>1 0</td>
<td>1 1 0 1</td>
<td>2 D 1</td>
<td></td>
<td>FILL0 (due to Read1/Peer1)</td>
<td></td>
<td></td>
<td>Y</td>
</tr>
<tr>
<td>x x</td>
<td>1 1 1 0</td>
<td>X E 0</td>
<td></td>
<td>Read0 - Mem</td>
<td>Y</td>
<td>Y</td>
<td></td>
</tr>
<tr>
<td>x x</td>
<td>1 1 1 1</td>
<td>X F 0</td>
<td></td>
<td>Read1 - Mem</td>
<td>Y</td>
<td>Y</td>
<td></td>
</tr>
</tbody>
</table>

5.4.11 Node IDs

The node ID is a six-bit field in the command address (bits<38:33>). The high-order three bits are always set, and the last three indicate the node. Bit-to-text translations give six-bit data, although only the last three bits define the node.

Table 5-10 Node IDs

<table>
<thead>
<tr>
<th>Node ID &lt;2:0&gt;</th>
<th>Six Bit (Hex)</th>
<th>Node (4000)</th>
<th>Node (4100)</th>
</tr>
</thead>
<tbody>
<tr>
<td>0 0 0</td>
<td>38</td>
<td>Memory</td>
<td>Memory</td>
</tr>
<tr>
<td>0 0 1</td>
<td>39</td>
<td>CPU0</td>
<td>CPU0</td>
</tr>
<tr>
<td>0 1 0</td>
<td>3A</td>
<td>CPU1</td>
<td>CPU1</td>
</tr>
<tr>
<td>0 1 1</td>
<td>3B</td>
<td>IOD0</td>
<td>IOD0</td>
</tr>
<tr>
<td>1 0 0</td>
<td>3C</td>
<td>IOD1</td>
<td>IOD1</td>
</tr>
<tr>
<td>1 0 1</td>
<td>3D</td>
<td>IOD2</td>
<td>CPU2</td>
</tr>
<tr>
<td>1 1 0</td>
<td>3E</td>
<td>IOD3</td>
<td>CPU3</td>
</tr>
<tr>
<td>1 1 1</td>
<td>3F</td>
<td></td>
<td></td>
</tr>
</tbody>
</table>
5.5 Double Error Halts and Machine Checks While in PAL Mode

Two error cases require special attention. Neither double error halts or machine checks while the machine is in PAL mode result in error log entries. Nevertheless, information is available that can help determine what error occurred.

5.5.1 PALcode Overview

PALcode, privileged architecture library code, is used to implement a number of functions at the machine level without the use of microcode. This allows operating systems to make common calls to PALcode routines without knowing the hardware specifics of each system the operating system is running on. PALcode routines handle:

- Instructions that require complex sequencing, such as atomic operations
- Instructions that require VAX-style interlocked memory access
- Privileged instructions
- Memory management
- Context swapping
- Interrupt and exception dispatching
- Power-up initialization and booting
- Console functions
- Emulation of instructions with no hardware support
5.5.2 Double Error Halt

A double error halt occurs under the following conditions:

- A machine check occurs.
- PAL completes its tasks and returns control of the system to the operating system.
- A second machine check occurs before the operating system completes its tasks.

The machine returns to the console and displays the following message:

```
halt code = 6
double error halt
PC = 20000004
Your system has halted due to an irrecoverable error. Record the error halt code and PC and contact your Digital Services representative. In addition, type INFO 5 and INFO 8 at the console and record the results.
```

The `info 5` command (Example 5-9) causes the SRM console to read the PAL-built logout area that contains all the data used by the operating system to create the error entry.

The `info 8` command (Example 5-10) causes the SRM console to read the IOD 0 and IOD 1 registers.

5.5.3 Machine Checks While in PAL

If a machine check occurs while the system is running PALcode, PALcode returns to the SRM console, not to the operating system. The SRM console writes the following message:

```
halt code = 7
machine check while in PAL mode
PC = 20000004
Your system has halted due to an irrecoverable error. Record the error halt code and PC and contact your Digital Services representative. In addition, type INFO 3 and INFO 8 at the console and record the results.
```

The `info 3` command (Example 5-8) causes the SRM console to read the “impure area,” which contains the state of the CPU before it entered PAL.

**Example 5-8  INFO 3 Command**

```
P00>>> info 3
```
per_cpu impure area 00004400

cns$flag 00000001 : 0000
cns$flag+4 00000000 : 0004
cns$shlt 00000000 : 0008
cns$shlt+4 00000000 : 000c

cns$mchkflag 00000228 : 0210
cns$mchkflag+4 00000000 : 0214

cns$exc_addr 20000004 : 0318
cns$exc_addr+4 00000000 : 031c

cns$pal_base 00000000 : 0320
cns$pal_base+4 00000000 : 0324

cns$mm_stat 0000da10 : 0338
cns$mm_stat+4 00000000 : 033c

cns$va 00080000 : 0340
cns$va+4 00000000 : 0344

cns$icsr 00000000 : 0350
cns$icsr+4 00000000 : 0354

cns$ps 00000000 : 0360
cns$ps+4 00000000 : 0364

cns$itb_asn 00000000 : 0370
cns$itb_asn+4 00000000 : 0374

cns$aster 00000000 : 0378
cns$aster+4 00000000 : 037c

cns$sr 00400000 : 0380
cns$sr+4 00000000 : 0384

cns$ivptbr 00000000 : 0390
cns$ivptbr+4 00000000 : 0394

cns$mcsr 00000000 : 0398
cns$mcsr+4 00000000 : 039c

cns$dc_mode 00000000 : 03a0
cns$dc_mode+4 00000000 : 03a4

cns$maf_mode 00000000 : 03a8
cns$maf_mode+4 00000000 : 03ac

cns$sc_stat 00000000 : 03b0
cns$sc_stat+4 00000000 : 03b4

cns$sc_addr 000047cf : 03c0
cns$sc_addr+4 ffffff00 : 03c4

cns$sc_ctl 0000f000 : 03cc
cns$sc_ctl+4 00000000 : 03d0

cns$ei_stat 00000000 : 03d4
cns$ei_stat+4 00000000 : 03d8

cns$exc_sum 00000000 : 03e0
cns$exc_sum+4 00000000 : 03e4

cns$exc_mask 00000000 : 03e8
cns$exc_mask+4 00000000 : 03ec

cns$intid 00000000 : 03f0
cns$intid+4 00000000 : 03f4

cns$bc_tag_addr ff7fefff : 03f8
cns$bc_tag_addr+4 ffffffff : 03fc

cns$ei_stat+4 04000000 : 0404
cns$fill_syn 000000a7 : 0410
cns$fill_syn+4 00000000 : 0414
cns$ld_lock 0004eef : 0418
cns$ld_lock+4 ffffff00 : 041c
Example 5-9   INFO 5 Command

P00>>> info 5

cpu00

per_cpu logout area 00004838
mchk$crd_flag 00000320 : 0000
mchk$crd_flag+4 00000000 : 0004
mchk$crd_offsets 00000118 : 0008
mchk$crd_offsets+4 00001328 : 000c
mchk$crd_mchk_code 00980000 : 0014
mchk$crd_mchk_code+4 00000000 : 0010
mchk$crd_ei_stat eba00003 : 0018
mchk$crd_ei_stat+4 4143040a : 001c
mchk$crd_ei_addr d1200067 : 0020
mchk$crd_ei_addr+4 47f90416 : 0024
mchk$crd_fill_syn eba00003 : 0028
mchk$crd_fill_syn+4 d1200068 : 002c
mchk$crd_isr 7ec38000 : 0030
mchk$crd_isr+4 63ff4000 : 0034
mchk$flag 00000320 : 0000
mchk$flag+4 00000000 : 0004
mchk$ isr 00000000 : 0138
mchk$ isr+4 00000000 : 013c
mchk$icsr 60000000 : 0140
mchk$icsr+4 000000c1 : 0144
mchk$ic_perr_stat 00000000 : 0148
mchk$ic_perr_stat+4 00000000 : 014c
mchk$do_perr_stat 00000000 : 0150
mchk$do_perr_stat+4 00000000 : 0154
mchk$va ff8000a0 : 0158
mchk$va+4 ffffffff : 015c
mchk$mm_stat 00149d0 : 0160
mchk$mm_stat+4 00000000 : 0164
mchk$sc_addr 001904f : 0168
mchk$sc_addr+4 ffffffff : 016c
mchk$sc_stat 00000000 : 0170
mchk$sc_stat+4 00000000 : 0174
mchk$bc_tag_addr ff7fefff : 0178
mchk$bc_tag_addr+4 ffffffff : 017c
mchk$ei_addr 066bc3ef : 0180
mchk$ei_addr+4 ffffffff : 0184
mchk$fill_syn 00000a7 : 0188
mchk$fill_syn+4 00000000 : 018c
mchk$ei_stat 04ffffff : 0190
mchk$ei_stat+4 fffffff0 : 0194
mchk$ld_lock 00005b6f : 0198
mchk$ld_lock+4 ffffffff0 : 019c

IOD: 0 base address: f9e0000000

WHOAMI: 0000003a PCI_REV: 06008221
CAP_CTL: 02490fb1 HAE_MEM: 00000000 HAE_IO: 00000000
INT_CTL: 00000003 INT_REQ: 00800000 INT_MASK0: 00010000
INT_MASK1: 00000000 MC_ERR0: e0000000 MC_ERR1: 800e88fd
CAP_ERR: 84000000 PCI_ERR: 00000000 MDPA_STAT: 00000000
MDPA_SYN: 00000000 MDPB_STAT: 00000000 MDPB_SYN: 00000000

IOD: 1 base address: fbe0000000

WHOAMI: 0000003a PCI_REV: 06000221
CAP_CTL: 02490fb1 HAE_MEM: 00000000 HAE_IO: 00000000
INT_CTL: 00000003 INT_REQ: 00800000 INT_MASK0: 00010000
INT_MASK1: 00000000 MC_ERR0: e0000000 MC_ERR1: 800e88fd
CAP_ERR: 84000000 PCI_ERR: 00000000 MDPA_STAT: 00000000
MDPA_SYN: 00000000 MDPB_STAT: 00000000 MDPB_SYN: 00000000
Example 5-10  INFO 8 Command

P00>>> info 8

IOD 0

WHOAMI: 0000003a  PCI_REV: 06008221
CAP_CTL: 02490fb1  HAE_MEM: 00000000  HAE_IO: 00000000
INT_CTL: 00000003  INT_REQ: 00000000  INT_MASK0: 00210000
INT_MASK1: 00000000  MC_ERR0: e0000000  MC_ERR1: 00000000
CAP_ERR: 00000000  PCI_ERR: 00000000  MDPA_STAT: 00000000
MDPA_SYN: 00000000  MDPB_STAT: 00000000  MDPB_SYN: 00000000
INT_TARG: 0000003a  INT_ADR: 00006000  INT_ADR_EXT: 00000000
PERF_MON: 00406ebf  PERF_CONT: 00000000
DIAG_CHK: 10000000  SCRATCH: 21011131
W0_BASE: 00100001  T0_BASE: 00001000
W1_BASE: 00800001  T1_BASE: 00008000
W2_BASE: 80000001  T2_BASE: 00000000
W3_BASE: 00000000  T3_BASE: 0000b800
W_DAC: 00000000  HGBASE: 00000000

IOD 1

WHOAMI: 0000003a  PCI_REV: 06000221
CAP_CTL: 02490fb1  HAE_MEM: 00000000  HAE_IO: 00000000
INT_CTL: 00000003  INT_REQ: 00000000  INT_MASK0: 00000000
INT_MASK1: 00000000  MC_ERR0: e0000000  MC_ERR1: 00000000
CAP_ERR: 00000000  PCI_ERR: 00000000  MDPA_STAT: 00000000
MDPA_SYN: 00000000  MDPB_STAT: 00000000  MDPB_SYN: 00000000
INT_TARG: 0000003a  INT_ADR: 00006000  INT_ADR_EXT: 00000000
PERF_MON: 004e31a6  PERF_CONT: 00000000
DIAG_CHK: 10000000  SCRATCH: 00000000
W0_BASE: 00100001  T0_BASE: 00001000
W1_BASE: 00800001  T1_BASE: 00008000
W2_BASE: 80000001  T2_BASE: 00000000
W3_BASE: 00000000  T3_BASE: 0000a000
W_DAC: 00000000  HGBASE: 00000000
This chapter describes the registers used to hold error information. These registers include:

- External Interface Status Register
- External Interface Address Register
- MC Error Information Register 0
- MC Error Information Register 1
- CAP Error Register
- PCI Error Status Register 1
6.1 External Interface Status Register - EL_STAT

The EL_STAT register is a read-only register that is unlocked and cleared by any PALcode read. A read of this register also unlocks the EI_ADDR, BC_TAG_ADDR, and FILL_SYN registers subject to some restrictions. The EL_STAT register is not unlocked or cleared by reset.

Address: FF FFFFFFF 0168
Type: R

```
| 31 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 |
|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|
|    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    | All 1s |
| CHIP_ID <3:0> | BC_TPERR | BC_TC_PERR | EI_ES | COR_ECC_ERR |
|               |           |            |       |             |
|               |           |            |       |             |
|               |           |            |       |             |
|               |           |            |       |             |
| 36 | 35 | 34 | 33 | 32 | 31 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 |
|    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    | All 1s |
| SEO_HRD_ERR | FIL_IRD | EI_PAR_ERR | UNC_ECC_ERR |
|             |         |            |            |
|             |         |            |            |
|             |         |            |            |
|             |         |            |            |
PKW0453-96```
Fill data from B-cache or main memory could have correctable or uncorrectable errors in ECC mode. In parity mode, fill data parity errors are treated as uncorrectable hard errors. System address/command parity errors are always treated as uncorrectable hard errors, irrespective of the mode. The sequence for reading, unlocking, and clearing EI_STAT, EI_ADDR, BC_TAG_ADDR, and FILL_SYN is as follows:

1. Read the EI_ADDR, BC_TAG_ADDR, and FILL_SYN registers in any order. Does not unlock or clear any register.

2. Read the EI_STAT register. This operation unlocks the EI_ADDR, BC_TAG_ADDR, and FILL_SYN registers. It also unlocks the EI_STAT register subject to conditions given in Table 6-2, which defines the loading and locking rules for external interface registers.

**NOTE:** If the first error is correctable, the registers are loaded but not locked. On the second correctable error, the registers are neither loaded nor locked.

Registers are locked on the first uncorrectable error except the second hard error bit. This bit is set only for an uncorrectable error that follows an uncorrectable error. A correctable error that follows an uncorrectable error is not logged as a second error. B-cache tag parity errors are uncorrectable in this context.
<table>
<thead>
<tr>
<th>Name</th>
<th>Bits</th>
<th>Type</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>COR_ECC_ERR</td>
<td>&lt;31&gt;</td>
<td>R</td>
<td><strong>Correctable ECC Error.</strong> Indicates that fill data received from outside the CPU contained a correctable ECC error.</td>
</tr>
<tr>
<td>EI_ES</td>
<td>&lt;30&gt;</td>
<td>R</td>
<td><strong>External Interface Error Source.</strong> When set, indicates that the error source is fill data from main memory or a system address/command parity error. When clear, the error source is fill data from the B-cache. This bit is only meaningful when &lt;COR_ECC_ERR&gt;, &lt;UNC_ECC_ERR&gt;, or &lt;EI_PAR_ERR&gt; is set in this register. This bit is not defined for a B-cache tag error (BC_TPERR) or a B-cache tag control parity error (BC_TC_PERR).</td>
</tr>
<tr>
<td>BC_TC_PERR</td>
<td>&lt;29&gt;</td>
<td>R</td>
<td><strong>B-Cache Tag Control Parity Error.</strong> Indicates that a B-cache read transaction encountered bad parity in the tag control RAM.</td>
</tr>
<tr>
<td>BC_TPERR</td>
<td>&lt;28&gt;</td>
<td>R</td>
<td><strong>B-Cache Tag Address Parity Error.</strong> Indicates that a B-cache read transaction encountered bad parity in the tag address RAM.</td>
</tr>
<tr>
<td>CHIP_ID</td>
<td>&lt;27:24&gt;</td>
<td>R</td>
<td><strong>Chip Identification.</strong> Read as “4.” Future update revisions to the chip will return new unique values.</td>
</tr>
<tr>
<td></td>
<td>&lt;23:0&gt;</td>
<td></td>
<td>All ones.</td>
</tr>
</tbody>
</table>
Table 6-1  External Interface Status Register (continued)

<table>
<thead>
<tr>
<th>Name</th>
<th>Bits</th>
<th>Type</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>6-5</td>
<td></td>
<td></td>
<td>All ones.</td>
</tr>
<tr>
<td>SEO_HRD_ERR</td>
<td>35</td>
<td>R</td>
<td>Second External Interface Hard Error. Indicates that a fill from B-cache or main memory, or a system address/command received by the CPU has a hard error while one of the hard error bits in the EI_STST register is already set.</td>
</tr>
<tr>
<td>FIL_IRD</td>
<td>34</td>
<td>R</td>
<td>Fill I-Ref D-Ref. When set, indicates that the error occurred during an I-ref fill. When clear, indicates that the error occurred during a D-ref fill. This bit has meaning only when one of the ECC or parity error bits is set.</td>
</tr>
<tr>
<td></td>
<td></td>
<td></td>
<td>This bit is not defined for a B-cache tag parity error (BC_TPERR) or a B-cache tag control parity error (BC_TC_ERR).</td>
</tr>
<tr>
<td>EL_PAR_ERR</td>
<td>33</td>
<td>R</td>
<td>External Interface Command/Address Parity Error. Indicates that an address and command received by the CPU has a parity error.</td>
</tr>
<tr>
<td>UNC_ECC_ER</td>
<td>32</td>
<td>R</td>
<td>Uncorrectable ECC Error. Indicates that fill data received from outside the CPU contained an uncorrectable ECC error. In parity mode, this bit indicates a data parity error.</td>
</tr>
</tbody>
</table>
6.1.1 External Interface Address Register - EI_ADDR

The EI_ADDR register contains the physical address associated with errors reported by the EI_STAT register. It is unlocked by a read of the EI_STAT Register. This register is meaningful only when one of the error bits is set.

<table>
<thead>
<tr>
<th>Address</th>
<th>FF FFF0 0148</th>
</tr>
</thead>
<tbody>
<tr>
<td>Access</td>
<td>R</td>
</tr>
</tbody>
</table>

<table>
<thead>
<tr>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th>4</th>
<th>3</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td>All 1s</td>
</tr>
</tbody>
</table>

<table>
<thead>
<tr>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th>40</th>
<th>39</th>
<th>32</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td></td>
<td></td>
<td></td>
<td>All 1s</td>
<td>EI_ADDR</td>
<td>&lt;39:32&gt;</td>
<td></td>
</tr>
</tbody>
</table>

PKW0454-96
Table 6-2: Loading and Locking Rules for External Interface Registers

<table>
<thead>
<tr>
<th>Correctable Error</th>
<th>Uncorrectable Error</th>
<th>Second Hard Error</th>
<th>Load Register</th>
<th>Lock Register</th>
<th>Action When EI_STAT is Read</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>0</td>
<td>Not possible</td>
<td>No</td>
<td>No</td>
<td>Clears and unlocks all registers</td>
</tr>
<tr>
<td>1</td>
<td>0</td>
<td>Not possible</td>
<td>Yes</td>
<td>No</td>
<td>Clears and unlocks all registers</td>
</tr>
<tr>
<td>0</td>
<td>1</td>
<td>0</td>
<td>Yes</td>
<td>Yes</td>
<td>Clears and unlocks all registers</td>
</tr>
<tr>
<td>1′</td>
<td>1</td>
<td>0</td>
<td>Yes</td>
<td>Yes</td>
<td>Clear bit (c) does not unlock. Transition to “0,1,0” state.</td>
</tr>
<tr>
<td>0</td>
<td>1</td>
<td>1</td>
<td>No</td>
<td>Already locked</td>
<td>Clears and unlocks all registers</td>
</tr>
<tr>
<td>1′</td>
<td>1</td>
<td>1</td>
<td>No</td>
<td>Already locked</td>
<td>Clear bit (c) does not unlock. Transition to “0,1,1” state.</td>
</tr>
</tbody>
</table>

1 These are special cases. It is possible that when EI_ADDR is read, only the correctable error bit is set and the registers are not locked. By the time EI_STAT is read, an uncorrectable error is detected and the registers are loaded again and locked. The value of EI_ADDR read earlier is no longer valid. Therefore, for the “1,1,x” case, when EI_STAT is read correctable, the error bit is cleared and the registers are not unlocked or cleared. Software must reexecute the IPR read sequence. On the second read operation, error bits are in “0,1,x” state, all the related IPRs are unlocked, and EI_STAT is cleared.
6.1.2 MC Error Information Register 0
(MC_ERR0 - Offset = 800)

The low-order MC bus (system bus) address bits are latched into this register when the system bus to PCI bus bridge detects an error event. If the event is a hard error, the register bits are locked. A write to clear symptom bits in the CAP Error Register unlocks this register. When the valid bit (MC_ERR_VALID) in the CAP Error Register is clear, the contents are undefined.

Table 6-3 MC Error Information Register 0

<table>
<thead>
<tr>
<th>Name</th>
<th>Bits</th>
<th>Type</th>
<th>Initial State</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>ADDR&lt;31:4&gt;</td>
<td>&lt;31:4&gt;</td>
<td>RO</td>
<td>0</td>
<td>Contains the address of the transaction on the system bus when an error is detected.</td>
</tr>
<tr>
<td>Reserved</td>
<td>&lt;3:0&gt;</td>
<td>RO</td>
<td>0</td>
<td></td>
</tr>
</tbody>
</table>
6.1.3 MC Error Information Register 1  
(MC_ERR1 - Offset = 840)

The high-order MC bus (system bus) address bits and error symptoms are latched into this register when the system bus to PCI bus bridge detects an error. If the event is a hard error, the register bits are locked. A write to clear symptom bits in the CAP Error Register unlocks this register. When the valid bit (MC_ERR_VALID) in the CAP Error Register is clear, the contents are undefined.
<table>
<thead>
<tr>
<th>Name</th>
<th>Bits</th>
<th>Type</th>
<th>Initial State</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>VALID</td>
<td>&lt;31&gt;</td>
<td>RO</td>
<td>0</td>
<td>Logical OR of bits &lt;30:23&gt; in the CAP_ERR Register. Set if MC_ERR0 and MC_ERR1 contain a valid address.</td>
</tr>
<tr>
<td>Reserved</td>
<td>&lt;30:21&gt;</td>
<td>RO</td>
<td>0</td>
<td>Reserved</td>
</tr>
<tr>
<td>Dirty</td>
<td>&lt;20&gt;</td>
<td>RO</td>
<td>0</td>
<td>Set if the system bus error was associated with a Read/Dirty transaction. When set, the device ID field &lt;19:14&gt; does not indicate the source of the data.</td>
</tr>
<tr>
<td>Reserved</td>
<td>&lt;19:17&gt;</td>
<td></td>
<td></td>
<td>All ones.</td>
</tr>
<tr>
<td>DEVICE_ID</td>
<td>&lt;16:14&gt;</td>
<td>RO</td>
<td>0</td>
<td>Slot number of bus master at the time of the error.</td>
</tr>
<tr>
<td>MC_CMD&lt;5:0&gt;</td>
<td>&lt;13:8&gt;</td>
<td>RO</td>
<td>0</td>
<td>Active command at the time the error was detected.</td>
</tr>
<tr>
<td>ADDR&lt;39:32&gt;</td>
<td>&lt;7:0&gt;</td>
<td>RO</td>
<td>0</td>
<td>Address bits &lt;39:32&gt; of the transaction on the system bus when an error is detected.</td>
</tr>
</tbody>
</table>
6.1.4 CAP Error Register  
(CAP_ERR - Offset = 880)

CAP_ERR is used to log information pertaining to an error detected by the CAP or MDP ASIC. If the error is a hard error, the register is locked. All bits, except the LOST_MC_ERR bit, are locked on hard errors. CAP_ERR remains locked until the CAP error is written to clear each individual error bit.
Table 6-5  CAP Error Register

<table>
<thead>
<tr>
<th>Name</th>
<th>Bits</th>
<th>Type</th>
<th>Initial State</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>MC_ERR VALID</td>
<td>&lt;31&gt;</td>
<td>RO</td>
<td>0</td>
<td>Logical OR of bits &lt;30:23&gt; in this register. When set MC_ERR0 and MC_ERR1 are latched.</td>
</tr>
<tr>
<td>RDSB</td>
<td>&lt;30&gt;</td>
<td>RW1C</td>
<td>0</td>
<td>Uncorrectable ECC error detected by MDPB. Clear state in MDPB before clearing this bit.</td>
</tr>
<tr>
<td>RDSA</td>
<td>&lt;29&gt;</td>
<td>RW1C</td>
<td>0</td>
<td>Uncorrectable ECC error detected by MDPA. Clear state in MDPA before clearing this bit.</td>
</tr>
<tr>
<td>CRDB</td>
<td>&lt;28&gt;</td>
<td>RW1C</td>
<td>0</td>
<td>Correctable ECC error detected by MDPB. Clear state in MDPB_STAT before clearing this bit.</td>
</tr>
<tr>
<td>CRDA</td>
<td>&lt;27&gt;</td>
<td>RW1C</td>
<td>0</td>
<td>Correctable ECC error detected by MDPA. Clear state in MDPA_STAT before clearing this bit.</td>
</tr>
<tr>
<td>NXM</td>
<td>&lt;26&gt;</td>
<td>RW1C</td>
<td>0</td>
<td>System bus master transaction status NXM (Read with Address bit &lt;39&gt; set but transaction not pended or transaction target above the top of memory register.) CPU will also get a fill error on reads.</td>
</tr>
<tr>
<td>MC_ADR_PERR</td>
<td>&lt;25&gt;</td>
<td>RW1C</td>
<td>0</td>
<td>Set when a system bus command/address parity error is detected.</td>
</tr>
<tr>
<td>Name</td>
<td>Bits</td>
<td>Type</td>
<td>Initial State</td>
<td>Description</td>
</tr>
<tr>
<td>--------------</td>
<td>------</td>
<td>------</td>
<td>---------------</td>
<td>---------------------------------------------------------------------------------------------------------------------------------------------</td>
</tr>
<tr>
<td>LOST_MC_ERR</td>
<td>&lt;24&gt;</td>
<td>RW1C</td>
<td>0</td>
<td>Set when an error is detected but not logged because the associated symptom fields and registers are locked with the state of an earlier error.</td>
</tr>
<tr>
<td>PIO_OVFL</td>
<td>&lt;23&gt;</td>
<td>RW1C</td>
<td>0</td>
<td>Set when a transaction that targets this system bus to PCI bus bridge is not serviced because the buffers are full.</td>
</tr>
<tr>
<td></td>
<td></td>
<td></td>
<td></td>
<td>This is a symptom of setting the PEND_NUM field in CAP_CNTL to an incorrect value.</td>
</tr>
<tr>
<td>Reserved</td>
<td>&lt;22:5&gt;</td>
<td>RO</td>
<td>0</td>
<td></td>
</tr>
<tr>
<td>PCI_ERR_VALID</td>
<td>&lt;4&gt;</td>
<td>RO</td>
<td>0</td>
<td>Logical OR of bits &lt;3:0&gt; of this register. When set, the PCI error address register is locked.</td>
</tr>
<tr>
<td>PTE_INV</td>
<td>&lt;3&gt;</td>
<td>RW1C</td>
<td>0</td>
<td>Invalid page table entry on scatter/gather access.</td>
</tr>
<tr>
<td>MAB</td>
<td>&lt;2&gt;</td>
<td>RW1C</td>
<td>0</td>
<td>PCI master state machine detected PCI Target Abort (likely cause: NXM) (except Special Cycle). On reads fill error is also returned.</td>
</tr>
<tr>
<td>SERR</td>
<td>&lt;1&gt;</td>
<td>RW1C</td>
<td>0</td>
<td>PCI target state machine observed SERR#. CAP asserts SERR when it is master and detects target abort.</td>
</tr>
<tr>
<td>PERR</td>
<td>&lt;0&gt;</td>
<td>RW1C</td>
<td>0</td>
<td>PCI master state machine observed PERR#.</td>
</tr>
</tbody>
</table>
6.1.5 PCI Error Status Register 1 (PCI_ERR1 - Offset = 1040)

PCI_ERR1 is used by the system bus to PCI bus bridge to log bus address <31:0> pertaining to an error condition logged in CAP_ERR. This register always captures PCI address <31:0>, even for a PCI DAC cycle. When the PCI_ERR_VALID bit in CAP_ERR is clear, the contents are undefined.

Table 6-6 PCI Error Status Register 1

<table>
<thead>
<tr>
<th>Name</th>
<th>Bits</th>
<th>Type</th>
<th>Initial State</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>ADDR&lt;31:0&gt;</td>
<td>&lt;31:0&gt;</td>
<td>RO</td>
<td>0</td>
<td>Contains address bits &lt;31:0&gt; of the transaction on the PCI bus when an error is detected.</td>
</tr>
</tbody>
</table>
Chapter 7

Removal and Replacement

This chapter describes removal and replacement procedures for field-replaceable units (FRUs).

7.1 System Safety

Observe the safety guidelines in this section to prevent personal injury.

CAUTION: Wear an antistatic wrist strap whenever you work on a system. The AlphaServer cabinet system has a wrist strap connected to the frame at the front and rear. The pedestal system does not have an attached strap, so you will have to take one to the site.

WARNING: When the system interlocks are disabled and the system is still powered on, voltages are low in the system drawer, but current is high. Observe the following guidelines to prevent personal injury.

1. Remove any jewelry that may conduct electricity before working on the system.
2. Do not insert your hands between the fan and the power supply.
3. If you need to access the system card cage, power down the system and wait 2 minutes to allow components in that area to cool.
7.2 FRU List

Figure 7-1 shows the locations of FRUs in the system drawer, and Table 7-1 lists the part numbers of all field-replaceable units.

**Figure 7-1  System Drawer FRU Locations**

- Top Cover
- Rear
- Memory Modules
- CPU Modules
- Fan Tray
- Optional and N+1 Power Supplies
- Power Supply
- Top Cover Front
- PCI/EISA Options
- Power Cable
- OCP, Floppy, and CD-ROM
- PKW0452-96

7-2  AlphaServer 4000/4100 Service Manual
Table 7-1  Field-Replaceable Unit Part Numbers

<table>
<thead>
<tr>
<th>CPU Modules</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>B3001-CA</td>
<td>300 MHz CPU, uncached</td>
</tr>
<tr>
<td>B3002-AB</td>
<td>300 MHz CPU, 2 Mbyte cache</td>
</tr>
<tr>
<td>B3004-BA</td>
<td>300 MHz CPU, 2 Mbyte cache</td>
</tr>
<tr>
<td>B3004-AA</td>
<td>400 MHz, 4 Mbyte cache</td>
</tr>
<tr>
<td>B3004-DA</td>
<td>466 MHz, 4 Mbyte cache</td>
</tr>
</tbody>
</table>

<table>
<thead>
<tr>
<th>Memory Modules</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>B3020-CA</td>
<td>64 Mbyte synch</td>
</tr>
<tr>
<td>B3030-EA</td>
<td>256 Mbyte asynch (EDO)</td>
</tr>
<tr>
<td>B3030-FA</td>
<td>512 Mbyte asynch (EDO)</td>
</tr>
<tr>
<td>B3030-GA</td>
<td>2 Gbyte asynch (EDO)</td>
</tr>
</tbody>
</table>

<table>
<thead>
<tr>
<th>Required System Drawer Modules and Display</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>54-23803-01</td>
<td>System motherboard (4100)</td>
</tr>
<tr>
<td>54-23803-02</td>
<td>System motherboard (early 4000)</td>
</tr>
<tr>
<td>54-23805-01</td>
<td>System motherboard (4000)</td>
</tr>
<tr>
<td>B3040-AA</td>
<td>System bus to PCI bus bridge module (both systems)</td>
</tr>
<tr>
<td>B3040-AB</td>
<td>System bus to PCI bus bridge module (later 4000 only)</td>
</tr>
<tr>
<td>54-24117-01</td>
<td>Power control module</td>
</tr>
<tr>
<td>B3050-AA</td>
<td>PCI motherboard (both systems)</td>
</tr>
<tr>
<td>B3051-AA</td>
<td>PCI motherboard (later 4000 only)</td>
</tr>
<tr>
<td>54-24364-01</td>
<td>OCP logic module</td>
</tr>
<tr>
<td>54-24366-01</td>
<td>OCP switch module</td>
</tr>
<tr>
<td>54-24674-01</td>
<td>Server control module</td>
</tr>
<tr>
<td>54-24691-01</td>
<td>Fan fail detect module (cabinet only)</td>
</tr>
<tr>
<td>30-43049-01</td>
<td>OCP display</td>
</tr>
</tbody>
</table>
### Table 7-1  Field-Replaceable Unit Part Numbers (continued)

<table>
<thead>
<tr>
<th>Fans</th>
<th></th>
</tr>
</thead>
<tbody>
<tr>
<td>12-23609-21</td>
<td>4.5-inch fan</td>
</tr>
<tr>
<td>12-24701-34</td>
<td>CPU fan</td>
</tr>
</tbody>
</table>

#### Power System Components

<table>
<thead>
<tr>
<th>Part Number</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>30-44712-01</td>
<td>Power supply (H7291-AA)</td>
</tr>
<tr>
<td>30-45353-01</td>
<td>Techniq AC Box (NA/Japan, H9A10-EB cabinet)</td>
</tr>
<tr>
<td>30-45353-02</td>
<td>Techniq AC Box (Europe/AP, H9A10-EC cabinet)</td>
</tr>
<tr>
<td>30-46788-01</td>
<td>Internal power source 40W/12V fan tray power (cabinet)</td>
</tr>
<tr>
<td>H7600-AA</td>
<td>Power controller (NA/Japan, H9A10-EL cabinet)</td>
</tr>
<tr>
<td>H7600-DB</td>
<td>Power controller (Europe/AP, H9A10-EM cabinet)</td>
</tr>
<tr>
<td>12-23501-01</td>
<td>NEMA power strip (N.A./Japan, pedestal)</td>
</tr>
<tr>
<td>12-45334-02</td>
<td>IEC power strip (Europe/AP, pedestal, and all cabinet systems)</td>
</tr>
</tbody>
</table>

#### Internal Power Cords

<table>
<thead>
<tr>
<th>Part Number</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>17-04285-01</td>
<td>.5 meter IEC to IEC</td>
</tr>
<tr>
<td>17-00606-02</td>
<td>6 foot NEMA to IEC (N.A./Japan, pedestal)</td>
</tr>
<tr>
<td>17-04285-02</td>
<td>2 meter IEC to IEC (Europe/AP, pedestal, and all cabinet systems.)</td>
</tr>
<tr>
<td>17-04285-03</td>
<td>IEC to IEC StorageWorks shelf</td>
</tr>
</tbody>
</table>

#### Fan Tray Cables (Cabinet Only)

<table>
<thead>
<tr>
<th>Part Number</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>17-04324-01</td>
<td>Elec fan power harness</td>
</tr>
<tr>
<td>17-04325-01</td>
<td>12V power for SCM</td>
</tr>
<tr>
<td>17-04338-01</td>
<td>Power ground cable</td>
</tr>
<tr>
<td>17-04339-01</td>
<td>AC cable power</td>
</tr>
</tbody>
</table>
### Table 7-1 Field-Replaceable Unit Part Numbers (continued)

<table>
<thead>
<tr>
<th>Server Control Module Power (Pedestal Only)</th>
<th></th>
</tr>
</thead>
<tbody>
<tr>
<td>30-46485-01</td>
<td>110V North America</td>
</tr>
<tr>
<td>30-46485-02</td>
<td>220V Europe</td>
</tr>
<tr>
<td>30-46485-03</td>
<td>Australia/N.Z.</td>
</tr>
<tr>
<td>30-46485-04</td>
<td>220V U.K.</td>
</tr>
</tbody>
</table>

<table>
<thead>
<tr>
<th>System Drawer Cables and Jumpers</th>
<th>From</th>
<th>To</th>
</tr>
</thead>
<tbody>
<tr>
<td>17-04196-01 Server control module signal cable (60 pin)</td>
<td>Remote I/O signal conn on PCI mbrd</td>
<td>SCM signal conn</td>
</tr>
<tr>
<td>17-04199-01 Current share cable</td>
<td>Current share conn on PS0</td>
<td>Current share conn on PS1 and PS2</td>
</tr>
<tr>
<td>17-04200-01 Floppy signal cable (36 pin)</td>
<td>Floppy conn on PCI mbrd</td>
<td>Floppy</td>
</tr>
<tr>
<td>17-04201-01 OCP signal</td>
<td>OCP conn on PCI mbrd</td>
<td>OCP signal (system drawer only)</td>
</tr>
<tr>
<td>17-04201-02 OCP signal jumper</td>
<td>OCP</td>
<td>OCP</td>
</tr>
<tr>
<td>17-04217-01 Power harness (4100 &amp; early 4000)</td>
<td>Power supply(s)</td>
<td>7 conn. sys mbrd, sys fans 0, 1, 5V conn on PCI mbrd, CD-ROM drv pwr, Floppy pwr, 1 OCP DC enable pwr conn or pwr conn on ped tray pwr drive cable (17-04293-01)</td>
</tr>
<tr>
<td>From</td>
<td>To</td>
<td></td>
</tr>
<tr>
<td>------</td>
<td>----</td>
<td></td>
</tr>
<tr>
<td>Power supply(s)</td>
<td>3 conns. sys mbrd sys fans 0, 1 5V conn on PCI/EISA mbrd 5V &amp; 3V conn on PCI (right) mbrd CD-ROM drv pwr Floppy pwr 1 OCP DC enable pwr conn or pwr conn on ped tray pwr drive cable (17-04293-01)</td>
<td></td>
</tr>
<tr>
<td>Power harness (later 4000 only)</td>
<td>CD-ROM sig conn</td>
<td></td>
</tr>
<tr>
<td>CD-ROM conn on PCI mbrd</td>
<td>Other OCP DC enable pwr conn or pwr conn on ped tray pwr drive cable (17-04293-01)</td>
<td></td>
</tr>
<tr>
<td>Interlock switch assy</td>
<td>Other OCP DC enable pwr conn or pwr conn on ped tray pwr drive cable (17-04293-01)</td>
<td></td>
</tr>
<tr>
<td>Interlock switch assy</td>
<td>12 V DC enable conn on SCM</td>
<td></td>
</tr>
<tr>
<td>Interlock conn on PCI mbrd</td>
<td>SCM</td>
<td></td>
</tr>
<tr>
<td>SCM 34-position jumper</td>
<td>SCM</td>
<td></td>
</tr>
<tr>
<td>SCM 12V interlock jumper</td>
<td>Sys fan 2 and SCM internal 12V conn</td>
<td></td>
</tr>
<tr>
<td>SCM 12V power jumper (4100 &amp; early 4000 only)</td>
<td>16 pos conn on SCM</td>
<td></td>
</tr>
<tr>
<td>SCM 16-position jumper</td>
<td>SCM sig conn on PCI mbrd</td>
<td></td>
</tr>
<tr>
<td>Pedestal Cables</td>
<td>From</td>
<td>To</td>
</tr>
<tr>
<td>----------------</td>
<td>------</td>
<td>----</td>
</tr>
<tr>
<td>17-04293-01</td>
<td>Elec harness power cable +5/+12</td>
<td>Power harness (17-04217-01)</td>
</tr>
<tr>
<td>17-04302-01</td>
<td>OCP signal cable</td>
<td>OCP sig conn on PCI mbrd</td>
</tr>
<tr>
<td>17-04305-01</td>
<td>Harness power cable +5/+12</td>
<td>Power conn on ped tray bulkhd (tray side)</td>
</tr>
<tr>
<td>17-04306-01</td>
<td>SCSI signal cable (narrow)</td>
<td>SCSI sig conn on ped tray bulkhd (tray side)</td>
</tr>
<tr>
<td>17-04380-01</td>
<td>OCP signal cable</td>
<td>OCP sig conn on ped tray bulkhd</td>
</tr>
</tbody>
</table>
7.3 4100 Power System FRUs

Figure 7-2 Location of 4100 Power System FRUs

Notes: Only power cables are shown. Systems have only one OCP located in either the cabinet tray or the pedestal tray. Thicker lines indicate cables present in both cabinets. Thinner lines are cables in the pedestal only.

<table>
<thead>
<tr>
<th>Part Number</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>1</td>
<td>30-45353-01 AC input box; only in cabinet systems: The 01 variant is for N. America/Japan and has a NEMA L6-30P power cord; the 02 variant is for Europe and AP and has an IEC 309 power cord.</td>
</tr>
<tr>
<td></td>
<td>30-45353-02 AC power strip: The 12-23501-01 is used on pedestals in N. America/Japan only and has six NEMA outlets and a 15 ft. cord to the wall outlet; the 12-45334-02 is used on pedestals in Eur./AP and on cabinet systems worldwide and has six IEC320 outlets. In pedestal systems, cords match country-specific wall outlets.</td>
</tr>
<tr>
<td>Part Number</td>
<td>Description</td>
</tr>
<tr>
<td>-------------</td>
<td>-------------</td>
</tr>
<tr>
<td>2a 17-04285-01</td>
<td>Power cord from AC input box to power strip. .5 meter, IEC320 to IEC320 connector used in cabinet systems only. In pedestal systems, cords match country-specific wall outlets.</td>
</tr>
<tr>
<td>1, 2, 2a H7600-AA</td>
<td>Power controller used in place of 30-45353-01, 12-45334-02, and 17-04285-02 in the H9A10-EL cabinet in N. America/Japan</td>
</tr>
<tr>
<td>1, 2, 2a H7600-DB</td>
<td>Power controller used in place of 30-45353-02, 12-45334-02, and 17-04285-02 in the H9A10-EM cabinet in Europe/AP</td>
</tr>
<tr>
<td>3 17-00606-02</td>
<td>Power cord from power strip to power supply: The 17-00606-02 is a 2 m NEMA to IEC320 AC jumper used with the 12-23501-01 power strip in N. Amer./Japan pedestals. The 17-04285-02 is a 2 m IEC320 to IEC320 AC jumper used with the 12-45334-02 power strip used on pedestals in Eur./APA and on cabinet systems worldwide and has six IEC320 outlets. In pedestal systems, cords match country-specific wall outlets.</td>
</tr>
<tr>
<td>4 30-44712-01</td>
<td>Power supply; 92 to 264 VAC input; one to three in a system drawer.</td>
</tr>
<tr>
<td>5 17-04199-01</td>
<td>Cable connecting power supplies</td>
</tr>
<tr>
<td>6 17-04217-01</td>
<td>Power distribution harness (4100 and early 4000)</td>
</tr>
<tr>
<td>7 17-04201-01</td>
<td>Cable from OCP to PCI motherboard (cabinet system)</td>
</tr>
<tr>
<td>8 70-32016-01</td>
<td>Interlock switches and cable to OCP (4100 and early 4000)</td>
</tr>
<tr>
<td>9 17-04351-01</td>
<td>Power from power harness between harness and Fan 2 to SCM</td>
</tr>
<tr>
<td>10 17-04293-01</td>
<td>Cable from power harness to interconnect cable and pedestal tray connector (pedestal system)</td>
</tr>
<tr>
<td>11 17-04302-01</td>
<td>Cable from pedestal tray connector to PCI motherboard (pedestal system)</td>
</tr>
<tr>
<td>12 17-04201-01</td>
<td>Cable from pedestal tray connector to OCP (pedestal system)</td>
</tr>
<tr>
<td>13 17-04305-01</td>
<td>Cable from pedestal tray connector to OCP and SCSI devices (pedestal system)</td>
</tr>
<tr>
<td>14 17-04339-01</td>
<td>Power cord from power strip to cabinet fan tray (cabinet only)</td>
</tr>
</tbody>
</table>
### 7.4 4000 Power System FRUs

#### Figure 7-3 Location of 4000 Power System FRUs

Note: Only power cables are shown. Systems have only one OCP located in either the cabinet tray or the pedestal tray. Thicker lines indicate cables present in both cabinets. Thinner lines are cables in the pedestal only. Callout is intentionally missing.

<table>
<thead>
<tr>
<th>Part Number</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>1 30-45353-01</td>
<td>AC input box; only in cabinet systems: The 01 variant is for N. Amer./Japan and has a NEMA L6-30P power cord; the 02 variant is for Europe and AP and has an IEC 309 power cord.</td>
</tr>
<tr>
<td>2 30-45353-02</td>
<td></td>
</tr>
<tr>
<td>2 12-23501-01</td>
<td>AC power strip: The 12-23501-01 is used on pedestals in N. Amer./Japan only and has six NEMA outlets and a 15 ft. cord to the wall outlet; the 12-45334-02 is used on pedestals in Eur./AP and on cabinet systems worldwide and has six IEC320 outlets. In pedestal systems, cords match country-specific wall outlets.</td>
</tr>
<tr>
<td>2 12-45334-02</td>
<td></td>
</tr>
<tr>
<td>Part Number</td>
<td>Description</td>
</tr>
<tr>
<td>-------------</td>
<td>-------------</td>
</tr>
<tr>
<td>2a 17-04285-01</td>
<td>Power cord from AC input box to power strip. .5 meter, IEC320 to IEC320 connector used in cabinet systems only. In pedestal systems, cords match country-specific wall outlets.</td>
</tr>
<tr>
<td>1, 2, 2a H7600-AA</td>
<td>Power controller used in place of 30-45353-01, 12-45334-02, and 17-04285-02 in the H9A10-EL cabinet in N. America/Japan</td>
</tr>
<tr>
<td>1, 2, 2a H7600-DB</td>
<td>Power controller used in place of 30-45353-02, 12-45334-02, and 17-04285-02 in the H9A10-EM cabinet in Europe/AP</td>
</tr>
<tr>
<td>3 17-00606-02, 17-04285-02</td>
<td>Power cord from power strip to power supply: The 17-00606-02 is a 2 m NEMA to IEC320 AC jumper used with the 12-23501-01 power strip in N. Amer./Japan pedestals. The 17-04285-02 is a 2 m IEC320 to IEC320 AC jumper used with the 12-45334-02 power strip used on pedestals in Eur./APA and on cabinet systems worldwide and has six IEC320 outlets. In pedestal systems, cords match country-specific wall outlets.</td>
</tr>
<tr>
<td>4 30-44712-01</td>
<td>Power supply; 92 to 264 VAC input; one to three in a system drawer.</td>
</tr>
<tr>
<td>5 17-04199-01</td>
<td>Cable connecting power supplies</td>
</tr>
<tr>
<td>6 17-04385-01</td>
<td>Power distribution harness (later 4000 only)</td>
</tr>
<tr>
<td>7 17-04201-01</td>
<td>Cable from OCP to PCI motherboard (cabinet system)</td>
</tr>
<tr>
<td>8 70-33002-01</td>
<td>Interlock switches and cable to OCP (later 4000 only)</td>
</tr>
<tr>
<td>10 17-04293-01</td>
<td>Cable from power harness to interconnect cable and pedestal tray connector (pedestal system)</td>
</tr>
<tr>
<td>11 17-04302-01</td>
<td>Cable from pedestal tray connector to PCI motherboard (pedestal system)</td>
</tr>
<tr>
<td>12 17-04201-01</td>
<td>Cable from pedestal tray connector to OCP (pedestal system)</td>
</tr>
<tr>
<td>13 17-04305-01</td>
<td>Cable from pedestal tray connector to OCP and SCSI devices (pedestal system)</td>
</tr>
<tr>
<td>14 17-04339-01</td>
<td>Power cord from power strip to cabinet fan tray (cabinet only)</td>
</tr>
</tbody>
</table>
7.5 System Drawer Exposure (Cabinet)

There are two cabinet types for these systems: the H9A10-EB -EC cabinet and the H9A10-EL -EM cabinet. System drawer exposure differs depending upon the cabinet.

7.5.1 Cabinet Drawer Exposure (H9A10-EB & -EC)

Open both doors, disconnect cables that obstruct movement of the drawer, remove the shipping brackets, and slide the drawer out from the cabinet.

Figure 7-4 Exposing System Drawer (H9A10-EB & -EC Cabinet)
Exposing the System Bus or PCI Bus Card Cages
1. Open the front and rear doors of the cabinet.
2. At the front of the cabinet, unplug the drawer’s power supplies.
3. At the rear, remove the two Phillips screws holding the shipping bracket on the right rail so that the drawer can be pulled out.
4. Using a flathead screwdriver, disengage the lock mechanism at the lower left hand corner of the drawer.
5. Pull the drawer out part way and release the lock mechanism by removing the screwdriver. If you wish to remove the whole drawer for some reason, leave the screwdriver in place.
6. Once the lock mechanism has been released, slide the drawer out until it locks.
7. Remove the system bus card cage cover. Unscrew the two Phillips head screws holding the cover in place and slide it off the drawer.
8. Remove the PCI bus card cage cover. Unscrew the three Phillips head screws holding the cover to the side of the drawer and slide it off the drawer.

Exposing the Power System or System Fans
1. Open the front and rear doors of the cabinet.
2. At the rear of the cabinet, remove any cables from PCI options that may interfere with pulling the drawer forward.
3. At the front, remove the shipping brackets on the right and left rails that hold the drawer.
4. Pull out the drawer until it locks.
5. Remove the power section cover. Unscrew the two Phillips head screws and slide the cover off the drawer.
7.5.2 Cabinet Drawer Exposure (H9A10-EL & EM)

In the H9A10-EL and -EM Cabinet, the system drawer sits on a tray that slides out of the front of the cabinet. A stabilizer bar must be pulled out from the bottom to prevent the cabinet from tipping over.

Figure 7-5 Exposing System Drawer (H9A10-EL & -EM Cabinet)
CAUTION: The cabinet could tip over if a system drawer is pulled out and the stabilizing bar is not fully extended and its leveler foot on the floor.

Exposing any section of the system drawer in an H9A10-EL or -EM Cabinet.
1. Open the front door of the cabinet.
2. Pull the stabilizer bar at the bottom of the cabinet out until it stops.
3. Extend the leveler foot at the end of the stabilizer bar to the floor.
4. Unplug the drawer’s power supplies.
5. Remove the Phillips screws holding the shipping bracket to the rails so that the drawer can be pulled out.
6. Pull the drawer all the way out until it locks
7. To access the system bus card cage cover, unscrew the two Phillips head screws holding the cover in place and slide it off.
8. To access the PCI/EESA bus card cage, unscrew the three Phillips head screws holding the cover to the right side of the drawer and slide it off.
9. To access the PCI bus card cage, unscrew the three Phillips head screws holding the cover to the left side of the drawer and slide it off.
10. To access the power or fan section, unscrew the two Phillips head screws holding the cover in place and slide it off.
7.6 System Drawer Exposure (Pedestal)

Figure 7-5   Exposing System Drawer (Pedestal)
Exposing the System Drawer

1. Open the front door and remove it by lifting and pulling it away from the system.
2. Remove the top cover. Unscrew the two Phillips head screws midway up on each side of the pedestal, tilt the cover up, and lift it away from the frame.
3. Remove the system bus card cage cover at the back of the pedestal if you are replacing any of the following: CPU, memory, power control module, system bus to PCI bus module, system motherboard, cables that attach to the system motherboard, or a system fan. To remove the cover, unscrew the two Phillips head screws and slide the cover off the drawer.
4. Remove the PCI bus card cage cover at the back of the pedestal if you are replacing any of the following: PCI or EISA option, server control module, PCI motherboard, cables attached to the PCI motherboard. To remove the cover, unscrew the three Phillips head screws holding the cover to the side of the drawer and slide the cover off the drawer.
5. Remove the pedestal tray as described below if you are replacing any of the following: system fan, power supply, power cables.

Removing the Pedestal Tray

1. Remove the tray cover by loosening the screws at the back of the tray.
2. Disconnect the cables from the OCP and any optional SCSI device from the bulkhead connector in the rear right corner of the tray.
3. Unscrew the Phillips head screw holding the bulkhead to the tray.
4. Unscrew the two Phillips head retaining screws and slide the tray off the drawer.
7.7 CPU Removal and Replacement

CAUTION: Several different CPU modules work in these systems. Unless you are upgrading, be sure you are replacing the broken module with the same variant. B3001 and B3002 can only be used in AlphaServer 4100 systems.

Figure 7-6 Removing CPU Module

WARNING: CPU modules and memory modules have parts that operate at high temperatures. Wait 2 minutes after power is removed before touching any module.
Removal
1. Shut down the operating system and power down the system.
2. Expose the system drawer.
3. Expose the system bus card cage. Remove the two Phillips head screws holding the cover in place and slide it off the drawer.
4. Identify and remove faulty CPU. A label to the left of the system bus card cage identifies which slot contains CPU0, CPU1, CPU2, or CPU3. The CPU is held in place with levers at both ends; simultaneously raise the levers and lift the CPU from the cage.

Replacement
Reverse the steps in the Removal procedure.

Verification — DIGITAL UNIX and OpenVMS Systems
1. Bring the system up to the SRM console by pressing the Halt button, if necessary.
2. Issue the `show cpu` command to display the status of the new module.

Verification — Windows NT Systems
1. Start AlphaBIOS Setup, select Display System Configuration, and press Enter.
2. Using the arrow keys, select MC Bus Configuration to display the status of the new module.
7.8 CPU Fan Removal and Replacement

Figure 7-7 Removing CPU Fan
Removal

1. Follow the CPU Removal and Replacement procedure.
2. Unplug the fan from the module.
3. Remove the four Phillips head screws holding the fan to the Alpha chip’s heatsink.

Replacement

Reverse the above procedure.

Verification

If the system powers up, the CPU fan is working.
7.9 Memory Removal and Replacement

CAUTION: Several different memory modules work in these systems. Be sure you are replacing the broken module with the same variant.

Figure 7-8 Removing Memory Module

WARNING: CPU modules and memory modules have parts that operate at high temperatures. Wait 2 minutes after power is removed before touching any module.
Removal
1. Shut down the operating system and power down the system.
2. Expose the system drawer.
3. Expose the system bus card cage. Remove the two Phillips head screws holding the cover in place and slide it off the drawer.
4. Identify and remove the faulty module. A label to the left of the system card cage identifies which slot contains the high or low halves of memory banks. The memory module is held in place by a flathead captive screw attached to the top brace of the module. Loosen the screw and lift the module from the cage.

Replacement
Reverse the steps in the Removal procedure.

NOTE: Memory modules must be installed in pairs. When you replace a bad module, be sure the second module in the pair is in place.

Verification — DIGITAL UNIX and OpenVMS Systems
1. Bring the system up to the SRM console by pressing the Halt button, if necessary.
2. Issue the show memory command to display the status of the new memory.
3. Verify the functioning of the new memory by issuing the command test mem\text{n}, where \( n \) is 0, 1, 2, 3, or *.

Verification — Windows NT Systems
1. Start AlphaBIOS Setup, select Display System Configuration, and press Enter.
2. Using the arrow keys, select Memory Configuration to display the status of the new memory.
3. Switch to the SRM console (press the Halt button in so that the LED on the button lights and reset the system). Verify the functioning of the new memory by issuing the command test mem\text{n}, where \( n \) is 0, 1, 2, 3, or *.
7.10 Power Control Module Removal and Replacement

Figure 7-9 Removing Power Control Module
Removal
1. Shut down the operating system and power down the system.
2. Expose the system drawer.
3. Expose the system bus card cage. Remove the two Phillips head screws holding the cover in place and slide it off the drawer.
4. Remove the faulty PCM. The PCM is located in the back left corner of the system bus card cage. A captive flathead screw and the rear card guide hold the PCM in place. Unscrew the screw and lift the module from the cage.

Replacement
Reverse the steps in the Removal procedure.

Verification
Power up the system. If the PCM is faulty or not seated properly, the system will not come up.
7.11 System Bus to PCI Bus Bridge (B3040-AA) Module Removal and Replacement

Figure 7-10 Removing System Bus to PCI/EISA Bus Bridge Module (B3040-AA)
Removal

1. Shut down the operating system and power down the system.
2. Expose the system drawer.
3. Expose the system bus card cage. Remove the two Phillips head screws holding the cover in place and slide it off the drawer.
4. Expose the PCI bus card cage. Remove three Phillips head screws holding the cover in place and slide it off the drawer.
5. Remove all the PCI/EISA options.
6. Remove the server control module.
7. Remove the PCI motherboard.
8. Remove the two Phillips head screws holding the system bus to PCI bus bridge module to the sheet metal between the system bus card cage and the PCI bus card cage.
9. Remove enough CPU and memory modules to the right of the bridge module to allow a flathead screwdriver to be inserted in the slot in the middle of the module’s top bracket.
10. Place a flathead screwdriver into the slot in the middle of the module’s top bracket and into the corresponding slot in the sheet metal between the two card cages. Use the screwdriver as a lever to disconnect the bridge module from the connector on the system motherboard.
11. Remove the bridge module from the system bus card cage.

Replacement

Reverse the steps in the Removal procedure.

Verification

Power up the system (press the Halt button if necessary to bring up the SRM console) and issue the `show device` command at the console prompt to verify that the system sees all system options and peripherals.
7.12 System Bus to PCI Bus Bridge (B3040-AB)
Module Removal and Replacement

Figure 7-11 Removing System Bus to PCI Bus Bridge Module (B3040-AB)
Removal

1. Shut down the operating system and power down the system.
2. Expose the system drawer.
3. Expose the system bus card cage. Remove the two Phillips head screws holding the cover in place and slide it off the drawer.
4. Expose the PCI bus card cage on the right side of the drawer. Remove three Phillips head screws holding the cover in place and slide it off the drawer.
5. Remove all the PCI options.
6. Remove the PCI motherboard.
7. Remove the two Phillips head screws holding the system bus to PCI bus bridge module to the sheet metal between the system bus card cage and the PCI bus card cage.
8. Remove enough CPU and memory modules to the left of the bridge module to allow a flathead screwdriver to be inserted in the slot in the middle of the module’s top bracket.
9. Place a flathead screwdriver into the slot in the middle of the module’s top bracket and into the corresponding slot in the sheet metal between the two card cages. Use the screwdriver as a lever to disconnect the bridge module from the connector on the system motherboard.
10. Remove the bridge module from the system bus card cage.

Replacement

Reverse the steps in the Removal procedure.

Verification

Power up the system (press the Halt button if necessary to bring up the SRM console) and issue the `show device` command at the console prompt to verify that the system sees all system options and peripherals.
7.13 System Motherboard (4100 & Early 4000) Removal and Replacement

The system motherboard contains an NVRAM that holds the system serial number. Be sure to record this number before replacing the module. The serial number is on a barcode on the side of the system drawer or on the system bus card cage. The part number for the 4100 is 54-23803-01 and for the early 4000 is 54-23803-02.

Figure 7-12 Removing System Motherboard

Removal
1. Shut down the operating system and power down the system.
2. Expose the system drawer.
3. Expose the system bus card cage by removing the two Phillips head screws holding it in place and sliding the cover off the drawer.
4. Remove all CPUs, memory modules, and the PCM from the system motherboard.

7-30 AlphaServer 4000/4100 Service Manual
5. Expose the PCI bus card cage. Remove three Phillips head screws holding the cover in place and slide it off the drawer.

6. Remove all the PCI/EISA options.

7. Remove the server control module.

8. Remove the PCI motherboard.

9. Remove system bus to PCI bus module from the system motherboard.

10. Remove the bracket holding the power cables in place as they pass from the system bus section to the power section of the drawer.

11. Disconnect all cables to the system motherboard and lay them back over the power supply section of the system drawer.

   CAUTION: Secure the power harness connectors in the system card cage to ensure that they cannot damage the pins in the CPU connectors.

12. Remove both the front and back module card guides. Unscrew the two screws that hold the guides in place.

13. Remove the system motherboard from the card cage by removing the 15 Phillips head screws holding it in place. Record the system serial number. (The serial number is on a barcode on the side of the system drawer or on the system bus card cage.)

Replacement

Reverse the above procedure. To align the motherboard in the cage, start replacing the screws in the corners next to the system bus to PCI bus bridge module and then the PCM module. Subsequent screws should align properly.

Verification

1. Power up the system (press the Halt button if necessary to bring up the SRM console) and issue the **show device** command at the console prompt to verify that all system options are seen.

2. Restore the system serial number by issuing the **set sys_serial_num** command at the SRM console prompt.
7.14 System Motherboard (4000) Removal and Replacement

The system motherboard contains an NVRAM that holds the system serial number. Be sure to record this number before replacing the module. The serial number is on a barcode on the side of the system drawer or on the system bus card cage. The part number for the later 4000 is 54-23805-01.

Figure 7-13 Removing System Motherboard

Removal
1. Shut down the operating system and power down the system.
2. Expose the system drawer.
3. Expose the system bus card cage by removing the two Phillips head screws holding it in place and sliding the cover off the drawer.
4. Remove all CPUs, memory modules, and the PCM from the system motherboard.
5. Expose both PCI bus card cages. Remove three Phillips head screws holding each cover in place and slide them off the drawer.
6. Remove all the PCI/EISA options.
7. Remove the server control module.
8. Remove the PCI motherboards.
9. Remove both bridge modules from the system motherboard.
10. Remove the bracket holding the power cables in place as they pass from the system bus section to the power section of the drawer.
11. Disconnect all cables to the system motherboard and lay them back over the power supply section of the system drawer.
   
   CAUTION: Secure the power harness connectors in the system card cage to ensure that they cannot damage the pins in the CPU connectors.

12. Remove both the front and back module card guides. Unscrew the two screws that hold the guides in place.
13. Remove the system motherboard from the card cage by removing the 15 Phillips head screws holding it in place. Record the system serial number. (The serial number is on a barcode on the side of the system drawer or on the system bus card cage.)

Replacement
Reverse the above procedure. To align the motherboard in the cage, replace screws in adjacent corners of the module. Subsequent screws should align properly.

Verification
1. Power up the system (press the Halt button if necessary to bring up the SRM console) and issue the show device command at the console prompt to verify that all system options are seen.
2. Restore the system serial number by issuing the set sys_serial_num command at the SRM console prompt.
7.15 PCI/EISA Motherboard (B3050) Removal and Replacement

Figure 7-14 Replacing PCI/EISA Motherboard

Removal

The PCI motherboard contains an NVRAM with ECU data and customized console environment variables. Therefore, if the console runs, execute a `show *` command at the console prompt and, if you have not done so earlier, record the settings for the `sys_model_number` and `sys_type` environment variables. These environment variables are used to display the system model number and type, and they compute certain information passed to the operating system. When you replace the PCI
motherboard, these environment variables are lost and must be restored after the module swap.

1. Shut down the operating system and power down the system.
2. Expose the system drawer.
3. Expose the PCI bus card cage. Remove three Phillips head screws holding the cover in place and slide it off the drawer.
4. Remove all PCI and EISA options.
5. Disconnect all cables connected to the PCI motherboard.
6. Remove the server control module.
7. Unscrew the two screws holding the system bus to PCI bus bridge module in the system bus card cage to the PCI motherboard.
8. Remove the nine Phillips head screws that hold the motherboard in place. To reach the screws on the bottom of the board, thread your screwdriver through the three holes in the sheet metal.
9. Carefully pry the motherboard loose from the system bus to PCI bus bridge module on the other side of the sheet metal separating the system bus card cage from the PCI card cage.
10. Remove the motherboard from the card cage.

**Replacement**
Reverse the steps in the Removal procedure.

**Verification**

1. Power up the system (press the Halt button if necessary to bring up the SRM console) and issue the `show device` command at the console prompt to verify that the system sees all options.
2. Restore the `sys_model_num`, `sys_type`, and other customized environment variables to their previous settings. Run the ECU to restore EISA configuration data. This must be done regardless of whether there is an EISA option in the EISA slot on PCI 0.
7.16 PCI Motherboard (B3051) Removal and Replacement

Figure 7-15 Replacing PCI Motherboard
Removal

1. Shut down the operating system and power down the system.
2. Expose the system drawer.
3. Expose the PCI bus card cage on the right when viewing the drawer from the rear. Remove three Phillips head screws holding the cover in place, and slide it off the drawer.
4. Remove all PCI options.
5. Disconnect all cables connected to the PCI motherboard.
6. Unscrew the two screws holding the system bus to PCI bus bridge module in the system bus card cage to the PCI motherboard.
7. Remove the nine Phillips head screws that hold the motherboard in place. To reach the screws on the bottom of the board, thread your screwdriver through the three holes in the sheet metal.
8. Carefully pry the motherboard loose from the system bus to PCI bus bridge module on the other side of the sheet metal separating the system bus card cage from the PCI card cage.
9. Remove the motherboard from the card cage.

Replacement

Reverse the steps in the Removal procedure.

Verification

1. Power up the system (press the Halt button if necessary to bring up the SRM console) and issue the **show device** command at the console prompt to verify that the system sees all options.
7.17 Server Control Module Removal and Replacement

Figure 7-16 Removing Server Control Module
**Removal**

1. Shut down the operating system and power down the system.
2. Expose the system drawer.
3. Expose the PCI bus card cage. Remove three Phillips head screws holding the cover in place and slide it off the drawer.
4. Disconnect the cables connected at the bulkhead to the server control module.
5. If necessary, remove several PCI and EISA options from the bottom of the PCI card cage up until you can access the server control module.
6. Disconnect the two cables connected to the PCI motherboard at the server control module end.
7. Disconnect the twisted pair power cable from the module.
8. Place a credit card or a piece of cardboard between the edge of the SCM module and the B3050 module to protect the delicate pins of the ASICs located in the lower right corner of the B3050 module.
9. The server control module is held in place by four stud snaps. Using a flathead screwdriver gently pry the SCM module off the snaps and remove it. *Make sure you do not hit the pins of the ASICs on the B3050.*

**Replacement**

Reverse the steps in the Removal procedure.

**Verification**

Verify console output on COM1.
7.18 PCI/ESA Option Removal and Replacement

Figure 7-17 Removing PCI/ESA Option

WARNING: To prevent fire, use only modules with current limited outputs. See National Electrical Code NFPA 70 or Safety of Information Technology Equipment, Including Electrical Business Equipment EN 60 950.
Removal
1. Shut down the operating system and power down the system.
2. Expose the system drawer.
3. Expose the PCI bus card cage. Remove three Phillips head screws holding the cover in place and slide it off the drawer.
4. Remove the faulty option. Disconnect cables connected to the option. Unscrew the small Phillips head screw securing the option to the card cage. Slide the option from the card cage.

Replacement
Reverse the steps in the Removal procedure.

Verification — DIGITAL UNIX and OpenVMS Systems
1. Power up the system (press the Halt button if necessary to bring up the SRM console) and run the ECU to restore EISA configuration data.
2. Issue the show config command or show device command at the console prompt to verify that the system sees the option you replaced.
3. Run any diagnostic appropriate for the option you replaced.

Verification — Windows NT Systems
1. Start AlphaBIOS Setup, select Display System Configuration, and press Enter.
2. Using the arrow keys, select PCI Configuration or EISA Configuration to determine that the new option is listed.
7.19 Power Supply Removal and Replacement

Figure 7-18 Removing Power Supply

Jumper
17-04199-01

Cable Harness
17-04217-01
or 17-04358-01
Removal

1. Shut down the operating system and power down the system.
2. Expose the system drawer.
3. Remove the cover to the power section of the drawer. Remove the two Phillips head screws holding the cover in place and slide it off the drawer.
4. Release the power supply tray by removing the two Phillips head screws on the side of the drawer. See ④.
5. Lift the power supply tray to release it from the sheet metal and slide it out from the drawer until it locks (about 4 inches).
6. Tilt the tray to allow easier access to the back of the power supplies.
7. Unplug the connectors at the rear of the supply that is being replaced.
8. Unscrew the four Phillips head screws at the front of the tray that hold the power supply in place. Also unscrew the two screws at the back of the power supply. See ⑤.
9. Remove the power supply.

Replacement

Reverse the steps in the Removal procedure.

Verification

Power up the system. If the system has redundant power, the system will power up regardless of whether the replaced power supply is faulty. In this case look at the PCM LEDs to determine that the power supply is functioning properly. If the system does not have redundant power, it will not power up.
7.20 Power Harness (4100 and early 4000) Removal and Replacement

Figure 7-19 Removing Power Harness
Removal

1. Shut down the operating system and power down the system.
2. Expose the system drawer.
3. Expose the power, system card cage, and PCI/EISA sections of the drawer by removing all covers. Unscrew the Phillips head screws holding each cover in place and slide the covers off the drawer.
4. Release the power supply tray by removing the two Phillips head screws on the side of the drawer.
5. Lift the power supply tray to release it from the sheet metal and slide it out from the drawer until it locks.
6. Tilt the tray to allow easier access to the fans.
7. Remove the bracket holding the power harness as it passes from the power section to the system card cage section of the drawer. Remove the three Phillips head screws holding the bracket in place.
8. Disconnect the power harness from the system motherboard and fold the harness back over the power supplies.
   
   **CAUTION:** Secure the power harness connectors in the system card cage to ensure that they cannot damage the pins in the CPU connectors.
9. Disconnect the two power connectors from the PCI/ESIA motherboard. Push the power cable through the hole from the PCI/ESIA section into the power section.
10. Disconnect the fan power cables from the power harness.
11. Remove the four Phillips head screws holding the OCP tray to the drawer.
12. Slide the tray from the drawer far enough to disconnect the power cables attached to the OCP (cabinet only), the floppy, and the CD-ROM.
13. As you remove the tray from the system, push the power cables through the hole at the back of the tray into the power section of the drawer.
14. Disconnect the power harness from the power supplies. Remove the harness from the system.

Replacement

Reverse the steps in the Removal procedure.
7.21 Power Harness (4000) Removal and Replacement

Figure 7-20 Removing Power Harness
Removal

1. Shut down the operating system and power down the system.
2. Expose the system drawer.
3. Expose the power and system card cage sections of the drawer by removing the two covers. Unscrew the two Phillips head screws holding each cover in place and slide the covers off the drawer.
4. If you want more space to work on the fans, do this step and the next; otherwise skip to step 7. Release the power supply tray by removing the two Phillips head screws on the side of the drawer.
5. Lift the power supply tray to release it from the sheet metal and slide it out from the drawer until it locks.
6. Tilt the tray to allow easier access to the fans.
7. Remove the bracket holding the power harness as it passes from the power section to the system card cage section of the drawer. Remove the three Phillips head screws holding the bracket in place.
8. Disconnect the power harness from the system motherboard and fold the harness back over the power supplies.
   
   **CAUTION:** Secure the power harness connectors in the system card cage to ensure that they cannot damage the pins in the CPU connectors.

9. Disconnect the three power connectors from the PCI/ESIA motherboard. Push the cable through the hole from the PCI/EISA section into the power section.
10. Disconnect the three power connectors from the other PCI motherboard. Push the power cable through the hole from the PCI section into the power section.
11. Disconnect the fan power cables from the power harness.
12. Remove the four Phillips head screws holding the OCP tray to the drawer.
13. Slide the tray from the drawer far enough to disconnect the power cables attached to the OCP (cabinet only), the floppy, and the CD-ROM.
14. As you remove the tray from the system, push the power cables through the hole at the back of the tray into the power section of the drawer.
15. Disconnect the power harness from the power supplies. Remove the harness from the system.

Replacement

Reverse the steps in the Removal procedure.
7.22 System Drawer Fan Removal and Replacement

Figure 7-21 Removing System Drawer Fan

Removal
1. Shut down the operating system and power down the system.
2. Expose the system drawer.
3. Expose the power system, the system card cage, and the PCI card cage sections of the drawer by removing all three covers. Unscrew the two Phillips head screws holding each cover on top of the drawer in place and slide them off the drawer. Release the two lever latches holding the PCI card cage cover in place and slide it off.
4. Release the power supply tray by removing the two Phillips head screws on the side of the drawer.

5. Lift the power supply tray to release it from the sheet metal and slide it out from the drawer.

6. Tilt the tray to allow easier access to the fans.

7. Remove the bracket holding the power harness as it passes from the power section to the system card cage section of the drawer. Remove the three Phillips head screws holding the bracket in place.

8. Disconnect the power harness from the system motherboard and fold the harness back over the power supplies. Remove any modules that prevent you from disconnecting the harness from the system motherboard.

   **CAUTION:** Secure the power harness connectors in the system card cage to ensure that they cannot damage the pins in the CPU connectors.

9. Disconnect the three power connectors from the PCI motherboard and pass them through the hole from the PCI card cage to the power section of the drawer.

10. Disconnect the fan power cables from the power harness.

11. Remove the four Phillips head screws holding the OCP tray to the system drawer. Slide the tray out of the system drawer far enough to disconnect power cables attached to the OCP, the floppy, and the CD-ROM drive.

12. Remove the tray from the system.

13. Release the three lever latches on the bracket holding all three fans in place.

14. Disconnect the broken fan’s power cable from the power harness and lift the fan from the drawer.

**Replacement**

Reverse the steps in the Removal procedure.

**Verification**

Power up the system. If the fan you installed is faulty, the system will not power up. Look at the PCM LEDs to determine that the fan you replaced is functioning properly.
7.23 Cover Interlock (4100 and early 4000) Removal and Replacement

Figure 7-22 Removing Cover Interlocks
Removal
1. Shut down the operating system and power down the system.
2. Expose the system drawer.
3. Remove all three section covers to expose the interlock switch assembly.
4. Remove the two screws holding the interlock in place.
5. Push the interlock toward the opposite side of the system drawer (be sure not to twist it) and tilt it so that the switches affected by the power and system card cage covers clear the openings in the side of the drawer. Slide it toward the front of the drawer and remove it, letting it hang loosely over the side of the drawer.
6. If you are working on a pedestal system, disconnect the switch connection from the tray bulkhead and remove the interlock switch assembly.
7. If you are working on a system drawer in a cabinet, unscrew the four screws holding the OCP tray assembly in place beneath the drawer in front.
8. Slide the tray out and remove it from the system.
9. Pull the interlock switch connection to the OCP back through the access hole and remove the entire switch assembly.

Replacement
Reverse the steps in the Removal procedure.

Verification
Power up the system. If the switch you installed is faulty, the system will not power up.
7.24 Cover Interlock (later 4000) Removal and Replacement

Figure 7-23 Removing Cover Interlocks
Removal
1. Shut down the operating system and power down the system.
2. Expose the system drawer.
3. Remove all three section covers to expose the interlock switch assemblies.
4. Remove the two screws holding the interlocks in place.
5. Push the interlock toward the opposite side of the system drawer (be sure not to twist it) and tilt it so that the switches affected by the power and system card cage covers clear the openings in the side of the drawer. Slide it toward the front of the drawer and remove it, letting it hang loosely over the side of the drawer.
6. If you are working on a pedestal system, disconnect the switch connection from the tray bulkhead and remove the interlock switch assembly.
7. If you are working on a system drawer in a cabinet, unscrew the four screws holding the OCP tray assembly in place beneath the drawer in front.
8. Slide the tray out and remove it from the system.
9. Pull the interlock switch connection to the OCP back through the access hole and remove the entire switch assembly.

Replacement
Reverse the steps in the Removal procedure.

Verification
Power up the system. If the switch you installed is faulty, the system will not power up.
7.25 Operator Control Panel Removal and Replacement (Cabinet)

Figure 7-24 Removing OCP (Cabinet)
Removal
1. Shut down the operating system and power down the system.
2. Expose the system drawer.
3. While you need not remove the tray containing the OCP, you do need to slide it forward to access the OCP retaining screws under the tray. The tray is attached to the power system section cover. To slide the tray forward:
   a. Remove the tray cover by loosening the retaining screws at the back of the tray and sliding it toward the back of the system.
   b. Disconnect the cables from the OCP, and any optional SCSI device in the tray from the bulkhead at the rear right of the tray.
   c. Unscrew the Phillips head retaining screw holding the bulkhead to the tray.
   d. Unscrew the two Phillips head retaining screws at the front of the system drawer and slide the tray forward.
4. Remove the white power interconnect wire and the signal ribbon cable from the OCP.
5. Remove the two Phillips head screws holding the OCP in place and remove it from the tray.

Replacement
Reverse the steps in the Removal procedure.

Verification
Power up the system. If the OCP you installed is faulty, the system will not power up.
7.26 Operator Control Panel Removal and Replacement (Pedestal)

Figure 7-25 Removing OCP (Pedestal)
Removal

1. Shut down the operating system and power down the system.
2. Expose the system drawer.
3. Remove the four Phillips head screws holding the OCP tray to the system drawer.
4. Slide the tray out of the system drawer far enough to disconnect cables attached to the OCP, the floppy, and the CD-ROM drive.
5. Remove the tray from the system.
6. Move the tray to some handy work surface. Hold the tray vertically and remove the two Phillips head screws that hold the OCP in place from the bottom of the tray and remove the OCP assembly from the tray.

Replacement

Reverse the steps in the Removal procedure. As you replace the tray in the drawer, be sure that the slides on the sides of the tray are placed on the rails in the drawer.

Verification

Power up the system. If the OCP you installed is faulty, the system will not power up or you will not see messages on the OCP display.
7.27 Floppy Removal and Replacement

Figure 7-26 Removing Floppy Drive
Removal

1. Shut down the operating system and power down the system.
2. Expose the system drawer.
3. Remove the four Phillips head screws holding the OCP tray to the system drawer.
4. Slide the tray out of the system drawer and disconnect cables attached to the OCP (unnecessary on a pedestal system), the floppy, and the CD-ROM drive. (In the pedestal system the OCP is in the tray above the power supplies.)
5. Move the tray to some handy work surface. Hold the tray vertically and from the bottom of the tray remove the four Phillips head screws that hold the floppy in place and remove it from the tray.

Replacement

Reverse the steps in the Removal procedure. As you replace the tray in the drawer, be sure that the slides on the sides of the tray are placed on the rails in the drawer.

Verification

Power up the system. Use the following SRM console commands to test the floppy:

P00>>> show dev floppy
P00>>> HD buf/dva0
7.28 CD-ROM Removal and Replacement

Figure 7-27 Removing CD-ROM
Removal
1. Shut down the operating system and power down the system.
2. Expose the system drawer.
3. Remove the four Phillips head screws holding the OCP tray to the system drawer.
4. Slide the tray out of the system drawer and disconnect cables attached to the OCP (unnecessary on a pedestal system), the floppy, and the CD-ROM drive. (In the pedestal system the OCP is in the pedestal tray above the power supplies.)
5. Move the tray to some handy work surface. Hold the tray vertically and from the bottom of the tray remove the four Phillips head screws that hold the floppy in place and remove it from the tray.

Replacement
Reverse the steps in the Removal procedure. As you replace the tray in the drawer, be sure that the slides on the sides of the tray are placed on the rails in the drawer.

Verification
Power up the system (press the Halt button if necessary to bring up the SRM console). Use the following SRM console commands to test the CD-ROM:

```
P00>>> show dev ncr0
P00>>> HD buf/dka nnn
```
where `nnn` is the device number; for example, dka500.
7.29 Cabinet Fan Tray Removal and Replacement

Figure 7-28 Removing Cabinet Fan Tray
Removal

1. Shut down the operating system and power down the system. Unplug the AC power cable from the cabinet tray power supply.
2. If present, unplug any power cables going to the server control modules at the back of system drawers.
3. Unscrew the four Phillips head screws securing the fan tray to the top of the cabinet.
4. Loosen the four hexnuts that hold the tray to the top of the cabinet.
5. Holding the bottom of the tray, slide it out so that the holes in the tray frame slip over the loosened hexnuts.
6. Move the tray to a work surface to remove whatever component is being replaced.

Replacement

Reverse the steps in the Removal procedure.

Verification

Power up the system. If the green power LED comes on, and the fan LED is off, the cabinet fan tray is verified.
7.30 Cabinet Fan Tray Power Supply Removal and Replacement

Figure 7-29 Removing Cabinet Fan Tray Power Supply

- To fan fail detect board
- Ground
- Load
- Not used
- Neutral
- Offsets
- To fans
- Power supply cover
- Power supply

PKW0441B-96
Removal

1. Remove the cabinet fan tray.
2. Disconnect the power harness from the fan fail detect module and each fan.
3. Remove the power supply cover. It is held in place by two screws that go through the AC bulkhead spot welded to the tray weldment.
4. Remove the power harness from the tray by disconnecting it from the power supply.
5. Disconnect the neutral and load leads from the power supply.
6. Remove the four screws holding the power supply to the tray. Keep track of the standoffs that provide space between the power supply and weldment. You will need them during replacement.

Replacement

1. Reverse the steps in the Removal procedure.
2. Place the fan tray back in the cabinet.

Verification

Power up the system. If the green power LED comes on, and the fan LED is off, the cabinet fan tray power supply is verified.
7.31 Cabinet Fan Tray Fan Removal and Replacement

Figure 7-30 Removing Cabinet Fan Tray Fan
Removal
1. Remove the cabinet fan tray.
2. Disconnect the power harness from the fan you wish to replace.
3. Remove the fan finger guard.
4. Remove the two remaining screws holding the fan to the tray and remove the fan.
5. If the new fan does not have clip nuts, remove them from the fan.

Replacement
1. Reverse the Removal procedure, taking care to orient the fan so that the connection to the power harness is dressed nicely.
2. Place the fan tray back in the cabinet.

Verification
Power up the system. If the green power LED comes on, and the fan LED is off, the cabinet fan tray fan is verified.
7.32 Cabinet Fan Tray Fan Fail Detect Module
Removal and Replacement

Figure 7-31 Removing Fan Tray Fan Fail Detect Module
Removal
1. Remove the cabinet fan tray.
2. Disconnect the power harness from the fan fail detect module.
3. Remove the fan fail detect module. In early systems, the module is held in place by three screws that go through the weldment, through three standoffs, through the module to nuts. In later systems, the module snaps in place.

Replacement
1. Reverse the steps in the Removal procedure.
2. Place the fan tray back in the cabinet.

Verification
Power up the system. If the green power LED comes on, and the fan LED is off, the cabinet fan fail detect module is verified.
7.33 StorageWorks Shelf Removal and Replacement

Figure 7-32 Removing StorageWorks Shelf

Cabinet

StorageWorks Shelf Mounting Rails (H910A-EC)

StorageWorks Shelf Mounting Rails (H910A-EB)

Pedestal
Removal
1. Shut down the operating system and power down the system.
2. Remove the power cord and signal cord(s) from the StorageWorks shelf.
3. Remove the two retaining brackets holding the shelf in the mounting rail by removing the Phillips head screws holding the brackets in place.
4. Slide the shelf out of the system.

Replacement
Reverse the steps in the Removal procedure.

Verification
Power up the system. Use the `show device console` command to verify that the StorageWorks shelf is configured into the system.
Appendix A

Running Utilities

This appendix provides a brief overview of how to load and run utilities. The following topics are covered:

- Running Utilities from a Graphics Monitor
- Running Utilities from a Serial Terminal
- Running ECU
- Running RAID Standalone Configuration Utility
- Updating Firmware with LFU
- Updating Firmware from AlphaBIOS
- Upgrading AlphaBIOS
A.1 Running Utilities from a Graphics Monitor

Start AlphaBIOS and select Utilities from the menu. The next selection depends on the utility to be run. For example, to run ECU, select Run ECU from floppy. To run RCU, select Run Maintenance Program.

Figure A-1 Running a Utility from a Graphics Monitor
A.2 Running Utilities from a Serial Terminal

Utilities are run from a serial terminal in the same way as from a graphics monitor. The menus are the same, but some keys are different.

<table>
<thead>
<tr>
<th>AlphaBIOS Key</th>
<th>VTxxx Key</th>
</tr>
</thead>
<tbody>
<tr>
<td>F1</td>
<td>Ctrl/A</td>
</tr>
<tr>
<td>F2</td>
<td>Ctrl/B</td>
</tr>
<tr>
<td>F3</td>
<td>Ctrl/C</td>
</tr>
<tr>
<td>F4</td>
<td>Ctrl/D</td>
</tr>
<tr>
<td>F5</td>
<td>Ctrl/E</td>
</tr>
<tr>
<td>F6</td>
<td>Ctrl/F</td>
</tr>
<tr>
<td>F7</td>
<td>Ctrl/P</td>
</tr>
<tr>
<td>F8</td>
<td>Ctrl/R</td>
</tr>
<tr>
<td>F9</td>
<td>Ctrl/T</td>
</tr>
<tr>
<td>F10</td>
<td>Ctrl/U</td>
</tr>
<tr>
<td>Insert</td>
<td>Ctrl/V</td>
</tr>
<tr>
<td>Delete</td>
<td>Ctrl/W</td>
</tr>
<tr>
<td>Backspace</td>
<td>Ctrl/H</td>
</tr>
<tr>
<td>Escape</td>
<td>Ctrl/[</td>
</tr>
</tbody>
</table>
A.3 Running ECU

The EISA Configuration Utility (ECU) is used to configure EISA options on AlphaServer systems. The ECU can be run either from a graphics monitor or a serial terminal.

1. Start AlphaBIOS Setup. If the system is in the SRM console, issue the command `alphabios`. (If the system has a graphics monitor, you can set the SRM `console` environment variable to `graphics`.)

2. From AlphaBIOS Setup, select Utilities, then select Run ECU from floppy… from the submenu that displays, and press Enter.

   NOTE: The EISA Configuration Utility is supplied on diskettes shipped with the system. There is a diskette for Microsoft Windows NT and a diskette for DIGITAL UNIX and OpenVMS.

3. Insert the correct ECU diskette for the operating system and press Enter to run it.

The ECU main menu displays the following options:

EISA Configuration Utility
Steps in configuring your computer
STEP 1: Important EISA configuration information
STEP 2: Add or remove boards
STEP 3: View or edit details
STEP 4: Examine required details
STEP 5: Save and exit

NOTE: Step 1 of the ECU provides online help. It is recommended that you select this step and become familiar with the utility before proceeding.
A.4  Running RAID Standalone Configuration Utility

The RAID Standalone Configuration Utility is used to set up RAID disk drives and logical units. The Standalone Utility is run from the AlphaBIOS Utility menu.

The AlphaServer 4100 system supports the KZPSC-xx PCI RAID controller (SWXCR). The KZPSC-xx kit includes the controller, RAID Array 230 Subsystems software, and documentation.

1. Start AlphaBIOS Setup. If the system is in the SRM console, issue the command alphabios. (If the system has a graphics monitor, you can set the SRM console environment variable to graphics.)

2. At the Utilities screen, select Run Maintenance Program. Press Enter.

3. In the Run Maintenance Program dialog box, type swxcmgr in the Program Name: field.

4. Press Enter to execute the program. The Main menu displays the following options:

   [01.View/Update Configuration]
   02.Automatic Configuration
   03.New Configuration
   04.Initialize Logical Drive
   05.Parity Check
   06.Rebuild
   07.Tools
   08.Select SWXCR
   09.Controller Setup
   10.Diagnostics

Refer to the RAID Array Subsystems documentation for information on using the Standalone Configuration Utility to set up RAID drives.
A.5 Updating Firmware with LFU

Start the Loadable Firmware Update (LFU) utility by issuing the lfu command at the SRM console prompt or by selecting Update AlphaBIOS in the AlphaBIOS Setup screen. LFU is part of the SRM console.

Example A-1 Starting LFU from the SRM Console

P00>>> lfu

***** Loadable Firmware Update Utility *****

Select firmware load device (cda0, dva0, ewa0), or Press <return> to bypass loading and proceed to LFU: cda0
.
.
UPD>

Figure A-2 Starting LFU from the AlphaBIOS Console

Press ENTER to upgrade your AlphaBIOS from floppy or CD-ROM.
Use the Loadable Firmware Update (LFU) utility to update system firmware. You can start LFU from either the SRM console or the AlphaBIOS console.

- From the SRM console, start LFU by issuing the **lfu** command.
- From the AlphaBIOS console, select **Upgrade AlphaBIOS** from the **AlphaBIOS Setup** screen (see Figure A-2).

A typical update procedure is:

1. Start LFU.
2. Use the LFU **list** command to show the revisions of modules that LFU can update and the revisions of update firmware.
3. Use the LFU **update** command to write the new firmware.
4. Use the LFU **exit** command to exit back to the console.

The sections that follow show examples of updating firmware from the local CD-ROM, the local floppy, and a network device. Following the examples is an LFU command reference.
A.5.1 Updating Firmware from the Internal CD-ROM

Insert the update CD-ROM, start LFU, and select cda0 as the load device.

Example A-2 Updating Firmware from the Internal CD-ROM

***** Loadable Firmware Update Utility *****

Select firmware load device (cda0, dva0, ewa0), or
Press <return> to bypass loading and proceed to LFU: cda0 ➊

Please enter the name of the options firmware files list, or
Press <return> to use the default filename [AS4X00FW]: AS4X00CP ➋

Copying AS4X00CP from DKA500.5.0.1.1.
Copying [as4x00]RHREADME from DKA500.5.0.1.1.
Copying [as4x00]RHSRMROM from DKA500.5.0.1.1 .....................
Copying [as4x00]RHARCROM from DKA500.5.0.1.1 .............

Function Description ➌

Display Displays the system’s configuration table.
Exit Done exit LFU (reset).
List Lists the device, revision, firmware name, and update revision.
LfU Restarts LFU.
Readme Lists important release information.
Update Replaces current firmware with loadable data image.
Verify Compares loadable and hardware images.
? or Help Scrolls this function table.

UPD> list ➍

Device Current Revision Filename Update Revision
AlphaBIOS V5.12-2 arcrom V6.40-1
srmFlash V1.0-9 srmrom V2.0-3

Continued on next page
1. Select the device from which firmware will be loaded. The choices are the internal CD-ROM, the internal floppy disk, or a network device. In this example, the internal CD-ROM is selected.

2. Select the file that has the firmware update, or press Enter to select the default file. The file options are:
   - AS4X00FW: SRM console, AlphaBIOS console, and I/O adapter firmware (default)
   - AS4X00CP: SRM console and AlphaBIOS console firmware only
   - AS4X00IO: I/O adapter firmware only

   In this example, the file for console firmware (AlphaBIOS and SRM) is selected.

3. The LFU function table and prompt (UPD>) display.

4. Use the LFU list command to determine the revision of firmware in a device and the most recent revision of that firmware available in the selected file. In this example, the resident firmware for each console (SRM and AlphaBIOS) is at an earlier revision than the firmware in the update file.

Continued on next page
Example A-2  Updating Firmware from the Internal CD-ROM  
(Continued)

UPD> update *  ⑤
WARNING: updates may take several minutes to complete for each device.

Confirm update on: AlphaBIOS  [Y/(N)] y  ⑥
DO NOT ABORT!
AlphaBIOS       Updating to V6.40-1... Verifying V6.40-1... PASSED.

Confirm update on: smrflash  [Y/(N)] y  ⑦
DO NOT ABORT!
srmflash        Updating to V2.0-3...  Verifying V2.0-3...  PASSED.

UPD> exit
The **update** command updates the device specified or all devices. In this example, the wildcard indicates that all devices supported by the selected update file will be updated.

For each device, you are asked to confirm that you want to update the firmware. The default is no. Once the update begins, do not abort the operation. Doing so will corrupt the firmware on the module.

The **exit** command returns you to the console from which you entered LFU (either SRM or AlphaBIOS).
A.5.2 Updating Firmware from the Internal Floppy Disk — Creating the Diskettes

Create the update diskettes before starting LFU. See Section A.5.3 for an example of the update procedure.

Table A-2 File Locations for Creating Update Diskettes on a PC

<table>
<thead>
<tr>
<th>Console Update Diskette</th>
<th>I/O Update Diskette</th>
</tr>
</thead>
<tbody>
<tr>
<td>AS4X00FW.TXT</td>
<td>AS4X00IO.TXT</td>
</tr>
<tr>
<td>AS4X00CP.TXT</td>
<td>RHREADME.SYS</td>
</tr>
<tr>
<td>RHREADME.SYS</td>
<td>CIPCA214.SYS</td>
</tr>
<tr>
<td>RHSRMROM.SYS</td>
<td>DFPAA246.SYS</td>
</tr>
<tr>
<td>RHARCROM.SYS</td>
<td>KZPAAA10.SYS</td>
</tr>
</tbody>
</table>

To update system firmware from floppy disk, you first must create the firmware update diskettes. You will need to create two diskettes: one for console updates, and one for I/O.

1. Download the update files from the Internet (see the Preface of this book).
2. On a PC, copy files onto two FAT-formatted diskettes.
   From an OpenVMS system, copy files onto two ODS2-formatted diskettes as shown in Example A-3.
Example A-3  Creating Update Diskettes on an OpenVMS System

Console Update Diskette

$ inquire ignore "Insert blank HD floppy in DVA0, then continue"
$ set verify
$ set proc/priv=all
$ init /density=hd/index=begin dva0: rhods2cp
$ mount dva0: rhods2cp
$ create /directory dva0:[as4x00]
$ copy as4x00fw.sys dva0:[as4x00]as4x00fw.sys
$ copy as4x00cp.sys dva0:[as4x00]as4x00cp.sys
$ copy rhreadme.sys dva0:[as4x00]rhreadme.sys
$ copy as4x00fw.txt dva0:[as4x00]as4x00fw.txt
$ copy as4x00cp.txt dva0:[as4x00]as4x00cp.txt
$ copy rhsmrrom.sys dva0:[as4x00]rhsmrrom.sys
$ copy rharcrom.sys dva0:[as4x00]rharcrom.sys
$ dismount dva0:
$ set noverify
$ exit

I/O Update Diskette

$ inquire ignore "Insert blank HD floppy in DVA0, then continue"
$ set verify
$ set proc/priv=all
$ init /density=hd/index=begin dva0: rhods2io
$ mount dva0: rhods2io
$ create /directory dva0:[as4x00]
$ create /directory dva0:[options]
$ copy as4x00fw.sys dva0:[as4x00]as4x00fw.sys
$ copy as4x00io.sys dva0:[as4x00]as4x00io.sys
$ copy rhreadme.sys dva0:[as4x00]rhreadme.sys
$ copy as4x00fw.txt dva0:[as4x00]as4x00fw.txt
$ copy as4x00io.txt dva0:[as4x00]as4x00io.txt
$ copy cipca214.sys dva0:[options]cipca214.sys
$ copy dfpaa246.sys dva0:[options]dfpaa246.sys
$ copy kzpsaA10.sys dva0:[options]kzpsaA10.sys
$ dismount dva0:
$ set noverify
$ exit
A.5.3 Updating Firmware from the Internal Floppy Disk — Performing the Update

Insert an update diskette (see Section A.5.2) into the internal floppy drive. Start LFU and select dva0 as the load device.

Example A-4 Updating Firmware from the Internal Floppy Disk

***** Loadable Firmware Update Utility *****

Select firmware load device (cda0, dva0, ewa0), or
Press <return> to bypass loading and proceed to LFU: dva0

Please enter the name of the options firmware files list, or
Press <return> to use the default filename [AS4X00IO, (AS4X00CP)]: AS4X00IO

Copying AS4X00IO from DVA0.
Copying RHREADME from DVA0.
Copying CIPCA214 from DVA0.
Copying DFPAA252 from DVA0 ...
Copying KZPSA11 from DVA0 ...

. (The function table displays, followed by the UPD> prompt, as shown in Example A-2.)

UPD> list

<table>
<thead>
<tr>
<th>Device</th>
<th>Current Revision</th>
<th>Filename</th>
<th>Update Revision</th>
</tr>
</thead>
<tbody>
<tr>
<td>AlphaBIOS</td>
<td>V5.12-3</td>
<td>arcrom</td>
<td>Missing file</td>
</tr>
<tr>
<td>pfio</td>
<td>2.46</td>
<td>dfpaa_fw</td>
<td>2.52</td>
</tr>
<tr>
<td>smrflash</td>
<td>T3.2-21</td>
<td>srmrom</td>
<td>Missing file</td>
</tr>
<tr>
<td></td>
<td></td>
<td>cipca_fw</td>
<td>A214</td>
</tr>
<tr>
<td></td>
<td></td>
<td>kzpsa_fw</td>
<td>All</td>
</tr>
</tbody>
</table>

Continued on next page
1. Select the device from which firmware will be loaded. The choices are the internal CD-ROM, the internal floppy disk, or a network device. In this example, the internal floppy disk is selected.

2. Select the file that has the firmware update, or press Enter to select the default file. When the internal floppy disk is the load device, the file options are:

   - AS4X00CP (default) SRM console and AlphaBIOS console firmware only
   - AS4X00IO   I/O adapter firmware only

   The default option in Example A-2 (AS4X00FW) is not available, since the file is too large to fit on a 1.44 MB diskette. This means that when a floppy disk is the load device, you can update either console firmware or I/O adapter firmware, but not both in the same LFU session. If you need to update both, after finishing the first update, restart LFU with the \texttt{lfu} command and insert the floppy disk with the other file.

   In this example the file for I/O adapter firmware is selected.

3. Use the LFU \texttt{list} command to determine the revision of firmware in a device and the most recent revision of that firmware available in the selected file. In this example, the update revision for console firmware displays as “Missing file” because only the I/O firmware files are available on the floppy disk.

   \textit{Continued on next page}
Example A-4  Updating Firmware from the Internal Floppy Disk (Continued)

UPD> update pfi0  
WARNING: updates may take several minutes to complete for each device.

Confirm update on: pfi0  [Y/(N)] y

DO NOT ABORT!
pfi0 Updating to 2.52... Verifying to 2.52... PASSED.

UPD> lfu

***** Loadable Firmware Update Utility *****

Select firmware load device (cda0, dva0, ewa0), or Press <return> to bypass loading and proceed to LFU: dva0

Please enter the name of the options firmware files list, or Press <return> to use the default filename [AS4X00IO,(AS4X00CP)]:

. (The function table displays, followed by the UPD> prompt.
. Console firmware can now be updated.)

UPD> exit
The **update** command updates the device specified or all devices.

For each device, you are asked to confirm that you want to update the firmware. The default is no. Once the update begins, do not abort the operation. Doing so will corrupt the firmware on the module.

The **lfu** command restarts the utility so that console firmware can be updated. (Another method is shown in Example A-5, where the user specifies the file AS4X00FW and is prompted to insert the second diskette.)

The default update file, AS4X00CP, is selected. The console firmware can now be updated, using the same procedure as for the I/O firmware.

The **exit** command returns you to the console from which you entered LFU (either SRM or AlphaBIOS).

**Example A-5 Selecting AS4X00FW to Update Firmware from the Internal Floppy Disk**

```
P00>>> lfu

***** Loadable Firmware Update Utility *****

Select firmware load device (cda0, dva0, ewa0), or
Press <return> to bypass loading and proceed to LFU: dva0

Please enter the name of the firmware files list, or
Press <return> to use the default filename [AS4X00IO,(AS4X00CP)]: as4x00fw

Copying AS4X00FW from DVA0 .
Copying RHREADME from DVA0 .
Copying RHSMROM from DVA0 ..................
Copying RHARCROM from DVA0 ...............
Copying CIPCA214 from DVA0
Please insert next floppy containing the firmware,
Press <return> when ready. Or type DONE to abort.
Copying CIPCA214 from DVA0 .
Copying DFPA1246 from DVA0 ...
Copying KZPSA10 from DVA0 ...

```

A.5.4 Updating Firmware from a Network Device

Copy files to the local MOP server’s MOP load area, start LFU, and select ewa0 as the load device.

Example A-6 Updating Firmware from a Network Device

***** Loadable Firmware Update Utility *****

Select firmware load device (cda0, dva0, ewa0), or
Press <return> to bypass loading and proceed to LFU: ewa0

Please enter the name of the options firmware files list, or
Press <return> to use the default filename [AS4X00FW]:

Copying AS4X00FW from EWA0 .
Copying RHREADME from EWA0 .
Copying RHSMROM from EWA0 ......................
Copying RHARCROM from EWA0 ............
Copying CIPCA214 from EWA0 .
Copying DFPAA246 from EWA0 ...
Copying KZPSA11 from EWA0 ...

[The function table displays, followed by the UPD> prompt, as shown in Example A-2.]

UPD> list

<table>
<thead>
<tr>
<th>Device</th>
<th>Current Revision</th>
<th>Filename</th>
<th>Update Revision</th>
</tr>
</thead>
<tbody>
<tr>
<td>AlphaBIOS</td>
<td>V5.12-2</td>
<td>arcrom</td>
<td>V6.40-1</td>
</tr>
<tr>
<td>kzpsa0</td>
<td>A10</td>
<td>kzpsa_fw</td>
<td>A11</td>
</tr>
<tr>
<td>kzpsa1</td>
<td>A10</td>
<td>kzpsa_fw</td>
<td>A11</td>
</tr>
<tr>
<td>srmflash</td>
<td>V1.0-9</td>
<td>srmrom</td>
<td>V2.0-3</td>
</tr>
<tr>
<td></td>
<td></td>
<td>cipca_fw</td>
<td>A214</td>
</tr>
<tr>
<td></td>
<td></td>
<td>dfpaa_fw</td>
<td>2.46</td>
</tr>
</tbody>
</table>

Continued on next page
Before starting LFU, download the update files from the Internet (see Preface). You will need the files with the extension .SYS. Copy these files to your local MOP server’s MOP load area.

1. Select the device from which firmware will be loaded. The choices are the internal CD-ROM, the internal floppy disk, or a network device. In this example, a network device is selected.

2. Select the file that has the firmware update, or press Enter to select the default file. The file options are:
   - AS4X00FW (default)  SRM console, AlphaBIOS console, and I/O adapter firmware
   - AS4X00CP  SRM console and AlphaBIOS console firmware only
   - AS4X00IO  I/O adapter firmware only
   In this example, the default file, which has both console firmware (AlphaBIOS and SRM) and I/O adapter firmware, is selected.

3. Use the LFU list command to determine the revision of firmware in a device and the most recent revision of that firmware available in the selected file. In this example, the resident firmware for each console (SRM and AlphaBIOS) and I/O adapter is at an earlier revision than the firmware in the update file.

Continued on next page
Example A-6   Updating Firmware from a Network Device
(Continued)

UPD> update * -all

WARNING: updates may take several minutes to complete for each device.

DO NOT ABORT!
AlphaBIOS Upgrading to V6.40-1... Verifying V6.40-1... PASSED.
DO NOT ABORT!
kzpsa0 Upgrading to A11 ... Verifying A11... PASSED.
DO NOT ABORT!
kzpsal Upgrading to A11 ... Verifying A11... PASSED.
DO NOT ABORT!
srmflash Upgrading to V2.0-3... Verifying V2.0-3... PASSED.

UPD> exit
The **update** command updates the device specified or all devices. In this example, the wildcard indicates that all devices supported by the selected update file will be updated. Typically, LFU requests confirmation before updating each console’s or device’s firmware. The **-all** option removes the update confirmation requests.

The **exit** command returns you to the console from which you entered LFU (either SRM or AlphaBIOS).
A.5.5 LFU Commands

The commands summarized in Table A-3 are used to update system firmware.

Table A-3 LFU Command Summary

<table>
<thead>
<tr>
<th>Command</th>
<th>Function</th>
</tr>
</thead>
<tbody>
<tr>
<td>display</td>
<td>Shows the system physical configuration.</td>
</tr>
<tr>
<td>exit</td>
<td>Terminates the LFU program.</td>
</tr>
<tr>
<td>help</td>
<td>Displays the LFU command list.</td>
</tr>
<tr>
<td>lfu</td>
<td>Restarts the LFU program.</td>
</tr>
<tr>
<td>list</td>
<td>Displays the inventory of update firmware on the selected device.</td>
</tr>
<tr>
<td>readme</td>
<td>Lists release notes for the LFU program.</td>
</tr>
<tr>
<td>update</td>
<td>Writes new firmware to the module.</td>
</tr>
<tr>
<td>verify</td>
<td>Reads the firmware from the module into memory and compares it with the update firmware.</td>
</tr>
</tbody>
</table>

These commands are described in the following pages.
display

The display command shows the system physical configuration. Display is equivalent to issuing the SRM console command show configuration. Because it shows the slot for each module, display can help you identify the location of a device.

exit

The exit command terminates the LFU program, causes system initialization and testing, and returns the system to the console from which LFU was called.

help

The help (or ?) command displays the LFU command list, shown below.

<table>
<thead>
<tr>
<th>Function</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>Display</td>
<td>Displays the system's configuration table.</td>
</tr>
<tr>
<td>Exit</td>
<td>Done exit LFU (reset).</td>
</tr>
<tr>
<td>List</td>
<td>Lists the device, revision, firmware name, and update revision.</td>
</tr>
<tr>
<td>Lfu</td>
<td>Restarts LFU.</td>
</tr>
<tr>
<td>Readme</td>
<td>Lists important release information.</td>
</tr>
<tr>
<td>Update</td>
<td>Replaces current firmware with loadable data image.</td>
</tr>
<tr>
<td>Verify</td>
<td>Compares loadable and hardware images.</td>
</tr>
<tr>
<td>? or Help</td>
<td>Scrolls this function table.</td>
</tr>
</tbody>
</table>

lfu

The lfu command restarts the LFU program. This command is used when the update files are on a floppy disk. The files for updating both console firmware and I/O firmware are too large to fit on a 1.44 MB disk, so only one type of firmware can be updated at a time. Restarting LFU enables you to specify another update file.
list
The list command displays the inventory of update firmware on the CD-ROM, network, or floppy. Only the devices listed at your terminal are supported for firmware updates.

The list command shows three pieces of information for each device:

- Current Revision — The revision of the device’s current firmware
- Filename — The name of the file used to update that firmware
- Update revision — The revision of the firmware update image

readme
The readme command lists release notes for the LFU program.

update
The update command writes new firmware to the module. Then LFU automatically verifies the update by reading the new firmware image from the module into memory and comparing it with the source image.

To update more than one device, you may use a wildcard but not a list. For example, update k* updates all devices with names beginning with k, and update * updates all devices. When you do not specify a device name, LFU tries to update all devices; it lists the selected devices to update and prompts before devices are updated. (The default is no.) The -all option removes the update confirmation requests, enabling the update to proceed without operator intervention.

CAUTION: Never abort an update operation. Aborting corrupts the firmware on the module.

verify
The verify command reads the firmware from the module into memory and compares it with the update firmware. If a module already verified successfully when you updated it, but later failed tests, you can use verify to tell whether the firmware has become corrupted.
A.6 Updating Firmware from AlphaBIOS

Insert the CD-ROM or diskette with the updated firmware and select Upgrade AlphaBIOS from the main AlphaBIOS Setup screen. Use the Loadable Firmware Update (LFU) utility to perform the update. The LFU exit command causes a system reset.

Figure A-3 AlphaBIOS Setup Screen

Press ENTER to upgrade your AlphaBIOS from floppy or CD-ROM.
Upgrading AlphaBIOS

As new versions of Windows NT are released, it might be necessary to upgrade AlphaBIOS to the latest version. Additionally, as improvements are made to AlphaBIOS, it might be desirable to upgrade to take advantage of new AlphaBIOS features.

Use this procedure to upgrade from an earlier version of AlphaBIOS:

1. Insert the diskette or CD-ROM containing the AlphaBIOS upgrade.

2. If you are not already running AlphaBIOS Setup, start it by restarting your system and pressing F2 when the Boot screen is displayed.

3. In the main AlphaBIOS Setup screen, select Upgrade AlphaBIOS and press Enter.

   The system is reset and the Loadable Firmware Update (LFU) utility is started. See Section A5.5 for LFU commands.

4. When the upgrade is complete, issue the LFU exit command. The system is reset and you are returned to AlphaBIOS.

   If you press the Reset button instead of issuing the LFU exit command, the system is reset and you are returned to LFU.
This appendix provides a summary of the SRM console commands and environment variables. The test command is described in Chapter 3 of this document. For complete reference information on the other SRM commands and environment variables, see the AlphaServer 4000/4100 System Drawer User’s Guide.

NOTE: It is recommended that you keep a list of the environment variable settings for systems that you service, because you will need to restore certain environment variable settings after swapping modules. Refer to Table B-3 for a convenient worksheet.
B.1 Summary of SRM Console Commands

The SRM console commands are used to examine or modify the system state.

Table B-1 Summary of SRM Console Commands

<table>
<thead>
<tr>
<th>Command</th>
<th>Function</th>
</tr>
</thead>
<tbody>
<tr>
<td>alphabios</td>
<td>Loads and starts the AlphaBIOS console.</td>
</tr>
<tr>
<td>boot</td>
<td>Loads and starts the operating system.</td>
</tr>
<tr>
<td>clear envar</td>
<td>Resets an environment variable to its default value.</td>
</tr>
<tr>
<td>continue</td>
<td>Resumes program execution.</td>
</tr>
<tr>
<td>crash</td>
<td>Forces a crash dump at the operating system level.</td>
</tr>
<tr>
<td>deposit</td>
<td>Writes data to the specified address.</td>
</tr>
<tr>
<td>edit</td>
<td>Invokes the console line editor on a RAM file or on the nvram file (power-up script).</td>
</tr>
<tr>
<td>examine</td>
<td>Displays the contents of a memory location, register, or device.</td>
</tr>
<tr>
<td>halt</td>
<td>Halts the specified processor. (Same as stop.)</td>
</tr>
<tr>
<td>help</td>
<td>Displays information about the specified console command.</td>
</tr>
<tr>
<td>info num</td>
<td>Displays various types of information about the system:</td>
</tr>
<tr>
<td></td>
<td>Info shows a list describing the num qualifier.</td>
</tr>
<tr>
<td></td>
<td>Info 3 reads the impure area that contains the state of the CPU before it entered PAL mode.</td>
</tr>
<tr>
<td></td>
<td>Info 5 reads the PAL built logout area that contains the data used by the operating system to create the error entry</td>
</tr>
<tr>
<td></td>
<td>Info 8 reads the IOD and IOD1 registers.</td>
</tr>
<tr>
<td>initialize</td>
<td>Resets the system.</td>
</tr>
<tr>
<td>lfu</td>
<td>Runs the Loadable Firmware Update Utility.</td>
</tr>
</tbody>
</table>

Continued on next page
<table>
<thead>
<tr>
<th>Command</th>
<th>Function</th>
</tr>
</thead>
<tbody>
<tr>
<td>man</td>
<td>Displays information about the specified console command.</td>
</tr>
<tr>
<td>more</td>
<td>Displays a file one screen at a time.</td>
</tr>
<tr>
<td>prcache</td>
<td>Initializes and displays status of the PCI NVRAM.</td>
</tr>
<tr>
<td>set\ envar</td>
<td>Sets or modifies the value of an environment variable.</td>
</tr>
<tr>
<td>set host</td>
<td>Connects to an MSCP DUP server on a DSSI device.</td>
</tr>
<tr>
<td>set rcm_dialout</td>
<td>Sets a modem dialout string.</td>
</tr>
<tr>
<td>show\ envar</td>
<td>Displays the state of the specified environment variable.</td>
</tr>
<tr>
<td>show config</td>
<td>Displays the configuration at the last system initialization.</td>
</tr>
<tr>
<td>show cpu</td>
<td>Displays the state of each processor in the system.</td>
</tr>
<tr>
<td>show device</td>
<td>Displays a list of controllers and their devices in the system.</td>
</tr>
<tr>
<td>show fru</td>
<td>Displays the serial number and revision level of system bus options.</td>
</tr>
<tr>
<td>show memory</td>
<td>Displays memory module information.</td>
</tr>
<tr>
<td>show network</td>
<td>Displays the state of network devices in the system.</td>
</tr>
<tr>
<td>show pal</td>
<td>Displays the version of the privileged architecture library code (PALcode).</td>
</tr>
<tr>
<td>show power</td>
<td>Displays information about the power supplies, system fans, CPU fans, and temperature.</td>
</tr>
<tr>
<td>show rcm_dialout</td>
<td>Displays the modem dialout string.</td>
</tr>
<tr>
<td>show version</td>
<td>Displays the version of the console program.</td>
</tr>
<tr>
<td>start</td>
<td>Starts a program that was previously loaded on the processor specified.</td>
</tr>
<tr>
<td>stop</td>
<td>Halts the specified processor. (Same as halt.)</td>
</tr>
<tr>
<td>test</td>
<td>Runs firmware diagnostics for the system.</td>
</tr>
</tbody>
</table>
B.2 Summary of SRM Environment Variables

Environment variables pass configuration information between the console and the operating system. Their settings determine how the system powers up, boots the operating system, and operates. Environment variables are set or changed with the set envar command and returned to their default values with the clear envar command. Their values are viewed with the show envar command. The SRM environment variables are specific to the SRM console.

<table>
<thead>
<tr>
<th>Environment Variable</th>
<th>Function</th>
</tr>
</thead>
<tbody>
<tr>
<td>auto_action</td>
<td>Specifies the console’s action at power-up, a failure, or a reset.</td>
</tr>
<tr>
<td>bootdef_dev</td>
<td>Specifies the default boot device string.</td>
</tr>
<tr>
<td>boot_osflags</td>
<td>Specifies the default operating system boot flags.</td>
</tr>
<tr>
<td>com2_baud</td>
<td>Changes the default baud rate of the COM2 serial port.</td>
</tr>
<tr>
<td>console</td>
<td>Specifies the device on which power-up output is displayed (serial terminal or graphics monitor).</td>
</tr>
<tr>
<td>cpu_enabled</td>
<td>Enables or disables a specific secondary CPU.</td>
</tr>
<tr>
<td>ew*0_mode</td>
<td>Specifies the connection type of the default Ethernet controller.</td>
</tr>
<tr>
<td>ew*0_protocols</td>
<td>Specifies network protocols for booting over the Ethernet controller.</td>
</tr>
<tr>
<td>kbd.hardware_type</td>
<td>Specifies the default console keyboard type.</td>
</tr>
<tr>
<td>kzpsa*_host_id</td>
<td>Specifies the default value for the KZPSA host SCSI bus node ID.</td>
</tr>
<tr>
<td>language</td>
<td>Specifies the console keyboard layout.</td>
</tr>
</tbody>
</table>

Continued on next page
<table>
<thead>
<tr>
<th>Environment Variable</th>
<th>Function</th>
</tr>
</thead>
<tbody>
<tr>
<td>memory_test</td>
<td>Specifies the extent to which memory will be tested. For DIGITAL UNIX systems only.</td>
</tr>
<tr>
<td>ocp_text</td>
<td>Overrides the default OCP display text with specified text.</td>
</tr>
<tr>
<td>os_type</td>
<td>Specifies the operating system and sets the appropriate console interface.</td>
</tr>
<tr>
<td>pci_parity</td>
<td>Disables or enables parity checking on the PCI bus.</td>
</tr>
<tr>
<td>pk*0_fast</td>
<td>Enables fast SCSI mode.</td>
</tr>
<tr>
<td>pk*0_host_id</td>
<td>Specifies the default value for a controller host bus node ID.</td>
</tr>
<tr>
<td>pk*0_soft_term</td>
<td>Enables or disables SCSI terminators on systems that use the QLogic ISP1020 SCSI controller.</td>
</tr>
<tr>
<td>sys_model_num</td>
<td>Displays the system model number and computes certain information passed to the operating system. Must be restored after a PCI motherboard is replaced.</td>
</tr>
<tr>
<td>sys_serial_num</td>
<td>Restores the system serial number. Must be set if the system motherboard is replaced.</td>
</tr>
<tr>
<td>sys_type</td>
<td>Displays the system type and computes certain information passed to the operating system. Must be restored after a PCI motherboard is replaced.</td>
</tr>
<tr>
<td>tga_sync_green</td>
<td>Specifies the location of the SYNC signal generated by the DIGITAL ZLXp-E PCI graphics accelerator option.</td>
</tr>
<tr>
<td>tt_allow_login</td>
<td>Enables or disables login to the SRM console firmware on other console ports.</td>
</tr>
</tbody>
</table>
B.3 Recording Environment Variables

You can make copies of the table below to record environment variable settings for specific systems. Write the system name in the column provided. Enter the show* command to list the system settings.

Table B-3   Environment Variables Worksheet

<table>
<thead>
<tr>
<th>Environment Variable</th>
<th>System Name</th>
<th>System Name</th>
<th>System Name</th>
</tr>
</thead>
<tbody>
<tr>
<td>auto_action</td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>bootdef_dev</td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>boot_osflags</td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>com2_baud</td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>console</td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>cpu_enabled</td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>ew*0_mode</td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>ew*0_protocols</td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>kbd_hardware_type</td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>kzpsa*_host_id</td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>language</td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>memory_test</td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>ocp_text</td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>os_type</td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>pci_parity</td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>pk*0_fast</td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>pk*0_host_id</td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>pk*0_soft_term</td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>
Table B-3  Environment Variables Worksheet (Continued)

<table>
<thead>
<tr>
<th>Environment Variable</th>
<th>System Name</th>
<th>System Name</th>
<th>System Name</th>
</tr>
</thead>
<tbody>
<tr>
<td>pk*0_soft_term</td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>sys_model_num</td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>sys_serial_num</td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>sys_type</td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>tga_sync_green</td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>tt_allow_login</td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>
Appendix C

Operating the System Remotely

This appendix describes how to use the remote console monitor (RCM) to monitor and control the system remotely.

C.1 RCM Console Overview

The remote console monitor (RCM) is used to monitor and control the system remotely. The RCM resides on the server control module and allows the system administrator to connect remotely to a managed system through a modem, using a serial terminal or terminal emulator.

The RCM has special console firmware that is used to remotely control an AlphaServer system. The RCM firmware resides on an independent microprocessor. It is not part of the SRM console that resides in the flash ROM. The RCM firmware has its own command interface that allows the user to perform the tasks that can usually be done from the system’s serial console terminal. RCM console commands are used to reset, halt, and power the system on or off, regardless of the operating system or hardware state. The RCM console commands are also used to monitor the power supplies, temperature, and fans.

The user can enter the RCM console either remotely or through the local serial console terminal. Once in command mode, the user can enter commands to control and monitor the system.

- To enter the RCM console remotely, the user dials in through a modem, enters a password, and then types a special escape sequence that invokes RCM command mode. You must set up the modem before you can dial in remotely. See Section C.1.1.
- To enter the RCM console locally, the user types the escape sequence at the SRM console prompt on the local serial console terminal.

The RCM also provides an autonomous dial-out capability when it detects a power failure within the system. When triggered, the RCM dials a paging service at 30-minute intervals until the administrator clears the alert within the RCM.
C.1.1 Modem Usage

To use the RCM to monitor a system remotely, first make the connections to the server control module, as shown below. Then configure the modem port for dial-in.

Figure C-1  RCM Connections
Modem Selection

The RCM requires a Hayes-compatible modem. The controls that the RCM sends to the modem have been selected to be acceptable to a wide selection of modems. The modems that have been tested and qualified include:

Motorola LifeStyle Series 28.8
AT&T DATAPORT 14.4/FAX
Zoom Model 360

The U.S. Robotics Sportster DATA/FAX MODEM is also supported, but requires some modification of the modem initialization and answer strings. See Section C.1.7.

Modem Configuration Procedure

1. Connect a Hayes-compatible modem to the RCM as shown in Figure C-1, and power up the modem.
2. From the local serial console terminal, enter the RCM firmware console by typing the following escape sequence:

   ^] ^] rcm

   The character “^” is created by simultaneously holding down the Ctrl key and pressing the ] key (right square bracket). The firmware prompt, RCM>, should now be displayed.
3. Enter a modem password with the setpass command. See Section C.1.3.14.
4. Enable the modem port with the enable command. See Section C.1.3.5.
5. Enter the quit command to leave the RCM console.
6. You are now ready to dial in remotely.
Dialing In to the RCM Modem Port

1. Dial the modem connected to the server control module. The RCM answers the call and after a few seconds prompts for a password with a “#” character.

2. Enter the password that was loaded using the `setpass` command. The user has three tries to correctly enter the password. On the third unsuccessful attempt, the connection is terminated, and as a security precaution, the modem is not answered again for 5 minutes.

On successful entry of the password, the RCM banner message “RCM V1.0” is displayed, and the user is connected to the system COM1 port. At this point the local terminal keyboard is disabled except for entering the RCM console firmware. The local terminal displays all the terminal traffic going out to the modem.

3. To connect to the RCM firmware console, type the RCM escape sequence.

Refer to Example C-1 for an example of the modem dial-in procedure.

Example C-1  Sample Remote Dial-In Dialog

```
ATQ0V1E1S0=0  When modem dial-in connection is made, a screen display
OK            similar to this appears.
ATDT30167     CONNECT 9600
#             Enter password at this prompt.
RCM V1.0      RCM banner is displayed.
^][]}rcm      Enter the escape sequence after the banner is displayed.
The escape sequence is not echoed on the terminal.
RCM>          RCM prompt is displayed. Commands to control and
              monitor the system can be entered.
```

Terminating a Modem Session

Terminate the modem session by executing a `hangup` command from the RCM console firmware. This will cleanly terminate the modem connection.

If the modem connection is terminated without using the `hangup` command, or if the line is dropped due to phone line problems, the RCM will detect carrier loss and initiate an internal `hangup` command. This process can take a minute or more, and the local terminal will be locked out until the auto hangup process completes.

If the modem link is idle for more than 20 minutes, the RCM initiates an auto hangup.
C.1.2 Entering and Leaving Command Mode

Use the default escape sequence to enter RCM command mode for the first time. You can enter RCM command mode from the SRM console level, the operating system level, or an application. The RCM quit command reconnects the terminal to the system console port.

Example C-2  Entering and Leaving RCM Command Mode

^]}^]rcm  \(^1\)
RCM>

RCM> quit  \(^2\)
Focus returned to COM port

Entering the RCM Firmware Console

To enter the RCM firmware console, enter the RCM escape sequence. See \(^1\) in Example C-2 for the default sequence.

The escape sequence is not echoed on the terminal or sent to the system. Once in the RCM firmware console, the user is in RCM command mode and can enter RCM console commands.

Leaving Command Mode

To leave RCM command mode and reconnect to the system console port, enter the quit command, then press Return to get a prompt from the operating system or system console. (See \(^2\)).
C.1.3 RCM Commands

The RCM commands summarized below are used to control and monitor a system remotely.

Table C-1  RCM Command Summary

<table>
<thead>
<tr>
<th>Command</th>
<th>Function</th>
</tr>
</thead>
<tbody>
<tr>
<td>alert_clr</td>
<td>Clears alert flag, stopping dial-out alert cycle</td>
</tr>
<tr>
<td>alert_dis</td>
<td>Disables the dial-out alert function</td>
</tr>
<tr>
<td>alert_ena</td>
<td>Enables the dial-out alert function</td>
</tr>
<tr>
<td>disable</td>
<td>Disables remote access to the modem port</td>
</tr>
<tr>
<td>enable</td>
<td>Enables remote access to the modem port</td>
</tr>
<tr>
<td>hangup</td>
<td>Terminates the modem connection</td>
</tr>
<tr>
<td>halt</td>
<td>Halts server</td>
</tr>
<tr>
<td>help</td>
<td>Displays the list of commands</td>
</tr>
<tr>
<td>poweroff</td>
<td>Turns off power to server</td>
</tr>
<tr>
<td>poweron</td>
<td>Turns on power to server</td>
</tr>
<tr>
<td>quit</td>
<td>Exits console mode and returns to system console port</td>
</tr>
<tr>
<td>reset</td>
<td>Resets the server</td>
</tr>
<tr>
<td>setesc</td>
<td>Changes the escape sequence for entering command mode</td>
</tr>
<tr>
<td>setpass</td>
<td>Changes the modem access password</td>
</tr>
<tr>
<td>status</td>
<td>Displays server’s status and sensors</td>
</tr>
</tbody>
</table>
Command Conventions

- The commands are not case sensitive.
- A command must be entered in full.
- If a command is entered that is not valid, the command fails with the message:

  *** ERROR - unknown command ***

  Enter a valid command.

The RCM commands are described on the following pages.
C.1.3.1 alert_clr

The alert_clr command clears an alert condition within the RCM. The alert enable condition remains active, and the RCM will again enter the alert condition when it detects a system power failure.

RCM>alert_clr

C.1.3.2 alert_dis

The alert_dis command disables RCM dial-out capability. It also clears any outstanding alerts. The alert disable state is nonvolatile. Dial-out capability remains disabled until the alert_enable command is issued.

RCM>alert_dis

C.1.3.3 alert_ena

The alert_ena command enables the RCM to automatically dial out when it detects a power failure within the system. The RCM repeats the dial-out alert at 30-minute intervals until the alert is cleared. The alert enable state is nonvolatile. Dial-out capability remains enabled until the alert_disable command is issued.

RCM>alert_ena

In order for the alert_enable command to work, two conditions must be met:

- A modem dial-out string must be entered with the system console.
- Remote access to the RCM modem port must be enabled with the enable command.

If the alert_enable command is entered when remote access is disabled, the following message is returned:

*** error ***
C.1.3.4 disable

The disable command disables remote access to the RCM modem port.

RCM>disable

The module’s remote access default state is DISABLED. The modem enable state is nonvolatile. When the modem is disabled, it remains disabled until the enable command is issued. If a modem connection is in progress, entering the disable command terminates it.

C.1.3.5 enable

The enable command enables remote access to the RCM modem port. It can take up to 10 seconds for the enable command to be executed.

RCM>enable

The module’s remote access default state is DISABLED.

The modem enable state is nonvolatile. When the modem is enabled, it remains enabled until the disable command is issued.

The enable command can fail for two reasons:

• There is no modem access password configured.
• The modem is not connected or is not working properly.

If the enable command fails, the following message is displayed:

*** ERROR - enable failed ***

C.1.3.6 hangup

The hangup command terminates the modem session. When this command is issued, the remote user is disconnected from the server. This command can be issued from either the local or remote console.

RCM>hangup
C.1.3.7 halt

The halt command attempts to halt the managed system. It is functionally equivalent to pressing the Halt button on the system operator control panel to the “in” position and then releasing it to the “out” position. The RCM console firmware exits command mode and reconnects the user’s terminal to the server’s COM1 serial port.

RCM>halt
Focus returned to COM port

NOTE: Pressing the Halt button has no effect on systems running Windows NT.

C.1.3.8 help or?

The help or ? command displays the RCM firmware command set.

C.1.3.9 poweroff

The poweroff command requests the RCM module to power off the system. It is functionally equivalent to turning off the system power from the operator control panel.

RCM>poweroff

If the system is already powered off, this command has no effect.

The external power to the RCM must be connected in order to power off the system from the RCM firmware console. If the external power supply is not connected, the command will not power the system down, and displays the message:

*** ERROR ***
C.1.3.10 poweron

The `poweron` command requests the RCM module to power on the system. For the system power to come on, the following conditions must be met:

- AC power must be present at the power supply inputs.
- The DC On/Off button must be in the “on” position.
- All system interlocks must be set correctly.

The RCM firmware console exits command mode and reconnects the user’s terminal to the system console port.

```
RCM>poweron
Focus returned to COM port
```

*NOTE: If the system is powered off with the DC On/Off button, the system will not power up. The RCM will not override the “off” state of the DC On/Off button. If the system is already powered on, the `poweron` command has no effect.*

C.1.3.11 quit

The `quit` command exits the user from command mode and reconnects the user’s terminal to the system console port. The following message is displayed:

```
Focus returned to COM port
```

The next display depends on what the system was doing when the RCM was invoked. For example, if the RCM was invoked from the SRM console prompt, the console prompt will be displayed when you enter a carriage return. Or, if the RCM was invoked from the operating system prompt, the operating system prompt will be displayed when you enter a carriage return.
C.1.3.12 reset

The reset command requests the RCM module to perform a hardware reset. It is functionally equivalent to pressing the Reset button on the system operator control panel.

RCM>reset
Focus returned to COM port

The following events occur when the reset command is executed:

- The system restarts and the system console firmware reinitializes.
- The console exits RCM command mode and reconnects the user’s terminal to the server’s COM1 serial port.
- The power-up messages are displayed, and then the console prompt is displayed or the operating system boot messages are displayed, depending on the state of the Halt button.

C.1.3.13 setesc

The setesc command allows the user to reset the default escape sequence for entering console mode. The escape sequence can be any character string. A typical sequence consists of 2 or more characters, to a maximum of 15 characters. The escape sequence is stored in the module’s on-board NVRAM.

NOTE: If you change the escape sequence, be sure to record the new sequence. Although the module factory defaults can be restored if the user has forgotten the escape sequence, this involves accessing the server control module and moving a jumper.

The following sample escape sequence consists of five iterations of the Ctrl key and the letter “o”.

RCM>setesc
^o^o^o^o^o
RCM>
If the escape sequence entered exceeds 15 characters, the command fails with the message:

*** ERROR ***

When changing the default escape sequence, avoid using special characters that are used by the system’s terminal emulator or applications.

Control characters are not echoed when entering the escape sequence. To verify the complete escape sequence, use the status command.

C.13.14 setpass

The setpass command allows the user to change the modem access password that is prompted for at the beginning of a modem session. The password is stored in the module’s on-board NVRAM.

RCM>setpass
new pass>********
RCM>

The maximum password length is 15 characters. If the password entered exceeds 15 characters, the command fails with the message:

*** ERROR ***

The minimum password length is one character, followed by a carriage return. If only a carriage return is entered, the command fails with the message:

*** ERROR - illegal password ***

If the user has forgotten the password, a new password can be entered.
C.1.3.15 status

The status command displays the current state of the server’s sensors, as well as the current escape sequence and alarm information.

RCM>status

Firmware Rev: V1.0
Escape Sequence: ^]^RCM
Remote Access: ENABLE/DISABLE
Alerts: ENABLE/DISABLE
Alert Pending: YES/NO (C)
Temp (C): 26.0
RCM Power Control: ON/OFF
External Power: ON
Server Power: OFF

RCM>

The status fields are explained in Table C-2.

Table C-2  RCM Status Command Fields

<table>
<thead>
<tr>
<th>Item</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>Firmware Rev:</td>
<td>Revision of RCM firmware.</td>
</tr>
<tr>
<td>Escape Sequence:</td>
<td>Current escape sequence to enter RCM firmware console.</td>
</tr>
<tr>
<td>Remote Access:</td>
<td>Modem remote access state. (ENABLE/DISABLE)</td>
</tr>
<tr>
<td>Alerts:</td>
<td>Alert dial-out state. (ENABLE/DISABLE)</td>
</tr>
<tr>
<td>Alert Pending:</td>
<td>Alert condition triggered. (YES/NO)</td>
</tr>
<tr>
<td>Temp (C):</td>
<td>Current system temperature in degrees Celsius.</td>
</tr>
<tr>
<td>RCM Power Control:</td>
<td>Current state of RCM system power control. (ON/OFF)</td>
</tr>
<tr>
<td>External Power:</td>
<td>Current state of power from external power supply to server control module. (ON/OFF)</td>
</tr>
<tr>
<td>Server Power:</td>
<td>Current state of system power. (ON/OFF)</td>
</tr>
</tbody>
</table>
C.1.4 Dial-Out Alerts

The RCM can be configured to automatically dial out through the modem (usually to a paging service) when it detects a power failure within the system. When a dial-out alert is triggered, the RCM initializes the modem for dial-out, sends the dial-out string, hangs up the modem, and reconfigures the modem for dial-in. The RCM and modem must continue to be powered, and the phone line must remain active, for the dial-out alert function to operate.

Example C-3 Configuring the Modem for Dial-Out Alerts

P00>>> set rcm_dialout "ATDTstring#;" ➊

RCM>enable
RCM>status
.
Remote Access: ENABLE ➊
.
RCM>alert_ena ➋

Example C-4 Typical RCM Dial-Out Command

P00>>> set rcm_dialout "ATXDT9,15085553333,,,,,,5085553332#;”

Use the show command to verify the RCM dial-out string:

P00>>> show rcm_dialout
rcm_dialout       ATXDT9,15085553333,,,,,,5085553332#;
**Enabling the Dial-Out Alert Function:**

1. Enter the `set rcm_dialout` command, followed by a dial-out alert string, from the SRM console (see ➊ in Example C-3).

   The string is a modem dial-out character string, not to exceed 47 characters, that is used by the RCM when dialing out through the modem. See the next topic for details on composing the modem dial-out string.

2. Enter the RCM firmware console and enter the `enable` command to enable remote access dial-in. The RCM firmware `status` command should display “Remote Access: ENABLE.” (See ➋)

3. Enter the RCM firmware `alert_ena` command to enable outgoing alerts. (See ➌)
Composing a Modem Dial-Out String

The modem dial-out string emulates a user dialing an automatic paging service. Typically, the user dials the pager phone number, waits for a tone, and then enters a series of numbers.

The RCM dial-out string (Example C-4) has the following requirements:

- The entire string following the `set rcm_dialout` command must be enclosed by quotation marks.
- The characters ATDT must be entered after the opening quotation marks. Do not mix case. Enter the characters either in all uppercase or all lowercase.
- Enter the character X if the line to be used also carries voice mail. Refer to the example that follows.
- The valid characters for the dial-out string are the characters on a phone keypad: 0–9, *, and #. In addition, a comma (,) requests that the modem pause for 2 seconds, and a semicolon (;) is required to terminate the string.

Elements of the Dial-Out String

<table>
<thead>
<tr>
<th>ATXDT</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>AT</td>
<td>Attention</td>
</tr>
<tr>
<td>X</td>
<td>Forces the modem to dial “blindly” (not look for a dial tone). Enter this character if the dial-out line modifies its dial tone when used for services such as voice mail.</td>
</tr>
<tr>
<td>D</td>
<td>Dial</td>
</tr>
<tr>
<td>T</td>
<td>Tone (for touch-tone)</td>
</tr>
<tr>
<td>,</td>
<td>Pause for 2 seconds.</td>
</tr>
<tr>
<td>#</td>
<td>In the example, “9” gets an outside line. Enter the number for an outside line if your system requires it.</td>
</tr>
<tr>
<td>;;</td>
<td>Dial the paging service.</td>
</tr>
<tr>
<td>, , , , ,</td>
<td>Pause for 12 seconds for paging service to answer.</td>
</tr>
<tr>
<td>5085553332#</td>
<td>“Message,” usually a call-back number for the paging service.</td>
</tr>
<tr>
<td>;</td>
<td>Return to console command mode. Must be entered at end of string.</td>
</tr>
</tbody>
</table>
C.1.5 Resetting the RCM to Factory Defaults

If the escape sequence has been forgotten, you can reset the controller to factory settings.

Reset Procedure

1. Power down the AlphaServer system and access the server control module, as follows:
   
   Expose the PCI bus card cage. Remove three Phillips head screws holding the cover in place and slide it off the drawer. If necessary, remove several PCI and EISA options from the bottom of the PCI card cage until you have enough space to access the server control module.

2. Unplug the external power supply to the server control module.
   
   Locate the password and option reset jumper. The jumper number, which is etched on the board, depends on the revision of the server control module.
   
   NOTE: If the RCM section of the server control module does not have an orange relay, the jumper number is J6. If the RCM section of the server control module has an orange relay, the jumper number is J7.

3. Move the jumper so that it is sitting on both pins.

4. Replace any panels or covers as necessary so you can power up the system. Press the Halt button and then power up the system to the SRM console prompt. Powering up with the password and option reset jumper in place resets the escape sequence, password, and modem enable states to the factory default.

5. When the console prompt is displayed, power down the system and move the password and option reset jumper back onto the single pin.

6. Replace any PCI or EISA modules you removed and replace the PCI bus card cage cover.

7. Power up the system to the SRM console prompt and type the default escape sequence to enter RCM command mode:

   ^] ^] RCM

8. Configure the module as desired. You must reset the password and modem enable states in order to enable remote access.
C.1.6 Troubleshooting Guide

Table C-3 lists a number of possible causes and suggested solutions for symptoms you might see.

<table>
<thead>
<tr>
<th>Symptom</th>
<th>Possible Cause</th>
<th>Suggested Solution</th>
</tr>
</thead>
<tbody>
<tr>
<td>The local terminal will not communicate with the system or the RCM.</td>
<td>System and terminal baud rate set incorrectly.</td>
<td>Set the system and terminal baud rates to 9600 baud.</td>
</tr>
<tr>
<td></td>
<td>Cables not correctly installed.</td>
<td>Review external cable installation.</td>
</tr>
<tr>
<td>RCM will not answer when the modem is called.</td>
<td>Modem cables may be incorrectly installed.</td>
<td>Check modem phone lines and connections.</td>
</tr>
<tr>
<td></td>
<td>RCM remote access is disabled.</td>
<td>Enable remote access.</td>
</tr>
<tr>
<td></td>
<td>RCM does not have a valid password set.</td>
<td>Set password and enable remote access.</td>
</tr>
<tr>
<td></td>
<td>The local terminal is currently in the RCM console firmware.</td>
<td>Issue a <strong>quit</strong> command on the local terminal.</td>
</tr>
<tr>
<td></td>
<td>On power-up, the RCM defers initializing the modem for 30 seconds to allow the modem to complete its internal diagnostics and initialization.</td>
<td>Wait 30 seconds after powering up the system and RCM before attempting to dial in.</td>
</tr>
<tr>
<td></td>
<td>Modem may have had power cycled since last being initialized or modem is not set up correctly.</td>
<td>Enter <strong>enable</strong> command from RCM console.</td>
</tr>
<tr>
<td>Symptom</td>
<td>Possible Cause</td>
<td>Suggested Solution</td>
</tr>
<tr>
<td>------------------------------------------------------------------------</td>
<td>--------------------------------------------------------------------------------</td>
<td>----------------------------------------------------------------------------------</td>
</tr>
<tr>
<td>After the system and RCM are powered up, the COM port seems to hang and then starts working after a few seconds.</td>
<td>This delay is normal behavior.</td>
<td>Wait a few seconds for the COM port to start working.</td>
</tr>
<tr>
<td>RCM installation is complete, but system will not power up.</td>
<td>RCM Power Control: is set to DISABLE.</td>
<td>Enter RCM console and issue the <code>poweron</code> command.</td>
</tr>
<tr>
<td>New password, escape sequence, and modem enable state are forgotten when system and RCM module are powered down.</td>
<td>The password and option reset jumper is still installed. If the RCM section of the server control module does not have an orange relay, the jumper number is J6. If it does have an orange relay, the number is J7.</td>
<td>After resetting RCM to factory defaults, move the jumper so that it is sitting on only one pin.</td>
</tr>
<tr>
<td>The remote user sees a “+++” string on the screen.</td>
<td>The modem is confirming whether the modem has really lost carrier. This occurs when the modem sees an idle time, followed by a “3,” followed by a carriage return, with no subsequent traffic. If the modem is still connected, it will remain so.</td>
<td>This is normal behavior.</td>
</tr>
<tr>
<td>The message “unknown command” is displayed when the user enters a carriage return by itself.</td>
<td>The terminal or terminal emulator is including a linefeed character with the carriage return.</td>
<td>Change the terminal or terminal emulator setting so that “new line” is not selected.</td>
</tr>
</tbody>
</table>

Continued on next page
<table>
<thead>
<tr>
<th>Symptom</th>
<th>Possible Cause</th>
<th>Suggested Solution</th>
</tr>
</thead>
<tbody>
<tr>
<td>Cannot enable modem or modem will not answer.</td>
<td>The modem is not configured correctly to work with the RCM.</td>
<td>Modify the modem initialization and/or answer string.</td>
</tr>
</tbody>
</table>
C.1.7 Modem Dialog Details

This section provides further details on the dialog between the RCM and the modem and is intended to help you reprogram your modem if necessary.

Phases of Modem Operation

The RCM is programmed to expect specific responses from the modem during four phases of operation:

- Initialization
- Ring detection
- Answer
- Hang-up

The initialization and answer command strings are stored in the RCM NVRAM. The factory default strings are:

Initialization string: `AT&F0EVS0=0S12=50<cr>`

Answer string: `ATXA<cr>`

*NOTE:* All modem commands must be terminated with a `<cr>` character (0x0d hex).

**Initialization**

The RCM initializes the modem to the following configuration:

- Factory defaults (&F0)
- No Echo (E)
- Numeric response codes (V)
- No Auto Answer (S0=0)
- Guard-band = 1 second (S12=50)
- Fixed modem-to-RCM baud rate
- Connect at highest possible reliability and speed

The RCM expects to receive a “0<cr>” (OK) in response to the initialization string. If it does not, the `enable` command will fail.
This default initialization string works on a wide variety of modems. If your modem does not configure itself to these parameters, the initialization string will need to be modified. See the topic in this section entitled Modifying Initialization and Answer Strings.

**Ring Detection**

The RCM expects to be informed of an in-bound call by the modem signaling the RCM with the string, “2<cr>” (RING).

**Answer**

When the RCM receives the ring message from the modem, it responds with the answer string. The “X” command modifier used in the default answer string forces the modem to report simple connect, rather than connect at xxxx. The RCM expects a simple connect message, “1<cr>” (CONNECTED). If the modem responds with anything else, the RCM forces a hang-up and initializes the modem.

The default answer string is formatted to request the modem to provide only basic status. If your modem does not provide the basic response, the answer string, and/or initialization string will need to be modified. See the topic in this section entitled Modifying Initialization and Answer Strings.

After receiving the “connect” status, the modem waits for 6 seconds and then prompts the user for a password.

**Hangup**

When the RCM is requested to hang up the modem, it forces the modem into command mode and issues the hangup command to the modem. This is done by pausing for a minimum of the guard time, sending the modem “+++”. When the modem responds with “0<cr>” (OK), the hang-up command string is sent. The modem should respond with “3<cr>” (NO CARRIER). After this interchange, the modem is reinitialized in preparation for the next dial-in session.
**RCM/Modem Interchange Overview**

Table C-4 summarizes the actions between the RCM and the modem from initialization to hangup.

**Table C-4  RCM/Modem Interchange Summary**

<table>
<thead>
<tr>
<th>Action</th>
<th>Data to Modem</th>
<th>Data from Modem</th>
</tr>
</thead>
<tbody>
<tr>
<td>Initialization command</td>
<td>AT&amp;F0EVSO=0S12=50&lt;cr&gt;</td>
<td></td>
</tr>
<tr>
<td>Initialization successful</td>
<td>0&lt;cr&gt;</td>
<td></td>
</tr>
<tr>
<td>Phone line ringing</td>
<td>2&lt;cr&gt;</td>
<td></td>
</tr>
<tr>
<td>RCM answering</td>
<td>ATXA&lt;cr&gt;</td>
<td></td>
</tr>
<tr>
<td>Modem successfully connected</td>
<td>1&lt;cr&gt;</td>
<td></td>
</tr>
<tr>
<td>Force modem into command mode</td>
<td>&lt;guard_band&gt;+ + +</td>
<td></td>
</tr>
<tr>
<td>Modem in command mode</td>
<td>0&lt;cr&gt;</td>
<td></td>
</tr>
<tr>
<td>Hangup</td>
<td>ATH&lt;cr&gt;</td>
<td></td>
</tr>
<tr>
<td>Successful hangup</td>
<td>3&lt;cr&gt;</td>
<td></td>
</tr>
</tbody>
</table>

**Modifying Initialization and Answer Strings**

The initialization and answer strings are stored in the RCM’s NVRAM. They come pre-programmed to support a wide selection of modems. In the circumstance where the default initialization and answer strings do not set the modem into the desired mode, the following SRM set and show commands are provided to enable the user to define and examine the initialization and answer strings.

**To replace the initialization string:**

```
P00>>> set rcm_init "new_init_string"
```

**To replace the answer string:**

```
P00>>> set rcm_answer "new_answer_string"
```
To display all the RCM user settable strings:

P00>>> show rcm*
rcm_answer ATXA
rcm_dialout
rcm_init AT&F0EV$0=0S12=50
P00>>>  

Initialization and Answer String Substitutions

The RCM default initialization and answer strings are as follows:

Initialization String: “AT&F0EVS0=0S12=50”
Answer String: “ATXA”

The following modem requires a modified answer string.

<table>
<thead>
<tr>
<th>Initialization String</th>
<th>Answer String</th>
</tr>
</thead>
<tbody>
<tr>
<td>USRobotics Sportster</td>
<td>RCM default</td>
</tr>
<tr>
<td>28,800 Data/Fax Modem</td>
<td>“ATX0&amp;B1&amp;A0A”</td>
</tr>
</tbody>
</table>
Index

? command, RCM, C-10
4000 system drawer, 1-4, 1-6
4100 system drawer, 1-2

A
alpha2l64 microprocessor, 1-16
Alpha chip composition, 1-20
AlphaBIOS console, 1-15
  loading, 2-7
  upgrading, A-28
Architecture, system, 1-16
  auto_action environment variable,
  SRM, 2-23
Auxiliary voltage (vaux), 4-9

B
B3002-AA CPU module, 1-21
B3002-AB CPU module, 1-21
B3002-BA CPU module, 1-21
B3004-AA CPU module, 1-21
B3004-DA CPU module, 1-21
B3020-CA memory module, 1-23, 7-3
B3030-EA memory module, 1-23, 7-3
B3030-FA memory module, 1-23, 7-3
B3030-GA memory module, 1-23
B3040-AA bridge module, 1-28, 7-3
B3040-AB bridge module, 7-3
B3050-AA PCI motherboard, 1-30, 7-3
B3051-AA PCI motherboard, 7-3
BA30A system drawer, 1-2
BA30B system drawer, 1-6
BA30C system drawer, 1-4
B-cache, 2-21, 2-23
Bridge module (B3040-AA)
  removal and replacement, 7-26
Bridge module (B3040-AB)
  removal and replacement, 7-28
Bridge module LEDs, 3-3

C
Cabinet differences, 1-9
Cabinet fan tray
  fan removal and replacement, 7-66
  fan tray fan fail detect module
  removal and replacement, 7-68
  power supply removal and
  replacement, 7-64
  removal and replacement, 7-62
Cabinet system, 1-8
  power and fan LEDs, 3-4
  power supply for remote access,
  3-5
Cables and jumpers, system drawer,
  7-5, 7-6
Cables, pedestal, 7-7
CAP Error Register, 6-11
CAP Error Register Data Pattern, 5-46
CAP_ERR Register, 6-11
CD-ROM
removal and replacement, 7-60
COM1 port, 2-19
Command codes, 5-54
Command summary (SRM), B-2
Components
housed in system drawer, 1-2, 1-4, 1-6
Console
SRM, 2-23
Console device determination, 2-18
Console device options, 2-19
Console device, changing, 2-19
console environment variable, SRM, 2-21, 2-23
Console power-up tests, 2-16
Control panel, 1-12, 2-2
display, 2-21
Halt button, 1-13
LCD potentiometer, 2-2
messages in display, 2-3
Controls
Halt button, 1-13
Cover interlocks, 1-3, 1-5, 1-7, 4-7
overriding, 4-7
removal and replacement, 7-50, 7-52
CPU and bridge module LEDs, 3-2
CPU LEDs, 3-3
CPU module, 1-20
configuration rules, 1-21
removal and replacement, 7-18
variants, 1-21
CPU modules, 1-17, 7-3

D
DECevent, 5-6
report formats, 5-10
DIAGNOSE command, 5-7
Diagnostics, test command, 3-12
disable command, RCM, C-9
display command (LFU), A-24, A-25
Double error halt, 5-57
Drives, CD-ROM and floppy, 1-12

E
ECC syndrome bits, 5-53
ECU, running, A-4
EL_ADDR Register, 6-6
EL_STAT Register, 6-2
enable command, RCM, C-9
Environment variables
SRM console, B-4
Environment variables, SRM, 1-15
auto_action, 2-23
console, 2-21, 2-23
os_type, 2-23
Error detector placement, 5-2
Error log events, 5-5
Error registers, 6-1
Event files, translating, 5-7
Events, filtering, 5-8
External Interface Address Register, 6-6
External Interface Registers
loading and locking rules, 6-7
External Interface Status Register, 6-2

F
Fail-safe loader, 2-24
Fan removal and replacement, 7-48
Fan tray cables (cabinet), 7-4
Fan tray, cabinet system, 1-9
Fan tray, LEDs, 3-5
Fans, 7-3
Fans, top of cabinet, 3-5
Fatal errors, 5-5
FEPROM
and XSROM test flow, 2-13
defined, 2-5
Firmware
RCM, C-6
updating, A-8
updating from AlphaBIOS, A-27
updating from CD-ROM, A-9
updating from floppy disk, A-14, A-16
updating from network device, A-20
updating, AlphaBIOS selection, A-6
updating, SRM command, A-6

Floppy
removal and replacement, 7-58
FRU list, 7-2
4000 power system, 7-10
4100 power system, 7-8
FRU part numbers, 7-3

G
Graphics monitor, VGA, 2-19

H
H7600-AA power controller, 1-9
H7600-DB power controller, 1-9
halt command, RCM, C-10
Halt
caused by power problem, 3-6
hangup command, RCM, C-9
Hard errors, categories of, 5-4
help command (LFU), A-24, A-25
help command, RCM, C-10

I
I squared C bus, 3-10
INFO 3 command, 5-58
INFO 5 command, 5-60
INFO 8 command, 5-62
Initialization and answer strings
modifying for modem, C-24
substitutions, C-25
Interlock switches, 7-50, 7-52
IOD, 2-23
IOD detected failure
PCI error, 5-32
System bus error, 5-27
IOD error interrupts, 5-5
IOD, defined, 5-2

L
LCD, 2-2
LEDs
troubleshooting with, 3-2
LEDs, fan and power in cabinet, 3-5
LFU
exit command, A-25
starting, A-6, A-8
starting the utility, A-6
typical update procedure, A-8
update command, A-26
updating firmware from CD-ROM, A-9
updating firmware from floppy
disk, A-14, A-16
updating firmware from network
device, A-20
lfu command (LFU), A-17, A-19, A-24, A-25
LFU commands
display, A-24, A-25
help, A-24, A-25
readme, A-24, A-26
summary, A-24
verify, A-24, A-26
list command (LFU), A-11, A-17, A-21, A-24, A-26
Loadable Firmware Update utility.
See LFU

M
Machine checks in PAL mode, 5-57
Maintenance bus, 3-10
Maintenance bus controller, 3-10
MC Error Information Register 0, 6-8
MC Error Information Register 1, 6-9
MC_ERR0 Register, 6-8
MC_ERR1 Register, 6-9
MCHK 620 correctable error, 5-44
MCHK 630 correctable CPU error, 5-41
MCHK 660 IOD detected failure, 5-27, 5-32
MCHK 670 CPU and IOD detected failure, 5-16
MCHK 670 CPU-detected failure, 5-11
MCHK 670 read dirty failure, 5-21
Memory addressing, 1-24
Memory errors
  corrected read data error, 5-52
  read data substitute error, 5-52
Memory modules, 1-17, 1-22, 7-3
  removal and replacement, 7-22
  variants, 1-23
Memory operation, 1-23
Memory option
  configuration rules, 1-23
Memory pairs, 1-23
Memory tests, 2-14, 2-21
Memory, broken, 5-52
Modem, C-2
  answer, C-23
  dial-in procedure, C-4
  hangup, C-23
  phases of operation, C-22
  ring detection, C-23

N
Node IDs, 5-55
NVRAM, 2-3, 2-8, 7-34

O
Operator control panel removal and replacement
  cabinet system, 7-54
  pedestal system, 7-56
os_type environment variable, SRM, 2-7, 2-23

P
Page table entry invalid error, 5-51
PALcode, 2-23
PALcode, described, 5-56
PCI Error Status Register 1, 6-14
PCI I/O subsystem, 1-30
PCI master abort, 5-51
PCI motherboard, 1-31
PCI motherboard (B3050)
  removal and replacement, 7-34
PCI motherboard (B3051)
  removal and replacement, 7-36
PCI parity error, 5-51
PCI system error, 5-51
PCI/EISA option
  removal and replacement, 7-40
PCI_ERR Register, 6-14
Pedestal system, 1-10
PIO buffer overflow error
  (PIO_OVFL), 5-50
Potentiometer, 2-2
Power circuit
  and cover interlocks, 4-6
  diagram, 4-6
  failures, 4-7
Power configuration rules
  cabinet system, 4-10
  pedestal system, 4-14, 4-15
  redundancy, 1-37
Power control module, 1-17, 1-34
  LED states, 3-9
  removal and replacement, 7-24
Power control module features, 4-4
Power control module LEDs, 3-8
Power cords, internal, 7-4
Power faults, 4-9
Power harness
  removal and replacement, 7-44, 7-46
Power problems
  at power-up, 3-7
Power supply, 1-36
  fault protection, 4-3
  outputs, 4-2
  removal and replacement, 7-42
voltages, 4-3
Power system components, 7-4
poweroff command, RCM, C-10
poweron command, RCM, C-11
Power-up
  SROM and XSROM messages during, 2-19
Power-up display, 2-20
Power-up sequence, 4-8
Processor
  determining primary, 2-21
Processor correctable error, 5-5
Processor machine checks, 5-5

Q
quit command, RCM, C-11

R
RAID Standalone Configuration
  Utility, running, A-5
RCM, C-1
  command summary, C-6
dial-out alerts, C-15
  entering and leaving command mode, C-5
  modem usage, C-2
  resetting to factory defaults, C-18
  troubleshooting, C-19
  typical dialout command, C-15
RCM commands
  ?, C-10
  alert_clr, C-8
  alert_dis, C-8
  alert_ena, C-8
disable, C-9
enable, C-9
halt, C-10
hangup, C-9
help, C-10
poweroff, C-10
poweron, C-11
quit, C-11
reset, C-12
setesc, C-12
setpass, C-13
status, C-14
rcm_dialout command, C-15
readme command (LFU), A-24, A-26
Redundant power, 1-37
Registers, 6-1
Remote console monitor. See RCM
Remote console monitor module, 1-32
reset command, RCM, C-12

S
Safety guidelines, 7-1
Serial number, system, 7-30, 7-32
  restoring with set
    sys_serial_num, 7-31, 7-33
Serial ports, 1-31
Serial terminal, 2-19
Server control module, 1-32
  removal and replacement, 7-38
Server control module power, 7-5
set sys_serial_num command, 7-31, 7-33
setesc command, RCM, C-12
setpass command, RCM, C-13
show power command (SRM), 1-37
Soft errors, categories of, 5-4
SRM commands
  show power, 1-37
SRM console, 1-15, 2-23
SRM, 2-21
  defined, 2-4
  errors, 2-11
  power-up test flow, 2-8
  tests, 2-10
Standard I/O, 1-32
status command, RCM, C-14
StorageWorks shelf removal and replacement, 7-70
sys_model_number environment variable, 7-34
sys_type environment variable, 7-34
System bus, 1-17, 1-26
System bus address parity error, 5-49
System bus ECC error, 5-47
System bus nonexistent address error, 5-48
System bus to PCI bus bridge module, 1-17, 1-28
System bus to PCI/EISA bus bridge module, 1-17
System consoles, 1-14
System correctable errors, 5-5
System drawer
components of, 1-2, 1-4, 1-6
FRU locations, 7-2
fully configured, 1-17
remote operation, C-1
System drawer exposure
original cabinet, 7-12
pedestal, 7-16
System drawer modules, 7-3
System machine checks, 5-5
System model number, displaying, 7-34
System motherboard, 1-18
System motherboard (4000) removal and replacement, 7-32
System motherboard (4100 & early 4000) removal and replacement, 7-30

T
Test command
for entire system, 3-13
Test mem command, 3-15

Test pci command, 3-17
Troubleshooting
failures at power-up, 3-7
IOD detected errors, 5-46
power problems, 3-6
using error logs, 5-2

U
Updating firmware
AlphaBIOS console, A-27
from AlphaBIOS console, A-6
from SRM console, A-6
Utility programs
running from graphics monitor, A-2
running from serial terminal, A-3

V
verify command (LFU), A-24, A-26

X
XBUS, 1-31
XSROM
defined, 2-4
effects, 2-15
power-up test flow, 2-12
tests, 2-13