| This Blitz shows how to decode the eeprom halt frame.
[TD 2262] AlphaServer 8200/8400, System Hang/Checklist - BLITZ
******************** CAUTION: FOR INTERNAL USE ONLY *********************
* *
* THIS INFORMATION IS FOR USE BY DIGITAL EQUIPMENT CORP. AND ITS *
* EMPLOYEES ONLY. PLEASE USE EXTREME CARE IF YOU MUST DISCUSS ANY *
* PART OF THIS INFORMATION WITH ANYONE WHO IS NOT A DIGITAL EMPLOYEE. *
* *
******************************************************************************
Copyright (c) Digital Equipment Corporation 1997. All rights reserved.
+---------------------------+TM
| | | | | | | |
| d | i | g | i | t | a | l | TIME DEPENDENT BLITZ
| | | | | | | |
+---------------------------+
BLITZ TITLE: AlphaServer 8200/8400, System Hang/Checklist
PRIORITY LEVEL: 1
DATE: 12 Mar 97
TD #: 2262
AUTHOR: Wayne Sylvia
DTN: 223-6325
EMAIL: Proxy::Sylvia
DEPARTMENT: Revenue Systems Engineering
=======================================================================
PRODUCT NAME(S): Alphaserver 8200/8400
PRODUCT FAMILY(IES):
Storage ___
Systems/OS _x_
Networks ___
PC/Peripherals ___
Software Apps. ___
BLITZ TYPE:
Maintenance Tip _x_
Service Action Requested ___
IF SERVICE ACTION IS REQUESTED:
Labor Support Required ___
Material Support Required ___
Estimated time to complete activity (in hours):
Will this require a change in the field's inventory: Yes ___ No _x_
Will an FCO be associated with this advisory? Yes ___ No _x_
***********************************************************************
PROBLEM:
Alphaserver 8200/8400 console version V4.8-6 has incorporated a fix
to a console multiprocessor synchronization bug encountered following
the occurrence of a Machine Check while in PAL Mode Halt; this bug
surfaces as a system hang. (It should be noted that this fix has
also been included into the interim V4.3 console release.)
RESOLUTION/WORKAROUND:
It is strongly recommended that, wherever possible, the Alphaserver
8200/8400 console firmware be upgraded to V4.8-6, or greater.
Version V4.8-6 is contained on the Firmware Update CD V3.9, order
number AI-ROUFC-BE.
ADDITIONAL INFORMATION:
The following checklist is intended as an aid, in the event of an
Alphaserver 8200/8400 system hang, in the collection of hardware
state information and isolation of the failure to its root cause.
1. Check the status of all LEDs, to include the system control
panel, cabinet control logic module, power subsystem(s), TLSB
modules, and all I/O busses, adapters and devices. (For
information relating to the locations and descriptions of most
LEDs, please refer to the Alphaserver 8200/8400 Service Manual,
EK-T8030-SV.)
2. Type "Control-P" to enter console I/O mode.
Please note that where no entry into the system can be gained,
it will not be possible to acquire the minimum hardware state
information required to effectively isolate the failure to its
root cause. In this situation, it will be necessary to:
a. Type "Control-T" (ref: step 3 below).
b. Initialize the system; check system self-test display.
c. Perform a console "SHOW EEPROM HALT" and "SHOW EEPROM
SYMPTOM" command on each CPU module (ref: step 4 below).
d. Test MS7CC* to verify the functionality of the memory.
Wherever possible, the console firmware should be upgraded
to the latest revision. Recent firmware revisions have
become more robust in regards to the identification and
handling of memory failures.
e. Boot the operating system
3. Type "Control-T" to display the status of all running console
processes.
4. Perform a console "SHOW EEPROM HALT" and "SHOW EEPROM SYMPTOM"
command; these commands should be performed on each CPU module
(specify or set cpu as appropriate). For Dual-CPU modules,
please note that the show eeprom halt/symptom commands will
display all frames logged on that CPU module and will identify
the frames as logged by CPU 0 or 1.
The turbolaser console provides a non-volatile area in each
processor's EEPROM (flash) for the storage of halt and symptom
frames. Briefly, a halt frame is built upon the occurrence of
a CPU double-error halt or machine check while in PAL mode halt
and, basically, consists of the machine check logout frame and
TLSB node registers (ref: EEPROM Halt Frame Description).
Similarly, the EEPROM symptom area stores OS error log
information. The contents of each entry is based upon the event
type: 620 System Correctable Error, 630 CPU Correctable Error,
660 System Machine Check, or 670 CPU Machine Check.
5. Using the Console "INFO" Command, extract, at a minimum, the
following information. (The console INFO command will list the
available options.)
1. Bitmap
5. TLSB Registers
7. LOGOUT Area
16. PCIA Registers
6. Force a system crash using the console "CRASH" command.
7. Analyze the system crash dump file and console and system error
logs.
EEPROM Halt Frame Description:
The following layout is provided as an aid in decoding the EEPROM
Halt Frame (ref: EEPROM HALT FRAME example) and is applicable to
console versions prior to V4.8-6. Console version V4.8-6 and
greater provides the respective register/longword description.
Offset Longword/Quadword Description
------ ------------------------------------------------
Machine Check Logout Frame:
0 Frame Size
4 R,S,D,C
8 CPU Area Offset
C System Area Offset
10 Machine Check Reason Mask
14 Machine Check Frame Revision
18 PAL shadow Register 0
20 PAL shadow Register 1
28 PAL shadow Register 2
30 PAL shadow Register 3
38 PAL shadow Register 4
40 PAL shadow Register 5
48 PAL shadow Register 6
50 PAL shadow Register 7
58 PAL Temp Register 0
60 PAL Temp Register 1
68 PAL Temp Register 2
70 PAL Temp Register 3
78 PAL Temp Register 4
80 PAL Temp Register 5
88 PAL Temp Register 6
90 PAL Temp Register 7
98 PAL Temp Register 8
A0 PAL Temp Register 9
A8 PAL Temp Register 10
B0 PAL Temp Register 11
B8 PAL Temp Register 12
C0 PAL Temp Register 13
C8 PAL Temp Register 14
D0 PAL Temp Register 15
D8 PAL Temp Register 16
E0 PAL Temp Register 17
E8 PAL Temp Register 18
F0 PAL Temp Register 19
F8 PAL Temp Register 20
100 PAL Temp Register 21
108 PAL Temp Register 22
110 PAL Temp Register 23
118 EXC_Addr
120 EXC_Sum
128 EXC_Mask
130 PAL_Base
138 ISR
140 ICSR
148 IC_PERR_Stat
150 DC_PERR_Stat
158 VA
160 MM_Stat
168 SC_Addr
170 SC_Stat
178 BC_Tag_Addr
180 EI_Addr
188 Fil_Syn
190 EI_Stat
198 LD_Lock
1A0 rsvd | MISCR | rsvd | Whami
1A4 reserved
1A8 TLDEV
1AC TLBER
1B0 TLCNR
1B4 TLVID
1B8 TLESR0
1BC TLESR1
1C0 TLESR2
1C4 TLESR3
1C8 TLEPAERR
1CC TLMODCONFIG
1D0 TLEPMERR
1D4 TLEPDERR
1D8 TLINTRMASK
1DC TLINTRSUM
1E0 TLEP_VMG
1E4 spare
1E8 spare
1EC TL56WERR0 (KN7CE only)
1F0 TL56WERR1 (KN7CE only)
1F4 TL56WERR2 (KN7CE only)
1F8 TL56WERR3 (Kn7CE only)
1FC spare
TLSB Node Registers:
200 TLDEV, TLSB Node 0
204 TLBER, TLSB Node 0
208 TLDEV, TLSB Node 1
20C TLBER, TLSB Node 1
210 TLDEV, TLSB Node 2
214 TLBER, TLSB Node 2
218 TLDEV, TLSB Node 3
21C TLBER, TLSB Node 3
220 TLDEV, TLSB Node 4
224 TLBER, TLSB Node 4
228 ICCNSE or BB+2040, TLSB Node 4
22C ICCWTR or BB+2100, TLSB Node 4
230 IDPNSE0 or BB+2A40, TLSB Node 4
234 IDPNSE1 or BB+2140, TLSB Node 4
238 IDPNSE2 or BB+2240, TLSB Node 4
23C IDPNSE3 or BB+2340, TLSB Node 4
240 TLDEV, TLSB Node 5
244 TLBER, TLSB Node 5
248 ICCNSE or BB+2040, TLSB Node 5
24C ICCWTR or BB+2100, TLSB Node 5
250 IDPNSE0 or BB+2A40, TLSB Node 5
254 IDPNSE1 or BB+2140, TLSB Node 5
258 IDPNSE2 or BB+2240, TLSB Node 5
25C IDPNSE3 or BB+2340, TLSB Node 5
260 TLDEV, TLSB Node 6
264 TLBER, TLSB Node 6
268 ICCNSE or BB+2040, TLSB Node 6
26C ICCWTR or BB+2100, TLSB Node 6
270 IDPNSE0 or BB+2A40, TLSB Node 6
274 IDPNSE1 or BB+2140, TLSB Node 6
278 IDPNSE2 or BB+2240, TLSB Node 6
27C IDPNSE3 or BB+2340, TLSB Node 6
280 TLDEV, TLSB Node 7
284 TLBER, TLSB Node 7
288 ICCNSE or BB+2040, TLSB Node 7
28C ICCWTR or BB+2100, TLSB Node 7
290 IDPNSE0 or BB+2A40, TLSB Node 7
294 IDPNSE1 or BB+2140, TLSB Node 7
298 IDPNSE2 or BB+2240, TLSB Node 7
29C IDPNSE3 or BB+2340, TLSB Node 7
2A0 TLDEV, TLSB Node 8
2A4 TLBER, TLSB Node 8
2A8 ICCNSE, TLSB Node 8
2AC ICCWTR, TLSB Node 8
2B0 IDPNSE0, TLSB Node 8
2B4 IDPNSE1, TLSB Node 8
2B8 IDPNSE2, TLSB Node 8
2BC IDPNSE3, TLSB Node 8
Timestamp:
2C0 WATCH$: DD | HH | MM | SS
2C4 WATCH$: YY | MM
EEPROM Halt Frame Example (as displayed by console):
CPU 0 Fatal Error Halt 1: PALcode Machine Check
00000000 00000000 00000001 0000fffa 000001a0 00000118 00000000 00000200 0
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 20
00000000 00000001 00000000 00000000 00000000 00000000 00000000 00000000 40
fffffc00 005d3eb0 00000000 00005200 fffffc00 00466c10 fffffccf 8113a200 60
1f1e1615 14020100 fffffc00 00466660 fffffc00 b8f0fa40 fffffc00 b8f0fa40 80
fffffc00 00466b80 fffffc00 004667e0 00000000 0001c515 fffffc00 00466980 a0
00000098 06700001 00000002 040585d9 00000000 00000000 00000055 55400000 c0
00000000 00af6000 fffffffe 8fbb7508 00000000 00000000 00000000 00000000 e0
00000000 0001c515 00000000 e9d6fa38 fffffc00 005c3eb0 fffffc00 00466bb0 100
00000000 00d00000 00000000 00018000 00000000 00000000 00000000 00000000 120
fffffffe 007c8018 00000000 00000000 00000000 00002000 00000061 60020000 140
ffffff80 e84d6fff 00000000 00000000 ffffff00 0001d24f 00000000 00014910 160
ffffff00 e76fc5cf fffffff0 01ffffff 00000000 00009000 ffffff00 0011d69f 180
00400c0c 00800303 00000010 00000200 00800000 73008014 00000000 00550000 1a0
000000fe 000001ff 00000000 00000000 00e08a84 00600800 00409090 00406060 1c0
00000000 00041313 00000498 0004f811 0003a201 00000000 00000000 04000852 1e0
00000000 00000000 00800000 73008014 00800000 73008014 00800000 73008014 200
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 220
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 240
00000000 00000000 00000000 00000000 00000000 00000000 00100000 00005000 260
00000000 00000000 00000000 00000000 00000000 00000000 00800000 00005000 280
00000007 00000006 00000006 00000006 00000008 80000000 00000000 00002000 2a0
00002c0b 19152316 2c0
*** DIGITAL INTERNAL USE ONLY ***
|