T.R | Title | User | Personal Name | Date | Lines |
---|
578.1 | errorlog? | POBOXB::STEINMAN | | Mon Apr 28 1997 19:32 | 4 |
|
Can you get the binary errorlog from the system?
mo
|
578.2 | Here is the binary errorlog. | PANTER::AUBERT | | Tue Apr 29 1997 05:58 | 753 |
| You will find the binary errorlog on ruxack.geo.dec.com (ftp account),
file /pub/binary.errlog.shd52
You will find below the "dia -R -f binary.errlog.shd52" output.
Thanks for any diagnosis.
Thierry Aubert/DEC at CERN
% /usr/sbin/dia -R -f binary.errlog.shd52 | more
DECevent V2.3
******************************** ENTRY 1
********************************
Logging OS 2. Digital UNIX
System Architecture 2. Alpha
Event sequence number 1.
Timestamp of occurrence 25-APR-1997 18:38:26
Host name shd52
System type register x00000016 AlphaServer 4000 Series
Number of CPUs (mpnum) x00000001
CPU logging event (mperr) x00000000
Event validity 1. O/S claims event is valid
Event severity 5. Low Priority
Entry type 300. Start-Up ASCII Message Type
SWI Minor class 9. ASCII Message
SWI Minor sub class 3. Startup
ASCII Message
Alpha boot: available memory from 0xb18000 to 0xfffe000
Digital UNIX V4.0B (Rev. 564); Wed Apr 16 15:15:14 MET DST 1997
physical memory = 256.00 megabytes.
available memory = 244.89 megabytes.
using 975 buffers containing 7.61 megabytes of memory
Master cpu at slot 0.
Firmware revision: 3.0
PALcode: Digital-UNIX/OSF version 1.21
AlphaServer 4100 5/300 0MB
pci1 at mcbus0 slot 5
psiop0 at pci1 slot 1
Loading SIOP: script c0000c00, reg 4444000, data c000cb70
scsi0 at psiop0 slot 0
rz5 at scsi0 target 5 lun 0 (LID=0) (DEC RRD45 (C) DEC 0436)
pza0 at pci1 slot 2
pza0 firmware version: DEC P01 A10
scsi1 at pza0 slot 0
tz8 at scsi1 target 0 lun 0 (LID=1) (STK SD-3 011E)
(Wide16)
pza1 at pci1 slot 3
pza1 firmware version: DEC P01 A10
scsi2 at pza1 slot 0
tz16 at scsi2 target 0 lun 0 (LID=2) (STK SD-3
011E)
(Wide16)
pza2 at pci1 slot 4
pza2 firmware version: DEC P01 A10
scsi3 at pza2 slot 0
tz24 at scsi3 target 0 lun 0 (LID=3) (STK SD-3
011E)
(Wide16)
tz26 at scsi3 target 2 lun 0 (LID=4) (QUANTUM DLT7000
101A)
(Wide16)
pza3 at pci1 slot 5
pza3 firmware version: DEC P01 A10
scsi4 at pza3 slot 0
tz32 at scsi4 target 0 lun 0 (LID=5) (STK SD-3
011E)
(Wide16)
changer at scsi4 target 1 lun 0 (LID=6) (STK 9714
1300)
tz34 at scsi4 target 2 lun 0 (LID=7) (QUANTUM DLT7000
101A)
(Wide16)
gpc0 at eisa0
pci0 at mcbus0 slot 4
eisa0 at pci0
ace0 at eisa0
ace1 at eisa0
lp0 at eisa0
fdi0 at eisa0
fd0 at fdi0 unit 0
pci2000 at pci0 slot 2
isp0 at pci2000 slot 0
isp0: QLOGIC ISP1020A
isp0: Firmware revision 2.10 (loaded by console)
scsi5 at isp0 slot 0
rz40 at scsi5 target 0 lun 0 (LID=8) (DEC RZ29B (C) DEC
0016)
(Wide16)
rz41 at scsi5 target 1 lun 0 (LID=9) (DEC RZ29B (C) DEC
0016)
(Wide16)
rz42 at scsi5 target 2 lun 0 (LID=10) (DEC RZ29B (C) DEC
0016)
(Wide16)
rz43 at scsi5 target 3 lun 0 (LID=11) (DEC RZ29B (C) DEC
0016)
(Wide16)
rz44 at scsi5 target 4 lun 0 (LID=12) (DEC RZ29B (C) DEC
0016)
(Wide16)
rz45 at scsi5 target 5 lun 0 (LID=13) (DEC RZ29B (C) DEC
0016)
(Wide16)
rz46 at scsi5 target 6 lun 0 (LID=14) (DEC RZ29B (C) DEC
0016)
(Wide16)
tu0: DECchip 21140-AA: Revision: 1.2
tu0 at pci0 slot 3
tu0: DEC Fast Ethernet Interface, hardware address:
00-00-F8-31-11-6D
tu0: console mode: selecting 10BaseT (UTP) port: half duplex
hip0: Roadrunner version 2 (20000900)
hip0 at pci0 slot 4
hip0 slot 4: PCI/HIPPI interface 0-a0-88-1-0-88
fta0 DEC DEFPA FDDI Module, Hardware Revision 1
fta0 at pci0 slot 5
fta0: DMA Available.
fta0: DEC DEFPA (PDQ) FDDI Interface, Hardware address:
08-00-2B-B4-15-75
fta0: Firmware rev: 2.46
Created FRU table configuration binary errorlog packet
kernel console: ace0
dli: configured
******************************** ENTRY 2
********************************
Logging OS 2. Digital UNIX
System Architecture 2. Alpha
Event sequence number 0.
Timestamp of occurrence 25-APR-1997 18:38:26
Host name shd52
System type register x00000016 AlphaServer 4000 Series
Number of CPUs (mpnum) x00000001
CPU logging event (mperr) x00000000
Event validity 1. O/S claims event is valid
Event severity 5. Low Priority
Entry type 110. Generalized Machine State Type
SWI Minor class 3. System configuration
********** The Following Revision 4.0 FRU Table **********
********** is NOT supported at this time **********
**** FRU Table Header ***
Checksum of config pkt x7981B0585077D8F9
FRU Table length x00002C4F
FRU Table Revision x00000004
System Serial Number AY65200674
******************************** ENTRY 3
********************************
Logging OS 2. Digital UNIX
System Architecture 2. Alpha
Event sequence number 4.
Timestamp of occurrence 25-APR-1997 18:10:17
Host name shd52
System type register x00000016 AlphaServer 4000 Series
Number of CPUs (mpnum) x00000004
CPU logging event (mperr) x00000000
Event validity 1. O/S claims event is valid
Event severity 1. Severe Priority
Entry type 100. CPU Machine Check Errors
CPU Minor class 2. 660 Entry
Software Flags x0000000300000000
IOD 0 Register Subpkt Pres
IOD 1 Register Subpkt Pres
Active CPUs x0000000F
Hardware Rev x00000000
System Serial Number AY65200674
Module Serial Number
Module Type x0000
System Revision x00000000
* MCHK 660 Regs *
Flags: x00000000
PCI Mask x0000
Machine Check Reason x0202 IOD-Detected Hard Error -OR-
DTag Parity Error (If Cached CPU)
PAL SHADOW REG 0 x0000000000000000
PAL SHADOW REG 1 x0000000000000000
PAL SHADOW REG 2 x0000000000000000
PAL SHADOW REG 3 x0000000000000000
PAL SHADOW REG 4 x0000000000000000
PAL SHADOW REG 5 x0000000000000000
PAL SHADOW REG 6 x0000000000000000
PAL SHADOW REG 7 x0000000000000000
PALTEMP0 x0000000000000007
PALTEMP1 xFFFFFC00005E2EB0
PALTEMP2 xFFFFFC0000464C80
PALTEMP3 x0000000000004400
PALTEMP4 x00000000000F40C2
PALTEMP5 x0000F980000003F8
PALTEMP6 x0000000000000000
PALTEMP7 xFFFFFC00004646C0
PALTEMP8 x1F1E171515020100
PALTEMP9 xFFFFFC00004649F0
PALTEMP10 xFFFFFC000046E444
PALTEMP11 xFFFFFC0000464850
PALTEMP12 xFFFFFC0000464BF0
PALTEMP13 x0000000000006E80
PALTEMP14 x0000000000000000
PALTEMP15 x00000000000F0000
PALTEMP16 x0000020306600001
PALTEMP17 x0000000000000000
PALTEMP18 x000000011FFFE0C0
PALTEMP19 xFFFFFFFF90EAB7D0
PALTEMP20 x000000000A074000
PALTEMP21 xFFFFFC0000464C20
PALTEMP22 xFFFFFC00005E4530
PALTEMP23 x00000000014D9A38
Exception Address Reg xFFFFFC000046E444
Native-mode Instruction
Exception PC x3FFFFF000011B911
Exception Summary Reg x0000000000000000
Exception Mask Reg x0000000000000000
PAL Base Address Reg x0000000000014000
Base Addr for PALcode:
x0000000000000005
Interrupt Summary Reg x0000000000200000
External HW Interrupt at IPL21
AST Requests 3-0:
x0000000000000000
IBOX Ctrl and Status Reg x000000C164000000
Timeout Counter Bit Clear.
IBOX Timeout Counter Enabled.
Floating Point Instr's May be
Issued.
PAL Shadow Registers Enabled.
Correctable Error Interrupts
Enabled.
ICACHE BIST (Self Test) Was
Successful.
TEST_STATUS_H Pin Asserted
Icache Par Err Stat Reg x0000000000000000
Dcache Par Err Stat Reg x0000000000000000
Virtual Address Reg xFFFFFFFFFF8000A0
Memory Mgmt Flt Sts Reg x0000000000014890
If Err, Reference Resulted in DTB
Miss
Fault Inst RA Field:
x0000000000000002
Fault Inst Opcode:
x0000000000000029
Scache Address Reg xFFFFFF000001904F
Scache Status Reg x0000000000000000
Bcache Tag Address Reg xFFFFFFFFFFFFFFFF
Last Bcache Access Resulted in a
Hit.
Value of Parity Bit for Tag
Control Status
Bits Dirty, Shared & Valid is
Set.
Value of Tag Control Dirty Bit is
Set.
Value of Tag Control Shared Bit is
Set.
Value of Tag Control Valid Bit is
Set.
Value of Parity Bit Covering Tag
Store
Address Bits is Set.
Tag Address<38:20> Is:
x000000000007FFFF
Ext Interface Address Reg xFFFFFF000802A0BF
Fill Syndrome Reg x0000000000006900
Ext Interface Status Reg xFFFFFFF004FFFFFF
Error Occurred During D-ref Fill
LD LOCK xFFFFFF00005EDEDF
** IOD SUBPACKET -> ** IOD 0 Register Subpacket
WHOAMI x0000023A Module Revision 1.
CPU = 0
Base Address of Bridge x000000F9E0000000
Dev Type & Rev Register x06008032 CAP Chip Revision:
x00000002
HORSE Module Revision:
x00000003
SADDLE Module Revision:
x00000000
SADDLE Module Type: Left
Hand
PCI-EISA Bus Bridge Present on PCI
Segment
PCI Class Code
x00000600
MC-PCI Command Register x42460FF1 Module SelfTest Passed LED on
Delayed PCI Bus Reads Protocol:
Enabled
Bridge to PCI Transactions:
Enabled
Bridge REQUESTS 64 Bit Data
Transactions
Bridge ACCEPTS 64 Bit Data
Transactions
PCI Address Parity Check: Enabled
MC Bus CMD/Addr Parity Check:
Enabled
MC Bus NXM Check: Enabled
Check ALL Transactions for Errors
Use RD/MOD/WRT for <64 Byte Block
Mem Wrt
Wrt PEND_NUM Threshold: 6.
RD_TYPE Memory Prefetch Algorithm:
Short
RL_TYPE Mem Rd Line Prefetch Type:
Medium
RM_TYPE Mem Rd Multiple Cmd Type:
Long
ARB_MODE PCI Arbitration: Round
Robin
Mem Host Address Ext Reg x00000000 HAE Sparse Mem Adr<31:27>
x00000000
IO Host Adr Ext Register x00000000 PCI Upper Adr Bits<31:25>
x00000000
Interrupt Ctrl Register x00000003 Write Device Interrupt Info
Struct:Enabled
Interrupt Request x00811011 Interrupts asserted x00011011
Hard Error
Interrupt Mask0 Register x00C51111
Interrupt Mask1 Register x00000000
MC Error Info Register 0 x00006E80
MC Bus Trans Addr<31:4>: 6E80
MC Error Info Register 1 x800E8A04 MC bus trans addr <39:32>
x00000004
MC Command is ReadMod0-Mem
CPU0 Master at Time of Error
Device ID: x00000002
MC error info valid
CAP Error Register x85000000 Error Detected but Not Logged
Non-existant memory
MC error info latched
PCI Bus Trans Error Adr x000003FE
MDPA Status Register x00000000 MDPA Status Register Data Not
Valid
MDPA Error Syndrome Reg x00000000 MDPA Syndrome Register Data Not
Valid
MDPB Status Register x00000000 MDPB Status Register Data Not
Valid
MDPB Error Syndrome Reg x00000000 MDPB Syndrome Register Data Not
Valid
** IOD SUBPACKET -> ** IOD 1 Register Subpacket
WHOAMI x0000023A Module Revision 1.
CPU = 0
Base Address of Bridge x000000FBE0000000
Dev Type & Rev Register x06000032 CAP Chip Revision:
x00000002
HORSE Module Revision:
x00000003
SADDLE Module Revision:
x00000000
SADDLE Module Type: Left
Hand
Internal CAP Chip Arbiter: Enabled
PCI Class Code
x00000600
MC-PCI Command Register x42460FF1 Module SelfTest Passed LED on
Delayed PCI Bus Reads Protocol:
Enabled
Bridge to PCI Transactions:
Enabled
Bridge REQUESTS 64 Bit Data
Transactions
Bridge ACCEPTS 64 Bit Data
Transactions
PCI Address Parity Check: Enabled
MC Bus CMD/Addr Parity Check:
Enabled
MC Bus NXM Check: Enabled
Check ALL Transactions for Errors
Use RD/MOD/WRT for <64 Byte Block
Mem Wrt
Wrt PEND_NUM Threshold: 6.
RD_TYPE Memory Prefetch Algorithm:
Short
RL_TYPE Mem Rd Line Prefetch Type:
Medium
RM_TYPE Mem Rd Multiple Cmd Type:
Long
ARB_MODE PCI Arbitration: Round
Robin
Mem Host Address Ext Reg x00000000 HAE Sparse Mem Adr<31:27>
x00000000
IO Host Adr Ext Register x00000000 PCI Upper Adr Bits<31:25>
x00000000
Interrupt Ctrl Register x00000003 Write Device Interrupt Info
Struct:Enabled
Interrupt Request x00800000 Interrupts asserted x00000000
Hard Error
Interrupt Mask0 Register x00C51111
Interrupt Mask1 Register x00000000
MC Error Info Register 0 x00006E80
MC Bus Trans Addr<31:4>: 6E80
MC Error Info Register 1 x800E8A04 MC bus trans addr <39:32>
x00000004
MC Command is ReadMod0-Mem
CPU0 Master at Time of Error
Device ID: x00000002
MC error info valid
CAP Error Register x85000000 Error Detected but Not Logged
Non-existant memory
MC error info latched
PCI Bus Trans Error Adr x00000000
MDPA Status Register x00000000 MDPA Status Register Data Not
Valid
MDPA Error Syndrome Reg x00000000 MDPA Syndrome Register Data Not
Valid
MDPB Status Register x00000000 MDPB Status Register Data Not
Valid
MDPB Error Syndrome Reg x00000000 MDPB Syndrome Register Data Not
Valid
PALcode Revision Palcode Rev: 1.21-3
******************************** ENTRY 4
********************************
Logging OS 2. Digital UNIX
System Architecture 2. Alpha
Event sequence number 3.
Timestamp of occurrence 25-APR-1997 18:10:17
Host name shd52
System type register x00000016 AlphaServer 4000 Series
Number of CPUs (mpnum) x00000004
CPU logging event (mperr) x00000000
Event validity 1. O/S claims event is valid
Event severity 1. Severe Priority
Entry type 302. ASCII Panic Message Type
SWI Minor class 9. ASCII Message
SWI Minor sub class 1. Panic
ASCII Message panic (cpu 0): System
Uncorrectable
Machine Check
******************************** ENTRY 5
********************************
Logging OS 2. Digital UNIX
System Architecture 2. Alpha
Event sequence number 2.
Timestamp of occurrence 25-APR-1997 18:10:13
Host name shd52
System type register x00000016 AlphaServer 4000 Series
Number of CPUs (mpnum) x00000004
CPU logging event (mperr) x00000000
Event validity 1. O/S claims event is valid
Event severity 1. Severe Priority
Entry type 100. CPU Machine Check Errors
CPU Minor class 2. 660 Entry
Software Flags x0000000300000000
IOD 0 Register Subpkt Pres
IOD 1 Register Subpkt Pres
Active CPUs x0000000F
Hardware Rev x00000000
System Serial Number AY65200674
Module Serial Number
Module Type x0000
System Revision x00000000
* MCHK 660 Regs *
Flags: x00000000
PCI Mask x0000
Machine Check Reason x0202 IOD-Detected Hard Error -OR-
DTag Parity Error (If Cached CPU)
PAL SHADOW REG 0 x0000000000000000
PAL SHADOW REG 1 x0000000000000000
PAL SHADOW REG 2 x0000000000000000
PAL SHADOW REG 3 x0000000000000000
PAL SHADOW REG 4 x0000000000000000
PAL SHADOW REG 5 x0000000000000000
PAL SHADOW REG 6 x0000000000000000
PAL SHADOW REG 7 x0000000000000000
PALTEMP0 x00000001400ECB98
PALTEMP1 x000000011FFFE110
PALTEMP2 xFFFFFC0000464C80
PALTEMP3 x0000000000004400
PALTEMP4 x000000014024FC68
PALTEMP5 x000000014024FC68
PALTEMP6 x0000000000000000
PALTEMP7 xFFFFFC00004646C0
PALTEMP8 x1F1E171515020100
PALTEMP9 xFFFFFC00004649F0
PALTEMP10 x00000001202C3E4C
PALTEMP11 xFFFFFC0000464850
PALTEMP12 xFFFFFC0000464BF0
PALTEMP13 x0000000000006E80
PALTEMP14 x0000000000000000
PALTEMP15 x00000000000F0000
PALTEMP16 x0000020306600001
PALTEMP17 x0000000000000000
PALTEMP18 x000000011FFFE0C0
PALTEMP19 xFFFFFFFF90EABA38
PALTEMP20 x000000000A074000
PALTEMP21 xFFFFFC0000464C20
PALTEMP22 xFFFFFC00005E4530
PALTEMP23 x00000000014D9A38
Exception Address Reg x00000001202C3E4C
Native-mode Instruction
Exception PC x00000000480B0F93
Exception Summary Reg x0000000000000000
Exception Mask Reg x0000000000000000
PAL Base Address Reg x0000000000014000
Base Addr for PALcode:
x0000000000000005
Interrupt Summary Reg x0000000000200000
External HW Interrupt at IPL21
AST Requests 3-0:
x0000000000000000
IBOX Ctrl and Status Reg x000000C164000000
Timeout Counter Bit Clear.
IBOX Timeout Counter Enabled.
Floating Point Instr's May be
Issued.
PAL Shadow Registers Enabled.
Correctable Error Interrupts
Enabled.
ICACHE BIST (Self Test) Was
Successful.
TEST_STATUS_H Pin Asserted
Icache Par Err Stat Reg x0000000000000000
Dcache Par Err Stat Reg x0000000000000000
Virtual Address Reg xFFFFFFFF90EABA08
Memory Mgmt Flt Sts Reg x0000000000016AD1
If Error, Reference Which Caused
Was Write
If Err, Reference Resulted in DTB
Miss
Fault Inst RA Field:
x000000000000000B
Fault Inst Opcode:
x000000000000002D
Scache Address Reg xFFFFFF000001902F
Scache Status Reg x0000000000000000
Bcache Tag Address Reg xFFFFFFFFFFFFFFFF
Last Bcache Access Resulted in a
Hit.
Value of Parity Bit for Tag
Control Status
Bits Dirty, Shared & Valid is
Set.
Value of Tag Control Dirty Bit is
Set.
Value of Tag Control Shared Bit is
Set.
Value of Tag Control Valid Bit is
Set.
Value of Parity Bit Covering Tag
Store
Address Bits is Set.
Tag Address<38:20> Is:
x000000000007FFFF
Ext Interface Address Reg xFFFFFF000802A0BF
Fill Syndrome Reg x0000000000006900
Ext Interface Status Reg xFFFFFFF004FFFFFF
Error Occurred During D-ref Fill
LD LOCK xFFFFFF0000200A0F
** IOD SUBPACKET -> ** IOD 0 Register Subpacket
WHOAMI x0000023A Module Revision 1.
CPU = 0
Base Address of Bridge x000000F9E0000000
Dev Type & Rev Register x06008032 CAP Chip Revision:
x00000002
HORSE Module Revision:
x00000003
SADDLE Module Revision:
x00000000
SADDLE Module Type: Left
Hand
PCI-EISA Bus Bridge Present on PCI
Segment
PCI Class Code
x00000600
MC-PCI Command Register x42460FF1 Module SelfTest Passed LED on
Delayed PCI Bus Reads Protocol:
Enabled
Bridge to PCI Transactions:
Enabled
Bridge REQUESTS 64 Bit Data
Transactions
Bridge ACCEPTS 64 Bit Data
Transactions
PCI Address Parity Check: Enabled
MC Bus CMD/Addr Parity Check:
Enabled
MC Bus NXM Check: Enabled
Check ALL Transactions for Errors
Use RD/MOD/WRT for <64 Byte Block
Mem Wrt
Wrt PEND_NUM Threshold: 6.
RD_TYPE Memory Prefetch Algorithm:
Short
RL_TYPE Mem Rd Line Prefetch Type:
Medium
RM_TYPE Mem Rd Multiple Cmd Type:
Long
ARB_MODE PCI Arbitration: Round
Robin
Mem Host Address Ext Reg x00000000 HAE Sparse Mem Adr<31:27>
x00000000
IO Host Adr Ext Register x00000000 PCI Upper Adr Bits<31:25>
x00000000
Interrupt Ctrl Register x00000003 Write Device Interrupt Info
Struct:Enabled
Interrupt Request x00800000 Interrupts asserted x00000000
Hard Error
Interrupt Mask0 Register x00C51111
Interrupt Mask1 Register x00000000
MC Error Info Register 0 x0D124500
MC Bus Trans Addr<31:4>: D124500
MC Error Info Register 1 x800FC800 MC bus trans addr <39:32>
x00000000
MC Command is Read0-Mem
CPU3 OR IOD3 Master at Time of
Error
Device ID: x00000007
MC error info valid
CAP Error Register xC0000000 Uncorrectable ECC err det by MDPB
MC error info latched
PCI Bus Trans Error Adr x00000000
MDPA Status Register x00000000 MDPA Status Register Data Not
Valid
MDPA Error Syndrome Reg x00000000 MDPA Syndrome Register Data Not
Valid
MDPB Status Register x00000000 MDPB Status Register Data Not
Valid
MDPB Error Syndrome Reg x00000000 MDPB Syndrome Register Data Not
Valid
** IOD SUBPACKET -> ** IOD 1 Register Subpacket
WHOAMI x0000023A Module Revision 1.
CPU = 0
Base Address of Bridge x000000FBE0000000
Dev Type & Rev Register x06000032 CAP Chip Revision:
x00000002
HORSE Module Revision:
x00000003
SADDLE Module Revision:
x00000000
SADDLE Module Type: Left
Hand
Internal CAP Chip Arbiter: Enabled
PCI Class Code
x00000600
MC-PCI Command Register x42460FF1 Module SelfTest Passed LED on
Delayed PCI Bus Reads Protocol:
Enabled
Bridge to PCI Transactions:
Enabled
Bridge REQUESTS 64 Bit Data
Transactions
Bridge ACCEPTS 64 Bit Data
Transactions
PCI Address Parity Check: Enabled
MC Bus CMD/Addr Parity Check:
Enabled
MC Bus NXM Check: Enabled
Check ALL Transactions for Errors
Use RD/MOD/WRT for <64 Byte Block
Mem Wrt
Wrt PEND_NUM Threshold: 6.
RD_TYPE Memory Prefetch Algorithm:
Short
RL_TYPE Mem Rd Line Prefetch Type:
Medium
RM_TYPE Mem Rd Multiple Cmd Type:
Long
ARB_MODE PCI Arbitration: Round
Robin
Mem Host Address Ext Reg x00000000 HAE Sparse Mem Adr<31:27>
x00000000
IO Host Adr Ext Register x00000000 PCI Upper Adr Bits<31:25>
x00000000
Interrupt Ctrl Register x00000003 Write Device Interrupt Info
Struct:Enabled
Interrupt Request x00800000 Interrupts asserted x00000000
Hard Error
Interrupt Mask0 Register x00C51111
Interrupt Mask1 Register x00000000
MC Error Info Register 0 x0D124500
MC Bus Trans Addr<31:4>: D124500
MC Error Info Register 1 x800FC800 MC bus trans addr <39:32>
x00000000
MC Command is Read0-Mem
CPU3 OR IOD3 Master at Time of
Error
Device ID: x00000007
MC error info valid
CAP Error Register xC0000000 Uncorrectable ECC err det by MDPB
MC error info latched
PCI Bus Trans Error Adr x00000000
MDPA Status Register x00000000 MDPA Status Register Data Not
Valid
MDPA Error Syndrome Reg x00000000 MDPA Syndrome Register Data Not
Valid
MDPB Status Register x00000000 MDPB Status Register Data Not
Valid
MDPB Error Syndrome Reg x00000000 MDPB Syndrome Register Data Not
Valid
PALcode Revision Palcode Rev: 1.21-3
|
578.3 | Install and use DECevent | POBOXA::SHEPARD | | Tue Apr 29 1997 08:59 | 3 |
| You can install and run DECevent and post the results here.
Gary
|
578.4 | DECevent result in .2 | PANTER::AUBERT | | Tue Apr 29 1997 10:09 | 3 |
| I have already run DECevent and the result is posted in .2
Thierry
|
578.5 | | HARMNY::CUMMINS | | Tue Apr 29 1997 11:58 | 8 |
| From the DECevent output, this would appear to be the same problem as
that described in 385.* and the Blitz in 93.26. Can you confirm? I
couldn't tell for sure from the UNIX start-up audit trail whether each
KZPSA in your machine had a disk attached.
P.S. I have spoken with my UNIX counterpart and he tells me there is
a fix in place for this problem and was going to post a note in this
conference once the patch became available.
|
578.6 | CPU #3 defect ? | PANTER::AUBERT | | Tue Apr 29 1997 12:09 | 313 |
| The AlphaServer 4100 5/300 crashed again... I will copy below the
new DECevent result. I have asked people from the field to replace the
cpu #3. Could somebody confirme my diagnostic ?
Thierry
******************************** ENTRY 4
********************************
Logging OS 2. Digital UNIX
System Architecture 2. Alpha
Event sequence number 4.
Timestamp of occurrence 29-APR-1997 12:24:57
Host name shd52
System type register x00000016 AlphaServer 4000 Series
Number of CPUs (mpnum) x00000004
CPU logging event (mperr) x00000003
Event validity 1. O/S claims event is valid
Event severity 1. Severe Priority
Entry type 302. ASCII Panic Message Type
SWI Minor class 9. ASCII Message
SWI Minor sub class 1. Panic
ASCII Message panic (cpu 3): Processor Machine
Check
******************************** ENTRY 5
********************************
Logging OS 2. Digital UNIX
System Architecture 2. Alpha
Event sequence number 3.
Timestamp of occurrence 29-APR-1997 12:24:57
Host name shd52
System type register x00000016 AlphaServer 4000 Series
Number of CPUs (mpnum) x00000004
CPU logging event (mperr) x00000003
Event validity 1. O/S claims event is valid
Event severity 1. Severe Priority
Entry type 100. CPU Machine Check Errors
CPU Minor class 1. Machine check (670 entry)
Software Flags x0000000300000000
IOD 0 Register Subpkt Pres
IOD 1 Register Subpkt Pres
Active CPUs x0000000F
Hardware Rev x00000000
System Serial Number AY65200674
Module Serial Number
Module Type x0000
System Revision x00000000
* MCHK 670 Regs *
Flags: x00000000
PCI Mask x0000
Machine Check Reason x0098 Fatal Alpha Chip Detected Hard
Error
PAL SHADOW REG 0 x0000000000000000
PAL SHADOW REG 1 x0000000000000000
PAL SHADOW REG 2 x0000000000000000
PAL SHADOW REG 3 x0000000000000000
PAL SHADOW REG 4 x0000000000000000
PAL SHADOW REG 5 x0000000000000000
PAL SHADOW REG 6 x0000000000000000
PAL SHADOW REG 7 x0000000000000000
PALTEMP0 x0000000140EFC1F8
PALTEMP1 x0000000140EFC1F8
PALTEMP2 xFFFFFC0000464C80
PALTEMP3 x0000000000005588
PALTEMP4 x0000000140D91680
PALTEMP5 x0000000140EFAD68
PALTEMP6 x0000000140D91680
PALTEMP7 xFFFFFC00004646C0
PALTEMP8 x1F1E161514020100
PALTEMP9 xFFFFFC00004649F0
PALTEMP10 x00000001201954F4
PALTEMP11 xFFFFFC0000464850
PALTEMP12 xFFFFFC0000464BF0
PALTEMP13 x0000000000006FC0
PALTEMP14 x0000000000000000
PALTEMP15 x0000000000004978
PALTEMP16 x0000009806700301
PALTEMP17 x0000000000000000
PALTEMP18 x000000011FFFE280
PALTEMP19 xFFFFFFFF90EA3A38
PALTEMP20 x000000000FCBE000
PALTEMP21 xFFFFFC0000464C20
PALTEMP22 xFFFFFC00005E4530
PALTEMP23 x0000000003BF7A38
Exception Address Reg x00000001201954F4
Native-mode Instruction
Exception PC x000000004806553D
Exception Summary Reg x0000000000000000
Exception Mask Reg x0000000000000000
PAL Base Address Reg x0000000000014000
Base Addr for PALcode:
x0000000000000005
Interrupt Summary Reg x0000000000000000
AST Requests 3-0:
x0000000000000000
IBOX Ctrl and Status Reg x000000C164000000
Timeout Counter Bit Clear.
IBOX Timeout Counter Enabled.
Floating Point Instr's May be
Issued.
PAL Shadow Registers Enabled.
Correctable Error Interrupts
Enabled.
ICACHE BIST (Self Test) Was
Successful.
TEST_STATUS_H Pin Asserted
Icache Par Err Stat Reg x0000000000000000
Dcache Par Err Stat Reg x0000000000000000
Virtual Address Reg x0000000140EFC250
Memory Mgmt Flt Sts Reg x0000000000011B10
If Err, Reference Resulted in DTB
Miss
Fault Inst RA Field:
x000000000000000C
Fault Inst Opcode:
x0000000000000023
Scache Address Reg xFFFFFF000001960F
Scache Status Reg x0000000000000000
Bcache Tag Address Reg xFFFFFFFFFF7FFFFF
Last Bcache Access Resulted in a
Hit.
Value of Parity Bit for Tag
Control Status
Bits Dirty, Shared & Valid is
Set.
Value of Tag Control Dirty Bit is
Set.
Value of Tag Control Shared Bit is
Set.
Value of Tag Control Valid Bit is
Set.
Value of Parity Bit Covering Tag
Store
Address Bits is Set.
Tag Address<38:20> Is:
x000000000007FFF7
Ext Interface Address Reg xFFFFFF00080C0D0F
Fill Syndrome Reg x000000000000491B
Ext Interface Status Reg xFFFFFFF904FFFFFF
UNCORRECTABLE ECC ERROR
Error Occurred During D-ref Fill
Second External Interface Hard
Error
LD LOCK xFFFFFF00024462CF
** IOD SUBPACKET -> ** IOD 0 Register Subpacket
WHOAMI x0000023F Module Revision 1.
CPU = 3
Base Address of Bridge x000000F9E0000000
Dev Type & Rev Register x06008032 CAP Chip Revision:
x00000002
HORSE Module Revision:
x00000003
SADDLE Module Revision:
x00000000
SADDLE Module Type: Left
Hand
PCI-EISA Bus Bridge Present on PCI
Segment
PCI Class Code
x00000600
MC-PCI Command Register x42460FF1 Module SelfTest Passed LED on
Delayed PCI Bus Reads Protocol:
Enabled
Bridge to PCI Transactions:
Enabled
Bridge REQUESTS 64 Bit Data
Transactions
Bridge ACCEPTS 64 Bit Data
Transactions
PCI Address Parity Check: Enabled
MC Bus CMD/Addr Parity Check:
Enabled
MC Bus NXM Check: Enabled
Check ALL Transactions for Errors
Use RD/MOD/WRT for <64 Byte Block
Mem Wrt
Wrt PEND_NUM Threshold: 6.
RD_TYPE Memory Prefetch Algorithm:
Short
RL_TYPE Mem Rd Line Prefetch Type:
Medium
RM_TYPE Mem Rd Multiple Cmd Type:
Long
ARB_MODE PCI Arbitration: Round
Robin
Mem Host Address Ext Reg x00000000 HAE Sparse Mem Adr<31:27>
x00000000
IO Host Adr Ext Register x00000000 PCI Upper Adr Bits<31:25>
x00000000
Interrupt Ctrl Register x00000003 Write Device Interrupt Info
Struct:Enabled
Interrupt Request x00800000 Interrupts asserted x00000000
Hard Error
Interrupt Mask0 Register x00C51111
Interrupt Mask1 Register x00000000
MC Error Info Register 0 x080C0D00
MC Bus Trans Addr<31:4>: 80C0D00
MC Error Info Register 1 x800FD800 MC bus trans addr <39:32>
x00000000
MC Command is Read0-Mem
CPU3 OR IOD3 Master at Time of
Error
Device ID: x00000007
MC error info valid
CAP Error Register xE0000000 Uncorrectable ECC err det by MDPA
Uncorrectable ECC err det by MDPB
MC error info latched
PCI Bus Trans Error Adr x00000000
MDPA Status Register x00000000 MDPA Status Register Data Not
Valid
MDPA Error Syndrome Reg x00000000 MDPA Syndrome Register Data Not
Valid
MDPB Status Register x00000000 MDPB Status Register Data Not
Valid
MDPB Error Syndrome Reg x00000000 MDPB Syndrome Register Data Not
Valid
** IOD SUBPACKET -> ** IOD 1 Register Subpacket
WHOAMI x0000023F Module Revision 1.
CPU = 3
Base Address of Bridge x000000FBE0000000
Dev Type & Rev Register x06000032 CAP Chip Revision:
x00000002
HORSE Module Revision:
x00000003
SADDLE Module Revision:
x00000000
SADDLE Module Type: Left
Hand
Internal CAP Chip Arbiter: Enabled
PCI Class Code
x00000600
MC-PCI Command Register x42460FF1 Module SelfTest Passed LED on
Delayed PCI Bus Reads Protocol:
Enabled
Bridge to PCI Transactions:
Enabled
Bridge REQUESTS 64 Bit Data
Transactions
Bridge ACCEPTS 64 Bit Data
Transactions
PCI Address Parity Check: Enabled
MC Bus CMD/Addr Parity Check:
Enabled
MC Bus NXM Check: Enabled
Check ALL Transactions for Errors
Use RD/MOD/WRT for <64 Byte Block
Mem Wrt
Wrt PEND_NUM Threshold: 6.
RD_TYPE Memory Prefetch Algorithm:
Short
RL_TYPE Mem Rd Line Prefetch Type:
Medium
RM_TYPE Mem Rd Multiple Cmd Type:
Long
ARB_MODE PCI Arbitration: Round
Robin
Mem Host Address Ext Reg x00000000 HAE Sparse Mem Adr<31:27>
x00000000
IO Host Adr Ext Register x00000000 PCI Upper Adr Bits<31:25>
x00000000
Interrupt Ctrl Register x00000003 Write Device Interrupt Info
Struct:Enabled
Interrupt Request x00800000 Interrupts asserted x00000000
Hard Error
Interrupt Mask0 Register x00C51111
Interrupt Mask1 Register x00000000
MC Error Info Register 0 x080C0D00
MC Bus Trans Addr<31:4>: 80C0D00
MC Error Info Register 1 x800FD800 MC bus trans addr <39:32>
x00000000
MC Command is Read0-Mem
CPU3 OR IOD3 Master at Time of
Error
Device ID: x00000007
MC error info valid
CAP Error Register xE0000000 Uncorrectable ECC err det by MDPA
Uncorrectable ECC err det by MDPB
MC error info latched
PCI Bus Trans Error Adr x00000000
MDPA Status Register x00000000 MDPA Status Register Data Not
Valid
MDPA Error Syndrome Reg x00000000 MDPA Syndrome Register Data Not
Valid
MDPB Status Register x00000000 MDPB Status Register Data Not
Valid
MDPB Error Syndrome Reg x00000000 MDPB Syndrome Register Data Not
Valid
PALcode Revision Palcode Rev: 1.21-3
|
578.7 | | HARMNY::CUMMINS | | Tue Apr 29 1997 12:37 | 16 |
| The DECevent output in .2 shows what appears to be a diskless KZPSA
UNIX panic (see note 93.26, etc.). Until you posted .6 I hadn't looked
far enough in the DECevent log posted in .2 to see the CPU3 error
(which is also shown in the log in .6). Could you post a reply
indicating whether you have any diskless (/tapeless) KZPSAs in said
machine?
Also, can you reply with info about the system's memory config? I see
from .2 that you have 256MBs of memory. This is two 128MB SYNC options,
yes? Or do you have a proto (or internal order) with a single 256MB
option installed in it? If SYNC memory, is this DIGITAL memory or
third-party memory?
Finally, I was looking back through other notes in this conference and
it would appear that the log in .2 may be from the same machine as that
discussed in 543.*? Is this true?
|
578.8 | Don't suspect CPU#3 | POBOXB::STEINMAN | | Tue Apr 29 1997 12:38 | 21 |
|
I don't believe CPU#3 is at fault....I suspect memory or unterminated
KZPSA referred to in .-1, since the system
bus saw an uncorrectable ECC at the same address as CPU3 detected it:
CPU3:
Ext Interface Address Reg xFFFFFF00080C0D0F
Fill Syndrome Reg x000000000000491B
Ext Interface Status Reg xFFFFFFF904FFFFFF
IOD:
MC Error Info Register 0 x080C0D00
MC Bus Trans Addr<31:4>: 80C0D00
MC Error Info Register 1 x800FD800
MC bus trans addr <39:32> x00000000
MC Command is Read0-Mem
CPU3 OR IOD3 Master at Time of Error
/mo
|
578.9 | All KZPSAs have devices connected... | PANTER::AUBERT | | Wed Apr 30 1997 05:23 | 36 |
| >> The DECevent output in .2 shows what appears to be a diskless KZPSA
>> UNIX panic.
>> Could you post a reply indicating whether you have any diskless
>> (/tapeless) KZPSAs in said machine?
I have no diskless (/tapeless) KZPSAs in this machine.
>> Also, can you reply with info about the system's memory config?
FRU Location 0. Slot Name: MEM0L and MEM0H
Self Test Status x00000001 FRU passed Self-Test
Total Memory Size 128. Mega Bytes (2 Modules)
Module Size 64. Mega Bytes (per Module)
Memory Base Addr x0000000000000000
Memory Module Type x0000000000000003
Syncronous DRAM
FRU Location 1. Slot Name: MEM1L and MEM1H
Self Test Status x00000001 FRU passed Self-Test
Total Memory Size 128. Mega Bytes (2 Modules)
Module Size 64. Mega Bytes (per Module)
Memory Base Addr x0000000008000000
Memory Module Type x0000000000000003
Syncronous DRAM
It means that we have 4 X 64MB module (4 X B3020-CA).
>> Finally, I was looking back through other notes in this conference
>> and it would appear that the log in .2 may be from the same machine
>> as that discussed in 543.*? Is this true?
It is not the same machine but a machine with the same configuration.
Thanks for your time helping diagnosing this urgent problem.
Thierry Aubert/DEC at CERN
|
578.10 | | PROXY::ALFORD | | Wed Apr 30 1997 11:14 | 24 |
| It's possible that the mother board could be at fault, if it is a rev
B06 54-23803-01. I have seen one customer problem with simular config-
uration that we fixed by swapping in a rev B07 54-23803-01 module. These
should be available through the P1 process.
Since pulling the mother board is not a trival task, I strongly suggest
first installing (2) sets of EDO (B3030-EA) memory first (if possible).
If these sets work, then it appears you may have the same problem as
that other customer. If they don't work, then you have a different
problem.
History:
There was a change to a PAL (vendor/code change) on the 54-23803-02 module.
This change added timing margin for sync memory configurations. Even though
all lab test results indicated the old PAL met system specification it
was decided to change the PAL anyways. As already mentioned, we have seen
one customer problem with 4 cpu and 4 B3020-CAs fail with 660 MCHKs. EDO
memories worked fine.
FYI... there is a rev B08 available too, this is electrically the same
as a rev B07. The difference is, the B08 has new mounting holes for a
power resistor support bracket.
bruce
|
578.11 | motherboard rev B07 solved my problem | PANTER::AUBERT | | Fri May 09 1997 05:27 | 10 |
| I have changed the motherboard with revision B07. Since 2 days now the
system is working fine (no more machine check). I do not have checked
with EDO memory before but I will do it with another system which has
the same problem.
I would like to thank you for your advice since it solves my problem.
Regards,
Thierry Aubert/DEC at CERN
|