[Search for users]
[Overall Top Noters]
[List of all Conferences]
[Download this site]
Title: | SABLE SYSTEM PUBLIC DISCUSSION |
|
Moderator: | COSMIC::PETERSON |
|
Created: | Mon Jan 11 1993 |
Last Modified: | Fri Jun 06 1997 |
Last Successful Update: | Fri Jun 06 1997 |
Number of topics: | 2614 |
Total number of notes: | 10244 |
anyone seen this, output from decevent....
Logging OS 2. Digital UNIX
System Architecture 2. Alpha
Event sequence number 77.
Timestamp of occurrence 24-JAN-1997 08:12:51
Host name atlas
System type register x00000009 AlphaServer 2x00
Number of CPUs (mpnum) x00000004
CPU logging event (mperr) x00000000
Event validity 1. O/S claims event is valid
Event severity 1. Severe Priority
Entry type 100. CPU Machine Check Errors
CPU Minor class 3. Bcache error (630 entry)
-- ENTRY FRAME FOLLOWS --
Frame ID x00000023 Proc BCache Corr Error Frame
CPU Number Logging Event 0.
Flags: x80000000 Retryable Error
Mchk Error Code x00000086 B-Cache Correctable
EI Address <39:4> xFFFFFF0000544D9F
Fill Syndrome Reg x00000000000000A8
Ext Interface Status Reg xFFFFFFF484FFFFFF
Correctable ECC error
Error occurred during I-ref fill
Interrupt Summary Reg x0000000100000000
Correctable ECC errors (IPL31)
AST requests 3 - 0 x0000000000000000
Configuration Reg (R0) x380003F238000002
LOW LONGWORD Slice Follows
RATTLER Gate Array: Revision #2
Bit 12 Clr: Cmd/Data NOACK are Errors
Bit 24 Clr: IDLEBC Assert in Last Cycle 4
Bit 25 Clr: IDLEBC Assert During Cycle 4
Bit 27 Set: ACK Set_Dirty & Set_Lock Cmds
CACHE Size Field: 4 MB Cache
HIGH LONGWORD Slice Follows
RATTLER Gate Array: Revision #2
Bit 36 Set: Rx IPL31 on CBus CERR Assert
Bit 37 Set: Rx HALT on CBus SYS_EVENT
Bit 38 Set: Rx HALT on IIRR CSR24 HALT Req
Bit 39 Set: Rx INTERPROC INT on Write to
IIRR CSR24 INTERPROC INT Req
Bit 40 Set: Enable CIRQ<0> INT From T2
Bit 41 Set: Enable CIRQ<1> INT From XIO
Bit 44 Clr: Cmd/Data NOACK are Errors
Bit 56 Clr: IDLEBC Assert in Last Cycle 4
Bit 57 Clr: IDLEBC Assert During Cycle 4
Bit 59 Set: ACK Set_Dirty & Set_Lock Cmds
CACHE Size Field: 4 MB Cache
Error Summary Reg (R1) x0000000000000000
EVB Control Register (R2) x0000006100000061
LOW LONGWORD Slice Follows
Bit 0 Set: Enable Addr-Cmd Parity Checking
Bit 5 Set: Enable Bcache ECC Corr QW0/QW2
Bit 6 Set: Enable ECC Check - QW0/QW2 Data
HIGH LONGWORD Slice Follows
Bit 32 Set: Enable Addr-Cmd Parity Check
Bit 37 Set: Enable Bcache ECC Corr QW1/QW3
Bit 38 Set: Enable ECC Check-QW1/QW3 Data
Correctable Err Reg (R4) x0000000000000000
LOW LONGWORD Slice Follows
QW0 ECC Syndrome: No Syndrome Bits Set
QW2 ECC Syndrome: No Syndrome Bits Set
HIGH LONGWORD Slice Follows
QW1 ECC Syndrome: No Syndrome Bits Set
QW3 ECC Syndrome: No Syndrome Bits Set
Correctable Err Addr (R5) xB800000AB800000A
LOW LONGWORD Slice Follows
Bit 32 Set: EV-Bus Bit 39, IO Bit, Set
EVB<34:4> Corr Err Adr x000000005800000A
HIGH LONGWORD Slice Follows
Bit 63 Set: EV-Bus Bit 39, IO Bit, Set
EVB<34:4> Corr Err Adr x000000005800000A
-- ENTRY FRAME FOLLOWS --
Frame ID x00000000 End Frame
Entry# (record in file) 0.
Canonical buff size 5416.
Canonical event size 4392.
Canonical Event-Buffer:
15--<-12 11--<-08 07--<-04 03--<-00 :Byte Order
0000: 00000004 00000000 00000000 00000063 *c...............*
0010: 00000202 4E454720 33317646 534F0001 *..OSFv13 GEN....*
0020: 00000000 00000000 00000000 00000000 *................*
0030: 004D0000 00000000 00000000 00000000 *..............M.*
0040: 30303135 32313830 34323130 37393931 *1997012408125100*
0050: 00000000 00000000 00000020 20202020 * ...........*
0060: 00000000 00000000 0073616C 74610000 *..atlas.........*
0070: 00000000 00000000 00000000 00000000 *................*
0080: 33317646 534F0001 00000000 00000000 *..........OSFv13*
0090: 000000FF 00000009 00000000 55504320 * CPU............*
00A0: 00000000 00000000 00000000 00000004 *................*
00B0: 00000000 00000000 00000000 00000000 *................*
00C0: 00000000 00000000 00000000 00000000 *................*
00D0: 00000000 00000000 00000000 00000000 *................*
00E0: 00000000 00000000 00000000 00000000 *................*
00F0: 00000000 00000000 00000000 00000500 *................*
0100: 00640101 54564520 33317646 534F0001 *..OSFv13 EVT..d.*
0110: 00000000 00000000 00000000 00000000 *................*
0120: 00000000 00000000 00000000 00000000 *................*
0130: 00000023 00000078 00000000 00030001 *........x...#...*
0140: 00000018 80000000 00000060 00000060 *`...`...........*
0150: 00544D9F 00000000 00000086 00000038 *8............MT.*
0160: 84FFFFFF 00000000 000000A8 FFFFFF00 *................*
0170: 38000002 00000001 00000000 FFFFFFF4 *...............8*
0180: 00000061 00000000 00000000 380003F2 *...8........a...*
0190: B800000A 00000000 00000000 00000061 *a...............*
01A0: 00000000 00000000 00000000 B800000A *................*
01B0: 00000000 00000000 00000000 5E3C7E25 *%~<^............*
01C0: 00000000 00000000 00000000 00000000 *................*
01D0: 00000000 00000000 00000000 00000000 *................*
01E0: 00000000 00000000 00000000 00000000 *................*
01F0: 00000000 00000000 00000000 00000000 *................*
0200: 00000000 00000000 00000000 00000000 *................*
0210: 00000000 00000000 00000000 00000000 *................*
0220: 00000000 00000000 00000000 00000000 *................*
0230: 00000000 00000000 00000000 00000000 *................*
0240: 00000000 00000000 00000000 00000000 *................*
0250: 00000000 00000000 00000000 00000000 *................*
0260: 00000000 00000000 00000000 00000000 *................*
0270: 00000000 00000000 00000000 00000000 *................*
0280: 00000000 00000000 00000000 00000000 *................*
the rest of this entry is all zeroes...
Logging OS 2. Digital UNIX
System Architecture 2. Alpha
Event sequence number 76.
Timestamp of occurrence 24-JAN-1997 08:12:46
Host name atlas
System type register x00000009 AlphaServer 2x00
Number of CPUs (mpnum) x00000004
CPU logging event (mperr) x00000000
Event validity 1. O/S claims event is valid
Event severity 1. Severe Priority
Entry type 100. CPU Machine Check Errors
CPU Minor class 3. Bcache error (630 entry)
-- ENTRY FRAME FOLLOWS --
Frame ID x00000023 Proc BCache Corr Error Frame
CPU Number Logging Event 0.
Flags: x80000000 Retryable Error
Mchk Error Code x00000086 B-Cache Correctable
EI Address <39:4> xFFFFFF0000544D9F
Fill Syndrome Reg x00000000000000A8
Ext Interface Status Reg xFFFFFFF484FFFFFF
Correctable ECC error
Error occurred during I-ref fill
Interrupt Summary Reg x0000000100000000
Correctable ECC errors (IPL31)
AST requests 3 - 0 x0000000000000000
Configuration Reg (R0) x380003F238000002
LOW LONGWORD Slice Follows
RATTLER Gate Array: Revision #2
Bit 36 Set: Rx IPL31 on CBus CERR Assert
Bit 37 Set: Rx HALT on CBus SYS_EVENT
Bit 38 Set: Rx HALT on IIRR CSR24 HALT Req
Bit 39 Set: Rx INTERPROC INT on Write to
IIRR CSR24 INTERPROC INT Req
Bit 40 Set: Enable CIRQ<0> INT From T2
Bit 41 Set: Enable CIRQ<1> INT From XIO
Bit 44 Clr: Cmd/Data NOACK are Errors
Bit 56 Clr: IDLEBC Assert in Last Cycle 4
Bit 57 Clr: IDLEBC Assert During Cycle 4
Bit 59 Set: ACK Set_Dirty & Set_Lock Cmds
CACHE Size Field: 4 MB Cache
Error Summary Reg (R1) x0000000000000000
EVB Control Register (R2) x0000006100000061
LOW LONGWORD Slice Follows
Bit 0 Set: Enable Addr-Cmd Parity Checking
Bit 5 Set: Enable Bcache ECC Corr QW0/QW2
Bit 6 Set: Enable ECC Check - QW0/QW2 Data
HIGH LONGWORD Slice Follows
Bit 32 Set: Enable Addr-Cmd Parity Check
Bit 37 Set: Enable Bcache ECC Corr QW1/QW3
Bit 38 Set: Enable ECC Check-QW1/QW3 Data
Correctable Err Reg (R4) x0000000000000000
LOW LONGWORD Slice Follows
QW0 ECC Syndrome: No Syndrome Bits Set
QW2 ECC Syndrome: No Syndrome Bits Set
HIGH LONGWORD Slice Follows
QW1 ECC Syndrome: No Syndrome Bits Set
QW3 ECC Syndrome: No Syndrome Bits Set
Correctable Err Addr (R5) xB800000AB800000A
LOW LONGWORD Slice Follows
Bit 32 Set: EV-Bus Bit 39, IO Bit, Set
EVB<34:4> Corr Err Adr x000000005800000A
HIGH LONGWORD Slice Follows
Bit 63 Set: EV-Bus Bit 39, IO Bit, Set
EVB<34:4> Corr Err Adr x000000005800000A
-- ENTRY FRAME FOLLOWS --
Frame ID x00000000 End Frame
Entry# (record in file) 0.
Canonical buff size 5416.
Canonical event size 4392.
Canonical Event-Buffer:
15--<-12 11--<-08 07--<-04 03--<-00 :Byte Order
0000: 00000005 00000000 00000000 00000063 *c...............*
0010: 00000202 4E454720 33317646 534F0001 *..OSFv13 GEN....*
0020: 00000000 00000000 00000000 00000000 *................*
0030: 004C0000 00000000 00000000 00000000 *..............L.*
0040: 30303634 32313830 34323130 37393931 *1997012408124600*
0050: 00000000 00000000 00000020 20202020 * ...........*
0060: 00000000 00000000 0073616C 74610000 *..atlas.........*
0070: 00000000 00000000 00000000 00000000 *................*
0080: 33317646 534F0001 00000000 00000000 *..........OSFv13*
0090: 000000FF 00000009 00000000 55504320 * CPU............*
00A0: 00000000 00000000 00000000 00000004 *................*
00B0: 00000000 00000000 00000000 00000000 *................*
00C0: 00000000 00000000 00000000 00000000 *................*
00D0: 00000000 00000000 00000000 00000000 *................*
00E0: 00000000 00000000 00000000 00000000 *................*
00F0: 00000000 00000000 00000000 00000500 *................*
0100: 00640101 54564520 33317646 534F0001 *..OSFv13 EVT..d.*
0110: 00000000 00000000 00000000 00000000 *................*
0120: 00000000 00000000 00000000 00000000 *................*
0130: 00000023 00000078 00000000 00030001 *........x...#...*
0140: 00000018 80000000 00000060 00000060 *`...`...........*
0150: 00544D9F 00000000 00000086 00000038 *8............MT.*
0160: 84FFFFFF 00000000 000000A8 FFFFFF00 *................*
0170: 38000002 00000001 00000000 FFFFFFF4 *...............8*
0180: 00000061 00000000 00000000 380003F2 *...8........a...*
0190: B800000A 00000000 00000000 00000061 *a...............*
01A0: 00000000 00000000 00000000 B800000A *................*
01B0: 00000000 00000000 00000000 5E3C7E25 *%~<^............*
01C0: 00000000 00000000 00000000 00000000 *................*
We are getting these about 20 a day.. The cpu's have been changed, the
backplane has been changed and 1 memory module plus the memory modules have
been rotated to no avail.
Any ideas.
T.R | Title | User | Personal Name | Date | Lines |
---|
2503.1 | isolated to CPU0 | DANGER::PAWLOWSKI | Chet Pawlowski | Mon Jan 27 1997 11:31 | 27 |
| The two error frames you posted both point to the same fault:
CPU0 is getting a single bit error (corrected by HW) from its B-cache.
Since the External_Interface bit was not set, the error was isolated to
the CPU.
The failing address and fill syndrome are identical in the two frames:
EI Address <39:4> xFFFFFF0000544D9F
Fill Syndrome Reg x00000000000000A8 maps to data bit 43
Do diagnostics find this error?
>>>show error cpu0
If you can post the SROM setup values, we could sanity check them:
>>>e -b -n a iic_cpu0:f5
Since you say that the CPU(s) have been replaced, is it possible that
you have a failed power supply and you're running off of one rather
than two supplies?
/Chet
|