[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference mvblab::sable

Title:SABLE SYSTEM PUBLIC DISCUSSION
Moderator:COSMIC::PETERSON
Created:Mon Jan 11 1993
Last Modified:Fri Jun 06 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:2614
Total number of notes:10244

2503.0. "mchk 630's" by WHTAIL::HUTCHINS () Mon Jan 27 1997 08:54

anyone seen this, output from decevent....

Logging OS                        2. Digital UNIX
System Architecture               2. Alpha
Event sequence number            77.
Timestamp of occurrence              24-JAN-1997 08:12:51
Host name                            atlas

System type register      x00000009  AlphaServer 2x00
Number of CPUs (mpnum)    x00000004
CPU logging event (mperr) x00000000

Event validity                    1. O/S claims event is valid
Event severity                    1. Severe Priority
Entry type                      100. CPU Machine Check Errors

CPU Minor class                   3. Bcache error (630 entry)

-- ENTRY FRAME FOLLOWS --
Frame ID                  x00000023  Proc BCache Corr Error Frame

CPU Number Logging Event          0.
Flags:                    x80000000  Retryable Error
Mchk Error Code           x00000086  B-Cache Correctable
EI Address <39:4>         xFFFFFF0000544D9F
Fill Syndrome Reg         x00000000000000A8
Ext Interface Status Reg  xFFFFFFF484FFFFFF
                                     Correctable ECC error
                                     Error occurred during I-ref fill
Interrupt Summary Reg     x0000000100000000
                                     Correctable ECC errors (IPL31)
                                     AST requests 3 - 0  x0000000000000000
Configuration Reg    (R0) x380003F238000002
                                             LOW LONGWORD Slice Follows
                                     RATTLER Gate Array:  Revision #2
				     Bit 12 Clr: Cmd/Data NOACK are Errors
                                     Bit 24 Clr: IDLEBC Assert in Last Cycle 4
                                     Bit 25 Clr: IDLEBC Assert During Cycle 4
                                     Bit 27 Set: ACK Set_Dirty & Set_Lock Cmds
                                     CACHE Size Field:  4 MB Cache
                                             HIGH LONGWORD Slice Follows
                                     RATTLER Gate Array:  Revision #2
                                     Bit 36 Set: Rx IPL31 on CBus CERR Assert
                                     Bit 37 Set: Rx HALT on CBus SYS_EVENT
                                     Bit 38 Set: Rx HALT on IIRR CSR24 HALT Req
                                     Bit 39 Set: Rx INTERPROC INT on Write to
                                                 IIRR CSR24 INTERPROC INT Req
                                     Bit 40 Set: Enable CIRQ<0> INT From T2
                                     Bit 41 Set: Enable CIRQ<1> INT From XIO
                                     Bit 44 Clr: Cmd/Data NOACK are Errors
                                     Bit 56 Clr: IDLEBC Assert in Last Cycle 4
                                     Bit 57 Clr: IDLEBC Assert During Cycle 4
                                     Bit 59 Set: ACK Set_Dirty & Set_Lock Cmds
                                     CACHE Size Field:  4 MB Cache
Error Summary Reg    (R1) x0000000000000000
EVB Control Register (R2) x0000006100000061
                                             LOW LONGWORD Slice Follows
                                     Bit 0 Set: Enable Addr-Cmd Parity Checking
                                     Bit 5 Set: Enable Bcache ECC Corr QW0/QW2
                                     Bit 6 Set: Enable ECC Check - QW0/QW2 Data
                                             HIGH LONGWORD Slice Follows
                                     Bit 32 Set: Enable Addr-Cmd Parity Check
                                     Bit 37 Set: Enable Bcache ECC Corr QW1/QW3
                                     Bit 38 Set: Enable ECC Check-QW1/QW3 Data
Correctable Err Reg  (R4) x0000000000000000
                                             LOW LONGWORD Slice Follows
                                     QW0 ECC Syndrome:  No Syndrome Bits Set
                                     QW2 ECC Syndrome:  No Syndrome Bits Set
                                             HIGH LONGWORD Slice Follows
                                     QW1 ECC Syndrome:  No Syndrome Bits Set
                                     QW3 ECC Syndrome:  No Syndrome Bits Set
Correctable Err Addr (R5) xB800000AB800000A
                                             LOW LONGWORD Slice Follows
                                     Bit 32 Set: EV-Bus Bit 39, IO Bit, Set
                                     EVB<34:4> Corr Err Adr  x000000005800000A

                                             HIGH LONGWORD Slice Follows
                                     Bit 63 Set: EV-Bus Bit 39, IO Bit, Set
                                     EVB<34:4> Corr Err Adr  x000000005800000A

-- ENTRY FRAME FOLLOWS --
Frame ID                  x00000000  End Frame

Entry# (record in file)           0.
Canonical buff size            5416.
Canonical event size           4392.
Canonical Event-Buffer:

          15--<-12  11--<-08  07--<-04  03--<-00   :Byte Order
 0000:    00000004  00000000  00000000  00000063   *c...............*
 0010:    00000202  4E454720  33317646  534F0001   *..OSFv13 GEN....*
 0020:    00000000  00000000  00000000  00000000   *................*
 0030:    004D0000  00000000  00000000  00000000   *..............M.*
 0040:    30303135  32313830  34323130  37393931   *1997012408125100*
 0050:    00000000  00000000  00000020  20202020   *     ...........*
 0060:    00000000  00000000  0073616C  74610000   *..atlas.........*
 0070:    00000000  00000000  00000000  00000000   *................*
 0080:    33317646  534F0001  00000000  00000000   *..........OSFv13*
 0090:    000000FF  00000009  00000000  55504320   * CPU............*
 00A0:    00000000  00000000  00000000  00000004   *................*
 00B0:    00000000  00000000  00000000  00000000   *................*
 00C0:    00000000  00000000  00000000  00000000   *................*
 00D0:    00000000  00000000  00000000  00000000   *................*
 00E0:    00000000  00000000  00000000  00000000   *................*
 00F0:    00000000  00000000  00000000  00000500   *................*
 0100:    00640101  54564520  33317646  534F0001   *..OSFv13 EVT..d.*
 0110:    00000000  00000000  00000000  00000000   *................*
 0120:    00000000  00000000  00000000  00000000   *................*
 0130:    00000023  00000078  00000000  00030001   *........x...#...*
 0140:    00000018  80000000  00000060  00000060   *`...`...........*
 0150:    00544D9F  00000000  00000086  00000038   *8............MT.*
 0160:    84FFFFFF  00000000  000000A8  FFFFFF00   *................*
 0170:    38000002  00000001  00000000  FFFFFFF4   *...............8*
 0180:    00000061  00000000  00000000  380003F2   *...8........a...*
 0190:    B800000A  00000000  00000000  00000061   *a...............*
 01A0:    00000000  00000000  00000000  B800000A   *................*
 01B0:    00000000  00000000  00000000  5E3C7E25   *%~<^............*
 01C0:    00000000  00000000  00000000  00000000   *................*
 01D0:    00000000  00000000  00000000  00000000   *................*
 01E0:    00000000  00000000  00000000  00000000   *................*
 01F0:    00000000  00000000  00000000  00000000   *................*
 0200:    00000000  00000000  00000000  00000000   *................*
 0210:    00000000  00000000  00000000  00000000   *................*
 0220:    00000000  00000000  00000000  00000000   *................*
 0230:    00000000  00000000  00000000  00000000   *................*
 0240:    00000000  00000000  00000000  00000000   *................*
 0250:    00000000  00000000  00000000  00000000   *................*
 0260:    00000000  00000000  00000000  00000000   *................*
 0270:    00000000  00000000  00000000  00000000   *................*
 0280:    00000000  00000000  00000000  00000000   *................*

the rest of this entry is all zeroes...



Logging OS                        2. Digital UNIX
System Architecture               2. Alpha
Event sequence number            76.
Timestamp of occurrence              24-JAN-1997 08:12:46
Host name                            atlas

System type register      x00000009  AlphaServer 2x00
Number of CPUs (mpnum)    x00000004
CPU logging event (mperr) x00000000

Event validity                    1. O/S claims event is valid
Event severity                    1. Severe Priority
Entry type                      100. CPU Machine Check Errors

CPU Minor class                   3. Bcache error (630 entry)

-- ENTRY FRAME FOLLOWS --
Frame ID                  x00000023  Proc BCache Corr Error Frame

CPU Number Logging Event          0.
Flags:                    x80000000  Retryable Error
Mchk Error Code           x00000086  B-Cache Correctable
EI Address <39:4>         xFFFFFF0000544D9F
Fill Syndrome Reg         x00000000000000A8
Ext Interface Status Reg  xFFFFFFF484FFFFFF
                                     Correctable ECC error
                                     Error occurred during I-ref fill
Interrupt Summary Reg     x0000000100000000
                                     Correctable ECC errors (IPL31)
                                     AST requests 3 - 0  x0000000000000000
Configuration Reg    (R0) x380003F238000002
                                             LOW LONGWORD Slice Follows
                                     RATTLER Gate Array:  Revision #2
                                     Bit 36 Set: Rx IPL31 on CBus CERR Assert
                                     Bit 37 Set: Rx HALT on CBus SYS_EVENT
                                     Bit 38 Set: Rx HALT on IIRR CSR24 HALT Req
                                     Bit 39 Set: Rx INTERPROC INT on Write to
                                                 IIRR CSR24 INTERPROC INT Req
                                     Bit 40 Set: Enable CIRQ<0> INT From T2
                                     Bit 41 Set: Enable CIRQ<1> INT From XIO
                                     Bit 44 Clr: Cmd/Data NOACK are Errors
                                     Bit 56 Clr: IDLEBC Assert in Last Cycle 4
                                     Bit 57 Clr: IDLEBC Assert During Cycle 4
                                     Bit 59 Set: ACK Set_Dirty & Set_Lock Cmds
                                     CACHE Size Field:  4 MB Cache
Error Summary Reg    (R1) x0000000000000000
EVB Control Register (R2) x0000006100000061
                                             LOW LONGWORD Slice Follows
                                     Bit 0 Set: Enable Addr-Cmd Parity Checking
                                     Bit 5 Set: Enable Bcache ECC Corr QW0/QW2
                                     Bit 6 Set: Enable ECC Check - QW0/QW2 Data
                                             HIGH LONGWORD Slice Follows
                                     Bit 32 Set: Enable Addr-Cmd Parity Check
                                     Bit 37 Set: Enable Bcache ECC Corr QW1/QW3
                                     Bit 38 Set: Enable ECC Check-QW1/QW3 Data
Correctable Err Reg  (R4) x0000000000000000
                                             LOW LONGWORD Slice Follows
                                     QW0 ECC Syndrome:  No Syndrome Bits Set
                                     QW2 ECC Syndrome:  No Syndrome Bits Set
                                             HIGH LONGWORD Slice Follows
                                     QW1 ECC Syndrome:  No Syndrome Bits Set
                                     QW3 ECC Syndrome:  No Syndrome Bits Set
Correctable Err Addr (R5) xB800000AB800000A
                                             LOW LONGWORD Slice Follows
                                     Bit 32 Set: EV-Bus Bit 39, IO Bit, Set
                                     EVB<34:4> Corr Err Adr  x000000005800000A

                                             HIGH LONGWORD Slice Follows
                                     Bit 63 Set: EV-Bus Bit 39, IO Bit, Set
                                     EVB<34:4> Corr Err Adr  x000000005800000A

-- ENTRY FRAME FOLLOWS --
Frame ID                  x00000000  End Frame


Entry# (record in file)           0.
Canonical buff size            5416.
Canonical event size           4392.
Canonical Event-Buffer:

          15--<-12  11--<-08  07--<-04  03--<-00   :Byte Order
 0000:    00000005  00000000  00000000  00000063   *c...............*
 0010:    00000202  4E454720  33317646  534F0001   *..OSFv13 GEN....*
 0020:    00000000  00000000  00000000  00000000   *................*
 0030:    004C0000  00000000  00000000  00000000   *..............L.*
 0040:    30303634  32313830  34323130  37393931   *1997012408124600*
 0050:    00000000  00000000  00000020  20202020   *     ...........*
 0060:    00000000  00000000  0073616C  74610000   *..atlas.........*
 0070:    00000000  00000000  00000000  00000000   *................*
 0080:    33317646  534F0001  00000000  00000000   *..........OSFv13*
 0090:    000000FF  00000009  00000000  55504320   * CPU............*
 00A0:    00000000  00000000  00000000  00000004   *................*
 00B0:    00000000  00000000  00000000  00000000   *................*
 00C0:    00000000  00000000  00000000  00000000   *................*
 00D0:    00000000  00000000  00000000  00000000   *................*
 00E0:    00000000  00000000  00000000  00000000   *................*
 00F0:    00000000  00000000  00000000  00000500   *................*
 0100:    00640101  54564520  33317646  534F0001   *..OSFv13 EVT..d.*
 0110:    00000000  00000000  00000000  00000000   *................*
 0120:    00000000  00000000  00000000  00000000   *................*
 0130:    00000023  00000078  00000000  00030001   *........x...#...*
 0140:    00000018  80000000  00000060  00000060   *`...`...........*
 0150:    00544D9F  00000000  00000086  00000038   *8............MT.*
 0160:    84FFFFFF  00000000  000000A8  FFFFFF00   *................*
 0170:    38000002  00000001  00000000  FFFFFFF4   *...............8*
 0180:    00000061  00000000  00000000  380003F2   *...8........a...*
 0190:    B800000A  00000000  00000000  00000061   *a...............*
 01A0:    00000000  00000000  00000000  B800000A   *................*
 01B0:    00000000  00000000  00000000  5E3C7E25   *%~<^............*
 01C0:    00000000  00000000  00000000  00000000   *................*

We are getting these about 20 a day..   The cpu's have been changed, the  
backplane has been changed and 1 memory module plus the memory modules have
been rotated to no avail.


Any ideas.
T.RTitleUserPersonal
Name
DateLines
2503.1isolated to CPU0DANGER::PAWLOWSKIChet PawlowskiMon Jan 27 1997 11:3127
    The two error frames you posted both point to the same fault:
    
    CPU0 is getting a single bit error (corrected by HW) from its B-cache.
    Since the External_Interface bit was not set, the error was isolated to
    the CPU.
    
    
    The failing address and fill syndrome are identical in the two frames:
    
    EI Address <39:4>         xFFFFFF0000544D9F
    Fill Syndrome Reg         x00000000000000A8		maps to data bit 43
    
    
    Do diagnostics find this error?
    
    >>>show error cpu0
    
    If you can post the SROM setup values, we could sanity check them:
    
    >>>e -b -n a iic_cpu0:f5
    
    
    Since you say that the CPU(s) have been replaced, is it possible that
    you have a failed power supply and you're running off of one rather
    than two supplies?
    
    /Chet