[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference wrksys::alphastation

Title:Alpha Workstation Conference
Notice:See note 1.* for conference notices
Moderator:WRKSYS::HOUSE
Created:Wed Sep 07 1994
Last Modified:Fri Jun 06 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:1996
Total number of notes:9122

1966.0. "alphastation 255 machine check entry" by KAOT01::B_CORBIN (dtn 640-7420 ) Mon May 12 1997 18:22

I've attached an  error log from an Alphastation 255 
    
    This is  a machine check that I am trying  to decode. The 
    machine check decoder indicate a possible main memory
    simm at fault.. Can I break this down any further?
    Using the Windows help file Alphastation 255 service guide,
    it indicates the SIMM can be found with the aid of the 
    Syndrome register , but it it all zero's.
    
    Any help appreciated.
    Brian Corbin



******************************* ENTRY    1 ********************************


Logging OS                        2. Digital UNIX
System Architecture               2. Alpha
Event sequence number             0.
Timestamp of occurrence              12-MAY-1997 11:21:57
Host name                            cides

System type register      x0000000D  AlphaStation 400 or 2xx
Number of CPUs (mpnum)    x00000001
CPU logging event (mperr) x00000000

Event validity                    1. O/S claims event is valid
Event severity                    5. Low Priority
Entry type                      300. Start-Up ASCII Message Type

SWI Minor class                   9. ASCII Message
SWI Minor sub class               3. Startup

ASCII Message
    Alpha boot: available memory from 0x10e8000 to 0x9ffe000
    Digital UNIX V4.0B  (Rev. 564); Thu Apr 24 16:11:29 PDT 1997
    physical memory = 160.00 megabytes.
    available memory = 143.10 megabytes.
    using 606 buffers containing 4.73 megabytes of memory
    AlphaStation 255/233 system
    DECchip 21071
    82378IB (SIO) PCI/ISA Bridge
    Firmware revision: 6.4
    PALcode: OSF version 1.46
    pci0 at nexus
    psiop0 at pci0 slot 6
    Loading SIOP: script 801000, reg 82910000, data 406c8fb0
    scsi0 at psiop0 slot 0
    rz0 at scsi0 target 0 lun 0 (LID=0) (DEC     RZ26F    (C) DEC 630J)
    rz4 at scsi0 target 4 lun 0 (LID=1) (DEC     RRD45   (C) DEC  0436)
    isa0 at pci0
    gpc0 at isa0
    ace0 at isa0
    ace1 at isa0
    lp0 at isa0
    fdi0 at isa0
    fd0 at fdi0 unit 0
    pci1000 at pci0 slot 12
    isp0 at pci1000 slot 0
    isp0: QLOGIC ISP1020A
    isp0: Firmware revision 5.1 (loaded by console)
    scsi1 at isp0 slot 0
    rz8 at scsi1 target 0 lun 0 (LID=2) (DEC     RZ28D    (C) DEC 0008)
    (Wide16)
    rz9 at scsi1 target 1 lun 0 (LID=3) (DEC     RZ28D    (C) DEC 0008)
    (Wide16)
    rz10 at scsi1 target 2 lun 0 (LID=4) (DEC     RZ28D    (C) DEC 0008)
    (Wide16)
    tz11 at scsi1 target 3 lun 0 (LID=5) (DEC     TLZ09     (C)DEC 0167)
    trio0 at pci0 slot 13
    trio0: S3 Trio64 (SVGA) - Plug N' Play - 2.0 Mb
    tu0: DECchip 21040-AA: Revision: 2.4
    tu0 at pci0 slot 14
    tu0: DEC TULIP Ethernet Interface, hardware address: 00-00-F8-23-3C-AD
    tu0: console mode: selecting 10BaseT (UTP) port: half duplex
    lvm0: configured.
    lvm1: configured.
    kernel console: trio0
    dli: configured
    ATM Subsystem configured with 1 restart threads
    ATM UNI 3.x signalling: configured
    ATM IP interface: configured


                                                         

Logging OS                        2. Digital UNIX
System Architecture               2. Alpha
Event sequence number             2.
Timestamp of occurrence              12-MAY-1997 11:06:00
Host name                            cides

System type register      x0000000D  AlphaStation 400 or 2xx
Number of CPUs (mpnum)    x00000001
CPU logging event (mperr) x00000000

Event validity                    1. O/S claims event is valid
Event severity                    1. Severe Priority
Entry type                      302. ASCII Panic Message Type

SWI Minor class                   9. ASCII Message
SWI Minor sub class               1. Panic

ASCII Message                        panic (cpu 0): Machine check - Hardware
                                     error
                                                                       System Architecture               2. Alpha
Event sequence number             1.
Timestamp of occurrence              12-MAY-1997 11:05:57
Host name                            cides

System type register      x0000000D  AlphaStation 400 or 2xx
Number of CPUs (mpnum)    x00000001
CPU logging event (mperr) x00000000

Event validity                    1. O/S claims event is valid
Event severity                    1. Severe Priority
Entry type                      100. CPU Machine Check Errors

CPU Minor class                   2. 660 Entry

Byte Count                    x02E8
Processor Specific Offset x00000110
System Specific Offset    x000001A0
PAL Error Type Code       x00000207
PAL Frame Revision        x00000001
- ALPHA CHIP REGISTERS -
PALTEMP1                  x0000000000000000
PALTEMP2                  x000002F800000004
PALTEMP3                  x0000000000000003
PALTEMP4                  xFFFFFC0009F68000
PALTEMP5                  x0000000000000000
PALTEMP6                  xFFFFFC00002A85D0
PALTEMP7                  x0000000000004200
PALTEMP8                  x0000000000000400
PALTEMP9                  x0000000000000000
PALTEMP10                 xFFFFFC00004FE840
PALTEMP11                 x0000000000000000
PALTEMP12                 xFFFFFC00004FEBE0
PALTEMP13                 xFFFFFC00004FEC10
PALTEMP14                 xFFFFFC00004FEC70
PALTEMP15                 xFFFFFC00004FE9E0
PALTEMP16                 xFFFFFC00004FE6B0
PALTEMP17                 xFFFFFFFF89828000
PALTEMP18                 x0000000000000000
PALTEMP19                 xFFFFFFFF89A07A38
PALTEMP20                 xFFFFFC0000699770
PALTEMP21                 x0000000000000000
PALTEMP22                 x00505070727A7A7A
PALTEMP23                 x0000000000000000
PALTEMP24                 x0000000000000000
PALTEMP25                 x0000000000010000
PALTEMP26                 x0000000000000000
PALTEMP27                 x0000000000000000
PALTEMP28                 x0000000000E8A000
PALTEMP29                 xFFFFFFFC00000000
PALTEMP30                 x0000000000000001
PALTEMP31                 x0000000009E2BA38
Exception Address Reg     xFFFFFC00002AA1CC
                                     Exception Address Reg Provides Information
                                        About The Most Recent Exception.
                                     Address Points to Native-Mode Instruction
                                     If Machine Check or Math Trap Exception,
                                        On Return Subtract 4 from Exception PC.
                                     Last Exception Addr PC:  x3FFFFF00000AA873
Exception Summary Reg     x0000000000000000
Exception Mask Reg        x0000000000000000
Icache Ctrl & Status Reg  x000002F800000004
                                     Performance Counters Disabled
                                     Empty Wrt Buffer Before Issuing Next Inst
                                     Branch Prediction Selection: Not Taken
                                     JSR Stack is Disabled
                                     Instructions Can Only Single Issue
                                     If Not in PALmode, Executing Reserved Inst
                                        Opcode Will Result in OPCDEC Exception.
                                     Super Page Istream Memory Mapping Disabled
                                     Float Point Inst Will Cause FEN Exception
                                     Icache Addr Space Numb:  x0000000000000000
PALcode Base Address Reg  x0000000000014000
                                     PALcode Base Address:  x0000000000000005
Hardware Int Enable Reg   x00000000000014F0
                                     CRD Error Interrupts Enabled
                                     CPU Hrdw Interrupts Enabled Irq_h Pins 0,2
                                     CPU Hrdw Interrupts Enbld Irq_h Pins 3,4,5
                                     Performance Cntr 0 & 1 Interrupts Disabled
                                     Serial Line Interrupts Disabled
                                     NO AST Interrupts Enabled In Any Mode
Hardware Int Request Reg  x0000000000001402
                                     Any Hrdw Int Req With Companion Enable Set
                                     NO Softw Int Req With Companion Enable Set
                                     NO AST Int Req With Companion Enable Set
                                     CPU Hrdw Interrupt Request on Irq_h Pin 0
                                     CPU Hrdw Interrupt Request on Irq_h Pin 2
Memory Management CSR     x0000000000003640
                                     MMCSR Valid Only on Mem Mgt Err, DTB Miss,
                                        D-Stream Fault, Dcache Parity Error.
                                     Last Faulting Instruction RA Field: R4
                                     Last Faulting Instruction Opcode Follows:
                                        x1B - Reserved for PALcode
(Data) Cache Status Reg   x0000000000000003
                                     This is EV45 Cache Status Register(C_STAT)
                                     EV45 Chip is Production Version of 21064A
                                     Last Load or Store Missed Dcache
Cache Address Reg         x00000007FFFFFFFF
Abox Control Reg          x000000000000942E
                                     Machine Checks Enabled for Uncorr Errors
                                     CRD Interrupts Enabled
                                     Single Entry Icache Stream Buffer Enabled
                                     Enable Super Page Dstream Virtual Addr Map
                                        VA<33:13> to PA<33:13>, if VA<42:41>=2.
                                     Lock Operation Conforms to Alpha Architect
                                     Dcache Enabled
                                     16K Byte Dcache Selected
                                     Double Invalidate: Both EV45 Dcache Blocks
                                        Addressed By iAdr_h<12:5> Invalidated.
Bus Interface Status Reg  x0000000000000050
Bus Interface Address Reg x00000000000060E0
                                     Address Only Valid if Bus Interface Status
                                        Register Error Bit 0,1,2, or 3 is Set.
                                     BIU Addr adr_h<33:5>:  x0000000000000307
Bus Interface Control Reg x0000000810002225
                                     External Cache (Bcache) Enabled
                                     PARITY MODE: External Cache Parity Enabled
                                     Cache Rams are Output Enable Controlled
                                     Ext Cache Rd Access Time: 3 CPU Cycles
                                     Ext Cache Wrt Cycle Time: 3 CPU Cycles
                                     Size of External Cache:  256 Kbyte
                                     Ext Cache For Phys Addr Quad 3 Disabled
                                     Ext Cache Rd Time Controlling Bcache Reads
                                     Ext Cache Wrt En Ctrl:  x0000000000000001
Fill Syndrome Reg         x0000000000000000
                                     No Error in Low Long Word of Quad Word
                                     No Error in Upper Long Word of Quad Word
Fill Address Reg          x0000000000006100
                                     Addr Only Valid if Bus Interface Stat Reg
                                        ECC(Bit 8) or PARITY(Bit 10) Error Set.
                                     Cache Blk Phy Adr<33:5>  x0000000000000308
Virtual Address Reg       x0000000000006170
                                     Dstream FLT/DTB Miss VA  x0000000000006170
Bcache Tag Reg            xA028003C24484850
                                     Last Bcache Access Resulted in a Miss
                                     Parity Bit for Bcache Tag Status Bits Clr
                                     Bcache Tag  Dirty Bit  Clear
                                     Bcache Tag  Shared Bit  Clear
                                     Bcache Tag  Valid Bit  Set
                                     Bcache Tag Addrress  Parity Bit  Asserted
                                     Tag Being Probed:  x0000000000004242

coma_gcr                  x000000007FB200B4
                                     DMA Priority
                                     128 bit wide MEM
                                     Bcache enabled
                                     Bcache long writes
coma_edsr                 x000000007FB221B0
coma_ter                  x000000006FB13FF0
                                     sysTag<21:17> =   x0000000000001FF8
coma_elar                 x000000006FB1FFFF
                                     sysBus<20:5> at time of e x000000000000FFFF

coma_ehar                 x000000006FB11FFB
                                     sysBus<33:21> at time of  x0000000000001FFB

coma_ldlr                 x000000006FB1F937
                                     sysBus<20:5> last locked  x000000000000F937

coma_ldhr                 x000000006FB10000
                                     sysBus<31:21> last locked x0000000000000000

coma_base0                x000000006FB10200
                                     Reg Base Adr <33:23> =  x0000000000000100
coma_base1                x000000006FB10000
                                     Reg Base Adr <33:23> =  x0000000000000000
coma_base2                x0000000047FF0000
                                     Reg Base Adr <33:23> =  x0000000000000000
coma_cnfg0                x0000000047FF00EB
                                     Bank Valid
                                     Bank Size =  32 MB
                                     Column Adr Selection  x0000000000000003
coma_cnfg1                x0000000047FF0067
                                     Bank Valid
                                     Bank Size =  128 MB
                                     Column Adr Selection  x0000000000000001
coma_cnfg2                x0000000047FF0000
                                     Bank Size =  1024 MB
                                     Column Adr Selection  x0000000000000000

epic_dcsr                 xFFFFFFFF800A201D
                                     Translation buffer enabled
                                     Prefetch enabled
                                     Disable correctable error
                                     Uncorrectable Memory Read
                                     Pass 2 Chip
                                     Partial Bypass
                                     PCI Cycle Type =   IO Read
epic_pear                 x0000000000822000
                                     PCI error address  x0000000000822000
epic_sear                 x00000000015ED310
                                     DMA Address =   x000000000015ED31
epic_tbr1                 x0000000000876000
                                     Translation Base Adr =   x00000000000043B0
epic_tbr2                 x0000000000000000
                                     Translation Base Adr =   x0000000000000000
epic_pbr1                 x00000000008C0000
                                     Scatter/Gather Enabled
                                     Window Enabled
                                     PCI Base Adr  x0000000000000008
epic_pbr2                 x0000000040080000
                                     Scatter/Gather Disabled
                                     Window Enabled
                                     PCI Base Adr  x0000000000000400
epic_pmr1                 x0000000000700000
                                     PCI Mask  x0000000000000007
epic_pmr2                 x000000003FF00000
                                     PCI Mask  x00000000000003FF
epic_harx1                xFFFFFFFF80000000
                                     PCI_ad - memory space =  x0000000000000010
epic_harx2                x0000000000000000
                                     PCI_ad - memory space =  x0000000000000000
epic_pmlt                 x00000000000000FF
                                     Master Latency Timer =   255.
epic_tag0                 x0000000000806000
                                     pci_page  x0000000000000101
epic_tag1                 x0000000000802000
                                     pci_page  x0000000000000101
epic_tag2                 x0000000000803000
                                     Entry Valid
                                     pci_page  x0000000000000101
epic_tag3                 x0000000000801000
                                     Entry Valid
                                     pci_page  x0000000000000100
epic_tag4                 x0000000000807000
                                     Entry Valid
                                     pci_page  x0000000000000101
epic_tag5                 x000000000081F000
                                     Entry Valid
                                     pci_page  x0000000000000103
epic_tag6                 x0000000000821000
                                     Entry Valid
                                     pci_page  x0000000000000104
epic_tag7                 x0000000000823000
                                     Entry Valid
                                     pci_page  x0000000000000105
epic_data0                x00000000000006C6
                                     cpu_page  x00000000000001B1
epic_data1                x00000000000006C2
                                     cpu_page  x00000000000001B0
epic_data2                x00000000000006C2
                                     cpu_page  x00000000000001B0
epic_data3                x00000000000006C0
                                     cpu_page  x00000000000001B0
epic_data4                x00000000000006C6
                                     cpu_page  x00000000000001B1
epic_data5                x0000000000002494
                                     cpu_page  x0000000000000925
epic_data6                x00000000000025AE
                                     cpu_page  x000000000000096B
epic_data7                x00000000000057B4
                                     cpu_page  x00000000000015ED


******************************** ENTRY    4 **************************

--------------------------------------------------------------------------------
BRIAN CORBIN               <machine check decoder>             12-MAY-1997 16:49
--------------------------------------------------------------------------------

Enter the Contents of the BIU_STAT register: 50
BIU_STAT Register = 50

                                   The EV4 requested an External Cycle
                                   The Cycle being performed =  Write Block.
                                   Bit 11 is clear - DCache Fill Reference
                                   The failing Quadword is = 0

Enter the contents of EPIC_DCSR register. 800A201D
Enter the contents of COMA_EDSR register. 7FB221B0

If you are running OpenVMS Enter the PAL ERROR CODE:
If you are running OSF/1 Enter the mchk_code
Enter CODE: 207

The Error code entered above has the following meaning

Uncorrectable Memory Error, An Uncorrectable error was
encountered by the EPIC in the data read from the DMA read
buffer in the DECADE chip during a DMA read or scatter
gather read.
Most likley Broken = MEMORY, CPU CARD

The EPIC_SEAR Register contain the failing Address. Which
you can match to the failing memory bank
                                                               
 Type C to continue. c
No Error Bits are set in the BIU_STAT Register.
                                              
Checking for Multiple Errors in the registers.

Bit 13 is set in the EPIC_DCSR register.
Uncorrectable Memory Error, An Uncorrectable error was
encountered by the EPIC in the data read from the DMA read
buffer in the DECADE chip during a DMA read or scatter
gather read.
Most likley Broken = Memory, CPU CARD

The EPIC_SEAR Register contain the failing Address. Which
 you can match to the failing memory bank

Type C to Continue.


          epic_dcsr                 xFFFFFFFF800A201D
                                     Translation buffer enabled
                                     Prefetch enabled
                                     Disable correctable error
                                     Uncorrectable Memory Read<------------
                                     Pass 2 Chip
                                     Partial Bypass
                                     PCI Cycle Type =   IO Read
epic_pear                 x0000000000822000
                                     PCI error address  x0000000000822000
epic_sear                 x00000000015ED310
                                     DMA Address =   x000000000015ED31<-------

T.RTitleUserPersonal
Name
DateLines
1966.1alphastation 255 machine check code=207CSC32::HUTMACHERTue May 13 1997 10:2652
    Hi Brian
    
    getting machine check code=207
    
    5.0.7   0x207  UNCORRECTABLE MEMORY ERROR
    
            EPIC_DCSR<13> = 1  - uMRD - Uncorrectable Memory Error
                            An Uncorrectable error was encountered by
                            the EPIC in the data read from the DMA Read
                            Buffer in the DECADE chip during a DMA Read
                            or Scatter Gather Read.
    
            RECOVERY:       Not recoverable
                            Clear error by writing to EPIC_DCSR at address
                            1 A000 0000, bit<13> with a one.
    
            ANALYSIS:       A MEMORY DMA READ DATA ERROR OCCURRED AT SEAR
    
                            EPIC_SEAR<31:4> - to determine value of
                            sysAdr<33:6>
                            when error was logged.
    
    
      the epic detected memory error during io dma so fill_add and syndrome
      are not captured. only thing you can go by is the address being used  
      at the time EPIC_SEAR<31:4>
    
      in this case 
    
    epic_sear                 x00000000015ED310
                                         DMA Address =   x000000000015ED31
    
    15ED31 is 1.4 to 1.5 meg region of memory
    
    your system has bank1 of 32meg simms so one of them is suspect there
    is no way i know of narrowing this down further?
    
    
    bank1 32meg simms has address range 0-128meg
    bank0 8meg simms has address range 128-160meg
    
    
    only time i have seen 207 mach check codes problem was with 3rd parity
    simms and we ended up putting in dec (opps digital) simms to stop it
    
    also it is recommended that larger simms be in the lowest banks #'s
    aka 32meg simms in bank0 and 8meg simms in bank1
    
    good luck 
    
    jim hutmacher mvhs colorado csc 800-354-9000 ext 25561
   
1966.2memory bank layoutKAOT01::B_CORBINdtn 640-7420 Wed May 14 1997 09:3916
    Jim
    Thanks for your analysis.. I talked to the branch engineer and 
    it appears that the bank 1 simms are from Dataram (4*32meg).
    
    Your reply pointed out an interesting fact about the Alphastation 255.
    
    In the service quide it states in the memory configuration section
    that the initialization code will set the base address of the largest
    bank to the lowest address.  In troubleshooting memory , a good
    rule to follow would be to look at a >>>show memory printout
    beforehand.
    
    
    
    
    
1966.3WRKSYS::DONALDSo Long, And Thanks For All The FishWed May 14 1997 11:149
    Hi,
    
    Re: .1
    
    Unless the firmware has changed, it doesn't matter where the largest
    SIMMs are placed; the firmware will automagically locate them at bank 0.
    
    Cheers,
    Terry