[Search for users]
[Overall Top Noters]
[List of all Conferences]
[Download this site]
Title: | Alpha Workstation Conference |
Notice: | See note 1.* for conference notices |
Moderator: | WRKSYS::HOUSE |
|
Created: | Wed Sep 07 1994 |
Last Modified: | Fri Jun 06 1997 |
Last Successful Update: | Fri Jun 06 1997 |
Number of topics: | 1996 |
Total number of notes: | 9122 |
1966.0. "alphastation 255 machine check entry" by KAOT01::B_CORBIN (dtn 640-7420 ) Mon May 12 1997 18:22
I've attached an error log from an Alphastation 255
This is a machine check that I am trying to decode. The
machine check decoder indicate a possible main memory
simm at fault.. Can I break this down any further?
Using the Windows help file Alphastation 255 service guide,
it indicates the SIMM can be found with the aid of the
Syndrome register , but it it all zero's.
Any help appreciated.
Brian Corbin
******************************* ENTRY 1 ********************************
Logging OS 2. Digital UNIX
System Architecture 2. Alpha
Event sequence number 0.
Timestamp of occurrence 12-MAY-1997 11:21:57
Host name cides
System type register x0000000D AlphaStation 400 or 2xx
Number of CPUs (mpnum) x00000001
CPU logging event (mperr) x00000000
Event validity 1. O/S claims event is valid
Event severity 5. Low Priority
Entry type 300. Start-Up ASCII Message Type
SWI Minor class 9. ASCII Message
SWI Minor sub class 3. Startup
ASCII Message
Alpha boot: available memory from 0x10e8000 to 0x9ffe000
Digital UNIX V4.0B (Rev. 564); Thu Apr 24 16:11:29 PDT 1997
physical memory = 160.00 megabytes.
available memory = 143.10 megabytes.
using 606 buffers containing 4.73 megabytes of memory
AlphaStation 255/233 system
DECchip 21071
82378IB (SIO) PCI/ISA Bridge
Firmware revision: 6.4
PALcode: OSF version 1.46
pci0 at nexus
psiop0 at pci0 slot 6
Loading SIOP: script 801000, reg 82910000, data 406c8fb0
scsi0 at psiop0 slot 0
rz0 at scsi0 target 0 lun 0 (LID=0) (DEC RZ26F (C) DEC 630J)
rz4 at scsi0 target 4 lun 0 (LID=1) (DEC RRD45 (C) DEC 0436)
isa0 at pci0
gpc0 at isa0
ace0 at isa0
ace1 at isa0
lp0 at isa0
fdi0 at isa0
fd0 at fdi0 unit 0
pci1000 at pci0 slot 12
isp0 at pci1000 slot 0
isp0: QLOGIC ISP1020A
isp0: Firmware revision 5.1 (loaded by console)
scsi1 at isp0 slot 0
rz8 at scsi1 target 0 lun 0 (LID=2) (DEC RZ28D (C) DEC 0008)
(Wide16)
rz9 at scsi1 target 1 lun 0 (LID=3) (DEC RZ28D (C) DEC 0008)
(Wide16)
rz10 at scsi1 target 2 lun 0 (LID=4) (DEC RZ28D (C) DEC 0008)
(Wide16)
tz11 at scsi1 target 3 lun 0 (LID=5) (DEC TLZ09 (C)DEC 0167)
trio0 at pci0 slot 13
trio0: S3 Trio64 (SVGA) - Plug N' Play - 2.0 Mb
tu0: DECchip 21040-AA: Revision: 2.4
tu0 at pci0 slot 14
tu0: DEC TULIP Ethernet Interface, hardware address: 00-00-F8-23-3C-AD
tu0: console mode: selecting 10BaseT (UTP) port: half duplex
lvm0: configured.
lvm1: configured.
kernel console: trio0
dli: configured
ATM Subsystem configured with 1 restart threads
ATM UNI 3.x signalling: configured
ATM IP interface: configured
Logging OS 2. Digital UNIX
System Architecture 2. Alpha
Event sequence number 2.
Timestamp of occurrence 12-MAY-1997 11:06:00
Host name cides
System type register x0000000D AlphaStation 400 or 2xx
Number of CPUs (mpnum) x00000001
CPU logging event (mperr) x00000000
Event validity 1. O/S claims event is valid
Event severity 1. Severe Priority
Entry type 302. ASCII Panic Message Type
SWI Minor class 9. ASCII Message
SWI Minor sub class 1. Panic
ASCII Message panic (cpu 0): Machine check - Hardware
error
System Architecture 2. Alpha
Event sequence number 1.
Timestamp of occurrence 12-MAY-1997 11:05:57
Host name cides
System type register x0000000D AlphaStation 400 or 2xx
Number of CPUs (mpnum) x00000001
CPU logging event (mperr) x00000000
Event validity 1. O/S claims event is valid
Event severity 1. Severe Priority
Entry type 100. CPU Machine Check Errors
CPU Minor class 2. 660 Entry
Byte Count x02E8
Processor Specific Offset x00000110
System Specific Offset x000001A0
PAL Error Type Code x00000207
PAL Frame Revision x00000001
- ALPHA CHIP REGISTERS -
PALTEMP1 x0000000000000000
PALTEMP2 x000002F800000004
PALTEMP3 x0000000000000003
PALTEMP4 xFFFFFC0009F68000
PALTEMP5 x0000000000000000
PALTEMP6 xFFFFFC00002A85D0
PALTEMP7 x0000000000004200
PALTEMP8 x0000000000000400
PALTEMP9 x0000000000000000
PALTEMP10 xFFFFFC00004FE840
PALTEMP11 x0000000000000000
PALTEMP12 xFFFFFC00004FEBE0
PALTEMP13 xFFFFFC00004FEC10
PALTEMP14 xFFFFFC00004FEC70
PALTEMP15 xFFFFFC00004FE9E0
PALTEMP16 xFFFFFC00004FE6B0
PALTEMP17 xFFFFFFFF89828000
PALTEMP18 x0000000000000000
PALTEMP19 xFFFFFFFF89A07A38
PALTEMP20 xFFFFFC0000699770
PALTEMP21 x0000000000000000
PALTEMP22 x00505070727A7A7A
PALTEMP23 x0000000000000000
PALTEMP24 x0000000000000000
PALTEMP25 x0000000000010000
PALTEMP26 x0000000000000000
PALTEMP27 x0000000000000000
PALTEMP28 x0000000000E8A000
PALTEMP29 xFFFFFFFC00000000
PALTEMP30 x0000000000000001
PALTEMP31 x0000000009E2BA38
Exception Address Reg xFFFFFC00002AA1CC
Exception Address Reg Provides Information
About The Most Recent Exception.
Address Points to Native-Mode Instruction
If Machine Check or Math Trap Exception,
On Return Subtract 4 from Exception PC.
Last Exception Addr PC: x3FFFFF00000AA873
Exception Summary Reg x0000000000000000
Exception Mask Reg x0000000000000000
Icache Ctrl & Status Reg x000002F800000004
Performance Counters Disabled
Empty Wrt Buffer Before Issuing Next Inst
Branch Prediction Selection: Not Taken
JSR Stack is Disabled
Instructions Can Only Single Issue
If Not in PALmode, Executing Reserved Inst
Opcode Will Result in OPCDEC Exception.
Super Page Istream Memory Mapping Disabled
Float Point Inst Will Cause FEN Exception
Icache Addr Space Numb: x0000000000000000
PALcode Base Address Reg x0000000000014000
PALcode Base Address: x0000000000000005
Hardware Int Enable Reg x00000000000014F0
CRD Error Interrupts Enabled
CPU Hrdw Interrupts Enabled Irq_h Pins 0,2
CPU Hrdw Interrupts Enbld Irq_h Pins 3,4,5
Performance Cntr 0 & 1 Interrupts Disabled
Serial Line Interrupts Disabled
NO AST Interrupts Enabled In Any Mode
Hardware Int Request Reg x0000000000001402
Any Hrdw Int Req With Companion Enable Set
NO Softw Int Req With Companion Enable Set
NO AST Int Req With Companion Enable Set
CPU Hrdw Interrupt Request on Irq_h Pin 0
CPU Hrdw Interrupt Request on Irq_h Pin 2
Memory Management CSR x0000000000003640
MMCSR Valid Only on Mem Mgt Err, DTB Miss,
D-Stream Fault, Dcache Parity Error.
Last Faulting Instruction RA Field: R4
Last Faulting Instruction Opcode Follows:
x1B - Reserved for PALcode
(Data) Cache Status Reg x0000000000000003
This is EV45 Cache Status Register(C_STAT)
EV45 Chip is Production Version of 21064A
Last Load or Store Missed Dcache
Cache Address Reg x00000007FFFFFFFF
Abox Control Reg x000000000000942E
Machine Checks Enabled for Uncorr Errors
CRD Interrupts Enabled
Single Entry Icache Stream Buffer Enabled
Enable Super Page Dstream Virtual Addr Map
VA<33:13> to PA<33:13>, if VA<42:41>=2.
Lock Operation Conforms to Alpha Architect
Dcache Enabled
16K Byte Dcache Selected
Double Invalidate: Both EV45 Dcache Blocks
Addressed By iAdr_h<12:5> Invalidated.
Bus Interface Status Reg x0000000000000050
Bus Interface Address Reg x00000000000060E0
Address Only Valid if Bus Interface Status
Register Error Bit 0,1,2, or 3 is Set.
BIU Addr adr_h<33:5>: x0000000000000307
Bus Interface Control Reg x0000000810002225
External Cache (Bcache) Enabled
PARITY MODE: External Cache Parity Enabled
Cache Rams are Output Enable Controlled
Ext Cache Rd Access Time: 3 CPU Cycles
Ext Cache Wrt Cycle Time: 3 CPU Cycles
Size of External Cache: 256 Kbyte
Ext Cache For Phys Addr Quad 3 Disabled
Ext Cache Rd Time Controlling Bcache Reads
Ext Cache Wrt En Ctrl: x0000000000000001
Fill Syndrome Reg x0000000000000000
No Error in Low Long Word of Quad Word
No Error in Upper Long Word of Quad Word
Fill Address Reg x0000000000006100
Addr Only Valid if Bus Interface Stat Reg
ECC(Bit 8) or PARITY(Bit 10) Error Set.
Cache Blk Phy Adr<33:5> x0000000000000308
Virtual Address Reg x0000000000006170
Dstream FLT/DTB Miss VA x0000000000006170
Bcache Tag Reg xA028003C24484850
Last Bcache Access Resulted in a Miss
Parity Bit for Bcache Tag Status Bits Clr
Bcache Tag Dirty Bit Clear
Bcache Tag Shared Bit Clear
Bcache Tag Valid Bit Set
Bcache Tag Addrress Parity Bit Asserted
Tag Being Probed: x0000000000004242
coma_gcr x000000007FB200B4
DMA Priority
128 bit wide MEM
Bcache enabled
Bcache long writes
coma_edsr x000000007FB221B0
coma_ter x000000006FB13FF0
sysTag<21:17> = x0000000000001FF8
coma_elar x000000006FB1FFFF
sysBus<20:5> at time of e x000000000000FFFF
coma_ehar x000000006FB11FFB
sysBus<33:21> at time of x0000000000001FFB
coma_ldlr x000000006FB1F937
sysBus<20:5> last locked x000000000000F937
coma_ldhr x000000006FB10000
sysBus<31:21> last locked x0000000000000000
coma_base0 x000000006FB10200
Reg Base Adr <33:23> = x0000000000000100
coma_base1 x000000006FB10000
Reg Base Adr <33:23> = x0000000000000000
coma_base2 x0000000047FF0000
Reg Base Adr <33:23> = x0000000000000000
coma_cnfg0 x0000000047FF00EB
Bank Valid
Bank Size = 32 MB
Column Adr Selection x0000000000000003
coma_cnfg1 x0000000047FF0067
Bank Valid
Bank Size = 128 MB
Column Adr Selection x0000000000000001
coma_cnfg2 x0000000047FF0000
Bank Size = 1024 MB
Column Adr Selection x0000000000000000
epic_dcsr xFFFFFFFF800A201D
Translation buffer enabled
Prefetch enabled
Disable correctable error
Uncorrectable Memory Read
Pass 2 Chip
Partial Bypass
PCI Cycle Type = IO Read
epic_pear x0000000000822000
PCI error address x0000000000822000
epic_sear x00000000015ED310
DMA Address = x000000000015ED31
epic_tbr1 x0000000000876000
Translation Base Adr = x00000000000043B0
epic_tbr2 x0000000000000000
Translation Base Adr = x0000000000000000
epic_pbr1 x00000000008C0000
Scatter/Gather Enabled
Window Enabled
PCI Base Adr x0000000000000008
epic_pbr2 x0000000040080000
Scatter/Gather Disabled
Window Enabled
PCI Base Adr x0000000000000400
epic_pmr1 x0000000000700000
PCI Mask x0000000000000007
epic_pmr2 x000000003FF00000
PCI Mask x00000000000003FF
epic_harx1 xFFFFFFFF80000000
PCI_ad - memory space = x0000000000000010
epic_harx2 x0000000000000000
PCI_ad - memory space = x0000000000000000
epic_pmlt x00000000000000FF
Master Latency Timer = 255.
epic_tag0 x0000000000806000
pci_page x0000000000000101
epic_tag1 x0000000000802000
pci_page x0000000000000101
epic_tag2 x0000000000803000
Entry Valid
pci_page x0000000000000101
epic_tag3 x0000000000801000
Entry Valid
pci_page x0000000000000100
epic_tag4 x0000000000807000
Entry Valid
pci_page x0000000000000101
epic_tag5 x000000000081F000
Entry Valid
pci_page x0000000000000103
epic_tag6 x0000000000821000
Entry Valid
pci_page x0000000000000104
epic_tag7 x0000000000823000
Entry Valid
pci_page x0000000000000105
epic_data0 x00000000000006C6
cpu_page x00000000000001B1
epic_data1 x00000000000006C2
cpu_page x00000000000001B0
epic_data2 x00000000000006C2
cpu_page x00000000000001B0
epic_data3 x00000000000006C0
cpu_page x00000000000001B0
epic_data4 x00000000000006C6
cpu_page x00000000000001B1
epic_data5 x0000000000002494
cpu_page x0000000000000925
epic_data6 x00000000000025AE
cpu_page x000000000000096B
epic_data7 x00000000000057B4
cpu_page x00000000000015ED
******************************** ENTRY 4 **************************
--------------------------------------------------------------------------------
BRIAN CORBIN <machine check decoder> 12-MAY-1997 16:49
--------------------------------------------------------------------------------
Enter the Contents of the BIU_STAT register: 50
BIU_STAT Register = 50
The EV4 requested an External Cycle
The Cycle being performed = Write Block.
Bit 11 is clear - DCache Fill Reference
The failing Quadword is = 0
Enter the contents of EPIC_DCSR register. 800A201D
Enter the contents of COMA_EDSR register. 7FB221B0
If you are running OpenVMS Enter the PAL ERROR CODE:
If you are running OSF/1 Enter the mchk_code
Enter CODE: 207
The Error code entered above has the following meaning
Uncorrectable Memory Error, An Uncorrectable error was
encountered by the EPIC in the data read from the DMA read
buffer in the DECADE chip during a DMA read or scatter
gather read.
Most likley Broken = MEMORY, CPU CARD
The EPIC_SEAR Register contain the failing Address. Which
you can match to the failing memory bank
Type C to continue. c
No Error Bits are set in the BIU_STAT Register.
Checking for Multiple Errors in the registers.
Bit 13 is set in the EPIC_DCSR register.
Uncorrectable Memory Error, An Uncorrectable error was
encountered by the EPIC in the data read from the DMA read
buffer in the DECADE chip during a DMA read or scatter
gather read.
Most likley Broken = Memory, CPU CARD
The EPIC_SEAR Register contain the failing Address. Which
you can match to the failing memory bank
Type C to Continue.
epic_dcsr xFFFFFFFF800A201D
Translation buffer enabled
Prefetch enabled
Disable correctable error
Uncorrectable Memory Read<------------
Pass 2 Chip
Partial Bypass
PCI Cycle Type = IO Read
epic_pear x0000000000822000
PCI error address x0000000000822000
epic_sear x00000000015ED310
DMA Address = x000000000015ED31<-------
T.R | Title | User | Personal Name | Date | Lines |
---|
1966.1 | alphastation 255 machine check code=207 | CSC32::HUTMACHER | | Tue May 13 1997 10:26 | 52 |
| Hi Brian
getting machine check code=207
5.0.7 0x207 UNCORRECTABLE MEMORY ERROR
EPIC_DCSR<13> = 1 - uMRD - Uncorrectable Memory Error
An Uncorrectable error was encountered by
the EPIC in the data read from the DMA Read
Buffer in the DECADE chip during a DMA Read
or Scatter Gather Read.
RECOVERY: Not recoverable
Clear error by writing to EPIC_DCSR at address
1 A000 0000, bit<13> with a one.
ANALYSIS: A MEMORY DMA READ DATA ERROR OCCURRED AT SEAR
EPIC_SEAR<31:4> - to determine value of
sysAdr<33:6>
when error was logged.
the epic detected memory error during io dma so fill_add and syndrome
are not captured. only thing you can go by is the address being used
at the time EPIC_SEAR<31:4>
in this case
epic_sear x00000000015ED310
DMA Address = x000000000015ED31
15ED31 is 1.4 to 1.5 meg region of memory
your system has bank1 of 32meg simms so one of them is suspect there
is no way i know of narrowing this down further?
bank1 32meg simms has address range 0-128meg
bank0 8meg simms has address range 128-160meg
only time i have seen 207 mach check codes problem was with 3rd parity
simms and we ended up putting in dec (opps digital) simms to stop it
also it is recommended that larger simms be in the lowest banks #'s
aka 32meg simms in bank0 and 8meg simms in bank1
good luck
jim hutmacher mvhs colorado csc 800-354-9000 ext 25561
|
1966.2 | memory bank layout | KAOT01::B_CORBIN | dtn 640-7420 | Wed May 14 1997 09:39 | 16 |
| Jim
Thanks for your analysis.. I talked to the branch engineer and
it appears that the bank 1 simms are from Dataram (4*32meg).
Your reply pointed out an interesting fact about the Alphastation 255.
In the service quide it states in the memory configuration section
that the initialization code will set the base address of the largest
bank to the lowest address. In troubleshooting memory , a good
rule to follow would be to look at a >>>show memory printout
beforehand.
|
1966.3 | | WRKSYS::DONALD | So Long, And Thanks For All The Fish | Wed May 14 1997 11:14 | 9 |
| Hi,
Re: .1
Unless the firmware has changed, it doesn't matter where the largest
SIMMs are placed; the firmware will automagically locate them at bank 0.
Cheers,
Terry
|