| T.R | Title | User | Personal Name
 | Date | Lines | 
|---|
| 589.1 |  | MAY21::CUMMINS |  | Tue May 06 1997 14:06 | 30 | 
|  |     Nothing stands out.. The 84000000 in CAP_ERR from the INFO 5 output is
    from the last MCHK taken (VMS takes NXM when probing the system bus
    (empty slot)). Not a real error. Just a stale sizing error. There's
    also soft error environment data in the frame, but that's from the 
    Power, Fan, Temp Status Normal message one gets across a reset/boot.
    
    Some thoughts/comments:
    
      1. Have you tried taking the newly-added memory pair and swapping it
         for the original pair (and running with only 1GB) to see whether
         the problem is software versus hardware? Problem could possibly be
         the motherboard's second memory slot pair, though unlikely, so the
         results of said experiment would not be foolproof / 100% obvious.
    
      2. Next time you boot, do the following:
    
          P00>>> b -h <device,flag,file_list>
          .
          .
          P00>>> info 1
    
          Are there any bad pages marked out of the bitmap passed to VMS?
    
       3. Nothing in the VMS system error log? No recoverable errors, etc.?
          Have you tried running V2.4 or V2.3 (with the latest KNL updates)
          DECevent on this system?
    
    Other than the above, I'm fresh out of ideas.. Without more data..
    
    BC
 | 
| 589.2 | may be graphics card | WRKSYS::RICHARDSON |  | Tue May 06 1997 14:10 | 5 | 
|  |     What graphics card is in this system?  Several of them won't work with
    >1G memory (not sure which are even supported on this particular system
    anyhow).
    
    /Charlotte
 | 
| 589.3 | could PCI0 problems cause this ? | GIDDAY::FLAWN |  | Tue May 06 1997 14:39 | 36 | 
|  |     Thanks,
    I'm waiting for detailed config information from the system - it looks
    like there's no graphics card in it from what's shown, though if there
    was it would be a reasonable idea to go back to the old S3 TRIO. 
                                                   
    My inclination is to look at the hardware revs on parts and see if
    anything shows.
    I'm not sure if DECevent is on the system (it should be !) but because
    very recent errors are going to be still held in the in memory errorlog
    buffers we may luck out there but it's worth making sure it' been
    checked - thanks.
    Something I'm not fully clear on .... with the system not even
    resetting would that imply that either :
    - we're hung at hardware IPL
    - PCI bus 0 is potentially having a problem
    I'm not sure what happens with a reset if we're at hardware IPL here.
    The other thing I was thinking of was to change the bus 0 configuration
    by moving something like the FDDI and ethernet adapters away (assuming 
    both machines have the same - which is why I need the info). Or
    removing the FDDI adapter altogethre if it's not conneccted (the
    link_unavail makes it look maybe unconnected), This would more or less
    presuppose some kind of weirdo problem with the configuration, but it's
    probably not all that common (to include CIPCA and mutiple of what seem
    to be KZPDA's etc).
    If this is a path not worth pursuing please let us know !
      
    Regards and thanks,
    Dave Flawn
    CSC Sydney
 | 
| 589.4 |  | MAY21::CUMMINS |  | Tue May 06 1997 15:16 | 15 | 
|  |     The only problem I'm aware of is that certain older rev DEFPAs
    were causing problems when in the same PCI segment as certain
    other devices. And I can't remember any more details than this..
    I believe the symptom was hangs, though. Will try to get more
    details and report back..
    
    You are correct that halts come in over PCI bus zero logic. Halts
    are unmaskable. Unless IOD0 is wedged, the halt should occur. It's
    possible PAL/console got the halt interrupt (IPL 31), but was unable
    to restart - though this would be an extremely unlikely scenario.
    
    More likely, one of the PCI options is wedging the bus. The DEFPA
    would be a good first candidate for removal.
    
    BC
 | 
| 589.5 | becoming clearer now.... | GIDDAY::FLAWN |  | Thu May 08 1997 10:17 | 47 | 
|  | Hi,
Happened again, but more info now.
What I was thinking was not correct. I was working off the same info as in .0
which I took to mean the FDDI card was in PCI0 - it was in PCI1 (it seems the
PCI-PCI bridges make it really weird, though I've yet to physically see this).
So I don't think I understand the slot layout when the KZPDAs are there.
Anyway, the more important information I now have is that the system actually 
loops with :
halted CPU 0
 halt code = 2
 kernel stack not valid halt
 PC = ffffffff8004d290
 CPU 0 restarting
 halted CPU 0
 halt code = 2
 kernel stack not valid halt
 PC = ffffffff8004bde0
 CPU 0 restarting
etc.
So my take on this is that what I thought about PCI0 being stuck is wrong since
the console output (serial console) is still working. In any case, the unused
VGA card has now also been removed.
It looks like this is actually OpenVMS taking a kernel stack not valid crash,
probably because of a software problem. What I don't understand is why it's
looping like that, rather than taking a crash. My understanding is that the
console has control right at that point and outputs the kernel stack not valid
halt message.... but it should produce a dump.... In order to fix what now 
looks like a software problem we need to get a crash dump out of it so I'm 
open to any ideas on this, particularly as I may be missing something obvious
here.
AUTO_ACTION is set to RESTART.
Regards and thanks,
Dave.
 | 
| 589.6 |  | HARMNY::CUMMINS |  | Thu May 08 1997 11:04 | 30 | 
|  |     Set auto_action halt and disable VMS bugcheck reboots.
    
    Then use the console crash command if/when you get another KSNV. You
    might want to type INFO 4, INFO 5, and INFO 8 before forcing the crash
    (in case the crash doesn't work for some reason). The INFO output will
    give you all GPR, FPR, IPR, and CSR state that you'd find in the crash
    dump file. I.e. except the system memory dump..
    
    I looked back at .0 and the FDDI was indeed in PCI0.
    
    The 4100 has two separate PCI buses/hoses (0 and 1). The KZPDA is a
    bridged QLOGIC option; i.e. it had a QLOGIC ISP1040 behind a PCI-PCI
    bridge. The aforementioned hoses 0 and 1 are the top-level PCI buses.
    The PCI-PCI bridge (PPB) spawns a secondary PCI bus, upon which the 
    QLOGIC device sits. Console (and VMS/UNIX) always reserve secondary
    bus 1's for EISA, regardless of whether a given PCI hose spawns an
    EISA bus. Therefore, console assigns the secondary buses associated
    with the KZPDAs as bus 2 off their respective primary PCI buses.
    
    Finally, there is a VMS issue with crash dumps when a VGA is in the
    system. See note 423.12 for details (inability to crash dump on VMS
    when VGA present). VMS will be changing how/when it clears MCES in
    VMS 7.2. Console V4.8 and beyond includes a hack of sorts to help
    resolve the inability to crash problem in most cases. You should
    therefore update the console to V4.8-5 (V3.9 CD) at some point for
    this customer - assuming he/she wants to put back the VGA card at
    some point.
    
    Let us know how things turn out.
    BC
 | 
| 589.7 | thanks, will go take a look myself shortly ... | GIDDAY::FLAWN |  | Thu May 08 1997 16:40 | 43 | 
|  | Thanks,
And for explaining the bridge bus numbering. I'm told that the FDDI adapter was
in PCI bus 1 before .... (even though it looks like we both see it as being on
bus 0).
I'm travelling to the site today and will update here with the results. At this
stage I plan on (in only approximate order) :
1. Write a quick program to mungle a process kernel stack pointer and try 
   to reproduce. If reproduceable try a different type of crash to see if
   it's just these or all crash types.
2. Rerun ECU in case that does some good
3. Fix something I've seen now on console output showing 4 billion (looks like 
   MAXINT of some reasonably sized bitfield) environmental events - first 
   clear the events, if that doesn't work do the neat save/clear/restore NVRAM 
   thing (console is 4.8-6). If still no work try replacing the XICOR NVRAM
   and finally saddle.
 
4. Increase sysgen parmeter KSTACKPAGES to 6.
  
5. Sort out the device naming issue - move parts around till it makes sense
   or indicates a problem.
6. Review software configuration and if nothing else has worked to nail the 
   problem down and the few software ECOs for known Digital caused kernel 
   stack invalid VMS crashes are relevant then apply as appropriate.
If I  can reproduce this it should be possible to fix, even if we had to try
different console versions or a 300Mhz CPU to make it dump.
Thanks for taking the interest in this, and to Rawhide engineering in general
for their assistance thru notes. Problems such as this, while formally
warranting escalation due to severity, really need to be narrowed down or
resolved in the field. 
Sorry about not having all the info on this earlier!
Thanks,
Dave.
 | 
| 589.8 | Comments on last reply | MAY21::CUMMINS |  | Thu May 08 1997 16:59 | 62 | 
|  | Feedback on your most recent reply..
    
And for explaining the bridge bus numbering. I'm told that the FDDI adapter was
in PCI bus 1 before .... (even though it looks like we both see it as being on
bus 0).
    BC> Yes, log from base note definitely shows it hanging off PCI hose 0.
    
1. Write a quick program to mungle a process kernel stack pointer and try 
   to reproduce. If reproduceable try a different type of crash to see if
   it's just these or all crash types.
2. Rerun ECU in case that does some good
    BC> I'm 99% sure you'll be wasting your time here.
    
3. Fix something I've seen now on console output showing 4 billion (looks like 
   MAXINT of some reasonably sized bitfield) environmental events - first 
   clear the events, if that doesn't work do the neat save/clear/restore NVRAM 
   thing (console is 4.8-6). If still no work try replacing the XICOR NVRAM
   and finally saddle.
 
    BC> This is a known problem that was caused when some systems slipped
    BC> thru MFG without having their RCM NVRAMs properly initialized. PAL
    BC> stores environmental events in the RCM NVRAM. You could have broken
    BC> HW, but it's more likely you have one of the uninit'd machines.
    BC>
    BC> To check for HW presence/okay, type the following:
    BC>
    BC>   P00>>>ls iic_rcm*
    BC>   iic_rcm_nvram0  iic_rcm_nvram1  iic_rcm_nvram2  iic_rcm_nvram3  iic_rcm_nvram4
    BC>   iic_rcm_nvram5  iic_rcm_nvram6  iic_rcm_nvram7  iic_rcm_temp
    BC>
    BC> You should see all of the above devices. If not, you're either missing
    BC> all or part of the COMBO/RCM logic.
    BC>
    BC> Most likely you simply need to init the NVRAM. Type the following:
    BC> 
    BC>   P00>>> d iic_rcm_nvram6:4 -q 20000010057
    BC>
    BC> Does this make the SHOW POWER problem go away?
    
4. Increase sysgen parmeter KSTACKPAGES to 6.
  
5. Sort out the device naming issue - move parts around till it makes sense
   or indicates a problem.
    BC> No problem with what I saw from the base note.. Other than someone
    BC> apparently telling you DEFPA was in PCI1 (hose 1).
    
6. Review software configuration and if nothing else has worked to nail the 
   problem down and the few software ECOs for known Digital caused kernel 
   stack invalid VMS crashes are relevant then apply as appropriate.
If I  can reproduce this it should be possible to fix, even if we had to try
different console versions or a 300Mhz CPU to make it dump.
    BC> Don't understand the 300 MHz CPU comment. Latest V4.8 console does
    BC> work around the VMS MCES and crash dumping issue (when VGA present).
    BC> So, if VGA present, you should update to latest console. Not a bad
    BC> idea to do so anyway, since provides various new features/fixes.
    BC> See LFU release notes for details..
 | 
| 589.9 | looks like hw or environment | GIDDAY::FLAWN |  | Mon May 26 1997 07:58 | 1249 | 
|  | Hi,
It turns out this should be a hardware problem. We got a similar failure
after returning to the 2GB configuration but this time, with ERLBUFFERPAGES
high enough for good error logging and DECevent installed we got some 
information and the system started to dmp but then hit what looks like the
VMS problem were with MCES it falls into XDELTA. While the customer is running
latest console it looks like we may have hit an instance where that can still
happen (removing the VGA may help get more info). 
Unfortunately I hadn't given them instructions on dumping the mchk
logout area as I hadn't expected this one....
It looks to me that this new behaviour (getting some error info and starting
to dump, rather than doing the kernel stack not valid halt loop) may  be
because the machine ceck handler was overflowing the kernel stack before...
so we didn't get the machine check crash but instead got the KSTACKNV. Just 
bad luck I suppose....
The error log info makes this look like a memory problem (the IOD and CPU 
were seeing errors at this time) but given the number of memory options 
attempted we can probably rule it out unless we've been unfortunate with
the spares.
Instead, and given that these failures seem to occur at about the same time
each day, but not on weekends, I think we're either seeing environmental
problems or a motherboard problem.... showing up when the higher physical
addresses are hit (though the customer says this is happening a bit after 
their heaviest load time). This happens on two machines (we've not
tried 2GB with the higher KSTACKPAGES in the other machine which was doing
the KSTACKNV, so I can't be  sure it's the same but it appears likely).
The parts in these systems are all fairly early FRS but I can't see any
known issues that would match these symptoms, so I'll try to reproduce 
this with DECVET and/or swap the motherboard, but am open to any suggestions.
We'l also set up a Dranetz with RI monitoring to see if it shows anything.
Regards and thanks,
Dave.
******************************** ENTRY 4555 ******************************** 
Logging OS                        1. OpenVMS 
System Architecture               2. Alpha 
OS version                           V6.2-1H3 
Event sequence number          5790. 
Timestamp of occurrence              26-MAY-1997 11:09:01   
Time since reboot                    2 Day(s) 15:00:19 
Host name                            AXP2     
System Model                         AlphaServer 4100 5/400 4MB 
Entry type                        6. Soft ECC Error 
Memory Minor class                1. Soft ECC error 
Software Flags            x0000000000000000 
Active CPUs               x00000001 
Hardware Rev              x00000000 
System Serial Number                   
Module Serial Number                   
Module Type                   x0000 
System Revision           x00000000 
Machine Check Reason          x0086  Alpha Chip Detected ECC Error, From Memory 
Ext Interface Status Reg  xFFFFFFF0C1FFFFFF 
                                     DATA SOURCE IS MEMORY OR SYSTEM 
                                     CORRECTABLE ECC ERROR 
                                     D-ref fill 
Ext Interface Address Reg xFFFFFF0066C181CF 
Fill Syndrome Reg         x000000000000D900 
Interrupt Summary Reg     x0000000100000000 
                                     Correctable ECC Errors (IPL31) 
                                     AST Requests 3-0:  x0000000000000000 
                                       
WHOAMI                    x00000000  CPU0 Detected This Error 
                                       
--IOD REGISTERS FOLLOW--               
Base Addr of Bridge       x0000000000000000 
                                     Register Contents Not Valid For This Error 
Dev Type & Rev Register   x00000000  Register Contents Not Valid For This Error 
MC Error Info Register 0  x00000000  Register Contents Not Valid For This Error 
MC Error Info Register 1  x00000000  Register Contents Not Valid For This Error 
CAP Error Register        x00000000  Register Contents Not Valid For This Error 
MDPA Status Register      x00000000  MDPA Status Register Data Not Valid 
MDPA Error Syndrome Reg   x00000000  MDPA Syndrome Register Data Not Valid 
MDPB Status Register      x00000000  MDPB Status Register Data Not Valid 
MDPB Error Syndrome Reg   x00000000  MDPB Syndrome Register Data Not Valid 
                                       
PALcode Revision                     Palcode Rev: 1.19-2 
******************************** ENTRY 4556 ******************************** 
Logging OS                        1. OpenVMS 
System Architecture               2. Alpha 
OS version                           V6.2-1H3 
Event sequence number          5791. 
Timestamp of occurrence              26-MAY-1997 11:09:01   
Time since reboot                    2 Day(s) 15:00:19 
Host name                            AXP2     
System Model                         AlphaServer 4100 5/400 4MB 
Entry type                        6. Soft ECC Error 
Memory Minor class                1. Soft ECC error 
Software Flags            x0000000000000000 
Active CPUs               x00000001 
Hardware Rev              x00000000 
System Serial Number                   
Module Serial Number                   
Module Type                   x0000 
System Revision           x00000000 
Machine Check Reason          x0204  IOD Detected Soft Error 
Ext Interface Status Reg  x0000000000000000 
                                     Register Contents Not Valid For This Error 
Ext Interface Address Reg x0000000000000000 
                                     Register Contents Not Valid For This Error 
Fill Syndrome Reg         x0000000000000000 
                                     Register Contents Not Valid For This Error 
Interrupt Summary Reg     x0000000000000000 
                                     Register Contents Not Valid For This Error 
WHOAMI                    x00000000  Register Contents Not Valid For This Error 
                                       
--IOD REGISTERS FOLLOW--               
This Bus Bridge Phy Addr  x000000F9E0000000 
                                     IOD# 0 
Dev Type & Rev Register   x06008221  CAP Chip Revision:        x00000001 
                                     B3040 Module Revision:    x00000002 
                                     B3050 Module Revision:    x00000002 
                                     B3050 Module Type:       Left Hand 
                                     PCI-EISA Bus Bridge Present on PCI Segment 
                                     Device Class: Host Bus to PCI Bridge 
MC Error Info Register 0  x66C181C0 
                                     MC Bus Trans Addr<31:4>: 66C181C0 
MC Error Info Register 1  x800E8900  MC bus trans addr <39:32> x00000000 
                                     MC Command is Read1-Mem 
                                     CPU0 Master at Time of Error 
                                     Device ID:   x00000002 
                                     MC error info valid 
CAP Error Register        x90000000  Correctable ECC err det by MDPB 
                                     MC error info latched 
MDPA Status Register      x00000000  MDPA Status Register Data Not Valid 
MDPA Error Syndrome Reg   x00000000  MDPA Syndrome Register Data Not Valid 
MDPB Status Register      x00000000  MDPB Status Register Data Not Valid 
MDPB Error Syndrome Reg   x00000000  MDPB Syndrome Register Data Not Valid 
                                       
PALcode Revision                     Palcode Rev: 1.19-2 
******************************** ENTRY 4557 ******************************** 
Logging OS                        1. OpenVMS 
System Architecture               2. Alpha 
OS version                           V6.2-1H3 
Event sequence number          5792. 
Timestamp of occurrence              26-MAY-1997 11:09:01   
Time since reboot                    2 Day(s) 15:00:19 
Host name                            AXP2     
System Model                         AlphaServer 4100 5/400 4MB 
Entry type                        6. Soft ECC Error 
Memory Minor class                1. Soft ECC error 
Software Flags            x0000000000000000 
Active CPUs               x00000001 
Hardware Rev              x00000000 
System Serial Number                   
Module Serial Number                   
Module Type                   x0000 
System Revision           x00000000 
Machine Check Reason          x0204  IOD Detected Soft Error 
Ext Interface Status Reg  x0000000000000000 
                                     Register Contents Not Valid For This Error 
Ext Interface Address Reg x0000000000000000 
                                     Register Contents Not Valid For This Error 
Fill Syndrome Reg         x0000000000000000 
                                     Register Contents Not Valid For This Error 
Interrupt Summary Reg     x0000000000000000 
                                     Register Contents Not Valid For This Error 
WHOAMI                    x00000000  Register Contents Not Valid For This Error 
                                       
--IOD REGISTERS FOLLOW--               
This Bus Bridge Phy Addr  x000000FBE0000000 
                                     IOD# 1 
Dev Type & Rev Register   x06000221  CAP Chip Revision:        x00000001 
                                     B3040 Module Revision:    x00000002 
                                     B3050 Module Revision:    x00000002 
                                     B3050 Module Type:       Left Hand 
                                     Internal CAP Chip Arbiter: Enabled 
                                     Device Class: Host Bus to PCI Bridge 
MC Error Info Register 0  x66C181C0 
                                     MC Bus Trans Addr<31:4>: 66C181C0 
MC Error Info Register 1  x800E8900  MC bus trans addr <39:32> x00000000 
                                     MC Command is Read1-Mem 
                                     CPU0 Master at Time of Error 
                                     Device ID:   x00000002 
                                     MC error info valid 
CAP Error Register        x90000000  Correctable ECC err det by MDPB 
                                     MC error info latched 
MDPA Status Register      x00000000  MDPA Status Register Data Not Valid 
MDPA Error Syndrome Reg   x00000000  MDPA Syndrome Register Data Not Valid 
MDPB Status Register      x00000000  MDPB Status Register Data Not Valid 
MDPB Error Syndrome Reg   x00000000  MDPB Syndrome Register Data Not Valid 
                                       
PALcode Revision                     Palcode Rev: 1.19-2 
******************************** ENTRY 4558 ******************************** 
Logging OS                        1. OpenVMS 
System Architecture               2. Alpha 
OS version                           V6.2-1H3 
Event sequence number          5793. 
Timestamp of occurrence              26-MAY-1997 11:09:02   
Time since reboot                    2 Day(s) 15:00:20 
Host name                            AXP2     
System Model                         AlphaServer 4100 5/400 4MB 
Entry type                        6. Soft ECC Error 
Memory Minor class                1. Soft ECC error 
Software Flags            x0000000000000000 
Active CPUs               x00000001 
Hardware Rev              x00000000 
System Serial Number                   
Module Serial Number                   
Module Type                   x0000 
System Revision           x00000000 
Machine Check Reason          x0086  Alpha Chip Detected ECC Error, From Memory 
Ext Interface Status Reg  xFFFFFFF0C1FFFFFF 
                                     DATA SOURCE IS MEMORY OR SYSTEM 
                                     CORRECTABLE ECC ERROR 
                                     D-ref fill 
Ext Interface Address Reg xFFFFFF0066C9A1CF 
Fill Syndrome Reg         x000000000000D600 
Interrupt Summary Reg     x0000000100000000 
                                     Correctable ECC Errors (IPL31) 
                                     AST Requests 3-0:  x0000000000000000 
                                       
WHOAMI                    x00000000  CPU0 Detected This Error 
                                       
--IOD REGISTERS FOLLOW--               
Base Addr of Bridge       x0000000000000000 
                                     Register Contents Not Valid For This Error 
Dev Type & Rev Register   x00000000  Register Contents Not Valid For This Error 
MC Error Info Register 0  x00000000  Register Contents Not Valid For This Error 
MC Error Info Register 1  x00000000  Register Contents Not Valid For This Error 
CAP Error Register        x00000000  Register Contents Not Valid For This Error 
MDPA Status Register      x00000000  MDPA Status Register Data Not Valid 
MDPA Error Syndrome Reg   x00000000  MDPA Syndrome Register Data Not Valid 
MDPB Status Register      x00000000  MDPB Status Register Data Not Valid 
MDPB Error Syndrome Reg   x00000000  MDPB Syndrome Register Data Not Valid 
                                       
PALcode Revision                     Palcode Rev: 1.19-2 
******************************** ENTRY 4559 ******************************** 
Logging OS                        1. OpenVMS 
System Architecture               2. Alpha 
OS version                           V6.2-1H3 
Event sequence number          5794. 
Timestamp of occurrence              26-MAY-1997 11:09:02   
Time since reboot                    2 Day(s) 15:00:20 
Host name                            AXP2     
System Model                         AlphaServer 4100 5/400 4MB 
Entry type                        6. Soft ECC Error 
Memory Minor class                1. Soft ECC error 
Software Flags            x0000000000000000 
Active CPUs               x00000001 
Hardware Rev              x00000000 
System Serial Number                   
Module Serial Number                   
Module Type                   x0000 
System Revision           x00000000 
Machine Check Reason          x0204  IOD Detected Soft Error 
Ext Interface Status Reg  x0000000000000000 
                                     Register Contents Not Valid For This Error 
Ext Interface Address Reg x0000000000000000 
                                     Register Contents Not Valid For This Error 
Fill Syndrome Reg         x0000000000000000 
                                     Register Contents Not Valid For This Error 
Interrupt Summary Reg     x0000000000000000 
                                     Register Contents Not Valid For This Error 
WHOAMI                    x00000000  Register Contents Not Valid For This Error 
                                       
--IOD REGISTERS FOLLOW--               
This Bus Bridge Phy Addr  x000000F9E0000000 
                                     IOD# 0 
Dev Type & Rev Register   x06008221  CAP Chip Revision:        x00000001 
                                     B3040 Module Revision:    x00000002 
                                     B3050 Module Revision:    x00000002 
                                     B3050 Module Type:       Left Hand 
                                     PCI-EISA Bus Bridge Present on PCI Segment 
                                     Device Class: Host Bus to PCI Bridge 
MC Error Info Register 0  x66C9A1D0 
                                     MC Bus Trans Addr<31:4>: 66C9A1D0 
MC Error Info Register 1  x800E9800  MC bus trans addr <39:32> x00000000 
                                     MC Command is Read0-Mem 
                                     CPU0 Master at Time of Error 
                                     Device ID:   x00000002 
                                     MC error info valid 
CAP Error Register        x90000000  Correctable ECC err det by MDPB 
                                     MC error info latched 
MDPA Status Register      x00000000  MDPA Status Register Data Not Valid 
MDPA Error Syndrome Reg   x00000000  MDPA Syndrome Register Data Not Valid 
MDPB Status Register      x00000000  MDPB Status Register Data Not Valid 
MDPB Error Syndrome Reg   x00000000  MDPB Syndrome Register Data Not Valid 
                                       
PALcode Revision                     Palcode Rev: 1.19-2 
******************************** ENTRY 4560 ******************************** 
Logging OS                        1. OpenVMS 
System Architecture               2. Alpha 
OS version                           V6.2-1H3 
Event sequence number          5795. 
Timestamp of occurrence              26-MAY-1997 11:09:02   
Time since reboot                    2 Day(s) 15:00:20 
Host name                            AXP2     
System Model                         AlphaServer 4100 5/400 4MB 
Entry type                        6. Soft ECC Error 
Memory Minor class                1. Soft ECC error 
Software Flags            x0000000000000000 
Active CPUs               x00000001 
Hardware Rev              x00000000 
System Serial Number                   
Module Serial Number                   
Module Type                   x0000 
System Revision           x00000000 
Machine Check Reason          x0204  IOD Detected Soft Error 
Ext Interface Status Reg  x0000000000000000 
                                     Register Contents Not Valid For This Error 
Ext Interface Address Reg x0000000000000000 
                                     Register Contents Not Valid For This Error 
Fill Syndrome Reg         x0000000000000000 
                                     Register Contents Not Valid For This Error 
Interrupt Summary Reg     x0000000000000000 
                                     Register Contents Not Valid For This Error 
WHOAMI                    x00000000  Register Contents Not Valid For This Error 
                                       
--IOD REGISTERS FOLLOW--               
This Bus Bridge Phy Addr  x000000FBE0000000 
                                     IOD# 1 
Dev Type & Rev Register   x06000221  CAP Chip Revision:        x00000001 
                                     B3040 Module Revision:    x00000002 
                                     B3050 Module Revision:    x00000002 
                                     B3050 Module Type:       Left Hand 
                                     Internal CAP Chip Arbiter: Enabled 
                                     Device Class: Host Bus to PCI Bridge 
MC Error Info Register 0  x66C9A1D0 
                                     MC Bus Trans Addr<31:4>: 66C9A1D0 
MC Error Info Register 1  x800E9800  MC bus trans addr <39:32> x00000000 
                                     MC Command is Read0-Mem 
                                     CPU0 Master at Time of Error 
                                     Device ID:   x00000002 
                                     MC error info valid 
CAP Error Register        x90000000  Correctable ECC err det by MDPB 
                                     MC error info latched 
MDPA Status Register      x00000000  MDPA Status Register Data Not Valid 
MDPA Error Syndrome Reg   x00000000  MDPA Syndrome Register Data Not Valid 
MDPB Status Register      x00000000  MDPB Status Register Data Not Valid 
MDPB Error Syndrome Reg   x00000000  MDPB Syndrome Register Data Not Valid 
                                       
PALcode Revision                     Palcode Rev: 1.19-2 
******************************** ENTRY 4561 ******************************** 
Logging OS                        1. OpenVMS 
System Architecture               2. Alpha 
OS version                           V6.2-1H3 
Event sequence number          5796. 
Timestamp of occurrence              26-MAY-1997 11:09:02   
Time since reboot                    2 Day(s) 15:00:20 
Host name                            AXP2     
System Model                         AlphaServer 4100 5/400 4MB 
Entry type                        6. Soft ECC Error 
Memory Minor class                1. Soft ECC error 
Software Flags            x0000000000000000 
Active CPUs               x00000001 
Hardware Rev              x00000000 
System Serial Number                   
Module Serial Number                   
Module Type                   x0000 
System Revision           x00000000 
Machine Check Reason          x0086  Alpha Chip Detected ECC Error, From Memory 
Ext Interface Status Reg  xFFFFFFF0C1FFFFFF 
                                     DATA SOURCE IS MEMORY OR SYSTEM 
                                     CORRECTABLE ECC ERROR 
                                     D-ref fill 
Ext Interface Address Reg xFFFFFF0066C9A1CF 
Fill Syndrome Reg         x000000000000D600 
Interrupt Summary Reg     x0000000100000000 
                                     Correctable ECC Errors (IPL31) 
                                     AST Requests 3-0:  x0000000000000000 
                                       
WHOAMI                    x00000000  CPU0 Detected This Error 
                                       
--IOD REGISTERS FOLLOW--               
Base Addr of Bridge       x0000000000000000 
                                     Register Contents Not Valid For This Error 
Dev Type & Rev Register   x00000000  Register Contents Not Valid For This Error 
MC Error Info Register 0  x00000000  Register Contents Not Valid For This Error 
MC Error Info Register 1  x00000000  Register Contents Not Valid For This Error 
CAP Error Register        x00000000  Register Contents Not Valid For This Error 
MDPA Status Register      x00000000  MDPA Status Register Data Not Valid 
MDPA Error Syndrome Reg   x00000000  MDPA Syndrome Register Data Not Valid 
MDPB Status Register      x00000000  MDPB Status Register Data Not Valid 
MDPB Error Syndrome Reg   x00000000  MDPB Syndrome Register Data Not Valid 
                                       
PALcode Revision                     Palcode Rev: 1.19-2 
******************************** ENTRY 4562 ******************************** 
Logging OS                        1. OpenVMS 
System Architecture               2. Alpha 
OS version                           V6.2-1H3 
Event sequence number          5797. 
Timestamp of occurrence              26-MAY-1997 11:09:02   
Time since reboot                    2 Day(s) 15:00:20 
Host name                            AXP2     
System Model                         AlphaServer 4100 5/400 4MB 
Entry type                        6. Soft ECC Error 
Memory Minor class                1. Soft ECC error 
Software Flags            x0000000000000000 
Active CPUs               x00000001 
Hardware Rev              x00000000 
System Serial Number                   
Module Serial Number                   
Module Type                   x0000 
System Revision           x00000000 
Machine Check Reason          x0204  IOD Detected Soft Error 
Ext Interface Status Reg  x0000000000000000 
                                     Register Contents Not Valid For This Error 
Ext Interface Address Reg x0000000000000000 
                                     Register Contents Not Valid For This Error 
Fill Syndrome Reg         x0000000000000000 
                                     Register Contents Not Valid For This Error 
Interrupt Summary Reg     x0000000000000000 
                                     Register Contents Not Valid For This Error 
WHOAMI                    x00000000  Register Contents Not Valid For This Error 
                                       
--IOD REGISTERS FOLLOW--               
This Bus Bridge Phy Addr  x000000F9E0000000 
                                     IOD# 0 
Dev Type & Rev Register   x06008221  CAP Chip Revision:        x00000001 
                                     B3040 Module Revision:    x00000002 
                                     B3050 Module Revision:    x00000002 
                                     B3050 Module Type:       Left Hand 
                                     PCI-EISA Bus Bridge Present on PCI Segment 
                                     Device Class: Host Bus to PCI Bridge 
MC Error Info Register 0  x66C9A1D0 
                                     MC Bus Trans Addr<31:4>: 66C9A1D0 
MC Error Info Register 1  x800E9900  MC bus trans addr <39:32> x00000000 
                                     MC Command is Read1-Mem 
                                     CPU0 Master at Time of Error 
                                     Device ID:   x00000002 
                                     MC error info valid 
CAP Error Register        x90000000  Correctable ECC err det by MDPB 
                                     MC error info latched 
MDPA Status Register      x00000000  MDPA Status Register Data Not Valid 
MDPA Error Syndrome Reg   x00000000  MDPA Syndrome Register Data Not Valid 
MDPB Status Register      x00000000  MDPB Status Register Data Not Valid 
MDPB Error Syndrome Reg   x00000000  MDPB Syndrome Register Data Not Valid 
                                       
PALcode Revision                     Palcode Rev: 1.19-2 
******************************** ENTRY 4563 ******************************** 
Logging OS                        1. OpenVMS 
System Architecture               2. Alpha 
OS version                           V6.2-1H3 
Event sequence number          5798. 
Timestamp of occurrence              26-MAY-1997 11:09:02   
Time since reboot                    2 Day(s) 15:00:20 
Host name                            AXP2     
System Model                         AlphaServer 4100 5/400 4MB 
Entry type                        6. Soft ECC Error 
Memory Minor class                1. Soft ECC error 
Software Flags            x0000000000000000 
Active CPUs               x00000001 
Hardware Rev              x00000000 
System Serial Number                   
Module Serial Number                   
Module Type                   x0000 
System Revision           x00000000 
Machine Check Reason          x0204  IOD Detected Soft Error 
Ext Interface Status Reg  x0000000000000000 
                                     Register Contents Not Valid For This Error 
Ext Interface Address Reg x0000000000000000 
                                     Register Contents Not Valid For This Error 
Fill Syndrome Reg         x0000000000000000 
                                     Register Contents Not Valid For This Error 
Interrupt Summary Reg     x0000000000000000 
                                     Register Contents Not Valid For This Error 
WHOAMI                    x00000000  Register Contents Not Valid For This Error 
                                       
--IOD REGISTERS FOLLOW--               
This Bus Bridge Phy Addr  x000000FBE0000000 
                                     IOD# 1 
Dev Type & Rev Register   x06000221  CAP Chip Revision:        x00000001 
                                     B3040 Module Revision:    x00000002 
                                     B3050 Module Revision:    x00000002 
                                     B3050 Module Type:       Left Hand 
                                     Internal CAP Chip Arbiter: Enabled 
                                     Device Class: Host Bus to PCI Bridge 
MC Error Info Register 0  x66C9A1D0 
                                     MC Bus Trans Addr<31:4>: 66C9A1D0 
MC Error Info Register 1  x800E9900  MC bus trans addr <39:32> x00000000 
                                     MC Command is Read1-Mem 
                                     CPU0 Master at Time of Error 
                                     Device ID:   x00000002 
                                     MC error info valid 
CAP Error Register        x90000000  Correctable ECC err det by MDPB 
                                     MC error info latched 
MDPA Status Register      x00000000  MDPA Status Register Data Not Valid 
MDPA Error Syndrome Reg   x00000000  MDPA Syndrome Register Data Not Valid 
MDPB Status Register      x00000000  MDPB Status Register Data Not Valid 
MDPB Error Syndrome Reg   x00000000  MDPB Syndrome Register Data Not Valid 
                                       
PALcode Revision                     Palcode Rev: 1.19-2 
******************************** ENTRY 4564 ******************************** 
Logging OS                        1. OpenVMS 
System Architecture               2. Alpha 
OS version                           V6.2-1H3 
Event sequence number          5799. 
Timestamp of occurrence              26-MAY-1997 11:09:15   
Time since reboot                    2 Day(s) 15:00:33 
Host name                            AXP2     
System Model                         AlphaServer 4100 5/400 4MB 
Entry type                        6. Soft ECC Error 
Memory Minor class                1. Soft ECC error 
Software Flags            x0000000000000000 
Active CPUs               x00000001 
Hardware Rev              x00000000 
System Serial Number                   
Module Serial Number                   
Module Type                   x0000 
System Revision           x00000000 
Machine Check Reason          x0086  Alpha Chip Detected ECC Error, From Memory 
Ext Interface Status Reg  xFFFFFFF0C1FFFFFF 
                                     DATA SOURCE IS MEMORY OR SYSTEM 
                                     CORRECTABLE ECC ERROR 
                                     D-ref fill 
Ext Interface Address Reg xFFFFFF0066C9A1CF 
Fill Syndrome Reg         x000000000000D600 
Interrupt Summary Reg     x0000000100000000 
                                     Correctable ECC Errors (IPL31) 
                                     AST Requests 3-0:  x0000000000000000 
                                       
WHOAMI                    x00000000  CPU0 Detected This Error 
                                       
--IOD REGISTERS FOLLOW--               
Base Addr of Bridge       x0000000000000000 
                                     Register Contents Not Valid For This Error 
Dev Type & Rev Register   x00000000  Register Contents Not Valid For This Error 
MC Error Info Register 0  x00000000  Register Contents Not Valid For This Error 
MC Error Info Register 1  x00000000  Register Contents Not Valid For This Error 
CAP Error Register        x00000000  Register Contents Not Valid For This Error 
MDPA Status Register      x00000000  MDPA Status Register Data Not Valid 
MDPA Error Syndrome Reg   x00000000  MDPA Syndrome Register Data Not Valid 
MDPB Status Register      x00000000  MDPB Status Register Data Not Valid 
MDPB Error Syndrome Reg   x00000000  MDPB Syndrome Register Data Not Valid 
                                       
PALcode Revision                     Palcode Rev: 1.19-2 
******************************** ENTRY 4565 ******************************** 
Logging OS                        1. OpenVMS 
System Architecture               2. Alpha 
OS version                           V6.2-1H3 
Event sequence number          5800. 
Timestamp of occurrence              26-MAY-1997 11:09:15   
Time since reboot                    2 Day(s) 15:00:33 
Host name                            AXP2     
System Model                         AlphaServer 4100 5/400 4MB 
Entry type                        6. Soft ECC Error 
Memory Minor class                1. Soft ECC error 
Software Flags            x0000000000000000 
Active CPUs               x00000001 
Hardware Rev              x00000000 
System Serial Number                   
Module Serial Number                   
Module Type                   x0000 
System Revision           x00000000 
Machine Check Reason          x0204  IOD Detected Soft Error 
Ext Interface Status Reg  x0000000000000000 
                                     Register Contents Not Valid For This Error 
Ext Interface Address Reg x0000000000000000 
                                     Register Contents Not Valid For This Error 
Fill Syndrome Reg         x0000000000000000 
                                     Register Contents Not Valid For This Error 
Interrupt Summary Reg     x0000000000000000 
                                     Register Contents Not Valid For This Error 
WHOAMI                    x00000000  Register Contents Not Valid For This Error 
                                       
--IOD REGISTERS FOLLOW--               
This Bus Bridge Phy Addr  x000000F9E0000000 
                                     IOD# 0 
Dev Type & Rev Register   x06008221  CAP Chip Revision:        x00000001 
                                     B3040 Module Revision:    x00000002 
                                     B3050 Module Revision:    x00000002 
                                     B3050 Module Type:       Left Hand 
                                     PCI-EISA Bus Bridge Present on PCI Segment 
                                     Device Class: Host Bus to PCI Bridge 
MC Error Info Register 0  x66C9A1D0 
                                     MC Bus Trans Addr<31:4>: 66C9A1D0 
MC Error Info Register 1  x800E9800  MC bus trans addr <39:32> x00000000 
                                     MC Command is Read0-Mem 
                                     CPU0 Master at Time of Error 
                                     Device ID:   x00000002 
                                     MC error info valid 
CAP Error Register        x90000000  Correctable ECC err det by MDPB 
                                     MC error info latched 
MDPA Status Register      x00000000  MDPA Status Register Data Not Valid 
MDPA Error Syndrome Reg   x00000000  MDPA Syndrome Register Data Not Valid 
MDPB Status Register      x00000000  MDPB Status Register Data Not Valid 
MDPB Error Syndrome Reg   x00000000  MDPB Syndrome Register Data Not Valid 
                                       
PALcode Revision                     Palcode Rev: 1.19-2 
******************************** ENTRY 4566 ******************************** 
Logging OS                        1. OpenVMS 
System Architecture               2. Alpha 
OS version                           V6.2-1H3 
Event sequence number          5801. 
Timestamp of occurrence              26-MAY-1997 11:09:15   
Time since reboot                    2 Day(s) 15:00:33 
Host name                            AXP2     
System Model                         AlphaServer 4100 5/400 4MB 
Entry type                        6. Soft ECC Error 
Memory Minor class                1. Soft ECC error 
Software Flags            x0000000000000000 
Active CPUs               x00000001 
Hardware Rev              x00000000 
System Serial Number                   
Module Serial Number                   
Module Type                   x0000 
System Revision           x00000000 
Machine Check Reason          x0204  IOD Detected Soft Error 
Ext Interface Status Reg  x0000000000000000 
                                     Register Contents Not Valid For This Error 
Ext Interface Address Reg x0000000000000000 
                                     Register Contents Not Valid For This Error 
Fill Syndrome Reg         x0000000000000000 
                                     Register Contents Not Valid For This Error 
Interrupt Summary Reg     x0000000000000000 
                                     Register Contents Not Valid For This Error 
WHOAMI                    x00000000  Register Contents Not Valid For This Error 
                                       
--IOD REGISTERS FOLLOW--               
This Bus Bridge Phy Addr  x000000FBE0000000 
                                     IOD# 1 
Dev Type & Rev Register   x06000221  CAP Chip Revision:        x00000001 
                                     B3040 Module Revision:    x00000002 
                                     B3050 Module Revision:    x00000002 
                                     B3050 Module Type:       Left Hand 
                                     Internal CAP Chip Arbiter: Enabled 
                                     Device Class: Host Bus to PCI Bridge 
MC Error Info Register 0  x66C9A1D0 
                                     MC Bus Trans Addr<31:4>: 66C9A1D0 
MC Error Info Register 1  x800E9800  MC bus trans addr <39:32> x00000000 
                                     MC Command is Read0-Mem 
                                     CPU0 Master at Time of Error 
                                     Device ID:   x00000002 
                                     MC error info valid 
CAP Error Register        x90000000  Correctable ECC err det by MDPB 
                                     MC error info latched 
MDPA Status Register      x00000000  MDPA Status Register Data Not Valid 
MDPA Error Syndrome Reg   x00000000  MDPA Syndrome Register Data Not Valid 
MDPB Status Register      x00000000  MDPB Status Register Data Not Valid 
MDPB Error Syndrome Reg   x00000000  MDPB Syndrome Register Data Not Valid 
                                       
PALcode Revision                     Palcode Rev: 1.19-2 
******************************** ENTRY 4567 ******************************** 
Logging OS                        1. OpenVMS 
System Architecture               2. Alpha 
OS version                           V6.2-1H3 
Event sequence number          5802. 
Timestamp of occurrence              26-MAY-1997 11:09:31   
Time since reboot                    2 Day(s) 15:00:49 
Host name                            AXP2     
System Model                         AlphaServer 4100 5/400 4MB 
Entry type                       38. Time Stamp Entry 
SWI Minor class                   7. Timestamp 
******************************** ENTRY 4568 ******************************** 
Logging OS                        1. OpenVMS 
System Architecture               2. Alpha 
OS version                           V6.2-1H3 
Event sequence number          5803. 
Timestamp of occurrence              26-MAY-1997 11:11:27   
Time since reboot                    2 Day(s) 15:02:45 
Host name                            AXP2     
System Model                         AlphaServer 4100 5/400 4MB 
Entry type                        6. Soft ECC Error 
Memory Minor class                1. Soft ECC error 
Software Flags            x0000000000000000 
Active CPUs               x00000001 
Hardware Rev              x00000000 
System Serial Number                   
Module Serial Number                   
Module Type                   x0000 
System Revision           x00000000 
Machine Check Reason          x0086  Alpha Chip Detected ECC Error, From Memory 
Ext Interface Status Reg  xFFFFFFF4C1FFFFFF 
                                     DATA SOURCE IS MEMORY OR SYSTEM 
                                     CORRECTABLE ECC ERROR 
                                     I-ref fill 
Ext Interface Address Reg xFFFFFF006703B71F 
Fill Syndrome Reg         x000000000000DC00 
Interrupt Summary Reg     x0000000100000000 
                                     Correctable ECC Errors (IPL31) 
                                     AST Requests 3-0:  x0000000000000000 
                                       
WHOAMI                    x00000000  CPU0 Detected This Error 
                                       
--IOD REGISTERS FOLLOW--               
Base Addr of Bridge       x0000000000000000 
                                     Register Contents Not Valid For This Error 
Dev Type & Rev Register   x00000000  Register Contents Not Valid For This Error 
MC Error Info Register 0  x00000000  Register Contents Not Valid For This Error 
MC Error Info Register 1  x00000000  Register Contents Not Valid For This Error 
CAP Error Register        x00000000  Register Contents Not Valid For This Error 
MDPA Status Register      x00000000  MDPA Status Register Data Not Valid 
MDPA Error Syndrome Reg   x00000000  MDPA Syndrome Register Data Not Valid 
MDPB Status Register      x00000000  MDPB Status Register Data Not Valid 
MDPB Error Syndrome Reg   x00000000  MDPB Syndrome Register Data Not Valid 
                                       
PALcode Revision                     Palcode Rev: 1.19-2 
******************************** ENTRY 4569 ******************************** 
Logging OS                        1. OpenVMS 
System Architecture               2. Alpha 
OS version                           V6.2-1H3 
Event sequence number          5804. 
Timestamp of occurrence              26-MAY-1997 11:11:27   
Time since reboot                    2 Day(s) 15:02:45 
Host name                            AXP2     
System Model                         AlphaServer 4100 5/400 4MB 
Entry type                        6. Soft ECC Error 
Memory Minor class                1. Soft ECC error 
Software Flags            x0000000000000000 
Active CPUs               x00000001 
Hardware Rev              x00000000 
System Serial Number                   
Module Serial Number                   
Module Type                   x0000 
System Revision           x00000000 
Machine Check Reason          x0204  IOD Detected Soft Error 
Ext Interface Status Reg  x0000000000000000 
                                     Register Contents Not Valid For This Error 
Ext Interface Address Reg x0000000000000000 
                                     Register Contents Not Valid For This Error 
Fill Syndrome Reg         x0000000000000000 
                                     Register Contents Not Valid For This Error 
Interrupt Summary Reg     x0000000000000000 
                                     Register Contents Not Valid For This Error 
WHOAMI                    x00000000  Register Contents Not Valid For This Error 
                                       
--IOD REGISTERS FOLLOW--               
This Bus Bridge Phy Addr  x000000F9E0000000 
                                     IOD# 0 
Dev Type & Rev Register   x06008221  CAP Chip Revision:        x00000001 
                                     B3040 Module Revision:    x00000002 
                                     B3050 Module Revision:    x00000002 
                                     B3050 Module Type:       Left Hand 
                                     PCI-EISA Bus Bridge Present on PCI Segment 
                                     Device Class: Host Bus to PCI Bridge 
MC Error Info Register 0  x6703B700 
                                     MC Bus Trans Addr<31:4>: 6703B700 
MC Error Info Register 1  x800E8800  MC bus trans addr <39:32> x00000000 
                                     MC Command is Read0-Mem 
                                     CPU0 Master at Time of Error 
                                     Device ID:   x00000002 
                                     MC error info valid 
CAP Error Register        x90000000  Correctable ECC err det by MDPB 
                                     MC error info latched 
MDPA Status Register      x00000000  MDPA Status Register Data Not Valid 
MDPA Error Syndrome Reg   x00000000  MDPA Syndrome Register Data Not Valid 
MDPB Status Register      x00000000  MDPB Status Register Data Not Valid 
MDPB Error Syndrome Reg   x00000000  MDPB Syndrome Register Data Not Valid 
                                       
PALcode Revision                     Palcode Rev: 1.19-2 
******************************** ENTRY 4570 ******************************** 
Logging OS                        1. OpenVMS 
System Architecture               2. Alpha 
OS version                           V6.2-1H3 
Event sequence number          5805. 
Timestamp of occurrence              26-MAY-1997 11:11:27   
Time since reboot                    2 Day(s) 15:02:45 
Host name                            AXP2     
System Model                         AlphaServer 4100 5/400 4MB 
Entry type                        6. Soft ECC Error 
Memory Minor class                1. Soft ECC error 
Software Flags            x0000000000000000 
Active CPUs               x00000001 
Hardware Rev              x00000000 
System Serial Number                   
Module Serial Number                   
Module Type                   x0000 
System Revision           x00000000 
Machine Check Reason          x0204  IOD Detected Soft Error 
Ext Interface Status Reg  x0000000000000000 
                                     Register Contents Not Valid For This Error 
Ext Interface Address Reg x0000000000000000 
                                     Register Contents Not Valid For This Error 
Fill Syndrome Reg         x0000000000000000 
                                     Register Contents Not Valid For This Error 
Interrupt Summary Reg     x0000000000000000 
                                     Register Contents Not Valid For This Error 
WHOAMI                    x00000000  Register Contents Not Valid For This Error 
                                       
--IOD REGISTERS FOLLOW--               
This Bus Bridge Phy Addr  x000000FBE0000000 
                                     IOD# 1 
Dev Type & Rev Register   x06000221  CAP Chip Revision:        x00000001 
                                     B3040 Module Revision:    x00000002 
                                     B3050 Module Revision:    x00000002 
                                     B3050 Module Type:       Left Hand 
                                     Internal CAP Chip Arbiter: Enabled 
                                     Device Class: Host Bus to PCI Bridge 
MC Error Info Register 0  x6703B700 
                                     MC Bus Trans Addr<31:4>: 6703B700 
MC Error Info Register 1  x800E8800  MC bus trans addr <39:32> x00000000 
                                     MC Command is Read0-Mem 
                                     CPU0 Master at Time of Error 
                                     Device ID:   x00000002 
                                     MC error info valid 
CAP Error Register        x90000000  Correctable ECC err det by MDPB 
                                     MC error info latched 
MDPA Status Register      x00000000  MDPA Status Register Data Not Valid 
MDPA Error Syndrome Reg   x00000000  MDPA Syndrome Register Data Not Valid 
MDPB Status Register      x00000000  MDPB Status Register Data Not Valid 
MDPB Error Syndrome Reg   x00000000  MDPB Syndrome Register Data Not Valid 
                                       
PALcode Revision                     Palcode Rev: 1.19-2 
******************************** ENTRY 4571 ******************************** 
Logging OS                        1. OpenVMS 
System Architecture               2. Alpha 
OS version                           V6.2-1H3 
Event sequence number          5806. 
Timestamp of occurrence              26-MAY-1997 11:14:58   
Time since reboot                    2 Day(s) 15:06:16 
Host name                            AXP2     
System Model                         AlphaServer 4100 5/400 4MB 
Entry type                        2. Machine Check  
CPU Minor class                   1. Machine check (670 entry) 
Software Flags            x0000000300000000 
                                     IOD 0 Register Subpkt Pres 
                                     IOD 1 Register Subpkt Pres 
Active CPUs               x00000001 
Hardware Rev              x00000000 
System Serial Number                   
Module Serial Number                   
Module Type                   x0000 
System Revision           x00000000 
* MCHK 670 Regs *                      
Flags:                    x00000000 
PCI Mask                      x0000 
Machine Check Reason          x0098  Fatal Alpha Chip Detected Hard Error 
PAL SHADOW REG 0          x0000000000000000 
PAL SHADOW REG 1          x0000000000000000 
PAL SHADOW REG 2          x0000000000000000 
PAL SHADOW REG 3          x0000000000000000 
PAL SHADOW REG 4          x0000000000000000 
PAL SHADOW REG 5          x000000FBE0000000 
PAL SHADOW REG 6          x6703B70006000221 
PAL SHADOW REG 7          x90000000800E8800 
PALTEMP0                  x0000000000000001 
PALTEMP1                  x0000000000000004 
PALTEMP2                  xFFFFFFFF92850918 
PALTEMP3                  x0000000000004400 
PALTEMP4                  x00000000098BE130 
PALTEMP5                  x0000000000000180 
PALTEMP6                  x0000000000000004 
PALTEMP7                  x0000000000000016 
PALTEMP8                  x0000000000000004 
PALTEMP9                  x0000000000000003 
PALTEMP10                 x000000000012A04C 
PALTEMP11                 x0000000000000000 
PALTEMP12                 xFFFFFFFF83A25C80 
PALTEMP13                 x0000000000006E80 
PALTEMP14                 x0000000000000000 
PALTEMP15                 x00000000000F0000 
PALTEMP16                 x0000009806700001 
PALTEMP17                 x0000529323B0E1EE 
PALTEMP18                 xFFFFFFFF81C20000 
PALTEMP19                 x000000007FF92000 
PALTEMP20                 x0000000042FA2000 
PALTEMP21                 x0000000200000000 
PALTEMP22                 x0000000000CF0000 
PALTEMP23                 x0000000045308080 
Exception Address Reg     x000000000012A04C 
                                     Native-mode Instruction 
                                     Exception PC  x000000000004A813 
Exception Summary Reg     x0000000000000000 
Exception Mask Reg        x0000000000000000 
PAL Base Address Reg      x0000000000008000 
                                     Base Addr for PALcode:  x0000000000000002 
Interrupt Summary Reg     x0000000000200000 
                                     External HW Interrupt at IPL21 
                                     AST Requests 3-0:  x0000000000000000 
IBOX Ctrl and Status Reg  x000000C144020000 
                                     Timeout Counter Bit Clear. 
                                     IBOX Timeout Counter Enabled. 
                                     Floating Point Instr's May be Issued. 
                                     PAL Shadow Registers Enabled. 
                                     Correctable Error Interrupts Enabled. 
                                     ICACHE BIST (Self Test) Was Successful. 
                                     TEST_STATUS_H Pin Asserted 
Icache Par Err Stat Reg   x0000000000000000 
Dcache Par Err Stat Reg   x0000000000000000 
Virtual Address Reg       x00000000009DF9C0 
Memory Mgmt Flt Sts Reg   x0000000000014310 
                                     If Err, Reference Resulted in DTB Miss 
                                     Fault Inst RA Field:  x000000000000000C 
                                     Fault Inst Opcode:  x0000000000000028 
Scache Address Reg        xFFFFFF000000F3AF 
Scache Status Reg         x0000000000000000 
Bcache Tag Address Reg    xFFFFFF80290D0FFF 
                                     Last Bcache Access Resulted in a Miss. 
                                     Value of Parity Bit for Tag Control Status 
                                        Bits Dirty, Shared & Valid is Clear. 
                                     Value of Tag Control Dirty Bit is Clear. 
                                     Value of Tag Control Shared Bit is Clear. 
                                     Value of Tag Control Valid Bit is Set. 
                                     Value of Parity Bit Covering Tag Store 
                                        Address Bits is Clear. 
                                     Tag Address<38:20> Is:  x0000000000000290 
Ext Interface Address Reg xFFFFFF00669221CF 
Fill Syndrome Reg         x0000000000001800 
Ext Interface Status Reg  xFFFFFFF141FFFFFF 
                                     Error Source is Memory or System 
                                     UNCORRECTABLE ECC ERROR 
                                     Error Occurred During D-ref Fill 
LD LOCK                   xFFFFFF0043C5E7CF 
** IOD SUBPACKET -> **               IOD 0 Register Subpacket 
WHOAMI                    x000004FA  Module Revision  1. 
                                     VCTY ASIC Rev = 0 
                                     Bcache Size = 4MB 
                                     CPU = 0 
This Bus Bridge Phy Addr  x000000F9E0000000 
                                     IOD# 0 
Dev Type & Rev Register   x06008221  CAP Chip Revision:        x00000001 
                                     B3040 Module Revision:    x00000002 
                                     B3050 Module Revision:    x00000002 
                                     B3050 Module Type:       Left Hand 
                                     PCI-EISA Bus Bridge Present on PCI Segment 
                                     Device Class: Host Bus to PCI Bridge 
MC-PCI Command Register   x46490FB1  Module Self-Test Passed LED On. 
                                     Delayed PCI Bus Reads Protocol: Enabled 
                                     Bridge to PCI Transactions:     Enabled 
                                     Bridge WILL NOT REQUEST 64 Bit Data Trans 
                                     Bridge ACCEPTS 64 Bit Data Transactions 
                                     PCI Address Parity Check:       Enabled 
                                     MC Bus CMD/Addr Parity Check:   Enabled 
                                     MC Bus NXM Check:               Enabled 
                                     Check ALL Transactions for Errors 
                                     Use MC_BMSK for 16 Byte Align Blk Mem Wrt 
                                     Wrt PEND_NUM Threshold:  9. 
                                     RD_TYPE Memory Prefetch Algorithm: Short 
                                     RL_TYPE Mem Rd Line Prefetch Type: Medium 
                                     RM_TYPE Mem Rd Multiple Cmd Type:  Long 
                                     ARB_MODE PCI Arbitration: Round Robin 
Mem Host Address Ext Reg  x00000000  HAE Sparse Mem Adr<31:27> x00000000 
IO Host Adr Ext Register  x00000000  PCI Upper Adr Bits<31:25> x00000000 
Interrupt Ctrl Register   x00000003  Write Device Interrupt Info Struct:Enabled 
Interrupt Request         x00800000  Interrupts asserted  x00000000 
                                     Hard Error 
Interrupt Mask0 Register  x00E51000 
Interrupt Mask1 Register  x00000000 
MC Error Info Register 0  x669221D0 
                                     MC Bus Trans Addr<31:4>: 669221D0 
MC Error Info Register 1  x800E8900  MC bus trans addr <39:32> x00000000 
                                     MC Command is Read1-Mem 
                                     CPU0 Master at Time of Error 
                                     Device ID:   x00000002 
                                     MC error info valid 
CAP Error Register        xC0000000  Uncorrectable ECC err det by MDPB 
                                     MC error info latched 
PCI Bus Trans Error Adr   x00000000 
MDPA Status Register      x00000000  MDPA Status Register Data Not Valid 
MDPA Error Syndrome Reg   x00000000  MDPA Syndrome Register Data Not Valid 
MDPB Status Register      x00000000  MDPB Status Register Data Not Valid 
MDPB Error Syndrome Reg   x00000000  MDPB Syndrome Register Data Not Valid 
                                       
** IOD SUBPACKET -> **               IOD 1 Register Subpacket 
WHOAMI                    x000004FA  Module Revision  1. 
                                     VCTY ASIC Rev = 0 
                                     Bcache Size = 4MB 
                                     CPU = 0 
This Bus Bridge Phy Addr  x000000FBE0000000 
                                     IOD# 1 
Dev Type & Rev Register   x06000221  CAP Chip Revision:        x00000001 
                                     B3040 Module Revision:    x00000002 
                                     B3050 Module Revision:    x00000002 
                                     B3050 Module Type:       Left Hand 
                                     Internal CAP Chip Arbiter: Enabled 
                                     Device Class: Host Bus to PCI Bridge 
MC-PCI Command Register   x46490FB1  Module Self-Test Passed LED On. 
                                     Delayed PCI Bus Reads Protocol: Enabled 
                                     Bridge to PCI Transactions:     Enabled 
                                     Bridge WILL NOT REQUEST 64 Bit Data Trans 
                                     Bridge ACCEPTS 64 Bit Data Transactions 
                                     PCI Address Parity Check:       Enabled 
                                     MC Bus CMD/Addr Parity Check:   Enabled 
                                     MC Bus NXM Check:               Enabled 
                                     Check ALL Transactions for Errors 
                                     Use MC_BMSK for 16 Byte Align Blk Mem Wrt 
                                     Wrt PEND_NUM Threshold:  9. 
                                     RD_TYPE Memory Prefetch Algorithm: Short 
                                     RL_TYPE Mem Rd Line Prefetch Type: Medium 
                                     RM_TYPE Mem Rd Multiple Cmd Type:  Long 
                                     ARB_MODE PCI Arbitration: Round Robin 
Mem Host Address Ext Reg  x00000000  HAE Sparse Mem Adr<31:27> x00000000 
IO Host Adr Ext Register  x00000000  PCI Upper Adr Bits<31:25> x00000000 
Interrupt Ctrl Register   x00000003  Write Device Interrupt Info Struct:Enabled 
Interrupt Request         x00800000  Interrupts asserted  x00000000 
                                     Hard Error 
Interrupt Mask0 Register  x00C11111 
Interrupt Mask1 Register  x00000000 
MC Error Info Register 0  x669221D0 
                                     MC Bus Trans Addr<31:4>: 669221D0 
MC Error Info Register 1  x800E8900  MC bus trans addr <39:32> x00000000 
                                     MC Command is Read1-Mem 
                                     CPU0 Master at Time of Error 
                                     Device ID:   x00000002 
                                     MC error info valid 
CAP Error Register        xC0000000  Uncorrectable ECC err det by MDPB 
                                     MC error info latched 
PCI Bus Trans Error Adr   x00000000 
MDPA Status Register      x00000000  MDPA Status Register Data Not Valid 
MDPA Error Syndrome Reg   x00000000  MDPA Syndrome Register Data Not Valid 
MDPB Status Register      x00000000  MDPB Status Register Data Not Valid 
MDPB Error Syndrome Reg   x00000000  MDPB Syndrome Register Data Not Valid 
                                       
PALcode Revision                     Palcode Rev: 1.19-2 
******************************** ENTRY 4572 ******************************** 
Logging OS                        1. OpenVMS 
System Architecture               2. Alpha 
OS version                           V6.2-1H3 
Event sequence number          5807. 
Timestamp of occurrence              26-MAY-1997 11:14:58   
Time since reboot                    2 Day(s) 15:06:16 
Host name                            AXP2     
System Model                         AlphaServer 4100 5/400 4MB 
Entry type                       37. Crash Re-Start 
Bugcheck Minor class              1. Crash Re-start 
Bugcheck Msg                         MACHINECHK, Machine check while in kernel 
                                     mode 
Process ID                x000600D7 
Process Name                           
KSP                       x000000007FF91EC0 
ESP                       x000000007FF96000 
SSP                       x000000007FF9C100 
USP                       x000000007ED12E70 
R0                        x0000000000000000 
R1                        x000000007FF91EE0 
R2                        xFFFFFFFF927AE2B8 
R3                        xFFFFFFFF927AE810 
R4                        x0000000000000000 
R5                        x0000000000000180 
R6                        x0000000000000004 
R7                        x000000000861C100 
R8                        x0000000000000006 
R9                        x0000000000000000 
R10                       x0000000000000001 
R11                       x0000000000000000 
R12                       x0000000000000000 
R13                       x0000000000000000 
R14                       x0000000000000000 
R15                       x00000000009DF9B0 
R16                       x0000000000000215 
R17                       x0000000000000001 
R18                       x0000000000000001 
R19                       xFFFFFFFF81C1DF18 
R20                       x0000000000000008 
R21                       xFFFFFFFF81C1DF18 
R22                       x0000000000000100 
R23                       x0000000000000180 
R24                       xFFFFFFFF81C1DC00 
R25                       x0000000000000003 
R26                       x0000000000000210 
R27                       xFFFFFFFF927B6560 
R28                       xFFFFFFFF8003F0EC 
FP                        x000000007FF91EC0 
SP                        x000000007FF91EC0 
PC                        xFFFFFFFF8004E610 
PS                        x0000000000001F00 
PTBR                      x00000000000217D1 
Process Ctl Block Base Re x0000000045308080 
PRBR                      xFFFFFFFF81C20000 
VPTB                      x0000000200000000 
System Ctl Block Base Reg x0000000000000678 
Software Interrupt Summar x0000000000000000 
ASN                       x0000000000000045 
ASTSR ASTEN               x000000000000000F 
FEN                       x0000000000000001 
ASN                       x0000000000000045 
IPL                       x000000000000001F 
MCES                      x0000000000000001 
 | 
| 589.10 | Try swapping high/low members of upper 1GB memory option? | HARMNY::CUMMINS |  | Tue May 27 1997 10:29 | 26 | 
|  |     Several different error addresses in this log. All in upper 1GB of
    memory. Correctables early on with uncorrectable eventually..
    
    Correctables:
    EI_ADDR: xFFFFFF0066C181CF    FILL_SYN: D900  -->  data bit 05
    EI_ADDR: xFFFFFF0066C9A1CF    FILL_SYN: D600  -->  data bit 04
    EI_ADDR: xFFFFFF0066C9A1CF    FILL_SYN: D600  -->  data bit 04
    EI_ADDR: xFFFFFF0066C9A1CF    FILL_SYN: D600  -->  data bit 04
    EI_ADDR: xFFFFFF006703B71F    FILL_SYN: DC00  -->  data bit 07
    
    Uncorrectable:
    EI_ADDR: xFFFFFF00669221CF
    
    IOD's MDPB chip saw the same errors..
    
    All errors originated in/from memory since no DIRTY bit set.
    
    Re: bad spares.. This is quite possible since from what I have seen the
    quality of the 4100/4000 spares, esp. memory, is just plain awful at best.
    
    Since the data points to MDPB always detecting the fault and the syndrome
    register always points to the high half of the transaction, if you're still
    in experimentation mode, you could try swapping the low/high halves of the
    upper 1GB pair to see if the problem follows the card or not. This would
    give you a good idea whether you are chasing a faulty memory spare versus a
    motherboard or some other systemic problem.
 | 
| 589.11 |  | HARMNY::CUMMINS |  | Wed May 28 1997 13:07 | 9 | 
|  | Note that the data bit callouts in reply -.1 are as described in the EV5 HW
spec. Since the upper byte of the syndrome is involved in each of the CRDs,
one actually needs to add 64 to these numbers..
 
    EI_ADDR: xFFFFFF0066C181CF    FILL_SYN: D900  -->  data bit 69
    EI_ADDR: xFFFFFF0066C9A1CF    FILL_SYN: D600  -->  data bit 68
    EI_ADDR: xFFFFFF0066C9A1CF    FILL_SYN: D600  -->  data bit 68
    EI_ADDR: xFFFFFF0066C9A1CF    FILL_SYN: D600  -->  data bit 68
    EI_ADDR: xFFFFFF006703B71F    FILL_SYN: DC00  -->  data bit 71
 | 
| 589.12 | looking like memory all along (sigh....) | GIDDAY::FLAWN |  | Fri May 30 1997 09:07 | 29 | 
|  | 
Thanks, this does now look like bad memory, with the earlier failure to get
information due to ERLBUFFERPAGES being too low and I think the machine check
handler blowing the kernel stack with SYSGEN param KSTACKPAGES at 1 (I set it
way up to 6, 2 would probably have done).
The customer has shifted to using a loan 2100 so we could run diags - it turns
out that DECVET wasn't necessary, the console diags pull consistent soft errors
at a reasonably regular rate which is what we'll chase.
Moving MEM1H down to MEM0L shifts the errors down to the low 1GB with the
syndrom bits indicting the low card. (Initially MEM1L and MEM1H were swapped
and this showed the low card faulty in MEM1, so we have consistency). We'll
now proceed to weed out the others.
This is a pretty rough rate of failure on new boards - do you think
manufacturing is aware of these instances already ?
I don't quite understand the distinction between looking at the syndrome info 
and which MDP ASIC saw the error - my understanding is that without syndrome 
information I can't pick the card (MDPB is the one seeing the errors even on 
the low card). At one stage I thought if the error was in the high 64 bits 
(seen by MDPB) then that meant the high card but that's clearly not the case 
(what I'm missing is why - i.e. which card the bits of a given physical 
address is on - I've looked at the SPM and the system spec ... don't have a HW
spec).... maybe this should be obvious anyway....
Thanks for the help with this mess,
Dave.
 | 
| 589.13 | Not sliced on QW boundaries | POBOXB::STEINMAN |  | Fri May 30 1997 09:18 | 6 | 
|  |     
    The system bus data bits (127:0) are not perfectly sliced between the
    HIGH and LOW modules of a given memory pair.  Due to routing, timing,
    layout, etc. the bits are somewhat scattered. 
    
    mo
 | 
| 589.14 | Can only isolate to mem pair member on EV5 CRDs | HARMNY::CUMMINS |  | Mon Jun 02 1997 09:30 | 32 | 
|  |     Another issue is that all versions of the MDP chips have a bug in them
    which can result in data corruption if software accesses registers on the
    MDP chips that require involvement by the MDPB. There are four such CSRs:
    
      MDPA_STAT,
      MDPB_STAT,
      MDPA_SYNDROME, and
      MDPB_SYNDROME.
    
    Unfortunately, these are the registers you need to use to figure out which
    half of a given memory option is at fault in the case of IOD-detected CRDs.
    Accordingly, we changed PALcode early on to not collect state from these
    registers during PALcode CRD handling.
    
    At the time we made the SRM PALcode changes, we implemented a call to PAL
    (CSERVE) that could be used to enable reading of these registers on CRDs.
    The SRM console-based TEST command turns on this feature so that IOD stat
    and syndrome info is collected and displayed during error handling. TEST
    does no writes to media (in FIELD mode), so the risk of data corruption is
    essentially nil. Writes are performed in manufacturing mode TEST, but data
    corruption is recoverable in this environment, should it occur.
    
    Finally, UNIX/VMS PAL always scrubs CRDs, and so the thinking was that the
    MDP bug did not need to be fixed since there should be an EV5-detected CRD
    generated during the read of the faulty memory location, provided the error
    was not a transient. PAL collects syndrome information on EV5-detected CRD
    errors, and this data can be used to isolate to the correct half of the MEM
    pair. [I'm not sure whether HAL scrubs memory on CRDs in an NT environment,
    but will find out and post a reply here with the answer..]
    
    I'd be interested to know if you are seeing IOD-detected CRDs with no
    accompanying EV5-detected CRDs.
 |