[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference wrksys::alphastation

Title:Alpha Workstation Conference
Notice:See note 1.* for conference notices
Moderator:WRKSYS::HOUSE
Created:Wed Sep 07 1994
Last Modified:Fri Jun 06 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:1996
Total number of notes:9122

1963.0. "AS600 5/333 CPU Exceptions" by LOTIMA::P_MORRIS () Fri May 09 1997 12:17

    Can anybody shed some light on the following:-
    
    We have an Alphastation 600 5/333 running Digital Unix V4.0 rev 386 that is
    getting intermittent CPU exceptions,with the following info from DECevent:-
    
    DECevent entry info:
    ====================
    Entry type 		100 	CPU machine Check Error
    CPU Minor Class    	3	Bcacahe error (630 entry)
    Mach Check Err Code 086	EV5 Detected Correctable ECC error
    
    EI ADDR		xFFFFFF000F7948EF
    FILL SYNDROME	x000000000000E500
    EI STATUS		xFFFFFFF0C4FFFFFF
    				error occurred during Dref fill
    ISR			x0000000100000000
    				correctable ECC errors (IPL31)
    
    All other registers are shown as being 0000 by the DECevent output.
    This is a soft error & the machine doesn't crash & there is no
    noticeable sign of impact on performance etc.
    
    Parts replaced so far:
    ======================
    We have tried replacing the following parts,which have now all been
    taken back out of the machine & returned to logistics.
    
    Bcache Simms	Replaced 2 in 1 go & then refitted originals &
    			replaced the remaining simm
    
    Memory Simms	Removed banks 1 + 2(leaving bank0 in)
    			Then removed bank 0 & refitted just bank 1 in place
    			of the original bank 0
    
    Mem motherboard	Replaced one at a time
    
    System Board	Replaced
    
    None of the above had any effect on the fault.
    We can force the machine to log several of these soft errors by running
    DECvet with 5 CPU exers & 5 MEM exers for approx 10 minutes
    
    In normal use these errors crop up randomly & infrequently maybe 1 a
    day but can be up to 20 a day.
    
    System Config info:
    ===================
    Below are outputs of show config & show *
    
    >>>show config
    
    Firmware
    SRM Console:    V6.3-11
    ARC Console:    4.49
    PALcode:        VMS PALcode V1.18-0, OSF PALcode V1.21-0
    SROM Version:   V1.2
    
    Processor
    DECchip (tm) 21164-5    Pass 4  333 MHz  96 KBytes SCache
    4 MB BCache
    CIA ASIC Pass 2
    
    MEMORY
    Memory Size = 448Mb
    Bank      Size/Sets   Base Addr     Speed
    ------    ----------  ---------     -----
    00        064Mb/2     018000000     Slow
    01        256Mb/2     000000000     Fast
    02        128Mb/1     010000000     Fast
    
    BCache Size = 4Mb
    
    Tested Memory =  448Mbytes
    
    PCI Bus
         Bus 00  Slot 07: Digital TGA2 Graphics Controller
    
         Bus 00  Slot 08: Digital PCI to PCI Bridge Chip
    
         Bus 01  Slot 00: DECchip 21040 Network Controller
                                       ewa0.0.0.1000.0      00-00-F8-20-E6-95
    
         Bus 01  Slot 01: ISP1020 Scsi Controller
                                       pka0.7.0.1001.0       SCSI Bus ID 7
                                       dka0.0.0.1001.0        RZ28D
                                       dka300.3.0.1001.0      DSP5200S
                                       dka500.5.0.1001.0      RRD45
                                       jka401.4.0.1001.0      TZ
                                       mka400.4.0.1001.0      TZ887
    
         Bus 01  Slot 02: ISP1020 Scsi Controller
                                       pkb0.7.0.1002.0       SCSI Bus ID 7
    
         Bus 00  Slot 09: DAC960  Scsi Raid Controller
                                       dra.0.0.9.0
                                       dra0.0.0.9.0           1 Member JBOD
                                       dra1.0.0.9.0           1 Member JBOD
                                       dra2.0.0.9.0           1 Member JBOD
                                       dra3.0.0.9.0           1 Member JBOD
                                       dra4.0.0.9.0           1 Member JBOD
                                       dra5.0.0.9.0           1 Member JBOD
                                       dra6.0.0.9.0           1 Member JBOD
                                       dra7.0.0.9.0           1 Member JBOD
    
         Bus 00  Slot 10: Intel   8275EB PCI to Eisa Bridge
    
    
         Bus 00  Slot 12: DAC960  Scsi Raid Controller
                                       drb.0.0.12.0
                                       drb0.0.0.12.0          2 Member RAID0
                                       drb1.0.0.12.0          2 Member RAID0
                                       drb2.0.0.12.0          2 Member RAID0
                                       drb3.0.0.12.0          2 Member RAID0
    
    EISA Bus Modules (installed)
    
    EISA/ISA NVR configuration
            SLOT    Module
    
    
    >>>show *
    auto_action             HALT
    boot_dev                dka0.0.0.1001.0
    boot_file
    boot_osflags            A
    boot_reset              OFF
    bootdef_dev             dka0.0.0.1001.0
    booted_dev              dka0.0.0.1001.0
    booted_file
    booted_osflags          A
    bus_probe_algorithm     new
    char_set                0
    console                 graphics
    controlp                on
    dump_dev
    enable_audit            ON
    ewa0_arp_tries          3
    ewa0_bootp_file
    ewa0_bootp_server
    ewa0_bootp_tries        3
    ewa0_def_ginetaddr      0.0.0.0
    ewa0_def_inetaddr       0.0.0.0
    ewa0_def_inetfile
    ewa0_def_sinetaddr      0.0.0.0
    ewa0_def_subnetmask     0.0.0.0
    ewa0_ginetaddr          0.0.0.0
    ewa0_inet_init          bootp
    ewa0_inetaddr           0.0.0.0
    ewa0_inetfile
    ewa0_loop_count         3e8
    ewa0_loop_inc           a
    ewa0_loop_patt          ffffffff
    ewa0_loop_size          2e
    ewa0_lp_msg_node        1
    ewa0_mode               AUI
    ewa0_protocols          BOOTP
    ewa0_sinetaddr          0.0.0.0
    ewa0_tftp_tries         3
    kbd_hardware_type       PCXAL
    language                38
    language_name           English(British/Irish)
    license                 MU
    mopv3_boot              OFF
    os_type                 UNIX
    pal                     VMS PALcode V1.18-0, OSF PALcode V1.21-0
    pci_parity              off
    pka0_host_id            7
    pka0_soft_term          on
    pkb0_host_id            7
    pkb0_soft_term          on
    quick_start             OFF
    scsi_poll               ON
    sys_serial_num          AY63709052
    tga_sync_green          0
    tt_allow_login          1
    tty_dev                 0
    version                 V6.3-11 Nov 20 1996 16:01:42
    
    
    Any ideas???
    
    
    Phil Morris
T.RTitleUserPersonal
Name
DateLines
1963.1AS600 MMB=0 and failing SIMM=7CSC32::HUTMACHERFri May 09 1997 16:3479
    Hi Phil
    
    from lastest registers posted in note .0  looks like a memory problem on
    
    MMB=0 and failing SIMM=7 on the BANK1 Simms - 32meg simm = 54-21277-BA
    ______________________________________________________________________
    
          +-------------------- Top of MMB-----------------------------+
          |           SIMM           #             SIMM           #     |
          |    ====================  16      ===================  14    |
          |    ====================  12      ===================  10    |
          |    ====================  08      ===================  06    |
          |    ====================  04      ===================  02    |
          |                                                             |
          |                                                             |
          |    ====================  15      ===================  13    |
          |    ====================  11      ===================  09    |
    this one>> ====================  07      ===================  05    |
    is bad|    ====================  03      ===================  01    |
          |                                                             |
          +--------------------------PINS-------------------------------+
          MMB in error = 0     SIMM in error = 07
    
    MMB0 is the lower MMB located in the middle of the system backplane
    
    got to this callout by using note 1658 in this notes file
    
    EI ADDR             xFFFFFF000F7948EF
                                       |_______bit 4=0 EI ADDR            
                                               so use EVEN MMB column in chart
    
    FILL SYNDROME       x000000000000E500
                                     \/
                                   syndrome data is "E5" <15:8> so high
                                                               Quadword
    
    
    Find which bank the EI Address=FFFFFF000F7948EF resides in by
    using  >>>show memory                              
          
        Memory Size = 448Mb
        Bank      Size/Sets   Base Addr     Speed
        ------    ----------  ---------     -----
        00        064Mb/2     018000000     Slow       EI Address
        01        256Mb/2     000000000     Fast < --- falls into bank 1
        02        128Mb/1     010000000     Fast                      
    
    
    Look in the table in note 1658 and find the following line.
    
        *           *
    
        S         D e   o
        y         a v   d
        d         t e   d
        r         a n
        o
        m         B M S M S
        e         i M I M I
                  t B M B M
                      M   M
    
        0xE5,    25,0,3,0,4,
          |         | |_______________________
       syndrome     -----|                   |
    so the failing addr MMB=0 and failing SIMM=3,7,11,15 depending on which
                                                         bank ei_addr falls
                                                         into. since in this 
                                                         case bank=1 then
                                                         bad SIMM=7
    
        There are 2 MMB's (Memory Mother Boards) in a AS600 System
        a MMB0 on the bottom (center of machine) and MMB1 on top (above)
        Each MMB has half (4 simms of a complete memory bank)
        (qty=8 SIMMs = full bank).
    
    
    jim hutmacher mvhs colorado csc 800-354-9000 ext 25561