Title: | Alpha Workstation Conference |
Notice: | See note 1.* for conference notices |
Moderator: | WRKSYS::HOUSE |
Created: | Wed Sep 07 1994 |
Last Modified: | Fri Jun 06 1997 |
Last Successful Update: | Fri Jun 06 1997 |
Number of topics: | 1996 |
Total number of notes: | 9122 |
Can anybody shed some light on the following:- We have an Alphastation 600 5/333 running Digital Unix V4.0 rev 386 that is getting intermittent CPU exceptions,with the following info from DECevent:- DECevent entry info: ==================== Entry type 100 CPU machine Check Error CPU Minor Class 3 Bcacahe error (630 entry) Mach Check Err Code 086 EV5 Detected Correctable ECC error EI ADDR xFFFFFF000F7948EF FILL SYNDROME x000000000000E500 EI STATUS xFFFFFFF0C4FFFFFF error occurred during Dref fill ISR x0000000100000000 correctable ECC errors (IPL31) All other registers are shown as being 0000 by the DECevent output. This is a soft error & the machine doesn't crash & there is no noticeable sign of impact on performance etc. Parts replaced so far: ====================== We have tried replacing the following parts,which have now all been taken back out of the machine & returned to logistics. Bcache Simms Replaced 2 in 1 go & then refitted originals & replaced the remaining simm Memory Simms Removed banks 1 + 2(leaving bank0 in) Then removed bank 0 & refitted just bank 1 in place of the original bank 0 Mem motherboard Replaced one at a time System Board Replaced None of the above had any effect on the fault. We can force the machine to log several of these soft errors by running DECvet with 5 CPU exers & 5 MEM exers for approx 10 minutes In normal use these errors crop up randomly & infrequently maybe 1 a day but can be up to 20 a day. System Config info: =================== Below are outputs of show config & show * >>>show config Firmware SRM Console: V6.3-11 ARC Console: 4.49 PALcode: VMS PALcode V1.18-0, OSF PALcode V1.21-0 SROM Version: V1.2 Processor DECchip (tm) 21164-5 Pass 4 333 MHz 96 KBytes SCache 4 MB BCache CIA ASIC Pass 2 MEMORY Memory Size = 448Mb Bank Size/Sets Base Addr Speed ------ ---------- --------- ----- 00 064Mb/2 018000000 Slow 01 256Mb/2 000000000 Fast 02 128Mb/1 010000000 Fast BCache Size = 4Mb Tested Memory = 448Mbytes PCI Bus Bus 00 Slot 07: Digital TGA2 Graphics Controller Bus 00 Slot 08: Digital PCI to PCI Bridge Chip Bus 01 Slot 00: DECchip 21040 Network Controller ewa0.0.0.1000.0 00-00-F8-20-E6-95 Bus 01 Slot 01: ISP1020 Scsi Controller pka0.7.0.1001.0 SCSI Bus ID 7 dka0.0.0.1001.0 RZ28D dka300.3.0.1001.0 DSP5200S dka500.5.0.1001.0 RRD45 jka401.4.0.1001.0 TZ mka400.4.0.1001.0 TZ887 Bus 01 Slot 02: ISP1020 Scsi Controller pkb0.7.0.1002.0 SCSI Bus ID 7 Bus 00 Slot 09: DAC960 Scsi Raid Controller dra.0.0.9.0 dra0.0.0.9.0 1 Member JBOD dra1.0.0.9.0 1 Member JBOD dra2.0.0.9.0 1 Member JBOD dra3.0.0.9.0 1 Member JBOD dra4.0.0.9.0 1 Member JBOD dra5.0.0.9.0 1 Member JBOD dra6.0.0.9.0 1 Member JBOD dra7.0.0.9.0 1 Member JBOD Bus 00 Slot 10: Intel 8275EB PCI to Eisa Bridge Bus 00 Slot 12: DAC960 Scsi Raid Controller drb.0.0.12.0 drb0.0.0.12.0 2 Member RAID0 drb1.0.0.12.0 2 Member RAID0 drb2.0.0.12.0 2 Member RAID0 drb3.0.0.12.0 2 Member RAID0 EISA Bus Modules (installed) EISA/ISA NVR configuration SLOT Module >>>show * auto_action HALT boot_dev dka0.0.0.1001.0 boot_file boot_osflags A boot_reset OFF bootdef_dev dka0.0.0.1001.0 booted_dev dka0.0.0.1001.0 booted_file booted_osflags A bus_probe_algorithm new char_set 0 console graphics controlp on dump_dev enable_audit ON ewa0_arp_tries 3 ewa0_bootp_file ewa0_bootp_server ewa0_bootp_tries 3 ewa0_def_ginetaddr 0.0.0.0 ewa0_def_inetaddr 0.0.0.0 ewa0_def_inetfile ewa0_def_sinetaddr 0.0.0.0 ewa0_def_subnetmask 0.0.0.0 ewa0_ginetaddr 0.0.0.0 ewa0_inet_init bootp ewa0_inetaddr 0.0.0.0 ewa0_inetfile ewa0_loop_count 3e8 ewa0_loop_inc a ewa0_loop_patt ffffffff ewa0_loop_size 2e ewa0_lp_msg_node 1 ewa0_mode AUI ewa0_protocols BOOTP ewa0_sinetaddr 0.0.0.0 ewa0_tftp_tries 3 kbd_hardware_type PCXAL language 38 language_name English(British/Irish) license MU mopv3_boot OFF os_type UNIX pal VMS PALcode V1.18-0, OSF PALcode V1.21-0 pci_parity off pka0_host_id 7 pka0_soft_term on pkb0_host_id 7 pkb0_soft_term on quick_start OFF scsi_poll ON sys_serial_num AY63709052 tga_sync_green 0 tt_allow_login 1 tty_dev 0 version V6.3-11 Nov 20 1996 16:01:42 Any ideas??? Phil Morris
T.R | Title | User | Personal Name | Date | Lines |
---|---|---|---|---|---|
1963.1 | AS600 MMB=0 and failing SIMM=7 | CSC32::HUTMACHER | Fri May 09 1997 16:34 | 79 | |
Hi Phil from lastest registers posted in note .0 looks like a memory problem on MMB=0 and failing SIMM=7 on the BANK1 Simms - 32meg simm = 54-21277-BA ______________________________________________________________________ +-------------------- Top of MMB-----------------------------+ | SIMM # SIMM # | | ==================== 16 =================== 14 | | ==================== 12 =================== 10 | | ==================== 08 =================== 06 | | ==================== 04 =================== 02 | | | | | | ==================== 15 =================== 13 | | ==================== 11 =================== 09 | this one>> ==================== 07 =================== 05 | is bad| ==================== 03 =================== 01 | | | +--------------------------PINS-------------------------------+ MMB in error = 0 SIMM in error = 07 MMB0 is the lower MMB located in the middle of the system backplane got to this callout by using note 1658 in this notes file EI ADDR xFFFFFF000F7948EF |_______bit 4=0 EI ADDR so use EVEN MMB column in chart FILL SYNDROME x000000000000E500 \/ syndrome data is "E5" <15:8> so high Quadword Find which bank the EI Address=FFFFFF000F7948EF resides in by using >>>show memory Memory Size = 448Mb Bank Size/Sets Base Addr Speed ------ ---------- --------- ----- 00 064Mb/2 018000000 Slow EI Address 01 256Mb/2 000000000 Fast < --- falls into bank 1 02 128Mb/1 010000000 Fast Look in the table in note 1658 and find the following line. * * S D e o y a v d d t e d r a n o m B M S M S e i M I M I t B M B M M M 0xE5, 25,0,3,0,4, | | |_______________________ syndrome -----| | so the failing addr MMB=0 and failing SIMM=3,7,11,15 depending on which bank ei_addr falls into. since in this case bank=1 then bad SIMM=7 There are 2 MMB's (Memory Mother Boards) in a AS600 System a MMB0 on the bottom (center of machine) and MMB1 on top (above) Each MMB has half (4 simms of a complete memory bank) (qty=8 SIMMs = full bank). jim hutmacher mvhs colorado csc 800-354-9000 ext 25561 |