| Title: | Alpha Workstation Conference |
| Notice: | See note 1.* for conference notices |
| Moderator: | WRKSYS::HOUSE |
| Created: | Wed Sep 07 1994 |
| Last Modified: | Fri Jun 06 1997 |
| Last Successful Update: | Fri Jun 06 1997 |
| Number of topics: | 1996 |
| Total number of notes: | 9122 |
Can anybody shed some light on the following:-
We have an Alphastation 600 5/333 running Digital Unix V4.0 rev 386 that is
getting intermittent CPU exceptions,with the following info from DECevent:-
DECevent entry info:
====================
Entry type 100 CPU machine Check Error
CPU Minor Class 3 Bcacahe error (630 entry)
Mach Check Err Code 086 EV5 Detected Correctable ECC error
EI ADDR xFFFFFF000F7948EF
FILL SYNDROME x000000000000E500
EI STATUS xFFFFFFF0C4FFFFFF
error occurred during Dref fill
ISR x0000000100000000
correctable ECC errors (IPL31)
All other registers are shown as being 0000 by the DECevent output.
This is a soft error & the machine doesn't crash & there is no
noticeable sign of impact on performance etc.
Parts replaced so far:
======================
We have tried replacing the following parts,which have now all been
taken back out of the machine & returned to logistics.
Bcache Simms Replaced 2 in 1 go & then refitted originals &
replaced the remaining simm
Memory Simms Removed banks 1 + 2(leaving bank0 in)
Then removed bank 0 & refitted just bank 1 in place
of the original bank 0
Mem motherboard Replaced one at a time
System Board Replaced
None of the above had any effect on the fault.
We can force the machine to log several of these soft errors by running
DECvet with 5 CPU exers & 5 MEM exers for approx 10 minutes
In normal use these errors crop up randomly & infrequently maybe 1 a
day but can be up to 20 a day.
System Config info:
===================
Below are outputs of show config & show *
>>>show config
Firmware
SRM Console: V6.3-11
ARC Console: 4.49
PALcode: VMS PALcode V1.18-0, OSF PALcode V1.21-0
SROM Version: V1.2
Processor
DECchip (tm) 21164-5 Pass 4 333 MHz 96 KBytes SCache
4 MB BCache
CIA ASIC Pass 2
MEMORY
Memory Size = 448Mb
Bank Size/Sets Base Addr Speed
------ ---------- --------- -----
00 064Mb/2 018000000 Slow
01 256Mb/2 000000000 Fast
02 128Mb/1 010000000 Fast
BCache Size = 4Mb
Tested Memory = 448Mbytes
PCI Bus
Bus 00 Slot 07: Digital TGA2 Graphics Controller
Bus 00 Slot 08: Digital PCI to PCI Bridge Chip
Bus 01 Slot 00: DECchip 21040 Network Controller
ewa0.0.0.1000.0 00-00-F8-20-E6-95
Bus 01 Slot 01: ISP1020 Scsi Controller
pka0.7.0.1001.0 SCSI Bus ID 7
dka0.0.0.1001.0 RZ28D
dka300.3.0.1001.0 DSP5200S
dka500.5.0.1001.0 RRD45
jka401.4.0.1001.0 TZ
mka400.4.0.1001.0 TZ887
Bus 01 Slot 02: ISP1020 Scsi Controller
pkb0.7.0.1002.0 SCSI Bus ID 7
Bus 00 Slot 09: DAC960 Scsi Raid Controller
dra.0.0.9.0
dra0.0.0.9.0 1 Member JBOD
dra1.0.0.9.0 1 Member JBOD
dra2.0.0.9.0 1 Member JBOD
dra3.0.0.9.0 1 Member JBOD
dra4.0.0.9.0 1 Member JBOD
dra5.0.0.9.0 1 Member JBOD
dra6.0.0.9.0 1 Member JBOD
dra7.0.0.9.0 1 Member JBOD
Bus 00 Slot 10: Intel 8275EB PCI to Eisa Bridge
Bus 00 Slot 12: DAC960 Scsi Raid Controller
drb.0.0.12.0
drb0.0.0.12.0 2 Member RAID0
drb1.0.0.12.0 2 Member RAID0
drb2.0.0.12.0 2 Member RAID0
drb3.0.0.12.0 2 Member RAID0
EISA Bus Modules (installed)
EISA/ISA NVR configuration
SLOT Module
>>>show *
auto_action HALT
boot_dev dka0.0.0.1001.0
boot_file
boot_osflags A
boot_reset OFF
bootdef_dev dka0.0.0.1001.0
booted_dev dka0.0.0.1001.0
booted_file
booted_osflags A
bus_probe_algorithm new
char_set 0
console graphics
controlp on
dump_dev
enable_audit ON
ewa0_arp_tries 3
ewa0_bootp_file
ewa0_bootp_server
ewa0_bootp_tries 3
ewa0_def_ginetaddr 0.0.0.0
ewa0_def_inetaddr 0.0.0.0
ewa0_def_inetfile
ewa0_def_sinetaddr 0.0.0.0
ewa0_def_subnetmask 0.0.0.0
ewa0_ginetaddr 0.0.0.0
ewa0_inet_init bootp
ewa0_inetaddr 0.0.0.0
ewa0_inetfile
ewa0_loop_count 3e8
ewa0_loop_inc a
ewa0_loop_patt ffffffff
ewa0_loop_size 2e
ewa0_lp_msg_node 1
ewa0_mode AUI
ewa0_protocols BOOTP
ewa0_sinetaddr 0.0.0.0
ewa0_tftp_tries 3
kbd_hardware_type PCXAL
language 38
language_name English(British/Irish)
license MU
mopv3_boot OFF
os_type UNIX
pal VMS PALcode V1.18-0, OSF PALcode V1.21-0
pci_parity off
pka0_host_id 7
pka0_soft_term on
pkb0_host_id 7
pkb0_soft_term on
quick_start OFF
scsi_poll ON
sys_serial_num AY63709052
tga_sync_green 0
tt_allow_login 1
tty_dev 0
version V6.3-11 Nov 20 1996 16:01:42
Any ideas???
Phil Morris
| T.R | Title | User | Personal Name | Date | Lines |
|---|---|---|---|---|---|
| 1963.1 | AS600 MMB=0 and failing SIMM=7 | CSC32::HUTMACHER | Fri May 09 1997 15:34 | 79 | |
Hi Phil
from lastest registers posted in note .0 looks like a memory problem on
MMB=0 and failing SIMM=7 on the BANK1 Simms - 32meg simm = 54-21277-BA
______________________________________________________________________
+-------------------- Top of MMB-----------------------------+
| SIMM # SIMM # |
| ==================== 16 =================== 14 |
| ==================== 12 =================== 10 |
| ==================== 08 =================== 06 |
| ==================== 04 =================== 02 |
| |
| |
| ==================== 15 =================== 13 |
| ==================== 11 =================== 09 |
this one>> ==================== 07 =================== 05 |
is bad| ==================== 03 =================== 01 |
| |
+--------------------------PINS-------------------------------+
MMB in error = 0 SIMM in error = 07
MMB0 is the lower MMB located in the middle of the system backplane
got to this callout by using note 1658 in this notes file
EI ADDR xFFFFFF000F7948EF
|_______bit 4=0 EI ADDR
so use EVEN MMB column in chart
FILL SYNDROME x000000000000E500
\/
syndrome data is "E5" <15:8> so high
Quadword
Find which bank the EI Address=FFFFFF000F7948EF resides in by
using >>>show memory
Memory Size = 448Mb
Bank Size/Sets Base Addr Speed
------ ---------- --------- -----
00 064Mb/2 018000000 Slow EI Address
01 256Mb/2 000000000 Fast < --- falls into bank 1
02 128Mb/1 010000000 Fast
Look in the table in note 1658 and find the following line.
* *
S D e o
y a v d
d t e d
r a n
o
m B M S M S
e i M I M I
t B M B M
M M
0xE5, 25,0,3,0,4,
| | |_______________________
syndrome -----| |
so the failing addr MMB=0 and failing SIMM=3,7,11,15 depending on which
bank ei_addr falls
into. since in this
case bank=1 then
bad SIMM=7
There are 2 MMB's (Memory Mother Boards) in a AS600 System
a MMB0 on the bottom (center of machine) and MMB1 on top (above)
Each MMB has half (4 simms of a complete memory bank)
(qty=8 SIMMs = full bank).
jim hutmacher mvhs colorado csc 800-354-9000 ext 25561
| |||||