[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference mvblab::alphaserver_4100

Title:AlphaServer 4100
Moderator:MOVMON::DAVISS
Created:Tue Apr 16 1996
Last Modified:Fri Jun 06 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:648
Total number of notes:3158

547.0. "AS4100/test cpu3 fails" by LOTIMA::P_MORRIS () Wed Mar 26 1997 07:23

    Hi
    
    Has anybody had similar problems to the following:-
    
    Config is:-
    
    Alphaserver 4100 Rackmount
    =========== ==== =========
    4 x 300Mhz CPU Part no:B3002-AB 
    4 x 128Mb  MEM Dataram MEM DEC equivalent = B3020-CA
    
    SRM version is 3.0-10 (V3.8 CD)
    System is running Digital Unix V4.0A
    
    in this config power test is OK
    test cpu0...OK   test mem0...OK
    test cpu1...OK   test mem1...OK
    test cpu2...OK   test mem2...OK
    test cpu3...FAIL test mem3...OK
    
    If CPU3 is removed everything is OK.
    
    Alternatively if all 4 CPUs are fitted
    & MEM banks 2 & 3 are removed,MEM0 & 1 installed ALL OK
    
    If all 4 CPUs are fitted & MEM0 ,1 & 2 fitted 
    the machine fails  the test cpu3.
    
    The CPUs have been cycled around & the CPU/MEM motherboard
    (54-23803-01) has been replaced,made no difference.
    
    The test_mem variable is set to full.
    
    We are currently trying to resource 8xB3020-CA to make the machine
    "totally DEC" instead of using the DataRAM memory,but I'm not sure that
    this is going to make any difference.
    
    Has anybody got any ideas ???
    All help gratefully received
    
    Regards
               
    Phil Morris 
    
    
T.RTitleUserPersonal
Name
DateLines
547.1MAY21::CUMMINSWed Mar 26 1997 12:1314
    You can do the following to get more info about the failure..
    
     1. Observe the OCP display. What does it freeze when it reports
        "FAIL" in the OCP display? The power-up code freezes failure
        displays for 3-5 seconds.. Is it failing a CPU or a memory test?
    
     2. Using a serial console (CONSOLE=SERIAL), do the following:
    
        a. Bring up the machine to SRM prompt (remove CPU3 if necessary).
        b. Type following at SRM prompt:  P00>>> d toy:22 22 -b  thereby
           enabling additional error reporting (MFG mode).
        c. Recreate the failure situation and reset the machine. 
        d. Does any additional error info get printed?
        e. Post the entire power-up display log as a reply to this note.
547.2MAY21::CUMMINSWed Mar 26 1997 12:141
    This system has two or three power supplies, yes?
547.3Console outputs of crashLOTIMA::P_MORRISFri Apr 04 1997 13:02726
    Sorry for the delay in supplying this info.
    And also for the length of this note,you may want to extract this reply
    first.
    
    Below are the power ups & show config show * etc,also captures of the
    crashes while running test cpu from the console.
    
    This machine has 2 PSUs & in the following example the Dataram 
    memory has been swapped out for DEC memory (4 banks of 128Mb sync)
    
      THE REVISION OF THE CPU'S ARE ALL REV   D07  
      THE REVISION OF THE CPU/MEM MOTHERBOARD B06  
      THE REVISION OF DIGITAL MEM MODULES IS A MIX OF D04 D05 D06        
      NONE OF THE CPUS HAVE BEEN REPLACED BUT THEY HAVE BEEN MOVED AROUND.
      THE PROBLEM ONLY OCCURS WHEN YOU HAVE ALL 4 CPUS INSTALLED AS STATED
      IN .0 OF THIS NOTE
    
    
      CONSOLE OUTPUTS:
      ================
    INIT
    Initializing...    
     SROM V1.1 on cpu0
     SROM V1.1 on cpu1
     SROM V1.1 on cpu2
     SROM V1.1 on cpu3
    XSROM V3.0 on cpu1
    XSROM V3.0 on cpu2
    XSROM V3.0 on cpu3
    XSROM V3.0 on cpu0
    BCache testing complete on cpu1
    BCache testing complete on cpu0
    BCache testing complete on cpu3
    BCache testing complete on cpu2
    mem_pair0 - 128 MB 
    mem_pair1 - 128 MB 
    mem_pair2 - 128 MB 
    mem_pair3 - 128 MB 
    20..20..20..21..21..21..20..21..23..
    please wait 10 seconds for T24 to complete
    24..24..24..24..
    Memory testing complete on cpu0
    Memory testing complete on cpu2
    Memory testing complete on cpu3
    Memory testing complete on cpu1
    starting console on CPU 0
    sizing memory
      0    128 MB SYNC
      1    128 MB SYNC
      2    128 MB SYNC
      3    128 MB SYNC
    starting console on CPU 1
    starting console on CPU 2
    starting console on CPU 3
    probing IOD1 hose 1 
      bus 0 slot 1 - NCR 53C810
      bus 0 slot 2 - DECchip 21040-AA
      bus 0 slot 4 - NCR 53C810
      bus 0 slot 5 - DEC PCI FDDI
    probing IOD0 hose 0 
      bus 0 slot 1 - PCEB
        probing EISA Bridge, bus 1
      bus 0 slot 2 - S3 Trio64/Trio32
      bus 0 slot 4 - Mylex DAC960
    ** keyboard error **
    configuring I/O adapters...
      ncr0, hose 1, bus 0, slot 1
      tulip0, hose 1, bus 0, slot 2
      ncr1, hose 1, bus 0, slot 4
      pfi0, hose 1, bus 0, slot 5
      floppy0, hose 0, bus 1, slot 0
      dac0, hose 0, bus 0, slot 4
    System temperature is 25 degrees C
    AlphaServer 4100 Console V3.0-10, 19-NOV-1996 13:57:07
    
    Halt Button is IN, BOOT NOT POSSIBLE
    
    
    P00>>>sho config
                               Digital Equipment Corporation
                                     AlphaServer 4100
    
     Console V3.0-10  OpenVMS PALcode V1.19-2, Digital UNIX PALcode
    V1.21-14
    
     Module                          Type     Rev    Name  
     System Motherboard              0        0000   mthrbrd0
     Memory  128 MB SYNC             0        0000   mem0     
     Memory  128 MB SYNC             0        0000   mem1     
     Memory  128 MB SYNC             0        0000   mem2     
     Memory  128 MB SYNC             0        0000   mem3     
     CPU (2MB Cache)                 2        0004   cpu0     
     CPU (2MB Cache)                 2        0004   cpu1     
     Bridge (IOD0/IOD1)              600      0032   iod0/iod1
     PCI Motherboard                 8        0000   saddle0  
     CPU (2MB Cache)                 2        0004   cpu2     
     CPU (2MB Cache)                 2        0004   cpu3     
    
     Bus 0  iod0 (PCI0)
     Slot   Option Name              Type     Rev    Name
     1      PCEB                     4828086  0015   pceb0    
     2      S3 Trio64/Trio32         88115333 0000   vga0     
     4      Mylex DAC960             11069    0002   dac0     
    
     Bus 1  pceb0 (EISA Bridge connected to iod0, slot 1)
     Slot   Option Name              Type     Rev    Name
    
     Bus 0  iod1 (PCI1)
     Slot   Option Name              Type     Rev    Name
     1      NCR 53C810               11000    0002   ncr0     
     2      DECchip 21040-AA         21011    0024   tulip0   
     4      NCR 53C810               11000    0002   ncr1     
     5      DEC PCI FDDI             f1011    0000   pfi0     
    
    
    
    P00>>>sho dev
    polling ncr0 (NCR 53C810) slot 1, bus 0 PCI, hose 1   SCSI Bus ID 7
    dka500.5.0.1.1     DKa500                   RRD45  0436
    polling ncr1 (NCR 53C810) slot 4, bus 0 PCI, hose 1   SCSI Bus ID 7
    dkb0.0.0.4.1       DKb0                     RZ29B  0016
    dkb100.1.0.4.1     DKb100                   RZ29B  0016
    dkb200.2.0.4.1     DKb200                   RZ29B  0016
    dkb300.3.0.4.1     DKb300                   RZ29B  0016
    dkb400.4.0.4.1     DKb400                   RZ29B  0016
    dkb500.5.0.4.1     DKb500                   RZ28D  0010
    polling floppy0 (FLOPPY) PCEB - XBUS hose 0   
    dva0.0.0.1000.0    DVA0                      RX23
    polling dac0 (Mylex DAC960) slot 4, bus 0 PCI, hose 0 
    dra0.0.0.4.0       DRA0             1 Member JBOD       
    dra1.0.0.4.0       DRA1             1 Member JBOD       
    dra2.0.0.4.0       DRA2             1 Member JBOD       
    dra3.0.0.4.0       DRA3             1 Member JBOD       
    dra4.0.0.4.0       DRA4             1 Member JBOD       
    dra5.0.0.4.0       DRA5             1 Member JBOD       
    dra6.0.0.4.0       DRA6             1 Member JBOD       
    dra7.0.0.4.0       DRA7             1 Member JBOD       
    polling tulip0 (DECchip 21040-AA) slot 2, bus 0 PCI, hose 1 
    ewa0.0.0.2.1       00-00-F8-23-12-8F
    polling pfi0 (DEC PCI FDDI) slot 5, bus 0 PCI, hose 1 
    fwa0.0.0.5.1: 00-00-F8-40-0A-BA
    
    
    P00>>>SHO *
    arc_enable          	OFF             
    auto_action         	BOOT            
    boot_dev            	dkb0.0.0.4.1    
    boot_file           	                
    boot_osflags        	A               
    boot_reset          	OFF             
    bootdef_dev         	dkb0.0.0.4.1    
    booted_dev          	                
    booted_file         	                
    booted_osflags      	                
    cda0                	dka500.5.0.1.1  
    char_set            	0
    com1_baud           	9600            
    com1_flow           	SOFTWARE        
    com1_modem          	OFF             
    com2_baud           	9600            
    com2_flow           	SOFTWARE        
    com2_modem          	OFF             
    console             	serial          
    cpu_enabled         	f
    d_group             	field           
    d_omit              	                
    d_passes            	1
    d_runtime           	0
    d_verbose           	0
    dump_dev            	                
    enable_audit        	ON              
    ewa0_loop_count     	3e8
    ewa0_loop_inc       	a
    ewa0_loop_patt      	ffffffff
    ewa0_loop_size      	2e
    ewa0_lp_msg_node    	1
    ewa0_mode           	Twisted-Pair    
    exdep_data          	5555555555555555
    exdep_location      	0               
    exdep_size          	0
    exdep_space         	pmem            
    exdep_type          	3
    fru_table           	ON              
    full_powerup_diags  	ON              
    fwa0_loop_count     	3e8
    fwa0_loop_inc       	a
    fwa0_loop_patt      	ffffffff
    fwa0_loop_size      	2e
    fwa0_lp_msg_node    	1
    graphics_background 	4               
    graphics_foreground 	7               
    graphics_page       	0               
    graphics_switch     	-1              
    graphics_sync       	0               
    graphics_type       	VIDEO           
    kbd_hardware_type   	PCXAL           
    language            	36
    language_name       	English (American)
    license             	MU              
    memory_test         	full            
    ocp_text            	                
    os_type             	UNIX            
    pal                 	OpenVMS PALcode V1.19-2, Digital UNIX
    PALcode V1.21-14
    pci_arbmode         	Round-Robin     
    pci_parity          	ON              
    pci_req64           	                
    pka0_disconnect     	1
    pka0_fast           	1
    pka0_host_id        	7
    pkb0_disconnect     	1
    pkb0_fast           	1
    pkb0_host_id        	7
    prompt              	>>>             
    rcm_answer          	
    ���������������������������������������������������������������������������������������������������������������
    rcm_dialout         	
    �����������������������������������������������
    rcm_init            	
    �������������������������������������������������������������������������������������������������������������������������������
    reset_boot_arg0     	0               
    reset_boot_arg1     	                
    reset_boot_arg2     	                
    sys_model_num       	4100            
    sys_serial_num      	AY65013525      
    sys_type            	RACKMOUNT       
    tga_sync_green      	8
    tt_allow_login      	1
    tta0_page           	0               
    tta0_type           	VIDEO           
    tty_dev             	0               
    version             	V3.0-10, 19-NOV-1996 13:57:07
    P00>>>
    P00>>>
    P00>>>
    P00>>>SHO DYNAMIC
    zone     zone       used    used       free    free       utili-  high
    address  size       blocks  bytes      blocks  bytes      zation  water
    -------- ---------- ------- ---------- ------- ---------- -------
    ----------
    0001A4E0 618496     28      394112     9       224416      63 %  
    553312    
    0001A120 982976     380     164512     22      818496      16 %  
    268800    
    0011B8A0 520093696  1       32         1       520093696    0 %  
    144928    
    P00>>>
    
    P00>>>d toy:22 22 -b
    P00>>>test cpu
    Console is in diagnostic mode
    System test, runtime 300 seconds
    
    Type ^C if you wish to abort testing once it has started
    
    Starting background cache/memory test on CPU0..
    Starting background cache/memory test on CPU1..
    Starting background cache/memory test on CPU2..
    Starting background cache/memory test on CPU3..
    
    ***CPU 00: Unexpected Machine Check through vector 00000660
    
    EV5 IPRs:
      exc_addr:  00000000 000e2208  exc_sum:     00000000 00000000
      exc_mask:  00000000 00000000  isr:         00000000 00200000
      icsr:      000000c1 44000000  icpe_stat:   00000000 00000000
      dcpe_stat: 00000000 00000000  va:          00000000 0ab50000
      mm_stat:   00000000 00016c11  sc_addr:     ffffff00 0000f40f
      sc_stat:   00000000 00000000  bc_tag_addr: ffffff80 1dcf6fff
      ei_addr:   ffffff00 0ab5000f  ei_stat:     fffffff9 04ffffff
      fill_syn:  00000000 000002fe
    
    IOD: 0 base address: f9e0000000
      WHOAMI:     000008ba PCI_REV:    06008032 ENVIRON:    00000000
      CAP_CTL:    46460ff1 HAE_MEM:    00000000 HAE_IO:     00000000
      INT_CTL:    00000003 INT_REQ:    00c00000 INT_MASK0:  00e70000
      INT_MASK1:  00000000 MC_ERR0:    0ab50000 MC_ERR1:    800e9b00
      CAP_ERR:    b0000000 PCI_ERR:    00000000 MDPA_STAT:  80000000
      MDPA_SYN:   00000000 MDPB_STAT:  40000000 MDPB_SYN:   00000002
    
    IOD: 1 base address: fbe0000000
      WHOAMI:     000008ba PCI_REV:    06000032 ENVIRON:    00000000
      CAP_CTL:    46460ff1 HAE_MEM:    00000000 HAE_IO:     00000000
      INT_CTL:    00000003 INT_REQ:    00c00000 INT_MASK0:  00c00000
      INT_MASK1:  00000000 MC_ERR0:    0ab50000 MC_ERR1:    800e9b00
      CAP_ERR:    b0000000 PCI_ERR:    00000000 MDPA_STAT:  80000000
      MDPA_SYN:   00000000 MDPB_STAT:  40000000 MDPB_SYN:   00000000
    
    Console Crash... Type ;P to view stack contents
    breakpoint at PC 61340 desired, XDELTA not loaded
    
    Process memtest, pcb = 0083C0E0
     pc: 00000000 000E2208  ps: 20000000 00000004
     r2: 00000000 00061DA0  r5: 00000000 0AA00000
     r3: 00000000 001049B0  r6: FFFFFFFF FFFFFFFF
     r4: 00000000 00000000  r7: 00000000 0083B5C0
    
    exception context saved starting at 0083CB40
    
    GPRs:
      0: 00000000 00020000  16: 00000000 0001F81B
      1: 00000000 00015013  17: FFFFFFFF FFFE07E5
      2: 00000000 000E3660  18: 00000000 0AB50128
      3: 00000000 0083B6C0  19: 00000000 00000000
      4: 00000000 00020000  20: 00000000 FFFFFFFD
      5: 00000000 0AA00000  21: 00000000 05C00000
      6: FFFFFFFF FFFFFFFF  22: 00000000 0083CCB0
      7: 00000000 0083B5C0  23: FFFFFFFF FFFFFFFF
      8: 00000000 0083B6C0  24: FFFFFFFF FFFFFFFF
      9: 00000000 01200000  25: 00000000 0AB50120
     10: 00000000 00000100  26: 00000000 00000018
     11: 00000000 0000004C  27: 00000000 00000001
     12: 00000000 00000000  28: FFFFFFFF FFFFFFF2
     13: 00000000 00000008  29: 00000000 0083CCA0
     14: 00000000 00000001  30: 00000000 0083CCA0
     15: 00000000 00000073
    
    dump of active call frames:
    
    PC  =  000E2204
    PD  =  000E3660 (MEMTEST_GRAY)
    FP  =  0083CCA0
    SP  =  0083CCA0
    
    R2 R3 R4 R5 R6 R7 R8 R9 R10 R11 R29 saved starting at 0083CCC0
    
    R2  =  000E3380
    R3  =  0083CF08
    R4  =  00000000
    R5  =  0011B8A0
    R6  =  00060480
    R7  =  0083B5C0
    R8  =  0083B6C0
    R9  =  01200000
    R10 =  00000100
    R11 =  0000004C
    R29 =  0083CD20
    
    PC  =  000E1CE8
    PD  =  000E3380 (RAND_BLOCK)
    FP  =  0083CD20
    SP  =  0083CD20
    
    R2 R3 R4 R5 R6 R7 R8 R9 R10 R11 R12 R13 R14 R15 R29 saved starting at
    0083CD28
    
    R2  =  000E34C0
    R3  =  1EFFFFE0
    R4  =  000E35A0
    R5  =  00000000
    R6  =  0083C0E0
    R7  =  000E3138
    R8  =  00060480
    R9  =  0083B6C0
    R10 =  00000000
    R11 =  0011B8A0
    R12 =  00000000
    R13 =  00000000
    R14 =  00000000
    R15 =  00000000
    R29 =  0083CDB0
    
    PC  =  000E17DC
    PD  =  000E34C0 (MEMTEST)
    FP  =  0083CDB0
    SP  =  0083CDB0
    
    R2 R3 R4 R5 R6 R7 R8 R9 R10 R11 R29 saved starting at 0083D220
    
    R2  =  0005A470
    R3  =  0083C0E0
    R4  =  0083C2B0
    R5  =  00000000
    R6  =  00000000
    R7  =  00000000
    R8  =  00000000
    R9  =  00000000
    R10 =  00000000
    R11 =  00000000
    R29 =  0083D280
    
    PC  =  00034288
    PD  =  0005A470 (KRN$_PROCESS)
    FP  =  0083D280
    SP  =  0083D280
    
    R2 R3 R4 R5 R29 saved starting at 0083D288
    
    R2  =  00000000
    R3  =  00000000
    R4  =  00000000
    R5  =  00000000
    R29 =  00000000
    
    breakpoint at PC 61340 desired, XDELTA not loaded
    
    
    Process memtest CPU2: soft error detected, vector 00630
    
      mchk_code:    00000000 00860000
    
    EV5 IPRs:
      ei_addr:   ffffff00 0802400f  ei_stat:     fffffff9 04ffffff
      fill_syn:  00000000 00000086  isr:         00000001 00000000
    
    IOD base address: 0000000000 WHOAMI: 00000002
      PCI_REV:    00000000 MC_ERR0:    00000000 MC_ERR1    00000000
      CAP_ERR:    00000000 MDPA_STAT:  00000000 MDPA_SYN:  00000000
      MDPB_STAT:  00000000 MDPB_SYN:   00000000
    
    
    Process memtest CPU3: soft error detected, vector 00630
    
      mchk_code:    00000000 00860000
    
    EV5 IPRs:
      ei_addr:   ffffff00 0a02400f  ei_stat:     fffffff9 04ffffff
      fill_syn:  00000000 00000012  isr:         00000001 00000000
    
    IOD base address: 0000000000 WHOAMI: 00000003
      PCI_REV:    00000000 MC_ERR0:    00000000 MC_ERR1    00000000
      CAP_ERR:    00000000 MDPA_STAT:  00000000 MDPA_SYN:  00000000
      MDPB_STAT:  00000000 MDPB_SYN:   00000000
    
    
    Process memtest CPU1: soft error detected, vector 00620
    
      mchk_code:    00000000 00860000
    
    EV5 IPRs:
      ei_addr:   ffffff00 0a5e062f  ei_stat:     fffffff0 c4ffffff
      fill_syn:  00000000 00006800  isr:         00000001 00000000
    
    IOD base address: 0000000000 WHOAMI: 00000001
      PCI_REV:    00000000 MC_ERR0:    00000000 MC_ERR1    00000000
      CAP_ERR:    00000000 MDPA_STAT:  00000000 MDPA_SYN:  00000000
      MDPB_STAT:  00000000 MDPB_SYN:   00000000
    
    ***CPU 00: Unexpected Machine Check through vector 00000660
    
    EV5 IPRs:
      exc_addr:  00000000 00061704  exc_sum:     00000000 00000000
      exc_mask:  00000000 00000000  isr:         00000000 00200000
      icsr:      000000c1 44000000  icpe_stat:   00000000 00000000
      dcpe_stat: 00000000 00000000  va:          00000000 000b2590
      mm_stat:   00000000 00005c50  sc_addr:     ffffff00 0000f42f
      sc_stat:   00000000 00000000  bc_tag_addr: ffffff80 0aaf6fff
      ei_addr:   ffffff00 0ab5000f  ei_stat:     fffffff0 04ffffff
      fill_syn:  00000000 000002fe
    
    IOD: 0 base address: f9e0000000
      WHOAMI:     000008ba PCI_REV:    06008032 ENVIRON:    00000000
      CAP_CTL:    46460ff1 HAE_MEM:    00000000 HAE_IO:     00000000
      INT_CTL:    00000003 INT_REQ:    00800000 INT_MASK0:  00e70000
      INT_MASK1:  00000000 MC_ERR0:    08024000 MC_ERR1:    800f9b00
      CAP_ERR:    a1000000 PCI_ERR:    00000000 MDPA_STAT:  c0000000
      MDPA_SYN:   00000000 MDPB_STAT:  40000000 MDPB_SYN:   00000000
    
    IOD: 1 base address: fbe0000000
      WHOAMI:     000008ba PCI_REV:    06000032 ENVIRON:    00000000
      CAP_CTL:    46460ff1 HAE_MEM:    00000000 HAE_IO:     00000000
      INT_CTL:    00000003 INT_REQ:    00800000 INT_MASK0:  00c00000
      INT_MASK1:  00000000 MC_ERR0:    08024000 MC_ERR1:    800f9b00
      CAP_ERR:    a1000000 PCI_ERR:    00000000 MDPA_STAT:  c0000000
      MDPA_SYN:   00000000 MDPB_STAT:  40000000 MDPB_SYN:   00680000
    
    Console Crash... Type ;P to view stack contents
    breakpoint at PC 61340 desired, XDELTA not loaded
    
    Process timer, pcb = 0010AB00
     pc: 00000000 00061704  ps: 30000000 00000004
     r2: 00000000 00061DA0  r5: 00000000 001098E0
     r3: 00000000 001049B0  r6: 00000000 0005B008
     r4: 00000000 00000000  r7: 00000000 00000004
    
    exception context saved starting at 0010B9C0
    
    GPRs:
      0: 00000000 0000001F  16: 00000000 00000000
      1: 00000000 0010AB08  17: 00000000 0001A3F0
      2: 00000000 00058430  18: 00000000 0001A3F0
      3: 00000000 00000000  19: 00000000 001108F0
      4: 00000000 00000000  20: 00000000 0010AB00
      5: 00000000 001098E0  21: 00000000 00000000
      6: 00000000 0005B008  22: 00000000 0004FE90
      7: 00000000 00000004  23: 00000000 0000001F
      8: 00000000 001098E0  24: 00000000 00D1D41A
      9: 00000000 00000001  25: 00000000 00000001
     10: 00000000 00000000  26: 00000000 0002A7E0
     11: 00000000 00000000  27: 00000000 00062330
     12: 00000000 00000000  28: FFFFFFFF FFFFFFF2
     13: 00000000 00000000  29: 00000000 0010BB30
     14: 00000000 00000000  30: 00000000 0010BB30
    
     15: 00000000 00000000
    
    dump of active call frames:
    
    PC  =  00061700
    PD  =  00058430 (SCHEDULE)
    FP  =  0010BB30
    SP  =  0010BB30
    
    R2 R3 R4 R5 R6 R7 R8 R9 R10 R29 saved starting at 0010BB48
    
    R2  =  0005B008
    R3  =  0001C9E8
    R4  =  0004FE90
    R5  =  000CB735
    R6  =  0001CA50
    R7  =  00000001
    R8  =  0001C9C8
    R9  =  00000000
    R10 =  00000000
    R29 =  0010BBA0
    
    PC  =  00038ACC
    PD  =  0005B008 (KRN$_WAIT)
    FP  =  0010BBA0
    SP  =  0010BBA0
    
    R2 R3 R4 R5 R29 saved starting at 0010BBA8
    
    R2  =  0005BA58
    R3  =  00000001
    R4  =  000F4240
    R5  =  000CB735
    R29 =  0010BC40
    
    PC  =  0003C0B8
    PD  =  0005BA58 (KRN$_TIMER)
    FP  =  0010BC40
    SP  =  0010BBE0
    
    R2 R3 R4 R5 R6 R7 R8 R9 R29 saved starting at 0010BC48
    
    R2  =  0005A470
    R3  =  0010AB00
    R4  =  0010ACD0
    R5  =  00000000
    R6  =  00000000
    R7  =  00000000
    R8  =  00000000
    R9  =  00000000
    R29 =  0010BCA0
    
    PC  =  00034288
    PD  =  0005A470 (KRN$_PROCESS)
    FP  =  0010BCA0
    SP  =  0010BC40
    
    R2 R3 R4 R5 R29 saved starting at 0010BCA8
    
    R2  =  00000000
    R3  =  00000000
    R4  =  00000000
    R5  =  00000000
    R29 =  00000000
    
    breakpoint at PC 61340 desired, XDELTA not loaded
    
    
    Process memtest CPU2: soft error detected, vector 00630
    
      mchk_code:    00000000 00860000
    
    EV5 IPRs:
      ei_addr:   ffffff00 0802400f  ei_stat:     fffffff9 04ffffff
      fill_syn:  00000000 00000086  isr:         00000001 00400000
    
    IOD base address: 0000000000 WHOAMI: 00000002
      PCI_REV:    00000000 MC_ERR0:    00000000 MC_ERR1    00000000
      CAP_ERR:    00000000 MDPA_STAT:  00000000 MDPA_SYN:  00000000
      MDPB_STAT:  00000000 MDPB_SYN:   00000000
    
    
    Process memtest CPU3: soft error detected, vector 00630
    
      mchk_code:    00000000 00860000
    
    EV5 IPRs:
      ei_addr:   ffffff00 0a02400f  ei_stat:     fffffff9 04ffffff
      fill_syn:  00000000 00000012  isr:         00000001 00000000
    
    IOD base address: 0000000000 WHOAMI: 00000003
      PCI_REV:    00000000 MC_ERR0:    00000000 MC_ERR1    00000000
      CAP_ERR:    00000000 MDPA_STAT:  00000000 MDPA_SYN:  00000000
      MDPB_STAT:  00000000 MDPB_SYN:   00000000
    
    
    Process memtest CPU1: soft error detected, vector 00620
    
      mchk_code:    00000000 00860000
    
    EV5 IPRs:
      ei_addr:   ffffff00 0a5e0e2f  ei_stat:     fffffff0 c4ffffff
      fill_syn:  00000000 00006800  isr:         00000001 00000000
    
    IOD base address: 0000000000 WHOAMI: 00000001
      PCI_REV:    00000000 MC_ERR0:    00000000 MC_ERR1    00000000
      CAP_ERR:    00000000 MDPA_STAT:  00000000 MDPA_SYN:  00000000
      MDPB_STAT:  00000000 MDPB_SYN:   00000000
    
    ***CPU 00: Unexpected Machine Check through vector 00000660
    
    EV5 IPRs:
      exc_addr:  00000000 00061704  exc_sum:     00000000 00000000
      exc_mask:  00000000 00000000  isr:         00000000 00200000
      icsr:      000000c1 44000000  icpe_stat:   00000000 00000000
      dcpe_stat: 00000000 00000000  va:          00000000 000b2590
      mm_stat:   00000000 00005c50  sc_addr:     ffffff00 0000f42f
      sc_stat:   00000000 00000000  bc_tag_addr: ffffff80 000f1fff
      ei_addr:   ffffff00 0ab5000f  ei_stat:     fffffff0 04ffffff
      fill_syn:  00000000 000002fe
    
    IOD: 0 base address: f9e0000000
      WHOAMI:     000008ba PCI_REV:    06008032 ENVIRON:    00000000
      CAP_CTL:    46460ff1 HAE_MEM:    00000000 HAE_IO:     00000000
      INT_CTL:    00000003 INT_REQ:    00800000 INT_MASK0:  00e70000
      INT_MASK1:  00000000 MC_ERR0:    0a024180 MC_ERR1:    800fda00
      CAP_ERR:    a1000000 PCI_ERR:    000003fd MDPA_STAT:  c0000000
      MDPA_SYN:   00000000 MDPB_STAT:  40000000 MDPB_SYN:   00680000
    
    IOD: 1 base address: fbe0000000
      WHOAMI:     000008ba PCI_REV:    06000032 ENVIRON:    00000000
      CAP_CTL:    46460ff1 HAE_MEM:    00000000 HAE_IO:     00000000
      INT_CTL:    00000003 INT_REQ:    00800000 INT_MASK0:  00c00000
      INT_MASK1:  00000000 MC_ERR0:    0a024180 MC_ERR1:    800fda00
      CAP_ERR:    a1000000 PCI_ERR:    00000000 MDPA_STAT:  c0000000
      MDPA_SYN:   86988612 MDPB_STAT:  40000000 MDPB_SYN:   00680000
    
    Console Crash... Type ;P to view stack contents
    breakpoint at PC 61340 desired, XDELTA not loaded
    
    Process timer, pcb = 0010AB00
     pc: 00000000 00061704  ps: 30000000 00000004
     r2: 00000000 00061DA0  r5: 00000000 001098E0
     r3: 00000000 001049B0  r6: 00000000 0005B008
     r4: 00000000 00000000  r7: 00000000 00000004
    
    exception context saved starting at 0010B9C0
    
    GPRs:
      0: 00000000 0000001F  16: 00000000 00000000
      1: 00000000 0010AB08  17: 00000000 0001A3F0
      2: 00000000 00058430  18: 00000000 0001A3F0
      3: 00000000 00000000  19: 00000000 001108F0
      4: 00000000 00000000  20: 00000000 0010AB00
      5: 00000000 001098E0  21: 00000000 00000000
      6: 00000000 0005B008  22: 00000000 0004FE90
      7: 00000000 00000004  23: 00000000 0000001F
      8: 00000000 001098E0  24: 00000000 00D1D41A
      9: 00000000 00000001  25: 00000000 00000001
     10: 00000000 00000000  26: 00000000 0002A7E0
     11: 00000000 00000000  27: 00000000 00062330
     12: 00000000 00000000  28: FFFFFFFF FFFFFFF2
     13: 00000000 00000000  29: 00000000 0010BB30
     14: 00000000 00000000  30: 00000000 0010BB30
     15: 00000000 00000000
    
    dump of active call frames:
    
    PC  =  00061700
    PD  =  00058430 (SCHEDULE)
    FP  =  0010BB30
    SP  =  0010BB30
    
    R2 R3 R4 R5 R6 R7 R8 R9 R10 R29 saved starting at 0010BB48
    
    R2  =  0005B008
    R3  =  0001C9E8
    R4  =  0004FE90
    R5  =  000CB735
    R6  =  0001CA50
    R7  =  00000001
    R8  =  0001C9C8
    R9  =  00000000
    R10 =  00000000
    R29 =  0010BBA0
    
    PC  =  00038ACC
    PD  =  0005B008 (KRN$_WAIT)
    FP  =  0010BBA0
    SP  =  0010BBA0
    
    R2 R3 R4 R5 R29 saved starting at 0010BBA8
    
    R2  =  0005BA58
    R3  =  00000001
    R4  =  000F4240
    R5  =  000CB735
    R29 =  0010BC40
    
    PC  =  0003C0B8
    PD  =  0005BA58 (KRN$_TIMER)
    FP  =  0010BC40
    SP  =  0010BBE0
    
    R2 R3 R4 R5 R6 R7 R8 R9 R29 saved starting at 0010BC48
    
    R2  =  0005A470
    R3  =  0010AB00
    R4  =  0010ACD0
    R5  =  00000000
    R6  =  00000000
    R7  =  00000000
    R8  =  00000000
    R9  =  00000000
    R29 =  0010BCA0
    
    PC  =  00034288
    PD  =  0005A470 (KRN$_PROCESS)
    FP  =  0010BCA0
    SP  =  0010BC40
    
    R2 R3 R4 R5 R29 saved starting at 0010BCA8
    
    R2  =  00000000
    R3  =  00000000
    R4  =  00000000
    R5  =  00000000
    R29 =  00000000
    
    breakpoint at PC 61340 desired, XDELTA not loaded
    
    
547.4PROXY::ALFORDThu Apr 10 1997 09:2517
    Phil,
    
    The only recommendation I have is to replace the mother bd
    (54-23803-01) with a rev B07 or B08. There were some changes
    made to improve MC_BUS timing margins. We never saw any failures in
    the lab, just notice that the margins were tight when we speed up the
    MC_BUS. So we modified one of the programmable parts. The difference 
    between the B07 and B08 is only etch. The B08 has new holes added so
    a power resistor support bracket could be permanently mounted. 
    If this does not fix your problem I would then open a IPMT case. 
    
    If you have problems getting a rev B07/8 mother board please let me
    know and I will see what I can do.
    
    Regards,
    Bruce
                                                              
547.5512Mb EDO mem worksLOTIMA::P_MORRISFri Apr 11 1997 12:2036
    Hi Bruce 
    
    I have some more info which may or may not shed some more light on the
    problem.
    We tried the following:-
    Fitted a 3rd PSU & ran of one old PSU + new PSU,then ran from the other
    old PSU + new PSU then all 3 PSUs.........still failed 
    
    Tried a new CPU in each slot in turn,
    ie: replacing one CPU at a time............still failed
    
    Removed all the PCI options from the machine ,clutching at
    straws I know,	.......................still failed
    
    Tried V2.0 SRM from V3.7CD ,V3.0 SRM from V3.8 CD & V4.8-3 SRM.
    All 3 versions of SRM faile in sthe same way.
    
    Removed all Synch memory from the system & fitted 512Mb EDO mem in
    bank 0.........THIS WORKS
    
    So it would seem we have a problem just with the Synch memory...
    
    I believe that I am right to say that the Alphaserver 4100 supports
    this config :-
    	4 x B3002-AB CPUs (2Mb Cache) + 4 x 128Mb banks synch memory
    I can't find anywhere that says this isn't supported.
    
    Also if it is supported is it still worth trying the rev B07/B08 rev
    mother board ??
    Or is it IPMT time ????
    
    Regards
    
    Phil
    
    
547.6PROXY::ALFORDFri Apr 11 1997 13:039
    Phil,
    	I would still recommend swapping out the mother bd (unless the
    	customer wants to keep the EDO's). If you can do this and it fixes
    	the customer problem I will then publish a blitz. Let me know if
    	you decide to change the mother board.
    
    thanks,
    bruce
    
547.7only rev B06 availableLOTIMA::P_MORRISTue Apr 15 1997 11:319
    Bruce
    
    I have checked with our logistics in the UK & found that the only
    rev of 54-23803-01 system motherboard available is rev B06.
    Are you able to help us with obtaining a rev B07/B08 module ??
    
    Regards
    
    Phil
547.8PROXY::ALFORDWed Apr 16 1997 11:3911
    Phil,
    	Sorry I did not look into the notes file yesterday. If you can't get
    	one through the P1 process, then (one time only) I would be willing
    	to upgrade a F.S. spare for you. Keep in mind, this could take
    	several weeks between shipping time and my time. If you want me to
    	do this, then send me a mail on Proxy::alford. We can handle this
    	outside this notes file.
    
    regards,
    bruce
         
547.9we'll try a P1LOTIMA::P_MORRISThu Apr 17 1997 10:565
    Bruce
    
    OK. we'll try a P1.
    
    Phil
547.10Fix is rev B07 motherboardLOTIMA::P_MORRISFri May 02 1997 08:4611
    We have just managed to get a rev B07 motherboard from Ayr
    Manufacturing. This HAS fixed the problem.
    
    So what are the differences between rev B06 & B07/08 ?
    
    Also this customer is buying in quite a few 4100s now,5 in the last
    month.They all have synchronous memory in them.
    Are there any plans to make a larger capacity synchronous memory module
    than the 64Mb module we currently have??
    
    Phil Morris
547.11PROXY::ALFORDFri May 02 1997 14:3413
    Phil, 
    
    	That's good news. The difference is a new vendor/pal code for the
    	two SEL pals. 
    
    	Also, there are not any plans for any larger sync memories.
    
    	RSE will post a Blitz soon to notify the rest of the field of this
    	potential problem.
    
    bruce