[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference wrksys::mikasa

Title:AlphaServer 1000 (aka Mikasa)
Moderator:WRKSYS::HESCHG
Created:Mon Nov 14 1994
Last Modified:Thu Jun 05 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:917
Total number of notes:3293

889.0. "AS1000A 5/400 Freeze/crash on boot" by KAOFS::D_ORMAECHEA (Denis Ormaechea... Montreal MCS) Wed Apr 23 1997 22:59

	I have the following problems with a newly installed AS1000A 5/400 
running Open VMS 6.2-1H3. They happend 7 times out of 10. I first noticed
that the system was frequently freezing after power up test .e6. and then
found the others shown below. My first move was to upgrade firmware with CD3.9
and run ECU 1.10, but nothing changed. Also removed all PCI option without
change. Before swapping parts, could somebody take a look at this ?.

Thanks in advance !

Denis

***************************************************************************


Problem # 1:

Booting the system :

bus 0, slot 11 -- ewa -- DECchip 21140-AA
ed.ec.eb.....ea.e9.e8.e7.e6.e5.e4.e3.e2.e1.e0.
V4.8-74, built on Feb  5 1997 at 13:39:39
Memory Testing and Configuration Status
     128 Meg of System Memory
     Bank 0 = 128 Mbytes(32 MB Per SIMM) Starting at 0x00000000
     Bank 1 = No Memory Detected 
     Bank 2 = No Memory Detected 
     Bank 3 = No Memory Detected 

Testing the System
Testing the Disks (read only)
Testing the Network
Change mode to Internal loopback.
Change to Normal Operating Mode.
>>>sho conf
                        Digital Equipment Corporation
                           AlphaServer 1000A 5/400

Firmware
SRM Console:	V4.8-74
ARC Console:	v5.28
PALcode:	VMS PALcode V1.19-4, OSF PALcode V1.21-6
Serial Rom:	V1.0

Processor
DECchip (tm) 21164A-2	400MHz

Memory
     128 Meg of System Memory
     Bank 0 = 128 Mbytes(32 MB Per SIMM) Starting at 0x00000000
     Bank 1 = No Memory Detected 
     Bank 2 = No Memory Detected 
     Bank 3 = No Memory Detected 


 Slot	Option			Hose 0, Bus 0, PCI
   7	Intel 82375EB       	                    	Bridge to Bus 1, EISA
   8	DECchip 21050-AA    	                    	Bridge to Bus 2, PCI
  11	DECchip 21140-AA    	ewa0.0.0.11.0       	00-00-F8-04-43-3D

 Slot	Option			Hose 0, Bus 1, EISA

 Slot	Option			Hose 0, Bus 2, PCI
   0	QLogic ISP1020      	pka0.7.0.2000.0     	SCSI Bus ID 7
				dka0.0.0.2000.0     	RZ28M
				dka100.1.0.2000.0   	RZ29B
				dka200.2.0.2000.0   	RZ29B
				dka300.3.0.2000.0   	RZ29B
				dka400.4.0.2000.0   	RRD45
				dka500.5.0.2000.0   	RZ29B
				dka600.6.0.2000.0   	RZ29B
   1	NCR 53C810          	pkb0.7.0.2001.0     	SCSI Bus ID 7
				dkb0.0.0.2001.0     	RZ29B
				dkb100.1.0.2001.0   	RZ29B
				dkb200.2.0.2001.0   	RZ29B
				dkb300.3.0.2001.0   	RZ29B
				dkb400.4.0.2001.0   	SEAGATE ST15150N
>>>sho
auto_action         	HALT            
boot_dev            	dka0.0.0.2000.0 
boot_file           	                
boot_osflags        	0,0             
boot_reset          	OFF             
bootdef_dev         	dka0.0.0.2000.0 
booted_dev          	                
booted_file         	                
booted_osflags      	                
bus_probe_algorithm 	new             
char_set            	0
com1_baud           	9600            
com1_flow           	SOFTWARE        
com1_modem          	OFF             
com2_baud           	9600            
com2_flow           	SOFTWARE        
com2_modem          	OFF             
console             	serial          
controlp            	ON              
d_bell              	off             
d_cleanup           	on              
d_complete          	off             
d_eop               	off             
d_group             	field           
d_harderr           	halt            
d_loghard           	on              
d_logsoft           	off             
d_oper              	on              
d_passes            	1
d_quick             	off             
d_report            	full            
d_runtime           	0
d_softerr           	halt            
d_startup           	off             
d_status            	off             
d_trace             	off             
dump_dev            	                
enable_audit        	ON              
ewa0_arp_tries      	3
ewa0_bootp_file     	                
ewa0_bootp_server   	                
ewa0_bootp_tries    	3
ewa0_def_ginetaddr  	0.0.0.0         
ewa0_def_inetaddr   	0.0.0.0         
ewa0_def_inetfile   	                
ewa0_def_sinetaddr  	0.0.0.0         
ewa0_def_subnetmask 	0.0.0.0         
ewa0_ginetaddr      	0.0.0.0         
ewa0_inet_init      	bootp           
ewa0_inetaddr       	0.0.0.0         
ewa0_inetfile       	                
ewa0_loop_count     	2
ewa0_loop_inc       	d0
ewa0_loop_patt      	ffffffff
ewa0_loop_size      	100
ewa0_lp_msg_node    	8
ewa0_mode           	Twisted-Pair    
ewa0_protocols      	MOP             
ewa0_sinetaddr      	0.0.0.0         
ewa0_tftp_tries     	3
fru_table           	ON              
full_powerup_diags  	ON              
i                   	g               
j                   	u               
kbd_hardware_type   	PCXAL           
language            	36
language_name       	English (American)
license             	MU              
ocp_text            	                
os_type             	OpenVMS         
pal                 	VMS PALcode V1.19-4, OSF PALcode V1.21-6
pci_parity          	off             
pka0_host_id        	7
pka0_soft_term      	on              
pkb0_disconnect     	1
pkb0_fast           	1
pkb0_host_id        	7
rcm_answer          	                
rcm_dialout         	                
rcm_init            	                
reset_boot_arg0     	0               
reset_boot_arg1     	                
reset_boot_arg2     	                
scsi_poll           	ON              
sys_serial_num      	NI711008ZF      
tga_sync_green      	ff
tt_allow_login      	1
tty_dev             	0               
version             	V4.8-74 Feb  5 1997 13:39:39

>>>boot
(boot dka0.0.0.2000.0 -flags 0,0)
Building FRU table
block 0 of dka0.0.0.2000.0 is a valid boot block
reading 1004 blocks from dka0.0.0.2000.0
bootstrap code read in
base = 1cc000, image_start = 0, image_bytes = 7d800
initializing HWRPB at 2000
initializing page table at 7ff0000
initializing machine state
setting affinity to the primary CPU
jumping to bootstrap code





    OpenVMS (TM) Alpha Operating System, Version V6.2-1H3





ff.halt code = 6
double error halt
PC = 145c4           
impure area for CPU 0 (at 5200)
5200: 0000000000000001
5208: 0000000000000006
5210: 0000000000000000
5218: 0000000000000000
.
.    and more
.
.
fe.fd.fc.fb.fa.f9.f8.breakpoint at PC 128b80 desired, XDELTA not loaded
f7.f6.f5.ef.df.ee.f4.
probing hose 0, PCI
probing PCI-to-EISA bridge, bus 1
probing PCI-to-PCI bridge, bus 2
bus 2, slot  0 -- pka -- QLogic ISP1020
bus 2, slot  1 -- pkb -- NCR 53C810
bus 0, slot 11 -- ewa -- DECchip 21140-AA
ed.ec.eb.....ea.e9.e8.e7.e6.



****************************************************************************

Problem # 2:

Shutdown & reboot of system.


Auditable event:          Audit server shutting down
Event time:               24-APR-2000 02:13:57.58
PID:                      20200111        
Username:                 SYSTEM          

%SHUTDOWN-I-REMOVE, all installed images will now be removed
%SHUTDOWN-I-DISMOUNT, all volumes will now be dismounted
%%%%%%%%%%%  OPCOM  24-APR-2000 02:13:59.45  %%%%%%%%%%%
Message from user SYSTEM on CSL01
_CSL01$OPA0:, CSL01 shutdown was requested by the operator.

%%%%%%%%%%%  OPCOM  24-APR-2000 02:13:59.46  %%%%%%%%%%%
Logfile was closed by operator _CSL01$OPA0:
Logfile was CSL01::SYS$SYSROOT:[SYSMGR]OPERATOR.LOG;16

%%%%%%%%%%%  OPCOM  24-APR-2000 02:13:59.50  %%%%%%%%%%%
Operator _CSL01$OPA0: has been disabled, username SYSTEM

halted CPU 0

halt code = 5
HALT instruction executed
PC = ffffffff8004e1dc

CPU 0 booting

(boot dka0.0.0.2000.0 -flags 0,0)
block 0 of dka0.0.0.2000.0 is a valid boot block
reading 1004 blocks from dka0.0.0.2000.0
bootstrap code read in
base = 1cc000, image_start = 0, image_bytes = 7d800
initializing HWRPB at 2000
initializing page table at 7ff0000
initializing machine state
setting affinity to the primary CPU
jumping to bootstrap code

Timeout Reset Error or CFAIL_H/no CACK_H occured. 

Processor Machine Check Through Vector 00000670
logout frame address 0x6060 code 0x98

IPRs:
EXC_ADDR:000000000012913C  EXC_SUM: 0000000000000000  EXC_MASK:000000000012913C
ISR:     0000000000400000  ICSR:    000000414C020000  IC_PERR: 0000000000002000
DC_PERR: 0000000000000000  VA:      00000000000EF144  MM_STAT: 0000000000005C50
SC_ADDR: FFFFFF00000145EF  SC_STAT: 0000000000000000  BCTAG_AD:FFFFFF80002D6FFF
EI_ADDR: FFFFFFDFFBFFFFFF  FILL_SYN:00000000000024FC  EI_STAT: FFFFFFF005FFFFFF
LD_LOCK: FFFFFF00000ED8FF

CPU CSRs:
CIA_REV:  00000102  PCI_LAT:  0000FF00  CIA_CTRL: 2104EC37  CIA_CNFG: 00000021
HAE_MEM:  8000A000  HAE_IO:   02000000  CFG:      00000001  EIR0:     00226100
EIR1:     00200800  CIA_ERR:  00000000  ERR_STAT: 0000011A  ERR_MASK: 00000F93
ECC_SYN:  00000000  MEM_ST0:  FFE00000  MEM_ST1:  10100000
PCI_ERR0: 02010006  PCI_ERR1: 40008840  PCI_ERR2: 000003FD

MEMORY BASE ADDRESS CSRs:
MCR:  20010000
MBA0: 00000033  MBA2: 00000000  MBA4: 00000000  MBA6: 00000000
MBA8: 00040033  MBAA: 00000000  MBAC: 00000000  MBAE: 00000000
TMG0: 6038C140  TMG1: 6038C140  TMG2: 6038C140

Process poll, pcb = 0002A260
 pc: 000000000012913C  ps: 0000000000001F00
 r2: 0000000000129550  r5: 0000000000000001
 r3: 00000000000269C8  r6: 0000000000000000
 r4: 0000000000000060  r7: 0000000000141F1C

exception context saved starting at 0002B1C0

GPRs:
  0: 00000000 0000001F  16: 00000000 0000001F
  1: 00000000 00000001  17: 00000085 80000C20
  2: 00000000 0010BB70  18: 00000000 03FFFFFF
  3: 00000000 00000000  19: 00000000 00000000
  4: 00000000 00000000  20: 00000000 00000001
  5: 00000000 00000001  21: 00000000 000381C4
  6: 00000000 00000000  22: 00000085 80007FA0
  7: 00000000 00141F1C  23: FFFFFFFF FFB060FF
  8: 00000000 00000000  24: 00000000 000383D8
  9: 00000000 00000000  25: 00000000 00000001
 10: 00000000 00000000  26: 00000000 0009A0B4
 11: 00000000 00000000  27: 00000000 00129EC0
 12: 00000000 00000000  28: 00000000 FFB0FFFF
 13: 00000000 00000000  29: 00000000 0002B300
 14: 00000000 00000000  30: 00000000 0002B300
 15: 00000000 00000000

dump of active call frames:

PC  =  00129138
PD  =  0010BB70
FP  =  0002B300
SP  =  0002B300

R2 R3 R29 saved starting at 0002B308

R2  =  000003FD
R3  =  000FF648
R29 =  0002B330

PC  =  0009A128
PD  =  0010BC38
FP  =  0002B330
SP  =  0002B330

R2 R29 saved starting at 0002B338

R2  =  00110ED0
R29 =  0002B350

PC  =  000A45F0
PD  =  00110ED0
FP  =  0002B350
SP  =  0002B350

R2 R3 R29 saved starting at 0002B358

R2  =  00102C30
R3  =  00141E80
R29 =  0002B380

PC  =  000682E4
PD  =  00102C30
FP  =  0002B380
SP  =  0002B380

R2 R3 R4 R5 R6 R29 saved starting at 0002B388

R2  =  00104058
R3  =  0002A260
R4  =  00000006
R5  =  00000002
R6  =  000385A8
R29 =  0002B3C0

PC  =  0005DDAC
PD  =  00104058
FP  =  0002B3C0
SP  =  0002B3C0

R2 R3 R4 R5 R6 R7 R29 saved starting at 0002B3C8

R2  =  00104380
R3  =  0002A260
R4  =  0002A430
R5  =  00000000
R6  =  00000000
R7  =  00000000
R29 =  0002B410

PC  =  0005971C
PD  =  00104380
FP  =  0002B410
SP  =  0002B410

R2 R3 R4 R29 saved starting at 0002B418

R2  =  00000000
R3  =  00000000
R4  =  00000000
R29 =  00000000






    OpenVMS (TM) Alpha Operating System, Version V6.2-1H3














**** OpenVMS (TM) Alpha Operating System V6.2-1H3 - BUGCHECK ****


** Code=00000215: MACHINECHK, Machine check while in kernel mode


** Crash CPU: 00    Primary CPU: 00    Active CPUs: 00000001


** Current Process = SWAPPER


** Image Name = 


**** Starting Memory Dump...


..............................


**** Global page table not in memory - no global pages dumped


...Complete ****



halted CPU 0

halt code = 5
HALT instruction executed
PC = ffffffff8004e1dc

CPU 0 booting

(boot dka0.0.0.2000.0 -flags 0,0)
block 0 of dka0.0.0.2000.0 is a valid boot block
reading 1004 blocks from dka0.0.0.2000.0
bootstrap code read in
base = 1cc000, image_start = 0, image_bytes = 7d800
initializing HWRPB at 2000
initializing page table at 7ff0000
initializing machine state
setting affinity to the primary CPU
jumping to bootstrap code

T.RTitleUserPersonal
Name
DateLines
889.1Best Bet CPU ModuleWRKSYS::DENNINGTue Apr 29 1997 11:443
    Looks like a problem with the CPU module.
    
    Don
889.2Final fix !KAOFS::ras022p01.mqo.dec.com::D_ORMAECHEAWed Apr 30 1997 22:3414
	The problem was still there after replacing the CPU module and the motherboard. I 
found out that by disconnecting the 16 Bit scsi cable from the motherboard, that the system 
started to work. I removed all 6 drives out of the storagework backplane and reconnected the 
cable. Then i started to add drives one by one till i got to drive #6, my problem came back ! 
.
The final fix for this, was to replace a marginal power supply  that was probably affected by 
the load of the RZ29B `s that was causing the +3 V for the ALPHA chip to go out of spec.

	That one was a crazy one !!!


Denis

889.3reformatted to 80 columnsUTOPIE::OETTLhide bug until worst timeFri May 02 1997 07:3720
         <<< Note 889.2 by KAOFS::ras022p01.mqo.dec.com::D_ORMAECHEA >>>
                                -< Final fix ! >-


The problem was still there after replacing the CPU module and the motherboard.
I found out that by disconnecting the 16 Bit scsi cable from the motherboard,
that the system started to work. I removed all 6 drives out of the storagework
backplane and reconnected the cable. Then i started to add drives one by one
till i got to drive #6, my problem came back ! 
.
The final fix for this, was to replace a marginal power supply  that was
probably affected by the load of the RZ29B `s that was causing the +3 V for
the ALPHA chip to go out of spec.

	That one was a crazy one !!!


Denis