[Search for users]
[Overall Top Noters]
[List of all Conferences]
[Download this site]
Title: | AlphaServer 1000 (aka Mikasa) |
|
Moderator: | WRKSYS::HESCH G |
|
Created: | Mon Nov 14 1994 |
Last Modified: | Thu Jun 05 1997 |
Last Successful Update: | Fri Jun 06 1997 |
Number of topics: | 917 |
Total number of notes: | 3293 |
889.0. "AS1000A 5/400 Freeze/crash on boot" by KAOFS::D_ORMAECHEA (Denis Ormaechea... Montreal MCS) Wed Apr 23 1997 22:59
I have the following problems with a newly installed AS1000A 5/400
running Open VMS 6.2-1H3. They happend 7 times out of 10. I first noticed
that the system was frequently freezing after power up test .e6. and then
found the others shown below. My first move was to upgrade firmware with CD3.9
and run ECU 1.10, but nothing changed. Also removed all PCI option without
change. Before swapping parts, could somebody take a look at this ?.
Thanks in advance !
Denis
***************************************************************************
Problem # 1:
Booting the system :
bus 0, slot 11 -- ewa -- DECchip 21140-AA
ed.ec.eb.....ea.e9.e8.e7.e6.e5.e4.e3.e2.e1.e0.
V4.8-74, built on Feb 5 1997 at 13:39:39
Memory Testing and Configuration Status
128 Meg of System Memory
Bank 0 = 128 Mbytes(32 MB Per SIMM) Starting at 0x00000000
Bank 1 = No Memory Detected
Bank 2 = No Memory Detected
Bank 3 = No Memory Detected
Testing the System
Testing the Disks (read only)
Testing the Network
Change mode to Internal loopback.
Change to Normal Operating Mode.
>>>sho conf
Digital Equipment Corporation
AlphaServer 1000A 5/400
Firmware
SRM Console: V4.8-74
ARC Console: v5.28
PALcode: VMS PALcode V1.19-4, OSF PALcode V1.21-6
Serial Rom: V1.0
Processor
DECchip (tm) 21164A-2 400MHz
Memory
128 Meg of System Memory
Bank 0 = 128 Mbytes(32 MB Per SIMM) Starting at 0x00000000
Bank 1 = No Memory Detected
Bank 2 = No Memory Detected
Bank 3 = No Memory Detected
Slot Option Hose 0, Bus 0, PCI
7 Intel 82375EB Bridge to Bus 1, EISA
8 DECchip 21050-AA Bridge to Bus 2, PCI
11 DECchip 21140-AA ewa0.0.0.11.0 00-00-F8-04-43-3D
Slot Option Hose 0, Bus 1, EISA
Slot Option Hose 0, Bus 2, PCI
0 QLogic ISP1020 pka0.7.0.2000.0 SCSI Bus ID 7
dka0.0.0.2000.0 RZ28M
dka100.1.0.2000.0 RZ29B
dka200.2.0.2000.0 RZ29B
dka300.3.0.2000.0 RZ29B
dka400.4.0.2000.0 RRD45
dka500.5.0.2000.0 RZ29B
dka600.6.0.2000.0 RZ29B
1 NCR 53C810 pkb0.7.0.2001.0 SCSI Bus ID 7
dkb0.0.0.2001.0 RZ29B
dkb100.1.0.2001.0 RZ29B
dkb200.2.0.2001.0 RZ29B
dkb300.3.0.2001.0 RZ29B
dkb400.4.0.2001.0 SEAGATE ST15150N
>>>sho
auto_action HALT
boot_dev dka0.0.0.2000.0
boot_file
boot_osflags 0,0
boot_reset OFF
bootdef_dev dka0.0.0.2000.0
booted_dev
booted_file
booted_osflags
bus_probe_algorithm new
char_set 0
com1_baud 9600
com1_flow SOFTWARE
com1_modem OFF
com2_baud 9600
com2_flow SOFTWARE
com2_modem OFF
console serial
controlp ON
d_bell off
d_cleanup on
d_complete off
d_eop off
d_group field
d_harderr halt
d_loghard on
d_logsoft off
d_oper on
d_passes 1
d_quick off
d_report full
d_runtime 0
d_softerr halt
d_startup off
d_status off
d_trace off
dump_dev
enable_audit ON
ewa0_arp_tries 3
ewa0_bootp_file
ewa0_bootp_server
ewa0_bootp_tries 3
ewa0_def_ginetaddr 0.0.0.0
ewa0_def_inetaddr 0.0.0.0
ewa0_def_inetfile
ewa0_def_sinetaddr 0.0.0.0
ewa0_def_subnetmask 0.0.0.0
ewa0_ginetaddr 0.0.0.0
ewa0_inet_init bootp
ewa0_inetaddr 0.0.0.0
ewa0_inetfile
ewa0_loop_count 2
ewa0_loop_inc d0
ewa0_loop_patt ffffffff
ewa0_loop_size 100
ewa0_lp_msg_node 8
ewa0_mode Twisted-Pair
ewa0_protocols MOP
ewa0_sinetaddr 0.0.0.0
ewa0_tftp_tries 3
fru_table ON
full_powerup_diags ON
i g
j u
kbd_hardware_type PCXAL
language 36
language_name English (American)
license MU
ocp_text
os_type OpenVMS
pal VMS PALcode V1.19-4, OSF PALcode V1.21-6
pci_parity off
pka0_host_id 7
pka0_soft_term on
pkb0_disconnect 1
pkb0_fast 1
pkb0_host_id 7
rcm_answer
rcm_dialout
rcm_init
reset_boot_arg0 0
reset_boot_arg1
reset_boot_arg2
scsi_poll ON
sys_serial_num NI711008ZF
tga_sync_green ff
tt_allow_login 1
tty_dev 0
version V4.8-74 Feb 5 1997 13:39:39
>>>boot
(boot dka0.0.0.2000.0 -flags 0,0)
Building FRU table
block 0 of dka0.0.0.2000.0 is a valid boot block
reading 1004 blocks from dka0.0.0.2000.0
bootstrap code read in
base = 1cc000, image_start = 0, image_bytes = 7d800
initializing HWRPB at 2000
initializing page table at 7ff0000
initializing machine state
setting affinity to the primary CPU
jumping to bootstrap code
OpenVMS (TM) Alpha Operating System, Version V6.2-1H3
ff.halt code = 6
double error halt
PC = 145c4
impure area for CPU 0 (at 5200)
5200: 0000000000000001
5208: 0000000000000006
5210: 0000000000000000
5218: 0000000000000000
.
. and more
.
.
fe.fd.fc.fb.fa.f9.f8.breakpoint at PC 128b80 desired, XDELTA not loaded
f7.f6.f5.ef.df.ee.f4.
probing hose 0, PCI
probing PCI-to-EISA bridge, bus 1
probing PCI-to-PCI bridge, bus 2
bus 2, slot 0 -- pka -- QLogic ISP1020
bus 2, slot 1 -- pkb -- NCR 53C810
bus 0, slot 11 -- ewa -- DECchip 21140-AA
ed.ec.eb.....ea.e9.e8.e7.e6.
****************************************************************************
Problem # 2:
Shutdown & reboot of system.
Auditable event: Audit server shutting down
Event time: 24-APR-2000 02:13:57.58
PID: 20200111
Username: SYSTEM
%SHUTDOWN-I-REMOVE, all installed images will now be removed
%SHUTDOWN-I-DISMOUNT, all volumes will now be dismounted
%%%%%%%%%%% OPCOM 24-APR-2000 02:13:59.45 %%%%%%%%%%%
Message from user SYSTEM on CSL01
_CSL01$OPA0:, CSL01 shutdown was requested by the operator.
%%%%%%%%%%% OPCOM 24-APR-2000 02:13:59.46 %%%%%%%%%%%
Logfile was closed by operator _CSL01$OPA0:
Logfile was CSL01::SYS$SYSROOT:[SYSMGR]OPERATOR.LOG;16
%%%%%%%%%%% OPCOM 24-APR-2000 02:13:59.50 %%%%%%%%%%%
Operator _CSL01$OPA0: has been disabled, username SYSTEM
halted CPU 0
halt code = 5
HALT instruction executed
PC = ffffffff8004e1dc
CPU 0 booting
(boot dka0.0.0.2000.0 -flags 0,0)
block 0 of dka0.0.0.2000.0 is a valid boot block
reading 1004 blocks from dka0.0.0.2000.0
bootstrap code read in
base = 1cc000, image_start = 0, image_bytes = 7d800
initializing HWRPB at 2000
initializing page table at 7ff0000
initializing machine state
setting affinity to the primary CPU
jumping to bootstrap code
Timeout Reset Error or CFAIL_H/no CACK_H occured.
Processor Machine Check Through Vector 00000670
logout frame address 0x6060 code 0x98
IPRs:
EXC_ADDR:000000000012913C EXC_SUM: 0000000000000000 EXC_MASK:000000000012913C
ISR: 0000000000400000 ICSR: 000000414C020000 IC_PERR: 0000000000002000
DC_PERR: 0000000000000000 VA: 00000000000EF144 MM_STAT: 0000000000005C50
SC_ADDR: FFFFFF00000145EF SC_STAT: 0000000000000000 BCTAG_AD:FFFFFF80002D6FFF
EI_ADDR: FFFFFFDFFBFFFFFF FILL_SYN:00000000000024FC EI_STAT: FFFFFFF005FFFFFF
LD_LOCK: FFFFFF00000ED8FF
CPU CSRs:
CIA_REV: 00000102 PCI_LAT: 0000FF00 CIA_CTRL: 2104EC37 CIA_CNFG: 00000021
HAE_MEM: 8000A000 HAE_IO: 02000000 CFG: 00000001 EIR0: 00226100
EIR1: 00200800 CIA_ERR: 00000000 ERR_STAT: 0000011A ERR_MASK: 00000F93
ECC_SYN: 00000000 MEM_ST0: FFE00000 MEM_ST1: 10100000
PCI_ERR0: 02010006 PCI_ERR1: 40008840 PCI_ERR2: 000003FD
MEMORY BASE ADDRESS CSRs:
MCR: 20010000
MBA0: 00000033 MBA2: 00000000 MBA4: 00000000 MBA6: 00000000
MBA8: 00040033 MBAA: 00000000 MBAC: 00000000 MBAE: 00000000
TMG0: 6038C140 TMG1: 6038C140 TMG2: 6038C140
Process poll, pcb = 0002A260
pc: 000000000012913C ps: 0000000000001F00
r2: 0000000000129550 r5: 0000000000000001
r3: 00000000000269C8 r6: 0000000000000000
r4: 0000000000000060 r7: 0000000000141F1C
exception context saved starting at 0002B1C0
GPRs:
0: 00000000 0000001F 16: 00000000 0000001F
1: 00000000 00000001 17: 00000085 80000C20
2: 00000000 0010BB70 18: 00000000 03FFFFFF
3: 00000000 00000000 19: 00000000 00000000
4: 00000000 00000000 20: 00000000 00000001
5: 00000000 00000001 21: 00000000 000381C4
6: 00000000 00000000 22: 00000085 80007FA0
7: 00000000 00141F1C 23: FFFFFFFF FFB060FF
8: 00000000 00000000 24: 00000000 000383D8
9: 00000000 00000000 25: 00000000 00000001
10: 00000000 00000000 26: 00000000 0009A0B4
11: 00000000 00000000 27: 00000000 00129EC0
12: 00000000 00000000 28: 00000000 FFB0FFFF
13: 00000000 00000000 29: 00000000 0002B300
14: 00000000 00000000 30: 00000000 0002B300
15: 00000000 00000000
dump of active call frames:
PC = 00129138
PD = 0010BB70
FP = 0002B300
SP = 0002B300
R2 R3 R29 saved starting at 0002B308
R2 = 000003FD
R3 = 000FF648
R29 = 0002B330
PC = 0009A128
PD = 0010BC38
FP = 0002B330
SP = 0002B330
R2 R29 saved starting at 0002B338
R2 = 00110ED0
R29 = 0002B350
PC = 000A45F0
PD = 00110ED0
FP = 0002B350
SP = 0002B350
R2 R3 R29 saved starting at 0002B358
R2 = 00102C30
R3 = 00141E80
R29 = 0002B380
PC = 000682E4
PD = 00102C30
FP = 0002B380
SP = 0002B380
R2 R3 R4 R5 R6 R29 saved starting at 0002B388
R2 = 00104058
R3 = 0002A260
R4 = 00000006
R5 = 00000002
R6 = 000385A8
R29 = 0002B3C0
PC = 0005DDAC
PD = 00104058
FP = 0002B3C0
SP = 0002B3C0
R2 R3 R4 R5 R6 R7 R29 saved starting at 0002B3C8
R2 = 00104380
R3 = 0002A260
R4 = 0002A430
R5 = 00000000
R6 = 00000000
R7 = 00000000
R29 = 0002B410
PC = 0005971C
PD = 00104380
FP = 0002B410
SP = 0002B410
R2 R3 R4 R29 saved starting at 0002B418
R2 = 00000000
R3 = 00000000
R4 = 00000000
R29 = 00000000
OpenVMS (TM) Alpha Operating System, Version V6.2-1H3
**** OpenVMS (TM) Alpha Operating System V6.2-1H3 - BUGCHECK ****
** Code=00000215: MACHINECHK, Machine check while in kernel mode
** Crash CPU: 00 Primary CPU: 00 Active CPUs: 00000001
** Current Process = SWAPPER
** Image Name =
**** Starting Memory Dump...
..............................
**** Global page table not in memory - no global pages dumped
...Complete ****
halted CPU 0
halt code = 5
HALT instruction executed
PC = ffffffff8004e1dc
CPU 0 booting
(boot dka0.0.0.2000.0 -flags 0,0)
block 0 of dka0.0.0.2000.0 is a valid boot block
reading 1004 blocks from dka0.0.0.2000.0
bootstrap code read in
base = 1cc000, image_start = 0, image_bytes = 7d800
initializing HWRPB at 2000
initializing page table at 7ff0000
initializing machine state
setting affinity to the primary CPU
jumping to bootstrap code
T.R | Title | User | Personal Name | Date | Lines |
---|
889.1 | Best Bet CPU Module | WRKSYS::DENNING | | Tue Apr 29 1997 11:44 | 3 |
| Looks like a problem with the CPU module.
Don
|
889.2 | Final fix ! | KAOFS::ras022p01.mqo.dec.com::D_ORMAECHEA | | Wed Apr 30 1997 22:34 | 14 |
|
The problem was still there after replacing the CPU module and the motherboard. I
found out that by disconnecting the 16 Bit scsi cable from the motherboard, that the system
started to work. I removed all 6 drives out of the storagework backplane and reconnected the
cable. Then i started to add drives one by one till i got to drive #6, my problem came back !
.
The final fix for this, was to replace a marginal power supply that was probably affected by
the load of the RZ29B `s that was causing the +3 V for the ALPHA chip to go out of spec.
That one was a crazy one !!!
Denis
|
889.3 | reformatted to 80 columns | UTOPIE::OETTL | hide bug until worst time | Fri May 02 1997 07:37 | 20 |
| <<< Note 889.2 by KAOFS::ras022p01.mqo.dec.com::D_ORMAECHEA >>>
-< Final fix ! >-
The problem was still there after replacing the CPU module and the motherboard.
I found out that by disconnecting the 16 Bit scsi cable from the motherboard,
that the system started to work. I removed all 6 drives out of the storagework
backplane and reconnected the cable. Then i started to add drives one by one
till i got to drive #6, my problem came back !
.
The final fix for this, was to replace a marginal power supply that was
probably affected by the load of the RZ29B `s that was causing the +3 V for
the ALPHA chip to go out of spec.
That one was a crazy one !!!
Denis
|