| Title: | Alpha Workstation Conference |
| Notice: | See note 1.* for conference notices |
| Moderator: | WRKSYS::HOUSE |
| Created: | Wed Sep 07 1994 |
| Last Modified: | Fri Jun 06 1997 |
| Last Successful Update: | Fri Jun 06 1997 |
| Number of topics: | 1996 |
| Total number of notes: | 9122 |
Hello,
I have a customer which recently upgraded his Alphastation 400 to V6.3-4
contained on the V3.8 firmware cd. When he tried to reboot the system the
alphastation hung after a machine check. He reinitialized the system & it
rebooted OK. From processing the machine check it appears to be a PCI/ISA
response problem. The following stars article describes a very similar problem.
{Elev} Boot/Shutdown issues VMS Alpha 7.0,7.1 with 3.8,3.7 CD firmware
Can anyone add any value & will downgraging the firmware fix the problem.
Regards,
Norm Pettet
Customer trouble statement follows.
This box has recently been upgraded to OpenVms V7.1 and Firmware Update V3.8
We had the following scenario:
Controlled shutdown
Cycled power (power was off for ~5 minutes)
Booted system - crashed and hung during reboot.
Reset machine from front panel
Booted again from >>> was OK this time.
No valid crash dump written to dump file, or anything to error log.
Console log (via LAT port) follows:
ff.fe.fd.fc.fb.fa.f9.f8.f7.f6.f5.
ef.df.ee.ed.ec.f4.eb.....ea.e9.e8.e7.e6.e5.
V6.3-4, built on Nov 20 1996 at 09:41:23
>>>b
(boot dka300.3.0.6.0 -flags 0,0)
block 0 of dka300.3.0.6.0 is a valid boot block
reading 904 blocks from dka300.3.0.6.0
bootstrap code read in
base = 1f2000, image_start = 0, image_bytes = 71000
initializing HWRPB at 2000
initializing page table at 1e4000
initializing machine state
setting affinity to the primary CPU
jumping to bootstrap code
OpenVMS (TM) Alpha Operating System, Version V7.1
access violation fault
PCB = 000FA340 (idle)
PC = 0006C0F0
VA = 20000008304D
exception context saved starting at 000FB240
GPRs:
0: 00002000 00083089 16: 00000000 000FB390
1: 00000000 000001F4 17: 00000000 00006000
2: 00000000 000D6048 18: 00000000 00038480
3: 00000000 0000000A 19: 00000000 000C27C0
4: 00000000 00038B08 20: 00000000 000C27C0
5: 00000000 00038B08 21: 00000000 00000000
6: 00000000 00004E04 22: 00000000 000FB388
7: 00000000 0000000A 23: 00000000 00000001
8: 00000000 000F5160 24: 00657174 00000000
9: 00000000 000F5168 25: 00000000 00000000
10: 00000000 000BD6C4 26: 00000000 0006C0D0
11: 00000000 00000001 27: 00000000 0006C748
12: 00000000 000FA340 28: 00002000 00083089
13: 00000000 00000010 29: 00000000 000FB380
14: 00000000 00000011 30: 00000000 000FB380
15: 00000000 00000000
dump of active call frames:
PC = 0006C0F0
PD = 000D6048
FP = 000FB380
SP = 000FB380
bad PD; KIND = 0
breakpoint at PC f53d0 desired, XDELTA not loaded
Avanti System Machine Check Through Vector 00000660
logout frame address 0x6048 code 0x100000205
IPRs:
EXC_ADD:000000000004BB52 ICCSR: 000044F800000004 HIER: 0000000000000080
HIRR: 0000000000000040 MM_CSR: 0000000000005140 DC_STAT:0000000000000003
DC_ADDR:00000007FFFFFFFF BIU_STAT:0000000000002041 BIU_ADD:00000003F4C00000
FILL_SY:0000000000000000 FILL_ADD:0000000000011250 VA: 00000000000061D0
EXC_SUM:4672041345C01C8E BC_TAG: 428020010A403050
EDSR (Comanche): 6d8d2140-->
DCSR ( Epic): 8018081d--> ,nDEV
SEAR ( SysAddr): 00041a10
PEAR ( PciAddr): f4c00000
access violation fault
PCB = 000FA340 (idle)
PC = A43D001040E70570
VA = A43D001040E70570
unexpected exception/interrupt through vector 420
process idle, pcb = 000FA340
pc: 00000000 0006C084 ps: 20000000 00001F00
r2: 00000000 000F5D90 r5: 00000000 00001F00
r3: 00000000 00022650 r6: 00000000 000BD5EC
�r4: 00000000 00000060 r7: A43D0010 40E70570
System hung at this point.
System Configuration:
---------------------
System Information:
System Type AlphaStation 400 4/233 Primary CPU ID 00
Cycle Time 4.2 nsec (233 MHz) Pagesize 8192 Byte
Memory Configuration:
Cluster PFN Start PFN Count Range (MByte) Usage
#03 0 249 0.0 MB - 1.9 MB Console
#04 249 7942 1.9 MB - 63.9 MB System
#05 8191 1 63.9 MB - 64.0 MB Console
Per-CPU Slot Processor Information:
CPU ID 00 CPU State rc,pa,pp,cv,pv,pmv,pl
CPU Type EV45 Halt PC 00000000.20000000
PAL Code 5.56 Halt PS 00000000.00001F00
CPU Revision .... Halt Code 00000000.00000000
Serial Number .......... "Bootstrap or Powerfail"
Console Vers V6.3-4
Adapter Configuration:
----------------------
TR Adapter ADP Hose Bus BusArrayEntry Node Device Name / HW-Id
-- ----------- -------- ---- -------------------- ---- -------------------------
1 KA0D02 80C460C0 0 BUSLESS_SYSTEM
2 PCI 80C462C0 0 PCI
80C465E8 PKA: 6 NCR 53C810 SCSI
80C46620 7 SATURN
80C46700 EWA: 11 NI (Tulip)
3 ISA 80C46AC0 0 ISA
80C46C98 0 EISA_SYSTEM_BOARD
4 XBUS 80C47040 0 XBUS
80C47218 0 MOUS
80C47250 1 KBD
80C47288 2 COM1
80C472C0 TTA: 3 Serial Port
80C472F8 LRA: 4 Line Printer (parallel po
rt)
80C47330 DVA: 5 Floppy
SDA>
-----------------------------------------------------------------------
>>>sho config
Firmware
SRM Console: V6.3-4
ARC Console: 4.49
PALcode: VMS PALcode V5.56-2, OSF PALcode X1.46-2
Serial Rom: V4.6
Diag Rom: V1.7
Processor
DECchip (tm) 21064A-2 233Mhz 512KB Cache
MEMORY
64 Meg of System Memory
Bank 0 = 32 Mbytes(16 MB Per Simm) Starting at 0x0
Bank 1 = 32 Mbytes(16 MB Per Simm) Starting at 0x2000000
Bank 2 = No Memory Detected
Flash ROM0 Mfr - AMD
Flash ROM1 Mfr - AMD
Flash ROM2 Mfr - AMD
Flash ROM3 Mfr - AMD
PCI Bus
Bus 00 Slot 06: NCR 810 Scsi Controller
pka0.7.0.6.0 SCSI Bus ID 7
dka200.2.0.6.0 RZ26N
dka300.3.0.6.0 RZ26N
dka600.6.0.6.0 RRD45
mka0.0.0.6.0 TLZ09
Bus 00 Slot 07: Intel SIO 82378
Bus 00 Slot 11: DECchip 21040 Network Controller
ewa0.0.0.11.0 00-00-F8-22-31-61
ISA
Slot Device Name Type Enabled BaseAddr IRQ DMA
0
0 MOUSE Embedded Yes 60 12
1 KBD Embedded Yes 60 1
2 COM1 Embedded Yes 3f8 4
3 COM2 Embedded Yes 2f8 3
4 LPT1 Embedded Yes 3bc 7
5 FLOPPY Embedded Yes 3f0 6 2
>>>show
auto_action HALT
boot_dev dka300.3.0.6.0
boot_file
boot_osflags 0,0
boot_reset OFF
bootdef_dev dka300.3.0.6.0
booted_dev dka300.3.0.6.0
booted_file
booted_osflags 0,0
bus_probe_algorithm old
char_set 0
console serial
control_disfd enable
control_idema enable
control_irq11 scsi
control_irq12 mouse
control_scsi_term external
controlp on
dump_dev
enable_audit ON
ewa0_arp_tries 3
ewa0_bootp_file
ewa0_bootp_server
ewa0_bootp_tries 3
ewa0_def_ginetaddr 0.0.0.0
ewa0_def_inetaddr 0.0.0.0
ewa0_def_inetfile
ewa0_def_sinetaddr 0.0.0.0
ewa0_def_subnetmask 0.0.0.0
ewa0_ginetaddr 0.0.0.0
ewa0_inet_init bootp
ewa0_inetaddr 0.0.0.0
ewa0_inetfile
ewa0_loop_count 3e8
ewa0_loop_inc a
ewa0_loop_patt ffffffff
ewa0_loop_size 2e
ewa0_lp_msg_node 1
ewa0_mode AUI
ewa0_protocols MOP
ewa0_sinetaddr 0.0.0.0
ewa0_tftp_tries 3
kbd_hardware_type PCXAL
language 36
language_name English(American)
license MU
mopv3_boot OFF
os_type OpenVMS
pal VMS PALcode V5.56-2, OSF PALcode X1.46-2
pci_parity off
pka0_disconnect 1
pka0_fast 1
pka0_host_id 7
quick_start OFF
scsi_poll ON
sys_serial_num
tga_sync_green 0
timer_tps 1
tt_allow_login 1
tty_dev 0
version V6.3-4 Nov 20 1996 09:41:23
>>>
| T.R | Title | User | Personal Name | Date | Lines |
|---|---|---|---|---|---|
| 1863.1 | console machine check | STAR::jacobi.zko.dec.com::jacobi | Paul A. Jacobi - OpenVMS Systems Group | Fri Feb 21 1997 13:15 | 7 |
Note that the machine check information appears to be produced by the SRM console, and not by OpenVMS machine check handler. -Paul | |||||
| 1863.2 | any news? | KERNEL::PETTET | Norm Pettet CSC Basingstoke | Tue Feb 25 1997 05:38 | 6 |
Can anyone confirm that its a esa/pci problem as I need to update the customer today. Regards, Norm | |||||
| 1863.3 | had same problem | CALDEC::BOSKLOPPER | Mon Mar 03 1997 20:45 | 9 | |
Had almost the same problem. MCcode 203 (I/O par error)
Code lower than V5.0 works fine, but V5.X gives MC check during boot.
For unix perhaps 50 mc checks, but unix is coming up and runs fine.
I replaced CPU board what fixed my problem. Old board 54-23262-02 rev
B01. New board was at rev B03. Upgraded with Cdrom V3.8 and all is
OK. You might try an other CPU board and check for rev B03.
Ben
Palo Alto, CA
| |||||
| 1863.4 | more to add | CALDEC::BOSKLOPPER | Tue Mar 04 1997 12:33 | 10 | |
More to add to .3 The story about the mcchecks is true, but it has
nothing to do with cpu rev. I did some additional investigation and
compared the env. settings. The new cpu board had PCI-parity off.
My old cpu had it set on. By enabling parity I have the same mcchecks.
So now we have an other question, must parity be on or off?
For now I let it off and my old cpu board is running fine.
Btw firmware V4.2-2 runs fine with parity on.
Ben
Palo Alto, CA
| |||||
| 1863.5 | machine check code 203 != 205 | WRKSYS::HOUSE | Kenny House, Workstations Engineering | Tue Mar 04 1997 13:26 | 18 |
There are two problem reports mixed up here.
.0 -.2 describe a machine check with code 205h, which is a PCI device
timeout. It appears to have been a memory read of address F4C0.0000.
This looks like a problem with the firmware's reinitialization of the
hardware after the operating system has touched it. I don't know
whether there's a fix in a later rev of firmware, but a real reset
seems to take care of the problem, at least for now.
.3 & .4 describe an entirely different machine check, with a code of
203h, which is a PCI parity error. At the time the AlphaStation 400
came out, there were few, if any, Intel-based PCs that detected parity
erros. That meant that most PCI card vendors did little or no testing
of PCI parity, with the expected result that many of them didn't do it
right. For the cards we had to ship, the data are OK, just the parity
is wrong. Disable PCI parity checking.
-- Kenny House
| |||||
| 1863.6 | Avanti Machine Check Codes | WRKSYS::HOUSE | Kenny House, Workstations Engineering | Tue Mar 04 1997 13:29 | 30 |
Here's the list of machine check codes for the AlphaStation 200 &
400.
-- Kenny House
Code Description SCB Recovery Action
80 Tag Parity Error 660 Fatal
82 Tag Control Error 660 Fatal
83 Hardware Error 670 Fatal
8A Unknown Error 670 Fatal
8C CackSoft Error 670 Fatal
8E BugCheck 670 Fatal
90 OS Bugcheck 670 Fatal
92 Dcache Parity Error 670 Fatal
94 Icache Parity Error 670 Fatal
96 c3 Tag Parity Error 670 Fatal
201 I/O Read/Write Retry Timeout 660 Fatal
202 DMA Data Parity Error 660 Fatal
203 I/O Data Parity Error 660 Fatal
204 Slave Abort PCI Transaction 660 Fatal
205 DEVSEL Not Asserted 660 Fatal
207 Uncorrectable Read Error 660 Fatal
208 Invalid Page Table Lookup 660 Fatal
209 Memory Cycle Error 660 Fatal
20A Bcache Tag Address Parity Error 660 Fatal
20B Bcache Tag Control Parity Error 660 Fatal
20C Non-Existent Memory Error 660 Fatal
20D SIO Check Condition 660 Fatal
| |||||
| 1863.7 | This will be usefull... | STAR::DEAN_G | Wed Mar 05 1997 17:37 | 5 | |
Kenny, Thanks for the list. Dean Gagne | |||||
| 1863.8 | thanks | KERNEL::PETTET | Norm Pettet CSC Basingstoke | Thu Mar 06 1997 05:47 | 9 |
Kenny, Many thanks - I'll update the customer Regards, Norm | |||||