Title: | Alpha Workstation Conference |
Notice: | See note 1.* for conference notices |
Moderator: | WRKSYS::HOUSE |
Created: | Wed Sep 07 1994 |
Last Modified: | Fri Jun 06 1997 |
Last Successful Update: | Fri Jun 06 1997 |
Number of topics: | 1996 |
Total number of notes: | 9122 |
Hello, I have a customer which recently upgraded his Alphastation 400 to V6.3-4 contained on the V3.8 firmware cd. When he tried to reboot the system the alphastation hung after a machine check. He reinitialized the system & it rebooted OK. From processing the machine check it appears to be a PCI/ISA response problem. The following stars article describes a very similar problem. {Elev} Boot/Shutdown issues VMS Alpha 7.0,7.1 with 3.8,3.7 CD firmware Can anyone add any value & will downgraging the firmware fix the problem. Regards, Norm Pettet Customer trouble statement follows. This box has recently been upgraded to OpenVms V7.1 and Firmware Update V3.8 We had the following scenario: Controlled shutdown Cycled power (power was off for ~5 minutes) Booted system - crashed and hung during reboot. Reset machine from front panel Booted again from >>> was OK this time. No valid crash dump written to dump file, or anything to error log. Console log (via LAT port) follows: ff.fe.fd.fc.fb.fa.f9.f8.f7.f6.f5. ef.df.ee.ed.ec.f4.eb.....ea.e9.e8.e7.e6.e5. V6.3-4, built on Nov 20 1996 at 09:41:23 >>>b (boot dka300.3.0.6.0 -flags 0,0) block 0 of dka300.3.0.6.0 is a valid boot block reading 904 blocks from dka300.3.0.6.0 bootstrap code read in base = 1f2000, image_start = 0, image_bytes = 71000 initializing HWRPB at 2000 initializing page table at 1e4000 initializing machine state setting affinity to the primary CPU jumping to bootstrap code OpenVMS (TM) Alpha Operating System, Version V7.1 access violation fault PCB = 000FA340 (idle) PC = 0006C0F0 VA = 20000008304D exception context saved starting at 000FB240 GPRs: 0: 00002000 00083089 16: 00000000 000FB390 1: 00000000 000001F4 17: 00000000 00006000 2: 00000000 000D6048 18: 00000000 00038480 3: 00000000 0000000A 19: 00000000 000C27C0 4: 00000000 00038B08 20: 00000000 000C27C0 5: 00000000 00038B08 21: 00000000 00000000 6: 00000000 00004E04 22: 00000000 000FB388 7: 00000000 0000000A 23: 00000000 00000001 8: 00000000 000F5160 24: 00657174 00000000 9: 00000000 000F5168 25: 00000000 00000000 10: 00000000 000BD6C4 26: 00000000 0006C0D0 11: 00000000 00000001 27: 00000000 0006C748 12: 00000000 000FA340 28: 00002000 00083089 13: 00000000 00000010 29: 00000000 000FB380 14: 00000000 00000011 30: 00000000 000FB380 15: 00000000 00000000 dump of active call frames: PC = 0006C0F0 PD = 000D6048 FP = 000FB380 SP = 000FB380 bad PD; KIND = 0 breakpoint at PC f53d0 desired, XDELTA not loaded Avanti System Machine Check Through Vector 00000660 logout frame address 0x6048 code 0x100000205 IPRs: EXC_ADD:000000000004BB52 ICCSR: 000044F800000004 HIER: 0000000000000080 HIRR: 0000000000000040 MM_CSR: 0000000000005140 DC_STAT:0000000000000003 DC_ADDR:00000007FFFFFFFF BIU_STAT:0000000000002041 BIU_ADD:00000003F4C00000 FILL_SY:0000000000000000 FILL_ADD:0000000000011250 VA: 00000000000061D0 EXC_SUM:4672041345C01C8E BC_TAG: 428020010A403050 EDSR (Comanche): 6d8d2140--> DCSR ( Epic): 8018081d--> ,nDEV SEAR ( SysAddr): 00041a10 PEAR ( PciAddr): f4c00000 access violation fault PCB = 000FA340 (idle) PC = A43D001040E70570 VA = A43D001040E70570 unexpected exception/interrupt through vector 420 process idle, pcb = 000FA340 pc: 00000000 0006C084 ps: 20000000 00001F00 r2: 00000000 000F5D90 r5: 00000000 00001F00 r3: 00000000 00022650 r6: 00000000 000BD5EC �r4: 00000000 00000060 r7: A43D0010 40E70570 System hung at this point. System Configuration: --------------------- System Information: System Type AlphaStation 400 4/233 Primary CPU ID 00 Cycle Time 4.2 nsec (233 MHz) Pagesize 8192 Byte Memory Configuration: Cluster PFN Start PFN Count Range (MByte) Usage #03 0 249 0.0 MB - 1.9 MB Console #04 249 7942 1.9 MB - 63.9 MB System #05 8191 1 63.9 MB - 64.0 MB Console Per-CPU Slot Processor Information: CPU ID 00 CPU State rc,pa,pp,cv,pv,pmv,pl CPU Type EV45 Halt PC 00000000.20000000 PAL Code 5.56 Halt PS 00000000.00001F00 CPU Revision .... Halt Code 00000000.00000000 Serial Number .......... "Bootstrap or Powerfail" Console Vers V6.3-4 Adapter Configuration: ---------------------- TR Adapter ADP Hose Bus BusArrayEntry Node Device Name / HW-Id -- ----------- -------- ---- -------------------- ---- ------------------------- 1 KA0D02 80C460C0 0 BUSLESS_SYSTEM 2 PCI 80C462C0 0 PCI 80C465E8 PKA: 6 NCR 53C810 SCSI 80C46620 7 SATURN 80C46700 EWA: 11 NI (Tulip) 3 ISA 80C46AC0 0 ISA 80C46C98 0 EISA_SYSTEM_BOARD 4 XBUS 80C47040 0 XBUS 80C47218 0 MOUS 80C47250 1 KBD 80C47288 2 COM1 80C472C0 TTA: 3 Serial Port 80C472F8 LRA: 4 Line Printer (parallel po rt) 80C47330 DVA: 5 Floppy SDA> ----------------------------------------------------------------------- >>>sho config Firmware SRM Console: V6.3-4 ARC Console: 4.49 PALcode: VMS PALcode V5.56-2, OSF PALcode X1.46-2 Serial Rom: V4.6 Diag Rom: V1.7 Processor DECchip (tm) 21064A-2 233Mhz 512KB Cache MEMORY 64 Meg of System Memory Bank 0 = 32 Mbytes(16 MB Per Simm) Starting at 0x0 Bank 1 = 32 Mbytes(16 MB Per Simm) Starting at 0x2000000 Bank 2 = No Memory Detected Flash ROM0 Mfr - AMD Flash ROM1 Mfr - AMD Flash ROM2 Mfr - AMD Flash ROM3 Mfr - AMD PCI Bus Bus 00 Slot 06: NCR 810 Scsi Controller pka0.7.0.6.0 SCSI Bus ID 7 dka200.2.0.6.0 RZ26N dka300.3.0.6.0 RZ26N dka600.6.0.6.0 RRD45 mka0.0.0.6.0 TLZ09 Bus 00 Slot 07: Intel SIO 82378 Bus 00 Slot 11: DECchip 21040 Network Controller ewa0.0.0.11.0 00-00-F8-22-31-61 ISA Slot Device Name Type Enabled BaseAddr IRQ DMA 0 0 MOUSE Embedded Yes 60 12 1 KBD Embedded Yes 60 1 2 COM1 Embedded Yes 3f8 4 3 COM2 Embedded Yes 2f8 3 4 LPT1 Embedded Yes 3bc 7 5 FLOPPY Embedded Yes 3f0 6 2 >>>show auto_action HALT boot_dev dka300.3.0.6.0 boot_file boot_osflags 0,0 boot_reset OFF bootdef_dev dka300.3.0.6.0 booted_dev dka300.3.0.6.0 booted_file booted_osflags 0,0 bus_probe_algorithm old char_set 0 console serial control_disfd enable control_idema enable control_irq11 scsi control_irq12 mouse control_scsi_term external controlp on dump_dev enable_audit ON ewa0_arp_tries 3 ewa0_bootp_file ewa0_bootp_server ewa0_bootp_tries 3 ewa0_def_ginetaddr 0.0.0.0 ewa0_def_inetaddr 0.0.0.0 ewa0_def_inetfile ewa0_def_sinetaddr 0.0.0.0 ewa0_def_subnetmask 0.0.0.0 ewa0_ginetaddr 0.0.0.0 ewa0_inet_init bootp ewa0_inetaddr 0.0.0.0 ewa0_inetfile ewa0_loop_count 3e8 ewa0_loop_inc a ewa0_loop_patt ffffffff ewa0_loop_size 2e ewa0_lp_msg_node 1 ewa0_mode AUI ewa0_protocols MOP ewa0_sinetaddr 0.0.0.0 ewa0_tftp_tries 3 kbd_hardware_type PCXAL language 36 language_name English(American) license MU mopv3_boot OFF os_type OpenVMS pal VMS PALcode V5.56-2, OSF PALcode X1.46-2 pci_parity off pka0_disconnect 1 pka0_fast 1 pka0_host_id 7 quick_start OFF scsi_poll ON sys_serial_num tga_sync_green 0 timer_tps 1 tt_allow_login 1 tty_dev 0 version V6.3-4 Nov 20 1996 09:41:23 >>>
T.R | Title | User | Personal Name | Date | Lines |
---|---|---|---|---|---|
1863.1 | console machine check | STAR::jacobi.zko.dec.com::jacobi | Paul A. Jacobi - OpenVMS Systems Group | Fri Feb 21 1997 13:15 | 7 |
Note that the machine check information appears to be produced by the SRM console, and not by OpenVMS machine check handler. -Paul | |||||
1863.2 | any news? | KERNEL::PETTET | Norm Pettet CSC Basingstoke | Tue Feb 25 1997 05:38 | 6 |
Can anyone confirm that its a esa/pci problem as I need to update the customer today. Regards, Norm | |||||
1863.3 | had same problem | CALDEC::BOSKLOPPER | Mon Mar 03 1997 20:45 | 9 | |
Had almost the same problem. MCcode 203 (I/O par error) Code lower than V5.0 works fine, but V5.X gives MC check during boot. For unix perhaps 50 mc checks, but unix is coming up and runs fine. I replaced CPU board what fixed my problem. Old board 54-23262-02 rev B01. New board was at rev B03. Upgraded with Cdrom V3.8 and all is OK. You might try an other CPU board and check for rev B03. Ben Palo Alto, CA | |||||
1863.4 | more to add | CALDEC::BOSKLOPPER | Tue Mar 04 1997 12:33 | 10 | |
More to add to .3 The story about the mcchecks is true, but it has nothing to do with cpu rev. I did some additional investigation and compared the env. settings. The new cpu board had PCI-parity off. My old cpu had it set on. By enabling parity I have the same mcchecks. So now we have an other question, must parity be on or off? For now I let it off and my old cpu board is running fine. Btw firmware V4.2-2 runs fine with parity on. Ben Palo Alto, CA | |||||
1863.5 | machine check code 203 != 205 | WRKSYS::HOUSE | Kenny House, Workstations Engineering | Tue Mar 04 1997 13:26 | 18 |
There are two problem reports mixed up here. .0 -.2 describe a machine check with code 205h, which is a PCI device timeout. It appears to have been a memory read of address F4C0.0000. This looks like a problem with the firmware's reinitialization of the hardware after the operating system has touched it. I don't know whether there's a fix in a later rev of firmware, but a real reset seems to take care of the problem, at least for now. .3 & .4 describe an entirely different machine check, with a code of 203h, which is a PCI parity error. At the time the AlphaStation 400 came out, there were few, if any, Intel-based PCs that detected parity erros. That meant that most PCI card vendors did little or no testing of PCI parity, with the expected result that many of them didn't do it right. For the cards we had to ship, the data are OK, just the parity is wrong. Disable PCI parity checking. -- Kenny House | |||||
1863.6 | Avanti Machine Check Codes | WRKSYS::HOUSE | Kenny House, Workstations Engineering | Tue Mar 04 1997 13:29 | 30 |
Here's the list of machine check codes for the AlphaStation 200 & 400. -- Kenny House Code Description SCB Recovery Action 80 Tag Parity Error 660 Fatal 82 Tag Control Error 660 Fatal 83 Hardware Error 670 Fatal 8A Unknown Error 670 Fatal 8C CackSoft Error 670 Fatal 8E BugCheck 670 Fatal 90 OS Bugcheck 670 Fatal 92 Dcache Parity Error 670 Fatal 94 Icache Parity Error 670 Fatal 96 c3 Tag Parity Error 670 Fatal 201 I/O Read/Write Retry Timeout 660 Fatal 202 DMA Data Parity Error 660 Fatal 203 I/O Data Parity Error 660 Fatal 204 Slave Abort PCI Transaction 660 Fatal 205 DEVSEL Not Asserted 660 Fatal 207 Uncorrectable Read Error 660 Fatal 208 Invalid Page Table Lookup 660 Fatal 209 Memory Cycle Error 660 Fatal 20A Bcache Tag Address Parity Error 660 Fatal 20B Bcache Tag Control Parity Error 660 Fatal 20C Non-Existent Memory Error 660 Fatal 20D SIO Check Condition 660 Fatal | |||||
1863.7 | This will be usefull... | STAR::DEAN_G | Wed Mar 05 1997 17:37 | 5 | |
Kenny, Thanks for the list. Dean Gagne | |||||
1863.8 | thanks | KERNEL::PETTET | Norm Pettet CSC Basingstoke | Thu Mar 06 1997 05:47 | 9 |
Kenny, Many thanks - I'll update the customer Regards, Norm |