[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference wrksys::alphastation

Title:Alpha Workstation Conference
Notice:See note 1.* for conference notices
Moderator:WRKSYS::HOUSE
Created:Wed Sep 07 1994
Last Modified:Fri Jun 06 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:1996
Total number of notes:9122

1863.0. "Machine check on Alphastation 400 4/233 after firmware upgrade to V6.3-4" by KERNEL::PETTET (Norm Pettet CSC Basingstoke) Fri Feb 21 1997 11:54

Hello,


	I have a customer which recently upgraded his Alphastation 400 to V6.3-4
contained on the V3.8 firmware cd. When he tried to reboot the system the
alphastation hung after a machine check. He reinitialized the system & it
rebooted OK. From processing the machine check it appears to be a PCI/ISA
response problem. The following stars article describes a very similar problem.

{Elev} Boot/Shutdown issues VMS Alpha 7.0,7.1 with 3.8,3.7 CD firmware

	Can anyone add any value & will downgraging the firmware fix the problem.

	Regards,
		
		Norm Pettet

Customer trouble statement follows.


This box has recently been upgraded to OpenVms V7.1 and Firmware Update V3.8

We had the following scenario:

Controlled shutdown
Cycled power (power was off for ~5 minutes)
Booted system - crashed and hung during reboot.

Reset machine from front panel
Booted again from >>> was OK this time.

No valid crash dump written to dump file, or anything to error log.

Console log (via LAT port) follows:


ff.fe.fd.fc.fb.fa.f9.f8.f7.f6.f5.
ef.df.ee.ed.ec.f4.eb.....ea.e9.e8.e7.e6.e5.
V6.3-4, built on Nov 20 1996 at 09:41:23
>>>b
(boot dka300.3.0.6.0 -flags 0,0)
block 0 of dka300.3.0.6.0 is a valid boot block
reading 904 blocks from dka300.3.0.6.0
bootstrap code read in
base = 1f2000, image_start = 0, image_bytes = 71000
initializing HWRPB at 2000
initializing page table at 1e4000
initializing machine state
setting affinity to the primary CPU
jumping to bootstrap code


    OpenVMS (TM) Alpha Operating System, Version V7.1


access violation fault
    PCB =  000FA340 (idle)
    PC  =  0006C0F0
    VA  =  20000008304D

exception context saved starting at 000FB240

GPRs:
  0: 00002000 00083089  16: 00000000 000FB390
  1: 00000000 000001F4  17: 00000000 00006000
  2: 00000000 000D6048  18: 00000000 00038480
  3: 00000000 0000000A  19: 00000000 000C27C0
  4: 00000000 00038B08  20: 00000000 000C27C0
  5: 00000000 00038B08  21: 00000000 00000000
  6: 00000000 00004E04  22: 00000000 000FB388
  7: 00000000 0000000A  23: 00000000 00000001
  8: 00000000 000F5160  24: 00657174 00000000
  9: 00000000 000F5168  25: 00000000 00000000
 10: 00000000 000BD6C4  26: 00000000 0006C0D0
 11: 00000000 00000001  27: 00000000 0006C748
 12: 00000000 000FA340  28: 00002000 00083089
 13: 00000000 00000010  29: 00000000 000FB380
 14: 00000000 00000011  30: 00000000 000FB380
 15: 00000000 00000000

dump of active call frames:

PC  =  0006C0F0
PD  =  000D6048
FP  =  000FB380
SP  =  000FB380
bad PD; KIND =  0
breakpoint at PC f53d0 desired, XDELTA not loaded

Avanti System Machine Check Through Vector 00000660
logout frame address 0x6048 code 0x100000205

IPRs:
EXC_ADD:000000000004BB52  ICCSR:   000044F800000004  HIER:   0000000000000080
HIRR:   0000000000000040  MM_CSR:  0000000000005140  DC_STAT:0000000000000003
DC_ADDR:00000007FFFFFFFF  BIU_STAT:0000000000002041  BIU_ADD:00000003F4C00000
FILL_SY:0000000000000000  FILL_ADD:0000000000011250  VA:     00000000000061D0
EXC_SUM:4672041345C01C8E  BC_TAG:  428020010A403050
  EDSR (Comanche): 6d8d2140-->
  DCSR (    Epic): 8018081d--> ,nDEV
  SEAR ( SysAddr): 00041a10
  PEAR ( PciAddr): f4c00000

access violation fault
    PCB =  000FA340 (idle)
    PC  =  A43D001040E70570
    VA  =  A43D001040E70570
unexpected exception/interrupt through vector 420
process idle, pcb = 000FA340

 pc: 00000000 0006C084  ps: 20000000 00001F00
 r2: 00000000 000F5D90  r5: 00000000 00001F00
 r3: 00000000 00022650  r6: 00000000 000BD5EC
�r4: 00000000 00000060  r7: A43D0010 40E70570

 System hung at this point.



System Configuration:
---------------------
System Information:
System Type    AlphaStation 400 4/233                 Primary CPU ID 00
Cycle Time     4.2 nsec (233 MHz)                     Pagesize       8192 Byte

Memory Configuration:
Cluster    PFN Start    PFN Count         Range (MByte)        Usage
 #03             0          249         0.0 MB -     1.9 MB    Console
 #04           249         7942         1.9 MB -    63.9 MB    System
 #05          8191            1        63.9 MB -    64.0 MB    Console

Per-CPU Slot Processor Information:
CPU ID         00                        CPU State    rc,pa,pp,cv,pv,pmv,pl
CPU Type       EV45                      Halt PC      00000000.20000000
PAL Code       5.56                      Halt PS      00000000.00001F00
CPU Revision   ....                      Halt Code    00000000.00000000
Serial Number  ..........                "Bootstrap or Powerfail"
Console Vers   V6.3-4

Adapter Configuration:
----------------------
TR Adapter     ADP      Hose Bus   BusArrayEntry  Node Device Name / HW-Id
-- ----------- -------- ---- -------------------- ---- -------------------------
 1 KA0D02      80C460C0    0 BUSLESS_SYSTEM
 2 PCI         80C462C0    0 PCI
                                   80C465E8  PKA:    6 NCR 53C810 SCSI
                                   80C46620          7 SATURN
                                   80C46700  EWA:   11 NI (Tulip)
 3 ISA         80C46AC0    0 ISA
                                   80C46C98          0 EISA_SYSTEM_BOARD
 4 XBUS        80C47040    0 XBUS
                                   80C47218          0 MOUS
                                   80C47250          1 KBD
                                   80C47288          2 COM1
                                   80C472C0  TTA:    3 Serial Port
                                   80C472F8  LRA:    4 Line Printer (parallel po
rt)
                                   80C47330  DVA:    5 Floppy
SDA>

-----------------------------------------------------------------------
>>>sho config

Firmware
SRM Console:    V6.3-4
ARC Console:    4.49
PALcode:        VMS PALcode V5.56-2, OSF PALcode X1.46-2
Serial Rom:     V4.6
Diag Rom:       V1.7

Processor
DECchip (tm) 21064A-2   233Mhz 512KB Cache

MEMORY
     64 Meg of System Memory
     Bank 0 = 32 Mbytes(16 MB Per Simm) Starting at 0x0
     Bank 1 = 32 Mbytes(16 MB Per Simm) Starting at 0x2000000
     Bank 2 = No Memory Detected
     Flash ROM0  Mfr - AMD
     Flash ROM1  Mfr - AMD
     Flash ROM2  Mfr - AMD
     Flash ROM3  Mfr - AMD

PCI Bus
     Bus 00  Slot 06: NCR     810 Scsi Controller
                                   pka0.7.0.6.0          SCSI Bus ID 7
                                   dka200.2.0.6.0         RZ26N
                                   dka300.3.0.6.0         RZ26N
                                   dka600.6.0.6.0         RRD45
                                   mka0.0.0.6.0           TLZ09

     Bus 00  Slot 07: Intel SIO 82378


     Bus 00  Slot 11: DECchip 21040 Network Controller
                                   ewa0.0.0.11.0         00-00-F8-22-31-61

ISA
Slot    Device  Name            Type         Enabled  BaseAddr  IRQ     DMA
0
        0       MOUSE           Embedded        Yes     60      12
        1       KBD             Embedded        Yes     60      1
        2       COM1            Embedded        Yes     3f8     4
        3       COM2            Embedded        Yes     2f8     3
        4       LPT1            Embedded        Yes     3bc     7
        5       FLOPPY          Embedded        Yes     3f0     6       2
>>>show
auto_action             HALT
boot_dev                dka300.3.0.6.0
boot_file
boot_osflags            0,0
boot_reset              OFF
bootdef_dev             dka300.3.0.6.0
booted_dev              dka300.3.0.6.0
booted_file
booted_osflags          0,0
bus_probe_algorithm     old
char_set                0
console                 serial
control_disfd           enable
control_idema           enable
control_irq11           scsi
control_irq12           mouse
control_scsi_term       external
controlp                on
dump_dev
enable_audit            ON
ewa0_arp_tries          3
ewa0_bootp_file
ewa0_bootp_server
ewa0_bootp_tries        3
ewa0_def_ginetaddr      0.0.0.0
ewa0_def_inetaddr       0.0.0.0
ewa0_def_inetfile
ewa0_def_sinetaddr      0.0.0.0
ewa0_def_subnetmask     0.0.0.0
ewa0_ginetaddr          0.0.0.0
ewa0_inet_init          bootp
ewa0_inetaddr           0.0.0.0
ewa0_inetfile
ewa0_loop_count         3e8
ewa0_loop_inc           a
ewa0_loop_patt          ffffffff
ewa0_loop_size          2e
ewa0_lp_msg_node        1
ewa0_mode               AUI
ewa0_protocols          MOP
ewa0_sinetaddr          0.0.0.0
ewa0_tftp_tries         3
kbd_hardware_type       PCXAL
language                36
language_name           English(American)
license                 MU
mopv3_boot              OFF
os_type                 OpenVMS
pal                     VMS PALcode V5.56-2, OSF PALcode X1.46-2
pci_parity              off
pka0_disconnect         1
pka0_fast               1
pka0_host_id            7
quick_start             OFF
scsi_poll               ON
sys_serial_num
tga_sync_green          0
timer_tps               1
tt_allow_login          1
tty_dev                 0
version                 V6.3-4 Nov 20 1996 09:41:23
>>>
T.RTitleUserPersonal
Name
DateLines
1863.1console machine checkSTAR::jacobi.zko.dec.com::jacobiPaul A. Jacobi - OpenVMS Systems GroupFri Feb 21 1997 13:157
Note that the machine check information appears to be produced by the SRM 
console, and not by OpenVMS machine check handler.


							-Paul

1863.2any news?KERNEL::PETTETNorm Pettet CSC BasingstokeTue Feb 25 1997 05:386
Can anyone confirm that its a esa/pci problem as I need to update the customer
today.

		Regards,

			Norm
1863.3had same problemCALDEC::BOSKLOPPERMon Mar 03 1997 20:459
    Had almost the same problem. MCcode 203 (I/O par error)
    Code lower than V5.0 works fine, but V5.X gives MC check during boot.
    For unix perhaps 50 mc checks, but unix is coming up and runs fine.
    I replaced CPU board what fixed my problem. Old board 54-23262-02 rev
    B01. New board was at rev B03. Upgraded with Cdrom V3.8 and all is
    OK. You might try an other CPU board and check for rev B03.
                 Ben
             Palo Alto, CA
    
1863.4more to addCALDEC::BOSKLOPPERTue Mar 04 1997 12:3310
    More to add to .3  The story about the mcchecks is true, but it has
    nothing to do with cpu rev. I did some additional investigation and
    compared the env. settings. The new cpu board had PCI-parity off.
    My old cpu had it set on. By enabling parity I have the same mcchecks.
    So now we have an other question, must parity be on or off?
    For now I let it off and my old cpu board is running fine.
    Btw firmware V4.2-2 runs fine with parity on.
          Ben
         Palo Alto, CA
    
1863.5machine check code 203 != 205WRKSYS::HOUSEKenny House, Workstations EngineeringTue Mar 04 1997 13:2618
    There are two problem reports mixed up here.
    
    .0 -.2 describe a machine check with code 205h, which is a PCI device
    timeout.  It appears to have been a memory read of address F4C0.0000. 
    This looks like a problem with the firmware's reinitialization of the
    hardware after the operating system has touched it.  I don't know
    whether there's a fix in a later rev of firmware, but a real reset
    seems to take care of the problem, at least for now.
    
    .3 & .4 describe an entirely different machine check, with a code of
    203h, which is a PCI parity error.  At the time the AlphaStation 400
    came out, there were few, if any, Intel-based PCs that detected parity
    erros.  That meant that most PCI card vendors did little or no testing
    of PCI parity, with the expected result that many of them didn't do it
    right.  For the cards we had to ship, the data are OK, just the parity
    is wrong.  Disable PCI parity checking.
    
    -- Kenny House
1863.6Avanti Machine Check CodesWRKSYS::HOUSEKenny House, Workstations EngineeringTue Mar 04 1997 13:2930
    Here's the list of machine check codes for the AlphaStation 200 &
    400.
    
    -- Kenny House
    
    Code    Description                     SCB     Recovery Action
    
    80      Tag Parity Error                660     Fatal
    82      Tag Control Error               660     Fatal
    83      Hardware Error                  670     Fatal
    8A      Unknown Error                   670     Fatal
    8C      CackSoft Error                  670     Fatal
    8E      BugCheck                        670     Fatal
    90      OS Bugcheck                     670     Fatal
    92      Dcache Parity Error             670     Fatal
    94      Icache Parity Error             670     Fatal
    96      c3 Tag Parity Error             670     Fatal
    
    201     I/O Read/Write Retry Timeout    660     Fatal
    202     DMA Data Parity Error           660     Fatal
    203     I/O Data Parity Error           660     Fatal
    204     Slave Abort PCI Transaction     660     Fatal
    205     DEVSEL Not Asserted             660     Fatal
    207     Uncorrectable Read Error        660     Fatal
    208     Invalid Page Table Lookup       660     Fatal
    209     Memory Cycle Error              660     Fatal
    20A     Bcache Tag Address Parity Error 660     Fatal
    20B     Bcache Tag Control Parity Error 660     Fatal
    20C     Non-Existent Memory Error       660     Fatal
    20D     SIO Check Condition             660     Fatal
1863.7This will be usefull...STAR::DEAN_GWed Mar 05 1997 17:375
Kenny,

Thanks for the list.

Dean Gagne
1863.8thanksKERNEL::PETTETNorm Pettet CSC BasingstokeThu Mar 06 1997 05:479
Kenny,


	Many thanks - I'll update the customer


	Regards,

		Norm