[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference mvblab::alphaserver_4100

Title:AlphaServer 4100
Moderator:MOVMON::DAVISS
Created:Tue Apr 16 1996
Last Modified:Fri Jun 06 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:648
Total number of notes:3158

604.0. "0xFACEFEED error/crash AS4000 5/400" by ACISS2::SDATZMAN () Wed May 14 1997 16:20

    I have an AlphaServer 4000 5/400 NT4.0 Cluster running
    in a critical, production mode at Cummins Engine. The
    environment was upgraded from NT3.51 and Clusters V1.0
    to NT4.0/SP2 and Clusters V1.1 about a month ago, and has
    been running OK for a couple of weeks.
    
    Over the last 2 weeks, hovever, we have begun to get 
    intermittent machine crashes on one of the servers. Currently, 
    we cannot boot the failed machine, nor can we successfully
    reinstall NT. When the NT install sequence transitions
    from the character-cell portion to the GUI portion we get
    the "0xFACEFEED" error referenced in Field Blitz 93.27 of
    this conference. Running TEST from the SRM console, we also
    see a "CPU0 Soft Error Vector 00620".
    
    Config Data and actions taken:
    ------------------------------
    AS4000 5/400 (1 CPU, 512 MB EDO RAM)
    KZPDA  (2 RZ29Bs, TZ88N)
    KZPSA  (to RA410)
    DEFPA (2)
    DE500-XA (2)
    Standard S3 video
    I/O options installed by CSS, buss positions have not been changed
    All 47 Field Blitzes have been read and applied where applicable
    We do not have DECevent (for NT); where can I find it ?
    NT 4.0 plus SP2
    NT HAL 4.0b
    Hotfix (Microsoft Knowledgebase #156410) NOT applied (1 CPU only)
    
    Some module numbers/revs follow:
      B3004-AA      Rev B04             CPU
      30-44712-01   Rev B03             Power
      B3030-ET      Rev A06             RAM (2 modules)
      54-23803-02   Rev A02             "Dodge" Mother Board ?
      54-24117-01   Rev F03             Power
      B3050-AA      Rev H03             "Saddle" ?
      B3040-AA      Rev D07             "Horse" ?
    
    We have invoked Field Service, and are currently trying to solve this
    problem and get NT re-installed. They have replaced the CPU module
    with a new one (because of the 620 Soft Error), but this has not solved
    the problem.
    
    Any ideas ?
    
    Thanks,
    
    Steve
    
    
       
    
T.RTitleUserPersonal
Name
DateLines
604.1Set MEMORY_TEST=PARTIALACISS2::SDATZMANWed May 14 1997 18:075
    After checking SRM EV "MEMORY_TEST" a second time, I found that it
    was set to "PARTIAL". Consistent with Field Blitz 93.38, I set this
    back to "FULL", and we seem to be in business.
    
    Steve
604.2MAY21::CUMMINSWed May 14 1997 18:2432
    My guess is that there's one or more bad pages of memory in said
    system. Partial test mode is not supported as you know and passes
    bitmaps/descriptors to the operating system that says all untested
    memory is "good"/usable.
    
    What may have happened is that enabling full test has led to these
    pages being marked bad such that they no longer get used by the OS.
    
    Under SRM, one can do:
    
      P00>>> b pmem:0 -h
      .
      .
      P00>>> info 1
    
    to read bad page info.
    
    One can also do:
    
      P00>>> d toy:22 22 -b
    
    to enable "MFG mode" which causes failure info to be reported to COM1
    that doesn't normally get printed. All failures would be reported in
    this case, whereas power-up tests normally simply mark bad pages in the
    memory bitmaps/descriptors.
    
    
    One would probably not want to leave TOY:22 <-- 22 at a customer site,
    but this may be a valuable on-site debug tool. To disable, write 0 to
    this TOY NVRAM location.
    
    BC
604.3Thanks, Will CheckACISS2::SDATZMANThu May 15 1997 12:484
    Thanks. I'll check this out, as we seem to still be having some
    random problems.
    
    Steve