[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference mvblab::alphaserver_4100

Title:AlphaServer 4100
Moderator:MOVMON::DAVISS
Created:Tue Apr 16 1996
Last Modified:Fri Jun 06 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:648
Total number of notes:3158

556.0. "dual power supply problem" by MAASUP::SIERAKOWSKI () Tue Apr 01 1997 13:38

Hello, I'm having a problem with two 4100 in a single rack. The customer
    order the systems with dual power supplies. Each system has two 400mhz
    cpu's with two gigs of memory. The system is running 3.2g on a
    KZPSC-XB. Firmware loaded from the 3.9CD. 
    
    The problem is when we remove power from one of the supplies (from the
    front of the box). The system will crash with kernel memory fault.
    
    Things I've tried.
    
            *   replaced both power supplies
            *   replaced pcm module
            *   load 4.0b on disk on kzpaa 
            *   connected power directly into wall not into power control 
    
    
                                  anyone have this problem ?
    
T.RTitleUserPersonal
Name
DateLines
556.1POBOXB::BAKWed Apr 02 1997 11:041
Did you check to make sure the current share cable is installed?
556.2SHARE CABLEMAASUP::SIERAKOWSKIWed Apr 02 1997 11:101
    YES THE CURRENT SHARE CABLE IS INSTALLED 
556.3tried but not reproduced hereAFW4::SAVAGEWed Apr 02 1997 19:1614
    Hi,
      I checked out one of our systems here (PKO) to see if I could reproduce
    the failure you have seen.  No problems were seen using console version
    V4.8-6 and testing with both Digital UNIX V3.2G and V4.0B.   Pulling
    the plug from either power supply would record a CPU exception error in
    the error log.  Checking the show power command under console yielded the
    expected results as well: power supply failures when power removed and
    then normal with power restored.  The configuration differed from your
    customer's system (kzpsc-xa, 466MHz CPUs, less memory) but was told
    this would not matter w.r.t. the power.
      Perhaps the error log info. can give you more clues on the failure.
    Posting the error log may help others to give you some more input on this
    problem.
    	-Phyllis
556.4MAY30::CUMMINSMon Apr 07 1997 13:276
    For other readers out there.. 
    
    The base noter neglected to mention that VMS doesn't crash when
    removing N+1 power and that UNIX only does after 10 or more attempts.
    The intermittent crashing when N+1 is removed under UNIX is being
    investigated.
556.5problem isolated to PALcodeMAY21::SAVAGETue Apr 22 1997 20:096
    
    The problem has been isolated to a failure in the PALcode to properly
    restore register R1 after an environemental failure (i.e. power supply,
    fan or temp. failure).  A fixed version of console will be built and
    posted to the WWW when available.
    	-Phyllis
556.6PALcode fix V4.8-7 consoleMAY21::SAVAGEFri May 16 1997 11:0554
    
    The AlphaServer 4100 console image files, that include corrected PALcode
    to properly restore register R1 after an environmental failure (power
    supply, fan, or temperature failure) are available from the Alpha firmware
    interim World Wide Website.
    
    The files may be accessed via the following URL:

	http://ftp.digital.com/pub/Digital/Alpha/firmware/interim/as4x00

    Attached below is the 00readme.txt file found at that location.  Please
    note the modified instructions for creating an update floppy diskette.

	-Phyllis
    
 This /as4x00 interim directory contains the SRM console V4.8-7 files.  These
 V4.8-7 files supercede the previous interim release (V4.8-6) and the V4.8-5
 files found on the Alpha Systems Firmware Update V3.9 CD and also located in
 the /v3.9/as4x00 Web directory.

 The V4.8-7 is an interim release, prior to the next Alpha Systems Firmware
 Update CD/ROM, to correct a PALcode problem exhibited under environmental
 failures: power supply, fan and temperature failures.  The V4.8-7 code does
 include the previous interim release console (V4.8-6) changes.

 Caution: 
  The Release notes instructions in Chapter 2, Updating Using a FAT-Formatted
  Floppy Diskette, lists copying as4x00 .txt files (as4x00cp, as4x00fw, 
  as4x00io) to floppy diskettes.  Creating floppy diskettes using .txt files
  that were saved/copied via a Web Browser will NOT work.  To avoid this
  problem, when copying files from the Web site, use the .sys files.  The
  console release notes instructions (example below) will be changed to 
  reflect this update information.

        Updating Using a FAT-Formatted Floppy Diskette 

        o  Insert a 1.44MB FAT-formatted diskette in your floppy drive.
        o  Copy the following files to diskette #1:

                as4x00cp.sys
                as4x00fw.sys
                rhreadme.sys
                rhsrmrom.sys
                rharcrom.sys

        o  Copy the following files to diskette #2 if you'll be updating I/O
           option firmware:

                as4x00io.sys
                rhreadme.sys
                cipca315.sys
                dfxaa310.sys
                kzpsaa10.sys
    
556.7Dunix PALcode fix V4.8-7 consoleCSC32::HUTMACHERMon May 19 1997 12:0619
    Thanks Phyllis                                        
    
    we pulled the new SRM console code V4.8-7 from 
    
    http://ftp.digital.com/pub/Digital/Alpha/firmware/interim/as4x00
    
    and installed it on the csc lab machine which would do a kernal memory
    fault panic on repetitive power failures (3-20 times) in a dual/tri
    reduntant power mode.
    
    with the new srm console installed dunix no longer paniced with more
    the 100 repetitive power failures. 
    i do not Recommend this kind of testing on your 4100/4000 machine but 
    we had site that reguired this survival testing to work.
    very obscure problem, thanks engring for finding and correcting the
    problem
    now Dunix does run as advertised in dual/tri reduntant power modes.
    
    jim hutmacher mvhs colorado csc 800-354-9000 ext 25561