[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference mvblab::alphaserver_4100

Title:AlphaServer 4100
Moderator:MOVMON::DAVISS
Created:Tue Apr 16 1996
Last Modified:Fri Jun 06 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:648
Total number of notes:3158

539.0. "Hangs loading AlphaBIOS on graphics" by BRUNEL::KIRBY () Thu Mar 20 1997 13:19

Two nasty faults escallated to me in one week .... first one resolved (note 536)
with help from here, but not the second yet.


This system is hanging when attempting to load AlphaBIOS on the "graphics" 
console. It hangs when the screen clears (after saying "loading AlphaBIOS 
firmware" and changing the OCP display to "Model 5/300").


The next thing I would expect is the Rawhide picture, so where is this loaded
from? (I am assuming the Flash, as part of the AlphaBIOS image that I have 
updated -- see later -- but am I correct?).


If I use the serial console then Alphabios comes up OK, but obviously it 
doesn't attempt to draw the picture. (Is the picture the clue?)



Configuration is with the Trio64 S3 graphics in the bottom PCI slot, and 
that's all. 4100, Single CPU, 256MBb memory. (Ethernet and KZPSC removed at 
present while testing.)

Other symptoms that may be relevent are:- with a KZPSC (Rev D) in the system, 
if I type >>>Show then I get two lines of variables then it says "string 
too long". 

SRM is V3.0-10, AlphaBIOS V5.24 from the V3.8 firmware CD. I have tried the 
V3.7 CD also (SRM V2.0-something, ARC V5.21) but just the same.

Items replaced to date are PCI motherboard (saddle), the bridge module (horse),
the graphics module, and the CPU (desperate there!) without changing the 
symptom. I'm now looking for more knowledge on the load-up sequence relating 
to the "Rawhide picture", and perhaps a little inspiration.



				Steve. 
T.RTitleUserPersonal
Name
DateLines
539.1Some things to tryHARMNY::CUMMINSThu Mar 20 1997 13:5165
    Looks like there are two independent things going on with this system..
    
      1. AlphaBIOS hang.
      2. SRM SHOW command returns "string too long".
    
    My initial impression is that you haven't had, nor do you now have any
    HW issues here and that you're running into two FW issues. Swapping the
    Saddle has possibly complicated the picture (at least for me) since I'm
    not sure from your mail whether the "string too long" problem occurred
    after or before you swapped the Saddle..
    
    The "string too long" problem is described in note 93.32. This is an SRM
    console firmware "feature" that was changed for the better in the V4.8
    console release (V3.9 FW CD has V4.8-5 and Web pages' /interim/as4x00
    directory has V4.8-6 required for 466 MHz CPUs). If you don't wish to
    update to V4.8-5 or -6, clearing out the ESC NVRAM used to store the
    SRM-defined, non-default environment variable (EV) values should fix this
    particular problem. V4.8 console will also fix it. And V4.8 console adds a
    CLEAR_SRM_NVRAM script for clearing out the SRM-defined NVRAM area as
    well. See below for how to clear out this area when running versions of
    console earlier than V4.8.
    
    How the SRM EVs got to this state is presumably due to one of two things:
    
      1. BOOTDEF_DEV was set to point to a boot device and then the boot
         device was moved around within the system without clearing or
         otherwise resetting the EV first.
      2. The spare Saddle that was swapped had a non-default BOOTDEF_DEV
         EV setting which happened to map to a non-boot device in your
         machine.
    
    As far as the AlphaBIOS hang is concerned; my first guess was the Fast
    NVRAM part AlphaBIOS problem described in note 93.8. However, you said
    you have tried different versions of AlphaBIOS (including V5.24 which
    fixed this problem), so apparently this is not the issue. Still, it
    would be interesting to note whether this machine's new Saddle has the
    fast or the slow NVRAM part on it - see note 93.8 for details. I would
    recommend clearing the ARC NVRAM and starting over..
    
    Clearing NVRAM regions if you don't have V4.8 console (CLEAR_SRM_NVRAM
    and CLEAR_ARC_NVRAM scripts).. The 8KB ESC NVRAM is divided by SW into
    three unique regions: 
     
      1. AlphaBIOS EVs; AlphaBIOS disk partition data; etc. (3KB in size)
      2. ECU data (3KB in size)
      3. SRM console EVs; SRM power-up NVRAM script; etc. (2KB in size)
    
    To clear these respective regions:
    
      P00>>> d eerom:0    -b -n bff 0       # ARC region (#1)
      P00>>> d eerom:c00  -b -n bff 0       # ECU region (#2)
      P00>>> d eerom:1800 -b -n 7ff 0       # SRM region (#3)
    
    Or to clear all three at once:
    
      P00>>> d eerom:0 -b -n 1fff 0         # All three regions
    
    It is advisable that you record data you want to restore (if possible to
    do so) prior to clearing these regions. V4.8 SRM console also adds save
    and restore NVRAM scripts should you want to save your NVRAM contents to a
    floppy diskette after getting things set up the way you like - especially
    useful if you need to swap a PCI motherboard (Saddle) for some reason..
    
    Let us know what you find out.
    BC
539.2Still hangs, "string too long" fixed.BRUNEL::KIRBYFri Mar 21 1997 10:0420
Bill,

	Thanks for your input. Yes I would (now) agree I have 2 problems, I 
just hadn't connected the "string too long" message with the line that had 
yet to be output on the screen! Message now fixed as recommended by clearing 
the ESC NVRAM.


As for the hang..........

	I had checked the physical NVRAMs early on, both saddles have 
200NS (-20 part number) chips so no issues there.

	I have cleared all the NVRAM regions as you suggested, but no change.
Still hangs at the point I expect the picture to appear.

	Any further thoughts?


			Steve.
539.3MAY30::CUMMINSFri Mar 21 1997 10:1238
    SROM checksum tests the system flash ROM's XSROM sectors.
    
    XSROM in turn checksum tests the SRM console + PAL sectors.
    
    However, the only time AlphaBIOS gets checksum tested is when running
    LFU or using the SHOW FLASH command at console. It's possible, though
    seemingly highly unlikely, that AlphaBIOS image in system flash ROM is
    partially corrupted. To discern whether this is true, use the following
    SRM console command:
    
      P00>>> show flash
    
    The very last several lines will tell whether the AlphaBIOS image is
    okay or not..
    
    I say that it's highly unlikely because you've swapped the PCI
    motherboard already and this didn't fix the problem. Is said machine
    a UNIX or VMS machine or is it an NT machine? If UNIX/VMS, is it able
    to boot/run the operating system?
    
    Another thing to check is whether memory is bad. We've seen this
    before. To check, do one or both of the following:
    
      P00>>> b -h pmem:0
      .
      .
      P00>>> info 1
    
    Does the INFO command say there are any bad pages? If so, how many?
    
    Invoke SRM console's system TEST command.
    
      P00>>> test
      .
      .
    
    Let us know what you find out.
    BC
539.4Still no good!BRUNEL::KIRBYFri Mar 21 1997 12:3120
Bill,

	Already checked the flash, OK on both original and replcement hardware.


	Memory is also good, both methods. (Had run "test" for a good while 
initially, and have even tried another pair of memory modules).



	This is an NT system, but has been brought into my office to fix and 
the Customer would not let me have the disks (data security). Thus I have not 
tried any O.S., but may be able to try Unix distribution CDs next week.


		Any more ideas?



				Steve.
539.5HARMNY::CUMMINSFri Mar 21 1997 13:0510
    Was the system ever working? It sounds as if at one point it was
    working (since you say the customer is an NT site and there are NT
    disks that were on the machine).
    
    Assuming my assumption is correct, do you know when AlphaBIOS started
    hanging? Was there a particular event that triggered it? E.g. a
    firmware upgrade; a reboot after some AlphaBIOS setup/init, etc.?
    
    Running out of ideas..
    BC
539.6HARMNY::CUMMINSFri Mar 21 1997 13:0611
    Have you tried both serial and VGA modes? Which mode are you using now?
    If only one or the other, I would try connecting to the other.
    
      P00>>> set os_type nt
      P00>>> set console serial (or graphics)
      P00>>> init
    
    Halt button pressed in gets you back to SRM if hung at AlphaBIOS, but
    then you already knew this I presume..
    
    BC
539.7Worked for 4 months!BRUNEL::KIRBYFri Mar 21 1997 13:3621
Bill,

	The system worked for 4 months fine. As far as I can ascertain it was 
just switched off for some reason, and would not work when switched on again.


	I can run Alphabios from the serial port no problems, but obviouly 
this is no use to the Customer.


	I have just booted a Unix distribution CD, and that comes up fine into 
XWindows, but as I have no disks I can go no further!


	(In desperation) I have ordered a complete set of parts for Monday......
System motherboard, CPU, saddle, horse ..... to swap together in case I have 
another "blow-up" situation. At least that should eliminate the hardware 
rapidly.


			Steve.
539.8MAY30::CUMMINSFri Mar 21 1997 16:111
    Are you sure the VGA card is in PCI0? It must hang off of IOD0.
539.9The "expletive "solution!!!BRUNEL::KIRBYThu Apr 03 1997 13:2729
Well ....... hmmmmm .......... it's fixed now.

The actual hardware fault was the VRC15 monitor. Honest! Everything else back
to the original.

Not a lot I can say really.

I guess the "picture" puts the display into a different mode, and the faulty 
monitor would not switch to the mode required. I subsequently tried it on my
laptop and it would not work on that either.




Having got the system back to site and put the Customer's disks back in and
reset the raid controller's parameters correctly so I could see his 3 
partitions and reset the boot definitions, NT wouldn't boot! Re-installed NT 
and still it wouldn't boot. "Hit any key to continue" was the continual 
message!

Notes 58 and 464 describe that problem (which required me to downgrade the 
AlphaBios temporarily to resolve), but I won't dwell on that here.
I guess this last problem occurred because of clearing the NVRAM!

		Thanks for your time, ideas and suggestions .... 


				Steve.