[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference wrksys::alphastation

Title:Alpha Workstation Conference
Notice:See note 1.* for conference notices
Moderator:WRKSYS::HOUSE
Created:Wed Sep 07 1994
Last Modified:Fri Jun 06 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:1996
Total number of notes:9122

1919.0. "2100A RM, crash/halt, no dump, no errlog" by OHFSS1::FULLER (Never confuse a memo with reality) Thu Apr 10 1997 11:36

    My customer has a 2100A RM system 5/300 with cpus and 1GB of memory.
    On the PCI bus, there are:
    
    	1 PB2GA-JB	S3TRIO 64 VGA video
    	1 DE435		Ethernet
    	1 KZPAA		SCSI, with TZ87 at SCSI target 2
    	1 KZPDA		SCSI (FWSE), with 2 CDROM drives (RRD45)
    	1 KZPDA		SCSI (FWSE), with several disks (RZ28, RZ29)
    
    The system is located about 20 miles from the system manager's desk, so
    he likes to do as much remote system management as possible, which
    includes an occasional reboot (shutdown -r).
    
    Every now and then, when he attempts to reboot the system, it fails to
    come back, so he drives the 20 miles to the system to find out what
    happened, and to reboot the system.  When he gets to the system, he
    finds that it's sitting at the >>> prompt.  So, he types BOOT and away
    it goes...sometimes.
    
    When it fails to reboot, we've noted the following:
    
    Part way through the boot process, the systems appears to crash, or at
    least try to, then it halts.  There is NO crash information on the
    screen; it just halts to the >>> prompt.
    
    Now, bear with me while I point out what we see on the screen during a
    boot:
    
    	. Type >>> BOOT
    	. Digital Unix (V3.2F) loads, showing text/data/bss sizes
    	. The screen font changes (take note; this is important)
    	. Unix displays hardware inventory
    	. Unix starts the init process, which boots up everything else
    
    What we're seeing is that at some time between the hardware inventory
    display and the rest of the booting, the screen font changes back to
    the font used by the console, and the screen *contents* changes back to
    that which was there when the screen font changed from the console font
    to the Unix font.  Then, it just halts to the >>> prompt.
    
    Since the screen contents revert back to that prior to the hardware
    inventory, there is no information on the screen to provide a hint as
    to the source of the crash.  Since the error logger process had not yet
    started, there is no error log information.  And, there is no crash
    dump.
    
    I spent a day looking at the hardware configuration, and after a long
    series of "try this" and "try that", I found that if I move the tape
    drive from the KZPAA SCSI controller to one of the KZPDA SCSI
    controllers, this crashing/halting problem no longer occurs.  However,
    with the tape on the same SCSI channel as the RZ28/RZ29 disks, this
    creates another problem having to do with backups, which I won't get
    into at this time.
    
    Any takers for this problem?  Thanks!
    
    	Stu
T.RTitleUserPersonal
Name
DateLines
1919.1Only fails with console=graphicsOHFSS1::FULLERNever confuse a memo with realityThu Apr 10 1997 11:385
    Oh, one more thing.  If we run the system with a serial port as the
    console, we don't have the problem.  Unfortunately, this is not an
    option for us.
    
    	Stu
1919.2try MVBLAB::SABLEWRKSYS::HOUSEKenny House, Workstations EngineeringThu Apr 10 1997 12:076
    This .. is .. NOT .. an .. AlphaServer .. conference.
    
    Try MVBLAB::SABLE for AlphaServer 2100A questions (press KP7 to add
    this entry to your notebook).
    
    -- Kenny House
1919.3UTOPIE::OETTLhide bug until worst timeThu Apr 10 1997 14:388
You have an S3TRIO, is it in the secondary PCI? If yes, move it to the
primary PCI-BUS. I had some problems with S3TRIO's causing crashes when sitting
in the secondary PCI of Lynxes.

Hope this helps,
�tzi

1919.4OHFSS1::FULLERNever confuse a memo with realityThu Apr 10 1997 15:235
>    This .. is .. NOT .. an .. AlphaServer .. conference.
    
    Ooops!  1000 apologies...
    
    	Stu