[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference turris::digital_unix

Title:DIGITAL UNIX(FORMERLY KNOWN AS DEC OSF/1)
Notice:Welcome to the Digital UNIX Conference
Moderator:SMURF::DENHAM
Created:Thu Mar 16 1995
Last Modified:Fri Jun 06 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:10068
Total number of notes:35879

9726.0. "AS255's that crash overnight" by CHEFS::16.42.2.67::TPJSmith (At The Sharp End) Tue May 06 1997 11:26

Hi

After a brief scan no one seems to be noting similar problems.

I have a customer with 40 or so Alphastation 255's running Unix V4.0A and 
V4.0B. Many but not all seem to crash during the night giving CPU panic's.
CSC diagnosis against any logged calls usually suspect memory corruption or a 
failure on the PCI. The hardware configuration is using 2 PBXGB-AA graphics 
cards which may slightly unusual although supported as I understand. These 
systems NEVER fail during the daytime, but only at night. NB! Due to unrelated 
problems the Open3D product set has been installed.

The only thing that seems to be running was the screensaver. Various 
screensavers are in use and no particular one seems to crash more than others.
However if the blank screen is chosen rather than an active screensaver the 
crashes are virtually eliminated (we had one in 10 days instead of 2-3 a 
night)

Are there any known bugs whereby the screensaver could crash an Aplhastation 
255 (233MHZ) with 2 PBXGB-AA's?
  
Any help greatly appreciated

Tim S 
T.RTitleUserPersonal
Name
DateLines
9726.1please try CANASTA !HAN::HALLEVolker Halle MCS @HAO DTN 863-5216Tue May 06 1997 12:5919
    Tim,
    
    please obtain the various crash-data files and send them to CANASTA.
    Once you have sent a 'couple of them', you should be able to recognize
    the 'crash footprint' - assuming there is a common reason and footprint
    from all those crashes.
    
    CANASTA might have seen other crashes with the SAME footprint and might
    even point you to a patch or escalated call.
    
    For details using CANASTA, please read note 8919.
    
    You could even set up AutoCLUE on all those nodes and automatically send
    the crash-data files to a 'central system' to collect the footprints or
    - if the customer has internet access - you can send the crash-data
    files directly to the CANASTA Mail Server (please see note
    CSC32::AUTOCLUE 72 and send mail to me, BEFORE you start this !).
    
    Volker.
9726.2I'll try CANASTACHEFS::rasmodem2.reo.dec.com::TPJSmithAt The Sharp EndWed May 07 1997 09:5512
Thanks for the quick response

I actually did a search on the stars database and noted something very similar 
that seemed to be an IMPT case. Unfortunatly it was never solved because the 
customer swapped his hardware rather than tolerate the crashes. Engineering 
therefore closed the case.

I will study the CANASTA route as suggested.

Thanks

Tim S
9726.3a thoughtCSC32::I_WALDOWed May 07 1997 11:5210
    
    Try the following, just to see.
    
    /sbin/sysconfig -r pwrmgr default_pwrmgr_state=0
    (not permanent)
    
    This helped one customer who had installed the power management
    software on a UNIX 4.0 Alphastation255 box.
    
    
9726.4Could be a winner!!CHEFS::oloras9.olo.dec.com::TPJSmithAt The Sharp EndFri May 09 1997 13:0414
For info

I am looked into the powersaver routine as suggested. It had been installed 
and was set to 1. We changed the setting to 0, but also set the boot default 
to 0 as well. This was because the area in use is a development lab and 
systems could be re-booted quite often and would be difficult to police 
whether users had remebered to reinstate state=0.

We intend to monitor this over the coming weeks. I suppose the acid test would 
be starting to use the normal sreensavers again.

Thanks

Tim S