[Search for users]
[Overall Top Noters]
[List of all Conferences]
[Download this site]
Title: | SABLE SYSTEM PUBLIC DISCUSSION |
|
Moderator: | COSMIC::PETERSON |
|
Created: | Mon Jan 11 1993 |
Last Modified: | Fri Jun 06 1997 |
Last Successful Update: | Fri Jun 06 1997 |
Number of topics: | 2614 |
Total number of notes: | 10244 |
2576.0. "2100A RM, crash/halt, no dump, no errlog" by OHFSS1::FULLER (Never confuse a memo with reality) Thu Apr 10 1997 15:40
[Crossposted, when I was reminded that ALPHASTATION is not SABLE...]
<<< WRKSYS::SYS_TOOLS:[NOTES$LIBRARY]ALPHASTATION.NOTE;1 >>>
-< Alpha Workstation Conference >-
================================================================================
Note 1919.0 2100A RM, crash/halt, no dump, no errlog 4 replies
OHFSS1::FULLER "Never confuse a memo with reality" 57 lines 10-APR-1997 10:36
--------------------------------------------------------------------------------
My customer has a 2100A RM system 5/300 with cpus and 1GB of memory.
On the PCI bus, there are:
1 PB2GA-JB S3TRIO 64 VGA video
1 DE435 Ethernet
1 KZPAA SCSI, with TZ87 at SCSI target 2
1 KZPDA SCSI (FWSE), with 2 CDROM drives (RRD45)
1 KZPDA SCSI (FWSE), with several disks (RZ28, RZ29)
The system is located about 20 miles from the system manager's desk, so
he likes to do as much remote system management as possible, which
includes an occasional reboot (shutdown -r).
Every now and then, when he attempts to reboot the system, it fails to
come back, so he drives the 20 miles to the system to find out what
happened, and to reboot the system. When he gets to the system, he
finds that it's sitting at the >>> prompt. So, he types BOOT and away
it goes...sometimes.
When it fails to reboot, we've noted the following:
Part way through the boot process, the systems appears to crash, or at
least try to, then it halts. There is NO crash information on the
screen; it just halts to the >>> prompt.
Now, bear with me while I point out what we see on the screen during a
boot:
. Type >>> BOOT
. Digital Unix (V3.2F) loads, showing text/data/bss sizes
. The screen font changes (take note; this is important)
. Unix displays hardware inventory
. Unix starts the init process, which boots up everything else
What we're seeing is that at some time between the hardware inventory
display and the rest of the booting, the screen font changes back to
the font used by the console, and the screen *contents* changes back to
that which was there when the screen font changed from the console font
to the Unix font. Then, it just halts to the >>> prompt.
Since the screen contents revert back to that prior to the hardware
inventory, there is no information on the screen to provide a hint as
to the source of the crash. Since the error logger process had not yet
started, there is no error log information. And, there is no crash
dump.
I spent a day looking at the hardware configuration, and after a long
series of "try this" and "try that", I found that if I move the tape
drive from the KZPAA SCSI controller to one of the KZPDA SCSI
controllers, this crashing/halting problem no longer occurs. However,
with the tape on the same SCSI channel as the RZ28/RZ29 disks, this
creates another problem having to do with backups, which I won't get
into at this time.
Any takers for this problem? Thanks!
Stu
================================================================================
Note 1919.1 2100A RM, crash/halt, no dump, no errlog 1 of 4
OHFSS1::FULLER "Never confuse a memo with reality" 5 lines 10-APR-1997 10:38
-< Only fails with console=graphics >-
--------------------------------------------------------------------------------
Oh, one more thing. If we run the system with a serial port as the
console, we don't have the problem. Unfortunately, this is not an
option for us.
Stu
================================================================================
Note 1919.2* 2100A RM, crash/halt, no dump, no errlog 2 of 4
WRKSYS::HOUSE "Kenny House, Workstations Engineering" 6 lines 10-APR-1997 11:07
-< try MVBLAB::SABLE >-
--------------------------------------------------------------------------------
This .. is .. NOT .. an .. AlphaServer .. conference.
Try MVBLAB::SABLE for AlphaServer 2100A questions (press KP7 to add
this entry to your notebook).
-- Kenny House
================================================================================
Note 1919.3 2100A RM, crash/halt, no dump, no errlog 3 of 4
UTOPIE::OETTL "hide bug until worst time" 8 lines 10-APR-1997 13:38
--------------------------------------------------------------------------------
You have an S3TRIO, is it in the secondary PCI? If yes, move it to the
primary PCI-BUS. I had some problems with S3TRIO's causing crashes when sitting
in the secondary PCI of Lynxes.
Hope this helps,
�tzi
=================================================================
Oh, and the S3TRIO is on the primary PCI bus.
Thanks!
Stu
T.R | Title | User | Personal Name | Date | Lines |
---|
2576.1 | | AFW3::MAZUR | | Fri Apr 11 1997 09:03 | 3 |
| Aside from the underlying problem your customer experiences, would you
think the customer would benefit from an RCM (Remote Console Management)
card to save him some 20 mile drives.
|
2576.2 | Bad Video Card (not just placement) | SOLVIT::BAZARNICK | Contemplating Buoyancy | Fri Apr 11 1997 18:02 | 4 |
| The response from support engineering:
"I have seen a malfunctioning video card cause this sort of problem.
Give that a try first."
|
2576.3 | Already tried a new video card | OHFSS1::FULLER | Never confuse a memo with reality | Fri Apr 11 1997 18:13 | 8 |
| > "I have seen a malfunctioning video card cause this sort of problem.
> Give that a try first."
Well, great minds thinking alike and all, "been there, done that".
Thanks for the suggestion, though.
Stu
|
2576.4 | Resolution found! | OHFSS1::FULLER | Never confuse a memo with reality | Thu May 15 1997 16:51 | 41 |
| For the benefit of those interested...
The original PCI configuration of the machine was:
DE450 > After the PCI bridge on the I/O mother board
KZPAA -> TZ87, (2)RRD45
KZPDA -> (6) RZ28M-VW
KZPDA -> (3) RZ28M-VW
[PCI bridge on the I/O mother board]
[empty] > Before the PCI bridge
[empty] >
[empty] >
#9 SVGA card >
In this configuration, at some point between Digital Unix (V3.2f)
displaying the hardware inventory and the single user mode prompt, the
screen would revert to the console font (from when the boot first
started) and to the original screen contents (from just before when the
kernel was called after loading). Occasionally, depending on how
closely we looked, we would see "PCI LOCAL BUS FAULT", and/or some
message(s) indicating that a dump was attempted and failed (giving up).
As a workaround, we found that by moving the TZ87 from the KZPAA to one
of the KZPDA channels, this problem stopped. However, with the tape
drive on the same channel with the disks, we would get intermittent
(once a week, on average) device timeouts. When the SCSI bus was reset
as a result of the timeout, the tape would rewind, DecNSR would
complain, etc., etc...
Putting the tape drive back on the KZPAA channel brought back the LOCAL
BUS FAULTs, however.
We ended up moving all the SCSI channels to before the PCI bridge on
the I/O mother board, and with the TZ87 on the KZPAA, we have been
running for 1.5 weeks without a problem. I was going to experiment
with moving just the KZPAA before the bridge, leaving the KZPDAs after
the bridge, but the customer was more interested in going home. Maybe
I'll be able to perform the KZPDA-after-the-bridge experiment some
other time. If so, I'll post the results here if anyone's interested.
Stu
|