[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference mvblab::alphaserver_4100

Title:AlphaServer 4100
Moderator:MOVMON::DAVISS
Created:Tue Apr 16 1996
Last Modified:Fri Jun 06 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:648
Total number of notes:3158

591.0. "Fatal system hw error" by NNTPD::"[email protected]" (Pedro Torres) Wed May 07 1997 13:28

I'm servicing an AlphaServer 4100 with the following configuration:
-4 x CPU, 300MHz EV5 w/MC Bus Interface          B3001-CA
-256 MB of RAM ( 2 X Memory, 128MB Sync SIMM Kit (2xB3020-CA)    MS320-CA)
-S3 trio 2MB in PCI0
-de500-xa in PCI1
-KZPSC in PCI1
-BA356 in dual bus mode with 5 RZ29B-VW
-SRM 4.8-6
-Alphabios 5.28

The machine has NT 4 + SP2 + hotfix, Internet Information Server,
Microsoft Index Server.

When creating and index file of 5 GB of documents the customer got:
Fatal System Hardware Error
Machine Check in PAL mode

right after that, while rebooting, the following error appeard in AlphaBios:
memory error at address 66e580

After power cycling it run OK for 15 min. and froze (system hang; didn't
respond to mouse, keyboard or ping).

I have some hints but I'd like to hear your wize oppinions. 

Thank you very much

Pedro Torres
[Posted by WWW Notes gateway]
T.RTitleUserPersonal
Name
DateLines
591.1POBOXB::BAKWed May 07 1997 19:253
Try EDO RAM (if possible)...

If not check rev of motherboard....
591.2Dodge rev B06NNTPD::"[email protected]"Pedro TorresThu May 08 1997 09:0660
-Motherboard (dodge)     rev. B06
-mem (4 x B3020-CT)      rev. D04
-CPU (4 x B30001-CA)     rev. D02
-Power Contrl mod        rev. F03
-Bridge mod (horse)      rev. D07
-PCI/EISA Bckpl (saddle) rev. H03 


I'm sorry for giving so much stuff, but this way I have a better
chance of taking more advantage of your knowledge.

I have more information on yestarday's error:

Fatal System hardware error: Mch Chck in PAL mode
ICSR:4144000300   ICPERR_STAT:0
MM_STAT:146d0     DCPERR_STAT:0
...
ECC ERROR: Dcache Fill
PA:813640
Multiple external/tag errors detected


Right after this, while rebooting, AlphaBios reported:

Memory error at address: 66e580
Expected:0    Actual:8


Today I went on site to check HW revisions and the customer 
informed me of a NT 4 crash that had just happened.
here is some data:

ECC ERROR: Dcache Fill
...
Bug CheckPC:80651ef4   PSR:2

           NTOSKRNL.EXE
           HALL.DLL
IRQL:7, DPC ACTIVE:FALSE


This was logged by NT's Event Viewer as:

The computer has rebooted from a bugcheck.  
The bugcheck was: 0x0000002e (0x00000000, 0x00000000, 0x00000000, 0x810edba8).
Microsoft Windows NT [v15.1381]. A dump was saved in: C:\WINNT\MEMORY.DMP.


Right after this one, while rebooting, Alphabios reported:

Memory error at address: 10d3c60
Expected: fff     Actual:800000007


I don't mean to overload you!!

Thank you very much for your help

Pedro Torres
[Posted by WWW Notes gateway]
591.3Mayday Mayday MaydayNNTPD::"[email protected]"Pedro TorresFri May 09 1997 10:5416
I'm sorry for pressing but this problem is getting hot and
I believe I'm not sure I can blame it on NT 4!!

I sould try to get a dodge (system board) with a revision B07
or B08, correct?

Anything else I should concentrate on?


I'm sorry for insisting


Thank you very much

Pedro Torres 
[Posted by WWW Notes gateway]
591.4MAY21::CUMMINSFri May 09 1997 11:2514
    There have been several noters in the past few days/weeks that have
    reported problems with three/four CPUs, SYNC memory, and B06
    motherboards. A B07 replacement motherboard has solved their problems.
    Without more crash info, this is the best advice we can give you. The
    error output you provided in .2 doesn't help. Other errors registers
    are needed such as the IOD's CAP_ERR, MC_ERR1 and MC_ERR0, PCI_ERR,
    EV5's EI_STAT, etc.
    
    Have you attempted to install/run DECevent on this system? DECevent
    supports Windows NT V4.0 on 4100/4000.
    
      http://www.service.digital.com/decevent/
    
    BC
591.5Unable to get DecEventNNTPD::"[email protected]"Pedro TorresFri May 09 1997 11:4015
Thank you very much for your reply.

In the last few days I have been trying to download
DECEvent for NT from the URL you refered but I have been getting 
a FTP error (it seems there's nothing there).

Do you have any other pointer?

Is there any other way i can keep the crash info?

Thank you very much 

Pedro Torres

[Posted by WWW Notes gateway]
591.6MAY21::CUMMINSFri May 09 1997 12:223
    I've asked the 4100/4000 DECevent developer if he can help out. I get
    the same error when trying to access the DECeventNT on Alpha systems
    page.
591.7HARMNY::CUMMINSFri May 09 1997 12:2938
From:	POBOXA::SHEPARD      "GARY DTN 223-2499"  9-MAY-1997 11:16:32.94
To:	HARMNY::CUMMINS
CC:	SPECXN::SCADDEN,TINCUP::KOLBE,SHEPARD
Subj:	RE: DECevent for Windows NT V4.0 on Rawhide

Bill,

Contact Liesl Kolbe at 592-5681 she should be able to help.

Gary

===========================================
From:	HARMNY::CUMMINS      "Bill Cummins, PKO3-2/Q21, 223-4641"  9-MAY-1997 11:09:12.52
To:	SPECXN::SCADDEN,POBOXA::SHEPARD
CC:	CUMMINS
Subj:	DECevent for Windows NT V4.0 on Rawhide

Hello Glenn,

A Rawhide customer (service person) is trying to download DECevent for
Windows NT V4.0 and is having trouble doing so. I get the same errors.
The notes file entry is MVBLAB::ALPHASERVER_4100 note 591.5.

The page that gives me the error is:

ftp://ftp.service.digital.com/private/decevent/windowsnt/deceventaxpv10.exe

The hyperlink to said page is:

  DECeventNT for Alpha systems
  ----------------------------

at the http://www.service.digital.com/decevent/decevent-ntkit.htm page.

Can you help?

Thanks,
BC
591.8CPU rev?NNTPD::"[email protected]"Pedro TorresTue May 13 1997 07:3614
I have DECenventNT now. Thank you.

Does it get a hold of the registers contents?
If not is there another way to do it (besides writing 
down the MCH CHK screen by hand)?

Note 232.1 mentioned rev D04 for the 5/300 CPU.
Is for the Bcacheless (B3001-CA)?


Thank you very much

Pedro Torres
[Posted by WWW Notes gateway]
591.9MAY21::CUMMINSWed May 14 1997 09:4012
    I'm not sure I understand your first question. The purpose of DECevent is
    to interpret an on-disk system error log file. Hopefully, the crash the 
    system in question took produced a dump file entry. This entry, if present,
    should contain the error/status register info necessary to make a manual
    diagnosis. DECeventNT currently only supports bit-to-text translation.
    Fault analysis capability will be added in the future. [In the very near
    future for UNIX/VMS DECevent.]
    
    I just posted note 93.47 which describes in more detail the problem with
    SYNC memories, B06 or earlier motherboards, and systems with 3 or 4 CPUs.
    
    Perhaps someone else can answer Pedro's question re: minimum CPU revs.
591.10PROXY::ALFORDThu May 15 1997 10:002
    the min. supported rev for B3002-CA is D01. According to your note
    591.2 you had rev D02...so you should be ok.
591.11OK for 48 hrsNNTPD::"[email protected]"Pedro TorresFri May 16 1997 08:4714
The system has been running OK for the last 48 hrs.
I'm using EDO RAM now, with a B06 dodge (system board).
Client is doubt between EDO or SYNC.
If he goes for SYNC I'll swap dodges.
When this tango is over I'll tell you this machine's
story from the beggining.
May it will be of some use to you guys.


Thank you very much


Pedro Torres
[Posted by WWW Notes gateway]