[Search for users]
[Overall Top Noters]
[List of all Conferences]
[Download this site]
Title: | AlphaServer 4100 |
|
Moderator: | MOVMON::DAVIS S |
|
Created: | Tue Apr 16 1996 |
Last Modified: | Fri Jun 06 1997 |
Last Successful Update: | Fri Jun 06 1997 |
Number of topics: | 648 |
Total number of notes: | 3158 |
604.0. "0xFACEFEED error/crash AS4000 5/400" by ACISS2::SDATZMAN () Wed May 14 1997 16:20
I have an AlphaServer 4000 5/400 NT4.0 Cluster running
in a critical, production mode at Cummins Engine. The
environment was upgraded from NT3.51 and Clusters V1.0
to NT4.0/SP2 and Clusters V1.1 about a month ago, and has
been running OK for a couple of weeks.
Over the last 2 weeks, hovever, we have begun to get
intermittent machine crashes on one of the servers. Currently,
we cannot boot the failed machine, nor can we successfully
reinstall NT. When the NT install sequence transitions
from the character-cell portion to the GUI portion we get
the "0xFACEFEED" error referenced in Field Blitz 93.27 of
this conference. Running TEST from the SRM console, we also
see a "CPU0 Soft Error Vector 00620".
Config Data and actions taken:
------------------------------
AS4000 5/400 (1 CPU, 512 MB EDO RAM)
KZPDA (2 RZ29Bs, TZ88N)
KZPSA (to RA410)
DEFPA (2)
DE500-XA (2)
Standard S3 video
I/O options installed by CSS, buss positions have not been changed
All 47 Field Blitzes have been read and applied where applicable
We do not have DECevent (for NT); where can I find it ?
NT 4.0 plus SP2
NT HAL 4.0b
Hotfix (Microsoft Knowledgebase #156410) NOT applied (1 CPU only)
Some module numbers/revs follow:
B3004-AA Rev B04 CPU
30-44712-01 Rev B03 Power
B3030-ET Rev A06 RAM (2 modules)
54-23803-02 Rev A02 "Dodge" Mother Board ?
54-24117-01 Rev F03 Power
B3050-AA Rev H03 "Saddle" ?
B3040-AA Rev D07 "Horse" ?
We have invoked Field Service, and are currently trying to solve this
problem and get NT re-installed. They have replaced the CPU module
with a new one (because of the 620 Soft Error), but this has not solved
the problem.
Any ideas ?
Thanks,
Steve
T.R | Title | User | Personal Name | Date | Lines |
---|
604.1 | Set MEMORY_TEST=PARTIAL | ACISS2::SDATZMAN | | Wed May 14 1997 18:07 | 5 |
| After checking SRM EV "MEMORY_TEST" a second time, I found that it
was set to "PARTIAL". Consistent with Field Blitz 93.38, I set this
back to "FULL", and we seem to be in business.
Steve
|
604.2 | | MAY21::CUMMINS | | Wed May 14 1997 18:24 | 32 |
| My guess is that there's one or more bad pages of memory in said
system. Partial test mode is not supported as you know and passes
bitmaps/descriptors to the operating system that says all untested
memory is "good"/usable.
What may have happened is that enabling full test has led to these
pages being marked bad such that they no longer get used by the OS.
Under SRM, one can do:
P00>>> b pmem:0 -h
.
.
P00>>> info 1
to read bad page info.
One can also do:
P00>>> d toy:22 22 -b
to enable "MFG mode" which causes failure info to be reported to COM1
that doesn't normally get printed. All failures would be reported in
this case, whereas power-up tests normally simply mark bad pages in the
memory bitmaps/descriptors.
One would probably not want to leave TOY:22 <-- 22 at a customer site,
but this may be a valuable on-site debug tool. To disable, write 0 to
this TOY NVRAM location.
BC
|
604.3 | Thanks, Will Check | ACISS2::SDATZMAN | | Thu May 15 1997 12:48 | 4 |
| Thanks. I'll check this out, as we seem to still be having some
random problems.
Steve
|