| From: NICES::ROBB 9-DEC-1994 13:15:24.37
To: KERNEL::ADAMS
CC:
Subj: memory parity template
Dear Customer,
Your system has experienced a single bit memory error at the following address
(extracted from your errorlog)
PHYSICAL ADDR
This is normal on your system and, providing the error does not return at
the same address, is nothing to worry about. If it should return, please log
another call, when we will investigate futher.
I have included a full explanation below.
Regards,
All DRAM memory systems (as used throughout the computer
industry) are subject to experiencing "soft" errors. A
"soft" error is defined as one that has occurred when a
stored bit within a DRAM cell flips its previously stored
polarity; it is corrected by rewriting the erroneous bit,
and subsequent accesses to that DRAM cell return the correct
stored data.
Because this system uses parity protected memory, it can detect
but not correct the occasional DRAM "soft" error. In the VMS
operating system, depending on IPL and mode, the occurrence of a
detected memory parity error may result in image abort, process
deletion or a system crash. Normal system operation is returned
by rebooting the machine.
Memory system DRAM "soft" failure rates increase with the number
of DRAM chips used. For a given size DRAM chip, a 32MB memory
system will experience a higher "soft" failure rate than a 4MB
memory system. Engineering has calculated acceptable "soft"
error rates which could range (worst case) from one every 72
days for a 4MB system to one every 45 days for a 32MB system.
The strategy for determining "hard" memory errors requires
iterative analysis of errorlog entries and/or crashdumps
produced by parity error machine checks. Should analysis produce
a consistent failure footprint (consistent failing address after
system reboots), then that error should be deemed "hard".
|
| Ken, I think your template is a bit long how about:-
Dear Customer,
A call has been automatically logged with the Customer Support Centre for
SYSTEM ......, NODE ........ Analysis shows that Memory Array #..,
has been logging CORRECTABLE READ DATA errors (CRD's) for Bit #...
The system is designed to expect occasional single bit errors, and the memory
controller will correct the data before allowing it to be used.
The recommended action is to reboot the system when this is convenient. Should
the same errors occur after the reboot i.e. CRD's with the same Array #/Bit #,
then further action may be required by DIGITAL on your behalf.
Regards,
|
|
Dear 'Customer',
Your 'sys-type', NODE 'nodename' reports that Memory Array #..,
has been logging CORRECTABLE READ DATA errors (CRD's) for Bit #...
Memory systems are designed to expect single bit errors, and the memory
controller will correct the data before allowing it to be used. This is the
normal, expected behaviour.
The recommended action is to reboot the system, when convenient.
Should the same errors occur after the reboot i.e. CRD's with the same
Array #/Bit #, further action may be required by DIGITAL on your behalf.
As no action is required from Digital, for this call, it will now be closed.
Regards
|