[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference kernel::csguk_systems

Title:CSGUK_SYSTEMS
Notice:No restrictions on keyword creation
Moderator:KERNEL::ADAMS
Created:Wed Mar 01 1989
Last Modified:Thu Nov 28 1996
Last Successful Update:Fri Jun 06 1997
Number of topics:242
Total number of notes:1855

195.0. "CRD Memory error template" by KERNEL::ADAMS (Brian Adams CSC-Viables '833-3026) Thu Dec 08 1994 02:29

    
    Topic for discussion on a new CRD memory error template for group use
T.RTitleUserPersonal
Name
DateLines
195.1From Ken RobbKERNEL::ADAMSBrian Adams CSC-Viables '833-3026Fri Dec 09 1994 23:0252
From:	NICES::ROBB          9-DEC-1994 13:15:24.37
To:	KERNEL::ADAMS
CC:	
Subj:	memory parity template

Dear Customer,

Your system has experienced a single bit memory error at the following address
(extracted from your errorlog)
 
      PHYSICAL ADDR  

This is normal on your system and, providing the error does not return at
the same address, is nothing to worry about. If it should return, please log 
another call, when we will investigate futher.

I have included a full explanation below.


Regards,


        All  DRAM  memory  systems  (as  used  throughout the computer
        industry) are  subject  to  experiencing  "soft"   errors.   A
        "soft"  error  is  defined  as  one  that  has occurred when a
        stored bit within a  DRAM  cell flips  its  previously  stored
        polarity;  it  is  corrected  by rewriting  the  erroneous bit,
        and  subsequent  accesses to that DRAM cell  return the correct
        stored data. 
        
        Because this system uses parity protected memory, it can  detect
        but  not  correct  the occasional DRAM "soft" error.  In the VMS
        operating system, depending on IPL and mode, the occurrence of a
        detected  memory parity error may result in image abort, process
        deletion or a system crash.  Normal system operation is returned
        by rebooting the machine.

        Memory system DRAM "soft" failure rates increase with the number
        of DRAM  chips used.  For a  given size DRAM chip, a 32MB memory
        system will experience a higher "soft" failure rate than  a  4MB
        memory system.   Engineering  has  calculated  acceptable "soft"
        error  rates  which could  range (worst case) from  one every 72
        days for a 4MB system to one every 45 days for a 32MB system. 


        The strategy  for  determining  "hard"  memory  errors  requires
        iterative   analysis   of  errorlog  entries  and/or  crashdumps
        produced by parity error machine checks. Should analysis produce 
        a consistent failure footprint (consistent failing address after
        system reboots), then that error should  be deemed  "hard".  


195.2alternatively....KERNEL::PETTETNorm Pettet CSC BasingstokeSun Dec 18 1994 15:5022
    Ken, I think your template is a bit long how about:-
    

Dear Customer,


A call has been automatically logged with the Customer Support Centre for
SYSTEM ......, NODE ........ Analysis shows that Memory Array #..,
has been logging CORRECTABLE READ DATA errors (CRD's) for Bit #...

The system is designed to expect occasional single bit errors, and the memory
controller will correct the data before allowing it to be used.

The recommended action is to reboot the system when this is convenient. Should
the same errors occur after the reboot i.e. CRD's with the same Array #/Bit #,
then further action may be required by DIGITAL on your behalf.

       Regards,


 
    
195.3And then again....KERNEL::ADAMSBrian Adams CSC-Viables '833-3026Sun Dec 18 1994 16:0021
Dear 'Customer',


Your 'sys-type', NODE 'nodename' reports that Memory Array #..,
has been logging CORRECTABLE READ DATA errors (CRD's) for Bit #...

Memory systems are designed to expect single bit errors, and the memory
controller will correct the data before allowing it to be used. This is the 
normal, expected behaviour.

The recommended action is to reboot the system, when convenient. 
Should the same errors occur after the reboot i.e. CRD's with the same 
Array #/Bit #, further action may be required by DIGITAL on your behalf.

As no action is required from Digital, for this call, it will now be closed.

Regards