[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference turris::digital_unix

Title:	DIGITAL UNIX(FORMERLY KNOWN AS DEC OSF/1)
Notice:	Welcome to the Digital UNIX Conference
Moderator:	SMURF::DENHAM

Created:	Thu Mar 16 1995
Last Modified:	Fri Jun 06 1997
Last Successful Update:	Fri Jun 06 1997
Number of topics:	10068
Total number of notes:	35879

9526.0. **"Blitz on new-wire-method V4.0*"** by KITCHE::schott (Eric R. Schott USG Product Management) Wed Apr 16 1997 16:20

+---------------------------+TM
|   |   |   |   |   |   |   |
| d | i | g | i | t | a | l |      TIME   DEPENDENT   BLITZ
|   |   |   |   |   |   |   |
+---------------------------+


      BLITZ TITLE: DIGITAL UNIX DATA CORRUPTION WITH SHARED MEMORY 
	  	  

                                                DATE: 16 April 1997
      AUTHOR: John Donovan			TD #:
      DTN:    381-1344   
      ENET: guru::donovan                       CROSS REFERENCE #'s:
      DEPARTMENT: UNIX Support Engineering      (PRISM/TIME/CLD#'s)

      INTENDED AUDIENCE: All                    PRIORITY LEVEL: 1
      (U.S./EUROPE/GIA)                         (1=TIME CRITICAL,
                                                 2=NON-TIME CRITICAL)

=====================================================================

      INTRODUCTION: 

	During the course of prereleased hardware testing with Digital UNIX
	Versions 4.0 and later, the Digital UNIX Engineering Group discovered 
	a user application data corruption that was not detected by the
	operating system software.

      PROBLEM:  

	A data corruption problem can occur when the parameter new-wire-method
	is turned on. The new-wire-method parameter is only available in V4.0
	and later releases. All versions V4.0 and later ship with the default 
	being new-wire-method enabled.

      RESOLUTION/WORKAROUND:

	The workaround for this problem is as follows:

        The problem can be eliminated by turning off the new-wire-method.

	1) Become the root user.

 	2) Create a new file named /tmp/nwm and insert the following lines:

		  vm:
		     new-wire-method=0

 	3) Execute the sysconfigdb command as follows:

		  # /sbin/sysconfigdb -f /tmp/nwm -m vm

 	4) Reboot the system.

	The new-wire-method option is now disabled.

	Please note that turning off the new-wire-method should cause
	little or no performance degradation.

        It is the Strong Recommendation of Digital UNIX Engineering that

        this workaround be implemented on all systems running Digital UNIX
        V4.0 and above. Failure to do so can result in undetected data
        corruption.

      ADDITIONAL COMMENTS:

        Digital UNIX Engineering is working at the highest priority on a
        solution that will not require the above workaround. When the
        resultant fix is ready, an advisory blitz will announce its
        availability.

                     *** DIGITAL INTERNAL USE ONLY ***

\\ GRP=TIME_DEPENDENT CAT=HARDWARE DB=CSSE_TIME_CRITICAL
\\ TYPE=KNOWN_PROBLEM TYPE=BLITZ STATUS=CURRENT

T.R	Title	User	Personal Name	Date	Lines
9526.1	What are the circumstances??	DYOSW5::WILDER	Does virtual reality get swapped?	`Thu Apr 17 1997 09:58`	11
	Any information on what circumstances can cause this data corruption? Any particular applications? The reason I ask is that I have a customer running 4.0a and TRC1.4. When taking down one node, it hung and they got some data corruption. Now, this could be caused by other factors, but it would be nice to see what engineering knows about this problem and how it can happen. Thanks, /jim
9526.2		KITCHE::schott	Eric R. Schott USG Product Management	`Thu Apr 17 1997 15:20`	15
	> > Any information on what circumstances can cause this data corruption? > Any particular applications? The reason I ask is that I have a customer > running 4.0a and TRC1.4. When taking down one node, it hung and they > got some data corruption. Now, this could be caused by other factors, > but it would be nice to see what engineering knows about this problem > and how it can happen. > This happens doing raw I/O...otherwise it is hard to describe circumstances... You should ensure you system, firmware, storage are uptodate for patches to ensure you avoid possible corruptions.
9526.3	Here is a little more...	SMURF::KNIGHT	Fred Knight	`Tue Apr 22 1997 13:46`	10
	It requires both RAW I/O and swapping/paging. If no swapping or paging is happening, then no corruption will occur. The title is also inaccurate since it has nothing to do with shared memory. Any time you a do raw read (into any memory, shared or not) and the process is swapped or paged, the contents of the raw I/O buffer are at risk. Fred
9526.4		LEXS01::GINGER	Ron Ginger	`Wed Apr 23 1997 08:53`	5
	Thanks Fred for a simple answer. Why dont we do this in a blitz, instead of trying to hide the details of the problem. Then customers can make accurate assements of their risk and the urgency of taking this action. It is not always easy to appy changes- even a re-boot at my customer must be scheduled as much as 3 weeks in advance.

Conference turris::digital_unix

9526.0. "Blitz on new-wire-method V4.0*" by KITCHE::schott (Eric R. Schott USG Product Management) Wed Apr 16 1997 16:20

9526.0. **"Blitz on new-wire-method V4.0*"** by KITCHE::schott (Eric R. Schott USG Product Management) Wed Apr 16 1997 16:20