| Title: | DIGITAL UNIX (FORMERLY KNOWN AS DEC OSF/1) |
| Notice: | Welcome to the Digital UNIX Conference |
| Moderator: | SMURF::DENHAM |
| Created: | Thu Mar 16 1995 |
| Last Modified: | Fri Jun 06 1997 |
| Last Successful Update: | Fri Jun 06 1997 |
| Number of topics: | 10068 |
| Total number of notes: | 35879 |
+---------------------------+TM
| | | | | | | |
| d | i | g | i | t | a | l | TIME DEPENDENT BLITZ
| | | | | | | |
+---------------------------+
BLITZ TITLE: DIGITAL UNIX DATA CORRUPTION WITH SHARED MEMORY
DATE: 16 April 1997
AUTHOR: John Donovan TD #:
DTN: 381-1344
ENET: guru::donovan CROSS REFERENCE #'s:
DEPARTMENT: UNIX Support Engineering (PRISM/TIME/CLD#'s)
INTENDED AUDIENCE: All PRIORITY LEVEL: 1
(U.S./EUROPE/GIA) (1=TIME CRITICAL,
2=NON-TIME CRITICAL)
=====================================================================
INTRODUCTION:
During the course of prereleased hardware testing with Digital UNIX
Versions 4.0 and later, the Digital UNIX Engineering Group discovered
a user application data corruption that was not detected by the
operating system software.
PROBLEM:
A data corruption problem can occur when the parameter new-wire-method
is turned on. The new-wire-method parameter is only available in V4.0
and later releases. All versions V4.0 and later ship with the default
being new-wire-method enabled.
RESOLUTION/WORKAROUND:
The workaround for this problem is as follows:
The problem can be eliminated by turning off the new-wire-method.
1) Become the root user.
2) Create a new file named /tmp/nwm and insert the following lines:
vm:
new-wire-method=0
3) Execute the sysconfigdb command as follows:
# /sbin/sysconfigdb -f /tmp/nwm -m vm
4) Reboot the system.
The new-wire-method option is now disabled.
Please note that turning off the new-wire-method should cause
little or no performance degradation.
It is the Strong Recommendation of Digital UNIX Engineering that
this workaround be implemented on all systems running Digital UNIX
V4.0 and above. Failure to do so can result in undetected data
corruption.
ADDITIONAL COMMENTS:
Digital UNIX Engineering is working at the highest priority on a
solution that will not require the above workaround. When the
resultant fix is ready, an advisory blitz will announce its
availability.
*** DIGITAL INTERNAL USE ONLY ***
\\ GRP=TIME_DEPENDENT CAT=HARDWARE DB=CSSE_TIME_CRITICAL
\\ TYPE=KNOWN_PROBLEM TYPE=BLITZ STATUS=CURRENT
| T.R | Title | User | Personal Name | Date | Lines |
|---|---|---|---|---|---|
| 9526.1 | What are the circumstances?? | DYOSW5::WILDER | Does virtual reality get swapped? | Thu Apr 17 1997 09:58 | 11 |
Any information on what circumstances can cause this data corruption?
Any particular applications? The reason I ask is that I have a customer
running 4.0a and TRC1.4. When taking down one node, it hung and they
got some data corruption. Now, this could be caused by other factors,
but it would be nice to see what engineering knows about this problem
and how it can happen.
Thanks,
/jim
| |||||
| 9526.2 | KITCHE::schott | Eric R. Schott USG Product Management | Thu Apr 17 1997 15:20 | 15 | |
> > Any information on what circumstances can cause this data corruption? > Any particular applications? The reason I ask is that I have a customer > running 4.0a and TRC1.4. When taking down one node, it hung and they > got some data corruption. Now, this could be caused by other factors, > but it would be nice to see what engineering knows about this problem > and how it can happen. > This happens doing raw I/O...otherwise it is hard to describe circumstances... You should ensure you system, firmware, storage are uptodate for patches to ensure you avoid possible corruptions. | |||||
| 9526.3 | Here is a little more... | SMURF::KNIGHT | Fred Knight | Tue Apr 22 1997 13:46 | 10 |
It requires both RAW I/O and swapping/paging. If no swapping or paging is happening, then no corruption will occur. The title is also inaccurate since it has nothing to do with shared memory. Any time you a do raw read (into any memory, shared or not) and the process is swapped or paged, the contents of the raw I/O buffer are at risk. Fred | |||||
| 9526.4 | LEXS01::GINGER | Ron Ginger | Wed Apr 23 1997 08:53 | 5 | |
Thanks Fred for a simple answer. Why dont we do this in a blitz,
instead of trying to hide the details of the problem. Then customers
can make accurate assements of their risk and the urgency of taking
this action. It is not always easy to appy changes- even a re-boot at
my customer must be scheduled as much as 3 weeks in advance.
| |||||