[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference turris::digital_unix

Title:DIGITAL UNIX(FORMERLY KNOWN AS DEC OSF/1)
Notice:Welcome to the Digital UNIX Conference
Moderator:SMURF::DENHAM
Created:Thu Mar 16 1995
Last Modified:Fri Jun 06 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:10068
Total number of notes:35879

8083.0. "media error in a RAID5 array (HSZ40) corrupts ADVFS" by MXOC00::MJUAREZ () Tue Dec 03 1996 22:26

T.RTitleUserPersonal
Name
DateLines
8083.1LEXS01::GINGERRon GingerWed Dec 04 1996 11:536
8083.2Don't shoot the messenger !RUSURE::KATZWed Dec 04 1996 12:4811
8083.3Does this happen with RAID1 as well?VIRGIN::SUTTERWho are you ??? - I'm BATMAN !!!Fri Dec 06 1996 04:0416
8083.4KITCHE::schottEric R. Schott USG Product ManagementFri Dec 06 1996 06:3825
8083.5HSZ40 is not resposible for my case...EPS::NGUYENWithout fools there would be no wisdom.Fri Feb 21 1997 13:3535
>>
>>As I understand there is the possibility that a RAID5 logical unit on a 
>>HSZ40 fails temporarily when a media device goes bad in that set and thus 
>>corrupts the data in it: 
>>
>>	- Is this unique to the HSZ40 or would this happen on other 
>>	  controllers as well (SWXCR, HSZ10, other vendor's RAID Controller)? 
>
>this is a bug in the hsz40 firmware...I think later versions fix
>it but you would have to ask the HSZ folks.

I have one customer who doesn't use HSZ40 at all.  He has some 20GB 
internal disks with RAID 5 setup.  He got a similar problem when the 
3rd party soft ware indicate a media problem.  He is using DU 3.2D.
When he restores the backup on another equivalent system without RAID 5 
setup, the system seems to be OK.  Is RAID 5 that unstable?

>>
>>	- If this is unique to the HSZ40, is it HSOF dependent?
>>
>>	- Would a RAID1 set be more secure here (i.e. would a RAID1 set
>>	  NEVER fail as long there is still enough device redundancy available? 
>
>In the presence of bugs like this, it is hard to say what is
>more secure...if you are really paranoid, use LSM mirroring.

LSM would give an option of high availability but doesn't offer 
fault tolerant which is what RAID 5 suppose to be.
That lead me to questions: Should OS 4.x is better to use with RAID5?
In case of OS V3.2D, do we have any patch(ES) to fix this problem 
(what, where,how to find?)

Many thanks.
Gina
8083.6NABETH::alanDr. File System's Home for Wayward Inodes.Fri Feb 21 1997 15:4831
	In a perfect world you wouldn't need any form of RAID.  This
	being an imperfect world where devices fail, blocks go bad and
	tradeoffs have to made, the various RAID levels offer differing
	levels of protection against failures with tradeoffs in cost
	and performance.

	With RAID-5 in the normal state, it should be able to correct
	errors on single devices or compensate for the loss of a single
	device.  It a particular RAID-5 implementation can't handle
	this case, without letting an I/O error get through, it is
	a rather poor one.  If the array has lost one member, than
	any I/O failure on the others will be passed back up as an
	because all the other members contribute to the data regeneration.

	Some implementions keep track of I/O failures and the state of
	the data used for regeneration.  If bad data would go into the
	regeneration on an error, they will treat that as error instead
	of returning the wrong data.

	It isn't very informative just to say that a RAID-5 let an I/O
	error get through, since there are many different causes.
	Hopefully the array or software will keep track of the
	detailed failure and offer some way to determine the cause
	in case it is preventable.

	re: LSM, fault tolerance and availability.

	This doesn't make any sense.  LSM offers mirroring, which is
	superior to RAID-5 in every way but price.  A properly configured
	mirror can survive any fault in the I/O path, except for one
	in the base system.  A RAID-5 won't do any better in that case.