|
>>
>>As I understand there is the possibility that a RAID5 logical unit on a
>>HSZ40 fails temporarily when a media device goes bad in that set and thus
>>corrupts the data in it:
>>
>> - Is this unique to the HSZ40 or would this happen on other
>> controllers as well (SWXCR, HSZ10, other vendor's RAID Controller)?
>
>this is a bug in the hsz40 firmware...I think later versions fix
>it but you would have to ask the HSZ folks.
I have one customer who doesn't use HSZ40 at all. He has some 20GB
internal disks with RAID 5 setup. He got a similar problem when the
3rd party soft ware indicate a media problem. He is using DU 3.2D.
When he restores the backup on another equivalent system without RAID 5
setup, the system seems to be OK. Is RAID 5 that unstable?
>>
>> - If this is unique to the HSZ40, is it HSOF dependent?
>>
>> - Would a RAID1 set be more secure here (i.e. would a RAID1 set
>> NEVER fail as long there is still enough device redundancy available?
>
>In the presence of bugs like this, it is hard to say what is
>more secure...if you are really paranoid, use LSM mirroring.
LSM would give an option of high availability but doesn't offer
fault tolerant which is what RAID 5 suppose to be.
That lead me to questions: Should OS 4.x is better to use with RAID5?
In case of OS V3.2D, do we have any patch(ES) to fix this problem
(what, where,how to find?)
Many thanks.
Gina
|
| In a perfect world you wouldn't need any form of RAID. This
being an imperfect world where devices fail, blocks go bad and
tradeoffs have to made, the various RAID levels offer differing
levels of protection against failures with tradeoffs in cost
and performance.
With RAID-5 in the normal state, it should be able to correct
errors on single devices or compensate for the loss of a single
device. It a particular RAID-5 implementation can't handle
this case, without letting an I/O error get through, it is
a rather poor one. If the array has lost one member, than
any I/O failure on the others will be passed back up as an
because all the other members contribute to the data regeneration.
Some implementions keep track of I/O failures and the state of
the data used for regeneration. If bad data would go into the
regeneration on an error, they will treat that as error instead
of returning the wrong data.
It isn't very informative just to say that a RAID-5 let an I/O
error get through, since there are many different causes.
Hopefully the array or software will keep track of the
detailed failure and offer some way to determine the cause
in case it is preventable.
re: LSM, fault tolerance and availability.
This doesn't make any sense. LSM offers mirroring, which is
superior to RAID-5 in every way but price. A properly configured
mirror can survive any fault in the I/O path, except for one
in the base system. A RAID-5 won't do any better in that case.
|