[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference ssag::ask_ssag

Title:Ask the Storage Architecture Group
Notice:Check out our web page at http://www-starch.shr.dec.com
Moderator:SSAG::TERZAN
Created:Wed Oct 15 1986
Last Modified:Fri Jun 06 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:6756
Total number of notes:25276

6713.0. "swxcr rebild failed because of soft errors" by UTOPIE::OETTL (hide bug until worst time) Thu May 22 1997 06:26

Hello,

I have a problem with a RAID5 on an SWXCR controller.

This is the controller log:

# cat 12-May-18:02_XCR1.log
13-May-1997 12:27:43 The hard disk at channel 1, target 0 had 2 soft errors.
18-May-1997 23:49:47 Logical RAID drive 0 is degraded.
18-May-1997 23:49:47 The hard disk at channel 2, target 1 is failed.
18-May-1997 23:49:47 The hard disk at channel 2, target 1 had 2 soft errors.
21-May-1997 14:15:44 A new hard disk has been inserted at channel 2, target 1.
21-May-1997 14:20:59 The hard disk at channel 2, target 1 is write only.
21-May-1997 14:21:00 The hard disk at channel 2, target 1 is being rebuilt. 
This affects the
status of logical drive 0.
21-May-1997 14:21:00 Logical RAID drive 0 is rebuilding.
21-May-1997 15:11:22 The hard disk at channel 1, target 0 had 2 soft errors.
21-May-1997 15:26:36 The rebuild (on hard disk at channel 2, target 1) failed
because of bad
blocks in the source drive(s).

I read the SCXCR User's guide, section Troubleshooting a failed rebuild.

If I now enter the RCU and view the Rebuild BBT, and then exit from the utility,
the rebuild BBT entries, if there are any, will be cleared - according to the
manual.

If I now again start a rebuild and it fails, and I again have bad blocks, I 
have to reconstruct from backup media.

If the rebuild works ok, can I be sure that I do not have lost data?

Why can a soft error that has occured, fail a rebuild if this soft error is
not severe enough to drop that disk from the raidset?

This raidset is, together with several additional raidsets, part of a huge
(ca.150 GB) filedomain.

I want to "rmvol" this raidset, initialize it and "addvol" it again.
Now, if I remove this volume from the domain, will I find defective files, if
there are any, during the rmvol?

Any other ways to check the integrity?

Thank you for any help, �tzi
T.RTitleUserPersonal
Name
DateLines
6713.1I have same problem - what to do?NNTPD::"[email protected]"Mitch KulbergTue May 27 1997 13:1025
I am having the exact same problem as .0

We have a failed drive but the rebuild on the spare 
failed because there are soft errors on one of the disks.  
Now the system is running degraded but I can't add the 
new drive and have it pass rebuild.

I tried to do a parity check but it won't do that in the 
degraded state.

This is a fairly serious problem:

1) It looks like I have to trash the whole RAID volume and init a new
and restore the data.

2) As mentioned in the previous note, A soft error should not compromise
the ability of the system to rebuild itself.  If it does then we should
not call it a soft error.


Thanks for all your help
Sincerely,
Mitch Kulberg DTN 462-6062
[email protected]
[Posted by WWW Notes gateway]
6713.2Is there anyone with an idea?UTOPIE::OETTLhide bug until worst timeMon Jun 02 1997 09:290