[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference ssdevo::hsz40_product

Title:HSZ40 Product Conference
Moderator:SSDEVO::EDMONDS
Created:Mon Apr 11 1994
Last Modified:Fri Jun 06 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:902
Total number of notes:3319

858.0. "RA410 to RA450 Lost_data?" by TOPTEN::AVERBACH () Mon Apr 28 1997 14:37

    
    	I just did a couple of Raid Array 410 to 450 upgrades, they all
    went pretty smooth. I had one problem, on the first upgrade when I went
    to add the units it complained that they had lost_data. I had to do a 
    CLEAR LOST_DATA on the units and then the raidsets all rebuilt. This
    only happened on one of the 2 upgrades.
    	In both cases the Sun box was shutdown, the cache had no unflushed
    data and I shutdown both HSZ40's prior to powering them down to install
    the HSZ50's. What exactly would cause this to occur and why only on one
    of the pairs?
    							Thanks,
    								Joe
T.RTitleUserPersonal
Name
DateLines
858.1SSDEVO::T_GONZALESMon Apr 28 1997 15:164
    Hi Joe,  How did you exactly do the upgrade? Did you delete the
    units and any raid sets before you upgraded and then
    readd them after the upgrade? Also what version of hsof were the
    raid arrays running before the upgrade?
858.2MSE1::PCOTEpress one now for personal nameMon Apr 28 1997 16:323

  where is the procedure on performing the upgrade ?
858.3EK-SM410-UPTOPTEN::AVERBACHMon Apr 28 1997 18:0316
    
    	The procedure is documented in a manual with P/N EK-SM410-UP, it
    contains Raid Array 410 to Raid Array 450 upgrade procedures for
    all platforms. I found it on the Web in the learning utility browser.
    
    	The customer was running HSOF V2.7z-2 prior to the upgrade, I did
    not delete any of the unit or raidsets. I shutdown both controllers in
    the dual redundant configuration, then powered the box down replaced
    the HSZ40's with HSZ50's running v5.0z-2, ran config, added the
    raidsets, then went to add the units. It was at that point I got the
    message about the LOST_DATA and had to CLEAR LOST DATA on the units.
    	I did this procedure on 2 pairs of dual redundant controllers both
    running on the Sun Solaris platform, but only encountered the lost data
    messages on one pair. The customers data appeared to be intact.
    
    							Joe
858.4SSDEVO::T_GONZALESThu May 01 1997 10:309
    It appears that there might be a "glitch" in the upgrade procedures.
    I will look into it. It's probably a good idea to delete the
    units and raid sets before upgrading and then add everyting back in.
    
    however don't do any initialize commands, this rebuilds the metadata,
    which you do not want to do.
    
    tom
    
858.5TOPTEN::AVERBACHThu May 01 1997 12:567
    
    	I knew enough to definetly not do an INIT. But what exactly did the
    Clear lost_data command do and why did it cause the raidsets to go
    through a rebuild? Also I assume that the rebuild, was a rebuild of the
    metadata. The customers data seemed to remain, I was kind of nervous
    when I saw the rebuild going on.
    								Joe
858.6Why the rebuild:SSDEVO::JACKSONJim Jackson, HSx RAID teamThu May 01 1997 14:1618
The writeback cache is used to close the "write hole" inherent in RAID5.  If
the writeback cache is lost, it is possible that it was lost in the middle
of a physical RAID5 write.  This can cause the corruption of a parity block,
which can cause data corruption in the future when a member of the RAID5 set
is lost.

The controller believes that the writeback cache was lost (thus the LOST
DATA indication).  Thus, when a CLEAR LOST_DATA is issued, the controller
destroys all redundancy by marking the parity blocks invalid.  This
eliminates the possibility of bad parity causing future data corruption.

The rebuild is to reconstruct the parity information.  It is the same sort
of reconstruct done after a raidset is first initialized.

I have not seen the shutdown procedure documentation, so I can't say whether
or not it's correct.  However, I can say that if a controller is powered off
without doing a SHUTDOWN command on *each* controller, you will see LOST
DATA on each RAID5 set you have after replacing controller/cache modules.