[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference cookie::raid_software_for_openvms

Title:	RAID Software for OpenVMS
Notice:	READ IMPORTANT NOTE IN 3.15, V2.4 SSB Kit in 3.176
Moderator:	COOKIE::FROEHLIN

Created:	Fri Dec 03 1993
Last Modified:	Fri Jun 06 1997
Last Successful Update:	Fri Jun 06 1997
Number of topics:	341
Total number of notes:	1378

324.0. "RAID5 config with problems?" by TAGEIN::GRUENWALD () Fri Feb 28 1997 03:11

    Hi,
    
    I have a few questions about this configuration
    
     -------------                --------------               -----------
    |         VAX |   cluster-   |        ALPHA | local SCSI  |           |
    | vax 4500    |--------------| 8400         |-------------| RAID sets |
    | VMS v6.2    | interconnect | VMS v6.2-1H3 |  (KZMSA)    |           |
    | swRAID v2.3 | (DSSI, Eth)  | swRAID v2.3  |             |           |
     -------------                --------------               -----------
    
    . All firmwares were updated according to the latest (FW upd v3.8)
    . The raid sets were bind from ALPHA after booting both systems and
    . the sets were succesfully used from both systems.
    . While stopping and rebooting ALPHA, the other node (VAX) remained
    . blocked (this is normal - quorum lost) and resumed it's activity
    . afterwards
    
    In the configuration above if cluster member ALPHA fails (simulated with
    ctrl_P/ init) and comes back the first members of the raid sets are replaced
    by disks from the spareset. The reconstructing procedure goes down well.
    There were NO writes on the sets during this operation!
    
    Without the cluster (alpha - single VMS - with its SCSI storage) the same 
    action (ctrl_P / init / boot) doesn't cause such a failover in the raid set.
           
    The customer doesn't accept this behaviour. (He states that on his other
    site - with the same configuration - the data were corrupted while he
    removed and gave back one of the nodes according to this reconstructing.
    He had run autogen with the option REBOOT on the VAX when the data-
    corruption occured.)
    
    I asked for a copy of RAID$DIAGNOSTICS_*_NODE.LOG,RAID5_BIND.COM,
    RAID5_INIT.COM,ERRLOG.SYS and OPERATOR.LOG from both systems.
    
    Is the above configuration for raid software a supported one?
    
    What is the accepted behaviour for a raid set using from both nodes if one
    node crashes or just leaves the cluster? Does a special action have to be
    taken before coming back that failed node?
    
    regards and thanx in advance
    
    Michael

T.R	Title	User	Personal Name	Date	Lines
324.1		COOKIE::FROEHLIN	Let's RAID the Internet!	`Fri Feb 28 1997 09:24`	9
	Michael, are you sure the VAX is running V2.3 of RAID? Any cluster configuration is a legal RAID Software configuration (s. SPD for details). In this scenario none of the members should have been removed using V2.3 RAID. I'll run a test here. Guenther
324.2	RAID5 config with problems?	TAGEIN::GRUENWALD		`Mon Mar 03 1997 06:01`	12
	Guenther, > are you sure the VAX is running V2.3 of RAID? Yes, we are. That customer site is located in Hungary. I just spoke to the colleague there. He will get the data i asked for. There wheren't any ECOs installed! Regards Michael
324.3		COOKIE::FROEHLIN	Let's RAID the Internet!	`Mon Mar 03 1997 16:54`	6
	Michael, I could, to my surprise, reproduce this behavior here. Have to find out where RAID lost its mind and does this. Guenther
324.4		COOKIE::FROEHLIN	Let's RAID the Internet!	`Tue Mar 04 1997 11:37`	22
	There's a quick solution to that. Disable the timeout action on all RAID 5 arrays with: $ RAID MODIFY/NOTIMEOUT array-ID /TIMEOUT is the default and allows the driver to make a decision to remove a member device if an I/O to the disk driver did not return within timeout seconds (default is 30 sec.). The disadvantage of using /NOTIMEOUT is that a RAID 5 array may hang unnecessarily. Assume member devices are connected to different controllers and just one member device becomes unavailable. With /NOTIMEOUT RAID driver waits for the device to become accessible again which may take minutes or hours. Until then the whole array is inaccessible. With /TIMEOUT=n in place the array would be reduced after n seconds and the array is accessible again. This is a tradeoff the users have to make. Instead of choosing /NOTIMEOUT they can use /TIMEOUT=n with n large enough so that a serving node can reboot. Guenther
324.5	SCSI patch solved the problem	BPSTGA::TORONY	G�bor Tornyossy	`Wed Mar 05 1997 07:18`	25
	Guenther, I assume that the SCSI patch (ALPSCSI02_070) solved the problem. Installing step by step the patches suggested by Michael when we arrived to that patch this "member_0_reconstructing" behaviour has disappeared (there were writes to that raidset, breaking the alpha, ...). Some additional things: at the office I set up a test environment where (without any patches) I couldn't reproduce the problem. My envir: dec3000/300 with turbochannel scsi adapter (PMAZB) and vax3100/78 with NI cluster (the software versions were the same as at the customer). The differences between this and the site are the machines and the PCI/SCSI stuff (KZPSA) and the DSSI as an additional cluster interconnect. It gave a lesson to me: somehow beeing able to declare what patches to where before a problem arises (=what are the MUP like patches). We know (or we believe we know) the existence of patches, there are summaries concerning the different op_sys versions. But anyhow they are to many to read through them all... How do you solve it? Can you help? Thanks, G�bor (the involved engineer in Hungary)
324.6		nabsco::FROEHLIN	Let's RAID the Internet!	`Wed Mar 05 1997 09:33`	34
	G�bor, the problem mentioned by Michael in .0 has nothing to do with hardware or patches. Let me explain: RAID$DPRIVER does RAID member management. If an I/O to the underlying driver (DS/DU/DKdriver) returns with an error and the RAID 5 array is in normal state, the member will be removed. When a disk server disappears with outstanding I/Os, the I/Os are not returned while the disk has entered mount verification. But RAID$DPRIVER can timeout the I/Os if told so (RAID MODIFY/TIMEOUT=n). An error status returned for a member disk I/O starts a removal request for an array. The driver function goes thru all members of an array with such a request and checks members 0 to n. If all disks of this array have been served by a disappearing disk server and there were active I/Os it is likely that more than one member has a removal request. The driver function starts with the first member and, if the array is in normal state, removes it. Then the driver checks the next member. But since we are reduced now, no further member can be removed and the DPA device enters mount verification. Starting with V2.3 the driver now waits for 2 seconds before working on a removal request. If more than 1 member needs to be removed the driver skips any removal request. But if after 2 seconds there's exactly one member with a removal request it will be removed. If there have been no I/Os at all during the time the disk server reboots, no member will be removed. Hope this helps! Guenther
324.7		BPSTGA::TORONY	G�bor Tornyossy	`Thu Mar 06 1997 10:23`	36
	Guenther, thank you for the info. It's like a "technical liberal education" in the topic of sw raid. This is why one reads the notes. I'm serious. But holds the problem (and the customer) in a state of uncertainty: As of .1 - should work without reconstructing, .2 - no (=feature), .4 - use /NOTIMEOUT. ---> Changing dinamically? Too strange to leave it to the customer without giving any suggestions. .6 - it's obviuos that removes the first members... ---> How to use than in a cluster environment then anyhow? Still not clear: is it raining or the sun is shining? We smile, o.k. but in a swimming dress or under an umbrella? .What are the expectations in a cluster environment? Is it normal that when one clustermember comes back (for example after an autogen/reboot) while the other node remains working the raid will be reconstructing? Yes or not how to handle? .There were tests with and without that patch (or patches) causing changes in the behaviour. Right - nothing to do with that, then why. And let's say to the customer that although the problem seems to be disappeared nothing has been solved - so don't use it? What's the answer to his next obvious question? The customer has decided to use it (after the successful test, as we said it's okay). What will happen during the next reboot? The more so important since he is about deciding to leave or keep Digital customer services and what more - platform! Regards, G�bor
324.8		COOKIE::FROEHLIN	Let's RAID the Internet!	`Thu Mar 06 1997 11:58`	53
	G�bor, >Still not clear: is it raining or the sun is shining? We smile, o.k. but in a >swimming dress or under an umbrella? If it's raining reconstructs use the /NOTIMEOUT umbrella to bring out your smile ;-). >.What are the expectations in a cluster environment? Is it normal that when one >clustermember comes back (for example after an autogen/reboot) while the other >node remains working the raid will be reconstructing? Yes or not how to handle? The point is not a general cluster member but a disk server. Some cluster nodes might serve their local disks to other nodes and are therefore disk servers as well. Rebooting such a "disk server" with active I/Os to the disks will cause the disks (I'm talking physical disks like the RAID set members) to enter mount verification. RAID software has a special feature built-in for RAID 5 arrays which can tolerate the loss of one member. This feature is a timeout of disk I/Os which is by default turned on. The idea behind this is to remove a hindering member quickly and continue with the remaining members instead of stalling the whole RAID set. Most customers might not need/like this feature and therefore can turn it off dynamically at any time for specific arrays using the RAID MODIFY command. >.There were tests with and without that patch (or patches) causing changes in >the behaviour. Right - nothing to do with that, then why. And let's say to the Then why WHAT? >customer that although the problem seems to be disappeared nothing has been >solved - so don't use it? What's the answer to his next obvious question? Patches are typically early point fixes to problems. If a system has a severe impact caused by such a problem then the patch should be installed. Otherwise wait until the patch has been incorporated into the next release of the product (e.g. OpenVMS) and has hence passed a full qualification test. >The customer has decided to use it (after the successful test, as we said it's >okay). What will happen during the next reboot? What is expected? >The more so important since he is about deciding to leave or keep Digital >customer services and what more - platform! Because of the RAID reconstruct issue? Or did they have so many little flames started in the past and now it's a wildfire the customer thinks we, Digital, cannot extinguish? Guenther
324.9		BPSTGA::TORONY	G�bor Tornyossy	`Fri Mar 07 1997 06:57`	34
	Guenther, I enjoy that you joined in. It's unfortunate that because of geographical reasons it cannot be deepened beside a mug of beer. The customer wants this software raid not to recognise server shutdown/reboot (not to rebuild the set) moreover not to make any failure in the filesystem on it. We can suggest or not suggest to use the sold software raid. To be more precise we have to give them a procedure how to use it safetely and meeting their expectations. Therefore let me summarize if I understood well your letter: . If you have a cluster (like in .0) and you have to shutdown/reboot one member (the server of the disks in question) while the other should run (the process making the IO to that disk will hang - no problem) - if you don't want to allow removing the first member (even if you have a spare disk) then switch both drivers (on both involved nodes) not to recognise events (/NOTIMEOUT)and switch them back again as the disks become online again, or - you may live with the state change in your raidset (use spare, ...). . If there is a crash like event - removing the first member will take place (actions are the same as the line above) If this is the case the behaviour under or without the patch has only technical interest: what changes in the SCSI related driver code make raid software react differently. Thanks a lot, G�bor (True, the last sentence - about the customer's decision of leaving the Digital presence - turned out too theatrical. The story is grotescue but Digital's role (local and general) and conscience quite clear... I just wanted underline the importance of the problem.)
324.10		COOKIE::FROEHLIN	Let's RAID the Internet!	`Fri Mar 07 1997 09:25`	34
	G�bor, >reasons it cannot be deepened beside a mug of beer. Czech mana...ah! >precise we have to give them a procedure how to use it safetely and meeting >their expectations. I assume the /NOTIMEOUT does it. Your summarization is correct. Use /NOTIMOUT and no reconstructs happen when a disk server reboots either caused by a crash or a shutdown/reboot. >(not to rebuild the set) moreover not to make any failure in the filesystem on ^^^^^^^^^^^^^^^^^^^^^^^^^ You didn't mention this before. Any more details? >interest: what changes in the SCSI related driver code make raid software >react differently. Nothing I could think of. RAID driver just fabricates on or more I/O request packets for the real disk driver (DU/DS/DKdriver) and queues it to its I/O queue. It's up to the real disk driver to perform the I/O. RAID driver just orchestrates that. >Digital's role (local and general) and conscience quite clear... I just wanted >underline the importance of the problem.) I want to get this customer back on the road with his computing equipment...Digital Equipment. Guenther