[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference spezko::cluster

Title:	+ OpenVMS Clusters - The best clusters in the world! +
Notice:	This conference is COMPANY CONFIDENTIAL. See #1.3
Moderator:	PROXY::MOORE

Created:	Fri Aug 26 1988
Last Modified:	Fri Jun 06 1997
Last Successful Update:	Fri Jun 06 1997
Number of topics:	5320
Total number of notes:	23384

5223.0. "Jukebox devices MVTIMEOUT" by HGOVC::CSCHAN () Tue Feb 04 1997 01:55

	This notes has been cross posted in optical notes conference.

	There is a 3 nodes CI/NI cluster: Alpha 4000, DEC7730 and DEC7750 
	with RW534 direct connection. After a node leave and re-joint the 
	cluster, many jukebox devices go into mount verification timeout 
	stage. Customer had to reboot the whole cluster to get access those
	jukebox devices. 

	According to the information from customer, one of event happened 
	in the following sequence:
	- Customer shutdown and reboot the Alpha 4000 system
	- Many disks went into mount verification stage.
	- Sometime later, find many mounted jukebox devices (13 out of 20 
	  mounted devices) went into mount verification timeout in Alpha 4000.
	- Same things happened on DEC7750 that has the RW534 direct connected.
        - It is different on DEC7730 system, all mounted jukebox devices have
	  gone to mount verification timeout stage. This is  the most busy
          system in the cluster.
	- Customer tried to dismount those MVtimout devices, process hung.
        - Customer had to reboot the whole cluster.


	VMS6.2, OSMS 3.3-1

	Sysgen Parameter: MVTIMEOUT: 3600 
			  MSCP_LOAD: 1
   			  MSCP_SERVE_ALL:1
	
	All the Jukeboxs devices are served clusterwise. There are 88 (176 
	logical units) cartridges in the optical disk library and only 20 
        mounted usually.
 

	Questions:

	1. Is it a normal behavior for cluster with jukebox devices?

	2. How can we prevent this problem happens?

	I plan to increase the MVTIMEOUT value but I am not sure it can
	fix the problem and whether there is any side effect.

T.R	Title	User	Personal Name	Date	Lines
5223.1		TAPE::SENEKER	Head banging causes brain mush	`Tue Feb 04 1997 10:12`	2
	From the OSMS standpoint I have replied in OPTICAL, note 770. Rob
5223.2	mount/cluster without DNS/DFS?	HGOVC::CSCHAN		`Wed Feb 05 1997 02:53`	53
	I found an artical in TIMA that quote "[RW5XX] Document...Use of Clusters with Optical Software". I think it is a good reference but there are some queries: 5.1.D VMS MOUNT and DISMOUNT operations for each optical volume will consume time equivalent to the traditional magnetic disk volume VMS MOUNT and DISMOUNT, plus a per-volume swap time of fifteen seconds plus the specified MINSWAP delay. > Base on this information, should I set the sysgen parameter: > > MVTIMEOUT > (15 + MINSWAP) * number of mounted jukebox devices ? > 5.2 Use of OSMS and OSDS without decDFS and decDNS 5.2.B. Platters must not be mounted /SERVED nor /CLUSTER nor /SYSTEM (which implies /SERVED). Served platters trigger OpenVMS MSCP services which currently are not compatible with removable storage. > The cluster does not use decDFS and decDNS. Could those Platters > be mounted /cluster or /system? 5.2.C. MOUNT and DISMOUNT operations may consume several hours on larger autochangers with as many as 288 volumes. This time may be reduced by mounting only as many volumes as are absolutely required, by mounting volumes as /NOWRITE (readonly) volumes where possible, by using MCR JBUTIL SET PARAMETER /MINSWAP=5 to set the platter hold time to a minimum, and by keeping as few files open as is necessary. 5.2.D. Cluster transitions must be avoided. Processors joining an active cluster do not respect the "special" nature of optical disks. This limitation is particularly true when the node rejoining the cluster contains the interface adapter to the autochanger and drives. Any cluster transition will cause all disks with outstanding I/O (optical as well as magnetic) > clusterwide to begin a mount-verification, which may not > complete before an operation timeout occurs which causes another > mount verification to begin, ad infinitum. > The customer has to shutdown/reboot one of the system often. In this > case, how can we eliminate "operation timeout" situation happening?