[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference spezko::cluster

Title:	+ OpenVMS Clusters - The best clusters in the world! +
Notice:	This conference is COMPANY CONFIDENTIAL. See #1.3
Moderator:	PROXY::MOORE

Created:	Fri Aug 26 1988
Last Modified:	Fri Jun 06 1997
Last Successful Update:	Fri Jun 06 1997
Number of topics:	5320
Total number of notes:	23384

5297.0. "how do you cluster 2 nodes thru fddi?" by MANM01::NOELGESMUNDO () Fri Apr 25 1997 07:37

    We have this customer who has an AS4100 (VMS6.2-1h3) and a DEC 4000
    (6.1). These systems are supposed to be about 500 meters away from each
    other and will be connected and clustered thru FDDI. I have been seeing
    brochures of multiple site clustering but I have not read on how this
    is done, especially if there are only 2 nodes. We also have another
    client with the same situation but using 8400 and 4100. On the 8400 and
    4100 I have set votes=2 and 1 for 8400 and 4100 respectively and
    expected votes=2 so that 8400 will still work even if the 4100 goes
    down.
    
    What is the best way to configure a cluster of only 2 nodes over FDDI?
    Is it, at all, possible?  Since there seemed to be no common disk
    between the 2 nodes, how do we set it up so that when one node fails,
    the remaining node will continue to work?
    
    Any suggestions and tips will be greatly appreciated.
    
    Thanks.
    
    Noel Gesmundo
    MCS/Digital Eqpt Filipinas

T.R	Title	User	Personal Name	Date	Lines
5297.1		BSS::JILSON	WFH in the Chemung River Valley	`Fri Apr 25 1997 08:42`	1
	The easiest thing to do is add a quorum system.
5297.2	Please Learn About Quorum Scheme...	XDELTA::HOFFMAN	Steve, OpenVMS Engineering	`Fri Apr 25 1997 09:37`	40
	: I have been seeing brochures of multiple site clustering but I have :not read on how this is done, especially if there are only 2 nodes. Please take the time to read through the _Guidelines for VMScluster Configurations_ manual. Do not attempt to bypass the quorum scheme -- if you have three votes configured, set the EXPECTED_VOTES to three. The quorum scheme is present to prevent user and system data corruption; it is not something that was implemented just to be bypassed. Severe user and system data corruptions can arise -- though the quorum scheme is designed to be difficult to bypass, there are a few configurations where folks have corrupted their disks. You will want to configure the nodes: peer: equal nodes. Failure of either will cause the other to enter a "user data integrity interlock" (aka: quorum hang). This is the classic "two node VMScluster", and this tends to be a configuration most folks avoid. primary-secondary: the AlphaServer 8400 has all the votes, and the Alphaserver 4100 can boot or fail without affecting the AlphaServer 8400. Failure of the AlphaServer 8400 will cause the AlphaServer 4100 to enter a "user data integrity interlock" (aka: quorum hang). with a third (voting) node: The best solution. Failure of any one node will not affect the other two nodes. Each node has VOTES=1 and each has EXPECTED_VOTES=3. If a third node is not available, systems are often configured with a non-served shared quorum disk on a shared interconnect -- but with the distance you are considering, there are no shared storage interconnects available.
5297.3	Check into the BRS products	CSC32::B_HIBBERT	When in doubt, PANIC	`Fri Apr 25 1997 11:00`	28
	Just to re-inforce Steve's emphasis on using the quorum scheme correctly, I will give an example of WHY you don't want each node to be able to continue independently in a 2 node multisite cluster. Consider a cluster that has 1 node at each of 2 sites. Each site has 1 member of dual member shadow sets for data protection. The cluster has it's voting scheme set incorrectly so that each node can run independently of the other. A failure occurs in the interconnect (backhoe cuts the fiber, bridge dies, etc.). Both nodes continue running, the shadow sets split, data is being entered by users at both sites. Sounds good right? WRONG!!! Each copy of the database is being updated independently of the other. When the interconnect problem is repaired you will have 2 completely different copies of the database. If you reform the shadow sets, any modifications made from 1 of the systems will be lost when the shadow copy is done. Data is LOST!!! If you set the cluster up to hang when quorum is lost, or set 1 node to have all the votes so that it can continue, but the other one will hang you will be better off. Or better yet, add a 3rd node as a quorum vote server. Please consider the BRS services for a multisite data center. This product set is intended to discover issues like the above and to provide solutions for the problems. The product may seem expensive on the surface, but it is well worth it of your customer's data is important. Brian Hibbert
5297.4	EXPECTED_VOTES should be called TOTAL_VOTES	WIBBIN::NOYCE	Pulling weeds, pickin' stones	`Fri Apr 25 1997 14:26`	17
	> On the 8400 and > 4100 I have set votes=2 and 1 for 8400 and 4100 respectively and > expected votes=2 so that 8400 will still work even if the 4100 goes > down. There seems to be a common misconception about what EXPECTED_VOTES means. Many people think it's the number of votes required for the cluster to keep running, so they set it to just over 1/2 the total number of votes that are available. This is wrong. EXPECTED_VOTES is supposed to be the total number of votes in all nodes that might ever participate in the cluster. VMS automatically figures out a quota (a value just over 1/2 of EXPECTED_VOTES) that prevents you from running a partitioned cluster, as explained in .3 So in the example quoted at the top of the note, EXPECTED_VOTES should be set to 2+1=3, since the two nodes have a total of 3 votes.
5297.5	Thanks. will check BRS.	MANM01::NOELGESMUNDO		`Sat Apr 26 1997 02:44`	15
	Thanks for all the replies. We have sold this setup to this customer and are expecting this to work without additional node (the best solution!). I have already informed one of the customer about the quorum scheme and that if the node with higher number of votes fail, the whole cluster fail as well and may have to reboot the remaining good system with modified parameters. I have come across BRS in the Guidelines to OpenVMS Cluster Configurations and will have to do some research on this area with the hope that is will help in this kind of situation. Regards. Noel
5297.6	also check IPC mechanism	HAN::HALLE	Volker Halle MCS @HAO DTN 863-5216	`Sat Apr 26 1997 06:14`	20
	Noel, in case the node with the 'higher' votes fails, the other node will just 'hang'. You don't have to reboot it, just use the IPC interrupt and recalculate quorum. You can do this from the console with the following commands: <CTRL/P> >>> D SIRR C >>> C IPC> Q IPC> <CTRL/Z> Check the documentation. It works ! As a general warning: the customer should be well aware of the cluster mechanisms and the system manager needs to be well trained. Running a clsuter like this without 'cluster knowledge' and well-tested 'emergency' procedures puts you customer's data at risk ! That's what BRS is for... Volker.
5297.7	EXPECTED_VOTES is a "blade guard"...	XDELTA::HOFFMAN	Steve, OpenVMS Engineering	`Mon Apr 28 1997 13:22`	8
	And if EXPECTED_VOTES is mis-set, it will be corrected automatically as soon as it is noticed by the VMScluster connection manager -- if the incorrect setting is not detected during bootstrap (due to some connectivity problem; due to a situation this EXPECTED_VOTES setting is designed for and intended to detect), severe user and system disk data corruptions can be expected, and have occured.