[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference spezko::cluster

Title:+ OpenVMS Clusters - The best clusters in the world! +
Notice:This conference is COMPANY CONFIDENTIAL. See #1.3
Moderator:PROXY::MOORE
Created:Fri Aug 26 1988
Last Modified:Fri Jun 06 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:5320
Total number of notes:23384

5297.0. "how do you cluster 2 nodes thru fddi?" by MANM01::NOELGESMUNDO () Fri Apr 25 1997 08:37

    We have this customer who has an AS4100 (VMS6.2-1h3) and a DEC 4000
    (6.1). These systems are supposed to be about 500 meters away from each
    other and will be connected and clustered thru FDDI. I have been seeing
    brochures of multiple site clustering but I have not read on how this
    is done, especially if there are only 2 nodes. We also have another
    client with the same situation but using 8400 and 4100. On the 8400 and
    4100 I have set votes=2 and 1 for 8400 and 4100 respectively and
    expected votes=2 so that 8400 will still work even if the 4100 goes
    down.
    
    What is the best way to configure a cluster of only 2 nodes over FDDI?
    Is it, at all, possible?  Since there seemed to be no common disk
    between the 2 nodes, how do we set it up so that when one node fails,
    the remaining node will continue to work?
    
    Any suggestions and tips will be greatly appreciated.
    
    Thanks.
    
    Noel Gesmundo
    MCS/Digital Eqpt Filipinas
T.RTitleUserPersonal
Name
DateLines
5297.1BSS::JILSONWFH in the Chemung River ValleyFri Apr 25 1997 09:421
The easiest thing to do is add a quorum system.
5297.2Please Learn About Quorum Scheme...XDELTA::HOFFMANSteve, OpenVMS EngineeringFri Apr 25 1997 10:3740
: I have been seeing brochures of multiple site clustering but I have
:not read on how this is done, especially if there are only 2 nodes.

   Please take the time to read through the _Guidelines for VMScluster
   Configurations_ manual.

   Do not attempt to bypass the quorum scheme -- if you have three
   votes configured, set the EXPECTED_VOTES to three.  The quorum
   scheme is present to prevent user and system data corruption;
   it is not something that was implemented just to be bypassed.
   *Severe* user and system data corruptions can arise -- though
   the quorum scheme is designed to be difficult to bypass, there
   are a few configurations where folks have corrupted their disks.

   You will want to configure the nodes:

	peer:
		equal nodes.  Failure of either will cause
		the other to enter a "user data integrity
		interlock" (aka: quorum hang).  This is the
		classic "two node VMScluster", and this tends
		to be a configuration most folks avoid.

	primary-secondary:
		the AlphaServer 8400 has all the votes, and
		the Alphaserver 4100 can boot or fail without
		affecting the AlphaServer 8400.  Failure of
		the AlphaServer 8400 will cause the AlphaServer
		4100 to enter a "user data integrity interlock"
		(aka: quorum hang).

	with a third (voting) node:
		The best solution.  Failure of any one node
		will not affect the other two nodes.  Each
		node has VOTES=1 and each has EXPECTED_VOTES=3.

    If a third node is not available, systems are often configured with
    a non-served shared quorum disk on a shared interconnect -- but with
    the distance you are considering, there are no shared storage
    interconnects available.
5297.3Check into the BRS productsCSC32::B_HIBBERTWhen in doubt, PANICFri Apr 25 1997 12:0028
    Just to re-inforce Steve's emphasis on using the quorum scheme
    correctly, I will give an example of WHY you don't want each node
    to be able to continue independently in a 2 node multisite cluster.
    
    Consider a cluster that has 1 node at each of 2 sites.  Each site has 1
    member of dual member shadow sets for data protection.  The cluster
    has it's voting scheme set incorrectly so that each node can run
    independently of the other.  A failure occurs in the interconnect
    (backhoe cuts the fiber, bridge dies, etc.).  Both nodes continue
    running, the shadow sets split, data is being entered by users at both
    sites.  Sounds good right? WRONG!!!  Each copy of the database is being
    updated independently of the other.  When the interconnect problem is
    repaired you will have 2 completely different copies of the database. 
    If you reform the shadow sets, any modifications made from 1 of the
    systems will be lost when the shadow copy is done.  Data is LOST!!!
    If you set the cluster up to hang when quorum is lost, or set 1 node to
    have all the votes so that it can continue, but the other one will hang
    you will be better off. Or better yet, add a 3rd node as a quorum
    vote server.
    
    Please consider the BRS services for a multisite data center.  This
    product set is intended to discover issues like the above and to
    provide solutions for the problems.  The product may seem expensive on
    the surface, but it is well worth it of your customer's data is
    important.
    
    Brian Hibbert
    
5297.4EXPECTED_VOTES should be called TOTAL_VOTESWIBBIN::NOYCEPulling weeds, pickin' stonesFri Apr 25 1997 15:2617
>   On the 8400 and
>    4100 I have set votes=2 and 1 for 8400 and 4100 respectively and
>    expected votes=2 so that 8400 will still work even if the 4100 goes
>    down.

There seems to be a common misconception about what EXPECTED_VOTES means.
Many people think it's the number of votes required for the cluster to
keep running, so they set it to just over 1/2 the total number of votes
that are available.  This is wrong.

EXPECTED_VOTES is supposed to be the *total* number of votes in all nodes
that might ever participate in the cluster.  VMS automatically figures out
a quota (a value just over 1/2 of EXPECTED_VOTES) that prevents you from
running a partitioned cluster, as explained in .3

So in the example quoted at the top of the note, EXPECTED_VOTES should be
set to 2+1=3, since the two nodes have a total of 3 votes.
5297.5Thanks. will check BRS.MANM01::NOELGESMUNDOSat Apr 26 1997 03:4415
    Thanks for all the replies.
    
    We have sold this setup to this customer and are expecting this to work
    without additional node (the best solution!). I have already informed
    one of the customer about the quorum scheme and that if the node with
    higher number of votes fail, the whole cluster fail as well and may
    have to reboot the remaining good system with modified parameters.
    
    I have come across BRS in the Guidelines to OpenVMS Cluster
    Configurations and will have to do some research on this area with the
    hope that is will help in this kind of situation.
    
    Regards.
    
    Noel
5297.6also check IPC mechanismHAN::HALLEVolker Halle MCS @HAO DTN 863-5216Sat Apr 26 1997 07:1420
    Noel,
    
    in case the node with the 'higher' votes fails, the other node will
    just 'hang'. You don't have to reboot it, just use the IPC interrupt
    and recalculate quorum. You can do this from the console with the
    following commands:
    
    	<CTRL/P>
    	>>> D SIRR C
    	>>> C
    	IPC> Q
    	IPC> <CTRL/Z>
    
    Check the documentation. It works ! As a general warning: the customer
    should be well aware of the cluster mechanisms and the system manager
    needs to be well trained. Running a clsuter like this without 'cluster
    knowledge' and well-tested 'emergency' procedures puts you customer's
    data at risk ! That's what BRS is for...
    
    Volker.
5297.7EXPECTED_VOTES is a "blade guard"...XDELTA::HOFFMANSteve, OpenVMS EngineeringMon Apr 28 1997 14:228
   And if EXPECTED_VOTES is mis-set, it will be corrected automatically
   as soon as it is noticed by the VMScluster connection manager -- if
   the incorrect setting is not detected during bootstrap (due to some
   connectivity problem; due to a situation this EXPECTED_VOTES setting
   is designed for and intended to detect), severe user and system disk
   data corruptions can be expected, and have occured.