T.R | Title | User | Personal Name | Date | Lines |
---|
5297.1 | | BSS::JILSON | WFH in the Chemung River Valley | Fri Apr 25 1997 09:42 | 1 |
| The easiest thing to do is add a quorum system.
|
5297.2 | Please Learn About Quorum Scheme... | XDELTA::HOFFMAN | Steve, OpenVMS Engineering | Fri Apr 25 1997 10:37 | 40 |
| : I have been seeing brochures of multiple site clustering but I have
:not read on how this is done, especially if there are only 2 nodes.
Please take the time to read through the _Guidelines for VMScluster
Configurations_ manual.
Do not attempt to bypass the quorum scheme -- if you have three
votes configured, set the EXPECTED_VOTES to three. The quorum
scheme is present to prevent user and system data corruption;
it is not something that was implemented just to be bypassed.
*Severe* user and system data corruptions can arise -- though
the quorum scheme is designed to be difficult to bypass, there
are a few configurations where folks have corrupted their disks.
You will want to configure the nodes:
peer:
equal nodes. Failure of either will cause
the other to enter a "user data integrity
interlock" (aka: quorum hang). This is the
classic "two node VMScluster", and this tends
to be a configuration most folks avoid.
primary-secondary:
the AlphaServer 8400 has all the votes, and
the Alphaserver 4100 can boot or fail without
affecting the AlphaServer 8400. Failure of
the AlphaServer 8400 will cause the AlphaServer
4100 to enter a "user data integrity interlock"
(aka: quorum hang).
with a third (voting) node:
The best solution. Failure of any one node
will not affect the other two nodes. Each
node has VOTES=1 and each has EXPECTED_VOTES=3.
If a third node is not available, systems are often configured with
a non-served shared quorum disk on a shared interconnect -- but with
the distance you are considering, there are no shared storage
interconnects available.
|
5297.3 | Check into the BRS products | CSC32::B_HIBBERT | When in doubt, PANIC | Fri Apr 25 1997 12:00 | 28 |
| Just to re-inforce Steve's emphasis on using the quorum scheme
correctly, I will give an example of WHY you don't want each node
to be able to continue independently in a 2 node multisite cluster.
Consider a cluster that has 1 node at each of 2 sites. Each site has 1
member of dual member shadow sets for data protection. The cluster
has it's voting scheme set incorrectly so that each node can run
independently of the other. A failure occurs in the interconnect
(backhoe cuts the fiber, bridge dies, etc.). Both nodes continue
running, the shadow sets split, data is being entered by users at both
sites. Sounds good right? WRONG!!! Each copy of the database is being
updated independently of the other. When the interconnect problem is
repaired you will have 2 completely different copies of the database.
If you reform the shadow sets, any modifications made from 1 of the
systems will be lost when the shadow copy is done. Data is LOST!!!
If you set the cluster up to hang when quorum is lost, or set 1 node to
have all the votes so that it can continue, but the other one will hang
you will be better off. Or better yet, add a 3rd node as a quorum
vote server.
Please consider the BRS services for a multisite data center. This
product set is intended to discover issues like the above and to
provide solutions for the problems. The product may seem expensive on
the surface, but it is well worth it of your customer's data is
important.
Brian Hibbert
|
5297.4 | EXPECTED_VOTES should be called TOTAL_VOTES | WIBBIN::NOYCE | Pulling weeds, pickin' stones | Fri Apr 25 1997 15:26 | 17 |
| > On the 8400 and
> 4100 I have set votes=2 and 1 for 8400 and 4100 respectively and
> expected votes=2 so that 8400 will still work even if the 4100 goes
> down.
There seems to be a common misconception about what EXPECTED_VOTES means.
Many people think it's the number of votes required for the cluster to
keep running, so they set it to just over 1/2 the total number of votes
that are available. This is wrong.
EXPECTED_VOTES is supposed to be the *total* number of votes in all nodes
that might ever participate in the cluster. VMS automatically figures out
a quota (a value just over 1/2 of EXPECTED_VOTES) that prevents you from
running a partitioned cluster, as explained in .3
So in the example quoted at the top of the note, EXPECTED_VOTES should be
set to 2+1=3, since the two nodes have a total of 3 votes.
|
5297.5 | Thanks. will check BRS. | MANM01::NOELGESMUNDO | | Sat Apr 26 1997 03:44 | 15 |
| Thanks for all the replies.
We have sold this setup to this customer and are expecting this to work
without additional node (the best solution!). I have already informed
one of the customer about the quorum scheme and that if the node with
higher number of votes fail, the whole cluster fail as well and may
have to reboot the remaining good system with modified parameters.
I have come across BRS in the Guidelines to OpenVMS Cluster
Configurations and will have to do some research on this area with the
hope that is will help in this kind of situation.
Regards.
Noel
|
5297.6 | also check IPC mechanism | HAN::HALLE | Volker Halle MCS @HAO DTN 863-5216 | Sat Apr 26 1997 07:14 | 20 |
| Noel,
in case the node with the 'higher' votes fails, the other node will
just 'hang'. You don't have to reboot it, just use the IPC interrupt
and recalculate quorum. You can do this from the console with the
following commands:
<CTRL/P>
>>> D SIRR C
>>> C
IPC> Q
IPC> <CTRL/Z>
Check the documentation. It works ! As a general warning: the customer
should be well aware of the cluster mechanisms and the system manager
needs to be well trained. Running a clsuter like this without 'cluster
knowledge' and well-tested 'emergency' procedures puts you customer's
data at risk ! That's what BRS is for...
Volker.
|
5297.7 | EXPECTED_VOTES is a "blade guard"... | XDELTA::HOFFMAN | Steve, OpenVMS Engineering | Mon Apr 28 1997 14:22 | 8 |
|
And if EXPECTED_VOTES is mis-set, it will be corrected automatically
as soon as it is noticed by the VMScluster connection manager -- if
the incorrect setting is not detected during bootstrap (due to some
connectivity problem; due to a situation this EXPECTED_VOTES setting
is designed for and intended to detect), severe user and system disk
data corruptions can be expected, and have occured.
|