[Search for users]
[Overall Top Noters]
[List of all Conferences]
[Download this site]
Title: | + OpenVMS Clusters - The best clusters in the world! + |
Notice: | This conference is COMPANY CONFIDENTIAL. See #1.3 |
Moderator: | PROXY::MOORE |
|
Created: | Fri Aug 26 1988 |
Last Modified: | Fri Jun 06 1997 |
Last Successful Update: | Fri Jun 06 1997 |
Number of topics: | 5320 |
Total number of notes: | 23384 |
5312.0. "Optimum Subcluster Selection" by OSOV03::KAGEYAMA (I Got Rhythm!) Sun May 18 1997 02:42
One of my customers experienced cluster exit bugchecks of all his
satellites at the same. This was caused by disconnecting an Ethernet
cable of one of his CI member.
His MI-cluster is consisted of 5 CI nodes(each has one vote) and three
NI satellites(each has no vote). By disconnecting an Ethernet cable of
one of CI members his cluster could be 5 CI nodes from NI connection
lost CI member or 3 satellites and 4 CI member from satellites who
couldn't see one of CI member. The connection manager calculated the
merit figure(256 x votes + # of subcluster members) of the former as
1285 (256 x 5 + 5) and the latter as 1032 (256 x 4 + 7), and chose to
the former as the optimal subcluster and the satellites should have been
left.
The algorithm is clear but he prefers one CI node crash to all
satellite member crash. We're planning to
1. give each satellites one vote and also one to the quorum disk
EXPECTED_VOTES = 9
VOTES = 1 (for all members)
QDSKVOTES = 1
2. give each satellites one vote and give each CI nodes two votes, thus
avoiding a quorum disk.
EXPECTED_VOTES = 13
VOTES = 2 (for CI members), VOTES = 1 (for CI satellites)
QDSKVOTES = 0
Any opinions would be appreciated. Regards.
- Kazunori
Detaild information
Two AlphaServer 8400s, one DEC7730, one DEC7630, one VAX6610(the
one disconncted its Ethernet cable), three VAXstation 4000s.
OpenVMS Alpha V6.2-1H3 and OpenVMS VAX 6.2 for their OSs.
T.R | Title | User | Personal Name | Date | Lines |
---|
5312.1 | | ALEPPO::mse_notbuk.mse.tay.dec.com::bowker | [email protected] | Mon May 19 1997 09:19 | 9 |
| The behaviour you saw is expected and normal. Either one of the proposals you
specified would prevent the cluexit that you experienced. Of the two choices I would
prefer the second, as it would give more weight to the CI nodes and avoid a quorum
disk.
A third choice would be to give one satellite a vote and the all other satellites
zero votes. This would prevent the cluexit for the case specified.
Joe
|