[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference spezko::cluster

Title:+ OpenVMS Clusters - The best clusters in the world! +
Notice:This conference is COMPANY CONFIDENTIAL. See #1.3
Moderator:PROXY::MOORE
Created:Fri Aug 26 1988
Last Modified:Fri Jun 06 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:5320
Total number of notes:23384

5312.0. "Optimum Subcluster Selection" by OSOV03::KAGEYAMA (I Got Rhythm!) Sun May 18 1997 02:42

One of my customers experienced cluster exit bugchecks of all his 
satellites at the same. This was caused by disconnecting an Ethernet 
cable of one of his CI member.

His MI-cluster is consisted of 5 CI nodes(each has one vote) and three 
NI satellites(each has no vote). By disconnecting an Ethernet cable of 
one of CI members his cluster could be 5 CI nodes from NI connection 
lost CI member or 3 satellites and 4 CI member from satellites who 
couldn't see one of CI member. The connection manager calculated the 
merit figure(256 x votes + # of subcluster members) of the former as 
1285 (256 x 5 + 5) and the latter as 1032 (256 x 4 + 7), and chose to 
the former as the optimal subcluster and the satellites should have been
left.

The algorithm is clear but he prefers one CI node crash to all 
satellite member crash. We're planning to 

1. give each satellites one vote and also one to the quorum disk

	EXPECTED_VOTES = 9
	VOTES = 1 (for all members)
	QDSKVOTES = 1

2. give each satellites one vote and give each CI nodes two votes, thus
  avoiding a quorum disk.

	EXPECTED_VOTES = 13
	VOTES = 2 (for CI members), VOTES = 1 (for CI satellites)
	QDSKVOTES = 0

Any opinions would be appreciated. Regards.

- Kazunori


Detaild information

	Two AlphaServer 8400s, one DEC7730, one DEC7630, one VAX6610(the
	one disconncted its Ethernet cable), three VAXstation 4000s.
	OpenVMS Alpha V6.2-1H3 and OpenVMS VAX 6.2 for their OSs.
T.RTitleUserPersonal
Name
DateLines
5312.1ALEPPO::mse_notbuk.mse.tay.dec.com::bowker[email protected]Mon May 19 1997 09:199
The behaviour you saw is expected and normal. Either one of the proposals you 
specified would prevent the cluexit that you experienced. Of the two choices I would 
prefer the second, as it would give more weight to the CI nodes and avoid a quorum 
disk.

A third choice would be to give one satellite a vote and the all other satellites 
zero votes. This would prevent the cluexit for the case specified.

Joe