[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference spezko::cluster

Title:+ OpenVMS Clusters - The best clusters in the world! +
Notice:This conference is COMPANY CONFIDENTIAL. See #1.3
Moderator:PROXY::MOORE
Created:Fri Aug 26 1988
Last Modified:Fri Jun 06 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:5320
Total number of notes:23384

5234.0. "Quorum Disks in SCSI Clusters" by STAR::NCARR (Talk dates & features - but never together....) Fri Feb 21 1997 14:45

We've recently realised that the documentation for Quorum Disks in SCSI
Clusters is somewhat lacking. This will get fixed in the next release.

In the meantime note that in most cases QDSKINTERVAL for a Quorum Disk in a
SCSI Cluster should be set to one (1). This is because when a node boots it
issues SCSI bus resets. These resets cause nodes that are already running to
Mount Verify the disks and re-validate the Quorum file. In many configurations
this will lead to the loss of Quorum for 40 seconds (four times the default
QDSKINTERVAL). This does not happen with CI/DSSI clusters.

Setting QDSKINTERVAL to 1 will result in two I/Os per second (one read and one
write) to the Quorum Disk. Note that only one node in a cluster ever performs
I/O to the Quorum file. It does this on behalf of all other nodes in the
cluster (regardless of their DISK_QUORUM value).

(In other words, even when there are multiple Quorum Disk Watchers, only one of
them performs I/Os to the quorum file. If the cluster partitions, one node in
each partition will perform I/Os to the Quorum file, and the multiple writers
will be detected.)

It is essential that QDSKINTERVAL values do not vary by more than a
factor of four across all the nodes in the cluster. Failure to do this could
result in cluster partitions not being detected correctly. (For example, do not
have nodes with QDSKINTERVAL values of 2 and 9 in the same cluster; values of 2
and 8 would be OK.)  It is strongly recommended that QDSKINTERVAL be the same
value on all nodes.  You may set different values temporarily (within the
factor of four limitation) in order to transition all the nodes in a
cluster to a new value.

At the next release of OpenVMS we will reduce the default value of QDSKINTERVAL
from 10 seconds to 3. In the release after that we'll probably reduce it to 1.


T.RTitleUserPersonal
Name
DateLines
5234.1connection lost... Oh No!CSC32::T_SULLIVANFri Feb 21 1997 16:5412
   I understand why you want to change the default value for QDSKINTERVAL,
   however, this will most likely precipitate a flood of calls into the CSC.
   Most system managers haven't a clue what the following messages mean and
   tend to get very excited when they see them on their console.

	%CNXMAN, Timed-out I/O operation to quorum disk
	%CNXMAN, Lost "connection" to quorum disk

   We presently receive quite a few calls concerning this (usually induced
   by backup) and lowering QDSKINTERVAL will cause the event to be triggered
   more frequently.  Perhaps these messages should be disabled :-)
5234.2CPEEDY::CONWAYSun Feb 23 1997 12:416
    Is it true that the quorum disk in a SCSI cluster must be the system
disk?  I have a cluster where that seems to be true, I could net get quorum
when the quorum disk was some other disk on PKB.

Steve

5234.3COVERT::COVERTJohn R. CovertSun Feb 23 1997 20:134
The quorum disk may be any shared disk.  It does not have to be the system
disk.

/john
5234.4Remember: Quorum needed to make QUORUM.DATOSLAGE::AGE_PAage Ronning, Oslo, Norway, (DTN 872-8464)Mon Feb 24 1997 02:006
>> .2 I could net get quorum when the quorum disk was some other disk on PKB.

Remember that you need enough votes to get quorum before QUORUM.DAT is made.
The quorum disk does NOT count before the file is made the first time.

\�ge  
5234.5Non-Shadowed Multi-Path Disk...XDELTA::HOFFMANSteve, OpenVMS EngineeringMon Feb 24 1997 10:5111
:The quorum disk may be any shared disk.  It does not have to be the system
:disk.

   Any shared (multiple-path-accessable) disk (that is not shadowed).

   If it's shadowed, or if there is only one host that can directly
   access the disk, don't bother configuring the disk as a quorum disk.

   To avoid a glitch in the quorum access code, keep the disk mounted.
   (This is not strictly necessary, but suppresses some error messages
   generated on some OpenVMS versions in some situations...)
5234.6EEMELI::MOSEROrienteers do it in the bush...Mon Feb 24 1997 12:435
    to clarify a little on the term 'shadowed' disk. A host-based
    shadowed disk *cannot* be used as a quorum disk, but a controller-based
    shadowed (mirrored) disk *can* be used as a quorum disk.
    
    /cmos
5234.7Single Quorum Disk Watcher?CSC32::K_JOHNSONWed Mar 05 1997 13:4726
Nick,

In your base note, you state that:

> even when there are multiple Quorum Disk Watchers, only one of them 
> performs I/Os to the quorum file. If the cluster partitions, one node in
> each partition will perform I/Os to the Quorum file, and the multiple writers
> will be detected.

Is this a recent developement? My understanding has always been that all
Quorum Disk Watchers (systems with local access to the Quorum disk and
DISK_QUORUM defined) perform read/write I/O to the Quorum disk. Would you
happen to know if/where this is documented? Should I be looking in the
Connection manager code?

On the same subject- do you know if any thought has been given to 
processing Quorum Disk read/write sequences at a higher IPL? It just
seems to us that this would allow for more timely verification and 
reduce the number of "lost connection" events.

Thanks for any thoughts you have on this,

Kevin Johnson
OpenVMS Internals/Drivers
Colorado CSC

5234.860326::KAGEYAMATrust, but VerifyWed Mar 05 1997 20:5617
re>                        -< Single Quorum Disk Watcher? >-

See "VAXcluster Principles" written by Roy G. Davis. Especially the 
middle of page 7-16.

  ... Given that there are multiple Watchers, a system in remote access 
  mode will select one Watcher at random, and then send to that Watcher 
  all of its inquiries regarding the trustworthiness of the quorum disk. 
  If that Watcher leaves the cluster, the system in remote access mode 
  will randomly choose a different Watcher to which it will send its 
  inquiries.

There can be multiple Watchers, but in a "stable" state only one Watcher
updates the quorum file.

- Kazunori
5234.9STAR::CROLLThu Mar 06 1997 09:4327
>On the same subject- do you know if any thought has been given to 
>processing Quorum Disk read/write sequences at a higher IPL? It just
>seems to us that this would allow for more timely verification and 
>reduce the number of "lost connection" events.

Quorum disk I/O is already performed at IPL 8 (SPL$C_SCS) -- you can't do it
much higher than that -- since the routine eventually calls EXE$INSIOQ, which
acquires the driver's fork IPL spinlock.  You can't do this from a higher IPL
without causing other problems.

Besides, once you do the quorom I/O, you have to do other things (like execute
the quorom algorithm, or execute the connection manager state transition
algorithm, or other cluster-related stuff) and all these things happen at IPL 8.
Doing quorom I/O at a higher IPL would introduce more overhead and
synchronization issues.

The fundamental problem here is that everybody and his brother has to do
something at IPL 8, generally all at the same time, and they (mostly) all do it
under the same two spinlocks (SCS and IOLOCK8), and, nearly all the time, all on
the same CPU (the primary).  There can be significant performance gains in
breaking all this processing up, as shown by the success of the Fast Path work,
but it's also very hard to a) quantify the benefits (to sell the project) and b)
design and implement it and c) test to make sure you haven't screwed something
else up.

John

5234.10STAR::NCARRTalk dates &amp; features - but never together....Mon Mar 17 1997 16:0916
RE:
>Is this a recent developement? My understanding has always been that all
>Quorum Disk Watchers (systems with local access to the Quorum disk and
>DISK_QUORUM defined) perform read/write I/O to the Quorum disk. Would you
>happen to know if/where this is documented? Should I be looking in the
>Connection manager code?

No, this isn't a recent development. Apparently it has always worked like this. I
imagine almost everybody thought it worked as you describe. It was only when
John Covert described the other day how it really worked that I discovered myself!

If you watch the little green light on a quorum disk you can see it for yourself.

You live and learn (well, sometimes). 

(BTW: .8 correctly describes what non-watchers do.)
5234.11.10 edited for 80 columnsBSS::JILSONWFH in the Chemung River ValleySun Mar 23 1997 11:1021
<<< Note 5234.10 by STAR::NCARR "Talk dates & features - but never together...." >>>

RE:
>Is this a recent developement? My understanding has always been that all
>Quorum Disk Watchers (systems with local access to the Quorum disk and
>DISK_QUORUM defined) perform read/write I/O to the Quorum disk. Would you
>happen to know if/where this is documented? Should I be looking in the
>Connection manager code?

No, this isn't a recent development. Apparently it has always worked like 
this. I imagine almost everybody thought it worked as you describe. It was 
only when John Covert described the other day how it really worked that I 
discovered myself!

If you watch the little green light on a quorum disk you can see it for 
yourself.

You live and learn (well, sometimes). 

(BTW: .8 correctly describes what non-watchers do.)