[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference spezko::cluster

Title:+ OpenVMS Clusters - The best clusters in the world! +
Notice:This conference is COMPANY CONFIDENTIAL. See #1.3
Moderator:PROXY::MOORE
Created:Fri Aug 26 1988
Last Modified:Fri Jun 06 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:5320
Total number of notes:23384

5306.0. "QDSKINTERVAL and "lost connection", the real story?" by COMICS::MILLSS ("Jump! Jump now!" ...Kosh) Thu May 08 1997 12:40

OK. I understand why when doing backups of a quorum disk you may see numerous
"timed out - lost connection to quorum disk" messages, i.e. i/o load on quorum
disk so high that the q disk watcher times out its i/o when it's trying to
'touch' QUORUM.DAT. What I can't find is what this timeout value is! 

I have a customer experiencing this problem. I told him that increasing
QDSKINTERVAL would alleviate the problem, i.e. QUORUM.DAT not being touched so
often, therefore not so many timeouts, but he *won't* increase it unless I can
justify the action to him and I just can't find the words!

Can anyone out there help me? Please?

Many thanks,

Simon R. Mills
OpenVMS Group
UK CSC
T.RTitleUserPersonal
Name
DateLines
5306.1Seems straightforward to meGIDDAY::GILLINGSa crucible of informative mistakesFri May 09 1997 01:4814
  Simon,

>unless I can
>justify the action to him and I just can't find the words

    How about: 

  If you want to stop the "lost connection to quorum disk" messages then
  you should adjust QDSKINTERVAL upwards. Otherwise just ignore the messages
  as they are self correcting and not harming anything.

  Even better, change the cluster configuration so you don't need a quorum
  disk at all!
						John Gillings, Sydney CSC
5306.2COMICS::MILLSS"Jump! Jump now!" ...KoshFri May 09 1997 11:3913
Re .1

Thanks, John. I've tried that approach but the customer was having none of it.
He won't change anything unless we can state specifics, i.e. "change this
parameter here to prevent that timeout there from exceeding this limit, etc"

I think if I were just to tell him that an i/o from the q disk watcher will
timeout after 0.1 seconds then that might satisfy him. It might not. Its worth a
try, though!

Thanks,

Simon.
5306.3_VAXcluster Principles_XDELTA::HOFFMANSteve, OpenVMS EngineeringFri May 09 1997 14:4214
   Get the customer a copy of Roy Davis' VAXcluster Principles manual.

   If the quorum disk cannot be reached in 2*QDSKINTERVAL, then the
   VMScluster will begin recalculating the quorum.  All VMScluster
   nodes should (normally) have the same values for QDSKINTERVAL.

   I would also determine if there is a need for a quorum disk in
   this configuration, or if there is a better configuration that
   may be created.  (There are a number of folks that have a quorum
   disk when they do not need one, and there are a number of folks
   that have a mis-set EXPECTED_VOTES value, etc.  A comparison of
   the SYSGEN> SHOW/SCS output for each node will catch most of these
   common errors.)
5306.4COMICS::MILLSS"Jump! Jump now!" ...KoshMon May 12 1997 08:1719
Re .3

Thanks for the info, Steve. What I'm after is the time that the quorum disk
watcher waits before deciding it can't complete its read and/or write to the
QUORUM.DAT file. Is it really 2*QDSKINTERVAL ? My customer is saying that he
does have a lot of outstanding i/o's at the time, but none of them take anywhere
near 1*QDSKINTERVAL let alone 2*QDSKINTERVAL to complete.

QDSKINTERVAL is not the issue here. Its the time that the touch of QUORUM.DAT
takes that the customer wants clarifying.

As for configuration, its a two node cluster. I suggested a quorum VAX but he
doesn't want to do that.

Thanks for any input and please correct me if I'm misunderstanding anything :^)

Regards,

Simon.
5306.52*QDSKINTERVAL; Need Info...XDELTA::HOFFMANSteve, OpenVMS EngineeringMon May 12 1997 11:2023
  Please acquire the manual, and pass it along to the customer.  Also
  pass along an internals and data structures manual.  This combination
  usually either overwhelms the customer, or it educates them.  :-)  And
  in either case, we all benefit.

  The interval before the quorum watcher "gets upset" is twice QDSKINTERVAL.

:As for configuration, its a two node cluster. I suggested a quorum VAX but he
:doesn't want to do that.

  The quorum host makes for faster quorum transitions.

  I need to know the hardware configuration of the quorum disk.  It is
  distinctly possible that the quorum disk is unnecessary, depending on
  the particular disk interconnect configuration.  (That you have a two
  node VMScluster is useful; it's the most common configuration found
  with a quorum disk...  But is this quorum disk on a SCSI bus, a DSSI
  bus, via a CI storage controller, and what are the two host types?

  And as requested, the SHOW/SCS settings are also of interest here.
  (I've seen more than a few customers with errors on their VOTES values,
  their EXPECTED_VOTES values, or other VMScluster SYSGEN settings...)
5306.6See note 4491.*CSC32::T_SULLIVANMon May 12 1997 13:522
    
    Check out note 4491.*