[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference vaxaxp::vmsnotes

Title:VAX and Alpha VMS
Notice:This is a new VMSnotes, please read note 2.1
Moderator:VAXAXP::BERNARDO
Created:Wed Jan 22 1997
Last Modified:Fri Jun 06 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:703
Total number of notes:3722

307.0. "SCSIcluster can't get quorum disk" by HTSC12::MICKWIDLAM (Water addict, water man) Tue Mar 11 1997 23:24

Hi,

I have a customer with two AS4100 and running VMS V6.2-1H3. They used KZPSA and
a BA356 to form a SCSIcluster. On the shared bus they have several disks,
include a system disk and a quorum disk. Under normal situation, the cluster
worked fine.

As this is a new installation, we tested several things and found some problem.
If either system is brought down ungracefully (crash, halt, or shutodwn w/o
remove_node), the other host will hang and cannot get quorum disk. Later we
found that if we define the system disk as the quorum disk, the remained system
is able to get the quorum disk and continue to run.

I've taken a look on the V6.2 NF but it doesn't mentioned whether this is a
restriction or not. Anyone seen similar case before?

Regards,
Mickwid.
T.RTitleUserPersonal
Name
DateLines
307.1EEMELI::MOSEROrienteers do it in the bush...Wed Mar 12 1997 08:265
    make sure to set QDSKINTERVAL=1 to reduce the cluster state transition.
    You definitely can have a quorum disk different from the system disk.
    BTW, what is your RECNXINTERVAL set to?
    
    /cmos
307.2any effect if QDSKINTERVAL is 10?HTSC12::MICKWIDLAMWater addict, water manThu Mar 13 1997 22:217
re .1

What is the effect if the QDSKINTERVAL is set to 10 the default value? The
RECNXINTERVAL is set to 20 as default also.

Regards,
Mickwid.
307.3EEMELI::MOSEROrienteers do it in the bush...Fri Mar 14 1997 02:557
    qdskinterval=10 will cause a cluster state transition of over 1 min
    which you might see as a hung. I believe if you take a coffee break
    and come back, after shutting down/halting one node, things are
    correctly resolved by then, hence the recommendation to lower the
    interval down to 1 sec.
    
    /cmos
307.4qdskinterval=1 but not workHTSC12::MICKWIDLAMWater addict, water manTue Apr 01 1997 05:1511
re .3

I tried to set QDSKINTERVAL to 1 but still not solved. Rather, we moved the
quorum disk to system, it can get the quorum. Currently the customer is using
system disk as the quorum disk. But you know, they cannot use system disk
shadowing.

Any idea?

Regards,
Mickwid.
307.5EEMELI::MOSEROrienteers do it in the bush...Tue Apr 01 1997 08:4421
    just to verify a few things:
    
    on both nodes you have
    	VOTES = 1
    	EXPECTED_VOTES = 3
    	QDSKVOTES = 1
    	QDSKINTERVAL = 1
    	DISK_QUORUM = "$nnn$DKzzz"	! where nnn is the allocation class
    					! and zzz the unit number
    	ALLOCLASS = aaa			! the same non-zero value
    
    Now when both systems are up and running, you halt one system, or
    crash it or whatever, and now you're trying to tell us that the
    other system will hang forever due to quorum lost?
    
    This can't be true, I've tried and tested this myself on a SCSI
    cluster, so there must be something else wrong on your system
    (SCSI bus termination etc.)
    
    /cmos