[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference spezko::cluster

Title:+ OpenVMS Clusters - The best clusters in the world! +
Notice:This conference is COMPANY CONFIDENTIAL. See #1.3
Moderator:PROXY::MOORE
Created:Fri Aug 26 1988
Last Modified:Fri Jun 06 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:5320
Total number of notes:23384

5233.0. "Help, VMScluster-E-NOT_SERVED" by CPEEDY::CONWAY () Wed Feb 19 1997 12:34

    I am having some trouble adding a Satellite to my two node SCSI cluster.
I shut down one node just to take it out of the picture, reducing the problem
to just one boot node, one satellite, one system disk with two roots. The lan
can be Eithernet or FDDI, makes no difference. I am using Port Allocation
Classes on the boot node. This is the SSB version of OpenVMS V7.1 /W OSI.

    While the boot node is in it's 'waiting for <satellite> to boot loop'
the following is seen on the satellite:

>>>b -fl 0,1
(boot fwa0.0.0.12.0 -flags 0,1)

Trying MOP boot.
.........

Network load complete.
Host name: MOLD
Host address: aa-00-04-00-63-30

bootstrap code read in
base = 1f2000, image_start = 0, image_bytes = 71077
initializing HWRPB at 2000
initializing page table at 1e4000
initializing machine state
setting affinity to the primary CPU
jumping to bootstrap code
%VMScluster-I-MOPSERVER, MOP server for downline was node MOLD
%VMScluster-I-SYSDISK, Satellite system disk is _$1$DKA100:     
%VMScluster-I-SYSROOT, Satellite system root is <SYS10.>
%VMScluster-I-BUSONLINE, LAN adapter is now running 08-00-2B-B4-18-80
%VMScluster-I-VOLUNTEER, System disk service volunteered by node MOLD
     AA-00-04-00-63-30
%VMScluster-I-CREATECH, Creating channel to node MOLD     
08-00-2B-B4-18-80   00-00-F8-4A-A0-10
%VMScluster-I-OPENVC, Opening virtual circuit to node MOLD    
%VMScluster-I-MSCPCONN, Connected to a MSCP server for the system disk,
 node MOLD    
%VMScluster-E-NOT_SERVED, Configuration change, the system disk is no
longer served by node MOLD     FF-7F-00-00-83-00
%VMScluster-I-REINIT_WAIT, Waiting for access to the system disk server

   And back to "%VMScluster-I-MSCPCONN", an endless loop.

Relevant params on boot node:

Parameters in use: Active
Parameter Name           Current    Default     Min.      Max.     Unit  Dynamic
--------------           -------    -------    -------   -------   ----  -------
VAXCLUSTER                      2          1         0          2 Coded-valu 
EXPECTED_VOTES                  3          1         1        127 Votes      
VOTES                           1          1         0        127 Votes      
RECNXINTERVAL                  20         20         1      32767 Seconds    D
DISK_QUORUM     "$1$DKA100       "    "    "    "    "     "ZZZZ" Ascii       
QDSKVOTES                       1          1         0        127 Votes      
QDSKINTERVAL                   10         10         1      32767 Seconds    
ALLOCLASS                       1          0         0        255 Pure-numbe 
LOCKDIRWT                       0          0         0        255 Pure-numbe 
CLUSTER_CREDITS                10         10        10        128 Credits    
NISCS_CONV_BOOT                 0          0         0          1 Boolean    
NISCS_LOAD_PEA0                 1          0         0          1 Boolean    
NISCS_PORT_SERV                 0          0         0          3 Bitmask    
MSCP_LOAD                       1          0         0      16384 Coded-valu 
TMSCP_LOAD                      0          0         0          3 Coded-valu 
MSCP_SERVE_ALL                  1          0         0          2 Coded-valu 
TMSCP_SERVE_ALL                 0          0         0          3 Coded-valu 
MSCP_BUFFER                   128        128        16         -1 Coded-valu 
MSCP_CREDITS                    8          8         2        128 Coded-valu 
MSCP_CMD_TMO                  600        600         0 2147483647 CNTLRTMOs  D
TAPE_ALLOCLASS                  0          0         0        255 Pure-numbe 
NISCS_MAX_PKTSZ              1498       1498      1080       8192 Bytes      
NISCS_LAN_OVRHD                18         18         0        256 Bytes      
CWCREPRC_ENABLE                 1          1         0          1 Bitmask    D

System Disk params:
 
Disk $1$DKA100: (MOLD), device type DEC RZ28M, is online, mounted, file-oriented
    device, shareable, served to cluster via MSCP Server, error logging is
    enabled.

    Error count                    3    Operations completed              31065
    Owner process                 ""    Owner UIC                      [SYSTEM]
    Owner process ID        00000000    Dev Prot            S:RWPL,O:RWPL,G:R,W
    Reference count              283    Default buffer size                 512
    Total blocks             4110480    Sectors per track                    86
    Total cylinders             2988    Tracks per cylinder                  16
    Allocation class               1

    Volume label           "GROUT71"    Relative volume number                0
    Cluster size                   4    Transaction count                   381
    Free blocks              1701652    Maximum files allowed            411048
    Extend quantity                5    Mount count                           1
    Mount status              System    Cache name        "_$1$DKA100:XQPCACHE"
    Extent cache size             64    Maximum blocks in extent cache   170165
    File ID cache size            64    Blocks currently in extent cache  19572
    Quota cache size               0    Maximum buffers in FCP cache        754
    Volume owner UIC           [1,1]    Vol Prot    S:RWCD,O:RWCD,G:RWCD,W:RWCD

  Volume Status:  subject to mount verification, protected subsystems enabled,
      file high-water marking, write-through caching enabled.

    So, what am I doing wrong here?

Steve
T.RTitleUserPersonal
Name
DateLines
5233.1BSS::JILSONWFH in the Chemung River ValleyWed Feb 19 1997 16:401
What is SCSCONNCNT?  Increase it to 50 or 60 and try again.
5233.2will tryCPEEDY::CONWAYThu Feb 20 1997 08:495
    Thanks, I will try that as soon as I can pry the system away from my
"customer".

Steve

5233.3CPEEDY::CONWAYFri Feb 21 1997 11:4313
    In addition to the the symptoms in .1 I also see, during "$show clus/con"
(add connections), two things:

	LOCAL_PROC_NAME		CON_STA
	---------------		-------
	SCS$DIR_LOOKUP		CON_SEN		! This keeps comming and going
	MSCP$DISK		OPEN		! This is steady

    I still havn't gotten a chance to try changing SCSCONN yet.

Steve


5233.4STAR::PITCHERSteve Pitcher/Pathworks for OpenVMSFri Feb 21 1997 13:148
    I'm working on this same cluster.  I tried setting SCSCONNCNT to 60...
    It didn't help.
    
    Any other thoughts?
    
    Thanks.
    
    -	stp
5233.5Questions, Guesses, No Answers...XDELTA::HOFFMANSteve, OpenVMS EngineeringFri Feb 21 1997 16:054
   What (non-zero) disk allocation classes are in use, and on which nodes?
   Does a bootstrap over the NI controller complete?  Are there any errors
   logged?  Is there a version of CLUSTER_AUTHORIZE in SYS$SPECIFIC:[SYSEXE]?
5233.6Same problem herePRSSOS::MENICACCIMon Mar 24 1997 11:2996
      .0, Steve dis you find a solution ?

Is it a known problem ? If an IPMT is needed, what else information would be
useful ?

Regards, 

Maria. 


Configuration :
-------------


Alphaserver 1000A 5/400 et 2000 4/233, cluster SCSI running OpenVMS V7.1.


 ------
 |    | Station alpha 4/233
 |    |   root SYS10
 |    |     (METEOR)
 |    |
 ------
   |
   |                            ETHERNET
  ------------------------------------------------------------------------
   |                                                |
   |                                                |
   |                                                |
 ----------                                     _____________  Alphaserver
 |         | Alphaserver 1000a 5/400            |            | 2000 4/233
 |         |   root SYS1 (MATEMA)               |            |  root SYS0
 |         |                                    |            |  (METIS)
 |         | SCSI Port allo class 200           |            |
 |         |        ____                        |            |SCSI Port Allo
 |         |--------|  |$200$DKA0               |            |    Class 300
 |         |  SCSI  ----                        |            |   ____
 |         |                                    |            |___|  |$300$DKA0
 | KZPSA   |                                    |KZPSA|KZPSA |   ----
 ----------                                     -------------
   |SCSI Port                              SCSI Port |   |SCSI Port
   |allo class                             allo class|   |allo class
   |301                                    301       |   |302 _____
   |                                                 |   |----|TAPE|$302$MKA400
   |        SCSI                        SCSI         |   |    -----
   |__________________|------|_______________________|   |    _____
                      |      | -$301$DKA100(SYSTEM DISK) |----|TAPE|$302$MKA500
                      |      | _$301$DKA200                   -----
                      |      | _$301$DKA300
                      |      | _$301$DKA400
                      -------
                                              



! VMS$DEVICES.DAT
CLUSTER_CONFIG created 17-mar-1997 14:25:50

[Port MATEMA$PKA]
allocation class = 301

[Port MATEMA$PKB]
allocation class = 200
!
!
[Port METIS$PKA]
allocation class = 300

[Port METIS$PKB]
allocation class = 301

[Port METIS$PKC]
allocation class = 302


>>> B fl 0,0 ewa0

...

%VMScluster-E-NOT_SERVED, Configuration change, the system disk is no
longer served by node METIS
%VMScluster-I-REINIT_WAIT, Waiting for access to the system disk server
%VMScluster-I-REINIT_WAIT, Waiting for access to the system disk server
%VMScluster-I-REINIT_WAIT, Waiting for access to the system disk server


....


On the two boot nodes, $ show dev/service ==> all disks are available.
                                                                           
No errors logged.

Only one cluster_authorize.dat in sys$common:[sysexe]

Boot was also done with only one boot node and scsconncnt = 60.
5233.7Start the IPMT...XDELTA::HOFFMANSteve, OpenVMS EngineeringMon Mar 24 1997 13:576
:Is it a known problem ? If an IPMT is needed, what else information would be
:useful ?

   Please start the IPMT, with the information here...  If additional
   information is needed, it will be asked for.
5233.8Lots Of Ethernet Addresses Here...XDELTA::HOFFMANSteve, OpenVMS EngineeringMon Mar 24 1997 14:0222
:Network load complete.
:Host name: MOLD
:Host address: aa-00-04-00-63-30
:%VMScluster-I-BUSONLINE, LAN adapter is now running 08-00-2B-B4-18-80
:%VMScluster-I-VOLUNTEER, System disk service volunteered by node MOLD
:     AA-00-04-00-63-30
:%VMScluster-I-CREATECH, Creating channel to node MOLD     
:08-00-2B-B4-18-80   00-00-F8-4A-A0-10
:%VMScluster-I-OPENVC, Opening virtual circuit to node MOLD    
:%VMScluster-I-MSCPCONN, Connected to a MSCP server for the system disk,
: node MOLD    
:%VMScluster-E-NOT_SERVED, Configuration change, the system disk is no
:longer served by node MOLD     FF-7F-00-00-83-00
:%VMScluster-I-REINIT_WAIT, Waiting for access to the system disk server

   I have no idea if this is significant, but MOLD appears to have
   more than a few Ethernet addresses here...

   Can you bring the satellite and the boot host onto the same LAN
   segment (a private LAN segment is even better), and eliminate any
   weird network hardware that might be lurking?

5233.9CPEEDY::CONWAYWed Mar 26 1997 08:3920
re .8 HOFFMAN:

>   I have no idea if this is significant

    Don't underestimate yourself, turns out we have a "invalid connection"
(PHY Status LED is flashing amber) on one of our DECswitch 900 EF'S
that is almost certainly causing my problem (not yet verified).

re .6 MENICACCI

>%VMScluster-I-REINIT_WAIT, Waiting for access to the system disk server
>%VMScluster-I-REINIT_WAIT, Waiting for access to the system disk server
>%VMScluster-I-REINIT_WAIT, Waiting for access to the system disk server

    I do not get a steady stream of REINIT_WAIT's. I get a stream of
MSCPCONN => NOT_SERVED => REINIT_WAIT. So it does not sound like the
same problem to me. It looks to me like my satellite is repeatedly
discovering the boot node and then loosing it.

Steve
5233.10Decnt MOP and LANCP MOPPRSSOS::MENICACCIThu Apr 10 1997 09:528
	
Engineering has found the solution.

On this cluster, Decnet MOP and lancp MOP were both up and running.

The solution was to stop Decnet Mop and use Lancp MOP only.

Satellite booted OK.