[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference vaxaxp::vmsnotes

Title:VAX and Alpha VMS
Notice:This is a new VMSnotes, please read note 2.1
Moderator:VAXAXP::BERNARDO
Created:Wed Jan 22 1997
Last Modified:Fri Jun 06 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:703
Total number of notes:3722

575.0. "OpenVMS Alpha V7.1 MSCP serving problem" by BSS::JILSON (WFH in the Chemung River Valley) Thu May 08 1997 16:56

A customer has a very strange MSCP disk serving problem in OpenVMS Alpha 
V7.1 .  While we wait for them to force crashes of the MSCP server and the 
non-local system for an IPMT I thought I'd post it here.  This problem 
happens with random disks on random nodes in this cluster and has only 
started happening under V7.1.  There is a VAX system in the cluster but the 
problem even happens if that system is left powered off.  The problem is 
that disks that are supposed to be MSCP served are not being seen by the 
non-local nodes.

Node ALONSO an AlphaServer 2100 4/233 has $7$DKA100 that is supposed to be 
served.  Node LEAR an AlphaServer 2100 4/200 does not see it and yet it 
sees other $7$ disk from ALONSO.  SCSCONNCNT is sufficient as is NPAGEDYN 
on both nodes.  I have killed and restarted the CONFIGURE process on LEAR 
but $7$DKA100 still doesn't show up. On ALONSO SHOW DEVICE/SERVED shows 
$7$DKA100 as being served.  Also verified that there is an MSCP$DISK SYSAPP 
open to both ALONSO & LEAR.  From the uptime we see that LEAR was booted 
first so this may be some new synch problem when ALONSO booted as the 
customer claims they could reboot LEAR and the $7$DKA100 would then show 
up on LEAR.

Has anyone seen anything similar?

Jilly


alonso$ show dev dka

Device                  Device           Error    Volume         Free  Trans Mnt
 Name                   Status           Count     Label        Blocks Count Cnt
$1$DKA0:      (DUNCAN)  Mounted              0  INSTR_DRV1     1309208     1   7
$1$DKA100:    (DUNCAN)  Mounted              0  INSTR_DRV2     2572076     1   7
$1$DKA400:    (OBERON)  Mounted              0  USER_DRV1      1069668     1   7
$1$DKA600:    (OBERON)  Online               0
$5$DKA600:      (LEAR)  Online               0
$7$DKA0:      (ALONSO)  Mounted              1  AXPSYS2        2536644   585   5
$7$DKA100:    (ALONSO)  Mounted              0  PWRK_DRV2       271596   118   2
$7$DKA600:    (ALONSO)  Online wrtlck        0
$54$DKA0:       (JUNO)  Mounted              0  JUNO_1078       765843     1   5
$54$DKA400:     (JUNO)  Mounted              0  AXPSYS          598683     1   5
$56$DKA0:     (MENTOR)  Mounted              0  MENTOR_1080     176811     1   5

alonso$ show dev/served
       MSCP-Served Devices on ALONSO  8-MAY-1997 15:30:36.09

                                             Queue Requests
Device:           Status      Total Size     Current    Max     Hosts
    7$DKA0        Online         4110480           0      0         4
    7$DKA100      Online         4110480           0      0         1
    7$DKA600       Avail               0           0      0         0

alonso$ mcr sysgen show msc[p
Parameter Name           Current    Default     Min.      Max.     Unit  Dynamic
--------------           -------    -------    -------   -------   ----  -------
MSCP_LOAD                       1          0         0      16384 Coded-valu
MSCP_SERVE_ALL                  1          0         0          2 Coded-valu
MSCP_BUFFER                   128        128        16         -1 Coded-valu
MSCP_CREDITS                    8          8         2        128 Coded-valu
MSCP_CMD_TMO                  600        600         0 2147483647 CNTLRTMOs  D

alonso$ show cpu

ALONSO, a AlphaServer 2100 4/233

alonso$ show system/noproc
OpenVMS V7.1  on node ALONSO   8-MAY-1997 15:31:58.37  Uptime  2 07:22:53

I/O data structures
-------------------
$7$DKA100 [ALONSO$DKA100]                      DEC RZ28M           UCB: 810A5500

Device status:   08021810 online,valid,unload,lcl_valid,exfunc_supp
Characteristics: 1C4D4008 dir,fod,shr,avl,mnt,elg,idv,odv,rnd
                 21010281 clu,srv,nnm,nlt,scsi,dtn

Owner UIC [000001,000004]   Operation count     369895   ORB address    8108BDC0
      PID        00000000   Error count              0   DDB address    8105B080
Alloc. lock ID   43000126   Reference count        114   DDT address    87DFDFA0
Alloc. class            7   Online count             2   VCB address    81174780
Class/Type          01/36   Retry cnt/max        16/16   CRB address    8105B100
Def. buf. size        512   BOFF              00000C00   I/O wait queue 810A556C
DEVDEPEND        0BAC1056   Byte count        00000400
DEVDEPND2        00000000   SVAPTE            8152C608
DEVDEPND3        01000001   DEVSTS            00000004
FLCK index             3A
DLCK address     8105B180

                --- Primary Class Driver Data Block (CDDB) 81070680 ---

Status:              00000000
Controller Flags:    0000

Allocation class       7    CDRP Queue      00000000    DDB address     8105B080
System ID       00000000    Restart Queue   00000000    CRB address     8105B100
                00000000    DAP Count              0    CDDB link       00000000
Contrl. ID      00000000    Contr. timeout         0    PDT address     00000000
                00000000    Reinit Count           0    Original UCB    00000000
Response ID     00000000    Wait UCB Count         0    UCB chain       00000000
MSCP Cmd status 00000000

        *** PORT I/O queue is empty ***

        *** DEVICE I/O queue is empty ***

        *** I/O request queue is empty ***

                --- Volume Control Block (VCB) 81174780 ---

Volume: PWRK_DRV2        Lock name: PWRK_DRV2
Status:  A0 extfid,system
Status2: 14 mountver,nohighwater
Status3: 00000000

Mount count            1    Rel. volume            0    AQB address     810AFFC0
Transactions         122    Max. files        411048    RVT address     810A5500
Free blocks       271416    Rsvd. files            9    FCB queue       81177500
Window size            7    Cluster size           4    Cache blk.      81144400
Vol. lock ID    1700028C    Def. extend sz.        5
Block. lock ID  0200029F    Record size            0

                    --- ACP Queue Block (AQB) 810AFFC0 ---

ACP requests are serviced by the eXtended Qio Processor (XQP)

Status: 14 defsys,xqioproc

Mount count           25    ACP type           f11v2    Request queue   00000000
                            ACP class              0

        *** ACP request queue is empty ***
                  --- CDT Summary Page ---

CDT Address   Local Process     Connection ID     State       Remote Node
-----------   -------------     -------------     -----       -----------

 8102A0A0     SCS$DIRECTORY       626F0000        listen
 8102A230     MSCP$TAPE           626F0001        listen
 8102A3C0     VMS$VAXcluster      626F0002        listen
 8102A550     MSCP$DISK           626F0003        listen
 8102A6E0     VMS$SDA_AXP         626F0004        listen
 8102A870     SCA$TRANSPORT       626F0005        listen
 8102AA00     SCA$TRANSPORT       62870006        open         OBERON
 8102AB90     VMS$VAXcluster      626F0007        open         JUNO
 8102AD20     MSCP$DISK           626F0008        open         OBERON
 8102AEB0     VMS$VAXcluster      626F0009        open         OBERON
 8102B040     MSCP$DISK           626F000A        open         JUNO
 8102B1D0     VMS$VAXcluster      626F000B        open         DUNCAN
 8102B360     VMS$VAXcluster      626F000C        open         LEAR
 8102B4F0     MSCP$DISK           626F000D        open         DUNCAN
 8102B680     MSCP$DISK           626F000E        open         LEAR
 8102B810     VMS$DISK_CL_DRVR    626F000F        open         OBERON
 8102B9A0     VMS$DISK_CL_DRVR    626F0010        open         DUNCAN
 8102BB30     VMS$DISK_CL_DRVR    626F0011        open         LEAR
 8102BCC0     VMS$DISK_CL_DRVR    626F0012        open         JUNO
 8102BE50     VMS$TAPE_CL_DRVR    626F0013        open         OBERON
 8102BFE0     MSCP$DISK           62710014        open         CERES
 8102C170     VMS$DISK_CL_DRVR    626F0015        open         CERES
 8102C300     VMS$VAXcluster      62700016        open         CERES
 8102C490     PATHWORKScluster    62700017        listen
 8102C620     PATHWORKScluster    62710018        open         LEAR
 8102C7B0     PATHWORKScluster    62740019        open         LEAR
 8102C940     VMS$VAXcluster      626F001A        open         MENTOR
 8102CAD0     VMS$DISK_CL_DRVR    626F001B        open         MENTOR
 8102CC60     MSCP$DISK           626F001C        open         MENTOR

Number of free CDT's:  13

alonso$ show mem/pool/full

              System Memory Resources on  8-MAY-1997 15:33:42.19

Nonpaged Dynamic Memory      (Lists + Variable)
    Current Size (bytes)      24289280    Current Size (pagelets)      47440
    Initial Size              24289280    Initial Size (pagelets)      47440
    Maximum Size             114688000    Maximum Size (pagelets)     224000
    Free Space (bytes)        14376320    Space in Use (bytes)       9912960
    Largest Variable Block    12643072    Smallest Variable Block         64
    Number of Free Blocks         6296    Free Blocks LEQU 64 Bytes      657
    Free Blocks on Lookasides     1016    Lookaside Space (bytes)     470912


lear$ show system/noproc
OpenVMS V7.1  on node LEAR   8-MAY-1997 15:36:06.00  Uptime  2 07:26:55

lear$ show cpu

LEAR, a AlphaServer 2100 4/200

lear$ show dev dka

Device                  Device           Error    Volume         Free  Trans Mnt
 Name                   Status           Count     Label        Blocks Count Cnt
$1$DKA0:      (DUNCAN)  Mounted              0  INSTR_DRV1     1308928     1   7
$1$DKA100:    (DUNCAN)  Mounted              0  INSTR_DRV2     2572040     1   7
$1$DKA400:    (OBERON)  Mounted              0  USER_DRV1      1069760     2   7
$1$DKA600:    (DUNCAN)  Online               0
$5$DKA600:      (LEAR)  Online wrtlck        0
$7$DKA0:      (ALONSO)  Mounted              0  AXPSYS2        2536644     1   5
$7$DKA600:    (ALONSO)  Online               0
$54$DKA0:       (JUNO)  Mounted              0  JUNO_1078       765843     1   5
$54$DKA400:     (JUNO)  Mounted              0  AXPSYS          598683     1   5
$56$DKA0:     (MENTOR)  Mounted              0  MENTOR_1080     176811     1   5

lear$ show mem/pool/full

              System Memory Resources on  8-MAY-1997 15:36:59.42

Nonpaged Dynamic Memory      (Lists + Variable)
    Current Size (bytes)      17227776    Current Size (pagelets)      33648
    Initial Size              17227776    Initial Size (pagelets)      33648
    Maximum Size              71958528    Maximum Size (pagelets)     140544
    Free Space (bytes)         7364672    Space in Use (bytes)       9863104
    Largest Variable Block     5655296    Smallest Variable Block         64
    Number of Free Blocks         5001    Free Blocks LEQU 64 Bytes      561
    Free Blocks on Lookasides     1370    Lookaside Space (bytes)     603200

lear$ mcr sysgen show mscp
Parameter Name           Current    Default     Min.      Max.     Unit  Dynamic
--------------           -------    -------    -------   -------   ----  -------
MSCP_LOAD                       1          0         0      16384 Coded-valu
MSCP_SERVE_ALL                  2          0         0          2 Coded-valu
MSCP_BUFFER                   128        128        16         -1 Coded-valu
MSCP_CREDITS                    8          8         2        128 Coded-valu
MSCP_CMD_TMO                  600        600         0 2147483647 CNTLRTMOs  D

                  --- CDT Summary Page ---

CDT Address   Local Process     Connection ID     State       Remote Node
-----------   -------------     -------------     -----       -----------

 80C2A020     SCS$DIRECTORY       87120000        listen
 80C2A1B0     MSCP$TAPE           87120001        listen
 80C2A340     VMS$VAXcluster      87120002        listen
 80C2A4D0     MSCP$DISK           87120003        listen
 80C2A660     VMS$SDA_AXP         87120004        listen
 80C2A7F0     SCA$TRANSPORT       87120005        listen
 80C2A980     SCA$TRANSPORT       872C0006        open         OBERON
 80C2AB10     VMS$VAXcluster      87120007        open         DUNCAN
 80C2ACA0     MSCP$DISK           87120008        open         DUNCAN
 80C2AE30     VMS$VAXcluster      87120009        open         OBERON
 80C2AFC0     MSCP$DISK           8712000A        open         OBERON
 80C2B150     MSCP$DISK           8712000B        open         JUNO
 80C2B2E0     VMS$VAXcluster      8712000C        open         JUNO
 80C2B470     VMS$VAXcluster      8712000D        open         ALONSO
 80C2B600     VMS$DISK_CL_DRVR    8712000E        open         ALONSO
 80C2B790     VMS$DISK_CL_DRVR    8712000F        open         DUNCAN
 80C2B920     VMS$DISK_CL_DRVR    87120010        open         OBERON
 80C2BAB0     VMS$DISK_CL_DRVR    87120011        open         JUNO
 80C2BC40     VMS$TAPE_CL_DRVR    87120012        open         OBERON
 80C2BDD0     MSCP$DISK           87120013        open         ALONSO
 80C2BF60     MSCP$DISK           87140014        open         CERES
 80C2C0F0     VMS$DISK_CL_DRVR    87120015        open         CERES
 80C2C280     VMS$VAXcluster      87130016        open         CERES
 80C2C410     PATHWORKScluster    87130017        listen
 80C2C5A0     PATHWORKScluster    87140018        open         ALONSO
 80C2C730     PATHWORKScluster    87160019        open         ALONSO
 80C2C8C0     VMS$DISK_CL_DRVR    8712001A        open         MENTOR
 80C2CA50     VMS$VAXcluster      8712001B        open         MENTOR
 80C2CBE0     MSCP$DISK           8712001C        open         MENTOR

Number of free CDT's:  13

T.RTitleUserPersonal
Name
DateLines
575.1EEMELI::MOSEROrienteers do it in the bush...Fri May 09 1997 08:523
    are you using PAC (port allocation class), i.e. is DEVICE_NAMING = 1?
    
    /cmos
575.2BSS::JILSONWFH in the Chemung River ValleyFri May 09 1997 10:363
Nope DEVICE_NAMING is 0 on all nodes.  Hadn't thought about that one.

Jilly
575.3SOS6::BERNARDBernard Ourghanlian, Alpha Resource CenterWed May 14 1997 06:2714
    I had the exact same problem to troubleshoot here. 
    
    After some time, it appeared the problem was linked to the way an on-going 
    tagged IO is detected. This problem appeared to be a phase timing problem 
    when you have a Fast SCSI drive that does not support Tagged Command 
    Queing. The problem occurs when the non-TCQ drive is under a high IO load 
    while a slower device like a CD-ROM drive is also performing a high number 
    of IO's the non-TCQ device periodically suffers a phase timing problem that 
    results in a Bus Reset that causes the device to go into Mount Verification.
    
    I fixed the problem in installing new (not already released) SCSI
    port and class drivers.
    
    But I don't know if this is your problem...
575.4BSS::JILSONWFH in the Chemung River ValleyWed May 14 1997 10:564
Thanks.  I have forced crashes now for these 2 systems and will be IPMT'ng 
this case.

Jilly
575.5FYIBSS::JILSONWFH in the Chemung River ValleyWed May 14 1997 16:071
IPMT case is HPAQ50PA9.
575.6SCSI driver issue was separate issueVMSSPT::DIFABIOMOVL #OPINION,EXE$GL_BLAKHOLEThu May 15 1997 15:385
    The fix was within SYS$PKEDRIVER and would not affect devices being
    served to a node (since that uses SYS$DUDRIVER) or serving a
    device(MSCP). 
    
              Mark d.
575.7SOS6::BERNARDBernard Ourghanlian, Alpha Resource CenterThu May 22 1997 13:092
    I do not agree with this analysis. I did fix this problem using the new
    SYS$PKEDRIVER.