[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference spezko::cluster

Title:+ OpenVMS Clusters - The best clusters in the world! +
Notice:This conference is COMPANY CONFIDENTIAL. See #1.3
Moderator:PROXY::MOORE
Created:Fri Aug 26 1988
Last Modified:Fri Jun 06 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:5320
Total number of notes:23384

5259.0. "Problems MOUNTing shadowsets - V7.1/V6.2-redhawk" by TIMABS::FREPPEL (Mosquito ergo summm...) Mon Mar 17 1997 13:37

    
    
    
    This note has first been entered as reply .5 in note 5243, but I found
    it to be worth to have more visibility (5243.5 deleted).
    
    This is a variant of the problem reported in 5243:
    VMScluster consists of 
    2 TLaser V6.2 redhawk
    2 Tlaser V7.1 ALPMOUN02_071
    1 VAX4000-100A V6.2 redhawk
    1 VAX3100 V6.2 redhawk
    
    Device DSAxxxx is mounted on 3 systems (1VAX, 2Alphas, all V6.2 redhawk)
                   it currently has 2 members

    During bootstrap of the V7.1 system (TLaser), the mount attempt fails
    with the DEVBUSY message.
    
    An attempt to mount the device interactively on the V7.1 system gives the 
    DEVBUSY message and one of the physical disks is is seen as "online 
    allocated", owner is the process which issued the MOUNT. The systems having
    mounted the shadow set see the very same physical disk as "ShadowSetMember".
    
    In order to prevent data inconsistencies due to the different way the disks
    were seen, the member with inconsistent status has been DISMOUNTed from the 
    shadow set.
    The shadow set has 1 member now and is still mounted on 3 systems.
    
    (To add some additional amount of thrill one TLaser (V6.2) silently 
     crashed at this time:
     **** Boot driver initialization routine returned failure = 00000000
     but this may well be another story, don't know...)

    An attempt to MOUNT this one member shadow set on a fourth system (V7.1 or
    V6.2 redhawk does not matter) gives back the DEVBUSY message.

    Moreover, when I remove the V7.1 system from the cluster and try to
    mount the shadow set from a V6.2 system I get the same message back. 

    Now we added a second member to the shadow set by issuing the command
    MOUNT/SYSTEM DSAxxxx: /shadow=$z$duayyyy label logical
    on the Alpha which has the DSAxxx mounted. The device is added to the 
    shadow set with a copy operation.

    After completion of the copy operation, the MOUNT succeeded on all 
    systems. The problem has gone and I don't know why...
    
    Could anybody shed some light on what's going on here, please?
    Any hints?
    
    Thanks
    Raymond

    The images involved are:
    - on the V6.2 systems (Alpha)
                    image name: "MOUNTSHR"
                    image file identification: "ALPHA X6AF-Z2A"
                    image file build identification: "X61Q-SSB-NG00"
                    link date/time: 19-FEB-1997 10:05:43.15
                    linker identification: "A11-12"
    - on the V6.2 system (VAX)
                    image name: "MOUNTSHR"
                    image file identification: "X-1"
                    link date/time: 18-FEB-1997 16:18:52.68
                    linker identification: "05-13"
    - on the V7.1 system (Alpha)
		    image name: "MOUNTSHR"
                    image file identification: "X-3"
                    image file build identification: "X6C7-SSB-1193"
                    link date/time: 19-FEB-1997 10:16:26.43
                    linker identification: "A11-39"
    
    
T.RTitleUserPersonal
Name
DateLines
5259.1If you want this looked at, Please use IPMTXDELTA::HOFFMANSteve, OpenVMS EngineeringMon Mar 17 1997 13:476
    I'd recommend an IPMT if you are unable to locate any previous reports
    via COMET or at the patch area -- there exists a CLUSIO patch that
    includes all of Redhawk, and there is a MOUNT patch presently in the
    works (it may already be available)...

5259.2CLUSIO01_062 installed - which ALPMOUNxx_071?TIMABS::FREPPELMosquito ergo summm...Mon Mar 17 1997 14:1414
    Thanks Steve, but
>I'd recommend an IPMT if you are unable to locate any previous reports
>via COMET or at the patch area -- 
    Just searched thru COMET, but to no avail (VMSNOTES 273 comes close).

>there exists a CLUSIO patch that includes all of Redhawk, 
    {VAX/ALP}CLUSIO01_062 has been applied on all V6.2 systems. (Did I
    forget to mention that?)

>and there is a MOUNT patch presently in the works (it may already be 
>available)...
    Cannot see any different from ALPMOUN02_071, which one do you refer to?

    Raymond.
5259.3The Folks Working On This Don't Usually Read NotesXDELTA::HOFFMANSteve, OpenVMS EngineeringMon Mar 17 1997 14:436
:>and there is a MOUNT patch presently in the works (it may already be 
:>available)...
:    Cannot see any different from ALPMOUN02_071, which one do you refer to?

   Log the IPMT -- ALPMOUN02_071 was the one I was refering to.

5259.4IPMT: Case ZUO101135 (CFS.49819)TIMABS::FREPPELMosquito ergo summm...Wed Mar 19 1997 12:261
    
5259.5workaround until new DUDRIVER is availableTIMABS::FREPPELMosquito ergo summm...Wed Apr 02 1997 03:3818
    From the IPMT answer:
    
    >This appears to be a known problem that was introduced in the SCSI
    >device naming that was added in V7.1 and CLUSIO.  There are timing windows
    >where the system may lose track of which connection/device it is working on
    >and the system will incorrectly append the wrong allocation class to the
    >unit's data structure.
    (...)
    >A fix is already in the works from the DUDRIVER development team. 
    
    Workaround:
    >In the meantime, we believe the only plausible workarounds for this
    >customer are to either revert back to V6.2 (without CLUSIO or COMPAT kits)
    >or to renumber all of the units on one of the lobes.
    
    We chose renumbering and do not have the problem anymore.
    
    Raymond.