Title: | + OpenVMS Clusters - The best clusters in the world! + |
Notice: | This conference is COMPANY CONFIDENTIAL. See #1.3 |
Moderator: | PROXY::MOORE |
Created: | Fri Aug 26 1988 |
Last Modified: | Fri Jun 06 1997 |
Last Successful Update: | Fri Jun 06 1997 |
Number of topics: | 5320 |
Total number of notes: | 23384 |
This note has first been entered as reply .5 in note 5243, but I found it to be worth to have more visibility (5243.5 deleted). This is a variant of the problem reported in 5243: VMScluster consists of 2 TLaser V6.2 redhawk 2 Tlaser V7.1 ALPMOUN02_071 1 VAX4000-100A V6.2 redhawk 1 VAX3100 V6.2 redhawk Device DSAxxxx is mounted on 3 systems (1VAX, 2Alphas, all V6.2 redhawk) it currently has 2 members During bootstrap of the V7.1 system (TLaser), the mount attempt fails with the DEVBUSY message. An attempt to mount the device interactively on the V7.1 system gives the DEVBUSY message and one of the physical disks is is seen as "online allocated", owner is the process which issued the MOUNT. The systems having mounted the shadow set see the very same physical disk as "ShadowSetMember". In order to prevent data inconsistencies due to the different way the disks were seen, the member with inconsistent status has been DISMOUNTed from the shadow set. The shadow set has 1 member now and is still mounted on 3 systems. (To add some additional amount of thrill one TLaser (V6.2) silently crashed at this time: **** Boot driver initialization routine returned failure = 00000000 but this may well be another story, don't know...) An attempt to MOUNT this one member shadow set on a fourth system (V7.1 or V6.2 redhawk does not matter) gives back the DEVBUSY message. Moreover, when I remove the V7.1 system from the cluster and try to mount the shadow set from a V6.2 system I get the same message back. Now we added a second member to the shadow set by issuing the command MOUNT/SYSTEM DSAxxxx: /shadow=$z$duayyyy label logical on the Alpha which has the DSAxxx mounted. The device is added to the shadow set with a copy operation. After completion of the copy operation, the MOUNT succeeded on all systems. The problem has gone and I don't know why... Could anybody shed some light on what's going on here, please? Any hints? Thanks Raymond The images involved are: - on the V6.2 systems (Alpha) image name: "MOUNTSHR" image file identification: "ALPHA X6AF-Z2A" image file build identification: "X61Q-SSB-NG00" link date/time: 19-FEB-1997 10:05:43.15 linker identification: "A11-12" - on the V6.2 system (VAX) image name: "MOUNTSHR" image file identification: "X-1" link date/time: 18-FEB-1997 16:18:52.68 linker identification: "05-13" - on the V7.1 system (Alpha) image name: "MOUNTSHR" image file identification: "X-3" image file build identification: "X6C7-SSB-1193" link date/time: 19-FEB-1997 10:16:26.43 linker identification: "A11-39"
T.R | Title | User | Personal Name | Date | Lines |
---|---|---|---|---|---|
5259.1 | If you want this looked at, Please use IPMT | XDELTA::HOFFMAN | Steve, OpenVMS Engineering | Mon Mar 17 1997 13:47 | 6 |
I'd recommend an IPMT if you are unable to locate any previous reports via COMET or at the patch area -- there exists a CLUSIO patch that includes all of Redhawk, and there is a MOUNT patch presently in the works (it may already be available)... | |||||
5259.2 | CLUSIO01_062 installed - which ALPMOUNxx_071? | TIMABS::FREPPEL | Mosquito ergo summm... | Mon Mar 17 1997 14:14 | 14 |
Thanks Steve, but >I'd recommend an IPMT if you are unable to locate any previous reports >via COMET or at the patch area -- Just searched thru COMET, but to no avail (VMSNOTES 273 comes close). >there exists a CLUSIO patch that includes all of Redhawk, {VAX/ALP}CLUSIO01_062 has been applied on all V6.2 systems. (Did I forget to mention that?) >and there is a MOUNT patch presently in the works (it may already be >available)... Cannot see any different from ALPMOUN02_071, which one do you refer to? Raymond. | |||||
5259.3 | The Folks Working On This Don't Usually Read Notes | XDELTA::HOFFMAN | Steve, OpenVMS Engineering | Mon Mar 17 1997 14:43 | 6 |
:>and there is a MOUNT patch presently in the works (it may already be :>available)... : Cannot see any different from ALPMOUN02_071, which one do you refer to? Log the IPMT -- ALPMOUN02_071 was the one I was refering to. | |||||
5259.4 | IPMT: Case ZUO101135 (CFS.49819) | TIMABS::FREPPEL | Mosquito ergo summm... | Wed Mar 19 1997 12:26 | 1 |
5259.5 | workaround until new DUDRIVER is available | TIMABS::FREPPEL | Mosquito ergo summm... | Wed Apr 02 1997 03:38 | 18 |
From the IPMT answer: >This appears to be a known problem that was introduced in the SCSI >device naming that was added in V7.1 and CLUSIO. There are timing windows >where the system may lose track of which connection/device it is working on >and the system will incorrectly append the wrong allocation class to the >unit's data structure. (...) >A fix is already in the works from the DUDRIVER development team. Workaround: >In the meantime, we believe the only plausible workarounds for this >customer are to either revert back to V6.2 (without CLUSIO or COMPAT kits) >or to renumber all of the units on one of the lobes. We chose renumbering and do not have the problem anymore. Raymond. |