[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference kernel::csguk_systems

Title:	CSGUK_SYSTEMS
Notice:	No restrictions on keyword creation
Moderator:	KERNEL::ADAMS

Created:	Wed Mar 01 1989
Last Modified:	Thu Nov 28 1996
Last Successful Update:	Fri Jun 06 1997
Number of topics:	242
Total number of notes:	1855

198.0. " xqperr @ glaxo " by COMICS::GLEDHILL () Wed Dec 28 1994 13:31

*****  THIS IS A READ ONLY COPY FROM NICE   -   please handle accordingly  *****
********************************************************************************

Log No            51017.00-49Q-1UVO          Queue                     
Log D/T            3-NOV-1994 15:29          Owner                               
LSDT D/T          28-NOV-1994 13:00          Loc/Phone           
Status as at      28-DEC-1994 13:09 is CLOSED                                  
EXT REQ Stat Code             Escalation Indicator  Y
Hold Indicator    N           Planned Indicator     Y

---------------------------------Customer---------------------------------------
Company          ENG-GLAXO                                                      
Department                                               
Street                                                   
City                                           
Postal Code                                  PO No                          

Caller           HAILSTONE                   Title      BRIAN          
Phone            081 966 4069                Extension  D/L   
Service Wish     LOOKIN:: $44$DUA34:[GLAXO]*.*                                  
--------------------------------------------------------------------------------


                                                                                
THIS SYSTEM WAS REPORTING A CRASH WITH THE CORRECTABLE MEMORY ERROR.
ACTUALLY THE MEMORY ERROR WAS NOTHING TO DO WITH THE CRASH - JUST REPORTED AT
THE TIME OF CRASH. 

ACTUAL CRASH WAS XQPERR BUGCHECK @ F11BXQP+40F3

NOTHING KNOWN SO REQUESTED TAPE BE SENT IN.

No CA description on this call so cant include.

T.R	Title	User	Date	Lines
198.1	original TS	COMICS::GLEDHILL	`Wed Dec 28 1994 13:35`	210
	-------------------------------------------------------------------------------- Log No 51017.00-49Q-1UVO Desc type TS Sequence no 01 Authr badge no 064234 Creation D/T 28-NOV-1994 11:57 -------------------------------------------------------------------------------- System crash information ------------------------ Time of system crash: 3-NOV-1994 14:05:46.96 Version of system: VAX/VMS VERSION V5.5-2 System Version Major ID/Minor ID: 1/0 VAXcluster node: UKKV01, a VAX 4000-300 Crash CPU ID/Primary CPU ID: 00/00 Bitmask of CPUs active/available: 00000001/00000001 CPU bugcheck codes: CPU 00 -- XQPERR, Error detected by file system XQP CPU 00 reason for Bugcheck: XQPERR, Error detected by file system XQP Process currently executing on this CPU: NML_16397 Current image file: $1$DIA0:[SYS0.SYSCOMMON.][SYSEXE]LOGINOUT.EXE Current IPL: 0 (decimal) CPU database address: 840C8000 General registers: R0 = 00000000 R1 = 000009B8 R2 = 00000000 R3 = 80AC746E R4 = 00000000 R5 = 00000000 R6 = 80DDDC6C R7 = 80AC7400 R8 = 7FF25D20 R9 = 80AC7450 R10 = 7FF25AC4 R11 = 80CF1500 AP = 7FF2597C FP = 7FF25944 SP = 7FF2588C PC = 7FF2A6F3 PSL = 00000000 ISP = 840C9200 KSP = 7FF2588C ESP = 7FFE9600 SSP = 7FFED800 USP = 0002DF84 Current operating stack (KERNEL): 7FF2586C 80AC7450 7FF25870 7FF25AC4 7FF25874 80CF1500 7FF25878 7FF2597C 7FF2587C 7FF25944 7FF25880 7FF25884 7FF25884 7FF2A6F3 7FF25888 00000000 SP => 7FF2588C 000000F3 7FF25890 80DDDC2C 7FF25894 81E59970 7FF25898 7FF25D68 7FF2589C 7FF25D1A 7FF258A0 7FF25CC4 7FF258A4 7FF25AFC 7FF258A8 7FF25AD8 7FF258AC 7FF25AD4 7FF258B0 7FF25AD0 7FF258B4 7FF25AC8 7FF258B8 7FF25A68 7FF258BC 7FF25A64 7FF258C0 7FF25A60 7FF258C4 80CE5BB0 7FF258C8 00000001 7FF258CC 00000000 7FF258D0 00000013 7FF258D4 7FF25A6F 7FF258D8 7FF25A3C 7FF258DC 00000000 7FF258E0 00000000 7FF258E4 7FF25B3C 7FF258E8 7FF25AE0 7FF258EC 5354454E ! Text that says 7FF258F0 45565245 ! 7FF258F4 4F4C2E52 ! NETSERVER.LOG;24 7FF258F8 34323B47 ! 7FF258FC 00393739 7FF25900 7FF25AC4 * Thanks - that came in useful later * 7FF25904 7FF25A44 7FF25908 00000004 7FF2590C 00000000 7FF25910 203C0000 7FF25914 7FF2597C 7FF25918 7FF25944 7FF2591C 7FF2B063 . . . * good idea to check this in case some problem with another node is involved VAXcluster data structures -------------------------- --- VAXcluster Summary --- Quorum Votes Quorum Disk Votes Status Summary ------ ----- ----------------- -------------- 1 1 N/A quorum --- CSB list --- Address Node CSID Votes State Status ------- ---- ---- ----- ----- ------ 80CA99F0 UKKV16 000100AF 0 open member,qf_noaccess 80CD2FD0 UKKV17 00010073 0 open member,qf_noaccess 80CD4E00 UKKV18 00010087 0 open member,qf_noaccess 80CD5300 UKKV19 00010075 0 open member,qf_noaccess 80CAE960 UKKV20 000100B2 0 open member,qf_noaccess 80CD3570 UKKV21 000100B3 0 open member,qf_noaccess 80CD69C0 UKKV22 0001006C 0 open member,qf_noaccess 80CD7420 UKKV23 00010008 0 open member,qf_noaccess 80CD6240 UKKV09 00010014 0 open member,qf_noaccess 80CD4B20 UKKV10 00010088 0 open member,qf_noaccess 80CA8E30 UKKV11 00010098 0 open member,qf_noaccess 80C83020 UKKV12 000100AE 0 open member,qf_noaccess 80C86640 UKKV24 00010097 0 open member,qf_noaccess 80C42D10 UKKV01 00010089 1 local member,qf_same,qf_noaccess * 80CD9F70 UKKV02 0001006B 0 open member,qf_noaccess 80CDCD80 UKKV03 0001008F 0 open member,qf_noaccess 80CDA030 UKKV04 00010085 0 open member,qf_noaccess 80CD7360 UKKV05 0001008B 0 open member,qf_noaccess 80CD7A20 UKKV06 0001008C 0 open member,qf_noaccess 80CD4840 UKKV07 00010076 0 open member,qf_noaccess 80CD53C0 UKKV08 00010072 0 open member,qf_noaccess 80CAAE50 UKKV13 000100B4 0 open member,qf_noaccess 80CDA330 UKKV15 0001000E 0 open member,qf_noaccess The node * has a vote; this mustbe the boot server ina LAVC. SDA> EVALUATE @PC-@(@CTL$GL_F11BXQP + 10) Hex = 000040F3 Decimal = 16627 UCB$M_SHD_WLGSTA_CHA+000F3 From Listings ============= F11BXQP.MAP CREATE 00003AEC 000044F0 00000A05 CREATE+003FD ! this is not the correct offset; the code appears further on. I searched for 3FD which it found and then searched twice for 'BLBS' from there. * probably in a patch, get an online listing from the customer (dir/dat/out * of sys$system and sys$loadable_images onto the tape so we can check easily. * try and match up the code and see the bit of listing it might be in, as long * as you make it clear you are not sure, you can always double check later * especially if things don't seem to fit. * I am trying to get in the habit of putting any assumptions I make on my * updates in case I forget later on! Call Frame Information ---------------------- Call Frame Generated by CALLS Instruction Condition Handler 7FF25944 00000000 SP Align Bits = 00 7FF25948 2BFC0000 Saved AP 7FF2594C 7FF259FC Saved FP 7FF25950 7FF259C4 Return PC 7FF25954 7FF2C712 R2 7FF25958 00000000 R3 7FF2595C 7FF2601E R4 7FF25960 00000000 R5 7FF25964 00000000 R6 7FF25968 7FF369D7 R7 7FF2596C 7FF25D20 R8 7FF25970 7FF368FC R9 7FF25974 00000033 R11 7FF25978 7FF25A44 Align Stack by 0 Bytes => Argument List 7FF2597C 00000000 Call Frame Generated by CALLS Instruction Condition Handler 7FF259C4 7FF2C86D SP Align Bits = 00 7FF259C8 2BFC0000 Saved AP 7FF259CC 7FFE77E8 CTL$GL_KSTKBAS+005E8 Saved FP 7FF259D0 7FFE77AC CTL$GL_KSTKBAS+005AC Return PC 7FF259D4 7FF26698 R2 7FF259D8 00000002 R3 7FF259DC 81E59970 R4 7FF259E0 80D5EA00 R5 7FF259E4 81E599D0 R6 7FF259E8 7FFA6560 R7 7FF259EC 00000033 R8 7FF259F0 80F35184 DUDRIVER+00154 R9 7FF259F4 00000000 R11 7FF259F8 7FFE6038 CTL$GL_KRP+00038 Align Stack by 0 Bytes => Argument List 7FF259FC 00000000 Call Frame Generated by CALLG Instruction Condition Handler 7FFE77AC 00000000 SP Align Bits = 00 7FFE77B0 0FFC0000 Saved AP 7FFE77B4 7FFE9600 Saved FP 7FFE77B8 7FFE9608 Return PC 7FFE77BC 80651755 EXE$ASTDEL+00003 R2 7FFE77C0 80004BA0 SCH$GQ_LEFWQ R3 7FFE77C4 80D5EA50 R4 7FFE77C8 80D5EA00 R5 7FFE77CC 00000000 R6 7FFE77D0 7FF41738 R7 7FFE77D4 806BA3B0 SCS$GA_LOCALSB+001D0 R8 7FFE77D8 000004D4 BUG$_ERRCACHFUL+00004 R9 7FFE77DC 7FF41408 R10 7FFE77E0 7FF41600 R11 7FFE77E4 7FFDFE48 PIO$GW_PIOIMPA Align Stack by 0 Bytes =>
198.2	My stuff	COMICS::GLEDHILL	`Wed Dec 28 1994 13:37`	383
	-------------------------------------------------------------------------------- Log No 51017.00-49Q-1UVO Desc type TS Sequence no 02 Authr badge no 231847 Creation D/T 9-DEC-1994 11:49 -------------------------------------------------------------------------------- ASSUMPTIONS - the code I am reading from here is the essentially the same as that on the customers system, this may not be true as they are on a patch. To sav time I work on that assumption and then check later. note my comments start with , anything else is from the listings. Crash is when it calls arbritate_access in the main create module. If the return status (r0) is 0 then it crashes (how can we fail to access a new file?) R0 has 0 in it, so this counts as an error condition (blbs and all that) (but why 0?) So we was here... IF .FUNCTION[IO$V_ACCESS] THEN BEGIN IF NOT ARBITRATE_ACCESS (.FIB [FIB$L_ACCTL], .FCB) THEN BUG_CHECK (XQPERR, 'how can we fail to access a new file?'); CURRENT_WINDOW = CREATE_WINDOW (.FIB[FIB$L_ACCTL] OR FIB$M_CONTROL, .FIB[FIB$B_WSIZE], .HEADER, .PACKET[IRP$L_PID], .FCB); IF .CURRENT_WINDOW EQL 0 THEN BEGIN And crash after calling this.... 1908 1 GLOBAL ROUTINE ARBITRATE_ACCESS (ACCTL, FCB) : L_JSB_2ARGS = 1909 1 !++ 1910 1 ! 1911 1 ! Determine if access to this file is allowed. 1912 1 ! 1913 1 !-- 1914 2 BEGIN 1915 2 1916 2 MAP 1917 2 ACCTL : BBLOCK, 1918 2 FCB : REF BBLOCK; 1919 2 1920 2 BIND_COMMON; 1921 2 1922 2 EXTERNAL 1923 2 CLU$GL_CLUB : ADDRESSING_MODE (GENERAL); 1924 2 1925 2 LOCAL 1926 2 LCKMODE; 1927 2 1928 2 IF .FCB [FCB$W_SEGN] NEQ 0 1929 2 THEN 1930 2 RETURN SS$_ACCONFLICT; 1931 2 1932 3 IF NOT (.ACCTL [FIB$V_NOLOCK] AND .CLEANUP_FLAGS [CLF_SYSPRV]) 1933 2 THEN 1934 2 IF .FCB [FCB$V_EXCL] 1935 3 OR .ACCTL [FIB$V_NOREAD] AND (.FCB [FCB$W_ACNT] NEQ 0) 1936 3 OR .ACCTL [FIB$V_NOWRITE] AND (.FCB [FCB$W_WCNT] NEQ 0) 1937 3 OR .ACCTL [FIB$V_WRITE] AND (.FCB [FCB$W_LCNT] NEQ 0) 1938 2 THEN 1939 2 RETURN SS$_ACCONFLICT; 1940 2 1941 2 IF NOT .BBLOCK [CURRENT_UCB [UCB$L_DEVCHAR2], DEV$V_CLU] 1942 2 OR .CLU$GL_CLUB EQL 0 1943 2 THEN 1944 2 RETURN 1; 1945 2 1946 2 LCKMODE = LOCK_MODE (.ACCTL); 1947 2 1948 2 IF .FCB [FCB$L_ACCLKID] EQL 0 1949 2 THEN 1950 2 NEW_ACCESS_LOCK (.LCKMODE, .FCB) 1951 2 ELSE 1952 2 IF .LCKMODE<0,8> GTRU .FCB [FCB$B_ACCLKMODE] 1953 2 THEN 1954 2 CONV_ACCLOCK (.LCKMODE, .FCB) 1955 2 ELSE 1956 2 SS$_NORMAL 1957 2 1958 1 END; ! of routine ARBITRATE_ACCESS This routine returns directly with a non-zero value, only way 0 could be * returned is if it called one of the lock routines, checking the fcb (in r11) * we have a null acclkid so will have to have called new_access_lock 1549 1 ROUTINE NEW_ACCESS_LOCK (LCKMODE, FCB) : L_NORM = 1550 1 1551 1 !++ 1552 1 ! 1553 1 ! FUNCTIONAL DESCRIPTION: 1554 1 ! 1555 1 ! This routine takes out a lock based on the given lock mode and file id, 1556 1 ! using the appropriate qualifiers in the resource name. 1557 1 ! 1558 1 ! CALLING SEQUENCE: 1559 1 ! See routine header above. 1560 1 ! 1561 1 ! INPUT PARAMETERS: 1562 1 ! 1563 1 ! IMPLICIT INPUTS: 1564 1 ! 1565 1 ! OUTPUT PARAMETERS: 1566 1 ! NONE 1567 1 ! 1568 1 ! IMPLICIT OUTPUTS: 1569 1 ! FCB$L_ACCLCKID - Lock id of granted lock. 0 if no lock granted. 1570 1 ! 1571 1 ! ROUTINE VALUE: 1572 1 ! 1 if access allowed 1573 1 ! 0 if access not allowed. <- * 1574 1 ! 1575 1 ! SIDE EFFECTS: 1576 1 ! 1577 1 !-- 1578 1 1579 2 BEGIN 1580 2 1581 2 BIND_COMMON; 1582 2 1583 2 MAP 1584 2 FCB : REF BBLOCK; 1585 2 1586 2 LOCAL 1587 2 LOCK_BLOCK : BBLOCK [24], 1588 2 RESNAM : VECTOR [24, BYTE], 1589 2 RESNAM_D : VECTOR [2] INITIAL (LONG (22), LONG (RESNAM)); 1590 2 1591 2 BIND 1592 2 LOCK_VAL = LOCK_BLOCK + 8 : BBLOCK FIELD (AV); 1593 2 1594 2 EXTERNAL 1595 2 EXE$AR_EWDATA : REF BBLOCK ADDRESSING_MODE (ABSOLUTE); 1596 2 ! pointer to PMS data area 1597 2 1598 2 EXTERNAL ROUTINE 1599 2 XQP$FCBSTALE : ADDRESSING_MODE (ABSOLUTE); ! blocking routine 1600 2 1601 2 ! Generate the resource name to identify the file in the cluster. 1602 2 ! Prefix the entire lock with the facility code for the file system. 1603 2 ! 1604 2 1605 2 (RESNAM [0]) = 'F11B'; 1606 2 (RESNAM [4])<0,16> = '$a'; 1607 2 1608 2 CH$MOVE (12, 1609 2 IF .CURRENT_VCB [VCB$W_RVN] EQL 0 1610 2 THEN CURRENT_VCB [VCB$T_VOLCKNAM] 1611 2 ELSE CURRENT_RVT [RVT$T_VLSLCKNAM], 1612 2 RESNAM [6]); 1613 2 1614 2 (RESNAM [18]) = .FCB [FCB$L_LOCKBASIS]; 1615 2 1616 2 EXE$AR_EWDATA [EW_PMS$GL_ACCLCK] = .EXE$AR_EWDATA [EW_PMS$GL_ACCLCK] + 1; 1617 2 1618 2 ! 1619 2 ! Attempt to acquire the lock. If granted then access is allowed. 1620 2 ! 1621 2 1622 3 BEGIN 1623 3 LOCAL 1624 3 LOCAL_FLAGS, 1625 3 STATUS; 1626 3 * note noqueue flag * 1627 3 LOCAL_FLAGS = LCK$M_NOQUEUE + LCK$M_SYNCSTS + LCK$M_SYSTEM 1628 3 + LCK$M_VALBLK + LCK$M_NOQUOTA + LCK$M_CVTSYS; 1629 3 1630 3 IF .LCKMODE EQL LCK$K_NLMODE 1631 3 THEN 1632 3 LOCAL_FLAGS = .LOCAL_FLAGS + LCK$M_EXPEDITE; 1633 3 1634 3 STATUS = $ENQ ( EFN = EFN, 1635 3 LKMODE = .LCKMODE, 1636 3 FLAGS = .LOCAL_FLAGS, 1637 3 BLKAST = XQP$FCBSTALE, 1638 3 ASTPRM = .FCB, 1639 3 LKSB = LOCK_BLOCK, 1640 3 RESNAM = RESNAM_D); 1641 3 1642 3 IF NOT .STATUS 1643 3 THEN 1644 4 IF (.STATUS EQL SS$_NOTQUEUED) <- 1645 3 THEN 1646 3 RETURN 0 <- ** 1647 3 ELSE 1648 4 IF (.STATUS NEQ SS$_VALNOTVALID) 1649 3 THEN 1650 3 ERR_EXIT (.STATUS); 1651 3 1652 3 FCB [FCB$B_ACCLKMODE] = .LCKMODE; 1653 3 FCB [FCB$L_ACCLKID] = .LOCK_BLOCK [LCK_ID]; 1654 3 1655 3 FCB [FCB$V_DELAYTRNC] = 0; 1656 3 1657 3 ! If the value block read above was invalid, do not use it. 1658 3 1659 4 IF .LOCK_VAL [AV_DELAYTRNC] AND (.STATUS NEQ SS$_VALNOTVALID) 1660 3 THEN 1661 3 FCB [FCB$V_DELAYTRNC] = 1; 1662 3 FCB [FCB$L_TRUNCVBN] = .LOCK_VAL [AV_TRUNCVBN]; 1663 3 1664 2 END; ! of block defining STATUS. 1665 2 1666 2 SS$_NORMAL 1667 2 1668 1 END; ! End of routine NEW_ACCESS_LOCK Checking the dump R1 contains status notqueued. Seems resonable that this is the status that was returned from the enq. Ie that the lock was held by something else at the time and we have elected not to wait (ie queue) as we specified the noqueue flag in the call to $enq. So looks like the lock was held by something else (probably on a different node) trying to access the same file at the same time. As this is a netserver.log file (see the stack in normans description) this is quite likely as these netserver logs are created quite often. -------------------------------------------------------------------------------- Looking at the vms 6.1 sources (this is 5.5-2) seems this problem is known about, from the create.lis sources. RLRFLK Robert L. Rappaport 24-Mar-1993 Workaround fix to lock manager problem that sometimes causes a crash when we call ARBITRATE_ACCESS for a new file and we fail to obtain the access lock. The problem is that a previous instance of the lock may persist for some short period of time after it was last dequeued. The workaround is to retry some number of times before causing a bugcheck. If retrying succeeds we simply proceed as is nothing out of the ordinary had occurred. See LOCKERS audit trail RLRFLK for a long explanation of the lock manager problem. Here is the bit of code equivalent to the bit we crashed in note that here we try 100 times before crashing rather than just the once. IF .FUNCTION[IO$V_ACCESS] THEN BEGIN IF NOT ARBITRATE_ACCESS (.FIB [FIB$L_ACCTL], .FCB) THEN ! The following is sort of a kludge. The failure to successfully acquire ! the access lock of the new file is sometimes due to a problem in the lock ! manager where knowledge of a previous incarnation of an access lock with ! the same resource name as the one we are trying to create may persist for ! a small window of time after this previous lock was dequeued. To deal ! with this problem for now, before we do redesign work on the lock manager, ! we introduce a retry loop below to retry the ARBITRATE_ACCESS for upto some ! number (100) of times. If we succeed in one of these times we simply ! continue. If we do not succeed after this number of tries, we revert to ! the bugcheck, because the failure is probably due to some other cause. BEGIN LOCAL ARB_STATUS; ARB_STATUS = 0; ! Initialize temporary status variable. DECR J FROM 100 TO 1 DO IF (ARB_STATUS = ARBITRATE_ACCESS (.FIB [FIB$L_ACCTL], .FCB)) THEN EXITLOOP; IF NOT .ARB_STATUS THEN BUG_CHECK (XQPERR, 'how can we fail to access a new file?'); END; * It would seem a good idea to upgrade to 6.1 or install the latest patch * as this seems to have this fix in as well as 6.1 This is from cscpat_1162 * dated 25-apr-1994. Problems addressed in VAXF11X02_U2055: o There is a basic lock manager design problem that manifests itself in two different file system problems. 1. Occasionally access to a cluster-shared file is incorrectly denied to an accessor. An application will attempt to access a shared file in what looks to be a compatible way and will receive an SS$_ACCONFLICT status. 2. Occasionally an application will create a new file <- *** and the system will crash in CREATE with an XQP bugcheck whose text says "how can we fail to access a new file." * note this fix also, I have seen a number of crashes in other bits of the * xqp where this crashes due to notqueued in addition to this one. Problems addressed in VAXF11X04_U2055: o There are several places in the XQP code where the system service $ENQ, using the flag LCK$M_NOQUEUE, is used to acquire a lock on a resource. In rare instances, with the use of the LCK$M_NOQUEUE flag, the XQP receives a status back of SS$_NOTQUEUED which indicates that the lock was not acquired. Since the XQP expects that it will never have to wait on a lock, a fatal bugcheck occurs and the system crashes. -------------------------------------------------------------------------------- * Below the gory details from the locker sources if you are interested. ! X-21 RLRFLK Robert L. Rappaport 22-Mar-1993 ! Fix FLK problem. The problem is in the interaction between ! the file system and the lock manager and it arises due to ! the way that the file system controls acquiring and ! converting the access lock of a file. ! ! The mode that we create an access lock is a function of the ! way we open the file. If we open the file for read, write, ! etc., if we are willing to share with other readers, writers, ! etc. When we attempt to acquire an access lock we announce ! how we are going to access the file by trying to acquire the ! lock in a given mode. If that mode is compatible with other ! current accessors of the file we are successful. If it is ! not compatible, we are unsuccessful and the access fails. ! This works only if the union of all the access locks on a ! file always represents the union of the current accessors ! and never visibly contains any additional state. The ! underlying reason that the access FLK problem arose is that ! due to a file system/lock manager interaction, an intermediate ! state became visible. ! ! The intermediate state mentioned above has to do with the ! fact that we use the access lock blocking AST mechanism for ! marking FCB's as stale across the cluster. To do this we ! momentarily attempt to convert the access lock to exclusive ! in order to trigger the blocking AST's, however since in order ! to attempt to acquire or convert the access lock of a file, one ! must first have the serial lock for the file, one would think ! that the momentary conversion (or attempt) to exclusive would ! be invisible, since in order to see it one would have to hold ! the serial lock and the one who holds the serial lock is the ! one converting to exclusive. However, if one thinks that way ! one would be wrong. ! ! The scenario in which this fails is that we call QEX_N_CANCEL ! to convert to exclusive and trigger the blocking AST's. At ! the time of the call the access lock is mastered on another ! node and the conversion attempt is queued on this master node. ! Upon return from queuing this conversion request, we immediately ! do a $DEQ with cancel specified to revoke the conversion to ! exclusive, and we immediately get back confirmation that the ! conversion was canceled. However, this confirmation came ! from the local node, and it may be that the conversion is still ! queued in the master. QEX_N_CANCEL then returns to its caller ! which calls CONV_ACCLOCK, which ordinarily does a ! conversion to current mode which causes the lock manager to ! delay until the previously queued conversion to exclusive ! truly is canceled in the master node. However, if this call ! to CONV_ACCLOCK occurs when the FCB$W_REFCNT is zero, i.e. ! at deaccess time, then CONV_ACCLOCK does not ask for a ! conversion at all but rather simply dequeues the access lock. ! In the failing scenario this $DEQ completes immediately in the ! local node, while the original convert to exclusive is still ! pending in the master. At this point we continue on, ! release the serial lock which is immediately taken by someone ! else who then attempts to acquire a legitimate mode access lock ! but this acquision fails due to the still pending conversion ! to exclusive. ! ! The fix is to ensure that the conversion to exclusive is truly ! canceled before continuing on. This can be done by forcing ! a true conversion to current lock mode whenever returning from ! QEX_N_CANCEL after triggering a blocking AST on an access lock. ! This will be done by making two changes; introducing a new ! routine, QEX_N_CANCEL_ACCLOCK, and introducing a new optional ! argument to CONV_ACCLOCK. The new routine a) calls QEX_N_CANCEL ! to trigger the bloking AST's on the access lock, and then b) ! calls CONV_ACCLOCK with the new optional argument to force a ! lock conversion from the current mode to the current mode. ! This seemingly unnecessary action forces the lock manager to ! to send a round trip message to the mastering node for the access ! lock and guarantees that the conversion to exclusive is no longer ! visible anywhere in the cluster. In addition to these changes, ! all file system callers to QEX_N_CANCEL for the access lock must ! be modified to use the new QEX_N_CANCEL_ACCLOCK. There are ! just two callers of QEX_N_CANCEL for the access lock; MAKE_FCB_STALE, ! here in LOCKERS.B32 and MARKDEL_FCB in DELETE.B32. !