[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference vaxaxp::vmsnotes

Title:VAX and Alpha VMS
Notice:This is a new VMSnotes, please read note 2.1
Moderator:VAXAXP::BERNARDO
Created:Wed Jan 22 1997
Last Modified:Fri Jun 06 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:703
Total number of notes:3722

479.0. "ALLIN1 process in MUTEX state" by CHOWDA::GLICKMAN (writing from Newport,RI) Tue Apr 15 1997 17:25

The ALL-IN-1 Administrator at this customer site logged into the ALLIN1
account yesterday and sent mail to all subscribers in ALL-IN-1 (3.0A) on 
OpenVMS 6.1 VAX.  From logging software that this customer has I can see 
that Forwarding Message Header screen was the last thing done.  He did the
TO: and the SUBJECT:.  A message came back that there was New ALL-IN-1 mail
 for POSTMASTER from MANAGER. 
Then I see an interrupt.

This process is currently in a MUTEX.  I have included the SHOW PROCESS
from ANALYZE/SYSTEM and the ALLIN1 account record from SYSUAF.DAT.

Can someone comment on what the problem and is a reboot the only way
to recover from this problem?

Note, that I have cross posted this note in both the ALL-IN-1 and
VMSNOTES notes conferences.

Appreciating any responses.

Process index: 0085   Name: ALLIN1   Extended PID: 2021DA85

Status : 02040023 res,delpen,respen,phdres,inter
Status2: 00000001 quantum_resched
PCB address              95A07FC0    JIB address              951F34C0
PHD address              C3CF4000    Swapfile disk address    00000000
Master internal PID      00ED0085    Subprocess count                0
Internal PID             00ED0085    Creator internal PID     00000000
Extended PID             2021DA85    Creator extended PID     00000000
State                       MUTEX    Termination mailbox          0000
Current priority                9    AST's enabled                ESU=20
Base priority                   4    AST's active                 NONE
UIC                [00001,000222]    AST's remaining               156
Mutex count                     0    Buffered I/O count/limit      149/150
Waiting EF cluster              0    Direct I/O count/limit        150/150
Starting wait time       1B001B1A    BUFIO byte count/limit        0/27392
Event flag wait mask     951F34C0    # open files allowed left     266
Local EF cluster 0       60000001    Timer entries allowed left     50
Local EF cluster 1       D0000000    Active page table count         0
Global cluster 2 pointer 00000000    Process WS page count        4781
Global cluster 3 pointer 00000000    Global WS page count         2606


Username: ALLIN1                           Owner:  SYSTEM
Account:  SYSTEM                           UIC:    [1,222] ([ALLIN1])
CLI:      DCL                              Tables: DCLTABLES
Default:  APPL$DISK3:[ALLIN1]
LGICMD:   
Flags:  DisCtlY DefCLI
Primary days:   Mon Tue Wed Thu Fri Sat Sun
Secondary days:                            
No access restrictions
Expiration:            (none)    Pwdminimum:  6   Login Fails:     0
Pwdlifetime:         30 00:00    Pwdchange:   8-APR-1997 09:43 
Last Login: 15-APR-1997 13:32 (interactive), 15-APR-1997 15:37 (non-interactive)
Maxjobs:         0  Fillm:       100  Bytlm:        36000
Maxacctjobs:     0  Shrfillm:      0  Pbytlm:           0
Maxdetach:       0  BIOlm:        50  JTquota:       2048
Prclm:          10  DIOlm:        50  WSdef:          600
Prio:            4  ASTlm:       100  WSquo:         1500
Queprio:         0  TQElm:        50  WSextent:      3000
CPU:        (none)  Enqlm:       350  Pgflquo:     100000
Authorized Privileges: 
  CMKRNL    DETACH    EXQUOTA   GRPNAM    NETMBX    OPER      PHY_IO    PRMGBL
  PRMMBX    READALL   SYSGBL    SYSLCK    SYSNAM    SYSPRV    TMPMBX    VOLPRO
  WORLD
Default Privileges: 
  CMKRNL    DETACH    EXQUOTA   GRPNAM    NETMBX    OPER      PHY_IO    PRMGBL
  PRMMBX    READALL   SYSGBL    SYSLCK    SYSNAM    SYSPRV    TMPMBX    VOLPRO
  WORLD
Identifier                         Value           Attributes
  OA$PRVAPP                        %X800100B1      RESOURCE 
  OA$ADMIN                         %X80010057      
  OA$MANAPP                        %X8001017A      RESOURCE 
  OA$MANAGER                       %X8001017B      
  OA$USER_QM                       %X8001017D      
  OAFC$SYSMAN                      %X8001017E      
  A1_DEV_TEAM                      %X800100B4      
%UAF-I-NOMODS, no modifications made to system authorization file
%UAF-I-NAFNOMODS, no modifications made to network proxy database
%UAF-I-RDBNOMODS, no modifications made to rights database
T.RTitleUserPersonal
Name
DateLines
479.1CSC64::BLAYLOCKIf at first you doubt,doubt again.Tue Apr 15 1997 18:1613
Starting wait time       1B001B1A    BUFIO byte count/limit        0/27392

The process is out of BUFIO (Bytlm: 36000) and will remain in
MUTEX until it is returned (maybe by a subprocess completing/terminating).

You might check the HACKERS notes file for some utility 
to bump up the quota 'on-line' or if this process has subprocesses
that are not stuck also, delete one of them and see if that
returns enough quota to clear the parent.

The quotas for the account look pretty low, so you might want to
bump them all up before the process(es) run out of one of the
others (BIOlm, DIOlm, ASTlm, TQElm, Enqlm) as well.
479.2how to cut your JIBGIDDAY::GILLINGSa crucible of informative mistakesTue Apr 15 1997 20:51152
.1>The process is out of BUFIO (Bytlm: 36000) and will remain in
.1>MUTEX until it is returned (maybe by a subprocess completing/terminating).

   Unlikely since there are no other processes in the job tree:

.0>Master internal PID      00ED0085    Subprocess count                0
.0>Internal PID             00ED0085    Creator internal PID     00000000
.0>Extended PID             2021DA85    Creator extended PID     00000000
								^^^^^^^^^^

    The process can be unstuck, but it's not 100% safe to do so. Here is an
  old note with instructions on how to do it. Use at your own risk!

  The subsequent note explains how to use DELTA to do the same thing, however,
  it doesn't always work since the process may have been swapped out. The
  program method has always worked for me.

						John Gillings, Sydney CSC


            <<< GIDDAY::DISK$NOTES:[NOTES$LIBRARY]TSCNOTES.NOTE;1 >>>
                      -< Sydney Telephone Support Group >-
================================================================================
Note 233.0           "MUTEX" state, symptoms, cause and cure             1 reply
GIDDAY::GILLINGSNP "a crucible of informative mista" 70 lines   2-AUG-1990 15:30
--------------------------------------------------------------------------------
    You will sometimes see processes in MUTEX state. To diagnose the cause use
  SDA to look at the process. If the "Event flag wait mask" has the same value
  as the JIB address then the problem is due to depletion of a pooled resource,
  one of BYTLM or TQELM. The process is waiting for another process in the same
  job tree to return some quota to the job pool. Which quota is usually obvious
  from the "BUFIO byte count/limit" or "Timer entries allowed left" fields, but
  you can make certain by formatting the JIB and examining offset JIB$B_FLAGS. 
  A value of 01 indicates BYTLM, 02 indicates TQELM. In theory, you may see 03 -
  this would seem to imply that 2 processes in the same job tree are in MUTEX 
  state, one waiting for BYTLM, the other for TQELM. Seems pretty unlikely but
  stranger things have happened. For all the gory details of diagnosis, see
  a STARS article titled: "Discussion Of Unusual MUTEX Wait State In VMS V5.n"

    It may be that a process in this state has no subprocess or creator
  (ie: subprocess count is 0 and creator PID is 0) in which case it may as
  well be waiting for lemon scented paper napkins instead of BYTLM or TQELM
  because there isn't anyone around to return quota to it. As with other
  MWAIT states, a STOP/ID won't do anything. However, there is a way, albeit
  a rather dangerous hack, to unwedge such a process. Using SDA, format the 
  JIB and note the address (left hand column) of JIB$L_BYTCNT. eg:

  SDA> format 803D26C0
803D26C0   JIB$L_MTLFL             803D26C0
803D26C4   JIB$L_MTLBL             803D26C0
803D26C8   JIB$W_SIZE                  00A0
803D26CA   JIB$B_TYPE                2F
803D26CB   JIB$B_DAYTYPES          60
803D26CC   JIB$T_USERNAME
803D26CC                           4C4C4947
803D26D0                           53474E49
803D26D4                           20202020
803D26D8   JIB$T_ACCOUNT
803D26D8                           4C4C4947
803D26DC                           53474E49
803D26E0   JIB$L_BYTCNT            00000024	**** Address of this
803D26E4   JIB$L_BYTLM             000081C0
803D26E8   JIB$L_PBYTCNT           00000000

   In this case the magic number is 803D26E0. Next, type in the following MACRO
   program 

           .title fixbytlm
        .entry start,^m<>
        pushl  #0
        pushab poke
        calls  #2,g^sys$cmkrnl
        RET
        .entry poke,^m<>	! WARNING - UNSYNCHRONIZED K-MODE CODE
        MOVL   #^X803D26E0,R0	! replace the address with your BYTCNT address
        ADDL2  #100,(R0)+	! replace "#100" with the amount of BYTLM
        ADDL2  #100,(R0)	! you wish to add on both these lines
        MOVL   #1,R0
        RET
        .END Start

   Now enable CMKRNL privilege, assemble, link and run the program (put fingers
   in ears before RUN - if you got the address or instructions wrong, the system
   will be blown away from underneath you!)

   If you survived, the MUTEX process should have received a boost in BYTLM and
   should now be runnable, but you'll have to give it a little kick to get it
   going. Anything which delivers a special kernel AST will suffice. An SDA
   SHOW PROCESS or $GETJPI <anything from the PCB> for example. If a STOP/ID
   command had been issued on the process before running this program, the
   process will just vanish. It is probably best to delete the process ASAP 
   since running with a hacked JIB is not exactly supported. Also, remember to
   delete the program and source after use. A program which adds an arbitrary
   number to some system addresses isn't very safe to leave lying around.
   
						John Gillings


            <<< GIDDAY::DISK$NOTES:[NOTES$LIBRARY]TSCNOTES.NOTE;1 >>>
                      -< Sydney Telephone Support Group >-
================================================================================
Note 233.1           "MUTEX" state, symptoms, cause and cure              1 of 1
GIDDAY::BRODRIBB "Scornful dogs eat dirty puddings"  45 lines   6-AUG-1992 11:50
                            -< Another solution... >-
--------------------------------------------------------------------------------
    ... which has the same risks (CMK hackery) but saves the customer a
    whole lot of typing:
    
    1. Find the process IPID..
    
    Process index: 0008   Name: BRODRIBB   Extended PID: 00000028
    -------------------------------------------------------------
    Process status:  00140001   RES,PHDRES,LOGIN
    
    PCB address              802E8210    JIB address              8075FF00
    PHD address              808AC200    Swapfile disk address    00000000
    Master internal PID      00010008    Subprocess count                0
    Internal PID             00010008    Creator internal PID     00000000
    			        ^
    				|
    		****------------+
    
    
    2. Format the JIB as before and find the addresses of JIB$L_BYTCNT and
       JIB$L_BYTLM:
    			8075FF20   JIB$L_BYTCNT            00017660
    			8075FF24   JIB$L_BYTLM             00017660
    
    3. Exit SDA and use DELTA to increment BYTCNT and BYTLM
    
    	$ run sys$share:delta
    	1;M					<--- you enter
    	00000001				<--- DELTA returns. All
    						     processes and system
    						     space are now writeable
    	<CR>					
    	ipid: address/current_value� new_value�	<--- generic command to
    						     change value at address.
    						     �DELTA enters this in 
    						     response to /
    						     �you enter this
    
    	In the real world example here:
    						     
    	00010008: 8075ff24/00017660 00020000	<--- must enter BYTLM first 
    						     otherwise CRASH !!!
    	00010008: 8075ff20/00017660 00019000
    
    	exit