| Starting wait time 1B001B1A BUFIO byte count/limit 0/27392
The process is out of BUFIO (Bytlm: 36000) and will remain in
MUTEX until it is returned (maybe by a subprocess completing/terminating).
You might check the HACKERS notes file for some utility
to bump up the quota 'on-line' or if this process has subprocesses
that are not stuck also, delete one of them and see if that
returns enough quota to clear the parent.
The quotas for the account look pretty low, so you might want to
bump them all up before the process(es) run out of one of the
others (BIOlm, DIOlm, ASTlm, TQElm, Enqlm) as well.
|
|
.1>The process is out of BUFIO (Bytlm: 36000) and will remain in
.1>MUTEX until it is returned (maybe by a subprocess completing/terminating).
Unlikely since there are no other processes in the job tree:
.0>Master internal PID 00ED0085 Subprocess count 0
.0>Internal PID 00ED0085 Creator internal PID 00000000
.0>Extended PID 2021DA85 Creator extended PID 00000000
^^^^^^^^^^
The process can be unstuck, but it's not 100% safe to do so. Here is an
old note with instructions on how to do it. Use at your own risk!
The subsequent note explains how to use DELTA to do the same thing, however,
it doesn't always work since the process may have been swapped out. The
program method has always worked for me.
John Gillings, Sydney CSC
<<< GIDDAY::DISK$NOTES:[NOTES$LIBRARY]TSCNOTES.NOTE;1 >>>
-< Sydney Telephone Support Group >-
================================================================================
Note 233.0 "MUTEX" state, symptoms, cause and cure 1 reply
GIDDAY::GILLINGSNP "a crucible of informative mista" 70 lines 2-AUG-1990 15:30
--------------------------------------------------------------------------------
You will sometimes see processes in MUTEX state. To diagnose the cause use
SDA to look at the process. If the "Event flag wait mask" has the same value
as the JIB address then the problem is due to depletion of a pooled resource,
one of BYTLM or TQELM. The process is waiting for another process in the same
job tree to return some quota to the job pool. Which quota is usually obvious
from the "BUFIO byte count/limit" or "Timer entries allowed left" fields, but
you can make certain by formatting the JIB and examining offset JIB$B_FLAGS.
A value of 01 indicates BYTLM, 02 indicates TQELM. In theory, you may see 03 -
this would seem to imply that 2 processes in the same job tree are in MUTEX
state, one waiting for BYTLM, the other for TQELM. Seems pretty unlikely but
stranger things have happened. For all the gory details of diagnosis, see
a STARS article titled: "Discussion Of Unusual MUTEX Wait State In VMS V5.n"
It may be that a process in this state has no subprocess or creator
(ie: subprocess count is 0 and creator PID is 0) in which case it may as
well be waiting for lemon scented paper napkins instead of BYTLM or TQELM
because there isn't anyone around to return quota to it. As with other
MWAIT states, a STOP/ID won't do anything. However, there is a way, albeit
a rather dangerous hack, to unwedge such a process. Using SDA, format the
JIB and note the address (left hand column) of JIB$L_BYTCNT. eg:
SDA> format 803D26C0
803D26C0 JIB$L_MTLFL 803D26C0
803D26C4 JIB$L_MTLBL 803D26C0
803D26C8 JIB$W_SIZE 00A0
803D26CA JIB$B_TYPE 2F
803D26CB JIB$B_DAYTYPES 60
803D26CC JIB$T_USERNAME
803D26CC 4C4C4947
803D26D0 53474E49
803D26D4 20202020
803D26D8 JIB$T_ACCOUNT
803D26D8 4C4C4947
803D26DC 53474E49
803D26E0 JIB$L_BYTCNT 00000024 **** Address of this
803D26E4 JIB$L_BYTLM 000081C0
803D26E8 JIB$L_PBYTCNT 00000000
In this case the magic number is 803D26E0. Next, type in the following MACRO
program
.title fixbytlm
.entry start,^m<>
pushl #0
pushab poke
calls #2,g^sys$cmkrnl
RET
.entry poke,^m<> ! WARNING - UNSYNCHRONIZED K-MODE CODE
MOVL #^X803D26E0,R0 ! replace the address with your BYTCNT address
ADDL2 #100,(R0)+ ! replace "#100" with the amount of BYTLM
ADDL2 #100,(R0) ! you wish to add on both these lines
MOVL #1,R0
RET
.END Start
Now enable CMKRNL privilege, assemble, link and run the program (put fingers
in ears before RUN - if you got the address or instructions wrong, the system
will be blown away from underneath you!)
If you survived, the MUTEX process should have received a boost in BYTLM and
should now be runnable, but you'll have to give it a little kick to get it
going. Anything which delivers a special kernel AST will suffice. An SDA
SHOW PROCESS or $GETJPI <anything from the PCB> for example. If a STOP/ID
command had been issued on the process before running this program, the
process will just vanish. It is probably best to delete the process ASAP
since running with a hacked JIB is not exactly supported. Also, remember to
delete the program and source after use. A program which adds an arbitrary
number to some system addresses isn't very safe to leave lying around.
John Gillings
<<< GIDDAY::DISK$NOTES:[NOTES$LIBRARY]TSCNOTES.NOTE;1 >>>
-< Sydney Telephone Support Group >-
================================================================================
Note 233.1 "MUTEX" state, symptoms, cause and cure 1 of 1
GIDDAY::BRODRIBB "Scornful dogs eat dirty puddings" 45 lines 6-AUG-1992 11:50
-< Another solution... >-
--------------------------------------------------------------------------------
... which has the same risks (CMK hackery) but saves the customer a
whole lot of typing:
1. Find the process IPID..
Process index: 0008 Name: BRODRIBB Extended PID: 00000028
-------------------------------------------------------------
Process status: 00140001 RES,PHDRES,LOGIN
PCB address 802E8210 JIB address 8075FF00
PHD address 808AC200 Swapfile disk address 00000000
Master internal PID 00010008 Subprocess count 0
Internal PID 00010008 Creator internal PID 00000000
^
|
****------------+
2. Format the JIB as before and find the addresses of JIB$L_BYTCNT and
JIB$L_BYTLM:
8075FF20 JIB$L_BYTCNT 00017660
8075FF24 JIB$L_BYTLM 00017660
3. Exit SDA and use DELTA to increment BYTCNT and BYTLM
$ run sys$share:delta
1;M <--- you enter
00000001 <--- DELTA returns. All
processes and system
space are now writeable
<CR>
ipid: address/current_value� new_value� <--- generic command to
change value at address.
�DELTA enters this in
response to /
�you enter this
In the real world example here:
00010008: 8075ff24/00017660 00020000 <--- must enter BYTLM first
otherwise CRASH !!!
00010008: 8075ff20/00017660 00019000
exit
|