| The mutex is not always wait state = jib, from stars;
[OPENVMS] How to Troubleshoot a Process in MUTEX State
Any party granted access to the following copyrighted information
(protected under Federal Copyright Laws), pursuant to a duly executed
Digital Service Agreement may, under the terms of such agreement copy
all or selected portions of this information for internal use and
distribution only. No other copying or distribution for any other
purpose is authorized.
Copyright (c) Digital Equipment Corporation, 1994, 1995. All rights reserved.
PRODUCT: OpenVMS Alpha, All Versions
OpenVMS VAX, All Versions
COMPONENT: Scheduler
SOURCE: Digital Equipment Corporation
OVERVIEW:
This is a general troubleshooting article for processes hung in the
MUTEX wait state. See the RELATED ARTICLE section for specific
troubleshooting steps on more unique issues relating to the MUTEX
wait state.
QUESTION:
The DCL command SHOW SYSTEM shows one or more processes hung in the
MUTEX wait state. How do you determine what the processes are waiting
for, and why they are waiting?
$ SHOW SYSTEM
VAX/VMS V6.1 on node COORS 10-AUG-1994....
Pid Process Name State Pri
20E00401 SWAPPER HIB 16
20E03402 wahkaw::Write LEF 5
20E02C03 DECW$TE_2C03 LEF 6
20E00C05 SOFTBALL MANIAC LEF 7
20E00406 CONFIGURE HIB 10
20E07008 J_HASSENPFEFF LEF 5
20E00E46 Marty LEF 4
20E08647 BOO_BOO MUTEX 4 <--- Process hung in MUTEX
20E00E48 Harv LEF 16
20E04E4A Dave LEF 4
ANSWER:
The operating system uses MUTEXes (Mutual Exclusion Semaphores) as a
synchronization technique for shared data structures that do not
require the process to be operating at elevated IPL (Interrupt
Priority Level).
A MUTEX is a data structure consisting of a longword for OpenVMS VAX
systems, and longwords or quadwords for OpenVMS Alpha systems.
Longword Format:
31 16 0
+-------------+-+--------------+
| Status | | Owner Count |
+-------------+-+--------------+
^
|
+-------------------------------------------+
|
Bit 0 or 16 = Write-Pending or Write-in-Progress flag ---+
|
Quadword Format: |
|
31 0 |
+----------------------------+-+ |
| Status | | <-------------------------+
+----------------------------+-+
| Owner Count |
+------------------------------+
NOTE:
The "Status" field of a MUTEX is undefined and reserved to DIGITAL.
The "Owner Count" field is initialized to negative 1, i.e; all "F"s,
so that a value of 0 indicates that there is 1 owner.
A process is placed in the MUTEX state when it is unable to gain read or
write access to a specified MUTEX. The inability to gain access will be
due to the write-pending or write-in-progress flag being equal to 1.
To determine what MUTEX a process is waiting for, examine the value for the
"Event flag wait mask" field from the SDA command SHOW PROCESS.
EXAMPLE #1:
Finding the Mutex
-----------------
The following information is a simple approach for troubleshooting a single
process in MUTEX, with a single process blocking the acquisition of the
mutex. For troubleshooting techniques involving multiple processes, see
EXAMPLE #3.
1. Invoke the System Dump Analyzer Utility (SDA) to examine the running
system:
$ ANALYZE/SYSTEM
2. Read in the system definitions for SDA so that any MUTEX address
can be interpreted.
For OpenVMS Alpha
-----------------
SDA> READ SYS$LOADABLE_IMAGES:SYSDEF
For OpenVMS VAX
---------------
SDA> READ SYS$SYSTEM:SYSDEF
3. View the process on the system, noting those processes in the
MUTEX state.
SDA> SHOW SUMMARY Current process summary
Extended Indx Process name Username State Pri
-- PID -- ---- --------------- ----------- ------- ---
20E00401 0001 SWAPPER SYSTEM HIB 16
20E03402 0002 Write_Crmp KING LEF 5
20E02C03 0003 DECW$TE_2C03 SYSTEM LEF 6
20E00C05 0005 BASEBALL ROCKIE LEF 7
20E00406 0006 CONFIGURE SYSTEM HIB 10
20E07008 0008 HASSENDOODOO BAYWTCH LEF 5
20E00E46 0246 Marty MARTY LEF 4
20E08647 0247 BOO_BOO --+ HUNTER MUTEX 4
20E00E48 0248 Harv | HOGGIE LEF 16
20E04E4A 024A Dave | STUCKIE LEF 4
+--------+
|
4. View the process hung in MUTEX. |
|
SDA> SHOW PROCESS/INDEX=247 <-----+
Process index: 0247 Name: BOO_BOO Extended PID: 20E08647
------------------------------------------------------------
Status : 02040001 res,phdres,inter
Status2: 00000001 quantum_resched
PCB address 840BE140 JIB address 83D58DC0
PHD address 9CD08E00 Swapfile disk address 00000000
Master internal PID 00210247 Subprocess count 0
Internal PID 00210247 Creator internal PID 00000000
Extended PID 20E08647 Creator extended PID 00000000
State MUTEX Termination mailbox 0000
Current priority 7 AST's enabled KESU
Base priority 4 AST's active NONE
UIC [00022,000050] AST's remaining 197
Mutex count 0 Buff I/O cnt/limt 100/100
Waiting EF cluster 1 Direct I/O cnt/limt 100/100
Starting wait time 1B001B1B BIO byte cnt/limt 65344/65344
Event flag wait mask 80004360 # open files allowed left 99
|
+----------+
5. Translate the Event flag wait mask: |
|
SDA> EXAMINE 80004360 <----------------+
LNM$AL_MUTEX: 00010000
^
bit 16, "write" flag
The process is waiting on the "Shared Logical Names Data Structure"
MUTEX, LNM$AL_MUTEX, (see the list at the end of this article for
other data structures protected by a MUTEX). The MUTEX has a single
owner, i.e; Owner Count=0, who has write access to the structure, i.e;
bit 16=1.
NOTE:
If the "Event flag wait mask" for the process is the same as the
"JIB address", see another database article titled:
[OpenVMS] Discussion Of Unusual MUTEX Wait State
6. To see approximately how many seconds the process has been in the
wait state, issue the following SDA command. The value you see may
not be 100% correct due to other areas of the operating system that
affect PCB$L_WAITIME.
SDA> EVAL (@EXE$GL_ABSTIM_TICS-@(PCB+PCB$L_WAITIME))/64
Determining why the process is blocked from gaining access to the
mutex requires that you determine which process owns the mutex.
Determining the owner is difficult because there is no owner field
defining this information.
When a process gains access to a mutex, its priority is raised to 16
to decrease the amount of time it has the resource. The "Mutex
count" field for the process will also be incremented. Use the SDA
command "SHOW SUMMARY" to determine which processes are at priority 16;
those processes are possibly blocking access to the mutex (ignore the
SWAPPER process, which is always at priority 16).
Isolate this list further by using the "Show Process" command, in SDA,
for those suspected processes and checking to see if their "Mutex count"
field is non-zero.
EXAMPLE #2:
Examining the Suspect Process
-----------------------------
(For this example we'll use the displays from the first three commands
in the previous example.)
Notice in Step 3 of EXAMPLE #1 that process "Harv" has a priority of
16 (the SWAPPER process is ignored as its priority is always 16).
1. Look at process Harv in detail, check the "Mutex count":
SDA> SHOW PROCESS/INDEX=248
Process index: 0248 Name: Harv Extended PID: 20E00E48
---------------------------------------------------------
Status : 02040001 res,phdres,inter
Status2: 00040001 quantum_resched
PCB address 840D2E00 JIB address 83ECC880
PHD address A7F54600 Swapfile disk address 00000000
Master internal PID 00030248 Subprocess count 0
Internal PID 00030248 Creator internal PID 00000000
Extended PID 20E00E48 Creator extended PID 00000000
State LEF Termination mailbox 0000
Current priority 16 AST's enabled KESU
Base priority 4 AST's active NONE
UIC [00060,000044] AST's remaining 197
Mutex count 1 Buff I/O cnt/limt 99/100
Waiting EF cluster 0 Direct I/O cnt/limt 100/100
Starting wait time 1B001B1B BIO byte cnt/limt 65088/65344
Event flag wait mask DFFFFFFF # open files allowed left 99
The non-zero "Mutex indicates that Harv owns a MUTEX, so this process
is the most likely suspect to be blocking the process hung in MUTEX
from gaining access to the data structure.
From this point you need to determine why this process is not releasing
the mutex. However, this determination is not the scope of this
article. The process is probably hung. To continue troubleshooting
this problem see another article titled:
[OpenVMS] How To Troubleshoot a Hung Process
EXAMPLE #3:
Investigating Multiple MUTEX Processes
--------------------------------------
Typically, when a mutex problem occurs, it will affect more then a
single process. There may also be more then 1 mutex that processes
are waiting on. A single process may be blocking those processes hung
in the MUTEX state, ie; "Mutex count" field greater then 1, or
multiple processes may be hung and own a single mutex.
If multiple processes are hung in MUTEX and/or multiple processes have
a priority 16 or higher, use SDA to produce a text file that can be
searched, as opposed to using single SDA commands for each process.
Use the following commands in SDA to produce a text file for your
search:
SDA> SET OUTPUT <filename>
SDA> SHOW SUMMARY
SDA> SHOW PROCESS ALL
SDA> SET OUTPUT TT:
You may now search the text file for the "Event flag wait mask" of
all processes hung in MUTEX, and/or for those processes with a
non-zero "Mutex count" field.
List of Data Structures Protected by Mutexes:
For both OpenVMS VAX and Alpha
------------------------------
+--------------------+-------------------------------------------+
| SYMBOL | MUTEX TYPE |
+--------------------+-------------------------------------------+
| EXE$GL_CEBMTX | Common Event Block List |
| EXE$GL_PGDYNMTX | Paged Dynamic Memory |
| EXE$GL_GSDMTX | Global Section Descriptor List |
| UCB$L_LP_MUTEX | Line Printer Control Block |
| ORB$L_ACL_MUTEX | Object Rights Block Access Control List |
| CHANGE_MODE_MUTEX | System Service Database |
| TFF$L_VEC_MUTEX | Terminal Fallback Database |
| CIA$GL_MUTEX | System Intruder List |
+--------------------+-------------------------------------------+
For OpenVMS VAX
---------------
+--------------------+-------------------------------------------+
| SYMBOL | MUTEX TYPE |
+--------------------+-------------------------------------------+
| LNM$AL_MUTEX | Shared Logical Name Data Structures |
| IOC$GL_MUTEX | IO Database |
| EXE$GL_SHMGSMTX | Shared Memory Global Section Descriptor |
| EXE$GL_SHMMBMTX | Shared Memory Mailbox Descriptor |
| EXE$GL_BASIMGMTX | Loadable Executive Image Data Structures |
+--------------------+-------------------------------------------+
For OpenVMS Alpha
-----------------
+--------------------+-------------------------------------------+
| SYMBOL | MUTEX TYPE |
+--------------------+-------------------------------------------+
| LNM$AQ_MUTEX | Shared logical name data structures |
| IOC$GQ_MUTEX | I/O Database |
| UCB$L_SO_MUTEX | Audio Device Unit Control Block |
| EXE$GQ_BASIMGMTX | Loadable Executive Image Data Structures |
+--------------------+-------------------------------------------+
RELATED ARTICLES:
Other articles in the OPSYS database describe some specific problems
with processes in the MUTEX wait state. These articles can be found
using a search string of:
SHADOW_SERVER PROCESS MUTEX
MUTEX HANG PATHWORKS 4.0
SESSION MANAGER HANGS MUTEX CREATING APPLICATION
DISCUSSION UNUSUAL MUTEX STATE
REFERENCES:
"VAX/VMS Internals and Data Structures, Version 5.2", 1991,
(EY-C171E-DP)
"OpenVMS AXP Internals and Data Structures, Version 1.5", 1994,
(EY-Q770E-DP)
"VMS System Dump Analyzer Utility Manual", April 1988, (AA-LA87A-TE),
page(s) SDA-72
|