[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference vmsdev::vmstuning

Title:VMSTUNING
Notice:Welcome to VMSTUNING
Moderator:EVMS::HALLYB
Created:Sat Feb 15 1986
Last Modified:Wed May 14 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:1539
Total number of notes:7984

1538.0. "MUTEX Wait State - How to avoid it ?" by TLAV01::IAN () Wed May 07 1997 06:29

Hi,

My customer performed a "tuning exercise" one night a few weeks back. 
The next day his 750 users across 3 x 8400's in a VMS cluster started 
going into MUTEX Wait States until his Cluster hung. He crashed his
cluster, reversed out of his "tuning exercise" and rebooted.

He's now a little nervous about performing "tuning exercises". He asked
me to tell him more about MUTEX wait states. I've provided various 
documents and searched some notes conferences, but he is still not 
satisfied with my answers.

Could someone please have a shot at answering these two questions:

1. What "brings-on" VMS User processes to go into an excessive MUTEX
   Wait State ?

2. If the above is due to VMS parameters being "too tight", then please 
   advise which parameters, if "squeezed", are most likely to do this ?

Any chance of a pointer to a "Rules of Thumb" document.

Thanks,
Ian.

Note1: From my looking at his previous MODPARAMS.DAT, I suspect he might
       have made PQL_DWSEXTENT too small.

Note2: I've seen note 1074 (Oct-91 for VAX VMS V5.4-2). However .0 is 
       intentionally trying to get into a Mutex wait state. My customer 
       is intentionally trying to never again get his computers into a 
       MUTEX wait state. (Also he's on a ALPHA at VMS V6.2-1H3)

T.RTitleUserPersonal
Name
DateLines
1538.1MUTEX => (usually) Insufficient process BYTLMGIDDAY::GILLINGSa crucible of informative mistakesThu May 08 1997 00:0413
  Ian,
    "Genuine" MUTEX waits are very rare. They are easily identified from
  SDA - the process is at priority 16/16. Most observed MUTEX wait states
  are due to resource depletion. From SDA you will see the "Event flag wait
  mask" set to the "JIB address". You can then examin JOB$B_FLAGS to determine
  which resource you've run out of. A value of 01 indicates BYTLM, 02 indicates
  TQELM. In theory, you may see 03 this would seem to imply that 2 processes 
  in the same job tree are in MUTEX state, one waiting for BYTLM, the other
  for TQELM.

    Perhaps the customer had reduced PQL_MBYTLM in this "tuning"?
					
						John Gillings, Sydney CSC
1538.2.1 - ThanksTLAV02::IANSun May 11 1997 01:387
    Hi John,
    
    Thanks for the .1 info, it should help to put the pressure back onto
    the customer and make him a little more careful.
    
    Ian.
    
1538.3wait mask not = jibCSC32::BUCKLEYski fast,take chances,die youngMon May 12 1997 09:37346
The mutex is not always wait state = jib, from stars;


[OPENVMS] How to Troubleshoot a Process in MUTEX State

     Any party granted access to the following copyrighted information
     (protected under Federal Copyright Laws), pursuant to a duly executed
     Digital Service Agreement may, under the terms of such agreement copy
     all or selected portions of this information for internal use and
     distribution only. No other copying or distribution for any other
     purpose is authorized.
Copyright (c) Digital Equipment Corporation, 1994, 1995. All rights reserved.

PRODUCT:    OpenVMS Alpha, All Versions                                         
            OpenVMS VAX, All Versions

COMPONENT:  Scheduler

SOURCE:     Digital Equipment Corporation


OVERVIEW:

This is a general troubleshooting article for processes hung in the
MUTEX wait state.  See the RELATED ARTICLE section for specific
troubleshooting steps on more unique issues relating to the MUTEX
wait state.


QUESTION:

The DCL command SHOW SYSTEM shows one or more processes hung in the
MUTEX wait state.  How do you determine what the processes are waiting
for, and why they are waiting?

   $ SHOW SYSTEM

    VAX/VMS V6.1  on node COORS  10-AUG-1994....
     Pid    Process Name    State  Pri
   20E00401 SWAPPER         HIB     16
   20E03402 wahkaw::Write   LEF      5
   20E02C03 DECW$TE_2C03    LEF      6
   20E00C05 SOFTBALL MANIAC LEF      7
   20E00406 CONFIGURE       HIB     10
   20E07008 J_HASSENPFEFF   LEF      5
   20E00E46 Marty           LEF      4
   20E08647 BOO_BOO         MUTEX    4 <--- Process hung in MUTEX
   20E00E48 Harv            LEF     16
   20E04E4A Dave            LEF      4


ANSWER:

The operating system uses MUTEXes (Mutual Exclusion Semaphores) as a
synchronization technique for shared data structures that do not
require the process to be operating at elevated IPL (Interrupt
Priority Level).

A MUTEX is a data structure consisting of a longword for OpenVMS VAX
systems, and longwords or quadwords for OpenVMS Alpha systems.

   Longword Format:

    31              16              0
     +-------------+-+--------------+
     |   Status    | |  Owner Count |
     +-------------+-+--------------+
                    ^
                    |
                    +-------------------------------------------+
                                                                |
       Bit 0 or 16 = Write-Pending or Write-in-Progress flag ---+
                                                                |
   Quadword Format:                                             |
                                                                |
    31                             0                            |
     +----------------------------+-+                           |
     |         Status             | | <-------------------------+
     +----------------------------+-+
     |       Owner Count            |
     +------------------------------+

   NOTE:

     The "Status" field of a MUTEX is undefined and reserved to DIGITAL.

     The "Owner Count" field is initialized to negative 1, i.e; all "F"s,
     so that a value of 0 indicates that there is 1 owner.

A process is placed in the MUTEX state when it is unable to gain read or 
write access to a specified MUTEX.  The inability to gain access will be 
due to the write-pending or write-in-progress flag being equal to 1.

To determine what MUTEX a process is waiting for, examine the value for the 
"Event flag wait mask" field from the SDA command SHOW PROCESS.


EXAMPLE #1:  
                    Finding the Mutex
                    -----------------

The following information is a simple approach for troubleshooting a single 
process in MUTEX, with a single process blocking the acquisition of the 
mutex. For troubleshooting techniques involving multiple processes, see 
EXAMPLE #3.

1.  Invoke the System Dump Analyzer Utility (SDA) to examine the running
    system:

    $ ANALYZE/SYSTEM

2.  Read in the system definitions for SDA so that any MUTEX address
    can be interpreted.
    
    For OpenVMS Alpha
    -----------------

         SDA> READ SYS$LOADABLE_IMAGES:SYSDEF

    For OpenVMS VAX 
    ---------------

         SDA> READ SYS$SYSTEM:SYSDEF

3.  View the process on the system, noting those processes in the 
    MUTEX state.

    SDA> SHOW SUMMARY       Current process summary
    Extended Indx Process name    Username    State   Pri
    -- PID -- ---- --------------- ----------- ------- ---
    20E00401 0001 SWAPPER         SYSTEM       HIB     16
    20E03402 0002 Write_Crmp      KING         LEF      5
    20E02C03 0003 DECW$TE_2C03    SYSTEM       LEF      6
    20E00C05 0005 BASEBALL        ROCKIE       LEF      7
    20E00406 0006 CONFIGURE       SYSTEM       HIB     10
    20E07008 0008 HASSENDOODOO    BAYWTCH      LEF      5
    20E00E46 0246 Marty           MARTY        LEF      4
    20E08647 0247 BOO_BOO --+     HUNTER       MUTEX    4
    20E00E48 0248 Harv      |     HOGGIE       LEF     16
    20E04E4A 024A Dave      |     STUCKIE      LEF      4
                            +--------+
                                     |
4.  View the process hung in MUTEX.  |
                                     |
   SDA> SHOW PROCESS/INDEX=247 <-----+
  Process index: 0247   Name: BOO_BOO   Extended PID: 20E08647
  ------------------------------------------------------------
  Status : 02040001 res,phdres,inter
  Status2: 00000001 quantum_resched
  PCB address              840BE140    JIB address           83D58DC0
  PHD address              9CD08E00    Swapfile disk address 00000000
  Master internal PID      00210247    Subprocess count             0
  Internal PID             00210247    Creator internal PID  00000000
  Extended PID             20E08647    Creator extended PID  00000000
  State                       MUTEX    Termination mailbox       0000
  Current priority                7    AST's enabled             KESU
  Base priority                   4    AST's active              NONE
  UIC                [00022,000050]    AST's remaining            197
  Mutex count                     0    Buff I/O cnt/limt      100/100
  Waiting EF cluster              1    Direct I/O cnt/limt    100/100
  Starting wait time       1B001B1B    BIO byte cnt/limt  65344/65344
  Event flag wait mask     80004360    # open files allowed left   99
                              |
                              +----------+
5.  Translate the Event flag wait mask:  |
                                         |
  SDA> EXAMINE 80004360 <----------------+
  LNM$AL_MUTEX:  00010000
                    ^
                bit 16, "write" flag

  The process is waiting on the "Shared Logical Names Data Structure"
  MUTEX, LNM$AL_MUTEX, (see the list at the end of this article for
  other data structures protected by a MUTEX).  The MUTEX has a single 
  owner, i.e; Owner Count=0, who has write access to the structure, i.e;
  bit 16=1.

  NOTE:
    If the "Event flag wait mask" for the process is the same as the
    "JIB address", see another database article titled:

        [OpenVMS] Discussion Of Unusual MUTEX Wait State

6.  To see approximately how many seconds the process has been in the
    wait state, issue the following SDA command. The value you see may
    not be 100% correct due to other areas of the operating system that
    affect PCB$L_WAITIME.                                         

    SDA> EVAL (@EXE$GL_ABSTIM_TICS-@(PCB+PCB$L_WAITIME))/64


Determining why the process is blocked from gaining access to the
mutex requires that you determine which process owns the mutex.
Determining the owner is difficult because there is no owner field
defining this information.

When a process gains access to a mutex, its priority is raised to 16
to decrease the amount of time it has the resource.  The "Mutex
count" field for the process will also be incremented.  Use the SDA
command "SHOW SUMMARY" to determine which processes are at priority 16;
those processes are possibly blocking access to the mutex (ignore the
SWAPPER process, which is always at priority 16).

Isolate this list further by using the "Show Process" command, in SDA,
for those suspected processes and checking to see if their "Mutex count"
field is non-zero.


EXAMPLE #2:  
                     Examining the Suspect Process
                     -----------------------------

  (For this example we'll use the displays from the first three commands
   in the previous example.)

  Notice in Step 3 of EXAMPLE #1 that process "Harv" has a priority of
  16 (the SWAPPER process is ignored as its priority is always 16).

1.  Look at process Harv in detail, check the "Mutex count":

  SDA> SHOW PROCESS/INDEX=248

  Process index: 0248   Name: Harv   Extended PID: 20E00E48
  ---------------------------------------------------------
  Status : 02040001 res,phdres,inter
  Status2: 00040001 quantum_resched
  PCB address              840D2E00    JIB address           83ECC880
  PHD address              A7F54600    Swapfile disk address 00000000
  Master internal PID      00030248    Subprocess count             0
  Internal PID             00030248    Creator internal PID  00000000
  Extended PID             20E00E48    Creator extended PID  00000000
  State                       LEF      Termination mailbox       0000
  Current priority               16    AST's enabled             KESU
  Base priority                   4    AST's active              NONE
  UIC                [00060,000044]    AST's remaining            197
  Mutex count                     1    Buff I/O cnt/limt       99/100
  Waiting EF cluster              0    Direct I/O cnt/limt    100/100
  Starting wait time       1B001B1B    BIO byte cnt/limt  65088/65344
  Event flag wait mask     DFFFFFFF    # open files allowed left   99

  The non-zero "Mutex indicates that Harv owns a MUTEX, so this process
  is the most likely suspect to be blocking the process hung in MUTEX
  from gaining access to the data structure.

From this point you need to determine why this process is not releasing
the mutex. However, this determination is not the scope of this
article. The process is probably hung.  To continue troubleshooting
this problem see another article titled:

    [OpenVMS] How To Troubleshoot a Hung Process


EXAMPLE #3:  
              Investigating Multiple MUTEX Processes
              --------------------------------------

Typically, when a mutex problem occurs, it will affect more then a
single process. There may also be more then 1 mutex that processes
are waiting on. A single process may be blocking those processes hung
in the MUTEX state, ie; "Mutex count" field greater then 1, or
multiple processes may be hung and own a single mutex.

If multiple processes are hung in MUTEX and/or multiple processes have
a priority 16 or higher, use SDA to produce a text file that can be
searched, as opposed to using single SDA commands for each process.

Use the following commands in SDA to produce a text file for your
search:

  SDA> SET OUTPUT <filename>
  SDA> SHOW SUMMARY
  SDA> SHOW PROCESS ALL
  SDA> SET OUTPUT TT:

You may now search the text file for the "Event flag wait mask" of
all processes hung in MUTEX, and/or for those processes with a
non-zero "Mutex count" field.


List of Data Structures Protected by Mutexes:

For both OpenVMS VAX and Alpha
------------------------------

  +--------------------+-------------------------------------------+
  | SYMBOL             |        MUTEX TYPE                         |
  +--------------------+-------------------------------------------+
  | EXE$GL_CEBMTX      |  Common Event Block List                  |
  | EXE$GL_PGDYNMTX    |  Paged Dynamic Memory                     |
  | EXE$GL_GSDMTX      |  Global Section Descriptor List           |
  | UCB$L_LP_MUTEX     |  Line Printer Control Block               |
  | ORB$L_ACL_MUTEX    |  Object Rights Block Access Control List  |
  | CHANGE_MODE_MUTEX  |  System Service Database                  |
  | TFF$L_VEC_MUTEX    |  Terminal Fallback Database               |
  | CIA$GL_MUTEX       |  System Intruder List                     |
  +--------------------+-------------------------------------------+

For OpenVMS VAX
---------------

  +--------------------+-------------------------------------------+
  | SYMBOL             |        MUTEX TYPE                         |
  +--------------------+-------------------------------------------+
  | LNM$AL_MUTEX       |  Shared Logical Name Data Structures      |
  | IOC$GL_MUTEX       |  IO Database                              |
  | EXE$GL_SHMGSMTX    |  Shared Memory Global Section Descriptor  |
  | EXE$GL_SHMMBMTX    |  Shared Memory Mailbox Descriptor         |
  | EXE$GL_BASIMGMTX   |  Loadable Executive Image Data Structures |
  +--------------------+-------------------------------------------+
                                                                                
For OpenVMS Alpha
-----------------

  +--------------------+-------------------------------------------+
  | SYMBOL             |        MUTEX TYPE                         |
  +--------------------+-------------------------------------------+
  | LNM$AQ_MUTEX       |  Shared logical name data structures      |
  | IOC$GQ_MUTEX       |  I/O Database                             |
  | UCB$L_SO_MUTEX     |  Audio Device Unit Control Block          |
  | EXE$GQ_BASIMGMTX   |  Loadable Executive Image Data Structures |
  +--------------------+-------------------------------------------+ 
 

RELATED ARTICLES:

Other articles in the OPSYS database describe some specific problems
with processes in the MUTEX wait state.  These articles can be found 
using a search string of:           
                                                                              
        SHADOW_SERVER PROCESS MUTEX
        MUTEX HANG PATHWORKS 4.0
        SESSION MANAGER HANGS MUTEX CREATING APPLICATION
        DISCUSSION UNUSUAL MUTEX STATE



REFERENCES:

"VAX/VMS Internals and Data Structures, Version 5.2", 1991,
 (EY-C171E-DP)
"OpenVMS AXP Internals and Data Structures, Version 1.5", 1994,
 (EY-Q770E-DP)
"VMS System Dump Analyzer Utility Manual", April 1988, (AA-LA87A-TE),
 page(s) SDA-72