[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference vmszoo::vms_posix

Title:POSIX for OpenVMS - Public forum
Notice:Internal users of V2.0 please register in note 909
Moderator:STAR::PARKE
Created:Fri Jan 04 1991
Last Modified:Mon Jun 02 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:1370
Total number of notes:5127

1363.0. "Process hangs in MUTEX wait" by EWBV05::KUNIYOSHI () Wed Feb 19 1997 07:53

	Hello,

	Some posix processes hung in "MUTEX" wait.
	Event flag wait mask of PCB is "80AB51C0" which is in nonpaged pool.
	The contents of this MUTEX is "00010000". It means that 
	the state of resource is "write-in-progress" or "write-pending".
	It seems that this MUTEX is defined by POSIX because locking MUTEX
	is called from POSIX$KERNEL.And also it seems that it is used for
	synchronizing the list of queue header after MUTEX longword.

	
	OpenVMS/Alpha V6.1
	posix V2.0 ECO #3

	What is this MUTEX ?
	What happen on this system ?

	Thanks,
	Makito	/CSC Tokyo


 Extended Indx Process name    Username    State   Pri   PCB      PHD    Wkset
-- PID -- ---- --------------- ----------- ------- --- -------- -------- ------
 000001B2 00B2 posix$000001B2  SYOPER       MUTEX    8 80ADD600 81BD4000    408
 000001B3 00B3 posix$000001B3  SYOPER       MUTEX    7 80AFA600 81C04000    126
 000001B4 00B4 posix$000001B4  SYOPER       MUTEX    8 80B02840 81C34000     50
 000001B5 00B5 posix$000001B5  SYOPER       MUTEX    8 80B16C00 81C64000    132
 000001B7 00B7 posix$000001B7  SYOPER       MUTEX    7 80B23EC0 81CC4000     50
 000001B8 00B8 posix$000001B8  SYOPER       MUTEX    8 80B2E340 81CF4000    134
 000001B9 00B9 posix$000001B9  SYOPER       MUTEX    8 80B34A40 81D24000    183
 000001BA 00BA posix$000001BA  SYOPER       MUTEX    7 80B3B000 81D54000    134
 000001BB 00BB posix$000001BB  SYOPER       MUTEX    8 80B40840 81D84000    258
 000001BC 00BC posix$000001BC  SYOPER       MUTEX    7 80B457C0 81DB4000     81
 000001BD 00BD posix$000001BD  SYOPER       MUTEX    7 80B4B080 81DE4000    257
 000001BE 00BE posix$000001BE  SYOPER       MUTEX    8 80B508C0 81E14000    139
 000001BF 00BF posix$000001BF  SYOPER       MUTEX    8 80B60E40 81E44000    269
 000001C0 00C0 posix$000001C0  SYOPER       MUTEX    8 80B6F9C0 81E74000    133
 000001C4 00C4 posix$000001C4  SYOPER       MUTEX    8 80B94F00 81F34000    261
 000001C6 00C6 posix$000001C6  SYOPER       MUTEX    8 80BA04C0 81F94000    139
 000001C7 00C7 posix$000001C7  SYOPER       MUTEX    8 80BB8300 81FC4000    267


    Press RETURN for more.


SDA> SHO PROC
Process index: 0036   Name: posix$00004836   Extended PID: 00004836
-------------------------------------------------------------------
Process status:        000C8001  RES,NOACNT,PHDRES,HIBER
Required capabilities: 0000000C  QUORUM,RUN

PCB address              80E8E340    JIB address              80D00680
PHD address              81754000    Swapfile disk address    00000000
Master internal PID      00480036    Subprocess count                0
Internal PID             00480036    Creator internal PID     00000000
Extended PID             00004836    Creator extended PID     00000000
State                       MUTEX    Termination mailbox          0000
Previous CPU Id          00000001    Current CPU Id           00000001
Previous ASNSEQ  0000000000441496    Previous ASN     0000000000000007
Current priority                8    # of threads     0000000000000000
Initial process priority        4    Delete pending count         0
Base priority                   6    AST's active                 NONE
UIC                [00012,000001]    AST's remaining               511
Mutex count                     0    Buffered I/O count/limit      500/500
Waiting EF cluster              0    Direct I/O count/limit        500/500
Abs time of last event   0662B214    BUFIO byte count/limit     576512/576512
Event flag wait mask     80AB51C0    # open files allowed left     464


    Press RETURN for more.
SDA> EXA 80AB51C0;100
80AB51E0 00001040 02040000 00010000  ........@...�Q�.     80AB51C0
00000000 00000000 00000000 00000000  ................     80AB51D0
80AB51E8 80AB51E8 80AB51E0 80AB51E0  �Q�.�Q�.�Q�.�Q�.     80AB51E0
80AB51F8 80AB51F8 80AB51F0 80AB51F0    Q�.Q�.�Q�.�Q�.     80AB51F0
80AB5208 80AB5208 80AB5200 80AB5200  .R�..R�..R�..R�.     80AB5200
80AB5218 80AB5218 80AB5210 80AB5210  .R�..R�..R�..R�.     80AB5210
80AB5228 80AB5228 80AB5220 80AB5220   R�. R�.(R�.(R�.     80AB5220
	.	
	.
80AB53D8 80AB53D8 80969080 80969080  ........             80AB53D0
	.
	.

SDA> exa 80969080;c0
0004EC7F 0F8E3039 80AB53D0 80AB53D0  S�.S�.90...�..       80969080
00019000 00000200 000000C8 00000001  ....�...........     80969090
000000C0 00000003 00C00000 00000000  ......�.....�...     809690A0
80AF12C0 80C7EA40 00000000 00000000  ........@��.�..      809690B0
809690C8 809690C8 809690C0 809690C0  �...�...�...�...     809690C0
00000026 00000000 0000002D 00000000  ....-.......&...     809690D0
00010A01 000001C4 FFFFFFFF 36305953  SY06....�.......     809690E0
000141B6 0001000A 00010A01 0001000A  ............�A..     809690F0
36D48B78 32EAECCC 000001C4 000001DE  ...�...���2x.�6      80969100
12F81828 32FB3595 2BB8312C 32FB35CC  �5�2,1+.5�2(.�.      80969110
00000020 20205359 53534D56 50584102  .AXPVMSSYS   ...     80969120
0004EC7F 00000000 00000000 00030F8E  .............�..     80969130

SDA> sho call
Call Frame Information
----------------------
	Stack Frame Procedure Descriptor
Flags:  Base Register = FP, No Jacket, Native
        Procedure Entry: FFFFFFFF 8001FED0		EXE$KERNEL_WAIT_PS_C
        Return address on stack = FFFFFFFF 80020970	SCH_STD$LOCKR_QUAD_C+000C0

Registers saved on stack
------------------------
7FF91D88  FFFFFFFF 80AB51C0  Saved R0     
7FF91D90  00000000 7FFD0ED8  Saved R1     PSX$GA_FS_DATAPTR
7FF91D98  FFFFFFFF 80E8E340  Saved R4     PCB
7FF91DA0  FFFFFFFF 81523A20  Saved R13    SCH_STD$LOCKR_QUAD+00020
7FF91DA8  00000000 7FF91DF0  Saved R29    
SDA> sho call/n
Call Frame Information
----------------------
	Stack Frame Procedure Descriptor
Flags:  Base Register = FP, No Jacket, Native
        Procedure Entry: FFFFFFFF 80199E20		POSIX$KERNEL_NPRO+35E20
        Return address on stack = FFFFFFFF 84AD5A00	POSIX$KERNEL_PRO+3BA00

Registers saved on stack
------------------------
7FF91E08  FFFFFFFF 80D07780  Saved R4     
7FF91E10  FFFFFFFF 81579248  Saved R13    POSIX$KERNEL_NPRW+1DE48
7FF91E18  00000000 7FF91E20  Saved R29    
Result: 32-bit sign extended integer returned in register R0
Argument List:

SDA> sho call/n
Call Frame Information
----------------------
	Stack Frame Procedure Descriptor
Flags:  Base Register = FP, No Jacket, Native
        Procedure Entry: FFFFFFFF 84AD59A0		POSIX$KERNEL_PRO+3B9A0
        Return address on stack = FFFFFFFF 84AD23F4	POSIX$KERNEL_PRO+383F4

Registers saved on stack
------------------------
7FF91E30  FFFFFFFF 81562990  Saved R2     POSIX$KERNEL_NPRW+07590
7FF91E38  00000000 7F9B6AA8  Saved R3     
7FF91E40  FFFFFFFF 80D07780  Saved R4     
7FF91E48  00000000 00000130  Saved R15    
7FF91E50  00000000 7FF91E60  Saved R29    
SDA> sho call/n
Call Frame Information
----------------------
	Stack Frame Procedure Descriptor
Flags:  Base Register = FP, No Jacket, Native
        Procedure Entry: FFFFFFFF 84AD22C0		POSIX$KERNEL_PRO+382C0
        Return address on stack = FFFFFFFF 80195ED4	POSIX$KERNEL_NPRO+31ED4

Registers saved on stack
------------------------
7FF91E78  00000000 7FF76950  Saved R2     
7FF91E80  00000000 7F9B6AA8  Saved R3     
7FF91E88  FFFFFFFF 80E8E340  Saved R4     
7FF91E90  00000000 00000005  Saved R5     
7FF91E98  00000000 00050007  Saved R6     
7FF91EA0  FFFFFFFF 81562990  Saved R7     POSIX$KERNEL_NPRW+07590
7FF91EA8  00000000 00000000  Saved R8     
7FF91EB0  FFFFFFFF 8497B268  Saved R9     EXE$CATCH_ALL
7FF91EB8  00000000 7FFBF800  Saved R10    MMG$IMGHDRBUF
7FF91EC0  00000000 00000000  Saved R11    
7FF91EC8  00000000 7FF91EE0  Saved R29    
SDA> sho call/n
Call Frame Information
----------------------
	Stack Frame Procedure Descriptor
Flags:  Base Register = FP, No Jacket, Native
        Procedure Entry: FFFFFFFF 80195BE0		POSIX$KERNEL_NPRO+31BE0
        Return address on stack = FFFFFFFF 80055124	PUT_MSG_C+0079C

Registers saved on stack
------------------------
7FF91F70  00000000 00000003  Saved R2     
7FF91F78  FFFFFFFF 81531580  Saved R3     BUG$TABLE+00C38
7FF91F80  FFFFFFFF 80E8E340  Saved R4     
7FF91F88  00000000 00000000  Saved R5     
7FF91F90  00000000 000B7DFC  Saved R6     
7FF91F98  00000000 7FF91FC0  Saved R7     
7FF91FA0  00000000 00011DD0  Saved R13    
7FF91FA8  00000000 7F9B6AA0  Saved R29    
Result: 32-bit sign extended integer returned in register R0
Argument List:




    Press RETURN for more.
Call Frame Information
----------------------




	FFFFFFFF 815235C8	in memory (32-bit signed integer)
	89FFFFFF FF815235	in memory (32-bit signed integer)
	3089FFFF FFFF8152	in memory (64-bit integer)
SDA> sho call/n
Call Frame Information
----------------------
	Stack Frame Procedure Descriptor
Flags:  Base Register = FP, No Jacket, Native
        Procedure Entry: 00000000 000B7D80		
        Return address on stack = 00000000 00045988	

Registers saved on stack
------------------------
7F9B6AB8  00000000 000118C0  Saved R2     
7F9B6AC0  00000000 00000000  Saved R3     
7F9B6AC8  00000000 00000000  Saved R4     
7F9B6AD0  00000000 0015B754  Saved R5     
7F9B6AD8  00000000 00000025  Saved R6     
7F9B6AE0  00000000 7F9B9508  Saved R7     
7F9B6AE8  00000000 00000000  Saved R8     
7F9B6AF0  FFFFFFFF 8497B268  Saved R9     EXE$CATCH_ALL
7F9B6AF8  00000000 7F9B6B00  Saved R29    
SDA> set nolog
T.RTitleUserPersonal
Name
DateLines
1363.1some IDSMUCTEC::BECKERHartmut B., VMS & Languages, MunichWed Feb 19 1997 13:2473
> SDA> EXA 80AB51C0;100
> 80AB51E0 00001040 02040000 00010000  ........@...�Q�.     80AB51C0
> 00000000 00000000 00000000 00000000  ................     80AB51D0
> 80AB51E8 80AB51E8 80AB51E0 80AB51E0  �Q�.�Q�.�Q�.�Q�.     80AB51E0
> 80AB51F8 80AB51F8 80AB51F0 80AB51F0    Q�.Q�.�Q�.�Q�.     80AB51F0
> 80AB5208 80AB5208 80AB5200 80AB5200  .R�..R�..R�..R�.     80AB5200
> 80AB5218 80AB5218 80AB5210 80AB5210  .R�..R�..R�..R�.     80AB5210
> 80AB5228 80AB5228 80AB5220 80AB5220   R�. R�.(R�.(R�.     80AB5220
> 	.	
> 	.
> 80AB53D8 80AB53D8 80969080 80969080  ........             80AB53D0
> 	.
> 	.

This looks like the resource hash table. For any resource, for example a 
message queue, there is a name which can be looked up in the hash table.
The first longword is used as mutex. Then there are other data which describe
this table (length, etc.). Then there is a pointer to the first resource
header: 80AB51E0. The other resource headers follow. Each resource header
has a flink and a blink. Hence all the resource headers with identical
entries pointing to itself are unused slots in the hash table. From your output
it seems there is only 1 resource used, header 80969080. But the length of the
table is 0x1040.

> SDA> exa 80969080;c0
> 0004EC7F 0F8E3039 80AB53D0 80AB53D0  S�.S�.90...�..       80969080
> 00019000 00000200 000000C8 00000001  ....�...........     80969090
> 000000C0 00000003 00C00000 00000000  ......�.....�...     809690A0
> 80AF12C0 80C7EA40 00000000 00000000  ........@��.�..      809690B0
> 809690C8 809690C8 809690C0 809690C0  �...�...�...�...     809690C0
> 00000026 00000000 0000002D 00000000  ....-.......&...     809690D0
> 00010A01 000001C4 FFFFFFFF 36305953  SY06....�.......     809690E0
> 000141B6 0001000A 00010A01 0001000A  ............�A..     809690F0
> 36D48B78 32EAECCC 000001C4 000001DE  ...�...���2x.�6      80969100
> 12F81828 32FB3595 2BB8312C 32FB35CC  �5�2,1+.5�2(.�.      80969110
> 00000020 20205359 53534D56 50584102  .AXPVMSSYS   ...     80969120
> 0004EC7F 00000000 00000000 00030F8E  .............�..     80969130

This is the resource. This record again has a flink and a blink, pointing back:
only one resource in this hash slot. It looks like a resource describing a
message queue. Do they use message queues? There are 3 messages in the message
queue, a total of 0xc0 bytes. The messages are in a linked list, header is at
809690B8. Nothing pending, no send, no receive. Pids and UIDs follow, and time
stamps (in posix format). I can't see anything unusual in these structures.

Back to the problem. It seems that these processes are waiting for the hash
table mutex. There should be one process, which currently holds that mutex.
The mutex is aquired for searching/updating the hash table and/or updating
resource structures.

This process may have identified itself as last sender or receiver in this or
another message queue (or another resource structure). At offset 0x80 and 0x84
there are the pids of the last sender and last receiver. In this structure the
last operation should be a send. The sender's id is 000001DE, the send time is
32FB35CC 2BB8312C (posix sec and nanosec format). This is obviously bigger than
the creation time and the last receive time (32EAECCC 36D48B78 and 32FB3595
12F81828). The last send was done 0xcc-0x95 seconds after the last receive.

If this is the only resource, then 1de was the last one successfully sending a
message to this queue. If it successfully did that it should have released the
mutex. What happened to this process, is it still around?

On the other hand, if the culprit locked the mutex and failed to identify
itself in this structure, then you have to look at all other posix processes,
not waiting for this mutex. Are there many around? What's their status?

And going some more steps back. Did it work before? Is there anything new onthe
system, new version of VMS, the application? How often does this happen, is
there anything unusual in the application/system going on at the same time?

I'm not sure if this really helps, but I hope it's a starter,
Hartmut
    
1363.2EWBV05::KUNIYOSHIThu Feb 20 1997 05:47116
	Thank you for quick response.

	The problem occurred just one time.The customer has been using
	this system since last October.The application uses the message queue.

	I have a forced crash dump and analizing it but I don't have a 
	knowledge of posix internal, so it is difficult to analize it.

	According to previous reply, owner of MUTEX is 0001DE. It can be 
	confirmed because the process's prioprity is 16.
	(VMS raises the priority of process who holding MUTEX to 16.)

	It seems that this process is looping in POSIX$KERNEL without 
	releaseing MUTEX.So other posix processes hang.And also it seems 
	that R3 points to the resource.This resource has C7 messages.
	But,I don't understand the relation between these data structure and
	process loop.

	Any comments are appreciated.

	Regards,
	Makito Kuniyoshi/CSC Tokyo


Process index: 00DE   Name: posix$000001DE   Extended PID: 000001DE
-------------------------------------------------------------------
Process status:        000C8001  RES,NOACNT,PHDRES,HIBER
Required capabilities: 0000000C  QUORUM,RUN

PCB address              80A98A80    JIB address              80CBE480
PHD address              823E4000    Swapfile disk address    00000000
Master internal PID      000100DE    Subprocess count                0
Internal PID             000100DE    Creator internal PID     00000000
Extended PID             000001DE    Creator extended PID     00000000
State                       CUR      Termination mailbox          0000
Previous CPU Id          00000001    Current CPU Id           00000000
Previous ASNSEQ  00000000004414C2    Previous ASN     000000000000003D
Current priority               16    # of threads     0000000000000001
Initial process priority        4    Delete pending count         0
Base priority                  16    AST's active                 NONE
UIC                [00012,000001]    AST's remaining               511
Mutex count                     1    Buffered I/O count/limit      500/500
Waiting EF cluster              0    Direct I/O count/limit        500/500
Abs time of last event   0662AE25    BUFIO byte count/limit     595328/595328
Event flag wait mask     F7FFFFFF    # open files allowed left     465

Process index: 00DE   Name: posix$000001DE   Extended PID: 000001DE
-------------------------------------------------------------------
Active registers for current process at bugcheck time
-----------------------------------------------------
R0   = FFFFFFFF 80B69908  R1   = 00000000 0011BA48  R2   = FFFFFFFF 81565CD8
R3   = FFFFFFFF 80B698C0  R4   = FFFFFFFF 80B698F8  R5   = FFFFFFFF 80AB51C0
R6   = FFFFFFFF 8155C090  R7   = 00010031 00000000  R8   = 00000000 00000000
R9   = 00050031 09295064  R10  = 00018C28 00000000  R11  = 80B1C300 7F9C0EF8
R12  = 00000000 7F9C3A88  R13  = FFFFFFFF 81579248  R14  = 00000000 00000000
R15  = 00000000 00000130  R16  = 00000000 0000064C  R17  = 00000000 32EAED0F
R18  = FFFFFFFF 80A72340  R19  = 00000000 00000006  R20  = 00000000 00000020
R21  = 80B98E00 80AF3780  R22  = 80B98E00 80AF3780  R23  = 00000000 00050031
R24  = 00000000 00000000  R25  = FFFFFFFF 80B69908  R26  = FFFFFFFF 815352FC
R27  = FFFFFFFF 815352C0  R28  = FFFFFFFF 84AC318C  R29  = 00000000 7FF91DB0
PC   = FFFFFFFF 8006EF34  PS   = 38000000 00001F00  KSP  = FFFFFFFF 848F1FF8
ESP  = 00000000 7FF96000  SSP  = 00000000 7FF9E000  USP  = 00000000 7F9C0EE0
PTBR = 00000000 000000F4  AST{SR/EN}    = 0000002E  ASN  = 00000000 0000002F

SDA> exa @r3;c0
000534C9 0F8E3039 80AB56A0 80AB56A0  .V�.V�90..�..        80B698C0
00019000 00000200 000000C8 00000001  ....�..........      80B698D0
00018C28 000000C7 00C00000 00000000  ......��..(...       80B698E0
80C1A480 809357C0 00000000 0000000A  ........��...�.      80B698F0
80AA8F40 80AF01C0 80B69900 80B69900  ..�..��@.�           80B69900
00000000 00000000 00000000 00000000  ................     80B69910
00010A01 000001DE FFFFFFFF 32355953  SY52..........       80B69920
000141B6 0001000A 00010A01 0001000A  ............��..     80B69930
06419000 32EAED0F 000001DE 00005A84  .Z.....��2..A.       80B69940
2BF3CAFC 32FB35CC 12746744 32FB357F  .5��Dgt.̵�����     80B69950
00000020 20205359 53534D56 50584102  .AXPVMSSYS   ...     80B69960
000534C9 00000000 00000000 00030F8E  ............�..      80B69970
	


Call Frame Information
----------------------
        Stack Frame Procedure Descriptor
Flags:  Base Register = FP, No Jacket, Native
        Procedure Entry: FFFFFFFF 84AE5A90              POSIX$KERNEL_PRO+4BA90
        Return address on stack = FFFFFFFF 84AE65BC     POSIX$KERNEL_PRO+4C5BC

Registers saved on stack
------------------------
7FF91DC8  FFFFFFFF 81565CF8  Saved R2     POSIX$KERNEL_NPRW+0A8F8
7FF91DD0  00000000 7FF76964  Saved R3
7FF91DD8  FFFFFFFF 80B1D3C0  Saved R4
7FF91DE0  FFFFFFFF 80AB51C0  Saved R5
7FF91DE8  00000000 7FF91DF0  Saved R29
SDA>

Call Frame Information
----------------------
        Stack Frame Procedure Descriptor
Flags:  Base Register = FP, No Jacket, Native
        Procedure Entry: FFFFFFFF 84AE5DE0              POSIX$KERNEL_PRO+4BDE0
        Return address on stack = FFFFFFFF 84AD2894     POSIX$KERNEL_PRO+38894

Registers saved on stack
------------------------
7FF91E10  FFFFFFFF 81562990  Saved R2     POSIX$KERNEL_NPRW+07590
7FF91E18  FFFFFFFF 8155BFD0  Saved R3     POSIX$KERNEL_NPRW+00BD0
7FF91E20  7FF76964 00000000  Saved R4
7FF91E28  FFFFFFFF 80B1D3C0  Saved R5
7FF91E30  00000000 7FF76970  Saved R6
7FF91E38  00000000 00000000  Saved R7
7FF91E40  00000000 00050031  Saved R8
7FF91E48  FFFFFFFF 8155C090  Saved R9     POSIX$KERNEL_NPRW+00C90
7FF91E50  7FF76964 00000000  Saved R10
7FF91E58  00000000 7FF91E60  Saved R29
1363.3fifo? locks?MUCTEC::BECKERHartmut B., VMS & Languages, MunichFri Feb 21 1997 18:1712
I'm not sure if I found the pc's and the calls in the maps and listings.
Are you sure about V2.0 and ECO 3? It looks like the last call was from a
fifo routine. Do they use fifo's? The called routine should (build a
resource name and) aquire a VMS lock, but I want to double check that. Does
it make sense with what you can see in the dump?

Anyway, I'm not with posix engineering, if you want this problem solved you
should QAR, SPR or IPMT this case (whatever you and your customer think is
apropriate). It may be a simple resource problem on the customer's site it
may also be a problem within posix.

Hartmut