| Title: | POSIX for OpenVMS - Public forum |
| Notice: | Internal users of V2.0 please register in note 909 |
| Moderator: | STAR::PARKE |
| Created: | Fri Jan 04 1991 |
| Last Modified: | Mon Jun 02 1997 |
| Last Successful Update: | Fri Jun 06 1997 |
| Number of topics: | 1370 |
| Total number of notes: | 5127 |
Hello,
Some posix processes hung in "MUTEX" wait.
Event flag wait mask of PCB is "80AB51C0" which is in nonpaged pool.
The contents of this MUTEX is "00010000". It means that
the state of resource is "write-in-progress" or "write-pending".
It seems that this MUTEX is defined by POSIX because locking MUTEX
is called from POSIX$KERNEL.And also it seems that it is used for
synchronizing the list of queue header after MUTEX longword.
OpenVMS/Alpha V6.1
posix V2.0 ECO #3
What is this MUTEX ?
What happen on this system ?
Thanks,
Makito /CSC Tokyo
Extended Indx Process name Username State Pri PCB PHD Wkset
-- PID -- ---- --------------- ----------- ------- --- -------- -------- ------
000001B2 00B2 posix$000001B2 SYOPER MUTEX 8 80ADD600 81BD4000 408
000001B3 00B3 posix$000001B3 SYOPER MUTEX 7 80AFA600 81C04000 126
000001B4 00B4 posix$000001B4 SYOPER MUTEX 8 80B02840 81C34000 50
000001B5 00B5 posix$000001B5 SYOPER MUTEX 8 80B16C00 81C64000 132
000001B7 00B7 posix$000001B7 SYOPER MUTEX 7 80B23EC0 81CC4000 50
000001B8 00B8 posix$000001B8 SYOPER MUTEX 8 80B2E340 81CF4000 134
000001B9 00B9 posix$000001B9 SYOPER MUTEX 8 80B34A40 81D24000 183
000001BA 00BA posix$000001BA SYOPER MUTEX 7 80B3B000 81D54000 134
000001BB 00BB posix$000001BB SYOPER MUTEX 8 80B40840 81D84000 258
000001BC 00BC posix$000001BC SYOPER MUTEX 7 80B457C0 81DB4000 81
000001BD 00BD posix$000001BD SYOPER MUTEX 7 80B4B080 81DE4000 257
000001BE 00BE posix$000001BE SYOPER MUTEX 8 80B508C0 81E14000 139
000001BF 00BF posix$000001BF SYOPER MUTEX 8 80B60E40 81E44000 269
000001C0 00C0 posix$000001C0 SYOPER MUTEX 8 80B6F9C0 81E74000 133
000001C4 00C4 posix$000001C4 SYOPER MUTEX 8 80B94F00 81F34000 261
000001C6 00C6 posix$000001C6 SYOPER MUTEX 8 80BA04C0 81F94000 139
000001C7 00C7 posix$000001C7 SYOPER MUTEX 8 80BB8300 81FC4000 267
Press RETURN for more.
SDA> SHO PROC
Process index: 0036 Name: posix$00004836 Extended PID: 00004836
-------------------------------------------------------------------
Process status: 000C8001 RES,NOACNT,PHDRES,HIBER
Required capabilities: 0000000C QUORUM,RUN
PCB address 80E8E340 JIB address 80D00680
PHD address 81754000 Swapfile disk address 00000000
Master internal PID 00480036 Subprocess count 0
Internal PID 00480036 Creator internal PID 00000000
Extended PID 00004836 Creator extended PID 00000000
State MUTEX Termination mailbox 0000
Previous CPU Id 00000001 Current CPU Id 00000001
Previous ASNSEQ 0000000000441496 Previous ASN 0000000000000007
Current priority 8 # of threads 0000000000000000
Initial process priority 4 Delete pending count 0
Base priority 6 AST's active NONE
UIC [00012,000001] AST's remaining 511
Mutex count 0 Buffered I/O count/limit 500/500
Waiting EF cluster 0 Direct I/O count/limit 500/500
Abs time of last event 0662B214 BUFIO byte count/limit 576512/576512
Event flag wait mask 80AB51C0 # open files allowed left 464
Press RETURN for more.
SDA> EXA 80AB51C0;100
80AB51E0 00001040 02040000 00010000 ........@...�Q�. 80AB51C0
00000000 00000000 00000000 00000000 ................ 80AB51D0
80AB51E8 80AB51E8 80AB51E0 80AB51E0 �Q�.�Q�.�Q�.�Q�. 80AB51E0
80AB51F8 80AB51F8 80AB51F0 80AB51F0 Q�.Q�.�Q�.�Q�. 80AB51F0
80AB5208 80AB5208 80AB5200 80AB5200 .R�..R�..R�..R�. 80AB5200
80AB5218 80AB5218 80AB5210 80AB5210 .R�..R�..R�..R�. 80AB5210
80AB5228 80AB5228 80AB5220 80AB5220 R�. R�.(R�.(R�. 80AB5220
.
.
80AB53D8 80AB53D8 80969080 80969080 ........ 80AB53D0
.
.
SDA> exa 80969080;c0
0004EC7F 0F8E3039 80AB53D0 80AB53D0 S�.S�.90...�.. 80969080
00019000 00000200 000000C8 00000001 ....�........... 80969090
000000C0 00000003 00C00000 00000000 ......�.....�... 809690A0
80AF12C0 80C7EA40 00000000 00000000 ........@��.�.. 809690B0
809690C8 809690C8 809690C0 809690C0 �...�...�...�... 809690C0
00000026 00000000 0000002D 00000000 ....-.......&... 809690D0
00010A01 000001C4 FFFFFFFF 36305953 SY06....�....... 809690E0
000141B6 0001000A 00010A01 0001000A ............�A.. 809690F0
36D48B78 32EAECCC 000001C4 000001DE ...�...���2x.�6 80969100
12F81828 32FB3595 2BB8312C 32FB35CC �5�2,1+.5�2(.�. 80969110
00000020 20205359 53534D56 50584102 .AXPVMSSYS ... 80969120
0004EC7F 00000000 00000000 00030F8E .............�.. 80969130
SDA> sho call
Call Frame Information
----------------------
Stack Frame Procedure Descriptor
Flags: Base Register = FP, No Jacket, Native
Procedure Entry: FFFFFFFF 8001FED0 EXE$KERNEL_WAIT_PS_C
Return address on stack = FFFFFFFF 80020970 SCH_STD$LOCKR_QUAD_C+000C0
Registers saved on stack
------------------------
7FF91D88 FFFFFFFF 80AB51C0 Saved R0
7FF91D90 00000000 7FFD0ED8 Saved R1 PSX$GA_FS_DATAPTR
7FF91D98 FFFFFFFF 80E8E340 Saved R4 PCB
7FF91DA0 FFFFFFFF 81523A20 Saved R13 SCH_STD$LOCKR_QUAD+00020
7FF91DA8 00000000 7FF91DF0 Saved R29
SDA> sho call/n
Call Frame Information
----------------------
Stack Frame Procedure Descriptor
Flags: Base Register = FP, No Jacket, Native
Procedure Entry: FFFFFFFF 80199E20 POSIX$KERNEL_NPRO+35E20
Return address on stack = FFFFFFFF 84AD5A00 POSIX$KERNEL_PRO+3BA00
Registers saved on stack
------------------------
7FF91E08 FFFFFFFF 80D07780 Saved R4
7FF91E10 FFFFFFFF 81579248 Saved R13 POSIX$KERNEL_NPRW+1DE48
7FF91E18 00000000 7FF91E20 Saved R29
Result: 32-bit sign extended integer returned in register R0
Argument List:
SDA> sho call/n
Call Frame Information
----------------------
Stack Frame Procedure Descriptor
Flags: Base Register = FP, No Jacket, Native
Procedure Entry: FFFFFFFF 84AD59A0 POSIX$KERNEL_PRO+3B9A0
Return address on stack = FFFFFFFF 84AD23F4 POSIX$KERNEL_PRO+383F4
Registers saved on stack
------------------------
7FF91E30 FFFFFFFF 81562990 Saved R2 POSIX$KERNEL_NPRW+07590
7FF91E38 00000000 7F9B6AA8 Saved R3
7FF91E40 FFFFFFFF 80D07780 Saved R4
7FF91E48 00000000 00000130 Saved R15
7FF91E50 00000000 7FF91E60 Saved R29
SDA> sho call/n
Call Frame Information
----------------------
Stack Frame Procedure Descriptor
Flags: Base Register = FP, No Jacket, Native
Procedure Entry: FFFFFFFF 84AD22C0 POSIX$KERNEL_PRO+382C0
Return address on stack = FFFFFFFF 80195ED4 POSIX$KERNEL_NPRO+31ED4
Registers saved on stack
------------------------
7FF91E78 00000000 7FF76950 Saved R2
7FF91E80 00000000 7F9B6AA8 Saved R3
7FF91E88 FFFFFFFF 80E8E340 Saved R4
7FF91E90 00000000 00000005 Saved R5
7FF91E98 00000000 00050007 Saved R6
7FF91EA0 FFFFFFFF 81562990 Saved R7 POSIX$KERNEL_NPRW+07590
7FF91EA8 00000000 00000000 Saved R8
7FF91EB0 FFFFFFFF 8497B268 Saved R9 EXE$CATCH_ALL
7FF91EB8 00000000 7FFBF800 Saved R10 MMG$IMGHDRBUF
7FF91EC0 00000000 00000000 Saved R11
7FF91EC8 00000000 7FF91EE0 Saved R29
SDA> sho call/n
Call Frame Information
----------------------
Stack Frame Procedure Descriptor
Flags: Base Register = FP, No Jacket, Native
Procedure Entry: FFFFFFFF 80195BE0 POSIX$KERNEL_NPRO+31BE0
Return address on stack = FFFFFFFF 80055124 PUT_MSG_C+0079C
Registers saved on stack
------------------------
7FF91F70 00000000 00000003 Saved R2
7FF91F78 FFFFFFFF 81531580 Saved R3 BUG$TABLE+00C38
7FF91F80 FFFFFFFF 80E8E340 Saved R4
7FF91F88 00000000 00000000 Saved R5
7FF91F90 00000000 000B7DFC Saved R6
7FF91F98 00000000 7FF91FC0 Saved R7
7FF91FA0 00000000 00011DD0 Saved R13
7FF91FA8 00000000 7F9B6AA0 Saved R29
Result: 32-bit sign extended integer returned in register R0
Argument List:
Press RETURN for more.
Call Frame Information
----------------------
FFFFFFFF 815235C8 in memory (32-bit signed integer)
89FFFFFF FF815235 in memory (32-bit signed integer)
3089FFFF FFFF8152 in memory (64-bit integer)
SDA> sho call/n
Call Frame Information
----------------------
Stack Frame Procedure Descriptor
Flags: Base Register = FP, No Jacket, Native
Procedure Entry: 00000000 000B7D80
Return address on stack = 00000000 00045988
Registers saved on stack
------------------------
7F9B6AB8 00000000 000118C0 Saved R2
7F9B6AC0 00000000 00000000 Saved R3
7F9B6AC8 00000000 00000000 Saved R4
7F9B6AD0 00000000 0015B754 Saved R5
7F9B6AD8 00000000 00000025 Saved R6
7F9B6AE0 00000000 7F9B9508 Saved R7
7F9B6AE8 00000000 00000000 Saved R8
7F9B6AF0 FFFFFFFF 8497B268 Saved R9 EXE$CATCH_ALL
7F9B6AF8 00000000 7F9B6B00 Saved R29
SDA> set nolog
| T.R | Title | User | Personal Name | Date | Lines |
|---|---|---|---|---|---|
| 1363.1 | some IDS | MUCTEC::BECKER | Hartmut B., VMS & Languages, Munich | Wed Feb 19 1997 13:24 | 73 |
> SDA> EXA 80AB51C0;100
> 80AB51E0 00001040 02040000 00010000 ........@...�Q�. 80AB51C0
> 00000000 00000000 00000000 00000000 ................ 80AB51D0
> 80AB51E8 80AB51E8 80AB51E0 80AB51E0 �Q�.�Q�.�Q�.�Q�. 80AB51E0
> 80AB51F8 80AB51F8 80AB51F0 80AB51F0 Q�.Q�.�Q�.�Q�. 80AB51F0
> 80AB5208 80AB5208 80AB5200 80AB5200 .R�..R�..R�..R�. 80AB5200
> 80AB5218 80AB5218 80AB5210 80AB5210 .R�..R�..R�..R�. 80AB5210
> 80AB5228 80AB5228 80AB5220 80AB5220 R�. R�.(R�.(R�. 80AB5220
> .
> .
> 80AB53D8 80AB53D8 80969080 80969080 ........ 80AB53D0
> .
> .
This looks like the resource hash table. For any resource, for example a
message queue, there is a name which can be looked up in the hash table.
The first longword is used as mutex. Then there are other data which describe
this table (length, etc.). Then there is a pointer to the first resource
header: 80AB51E0. The other resource headers follow. Each resource header
has a flink and a blink. Hence all the resource headers with identical
entries pointing to itself are unused slots in the hash table. From your output
it seems there is only 1 resource used, header 80969080. But the length of the
table is 0x1040.
> SDA> exa 80969080;c0
> 0004EC7F 0F8E3039 80AB53D0 80AB53D0 S�.S�.90...�.. 80969080
> 00019000 00000200 000000C8 00000001 ....�........... 80969090
> 000000C0 00000003 00C00000 00000000 ......�.....�... 809690A0
> 80AF12C0 80C7EA40 00000000 00000000 ........@��.�.. 809690B0
> 809690C8 809690C8 809690C0 809690C0 �...�...�...�... 809690C0
> 00000026 00000000 0000002D 00000000 ....-.......&... 809690D0
> 00010A01 000001C4 FFFFFFFF 36305953 SY06....�....... 809690E0
> 000141B6 0001000A 00010A01 0001000A ............�A.. 809690F0
> 36D48B78 32EAECCC 000001C4 000001DE ...�...���2x.�6 80969100
> 12F81828 32FB3595 2BB8312C 32FB35CC �5�2,1+.5�2(.�. 80969110
> 00000020 20205359 53534D56 50584102 .AXPVMSSYS ... 80969120
> 0004EC7F 00000000 00000000 00030F8E .............�.. 80969130
This is the resource. This record again has a flink and a blink, pointing back:
only one resource in this hash slot. It looks like a resource describing a
message queue. Do they use message queues? There are 3 messages in the message
queue, a total of 0xc0 bytes. The messages are in a linked list, header is at
809690B8. Nothing pending, no send, no receive. Pids and UIDs follow, and time
stamps (in posix format). I can't see anything unusual in these structures.
Back to the problem. It seems that these processes are waiting for the hash
table mutex. There should be one process, which currently holds that mutex.
The mutex is aquired for searching/updating the hash table and/or updating
resource structures.
This process may have identified itself as last sender or receiver in this or
another message queue (or another resource structure). At offset 0x80 and 0x84
there are the pids of the last sender and last receiver. In this structure the
last operation should be a send. The sender's id is 000001DE, the send time is
32FB35CC 2BB8312C (posix sec and nanosec format). This is obviously bigger than
the creation time and the last receive time (32EAECCC 36D48B78 and 32FB3595
12F81828). The last send was done 0xcc-0x95 seconds after the last receive.
If this is the only resource, then 1de was the last one successfully sending a
message to this queue. If it successfully did that it should have released the
mutex. What happened to this process, is it still around?
On the other hand, if the culprit locked the mutex and failed to identify
itself in this structure, then you have to look at all other posix processes,
not waiting for this mutex. Are there many around? What's their status?
And going some more steps back. Did it work before? Is there anything new onthe
system, new version of VMS, the application? How often does this happen, is
there anything unusual in the application/system going on at the same time?
I'm not sure if this really helps, but I hope it's a starter,
Hartmut
| |||||
| 1363.2 | EWBV05::KUNIYOSHI | Thu Feb 20 1997 05:47 | 116 | ||
Thank you for quick response.
The problem occurred just one time.The customer has been using
this system since last October.The application uses the message queue.
I have a forced crash dump and analizing it but I don't have a
knowledge of posix internal, so it is difficult to analize it.
According to previous reply, owner of MUTEX is 0001DE. It can be
confirmed because the process's prioprity is 16.
(VMS raises the priority of process who holding MUTEX to 16.)
It seems that this process is looping in POSIX$KERNEL without
releaseing MUTEX.So other posix processes hang.And also it seems
that R3 points to the resource.This resource has C7 messages.
But,I don't understand the relation between these data structure and
process loop.
Any comments are appreciated.
Regards,
Makito Kuniyoshi/CSC Tokyo
Process index: 00DE Name: posix$000001DE Extended PID: 000001DE
-------------------------------------------------------------------
Process status: 000C8001 RES,NOACNT,PHDRES,HIBER
Required capabilities: 0000000C QUORUM,RUN
PCB address 80A98A80 JIB address 80CBE480
PHD address 823E4000 Swapfile disk address 00000000
Master internal PID 000100DE Subprocess count 0
Internal PID 000100DE Creator internal PID 00000000
Extended PID 000001DE Creator extended PID 00000000
State CUR Termination mailbox 0000
Previous CPU Id 00000001 Current CPU Id 00000000
Previous ASNSEQ 00000000004414C2 Previous ASN 000000000000003D
Current priority 16 # of threads 0000000000000001
Initial process priority 4 Delete pending count 0
Base priority 16 AST's active NONE
UIC [00012,000001] AST's remaining 511
Mutex count 1 Buffered I/O count/limit 500/500
Waiting EF cluster 0 Direct I/O count/limit 500/500
Abs time of last event 0662AE25 BUFIO byte count/limit 595328/595328
Event flag wait mask F7FFFFFF # open files allowed left 465
Process index: 00DE Name: posix$000001DE Extended PID: 000001DE
-------------------------------------------------------------------
Active registers for current process at bugcheck time
-----------------------------------------------------
R0 = FFFFFFFF 80B69908 R1 = 00000000 0011BA48 R2 = FFFFFFFF 81565CD8
R3 = FFFFFFFF 80B698C0 R4 = FFFFFFFF 80B698F8 R5 = FFFFFFFF 80AB51C0
R6 = FFFFFFFF 8155C090 R7 = 00010031 00000000 R8 = 00000000 00000000
R9 = 00050031 09295064 R10 = 00018C28 00000000 R11 = 80B1C300 7F9C0EF8
R12 = 00000000 7F9C3A88 R13 = FFFFFFFF 81579248 R14 = 00000000 00000000
R15 = 00000000 00000130 R16 = 00000000 0000064C R17 = 00000000 32EAED0F
R18 = FFFFFFFF 80A72340 R19 = 00000000 00000006 R20 = 00000000 00000020
R21 = 80B98E00 80AF3780 R22 = 80B98E00 80AF3780 R23 = 00000000 00050031
R24 = 00000000 00000000 R25 = FFFFFFFF 80B69908 R26 = FFFFFFFF 815352FC
R27 = FFFFFFFF 815352C0 R28 = FFFFFFFF 84AC318C R29 = 00000000 7FF91DB0
PC = FFFFFFFF 8006EF34 PS = 38000000 00001F00 KSP = FFFFFFFF 848F1FF8
ESP = 00000000 7FF96000 SSP = 00000000 7FF9E000 USP = 00000000 7F9C0EE0
PTBR = 00000000 000000F4 AST{SR/EN} = 0000002E ASN = 00000000 0000002F
SDA> exa @r3;c0
000534C9 0F8E3039 80AB56A0 80AB56A0 .V�.V�90..�.. 80B698C0
00019000 00000200 000000C8 00000001 ....�.......... 80B698D0
00018C28 000000C7 00C00000 00000000 ......��..(... 80B698E0
80C1A480 809357C0 00000000 0000000A ........��...�. 80B698F0
80AA8F40 80AF01C0 80B69900 80B69900 ..�..��@.� 80B69900
00000000 00000000 00000000 00000000 ................ 80B69910
00010A01 000001DE FFFFFFFF 32355953 SY52.......... 80B69920
000141B6 0001000A 00010A01 0001000A ............��.. 80B69930
06419000 32EAED0F 000001DE 00005A84 .Z.....��2..A. 80B69940
2BF3CAFC 32FB35CC 12746744 32FB357F .5��Dgt.̵����� 80B69950
00000020 20205359 53534D56 50584102 .AXPVMSSYS ... 80B69960
000534C9 00000000 00000000 00030F8E ............�.. 80B69970
Call Frame Information
----------------------
Stack Frame Procedure Descriptor
Flags: Base Register = FP, No Jacket, Native
Procedure Entry: FFFFFFFF 84AE5A90 POSIX$KERNEL_PRO+4BA90
Return address on stack = FFFFFFFF 84AE65BC POSIX$KERNEL_PRO+4C5BC
Registers saved on stack
------------------------
7FF91DC8 FFFFFFFF 81565CF8 Saved R2 POSIX$KERNEL_NPRW+0A8F8
7FF91DD0 00000000 7FF76964 Saved R3
7FF91DD8 FFFFFFFF 80B1D3C0 Saved R4
7FF91DE0 FFFFFFFF 80AB51C0 Saved R5
7FF91DE8 00000000 7FF91DF0 Saved R29
SDA>
Call Frame Information
----------------------
Stack Frame Procedure Descriptor
Flags: Base Register = FP, No Jacket, Native
Procedure Entry: FFFFFFFF 84AE5DE0 POSIX$KERNEL_PRO+4BDE0
Return address on stack = FFFFFFFF 84AD2894 POSIX$KERNEL_PRO+38894
Registers saved on stack
------------------------
7FF91E10 FFFFFFFF 81562990 Saved R2 POSIX$KERNEL_NPRW+07590
7FF91E18 FFFFFFFF 8155BFD0 Saved R3 POSIX$KERNEL_NPRW+00BD0
7FF91E20 7FF76964 00000000 Saved R4
7FF91E28 FFFFFFFF 80B1D3C0 Saved R5
7FF91E30 00000000 7FF76970 Saved R6
7FF91E38 00000000 00000000 Saved R7
7FF91E40 00000000 00050031 Saved R8
7FF91E48 FFFFFFFF 8155C090 Saved R9 POSIX$KERNEL_NPRW+00C90
7FF91E50 7FF76964 00000000 Saved R10
7FF91E58 00000000 7FF91E60 Saved R29
| |||||
| 1363.3 | fifo? locks? | MUCTEC::BECKER | Hartmut B., VMS & Languages, Munich | Fri Feb 21 1997 18:17 | 12 |
I'm not sure if I found the pc's and the calls in the maps and listings. Are you sure about V2.0 and ECO 3? It looks like the last call was from a fifo routine. Do they use fifo's? The called routine should (build a resource name and) aquire a VMS lock, but I want to double check that. Does it make sense with what you can see in the dump? Anyway, I'm not with posix engineering, if you want this problem solved you should QAR, SPR or IPMT this case (whatever you and your customer think is apropriate). It may be a simple resource problem on the customer's site it may also be a problem within posix. Hartmut | |||||