T.R | Title | User | Personal Name | Date | Lines |
---|
787.1 | A vital piece of info missing | IOSG::TALLETT | Arranging bits for a living... | Wed Jun 03 1992 09:24 | 11 |
| Hi there!
You forgot to mention the lock name that the process was actually
waiting for (the one it already had). This is important, as it
will give us a clue as to which bit of code has the problem.
I take it HITMAN is some sort of idle process killer (never heard
of it before).
Regards,
Paul
|
787.2 | Lock names involved | KAOFS::F_GUIBORD | | Wed Jun 03 1992 20:52 | 16 |
| Hi Paul,
The lock it has granted and is also waiting for is:
April 22 RMS$N_SHADISK_21
May 4 RMS$N_SHADISK_31
The affected files were:
April 22 OA$DAF_A.DAT
May 4 OA$DAF_D.DAT
DUS21 and DUS31 hold the ALL-IN-1 shared area files.
Thanks for the quick response.
Francine
|
787.3 | Moved by moderator | IOSG::TALLETT | Arranging bits for a living... | Wed Jun 03 1992 21:13 | 8 |
|
I moved your note, you probably meant REPLY but typed WRITE.
Hmm. The lock names don't give much clues, I don't know what
is going on, I will have to defer to a filcab expert...
Regards,
Paul
|
787.4 | Hangs 1/month | KAOFS::F_GUIBORD | | Thu Jun 04 1992 15:35 | 12 |
| You're right! I did mean REPLY but I typed WRITE.
Any ideas would be much appreciated. We get these hangs about once a
month and the last two, which we looked at closely, were the same.
This seems to be the one common factor in both hangs. A process with a
granted lock blocking tons of other processes, including himself.
Any ideas on other ways of logging out/deleting these idle processes?
Thanks,
Fran
|
787.5 | Try VMSNOTES | IOSG::TALLETT | Arranging bits for a living... | Thu Jun 04 1992 17:35 | 13 |
|
I was hoping someone else would step in, but....
I would suggest you take your lock names off to the VMSNOTES
conference and find out what they are used for in VMS and then
come back here and point us at the VMSNOTES notes stream and
we'll try and carry on from there. Sorry but I don't have time
right now to take your questions to VMSNOTES for you.
Without knowing what the locks mean, I'm a bit stuck.
Regards,
Paul
|
787.6 | VMSNOTES 1533 | KAOFS::F_GUIBORD | | Tue Jun 09 1992 20:36 | 4 |
| I have cross posted in VMSNOTES note 1533 but have yet to hear
anything. I will post any feedback as soon as I get it.
Fran
|
787.7 | Hitman idle process killer hits RMS design flaw. | STAR::VANDENHEUVEL | Nice numbers are mostly wrong | Fri Jun 19 1992 23:44 | 27 |
| [similar reply to one I posted in VMSnotes.]
RMS has a CLD from BARCLAYs BANK for this problem.
I have also received a report from Singapore on
the very same problem. We have all come to the same
conclustion: DELPRC folowing FORCXT with very short delay.
Still, there is a (design!) bug in RMS with the process
deletion coming down in kernel mode, interupting normal
exec mode rundown (waiting for a global buffer lock).
RMS intends to address that problem but I hope you'll
all down play that a little to buy them time.
The RMS code has NOT changed in this area. This old problem
has only now become visible with this particular idle process
killer implementation. Let's blame it on them :^).
As much as I _HATE_ Idle process killers (even more so now :-)
I have to acknowledge that there is a need for it. Customer
_are_ buying these hacks like it or not. I believe ALL-IN-1
systems to be a particulary important target for them.
Surely ALL-IN-1 engineering is aware of that for about a decade.
What have they done about it? Is there a digital solution?
If not, why not? Missed opportunity!
Just an opinion, not an Engineering position,
Hein.
|
787.8 | Possible workround - to be confirmed | AIMTEC::VOLLER_I | Gordon (T) Gopher for President | Fri Jun 19 1992 23:47 | 319 |
|
Fran,
Here's a copy of the STARs article I intend to issue.
Carl Nadrowski and I analysed the dump and our findings have been
communicated to RMS Engineering who were aware of the problem and
are currently investigating possible solutions etc. More details
can be found in VMSNOTES 1533.
Note. The analysis below has not yet been fully confirmed and
therefore I wouldn't communicate it as is to a customer. However,
the workround included shouldn't cause any problems and is worth
trying should you come across a similar problem before a final
solution is available.
Cheers,
Iain.
***************************************************************************
Problem.
Customer reports that ALL-IN-1 user processes are hanging when
attempting to access Electronic Mail.
Symtoms.
Using $ANALYZE/SYSTEM to analyze the running system shows that
the processes are waiting to be granted an EXclusive lock on
an RMS lock associated with one of the ALL-IN-1 files (normally
the ALL-IN-1 System DAF). The process that currently owns the
lock is also waiting for it hence creating a deadlock situation.
Analysis.
The SDA information (below) indicates that the problem is due
to the fact that RMS rundown procedures are being called to
flush Global Buffers from 2 streams (EXEC and KERNEL) at the
same time.
A process 'killer' program was being used to remove idle
processes from the system. The process killer issued a $FORCEX
system service followed by a $DELPRC. It appears that the
$FORCEX had not completed when the $DELPRC was issued.
An ALL-IN-1 process owns the lock in EX mode. The EXEC stack for
the process indicates that a call to $DEQ (dequeue lock) had been
made to dequeue the lock on the SDAF after flushing it's
Global Buffers.
The KERNEL (current operating) stack indicates a request to $ENQ
(enqueue lock) the same lock that is being $DEQueued from
EXEC mode. $DELPRC at this stage is running the RMS last chance
hanler code to flush the buffers.
The EXEC mode lock does specify a blocking AST, but this will
never be delivered while the process is in KERNEL mode.
The $FORCEX ($EXIT) was expecting to return to complete the
rundown but was interupted by the delivery of the $DELPRC.
As the RMS rundown is now in an unknown state, $DELPRC will restart
the rundown and request the lock already owned in EXEC mode.
Both the $FORCEX and the $DELPRC are trying to achieve the same
result but don't synchronise activities. Deadlock detection
is effectively turned off by the KERNEL mode $ENQ request from
$DELPRC.
Environment.
This problem has been observed on different platforms running
VMS V5.5 (may occur on other versions of VMS). It is felt that
large, heavily utilized systems are most likely to experience
the problem.
Currently we have seen the problem on systems running HITMAN
software but any process killer program could highlight the
problem. (Note. HITMAN is not at fault in this scenario).
Workround.
Increase the wait time between issuing the $FORCEX and $DELPRC.
For HITMAN use /FORCE_WAIT qualifier to specify a longer time
interval, eg. HITMAN/FORCE_WAIT=10
Solution.
RMS Engineering are aware of this problem and are currently
investigating possible solutions etc.
SDA information.
1) Currently owned lock
Lock data:
Lock id: 630039D4 PID: 007F009C Flags: VALBLK CONVERT SYNCSTS
Par. id: 01000000 SUBLCKs: 0 SYSTEM NODLCKW NODLCKB
LKB: 83619F00 BLKAST: 802F512A
PRIORTY: 0000
Granted at EX 00000000-FFFFFFFF
Resource: 001B0527 24534D52 RMS$'... Status: DBLKAST ASYNC BLASTQD
Length 26 44524553 55020000 ...USERD
Exec. mode 00202039 314B5349 ISK19 . ! RMS lock on OA$SHARE:OA$DAF_E
System 00000000 00000000 ........
Local copy
``
2) New lock request that is waiting
Lock data:
Lock id: 120019DA PID: 007F009C Flags: VALBLK SYNCSTS SYSTEM
Par. id: 01000000 SUBLCKs: 0 NOQUOTA
LKB: 834CDC80 BLKAST: 00000000
PRIORTY: 0000 RQSEQNM: 2FE5
Waiting for EX 00000000-FFFFFFFF
Resource: 001B0527 24534D52 RMS$'... Status: ASYNC NOQUOTA
Length 26 44524553 55020000 ...USERD
Exec. mode 00202039 314B5349 ISK19 . ! Same lock as above
System 00000000 00000000 ........
Local copy
SDA> show process ! NOTE DELPEN flag set
Process index: 009C Name: FERNANDEZM Extended PID: 20A0FE9C
---------------------------------------------------------------
Process status: 02040023 RES,DELPEN,RESPEN,PHDRES
PCB address 8143CC70 JIB address 82407F00
PHD address A6784E00 Swapfile disk address 00000000
Master internal PID 007F009C Subprocess count 0
Internal PID 007F009C Creator internal PID 00000000
Extended PID 20A0FE9C Creator extended PID 00000000
State LEF Termination mailbox 0000
Current priority 9 AST's enabled KESU
Base priority 4 AST's active E
UIC [00046,002534] AST's remaining 297
Mutex count 0 Buffered I/O count/limit 90/90
Waiting EF cluster 0 Direct I/O count/limit 300/300
Starting wait time 1B001B1A BUFIO byte count/limit 43368/43368
Event flag wait mask 7FFFFFFF # open files allowed left 183
Local EF cluster 0 60000003 Timer entries allowed left 50
Local EF cluster 1 D0000000 Active page table count 0
Global cluster 2 pointer 00000000 Process WS page count 1454
Global cluster 3 pointer 00000000 Global WS page count 904
Current operating stack (KERNEL):
7FFE7634 80000000 EXE$QIOW
7FFE7638 00000000
7FFE763C 00000000
7FFE7640 7FFE7694 CTL$GL_KSTKBAS+00494
7FFE7644 7FFE7658 CTL$GL_KSTKBAS+00458
7FFE7648 8039C420 EXCEPTION+00420
7FFE764C 7FFEE44C SYS$SYNCH+0000C
7FFE7650 00000004
SP => 7FFE7654 00E9A8EC
7FFE7658 00000000
7FFE765C 2FFC0000
7FFE7660 00000002
7FFE7664 7FFE76E0 CTL$GL_KSTKBAS+004E0
7FFE7668 802EF7C1 RMS+07BC1
7FFE766C 7FFE76C0 CTL$GL_KSTKBAS+004C0
7FFE7670 7FFE2682 IAC$AW_VECSET+0000A
7FFE7674 00E9A8B0
7FFE7678 0045BE3C
7FFE767C 00ED4A00
7FFE7680 00000000
7FFE7684 00000002
7FFE7688 008F1608
7FFE768C 00E99608
7FFE7690 7FFDFE70 PIO$GW_IIOIMPA
7FFE7694 0000000D
7FFE7698 0000001F
7FFE769C 00000005
7FFE76A0 00E9A8EC
7FFE76A4 00000039
7FFE76A8 00E9A8B0
7FFE76AC 01000000 SYS$K_VERSION_BASE_IMAGE
7FFE76B0 00000000
7FFE76B4 00000000
7FFE76B8 00000000
7FFE76BC 00000000
7FFE76C0 00000000
7FFE76C4 00000000
7FFE76C8 00000000
7FFE76CC 802EF53C RMS+0793C
7FFE76D0 0045BEC0
7FFE76D4 802EF4F9 RMS+078F9
7FFE76D8 7FFE7738 CTL$GL_KSTKBAS+00538
7FFE76DC 802EF4D6 RMS+078D6
7FFE76E0 00000000
7FFE76E4 00000000
7FFE76E8 7FFE7738 CTL$GL_KSTKBAS+00538
7FFE76EC 7FFE76FC CTL$GL_KSTKBAS+004FC
7FFE76F0 8039C420 EXCEPTION+00420
7FFE76F4 7FFEE26E SYS$RMSRUNDWN+00006
7FFE76F8 00000000
7FFE76FC 00000000
7FFE7700 2FFC0000
7FFE7704 7FFE77E8 CTL$GL_KSTKBAS+005E8
7FFE7708 7FFE77CC CTL$GL_KSTKBAS+005CC
7FFE770C 803F8B42 PROCESS_MANAGEMENT+05942
7FFE7710 00000000
7FFE7714 7FFE2682 IAC$AW_VECSET+0000A
7FFE7718 8143CC70 PCB
7FFE771C 00000000
7FFE7720 7FFE5C00 CTL$A_DISPVEC+00400
7FFE7724 00000000
7FFE7728 7FFE9790
7FFE772C 008F1608
7FFE7730 008F1608
7FFE7734 7FFDFE70 PIO$GW_IIOIMPA
7FFE7738 00000002
7FFE773C 7FFE7744 CTL$GL_KSTKBAS+00544
7FFE7740 00000002
7FFE7744 00000080
7FFE7748 7FFE774C CTL$GL_KSTKBAS+0054C
7FFE774C 7FFE9790
7FFE7750 008F1608
7FFE7754 008F1608
7FFE7758 7FFDFE70 PIO$GW_IIOIMPA
7FFE775C 00000000
7FFE7760 00000000
7FFE7764 7FFE7794 CTL$GL_KSTKBAS+00594
7FFE7768 7FFE7778 CTL$GL_KSTKBAS+00578
7FFE776C 8039C420 EXCEPTION+00420
7FFE7770 7FFEDE96 SYS$CMKRNL+00006
7FFE7774 00000000
7FFE7778 00000000
7FFE777C 20140000
7FFE7780 7FFE77B4 CTL$GL_KSTKBAS+005B4
7FFE7784 803AFB6A PAGE_MANAGEMENT+0016A
7FFE7788 803B0D18 PAGE_MANAGEMENT+01318
7FFE778C 803B0DC7 PAGE_MANAGEMENT+013C7
7FFE7790 803B080A PAGE_MANAGEMENT+00E0A
7FFE7794 A69C22B0
7FFE7798 7FFE5800 CTL$A_DISPVEC
7FFE779C 803B0839 PAGE_MANAGEMENT+00E39
7FFE77A0 00000004
7FFE77A4 803E80DD IMAGE_MANAGEMENT+02ADD
7FFE77A8 00000004
7FFE77AC 7FFE267A IAC$AW_VECSET+00002
7FFE77B0 8143CC70 PCB
7FFE77B4 00000000
7FFE77B8 00000004
7FFE77BC 7FFE5800 CTL$A_DISPVEC
7FFE77C0 803E7C2D IMAGE_MANAGEMENT+0262D
7FFE77C4 00000000
7FFE77C8 803F8B01 PROCESS_MANAGEMENT+05901
7FFE77CC 00000000
7FFE77D0 00300000
7FFE77D4 7FFE965C
7FFE77D8 7FFE9620
7FFE77DC 803E8D55 ! CALLG ASTDEL for $DELPRC
7FFE77E0 008F28B0 ! R4
7FFE77E4 00000400 ! R5
7FFE77E8 00000005 ! AST AGR block
7FFE77EC 82469430 ! AST PARAM
7FFE77F0 00B8A000 ! R0
7FFE77F4 00DE7BFF ! R1
7FFE77F8 7FFEE3CA !PC SYS$DEQ+00002
7FFE77FC 01400000 !PSL
Process stacks
--------------
EXECUTIVE stack:
SP => 7FFE9620 00000000 ! Call frame for DEQ of lock
7FFE9624 2FFC0000 !
7FFE9628 7FFE96EC ! AP
7FFE962C 7FFE96B0 ! FP
7FFE9630 802F50BC ! CALS SYS$DEQ
7FFE9634 7FFE967C ! r2
7FFE9638 008F1600 ! r3
7FFE963C 008F28B0 ! r4
7FFE9640 00000400 ! r5
7FFE9644 008F16C4 ! r6
7FFE9648 00000001 ! r7
7FFE964C 7FFE9790 ! r8
7FFE9650 008F1608 ! r9
7FFE9654 008F1608 ! r10
7FFE9658 7FFDFE70 ! r11
7FFE965C 00000004 ! arg block
7FFE9660 630039D4 ! lock ID
7FFE9664 008F28F4 ! lock value block
7FFE9668 00000001 ! mode
7FFE966C 00000000 ! flags
7FFE9670 00DE7BFF
7FFE9674 7FFE967C
7FFE9678 008F1600
7FFE967C 008F16C4
7FFE9680 00000400 IRP$M_MBXIO
7FFE9684 008F1608
7FFE9688 802EE233 RMS$CLOSE+00133
|