[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference iosg::all-in-1_v30

Title:*OLD* ALL-IN-1 (tm) Support Conference
Notice:Closed - See Note 4331.l to move to IOSG::ALL-IN-1
Moderator:IOSG::PYE
Created:Thu Jan 30 1992
Last Modified:Tue Jan 23 1996
Last Successful Update:Fri Jun 06 1997
Number of topics:4343
Total number of notes:18308

787.0. "Process rundown hangs???" by KAOFS::R_OBAS () Tue Jun 02 1992 21:52

    Can anybody shed some light?
    
    I have a system running MR3.2, ALL-IN-1 2.4, VMS 5.5.
    
    Every once in a while use processes get hung up and the system has to
    be rebooted.  Analysis during the hang shows the processes are hung in
    LEF wait for a lock.  A look at the lock database shows the lock is
    granted EX to a process which is also in LEF, also waiting for an EX
    on this same lock.  Looks like the process forgot it had the lock in
    the first place.  
    
    What we have been able to determine from a few previous hangs is that
    this process was "hit" by HITMAN just before this whole mess started.
    A look at this process's other locks shows it has a lock on OA$SHUTDWN
    and also a lock on OA$REEN (unsure if this exact resource name).  I
    am not familiar with exact locking resources for ALL-IN-1, but is this
    telling me that the process was being rundown, then it tried to
    re-enter and 'forgot' it had the resource locked already?
    
    We have since turned off HITMAN but the customer is requesting
    suggestions on how else he can get rid of these idle processes; either
    through modifications to HITMAN or other means.
    
    We checked HITMAN and it issues a $FORCEX to kill the process.  It
    loops waiting for it to stop.  If it doesn't stop within that
    pre-defined delay, it then tries to delete the process.
    
    Any suggestions/workarounds/explanations would be much appreciated.
    
    Thanks,
    
    Francine Guibord
    Ricardo Obas
T.RTitleUserPersonal
Name
DateLines
787.1A vital piece of info missingIOSG::TALLETTArranging bits for a living...Wed Jun 03 1992 09:2411
    Hi there!
    
    	You forgot to mention the lock name that the process was actually
    	waiting for (the one it already had). This is important, as it
    	will give us a clue as to which bit of code has the problem.
    
    	I take it HITMAN is some sort of idle process killer (never heard
    	of it before).
    
    Regards,
    Paul
787.2Lock names involvedKAOFS::F_GUIBORDWed Jun 03 1992 20:5216
    Hi Paul,
    
    The lock it has granted and is also waiting for is:
    
    	April 22	RMS$N_SHADISK_21
    	May 4		RMS$N_SHADISK_31
    
    The affected files were:
    	April 22	OA$DAF_A.DAT
    	May 4		OA$DAF_D.DAT
    
    DUS21 and DUS31 hold the ALL-IN-1 shared area files.
    
    Thanks for the quick response.
    
    Francine
787.3Moved by moderatorIOSG::TALLETTArranging bits for a living...Wed Jun 03 1992 21:138
    
    	I moved your note, you probably meant REPLY but typed WRITE.
    
    	Hmm. The lock names don't give much clues, I don't know what
    	is going on, I will have to defer to a filcab expert...
    
    Regards,
    Paul
787.4Hangs 1/monthKAOFS::F_GUIBORDThu Jun 04 1992 15:3512
    You're right! I did mean REPLY but I typed WRITE.
    
    Any ideas would be much appreciated.  We get these hangs about once a
    month and the last two, which we looked at closely, were the same. 
    This seems to be the one common factor in both hangs.  A process with a
    granted lock blocking tons of other processes, including himself.
    
    Any ideas on other ways of logging out/deleting these idle processes?
    
    Thanks, 
    
    Fran
787.5Try VMSNOTESIOSG::TALLETTArranging bits for a living...Thu Jun 04 1992 17:3513
    
    	I was hoping someone else would step in, but....
    
    	I would suggest you take your lock names off to the VMSNOTES
    	conference and find out what they are used for in VMS and then
    	come back here and point us at the VMSNOTES notes stream and
    	we'll try and carry on from there. Sorry but I don't have time
    	right now to take your questions to VMSNOTES for you.
    
    	Without knowing what the locks mean, I'm a bit stuck.
    
    Regards,
    Paul
787.6VMSNOTES 1533KAOFS::F_GUIBORDTue Jun 09 1992 20:364
    I have cross posted in VMSNOTES note 1533 but have yet to hear
    anything.  I will post any feedback as soon as I get it.
    
    Fran
787.7Hitman idle process killer hits RMS design flaw.STAR::VANDENHEUVELNice numbers are mostly wrongFri Jun 19 1992 23:4427
    [similar reply to one I posted in VMSnotes.]
    
    RMS has a CLD from BARCLAYs BANK for this problem.
    I have also received a report from Singapore on 
    the very same problem. We have all come to the same 
    conclustion: DELPRC folowing FORCXT with very short delay.
    
    Still, there is a (design!) bug in RMS with the process
    deletion coming down in kernel mode, interupting normal
    exec mode rundown (waiting for a global buffer lock).
    RMS intends to address that problem but I hope you'll
    all down play that a little to buy them time.
    
    The RMS code has NOT changed in this area. This old problem
    has only now become visible with this particular idle process
    killer implementation. Let's blame it on them :^).
    
    As much as I _HATE_ Idle process killers (even more so now :-)
    I have to acknowledge that there is a need for it. Customer
    _are_ buying these hacks like it or not. I believe ALL-IN-1
    systems to be a particulary important target for them.
    Surely ALL-IN-1 engineering is aware of that for about a decade.
    What have they done about it? Is there a digital solution?
    If not, why not? Missed opportunity!
    
    Just an opinion, not an Engineering position,
    						Hein.
787.8Possible workround - to be confirmedAIMTEC::VOLLER_IGordon (T) Gopher for PresidentFri Jun 19 1992 23:47319
    
    Fran,
    	
    	Here's a copy of the STARs article I intend to issue.
    
    	Carl Nadrowski and I analysed the dump and our findings have been
    	communicated to RMS Engineering who were aware of the problem and
    	are currently investigating possible solutions etc. More details
    	can be found in VMSNOTES 1533.
    
    	Note. The analysis below has not yet been fully confirmed and 
    	therefore I wouldn't communicate it as is to a customer. However,
    	the workround included shouldn't cause any problems and is worth
    	trying should you come across a similar problem before a final
    	solution is available.
    
    Cheers,
    
    Iain.
    
    	
    
    ***************************************************************************
    
    Problem.

	Customer reports that ALL-IN-1 user processes are hanging when
	attempting to access Electronic Mail.
    
    Symtoms.
    
	Using $ANALYZE/SYSTEM to analyze the running system shows that
	the processes are waiting to be granted an EXclusive lock on
	an RMS lock associated with one of the ALL-IN-1 files (normally
	the ALL-IN-1 System DAF). The process that currently owns the 
	lock is also waiting for it hence creating a deadlock situation.

    Analysis.

	The SDA information (below) indicates that the problem is due
	to the fact that RMS rundown procedures are being called to
	flush Global Buffers from 2 streams (EXEC and KERNEL) at the
	same time.

	A process 'killer' program was being used to remove idle
	processes from the system. The process killer issued a $FORCEX
	system service followed by a $DELPRC. It appears that the 
	$FORCEX had not completed when the $DELPRC was issued.

	An ALL-IN-1 process owns the lock in EX mode. The EXEC stack for
	the process indicates that a call to $DEQ (dequeue lock) had been
	made to dequeue the lock on the SDAF after flushing it's
	Global Buffers.

	The KERNEL (current operating) stack indicates a request to $ENQ 
	(enqueue lock) the same lock that is being $DEQueued from 
	EXEC mode. $DELPRC at this stage is running the RMS last chance
	hanler code to flush the buffers.

	The EXEC mode lock does specify a blocking AST, but this will
	never be delivered while the process is in KERNEL mode.

	The $FORCEX ($EXIT) was expecting to return to complete the 
	rundown but was interupted by the delivery of the $DELPRC.
	As the RMS rundown is now in an unknown state, $DELPRC will restart
	the rundown and request the lock already owned in EXEC mode.

	Both the $FORCEX and the $DELPRC are trying to achieve the same
	result but don't synchronise activities. Deadlock detection 
	is effectively turned off by the KERNEL mode $ENQ request from
	$DELPRC.

    Environment.

	This problem has been observed on different platforms running
	VMS V5.5 (may occur on other versions of VMS). It is felt that 
	large, heavily utilized systems are most likely to experience 
	the problem.

	Currently we have seen the problem on systems running HITMAN
	software but any process killer program could highlight the
	problem. (Note. HITMAN is not at fault in this scenario).

    Workround.

	Increase the wait time between issuing the $FORCEX and $DELPRC.

	For HITMAN use /FORCE_WAIT qualifier to specify a longer time
	interval, eg. HITMAN/FORCE_WAIT=10

    Solution.

	RMS Engineering are aware of this problem and are currently
	investigating possible solutions etc.

    SDA information.

1) Currently owned lock

Lock data:

Lock id:  630039D4   PID:     007F009C   Flags:   VALBLK  CONVERT SYNCSTS
Par. id:  01000000   SUBLCKs:        0            SYSTEM  NODLCKW NODLCKB
LKB:      83619F00   BLKAST:  802F512A
PRIORTY:      0000

Granted at      EX   00000000-FFFFFFFF

Resource:      001B0527 24534D52    RMS$'...  Status:  DBLKAST ASYNC   BLASTQD
 Length   26   44524553 55020000    ...USERD
 Exec. mode    00202039 314B5349    ISK19  .  ! RMS lock on OA$SHARE:OA$DAF_E
 System        00000000 00000000    ........  

Local copy


``
2) New lock request that is waiting

Lock data:

Lock id:  120019DA   PID:     007F009C   Flags:   VALBLK  SYNCSTS SYSTEM
Par. id:  01000000   SUBLCKs:        0            NOQUOTA
LKB:      834CDC80   BLKAST:  00000000
PRIORTY:      0000   RQSEQNM:     2FE5

Waiting for     EX   00000000-FFFFFFFF

Resource:      001B0527 24534D52    RMS$'...  Status:  ASYNC   NOQUOTA
 Length   26   44524553 55020000    ...USERD
 Exec. mode    00202039 314B5349    ISK19  .  ! Same lock as above
 System        00000000 00000000    ........

Local copy




SDA> show process 	! NOTE DELPEN flag set 

Process index: 009C   Name: FERNANDEZM   Extended PID: 20A0FE9C
---------------------------------------------------------------
Process status:  02040023   RES,DELPEN,RESPEN,PHDRES

PCB address              8143CC70    JIB address              82407F00
PHD address              A6784E00    Swapfile disk address    00000000
Master internal PID      007F009C    Subprocess count                0
Internal PID             007F009C    Creator internal PID     00000000
Extended PID             20A0FE9C    Creator extended PID     00000000
State                       LEF      Termination mailbox          0000
Current priority                9    AST's enabled                KESU
Base priority                   4    AST's active                 E
UIC                [00046,002534]    AST's remaining               297
Mutex count                     0    Buffered I/O count/limit       90/90
Waiting EF cluster              0    Direct I/O count/limit        300/300
Starting wait time       1B001B1A    BUFIO byte count/limit      43368/43368
Event flag wait mask     7FFFFFFF    # open files allowed left     183
Local EF cluster 0       60000003    Timer entries allowed left     50
Local EF cluster 1       D0000000    Active page table count         0
Global cluster 2 pointer 00000000    Process WS page count        1454
Global cluster 3 pointer 00000000    Global WS page count          904

Current operating stack (KERNEL):

		7FFE7634  80000000	EXE$QIOW
		7FFE7638  00000000
		7FFE763C  00000000
		7FFE7640  7FFE7694	CTL$GL_KSTKBAS+00494
		7FFE7644  7FFE7658	CTL$GL_KSTKBAS+00458
		7FFE7648  8039C420	EXCEPTION+00420
		7FFE764C  7FFEE44C	SYS$SYNCH+0000C
		7FFE7650  00000004

	 SP =>  7FFE7654  00E9A8EC
		7FFE7658  00000000
		7FFE765C  2FFC0000
		7FFE7660  00000002
		7FFE7664  7FFE76E0	CTL$GL_KSTKBAS+004E0
		7FFE7668  802EF7C1	RMS+07BC1
		7FFE766C  7FFE76C0	CTL$GL_KSTKBAS+004C0
		7FFE7670  7FFE2682	IAC$AW_VECSET+0000A
		7FFE7674  00E9A8B0
		7FFE7678  0045BE3C
		7FFE767C  00ED4A00
		7FFE7680  00000000
		7FFE7684  00000002
		7FFE7688  008F1608
		7FFE768C  00E99608
		7FFE7690  7FFDFE70	PIO$GW_IIOIMPA
		7FFE7694  0000000D
		7FFE7698  0000001F
		7FFE769C  00000005
		7FFE76A0  00E9A8EC
		7FFE76A4  00000039
		7FFE76A8  00E9A8B0
		7FFE76AC  01000000	SYS$K_VERSION_BASE_IMAGE
		7FFE76B0  00000000
		7FFE76B4  00000000
		7FFE76B8  00000000
		7FFE76BC  00000000
		7FFE76C0  00000000
		7FFE76C4  00000000
		7FFE76C8  00000000
		7FFE76CC  802EF53C	RMS+0793C
		7FFE76D0  0045BEC0
		7FFE76D4  802EF4F9	RMS+078F9
		7FFE76D8  7FFE7738	CTL$GL_KSTKBAS+00538
		7FFE76DC  802EF4D6	RMS+078D6
		7FFE76E0  00000000
		7FFE76E4  00000000
		7FFE76E8  7FFE7738	CTL$GL_KSTKBAS+00538
		7FFE76EC  7FFE76FC	CTL$GL_KSTKBAS+004FC
		7FFE76F0  8039C420	EXCEPTION+00420
		7FFE76F4  7FFEE26E	SYS$RMSRUNDWN+00006
		7FFE76F8  00000000
		7FFE76FC  00000000
		7FFE7700  2FFC0000
		7FFE7704  7FFE77E8	CTL$GL_KSTKBAS+005E8
		7FFE7708  7FFE77CC	CTL$GL_KSTKBAS+005CC
		7FFE770C  803F8B42	PROCESS_MANAGEMENT+05942
		7FFE7710  00000000
		7FFE7714  7FFE2682	IAC$AW_VECSET+0000A
		7FFE7718  8143CC70	PCB
		7FFE771C  00000000
		7FFE7720  7FFE5C00	CTL$A_DISPVEC+00400
		7FFE7724  00000000
		7FFE7728  7FFE9790
		7FFE772C  008F1608
		7FFE7730  008F1608
		7FFE7734  7FFDFE70	PIO$GW_IIOIMPA
		7FFE7738  00000002
		7FFE773C  7FFE7744	CTL$GL_KSTKBAS+00544
		7FFE7740  00000002
		7FFE7744  00000080
		7FFE7748  7FFE774C	CTL$GL_KSTKBAS+0054C
		7FFE774C  7FFE9790
		7FFE7750  008F1608
		7FFE7754  008F1608
		7FFE7758  7FFDFE70	PIO$GW_IIOIMPA
		7FFE775C  00000000
		7FFE7760  00000000
		7FFE7764  7FFE7794	CTL$GL_KSTKBAS+00594
		7FFE7768  7FFE7778	CTL$GL_KSTKBAS+00578
		7FFE776C  8039C420	EXCEPTION+00420
		7FFE7770  7FFEDE96	SYS$CMKRNL+00006
		7FFE7774  00000000
		7FFE7778  00000000
		7FFE777C  20140000
		7FFE7780  7FFE77B4	CTL$GL_KSTKBAS+005B4
		7FFE7784  803AFB6A	PAGE_MANAGEMENT+0016A
		7FFE7788  803B0D18	PAGE_MANAGEMENT+01318
		7FFE778C  803B0DC7	PAGE_MANAGEMENT+013C7
		7FFE7790  803B080A	PAGE_MANAGEMENT+00E0A
		7FFE7794  A69C22B0
		7FFE7798  7FFE5800	CTL$A_DISPVEC
		7FFE779C  803B0839	PAGE_MANAGEMENT+00E39
		7FFE77A0  00000004
		7FFE77A4  803E80DD	IMAGE_MANAGEMENT+02ADD
		7FFE77A8  00000004
		7FFE77AC  7FFE267A	IAC$AW_VECSET+00002
		7FFE77B0  8143CC70	PCB
		7FFE77B4  00000000
		7FFE77B8  00000004
		7FFE77BC  7FFE5800	CTL$A_DISPVEC
		7FFE77C0  803E7C2D	IMAGE_MANAGEMENT+0262D
		7FFE77C4  00000000
		7FFE77C8  803F8B01	PROCESS_MANAGEMENT+05901
		7FFE77CC  00000000
		7FFE77D0  00300000
		7FFE77D4  7FFE965C
		7FFE77D8  7FFE9620

		7FFE77DC  803E8D55	! CALLG ASTDEL for $DELPRC
		7FFE77E0  008F28B0	! R4
		7FFE77E4  00000400	! R5
		7FFE77E8  00000005	! AST AGR block
		7FFE77EC  82469430	! AST PARAM
		7FFE77F0  00B8A000	! R0
		7FFE77F4  00DE7BFF	! R1
		7FFE77F8  7FFEE3CA	!PC SYS$DEQ+00002
		7FFE77FC  01400000	!PSL


Process stacks
--------------
EXECUTIVE stack:

	 SP =>  7FFE9620  00000000	! Call frame for DEQ of lock
		7FFE9624  2FFC0000	!
		7FFE9628  7FFE96EC	! AP
		7FFE962C  7FFE96B0	! FP
		7FFE9630  802F50BC	! CALS SYS$DEQ
		7FFE9634  7FFE967C	! r2
		7FFE9638  008F1600	! r3
		7FFE963C  008F28B0	! r4
		7FFE9640  00000400	! r5
		7FFE9644  008F16C4	! r6
		7FFE9648  00000001	! r7
		7FFE964C  7FFE9790	! r8
		7FFE9650  008F1608	! r9
		7FFE9654  008F1608	! r10
		7FFE9658  7FFDFE70	! r11
		7FFE965C  00000004      ! arg block
		7FFE9660  630039D4      ! lock ID
		7FFE9664  008F28F4      ! lock value block
		7FFE9668  00000001      ! mode
		7FFE966C  00000000      ! flags
		7FFE9670  00DE7BFF
		7FFE9674  7FFE967C
		7FFE9678  008F1600
		7FFE967C  008F16C4
		7FFE9680  00000400	IRP$M_MBXIO
		7FFE9684  008F1608
		7FFE9688  802EE233	RMS$CLOSE+00133