[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference clusta::acms

Title:	ACMS comments and questions
Notice:	This is not an official software support channel. Kits 5.*
Moderator:	CLUSTA::HALLAN

Created:	Mon Feb 17 1986
Last Modified:	Fri Jun 06 1997
Last Successful Update:	Fri Jun 06 1997
Number of topics:	4179
Total number of notes:	15091

4177.0. "Rogue DECnet object ?" by SIOG::HANLEY () Tue Jun 03 1997 07:38

Hi, 

VMS VAX V6.2
DECnet Phase IV
ACMS V4.1
DECforms V2.1b

I have a situation where I can see from DCL failure routines that the following 
DECnet object has remained on this Front End node for the past 6 days 
although ACMS is taken down nightly at this customer site. This rogue object 
does not seem to be causing any problems for ACMS, but it is patently incorrect.

The object ACM$00010221 has what looks like a valid process id 21400439...

Object = ACM$00010221

Number                   = 0
Process id               = 21400439                                  <<<<<<<<<


$  sh proc /cont/id=21400439                                         <<<<<<<<<
   %SYSTEM-W-NONEXPR, nonexistant process


$ ANALYZE/SYSTEM


SDA> set process/index=21400439                                      <<<<<<<<<
SDA>
SDA> show process

Process index: 0039   Name: ACMS01CP057000   Extended PID: 21408A39  <<<<<<<<<
               ^^^^         ^^^^^^^^^^^^^^                 ^^^^^^^^
-------------------------------------------------------------------
Status : 00040001 res,phdres
Status2: 00000001 quantum_resched
PCB address              B4DE6900    JIB address              B4901D40
PHD address              EE351000    Swapfile disk address    00000000
Master internal PID      00450039    Subprocess count                0
Internal PID             00450039    Creator internal PID     00000000
Extended PID             21408A39    Creator extended PID     00000000
State                       HIB      Termination mailbox          21A2
Current priority                6    AST's enabled                KESU
Base priority                   4    AST's active                 NONE
UIC                [00001,000004]    AST's remaining              2951
Mutex count                     0    Buffered I/O count/limit     2198/2220
Waiting EF cluster              1    Direct I/O count/limit       2220/2220
Starting wait time       1B001B18    BUFIO byte count/limit     ******/1921098
Event flag wait mask     0000000C    # open files allowed left     158
Local EF cluster 0       F0000080    Timer entries allowed left     98
Local EF cluster 1       64000000    Active page table count         0
Global cluster 2 pointer 00000000    Process WS page count        9217
Global cluster 3 pointer 00000000    Global WS page count         1469
SDA>


NCP>

Object = ACM$00010254

Number                   = 0
Process id               = 21408A39                               <<<<<<<<<<<

I am unclear whether the process index 21408a39 has in fact two ACM$ decnet 
objects associated with it ?  

Am I correct in thinking that SDA is just returning what is currently index
0039, and is erronous for the purpose of troubleshooting the existence of the 
rogue decnet object after the event.

Also, I would like a discussion as to whether it is mandatory to stop/restart 
acms and decnet if acms fails to shutdown cleanly (like rogue decnet 
objects remaining) ? 

I have seen these objects remain a few times in the past month, and on 
occasions I get SRVNOTFOUND errors reported in the Back End audit log 
the following day. I think these are due to the rogue DECnet objects.

Regards and Thanks,
P.J.Hanley.

T.R	Title	User	Personal Name	Date	Lines
4177.1		OHMARY::HALL	Bill Hall - ACMS Engineering - ZKO2-2	`Tue Jun 03 1997 09:11`	8
	I think the SDA command is being mis-interpreted. There is a SET PROCESS/INDEX= and a SET PROCESS/ID=. In your example, you said 'set process/index=21400439 ' when it should have been set proc/id=21400439. Do a SHOW SUMMARY next time and see if the process is still there. Bill
4177.2		SIOG::HANLEY		`Tue Jun 03 1997 09:49`	43
	Bill, Seems to work the same for /id or /index ... I did a SDA> show summary and the process pid 21400439 was not present. Then .... SDA> SDA> set proc/id=21400439 SDA> SDA> sh process Process index: 0039 Name: ACMS01CP057000 Extended PID: 21408A39 ------------------------------------------------------------------- Status : 00040001 res,phdres Status2: 00000001 quantum_resched PCB address B4DE6900 JIB address B4901D40 PHD address EE351000 Swapfile disk address 00000000 Master internal PID 00450039 Subprocess count 0 Internal PID 00450039 Creator internal PID 00000000 Extended PID 21408A39 Creator extended PID 00000000 State HIB Termination mailbox 21A2 Current priority 5 AST's enabled KESU UIC [00001,000004] AST's remaining 2943 Mutex count 0 Buffered I/O count/limit 2196/2220 Waiting EF cluster 1 Direct I/O count/limit 2220/2220 Starting wait time 1B001B18 BUFIO byte count/limit ******/1920650 Event flag wait mask 0000000C # open files allowed left 151 Local EF cluster 0 F0000080 Timer entries allowed left 98 Local EF cluster 1 6C000000 Active page table count 0 Global cluster 2 pointer 00000000 Process WS page count 15251 Global cluster 3 pointer 00000000 Global WS page count 1475 SDA>
4177.3		OHMARY::HALL	Bill Hall - ACMS Engineering - ZKO2-2	`Tue Jun 03 1997 12:43`	12
	Looks like the orginal process is gone but the PID remains tied to the DECnet object. Another CP is created that just happens to use the same PID slot. PIDs are unique while VMS is running but the slots are re-used. We've already been in contact with DECnet engineering and supplied them with our code that deals with DECnet objects. They have a case already open for another customer. Bill
4177.4	Continued.	KERNEL::PULLEY	Come! while living waters flow	`Fri Jun 06 1997 04:34`	22
	On the day this object was first spotted, being attached to a nonexistant process: the machine had only been up one day; we couldn't find any record of it in accounting; there was a system service exception in the error log from a development user that morning. I've asked them to try to simply clear the object to see if that hurts ACMS, and to find out what that user was doing and what they experienced. They haven't got back to me yet. They are using acms/enter/noreturn. Does that briefly run up a user process before handing them on to the CP, then reminates the user process? There does seem to be some corolation between orphaned ACM$* network objects, and a development user's system service exception, but doesn't appear to be any match between the pid of that exception process and the pid attached to the network object. Does anyone juust happen to know, what mode exit handlers are run, (presumabley a DECnet one), which should clear up any objects?