[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference clusta::acms

Title:ACMS comments and questions
Notice:This is not an official software support channel. Kits 5.*
Moderator:CLUSTA::HALLAN
Created:Mon Feb 17 1986
Last Modified:Fri Jun 06 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:4179
Total number of notes:15091

4177.0. "Rogue DECnet object ?" by SIOG::HANLEY () Tue Jun 03 1997 08:38

Hi, 

VMS VAX V6.2
DECnet Phase IV
ACMS V4.1
DECforms V2.1b

I have a situation where I can see from DCL failure routines that the following 
DECnet object has remained on this Front End node for the past 6 days 
although ACMS is taken down nightly at this customer site. This rogue object 
does not seem to be causing any problems for ACMS, but it is patently incorrect.

The object ACM$00010221 has what looks like a valid process id 21400439...

Object = ACM$00010221

Number                   = 0
Process id               = 21400439                                  <<<<<<<<<


$  sh proc /cont/id=21400439                                         <<<<<<<<<
   %SYSTEM-W-NONEXPR, nonexistant process


$ ANALYZE/SYSTEM


SDA> set process/index=21400439                                      <<<<<<<<<
SDA>
SDA> show process

Process index: 0039   Name: ACMS01CP057000   Extended PID: 21408A39  <<<<<<<<<
               ^^^^         ^^^^^^^^^^^^^^                 ^^^^^^^^
-------------------------------------------------------------------
Status : 00040001 res,phdres
Status2: 00000001 quantum_resched
PCB address              B4DE6900    JIB address              B4901D40
PHD address              EE351000    Swapfile disk address    00000000
Master internal PID      00450039    Subprocess count                0
Internal PID             00450039    Creator internal PID     00000000
Extended PID             21408A39    Creator extended PID     00000000
State                       HIB      Termination mailbox          21A2
Current priority                6    AST's enabled                KESU
Base priority                   4    AST's active                 NONE
UIC                [00001,000004]    AST's remaining              2951
Mutex count                     0    Buffered I/O count/limit     2198/2220
Waiting EF cluster              1    Direct I/O count/limit       2220/2220
Starting wait time       1B001B18    BUFIO byte count/limit     ******/1921098
Event flag wait mask     0000000C    # open files allowed left     158
Local EF cluster 0       F0000080    Timer entries allowed left     98
Local EF cluster 1       64000000    Active page table count         0
Global cluster 2 pointer 00000000    Process WS page count        9217
Global cluster 3 pointer 00000000    Global WS page count         1469
SDA>


NCP>

Object = ACM$00010254

Number                   = 0
Process id               = 21408A39                               <<<<<<<<<<<

I am unclear whether the process index 21408a39 has in fact two ACM$ decnet 
objects associated with it ?  

Am I correct in thinking that SDA is just returning what is currently index
0039, and is erronous for the purpose of troubleshooting the existence of the 
rogue decnet object after the event.

Also, I would like a discussion as to whether it is mandatory to stop/restart 
acms and decnet if acms fails to shutdown cleanly (like rogue decnet 
objects remaining) ? 

I have seen these objects remain a few times in the past month, and on 
occasions I get SRVNOTFOUND errors reported in the Back End audit log 
the following day. I think these are due to the rogue DECnet objects.

Regards and Thanks,
P.J.Hanley.


T.RTitleUserPersonal
Name
DateLines
4177.1OHMARY::HALLBill Hall - ACMS Engineering - ZKO2-2Tue Jun 03 1997 10:118
    
    	I think the SDA command is being mis-interpreted.  There is a
    	SET PROCESS/INDEX= and a SET PROCESS/ID=. In your example,
    	you said  'set process/index=21400439 ' when it should
    	have been set proc/id=21400439.  Do a SHOW SUMMARY
    	next time and see if the process is still there.
    
    	Bill
4177.2SIOG::HANLEYTue Jun 03 1997 10:4943
    
    
    Bill,
    
    Seems to work the same for /id or /index ...
    
    I did a SDA> show summary 
    and the process pid 21400439 was not present.
    
    Then ....
    
    SDA>
    SDA> set proc/id=21400439
    SDA>
    SDA> sh process
    
    Process index: 0039   Name: ACMS01CP057000   Extended PID: 21408A39
    -------------------------------------------------------------------
    Status : 00040001 res,phdres
    Status2: 00000001 quantum_resched
    PCB address              B4DE6900    JIB address              B4901D40
    PHD address              EE351000    Swapfile disk address    00000000
    Master internal PID      00450039    Subprocess count                0
    Internal PID             00450039    Creator internal PID     00000000
    Extended PID             21408A39    Creator extended PID     00000000
    State                       HIB      Termination mailbox          21A2
    Current priority                5    AST's enabled                KESU
    UIC                [00001,000004]    AST's remaining              2943
    Mutex count                     0    Buffered I/O count/limit    
    2196/2220
    Waiting EF cluster              1    Direct I/O count/limit      
    2220/2220
    Starting wait time       1B001B18    BUFIO byte count/limit    
    ******/1920650
    Event flag wait mask     0000000C    # open files allowed left     151
    Local EF cluster 0       F0000080    Timer entries allowed left     98
    Local EF cluster 1       6C000000    Active page table count         0
    Global cluster 2 pointer 00000000    Process WS page count       15251
    Global cluster 3 pointer 00000000    Global WS page count         1475
    SDA>
    
    
    
4177.3OHMARY::HALLBill Hall - ACMS Engineering - ZKO2-2Tue Jun 03 1997 13:4312
    
    	Looks like the orginal process is gone but the PID remains tied
    	to the DECnet object.  Another CP is created that just happens
    	to use the same PID slot. PIDs are unique while VMS is running
    	but the slots are re-used.
    
    	We've already been in contact with DECnet engineering and supplied
    	them with our code that deals with DECnet objects.  They have a
    	case already open for another customer.
    
    	Bill
    
4177.4Continued.KERNEL::PULLEYCome! while living waters flowFri Jun 06 1997 05:3422
    On the day this object was first spotted, being attached to a
    nonexistant process:  the machine had only been up one day;  we
    couldn't find any record of it in accounting;  there was a system
    service exception in the error log from a development user that
    morning.
    I've asked them to try to simply clear the object to see if that hurts
    ACMS, and to find out what that user was doing and what they
    experienced.
    They haven't got back to me yet.
    
    They are using acms/enter/noreturn.
    Does that briefly run up a user process before handing them on to the
    CP, then reminates the user process?
    
    There does seem to be some corolation between orphaned ACM$* network
    objects, and a development user's system service exception, but doesn't
    appear to be any match between the pid of that exception process and 
    the pid attached to the network object.
    
    Does anyone juust happen to know, what mode exit handlers are run,
    (presumabley a DECnet one), which should clear up any objects?