[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference azur::mcc

Title:	DECmcc user notes file. Does not replace IPMT.
Notice:	Use IPMT for problems. Newsletter location in note 6187
Moderator:	TAEC::BEROUD

Created:	Mon Aug 21 1989
Last Modified:	Wed Jun 04 1997
Last Successful Update:	Fri Jun 06 1997
Number of topics:	6497
Total number of notes:	27359

1149.0. "EVENTS problem" by SNOC02::MISNETWORK (Take a byte) Mon Jun 17 1991 00:46

    Haven't been able to find a similar problem so here goes. 
    
    I setup all the appropriate event logging and MCC_DNA4_EVL task object
    so that my DECnet alarms would work. They have worked fine but all of a
    sudden things seem to have died. Lots of things look strange, and I am
    now at a lost stage.
    
    I tried the old reboot trick, but I had no success with a GETEVENT
    command. Below is a section of my MCC_DNA4_EVL log -
    
    $ manage/enter/presen=mcc_dna4_evl
    Network object MCC_DNA4_EVL is declared, Status = 52854793
    Waiting for the event message from EVL....
    
    but nothing happens, I see the events reaching my system with
    REPL/ENA=NET. I disabled/enabled my local sink monitor twice before it
    started to work again. This is causing some pain as I try to keep a log
    of all DECnet outages with the following command file, but it is not
    reliable because something gets locked up, and events get
    lost/misplaced/unrecorded -
    
    show time
    $today=f$cvtime("today","absolute","date")
    $hour_till_midnight=23-f$cvtime("''f$time()'","absolute","hour")
    $minutes_till_midnight=59-f$cvtime("''f$time()'","absolute","minute")
    $duration="''hour_till_midnight'"+":"+"''minutes_till_midnight'"+":00.00"
    $todays_event_file="disk$userdisk:[tassone.mcc]''today'.events"
    $todays_event_com="disk$userdisk:[tassone.mcc]''today'.com"
    $!
    $open/write command_file 'todays_event_com'
    $write command_file "$mana/enter"
    $write command_file "getevent node4 * circ * Any Events, -"
    $write command_file "for dur ''duration', to file=''todays_event_file'"
    $write command_file "exit"
    $close command_file
    $!
    $show time
    $@'todays_event_com'
    $show time
    $!
    $submit/after=tomorrow/keep/noprint/queue=mcc$batch/-
    log=disk$userdisk:[tassone]decnet_events.log -
    disk$userdisk:[tassone.mcc]decnet_events.com
    $delete 'todays_event_com';*
    $mail/sub="DECnet events for ''today'" 'todays_event_file'
    snoc01::misnetwork
    $purg/keep=3 disk$userdisk:[tassone]decnet_events.log 
    
    Previous MCC_DNA4_EVL logs showed the following illness -
    
    Waiting for the event message from EVL.....
    The connection with EVL is established.
    ** Unable to connect to NMCC  **
    Ready to read the next event message...
    Failed to send event = 409 to MCC event manager, INSEVTPOOLMEM
    Ready to read the next event message...
    Failed to send event = 407 to MCC event manager, INSEVTPOOLMEM
    Ready to read the next event message...
    Failed to send event = 410 to MCC event manager, INSEVTPOOLMEM
    Ready to read the next event message...
    Failed to send event = 410 to MCC event manager, INSEVTPOOLMEM
    Ready to read the next event message...
    Failed to send event = 407 to MCC event manager, INSEVTPOOLMEM
    Ready to read the next event message...
    Failed to receive an event from EVL, status = 8420
    %SYSTEM-F-LINKABORT, network partner aborted logical link
      TASSONE      job terminated at 17-JUN-1991 13:55:53.01
    
    Help !
    Louis

T.R	Title	User	Personal Name	Date	Lines
1149.1	More info - more confusion	SNOC02::MISNETWORK	Take a byte	`Mon Jun 17 1991 23:01`	86
	More info. I know that last night my alarms worked when an event happened on one of my circuits, but again, today it is very much broken. The MCC_DNA4_EVL log showed the following - $ set proc/priv=(all,nobypass) $ manage/enter/presen=mcc_dna4_evl Network object MCC_DNA4_EVL is declared, Status = 52854793 Waiting for the event message from EVL..... The connection with EVL is established. Unable to connect to NMCC Ready to read the next event message... Ready to read the next event message... Ready to read the next event message... . . . Ready to read the next event message... Ready to read the next event message... Failed to receive an event from EVL, status = 8420 %SYSTEM-F-LINKABORT, network partner aborted logical link TASSONE job terminated at 18-JUN-1991 11:46:34.63 I tried the DISABLE/ENABLE trick with the local sink monitor without any success, again the log as follows - $ manage/enter/presen=mcc_dna4_evl Network object MCC_DNA4_EVL is declared, Status = 52854793 Waiting for the event message from EVL..... I tried the DISABLE/ENABLE trick a second time with the following results - MCC> disab node4 sprnet local sink monitor Node4 59.1 Local Sink Monitor AT 18-JUN-1991 11:51:31 Disable completed successfully. MCC> enabl node4 sprnet local sink monitor Node4 59.1 Local Sink Monitor AT 18-JUN-1991 11:51:34 Internal error in DECnet Phase IV AM. VMS Error = %SYSTEM-F-DUPLNAM, duplicate name MCC> enabl node4 sprnet local sink monitor Node4 59.1 Local Sink Monitor AT 18-JUN-1991 11:59:59 Enable completed successfully. Tried zeroing my counters with the following results - MCC> getevent node4 * any event %%%%%%%%%%% OPCOM 18-JUN-1991 12:01:01.00 %%%%%%%%%%% Message from user DECNET on SPRNET DECnet event 0.9, counters zeroed From node 59.1 (SPRNET), 18-JUN-1991 12:01:00.02 Node 59.1 (SPRNET) %%%%%%%%%%% OPCOM 18-JUN-1991 12:01:01.79 %%%%%%%%%%% Message from user AUDIT$SERVER on SPRNET Security alarm (SECURITY) and security audit (SECURITY) on SPRNET, system id: 65 534 Auditable event: Network login failure Event time: 18-JUN-1991 12:01:01.77 PID: 00000164 Username: ILLEGAL Remote nodename: SPRNET Remote node id: 60417 Remote username: TASSONE Status: %LOGIN-F-NOSUCHUSER, no such user NCP showed following - MCC_DNA4_EVL 0 00000163 TASK 0 ILLEGAL HELP!!! What is happening here. My once beloved uncomplaining fully operational DECmcc is sick ! Cheers, Louis
1149.2		TOOK::JEAN_LEE		`Tue Jun 18 1991 13:55`	110
	Hi Louis, Thanks for entering these reports. Let me answer them sequentially. 1. > $ manage/enter/presen=mcc_dna4_evl > Network object MCC_DNA4_EVL is declared, Status = 52854793 > Waiting for the event message from EVL.... > but nothing happens, I see the events reaching my system with > REPL/ENA=NET. I disabled/enabled my local sink monitor twice before it > started to work again. This is causing some pain as I try to keep a log > of all DECnet outages with the following command file, but it is not > reliable ..... We have also experienced this. By toggling the state of the sink usually clears the problem. We will investigate further whether this is a expected behaviour of EVL or not. 2. > Waiting for the event message from EVL..... > The connection with EVL is established. > Unable to connect to NMCC > Ready to read the next event message... > Failed to send event = 409 to MCC event manager, INSEVTPOOLMEM > Ready to read the next event message... > Failed to send event = 407 to MCC event manager, INSEVTPOOLMEM > Ready to read the next event message... > Failed to send event = 410 to MCC event manager, INSEVTPOOLMEM > Ready to read the next event message... > Failed to send event = 410 to MCC event manager, INSEVTPOOLMEM > Ready to read the next event message... > Failed to send event = 407 to MCC event manager, INSEVTPOOLMEM > Ready to read the next event message... > Failed to receive an event from EVL, status = 8420 > %SYSTEM-F-LINKABORT, network partner aborted logical link This means that MCC event manager is running out of its virtual memory. This problem needs further investigation. I will report the findings in a future note. 3. ================================================================================ Note 1149.1 EVENTS problem 1 of 1 SNOC02::MISNETWORK "Take a byte" 86 lines 17-JUN-1991 23:01 -< More info - more confusion >- -------------------------------------------------------------------------------- > Ready to read the next event message... > Ready to read the next event message... > Failed to receive an event from EVL, status = 8420 > %SYSTEM-F-LINKABORT, network partner aborted logical link When the logical link between EVL and the event sink is broken, it can be caused by many reasons, node reachability change, circuit state change, line problem...etc, just like any connectivity between two nodes. When this happens, I would check the system EVL.LOG right away, (not the mcc_dna4_evl.log) to find out the cause. Depending on the cause, restarting the sink or EVL immediately may not always be the right answer. MCC does not control the connectivity between EVL and MCC sink, except using ENABLE or DISABLE to start or abort the sink process. If the latter is the case, the log will tell you so. 4. > I tried the DISABLE/ENABLE trick with the local sink monitor without any > success, again the log as follows - > $ manage/enter/presen=mcc_dna4_evl > Network object MCC_DNA4_EVL is declared, Status = 52854793 > Waiting for the event message from EVL..... > I tried the DISABLE/ENABLE trick a second time with the following results - MCC> enable node4 sprnet local sink monitor Node4 59.1 Local Sink Monitor AT 18-JUN-1991 11:51:34 > Internal error in DECnet Phase IV AM. > VMS Error = %SYSTEM-F-DUPLNAM, duplicate name This means the sink monitor process is not completely gone yet. Sometimes it takes a while for VMS to kill a process. I would make sure the process mcc_dna4_evl is actually gone before I enable it. 5. > Tried zeroing my counters with the following results - > MCC> getevent node4 * any event > %%%%%%%%%%% OPCOM 18-JUN-1991 12:01:01.00 %%%%%%%%%%% > Message from user DECNET on SPRNET > DECnet event 0.9, counters zeroed > From node 59.1 (SPRNET), 18-JUN-1991 12:01:00.02 > Node 59.1 (SPRNET) In the above OPCOM message, this event occurred on sprnet and is from node sprnet. In MCC's model, this event is considered an event of node4 sprnet remote node sprnet. Thus, you need to use this command to get the event: MCC> getevent node4 sprnet remote node sprnet any event
1149.3	Thanks for the info	SNOC02::MISNETWORK	Take a byte	`Tue Jun 18 1991 19:54`	22
	Thanks for the thorough reply. Good to see there are answeres to some of my problems, if not total solutions. I checked my EVL.LOG and only found 2, one was fine but the latest version showed the following - $ RUN SYS$SYSTEM:EVL %EVL-E-OPENMON, error creating logical link to monitor process SPRNET::"TASK=mcc _dna4_evl" -SYSTEM-F-INVLOGIN, login information invalid at remote node %EVL-E-WRITEMON, error writing event record to monitor process mcc_dna4_evl -SYSTEM-F-FILNOTACC, file not accessed on channel Must have been when I was turning the lights on and off. The log times donot correspond to the MCC_DNA4_EVL log, so I will have to remember next time to check the EVL.LOG when I get the network abort message. Looking forward to your findings, Cheers, Louis
1149.4	Need more info for INSEVTPOOLMEM	TOOK::T_HUPPER	The rest, as they say, is history.	`Tue Jun 25 1991 13:06`	33
	The inquiry into the INSEVTPOOLMEM error needs further input from you. Are you receiving MCC_S_EVENT_LOST in your com file log when the sink is reporting INSEVTPOOLMEM? This should be the case. If not, then something is either not being reported, or the event pool is so full that lost events cannot be delivered. A good way to create a big problem in the current type of event pool is to "stop" (exit handlers don't run) a DECmcc process that is receiving events while other DECmcc processes are still running. The event pool will still contain the abandonned mcc_event_get request structures. These abandonned requests will still receive all matching events, but will not read the event out of the event pool and free its memory. If this is the case, the only way to free the memory is to exit from all DECmcc processes on the system and restart them. There must be a point in time when there are NO DECmcc processes running. Then the next DECmcc process to perform an event operation will cause the event pool to be recreated in its empty state. Are you stopping any DECmcc processes on the system (any users) while leaving others running? I would assume that a reboot of the system would also clean out the event pool nicely. How long after the reboot did the sink report that the pool had INSEVTPOOLMEM? How many events are correctly received before lost events or no events are received? I would assume that events are correctly received for a while, then lost events are received, then no events are received. This would be the case if the events were simply arriving too fast to be processed by the DECmcc system. The event pool happens to be the most limited queue in the the events subsystem, so that is where the problem is reported. What is the arrival rate of events in the event sink that are to be processed by DECmcc? Also, what type of machine are you using, so we can get a estimate of reasonable event throughput? Ted Hupper
1149.5	INSEVTPOOLMEM error gone	SNOC01::MISNETWORK	They call me LAT	`Sun Jun 30 1991 23:00`	8
	The INSEVTPOOLMEM error seems to have gone away, so I will not pursue it at this stage. Things have been working pretty well, but I haven't had a chance to check all the logs, so I will start doing that again. Thanks for the advice, Cheers, Louis
1149.6	still a prob	JETSAM::WOODCOCK		`Mon Jul 01 1991 09:29`	62
	If it's ok I'd like to pick up following thru on this problem. I see this INSETPOOLMEM almost daily with MCC_DNA4_EVL going south after a dozen or two. This is hampering my confidence in using EVENTS. > The inquiry into the INSEVTPOOLMEM error needs further input from you. > Are you receiving MCC_S_EVENT_LOST in your com file log when the sink > is reporting INSEVTPOOLMEM? This should be the case. If not, then > something is either not being reported, or the event pool is so full > that lost events cannot be delivered. I'm sure I'm not starting the process exactly like base note but I don't believe I've ever seen a MCC_S_EVENT_LOST error. > A good way to create a big problem in the current type of event pool is > to "stop" (exit handlers don't run) a DECmcc process that is receiving > events while other DECmcc processes are still running. The event pool > will still contain the abandonned mcc_event_get request structures. > These abandonned requests will still receive all matching events, but > will not read the event out of the event pool and free its memory. If > this is the case, the only way to free the memory is to exit from all > DECmcc processes on the system and restart them. There must be a point > in time when there are NO DECmcc processes running. Then the next > DECmcc process to perform an event operation will cause the event pool > to be recreated in its empty state. Are you stopping any DECmcc > processes on the system (any users) while leaving others running? Usually the only reason we stop processes is because they don't work. Once the INSETPOOLMEM kills MCC_DNA4_EVL we of course have to restart it. This is typically in the morning when we check for proper processes. As far as other MCC processes running there probably is. It is unrealistic to stop ALL MCC processes when we restart MCC_DNA4_EVL and the associated alarms. There will ALWAYS be other alarm processes, recording, and exporting to take place. We can't be restarting all MCC processes in the future when this occurs. > I would assume that a reboot of the system would also clean out the > event pool nicely. It seems to, yes. > How long after the reboot did the sink report that the pool had > INSEVTPOOLMEM? How many events are correctly received before lost > events or no events are received? I would assume that events are > correctly received for a while, then lost events are received, then no > events are received. This would be the case if the events were simply > arriving too fast to be processed by the DECmcc system. The event pool > happens to be the most limited queue in the the > events subsystem, so that is where the problem is reported. What is > the arrival rate of events in the event sink that are to be processed > by DECmcc? Also, what type of machine are you using, so we can get a > estimate of reasonable event throughput? I'm not sure how long after a reboot. Restarting MCC_DNA4_EVL seems to work for several hours though. I haven't seen anything in the present logs which indicate lost events. Events coming in can be anything from 1 an hour to 5-10 per second. It depends on what is happening on the net. I'm now running on an 8810 w/384M (it feels good to breathe again, the 3520 now only does the display work). To say the least I should have enough fire power, and I'm gonna let all sorts of MCC stuff RIP and bring us to the levels we should have been at months ago. best regards, brad...
1149.7	Please be careful how you kill background MCC processes	TOOK::GUERTIN	I do this for a living -- really	`Mon Jul 01 1991 11:52`	40
	If you are seeing INSEVTPOOLMEM when you look at the DNA4 EVL log file, then I can understand it. It should (I assume) have some text around it, like "The DNA4 Event Monitor just got a INSEVTPOOLMEM from the MCC Event Manager!". On the other hand, if you are seeing this signalled as a VMS message, then something doesn't make sense. That CVR should always be trapped by the caller of the mcc_event_put() MCC kernel routine. In order to clean up a request of an event, the Requestor of an MCC event must cancel the request. However, the Requestor cannot always cancel, for example, if the user hits Control-Y, the Requestor may not get control. We therefore have an Exit Handler in the Event Manager to capture any remaining outstanding requests. On image exit, the Event Manager cleans up whatever the Event Requestors could not. But if someone does a $ STOP on an MCC process which is requesting events, even the Exit Handlers do not get called. There is little we can do at this point (being a user-mode event system). The Event Sinks generally only PUT events, so stopping them (with a $ STOP) rarely (if ever) would cause outstanding Requests to be left in the Event Pool. Ideally, Event Sinks should be stopped but issuing some sort of MCC> DISABLE <whatever> SINK command, which will cause a clean rundown of the Event Sink. Check the Documentation for the exact command syntax for the Sink you want to stop. There are some MCC processes which run in the background (no user interface), but also do GETEVENTs. These need to be aborted WITHOUT stopping them (e.g., DO NOT use the DCL $ STOP command). An example might be MCC Alarms running in batch. If you DO abort a background MCC Alarms process, would almost always cause garbage (mostly invalid request information) to be left in the Event Pool. The Putters (e.g., DNA4 Event Sinks) would see these as valid reqests for events, and post events to the Event Manager. After awhile, the Events will flood the Event Pool, and you have to take fairly drastic measures (killing all processes using MCC) to clean things up. Do you have to kill background MCC Alarms processes? If so, how do you kill them? -Matt.
1149.8	more info/questions	JETSAM::WOODCOCK		`Mon Jul 01 1991 14:07`	54
	> If you are seeing INSEVTPOOLMEM when you look at the DNA4 EVL log file, > then I can understand it. It should (I assume) have some text around > it, like "The DNA4 Event Monitor just got a INSEVTPOOLMEM from the MCC > Event Manager!". On the other hand, if you are seeing this signalled > as a VMS message, then something doesn't make sense. That CVR should > always be trapped by the caller of the mcc_event_put() MCC kernel > routine. The INSEVTPOOLMEM error is indeed seen in the MCC_DNA4_EVL.LOG. > In order to clean up a request of an event, the Requestor of an MCC > event must cancel the request. However, the Requestor cannot always > cancel, for example, if the user hits Control-Y, the Requestor may not > get control. We therefore have an Exit Handler in the Event Manager > to capture any remaining outstanding requests. On image exit, the > Event Manager cleans up whatever the Event Requestors could not. But > if someone does a $ STOP on an MCC process which is requesting events, > even the Exit Handlers do not get called. There is little we can do at > this point (being a user-mode event system). The Event Sinks generally > only PUT events, so stopping them (with a $ STOP) rarely (if ever) > would cause outstanding Requests to be left in the Event Pool. > Ideally, Event Sinks should be stopped but issuing some sort of > MCC> DISABLE <whatever> SINK command, which will cause a clean rundown > of the Event Sink. Check the Documentation for the exact command > syntax for the Sink you want to stop. Actually, MCC_STARTUP_DNA4_EVL I think does this as a first step. In any event, the errors and subsequent failure of MCC_DNA4_EVL doesn't come when someone has STOPped a process. It is usually in the middle of the night sometime. Could a STOP process cause problems later? > There are some MCC processes which run in the background (no user > interface), but also do GETEVENTs. These need to be aborted WITHOUT > stopping them (e.g., DO NOT use the DCL $ STOP command). An example > might be MCC Alarms running in batch. If you DO abort a background MCC > Alarms process, would almost always cause garbage (mostly invalid > request information) to be left in the Event Pool. The Putters (e.g., > DNA4 Event Sinks) would see these as valid reqests for events, and post > events to the Event Manager. After awhile, the Events will flood the > Event Pool, and you have to take fairly drastic measures (killing all > processes using MCC) to clean things up. > Do you have to kill background MCC Alarms processes? If so, how do > you kill them? The only time we STOP MCC ALARMS processes is when they don't work. Sorry, I'm a bit puzzled, if we are running ALARMS in batch what other options other than STOP do we have to initiate a restart of the alarms? Or should the order of things go, DISABLE SINK, STOP alarms process, ENABLE SINK, START alarms process? thanks, brad...
1149.9	There are no easy answers for this problem	TOOK::GUERTIN	I do this for a living -- really	`Mon Jul 01 1991 16:07`	40
	> Actually, MCC_STARTUP_DNA4_EVL I think does this as a first step. In any > event, the errors and subsequent failure of MCC_DNA4_EVL doesn't come when > someone has STOPped a process. It is usually in the middle of the night > sometime. Could a STOP process cause problems later? Yes. Once you STOP a process which is doing GETEVENTs, you have initiated a stale request, which could eventually clog up the Event Pool. It may minutes, hours, or days, depending how often the events (which never get picked up) come into the Event Pool. > The only time we STOP MCC ALARMS processes is when they don't work. Sorry, I'm > a bit puzzled, if we are running ALARMS in batch what other options other than > STOP do we have to initiate a restart of the alarms? Or should the order of > things go, DISABLE SINK, STOP alarms process, ENABLE SINK, START alarms > process? I'm sorrier than you are! There is no elegant solution to this problem. The fact of the matter is that in the release notes, we state (for users of the MCC Kernel routines) that the MCC processes should not be STOPped. End users are now realizing that it is useful to have Alarms running in batch, but don't know of a clean way to stop the batch process. Hence, shooting it in the head seems to do the trick. There are two possibilities for this awkward situation. I recommend running Alarms from a window (you can iconize it). If you want to kill Alarms, then just Control-Y out. Everything should cleanup correctly. The other possibility to do a "Forced Exit" of the Alarms process. This is more difficult, because there is no way at DCL level to do this, you need to write your own program (I have one that I can post as a reply if you want it). Also, since it causes the process to essentially call the Exit routine in the middle of execution, you may cause the process to go into resource waits (for example, if the process was in a Disable Control-Y window of execution, and you Forced an Exit). If Alarms is not working, then we need to figure out why BEFORE killing the Alarms process. If we find the originator of the problems, you should never need to stop the Alarms process. I think by solving one of your problems, you are creating bigger problems. -Matt.
1149.10	If not Batch, then what?	NSSG::R_SPENCE	Nets don't fail me now...	`Tue Jul 02 1991 10:04`	11
	DECmcc engineering reccomends running alarms in batch. No one is going to run production alarms in a window. Can't reboot the workstation... can't even log out to let someone else use it... Sounds like the re-engineering of alarms to a detached process controlled from DECmcc is a priority. What do we tell customers? s/rob
1149.11	managable batch alarms soon??	JETSAM::WOODCOCK		`Tue Jul 02 1991 11:29`	16
	I have to agree. Alarms from a window is not viable. For the reasons Rob mentioned and also alarms run 24 hours a day. Leaving sys logged in all day/ night I'm uncomfortable with, especially considering I've set host to the main system and this link potentially could drop occasionally creating the same problem we're trying to avoid. Managable alarms within batch has been LONG stated as an area needed for change. Are there any updates as to when this may change? As far as killing processes I'll try to walk more lightly but what can I say. Stopping all MCC processes or rebooting a multi-application clustered 8810 aren't pretty options. Also I'm not convinced this is the root to all the evil, but only an irritant worsening the situation. FYI, this problem with the pool is probably more widespread among EVL users than known because others have indicated they seen it also. Considering how many are actually using EVL for monitoring it may be a high percentage seeing the error. cheers, brad...
1149.12	We said THAT!?!?!	TOOK::GUERTIN	I do this for a living -- really	`Tue Jul 02 1991 11:47`	24
	Rob, As a member of DECmcc engineering, I'm amazed and disappointed that this fell through the cracks. There is no patch that I can think of. I talked to Anil Navkal (Alarms PL) just yesterday, and thought he told me that they did NOT explicitly state that the user should run Alarms in batch. The problem is that Alarms does not have ANY detached process support. If it did, then we would not be in this predicament. (This is not a complaint about the Alarms-FM. The MCC-Kernel needs to provide generic detached process management routines.) Other MMs have implemented their own private detached process support. The fact of the matter remains that you cannot kill the Alarms process by doing a DCL STOP on the process while Alarms is requesting Events. I don't know what a DELETE/ENTRY does to a process, if it is the same as a STOP, then you MUST NOT do that either. Is it possible to have a command procedure disable all the Alarm Event rules running in batch? -Matt.
1149.13	No can do ...	TOOK::ORENSTEIN		`Tue Jul 02 1991 13:42`	13
	I too have been thinking about this problem, and I agree that ALARMS will be better off when it is detached. Matt, unfortunately rules are enabled within a process. So a user on DCL can not see that rules are being run in batch. And that user on DCL can not disable the rules that are running in batch. Infact, ALARMS is designed so that once the rule is enabled, another process could delete the rule from the MIR, and it would keep running in the first process as if nothing happened. aud...
1149.14	using DELETE not STOP	JETSAM::WOODCOCK		`Tue Jul 02 1991 14:33`	7
	Hi Matt, For clarity, I always DELETE/ENTRY to stop the process. I never use STOP PROCESS/ID=... I too, don't know if there is a difference. But I always use DELETE because it's usually easier to type :-). brad...
1149.15	Try this instead...	TOOK::GUERTIN	I do this for a living -- really	`Tue Jul 02 1991 15:10`	102
	The following is a VAX C program which will attempt to send a Force Exit to another process. You need privileges to send a Force Exit to a process that you do not own. If you need to abort an MCC process and cannot do it interactively, then please try using "FORCEX" before attempting to use the STOP or DELETE/ENTRY commands. (At least until we find a better solution.) -Matt. This program is not supported by NME, MCC, or DEC in general. No one is liable or responsible for this program in any way, shape or form. Use at your own risk. Etc,etc. <insert usual caveats here> --------------------------CUT HERE--------------------------------- /* FORCEX.C -- Force Another Process to Exit (by calling the $FORCEX system routine). $ CC FORCEX.C $ LINK FORCEX.OBJ, SYS$INPUT:/OPT ! Type in image lib interactively. SYS$SHARE:VAXCRTL.EXE/SHARE ^Z ! Control-Z out of input mode. $ COPY FORCEX.EXE ! Copy it to where you want it. from a privileged account, define it as a Foreign command: $ FORCEX:==$SYS$DISK:[]FORCEX.EXE ! Use actual disk location. $ FORCEX <pid1> [<pid2> ... <pidn>] ! Use PID or Process name (quoted). / #include <descrip.h> #include <ssdef.h> int remove_quotes( p_string ) / Remove double quotes / char p_string; { int i; for (i=0;(p_string+i) != '\0';i++) (p_string+i) = (p_string+i+1); if ((i > 1) && ((p_string+i-2) == '"')) (p_string+i-2) = '\0'; return (strlen( p_string )); } main( argc, argv ) int argc; char argv[]; { int exit_code = SS$_FORCEDEXIT; int use_pid; int sstat; int pid; char procnam_str; struct dsc$descriptor procnam_dsc = {0, DSC$K_DTYPE_T, DSC$K_CLASS_S, 0}; int arg_count = 0; int quotes = 0;/ boolean flag 1 = no quotes, 0 = quotes specified / int msg_len; char msg_txt[256]; struct dsc$descriptor msg_dsc = {256, DSC$K_DTYPE_T, DSC$K_CLASS_S, msg_txt}; procnam_str = malloc( 256 ); do { arg_count++; if (argc < 2) { printf("Enter a PID in hex (or a Process Name) : "); scanf("%s",procnam_str ); argc = 1; } else procnam_str = argv[arg_count]; procnam_dsc.dsc$w_length = strlen( procnam_str ); procnam_dsc.dsc$a_pointer = procnam_str; / Quoted strings are always treated as Names / quotes = (procnam_str == '"'); if (!quotes && (ots$cvt_tz_l(&procnam_dsc, &pid, 4, 0) == SS$_NORMAL)) sstat = sys$forcex( &pid, 0, &exit_code ); else { if ((quotes) && (procnam_dsc.dsc$w_length > 1)) procnam_dsc.dsc$w_length = remove_quotes( procnam_str ); sstat = sys$forcex( 0, &procnam_dsc, &exit_code ); } if (sstat == SS$_NORMAL) printf("\nForced Exit successfully requested for %s\n", procnam_str ); else { printf("\nForced Exit request failed for %s\n", procnam_str); sys$getmsg( sstat, &msg_len, &msg_dsc, 1, 0 ); msg_txt[msg_len] ='\0'; printf("Reason: %s\n",msg_txt); } } while (arg_count < argc-1); }
1149.16	SET MODE=HACK	WAKEME::ANIL		`Wed Jul 03 1991 09:21`	37
	Thanks Matt. Will every one out there give a good round of applause to Matt for writing the real code! :-) While you guys are busy compiling Matt's program you may want to try the following to get you out of the "how-to-stop-MCC-that-is_running- in-the-background". The command procedure has all the comments. My first thought was to make it a lot more fancy and be driven by some rule firing that will stop the batch job. But for now I prefer it to be very simple. A little effort on users part will solve the problem. In V1.2 we may try to be a little more user friendly :-), no promises though!! $ manage/enter ! Enable mcc 0 Alarms rule foo_1, in domain blaha ! : ! Enable all your rules here ! : ! Enable mcc 0 Alarms rule foo_n, in domain blaha ! ! The following command will wait for what ever delta-time you specify ! If you want to stop the Background process check the PID of the ! spawned process. The name of the process is <username>_1 ! The PID of this process is generally 1 more than the batch job's PID ! , say its x. Now to stop the background MCC, do your favorite stop/id ! for the PID x. The spawned process will be killed. The parent process ! will now resume next mcc command which just happens to be a graceful ! exit. You may want to do SHOW MCC 0 Alarms RULE * all att before ! the exit command. ! spawn wait 22:00:00 exit
1149.17	works good	JETSAM::WOODCOCK		`Wed Jul 03 1991 15:41`	11
	Hi Matt, Thanks for the program. I've got it compiled and tested. It seems to do the trick and hopefully it helps and/or resolves this problem. best regards, brad... PS. Anil, nice creative hack as Option B :-)
1149.18	For a future version...	MARVIN::COBB	Graham R. Cobb (Wide Area Comms.), REO2-G/H9, 830-3917	`Fri Jul 05 1991 08:31`	26
	Processes will always get stopped for many reasons. You shouldn't ever rely on user-mode exit handlers or ^Y interception to clean up a shared resource. There are two fairly obvious fixes I can think of for a future version: 1) Use a kernel mode exit handler. Of course this requires writing privileged, inner mode code and using things like protected sharable images. 2) Take stock and tidy up frequently. For example every time a process connects to the global section have it look around and tidy up the mess caused by a process going away unexpectedly. Or do it from a timer. The main problem here is working out who is still attached. Fortunately there is an easy solution to that using locks. You can get as complex as you like using locks but a simple solution should work: every process that uses the global section writes its PID somewhere in the section where everyone else can find it. It also takes out an exclusive lock called MCC$<pid>. If another process needs to know whether the first process is still around (and, more importantly, still using the global section!) it tries to acquire lock MCC$<pid>. If it succeeds the process has stopped using the section and its mess should be tidied away. Either of those solutions could work. Or, of course, something much more specific to the alarms module. Whichever way it is done I think this needs to be a high priority to fix for V1.2. Graham
1149.19	The future is ... "Portability"!	TOOK::GUERTIN	I do this for a living -- really	`Mon Jul 08 1991 08:50`	55
	RE:.18 Graham, Yes, the solutions you suggest are doable. The problems are: 1) Using a kernel mode exit handler. This is analogous to cracking open a peanut with a thermonuclear device. Yes, it will work, yes, it is overkill, yes, there are simpler (and more portable) solutions which stay in user-mode. 2) Various garbage collection schemes. Counting on things such as PIDs to identify a process will work until the same PID gets re-used. If you look at the N-process to N-process communication behavior of MCC events (for example Sinks are generally very long running process which mainly do Puts, while forground MCC tend not to run very long, and do Gets), then you will notice that it may be several hours or days between when the process goes aways and another process needs to check its PID. I do not believe there is a guarantee in the VMS architecture that PIDs will be not be reused, or at which intervals they could be re-used. If you know of any statements (such as, "PIDs are always unique and never reused between reboots"), then please let me know. Also, remember that were are not just talking about processes, we are also talking about threads. For example, if a thread issues a Get, and then is destroyed, or hangs, the event request remains in the event pool. Instead, there appears to be a handful of creative, yet simple solutions, which provide the same end result. Some examples: 1) Implement a "sweeper". Sweepers are threads which run in any process which calls the MCC Event Manager. They are started up on Event Manager initialization, and periodically scan the Event Pool for garbage. Unfortunately, this is an "active" as opposed to a "passive" solution, and required the system to do more work base upon the load. 2) When the Putter puts an Event, and notices that the Getter hasn't picked up events in a timely fashion, he issues a "challenge". If the Getter accepts the challenge, then the Request is validated. 3) Each Getter has a quota of the number of events it can have queued up in the event pool. If the quota is reached, the events are "lost", after a period of no Getter activity, the request itself becomes invalid. We have several others, including various combinations of the above schemes. I appreciate your interest, and your taking the time to propose plausable solutions. However, the real issue is not the lack of solutions, but the lack of time and people resources to implement them. The solution we have finally come up with requires a minimum of both, but it still must be worked into the schedule and traded against other tasks (which means some other piece of functionality or some other bug fix will NOT get into the product in the next release). For V1.1, we reluctantly settled for exit handler cleanup -- although that solution isn't very portable either :-).
1149.20	Help is coming - "Real Soon"	TOOK::T_HUPPER	The rest, as they say, is history.	`Mon Jul 08 1991 11:46`	13
	RE:.18, .19 Just so everybody can feel better about "the Event Manager that can't clean up after itself", we have time allocated for the V1.2 release to implement some/all of the functionality that Matt outlined in .19. The internal Event Pool cleanup mechanism has always been an integral part of the Event Manager, but until now, there has been NO time to implement it. The tradeoffs we've had to make in many areas of DECmcc in order to get ANY product out the door have been severe. We are allocating more time now to filling in some of the areas previously traded off. Ted
1149.21		MARVIN::COBB	Graham R. Cobb (Wide Area Comms.), REO2-G/H9, 830-3917	`Mon Jul 08 1991 11:49`	11
	You are right that there are many possible solutions (by the way, the "lock" approach can be made immune to re-using the same PID but it rapidly becomes complex). Personally I would probably use the kernel mode exit handler approach, but then I have been writing VMS inner mode code for almost 10 years! I take your point that any solution will cost some other feature but I wanted to add my voice to the outcry that a user mode exit handler is not an adequate solution for V1.2. Graham
1149.22	INSEVTPOOLMEM is back	JETSAM::WOODCOCK		`Thu Jul 18 1991 11:59`	18
	I have come back to the original problem, INSEVTPOOLMEM. I have once again received this error today. I have been extremely careful to use 'FORCEX' but the error has reappeared. Usually I can simply restart MCC_DNA4_EVL and all works well for awhile but not today. Restarting it brought back the same error within minutes. Should I reset all MCC processes when I receive this error always, please say no that is a painful workaround. As a side note I have been working on MCC and EVL being more robust. As a consequence I forced EVL to go away many times yesterday which produced a fatal link abort error in MCC_DNA4_EVL. Could this have been the prelude to this error coming on again? It shouldn't be because EVL goes away on its own often and can't be avoided thru normal operations. Would it help to restart only MCC_DNA4_EVL each work day? BTW, I think I have a hack to keep MCC_DNA4_EVL running even when EVL drops out. I'll be looking for opinions on it but I'll post it in the appropriate note. regards, brad...
1149.23	Not processing sinked events	AUNTB::BRILEY	Are you a rock or leaf in the wind	`Wed Jul 24 1991 09:42`	7
	Did anyone ever find out the problem causing the initial problem that Louis reported. That is the MCC_DNA4_EVL not receiving/processing sinked event. Thanks, Rob
1149.24	Event Mgr cleanup for killed processes?	TAEC::MCDONALD		`Mon Feb 17 1992 05:16`	21
	re .20 >Just so everybody can feel better about "the Event Manager that can't >clean up after itself", we have time allocated for the V1.2 release >to implement some/all of the functionality that Matt outlined in .19. I am using mcc Component Version = T1.2.4 on Ultrix. Has the functionality discussed in notes .19 & .20 been implemented in the newer Event Manager? I have a background process which does an mcc_event_get for infinity. If this process gets killed (kill on Ultrix), then other processes doing mcc_event_puts still receive a status of Normal (as if another process has received the event, when in fact there are no other processes waiting for the event). If the background process does an mcc_event_get cancel before exiting then this does not happen. Is there a way to correct this (the mcc_event_put receives MCC_S_NOEVENTREQ when the process is no longer there) ? thanks, Carol
1149.25	Use mcc_kill rather than kill	TOOK::MINTZ	Erik Mintz, DECmcc Development, dtn 226-5033	`Mon Feb 17 1992 08:40`	6
	This does appear to be a problem (and I have seen the relevant QAR). However, we DO NOT recommend killing DECmcc processes on ULTRIX using "kill". That is why we provide mcc_kill to terminate them. -- Erik
1149.26	what's the difference?	TAEC::MCDONALD		`Mon Feb 17 1992 10:59`	3
	what does mcc_kill do differently from "kill"? Anyway a process might exit for other reasons before doing a cancel.
1149.27	mcc_kill allows a clean shut down	TOOK::MINTZ	Erik Mintz, DECmcc Development, dtn 226-5033	`Mon Feb 17 1992 11:16`	7
	> what does mcc_kill do differently from "kill"? It sends an MCC event that allows a process to shut itself down. There are known clean-up problems when a process is abruptly terminated.
1149.28	Event manager cleanup has been implemented in V1.2	TOOK::T_HUPPER	The rest, as they say, is history.	`Tue Feb 18 1992 11:08`	61
	RE .24: New functionality for V1.2: The event manager DOES cleanup when processes die. It does NOT do so immediately. The purpose is to avoid filling up the event memory pool with events for GETs of processes that have been killed/stopped. The purpose is not to ensure to the PUT that a GET actually processed the event. That is impossible for the (low-level) event manager to do. It has no control over what happens to an event after it leaves the event manager. The cleanup that is done when a process doing mcc_event_get calls dies is based on a timer and the queue of the mcc_event_get filling up. The algorithm is as follows: If the event queue (settable with the MCC_EVENT_EDQ_SIZE_LIMIT environmemt variable, default is 200) for the GET fills up, after a timeout (settable with the environment variable MCC_EVENT_EDQ_TIME_LIMIT, default is 60 seconds) AND another event is PUT to this queue, the entire contents of the queue is converted to lost events. If another event is PUT to this queue after another timeout (settable with the environment variable MCC_EVENT_LOST_TIME_LIMIT, default is 600 seconds) expires, the GET structures are removed from the event manager. No further PUTs will see this deleted GET (they will now receive MCC_S_NOEVENTREQ). If the event pool has filled to a threshold level (not settable), it is not necessary to have any PUTs enqueued for the dead GET to have the above sequence take place. All GETs in the event pool are checked against the timeouts. Any GETs past the timeouts are deleted along with their posted events. The purpose of the above sweeping operation is to prevent the event manager pool from being put out of commission by dead GETs. Note that because of the timeouts and/or requirement to reach a threshold of fullness, we cannot give instantaneous accuracy on whether or not the event actually went to a GET process. After a process with outstanding GETs dies, and before the GET structures are removed from the event pool, PUTs that match those GETs will return MCC_S_NORMAL. After the cleanup, they will receive MCC_S_NOEVENTREQ. The difference in these CVRs is whether or not the event was queued to a GET, not whether the event was acted upon by a real process. If you need to know whether an event was acted upon, then you need a transaction processing model. As the event manager is only providing a one-way distribution of data, a single event posting cannot provide this capability. An end-to-end receipt is required. A return event could provide that receipt, but the model is becoming complicated. If knowing as quickly as possible whether a GET process has died (perhaps so that an automatic restart of the GET process can be done (but why did it die?)) is really important, we would have to test the existence of the GET process for each matching GET for each PUT of an event. Given that the event manager cannot guarantee action on an event and needs to have high performance, we did not implement this test. Ted