[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference azur::mcc

Title:	DECmcc user notes file. Does not replace IPMT.
Notice:	Use IPMT for problems. Newsletter location in note 6187
Moderator:	TAEC::BEROUD

Created:	Mon Aug 21 1989
Last Modified:	Wed Jun 04 1997
Last Successful Update:	Fri Jun 06 1997
Number of topics:	6497
Total number of notes:	27359

1724.0. "Events and deadlocks?" by NWACES::SPAZZ::TRULL () Fri Oct 25 1991 17:11

    We are, on occasion, receiving a SYSTEM-F-DEADLOCK error from
    the mcc_event_put routine.
    
    Some background info:
    We are using DECmcc V1.1 on VMS 5.4.
    
    We have an Event Sink process which generates the events using
    mcc_event_put.  We have around 200 different events and have
    about 60 alarms enabled.  We are using the iconic map but have
    the alarm rules enabled via a batch process (as discussed in a
    previous note).  We are probably generating 20 events per hour
    although we have a 1 second timer in the code so that we don't 
    flood DECmcc with too many events at a time.
    
    Could someone explain how the event pool (global section) works;
    how the mcc_event_put and mcc_event_get calls lock and unlock it?
    Could the size of the event pool be a factor?  Is the event pool
    a factor?  We also (once) received an exceeded ENQUEUE limit error,
    but the process has an ENQUEUE limit of 2000 - is this sufficient?
    
    
    Thanks for any suggestions,
    
    Bruce and Cathy

T.R	Title	User	Personal Name	Date	Lines
1724.1		TOOK::GUERTIN	Don't fight fire with flames	`Mon Oct 28 1991 14:50`	17
	The MCC V1.1 Event Manager does use VMS locks to lock the MCC Event Pool Global Section. VMS interlock instructions are used at a low level to reduce VMS lock manager resources. I run with an ENQLM of 512 and have never seen an exceeded enqlm error message. And I have done extensive testing with MCC and the MCC Event Manager. I have also not seen a SYSTEM-F-DEADLOCK problem with the production V1.1 code (we did have a window of potential deadlock on the Event_Get side before shipping but that was fixed). The one second timer seems to be overkill to me. We have "maxed" out the putter without seeing this problem. The size of the event pool is too small, we know that now (at the time, we had no idea using the event manager would be so popular). There is a patch around to increase the event pool size. If it isn't in the patch node (1267.* -- I think) then it should be. How is the timer called? It isn't blocking the entire process is it? -Matt
1724.2	V1.1 locking/unlocking is complex	TOOK::T_HUPPER	The rest, as they say, is history.	`Mon Oct 28 1991 15:27`	35
	RE .0: The size of your event pool does not seem to be a concern. You would be receiving MCC_S_INSEVTPOOLMEM errors from the mcc_event_put if it were. The load imposed by your events should not be a problem either. The Enqlm of 2000 should also be adequate. The event pool for V1.1 is fairly complex, and a full explanation here would probably not be appropriate. Just to make it simple, the locking/unlocking of the V1.1 event manager has been greatly simplified for V1.2. The V1.1 version has several levels of locks, with higher levels being held for concurrent access during an mcc_event_put() call. This opens up many areas for concurrency conflicts and caused us a great deal of excess execution overhead when we moved from our original framework routines to the use of CMA (DECthreads) for our framework. The V1.2 event manager has a single lock, which is always held for exclusive access. It is simple and much more robust. Were we having problems with the V1.1 event manager? Yes, but not much in the area of deadlocks. One of the major problems we have with V1.1 is the deletion of processes that are using the event manager. We end up with hung locks and also the filling of the event pool because of dead get structures. Ensure that when DECmcc starts up on a system, leftover event manager users have been terminated. This permits a clean startup of the event manager. For V1.2, this will not be so much of a concern, as the event pool will be self-policing to a large extent. How often are you seeing the deadlock? When it occurs, are there many events being put into the event manager? Just a few? One event? What is your event sink process? Is it using other DECmcc or VMS locks? How many threads is the event sink itself using? How many are attempting to use mcc_event_put() at the same time? Ted
1724.3	Some answers...	NWACES::KIMMEL		`Tue Oct 29 1991 11:44`	58
	Thank you for your quick replies. Re .1: We added the 1-second timer (we used the MCC timer routine) because we discovered that it fixed a weird problem we were seeing - when we generated an event for an entity, occasionally an alarm rule for a different entity would fire. This would usually happen when we generated a lot of events quickly (we used a test driver that looped, constantly generating events), although it was not reproducible at will. (We only had 2 weeks to test our prototype software, so we did not have time to thoroughly look into problems... since the timer fixed it, we just assumed it was some sort of timing problem and accepted it.) We do have the patch to enlarge the event pool because the customer has seen INSEVTPOOLMEM a few times. RE .2: It sounds like 1.2 may solve our problems. Fortunately, the customer has not encountered the DEADLOCK error; we saw it during our testing here only and, from your reply, it may have occurred because of processes that were killed and did not free locks. The customer has a habit of rebooting the workstation whenever he needs to restart the software, so that may be goodness. To answer some of your other questions... >>> How often are you seeing the deadlock? <<< We saw it a few times (less than 10) during our 2 weeks of testing. >>> When it occurs, are there many events being put into the event manager? Just a few? One event? <<< It varied; sometimes it would occur in the middle of a burst of events (when we used our test driver mentioned in the 1st paragraph); sometimes, it would occur in the middle of a slow series of events. I don't remember it ever occurring on the first event after everything was freshly started. >>> What is your event sink process? Is it using other DECmcc or VMS locks? How many threads is the event sink itself using? How many are attempting to use mcc_event_put() at the same time? <<< The event sink process is a detached process using 2 threads; only 1 thread calls mcc_event_put. It uses 2 other locks using the DECmcc locking routines. Since it's not a reproducible problem, we don't want to waste your time. We were thinking that if we understood how the event pool locking worked that we might be able to understand how a deadlock error could occur. It seems likely that the problem was caused by old locks that were hanging around. We added more debugging statements in hopes of trapping exactly what is happening with the ENQLM error. That isn't reproducible, either. If we find out more info, we'll let you know. Thanks again, Cathy