[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference azur::mcc

Title:DECmcc user notes file. Does not replace IPMT.
Notice:Use IPMT for problems. Newsletter location in note 6187
Moderator:TAEC::BEROUD
Created:Mon Aug 21 1989
Last Modified:Wed Jun 04 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:6497
Total number of notes:27359

1724.0. "Events and deadlocks?" by NWACES::SPAZZ::TRULL () Fri Oct 25 1991 17:11

    We are, on occasion, receiving a SYSTEM-F-DEADLOCK error from
    the mcc_event_put routine.
    
    Some background info:
    We are using DECmcc V1.1 on VMS 5.4.
    
    We have an Event Sink process which generates the events using
    mcc_event_put.  We have around 200 different events and have
    about 60 alarms enabled.  We are using the iconic map but have
    the alarm rules enabled via a batch process (as discussed in a
    previous note).  We are probably generating 20 events per hour
    although we have a 1 second timer in the code so that we don't 
    flood DECmcc with too many events at a time.
    
    Could someone explain how the event pool (global section) works;
    how the mcc_event_put and mcc_event_get calls lock and unlock it?
    Could the size of the event pool be a factor?  Is the event pool
    a factor?  We also (once) received an exceeded ENQUEUE limit error,
    but the process has an ENQUEUE limit of 2000 - is this sufficient?
    
    
    Thanks for any suggestions,
    
    Bruce and Cathy
T.RTitleUserPersonal
Name
DateLines
1724.1TOOK::GUERTINDon't fight fire with flamesMon Oct 28 1991 14:5017
    The MCC V1.1 Event Manager does use VMS locks to lock the MCC Event
    Pool Global Section.  VMS interlock instructions are used at a low
    level to reduce VMS lock manager resources.  I run with an ENQLM of
    512 and have never seen an exceeded enqlm error message.  And I have
    done extensive testing with MCC and the MCC Event Manager.  I have
    also not seen a SYSTEM-F-DEADLOCK problem with the production V1.1
    code (we did have a window of potential deadlock on the Event_Get
    side before shipping but that was fixed).  The one second timer
    seems to be overkill to me.  We have "maxed" out the putter without
    seeing this problem.  The size of the event pool is too small, we
    know that now (at the time, we had no idea using the event manager
    would be so popular).  There is a patch around to increase the event
    pool size.  If it isn't in the patch node (1267.* -- I think) then
    it should be.  How is the timer called?  It isn't blocking the
    entire process is it?
    
    -Matt
1724.2V1.1 locking/unlocking is complexTOOK::T_HUPPERThe rest, as they say, is history.Mon Oct 28 1991 15:2735
    RE .0:
    
    The size of your event pool does not seem to be a concern.  You would
    be receiving MCC_S_INSEVTPOOLMEM errors from the mcc_event_put if it
    were.  The load imposed by your events should not be a problem either.
    The Enqlm of 2000 should also be adequate.
    
    The event pool for V1.1 is fairly complex, and a full explanation here
    would probably not be appropriate.   Just to make it simple, the
    locking/unlocking of the V1.1 event manager has been greatly simplified
    for V1.2.  The V1.1 version has several levels of locks, with higher
    levels being held for concurrent access during an mcc_event_put() call. 
    This opens up many areas for concurrency conflicts and caused us a
    great deal of excess execution overhead when we moved from our original
    framework routines to the use of CMA (DECthreads) for our framework. 
    The V1.2 event manager has a single lock, which is always held for
    exclusive access.  It is simple and much more robust.  Were we having
    problems with the V1.1 event manager?  Yes, but not much in the area of
    deadlocks.
    
    One of the major problems we have with V1.1 is the deletion of
    processes that are using the event manager.  We end up with hung locks
    and also the filling of the event pool because of dead get structures.  
    Ensure that when DECmcc starts up on a system, leftover event manager
    users have been terminated.  This permits a clean startup of the event
    manager.  For V1.2, this will not be so much of a concern, as the event
    pool will be self-policing to a large extent.
    
    How often are you seeing the deadlock?  When it occurs, are there many
    events being put into the event manager?  Just a few?  One event?  What
    is your event sink process?  Is it using other DECmcc or VMS locks? 
    How many threads is the event sink itself using?  How many are
    attempting to use mcc_event_put() at the same time?
    
    Ted
1724.3Some answers...NWACES::KIMMELTue Oct 29 1991 11:4458
Thank you for your quick replies.

Re .1: We added the 1-second timer (we used the MCC timer routine) because
       we discovered that it fixed a weird problem we were seeing - when
       we generated an event for an entity, occasionally an alarm rule
       for a *different* entity would fire.  This would usually happen
       when we generated a lot of events quickly (we used a test driver
       that looped, constantly generating events), although it was not
       reproducible at will.  (We only had 2 weeks to test our prototype
       software, so we did not have time to thoroughly look into problems...
       since the timer fixed it, we just assumed it was some sort of
       timing problem and accepted it.)

       We do have the patch to enlarge the event pool because the customer
       has seen INSEVTPOOLMEM a few times.  

RE .2: It sounds like 1.2 may solve our problems.  Fortunately, the
       customer has not encountered the DEADLOCK error; we saw it during
       our testing here only and, from your reply, it may have occurred
       because of processes that were killed and did not free locks.
       The customer has a habit of rebooting the workstation whenever
       he needs to restart the software, so that may be goodness.
    
       To answer some of your other questions...
       >>> How often are you seeing the deadlock? <<<  

       We saw it a few times (less than 10) during our 2 weeks of testing.

       >>> When it occurs, are there many events being put into the event 
       manager?  Just a few?  One event? <<<  

       It varied; sometimes it would occur in the middle of a burst of 
       events (when we used our test driver mentioned in the 1st paragraph);
       sometimes, it would occur in the middle of a slow series of events.

       I don't remember it ever occurring on the first event after
       everything was freshly started.

       >>> What is your event sink process?  Is it using other DECmcc or 
       VMS locks? How many threads is the event sink itself using?  How many 
       are attempting to use mcc_event_put() at the same time? <<<

       The event sink process is a detached process using 2 threads; only
       1 thread calls mcc_event_put.  It uses 2 other locks using the
       DECmcc locking routines.

Since it's not a reproducible problem, we don't want to waste your time.
We were thinking that if we understood how the event pool locking worked
that we might be able to understand how a deadlock error could occur.
It seems likely that the problem was caused by old locks that were 
hanging around.

We added more debugging statements in hopes of trapping exactly what is
happening with the ENQLM error.  That isn't reproducible, either.  If
we find out more info, we'll let you know.

Thanks again,
Cathy