| The MCC V1.1 Event Manager does use VMS locks to lock the MCC Event
Pool Global Section. VMS interlock instructions are used at a low
level to reduce VMS lock manager resources. I run with an ENQLM of
512 and have never seen an exceeded enqlm error message. And I have
done extensive testing with MCC and the MCC Event Manager. I have
also not seen a SYSTEM-F-DEADLOCK problem with the production V1.1
code (we did have a window of potential deadlock on the Event_Get
side before shipping but that was fixed). The one second timer
seems to be overkill to me. We have "maxed" out the putter without
seeing this problem. The size of the event pool is too small, we
know that now (at the time, we had no idea using the event manager
would be so popular). There is a patch around to increase the event
pool size. If it isn't in the patch node (1267.* -- I think) then
it should be. How is the timer called? It isn't blocking the
entire process is it?
-Matt
|
| RE .0:
The size of your event pool does not seem to be a concern. You would
be receiving MCC_S_INSEVTPOOLMEM errors from the mcc_event_put if it
were. The load imposed by your events should not be a problem either.
The Enqlm of 2000 should also be adequate.
The event pool for V1.1 is fairly complex, and a full explanation here
would probably not be appropriate. Just to make it simple, the
locking/unlocking of the V1.1 event manager has been greatly simplified
for V1.2. The V1.1 version has several levels of locks, with higher
levels being held for concurrent access during an mcc_event_put() call.
This opens up many areas for concurrency conflicts and caused us a
great deal of excess execution overhead when we moved from our original
framework routines to the use of CMA (DECthreads) for our framework.
The V1.2 event manager has a single lock, which is always held for
exclusive access. It is simple and much more robust. Were we having
problems with the V1.1 event manager? Yes, but not much in the area of
deadlocks.
One of the major problems we have with V1.1 is the deletion of
processes that are using the event manager. We end up with hung locks
and also the filling of the event pool because of dead get structures.
Ensure that when DECmcc starts up on a system, leftover event manager
users have been terminated. This permits a clean startup of the event
manager. For V1.2, this will not be so much of a concern, as the event
pool will be self-policing to a large extent.
How often are you seeing the deadlock? When it occurs, are there many
events being put into the event manager? Just a few? One event? What
is your event sink process? Is it using other DECmcc or VMS locks?
How many threads is the event sink itself using? How many are
attempting to use mcc_event_put() at the same time?
Ted
|
| Thank you for your quick replies.
Re .1: We added the 1-second timer (we used the MCC timer routine) because
we discovered that it fixed a weird problem we were seeing - when
we generated an event for an entity, occasionally an alarm rule
for a *different* entity would fire. This would usually happen
when we generated a lot of events quickly (we used a test driver
that looped, constantly generating events), although it was not
reproducible at will. (We only had 2 weeks to test our prototype
software, so we did not have time to thoroughly look into problems...
since the timer fixed it, we just assumed it was some sort of
timing problem and accepted it.)
We do have the patch to enlarge the event pool because the customer
has seen INSEVTPOOLMEM a few times.
RE .2: It sounds like 1.2 may solve our problems. Fortunately, the
customer has not encountered the DEADLOCK error; we saw it during
our testing here only and, from your reply, it may have occurred
because of processes that were killed and did not free locks.
The customer has a habit of rebooting the workstation whenever
he needs to restart the software, so that may be goodness.
To answer some of your other questions...
>>> How often are you seeing the deadlock? <<<
We saw it a few times (less than 10) during our 2 weeks of testing.
>>> When it occurs, are there many events being put into the event
manager? Just a few? One event? <<<
It varied; sometimes it would occur in the middle of a burst of
events (when we used our test driver mentioned in the 1st paragraph);
sometimes, it would occur in the middle of a slow series of events.
I don't remember it ever occurring on the first event after
everything was freshly started.
>>> What is your event sink process? Is it using other DECmcc or
VMS locks? How many threads is the event sink itself using? How many
are attempting to use mcc_event_put() at the same time? <<<
The event sink process is a detached process using 2 threads; only
1 thread calls mcc_event_put. It uses 2 other locks using the
DECmcc locking routines.
Since it's not a reproducible problem, we don't want to waste your time.
We were thinking that if we understood how the event pool locking worked
that we might be able to understand how a deadlock error could occur.
It seems likely that the problem was caused by old locks that were
hanging around.
We added more debugging statements in hopes of trapping exactly what is
happening with the ENQLM error. That isn't reproducible, either. If
we find out more info, we'll let you know.
Thanks again,
Cathy
|