[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference azur::mcc

Title:DECmcc user notes file. Does not replace IPMT.
Notice:Use IPMT for problems. Newsletter location in note 6187
Moderator:TAEC::BEROUD
Created:Mon Aug 21 1989
Last Modified:Wed Jun 04 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:6497
Total number of notes:27359

131.0. "QUESTION - ALARMS polling interval" by DSTEG1::MCCANN () Tue May 15 1990 16:41

I'm running the MCC EFT release (UT1.0.0).

I created a rule that was to be evaluated once per second (i.e. at
every ::1), and enabled the rule for 15 minutes.  I hoped to see
15 * 60 = 900 polls occur, but I saw only 310.  There was no
MCC$ALARMS_date_ERROR.LOG, and the rule was not disabled.

How does Alarms handle the situation when it cannot evaluate a rule
at its scheduled interval?

It looks as though the situation is ignored (at least there are no user
visible indications that a poll was missed), and polling continues at the
next scheduled polling time in the future.  Is this a correct conclusion?

Thanks,

Jack
T.RTitleUserPersonal
Name
DateLines
131.1We make upto ten attempts before giving upWAKEME::ANILTue May 15 1990 22:2736
>> How does Alarms handle the situation when it cannot evaluate a rule at
>> its scheduled interval?
>>
>> It looks as though the situation is ignored (at least there are no user
>> visible indications that a poll was missed), and polling continues at
>> the next scheduled polling time in the future.  Is this a correct
>> conclusion?
>>

   The Algorithm used by Alarms is as follows:

   The Rule is enabled for its schedule time. If the IM (working as a
   scheduler in MCC Kernel) indicates that the scheduled time is already
   passed we reschedule the call. This loop continues for no more than ten
   times. If after ten attempts we fail to schedule the call the rule gets
   disabled.

   In plain English all this translates in to "do your best to schedule the
   call. If you miss some (ie up to ten consecutive) attempts that's OK!
   But do give up after ten attempts".

   Now is this the best approach? Not really. Some day we would like to
   make the number of attempts as the MGMT parameter of each rule with 10
   as its default value.

   Now the question is should alarm bump the error count if "time already
   passed" status comes back? The answer is yes! We may not be able to  get
   this done before EFT update, but we would like to try it for V1.0.


   Does this sound okay to you? Let us know if we are on the right track!

   Thanks,

   - Anil
131.2So, if I understand this correctly...DSTEG1::MCCANNWed May 16 1990 10:3441
Ok.  Let me see if I've got this right.

I have a rule whose expression I wish to have evaluated once every second.
However, say it takes 2.5 seconds to evaluate the expression (due to the
time it takes to go to the network and obtain the info required).

Would the flow of events look something like this:

TIME	ACTION
----    --------------------------------------------
 0      Rule begins scheduled evaluation

 1      Rule is still being evaluated, so we miss this scheduled evaluation.

 2      Rule is still being evaluated, so we miss this scheduled evaluation.

 2.5    Rule ends evaluation
        Try to schedule rule for evaluation at time 1, but time has passed.
        Try to schedule rule for evaluation at time 2, but time has passed.
        Schedule rule for evaluation at time 3 succeeds, time 3 has not passed.

 3      Rule begins scheduled evaluation.


Is the following statement true:

 "If a rule is being evaluated, the same rule will not begin
  its next evaluation until the first evaluation has completed."

This is as opposed to having multiple concurrent evaluations of the same rule.

>   Now the question is should alarm bump the error count if "time already
>   passed" status comes back? The answer is yes! We may not be able to  get
>   this done before EFT update, but we would like to try it for V1.0.

Bumping a counter (or logging an error) sounds like a good idea.  Do you
plan to support a COUNTERS attribute group for the alarms rule entity?

Thanks again,

Jack
131.3Yes we have COUNTERS and STATUS too for Rule subentityWAKEME::ANILWed May 16 1990 19:5950
>     Is the following statement true:
>
>      "If a rule is being evaluated, the same rule will not begin
>       its next evaluation until the first evaluation has completed."
>
>This is as opposed to having multiple concurrent evaluations of the same rule.

  Yes it is true that rule will not be reschduled untill the privious
  evaluation  is complted. Note that time to evaluate is less than
  .1 seconds. I have not been able schedule anything faster than
  .5 seconds.

>
>Bumping a counter (or logging an error) sounds like a good idea.  Do you
>plan to support a COUNTERS attribute group for the alarms rule entity?
>
>
  Yes we do have counters! Enclosed is a sample output! The counter
  and status atributes will be available in EFTupdate kit.


MCC 0 ALARMS RULE DTM_101
ALL ATTRIBUTES
AT 16-MAY-1990 18:49:45


                                   NAME = DTM_101
      +----------------------------------------------------------------+
      |       Result of Last Evaluation = True                         |
NEW   |                           State = Enabled                      |  Status
      |                        Substate = Running                      |
      |         Time of Last Evaluation = 16-MAY-1990 18:49:34.79      |
      +----------------------------------------------------------------+
      +----------------------------------------------------------------+
NEW   |                Evaluation Error = 0                            |
      |                Evaluation False = 0                            | Counter
      |                 Evaluation True = 2                            |
      +----------------------------------------------------------------+

                               Category = "LOG file notification"
                      Exception Handler = SYS$COMMON:[MCC]MCC$ALARMS_LOG_EXCEPTI
                                          ON.COM;3
                             Expression = (bridge 08-00-2b-07-b9-f3 Bad Hello Li
                                          mit > 10, AT every 00:00:30)
                              Parameter = "BINDU$DUA0:[ANIL.LOG]DTM_101.LOG"
                              Procedure = SYS$COMMON:[MCC]MCC$ALARMS_LOG_ALARM.C
                                          OM;3



131.4More on Evaluation TimeWAKEME::ANILWed May 16 1990 20:3038
RE: <<< Note 131.2 by DSTEG1::MCCANN >>>


+-------------------------------------------------------------------------------+
|Would the flow of events look something like this:                             |
|                                                                               |
|TIME    ACTION                                                                 |
|----    --------------------------------------------                           |
| 0      Rule begins scheduled evaluation                                       |
|                                                                               |
| 1      Rule is still being evaluated, so we miss this scheduled evaluation.   |
|                                                                               |
| 2      Rule is still being evaluated, so we miss this scheduled evaluation.   |
|                                                                               |
| 2.5    Rule ends evaluation                                                   |
|        Try to schedule rule for evaluation at time 1, but time has passed.    |
|        Try to schedule rule for evaluation at time 2, but time has passed.    |
|        Schedule rule for evaluation at time 3 succeeds, time 3 has not passed.|
|                                                                               |
| 3      Rule begins scheduled evaluation.                                      |
+-------------------------------------------------------------------------------+
The table you have made is essentially acurate. I would just like to
Define the Evaluation time as the processing time and the Time to fetch
the data. i.e.

	Time to evaluate = (Time to fetch the data)  +
			   (Time to process the data)

	Say Te = Tf + Tp

	Performance calculations done so far indicate that Tp < .1 seconds
	Where as Tf
		for DECNET Phase4 is ~ 15-20 seconds.
                for Bridge is ~ 1-2 seconds

        so you can see that most of the time is spent in the network I/O.


131.5I agree - Te = Tf + TpDSTEG1::MCCANNThu May 17 1990 15:3729
Yes.  When I used the term "evaluate a rule", I meant it to include both
fetching the data and checking the rule expression.

I like Te = Tf + Tp.  It's a good definition.

Since you've been so helpful, here's a couple more questions. :-)

Assume we have a rule that fetches a circuit characteristic from a remote
node4.  When the rule begins evaluation, it issues a request to the
DECnet IV AM.  The DECnet IV AM fetches the data (in a separate thread?
while the rule thread waits?).

In the act of fetching that data, a DECnet session is established
between MCC on the local node and NML on the remote node.  The data is
requested and returned using NICE, then the session terminates.  Is any of
the returned data processed by MCC before the session terminates?  Or
is the session terminated, then the data processed?

Perhaps I should define what I think I mean be "process the data".  I mean
the DECnet IV AM parses the NICE message, extracts the desired circuit
characteristic, and returns that characteristic to the alarms FM.  In turn,
the alarms FM checks the rule expression using the data it just received.

In this example, could we use the duration of the session as Tf?

Thanks again,

Jack
131.6More on TfWAKEME::ANILThu May 17 1990 21:5928
>    Perhaps I should define what I think I mean be "process the data".  I mean
>    the DECnet IV AM parses the NICE message, extracts the desired circuit
>    characteristic, and returns that characteristic to the alarms FM.  In turn,
>    the alarms FM checks the rule expression using the data it just received.
>
>    In this example, could we use the duration of the session as Tf?


From Alarms perpective Tf is the time taken between making the call and data
being returnd to Alarms. If we were to draw a generic time line for any AM
returning data as a result of show request it would look as follows:


                   |<-------------------- Tf ----------------------->|
		   |
        -----------+---+---------------------------------------------+--
       <- Alarms-->| ^ |<-- Validate the call  --------------------->|
          code     ^ |      Set up the context
                   | `      Lookup translation tables
                  call`     Get the data
                  to   `    translate back to MCC call parameters
                  AM    `
                         `
                         IM/Dispatcher

As you can see from the above picture if DECNET Phase4 AM has to incure the
overhead of estblishing a Session it would still be part of Tf from Alarms
view point.
131.7That approach isn't very self limitingCAPN::SYLORArchitect = Buzzword GeneratorMon May 21 1990 18:3131
In an earlier approach to this problem we used a slightly different algorithm 
for computing the "next" poll/evaluation. We took the "period" that the user
asked for (say 60 sec) and added it onto the time that we finished an
evaluation/poll and used that to determine the time of the start of the next
evaluation/poll. Example:

	evaluation period 60 sec.
	evaluation starts at 12:01:00
	it takes 10 secs to gather data, evaluate it, etc. finishing at 12:01:10
	start of next evaluation 12:02:10

Now in this example we actually take 70 seconds between evaluations.
The advantage of this approach was we never completely "hog" the system.
In fact, the algorithm is self stabalizing in that added load "slows the rate
of evaluation/polling", and lowers the workload causing the overload.
We never have to worry about "missed evaluations", in fact all you need do is
keep a counter of how many evaluations have been done. Over a long interval, 
it is possible to compute the "actual" rate at which the evaluations are done.
If you *really* wanted to get fancy, you could compute that rate, compare
it against the "desired rate", and if they were more than X% slower,
raise an alarm!

I'm generally concerned about the 10 times then give up rule. If the period
between evaluations is large (relative to the average length of a computation)
then it never detects a problem (even if a computation takes 10 times the 
expected period). When you shorten the period (say for stress testing by a 
group like DSTEG, or by a customer), you'll suddenly see the rules start 
disabling themselves as they one by one give up. That won't look good on
a test.

					Mark