[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference azur::mcc

Title:	DECmcc user notes file. Does not replace IPMT.
Notice:	Use IPMT for problems. Newsletter location in note 6187
Moderator:	TAEC::BEROUD

Created:	Mon Aug 21 1989
Last Modified:	Wed Jun 04 1997
Last Successful Update:	Fri Jun 06 1997
Number of topics:	6497
Total number of notes:	27359

131.0. "QUESTION - ALARMS polling interval" by DSTEG1::MCCANN () Tue May 15 1990 15:41

I'm running the MCC EFT release (UT1.0.0).

I created a rule that was to be evaluated once per second (i.e. at
every ::1), and enabled the rule for 15 minutes.  I hoped to see
15 * 60 = 900 polls occur, but I saw only 310.  There was no
MCC$ALARMS_date_ERROR.LOG, and the rule was not disabled.

How does Alarms handle the situation when it cannot evaluate a rule
at its scheduled interval?

It looks as though the situation is ignored (at least there are no user
visible indications that a poll was missed), and polling continues at the
next scheduled polling time in the future.  Is this a correct conclusion?

Thanks,

Jack

T.R	Title	User	Personal Name	Date	Lines
131.1	We make upto ten attempts before giving up	WAKEME::ANIL		`Tue May 15 1990 21:27`	36
	>> How does Alarms handle the situation when it cannot evaluate a rule at >> its scheduled interval? >> >> It looks as though the situation is ignored (at least there are no user >> visible indications that a poll was missed), and polling continues at >> the next scheduled polling time in the future. Is this a correct >> conclusion? >> The Algorithm used by Alarms is as follows: The Rule is enabled for its schedule time. If the IM (working as a scheduler in MCC Kernel) indicates that the scheduled time is already passed we reschedule the call. This loop continues for no more than ten times. If after ten attempts we fail to schedule the call the rule gets disabled. In plain English all this translates in to "do your best to schedule the call. If you miss some (ie up to ten consecutive) attempts that's OK! But do give up after ten attempts". Now is this the best approach? Not really. Some day we would like to make the number of attempts as the MGMT parameter of each rule with 10 as its default value. Now the question is should alarm bump the error count if "time already passed" status comes back? The answer is yes! We may not be able to get this done before EFT update, but we would like to try it for V1.0. Does this sound okay to you? Let us know if we are on the right track! Thanks, - Anil
131.2	So, if I understand this correctly...	DSTEG1::MCCANN		`Wed May 16 1990 09:34`	41
	Ok. Let me see if I've got this right. I have a rule whose expression I wish to have evaluated once every second. However, say it takes 2.5 seconds to evaluate the expression (due to the time it takes to go to the network and obtain the info required). Would the flow of events look something like this: TIME ACTION ---- -------------------------------------------- 0 Rule begins scheduled evaluation 1 Rule is still being evaluated, so we miss this scheduled evaluation. 2 Rule is still being evaluated, so we miss this scheduled evaluation. 2.5 Rule ends evaluation Try to schedule rule for evaluation at time 1, but time has passed. Try to schedule rule for evaluation at time 2, but time has passed. Schedule rule for evaluation at time 3 succeeds, time 3 has not passed. 3 Rule begins scheduled evaluation. Is the following statement true: "If a rule is being evaluated, the same rule will not begin its next evaluation until the first evaluation has completed." This is as opposed to having multiple concurrent evaluations of the same rule. > Now the question is should alarm bump the error count if "time already > passed" status comes back? The answer is yes! We may not be able to get > this done before EFT update, but we would like to try it for V1.0. Bumping a counter (or logging an error) sounds like a good idea. Do you plan to support a COUNTERS attribute group for the alarms rule entity? Thanks again, Jack
131.3	Yes we have COUNTERS and STATUS too for Rule subentity	WAKEME::ANIL		`Wed May 16 1990 18:59`	50
	> Is the following statement true: > > "If a rule is being evaluated, the same rule will not begin > its next evaluation until the first evaluation has completed." > >This is as opposed to having multiple concurrent evaluations of the same rule. Yes it is true that rule will not be reschduled untill the privious evaluation is complted. Note that time to evaluate is less than .1 seconds. I have not been able schedule anything faster than .5 seconds. > >Bumping a counter (or logging an error) sounds like a good idea. Do you >plan to support a COUNTERS attribute group for the alarms rule entity? > > Yes we do have counters! Enclosed is a sample output! The counter and status atributes will be available in EFTupdate kit. MCC 0 ALARMS RULE DTM_101 ALL ATTRIBUTES AT 16-MAY-1990 18:49:45 NAME = DTM_101 +----------------------------------------------------------------+ \| Result of Last Evaluation = True \| NEW \| State = Enabled \| Status \| Substate = Running \| \| Time of Last Evaluation = 16-MAY-1990 18:49:34.79 \| +----------------------------------------------------------------+ +----------------------------------------------------------------+ NEW \| Evaluation Error = 0 \| \| Evaluation False = 0 \| Counter \| Evaluation True = 2 \| +----------------------------------------------------------------+ Category = "LOG file notification" Exception Handler = SYS$COMMON:[MCC]MCC$ALARMS_LOG_EXCEPTI ON.COM;3 Expression = (bridge 08-00-2b-07-b9-f3 Bad Hello Li mit > 10, AT every 00:00:30) Parameter = "BINDU$DUA0:[ANIL.LOG]DTM_101.LOG" Procedure = SYS$COMMON:[MCC]MCC$ALARMS_LOG_ALARM.C OM;3
131.4	More on Evaluation Time	WAKEME::ANIL		`Wed May 16 1990 19:30`	38
	RE: <<< Note 131.2 by DSTEG1::MCCANN >>> +-------------------------------------------------------------------------------+ \|Would the flow of events look something like this: \| \| \| \|TIME ACTION \| \|---- -------------------------------------------- \| \| 0 Rule begins scheduled evaluation \| \| \| \| 1 Rule is still being evaluated, so we miss this scheduled evaluation. \| \| \| \| 2 Rule is still being evaluated, so we miss this scheduled evaluation. \| \| \| \| 2.5 Rule ends evaluation \| \| Try to schedule rule for evaluation at time 1, but time has passed. \| \| Try to schedule rule for evaluation at time 2, but time has passed. \| \| Schedule rule for evaluation at time 3 succeeds, time 3 has not passed.\| \| \| \| 3 Rule begins scheduled evaluation. \| +-------------------------------------------------------------------------------+ The table you have made is essentially acurate. I would just like to Define the Evaluation time as the processing time and the Time to fetch the data. i.e. Time to evaluate = (Time to fetch the data) + (Time to process the data) Say Te = Tf + Tp Performance calculations done so far indicate that Tp < .1 seconds Where as Tf for DECNET Phase4 is ~ 15-20 seconds. for Bridge is ~ 1-2 seconds so you can see that most of the time is spent in the network I/O.
131.5	I agree - Te = Tf + Tp	DSTEG1::MCCANN		`Thu May 17 1990 14:37`	29
	Yes. When I used the term "evaluate a rule", I meant it to include both fetching the data and checking the rule expression. I like Te = Tf + Tp. It's a good definition. Since you've been so helpful, here's a couple more questions. :-) Assume we have a rule that fetches a circuit characteristic from a remote node4. When the rule begins evaluation, it issues a request to the DECnet IV AM. The DECnet IV AM fetches the data (in a separate thread? while the rule thread waits?). In the act of fetching that data, a DECnet session is established between MCC on the local node and NML on the remote node. The data is requested and returned using NICE, then the session terminates. Is any of the returned data processed by MCC before the session terminates? Or is the session terminated, then the data processed? Perhaps I should define what I think I mean be "process the data". I mean the DECnet IV AM parses the NICE message, extracts the desired circuit characteristic, and returns that characteristic to the alarms FM. In turn, the alarms FM checks the rule expression using the data it just received. In this example, could we use the duration of the session as Tf? Thanks again, Jack
131.6	More on Tf	WAKEME::ANIL		`Thu May 17 1990 20:59`	28
	> Perhaps I should define what I think I mean be "process the data". I mean > the DECnet IV AM parses the NICE message, extracts the desired circuit > characteristic, and returns that characteristic to the alarms FM. In turn, > the alarms FM checks the rule expression using the data it just received. > > In this example, could we use the duration of the session as Tf? From Alarms perpective Tf is the time taken between making the call and data being returnd to Alarms. If we were to draw a generic time line for any AM returning data as a result of show request it would look as follows: \|<-------------------- Tf ----------------------->\| \| -----------+---+---------------------------------------------+-- <- Alarms-->\| ^ \|<-- Validate the call --------------------->\| code ^ \| Set up the context \| ` Lookup translation tables call` Get the data to ` translate back to MCC call parameters AM ` ` IM/Dispatcher As you can see from the above picture if DECNET Phase4 AM has to incure the overhead of estblishing a Session it would still be part of Tf from Alarms view point.
131.7	That approach isn't very self limiting	CAPN::SYLOR	Architect = Buzzword Generator	`Mon May 21 1990 17:31`	31
	In an earlier approach to this problem we used a slightly different algorithm for computing the "next" poll/evaluation. We took the "period" that the user asked for (say 60 sec) and added it onto the time that we finished an evaluation/poll and used that to determine the time of the start of the next evaluation/poll. Example: evaluation period 60 sec. evaluation starts at 12:01:00 it takes 10 secs to gather data, evaluate it, etc. finishing at 12:01:10 start of next evaluation 12:02:10 Now in this example we actually take 70 seconds between evaluations. The advantage of this approach was we never completely "hog" the system. In fact, the algorithm is self stabalizing in that added load "slows the rate of evaluation/polling", and lowers the workload causing the overload. We never have to worry about "missed evaluations", in fact all you need do is keep a counter of how many evaluations have been done. Over a long interval, it is possible to compute the "actual" rate at which the evaluations are done. If you really wanted to get fancy, you could compute that rate, compare it against the "desired rate", and if they were more than X% slower, raise an alarm! I'm generally concerned about the 10 times then give up rule. If the period between evaluations is large (relative to the average length of a computation) then it never detects a problem (even if a computation takes 10 times the expected period). When you shorten the period (say for stress testing by a group like DSTEG, or by a customer), you'll suddenly see the rules start disabling themselves as they one by one give up. That won't look good on a test. Mark