Title: | DECmcc user notes file. Does not replace IPMT. |
Notice: | Use IPMT for problems. Newsletter location in note 6187 |
Moderator: | TAEC::BEROUD |
Created: | Mon Aug 21 1989 |
Last Modified: | Wed Jun 04 1997 |
Last Successful Update: | Fri Jun 06 1997 |
Number of topics: | 6497 |
Total number of notes: | 27359 |
HI I am trying to reduce the number of mail messages being sent to my customer. He is NOT interested if a circuit goes down for a couple of minutes as the map changes colour, but if a circuit is off the air for longer than this, then he would like to be mailed. I thought a few lines of DCL and couple of extra alarm rules would fix his problem. But life isn't easy . A simple rule . <If a circuit up rule fires within 2 minutes then Delete the batch job which sends the mail message>. It would be nice ,if I could setup an alarm rule <if no circuit up rule fires within 2 minutes then send a mail message> but life is a pain . Occurs N Time Rule is the rule for the job!! N needs to set to a value of 1 but this causes Software logic error detected, %MCC-E-INVALIDTIME, invalid time specified Invalid time is strange message, as the problem is Value N What values can Count N be ??? 0------>255 MCC 0 ALARMS RULE waiting_for_hazard_up AT 27-OCT-1992 02:47:02 All Attributes NAME = waiting_for_hazard_up State = Enabled Substate = Running Time of Last Evaluation = 27-OCT-1992 02:45:48.25 Result of Last Evaluation = True Current Severity = Major Creation Timestamp = 27-OCT-1992 02:45:17.33 Evaluation Error = 0 Evaluation True = 1 Evaluation False = 1 Expression = (OCCURS (Domain SOUTHWATER Rule hazard_up OSI rule ----> count n is > 1 fired,2, 00:02:00)) Procedure = DKB200:[MALLOY]MCC.COM;1 Parameter = "$ delete/entry=442" Perceived Severity = Major Probable Cause = Unknown MCC 0 ALARMS RULE waiting_for_hazard_up AT 27-OCT-1992 02:49:45 All Attributes NAME = waiting_for_hazard_up State = Enabled Substate = Running -> Error Condition = "Software logic error detected, -> %MCC-E-INVALIDTIME, invalid time specified " Time of Last Evaluation = 27-OCT-1992 02:48:21.10 Result of Last Evaluation = Error Current Severity = Indeterminate Error Entity = MCC 0 ALARMS RULE waiting_for_hazard_up Creation Timestamp = 27-OCT-1992 02:47:59.74 ------> Evaluation Error = 1 Evaluation True = 0 Evaluation False = 0 Expression = (OCCURS (Domain SOUTHWATER Rule hazard_up OSI rule --> count n = 1 (should n be > 1) fired,1, 00:02:00)) Procedure = DKB200:[MALLOY]MCC.COM;1 Parameter = "$ delete/entry=442" Perceived Severity = Major Probable Cause = Unknown Gary
T.R | Title | User | Personal Name | Date | Lines |
---|---|---|---|---|---|
3970.1 | QAR'd in mcc012_ext | MCC1::DITMARS | Pete | Wed Oct 28 1992 14:16 | 1 |
QAR 516, entered against alarms | |||||
3970.2 | options, options... | CTHQ::WOODCOCK | Sun Nov 01 1992 01:09 | 34 | |
Hi Gary, I have done a little of this stuff in the past and this is my experience which may (or may not) apply for yourself with the current version. For circuit events (DECnet) I don't fire ANY alarm procedures. I simply use the raw event to update the map. The reason I took this approach is that our environment is to watch dozens of circuits including multiple vendors and multiple physical media (ie fiber, satellite). There have been many an instance when a circuit will begin bouncing at a rate of 1-2 times a second. I suspect it is even faster but the router can't get the events in quick enough. If I am using alarms with batch these scenerios would (and have) dropped our system off the map. What I do instead is to incorporate a polling interval as a backup. That is maybe poll it every 2, 5, or even 30 minutes just to ensure we haven't missed anything from events and this also keeps the map updated periodically. If a poll says it's down then it is most likely really down and worthy of mail. During the day someone will have already caught the color change from the event so the mail isn't actually needed real time. At night we only log in once or twice anyway so we ourselves aren't real time so the mail doesn't require real time. As a second option I also use strictly events for another solution. Because events are the only means of determining an outage on a different network a little dcl was used to buffer the mail. While I'm not doing circuit management the same method could apply. Have two com procedures; one for downs and one for ups (up.com and down.com). When the down event comes in set a logical (or create a file) then put in a two minute wait statment. If the circuit comes back up UP.COM searches for the logical and deletes it. When DOWN.COM comes out of the wait it then checks for the logical, if it's still there send mail if not then exit. This works very effectively so long as you don't get the streaming event problem described above. best regards, brad... | |||||
3970.3 | workable VMS approach | CSOADM::ROTH | Kick out the jams! | Tue Nov 10 1992 23:51 | 7 |
I did a similar thing as in .2, I create a 'flag' file for each down or up that incorporates the name of the entity. Before deciding to act (page, in this case), I do a delete/bef="-:10" filename.ext and then use a f$search lexical to see if a file is still there... if a file is found then skip notification- it is too soon since the last one. Lee | |||||
3970.4 | The problem in .0 has been corrected | TRM::KWAK | Wed Dec 16 1992 17:58 | 9 | |
RE: .0 and .1 The problem with alarms "OCCURS N Time" rule when N is 1 has been fixed. The next release will include the changes. Thanks. William |