[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference azur::mcc

Title:	DECmcc user notes file. Does not replace IPMT.
Notice:	Use IPMT for problems. Newsletter location in note 6187
Moderator:	TAEC::BEROUD

Created:	Mon Aug 21 1989
Last Modified:	Wed Jun 04 1997
Last Successful Update:	Fri Jun 06 1997
Number of topics:	6497
Total number of notes:	27359

5597.0. "Need help writing an alarm rule" by ZPOVC::VENKAT () Thu Sep 16 1993 23:15

    Hello,
    
    I want to write an Alarm rule which should fire whenever a alarm
    rule is 'CLEAR'ed. What I mean is for example whenever a previously 
    down circuit becomes up MCC fires a General polling alarm and clears 
    the previously fired rule. I want to fire another alarm whenever 
    this happens.
    
    How can I do this ?
    
    Venkat,
    Asia Pacific Network Operations Center,
    Singapore.

T.R	Title	User	Personal Name	Date	Lines
5597.1	Got a solution, not sure if the problem fits...	BIKINI::KRAUSE	European NewProductEngineer for MCC	`Mon Sep 20 1993 04:43`	12
	I'm not sure if I understood your problem. What do you mean by 'Clear'ing a rule? If you mean Disable or Delete, you coud write a rule that picks up rule events, e.g. (occurs(domain abc rule xyz, rule disabled)) (occurs(domain abc rule xyz, rule deleted)) You might use the Data Collector to send an event directly from the action procedure of your first rule. If you just want to notify and not take further action, this would save you writing a second rule. *Robert
5597.2	Hint, hint...	BIKINI::KRAUSE	European NewProductEngineer for MCC	`Mon Sep 20 1993 04:48`	7
	And - please be more specific in your notes titles. 'Help!' is skipped by most noters. You can change your title to something more meaningful with e.g. SET NOTE 5597.0/title="Need help writing an alarm rule" Thanks, *Robert
5597.3	Reply to .1	ZPOVC::VENKAT		`Mon Sep 20 1993 05:27`	21
	Reply to .1 OK.. Let me be more specific ! My requirement is that I want MCC to keep track of availability of circuits. So what I intend to do is whenever a circuit is down, write an entry into a log file and this can be done by the command procedure that is invoked when the associated alarm rule fires. You must be knowing that when the circuit comes up again, a general alarm rule is fired and the iconic map changes color to green. However no command procedure is invoked. I would like to have command procedure invoked when this happens. So that I can write an entry into a log file and by this process, I can keep track of availability of circuits. I hope this is clear ! Thanks and Regards, Venkat.
5597.4	To keep the story together...	BIKINI::KRAUSE	European NewProductEngineer for MCC	`Tue Sep 21 1993 03:52`	50
	From: ZPOVC::VENKAT "Venkat Narayanan @ZPO" 20-SEP-1993 21:05:37.83 To: BIKINI::KRAUSE CC: VENKAT Subj: U: Need your help please .. Hi, Thanks for your reply to my note #5597 in TOOK::MCC conference. I need your help if possible : What I want to do is by using MCC, I want to keep track of all circuit outages. What I intend to do is create a log file and put entries to the log file whenever the circuit goes down and becomes up. I have written Alarm rules which will fire when the circuit is down, the alarm expression being : (NODE4 * CIRCUIT * SUBSTATE <> NONE, AT EVERY 00:10:00) When this alarm rule fires, the Iconic Map changes color to RED. A command procedure is invoked and this DCL script will perform a set of actions like writing an entry into a log file, sending an E-Mail etc.. However when the circuit comes up again, the rule that was fired previously will be cleared and the map changes color to Green. However when this happens, no command procedure is fired. So my question is "Is it possible to have a DCL procedure invoked when this happens (that is a rule is cleared ) ?". If you have any suggestions, please let me know. Thanks and Regards, Venkat. ------------------------------------------------------------------------------- Venkat, Now I know what you want to do. This functionality (detecting that a rule does no longer fire) has been on the wishlist for a long time. As an alternative you could use a pair of change_of rules with appropriate action routines, e.g. (change_of(node4 * circuit * substate, , none), at every 00:10:00) (change_of(node4 circuit * substate, none, ), at every 00:10:00) Got the idea? Robert
5597.5	good but color probs	CTHQ::WOODCOCK		`Tue Sep 21 1993 08:58`	18
	> (change_of(node4 * circuit * substate, , none), at every 00:10:00) > (change_of(node4 circuit * substate, none, *), at every 00:10:00) This is a very good idea but it does come with restrictions. By using 2 seperate rules the colors won't correlate (I don't think). In other words, when the circuit goes down the link shows 'red', when it comes back up the clear rule fires but the link still shows 'red'. It would work with propagation equal to LATEST but this setting brings on another whole set of worries with not seeing errors on the map and I wouldn't recommend it. If memory serves me, there is still another potential solution for getting the colors right using the above rules but I don't know if it ever worked. Using mcc internal events for notification instead of the general alarm notify, but I don't recall the syntax. best regards, brad...
5597.6	One possible solution	TOOK::NAVKAL		`Tue Sep 21 1993 12:56`	19
	Okay I will byte. When Alarms rule fires the severity of the rule is what ever the user had indicated when s/he created the rule. How ever when the rule detects that the alarming condition does no longer exist it fire with severity of "Clear". Now what we need to do is use this fact in a a very creative way. (What is other wise know as a hack!) Lets say rule A is the one looking for Alarming condition When Rule A fires let it invoke a command procedure that checks the severity. Severity is passed as a parameter to the command procedure. If the severity is any thing other than what was indicated when the rule was created, you have basically detected the "clear" condition. Did I answer the question? Anil Navkal
5597.7	no fire on clear	CTHQ::WOODCOCK		`Tue Sep 21 1993 15:54`	110
	> Okay I will byte. When Alarms rule fires the severity of the > rule is what ever the user had indicated when s/he created the > rule. How ever when the rule detects that the alarming > condition does no longer exist it fire with severity of > "Clear". Now what we need to do is use this fact in a > a very creative way. (What is other wise know as a hack!) > Lets say rule A is the one looking for Alarming condition > When Rule A fires let it invoke a command procedure that > checks the severity. Severity is passed as a parameter to the > command procedure. If the severity is any thing other than > what was indicated when the rule was created, you have > basically detected the "clear" condition. > Did I answer the question? Nope. Rules do not fire the procedure when they CLEAR. When rules transition from some_severity to clear an internal event is generated. This event clears the map but no exeternal procedures are run. Your idea can be used but you must write a rule on the internal events of the rules and then parse for the CLEAR severity. Something like: (occurs(domain x rule * OSI RULE FIRED) Now look for a CLEAR severity and hack at it. Note that even this solution can actually get messy if there are -many,many- domains. You could wildcard the domain also but you get into trouble if mcc has multiple uses with rules firing for different business disciplines. So what would I do (or did do) to solve this? AVAILABILITY metrics are a trend and for us are only computed once a month. Therefore we only need something statistically close. We poll everything every 10 minutes with the basic general rule. When this rule fires (and it fires EVERY 10 minutes during an outage) an entry is placed into a monthly log file. Therefore, statistically speaking, if a circuit is down this equates to a 10 minute outage for every entry in the log. No entry, no outage. After the end of the month a procedure counts the circuits and nodes being monitored via a configuration file, figures out how days there were in previous month, then determines the total polls for all ckts/nodes for that month. This is done for every CLASS entity (ie. node4, snmp, etc..). It then searches the log and assumes every entry equals a 10 minute outage, counts them, then derives an availability percentage. One can argue the accuracy of assuming every entry equals a ten minute outage so lets spend a minute on this also. Regardless of what POLLING technique you use this assumption is the only one which can be made. Even if you use the mcc internal CLEAR event the accuracy is still determined by the original poll rate of the first alarm. The ONLY way to increase accuracy is to decrease the polling interval or go to rules on entity events if your system can handle it (ours cannot). You would think the assumption of the whole polling interval as the outage would tend to decrease the AVAILABILITY metric but also note that there are some outages completely missed because it falls between polls. I contend it all comes clean, statistically, out in the wash (see ESC metrics below). So...if you need AVAILABILITY metrics on a monthly basis why not just do as described above?? If you want the outage duration announced real time by the alarms then you'll have to persue the other methods but otherwise this all seems like a lot more work than actually required. cheers, brad... +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ USAMTS>ty DISK$MCC:[MCC.MENU]MCC_AVL_AUG.DAT;3 MCC_AVL_AUG Availability Metrics ---------------------------------- DECnet (67) DECnet routers with (75) circuits were polled every ten minutes (2411) 10 minute circuit outages were recorded (60) 10 minute DECnet router outages were recorded TOTAL DECnet ROUTER AVAILABILITY = 99.98% TOTAL DECnet NETWORK/CIRCUIT AVAILABILITY = 99.28% TOTAL DECnet ROUTER DPMO = 200 (OFD=1) TOTAL DECnet ROUTER DPMO = 28 (OFD=7) ............................................................................ TCP/IP (10) TCP/IP routers with (14) circuits were polled every ten minutes (219) 10 minute circuit outages were recorded (379) 10 minute TCP/IP router outages were recorded TOTAL TCP/IP ROUTER AVAILABILITY = 99.16% TOTAL TCP/IP NETWORK/CIRCUIT AVAILABILITY = 99.65% TOTAL TCP/IP ROUTER DPMO = 8490 (OFD=1) TOTAL TCP/IP ROUTER DPMO = 1212 (OFD=7) ............................................................................ WATN (81) WATN hosts (207) WATN host outages were recorded TOTAL WATN HOST UNAVAILABILITY = 5:4:50:32 (d:h:m:s) AVERAGE WATN HOST OUTAGE = 36 minutes TOTAL WATN HOST AVAILABILITY = 99.80% TOTAL WATN HOST DPMO = -304 (OFD=1 computed by minute) TOTAL WATN HOST DPMO = -43 (OFD=7 computed by minute)
5597.8	Brad is right!	TOOK::NAVKAL		`Tue Sep 21 1993 20:09`	16
	You got me Brad! Tells me how stale is my Alarms knowledge is now. We went back and forth on the rule Clear event so many times that I just don't know the last "state" of Rule Clear event. Thanks for a complete answer Brad. Now I also remember the justification for not executing rule fire procedure. As most of these procedures are written to handle the case of Alarming situation, it just was not appropriate to go ahead and execute the same procedure when the Alarming situation was acually cleared. And yes I can also see why with glabal wild carding things can get real messy. Anyway I hope Venkat you got what you were looking for. Anil Navkal
5597.9	Thank you !	ZPOVC::VENKAT		`Wed Sep 22 1993 01:10`	15
	Hello, Thanks to everyone for thier comments and views. I have decided to go by what Brad suggested. However I still wish that the user should be able to decide whether a procedure should be fired when an alarm rule is 'CLEAR'ed. This will help to take a set of actions when an outage or an alarming event is back to normal state. Thanks again, Venkat.
5597.10	It can be solved!	BERN01::GMUER		`Thu Sep 23 1993 04:59`	28
	Hello The problem can be solved, but it needs some coding. MCC_REACH is one example for this, see note 24 in conference MCC-TOOLS. Basic Idea: 1) Put the rule in a separate domain X, where the notification for alarm rules is off. 2) Create a rule for OSI rule fired and OSI rule exception events in domain X with an alarm trigger procedure OSI_RULE_FIRED.COM. 3) Parse the input parameters in OSI_RULE_FIRED.COM and check the severity. 4) Send a data collector event with the actual severity to the target entity in the domain Y on the iconic map. You need a collector in the domain Y and a notify request for collector events in the root domain of Y. MCC_REACH maintains a list of domain memberships to find the correct domain Y. Important: The event title in the alarm event and in the clear event must be the same to get your desired alarm correlation behaviour. We have implemented this alarming for a customer here in Switzerland based on MCC_REACH. I have improved the original OSI_RULE_FIRED.COM, fixing some bugs and implementing an escalation mechanism. We also use the MCC_REACH logfile to make reports about host and server availabilty. Edgar
5597.11	MCC_REACH lives on	CTHQ::WOODCOCK		`Thu Sep 23 1993 09:17`	17
	>The problem can be solved, but it needs some coding. MCC_REACH is one example >for this, see note 24 in conference MCC-TOOLS. Hi Edgar, You've made my day!!! MCC_REACH was semi-written while I was on a customer site. When I came back I cleaned it up a bit for any future opportunities I potentially might go on. I posted it because it looked to solve some problems others might be interested in. Glad to see you found it useful, and IMPROVED it! I always get a kick when someone mentions they use it. Why? Because since I posted the last version I've never had any opportunities to use and test it in a production environment myself :). cheers, brad...