[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference azur::mcc

Title:DECmcc user notes file. Does not replace IPMT.
Notice:Use IPMT for problems. Newsletter location in note 6187
Moderator:TAEC::BEROUD
Created:Mon Aug 21 1989
Last Modified:Wed Jun 04 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:6497
Total number of notes:27359

5597.0. "Need help writing an alarm rule" by ZPOVC::VENKAT () Fri Sep 17 1993 00:15

    Hello,
    
    I want to write an Alarm rule which should fire whenever a alarm
    rule is 'CLEAR'ed. What I mean is for example whenever a previously 
    down circuit becomes up MCC fires a General polling alarm and clears 
    the previously fired rule. I want to fire another alarm whenever 
    this happens.
    
    How can I do this ?
    
    Venkat,
    Asia Pacific Network Operations Center,
    Singapore.
T.RTitleUserPersonal
Name
DateLines
5597.1Got a solution, not sure if the problem fits...BIKINI::KRAUSEEuropean NewProductEngineer for MCCMon Sep 20 1993 05:4312
I'm not sure if I understood your problem. What do you mean by
'Clear'ing a rule? If you mean Disable or Delete, you coud write a rule 
that picks up rule events, e.g.

	(occurs(domain abc rule xyz, rule disabled))
	(occurs(domain abc rule xyz, rule deleted))

You might use the Data Collector to send an event directly from the 
action procedure of your first rule. If you just want to notify and not 
take further action, this would save you writing a second rule.

*Robert
5597.2Hint, hint...BIKINI::KRAUSEEuropean NewProductEngineer for MCCMon Sep 20 1993 05:487
And - please be more specific in your notes titles. 'Help!' is skipped 
by most noters. You can change your title to something more meaningful 
with e.g. SET NOTE 5597.0/title="Need help writing an alarm rule"

Thanks,

*Robert
5597.3Reply to .1ZPOVC::VENKATMon Sep 20 1993 06:2721
    
    Reply to .1
    
    OK.. Let me be more specific !
    
    My requirement is that I want MCC to keep track of availability of
    circuits. So what I intend to do is whenever a circuit is down, write
    an entry into a log file and this can be done by the command procedure
    that is invoked when the associated alarm rule fires.
    
    You must be knowing that when the circuit comes up again, a general
    alarm rule is fired and the iconic map changes color to green. However
    no command procedure is invoked. I would like to have command procedure
    invoked when this happens. So that I can write an entry into a log file
    and by this process, I can keep track of availability of circuits.
    
    I hope this is clear !
    
    Thanks and Regards,
    
    Venkat.
5597.4To keep the story together...BIKINI::KRAUSEEuropean NewProductEngineer for MCCTue Sep 21 1993 04:5250
From:	ZPOVC::VENKAT       "Venkat Narayanan @ZPO" 20-SEP-1993 21:05:37.83
To:	BIKINI::KRAUSE
CC:	VENKAT
Subj:	U: Need your help please ..

Hi,

Thanks for your reply to my note #5597 in TOOK::MCC conference. I need your help
if possible :

What I want to do is by using MCC, I want to keep track of all circuit outages.
What I intend to do is create a log file and put entries to the log file
whenever the circuit goes down and becomes up.

I have written Alarm rules which will fire when the circuit is down, the
alarm expression being :

	(NODE4 * CIRCUIT * SUBSTATE <> NONE, AT EVERY 00:10:00)

When this alarm rule fires, the Iconic Map changes color to RED. A command procedure
is invoked and this DCL script will perform a set of actions like writing an 
entry into a log file, sending an E-Mail etc..

However when the circuit comes up again, the rule that was fired previously 
will be cleared and the map changes color to Green. However when this happens, 
no command procedure is fired. So my question is "Is it possible to have a 
DCL procedure invoked when this happens (that is a rule is cleared ) ?".
If you have any suggestions, please let me know.


Thanks and Regards,

Venkat.

-------------------------------------------------------------------------------

Venkat,

Now I know what you want to do. This functionality (detecting that a rule does 
no longer fire) has been on the wishlist for a long time.

As an alternative you could use a pair of change_of rules with appropriate 
action routines, e.g.

	(change_of(node4 * circuit * substate, *, none), at every 00:10:00)
	(change_of(node4 * circuit * substate, none, *), at every 00:10:00)

Got the idea?

*Robert
5597.5good but color probsCTHQ::WOODCOCKTue Sep 21 1993 09:5818
>	(change_of(node4 * circuit * substate, *, none), at every 00:10:00)
>	(change_of(node4 * circuit * substate, none, *), at every 00:10:00)

This is a very good idea but it does come with restrictions. By using 2 
seperate rules the colors won't correlate (I don't think). In other words,
when the circuit goes down the link shows 'red', when it comes back up the
clear rule fires but the link still shows 'red'. It would work with propagation
equal to LATEST but this setting brings on another whole set of worries with
not seeing errors on the map and I wouldn't recommend it.

If memory serves me, there is still another potential solution for getting the
colors right using the above rules but I don't know if it ever worked. Using
mcc internal events for notification instead of the general alarm notify, but
I don't recall the syntax.

best regards,
brad...
5597.6One possible solutionTOOK::NAVKALTue Sep 21 1993 13:5619
	Okay I  will byte. When Alarms rule fires the severity of the 
	rule is what ever the user had indicated when s/he created the 
	rule. How ever when the rule detects that the alarming 
	condition does no longer exist it fire with severity of
	"Clear". Now what we need to do is use this fact in a 
	a very creative way. (What is other wise know as a hack!)

	Lets say rule A is the one looking for Alarming condition
	When Rule A fires let it invoke a command procedure that 
	checks the severity. Severity is passed as a parameter to the 
	command procedure. If the severity is any thing other than
	what was indicated when the rule was created, you have 
	basically detected the "clear" condition.

	Did I answer the question?

	Anil Navkal
	
5597.7no fire on clearCTHQ::WOODCOCKTue Sep 21 1993 16:54110
>	Okay I  will byte. When Alarms rule fires the severity of the 
>	rule is what ever the user had indicated when s/he created the 
>	rule. How ever when the rule detects that the alarming 
>	condition does no longer exist it fire with severity of
>	"Clear". Now what we need to do is use this fact in a 
>	a very creative way. (What is other wise know as a hack!)

>	Lets say rule A is the one looking for Alarming condition
>	When Rule A fires let it invoke a command procedure that 
>	checks the severity. Severity is passed as a parameter to the 
>	command procedure. If the severity is any thing other than
>	what was indicated when the rule was created, you have 
>	basically detected the "clear" condition.

>	Did I answer the question?

Nope. Rules do not fire the procedure when they CLEAR. When rules transition
from some_severity to clear an internal event is generated. This event clears
the map but no exeternal procedures are run. Your idea can be used but you
must write a rule on the internal events of the rules and then parse for
the CLEAR severity.

Something like:

(occurs(domain x rule * OSI RULE FIRED)

Now look for a CLEAR severity and hack at it. Note that even this solution can
actually get messy if there are -many,many- domains. You could wildcard the
domain also but you get into trouble if mcc has multiple uses with rules
firing for different business disciplines.

So what would I do (or did do) to solve this? AVAILABILITY metrics are a trend
and for us are only computed once a month. Therefore we only need something 
statistically close. We poll everything every 10 minutes with the basic general
rule. When this rule fires (and it fires EVERY 10 minutes during an outage) an
entry is placed into a monthly log file. Therefore, statistically speaking, if
a circuit is down this equates to a 10 minute outage for every entry in the
log. No entry, no outage. After the end of the month a procedure counts 
the circuits and nodes being monitored via a configuration file, figures 
out how days there were in previous month, then determines the total polls
for all ckts/nodes for that month. This is done for every CLASS entity
(ie. node4, snmp, etc..). It then searches the log and assumes every
entry equals a 10 minute outage, counts them, then derives an availability 
percentage.

One can argue the accuracy of assuming every entry equals a ten minute outage
so lets spend a minute on this also. Regardless of what POLLING technique you
use this assumption is the only one which can be made. Even if you use the
mcc internal CLEAR event the accuracy is still determined by the original poll
rate of the first alarm. The ONLY way to increase accuracy is to decrease the
polling interval or go to rules on entity events if your system can handle 
it (ours cannot). You would think the assumption of the whole polling interval
as the outage would tend to decrease the AVAILABILITY metric but also note
that there are some outages completely missed because it falls between polls.
I contend it all comes clean, statistically, out in the wash (see ESC metrics
below).

So...if you need AVAILABILITY metrics on a monthly basis why not just do as
described above??  If you want the outage duration announced real time by 
the alarms then you'll have to persue the other methods but otherwise this 
all seems like a lot more work than actually required.

cheers,
brad...

+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
	
USAMTS>ty DISK$MCC:[MCC.MENU]MCC_AVL_AUG.DAT;3

                     MCC_AVL_AUG Availability Metrics
                    ----------------------------------

                                    DECnet


  (67) DECnet routers with (75) circuits were polled every ten minutes
  (2411) 10 minute circuit outages were recorded
  (60) 10 minute DECnet router outages were recorded

  TOTAL DECnet ROUTER AVAILABILITY = 99.98%
  TOTAL DECnet NETWORK/CIRCUIT AVAILABILITY = 99.28%
  TOTAL DECnet ROUTER DPMO = 200 (OFD=1)
  TOTAL DECnet ROUTER DPMO = 28 (OFD=7)
............................................................................

                                    TCP/IP


  (10) TCP/IP routers with (14) circuits were polled every ten minutes
  (219) 10 minute circuit outages were recorded
  (379) 10 minute TCP/IP router outages were recorded

  TOTAL TCP/IP ROUTER AVAILABILITY = 99.16%
  TOTAL TCP/IP NETWORK/CIRCUIT AVAILABILITY = 99.65%
  TOTAL TCP/IP ROUTER DPMO = 8490 (OFD=1)
  TOTAL TCP/IP ROUTER DPMO = 1212 (OFD=7)
............................................................................

                                     WATN


  (81) WATN hosts
  (207) WATN host outages were recorded

  TOTAL WATN HOST UNAVAILABILITY = 5:4:50:32 (d:h:m:s)
  AVERAGE WATN HOST OUTAGE = 36 minutes
  TOTAL WATN HOST AVAILABILITY = 99.80%
  TOTAL WATN HOST DPMO = -304 (OFD=1 computed by minute)
  TOTAL WATN HOST DPMO = -43 (OFD=7 computed by minute)

5597.8Brad is right!TOOK::NAVKALTue Sep 21 1993 21:0916
You got me Brad! Tells me how stale is my Alarms knowledge is now. We went
back and forth on the rule Clear event so many times that I just don't know 
the last "state" of Rule Clear event. Thanks for a complete answer Brad. 

Now I also remember the justification for not executing rule fire procedure.
As most of these procedures are written to handle the case of Alarming
situation, it just was not appropriate to go ahead and execute the same 
procedure when the Alarming situation was acually cleared. 

And yes I can also see why with glabal wild carding things can get real
messy. 

Anyway I hope Venkat you got what you were looking for.

Anil Navkal

5597.9Thank you !ZPOVC::VENKATWed Sep 22 1993 02:1015
    
    Hello,
    
    Thanks to everyone for thier comments and views.
    
    I have decided to go by what Brad suggested. 
    
    However I still wish that the user should be able to decide whether a
    procedure should be fired when an alarm rule is 'CLEAR'ed. This will
    help to take a set of actions when an outage or an alarming event is
    back to normal state.
    
    Thanks again,
    
    Venkat.
5597.10It can be solved!BERN01::GMUERThu Sep 23 1993 05:5928
Hello

The problem can be solved, but it needs some coding. MCC_REACH is one example
for this, see note 24 in conference MCC-TOOLS.

Basic Idea:

1) Put the rule in a separate domain X, where the notification for alarm 
rules is off.

2) Create a rule for OSI rule fired and OSI rule exception events in domain X
with an alarm trigger procedure OSI_RULE_FIRED.COM.

3) Parse the input parameters in OSI_RULE_FIRED.COM and check the severity.

4) Send a data collector event with the actual severity to the target entity 
in the domain Y on the iconic map. You need a collector in the domain Y and
a notify request for collector events in the root domain of Y. MCC_REACH 
maintains a list of domain memberships to find the correct domain Y.
Important: The event title in the alarm event and in the clear event must be
the same to get your desired alarm correlation behaviour.

We have implemented this alarming for a customer here in Switzerland based
on MCC_REACH. I have improved the original OSI_RULE_FIRED.COM, fixing some
bugs and implementing an escalation mechanism. We also use the MCC_REACH 
logfile to make reports about host and server availabilty.

Edgar
5597.11MCC_REACH lives onCTHQ::WOODCOCKThu Sep 23 1993 10:1717
>The problem can be solved, but it needs some coding. MCC_REACH is one example
>for this, see note 24 in conference MCC-TOOLS.

Hi Edgar,

You've made my day!!! MCC_REACH was semi-written while I was on a customer
site. When I came back I cleaned it up a bit for any future opportunities I
potentially might go on. I posted it because it looked to solve some problems 
others might be interested in. Glad to see you found it useful, and IMPROVED 
it! I always get a kick when someone mentions they use it. Why? Because since
I posted the last version I've never had any opportunities to use and test it
in a production environment myself :).

cheers,
brad...