T.R | Title | User | Personal Name | Date | Lines |
---|
5597.1 | Got a solution, not sure if the problem fits... | BIKINI::KRAUSE | European NewProductEngineer for MCC | Mon Sep 20 1993 05:43 | 12 |
| I'm not sure if I understood your problem. What do you mean by
'Clear'ing a rule? If you mean Disable or Delete, you coud write a rule
that picks up rule events, e.g.
(occurs(domain abc rule xyz, rule disabled))
(occurs(domain abc rule xyz, rule deleted))
You might use the Data Collector to send an event directly from the
action procedure of your first rule. If you just want to notify and not
take further action, this would save you writing a second rule.
*Robert
|
5597.2 | Hint, hint... | BIKINI::KRAUSE | European NewProductEngineer for MCC | Mon Sep 20 1993 05:48 | 7 |
| And - please be more specific in your notes titles. 'Help!' is skipped
by most noters. You can change your title to something more meaningful
with e.g. SET NOTE 5597.0/title="Need help writing an alarm rule"
Thanks,
*Robert
|
5597.3 | Reply to .1 | ZPOVC::VENKAT | | Mon Sep 20 1993 06:27 | 21 |
|
Reply to .1
OK.. Let me be more specific !
My requirement is that I want MCC to keep track of availability of
circuits. So what I intend to do is whenever a circuit is down, write
an entry into a log file and this can be done by the command procedure
that is invoked when the associated alarm rule fires.
You must be knowing that when the circuit comes up again, a general
alarm rule is fired and the iconic map changes color to green. However
no command procedure is invoked. I would like to have command procedure
invoked when this happens. So that I can write an entry into a log file
and by this process, I can keep track of availability of circuits.
I hope this is clear !
Thanks and Regards,
Venkat.
|
5597.4 | To keep the story together... | BIKINI::KRAUSE | European NewProductEngineer for MCC | Tue Sep 21 1993 04:52 | 50 |
| From: ZPOVC::VENKAT "Venkat Narayanan @ZPO" 20-SEP-1993 21:05:37.83
To: BIKINI::KRAUSE
CC: VENKAT
Subj: U: Need your help please ..
Hi,
Thanks for your reply to my note #5597 in TOOK::MCC conference. I need your help
if possible :
What I want to do is by using MCC, I want to keep track of all circuit outages.
What I intend to do is create a log file and put entries to the log file
whenever the circuit goes down and becomes up.
I have written Alarm rules which will fire when the circuit is down, the
alarm expression being :
(NODE4 * CIRCUIT * SUBSTATE <> NONE, AT EVERY 00:10:00)
When this alarm rule fires, the Iconic Map changes color to RED. A command procedure
is invoked and this DCL script will perform a set of actions like writing an
entry into a log file, sending an E-Mail etc..
However when the circuit comes up again, the rule that was fired previously
will be cleared and the map changes color to Green. However when this happens,
no command procedure is fired. So my question is "Is it possible to have a
DCL procedure invoked when this happens (that is a rule is cleared ) ?".
If you have any suggestions, please let me know.
Thanks and Regards,
Venkat.
-------------------------------------------------------------------------------
Venkat,
Now I know what you want to do. This functionality (detecting that a rule does
no longer fire) has been on the wishlist for a long time.
As an alternative you could use a pair of change_of rules with appropriate
action routines, e.g.
(change_of(node4 * circuit * substate, *, none), at every 00:10:00)
(change_of(node4 * circuit * substate, none, *), at every 00:10:00)
Got the idea?
*Robert
|
5597.5 | good but color probs | CTHQ::WOODCOCK | | Tue Sep 21 1993 09:58 | 18 |
|
> (change_of(node4 * circuit * substate, *, none), at every 00:10:00)
> (change_of(node4 * circuit * substate, none, *), at every 00:10:00)
This is a very good idea but it does come with restrictions. By using 2
seperate rules the colors won't correlate (I don't think). In other words,
when the circuit goes down the link shows 'red', when it comes back up the
clear rule fires but the link still shows 'red'. It would work with propagation
equal to LATEST but this setting brings on another whole set of worries with
not seeing errors on the map and I wouldn't recommend it.
If memory serves me, there is still another potential solution for getting the
colors right using the above rules but I don't know if it ever worked. Using
mcc internal events for notification instead of the general alarm notify, but
I don't recall the syntax.
best regards,
brad...
|
5597.6 | One possible solution | TOOK::NAVKAL | | Tue Sep 21 1993 13:56 | 19 |
|
Okay I will byte. When Alarms rule fires the severity of the
rule is what ever the user had indicated when s/he created the
rule. How ever when the rule detects that the alarming
condition does no longer exist it fire with severity of
"Clear". Now what we need to do is use this fact in a
a very creative way. (What is other wise know as a hack!)
Lets say rule A is the one looking for Alarming condition
When Rule A fires let it invoke a command procedure that
checks the severity. Severity is passed as a parameter to the
command procedure. If the severity is any thing other than
what was indicated when the rule was created, you have
basically detected the "clear" condition.
Did I answer the question?
Anil Navkal
|
5597.7 | no fire on clear | CTHQ::WOODCOCK | | Tue Sep 21 1993 16:54 | 110 |
| > Okay I will byte. When Alarms rule fires the severity of the
> rule is what ever the user had indicated when s/he created the
> rule. How ever when the rule detects that the alarming
> condition does no longer exist it fire with severity of
> "Clear". Now what we need to do is use this fact in a
> a very creative way. (What is other wise know as a hack!)
> Lets say rule A is the one looking for Alarming condition
> When Rule A fires let it invoke a command procedure that
> checks the severity. Severity is passed as a parameter to the
> command procedure. If the severity is any thing other than
> what was indicated when the rule was created, you have
> basically detected the "clear" condition.
> Did I answer the question?
Nope. Rules do not fire the procedure when they CLEAR. When rules transition
from some_severity to clear an internal event is generated. This event clears
the map but no exeternal procedures are run. Your idea can be used but you
must write a rule on the internal events of the rules and then parse for
the CLEAR severity.
Something like:
(occurs(domain x rule * OSI RULE FIRED)
Now look for a CLEAR severity and hack at it. Note that even this solution can
actually get messy if there are -many,many- domains. You could wildcard the
domain also but you get into trouble if mcc has multiple uses with rules
firing for different business disciplines.
So what would I do (or did do) to solve this? AVAILABILITY metrics are a trend
and for us are only computed once a month. Therefore we only need something
statistically close. We poll everything every 10 minutes with the basic general
rule. When this rule fires (and it fires EVERY 10 minutes during an outage) an
entry is placed into a monthly log file. Therefore, statistically speaking, if
a circuit is down this equates to a 10 minute outage for every entry in the
log. No entry, no outage. After the end of the month a procedure counts
the circuits and nodes being monitored via a configuration file, figures
out how days there were in previous month, then determines the total polls
for all ckts/nodes for that month. This is done for every CLASS entity
(ie. node4, snmp, etc..). It then searches the log and assumes every
entry equals a 10 minute outage, counts them, then derives an availability
percentage.
One can argue the accuracy of assuming every entry equals a ten minute outage
so lets spend a minute on this also. Regardless of what POLLING technique you
use this assumption is the only one which can be made. Even if you use the
mcc internal CLEAR event the accuracy is still determined by the original poll
rate of the first alarm. The ONLY way to increase accuracy is to decrease the
polling interval or go to rules on entity events if your system can handle
it (ours cannot). You would think the assumption of the whole polling interval
as the outage would tend to decrease the AVAILABILITY metric but also note
that there are some outages completely missed because it falls between polls.
I contend it all comes clean, statistically, out in the wash (see ESC metrics
below).
So...if you need AVAILABILITY metrics on a monthly basis why not just do as
described above?? If you want the outage duration announced real time by
the alarms then you'll have to persue the other methods but otherwise this
all seems like a lot more work than actually required.
cheers,
brad...
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
USAMTS>ty DISK$MCC:[MCC.MENU]MCC_AVL_AUG.DAT;3
MCC_AVL_AUG Availability Metrics
----------------------------------
DECnet
(67) DECnet routers with (75) circuits were polled every ten minutes
(2411) 10 minute circuit outages were recorded
(60) 10 minute DECnet router outages were recorded
TOTAL DECnet ROUTER AVAILABILITY = 99.98%
TOTAL DECnet NETWORK/CIRCUIT AVAILABILITY = 99.28%
TOTAL DECnet ROUTER DPMO = 200 (OFD=1)
TOTAL DECnet ROUTER DPMO = 28 (OFD=7)
............................................................................
TCP/IP
(10) TCP/IP routers with (14) circuits were polled every ten minutes
(219) 10 minute circuit outages were recorded
(379) 10 minute TCP/IP router outages were recorded
TOTAL TCP/IP ROUTER AVAILABILITY = 99.16%
TOTAL TCP/IP NETWORK/CIRCUIT AVAILABILITY = 99.65%
TOTAL TCP/IP ROUTER DPMO = 8490 (OFD=1)
TOTAL TCP/IP ROUTER DPMO = 1212 (OFD=7)
............................................................................
WATN
(81) WATN hosts
(207) WATN host outages were recorded
TOTAL WATN HOST UNAVAILABILITY = 5:4:50:32 (d:h:m:s)
AVERAGE WATN HOST OUTAGE = 36 minutes
TOTAL WATN HOST AVAILABILITY = 99.80%
TOTAL WATN HOST DPMO = -304 (OFD=1 computed by minute)
TOTAL WATN HOST DPMO = -43 (OFD=7 computed by minute)
|
5597.8 | Brad is right! | TOOK::NAVKAL | | Tue Sep 21 1993 21:09 | 16 |
| You got me Brad! Tells me how stale is my Alarms knowledge is now. We went
back and forth on the rule Clear event so many times that I just don't know
the last "state" of Rule Clear event. Thanks for a complete answer Brad.
Now I also remember the justification for not executing rule fire procedure.
As most of these procedures are written to handle the case of Alarming
situation, it just was not appropriate to go ahead and execute the same
procedure when the Alarming situation was acually cleared.
And yes I can also see why with glabal wild carding things can get real
messy.
Anyway I hope Venkat you got what you were looking for.
Anil Navkal
|
5597.9 | Thank you ! | ZPOVC::VENKAT | | Wed Sep 22 1993 02:10 | 15 |
|
Hello,
Thanks to everyone for thier comments and views.
I have decided to go by what Brad suggested.
However I still wish that the user should be able to decide whether a
procedure should be fired when an alarm rule is 'CLEAR'ed. This will
help to take a set of actions when an outage or an alarming event is
back to normal state.
Thanks again,
Venkat.
|
5597.10 | It can be solved! | BERN01::GMUER | | Thu Sep 23 1993 05:59 | 28 |
| Hello
The problem can be solved, but it needs some coding. MCC_REACH is one example
for this, see note 24 in conference MCC-TOOLS.
Basic Idea:
1) Put the rule in a separate domain X, where the notification for alarm
rules is off.
2) Create a rule for OSI rule fired and OSI rule exception events in domain X
with an alarm trigger procedure OSI_RULE_FIRED.COM.
3) Parse the input parameters in OSI_RULE_FIRED.COM and check the severity.
4) Send a data collector event with the actual severity to the target entity
in the domain Y on the iconic map. You need a collector in the domain Y and
a notify request for collector events in the root domain of Y. MCC_REACH
maintains a list of domain memberships to find the correct domain Y.
Important: The event title in the alarm event and in the clear event must be
the same to get your desired alarm correlation behaviour.
We have implemented this alarming for a customer here in Switzerland based
on MCC_REACH. I have improved the original OSI_RULE_FIRED.COM, fixing some
bugs and implementing an escalation mechanism. We also use the MCC_REACH
logfile to make reports about host and server availabilty.
Edgar
|
5597.11 | MCC_REACH lives on | CTHQ::WOODCOCK | | Thu Sep 23 1993 10:17 | 17 |
|
>The problem can be solved, but it needs some coding. MCC_REACH is one example
>for this, see note 24 in conference MCC-TOOLS.
Hi Edgar,
You've made my day!!! MCC_REACH was semi-written while I was on a customer
site. When I came back I cleaned it up a bit for any future opportunities I
potentially might go on. I posted it because it looked to solve some problems
others might be interested in. Glad to see you found it useful, and IMPROVED
it! I always get a kick when someone mentions they use it. Why? Because since
I posted the last version I've never had any opportunities to use and test it
in a production environment myself :).
cheers,
brad...
|