T.R | Title | User | Personal Name | Date | Lines |
---|
2619.1 | Some answers | TOOK::MINTZ | Erik Mintz, DECmcc Development, dtn 226-5033 | Tue Mar 24 1992 11:21 | 19 |
| >1. When I submit alarms that should run every 15 minutes, and an exception
>occurs, then the next exception occurs 5 minutes later not 15 when the
>error has not cleared thereby causing a second exception. Is this correct?
>and is this "exception timer" changeable.
Don't know off hand.
>2. How is the Severity of an exception determined? Is there a way to set
>this so that it is not always Indeterminate? Why does it not take on
>the Severity of the alarm that was set?
It is always Indeterminate. We are considering changing it to take on
the severity of the corresponding alarm.
>3. Why is an MCC alarm tied to a certain revision of the
>Alarms_mail_exceptions.com file in mcc_common? Is there a way to change
This is a security feature. There is no way to change it.
|
2619.2 | More Needed | BEAGLE::ANDRADE | The sentinel (.)(.) | Fri Mar 27 1992 04:29 | 40 |
| Some more on alarms:
Clients want, DECmcc to change icon colors for every event. Also not to
receive too many messages about the same thing.
As an example: Using polling to detect when nodes become unreachable
and then reachable again.
In order to do this it would help if the alarm color for the exception
and for the normal alarm firing could be set to two different colors.
And that DECmcc would always set the Icon to the last color alarmed.
Also in a CHANGE_OF expression, I find it very strange that the
EXCEPTION doesn't count in setting the known attribute value. I think
the attribut value should be changed in this case to "UNKNOWN".
Thus with an alarm of the type:
(CHANGE_OF(NODE4 X state *,*)) , normal firing = clear = blue
, exception firing = critical = red
I could make the ICON go red (and mail sent, etc) when the node becomes
unreachable. (State changes from "on" to "UNKNOWN")
Then while the node is unreachable, the alarm exception doesn't fire
again because the state remains "UNKNOWN".
But when the node comes up again, the state changes from "UNKNOWN" to
"on" and the alarm fires again normally changing the ICON to blue
(and sends mail, etc).
And lastly while the node remains reachable, the alarm doesn't fire
again either because the state remains "on" all the while.
Hoppefull, Gil
P.S. I had several concerned people mentioning something similar to
me. (how with DECmcc you can never see the node state = off, etc. )
because then the node is unreachable and its the exception that fires.
|
2619.3 | Try IpReachability | TOOK::MINTZ | Erik Mintz, DECmcc Development, dtn 226-5033 | Fri Mar 27 1992 07:49 | 9 |
| For SNMP nodes, I would suggest using the IpReachability attribute,
which should return up or down rather than the exception.
Unfortunately, it is too late in the development cycle for us to
change functionality in alarms for V1.2. But I will file your
note as a suggestion QAR so it can be considered later.
-- Erik
|
2619.4 | Occurs N Times + 'generic' reachability? | TOOK::MCPHERSON | Save a tree: kill an ISO working group. | Fri Mar 27 1992 08:19 | 22 |
| > Clients want, DECmcc to change icon colors for every event. Also not to
> receive too many messages about the same thing.
Maybe I'm reading this incorrectly, but this statement seems to
contradict itself.
Would an "OCCURS N TIMES" rule format do anything to help your here?
Re: your questions about reachability et al:
For objects that have 'reachability' defined as an attribute (e.g.
IPreachability for SNMP entities) then you can do what you want. For
all the rest, you will still need more fancy processing (which is not
in 1.2)...
Maybe one could make a case for a 'generic' attribute called
"Reachability" that could be 'inherited' (sorta like reference
attributes) ? Might that be a more reasonable/predictable way to solve
the general problem instead of trying to infer from alarm rule
exceptions firing?
/doug
|
2619.5 | Re. 3 & .4 | BEAGLE::ANDRADE | The sentinel (.)(.) | Mon Mar 30 1992 05:29 | 52 |
| Re.3 Thanks Erik
Too bad it can't be done for v1.2, I guess I should have mentioned it
sooner. But it seemed so obvious that something like this was needed...
Re.4 Doug
A generic Reachability attribute that is inherited certainly seems like
something worth doing. A CHANGE_OF rule would fire when the node first
becomes rechable or unreachable. However, would the ICON colors change
as well with the node state. This would require being able to set ICON
colors by the "REACHABILITY" attribute state.
The "UNKNOWN" value for attributes I asked for, is in fact the same
as your "REACHABILITY" attribute. If you implement it for everything.
Not just the NODE rechability but every attribute's rechability.
Also people still don't want to be swamped by exception alarms...
Exceptions should fire only when the reason for the exception changes.
I also sugest that it would be very usefull to have the exception
and the normal alarm firing turn the ICONs to different colors. This
is desirable no matter what else is done.
When DECmcc fails to get some data then all involved attribute values
should be set to "UNKNOWN" or "UNREACHABLE". For Status attributes,
this just means an extra valid state. For String Attributes this is just
another string. For Numeric attributes it maybe more difficult, it means
associating a state variable to everyone of them.
One use of such information is to get node rechability state, using
CHANGE_OF alarm rules. Another use is to notify you ONCE (both with the
ICON color and your choosen method) when any attribute you are watching
for becomes unreachable, and to notify you ONCE again when it becomes
reachable, for whatever reasons.
For example: You have an alarm to inform you when a line's utilization
goes over 80%. DECmcc would fire the rule normally as requested, but it
also warns you when the UTILIZATION data is not available for whatever
reason. (No need for you to create extra rules, and no fear of being
swamped by exception alarms)
Regards, Gil
*** Another thing that would be very usefull, would be to let us the
DECmcc users. Set the ICON color from the ALARM user command procedure.
Using a command like: "MCC> set entity X color Y, in domain Z"
We could then choose the ICON color, based on whatever data item we
choose and acording to the our specific enviroment.
It would also be usefull for Demos, and the like.
|
2619.6 | smart availability | SKIBUM::GASSMAN | | Mon Mar 30 1992 09:04 | 11 |
| As long as you are talking about reachability - one of the items that
MSU had over HP (as of hot staging for last october's interop) was the
attribute it looks at for the reachability map. MSU digs down into the
interfaces and checks their status - so during the 'frequent' poll, it
not only checks the host, but also it's circuit information. MSU is
actually quite good at finding and displaying particular circuits down
due to this definition of reachability, and is one of MSU's competitive
features MCC needs to emulate as the LAN/internet market is sought
after.
bill
|
2619.7 | DECmcc does have exception color | TOOK::R_SPENCE | Nets don't fail me now... | Mon Mar 30 1992 13:27 | 4 |
| By the way, DECmcc DOES have a seperate color for exceptions, it
is the color associated with the severity INDETERMINATE.
s/rob
|
2619.8 | more discussion | MCDOUG::MCPHERSON | Save a tree: kill an ISO working group. | Mon Mar 30 1992 15:18 | 56 |
| > A generic Reachability attribute that is inherited certainly seems like
> something worth doing. A CHANGE_OF rule would fire when the node first
> becomes rechable or unreachable. However, would the ICON colors change
> as well with the node state. This would require being able to set ICON
> colors by the "REACHABILITY" attribute state.
The behavior of the product is this: Icon color is associated with an event or
alarm rule severity. If there is an event or alarm associated with this
REACHABILITY attribute, THEN the object's icon will change color.
> Also people still don't want to be swamped by exception alarms...
> Exceptions should fire only when the reason for the exception changes.
Sorry. Your interpretation for EXCEPTIONs firing is in conflict with DECmcc's
prescribed behavior. Exceptions fire when an alarm rule CANNOT be evaluated.
Period. That is the prescribed behavior.
> I also sugest that it would be very usefull to have the exception
> and the normal alarm firing turn the ICONs to different colors. This
> is desirable no matter what else is done.
There is a default notification severity (hence color) for EXCEPTION: the one
associated with severity = Indeterminate.
> When DECmcc fails to get some data then all involved attribute values
> should be set to "UNKNOWN" or "UNREACHABLE". For Status attributes,
> this just means an extra valid state. For String Attributes this is just
> another string. For Numeric attributes it maybe more difficult, it means
> associating a state variable to everyone of them.
This may be true in your particular requirements, but certainly not
universally. Might it be that what you're hinting at is some indicator in an
entity's description that describes the 'relationship atomicity' (for want of
a better description) of a group of attributes. I.e. if any member of a given
attribute partition is not returned, then set a flag that indicates "data is
suspect".
Also remember: Alarm rules gotta do what alarm rules gotta do. They look for
the data needed to satisfy rules in a BOOLEAN fashion. I.e. there is no
"Maybe. Come back later" state for an alarm rule; they are either TRUE, FALSE
or INDETERMINATE. INDETERMINATE means that it couldn't evaluate either TRUE or
FALSE and *that* is considered an EXCEPTION (and not the rule.... ;^) )
>Another thing that would be very usefull, would be to let us the
> DECmcc users. Set the ICON color from the ALARM user command procedure.
> Using a command like: "MCC> set entity X color Y, in domain Z"
> We could then choose the ICON color, based on whatever data item we
> choose and acording to the our specific enviroment.
>
> It would also be usefull for Demos, and the like.
Use the Data Collector AM & sample code & specify a target entity in the event.
That'll do what you want fairly simply.
|
2619.9 | More | MAYDAY::ANDRADE | The sentinel (.)(.) | Tue Mar 31 1992 12:33 | 80 |
| Re.6 (Bill)
I agree DECmcc alarms need more functionality. Alarms should be more
then a simple YES/NO check of an entity's attribute. Having DECmcc
alarms do one thing per alarm, means that users have to create a lot
of alarms should not have been needed.
And this is is what I am sugesting, making the alarms, keep track of
the availability of the data they check as well as the data itself.
Then passing the availability data in a effective maner to the user.
Reducing the number of alarm rules, as well as the number of firings
those rules do is one of my major goals. What I want is an ICON map
that mirrors DECmcc's knowledge of the CURRENT state of the network.
(This includes an indication of alarmed entities that are unreachable)
With ONE mail sent to the appropriate people when something serious
happens. A node becoming unreachable is serious, I just don't want to
hear about it thousands of times.
Re.7 (Rob)
> By the way, DECmcc DOES have a seperate color for exceptions, it
> is the color associated with the severity INDETERMINATE.
This is news to me, it must be v1.2 functionality. My v1.1 never did
this.
Re.8 (MCPHERSON)
>The behavior of the product is this: Icon color is associated with an event or
>alarm rule severity. If there is an event or alarm associated with this
>REACHABILITY attribute, THEN the object's icon will change color.
I know that each alarm rule has ONE color associated with it. What I
sugested is that it maybe usefull to have MANY colors associated with
a single rule. Avoiding the need to create many rules to do the same
thing. Something like
"STATE= (ON=green, OFF=red, UNREACHABLE=orange)"
or
"UTILIZATION= ([<80]=green, [>=80]=yellow, UNREACHABLE=red)"
>Sorry. Your interpretation for EXCEPTIONs firing is in conflict with DECmcc's
>prescribed behavior. Exceptions fire when an alarm rule CANNOT be evaluated.
>Period. That is the prescribed behavior.
I know that this is how they work. But consider this, you are polling
10 systems (with 3 alarms each) in an ethernet segment every 10 minutes.
This means if that segment becomes unrechable for a day (has happned)
you will receive over FOUR THOUSAND alarm exceptions in a single day
informing you that those systems are unreachable. (and if you requested
mails...)
That is why I sugest to reduce the work load of the users (and DECmcc)
that if those alarm exceptions informed the user ONCE per alarm that
the system is unreachable, and ONCE again when it becomes rechable. It
would be a lot better; 60 alarm exceptions as opposed to over 4,000.
>Also remember: Alarm rules gotta do what alarm rules gotta do. They look for
>the data needed to satisfy rules in a BOOLEAN fashion. I.e. there is no
>"Maybe. Come back later" state for an alarm rule; they are either TRUE, FALSE
>or INDETERMINATE. INDETERMINATE means that it couldn't evaluate either TRUE or
>FALSE and *that* is considered an EXCEPTION (and not the rule.... ;^) )
I agree a rule gotta do what its gotta do, all I am saying is that
exceptions should fire TWICE ONLY. Firing ONCE when the rule becomes
INDETERMINATE and ONCE AGAIN when the rule becomes DETERMINATE again.
If as indicated, v1.2 sets the ICON to the INDETERMINATE color "WHILE"
the rule evaluation is INDETERMINATE. Then these two things together
would provide what I requested in my original reply.
>Use the Data Collector AM & sample code & specify a target entity in the event.
>That'll do what you want fairly simply.
Thanks for the information, I will look into it.
|
2619.10 | Wish->QAR->Engineering's 'to do' list... | TOOK::MCPHERSON | Save a tree: kill an ISO working group. | Tue Mar 31 1992 13:21 | 19 |
| I urge you to file a QAR so that your suggestions about changes to the
post 1.2 product will get in the queue. Otherwise, they're likely to
remain 'nice ideas'.
BTW: I understand your requirements and they would be nifty
enhancements; you just 'cahn't get there from heah' right now.
/doug
P.S. If you're getting tons of mail on your EXCEPTIONS, then you need
to change the exception procedure to NOT send mail. At least you
wouldn't get flooded with mail when a segment goes unreachable...
P.P.S
You can also create "rules on rules" that look at the counters
associated (I.e. if it's unable to evaluate >10 times, then open the
pod bay doors... blah blah blah.
/doug
|
2619.11 | ok | MAYDAY::ANDRADE | The sentinel (.)(.) | Thu Apr 09 1992 06:53 | 8 |
| Re.10 /doug
Certainly, I will open a QAR if it will help, but Erik Mintz (re.3)
has already done so for me.
These last replies are just to iron the functionality details out.
Gil
|
2619.12 | correlation of events | TOOK::CALLANDER | MCC = My Constant Companion | Tue Apr 21 1992 11:45 | 18 |
| to make things a bit clearer, what you are asking for has already been
defined by the standards on alarming, they call this function event
correlation. There exists in the profile (the document that explains
how to implement a standard) a description of the how to's and when
to's for event correlation. The idea behind correlation is so that
an event is only reported once, and another report doesn't occur until
the condition has been changed (better or worse). In DECmcc we have
implemented correlation at the PM level and not the FM level for V1.2
(though it needs to be handled in the FM; but not until we figure out
how to handle notification of these events when user processes come and
go between the leading edge and trailing edge of an event). If you
select an entity and ask to see it's list of notifications (display
notifications) you will see only the correlated list, not EVERY event
that has been reported. We hope this helps, but are aware of the
limitation that the implementation in the PM has on the user.
jill
|
2619.13 | We hope to see this functionality also!!!!! | COL01::LUNT | | Tue Apr 28 1992 06:01 | 7 |
| Hi,
This correlation of alarms is really needed. We also hope to see it in
the follow on version to 1.2.
Julie Ann
|