T.R | Title | User | Personal Name | Date | Lines |
---|
2405.1 | Lets get this straight ... | NANOVX::ROBERTS | Keith Roberts - DECmcc Toolkit Team | Mon Feb 24 1992 17:02 | 21 |
| RE: .0
Let me see if I can understand whats going on here. You have created a
Rule via the FCL. I guess your syntax was something like:
create domain pko-24 rule <rule-name> -
expression = (node4 bbpk04 circuit syn-0 substate <> none, at every 00:01:00)
When this Rule is enabled you see the 'evaluation' counters increment
by more than 1 per evaluation (ie, per minute)...Right?
Too strange. Could you try the same test without the IMPM .. just by
using the FCL and Notification FM ... type:
notify domain pko-24
before enabling the rule. Let it run for a bit and Show the Rule Counters
every few evaluations. Then post the results here.
thanks,
Keith
|
2405.2 | results of test | ICS::WOODCOCK | | Tue Feb 25 1992 09:30 | 186 |
| Hi Keith,
Thanks for your quick reply. Here are the results you asked for. When the
rule was enabled the circuit was up so it did not fire for the first couple
of minutes. Right at the point when I spawned to show the time I made the
circuit go 'synchronizing' causing it to fire.
You'll note that 17 seconds after the rule is enabled I did the first SHO COUNT
and there are 3 Evaluation False already. I then showed the char. of the rule
for your scrutiny. Two notifies came each firing (true) and this also does
not match the counters.
best regards,
brad...
ps. anything on the color update problems (my real concern)??
notify domain .pko-24
!%MCC-S-NOTIFSTART, Notify request 2 started
!
enable domain .pko-24 rule bbpk04_0
!
!Domain NOCMAN_NS:.pko-24 Rule bbpk04_0
!AT 25-FEB-1992 08:59:56
!
!Normal operation has begun.
!
show domain .pko-24 rule bbpk04_0 all count
!
!Domain NOCMAN_NS:.pko-24 Rule bbpk04_0
!AT 25-FEB-1992 09:00:16 Counters
!
!Examination of attributes shows:
! Creation Timestamp = 25-FEB-1992 08:59:56.64
! Evaluation True = 0
! Evaluation False = 3
! Evaluation Error = 0
!
show domain .pko-24 rule bbpk04_0 all char
!
!Domain NOCMAN_NS:.pko-24 Rule bbpk04_0
!AT 25-FEB-1992 09:02:09 Characteristics
!
!Examination of attributes shows:
! Expression = (node4 bbpk04 circuit syn-* substate
! <> none,at every 0:1:0)
! Severity = Critical
! Probable Cause = Unknown
!
!
show domain .pko-24 rule bbpk04_0 all count
!
!Domain NOCMAN_NS:.pko-24 Rule bbpk04_0
!AT 25-FEB-1992 09:02:43 Counters
!
!Examination of attributes shows:
! Creation Timestamp = 25-FEB-1992 08:59:56.64
! Evaluation True = 0
! Evaluation False = 9
! Evaluation Error = 0
!
sp sho time ! the circuit was brought down here and the rule began to fire
! each minute.
!
!!!!!!!!!!!!!!! Alarm, 25-FEB-1992 09:03:59 !!!!!!!!!!!!!! [2]
!Domain: NOCMAN_NS:.pko-24 Severity: Critical
!Notification Entity: Node4 NOCMAN_NS:.BBPK04 Circuit SYN-0
!Event Source: Domain NOCMAN_NS:.pko-24 Rule bbpk04_0
!Event: OSI Rule Fired
!
! Event Type = QualityofServiceAlarm
! Event Time = 25-FEB-1992 09:03:57.25
! Probable Cause = Unknown
! Additional Info = { (
! significance = True,
! information = "Rule fired: Node4 24.545 Circuit
! SYN-0 Substate = Synchronizing
! 25-FEB-1992 09:03:57.18" ),
! (
! significance = True,
! information = "(node4 bbpk04 circuit syn-*
! substate <> none,at every 0:1:0)
! " ) }
! Managed Object = Node4 24.545 Circuit SYN-0
! Perceived Severity = Critical
!
!
!
!!!!!!!!!!!!!!! Alarm, 25-FEB-1992 09:04:03 !!!!!!!!!!!!!! [2]
!Domain: NOCMAN_NS:.pko-24 Severity: Clear
!Notification Entity: Node4 NOCMAN_NS:.BBPK04 Circuit SYN-1
!Event Source: Domain NOCMAN_NS:.pko-24 Rule bbpk04_0
!Event: OSI Rule Fired
!
! Event Type = QualityofServiceAlarm
! Event Time = 25-FEB-1992 09:03:58.22
! Probable Cause = Unknown
! Additional Info = { (
! significance = True,
! information = "Rule cleared: Node4 24.545
! Circuit SYN-1 Substate = None
! 25-FEB-1992 09:03:57.20" ),
! (
! significance = True,
! information = "(node4 bbpk04 circuit syn-*
! substate <> none,at every 0:1:0)
! " ) }
! Managed Object = Node4 24.545 Circuit SYN-1
! Perceived Severity = Clear
!
!
!
show domain .pko-24 rule bbpk04_0 all count
!
!Domain NOCMAN_NS:.pko-24 Rule bbpk04_0
!AT 25-FEB-1992 09:04:35 Counters
!
!Examination of attributes shows:
! Creation Timestamp = 25-FEB-1992 08:59:56.64
! Evaluation True = 1
! Evaluation False = 14
! Evaluation Error = 0
!
!!!!!!!!!!!!!!! Alarm, 25-FEB-1992 09:04:58 !!!!!!!!!!!!!! [2]
!Domain: NOCMAN_NS:.pko-24 Severity: Critical
!Notification Entity: Node4 NOCMAN_NS:.BBPK04 Circuit SYN-0
!Event Source: Domain NOCMAN_NS:.pko-24 Rule bbpk04_0
!Event: OSI Rule Fired
!
! Event Type = QualityofServiceAlarm
! Event Time = 25-FEB-1992 09:04:57.16
! Probable Cause = Unknown
! Additional Info = { (
! significance = True,
! information = "Rule fired: Node4 24.545 Circuit
! SYN-0 Substate = Synchronizing
! 25-FEB-1992 09:04:57.09" ),
! (
! significance = True,
! information = "(node4 bbpk04 circuit syn-*
! substate <> none,at every 0:1:0)
! " ) }
! Managed Object = Node4 24.545 Circuit SYN-0
! Perceived Severity = Critical
!
!
!
!!!!!!!!!!!!!!! Alarm, 25-FEB-1992 09:05:01 !!!!!!!!!!!!!! [2]
!Domain: NOCMAN_NS:.pko-24 Severity: Clear
!Notification Entity: Node4 NOCMAN_NS:.BBPK04 Circuit SYN-1
!Event Source: Domain NOCMAN_NS:.pko-24 Rule bbpk04_0
!Event: OSI Rule Fired
!
! Event Type = QualityofServiceAlarm
! Event Time = 25-FEB-1992 09:04:57.61
! Probable Cause = Unknown
! Additional Info = { (
! significance = True,
! information = "Rule cleared: Node4 24.545
! Circuit SYN-1 Substate = None
! 25-FEB-1992 09:04:57.11" ),
! (
! significance = True,
! information = "(node4 bbpk04 circuit syn-*
! substate <> none,at every 0:1:0)
! " ) }
! Managed Object = Node4 24.545 Circuit SYN-1
! Perceived Severity = Clear
!
!
!
show domain .pko-24 rule bbpk04_0 all count
!
!Domain NOCMAN_NS:.pko-24 Rule bbpk04_0
!AT 25-FEB-1992 09:05:11 Counters
!
!Examination of attributes shows:
! Creation Timestamp = 25-FEB-1992 08:59:56.64
! Evaluation True = 2
! Evaluation False = 16
! Evaluation Error = 0
!
use log off
!
|
2405.3 | You are using Child Wildcards in your Rule Expression | NANOVX::ROBERTS | Keith Roberts - DECmcc Toolkit Team | Tue Feb 25 1992 14:17 | 30 |
| RE: .2
You are using Child Wildcards in your Rule Expression. I bet you
have 3 SYN circuits on node BBPK04 -- which explains why the counter
increments by 3.
I am currently implementing Global Wildcarding for Alarms. One of the
sub-tasks is to fix the Rule Transition logic; that is ...
Previous Current Event type
False False n/a
False True Rule Fired
False Error Rule Exception
True False Rule Cleared
True True Rule Fired
True Error Rule Exception
Error False Rule Cleared
Error True Rule Fired
Error Error Rule Exception
This existing logic worked fine if there were no wildcards. The new
logic adds another column: the rule target entity.
Could the Icon color problems be due to rules which contained child
wildcards ?
/keith
|
2405.4 | that makes sense to me | ICS::WOODCOCK | | Tue Feb 25 1992 15:33 | 20 |
|
> You are using Child Wildcards in your Rule Expression. I bet you
> have 3 SYN circuits on node BBPK04 -- which explains why the counter
> increments by 3.
You are most correct and that definitely makes sense for the counters.
> Could the Icon color problems be due to rules which contained child
> wildcards ?
> /keith
I suspect you are also right about this but something doesn't feel
quite right yet. I'll have to play a little more now that I have a better
idea of what is going on and get back to you.
thanks,
brad...
|
2405.5 | Architecture .nes. Solution | ICS::WOODCOCK | | Wed Feb 26 1992 11:35 | 63 |
| Keith,
Again, thanks for pointing out my misinterpretation of what is happening. I
wish I understood what was going on when I first saw this a few weeks back.
There is a "M A J O R" implementation issue at hand and I can only hope your
indication that you are still working in this area will fix this. There seems
to be a couple of topics to talk about. I have gone back and retested and here
is what I've got.
Viewing the top domain. Enable the wildcarded rule. Counters increment by
three appropriately with TRUE or FALSE increments depending on the circuit
states as should be. Circuit syn-0 is down and circuits syn-1 & 2 are up.
The rule fires twice each poll period of the alarm. First it goes critical
because syn-0 is down, it then polls syn-1 and turns clear because the circuit
is up, it then polls syn-2 and does nothing because the state of the rule
hasn't changed and the circuit is up. If I view the lower domain while this
is happening the node4 global entity reacts the same; it goes critical then
clear. The line drawn as syn-0 goes critical and stays that way as expected.
All appears to be operating as architected with one glitch which I'll go into
in a paragraph or two. The problem is that the "architecture" doesn't provide
the solution to the problem.
PROBLEM: Build a REAL TIME MONITOR with multiple levels (domain hierarchy)
which shows the current status of the network at any given level.
We already know that viewing most severe alarms doesn't produce this monitor.
We now know that viewing all alarms last fire doesn't work because as indicated
above if I have a problem in a domain I don't see it at the top level. There
appears to be a couple of options:
1. Allow flexibility for individual objects to be set up for LAST or MOST
SEVERE (this was just recently suggested in another note). To work this
has to be defined all the way thru the entity structure with children
having the ability to be defined seperately. I don't believe I support
this because it is simply too much work to set up. Unless you could set
it up globally. Example, Domain *=Most Severe, Node4 *=Most Severe,
Node4 * Circuit *=LAST.
2. A direct hybrid solution may be more appropriate. Dynamic domains and global
entities are MOST SEVERE if alarms are detected 'beneath' within the
children. Children of the global entities are LAST. I suppose there could
be problems with this if a global entity=MOST for the monitoring of its
children and another alarm set is used to monitor something at the global
level which needs to be LAST. There is a trade off and a decision for the
user.
The second topic appears to be a bug. While viewing the lower domain the
rule executes and the line goes critical and the node4 toggles from critical
back to clear. Before the next alarm fires, I look into the node4 and see
the circuit critical, I then look back up and the line has gone clear and
the node4 is now critical (???). The same thing happens if I look up. In the
original example the top level domain toggles from critical to clear. If
I look up from the lower domain the top level now shows the lower level
critical instead. Another quick look down again indicates the line as clear
and the node4 as critical (???).
best regards,
brad...
|
2405.6 | A partial answer | NANOVX::ROBERTS | Keith Roberts - DECmcc Toolkit Team | Wed Feb 26 1992 16:14 | 24 |
| RE: .5
Brad,
I certainly see your point about the Real-time-monitoring capabilities of
DECmcc. There needs to be some more work in this area.
As far as:
>> The second topic appears to be a bug. While viewing the lower domain the
>> rule executes and the line goes critical and the node4 toggles from critical
>> back to clear. Before the next alarm fires, I look into the node4 and see
>> the circuit critical, I then look back up and the line has gone clear and
>> the node4 is now critical (???). The same thing happens if I look up. In the
>> original example the top level domain toggles from critical to clear. If
>> I look up from the lower domain the top level now shows the lower level
>> critical instead. Another quick look down again indicates the line as clear
>> and the node4 as critical (???).
I hope that the new code which implements the Rule Transition logic for
multiple entities fixes your problem. That is, fixes Rule Fired/Cleared
events when Global or Child Wildcards are present in the Rule Expression.
/keith
|
2405.7 | another possibility | ICS::WOODCOCK | | Wed Feb 26 1992 18:02 | 48 |
| > I certainly see your point about the Real-time-monitoring capabilities of
> DECmcc. There needs to be some more work in this area.
My question now becomes will this work be done for V1.2 (MCC managers are
encouraged to answer)? The use or **NON-USE** of this product as our monitor
may very well depend on this functionallity being available. Politically for
marketing reasons we may use it, but my technical recommendation to my current
management would be "seek alternative methods until MCC supplies this
functionallity, at least for our monitoring needs". The alternative is already
running. If V1.2 cannot accomplish this task (which my dollar says EVERY
customer will demand anyway) then the last two years and three months spent
helping to test this product has been a poor investment.
The ESC has two basic needs, a real time monitor as described (without manual
clearing of alarms), and network performance reporting for analysis. V1.2
is outstanding in the area of performance mngmt for each of the protocols
PA supports, and the monitoring capabilities appears to be equally impressive
except for this last issue in the graphics area which is a cliff hanger. It's
got to be done to be the market leader.
Hi Keith,
> I hope that the new code which implements the Rule Transition logic for
> multiple entities fixes your problem. That is, fixes Rule Fired/Cleared
> events when Global or Child Wildcards are present in the Rule Expression.
When you think the fix is done reply here and I'll make a note to test it in
the next version.
Also to follow up on what I suggested a couple of notes back for the monitor,
I don't think that will work. If domains are set to MOST SEVERE we are back
to what V1.1 offers. I guess what is really needed is to have MCC keep
track of all alarms fired within each dynamic domain and display the "highest
current severity" it knows of. Also note it has to handle wildcards and
therefore a current severity for each instance matching the wildcard. The
global entities could do the same, or potentially display a different
severity (warning maybe) if there are only alarms on its children. It would
then be up to the user to draw the child as a seperate line entity to view
the actual status of the child. This would allow for global entity alarms to
be displayed if encountered or needed.
sincerely,
brad...
|
2405.8 | X1.2.15 issues: Alarms clear & Domains not in cache | CUJO::HILL | Dan Hill-Net.Mgt.-Customer Resident | Wed Apr 01 1992 02:15 | 21 |
| I'm concerned by what is described in the previous few notes. I am in
desperate need of the alarms clear and domain caching capabilities.
I am currently running X1.2.15 on VMS V5.5, Motif T1.1. Once alarms
fire and change icon colors, those colors remain even when the alarms
clear in the notification window. This is unacceptable. For my
customer, this is a MAJOR product weakness.
The same is true of domain caching. DECmcc domain/map navigation under
VMS (on VAXstation 3100s) is already slower than grandma with a hernia.
To force another map file read is frustrating. I can manually get what
I need via NCP, UCX, Remote Console, or UNIX prompt command lines
BEFORE MCC can even complete a "Look Into". Can we please have the old
way back?
Please don't think I'm insensitive. I understand you're all under the
gun and that you're also short-handed, but if the above are not fixed,
for V1.2, DECmcc will not be looked upon favorably by my customer.
Thanks,
Dan
|
2405.9 | domain caching will be in v1.2 | POLE::LEMMON | | Wed Apr 01 1992 14:44 | 11 |
|
> The same is true of domain caching. DECmcc domain/map navigation under
> VMS (on VAXstation 3100s) is already slower than grandma with a hernia.
> To force another map file read is frustrating. I can manually get what
> I need via NCP, UCX, Remote Console, or UNIX prompt command lines
> BEFORE MCC can even complete a "Look Into". Can we please have the old
> way back?
Domain caching was put back in x1.2.16 and will be in the product.
It was accidently pulled when the caching for subentities was
pulled.
|
2405.10 | | POLE::LEMMON | | Wed Apr 01 1992 14:47 | 11 |
|
> I am currently running X1.2.15 on VMS V5.5, Motif T1.1. Once alarms
> fire and change icon colors, those colors remain even when the alarms
> clear in the notification window. This is unacceptable. For my
> customer, this is a MAJOR product weakness.
When you say clear, do you mean the alarm fires with clear severity?
If so, are you running with highest or latest propagation?
/Jim
|
2405.11 | re: .5, hang on a minute notification wasn't completed in X1.2.15!!!!! | DADA::DITMARS | Pete | Thu Apr 02 1992 14:47 | 39 |
| Brad,
>hasn't changed and the circuit is up. If I view the lower domain while this
>is happening the node4 global entity reacts the same; it goes critical then
>clear. The line drawn as syn-0 goes critical and stays that way as expected.
>All appears to be operating as architected with one glitch which I'll go into
>in a paragraph or two. The problem is that the "architecture" doesn't provide
>the solution to the problem.
>
>PROBLEM: Build a REAL TIME MONITOR with multiple levels (domain hierarchy)
> which shows the current status of the network at any given level.
>
>We already know that viewing most severe alarms doesn't produce this monitor.
You are working with X1.2.15 of the IMPM. X1.2.15 >>>>*does not*<<<< handle
rule-cleared conditions correctly. Rule-cleared conditions were not taken into
account in the V1.1 design. The work was not straightforward, and it was not
completed until late in the development process.
To be precise, the CLEAR severity that was coming in on the later circuit
*INCORRECTLY* caused the parent NODE4 entity to go CLEAR. Using HIGHEST
policy, the NODE4 entity should (and does, in version later than X1.2.16)
remain the highest severity of any of its children.
Also, the IMPM is able to distinguish among multiple conditions (e.g. different
rules firing) on the same entity, thus if rule 1 is a WARNING and rule 2 is
a MAJOR, the entity will have the MAJOR color. If rule 2 then CLEARS, the
entity will go to WARNING. If rule 1 then CLEARS the entity will go to CLEAR.
Clear? :^)
Latest/highest on a per-entity basis will be a pain to implement and to
use, but if it's a real requirement it should go onto the wish-list (and into
Phase 0 for the next version) and be prioritized along with all the other future
work.
PLEASE, before you go bashing the product in any wider circles, let's understand
that you haven't really tested the final V1.2 functionality in this area!!!!
Thanks for your past and future feedback!
|
2405.12 | re: .8 more details... | DADA::DITMARS | Pete | Thu Apr 02 1992 15:00 | 15 |
| >Once alarms
> fire and change icon colors, those colors remain even when the alarms
> clear in the notification window. This is unacceptable. For my
> customer, this is a MAJOR product weakness.
See reply .11.
Explain what you mean by "clear in the notification window". Do you mean that
an alarm rule fires with severity CLEAR, or that the user DELETES the
notification, or merely that the notification scrolls out of the viewport
in the notification window?
If you're talking about rules on a global entity, and are using x1.2.15 or
earlier, then you're not going to see correct coloring behavior in the IMPM
when a rule fires and then clears.
|
2405.13 | great news | ICS::WOODCOCK | | Thu Apr 02 1992 17:45 | 49 |
| Pete,
>You are working with X1.2.15 of the IMPM. X1.2.15 >>>>*does not*<<<< handle
>rule-cleared conditions correctly. Rule-cleared conditions were not taken into
>account in the V1.1 design. The work was not straightforward, and it was not
>completed until late in the development process.
Glad to hear I wasn't seeing finished code.
>To be precise, the CLEAR severity that was coming in on the later circuit
>*INCORRECTLY* caused the parent NODE4 entity to go CLEAR. Using HIGHEST
>policy, the NODE4 entity should (and does, in version later than X1.2.16)
>remain the highest severity of any of its children.
You just made my day (3 months worth)... Is this also going to be true for
domains?
>Also, the IMPM is able to distinguish among multiple conditions (e.g. different
>rules firing) on the same entity, thus if rule 1 is a WARNING and rule 2 is
>a MAJOR, the entity will have the MAJOR color. If rule 2 then CLEARS, the
>entity will go to WARNING. If rule 1 then CLEARS the entity will go to CLEAR.
>Clear? :^)
Clear as the sky is blue.
>Latest/highest on a per-entity basis will be a pain to implement and to
>use, but if it's a real requirement it should go onto the wish-list (and into
>Phase 0 for the next version) and be prioritized along with all the other future
>work.
From your description above it seems that HIGHEST is what the doctor ordered.
It may even be worth renaming to CURRENT HIGHEST as this looks quite different
than what users of V1.1 see today for HIGHEST.
>PLEASE, before you go bashing the product in any wider circles, let's understand
>that you haven't really tested the final V1.2 functionality in this area!!!!
Fair enough. The previous notes were written before the new schedule was out
with an understanding the release date was about to be upon us. It looked as
if some serious attention needed to be afforded in this critical area in a
very short period or the boat would be missed for another year. The next bash
should be held in a local establishment when V1.2 ships :-).
>Thanks for your past and future feedback!
Bring on the next kit.
best regards,
brad...
|
2405.14 | DW did it. | DADA::DITMARS | Pete | Thu Apr 02 1992 20:11 | 10 |
| >You just made my day (3 months worth)...
Actually David Wong made your day. I just told you it was made. :^)
>Is this also going to be true for domains?
Yes.
Again, thanks for your testing efforts, your feedback, your concern, and your
patience. We really appreciate it!
|
2405.15 | still not right | ICS::WOODCOCK | | Tue Apr 21 1992 16:19 | 34 |
| Pete/all,
I just got my EVL sink going and did some testing and this is what I found.
>To be precise, the CLEAR severity that was coming in on the later circuit
>*INCORRECTLY* caused the parent NODE4 entity to go CLEAR. Using HIGHEST
>policy, the NODE4 entity should (and does, in version later than X1.2.16)
>remain the highest severity of any of its children.
The problem of a later circuit causing a 'clear' is no longer their as you
state above.
>Also, the IMPM is able to distinguish among multiple conditions (e.g. different
>rules firing) on the same entity, thus if rule 1 is a WARNING and rule 2 is
>a MAJOR, the entity will have the MAJOR color. If rule 2 then CLEARS, the
>entity will go to WARNING. If rule 1 then CLEARS the entity will go to CLEAR.
>Clear? :^)
With multiple children I don't seem to be able to get this to work as
described. When set to HIGHEST I create a 'critical' event (node goes red), I
then create a 'clear' event. The 'clear' notify comes in but the node stays
red (events were on the same child). This implies MCC is showing HIGHEST of
all events rather than CURRENT HIGHEST OF ALL CHILDREN making this method not
suitable for a real time hands off monitor.
If I set notify's to LATEST. Create a critical event for two of the children
causing the node to be red. Then create a single 'clear' event for only one
of the children. The result is that the node goes 'clear' while one of the
children is still 'critical'. This also happens at the domain level.
Question: Is development of this functionallity still under way??
best regards,
brad...
|
2405.16 | using t1.2.7 | ICS::WOODCOCK | | Tue Apr 21 1992 16:20 | 1 |
| ps. previous notes' testing was under t1.2.7.
|
2405.17 | event correlation: a modest proposal | DADA::DITMARS | Pete | Thu Apr 23 1992 11:41 | 40 |
| OK, the problem is that when correlation on events is done, it's done by
default on the event ID. So event A (circuit down) is never going to correlate
to event B (circuit up) because their IDs are different. Therefore, given the
current set of notification services there's no way to make an icon go "red" and
then "green" automatically based on events.
Presently, there's a way to override the default event correlation behavior so
that correlation can be done on the text of an event instead of the ID of the
event (via the mcc_ns.replyTextMatchEnts resource in the
mcc_notification_resource.dat file). This extends us in one direction to allow
the status of multiple conditions to be tracked that are reported via the same
event (a la data collector).
It would appear that failing some more sophisticated mechanism for informing
notification services of how events should be correlated (e.g. a table with
circuit up and circuit down identified as being related), a method similar to
the replyTextMatchEnts resource would provide behavior that is much more useful
than the present implementation.
I'm proposing the following change to event correlation for the V1.2 product:
1) event correlation BY DEFAULT changes to lump all events together,
thus circuit up and circuit down would correlate to one another
2) a resource is added to the mcc_notification_resource.dat file
mcc_ns.eventIdMatchEnts, which is a list of global entity classes
for which events should be correlated based on ID
(i.e. the present behavior)
What this will give us is the ability to use events, notify requests and
targetting (to assign severities to events) to more correctly indicate the
status of phase4 circuits, etc..
Correlation of alarms would not be affected in any way by this change.
I know the above isn't a perfect solution, but we're talking about an acceptable
risk and impact to the existing product schedule that gets us a few steps
farther in the right direction of solving real customer problems and producing
a more saleable product.
Feedback is welcome.
|
2405.18 | proposal comments | ICS::WOODCOCK | | Thu Apr 23 1992 12:48 | 74 |
| Hi Pete,
As a follow on to our conversation I'd like to get some thoughts in writing.
First is the importance of this functionallity. Anyone using polling to manage
their circuits should be ok generally speaking. But, anyone with their nose
to the grindstone doing network mngmt will want to use events and have proper
colors showing status of the net. I don't have to look far for a clear example.
We have both MCC and MSU running. There have been many situations (recent)
where the network has had critical problems where the backbone has 10-15
circuits bouncing at once. MCC X1.2.15 is right there in the middle of it with
bells, mail, log files, notify window, and color even to the point where we saw
our other polling alarms fail across the net with exceptions (an ugly sight but
a key indicator how bad things were).
And MSU, barely a blink!! MCC and events hands down [do success stories help
lobbying efforts :-)].
>It would appear that failing some more sophisticated mechanism for informing
>notification services of how events should be correlated (e.g. a table with
>circuit up and circuit down identified as being related), a method similar to
>the replyTextMatchEnts resource would provide behavior that is much more useful
>than the present implementation.
When you do get to correlation of events here are some examples:
Corrolate: Circuit Down Circuit Fault (4.7) to Circuit Up (4.10)
Adjacency Down (4.18) to Adjacency Up (4.15)
Node Reachability (needs correlation via the text in the event)
Area Reachability (needs correlation via the text in the event)
>I'm proposing the following change to event correlation for the V1.2 product:
> 1) event correlation BY DEFAULT changes to lump all events together,
> thus circuit up and circuit down would correlate to one another
> 2) a resource is added to the mcc_notification_resource.dat file
> mcc_ns.eventIdMatchEnts, which is a list of global entity classes
> for which events should be correlated based on ID
> (i.e. the present behavior)
>What this will give us is the ability to use events, notify requests and
>targetting (to assign severities to events) to more correctly indicate the
>status of phase4 circuits, etc..
This sounds as if it is workable for the v1.2 time frame as a solution (I'll
take anything that comes close at this point). There is one limitation which
must be brought to light. This will work well if the user is only doing
very basic mngmt tasks like circuit monitoring. If the user does this task AND
a notify of some other event something may get missed at some point because the
colors will strictly show the LAST severity. I would contend that for most
implementations job #1 is to track circuit and node outages with the monitor
therefore it should be ok. Adding the above functionallity can only help one
way or the other.
>Correlation of alarms would not be affected in any way by this change.
This should be a future consideration because some users won't use NOTIFY for
the events but instead an alarm on the event to trigger other activity. They
will have to do both with the above solution. A NOTIFY command to handle color
changes in the domain and an ALARM with no associated domain to handle other
activity while maintaining the proper colors.
>I know the above isn't a perfect solution, but we're talking about an acceptable
>risk and impact to the existing product schedule that gets us a few steps
>farther in the right direction of solving real customer problems and producing
>a more saleable product.
Admittedly the solution isn't perfect, but at this late stage in the v1.2 game
any changes moving forward are appreciated.
any other comments out there...
best regards,
brad...
|
2405.19 | refinement to proposal | DADA::DITMARS | Pete | Thu Apr 23 1992 14:22 | 18 |
| a suggested refinement is that instead of
1) event correlation BY DEFAULT changes to lump all events together,
thus circuit up and circuit down would correlate to one another
we do
1) event correlation BY DEFAULT changes to lump together all events that
arrive in response to the SAME NOTIFY REQUEST. thus you could have
one notify request looking for circuit up and down and another
notify request looking for other events that shouldn't interfere
with the circuit up/circuit down event correlation (or another
pair of events that correlate to one another like circuit up/down).
This is a better step toward the "real" solution of having a table of
events that correlate to one another. You instead have a table of notify
requests, each of which is looking for a list of events that correlate to one
another.
|
2405.20 | agreed, it's better | ICS::WOODCOCK | | Thu Apr 23 1992 22:02 | 26 |
|
1) event correlation BY DEFAULT changes to lump together all events that
arrive in response to the SAME NOTIFY REQUEST. thus you could have
one notify request looking for circuit up and down and another
notify request looking for other events that shouldn't interfere
with the circuit up/circuit down event correlation (or another
pair of events that correlate to one another like circuit up/down).
This is a better step toward the "real" solution of having a table of
events that correlate to one another. You instead have a table of notify
requests, each of which is looking for a list of events that correlate to one
another.
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
I like it, for the most part. The only complexity comes from having to set up
TARGETTING for every dynamic domain. The documentation will have to be precise
on how to pull this off or you'll lose the users in the smoke.
This is definitely a better approach with the 'real' solution providing a
mechanism for severity of each event so TARGETTING isn't so intense.
This brings up a question: Why doesn't TARGETTING propogate down thru domains
like NOTIFYs??? It seems to be an inconsistency from the users view.
Again, good idea...[is the code ready for testing yet :-)]
brad...
|
2405.21 | copying targetting from domain to domain to ... | DADA::DITMARS | Pete | Mon Apr 27 1992 13:36 | 51 |
| Howdy,
>This brings up a question: Why doesn't TARGETTING propogate down thru domains
>like NOTIFYs??? It seems to be an inconsistency from the users view.
The "Expand" argument of the Notify directive controls the "propagate down"
behavior. Assign Target doesn't have such an argument. Sounds like a good
suggestion.
In the mean time, once you plan your use of events and targetting, you could
propagate them to another set of domains in one of three ways:
1) use FCL to assign the targets to all domains, e.g.
assign target domain * -
event source = "node4 * circuit *", -
event name = "...", -
(etc.)
2) use your favorite editor to create an FCL script that specifies
the domain as a symbol, e.g.
assign target domain domain_instance -
event source = "node4 *", -
event name = "..." -
(etc.)
then create a master script that invokes that script, defining the
domain instance symbol appropriately e.g.
define domain_instance FOO
@setup_targets
define domain_instance FOO2
@setup_targets
(etc.)
3) using the IMPM, you can copy targets from one domain to another
domain using the targetting clipboard:
a. get targets the way you want them in source domain
b. from the Notification Window, bring up two target
directory windows by clicking
"Targeting->Directory of Targets.." and
"Targeting->Directory of Targets in new window..."
c. put source domain's name in one target directory window's
domain field and press "Update Display"
d. press Edit->Select All in that window, then Edit->Copy
e. put destination domain's name in other target directory
window, and press Edit->Paste in that window
|
2405.22 | better alarm correlation will probably have to wait | DADA::DITMARS | Pete | Mon Apr 27 1992 13:56 | 17 |
| re: .18
>>Correlation of alarms would not be affected in any way by this change.
>
>This should be a future consideration because some users won't use NOTIFY for
>the events but instead an alarm on the event to trigger other activity. They
>will have to do both with the above solution. A NOTIFY command to handle color
>changes in the domain and an ALARM with no associated domain to handle other
>activity while maintaining the proper colors.
Good point (that notify requests can't presently associate an activity with
an event and alarms can). Good work-around too (use Notify to get the color
right and null-domain rule to take action).
Are there cases where this real primative correlation approach would be useful
for alarms? (it'd be more work and more risk and we probably can't consider
it for V1.2, unless it's an even bigger win than this proposal).
|
2405.23 | one happy camper | ICS::WOODCOCK | | Thu Apr 30 1992 13:19 | 40 |
| Hi Pete et all,
I think this note has served its purpose and looks to be winding down. I
wanted to make sure it was left on a HIGH note. The exe you provided has
worked very well for our needs. So good actually, I put the 30k block debug
beast into production services while we wait for the next kit [and of course
we are looking for bugs :-)]. The functionallity added here is a **MAJOR**
win for DECmcc and those involved should know the effort should be worth it
and is appreciated greatly internally.
>Are there cases where this real primative correlation approach would be useful
>for alarms? (it'd be more work and more risk and we probably can't consider
>it for V1.2, unless it's an even bigger win than this proposal).
Actually I think I **PREFER** this solution over setting up a stiff correlation
table for specific events. I think this should be extended to alarms and also
to a MIX of events/rules within a NOTIFY command. For example, if a user has
two methods of monitoring an entity (events and polling) they won't correlate
today.
Scenario: Notify for events today, and poll every half hour as a backup and to
fire procedures (this is in use now). A "DOWN" event comes in at 12:00 and the
entity turns RED. The polling rule comes in at 12:05 and also goes critical.
The circuit comes back "UP" at 12:10 and the event clears the down event. But,
the entity stays RED until 12:35 rolls around and the polling rule clears. If
both the rule and the events were correlated together the entity would have
turned 'clear' at 12:10. Would this be an even BIGGER win??
So what you have begun today would be "BETTER" than what you were going to
provide as a solution later. Just extend this method into RULES and a MIX of
rules/events and you will have more flexibility than you were going to
provide with your future plans!!
All this and you could probably pull it off within V1.2 ;-).
My only warning is that the DOCUMENTATION must be PRECISE and THOROUGH on
how to use these services.
kind regards,
brad...
|