T.R | Title | User | Personal Name | Date | Lines |
---|
839.1 | | BEAGLE::WLODEK | Network pathologist. | Mon Mar 25 1991 15:18 | 11 |
|
Karl,
I can't comment on MCC command, but it is better to use Reachability
Change event to change the state of the node rather then adjacency
event. Adjacency event means only that the routing initialization completed,
not that one can actually communicate with the remote system.
In many cases one can lose adjacency but remote node will be still
reachable because there are alternative paths.
wlodek
|
839.2 | events generated on children apply to their parents | TOOK::DITMARS | Pete | Mon Mar 25 1991 16:13 | 14 |
| Hi,
This is the way things are supposed to work from an architecture point of
view i.e. the alarm rule was defined on a particular child entity of a global
entity, so when the rule fires the iconic map color change is associated with
that child entity of the global entity.
It just so happens that the child entity in question is also a global entity,
and that to most humans it makes much more sense for this particular alarm
firing event to be "posted" against the associated global entity, and not
the child entity.
We are working on a way to "re-target" events/alarms from one entity to another,
mostly to solve this particular issue.
|
839.3 | if there's a will there's a way | JETSAM::WOODCOCK | | Mon Mar 25 1991 17:41 | 8 |
| It seems there is a real need for highlighting the entity of concern. In
the next reply you will find a "hack" of the midnight variety for those
interested. It may suit the needs of some until the "real" solution is
available. The logic is simple and I put as much description as I could
to clarify. It's not pretty but it should work for the interim. You could
also adapt it for Node Reachability events easily enough I suspect.
brad...
|
839.4 | target_entity.com | JETSAM::WOODCOCK | | Mon Mar 25 1991 17:48 | 110 |
| $! TARGET_ENTITY.COM
$! written by Brad Woodcock
$! Last revision 3/25/91
$!
$! This procedure is used with MCC alarms to highlight the proper NODE4
$! entity when an ADJACENCY DOWN (4.18) event is received. The following
$! rule characteristics describe the specific alarm which uses this procedure:
$!
$! create mcc 0 alarms rule zko_nodes -
$! Expression = (occurs(node4 bbzk01 circuit ethernet adjacent node * -
$! adjacency down)),-
$! Procedure = [ALARMS.COM]TARGET_ENTITY.COM;1, Perceived Severity = Clear,-
$! in domain .zko-2
$!
$! This alarm signals an icon color change to the entity (bbzk01 in this case)
$! where the event originated. Because this is not the desired effect the
$! user should customize the ALARM SEVERITY "clear" to be identical to that
$! of the ENTITY ICON color.
$!
$! TARGET_ENTITY.COM then parses for the actual node number which is down from
$! the alarm data. A procedure is then built and run which creates a polling
$! alarm to the entity which is down and uses exception handling to highlight
$! the proper entity.
$!
$! This procedure was specifically built for ESC uses and must be customized
$! for others to use. The following potentially need modifying:
$!
$! - Only those nodes which are located in ESC domains are registered
$! in the namespace. To eliminate some overhead a search is done on
$! a file to ensure the node which is down is of interest before
$! proceding. This file was created with the following command:
$!
$! MCC>DIR NODE4 *, TO FILE=[ALARMS.COM]DNS_NODES.DAT
$!
$! - When this procedure was written domain information was not passed
$! with alarms. Therefore there are a series of IF statements
$! which are specific to the ESC and need to be modified for others
$! to use.
$!
$! - Directory and file references need to be modified throughout the
$! procedure including the CLEANUP section.
$!
$! START...
$!
$! Parse the node number and create a node symbol without the "." to be
$! used for temp file names.
$!
$ data :='p6
$ node1 = f$element(6," ",data)
$ area = f$element(0,".",node1)
$ num = f$element(1,".",node1)
$ node = area + num
$ NODE:
$!
$!
$! Search the dns file to ensure this is a node of interest.
$!
$ search [alarms.com]dns_nodes.dat " ''node1'"/output='node'.tmp
$ open input 'node'.tmp
$ read/end_of_file=endit input line
$ goto close
$ endit:
$ line :="0"
$ close:
$ close input
$ sho sym line
$ if line .eqs. "0" then goto cleanup
$ search_node :="''f$extract(42,7,line)'"
$ if search_node .nes. node1 then goto cleanup
$!
$!
$! Determine the domain using the area number of the node.
$!
$ if area .eq. "24" then domain :=".pko-24"
$ if area .eq. "2" then domain :=".zko-2"
$ if area .eq. "3" then domain :=".mko-3"
$ if area .eq. "55" then domain :=".lkg-55"
$ if area .eq. "7" then domain :=".mro-7"
$ if area .eq. "8" then domain :=".cxo-8"
$ if area .eq. "28" then domain :=".alf-28"
$ if area .eq. "5" then domain :=".mlo-5"
$ if area .eq. "20" then domain :=".ako-20"
$ if area .eq. "37" then domain :=".hlo-37"
$!
$!
$! Create and run a procedure which polls the node down and fires an alarm.
$!
$ open/write demon 'node'_poll.com
$ write demon "manage/enterprise"
$ write demon "create mcc 0 alarms rule ''node'_poll -"
$ write demon "expression=(node4 ''node1' circuit ethernet state=on),-"
$ write demon "exception handler=[alarms.com]node_mail.com,-"
$ write demon "perceived severity=critical,-"
$ write demon "in domain ''domain'"
$ write demon "enable mcc 0 alarms rule ''node'_poll,in domain ''domain'"
$ write demon "show node4 bbpk99 circuit ethernet state"
$ write demon "show mcc 0 alarms rule ''node'_poll all status,in domain ''domain'"
$ write demon "show mcc 0 alarms rule ''node'_poll all counters,in domain ''domain'"
$ write demon "disable mcc 0 alarms rule ''node'_poll,in domain ''domain'"
$ write demon "delete mcc 0 alarms rule ''node'_poll,in domain ''domain'"
$ write demon "exit"
$ close demon
$ @'node'_poll.com
$!
$!
$ CLEANUP:
$ delete [decmcc]'node'.tmp;*
$ delete 'node'_poll.com;*
$ purge/keep=3 [decmcc]target_entity.log
$ exit
|
839.5 | reachability versus adjacency? | TOOK::CAREY | | Tue Mar 26 1991 11:13 | 49 |
|
Thanks for the workaround, Brad. It looks like good stuff.
Target Entity support will be a valuable contribution to DECmcc,
effectively allowing the network management staff to focus their
attention where experience has shown them the problems lie. Without
that experience, DECmcc is forced to report what it KNOWs, which isn't
really very much.
DECmcc knows that an event report was sent in by a routing node
that noticed a previously adjacent node was no longer reporting in.
That's it. So, we react by bringing the routing node (which reported
the event) to the attention of the network manager so that they can
investigate.
Redirecting to a target entity opens things up for us to start to make
the inferences that you are naturally making, and to allow you to make
those inferences explicit yourself, saving one step in the cognitive
process of understanding what this change at a router means to the
network.
Now that the information is available, it is time to start
building in the mechanisms to help us and you make these inferences
explicit.
* * * * * *
I notice a reply about "reachability changes" being perhaps preferable
to "adjacency down".
My [limited] experience is that in our network, "reachability changes"
might take five minutes or more to be ratified by all of the routers in
a given area. By collecting "adjacency down" information from each of
the routers, I was under the impression that I was getting similar
information in a much more timely manner.
It is true that I shouldn't necessarily infer that an "adjacency down"
means that a node is no longer reachable. On the other hand, it seems
to be a pretty good approximation (at least in our network), and gives
me the information more quickly.
Any thoughts or comments about watching "reachability" versus
"adjacency down"?
-Jim Carey
|
839.6 | | BEAGLE::WLODEK | Network pathologist. | Tue Mar 26 1991 11:33 | 23 |
|
"Reachability down " events will certainly have a longer time to
settle, it depends on size of the networks and line speeds.
If the setting time is too long, something has to be done about it.
But really, Recheability are the only real events telling if remote node
is reachable or not.
Adjacency events alone are not good enough. But if you know that a
certain node is an end node with just one circuit ( not hot standby or
phase V multi circuit) then Adjacency down is enough.
Back to the previous discussion. A router will quite often report a
*network* problem or a problem with a *different* node.
So, routers is often just a messenger, there are no problems with it.
It is necessary that MCC can decode the source of the event ( not of
the sender) and deliver event to right entity. Things will get very
complicated otherwise.
wlodek
|
839.7 | it works - but eats resources | MFRNW1::SCHUSTER | Karl Schuster @MFR Network Services | Tue Mar 26 1991 12:23 | 9 |
| Thanks for the recommandations.
I tried out .4, and it works. But its a lot of work to set it up in a
large environment, and temporarily creating,enabling,disabling,deleting
rules is veeeery slow and consumes a lot of CPU and IO.
Will there be significat changes concernung event notification
in V1.2 ?
Karl
|
839.8 | depends on your definition of "significant" | TOOK::CALLANDER | | Tue Mar 26 1991 16:40 | 14 |
| We have been working hard trying to listen to all of the requirements
and the ability to retarget (ASSIGN) a new target on a rule or event
has been one of the highly7 requested items. It is on the list but we are
still going through those lists trying to deteremine what is feasible with
the current resource limitations.
One of the new things coming will be a notification PM interface that will allow you
easier access to the occurrence information stored in chronological order. As
well as historical recording of the occurrence information.
BTW -- we use the term occurrence when discussing both alarming information
and events.
jill
|
839.9 | | BEAGLE::WLODEK | Network pathologist. | Wed Mar 27 1991 02:55 | 7 |
|
Retargeting is really a priority 1 if we want to have a reasonable
alarming. Otherwise we will end up with lots of side applications,
to feed in configurations and manipulate MCC to do right thing.
I would be a pity.
wlodek
|
839.10 | suggestion | JETSAM::WOODCOCK | | Wed Mar 27 1991 09:07 | 48 |
| > I tried out .4, and it works. But its a lot of work to set it up in a
> large environment, and temporarily creating,enabling,disabling,deleting
> rules is veeeery slow and consumes a lot of CPU and IO.
This is true and the reason why a formalized MCC approach is needed.
Fortunately this overhead is not an issue for the ESC. All the nodes we
are monitoring are on isolated segments from user LANs. These isolated
segments only have routers and their LHs on them and therefore only
when these routing related nodes go off net is there an event. It is
far and few between these "occurances". Also because of this method we
are able to track the entire US backbone with only 10 alarms. So for us
the setup was trivial.
I would caution to all those persuing node monitoring using either 4.18
or 4.14 that this could be a serious load on your system if the 4.18
events are coming from *user* segments where nodes are up and down quite
often. Also using 4.14 (node reachability) events will produce this load
even if you're not on the user segment.
This is one reason I leaned toward the adj node event to do our monitoring.
Another reason was that MCC provides more info in the data for adj down
compared to circuit down occurs. Also I had done some testing when all this
fuctionality first came about and found that node reachability events took
about five minutes of lag time to signal the actual down. This testing was
limited to a LAN environment. I wasn't happy with 5 minutes and at the
prospect of further delays across a WAN was unacceptable. To each his own
but it is safe to say I won't tune our network so that the reporting of
down nodes happens quicker. I view this as a DECnet shortcoming and not
MCC.
Today we use adj node occurrences for a couple of purposes. The first is on
point to point circuits which signals the circuit is down. The other is on
Ethernet circuits which signals a node is down.
Since you haven't formalized your method of target entity may I suggest the
following. The descriptions I have seen thus far indicated that each alarm
would have to indicate a target entity. This will be too much of a burden
to implement in the field because an alarm would have to be written for
each node of interest. The plus side of target_entity.com is that it can
wildcard and therefore only one rule is needed for each extended LAN. Or
if it were modified for node reachability rules only one rule per DECnet
area. Your best bet may be to generically parse on either 4.14 or 4.18
occurrances for the actual node down then check for this entity in maps
and apply the notify. Wildcard type functionallity is a must to be successful
in my view.
good day,
brad...
|
839.11 | I understand... | TOOK::CALLANDER | | Wed Mar 27 1991 15:47 | 28 |
| and then....the infamous BUT!
I do understand what you are getting at about learning something about the
events and handling them in specific ways. Unluckily in a generic environment
we tend towards getting the generic case working first, so that anyone can
write a module and work; then we sit down and look at what special case handling
would be useful and start adding that in.
Right now the plan is to have a command (and the "verbs" used here are place
holders until I get aorund to see what is out there) that will allow the user
to ASSIGN a target entity and serverity for any event on any entity.
It would be something like:
ASSIGN DOMAIN X ENTITY=MANAGED_ENTITY, EVENT=EVENT_NAME,
TARGET ENTITY=ENTITY_TO_TURN_COLOR, TARGET SEVERITY=SEVERITY_CODE
The user can change these at any time, save them, or delete them; or don't
use them at all.
I have been trying to come up with a scheme that would allow the user to
make use of wildcards in this type of command without having to teach the
Notification FM too much. One alternative I have been thinking about was
to have special case knowledge about phase 4 events (adjency down and
state change) and just handle them special; but we will have to see what time
allows.
TOO MUCH WORK....TOO LITTLE TIME...
|
839.12 | confused | JETSAM::WOODCOCK | | Wed Mar 27 1991 16:34 | 14 |
| I'm a bit confused on the works of this statement.
>> ASSIGN DOMAIN X ENTITY=MANAGED_ENTITY, EVENT=EVENT_NAME,
>> TARGET ENTITY=ENTITY_TO_TURN_COLOR, TARGET SEVERITY=SEVERITY_CODE
The same event will be received from the MANAGED_ENTITY which will
describe many different outages (at least if it is a 4.14 or 4.18 events).
Where in this statement is a more descriptive breakdown of the event so
MCC can decipher it and make the assignment to the proper TARGET ENTITY?
And will this method replace alarms so MCC is working directly from events?
Ex. node4 ...... cir ethernet adj node ...... adjacency down
^^^^
This is a variable
|
839.13 | try again | TOOK::CALLANDER | | Wed Mar 27 1991 17:14 | 27 |
| With the direction being taken right now, and I am looking into more verstile
approachs, you would do something like:
ASSIGN domain jills_domain ENTITY=NODE4 GOSTE REMOTE NODE BOEHM,
EVENT=NODE REACHABILITY CHANGE,
TARGET ENTITY= NODE4 BOEHM,
TARGET SEVERITY = CRITICAL
Do this make it clearer? You would have to explicitly enter the info.
I have, since entering the last note, talked to a few more people about
trying to do something like
ASSIGN domain jills_domain ENTITY=NODE4 GOSTE REMOTE NODE *,
EVENT=NODE REACHABILITY CHANGE,
TARGET ENTITY ,
TARGET SEVERITY = CRITICAL
Where TARGET ENTITY would have an implementation specific default value that
says use the parent class (node4) with the child instance (boehm) as the
target entity. This would then allow you to set up an easy assignement
for both the reachability change and maybe with bridges when they get
around to events, forwarding database physical entries work similar....
Just another idea; please feel free to add in your own. But give them to
me using something like the FCL syntax so I can understand what you are
implying from an implementation stand.
jill
|
839.14 | looks good | JETSAM::WOODCOCK | | Thu Mar 28 1991 09:09 | 25 |
|
> ASSIGN domain jills_domain ENTITY=NODE4 GOSTE REMOTE NODE *,
> EVENT=NODE REACHABILITY CHANGE,
> TARGET ENTITY ,
> TARGET SEVERITY = CRITICAL
> Where TARGET ENTITY would have an implementation specific default value that
> says use the parent class (node4) with the child instance (boehm) as the
> target entity. This would then allow you to set up an easy assignement
> for both the reachability change and maybe with bridges when they get
> around to events, forwarding database physical entries work similar....
I figured there was an expansion of the entity spec from your last note
but just couldn't tell. I like this approach very much. The cases where
your looking for a different target entity will probably always have some
sort of occurance as the child entity from the event. Also if wires are
defined as child entities then this may also work for circuits. Ex.
ASSIGN ................ ENTITY=NODE4 GOSTE CIRCUIT SYN-*, (or CIRCUIT *)
EVENT=CIRCUIT DOWN,
TARGET ENTITY,
TARGET SEVERITY=CRITICAL
good idea,
brad...
|
839.15 | It gets even better... | WAKEME::ANIL | | Thu Mar 28 1991 18:13 | 26 |
| RE: .12
>>> And will this method replace alarms so MCC is working directly from events?
No need to worry about your investment in Alarms! In next version not
only you can specify target entity on Rule creation but also you may
be able to specify two additional parameters on OCCURS function.
Nothing is committed yet, but following is the idea:
Expression = (OCCURS ( NODE4 * Event foo , n, hh::mm:ss))
Target Entity = Domain foo
Where n is number of times the event is detected within hh:mm:ss time frame.
This expression should be particularly handy in trying to monitor
adejecency up/down events.
In the above expression the added advantage is that the NODE4 x
need not be in the Domain in which the Rule is being enabled!
- Anil Navkal
|
839.16 | Clearing alarms problem ? | BELFST::ROONEY | Hugh Rooney | Wed Nov 27 1991 12:42 | 17 |
| Hi,
I have a customer using TARGET_ENTITY.COM. They have one small problem
which occurs when alarms have fired on several entities in a domain
and one of them was fired by target_entity.com. When they clear the
alarm which was triggered by Target_entity.com the higher level domains
also clear, even though there are still other lower level alarms which
have not been cleared.
Can anyone suggest why this might happen, or a workaround ?
Many thanks in advance for any ideas.
Regards
Hugh Rooney
|
839.17 | you're right, but I'm no help | JETSAM::WOODCOCK | | Wed Nov 27 1991 14:53 | 14 |
| Hi Hugh,
You're definitely right. I just recreated your scenerio. Although I find it
hard to believe the target procedure is the culprit, I can't recreate it
without using target_entity. As a test I created different alarms at
different severities and always cleared the highest severity to no avail
in reproducing the problem. It seems to me I remember reading of similar
problems in notes a while back. Maybe someone from the ALARMS team can
shed some light. Could it be because it was created/activated in batch???
Just a guess, I don't have a clue on this one.
best regards,
brad...
|
839.18 | problem has been seen before | TOOK::CALLANDER | MCC = My Constant Companion | Wed Jan 08 1992 11:27 | 9 |
| Well I am from the notif team, not alarms, but will I do? ;->
As to why it happens, who knows. But we do know that it does
happen, and there are a few ways to reproduce it. As to will
it be fixed, it will be. We have extended the notification functions in
1.2. Some of them (include propogation of color changes) are not all
100% there yet, but you should see enhanced functionality with the 1.2
eft kit.
|