[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference azur::mcc

Title:DECmcc user notes file. Does not replace IPMT.
Notice:Use IPMT for problems. Newsletter location in note 6187
Moderator:TAEC::BEROUD
Created:Mon Aug 21 1989
Last Modified:Wed Jun 04 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:6497
Total number of notes:27359

839.0. "event notification on iconic map" by MFRNW1::SCHUSTER (Karl Schuster @MFR Network Services) Mon Mar 25 1991 11:49

    I ask this question, because documentation is not very clear to me:
    
    I would like to use the event 4.18 (adjacency down) to change the
    colour of systems which are no longer reachable.
    
    I set up eventlogging for MCC (MCC_STARTUP_DNA4_EVL.COM), and I get
    4.18 events delivered on my mcc-node.
    My alarm rule expression is the following:
        expression=(occurs(node4 <mcc-node> circuit sva-0 adjacent node *
        adjacency down))
    What happens on the map is the following: the system which reports
    the event (<mcc-node>) turns red, but not the adjacent node which is
    no longer reachable.
    
    Is this the intented behaviour of MCC event notification ? 
    
    Regards, Karl
T.RTitleUserPersonal
Name
DateLines
839.1BEAGLE::WLODEKNetwork pathologist.Mon Mar 25 1991 15:1811
    Karl,

    I can't comment on MCC command, but it is better to use Reachability
    Change event to change the state of the node rather then adjacency 
    event. Adjacency event means only that the routing initialization completed,
    not that one can actually communicate with the remote system. 
    In many cases one can lose adjacency but remote node will be still
    reachable because there are alternative paths.

    				wlodek
839.2events generated on children apply to their parentsTOOK::DITMARSPeteMon Mar 25 1991 16:1314
Hi,

This is the way things are supposed to work from an architecture point of
view i.e. the alarm rule was defined on a particular child entity of a global
entity, so when the rule fires the iconic map color change is associated with 
that child entity of the global entity.  

It just so happens that the child entity in question is also a global entity, 
and that to most humans it makes much more sense for this particular alarm 
firing event to be "posted" against the associated global entity, and not
the child entity.

We are working on a way to "re-target" events/alarms from one entity to another,
mostly to solve this particular issue.
839.3if there's a will there's a wayJETSAM::WOODCOCKMon Mar 25 1991 17:418
It seems there is a real need for highlighting the entity of concern. In
the next reply you will find a "hack" of the midnight variety for those
interested. It may suit the needs of some until the "real" solution is
available. The logic is simple and I put as much description as I could
to clarify. It's not pretty but it should work for the interim. You could
also adapt it for Node Reachability events easily enough I suspect.

brad... 
839.4target_entity.comJETSAM::WOODCOCKMon Mar 25 1991 17:48110
$! TARGET_ENTITY.COM
$! written by Brad Woodcock
$! Last revision 3/25/91
$!
$! This procedure is used with MCC alarms to highlight the proper NODE4
$! entity when an ADJACENCY DOWN (4.18) event is received. The following
$! rule characteristics describe the specific alarm which uses this procedure:
$!
$!    create mcc 0 alarms rule zko_nodes -
$!    Expression = (occurs(node4 bbzk01 circuit ethernet adjacent node * -
$!    adjacency down)),-
$!    Procedure = [ALARMS.COM]TARGET_ENTITY.COM;1, Perceived Severity = Clear,-
$!    in domain .zko-2
$!
$! This alarm signals an icon color change to the entity (bbzk01 in this case)
$! where the event originated. Because this is not the desired effect the
$! user should customize the ALARM SEVERITY "clear" to be identical to that
$! of the ENTITY ICON color.
$!
$! TARGET_ENTITY.COM then parses for the actual node number which is down from
$! the alarm data. A procedure is then built and run which creates a polling 
$! alarm to the entity which is down and uses exception handling to highlight 
$! the proper entity.
$!
$! This procedure was specifically built for ESC uses and must be customized
$! for others to use. The following potentially need modifying:
$!
$!	- Only those nodes which are located in ESC domains are registered
$!	  in the namespace. To eliminate some overhead a search is done on
$!	  a file to ensure the node which is down is of interest before 
$!	  proceding. This file was created with the following command:
$!
$!		MCC>DIR NODE4 *, TO FILE=[ALARMS.COM]DNS_NODES.DAT
$!
$!	- When this procedure was written domain information was not passed
$!	  with alarms. Therefore there are a series of IF statements
$!	  which are specific to the ESC and need to be modified for others
$!	  to use.
$!
$!	- Directory and file references need to be modified throughout the
$!	  procedure including the CLEANUP section.
$!
$! START...
$!
$! Parse the node number and create a node symbol without the "." to be
$! used for temp file names.
$!
$ data :='p6
$ node1 = f$element(6," ",data)
$ area = f$element(0,".",node1)
$ num = f$element(1,".",node1)
$ node = area + num
$ NODE:
$!
$!
$! Search the dns file to ensure this is a node of interest.
$!
$ search [alarms.com]dns_nodes.dat " ''node1'"/output='node'.tmp
$ open input 'node'.tmp
$ read/end_of_file=endit input line
$ goto close
$ endit:
$ line :="0"
$ close:
$ close input
$ sho sym line
$ if line .eqs. "0" then goto cleanup
$ search_node :="''f$extract(42,7,line)'"
$ if search_node .nes. node1 then goto cleanup
$!
$!
$! Determine the domain using the area number of the node.
$!
$ if area .eq. "24" then domain :=".pko-24"
$ if area .eq. "2" then domain :=".zko-2"
$ if area .eq. "3" then domain :=".mko-3"
$ if area .eq. "55" then domain :=".lkg-55"
$ if area .eq. "7" then domain :=".mro-7"
$ if area .eq. "8" then domain :=".cxo-8"
$ if area .eq. "28" then domain :=".alf-28"
$ if area .eq. "5" then domain :=".mlo-5"
$ if area .eq. "20" then domain :=".ako-20"
$ if area .eq. "37" then domain :=".hlo-37"
$!
$!
$! Create and run a procedure which polls the node down and fires an alarm.
$!
$ open/write demon 'node'_poll.com
$ write demon "manage/enterprise"
$ write demon "create mcc 0 alarms rule ''node'_poll -"
$ write demon "expression=(node4 ''node1' circuit ethernet state=on),-"
$ write demon "exception handler=[alarms.com]node_mail.com,-"
$ write demon "perceived severity=critical,-"
$ write demon "in domain ''domain'"
$ write demon "enable mcc 0 alarms rule ''node'_poll,in domain ''domain'"
$ write demon "show node4 bbpk99 circuit ethernet state"
$ write demon "show mcc 0 alarms rule ''node'_poll all status,in domain ''domain'"
$ write demon "show mcc 0 alarms rule ''node'_poll all counters,in domain ''domain'"
$ write demon "disable mcc 0 alarms rule ''node'_poll,in domain ''domain'"
$ write demon "delete mcc 0 alarms rule ''node'_poll,in domain ''domain'"
$ write demon "exit"
$ close demon
$ @'node'_poll.com
$!
$!
$ CLEANUP:
$ delete [decmcc]'node'.tmp;*
$ delete 'node'_poll.com;*
$ purge/keep=3 [decmcc]target_entity.log
$ exit
839.5reachability versus adjacency?TOOK::CAREYTue Mar 26 1991 11:1349
    
    
    Thanks for the workaround, Brad.  It looks like good stuff.
    
    Target Entity support will be a valuable contribution to DECmcc,
    effectively allowing the network management staff to focus their
    attention where experience has shown them the problems lie.  Without
    that experience, DECmcc is forced to report what it KNOWs, which isn't
    really very much.
    
    DECmcc knows that an event report was sent in by a routing node
    that noticed a previously adjacent node was no longer reporting in.
    
    That's it.  So, we react by bringing the routing node (which reported 
    the event) to the attention of the network manager so that they can
    investigate.
    
    Redirecting to a target entity opens things up for us to start to make
    the inferences that you are naturally making, and to allow you to make
    those inferences explicit yourself, saving one step in the cognitive
    process of understanding what this change at a router means to the
    network. 
    
    Now that the information is available, it is time to start
    building in the mechanisms to help us and you make these inferences
    explicit.
    
    		*	*	*	*	*	*
    
    I notice a reply about "reachability changes" being perhaps preferable
    to "adjacency down".
    
    My [limited] experience is that in our network, "reachability changes"
    might take five minutes or more to be ratified by all of the routers in
    a given area.  By collecting "adjacency down" information from each of
    the routers, I was under the impression that I was getting similar
    information in a much more timely manner.
    
    It is true that I shouldn't necessarily infer that an "adjacency down"
    means that a node is no longer reachable.  On the other hand, it seems
    to be a pretty good approximation (at least in our network), and gives
    me the information more quickly.
    
    Any thoughts or comments about watching "reachability" versus
    "adjacency down"?
    
    -Jim Carey
    
    
839.6BEAGLE::WLODEKNetwork pathologist.Tue Mar 26 1991 11:3323
    "Reachability down " events will certainly have a longer time to 
    settle, it depends on size of the networks and line speeds.
    If the setting time is too long, something has to be done about it.
    
    
    But really, Recheability are the only real events telling if remote node 
    is reachable or not. 
    
    Adjacency events alone are not good enough. But if you know that a
    certain node is an end node with just one circuit ( not hot standby or
    phase V multi circuit) then Adjacency down is enough.


    Back to the previous discussion. A router will quite often report a
    *network* problem or a problem with a *different* node.
    So, routers is often just a messenger, there are no problems with it.

    It is necessary that MCC can decode the source of the event ( not of
    the sender) and deliver event to right entity. Things will get very
    complicated otherwise.

    				wlodek
839.7it works - but eats resourcesMFRNW1::SCHUSTERKarl Schuster @MFR Network ServicesTue Mar 26 1991 12:239
    Thanks for the recommandations. 
    I tried out .4, and it works. But its a lot of work to set it up in a 
    large environment, and temporarily creating,enabling,disabling,deleting
    rules is veeeery slow and consumes a lot of CPU and IO.
    
    Will there be significat changes concernung event notification
    in V1.2 ?
    Karl
    
839.8depends on your definition of "significant"TOOK::CALLANDERTue Mar 26 1991 16:4014
We have been working hard trying to listen to all of the requirements
and the ability to retarget (ASSIGN) a new target on a rule or event
has been one of the highly7 requested items. It is on the list but we are
still going through those lists trying to deteremine what is feasible with
the current resource limitations.

One of the new things coming will be a notification PM interface that will allow you
easier access to the occurrence information stored in chronological order. As
well as historical recording of the occurrence information.

BTW -- we use the term occurrence when discussing both alarming information
and events.

jill
839.9BEAGLE::WLODEKNetwork pathologist.Wed Mar 27 1991 02:557
    Retargeting is really a priority 1 if we want to have a reasonable
    alarming. Otherwise we will end up with lots of side applications,
    to feed in configurations and manipulate MCC to do right thing.
    I would be a pity.

    			wlodek
839.10suggestionJETSAM::WOODCOCKWed Mar 27 1991 09:0748
>    I tried out .4, and it works. But its a lot of work to set it up in a 
>    large environment, and temporarily creating,enabling,disabling,deleting
>    rules is veeeery slow and consumes a lot of CPU and IO.
    
This is true and the reason why a formalized MCC approach is needed. 
Fortunately this overhead is not an issue for the ESC. All the nodes we
are monitoring are on isolated segments from user LANs. These isolated
segments only have routers and their LHs on them and therefore only
when these routing related nodes go off net is there an event. It is
far and few between these "occurances". Also because of this method we
are able to track the entire US backbone with only 10 alarms. So for us
the setup was trivial.   

I would caution to all those persuing node monitoring using either 4.18
or 4.14 that this could be a serious load on your system if the 4.18
events are coming from *user* segments where nodes are up and down quite
often. Also using 4.14 (node reachability) events will produce this load
even if you're not on the user segment.

This is one reason I leaned toward the adj node event to do our monitoring.
Another reason was that MCC provides more info in the data for adj down
compared to circuit down occurs. Also I had done some testing when all this
fuctionality first came about and found that node reachability events took
about five minutes of lag time to signal the actual down. This testing was
limited to a LAN environment. I wasn't happy with 5 minutes and at the
prospect of further delays across a WAN was unacceptable. To each his own
but it is safe to say I won't tune our network so that the reporting of
down nodes happens quicker. I view this as a DECnet shortcoming and not
MCC.

Today we use adj node occurrences for a couple of purposes. The first is on
point to point circuits which signals the circuit is down. The other is on
Ethernet circuits which signals a node is down.

Since you haven't formalized your method of target entity may I suggest the
following. The descriptions I have seen thus far indicated that each alarm
would have to indicate a target entity. This will be too much of a burden
to implement in the field because an alarm would have to be written for
each node of interest. The plus side of target_entity.com is that it can
wildcard and therefore only one rule is needed for each extended LAN. Or
if it were modified for node reachability rules only one rule per DECnet
area. Your best bet may be to generically parse on either 4.14 or 4.18
occurrances for the actual node down then check for this entity in maps
and apply the notify. Wildcard type functionallity is a must to be successful
in my view.

good day,
brad...
839.11I understand...TOOK::CALLANDERWed Mar 27 1991 15:4728
and then....the infamous BUT!

I do understand what you are getting at about learning something about the 
events and handling them in specific ways. Unluckily in a generic environment
we tend towards getting the generic case working first, so that anyone can
write a module and work; then we sit down and look at what special case handling
would be useful and start adding that in.

Right now the plan is to have a command (and the "verbs" used here are place
holders until I get aorund to see what is out there) that will allow the user
to ASSIGN  a target entity and serverity for any event on any entity.

It would be something like:

ASSIGN DOMAIN X ENTITY=MANAGED_ENTITY, EVENT=EVENT_NAME,
     TARGET ENTITY=ENTITY_TO_TURN_COLOR, TARGET SEVERITY=SEVERITY_CODE

The user can change these at any time, save them, or delete them; or don't
use them at all.

I have been trying to come up with a scheme that would allow the user to
make use of wildcards in this type of command without having to teach the
Notification FM too much. One alternative I have been thinking about was
to have special case knowledge about phase 4 events (adjency down and
state change) and just handle them special; but we will have to see what time
allows.

TOO MUCH WORK....TOO LITTLE TIME...
839.12confusedJETSAM::WOODCOCKWed Mar 27 1991 16:3414
I'm a bit confused on the works of this statement.

>>     ASSIGN DOMAIN X ENTITY=MANAGED_ENTITY, EVENT=EVENT_NAME,
>>     TARGET ENTITY=ENTITY_TO_TURN_COLOR, TARGET SEVERITY=SEVERITY_CODE

The same event will be received from the MANAGED_ENTITY which will
describe many different outages (at least if it is a 4.14 or 4.18 events).
Where in this statement is a more descriptive breakdown of the event so
MCC can decipher it and make the assignment to the proper TARGET ENTITY? 
And will this method replace alarms so MCC is working directly from events?

Ex.   node4 ...... cir ethernet adj node ...... adjacency down
                                          ^^^^
                                           This is a variable
839.13try againTOOK::CALLANDERWed Mar 27 1991 17:1427
With the direction being taken right now, and I am looking into more verstile
approachs, you would do something like:
ASSIGN domain jills_domain ENTITY=NODE4 GOSTE REMOTE NODE BOEHM, 
	EVENT=NODE REACHABILITY CHANGE,
	TARGET ENTITY= NODE4 BOEHM,
	TARGET SEVERITY = CRITICAL

Do this make it clearer? You would have to explicitly enter the info.
I have, since entering the last note, talked to a few more people about
trying to do something like


ASSIGN domain jills_domain ENTITY=NODE4 GOSTE REMOTE NODE *, 
	EVENT=NODE REACHABILITY CHANGE,
	TARGET ENTITY ,
	TARGET SEVERITY = CRITICAL

Where TARGET ENTITY would have an implementation specific default value that
says use the parent class (node4) with the child instance (boehm) as the
target entity. This would then allow you to set up an easy assignement
for both the reachability change and maybe with bridges when they get
around to events, forwarding database physical entries work similar....
Just another idea; please feel free to add in your own. But give them to
me using something like the FCL syntax so I can understand what you are
implying from an implementation  stand.

jill
839.14looks goodJETSAM::WOODCOCKThu Mar 28 1991 09:0925
> ASSIGN domain jills_domain ENTITY=NODE4 GOSTE REMOTE NODE *, 
>	EVENT=NODE REACHABILITY CHANGE,
>	TARGET ENTITY ,
>	TARGET SEVERITY = CRITICAL

> Where TARGET ENTITY would have an implementation specific default value that
> says use the parent class (node4) with the child instance (boehm) as the
> target entity. This would then allow you to set up an easy assignement
> for both the reachability change and maybe with bridges when they get
> around to events, forwarding database physical entries work similar....

I figured there was an expansion of the entity spec from your last note
but just couldn't tell. I like this approach very much. The cases where
your looking for a different target entity will probably always have some
sort of occurance as the child entity from the event. Also if wires are
defined as child entities then this may also work for circuits. Ex.

ASSIGN ................ ENTITY=NODE4 GOSTE CIRCUIT SYN-*, (or CIRCUIT *)
	EVENT=CIRCUIT DOWN,
	TARGET ENTITY,
	TARGET SEVERITY=CRITICAL

good idea,
brad...
839.15It gets even better...WAKEME::ANILThu Mar 28 1991 18:1326
RE: .12
>>> And will this method replace alarms so MCC is working directly from events?


No need to worry about your investment in Alarms! In next version not
only you can specify target entity on Rule creation but also you may
be able to specify two additional parameters on OCCURS function.
Nothing is committed yet, but following is the idea:

Expression    = (OCCURS ( NODE4 * Event foo , n, hh::mm:ss))
Target Entity = Domain foo

Where n is number of times the event is detected within hh:mm:ss time frame.

This expression should be particularly handy in trying to monitor 
adejecency up/down events.

In the above expression the added advantage is that the NODE4 x
need not be in the Domain in which the Rule is being enabled!

- Anil Navkal





839.16Clearing alarms problem ?BELFST::ROONEYHugh RooneyWed Nov 27 1991 12:4217
    Hi,
    
    I have a customer using TARGET_ENTITY.COM. They have one small problem
    which occurs when alarms have fired on several entities in a domain
    and one of them was fired by target_entity.com. When they clear the
    alarm which was triggered by Target_entity.com the higher level domains
    also clear, even though there are still other lower level alarms which
    have not been cleared.
    
    Can anyone suggest why this might happen, or a workaround ?
    
    Many thanks in advance for any ideas.
    
    Regards
    
    Hugh Rooney
    
839.17you're right, but I'm no helpJETSAM::WOODCOCKWed Nov 27 1991 14:5314
Hi Hugh,

You're definitely right. I just recreated your scenerio. Although I find it
hard to believe the target procedure is the culprit, I can't recreate it
without using target_entity. As a test I created different alarms at
different severities and always cleared the highest severity to no avail
in reproducing the problem. It seems to me I remember reading of similar
problems in notes a while back. Maybe someone from the ALARMS team can
shed some light. Could it be because it was created/activated in batch???
Just a guess, I don't have a clue on this one.

best regards,
brad...   

839.18problem has been seen beforeTOOK::CALLANDERMCC = My Constant CompanionWed Jan 08 1992 11:279
    Well I am from the notif team, not alarms, but will I do? ;->
    
    As to why it happens, who knows. But we do know that it does
    happen, and there are a few ways to reproduce it. As to will
    it be fixed, it will be. We have extended the notification functions in
    1.2. Some of them (include propogation of color changes) are not all
    100% there yet, but you should see enhanced functionality with the 1.2
    eft kit.