[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference azur::mcc

Title:	DECmcc user notes file. Does not replace IPMT.
Notice:	Use IPMT for problems. Newsletter location in note 6187
Moderator:	TAEC::BEROUD

Created:	Mon Aug 21 1989
Last Modified:	Wed Jun 04 1997
Last Successful Update:	Fri Jun 06 1997
Number of topics:	6497
Total number of notes:	27359

839.0. "event notification on iconic map" by MFRNW1::SCHUSTER (Karl Schuster @MFR Network Services) Mon Mar 25 1991 11:49

    I ask this question, because documentation is not very clear to me:
    
    I would like to use the event 4.18 (adjacency down) to change the
    colour of systems which are no longer reachable.
    
    I set up eventlogging for MCC (MCC_STARTUP_DNA4_EVL.COM), and I get
    4.18 events delivered on my mcc-node.
    My alarm rule expression is the following:
        expression=(occurs(node4 <mcc-node> circuit sva-0 adjacent node *
        adjacency down))
    What happens on the map is the following: the system which reports
    the event (<mcc-node>) turns red, but not the adjacent node which is
    no longer reachable.
    
    Is this the intented behaviour of MCC event notification ? 
    
    Regards, Karl

T.R	Title	User	Personal Name	Date	Lines
839.1		BEAGLE::WLODEK	Network pathologist.	`Mon Mar 25 1991 15:18`	11
	Karl, I can't comment on MCC command, but it is better to use Reachability Change event to change the state of the node rather then adjacency event. Adjacency event means only that the routing initialization completed, not that one can actually communicate with the remote system. In many cases one can lose adjacency but remote node will be still reachable because there are alternative paths. wlodek
839.2	events generated on children apply to their parents	TOOK::DITMARS	Pete	`Mon Mar 25 1991 16:13`	14
	Hi, This is the way things are supposed to work from an architecture point of view i.e. the alarm rule was defined on a particular child entity of a global entity, so when the rule fires the iconic map color change is associated with that child entity of the global entity. It just so happens that the child entity in question is also a global entity, and that to most humans it makes much more sense for this particular alarm firing event to be "posted" against the associated global entity, and not the child entity. We are working on a way to "re-target" events/alarms from one entity to another, mostly to solve this particular issue.
839.3	if there's a will there's a way	JETSAM::WOODCOCK		`Mon Mar 25 1991 17:41`	8
	It seems there is a real need for highlighting the entity of concern. In the next reply you will find a "hack" of the midnight variety for those interested. It may suit the needs of some until the "real" solution is available. The logic is simple and I put as much description as I could to clarify. It's not pretty but it should work for the interim. You could also adapt it for Node Reachability events easily enough I suspect. brad...
839.4	target_entity.com	JETSAM::WOODCOCK		`Mon Mar 25 1991 17:48`	110
	$! TARGET_ENTITY.COM $! written by Brad Woodcock $! Last revision 3/25/91 $! $! This procedure is used with MCC alarms to highlight the proper NODE4 $! entity when an ADJACENCY DOWN (4.18) event is received. The following $! rule characteristics describe the specific alarm which uses this procedure: $! $! create mcc 0 alarms rule zko_nodes - $! Expression = (occurs(node4 bbzk01 circuit ethernet adjacent node * - $! adjacency down)),- $! Procedure = [ALARMS.COM]TARGET_ENTITY.COM;1, Perceived Severity = Clear,- $! in domain .zko-2 $! $! This alarm signals an icon color change to the entity (bbzk01 in this case) $! where the event originated. Because this is not the desired effect the $! user should customize the ALARM SEVERITY "clear" to be identical to that $! of the ENTITY ICON color. $! $! TARGET_ENTITY.COM then parses for the actual node number which is down from $! the alarm data. A procedure is then built and run which creates a polling $! alarm to the entity which is down and uses exception handling to highlight $! the proper entity. $! $! This procedure was specifically built for ESC uses and must be customized $! for others to use. The following potentially need modifying: $! $! - Only those nodes which are located in ESC domains are registered $! in the namespace. To eliminate some overhead a search is done on $! a file to ensure the node which is down is of interest before $! proceding. This file was created with the following command: $! $! MCC>DIR NODE4 , TO FILE=[ALARMS.COM]DNS_NODES.DAT $! $! - When this procedure was written domain information was not passed $! with alarms. Therefore there are a series of IF statements $! which are specific to the ESC and need to be modified for others $! to use. $! $! - Directory and file references need to be modified throughout the $! procedure including the CLEANUP section. $! $! START... $! $! Parse the node number and create a node symbol without the "." to be $! used for temp file names. $! $ data :='p6 $ node1 = f$element(6," ",data) $ area = f$element(0,".",node1) $ num = f$element(1,".",node1) $ node = area + num $ NODE: $! $! $! Search the dns file to ensure this is a node of interest. $! $ search [alarms.com]dns_nodes.dat " ''node1'"/output='node'.tmp $ open input 'node'.tmp $ read/end_of_file=endit input line $ goto close $ endit: $ line :="0" $ close: $ close input $ sho sym line $ if line .eqs. "0" then goto cleanup $ search_node :="''f$extract(42,7,line)'" $ if search_node .nes. node1 then goto cleanup $! $! $! Determine the domain using the area number of the node. $! $ if area .eq. "24" then domain :=".pko-24" $ if area .eq. "2" then domain :=".zko-2" $ if area .eq. "3" then domain :=".mko-3" $ if area .eq. "55" then domain :=".lkg-55" $ if area .eq. "7" then domain :=".mro-7" $ if area .eq. "8" then domain :=".cxo-8" $ if area .eq. "28" then domain :=".alf-28" $ if area .eq. "5" then domain :=".mlo-5" $ if area .eq. "20" then domain :=".ako-20" $ if area .eq. "37" then domain :=".hlo-37" $! $! $! Create and run a procedure which polls the node down and fires an alarm. $! $ open/write demon 'node'_poll.com $ write demon "manage/enterprise" $ write demon "create mcc 0 alarms rule ''node'_poll -" $ write demon "expression=(node4 ''node1' circuit ethernet state=on),-" $ write demon "exception handler=[alarms.com]node_mail.com,-" $ write demon "perceived severity=critical,-" $ write demon "in domain ''domain'" $ write demon "enable mcc 0 alarms rule ''node'_poll,in domain ''domain'" $ write demon "show node4 bbpk99 circuit ethernet state" $ write demon "show mcc 0 alarms rule ''node'_poll all status,in domain ''domain'" $ write demon "show mcc 0 alarms rule ''node'_poll all counters,in domain ''domain'" $ write demon "disable mcc 0 alarms rule ''node'_poll,in domain ''domain'" $ write demon "delete mcc 0 alarms rule ''node'_poll,in domain ''domain'" $ write demon "exit" $ close demon $ @'node'_poll.com $! $! $ CLEANUP: $ delete [decmcc]'node'.tmp; $ delete 'node'_poll.com;* $ purge/keep=3 [decmcc]target_entity.log $ exit
839.5	reachability versus adjacency?	TOOK::CAREY		`Tue Mar 26 1991 11:13`	49
	Thanks for the workaround, Brad. It looks like good stuff. Target Entity support will be a valuable contribution to DECmcc, effectively allowing the network management staff to focus their attention where experience has shown them the problems lie. Without that experience, DECmcc is forced to report what it KNOWs, which isn't really very much. DECmcc knows that an event report was sent in by a routing node that noticed a previously adjacent node was no longer reporting in. That's it. So, we react by bringing the routing node (which reported the event) to the attention of the network manager so that they can investigate. Redirecting to a target entity opens things up for us to start to make the inferences that you are naturally making, and to allow you to make those inferences explicit yourself, saving one step in the cognitive process of understanding what this change at a router means to the network. Now that the information is available, it is time to start building in the mechanisms to help us and you make these inferences explicit. * * * * * * I notice a reply about "reachability changes" being perhaps preferable to "adjacency down". My [limited] experience is that in our network, "reachability changes" might take five minutes or more to be ratified by all of the routers in a given area. By collecting "adjacency down" information from each of the routers, I was under the impression that I was getting similar information in a much more timely manner. It is true that I shouldn't necessarily infer that an "adjacency down" means that a node is no longer reachable. On the other hand, it seems to be a pretty good approximation (at least in our network), and gives me the information more quickly. Any thoughts or comments about watching "reachability" versus "adjacency down"? -Jim Carey
839.6		BEAGLE::WLODEK	Network pathologist.	`Tue Mar 26 1991 11:33`	23
	"Reachability down " events will certainly have a longer time to settle, it depends on size of the networks and line speeds. If the setting time is too long, something has to be done about it. But really, Recheability are the only real events telling if remote node is reachable or not. Adjacency events alone are not good enough. But if you know that a certain node is an end node with just one circuit ( not hot standby or phase V multi circuit) then Adjacency down is enough. Back to the previous discussion. A router will quite often report a network problem or a problem with a different node. So, routers is often just a messenger, there are no problems with it. It is necessary that MCC can decode the source of the event ( not of the sender) and deliver event to right entity. Things will get very complicated otherwise. wlodek
839.7	it works - but eats resources	MFRNW1::SCHUSTER	Karl Schuster @MFR Network Services	`Tue Mar 26 1991 12:23`	9
	Thanks for the recommandations. I tried out .4, and it works. But its a lot of work to set it up in a large environment, and temporarily creating,enabling,disabling,deleting rules is veeeery slow and consumes a lot of CPU and IO. Will there be significat changes concernung event notification in V1.2 ? Karl
839.8	depends on your definition of "significant"	TOOK::CALLANDER		`Tue Mar 26 1991 16:40`	14
	We have been working hard trying to listen to all of the requirements and the ability to retarget (ASSIGN) a new target on a rule or event has been one of the highly7 requested items. It is on the list but we are still going through those lists trying to deteremine what is feasible with the current resource limitations. One of the new things coming will be a notification PM interface that will allow you easier access to the occurrence information stored in chronological order. As well as historical recording of the occurrence information. BTW -- we use the term occurrence when discussing both alarming information and events. jill
839.9		BEAGLE::WLODEK	Network pathologist.	`Wed Mar 27 1991 02:55`	7
	Retargeting is really a priority 1 if we want to have a reasonable alarming. Otherwise we will end up with lots of side applications, to feed in configurations and manipulate MCC to do right thing. I would be a pity. wlodek
839.10	suggestion	JETSAM::WOODCOCK		`Wed Mar 27 1991 09:07`	48
	> I tried out .4, and it works. But its a lot of work to set it up in a > large environment, and temporarily creating,enabling,disabling,deleting > rules is veeeery slow and consumes a lot of CPU and IO. This is true and the reason why a formalized MCC approach is needed. Fortunately this overhead is not an issue for the ESC. All the nodes we are monitoring are on isolated segments from user LANs. These isolated segments only have routers and their LHs on them and therefore only when these routing related nodes go off net is there an event. It is far and few between these "occurances". Also because of this method we are able to track the entire US backbone with only 10 alarms. So for us the setup was trivial. I would caution to all those persuing node monitoring using either 4.18 or 4.14 that this could be a serious load on your system if the 4.18 events are coming from user segments where nodes are up and down quite often. Also using 4.14 (node reachability) events will produce this load even if you're not on the user segment. This is one reason I leaned toward the adj node event to do our monitoring. Another reason was that MCC provides more info in the data for adj down compared to circuit down occurs. Also I had done some testing when all this fuctionality first came about and found that node reachability events took about five minutes of lag time to signal the actual down. This testing was limited to a LAN environment. I wasn't happy with 5 minutes and at the prospect of further delays across a WAN was unacceptable. To each his own but it is safe to say I won't tune our network so that the reporting of down nodes happens quicker. I view this as a DECnet shortcoming and not MCC. Today we use adj node occurrences for a couple of purposes. The first is on point to point circuits which signals the circuit is down. The other is on Ethernet circuits which signals a node is down. Since you haven't formalized your method of target entity may I suggest the following. The descriptions I have seen thus far indicated that each alarm would have to indicate a target entity. This will be too much of a burden to implement in the field because an alarm would have to be written for each node of interest. The plus side of target_entity.com is that it can wildcard and therefore only one rule is needed for each extended LAN. Or if it were modified for node reachability rules only one rule per DECnet area. Your best bet may be to generically parse on either 4.14 or 4.18 occurrances for the actual node down then check for this entity in maps and apply the notify. Wildcard type functionallity is a must to be successful in my view. good day, brad...
839.11	I understand...	TOOK::CALLANDER		`Wed Mar 27 1991 15:47`	28
	and then....the infamous BUT! I do understand what you are getting at about learning something about the events and handling them in specific ways. Unluckily in a generic environment we tend towards getting the generic case working first, so that anyone can write a module and work; then we sit down and look at what special case handling would be useful and start adding that in. Right now the plan is to have a command (and the "verbs" used here are place holders until I get aorund to see what is out there) that will allow the user to ASSIGN a target entity and serverity for any event on any entity. It would be something like: ASSIGN DOMAIN X ENTITY=MANAGED_ENTITY, EVENT=EVENT_NAME, TARGET ENTITY=ENTITY_TO_TURN_COLOR, TARGET SEVERITY=SEVERITY_CODE The user can change these at any time, save them, or delete them; or don't use them at all. I have been trying to come up with a scheme that would allow the user to make use of wildcards in this type of command without having to teach the Notification FM too much. One alternative I have been thinking about was to have special case knowledge about phase 4 events (adjency down and state change) and just handle them special; but we will have to see what time allows. TOO MUCH WORK....TOO LITTLE TIME...
839.12	confused	JETSAM::WOODCOCK		`Wed Mar 27 1991 16:34`	14
	I'm a bit confused on the works of this statement. >> ASSIGN DOMAIN X ENTITY=MANAGED_ENTITY, EVENT=EVENT_NAME, >> TARGET ENTITY=ENTITY_TO_TURN_COLOR, TARGET SEVERITY=SEVERITY_CODE The same event will be received from the MANAGED_ENTITY which will describe many different outages (at least if it is a 4.14 or 4.18 events). Where in this statement is a more descriptive breakdown of the event so MCC can decipher it and make the assignment to the proper TARGET ENTITY? And will this method replace alarms so MCC is working directly from events? Ex. node4 ...... cir ethernet adj node ...... adjacency down ^^^^ This is a variable
839.13	try again	TOOK::CALLANDER		`Wed Mar 27 1991 17:14`	27
	With the direction being taken right now, and I am looking into more verstile approachs, you would do something like: ASSIGN domain jills_domain ENTITY=NODE4 GOSTE REMOTE NODE BOEHM, EVENT=NODE REACHABILITY CHANGE, TARGET ENTITY= NODE4 BOEHM, TARGET SEVERITY = CRITICAL Do this make it clearer? You would have to explicitly enter the info. I have, since entering the last note, talked to a few more people about trying to do something like ASSIGN domain jills_domain ENTITY=NODE4 GOSTE REMOTE NODE *, EVENT=NODE REACHABILITY CHANGE, TARGET ENTITY , TARGET SEVERITY = CRITICAL Where TARGET ENTITY would have an implementation specific default value that says use the parent class (node4) with the child instance (boehm) as the target entity. This would then allow you to set up an easy assignement for both the reachability change and maybe with bridges when they get around to events, forwarding database physical entries work similar.... Just another idea; please feel free to add in your own. But give them to me using something like the FCL syntax so I can understand what you are implying from an implementation stand. jill
839.14	looks good	JETSAM::WOODCOCK		`Thu Mar 28 1991 09:09`	25
	> ASSIGN domain jills_domain ENTITY=NODE4 GOSTE REMOTE NODE , > EVENT=NODE REACHABILITY CHANGE, > TARGET ENTITY , > TARGET SEVERITY = CRITICAL > Where TARGET ENTITY would have an implementation specific default value that > says use the parent class (node4) with the child instance (boehm) as the > target entity. This would then allow you to set up an easy assignement > for both the reachability change and maybe with bridges when they get > around to events, forwarding database physical entries work similar.... I figured there was an expansion of the entity spec from your last note but just couldn't tell. I like this approach very much. The cases where your looking for a different target entity will probably always have some sort of occurance as the child entity from the event. Also if wires are defined as child entities then this may also work for circuits. Ex. ASSIGN ................ ENTITY=NODE4 GOSTE CIRCUIT SYN-, (or CIRCUIT *) EVENT=CIRCUIT DOWN, TARGET ENTITY, TARGET SEVERITY=CRITICAL good idea, brad...
839.15	It gets even better...	WAKEME::ANIL		`Thu Mar 28 1991 18:13`	26
	RE: .12 >>> And will this method replace alarms so MCC is working directly from events? No need to worry about your investment in Alarms! In next version not only you can specify target entity on Rule creation but also you may be able to specify two additional parameters on OCCURS function. Nothing is committed yet, but following is the idea: Expression = (OCCURS ( NODE4 * Event foo , n, hh::mm:ss)) Target Entity = Domain foo Where n is number of times the event is detected within hh:mm:ss time frame. This expression should be particularly handy in trying to monitor adejecency up/down events. In the above expression the added advantage is that the NODE4 x need not be in the Domain in which the Rule is being enabled! - Anil Navkal
839.16	Clearing alarms problem ?	BELFST::ROONEY	Hugh Rooney	`Wed Nov 27 1991 12:42`	17
	Hi, I have a customer using TARGET_ENTITY.COM. They have one small problem which occurs when alarms have fired on several entities in a domain and one of them was fired by target_entity.com. When they clear the alarm which was triggered by Target_entity.com the higher level domains also clear, even though there are still other lower level alarms which have not been cleared. Can anyone suggest why this might happen, or a workaround ? Many thanks in advance for any ideas. Regards Hugh Rooney
839.17	you're right, but I'm no help	JETSAM::WOODCOCK		`Wed Nov 27 1991 14:53`	14
	Hi Hugh, You're definitely right. I just recreated your scenerio. Although I find it hard to believe the target procedure is the culprit, I can't recreate it without using target_entity. As a test I created different alarms at different severities and always cleared the highest severity to no avail in reproducing the problem. It seems to me I remember reading of similar problems in notes a while back. Maybe someone from the ALARMS team can shed some light. Could it be because it was created/activated in batch??? Just a guess, I don't have a clue on this one. best regards, brad...
839.18	problem has been seen before	TOOK::CALLANDER	MCC = My Constant Companion	`Wed Jan 08 1992 11:27`	9
	Well I am from the notif team, not alarms, but will I do? ;-> As to why it happens, who knows. But we do know that it does happen, and there are a few ways to reproduce it. As to will it be fixed, it will be. We have extended the notification functions in 1.2. Some of them (include propogation of color changes) are not all 100% there yet, but you should see enhanced functionality with the 1.2 eft kit.