[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference azur::mcc

Title:	DECmcc user notes file. Does not replace IPMT.
Notice:	Use IPMT for problems. Newsletter location in note 6187
Moderator:	TAEC::BEROUD

Created:	Mon Aug 21 1989
Last Modified:	Wed Jun 04 1997
Last Successful Update:	Fri Jun 06 1997
Number of topics:	6497
Total number of notes:	27359

994.0. "NODE4 unreachable rule..." by FLYROC::GOYETTE (PLAY BALL !!!) Wed May 08 1991 11:31


	Hi,

	Is there an easier/better way to set up a rule for a node4 entity
	being unreachable ? The sample rule for this condition uses the
	local router to test the condition.

	I have two qualms with this. When node4 X goes down, the router icon
	turns red, not the node4 X icon. Also, If I am using this rule to
	monitor several nodes, I get bombarded by node unreachable messages
	when the router is out of resources. In my building this happens
	very frequently. The alarms are meaningless as well. The nodes are
	reachable it's just that the alarm module couldn't communicate with
	the router to perform the test.

	Any ideas ?

	Thanks,
	Joe

T.R	Title	User	Personal Name	Date	Lines
994.1	Hello !	FLYROC::GOYETTE	PLAY BALL !!!	`Fri May 10 1991 09:08`	6
	Is anyone here ? A simple yes or no to .0 would suffice.... Thank You, Joe
994.2	Need a bit more info first...	NSSG::R_SPENCE	Nets don't fail me now...	`Fri May 10 1991 09:38`	8
	Let me ask you a question first... Better than what? Could you describe how you are currently thinking of solving this so we can perhaps offer a better way? From your note I don't know how you are now handeling it. thanks s/rob
994.3	Does the note 839.5 (thanks Jim Carey) answer your question?	WAKEME::ANIL		`Fri May 10 1991 12:12`	5
	Please refer to note 839.. 839.5 should answer your question. If the 839. does not answer your question please let us know. - Anil Navkal
994.4	more info...	FLYROC::GOYETTE	PLAY BALL !!!	`Fri May 10 1991 14:22`	62
	Thanks for the pointers... > Let me ask you a question first... > > Better than what? Could you describe how you are currently thinking > of solving this so we can perhaps offer a better way? From your note > I don't know how you are now handeling it. > > thanks > s/rob A method other than the one used by the sample rule. I've been doing a lot of work in the Event Management area of system management. Currently, I'm looking at, installing, using, testing all of Digital's system management/event management tools and applications in order to determine each one's strengths, weaknesses, etc... I'm also trying to determine the feasibility of using DECmcc as the central product around which a cohesive event management system can be built. Hence, the reason I asked the question about node unreachable events. All of the event management products detect this condition, but using MCC to detect it is by far the most difficult. > Please refer to note 839.. 839.5 should answer your question. > If the 839. does not answer your question please let us know. > > - Anil Navkal I missed this topic on my original pass, having done a dir/title on NODE4 and Unreachable...Nevertheless, after reading the notes in that topic it seems to me that there's too much extra work on the part of the user just to get a particular icon, the target, to change color for the node unreachable event. Is there a simple way, like a callable interface or something, for an outside agent to send a message directly to MCC somehow ? For Example, let's say I'm using the MCC iconic map. I also have Data Center Monitor running on this system. DCM detects that node X has just become unreachable, no, let's use a completely different example. An example of not only using MCC as the network management tool but also as the central repository for enterprise-wide events and alarms. DCM is setup to monitor for low disk space on node X. At some point DCM determines that disk A on node X is low on disk space. From DCM a command file is executed which initiates a purge on disk A. However, I would also like to be able to send a message to MCC stating this condition and have MCC change the color of the node X icon. I'm almost positive that nothing like this currently exists but to me it seems like this should be a part of the evolution of MCC and its mission to provide enterprise-wide management. Your thoughts on this ? Joe
994.5	Application generated DECnet events...	NSSG::R_SPENCE	Nets don't fail me now...	`Fri May 10 1991 15:40`	26
	Your last example is doable. Define some DECnet events from the ones for customer applications that represent the various things that can go wrong. Write a simple program that DCM can call when things go wrong that will generate the appropriate DECnet event. Sink the DECnet events from the systems to the DECmcc station. Set up a series of alarm rules for events from the systems. Have DCM, when it detects a problem, cause the event generating program to run on the system with the problem. The proceedure that the alarm rule calls could add needed text to a message as needed from the info provided from the alarm. I realize that it would be better to have DCM generate all the events from the DCM system (cause the actual system may be too ill to do it) but that will have to wait till the indirect notification (or whatever Engineering will call it) is available in a future release. Anil, it seems to me that the above would work. What do you think? s/rob
994.6	Lets me face it....	WAKEME::ANIL		`Fri May 10 1991 16:17`	80
	Rob, your suggestion will certainly work. No problem with Alarms here. If a DCM is going to sink an event for the correct NODE4, turning that specific NODE4 icon's color is straight forward. The problem we encounter with NODE4 is that the remote node that is not reachable is the child entity of [router] node. So if remote node becomes unreachable you see color changes for the actual router. This is a particularly sticky problem to solve. Just because the node is not reachable does not mean that the node itself is powered down. It can become unreachable for any one of the following reason: 1. A bridge between the two node may be experiencing some hardware/software problems. 2. There may be a problem with the physical/logical circuit to the node. 3. There may be inadequate resources at the remote node Also just because we received the :node reachable event" we can not conclude that the node is unreachable now, all it means is that that the node became unreachable when the event was generated. I agree as a user all this does not matter. S/He wants the color to change when the object/entity is unreachable and bring it back to normal color (Read severity = clear) when it is reachable, no ifs no buts! I don't know of any good way of solving this problem. A generic Alarms can only go so far! All I can say is that if you have an event we can make a rule on, you are in business!! The syntax for creating rules base on events is as follows: expression = (OCCURS (<entity> <event name>)). When alarms detects the event, notification of your choice will take place! Hope this helps. - Anil Navkal P.S. I am not a guru in DECNET so please feel to correct me. <<< Note 994.5 by NSSG::R_SPENCE "Nets don't fail me now..." >>> -< Application generated DECnet events... >- Your last example is doable. Define some DECnet events from the ones for customer applications that represent the various things that can go wrong. Write a simple program that DCM can call when things go wrong that will generate the appropriate DECnet event. Sink the DECnet events from the systems to the DECmcc station. Set up a series of alarm rules for events from the systems. Have DCM, when it detects a problem, cause the event generating program to run on the system with the problem. The proceedure that the alarm rule calls could add needed text to a message as needed from the info provided from the alarm. I realize that it would be better to have DCM generate all the events from the DCM system (cause the actual system may be too ill to do it) but that will have to wait till the indirect notification (or whatever Engineering will call it) is available in a future release. Anil, it seems to me that the above would work. What do you think? s/rob
994.7	Let have an event!	CLAUDI::PETERS		`Fri May 10 1991 17:15`	11
	Rob's suggestion works, but be careful because DECmcc supports only event classes 0, 2-7. That means that DECmcc can not receive event classes 1 (applications) and 8-511. It means, for instance, that you can not alarm on DNS events (352,353) which would be really handy or on application specific events. The nodes still issue DECnet events, you can sink 'em to the system on which DECmcc is running, but MCC_DNA4_EVL ignores them. /Claudia
994.8	?	FLYROC::GOYETTE	PLAY BALL !!!	`Thu May 16 1991 11:32`	64
	RE: .5 NSSG::R_SPENCE Sorry for the delay but I was in landover, md. supporting a POLYCENTER awareness day/demo at the act...anyway, > Your last example is doable. > > Define some DECnet events from the ones for customer applications > that represent the various things that can go wrong. > Write a simple program that DCM can call when things go wrong that will > generate the appropriate DECnet event. How does one define/generate a DECNET event ? > Sink the DECnet events from the systems to the DECmcc station. Again, not being fluent in DECNET lingo, what does "sink" mean ? > Set up a series of alarm rules for events from the systems. > Have DCM, when it detects a problem, cause the event generating program > to run on the system with the problem. No can do. The DCM display process, which can execute command files based on an event type, normally runs on only one node. It currently does not have the capapbility to execute a command file on a remote node, i.e., the one with the problem. Some more thoughts/questions... In it's current state, the mcc alarm module can only be configured to monitor/alarm on network related events/conditions. ? If so, consider the possibility of this... What if I was to write an access module for an entity called SMW, System Management Workstation. The access module would contain the same things as the NODE4 access module as well as the knowledge to communicate with the SMW regarding SMW specific attributes and characteristics. Using this new access module, would it be possible to augment the current alarms function module to monitor/ alarm on events specific to this entity ? A very simplistic example... I build this entity called System Management Workstation. From this SMW I can manage and monitor all of my systems. I then build an access module so mcc can talk to it. Using the alarms module I define a rule for the SWM entity, CHECK NEW_EVENT_MESSAGE_COUNTER > 0 every 00:01:00. Theoretically, NEW_MESSAGE_EVENT_COUNTER is a characteristic of the SMW entity. Here's where the additional functionality to the alarms fm comes in. If the alarms FM checks this characteristic and finds that it is > 0 it could then retrieve the event messages from the SWM and zero the counter. Since the alarms FM knows these messages are SMW entity related it could parse them for node name and severity and change the color of the appropriate icons. I may be totally off base here. I'm not all that confident in my understanding of exactly how mcc does some things. However, would something along these lines be feasible ? Joe
994.9	Some answers...	WAKEME::ANIL		`Thu May 16 1991 12:49`	99
	RE .8: +------------------------------------------------------------------------------+ \|>>> How does one define/generate a DECNET event ? \| \| \| \|> Sink the DECnet events from the systems to the DECmcc station. \| \| \| \|>>>> Again, not being fluent in DECNET lingo, what does "sink" mean ? \| +------------------------------------------------------------------------------+ Read note 715.3. If this note is not adequate you may also want to read DECNET Phase IV access module use, Part number: AA-PD5BB-TE chapter 11 through 14. >>> No can do. The DCM display process, which can execute command files >>> based on an event type, normally runs on only one node. It currently >>> does not have the capability to execute a command file on a remote ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ >>>> node, i.e., the one with the problem. ~~~~ I am not sure I understand your problem. Are you saying that your command procedures can not run another command file on a remote node? If that's the question then the answer is simple. You have at least three ways of solving it. 1. Release protection on the remote com file so that the world can execute it. - This is the unsafest way to hack it! 2. Use ACLs so the on the remote node so that "current node/user" can execute the remote node com files - Moderately safe! 3. Use username/password combo to execute the remote com file. Did I answer a question you never asked? >>> Some more thoughts/questions... >>> >>> In it's current state, the MCC Alarm module can only be configured >>> to monitor/Alarm on network related events/conditions. ? >>> >>> If so, consider the possibility of this... >>> >>> What if I was to write an access module for an entity called SMW, >>> System Management Workstation. The access module would contain the >>> same things as the NODE4 access module as well as the knowledge to >>> communicate with the SMW regarding SMW specific attributes and >>> characteristics. Using this new access module, would it be >>> possible to augment the current Alarms function module to monitor/ >>> Alarm on events specific to this entity ? >>> Alarms can monitor any events you can see with "GETEVENT" command from FCL! No ifs no buts!! The events can come from any software/hardware. As long as an AM has pumped them in by providing a GETEVENT service you can depend on Alarms to watch for it. >>> A very simplistic example... >>> >>> I build this entity called System Management Workstation. From this >>> SMW I can manage and monitor all of my systems. I then build an access >>> module so MCC can talk to it. Using the Alarms module I define a >>> rule for the SWM entity, CHECK NEW_EVENT_MESSAGE_COUNTER > 0 every >>> 00:01:00. Theoretically, NEW_MESSAGE_EVENT_COUNTER is a characteristic >>> of the SMW entity. Here's where the additional functionality to the >>> Alarms FM comes in. If the Alarms FM checks this characteristic and >>> finds that it is > 0 it could then retrieve the event messages >>> from the SWM and zero the counter. Since the Alarms FM knows these >>> messages are SMW entity related it could parse them for node name and >>> severity and change the color of the appropriate icons. I think what you are getting at is the possibility of Zeroing a counter of an entity based on an Alarms rule firing. Sure you can do this. When a rule fires, Alarms writes the information about the entity and the data it got in a data file. The command procedure you specify on create rule command has this data available. You can write a DCL procedure to get the entity name out and then invoke MCC to zero the counters. It sounds complicated but is not all that difficult. If you are going to write another MM, you could also get events from Alarms rule firing and then zero the counters using MCC_CALL protocol. This mechanism is very efficient and does not require extra overhead of invoking MCC in a batch job. >>> >>> I may be totally off base here. I'm not all that confident in my >>> understanding of exactly how MCC does some things. However, would >>> something along these lines be feasible ? >>> >>> Joe Let us know if my answer is totally off base! - Anil Navkal
994.10		NSSG::R_SPENCE	Nets don't fail me now...	`Thu May 16 1991 13:28`	12
	Anil, is there any plan to alow events by event number rather than the text description? That would seem to then allow all the other DECnet events beyond those listed in the Phase IV manual including the ones reservered for customer use. re; the base note... depending on the timing of your need, perhaps you could use the common management agent as it gets developed? Unfortunatly, I don't have a pointer to someone who knows details about it but maybe someone watching here does? s/rob
994.11	DNA evl process	TOOK::CALLANDER		`Thu May 16 1991 14:43`	10
	not quite Rob. Even if we allowed you to specify an event number, the EVL process would have to be looking for them and passing them into MCC; my understanding from Jim Carey (PL of DECnet AM's) is that the way the DNA4 EVL process is setup they do quite a bit of preprocessing in the detached process and excepting new events would take some work. Though I did talk to Jim about what it would take to add in support for user definable events.
994.12		FLYROC::GOYETTE	PLAY BALL !!!	`Fri May 17 1991 09:07`	14
	> Let us know if my answer is totally off base! > > - Anil Navkal I don't know enough yet about MCC itself to determine whether it is or isn't. I'm going to take your advice and do some reading. Hopefully I will gain a better understanding of the alarms module and the decnet module. When I do, I'll see what I come up with then.. Thanks again for all the help. Joe
994.13	Getting events past the EVL shouldn't be hard?	NSSG::R_SPENCE	Nets don't fail me now...	`Fri May 17 1991 10:18`	19
	Yes Jill. The EVL process DOES do lots of pre processing, but there are NCL (and NCP) commands to easily modify what filtering it does. FOr example, (something I usually want to do) if I wanted to set alarms on DNS events, I would only have to tell the EVL process to PASS the 352.* and 353.* events. No big deal. I believe that you will find that more network managers know the event number that they want to see than know the "Name" or Descriptive Text that goes with them. I agree that the model would like us to use the correct names and certainly the product should allow it but we should also consider life after learning and help experts do work faster with approved shortcuts (like the event number if they know it vs the name) and accelerator keys for lots of common functions. What do other network manager types think about this? s/rob
994.14	Hmmm, 3 interesting event issues in one note!	TOOK::CAREY		`Fri May 17 1991 11:34`	159
	Hi guys, I guess it is time to get involved.... I've been sort of waiting for the dust to settle. There are three major issues running in and out in this note, the way I see it: -- How come "reachability events" light up the router and not the node that went unreachable? -- How can I get "other interesting DECnet Events" into MCC? -- How can I get "other interesting events in general" into MCC? * * * * * * NODE UNREACHABLE EVENTS DON'T LIGHT UP THE UNREACHABLE NODE: I've heard this request to "light up" the node that has been reported in as unreachable instead of the router that noticed the lack of reachability a few times, and I've been hoping I could defer that until "Target Entity" is available. From a purist DECnet perspective all we know is that the router whose events we are watching noticed that a node has become unreachable. Of course that could mean a lot of things: -- The unreachable node is down. In this notes file at least, that is a pretty popular choice. -- the router has lost a circuit. -- a bridge somewhere is down. -- Reachability doesn't imply adjacency. Another router in the area could have gone down, causing the problem. -- peak traffic swallowed the enough periodic hello messages to make it look like the node is down. -- Stan, the electrician working in the ceiling, just took out his wire cutters and made a mistake. Okay, if it is Ethernet, he just took out his chainsaw and made a mistake. All we know is that a routing node told us it no longer knows how to reach a particular end node. Okay, perhaps we shouldn't take the purist perspective. I just don't know what perspective to take. There are too many possibilities, most of which have nothing to do with either node reported in the event report. What we expect a network manager to do when an icon "lights up" is go to that icon, find out what occurred to get it all upset, and resolve the problem. That currently requires looking at the router that contains the intelligence to know that a node has become unreachable and trying to find out what is happening from there. I throw my hands up in confusion and depend upon the user's intelligence to unravel the cause. We have room for either: - .COM procedures following up the alarm rule firing to help isolate the cause (Brad's thing is a step in that direction), - a fill-in-the-blank Functional Module to help add value in this space. I know what the next question is: when? Sorry, we really are flat out here. I just don't know the answer to that. Oh, now I notice that .6 talks about many of the same sorts of things.... Sorry to be redundant and repeat myself and say all the same things several times. Thoughts on target entity: This node unreachable situation that is almost solved by the idea of "target entity" (see note 839) but not quite. We want to be able to say something like: Notify NODE4 * REMOTE NODE * node reachability change to get all reachability changes events, then we may want to specify that the target entity is: TARGET ENTITY = (the Node4 entity that is specified as the Remote Node entity in the entity specifier of the received event, if there is one). Implicitly, we recognize that what the router sees as a remote node maps to a real live Node4 out there for our management system. It is a little harder to make explicit, but that would sure be nice. Any ideas on some kind of a simple syntax that could describe that? I KNOW what I want, I just don't know how to say it. Other people are spending a lot of time thinking about this sort of issue.... * * * * * * Other DECnet events: The VMS or Ultrix EVL will deliver anything to our sink monitor (MCC_DNA4_EVL) that we request it to. The limitations on that are implementation specific and not worth going into here. The problem with receiving "other DECnet events" is in MCC_DNA4_EVL where we receive the event, translate it into MCCspeak and deliver it for [possible] consumption by anyone that happens to have a GETEVENT outstanding on the request. Translation gets tricky on events we don't know about. Describing an unknown event to the event manager is also tricky. Right now, we only know how to translate the generic Phase 4 events. We'd like to add significant application events (such as the 352.* events for DNS that Claudia mentioned). We're even looking at some approach to getting other DECnet events up to the Event Manager via some "unknown event" reporting mechanism or something like that. I'm not sure how much utility that would have, or how limited our event attribute translation would be. We are not that far advanced. Other "generic" events (such as from DCM): Enterprise-wide event sinking is a good idea. It does fit in well with the kind of mission that DECmcc hopes to be filling in the management space. There's a lot of talk around developing a "generic event sink" for MCC that would just hang around waiting for events to get delivered to it. Then it would pass them off to the Event Manager to see if anyone has subscribed to those events. This sink would allow exactly your DCM "disk purge" scenario to send an event to MCC (potentially over the network), and by asking Notification to watch for that event, we could "light up the map" with valuable enterprise information. Good stuff. Like I said, there's a lot of talk, and it is part of our "mission". When? Well, as always, we're trying to establish our priorities and make the very best use of limited resources. Given that, then just as soon as possible. Durn. I get as tired of saying that as you get of hearing it. Anybody got some decent C programmers out there sitting on their thumbs? We can think of something for 'em.... ;-) * * * * * * Not many answers I'm afraid, but I hope this makes some sense out of what we've done, why we've done some of it, and what we would like to do towards making them work better still. -Jim Carey
994.15	much clearer now.	FLYROC::GOYETTE	PLAY BALL !!!	`Fri May 17 1991 13:01`	71
	re: .14 Excellent explanation. I now have a much better understanding of what mcc can do, will do, and possibly do. Let me just comment on the following... > From a purist DECnet perspective all we know is that the router whose > events we are watching noticed that a node has become unreachable. > Of course that could mean a lot of things: > > -- The unreachable node is down. In this notes file at least, > that is a pretty popular choice. > -- the router has lost a circuit. > -- a bridge somewhere is down. > -- Reachability doesn't imply adjacency. Another router in the > area could have gone down, causing the problem. > -- peak traffic swallowed the enough periodic hello messages to > make it look like the node is down. > -- Stan, the electrician working in the ceiling, just took out his > wire cutters and made a mistake. Okay, if it is Ethernet, he > just took out his chainsaw and made a mistake. The reason I brought up this topic in the first place is that I believe that since mcc is the network monitoring tool it should be not only the logical choice for determining node unreachable events but the only choice. Here's why: currently, every event notifcation tool we have detects the node unreachable condition. With the exception of mcc, none of them are able to tell why the node is unreachable...In my ideal world, I would like to see a "node down" condition and a "node unreachable" condition. I would also like to see a tiny bit of AI used in differentiating between the two. If mcc determines that bridge X is down then it should turn the bridge icon red, but it should also turn all the other, for example, node icons a shade of red as well and signal a "node unreachable" condition. In doing so you've visually, and instantly, conveyed to the user the fact all the nodes are truly unreachable, not down. On the flipside, Node Z crashes. Mcc sees that it can't communicate with node Z, but before it signals a "node unreachable" condition it uses some intelligence of its own...is the router functioning, is the bridge functioning, is the network very busy, yes, ok I'll try to communicate with node Z again... If after making all these checks mcc still can't communicate with node Z mcc should signal a "node down" condition. Once signaled the node Z icon, and only that icon, should turn red. Again, visually conveying to user that this node has a problem...In the future, tools like VCS could be used to definitively determine "node down" conditions and then pass them off to MCC via the "general events" module. I can relate this to the way the sample rule works for node unreachable. Today, I have a simple domain with about ten node4 icons and a router. I've also enabled an unreachable rule for all of them. At some point during the day I get alarms stating that node A is unreachable..due to insufficient resources at the router. This is perfectly acceptable except the alarm is saying that node A has a problem but the iconic map is showing that the router has a problem. Mcc should change the color of the node icons to pink stating the node has a problem but it was caused by another object having a problem. As it does now, the router icon should turn red, but mcc should also generate an "out of resources" alarm for the router even though I don't have an explicit rule set up for it. Mcc already knows the router is out of resources because the other alarms told mcc that it was. Well I've rambled on long enough and I hope someone can make some sense out it. The main point is that I'd just like to see a little bit of intelligence included in the process.... Thanks again, Joe
994.16	numbers easier	JETSAM::WOODCOCK		`Mon May 20 1991 13:36`	7
	> What do other network manager types think about this? The use of EVENT numbers rather than a name would be much more convenient. brad...
994.17	Topology model needed before AI	TOOK::MATTHEWS		`Fri May 24 1991 17:11`	39
	Joe, your note .15 makes a lot of sense. What is lacking today in DECmcc is the concept of a topology model for the enterprise which can be referred to when atomic events happen to aid the AI process. Having AI itself doesn't do a thing until you have the topology model and that is what we are missing. In V1.2 we will add the first brick to the foundation of the topology model. We will introduce a concept of a circuit and the circuit will have topological structure. The circuit's state will be derived from the state of all of its components. Does this solve the problem. NO! However, it is one brick in the right direction. After V1.2 we will build a topology FM that provides ways to associate all the components of an enterprise into a single topology structure. It will take multiple release cycles to build the mechanisms. We know how to do some of them but lack the resources to do it instantly. Others we will have to innovate to be able to solve. When we get to the point that we have a topology model that has no holes for basic functions such as reachability, then we can put the AI routines into the event processing so that the correct cause is identified. Until then, the best we can do is provide the user with too much data and hope that he is intelligent enough to wade through it and deduce the correct cause. On the way to the ideal, we will find ways to correlate multiple events from components and generate newer higher level events (higher level in the topological structure sense) and reduce the amount of raw data we throw at the user and hopefully make him more productive in determining the root cause. As a result of some work we recently did brainstorming circuits we found a way to reduce multiple events to a single more meaningful event for one particular kind of circuit. We will not be able to do it for V1.2 but we will plan it for a future release. We are working on it, but it will not arrive instantly. wally