[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference azur::mcc

Title:	DECmcc user notes file. Does not replace IPMT.
Notice:	Use IPMT for problems. Newsletter location in note 6187
Moderator:	TAEC::BEROUD

Created:	Mon Aug 21 1989
Last Modified:	Wed Jun 04 1997
Last Successful Update:	Fri Jun 06 1997
Number of topics:	6497
Total number of notes:	27359

3385.0. "Rule firing, but not correctly" by HANNAH::B_COBB () Mon Jul 20 1992 07:08

    When I create a rule that states:
    
    
    (node4 noon remote node noon state = unreachable, at every 00:05:00)
    
    And the severity is critical...  When the node goes down, instead of
    the rule firing with a RED critical severity, the rule fires with an
    exception of "Node not currently accessible" which has a severity of
    intermediate.  Shouldn't the rule fire with a critical severity?  After
    all, the node is not reachable anymore.  Did I construct the rule
    correctly?  The node I am testing is not a router, but an end node.
    
    Thanks for any help
    
    Bill

T.R	Title	User	Personal Name	Date	Lines
3385.1	No, it's really an exception...	TOOK::MINTZ	Erik Mintz, dtn 226-5033	`Mon Jul 20 1992 07:29`	19
	When you try to determine the attribute (node4 A remote node B state ...) the DNA4 AM contacts node "A", and requests the information about node "B". In your case, since "A" and "B" are the same, when the node goes down, the AM is unable to read the information, and returns an exception. The alarms FM is then acting on the exception. The alarms FM has no protocol specific information that would allow it to realize that the exception indicates the condition for which you were testing. The long term solution to this problem is for the DNA4 AM to return a synthesized "reachability" attribute, like the "IPreachability" provided by the SNMP AM. In that case, the AM can try to communicate with a node, and then translate the resulting exception into an attribute, since the AM has protocol specific information about what attributes should mean. In the short term, your best bet is to use different values for "A" and "B" (that is, essentially, ask some other node whether "noon" is reachable). -- Erik
3385.2		HANNAH::B_COBB		`Mon Jul 20 1992 07:52`	6
	Thanks for the answer. Should the node to be queried be a routing node or can it be any end node? Thanks Bill
3385.3	End nodes only cache?	TOOK::MINTZ	Erik Mintz, dtn 226-5033	`Mon Jul 20 1992 09:12`	4
	I believe that end nodes have a cache of reachability information for those nodes which they have tried to reach recently. -- Erik
3385.4	ask a router	CTHQ1::WOODCOCK		`Mon Jul 20 1992 11:18`	6
	You would want to ask a ROUTER IN THE SAME AREA about the reachability of another node. End-node routing databases only contain an entry for its designated router and therefore won't tell you what you want to find out. best regards, brad...
3385.5	Works, but seems unreliable	HANNAH::B_COBB		`Mon Jul 20 1992 12:00`	16
	Yes, I have tried this with a level 4 routing node. It seems not to be to reliable. I set up a reachability rule for node X and brought node X down. The routing node still showed node X as reachable. I waited and it still was reachable. I had to physically try to set host to node X before the routing node declared it unreachable. I saw the adjacency for node X drop right away on the routing node's console, but node X was still listed as "reachable" until I tried to contact it. This seems a bit hokey. If a node becomes unreachable, I want my rule to file as soon as possible. Any comments on how to make this a bit quicker? How does everyone else handle NODE4 reachability? Thanks, Bill
3385.6	exceptions or events	CTHQ3::WOODCOCK		`Mon Jul 20 1992 14:26`	34
	Hi Bill, Yes, you are correct with your testing. I went thru the same scenerio when MCC first came out and found that routers can take up to 5 minutes lag time before a remote node is changed to unreachable. This is an anomaly of DECnet which can't be avoided. You can use a couple of methods for determining reachability more quickly. The first is to poll the node directly and let exception handling fire if the poll fails. expression=(node4 xxxxxx buffer size<>576) The above expression should never fire except when the poll fails. Actually what I use is a dual purpose alarm with an expression of: expression=(node4 * circuit * substate <>none) This polls all node4's in a domain and ensures all the circuits are up. Also, if any node goes down we'll get an exception and get notified anyway. You could also use DECnet events to find out what is going down. You have already seen how the adjacency events are very quick to be generated. This is due to the fact that these events are from the DATA LINK layer and not the ROUTING layer of DECnet. expression=(occurs(node4 xxxx adjacent node * adjacency down)) syntax?? Be careful with the above expression because once you set up the event sink you will be receiving adjacency events for all nodes adjacent to the router sending the events to you. These events can cause a heavy load on your system especially if you're using alarms against them if there is a lot of adjacencies. You also need TARGETs set up to highlight the proper node. best regards, brad...
3385.7		HANNAH::B_COBB		`Mon Jul 20 1992 15:54`	23
	Thanks for the help. I was polling nodes themselves with: (node4 mynode remote node mynode if state = unreachable, at every XX:XX) This works nice, but you get an exception instead of a rule fire and you get the intermediate severity and it's color. I wish we could just poll a node and if it does not answer, then fire a rule of your choice. I also do not like the idea that when the rule/exception fires, the routing icon changes color instead of the "problem" icon. But that is another issue discussed elsewhere in the conference. I do not think I want to sink because of what you mentioned about the machine getting pounded with events. As it is my machine slows down with 20 rules running and I have all of these NML links (from the rules I believe for some reason) that create logfiles galore. I am still trying to figure out my strategy with MCC, but it looks like it is going to be more difficult than I thought. Thanks for the help. Bill
3385.8	Not a pretyy solution, but ....	MLNCSC::BARILARO		`Tue Jul 21 1992 06:28`	47
	Hi Bill, I had your same problem with Node4 reachibility with MCC v1.1 and also v1.2. As other people wrote before there isn't a easy way to have the correct notification and graphics informations. When I started using MCC I would like something that when one node is up the icon is green and when it's down the icon is red. The way I use is to ask directly to a node4 something that is always true when one node is up, for example I use sintax like: expression = (Node4 xxxx state = ON, at every 00:10) Perceived Severity = Warning (green) So, every xx minutes you'll receive a WARNING alarm, when the node is down you receive an exception, with MCC v1.1 it was "quite" fine because the exception was linked to the critical color (RED), with v1.2 (the version you probably have) it's not the same, the icon become indeterminate color (Light Blue I think), so I had to modify on the CUSTOMIZE window option the ALARM colors and I associated the red color to the INDETERMINATE alam. I agree it isn't a clean a very intelligent solution, but .... One big problem that I had with this kind of rule (I use the same logic, asking something that always happen, also for Stations and Bridges) it's that when the node4 is up, every xx minutes you receive a WARNING alarm, and when it's down you receive every xx minutes an EXCEPTION. There isn't nothing to do if you want to use the NOTIFICATION window, you'll receive an alarm every xx minutes, but if you want to use mails or broadcast command files, it's possible to modify them to receive only ONE mail (or broadcast) when the node goes down an another when it goes up. You should modify the standard command files, check the existence of a flag, if the flag exist then exit and not send the mail. If you want I could send you these modified command files. Hope this help, Ciao Luciano P.S. As usual sorry for my english
3385.9	Needs to be figured out.	SKIBUM::GASSMAN		`Tue Jul 21 1992 07:37`	11
	The concept of entity availablity needs to be addressed. There should be an alarm when the availability changes from reachable to unreachable, and other problems such as "network partner exited", "invalid password", etc should continue to be 'indeterminate'. There should not need to be continous alarms each time the rule is re-evaluated, as that degrades the importance of each individual alarm. Since most SNMP managers are optimized for this - it's important that the MCC support community figure out a way to simulate this feature in V1.2, and then support it in V1.3. bill
3385.10		HANNAH::B_COBB		`Tue Jul 21 1992 08:48`	4
	I aggree with .9, however does MCC engineering feel that the current way is acceptable or are they going to look into a better way? Any comments?
3385.11	One of many things we'd like to fix	TOOK::MINTZ	Erik Mintz, dtn 226-5033	`Tue Jul 21 1992 08:58`	6
	DECmcc engineering recognizes the limitations of the current situation. Of course, there are many things that we feel need improvement. If you feel this should be higher priority than some other improvements, you could provide that information to product management so that our requirements are prioritized correctly.
3385.12		HANNAH::B_COBB		`Tue Jul 21 1992 09:05`	3
	Fair enough.. Thanks for the help and responses. Bill
3385.13	why fire every interval?	CTHQ3::WOODCOCK		`Wed Jul 22 1992 10:22`	37
	Hi Ciao/Bill, > The way I use is to ask directly to a node4 something that is > always true when one node is up, for example I use sintax like: > > expression = (Node4 xxxx state = ON, at every 00:10) > Perceived Severity = Warning (green) > > One big problem that I had with this kind of rule (I use the > same logic, asking something that always happen, also for > Stations and Bridges) it's that when the node4 is up, every > xx minutes you receive a WARNING alarm, and when it's down you > receive every xx minutes an EXCEPTION. There isn't nothing to do > if you want to use the NOTIFICATION window, you'll receive an alarm > every xx minutes, but if you want to use mails or broadcast > command files, it's possible to modify them to receive only ONE > mail (or broadcast) when the node goes down an another when it > goes up. Why have it FIRE every interval?? If you use: (node4 * state=off, at every yy) the rule only fires an exception when the node is down. Using this method set severity INDETERMINITE to RED like you have now and set your DEFAULT ICON color to GREEN. This would keep your notification window clean for the REAL problems. On the subject of reachability, every AM should be STRONGLY RECOMMENDED to provide a reachability attribute (whether its real or simulated). This is a must for managing anything... best regards, brad... ps. your english is probably better than most :-)
3385.14	Not quite...	TOOK::MCPHERSON	Life is hard. Play short.	`Wed Jul 22 1992 10:52`	16
	>Why have it FIRE every interval?? If you use: > >(node4 * state=off, at every yy) > >the rule only fires an exception when the node is down. Using this method set >severity INDETERMINITE to RED like you have now and set your DEFAULT ICON color >to GREEN. This would keep your notification window clean for the REAL problems. Ummm... I don't think so, Brad. If the NODE4's state truly is OFF, then the rule won't be able to evaluate (it's using DECnet/NML to get the attribute, remember?) and you'll get an EXCEPTION of severity indeterminate... /doug
3385.15	right, what he said	CTHQ3::WOODCOCK		`Wed Jul 22 1992 13:23`	30
	Hi Doug, >>>Why have it FIRE every interval?? If you use: >>> >>>(node4 * state=off, at every yy) >>> >>>the rule only fires an exception when the node is down. Using this method set >>>severity INDETERMINITE to RED like you have now and set your DEFAULT ICON color >>>to GREEN. This would keep your notification window clean for the REAL problems. > Ummm... I don't think so, Brad. > > If the NODE4's state truly is OFF, then the rule won't be able to evaluate > (it's using DECnet/NML to get the attribute, remember?) and you'll get an > EXCEPTION of severity indeterminate... > > /doug Right, that's the idea. This rule will NEVER fire unless the node is unreachable via DECnet, and then it's an exception. Default icon=green (it's up), indeterminite=red (it's down). The theory is to poll any attribute which WON'T fire an alarm unless the node is down. State=off is probably the best example of using this method for simple reachability. Confused??? Good!!! :-) So are the customers trying to implement MCC and hence the need for a reachability attribute for EVERY AM!!! kind regards, brad...
3385.16	I quite agree, but....	MLNCSC::BARILARO		`Thu Jul 23 1992 06:55`	40
	RE: .13 Hi Brad, I quite agree with you, but there are 2 things that force me to choice this kind of rules. First, most of our customers want to have something graphically that show them when a node goes down/up, and second they want only ONE message (mail/broadcast or so on) that said that something happens. I don't know if you used sometime ENOP (was a product that generate alarms on reachibility, lines/circuits use, space disks etc..), this product did exactly this.. If I use the kind of rule that you describe, I've (the customer has) the problems that I haven't indication when the node goes up, I've manually to check the state of the node and one time it's up I've to reset manually the EXCEPTION alarms to have the green icon back. And also until the node remain down I receive a mail/broadcast at every xx minutes. So, the only solution that I found until now, it's this one, I agree it's a dirty one, and sometime also heavy for the system, every xx minutes start N batches (for alarms or exceptions), and I still has the problem with the NOTIFICATION. I also completly agree with your sentence > On the subject of reachability, every AM should be STRONGLY RECOMMENDED to > provide a reachability attribute (whether its real or simulated). This is a > must for managing anything... I'm hungry to find a clear an intelligent solution. Best regards, Ciao Luciano P.S.: The word "Ciao" in italian means "Hi" or "Cheers"
3385.17	i see the need now	CTHQ3::WOODCOCK		`Thu Jul 23 1992 08:52`	11
	Hi Luciano, I see what you're looking for but I'll have to think on this one for awhile. To MCCs credit one can usually get around such things. If I think of anything I'll come back with it. >>P.S.: The word "Ciao" in italian means "Hi" or "Cheers" As always, excuse my italian :-) Ciao brad...
3385.18	On supporting reachability	TOOK::GUERTIN	It fall down, go boom	`Thu Jul 23 1992 09:59`	14
	RE: .9 and last few During MCC V1.0 design/development there was a proposal for a Reachability FM. It was a "generic" FM which determined reachability (perhaps "availability") for entities. The decision was that the Alarms module was the correct place for such functionality. This may require an arbitrarily complex expression, but in theory can be done. So, if people have specific requirements for Alarms FM (the lights are on, but no one is home) to support reachability, other than what has already been stated, then we (engineering) would be happy to listen. No it doesn't go into a black hole, it just gets added to a very long list. -Matt.
3385.19		HANNAH::B_COBB		`Thu Jul 23 1992 11:15`	7
	One other thing is that if you settle for just getting exceptions and not real rule fires, then you miss the rule fire when the entity is up again giving you the "CLEAR" severity. This is useful for when something goes down, you can see if it has returned with a "quick look" at the notification window.
3385.20		MARVIN::COBB	Graham R. Cobb (DECNIS development), REO2-G/G9, 830-3917	`Fri Jul 24 1992 06:48`	23
	.5> I saw the .5> adjacency for node X drop right away on the routing node's console, .5> but node X was still listed as "reachable" until I tried to contact .5> it. This seems a bit hokey. If a node becomes unreachable, I want .5> my rule to file as soon as possible. All routing vector routing protocols (including DECnet Phase IV) have this problem (the "counting to infinity" problem). It takes a long time to decide that something is unreachable if it has gone away altogether. By the way, it isn't a feature of "DECnet": RIP (used in TCP/IP) has exactly the same characteristics (but in the TCP/IP world reachability is tested using ping, not by asking routers). That is why "link state" routing protocols were invented. DECnet Phase V uses a link state protocol and will notice much faster that the node has gone down. .5> Any comments on how to make this a bit quicker? Install Phase V routers, running Phase V routing! If you thought that was difficult then try rewriting your rule in Phase V terms!! Graham
3385.21	The problem still needs to be solved	SKIBUM::GASSMAN		`Mon Jul 27 1992 06:54`	22
	The problem statement should be fairly simple - if it's reachable make it green, if it is unreachable, make it red. When the exception path is used, you lose granularity of your alarms. Many polled nodes will give you an exception due to "invalid password", "network partner exited", and such - which are not RED critical problems. A manager that will be used is one that alerts you when it should, and doesn't when things are "indeterminate", but still ok. The simulated availability is probably the best way to accomplish the required availability feature, however since this feature is not on the V1.3 list yet, this note will have to determine which is the best hack. The real requirement (based on competitive products) includes the ability to put certain nodes into "MARKED" mode - ie, remove them from the polling list. This is hard to do when using wild card alarms, yet is useful when you know a node will be down for weeks, and you don't want it to be red. We're talking features that buyers of other management systems have been used to for two years, so the details of what is needed for parity in the market is well known. Since availability status can come from many sources (events, alarms, remote polling devices, other management systems), perhaps a unique Availability FM should be looked at again to solve this. bill
3385.22	hack methodology	CTHQ::WOODCOCK		`Wed Aug 19 1992 14:25`	97
	Hi there, I've had a chance to think this one over a bit (what's it been a month!!). Contained here are a couple of ideas/thoughts about reachability of a future mcc version and also a methodology for a v1.2 'hack'. At best reachability for the next version 'must' be addressed. This should not be an issue of when but NEXT. Reachability is the BUSINESS of network management and an easily understood solution is required without exception handling. One approach is to make all AMs supply a reachabilty attribute for alarming. Another approach is to force all AMs to simply return the same value for unreachability. How about "No response from entity". In this situation I would tend to think ALARMS detection and notification would be possible relatively easily for unreachable and then re-reachable entities with proper correlation and colors. The last option is a reachability AM or FM as Bill has suggested. This is propably the 'best' solution but the most work. Having the ability to mark objects for non-polling is essential. Having a poll exception list would most likely work in this area. For those looking for a potential hack for V1.2 read on. I'm not sure if this meets all the needs but only you can answer. It does require DCL work which I have not done but shouldn't be too large a project for those with the time and need. I have been unable to get an internal event from MCC indicating a transition of a rule from FIRED -> CLEAR or EXCEPTION -> CLEAR. If anyone has been successful with this I'd like to see how it was done. This transition is key to getting colors back to the CLEAR state. Because I can't get this internal from MCC an external process is required to determine when the object becomes reachable again. The hack would involve using the data collector as the central source of updating the map. Two approaches could be taken, use your current domain structure as you are today or use a secondary domain for polling. When using the current domains FILTERS would be required to be set up for the exceptions (if possible) and have the .com send a collector event to update the map. I see a couple of problems with the later, setting up the filters each startup (actually not a biggie) and losing other exceptions in the process. There are also advantages to using a seperate polling domain which is what I'd recommend if I were to persue this. Details: Create a domain called REACHABILITY and populate it with every entity you'd like to get availabily on and maintain you current domains for 'viewing' purposes. The only downfall to this method is ensuring the REACHABILITY domain accurately reflects what's in the viewed domains. Next write an alarm rule for each entity class with wildcards to poll all devices and fire with exception if entity is unavailable. The advantage of this is that you now only require one alarm rule for each class to poll all entities (if the system can handle it). You have most likely saved a great deal of resources already with this reduction of alarms. When the exception procedure fires have it do the following: - Check for a logical called POLL_BRIDGE_xxxxxx (example) - if present exit (it has already been reported) - if not send a collector event to a 'viewed' domain updating the entity color RED, update log file, and set the logical POLL_BRIDGE_xxxxxx - Also check for the presence of an external reachability job in batch and submit if not present (this job to be described next). - A collector is required for each 'viewed' domain. - A method of mapping this entity to the 'viewed' domain is required. If you have an alarms process which resubmits itself each night then also have it SHOW DOMAIN * MEMBER , TO FILE=DOMAIN_MEMBERS.LIS;. DOMAIN_MEMBERS.LIS; can now be used for searches to determine what 'viewed' domain the entity resides and hence which collector to send the event to. Also the title of the event should be something like BRIDGE_xxxxxx_REACHABILITY and have the color tell you whether it it is up or down. External Batch Job: - This job runs at some interval equal to or greater than the polling interval of the alarms when something is down. - Retrieve all POLL logicals - Create MCC procedure to get name attribute of all reported down entities from the list of logicals, execute procedure and write to a file. - Search file for entities now back up. - Determine domain/collector (actually could also be in logical name) - Send collector event BRIDGE_xxxxxx_REACHABILITY severity clear to all entities back up. - Update log file for entities back up and delete logical. - If all entities now reachable exit, if not resubmit this job. There you have it, a method which gives proper color (icon color = clear color) for both up and down and needs far less resources than firing every interval for every entity. You also save on alarm rules. A potential masterpiece :-). best regards, brad...
3385.23	internal events WORK!	CTHQ3::WOODCOCK		`Sun Sep 06 1992 11:30`	19
	Hi there, If anybody is still listening hold the phone. I have gotten the function in the below paragraph working. >I have been unable to get an internal event from MCC indicating a transition >of a rule from FIRED -> CLEAR or EXCEPTION -> CLEAR. If anyone has been >successful with this I'd like to see how it was done. This transition is >key to getting colors back to the CLEAR state. Because I can't get this internal >from MCC an external process is required to determine when the object becomes >reachable again. This makes the need for an external job unnecessary as described in -.1. Stay tuned a new note will most likely follow in a couple of weeks with a home brewed reachability FM for V1.2. It already works but a couple of bells/ whisles are needed. best regards, brad...
3385.24	What protocols are you using to determine reachability?	CUJO::HILL	Dan Hill-Net.Mgt.-Customer Resident	`Fri Sep 11 1992 11:22`	27
	Hi, Brad, Which protocols (translation: which AMs) are you using to determine reachability? Can you give me an example of an alarm rule you are using to determine DECnet reachability? What about terminal servers, bridges, ip nodes, and the generic Ethernet station? I have a few global alarm rules of my own which I'll publish in a later note. Also, if you are looking for reachability polling and a reduction in resource consumption, you can modify the generic command procedures to fire alarm rules such that no logging is done. This also means no batch processing overhead. Reachability determination is the PRIMARY reason my customer is using DECmcc. They are tolerating its current deficiencies with the expectation that the product will improve after V1.2. I've heard good news from some in the development groups that there is a dedicated effort to address the issue of reachability. This is encouraging, and I hope it continues with TOP PRIORITY. I'd be interested in testing your procedures. Let me know if I can help. -Dan
3385.25	protocol independent!	CTHQ::WOODCOCK		`Mon Sep 14 1992 09:25`	12
	Hi Dan, Unfortunately the procedures were left behind at the customer site. I am waiting for a tape to be made and put on EASYnet so I can pull it up and make a couple of enhancements. I'm hoping I'll get them this week, if not, I may rewrite them. Protocol didn't matter with the technique once set up!!! When I left they were monitoring STATIONS, NODE4s, SNMPs and the addition of anything else is NO PROBLEM (I think)!! As soon as I've got something I'll let ya know (it could still be a week or two). best regards, brad...
3385.26	Some global alarm rule expressions for reachability	CUJO::HILL	Dan Hill-Net.Mgt.-Customer Resident	`Wed Sep 23 1992 17:29`	22
	I have been testing on a VAXstation 4000 Model 90 with 80MB memory. What a sweet system. 45 VUPs, 33 SPECmarks. I have been testing reachability of SNMP and BRIDGE entities mostly, several hundred of them (total). The performance was great. Expressions: (SNMP * ipReachability = DOWN, AT EVERY = 00:05:00) (BRIDGE * operation state = DOWN, AT EVERY = 00:05:00) I have been trying to determine the strain that polling imposes on the node. Not much for this Model 90 and 300-400 entities. Once again, please let me state the importance of reachability. My customers are beating me up on this issue. This should be something that DECmcc does by default, with NO HACKING or other chicken-rigged setups involved. This should not be "A" top priority, it should be "THE" top priority. Regards, Dan
3385.27	BRIDGE alarm rule expression	CUJO::HILL	Dan Hill-Net.Mgt.-Customer Resident	`Thu Sep 24 1992 11:14`	16
	I don't know why I have trouble remembering the syntax of this BRIDGE alarm rule expression. I've used it so many times, but I've botched it twice in this notes file. Guess my credibility is completely shot. At any rate, here is the REAL expression for bridge reachability: (BRIDGE * DEVICE STATE <> OPERATING, AT EVERY 00:05:00) This works like a champ, except that LTM bridges don't respond properly. What you can do to help eliminate this: Filter the notification of the BRIDGE entities (EXCLUDE them) in the notification window FILTERS section. I haven't been able to stop the icons on the map from changing colors, though. -dan
3385.28	Why not CHANGE_OF instead?	CHRISB::BRIENEN	DECmcc LAN and SNMP Stuff...	`Tue Sep 29 1992 13:42`	5
	Wouldn't the expression: CHANGE_OF( snmp * ipReachability, ,), every 00:05:00 ...work better?
3385.29	color doesn't represent status	CTHQ::WOODCOCK		`Tue Sep 29 1992 13:53`	10
	>Wouldn't the expression: > > CHANGE_OF( snmp * ipReachability, ,), every 00:05:00 > >...work better? 2cents - Change_of does not meet the requirement of color status for IS IT UP or IS IT DOWN. Change_of will always give the same color regardless of reachablity status (color must be cleared manually).
3385.30	V1.2 won't let you	TOOK::R_SPENCE	Nets don't fail me now...	`Mon Oct 05 1992 14:08`	4
	An besides, CHANGE_OF is NOT supported for wildcard rules in V1.2 s/rob
3385.31		CUJO::HILL	Dan Hill-Net.Mgt.-Customer Resident	`Wed Oct 07 1992 14:13`	9
	I sincerely hope that CHANGE_OF wild carding will be supported in the next release. I simply don't have the resources to enable alarm rules for 200+ nodes. The more global alarm rules you can support, the better DECmcc will be as a network monitoring and troubleshooting tool. Thanks, Dan
3385.32	Note 3894 has potential	CTHQ::WOODCOCK		`Tue Oct 13 1992 13:12`	50
	Hello, Note 3894 introduces some procedures which was the best solution I could come up with for V1.2 for reachability and may help with device monitoring today. Better late than never I guess, is it V1.3 yet??? Going beyond, this has been an interesting exercise in learning what's NEEDED for users in determining reachability and also other functions. To make life easier for the MCC user in the future it may be appropriate to modify our outlook of DOMAINS. The end solution was to create a FUNCTIONAL domain which more closely accommodates a user's tasks, then use a couple features to marry the functional domain back into how the user VIEWs domains. This may not always apply of course but here is an exercise. There are 6 MCC domains shown below but how many user functional domains are there?? A B / \ / \ C D E F In most cases there are 2 functional domains. Alarms is a clear example. The user wants to monitor all specific devices in each of the 2 hierarchies as a single function for each, one alarm for each with some exceptions (enter no-poll mark). The same may likely hold true for other functions, historical data and metrics, etc. The lesson here is that the arbitrary collection of devices for VIEWing purposes often doesn't meet functional needs. Maybe EXPAND=TRUE needs EXPANDing :-). re: -.1 > I sincerely hope that CHANGE_OF wild carding will be supported in the > next release. I simply don't have the resources to enable alarm rules > for 200+ nodes. > The more global alarm rules you can support, the better DECmcc will be > as a network monitoring and troubleshooting tool. While it is likely that CHANGE_OF might be coming in the future it still does not solve this problem of REACHABILITY. Why, because it must compare attribute values from two different polls, if it can't get the value there will still be an EXCEPTION. Once again, if all AMs provide a reachability attribute then this becomes viable. But...only if you can set a specific severity for a given value of the attribute, otherwise its RED when it goes down and RED when it comes up. What is truely needed is a change_of type function which doesn't burden the user with things like mail, but still gets the color right for when the device is UP or DOWN. MCC_REACH might fill the bill in the short term but this should be brought out functionally within MCC itself, dcl and imagination only go so far. best regards, brad...