T.R | Title | User | Personal Name | Date | Lines |
---|
3385.1 | No, it's really an exception... | TOOK::MINTZ | Erik Mintz, dtn 226-5033 | Mon Jul 20 1992 08:29 | 19 |
| When you try to determine the attribute (node4 A remote node B state ...)
the DNA4 AM contacts node "A", and requests the information about
node "B". In your case, since "A" and "B" are the same, when the node
goes down, the AM is unable to read the information, and returns an exception.
The alarms FM is then acting on the exception. The alarms FM has no
protocol specific information that would allow it to realize that
the exception indicates the condition for which you were testing.
The long term solution to this problem is for the DNA4 AM to return
a synthesized "reachability" attribute, like the "IPreachability" provided
by the SNMP AM. In that case, the AM can try to communicate with a node,
and then translate the resulting exception into an attribute, since
the AM has protocol specific information about what attributes should mean.
In the short term, your best bet is to use different values for "A" and "B"
(that is, essentially, ask some other node whether "noon" is reachable).
-- Erik
|
3385.2 | | HANNAH::B_COBB | | Mon Jul 20 1992 08:52 | 6 |
| Thanks for the answer. Should the node to be queried be a routing
node or can it be any end node?
Thanks
Bill
|
3385.3 | End nodes only cache? | TOOK::MINTZ | Erik Mintz, dtn 226-5033 | Mon Jul 20 1992 10:12 | 4 |
| I believe that end nodes have a cache of reachability information for those
nodes which they have tried to reach recently.
-- Erik
|
3385.4 | ask a router | CTHQ1::WOODCOCK | | Mon Jul 20 1992 12:18 | 6 |
| You would want to ask a ROUTER IN THE SAME AREA about the reachability of
another node. End-node routing databases only contain an entry for its
designated router and therefore won't tell you what you want to find out.
best regards,
brad...
|
3385.5 | Works, but seems unreliable | HANNAH::B_COBB | | Mon Jul 20 1992 13:00 | 16 |
| Yes, I have tried this with a level 4 routing node. It seems not to be
to reliable. I set up a reachability rule for node X and brought node
X down. The routing node still showed node X as reachable. I waited
and it still was reachable. I had to physically try to set host to
node X before the routing node declared it unreachable. I saw the
adjacency for node X drop right away on the routing node's console,
but node X was still listed as "reachable" until I tried to contact
it. This seems a bit hokey. If a node becomes unreachable, I want
my rule to file as soon as possible.
Any comments on how to make this a bit quicker? How does everyone else
handle NODE4 reachability?
Thanks,
Bill
|
3385.6 | exceptions or events | CTHQ3::WOODCOCK | | Mon Jul 20 1992 15:26 | 34 |
| Hi Bill,
Yes, you are correct with your testing. I went thru the same scenerio when
MCC first came out and found that routers can take up to 5 minutes lag time
before a remote node is changed to unreachable. This is an anomaly of DECnet
which can't be avoided. You can use a couple of methods for determining
reachability more quickly. The first is to poll the node directly and let
exception handling fire if the poll fails.
expression=(node4 xxxxxx buffer size<>576)
The above expression should never fire except when the poll fails. Actually
what I use is a dual purpose alarm with an expression of:
expression=(node4 * circuit * substate <>none)
This polls all node4's in a domain and ensures all the circuits are up. Also,
if any node goes down we'll get an exception and get notified anyway.
You could also use DECnet events to find out what is going down. You have
already seen how the adjacency events are very quick to be generated. This is
due to the fact that these events are from the DATA LINK layer and not the
ROUTING layer of DECnet.
expression=(occurs(node4 xxxx adjacent node * adjacency down)) syntax??
Be careful with the above expression because once you set up the event sink
you will be receiving adjacency events for all nodes adjacent to the router
sending the events to you. These events can cause a heavy load on your system
especially if you're using alarms against them if there is a lot of adjacencies.
You also need TARGETs set up to highlight the proper node.
best regards,
brad...
|
3385.7 | | HANNAH::B_COBB | | Mon Jul 20 1992 16:54 | 23 |
| Thanks for the help. I was polling nodes themselves with:
(node4 mynode remote node mynode if state = unreachable, at every XX:XX)
This works nice, but you get an exception instead of a rule fire and
you get the intermediate severity and it's color. I wish we could just
poll a node and if it does not answer, then fire a rule of your choice.
I also do not like the idea that when the rule/exception fires, the
routing icon changes color instead of the "problem" icon. But that
is another issue discussed elsewhere in the conference. I do not
think I want to sink because of what you mentioned about the machine
getting pounded with events. As it is my machine slows down with 20
rules running and I have all of these NML links (from the rules I
believe for some reason) that create logfiles galore. I am still
trying to figure out my strategy with MCC, but it looks like it is
going to be more difficult than I thought.
Thanks for the help.
Bill
|
3385.8 | Not a pretyy solution, but .... | MLNCSC::BARILARO | | Tue Jul 21 1992 07:28 | 47 |
| Hi Bill,
I had your same problem with Node4 reachibility with MCC v1.1 and
also v1.2. As other people wrote before there isn't a easy way to
have the correct notification and graphics informations.
When I started using MCC I would like something that when one node
is up the icon is green and when it's down the icon is red.
The way I use is to ask directly to a node4 something that is
always true when one node is up, for example I use sintax like:
expression = (Node4 xxxx state = ON, at every 00:10)
Perceived Severity = Warning (green)
So, every xx minutes you'll receive a WARNING alarm, when the
node is down you receive an exception, with
MCC v1.1 it was "quite" fine because the exception was
linked to the critical color (RED), with v1.2 (the version
you probably have) it's not the same, the icon become
indeterminate color (Light Blue I think), so I had to
modify on the CUSTOMIZE window option the ALARM colors and I
associated the red color to the INDETERMINATE alam.
I agree it isn't a clean a very intelligent solution, but ....
One big problem that I had with this kind of rule (I use the
same logic, asking something that always happen, also for
Stations and Bridges) it's that when the node4 is up, every
xx minutes you receive a WARNING alarm, and when it's down you
receive every xx minutes an EXCEPTION. There isn't nothing to do
if you want to use the NOTIFICATION window, you'll receive an alarm
every xx minutes, but if you want to use mails or broadcast
command files, it's possible to modify them to receive only ONE
mail (or broadcast) when the node goes down an another when it
goes up.
You should modify the standard command files, check the existence
of a flag, if the flag exist then exit and not send the mail.
If you want I could send you these modified command files.
Hope this help,
Ciao Luciano
P.S. As usual sorry for my english
|
3385.9 | Needs to be figured out. | SKIBUM::GASSMAN | | Tue Jul 21 1992 08:37 | 11 |
| The concept of entity availablity needs to be addressed. There should
be an alarm when the availability changes from reachable to
unreachable, and other problems such as "network partner exited",
"invalid password", etc should continue to be 'indeterminate'. There
should not need to be continous alarms each time the rule is
re-evaluated, as that degrades the importance of each individual alarm.
Since most SNMP managers are optimized for this - it's important that
the MCC support community figure out a way to simulate this feature in
V1.2, and then support it in V1.3.
bill
|
3385.10 | | HANNAH::B_COBB | | Tue Jul 21 1992 09:48 | 4 |
| I aggree with .9, however does MCC engineering feel that the current
way is acceptable or are they going to look into a better way?
Any comments?
|
3385.11 | One of many things we'd like to fix | TOOK::MINTZ | Erik Mintz, dtn 226-5033 | Tue Jul 21 1992 09:58 | 6 |
| DECmcc engineering recognizes the limitations of the current situation.
Of course, there are many things that we feel need improvement.
If you feel this should be higher priority than some other improvements,
you could provide that information to product management so that our
requirements are prioritized correctly.
|
3385.12 | | HANNAH::B_COBB | | Tue Jul 21 1992 10:05 | 3 |
| Fair enough.. Thanks for the help and responses.
Bill
|
3385.13 | why fire every interval? | CTHQ3::WOODCOCK | | Wed Jul 22 1992 11:22 | 37 |
|
Hi Ciao/Bill,
> The way I use is to ask directly to a node4 something that is
> always true when one node is up, for example I use sintax like:
>
> expression = (Node4 xxxx state = ON, at every 00:10)
> Perceived Severity = Warning (green)
>
> One big problem that I had with this kind of rule (I use the
> same logic, asking something that always happen, also for
> Stations and Bridges) it's that when the node4 is up, every
> xx minutes you receive a WARNING alarm, and when it's down you
> receive every xx minutes an EXCEPTION. There isn't nothing to do
> if you want to use the NOTIFICATION window, you'll receive an alarm
> every xx minutes, but if you want to use mails or broadcast
> command files, it's possible to modify them to receive only ONE
> mail (or broadcast) when the node goes down an another when it
> goes up.
Why have it FIRE every interval?? If you use:
(node4 * state=off, at every yy)
the rule only fires an exception when the node is down. Using this method set
severity INDETERMINITE to RED like you have now and set your DEFAULT ICON color
to GREEN. This would keep your notification window clean for the REAL problems.
On the subject of reachability, every AM should be STRONGLY RECOMMENDED to
provide a reachability attribute (whether its real or simulated). This is a
must for managing anything...
best regards,
brad...
ps. your english is probably better than most :-)
|
3385.14 | Not quite... | TOOK::MCPHERSON | Life is hard. Play short. | Wed Jul 22 1992 11:52 | 16 |
| >Why have it FIRE every interval?? If you use:
>
>(node4 * state=off, at every yy)
>
>the rule only fires an exception when the node is down. Using this method set
>severity INDETERMINITE to RED like you have now and set your DEFAULT ICON color
>to GREEN. This would keep your notification window clean for the REAL problems.
Ummm... I don't think so, Brad.
If the NODE4's state truly is OFF, then the rule won't be able to evaluate
(it's using DECnet/NML to get the attribute, remember?) and you'll get an
EXCEPTION of severity indeterminate...
/doug
|
3385.15 | right, what he said | CTHQ3::WOODCOCK | | Wed Jul 22 1992 14:23 | 30 |
| Hi Doug,
>>>Why have it FIRE every interval?? If you use:
>>>
>>>(node4 * state=off, at every yy)
>>>
>>>the rule only fires an exception when the node is down. Using this method set
>>>severity INDETERMINITE to RED like you have now and set your DEFAULT ICON color
>>>to GREEN. This would keep your notification window clean for the REAL problems.
> Ummm... I don't think so, Brad.
>
> If the NODE4's state truly is OFF, then the rule won't be able to evaluate
> (it's using DECnet/NML to get the attribute, remember?) and you'll get an
> EXCEPTION of severity indeterminate...
>
> /doug
Right, that's the idea. This rule will NEVER fire unless the node is
unreachable via DECnet, and then it's an exception. Default icon=green (it's
up), indeterminite=red (it's down). The theory is to poll any attribute which
WON'T fire an alarm unless the node is down. State=off is probably the best
example of using this method for simple reachability.
Confused??? Good!!! :-) So are the customers trying to implement MCC and hence
the need for a reachability attribute for EVERY AM!!!
kind regards,
brad...
|
3385.16 | I quite agree, but.... | MLNCSC::BARILARO | | Thu Jul 23 1992 07:55 | 40 |
| RE: .13
Hi Brad,
I quite agree with you, but there are 2 things that force me to
choice this kind of rules.
First, most of our customers want to have something graphically that
show them when a node goes down/up, and second they want only ONE
message (mail/broadcast or so on) that said that something happens.
I don't know if you used sometime ENOP (was a product that generate
alarms on reachibility, lines/circuits use, space disks etc..),
this product did exactly this..
If I use the kind of rule that you describe, I've (the customer has)
the problems that I haven't indication when the node goes up, I've
manually to check the state of the node and one time it's up I've
to reset manually the EXCEPTION alarms to have the green icon back.
And also until the node remain down I receive a mail/broadcast
at every xx minutes.
So, the only solution that I found until now, it's this one, I
agree it's a dirty one, and sometime also heavy for the system,
every xx minutes start N batches (for alarms or exceptions),
and I still has the problem with the NOTIFICATION.
I also completly agree with your sentence
> On the subject of reachability, every AM should be STRONGLY RECOMMENDED to
> provide a reachability attribute (whether its real or simulated). This is a
> must for managing anything...
I'm hungry to find a clear an intelligent solution.
Best regards,
Ciao Luciano
P.S.: The word "Ciao" in italian means "Hi" or "Cheers"
|
3385.17 | i see the need now | CTHQ3::WOODCOCK | | Thu Jul 23 1992 09:52 | 11 |
| Hi Luciano,
I see what you're looking for but I'll have to think on this one for awhile.
To MCCs credit one can usually get around such things. If I think of anything
I'll come back with it.
>>P.S.: The word "Ciao" in italian means "Hi" or "Cheers"
As always, excuse my italian :-)
Ciao brad...
|
3385.18 | On supporting reachability | TOOK::GUERTIN | It fall down, go boom | Thu Jul 23 1992 10:59 | 14 |
| RE: .9 and last few
During MCC V1.0 design/development there was a proposal for a
Reachability FM. It was a "generic" FM which determined reachability
(perhaps "availability") for entities. The decision was that the
Alarms module was the correct place for such functionality. This may
require an arbitrarily complex expression, but in theory can be done.
So, if people have specific requirements for Alarms FM (the lights are
on, but no one is home) to support reachability, other than what has
already been stated, then we (engineering) would be happy to listen.
No it doesn't go into a black hole, it just gets added to a very long
list.
-Matt.
|
3385.19 | | HANNAH::B_COBB | | Thu Jul 23 1992 12:15 | 7 |
| One other thing is that if you settle for just getting exceptions and
not real rule fires, then you miss the rule fire when the entity is
up again giving you the "CLEAR" severity. This is useful for when
something goes down, you can see if it has returned with a "quick
look" at the notification window.
|
3385.20 | | MARVIN::COBB | Graham R. Cobb (DECNIS development), REO2-G/G9, 830-3917 | Fri Jul 24 1992 07:48 | 23 |
| .5> I saw the
.5> adjacency for node X drop right away on the routing node's console,
.5> but node X was still listed as "reachable" until I tried to contact
.5> it. This seems a bit hokey. If a node becomes unreachable, I want
.5> my rule to file as soon as possible.
All routing vector routing protocols (including DECnet Phase IV) have this
problem (the "counting to infinity" problem). It takes a long time to
decide that something is unreachable if it has gone away altogether. By the
way, it isn't a feature of "DECnet": RIP (used in TCP/IP) has exactly the
same characteristics (but in the TCP/IP world reachability is tested using
ping, not by asking routers).
That is why "link state" routing protocols were invented. DECnet Phase V
uses a link state protocol and will notice much faster that the node has
gone down.
.5> Any comments on how to make this a bit quicker?
Install Phase V routers, running Phase V routing! If you thought that was
difficult then try rewriting your rule in Phase V terms!!
Graham
|
3385.21 | The problem still needs to be solved | SKIBUM::GASSMAN | | Mon Jul 27 1992 07:54 | 22 |
| The problem statement should be fairly simple - if it's reachable make
it green, if it is unreachable, make it red. When the exception path
is used, you lose granularity of your alarms. Many polled nodes will
give you an exception due to "invalid password", "network partner
exited", and such - which are not RED critical problems. A manager
that will be used is one that alerts you when it should, and doesn't
when things are "indeterminate", but still ok. The simulated availability
is probably the best way to accomplish the required availability feature,
however since this feature is not on the V1.3 list yet, this note will
have to determine which is the best hack. The real requirement
(based on competitive products) includes the ability to put certain nodes
into "MARKED" mode - ie, remove them from the polling list. This is
hard to do when using wild card alarms, yet is useful when you know a
node will be down for weeks, and you don't want it to be red. We're
talking features that buyers of other management systems have been used to
for two years, so the details of what is needed for parity in the market
is well known. Since availability status can come from many sources
(events, alarms, remote polling devices, other management systems),
perhaps a unique Availability FM should be looked at again to solve
this.
bill
|
3385.22 | hack methodology | CTHQ::WOODCOCK | | Wed Aug 19 1992 15:25 | 97 |
| Hi there,
I've had a chance to think this one over a bit (what's it been a month!!).
Contained here are a couple of ideas/thoughts about reachability of a future
mcc version and also a methodology for a v1.2 'hack'.
At best reachability for the next version 'must' be addressed. This should
not be an issue of when but NEXT. Reachability is the BUSINESS of network
management and an easily understood solution is required without exception
handling.
One approach is to make all AMs supply a reachabilty attribute for alarming.
Another approach is to force all AMs to simply return the same value for
unreachability. How about "No response from entity". In this situation I would
tend to think ALARMS detection and notification would be possible relatively
easily for unreachable and then re-reachable entities with proper correlation
and colors.
The last option is a reachability AM or FM as Bill has suggested. This is
propably the 'best' solution but the most work. Having the ability to mark
objects for non-polling is essential. Having a poll exception list would
most likely work in this area.
For those looking for a potential hack for V1.2 read on. I'm not sure if
this meets all the needs but only you can answer. It does require DCL work
which I have not done but shouldn't be too large a project for those with
the time and need.
I have been unable to get an internal event from MCC indicating a transition
of a rule from FIRED -> CLEAR or EXCEPTION -> CLEAR. If anyone has been
successful with this I'd like to see how it was done. This transition is
key to getting colors back to the CLEAR state. Because I can't get this internal
from MCC an external process is required to determine when the object becomes
reachable again.
The hack would involve using the data collector as the central source of
updating the map. Two approaches could be taken, use your current domain
structure as you are today or use a secondary domain for polling. When using
the current domains FILTERS would be required to be set up for the exceptions
(if possible) and have the .com send a collector event to update the map. I
see a couple of problems with the later, setting up the filters each startup
(actually not a biggie) and losing other exceptions in the process. There are
also advantages to using a seperate polling domain which is what I'd recommend
if I were to persue this.
Details:
Create a domain called REACHABILITY and populate it with every entity you'd
like to get availabily on and maintain you current domains for 'viewing'
purposes. The only downfall to this method is ensuring the REACHABILITY domain
accurately reflects what's in the viewed domains. Next write an alarm rule
for each entity class with wildcards to poll all devices and fire with exception
if entity is unavailable. The advantage of this is that you now only require
one alarm rule for each class to poll all entities (if the system can handle
it). You have most likely saved a great deal of resources already with this
reduction of alarms. When the exception procedure fires have it do the
following:
- Check for a logical called POLL_BRIDGE_xxxxxx (example)
- if present exit (it has already been reported)
- if not send a collector event to a 'viewed' domain updating
the entity color RED, update log file, and set the logical
POLL_BRIDGE_xxxxxx
- Also check for the presence of an external reachability job
in batch and submit if not present (this job to be described next).
- A collector is required for each 'viewed' domain.
- A method of mapping this entity to the 'viewed' domain is required.
If you have an alarms process which resubmits itself each night then
also have it SHOW DOMAIN * MEMBER *, TO FILE=DOMAIN_MEMBERS.LIS;.
DOMAIN_MEMBERS.LIS; can now be used for searches to determine what
'viewed' domain the entity resides and hence which collector to send
the event to. Also the title of the event should be something like
BRIDGE_xxxxxx_REACHABILITY and have the color tell you whether it
it is up or down.
External Batch Job:
- This job runs at some interval equal to or greater than the polling
interval of the alarms when something is down.
- Retrieve all POLL* logicals
- Create MCC procedure to get name attribute of all reported down
entities from the list of logicals, execute procedure and write to a
file.
- Search file for entities now back up.
- Determine domain/collector (actually could also be in logical name)
- Send collector event BRIDGE_xxxxxx_REACHABILITY severity clear
to all entities back up.
- Update log file for entities back up and delete logical.
- If all entities now reachable exit, if not resubmit this job.
There you have it, a method which gives proper color (icon color = clear color)
for both up and down and needs far less resources than firing every interval
for every entity. You also save on alarm rules. A potential masterpiece :-).
best regards,
brad...
|
3385.23 | internal events WORK! | CTHQ3::WOODCOCK | | Sun Sep 06 1992 12:30 | 19 |
| Hi there,
If anybody is still listening hold the phone. I have gotten the function
in the below paragraph working.
>I have been unable to get an internal event from MCC indicating a transition
>of a rule from FIRED -> CLEAR or EXCEPTION -> CLEAR. If anyone has been
>successful with this I'd like to see how it was done. This transition is
>key to getting colors back to the CLEAR state. Because I can't get this internal
>from MCC an external process is required to determine when the object becomes
>reachable again.
This makes the need for an external job unnecessary as described in -.1.
Stay tuned a new note will most likely follow in a couple of weeks with a
home brewed reachability FM for V1.2. It already works but a couple of bells/
whisles are needed.
best regards,
brad...
|
3385.24 | What protocols are you using to determine reachability? | CUJO::HILL | Dan Hill-Net.Mgt.-Customer Resident | Fri Sep 11 1992 12:22 | 27 |
| Hi, Brad,
Which protocols (translation: which AMs) are you using to determine
reachability? Can you give me an example of an alarm rule you are
using to determine DECnet reachability? What about terminal servers,
bridges, ip nodes, and the generic Ethernet station?
I have a few global alarm rules of my own which I'll publish in a later
note.
Also, if you are looking for reachability polling and a
reduction in resource consumption, you can modify the generic command
procedures to fire alarm rules such that no logging is done. This
also means no batch processing overhead.
Reachability determination is the PRIMARY reason my customer is using
DECmcc. They are tolerating its current deficiencies with the
expectation that the product will improve after V1.2.
I've heard good news from some in the development groups that there is
a dedicated effort to address the issue of reachability. This is
encouraging, and I hope it continues with TOP PRIORITY.
I'd be interested in testing your procedures. Let me know if I can
help.
-Dan
|
3385.25 | protocol independent! | CTHQ::WOODCOCK | | Mon Sep 14 1992 10:25 | 12 |
| Hi Dan,
Unfortunately the procedures were left behind at the customer site. I am
waiting for a tape to be made and put on EASYnet so I can pull it up and make
a couple of enhancements. I'm hoping I'll get them this week, if not, I may
rewrite them. Protocol didn't matter with the technique once set up!!! When I
left they were monitoring STATIONS, NODE4s, SNMPs and the addition of anything
else is NO PROBLEM (I think)!! As soon as I've got *something* I'll let ya
know (it could still be a week or two).
best regards,
brad...
|
3385.26 | Some global alarm rule expressions for reachability | CUJO::HILL | Dan Hill-Net.Mgt.-Customer Resident | Wed Sep 23 1992 18:29 | 22 |
| I have been testing on a VAXstation 4000 Model 90 with 80MB memory.
What a sweet system. 45 VUPs, 33 SPECmarks.
I have been testing reachability of SNMP and BRIDGE entities mostly,
several hundred of them (total). The performance was great.
Expressions:
(SNMP * ipReachability = DOWN, AT EVERY = 00:05:00)
(BRIDGE * operation state = DOWN, AT EVERY = 00:05:00)
I have been trying to determine the strain that polling imposes on the
node. Not much for this Model 90 and 300-400 entities.
Once again, please let me state the importance of reachability.
My customers are beating me up on this issue. This should be something
that DECmcc does by default, with NO HACKING or other chicken-rigged
setups involved. This should not be "A" top priority, it should be
"THE" top priority.
Regards,
Dan
|
3385.27 | BRIDGE alarm rule expression | CUJO::HILL | Dan Hill-Net.Mgt.-Customer Resident | Thu Sep 24 1992 12:14 | 16 |
| I don't know why I have trouble remembering the syntax of this BRIDGE
alarm rule expression. I've used it so many times, but I've botched it
twice in this notes file. Guess my credibility is completely shot.
At any rate, here is the REAL expression for bridge reachability:
(BRIDGE * DEVICE STATE <> OPERATING, AT EVERY 00:05:00)
This works like a champ, except that LTM bridges don't respond
properly. What you can do to help eliminate this:
Filter the notification of the BRIDGE entities (EXCLUDE them) in
the notification window FILTERS section. I haven't been able to stop
the icons on the map from changing colors, though.
-dan
|
3385.28 | Why not CHANGE_OF instead? | CHRISB::BRIENEN | DECmcc LAN and SNMP Stuff... | Tue Sep 29 1992 14:42 | 5 |
| Wouldn't the expression:
CHANGE_OF( snmp * ipReachability, *,*), every 00:05:00
...work better?
|
3385.29 | color doesn't represent status | CTHQ::WOODCOCK | | Tue Sep 29 1992 14:53 | 10 |
|
>Wouldn't the expression:
>
> CHANGE_OF( snmp * ipReachability, *,*), every 00:05:00
>
>...work better?
2cents - Change_of does not meet the requirement of color status for IS IT UP
or IS IT DOWN. Change_of will always give the same color regardless of
reachablity status (color must be cleared manually).
|
3385.30 | V1.2 won't let you | TOOK::R_SPENCE | Nets don't fail me now... | Mon Oct 05 1992 15:08 | 4 |
| An besides, CHANGE_OF is NOT supported for wildcard rules in V1.2
s/rob
|
3385.31 | | CUJO::HILL | Dan Hill-Net.Mgt.-Customer Resident | Wed Oct 07 1992 15:13 | 9 |
| I sincerely hope that CHANGE_OF wild carding will be supported in the
next release. I simply don't have the resources to enable alarm rules
for 200+ nodes.
The more global alarm rules you can support, the better DECmcc will be
as a network monitoring and troubleshooting tool.
Thanks,
Dan
|
3385.32 | Note 3894 has potential | CTHQ::WOODCOCK | | Tue Oct 13 1992 14:12 | 50 |
| Hello,
Note 3894 introduces some procedures which was the best solution I could come
up with for V1.2 for reachability and may help with device monitoring today.
Better late than never I guess, is it V1.3 yet???
Going beyond, this has been an interesting exercise in learning what's NEEDED
for users in determining reachability and also other functions. To make life
easier for the MCC user in the future it may be appropriate to modify our
outlook of DOMAINS. The end solution was to create a FUNCTIONAL domain which
more closely accommodates a user's tasks, then use a couple features to marry
the functional domain back into how the user VIEWs domains. This may not always
apply of course but here is an exercise. There are 6 MCC domains shown below
but how many user functional domains are there??
A B
/ \ / \
C D E F
In most cases there are 2 functional domains. Alarms is a clear
example. The user wants to monitor all specific devices in each of the 2
hierarchies as a single function for each, one alarm for each with some
exceptions (enter no-poll mark). The same may likely hold true for other
functions, historical data and metrics, etc. The lesson here is that the
arbitrary collection of devices for VIEWing purposes often doesn't meet
functional needs. Maybe EXPAND=TRUE needs EXPANDing :-).
re: -.1
> I sincerely hope that CHANGE_OF wild carding will be supported in the
> next release. I simply don't have the resources to enable alarm rules
> for 200+ nodes.
> The more global alarm rules you can support, the better DECmcc will be
> as a network monitoring and troubleshooting tool.
While it is likely that CHANGE_OF might be coming in the future it still does
not solve this problem of REACHABILITY. Why, because it must compare attribute
values from two different polls, if it can't get the value there will still be
an EXCEPTION. Once again, if all AMs provide a reachability attribute then this
becomes viable. But...only if you can set a specific severity for a given
value of the attribute, otherwise its RED when it goes down and RED when it
comes up. What is truely needed is a change_of type function which doesn't
burden the user with things like mail, but still gets the color right for when
the device is UP or DOWN. MCC_REACH might fill the bill in the short term but
this should be brought out functionally within MCC itself, dcl and imagination
only go so far.
best regards,
brad...
|