T.R | Title | User | Personal Name | Date | Lines |
---|
994.1 | Hello ! | FLYROC::GOYETTE | PLAY BALL !!! | Fri May 10 1991 10:08 | 6 |
|
Is anyone here ? A simple yes or no to .0 would suffice....
Thank You,
Joe
|
994.2 | Need a bit more info first... | NSSG::R_SPENCE | Nets don't fail me now... | Fri May 10 1991 10:38 | 8 |
| Let me ask you a question first...
Better than what? Could you describe how you are currently thinking
of solving this so we can perhaps offer a better way? From your note
I don't know how you are now handeling it.
thanks
s/rob
|
994.3 | Does the note 839.5 (thanks Jim Carey) answer your question? | WAKEME::ANIL | | Fri May 10 1991 13:12 | 5 |
| Please refer to note 839.*. 839.5 should answer your question.
If the 839.* does not answer your question please let us know.
- Anil Navkal
|
994.4 | more info... | FLYROC::GOYETTE | PLAY BALL !!! | Fri May 10 1991 15:22 | 62 |
|
Thanks for the pointers...
> Let me ask you a question first...
>
> Better than what? Could you describe how you are currently thinking
> of solving this so we can perhaps offer a better way? From your note
> I don't know how you are now handeling it.
>
> thanks
> s/rob
A method other than the one used by the sample rule. I've been doing a
lot of work in the Event Management area of system management.
Currently, I'm looking at, installing, using, testing all of Digital's
system management/event management tools and applications in order to
determine each one's strengths, weaknesses, etc... I'm also trying
to determine the feasibility of using DECmcc as the central product
around which a cohesive event management system can be built. Hence,
the reason I asked the question about node unreachable events. All
of the event management products detect this condition, but using
MCC to detect it is by far the most difficult.
> Please refer to note 839.*. 839.5 should answer your question.
> If the 839.* does not answer your question please let us know.
>
> - Anil Navkal
I missed this topic on my original pass, having done a dir/title on
NODE4 and Unreachable...Nevertheless, after reading the notes in that
topic it seems to me that there's too much extra work on the part of
the user just to get a particular icon, the target, to change color
for the node unreachable event.
Is there a simple way, like a callable interface or something, for an
outside agent to send a message directly to MCC somehow ?
For Example, let's say I'm using the MCC iconic map. I also have
Data Center Monitor running on this system. DCM detects that node X
has just become unreachable, no, let's use a completely different
example. An example of not only using MCC as the network management
tool but also as the central repository for enterprise-wide events
and alarms. DCM is setup to monitor for low disk space on node X. At
some point DCM determines that disk A on node X is low on disk space.
From DCM a command file is executed which initiates a purge on disk A.
However, I would also like to be able to send a message to MCC stating
this condition and have MCC change the color of the node X icon.
I'm almost positive that nothing like this currently exists but to
me it seems like this should be a part of the evolution of MCC and
its mission to provide enterprise-wide management.
Your thoughts on this ?
Joe
|
994.5 | Application generated DECnet events... | NSSG::R_SPENCE | Nets don't fail me now... | Fri May 10 1991 16:40 | 26 |
| Your last example is doable.
Define some DECnet events from the ones for customer applications
that represent the various things that can go wrong.
Write a simple program that DCM can call when things go wrong that will
generate the appropriate DECnet event.
Sink the DECnet events from the systems to the DECmcc station.
Set up a series of alarm rules for events from the systems.
Have DCM, when it detects a problem, cause the event generating program
to run on the system with the problem.
The proceedure that the alarm rule calls could add needed text to a
message as needed from the info provided from the alarm.
I realize that it would be better to have DCM generate all the events
from the DCM system (cause the actual system may be too ill to do it)
but that will have to wait till the indirect notification (or whatever
Engineering will call it) is available in a future release.
Anil, it seems to me that the above would work. What do you think?
s/rob
|
994.6 | Lets me face it.... | WAKEME::ANIL | | Fri May 10 1991 17:17 | 80 |
|
Rob, your suggestion will certainly work. No problem with Alarms
here. If a DCM is going to sink an event for the correct NODE4,
turning that specific NODE4 icon's color is straight forward.
The problem we encounter with NODE4 is that the remote node that is
not reachable is the child entity of [router] node. So if remote
node becomes unreachable you see color changes for the actual
router. This is a particularly sticky problem to solve.
Just because the node is not reachable does not mean that the node
itself is powered down. It can become unreachable for any one of
the following reason:
1. A bridge between the two node may be experiencing some
hardware/software problems.
2. There may be a problem with the physical/logical circuit to
the node.
3. There may be inadequate resources at the remote node
Also just because we received the :node reachable event" we can not
conclude that the node is unreachable now, all it means is that that
the node became unreachable when the event was generated.
I agree as a user all this does not matter. S/He wants the color to
change when the object/entity is unreachable and bring it back to
normal color (Read severity = clear) when it is reachable, no ifs no
buts!
I don't know of any good way of solving this problem. A generic
Alarms can only go so far! All I can say is that if you have an
event we can make a rule on, you are in business!! The syntax for
creating rules base on events is as follows:
expression = (OCCURS (<entity> <event name>)).
When alarms detects the event, notification of your choice will take
place!
Hope this helps.
- Anil Navkal
P.S. I am not a guru in DECNET so please feel to correct me.
<<< Note 994.5 by NSSG::R_SPENCE "Nets don't fail me now..." >>>
-< Application generated DECnet events... >-
Your last example is doable.
Define some DECnet events from the ones for customer applications
that represent the various things that can go wrong.
Write a simple program that DCM can call when things go wrong that will
generate the appropriate DECnet event.
Sink the DECnet events from the systems to the DECmcc station.
Set up a series of alarm rules for events from the systems.
Have DCM, when it detects a problem, cause the event generating program
to run on the system with the problem.
The proceedure that the alarm rule calls could add needed text to a
message as needed from the info provided from the alarm.
I realize that it would be better to have DCM generate all the events
from the DCM system (cause the actual system may be too ill to do it)
but that will have to wait till the indirect notification (or whatever
Engineering will call it) is available in a future release.
Anil, it seems to me that the above would work. What do you think?
s/rob
|
994.7 | Let have an event! | CLAUDI::PETERS | | Fri May 10 1991 18:15 | 11 |
| Rob's suggestion works, but be careful because DECmcc supports
only event classes 0, 2-7. That means that DECmcc can not
receive event classes 1 (applications) and 8-511.
It means, for instance, that you can not alarm on DNS events
(352,353) which would be really handy or on application
specific events. The nodes still issue DECnet events, you can
sink 'em to the system on which DECmcc is running, but
MCC_DNA4_EVL ignores them.
/Claudia
|
994.8 | ? | FLYROC::GOYETTE | PLAY BALL !!! | Thu May 16 1991 12:32 | 64 |
| RE: .5 NSSG::R_SPENCE
Sorry for the delay but I was in landover, md. supporting a POLYCENTER
awareness day/demo at the act...anyway,
> Your last example is doable.
>
> Define some DECnet events from the ones for customer applications
> that represent the various things that can go wrong.
> Write a simple program that DCM can call when things go wrong that will
> generate the appropriate DECnet event.
How does one define/generate a DECNET event ?
> Sink the DECnet events from the systems to the DECmcc station.
Again, not being fluent in DECNET lingo, what does "sink" mean ?
> Set up a series of alarm rules for events from the systems.
> Have DCM, when it detects a problem, cause the event generating program
> to run on the system with the problem.
No can do. The DCM display process, which can execute command files
based on an event type, normally runs on only one node. It currently
does not have the capapbility to execute a command file on a remote
node, i.e., the one with the problem.
Some more thoughts/questions...
In it's current state, the mcc alarm module can only be configured
to monitor/alarm on network related events/conditions. ?
If so, consider the possibility of this...
What if I was to write an access module for an entity called SMW,
System Management Workstation. The access module would contain the
same things as the NODE4 access module as well as the knowledge to
communicate with the SMW regarding SMW specific attributes and
characteristics. Using this new access module, would it be
possible to augment the current alarms function module to monitor/
alarm on events specific to this entity ?
A very simplistic example...
I build this entity called System Management Workstation. From this
SMW I can manage and monitor all of my systems. I then build an access
module so mcc can talk to it. Using the alarms module I define a
rule for the SWM entity, CHECK NEW_EVENT_MESSAGE_COUNTER > 0 every
00:01:00. Theoretically, NEW_MESSAGE_EVENT_COUNTER is a characteristic
of the SMW entity. Here's where the additional functionality to the
alarms fm comes in. If the alarms FM checks this characteristic and
finds that it is > 0 it could then retrieve the event messages
from the SWM and zero the counter. Since the alarms FM knows these
messages are SMW entity related it could parse them for node name and
severity and change the color of the appropriate icons.
I may be totally off base here. I'm not all that confident in my
understanding of exactly how mcc does some things. However, would
something along these lines be feasible ?
Joe
|
994.9 | Some answers... | WAKEME::ANIL | | Thu May 16 1991 13:49 | 99 |
| RE .8:
+------------------------------------------------------------------------------+
|>>> How does one define/generate a DECNET event ? |
| |
|> Sink the DECnet events from the systems to the DECmcc station. |
| |
|>>>> Again, not being fluent in DECNET lingo, what does "sink" mean ? |
+------------------------------------------------------------------------------+
Read note 715.3. If this note is not adequate you may also want to
read DECNET Phase IV access module use, Part number: AA-PD5BB-TE
chapter 11 through 14.
>>> No can do. The DCM display process, which can execute command files
>>> based on an event type, normally runs on only one node. It currently
>>> does not have the capability to execute a command file on a remote
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>>> node, i.e., the one with the problem.
~~~~
I am not sure I understand your problem. Are you saying that your command
procedures can not run another command file on a remote node?
If that's the question then the answer is simple. You have at least three
ways of solving it.
1. Release protection on the remote com file so that the world can execute it.
- This is the unsafest way to hack it!
2. Use ACLs so the on the remote node so that "current node/user" can
execute the remote node com files
- Moderately safe!
3. Use username/password combo to execute the remote com file.
Did I answer a question you never asked?
>>> Some more thoughts/questions...
>>>
>>> In it's current state, the MCC Alarm module can only be configured
>>> to monitor/Alarm on network related events/conditions. ?
>>>
>>> If so, consider the possibility of this...
>>>
>>> What if I was to write an access module for an entity called SMW,
>>> System Management Workstation. The access module would contain the
>>> same things as the NODE4 access module as well as the knowledge to
>>> communicate with the SMW regarding SMW specific attributes and
>>> characteristics. Using this new access module, would it be
>>> possible to augment the current Alarms function module to monitor/
>>> Alarm on events specific to this entity ?
>>>
Alarms can monitor any events you can see with "GETEVENT" command from
FCL! No ifs no buts!! The events can come from any software/hardware.
As long as an AM has pumped them in by providing a GETEVENT service
you can depend on Alarms to watch for it.
>>> A very simplistic example...
>>>
>>> I build this entity called System Management Workstation. From this
>>> SMW I can manage and monitor all of my systems. I then build an access
>>> module so MCC can talk to it. Using the Alarms module I define a
>>> rule for the SWM entity, CHECK NEW_EVENT_MESSAGE_COUNTER > 0 every
>>> 00:01:00. Theoretically, NEW_MESSAGE_EVENT_COUNTER is a characteristic
>>> of the SMW entity. Here's where the additional functionality to the
>>> Alarms FM comes in. If the Alarms FM checks this characteristic and
>>> finds that it is > 0 it could then retrieve the event messages
>>> from the SWM and zero the counter. Since the Alarms FM knows these
>>> messages are SMW entity related it could parse them for node name and
>>> severity and change the color of the appropriate icons.
I think what you are getting at is the possibility of Zeroing a counter
of an entity based on an Alarms rule firing. Sure you can do this.
When a rule fires, Alarms writes the information about the entity and the
data it got in a data file. The command procedure you specify on
create rule command has this data available. You can write a DCL
procedure to get the entity name out and then invoke MCC to zero the
counters. It sounds complicated but is not all that difficult.
If you are going to write another MM, you could also get events from
Alarms rule firing and then zero the counters using MCC_CALL protocol.
This mechanism is very efficient and does not require extra overhead
of invoking MCC in a batch job.
>>>
>>> I may be totally off base here. I'm not all that confident in my
>>> understanding of exactly how MCC does some things. However, would
>>> something along these lines be feasible ?
>>>
>>> Joe
Let us know if my answer is totally off base!
- Anil Navkal
|
994.10 | | NSSG::R_SPENCE | Nets don't fail me now... | Thu May 16 1991 14:28 | 12 |
| Anil, is there any plan to alow events by event number rather than the
text description? That would seem to then allow all the other DECnet
events beyond those listed in the Phase IV manual including the ones
reservered for customer use.
re; the base note... depending on the timing of your need, perhaps you
could use the common management agent as it gets developed?
Unfortunatly, I don't have a pointer to someone who knows details about
it but maybe someone watching here does?
s/rob
|
994.11 | DNA evl process | TOOK::CALLANDER | | Thu May 16 1991 15:43 | 10 |
|
not quite Rob. Even if we allowed you to specify an event number, the
EVL process would have to be looking for them and passing them into
MCC; my understanding from Jim Carey (PL of DECnet AM's) is that the
way the DNA4 EVL process is setup they do quite a bit of preprocessing
in the detached process and excepting new events would take some work.
Though I did talk to Jim about what it would take to add in support for
user definable events.
|
994.12 | | FLYROC::GOYETTE | PLAY BALL !!! | Fri May 17 1991 10:07 | 14 |
|
> Let us know if my answer is totally off base!
>
> - Anil Navkal
I don't know enough yet about MCC itself to determine whether it is
or isn't. I'm going to take your advice and do some reading. Hopefully
I will gain a better understanding of the alarms module and the decnet
module. When I do, I'll see what I come up with then..
Thanks again for all the help.
Joe
|
994.13 | Getting events past the EVL shouldn't be hard? | NSSG::R_SPENCE | Nets don't fail me now... | Fri May 17 1991 11:18 | 19 |
| Yes Jill. The EVL process DOES do lots of pre processing, but there are
NCL (and NCP) commands to easily modify what filtering it does.
FOr example, (something I usually want to do) if I wanted to set alarms
on DNS events, I would only have to tell the EVL process to PASS the
352.* and 353.* events. No big deal. I believe that you will find that
more network managers know the event number that they want to see than
know the "Name" or Descriptive Text that goes with them.
I agree that the model would like us to use the correct names and
certainly the product should allow it but we should also consider life
after learning and help experts do work faster with approved shortcuts
(like the event number if they know it vs the name) and accelerator
keys for lots of common functions.
What do other network manager types think about this?
s/rob
|
994.14 | Hmmm, 3 interesting event issues in one note! | TOOK::CAREY | | Fri May 17 1991 12:34 | 159 |
|
Hi guys,
I guess it is time to get involved.... I've been sort of waiting for
the dust to settle. There are three major issues running in and out in
this note, the way I see it:
-- How come "reachability events" light up the router and not the
node that went unreachable?
-- How can I get "other interesting DECnet Events" into MCC?
-- How can I get "other interesting events in general" into MCC?
* * * * * *
NODE UNREACHABLE EVENTS DON'T LIGHT UP THE UNREACHABLE NODE:
I've heard this request to "light up" the node that has been reported
in as unreachable instead of the router that noticed the lack of
reachability a few times, and I've been hoping I could defer that until
"Target Entity" is available.
From a purist DECnet perspective all we know is that the router whose
events we are watching noticed that a node has become unreachable.
Of course that could mean a lot of things:
-- The unreachable node is down. In this notes file at least,
that is a pretty popular choice.
-- the router has lost a circuit.
-- a bridge somewhere is down.
-- Reachability doesn't imply adjacency. Another router in the
area could have gone down, causing the problem.
-- peak traffic swallowed the enough periodic hello messages to
make it look like the node is down.
-- Stan, the electrician working in the ceiling, just took out his
wire cutters and made a mistake. Okay, if it is Ethernet, he
just took out his chainsaw and made a mistake.
All we know is that a routing node told us it no longer knows how to
reach a particular end node.
Okay, perhaps we shouldn't take the purist perspective. I just don't
know what perspective to take. There are too many possibilities, most
of which have nothing to do with either node reported in the event
report.
What we expect a network manager to do when an icon "lights up" is go
to that icon, find out what occurred to get it all upset, and resolve
the problem. That currently requires looking at the router that
contains the intelligence to know that a node has become unreachable
and trying to find out what is happening from there.
I throw my hands up in confusion and depend upon the user's
intelligence to unravel the cause. We have room for either:
- .COM procedures following up the alarm rule firing to help
isolate the cause (Brad's thing is a step in that direction),
- a fill-in-the-blank Functional Module to help add value in this
space.
I know what the next question is: when? Sorry, we really *are* flat
out here. I just don't know the answer to that.
Oh, now I notice that .6 talks about many of the same sorts of
things.... Sorry to be redundant and repeat myself and say all the
same things several times.
Thoughts on target entity:
This node unreachable situation that is almost solved by the idea of
"target entity" (see note 839) but not quite. We want to be able to
say something like:
Notify NODE4 * REMOTE NODE * node reachability change
to get all reachability changes events, then we may want to specify
that the target entity is:
TARGET ENTITY = (the Node4 entity that is specified as the Remote Node
entity in the entity specifier of the received event, if there is one).
Implicitly, we recognize that what the router sees as a remote node
maps to a real live Node4 out there for our management system. It is a
little harder to make explicit, but that would sure be nice.
Any ideas on some kind of a simple syntax that could describe that? I
KNOW what I want, I just don't know how to say it. Other people are
spending a lot of time thinking about this sort of issue....
* * * * * *
Other DECnet events:
The VMS or Ultrix EVL will deliver anything to our sink monitor
(MCC_DNA4_EVL) that we request it to. The limitations on that are
implementation specific and not worth going into here.
The problem with receiving "other DECnet events" is in MCC_DNA4_EVL
where we receive the event, translate it into MCCspeak and deliver it
for [possible] consumption by anyone that happens to have a GETEVENT
outstanding on the request. Translation gets tricky on events we don't
know about. Describing an unknown event to the event manager is also
tricky.
Right now, we only know how to translate the generic Phase 4 events.
We'd like to add significant application events (such as the 352.*
events for DNS that Claudia mentioned). We're even looking at some
approach to getting other DECnet events up to the Event Manager via
some "unknown event" reporting mechanism or something like that. I'm
not sure how much utility that would have, or how limited our event
attribute translation would be. We are not that far advanced.
Other "generic" events (such as from DCM):
Enterprise-wide event sinking is a good idea. It does fit in well with
the kind of mission that DECmcc hopes to be filling in the management
space.
There's a lot of talk around developing a "generic event sink" for MCC
that would just hang around waiting for events to get delivered to it.
Then it would pass them off to the Event Manager to see if anyone has
subscribed to those events. This sink would allow exactly your DCM
"disk purge" scenario to send an event to MCC (potentially over the
network), and by asking Notification to watch for that event, we could
"light up the map" with valuable enterprise information. Good stuff.
Like I said, there's a lot of talk, and it is part of our "mission".
When? Well, as always, we're trying to establish our priorities and
make the very best use of limited resources. Given that, then just as
soon as possible.
Durn. I get as tired of saying that as you get of hearing it. Anybody
got some decent C programmers out there sitting on their thumbs? We
can think of something for 'em.... ;-)
* * * * * *
Not many answers I'm afraid, but I hope this makes some sense out of
what we've done, why we've done some of it, and what we would like to
do towards making them work better still.
-Jim Carey
|
994.15 | much clearer now. | FLYROC::GOYETTE | PLAY BALL !!! | Fri May 17 1991 14:01 | 71 |
|
re: .14
Excellent explanation. I now have a much better understanding of what
mcc can do, will do, and possibly do.
Let me just comment on the following...
> From a purist DECnet perspective all we know is that the router whose
> events we are watching noticed that a node has become unreachable.
> Of course that could mean a lot of things:
>
> -- The unreachable node is down. In this notes file at least,
> that is a pretty popular choice.
> -- the router has lost a circuit.
> -- a bridge somewhere is down.
> -- Reachability doesn't imply adjacency. Another router in the
> area could have gone down, causing the problem.
> -- peak traffic swallowed the enough periodic hello messages to
> make it look like the node is down.
> -- Stan, the electrician working in the ceiling, just took out his
> wire cutters and made a mistake. Okay, if it is Ethernet, he
> just took out his chainsaw and made a mistake.
The reason I brought up this topic in the first place is that I
believe that since mcc is the network monitoring tool it should be
not only the logical choice for determining node unreachable events
but the only choice. Here's why: currently, every event notifcation
tool we have detects the node unreachable condition. With the
exception of mcc, none of them are able to tell *why* the node is
unreachable...In my ideal world, I would like to see a "node down"
condition and a "node unreachable" condition. I would also like to
see a tiny bit of AI used in differentiating between the two. If mcc
determines that bridge X is down then it should turn the bridge icon
red, but it should also turn all the other, for example, node icons
a shade of red as well and signal a "node unreachable" condition. In
doing so you've visually, and instantly, conveyed to the user the fact
all the nodes are truly unreachable, not down. On the flipside, Node Z
crashes. Mcc sees that it can't communicate with node Z, but before it
signals a "node unreachable" condition it uses some intelligence of its
own...is the router functioning, is the bridge functioning, is the
network very busy, yes, ok I'll try to communicate with node Z again...
If after making all these checks mcc still can't communicate with node
Z mcc should signal a "node down" condition. Once signaled the node Z
icon, and only that icon, should turn red. Again, visually conveying
to user that this node has a problem...In the future, tools like VCS
could be used to definitively determine "node down" conditions and then
pass them off to MCC via the "general events" module.
I can relate this to the way the sample rule works for node unreachable.
Today, I have a simple domain with about ten node4 icons and a router.
I've also enabled an unreachable rule for all of them. At some point
during the day I get alarms stating that node A is unreachable..due to
insufficient resources at the router. This is perfectly acceptable
except the alarm is saying that node A has a problem but the iconic map
is showing that the router has a problem. Mcc should change the color
of the node icons to pink stating the node has a problem but it was
caused by another object having a problem. As it does now, the router
icon should turn red, but mcc should also generate an "out of resources"
alarm for the router even though I don't have an explicit rule set up
for it. Mcc already knows the router is out of resources because the
other alarms told mcc that it was.
Well I've rambled on long enough and I hope someone can make some
sense out it. The main point is that I'd just like to see a little
bit of intelligence included in the process....
Thanks again,
Joe
|
994.16 | numbers easier | JETSAM::WOODCOCK | | Mon May 20 1991 14:36 | 7 |
|
> What do other network manager types think about this?
The use of EVENT numbers rather than a name would be much more convenient.
brad...
|
994.17 | Topology model needed before AI | TOOK::MATTHEWS | | Fri May 24 1991 18:11 | 39 |
| Joe, your note .15 makes a lot of sense. What is lacking today in
DECmcc is the concept of a topology model for the enterprise which
can be referred to when atomic events happen to aid the AI process.
Having AI itself doesn't do a thing until you have the topology
model and that is what we are missing. In V1.2 we will add the first
brick to the foundation of the topology model. We will introduce
a concept of a circuit and the circuit will have topological structure.
The circuit's state will be derived from the state of all of its
components.
Does this solve the problem. NO! However, it is one brick in the
right direction. After V1.2 we will build a topology FM that
provides ways to associate all the components of an enterprise
into a single topology structure. It will take multiple release
cycles to build the mechanisms. We know how to do some of them
but lack the resources to do it instantly. Others we will have
to innovate to be able to solve.
When we get to the point that we have a topology model that has
no holes for basic functions such as reachability, then we
can put the AI routines into the event processing so that the
correct cause is identified. Until then, the best we can do
is provide the user with too much data and hope that he is
intelligent enough to wade through it and deduce the correct
cause.
On the way to the ideal, we will find ways to correlate multiple
events from components and generate newer higher level events (higher
level in the topological structure sense) and reduce the amount
of raw data we throw at the user and hopefully make him more
productive in determining the root cause. As a result of some
work we recently did brainstorming circuits we found a way to
reduce multiple events to a single more meaningful event for
one particular kind of circuit. We will not be able to do it
for V1.2 but we will plan it for a future release.
We are working on it, but it will not arrive instantly.
wally
|