T.R | Title | User | Personal Name | Date | Lines |
---|
1400.1 | Alarms uses the Show Directive ... | NANOVX::ROBERTS | Keith Roberts - DECmcc Toolkit Team | Tue Aug 27 1991 16:50 | 15 |
| The Alarms Rule Evaluator converts your Rule Expression into a Show
directive (or Getevent directive for the Occurs function).
If you rule is:
(CHANGE_OF(NODE4 NOMAD LINE SVA-0 RECEIVE FAILURE,*,*), AT EVERY 00:03:00)
Try:
SHOW NODE4 NOMAD LINE SVA-0 RECEIVE FAILURE AT EVERY 00:03:00
Because this is exactly the command we generate. I can't imagine why your
system got bogged down by the NML processes.
/keith
|
1400.2 | Additional NML_nnnn problems using NODE4_AM rules. | CUJO::HILL | Dan Hill-Net.Mgt.-Customer Resident | Tue Aug 27 1991 17:43 | 23 |
| Some additional info:
I've created rules for additional nodes to monitor reachability of
nodes in a specific area using the following syntax:
CREATE MCC 0 ALARMS RULE node2_REMOTE_NODE_STATE -
EXPRESSION =(NODE4 node1 REMOTE NODE node2 STATE = UNREACHABLE ,-
AT EVERY 00:05:00) ,-
.
.
.
NOTE that node1 is area router for node2.
Two NML_nnnn processes were started for each end node on the area
router (node1). I enabled rules for 14 nodes, so the area router had
28 NML_nnnn processes running.
---------------
Thanks for the info in the .1 . I'll continue to research this issue,
but until I resolve it, I can't use the NODE4_AM rules I've created
since the result is a massive system impact.
-dh
|
1400.3 | No answers, some suggestions, some questions | TOOK::CAREY | | Wed Aug 28 1991 15:17 | 46 |
|
Re: .0 -- I'm not sure why you're seeing 4 NML server processes per
node. Due to our connection checking, I would expect 2. Sorry, but 2
is the minimum.
How many nodes are there in your LAVc? I expect that it is less than
20, but tell me if I'm wrong.
How many Node4 alarms are you running? To how many different Node4
entities? (When I ask about Node4, I'm referring to Node4 and all of
its children, such as Node4 Line, etc.)
The request you're making is pretty simple, and even though the
processes are going to be there, I don't expect them to be active for
more than a second or two during every polling cycle.
I'm curious about how much trouble you're having with the Lines
themselves as well; as an in-band management system, management
requests are at as much risk as data of confusion causing excessive
overhead. Do you know anything much about the problem?
Your LAVc shouldn't be collapsing under the load induced by MCC, more
information about the kinds of load it is under might help us to
understand better.
Re: .2 -- Try either slightly different polling intervals for different
alarms, or use a command procedure to Enable them and do it at
say, fifteen second intervals. Since you are running all of these
rules from the same process on the same node, to the same remote node,
skewing the requests a little will allow them to re-use the already
started nml servers. Probably you will still use something more than
two processes to service the requests, but I expect this to reduce the
count to five or six. When you enable all the rules at once, we get
into a situation where the DNA4 AM is trying to start all of these
connections concurrently. THAT is what costs the process slots.
Incidentally, the first connection will take two processes, the rest
should only require 1, unless you are doing a lot of requests to
different Node4s in the meantime.
We are looking at ways to reduce this load still more -- hopefully
we'll be able to supply you with some options in V1.2.
-Jim Carey
|
1400.4 | Can you solve the problem with less polling ? | DELNI::R_PAQUET | | Thu Aug 29 1991 08:56 | 10 |
|
For the reachability problem, why don't you use DECnet events to
determine reachability, and then alarm on these events. This will
eliminate the polling for reachability.
For the line recieve failures, I'd guess that you are seeing the same
failures, but counted by each individual system. Rather than alarm on
every system in the LAVc, why not just pick one as representative (like
the boot node as it is the most critical) for this alarm ?
|
1400.5 | More on NML; Questions on Performance & Max Alarms allowed | CUJO::HILL | Dan Hill-Net.Mgt.-Customer Resident | Fri Sep 06 1991 02:12 | 28 |
| Upon further testing, here is what I found. 3 NML_nnnn processes are
created for every rule enabled for Phase IV nodes. These processes
convert to SERVER_nnnn processes after a few seconds to a minute or
more (depending on how bogged down the target processor is). Two of
the SERVER_nnnn processes eventually timeout and go away leaving a
single process. (It is a bit more complex depending on types of rules,
but this explanation suffices for now).
If you want to use the same NML_nnnn/SERVER_nnnn process for all
alarms, you must wait a few seconds before enabling the next alarm
rule. This "staggering" can be accomplished using the command
MCC> SPAWN WAIT 00:00:15
between each ENABLE command.
------------------------------------------------------------------------
Enabling rules and at the same time avoiding the comsumption of vast
amounts of target node resources can be a real juggling act. Still,
by selectively enabling 40+ rules to monitor everything from NODE4 LINE
RECEIVE COUNTERS to BRIDGE SPANNING TREE changes, my VAXstation 3100/76
with 32 MB of memory incurred a noticable performance hit as alarms
began to fire.
>>>>> *Does anyone have any info on maximum number of rules that can be
enabled?
*What is the minimum polling time allowed without impacting
performance?
*What is the minimum suggested VAXstation configuration for
monitoring 600 nodes with 3 rules each (1200 alarm rules)
with polling times less than 5 minutes each?
|
1400.6 | | NSSG::R_SPENCE | Nets don't fail me now... | Fri Sep 06 1991 15:55 | 3 |
| You can also put a start time on each enable command.
s/rob
|