T.R | Title | User | Personal Name | Date | Lines |
---|
711.1 | NMCC or MCC | TOOK::W_MCGRATH | DTN 226-5075 | Wed Feb 13 1991 08:46 | 11 |
| Are you refering to NMCC (NMCC DECnet Monitor) or DECmcc? If you are
refering to the point product tool, NMCC, then try the NAC::NMCC notes
conference. NMCC cannot update the map reachability status based on
the amount of time a node is down.
If you are refering to DECmcc (the EMA product), you might want to ask
again here. My understanding is that as long as the attribute (in this
case, seconds the circuit has been down) is recognized by the AM, the
Alarms FM can alarm on the attribute (with >, <, =, etc...)
-Will-
|
711.2 | DECmcc | ULYSSE::ZITTA | Batchman | Thu Feb 14 1991 03:43 | 15 |
|
RE-1:
Sorry for the lack of precision.
I was meaning DECmcc
Can somebody confirm and precise how to achieve this with DECmcc ?
Another related question I have,is that if a circuit is down ,I would
like to be sure that ACCOUNTING/BILLING is stopped during that time.
Is that possible ?
Thanks,
Gerard
|
711.3 | Here is a hack! | WAKEME::ANIL | | Tue Feb 19 1991 19:46 | 46 |
| Re: .0
>
>
> Is it (or will it be) possible with NMCC to automatically give an
> alarm only when a CIRCUIT was DOWN for more than a given time
> (settable by user) ?
>
> Thanks,
>
> Gerard
Let me up front accept that the functionality you have requested does
not exist in V1.1 Alarms, nor is it planned for V1.2. Based on how
important you (and others) feel this functionality is, we could use
it as an input to V2.0.
Now having said the above, here is a hack. Jim Carry may want to
correct me if I am wrong, but from my limited understanding of DECNET
phase4 it should work.
There are number of ways you can accomplish what you are trying to
do. I am suggesting just one way of doing it!
You can make one rule that watches for a Circuit Down Event from
that Circuit. Rule fires when the circuit goes down.
Let this trigger another session of MCC. (You can do this by invoking a
command procedure).
In this other session of MCC, ZERO all the counters for this
circuit.
Then enable a rule that will look for the attribute "Byte sent" to
be = 0! If you want to make sure the circuit comes up within 10
minutes start watching after 10 minutes. If after 10 minutes you see
that the rule fires then you know for sure that the circuit was down
for 10 minutes. The rule expression for such a rule will be along
the following lines:
expression = (node4 <node-name> circuit <cir-name> byte sent = 0,
at start +00:10)
It is a hack, but then you don't have to wait till V2.0!
- Anil Who-is-not-very-happy-with-his-own-answer!
|
711.4 | too bad! | ULYSSE::ZITTA | Batchman | Wed Feb 20 1991 08:44 | 19 |
|
Thanks Anil for the information .
Glad to hear such a functionality was already on the wishlist.
It would be *very* interesting for any network management center .
Right now the existing tools send so many "unuseful" alarms usually
that you can miss the real problems .
It would be nice to have a tool that can acknowledge "by itself"
the unuseful events ,and that could flag only the real problems on
the map .
(An example of what I call "unuseful" is CIRCUIT DOWN (and all the
related events ) followed by CIRCUIT UP close enough )
Anyway thanks for your reply .We'll try to put in place a workaround
in the meantime .
Gerard
|
711.5 | Want it? I crave it! | SICKO::FRAZER | Vision operates on many levels... | Wed Feb 20 1991 18:49 | 6 |
| >> Based on how important you (and others) feel this functionality
>> is, we could use it as an input to V2.0.
In a word, VERY_IMPORTANT!
timf
|
711.6 | How might you expression this type of Rule ? | WAKEME::ROBERTS | Keith Roberts - DECmcc Alarms Team | Thu Feb 21 1991 09:14 | 28 |
| We agree that this type of functionality would be *very* important for
managing a network. How would it be expressed in an Alarm Rule?
Maybe:
------------------------------------------------------------------------
SEQUENCE( Entity-Event-1, Entity-Event-2, Duration )
Ex: (SEQUENCE(node4 A circuit qna-0 circuit down, -
node4 A circuit qna-0 circuit up, +00:10:00))
"If Event-1 occurs, followed by Event-2 within the Duration, tell me"
------------------------------------------------------------------------
o Anyone have any other ideas?
o What should we be adding to the wish list for Alarm Rules?
- Please try and give suggestions how this could be written
in terms of an Alarm Rule Expression.
Thanks
Keith Roberts
DECmcc Alarms Team
|
711.7 | catch 22 | JETSAM::WOODCOCK | | Thu Feb 21 1991 12:26 | 36 |
| Hi,
What your attempting is probably more difficult than meets the eye. Circuits
don't typically go down cleanly and then come up cleanly, they very often
bounce many times. Will your sequence example below be able to handle
multiple outages within that ten minutes? If not you would have to reduce
the ten minutes, the smaller the more accurate.
Here's the catch 22. What happens if the circuit goes down cleanly for
longer than 10 minutes. This means your sequence won't be satisfied.
Therefore you would have to increase the ten minutes, the larger the better.
Also from a net mngmt standpoint, a steadily bouncing circuit has a far worse
impact on the net than the circuit which simple goes down and stays down for
the outage. Every time there is an event (up or down) a routing update occurs
which traverses and propogates thru the net. This impact on routing can't be
measured to keep our vendors and our rebates honest. Although you appear to
be on the right track there are practical pitfalls which I'm not certain how
to resolve.
regards,
brad...
>Maybe:
> ------------------------------------------------------------------------
> SEQUENCE( Entity-Event-1, Entity-Event-2, Duration )
> Ex: (SEQUENCE(node4 A circuit qna-0 circuit down, -
> node4 A circuit qna-0 circuit up, +00:10:00))
> "If Event-1 occurs, followed by Event-2 within the Duration, tell me"
|
711.8 | One shoe for all feet? | TOOK::KOHLS | Ruth Kohls | Thu Feb 21 1991 13:14 | 16 |
| Re: .7
It seems to me that you are trying to make too general a tool from one,
very specific, rule. You will still have to set up alarms and notifications
for each situation you want to cover--and a 10 minute period isn't going to fit
a lot of situations!
"Just" listing various things that circuits do when they aren't behaving well
is a starting point, and I suspect the Alarms team will find your description
helpful. It sounds like they may need an expert system or two, among other
things (;-).
I thought the question in .6 was more along the lines of: would an
expression *like* this make mcc alarms nicer for you?
Ruth Kohls
(Mcc Kernel Team)
|
711.9 | It sounds like AI to me 8) | WAKEME::ROBERTS | Keith Roberts - DECmcc Alarms Team | Thu Feb 21 1991 13:59 | 17 |
| There is an extention to the OCCURS function we've been planning:
OCCURS( Event, Count, Duration )
If 'Event' occurs 'Count' times within this 'Duration' ... let me know.
But - I think Ruth hit the nail ... What describes a happy (or not so happy)
circuit. It can't be expressed as one simple statement.
We *want* to add AND's and OR's to the Rule Expression so you can combine
'thoughts' (not for v1.1 .. sorry)
Let's keep this discussion going -- from what I see DECmcc already has
one of the most powerful Alarming capabilities going.
/Keith
|
711.10 | Can an AM help us here? | WAKEME::ANIL | | Thu Feb 21 1991 16:44 | 21 |
| Generic solutions if powerful enough will solve many problems with
surprising elegance. How ever I a afraid the CIRCUIT DOWN is more
like a technology specific problem. We can solve the detection
of circuit bouncing problem by the OCCURS function with the extension
indicated in .-1 by Keith.
I was wondering if the AMs can provide a counter such as
entity down timer = xyz seconds
The above timer could be a generic timer and any Entity AM can support
it in a generic fashion at any global/child entity level. Availability
of such counter will make Alarms job ridiculously easy! It will be zeroed
every time the entity(circuit) reaches a stable Enabled/On state!
Alarms could certainly use some extentions like trigger rule foo
if rule foobar fires. This will make it much easier to start detecting
a problem/condition only after an event (ie circuit down) reaches MCC.
- Anil
|
711.11 | clock it | JETSAM::WOODCOCK | | Thu Feb 21 1991 17:59 | 13 |
| The use of a generic timer seems to be an excellent approach. The start
clock can be driven by an event/alarm and then stopped by another event/alarm,
then report that time. The expression should be able to handle different
events for start and stop. This case it's circuit down and circuit up and
should allow for continuous monitoring for the events and not rely on alarms
fm to poll for info at intervals (the use of getevent). The basenote wants
to measure the outage times, my comments in .7 were to simply point out that
determining absolute time of multiple events cannot easily be done on an
interval basis. It's not a working solution unless you gather the events and
work one time stamp against the other.
brad...
|
711.12 | | NSSG::R_SPENCE | Nets don't fail me now... | Fri Feb 22 1991 14:18 | 6 |
| In fact, when I read over the first proposal (several reples back)
of having (sequence(event-1, event-2, duration)), what popped into
my mind before I read the explanation was that the rule would fire
when the sequence occured, and PASS the duration to the proceedure.
s/rob
|
711.13 | from a user only to the experts | ULYSSE::ZITTA | Batchman | Mon Feb 25 1991 05:34 | 89 |
|
Maybe at this time of discussion , we can enlarge the debate .
In our group we remotely monitor,manage ,operate and support internal
and external networks 24h/365days . Important area to improve (I think)
,is the Transmissions and physical layer management and my base
note was just a small (but important)part of it .
Right now there is no easy and direct DEC tool to remotely monitor or
manage a line or a circuit .So ,here are some inputs that ,I hope, will
help to find the best solution .
For "store and forward" type of networks ,it is not (very) important
to react quickly ,but for "real-time" networks (where each bit lost
is worth $$$$$$$$ (-: ) ,we may need to detect a problem in ,
let's say ,3 seconds max , and troubleshoot it (localize) in 15' max
for instance .
As a minimum we want to react as soon as a circuit is down for
more than x seconds/minutes,for the monitoring part and if needed for
troubleshooting ,we want to be able to retrieve more details if
needed .The point is to flag only critical events at the monitoring
center .
There are also different needs (Basically for use NOW , DAILY or
MONTHLY) :
- Monitoring of all the events if needed .
- Automatic Storage and easy retrieval of the events .
- Detection of Priority #1 problems that need immediate attention or
escalation to support .
- Preventive maintenance : Monitor some quality parameters over a
longer period (e.g one month) and detect any degradation .
This can be useful for dealing with carriers and planning upgrades .
- Error Performance (ES,SES,DM)
- Testing : If a link fails ,break down the DECmcc results/counters
to ensure a detailed analysis that will localize the problem .
We need results at the end of a test period ,daily or monthly ,as
well as indication of error distribution over time .
The above needs can vary ,depending on the "maturity" of service.
o Development : Functional conformance test
o Installation/commissioningg/out-of-service repair :
Conformance and full quality validation
o In-service : Quality of service monitor
It can also depend on the regulations and country . But a good
reference are the CCITT recommendations G.821 ,G.921 and M.550 .
The following can give some ideas on what we need to manage digital
connections :
- G.821 defines a set of performance objectives to test 64kbps lines.
The parameters are :
o ES (Errored seconds) : Any second during which an error occurs
(unavailable time is excluded)
o SES (Severely errored seconds) : Any 1-second interval during
which the bit error ratio exceeds 10E-3 (unavailable time is excluded
too)
o DM (Degraded minutes) : Any 1-minute interval during which the
bit error ratio exceeds 10E-6 .(SES and unavailable time are excluded)
Unavailable time begins when the bit error ratio in each second exceeds
10E-3 for 10 consecutive seconds .These 10 seconds are unavailable
time .
- G.921 is for 2048kbps
- M.550 sets limits for ES,SES and DM according to the application
and defines the times over which they should be measured .
The best would be a system that enables the network managers to see
results on a daily basis graphically to quickly see which results
exceed the daily limits .
Those limits should be programmable (entered manually,or calculated
by DECmcc depending on other data like link grade ,length or circuit
classification ).They should satisfy the CCITT rec. or user-defined
limits .
Summary :
As a minimum ,I would like DECmcc to tell me when there is a "real"
problem .
As a complement , I would like DECmcc to perform at least some basic
functions of a Data Comm. analyzer .
I guess another question is : Is it possible in DECmcc the
transmission line to be a manageable object ?
gerard
|