T.R | Title | User | Personal Name | Date | Lines |
---|
1194.1 | Try SHOW first. If that works we may have a bug in Alarms! | WAKEME::ANIL | | Thu Jun 27 1991 17:54 | 17 |
| Hi Brad,
As usual you were the first to try Alarms on past data. I don't see
why Alarms should not be able to handle the case. I did not document
it just because I felt it would be a very difficult concept for a user to
grasp and its usefulness was questionable.
Now that you have tried it when not so happy results, can I request you to
try it for just pure data and not stats. Also another question,
were you able to do a "show" on the data that you were trying to
Alarm on?
Let us know your findings.
- Anil Navkal
|
1194.2 | avoids polling | JETSAM::WOODCOCK | | Fri Jun 28 1991 13:57 | 50 |
| Hi Anil,
Now that I know it *should* work I'll dig in and see what I find. As far
as its usefulness I've got some very real needs for it. One complaint of MCC
that I've heard is its complexity. I'm not sure if this should be a seperate
topic or not but I'm hoping MCC managers are listening. Please bear with me
as I get a little long winded as I throw real numbers out on the table.
There are a couple of issues at hand which MCC should try to improve. Number
of rules (ie. complexity) and amounts of WAN polling necessary to manage the
WAN. I know there have been some procedures to help write the rules but this
should probably be taken a step further to some sort of definable default
services for the different entity classes.
The numbers game...
I've got 50 routers and 80 circuits (reality) to manage. If I compare a
similar setup to what is used today to manage the net I come up with the
following amounts of rules and polling. This is very conservative and
nothing fancy.
A rule for each router (50) dealing with circuit outages (events, no polling).
A rule for each site (12) dealing with node outages (events, no polling).
A rule for each circuit (80) for off hours circuit monitoring each 15 min.
(320 polls/hr)
Export or Record mainly *just* counters for each circuit (80) and each
router (50) only once an hour (130 polls/hr).
A rule for each circuit for utilization and error threshold as a warning.
One for inbouond, outbound, and errors. 80 circuits 240 rules polling
at hourly intervals. PA polls twice per interval therefore 480 polls/hr.
A rule for each circuit for utilization and error threshold as a problem.
One for inbouond, outbound, and errors. 80 circuits 240 rules polling
at hourly intervals. PA polls twice per interval therefore 480 polls/hr.
A rule for each router (50) for packet thruput each hour. 100 polls/hr.
Grand totals are in the neighborhood of 670 rules and 1510
polls/hr...conservatively!!! No can do. I may not be able to get the number
of rules down but if I use past time for non real-time needs (ie. all stats)
I can reduce the polling by more than 1000 polls/hr.
While I love the versatility of this product it does come at a price. Other
large companies will also have to grapple with these numbers and look to cut
back on certain non-essentials to make the management managable.
In any event, I'll let you know of my successes with alarms for past times.
best regards,
brad...
|
1194.3 | prob parsing 'in domain ...' | LUVBOT::MCC | | Mon Jul 01 1991 12:01 | 24 |
| In doing a SHOW command the domain needs to be specified. But if I put the
"in domain" qualifier into the expression ALARMS doesn't appear to parse it
as needed. Bug or unsupported???
thanks,
brad...
create mcc 0 alarms rule past_count -
expression=(node4 bbpk01 cir syn-0 circuit down>0,for start 16:30 -
,in domain .pko-24),-
procedure=mcc_common:mcc_alarms_mail_alarm.com,parameter=mcc,-
in domain .pko-24
!
!MCC 0 ALARMS RULE past_count
!AT 28-JUN-1991 16:50:43
!
!Missing right parenthesis in alarm expression.
!
exit
!
|
1194.4 | Bot - bug and unsupported | TOOK::ORENSTEIN | | Mon Jul 01 1991 15:22 | 22 |
| >>> In doing a SHOW command the domain needs to be specified. But if I put the
>>> "in domain" qualifier into the expression ALARMS doesn't appear to parse it
>>> as needed. Bug or unsupported???
Both.
In investigating this, I found a bug in the parse routine for
prepositions. A QAR has been filed.
Also, I discovered that ALARMS does not support the IN DOMAIN
qualifier in expressions. Currently there is no support for
examining data on a domain basis. A QAR has been filed.
I saw your math that states that polling historical data could
save you 1000 polls on your network, but I did have some trouble
understanding that. How important is this to you?
I will see what can get done for V1.2, but I make NO promises.
aud...
|
1194.5 | It IS important! | NSSG::R_SPENCE | Nets don't fail me now... | Mon Jul 01 1991 16:16 | 32 |
| The savings in polling is very important. 1000 polls per hour
translates to an average of 16 per minute. That will take a very big
system to support in order to leave some resources to deal with
and alarms testing true or exception handeling not to mention
any management actions initiated by people.
Where is the savings? Well, for example...
To export data on a node4, (router for example), the Ethernet line
and circuit plus the 4 sync lines and circuits, I have to poll the
router 15 times (maybe more?).
The same sort of number comes up for the Historical Recording. I don't
know what happens if you specify several partitions (Brad, you might
want to make sure you record and then export characteristics too in
case you want to do any external reporting that needs line speeds).
Then, if we want to have alarm rules for % utilization inbound and
outbound plus errors, we add another 15 polls.
That adds up to 45 polls per router for each time we want all this
stuff. If the Historian could record it all with a minimum number
of polls and then export and alarms use the recorded data the polling
could perhaps be reduced from 45 to 10 or less.
All the RFIs and RFPs I am seeing these days on Network Management
are actually asking us what the traffic level that the management
system will add to the network is. We need to be able to minimize
that traffic.
Hope this clears it up.
s/rob
|
1194.6 | suggestions | JETSAM::WOODCOCK | | Mon Jul 01 1991 17:08 | 38 |
| Hi,
> The same sort of number comes up for the Historical Recording. I don't
> know what happens if you specify several partitions (Brad, you might
> want to make sure you record and then export characteristics too in
> case you want to do any external reporting that needs line speeds).
Actually, I was planning on getting LINE characteristics only once a day for
each circuit to handle reports. Like I said, the numbers were conservative.
As far as what's needed, I'm going to look for different approaches to get the
job done with V1.1. I'll probably end up scaling back info (one threshold for
errors rather than two) and hack something together that partially uses MCC.
But I'm willing to bet big bucks if our customers understood the mechanics
they won't be happy.
Suggestions, I've got three:
1. Bring in the support for ALARMS handling historical (in specific domains)
data. And fix the parsing bug. VERY IMPORTANT.
2. Change the way PA operates today. This was a previous suggestion but worth
mentioning several times :-). Rather than having PA poll at the beginning
and end of each interval, have PA poll once each interval and subtract
last_counters from present_counters for calculations (also holds true for
reports). This solves two problems. It effectively reduces the number of
polls by half. Also, as the polling interval decreases MCCs accuracy becomes
more dependent on both system and network performance with todays method
because the polling must be accurate. If you use the one poll method the
polls could be off but the stats are always on the money. A must in my
opinion.
3. As a bonus set up a utility which handles default (user definable) services
for different entity classes (ie. alarms, stats). This hides some of the
complexity of the management environment.
best regards,
brad...
|
1194.7 | I have a dream... | WAKEME::ANIL | | Tue Jul 02 1991 13:40 | 47 |
| Hi Rob and Brad,
Thanks for the valuable data about number of rules needed to manage
a reasonable size network. We will look *very* seriously to provide
the domain support in rule expression but we also have face the
reality of available (or lack there of!) people power.
Talking along the lines of suggestions, from the users point of view
the following thought makes a hell of a sense:
Record the following attribute for entity foo every
1 hour and
by the way let me know if the attribute cross the thresholds
indicated them.
List Attribute partitions to record
o Characteristics
o Counters
o Status
List of attributes threshold values Change
for thresholds upper bound lower bound from to
aaa 10 20
bbb 30 40
ccc Enable Disable
ddd router non router
eee 15.5 20.5
Yes I now I am dreaming for now. But I do want to make two points.
1. There is no reason why we can not evaluate the data as it is being
collected. Yes that does mean, Alarms and Historian have to communicate
a lot! But look at the advantage. We need not poll twice for the
same data, nor do we have to wait for the data to be in the MIR.
2. A very simplified user interface that does not need 100 rules to
monitor 100 attributes! Thus saving on the resources.
I know all this is hind sight. I only hope it becomes a foresight
for the future!!
- Anil Navkal
|
1194.8 | Ain't that the truth | TOOK::ORENSTEIN | | Tue Jul 02 1991 14:46 | 6 |
|
Now that sounds like true integration of Network Management
Products!
aud...
|
1194.9 | Yup | NSSG::R_SPENCE | Nets don't fail me now... | Tue Jul 02 1991 14:55 | 9 |
| Anil, exactly...
And add to it integration of Export as well.
Seems like there should be a "data gatherer FM" that gets called for
entity data and by using fuzzy logic it could reduce the network
traffic needed for management by consolodating requests for data.
s/rob
|
1194.10 | Deja vu! | DFLAT::PLOUFFE | Jerry | Tue Jul 02 1991 16:07 | 18 |
|
> Seems like there should be a "data gatherer FM" that gets called for
> entity data and by using fuzzy logic it could reduce the network
> traffic needed for management by consolodating requests for data.
This is exactly what is needed. We used to call this a "subscription
service" and it was talked about many moons agos. I'm glad to see it
brought back to light. Hopefully Brad's numbers will provide the
necessary justification.
We did not call it an FM , we called it a "service" since we thought of it
as being part of the IM. After all, the IM handles all scheduling of
operations (including SHOWs) so it possibly could implement the "fuzzy
logic" that you mentioned.
Whatever the design, it certainly seems to be necessary...
- Jerry
|
1194.11 | A couple of solutions | TOOK::ORENSTEIN | | Tue Jul 09 1991 14:25 | 28 |
|
Back to the original topic: Can ALARMS do rules on historical data?
re .3
There are two possibilities for allowing this:
1. Let the user decide:
As you did in your example, we could allow the IN DOMAIN qualifier in
the expression -- the bug you found could be fixed? But this may be
confusing because you could be in domain A and have rules on data
recorded from domain B.
2. Make it transparent:
The domain in which the rules are ENABLED could be used for
determining the domain of the recorded data. In this case, the IN
DOMAIN qualifer will not be allowed in a rule expression; but, when
using the MAP everything will be transparent since the domain
is implicit on every comman. This means that you can be in Domain A
and any rules that you Enable will only watch entities in Domain A.
I prefer possibility 2.
Feedback?
aud. ..
|
1194.12 | either method ok | JETSAM::WOODCOCK | | Tue Jul 09 1991 15:21 | 6 |
| I think method two would be sufficient for our needs. Although someone down
the road might find uses for the first method depending on the domain structure
and how they intend to use it.
regards,
brad...
|
1194.13 | Clarificatin on .-2 | WAKEME::ANIL | | Wed Jul 10 1991 09:29 | 26 |
| Before anyone jumps at us I would like to clarify the following point:
> 2. Make it transparent:
>
> The domain in which the rules are ENABLED could be used for
> determining the domain of the recorded data. In this case, the IN
> DOMAIN qualifer will not be allowed in a rule expression; but, when
> using the MAP everything will be transparent since the domain
> is implicit on every comman. This means that you can be in Domain A
> and any rules that you Enable will only watch entities in Domain A.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
What Alarms will do is issue a SHOW/GETEVENT directive with
IN_Q filled in. The MM then can choose to ingore the IN_Q
qualifier and provide the information for the entity. What
this means is that if trhe node foo in *not* a member of DOMAIN
A, Alarms will still get the data to evaluate the rule
as long as past timne has not been specified.
If you do specify past time you will have to have the Historian
collected the data for the node foo which will then in turn
will have to be the member of Domain A! (Boy is it complicted!!)
Hope this helps. ;)
- Anil
|