[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference azur::mcc

Title:	DECmcc user notes file. Does not replace IPMT.
Notice:	Use IPMT for problems. Newsletter location in note 6187
Moderator:	TAEC::BEROUD

Created:	Mon Aug 21 1989
Last Modified:	Wed Jun 04 1997
Last Successful Update:	Fri Jun 06 1997
Number of topics:	6497
Total number of notes:	27359

1781.0. "Alarm Pre-Processing etc." by SUBWAY::REILLY (Mike Reilly - New York Bank District) Thu Nov 07 1991 13:59

    As the trouble ticket note is generating so much heat :-) I decided to
    move my current pet topic out here.

    It is my understanding that the alarms team are adding a number of 
    new features to the v1.2 product.  Could somebody describe exactly
    what extra functionality will be provided? 

    Having made a number of attempt to implement a large rule base for
    DECmcc I now feel that we must plan for some form of alarm
    pre-processing in the near future. Hearing that many of our customers
    are planning to have several hundred polls per hours means that
    customers will be screaming for this functionality in a few months.  At 
    the moment the only reason most of our customers are not polling every
    LAN bridge etc. on their network is that it is too much work to key in all
    these alarm rules and they haven't the time to write automatic load scripts.
     
    It has been implied in another note that a customer who has hundreds of
    alarms triggering at the time has done something wrong.  I don't agree
    with this.  If I create a rule which will watch every bridge on my LAN
    for a spanning tree change, then it is quite possible that all bridges
    will undergo a change within a few minutes.  The fact that a single bridge
    under-goes a spanning change is important and should be reported.  If
    however, 200 bridges have a spanning change than these alarms must be
    treated differently.  

    Another cause of large number of alarms is the classic network
    management problem of declaring everything on the network broken just
    because the management station cannot see it. In DECmcc lingo we call
    these events 'exceptions'.  If an intermediate device between the
    management station and the device being monitored dies then an
    exception will be generated for the remote device.  If the component
    was a router which links a remote site then all devices being monitored
    at the remote site will generate exceptions.  Because of the volume of
    exceptions that can be generated my current customer will no longer allow
    mail messages etc. to be sent when exceptions occurs. Instead  they
    rely on the device that failed to reboot and then alert them because
    the 'seconds operating timer' will have a low value.  This is far from
    a good solution as devices that never return to the network are not
    reported. 

    Given that large number of alarms or exceptions will occur, we now need
    a was of pre-processing these alarms.  I don't believe that much of
    this processing should take place within the Alarms FM. The Alarms
    FM should concentrate on providing the ability to collect large quantities 
    of raw-alarm data in a very efficient manor and perform minimal data 
    reduction.  A separate FM should be able to take the raw data, 
    perform some pre-processing and then feed a 'synthetic' alarm back into 
    the system.  This FM should be rule based and it's rule base exposed to
    the customers.  We already have some of the code to add some real
    analysis to this FM.  The ELM developers have developed a piece of code which
    can automatically generate the LAN's spanning tree, and can add the
    functionality to determine which devices resides on each segment.  If
    that map of the network is kept in memory within the Alarm
    Pre-Processing FM then problems due to bridge failures can be
    detected by noting which devices are causing exception messages. 
    Exception messages which were caused as a result of the device that
    failed can be filtered out. 
    By building similar internal maps for the routing tables of the various 
    protocols failure of routers and the associated 'exceptions' messages
    due to nodes becoming unreachable can be correlated to determine which
    component failed.      

    If this capability is not considered to be an important component of
    DECmcc then I would suggest that the DECmcc kit be shipped with a piece
    of code or an easy API which will allow customers to intercept Alarms
    on their way to the Notification FM, and allow customers to insert their
    own alarms into the system. 

    With many of our competitors adding intelligence to their network
    monitoring probes, we have a chance to add intelligence at the
    'enterprise' level which will be a real marketing advantage.  In a few
    years will will be able to detect that a TP application is building up
    a large IO queue on a network link and be able to look at the link and
    see if there are transmissions problems there.  Systems that can
    perform these functions will have high dollar value. 
    
    - Mike

T.R	Title	User	Personal Name	Date	Lines
1781.1	More requirements...	RIVAGE::SILVA	Carl Silva - Telecom Eng - DTN 828-5339	`Fri Nov 08 1991 06:03`	66
	> As the trouble ticket note is generating so much heat :-) I decided to > move my current pet topic out here. Well, its that the trouble ticket functionality seems to be a seperate domain from the alarm correlation/pre-processing. I know that a lot of RFPs come in with thte two areas lumped together, however you could have a trouble ticket management/tracking system without the alarm correlation (tickets created manually). > Having made a number of attempt to implement a large rule base for > DECmcc I now feel that we must plan for some form of alarm > pre-processing in the near future. Hearing that many of our customers > are planning to have several hundred polls per hours means that > customers will be screaming for this functionality in a few months. At > the moment the only reason most of our customers are not polling every > LAN bridge etc. on their network is that it is too much work to key in all > these alarm rules and they haven't the time to write automatic load scripts. Yes, it would be nice to have the alarms rules management functionality expanded so that alarm templates could be defined and automatically associated with managed object classes. > It has been implied in another note that a customer who has hundreds of > alarms triggering at the time has done something wrong. I don't know if it was me but I didn't mean to imply that the customer was doing something wrong. A fiber optic cable cut can generate many alarms. All I was saying is that the type of alarms/type distribution versus time should be looked at to detect patterns that may be able to be suppressed or pre-processed. > Given that large number of alarms or exceptions will occur, we now need > a was of pre-processing these alarms. I don't believe that much of > this processing should take place within the Alarms FM. The Alarms > FM should concentrate on providing the ability to collect large quantities > of raw-alarm data in a very efficient manor and perform minimal data > reduction. A separate FM should be able to take the raw data, > perform some pre-processing and then feed a 'synthetic' alarm back into > the system. This FM should be rule based and it's rule base exposed to > the customers. We already have some of the code to add some real > analysis to this FM. The ELM developers have developed a piece of code which > can automatically generate the LAN's spanning tree, and can add the > functionality to determine which devices resides on each segment. If > that map of the network is kept in memory within the Alarm > Pre-Processing FM then problems due to bridge failures can be > detected by noting which devices are causing exception messages. > Exception messages which were caused as a result of the device that > failed can be filtered out. > By building similar internal maps for the routing tables of the various > protocols failure of routers and the associated 'exceptions' messages > due to nodes becoming unreachable can be correlated to determine which > component failed. Sounds like we need an artificial intelligence kind of FM with a nice user interface. Any volunteers? 8-) > If this capability is not considered to be an important component of > DECmcc then I would suggest that the DECmcc kit be shipped with a piece > of code or an easy API which will allow customers to intercept Alarms > on their way to the Notification FM, and allow customers to insert their > own alarms into the system. This can be done now. You can build an FM that receives the alarm notifications from the alarms FM. Carl
1781.2	we need at least 2 answers	TOOK::MATTHEWS		`Fri Nov 08 1991 10:05`	45
	I will start by agreeing that customers need much greater functionality in the way of event/alarm correlation/filtering than we currently are providing. There are many reasons we are not doing more, most of them are purely financial. Yes, there is a need but it is not clear that there aren't many different needs identified here and I doubt that a single "hack" /"modification" will do any more than change the perception of the need. First, We need event correlation/filtering based upon topological knowledge of the network. Ie. DECmcc needs to know that to get to bridge X from the current instantiation of DECmcc requires going through bridge Y, Z, and A. This way a DECmcc "function" could make rational decisions about the effects of one of these bridges going down and whether to correlate these "events". You can do it with hardwired procedures as is suggested in the note or you can do it via based upon topological knowledge (which currently does not exist in DECmcc). Note that the hardwired solution breaks down when customers provide dormant paths in their network that are enabled by changing forwarding database entries and creating alternate topologies. Yes, I know that spanning tree doesn't allow cycles. But, dormant (ie. non forwarding links which are potential cycles) are allowed and can be used. Thus any hardwired procedure based on a static topology will fail in this case. I suggest that the there are 2 answers to this. First, there needs to be a way for customers to write simple filtering scripts to reduce multiple events occuring in some time scope to be correlated and generate a "more meaningful" event to go into the alarms fm. This has the advantage that it reduces the load on the alarms fm. It has the disadvantage that it increases the delay for receiving an event at the alarms fm interface. Second, we need to capture topology data about networks including dynamic views and static views so that an event filtering mechanism can provide event filtering based upon topology. Neither of these will be in V1.2. It is possible for the first to be done by the next release after V1.2. The topology part of the second is being planned but the actual filtering algorithm/mechanism is not currently understood enough to be planned. wally
1781.3	configuration management if necessary...	RIVAGE::SILVA	Carl Silva - Telecom Eng - DTN 828-5339	`Fri Nov 08 1991 10:19`	16
	RE: .2, > First, We need event correlation/filtering based upon topological > knowledge of the network. Ie. DECmcc needs to know that to get > to bridge X from the current instantiation of DECmcc requires > going through bridge Y, Z, and A. This way a DECmcc "function" > could make rational decisions about the effects of one of these > bridges going down and whether to correlate these "events". Yes, it is definitely clear that without configuration information in the system, it is very difficult to do the alarm correlation. Without the config info all you will have is the info contained in the alarm reports. Are there plans for MCC to do configuration management? Carl
1781.4	Alarm & Notification APIs?	ANDRIS::putnins	Hands across the Baltics	`Fri Nov 08 1991 10:56`	34
	Re: .1 > This can be done now. You can build an FM that receives the alarm >notifications from the alarms FM. Where can I obtain information on the Alarms and Notification FM interfaces? I would like to be able to write an FM that screens events and writes them into an external database so another, independent, program can process them. This may be in addition to the facilities offered by PNMP, below. Re: .2 >First, there needs to be a way for customers > to write simple filtering scripts to reduce multiple events > occuring in some time scope to be correlated and generate a > "more meaningful" event to go into the alarms fm. It appears that the PNMP Alarm Handling FM will offer this capability (see TAEC::PNMP conference, note 3.2): The PNMP Alarm Handioling FM provides the following alarm handling functions: 1. The ability to define an Operation Context for alarm handling. 2. Collecting alarm reports generated by managed object or generated by user defined rules. 3. Filtering alarm reports using an ISO conformant discriminator construct. 4. Creating Alarm Objects corresponding to the filtered alarm reports and maintaining these objects. Alarm Objects can then be acknowledged, handled, closed, archived and/or purged. 4. Notifying the PNMP PM when an Alarm Object has been created or when its status has changed. 5. Escalation (with the creation of a new Alarm Object) when an Alarm Object has not been acknowledged or handled before a specified time (change in time per severity).
1781.5	PNMP Alarm Handling FM	RIVAGE::SILVA	Carl Silva - Telecom Eng - DTN 828-5339	`Fri Nov 08 1991 12:07`	24
	>Where can I obtain information on the Alarms and Notification FM >interfaces? I would like to be able to write an FM that screens events >and writes them into an external database so another, independent, >program can process them. This may be in addition to the facilities >offered by PNMP, below. The API is the normal interface that all modules use (mcc_call_access or mcc_call_function). >>First, there needs to be a way for customers >> to write simple filtering scripts to reduce multiple events >> occuring in some time scope to be correlated and generate a >> "more meaningful" event to go into the alarms fm. > >It appears that the PNMP Alarm Handling FM will offer this capability >(see TAEC::PNMP conference, note 3.2): Yes, it will provide some of this functionality. RE: .2, Can you expand on your requirements? Carl
1781.6	As usual, sample code would do wonders..	SUBWAY::REILLY	Mike Reilly - New York Bank District	`Fri Nov 08 1991 12:15`	30
	re: .2 > This can be done now. You can build an FM that receives the alarm > notifications from the alarms FM. If you know of somewhere I could get my hands on some code which retrieves alarms and inserts alarms into the system I would be very interested in developing the link to a rule based system. I would suggest that code such as this should be shipped with the next release of DECmcc. This will allow customers to use their own alarm pre-processing algorithms in the near term. My current customer would like to define DECmcc alarm rules which all have a severity of 'warning' or lower. When the alarms are passed thru the pre-processing algorithm, a synthetic alarm would be generated( if needed) with a severity of 'critical'. The network operations staff would only have to watch for alarms which have a severity of 'critical'. The original alarms would still be available if needed. We also need a fix to the problem of all exceptions being flagged as 'critical' for this scheme to work. With regard to the development of an AI system to handle this, I have heard that the south of France provides the ideal environment for the development of AI systems :-). - Mike
1781.7		RIVAGE::SILVA	Carl Silva - Telecom Eng - DTN 828-5339	`Fri Nov 08 1991 12:17`	5
	With regard to the development of an AI system to handle this, I have heard that the south of France provides the ideal environment for the development of AI systems :-). Yes, it does!
1781.8	1.2 notif has alot of what you are asking for	TOOK::CALLANDER	MCC = My Constant Companion	`Fri Jan 03 1992 11:30`	23
	okay, I know this discussion is old but... The MRM (module reference manuals) ship with the kit. The alarms and notif ones will be update along with the rest for final v1.2 shipment. These documents explain the functions supported by each module. Now as to most of the functionality you have listed, you should find alot of it in the notification services in the 1.2 field test kit. The logging stuff didn't make it into field test but will be in the final product. The log is supposed to allow logging based on the filters you have defined. As to an open api to do what you are looking for, the data collector AM provides an easy to use open interface that allows for inormation to be passed into mcc from any application you want to connect to the api. The documentation on this is only being distributed to a few specific field test sites so that we can first determine if our implementation meets the need. If you want more information on the data collector please send mail to Anne Pelagatti or Wally Matthews (both are located on TOOK::) jill