[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference azur::mcc

Title:DECmcc user notes file. Does not replace IPMT.
Notice:Use IPMT for problems. Newsletter location in note 6187
Moderator:TAEC::BEROUD
Created:Mon Aug 21 1989
Last Modified:Wed Jun 04 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:6497
Total number of notes:27359

5839.0. "SNMP Alarm woes" by ADO75A::BOUCHER (Reece Boucher) Sat Jan 22 1994 19:02

Greetings,

I am having some problems with SNMP IP reachability alarms.

I am running DECmcc V1.3.0 on a VMS V5.5-2 VAXstation 4000 Model 60.  I am 
currently monitoring approximately 70 cisco routers and need the following to 
occur:

	Poll for IP reachability every 60 sec.  If a node is unreachable, send a 
mail message DECMCC::SYSTEM, which gets directed directly to Target HOTline 
(Help desk application).  Therefore, IP Poller no good.

	Poll for IP reachability every 60 sec.  If reachability up, send 
notification with severity clear to IMPM.  No mail sent. (used to clear the icon 
on the map only).

What is actually happening is that I am getting a lot of alarms firing saying 
that entities are unreachable, when in fact they are.  This is causing a number 
of problems WRT Target HOTline.  It appears that a low value of ICMP timeout or 
retry is being applied, even though I have increased them in the process that 
enables the alarms.

Attached are copies of the alarms rules that are enabled, as well as the log 
file that is created by the alarms process.

Q:  Is there any way I can use the IP reachability Poller to fire an alarm ?  I 
need to be able to send a MAIL message on a poll failure.

Q:  Is there a better way of achieving what I need to do ?

Your help is greatly appreciated.

Regards,



Reece Boucher
Adelaide, Australia

!
! MCC Alarm Rules
!
!   IP Reachability Down rules
!
Create Domain  LOCAL_NS:.state_net_hubs Rule SNMP_IP_Reach_Down  -
  Expression   = (CHANGE_OF (SNMP * ipReachability, Up,*),at every=00:01:00), -
  Severity     = Critical, -
  Category     = "Router", -
  Description  = " IP Reachability = DOWN.  Node is unreachable by IP", -
  Alarm Fired Procedure        = DKA200:[ALARMS]MCC_ALARMS_HOTLINE.COM, -
  Alarm Exception Procedure    = DKA200:[ALARMS]MCC_ALARMS_EXCEPTION.COM, -
  Alarm Fired Parameters       = "HOTLINE", -
  Batch Queue                  = "alarms$batch"
!
!
Create Domain  LOCAL_NS:.primary_industries Rule SNMP_IP_Reach_Down  -
  Expression   = (CHANGE_OF (SNMP * ipReachability, Up,*),at every=00:01:00), -
  Severity     = Critical, -
  Category     = "Router", -
  Description  = " IP Reachability = DOWN.  Node is unreachable by IP", -
  Alarm Fired Procedure        = DKA200:[ALARMS]MCC_ALARMS_HOTLINE.COM, -
  Alarm Exception Procedure    = DKA200:[ALARMS]MCC_ALARMS_EXCEPTION.COM, -
  Alarm Fired Parameters       = "HOTLINE", -
  Batch Queue                  = "alarms$batch"
!
!
Create Domain LOCAL_NS:.road_transport Rule SNMP_IP_Reach_Down  -
  Expression   = (CHANGE_OF (SNMP * ipReachability, Up,*),at every=00:01:00), -
  Severity     = Critical, -
  Category     = "Router", -
  Description  = " IP Reachability = DOWN.  Node is unreachable by IP", -
  Alarm Fired Procedure        = DKA200:[ALARMS]MCC_ALARMS_HOTLINE.COM, -
  Alarm Exception Procedure    = DKA200:[ALARMS]MCC_ALARMS_EXCEPTION.COM, -
  Alarm Fired Parameters       = "HOTLINE", -
  Batch Queue                  = "alarms$batch"
!
!
Create Domain LOCAL_NS:.southern_power_and_water Rule SNMP_IP_Reach_Down  -
  Expression   = (CHANGE_OF (SNMP * ipReachability, Up,*),at every=00:01:00), -
  Severity     = Critical, -
  Category     = "Router", -
  Description  = " IP Reachability = DOWN.  Node is unreachable by IP", -
  Alarm Fired Procedure        = DKA200:[ALARMS]MCC_ALARMS_HOTLINE.COM, -
  Alarm Exception Procedure    = DKA200:[ALARMS]MCC_ALARMS_EXCEPTION.COM, -
  Alarm Fired Parameters       = "HOTLINE", -
  Batch Queue                  = "alarms$batch"
!
!
Create Domain LOCAL_NS:.environ_land_management Rule SNMP_IP_Reach_Down  -
  Expression   = (CHANGE_OF (SNMP * ipReachability, Up,*),at every=00:01:00), -
  Severity     = Critical, -
  Category     = "Router", -
  Description  = " IP Reachability = DOWN.  Node is unreachable by IP", -
  Alarm Fired Procedure        = DKA200:[ALARMS]MCC_ALARMS_HOTLINE.COM, -
  Alarm Exception Procedure    = DKA200:[ALARMS]MCC_ALARMS_EXCEPTION.COM, -
  Alarm Fired Parameters       = "HOTLINE", -
  Batch Queue                  = "alarms$batch"
!
!
Create Domain LOCAL_NS:.housing_urban_development Rule SNMP_IP_Reach_Down  -
  Expression   = (CHANGE_OF (SNMP * ipReachability, Up,*),at every=00:01:00), -
  Severity     = Critical, -
  Category     = "Router", -
  Description  = " IP Reachability = DOWN.  Node is unreachable by IP", -
  Alarm Fired Procedure        = DKA200:[ALARMS]MCC_ALARMS_HOTLINE.COM, -
  Alarm Exception Procedure    = DKA200:[ALARMS]MCC_ALARMS_EXCEPTION.COM, -
  Alarm Fired Parameters       = "HOTLINE", -
  Batch Queue                  = "alarms$batch"
!
!
Create Domain LOCAL_NS:.labour_admin_services Rule SNMP_IP_Reach_Down  -
  Expression   = (CHANGE_OF (SNMP * ipReachability, Up,*),at every=00:01:00), -
  Severity     = Critical, -
  Category     = "Router", -
  Description  = " IP Reachability = DOWN.  Node is unreachable by IP", -
  Alarm Fired Procedure        = DKA200:[ALARMS]MCC_ALARMS_HOTLINE.COM, -
  Alarm Exception Procedure    = DKA200:[ALARMS]MCC_ALARMS_EXCEPTION.COM, -
  Alarm Fired Parameters       = "HOTLINE", -
  Batch Queue                  = "alarms$batch"
!
!
Create Domain LOCAL_NS:.justice Rule SNMP_IP_Reach_Down  -
  Expression   = (CHANGE_OF (SNMP * ipReachability, Up,*),at every=00:01:00), -
  Severity     = Critical, -
  Category     = "Router", -
  Description  = " IP Reachability = DOWN.  Node is unreachable by IP", -
  Alarm Fired Procedure        = DKA200:[ALARMS]MCC_ALARMS_HOTLINE.COM, -
  Alarm Exception Procedure    = DKA200:[ALARMS]MCC_ALARMS_EXCEPTION.COM, -
  Alarm Fired Parameters       = "HOTLINE", -
  Batch Queue                  = "alarms$batch"
!
!
Create Domain LOCAL_NS:.arts_cultural_heritage Rule SNMP_IP_Reach_Down  -
  Expression   = (CHANGE_OF (SNMP * ipReachability, Up,*),at every=00:01:00), -
  Severity     = Critical, -
  Category     = "Router", -
  Description  = " IP Reachability = DOWN.  Node is unreachable by IP", -
  Alarm Fired Procedure        = DKA200:[ALARMS]MCC_ALARMS_HOTLINE.COM, -
  Alarm Exception Procedure    = DKA200:[ALARMS]MCC_ALARMS_EXCEPTION.COM, -
  Alarm Fired Parameters       = "HOTLINE", -
  Batch Queue                  = "alarms$batch"
!
!
Create Domain LOCAL_NS:.premier_govt_management Rule SNMP_IP_Reach_Down  -
  Expression   = (CHANGE_OF (SNMP * ipReachability, Up,*),at every=00:01:00), -
  Severity     = Critical, -
  Category     = "Router", -
  Description  = " IP Reachability = DOWN.  Node is unreachable by IP", -
  Alarm Fired Procedure        = DKA200:[ALARMS]MCC_ALARMS_HOTLINE.COM, -
  Alarm Exception Procedure    = DKA200:[ALARMS]MCC_ALARMS_EXCEPTION.COM, -
  Alarm Fired Parameters       = "HOTLINE", -
  Batch Queue                  = "alarms$batch"
!
!
!   IP Reachability Up rules
!
Create Domain LOCAL_NS:.state_net_hubs Rule SNMP_IP_Reach_Up  -
  Expression   = (CHANGE_OF (SNMP * ipReachability, down, up),at every=00:01:00), -
  Severity     = Clear, -
  Category     = "Router", -
  Description  = " IP Reachability = UP.  Node is now reachable by IP", -
  Alarm Fired Procedure        = DKA200:[ALARMS]MCC_ALARMS_BROADCAST.COM, -
  Alarm Exception Procedure    = DKA200:[ALARMS]MCC_ALARMS_EXCEPTION.COM, -
  Alarm Fired Parameters       = "NETMAN", -
  Batch Queue                  = "alarms$batch"
!
!
Create Domain  LOCAL_NS:.primary_industries Rule SNMP_IP_Reach_Up  -
  Expression   = (CHANGE_OF (SNMP * ipReachability, down, up),at every=00:01:00), -
  Severity     = Clear, -
  Category     = "Router", -
  Description  = " IP Reachability = UP.  Node is now reachable by IP", -
  Alarm Fired Procedure        = DKA200:[ALARMS]MCC_ALARMS_BROADCAST.COM, -
  Alarm Exception Procedure    = DKA200:[ALARMS]MCC_ALARMS_EXCEPTION.COM, -
  Alarm Fired Parameters       = "NETMAN", -
  Batch Queue                  = "alarms$batch"
!
!
Create Domain LOCAL_NS:.road_transport Rule SNMP_IP_Reach_Up  -
  Expression   = (CHANGE_OF (SNMP * ipReachability, down, up),at every=00:01:00), -
  Severity     = Clear, -
  Category     = "Router", -
  Description  = " IP Reachability = UP.  Node is now reachable by IP", -
  Alarm Fired Procedure        = DKA200:[ALARMS]MCC_ALARMS_BROADCAST.COM, -
  Alarm Exception Procedure    = DKA200:[ALARMS]MCC_ALARMS_EXCEPTION.COM, -
  Alarm Fired Parameters       = "NETMAN", -
  Batch Queue                  = "alarms$batch"
!
!
Create Domain LOCAL_NS:.southern_power_and_water Rule SNMP_IP_Reach_Up  -
  Expression   = (CHANGE_OF (SNMP * ipReachability, down, up),at every=00:01:00), -
  Severity     = Clear, -
  Category     = "Router", -
  Description  = " IP Reachability = UP.  Node is now reachable by IP", -
  Alarm Fired Procedure        = DKA200:[ALARMS]MCC_ALARMS_BROADCAST.COM, -
  Alarm Exception Procedure    = DKA200:[ALARMS]MCC_ALARMS_EXCEPTION.COM, -
  Alarm Fired Parameters       = "NETMAN", -
  Batch Queue                  = "alarms$batch"
!
!
Create Domain LOCAL_NS:.environ_land_management Rule SNMP_IP_Reach_Up  -
  Expression   = (CHANGE_OF (SNMP * ipReachability, down, up),at every=00:01:00), -
  Severity     = Clear, -
  Category     = "Router", -
  Description  = " IP Reachability = UP.  Node is now reachable by IP", -
  Alarm Fired Procedure        = DKA200:[ALARMS]MCC_ALARMS_BROADCAST.COM, -
  Alarm Exception Procedure    = DKA200:[ALARMS]MCC_ALARMS_EXCEPTION.COM, -
  Alarm Fired Parameters       = "NETMAN", -
  Batch Queue                  = "alarms$batch"
!
!
Create Domain LOCAL_NS:.housing_urban_development Rule SNMP_IP_Reach_Up  -
  Expression   = (CHANGE_OF (SNMP * ipReachability, down, up),at every=00:01:00), -
  Severity     = Clear, -
  Category     = "Router", -
  Description  = " IP Reachability = UP.  Node is now reachable by IP", -
  Alarm Fired Procedure        = DKA200:[ALARMS]MCC_ALARMS_BROADCAST.COM, -
  Alarm Exception Procedure    = DKA200:[ALARMS]MCC_ALARMS_EXCEPTION.COM, -
  Alarm Fired Parameters       = "NETMAN", -
  Batch Queue                  = "alarms$batch"
!
!
Create Domain LOCAL_NS:.labour_admin_services Rule SNMP_IP_Reach_Up  -
  Expression   = (CHANGE_OF (SNMP * ipReachability, down, up),at every=00:01:00), -
  Severity     = Clear, -
  Category     = "Router", -
  Description  = " IP Reachability = UP.  Node is now reachable by IP", -
  Alarm Fired Procedure        = DKA200:[ALARMS]MCC_ALARMS_BROADCAST.COM, -
  Alarm Exception Procedure    = DKA200:[ALARMS]MCC_ALARMS_EXCEPTION.COM, -
  Alarm Fired Parameters       = "NETMAN", -
  Batch Queue                  = "alarms$batch"
!
!
Create Domain LOCAL_NS:.justice Rule SNMP_IP_Reach_Up  -
  Expression   = (CHANGE_OF (SNMP * ipReachability, down, up),at every=00:01:00), -
  Severity     = Clear, -
  Category     = "Router", -
  Description  = " IP Reachability = UP.  Node is now reachable by IP", -
  Alarm Fired Procedure        = DKA200:[ALARMS]MCC_ALARMS_BROADCAST.COM, -
  Alarm Exception Procedure    = DKA200:[ALARMS]MCC_ALARMS_EXCEPTION.COM, -
  Alarm Fired Parameters       = "NETMAN", -
  Batch Queue                  = "alarms$batch"
!
!
Create Domain LOCAL_NS:.arts_cultural_heritage Rule SNMP_IP_Reach_Up  -
  Expression   = (CHANGE_OF (SNMP * ipReachability, down, up),at every=00:01:00), -
  Severity     = Clear, -
  Category     = "Router", -
  Description  = " IP Reachability = UP.  Node is now reachable by IP", -
  Alarm Fired Procedure        = DKA200:[ALARMS]MCC_ALARMS_BROADCAST.COM, -
  Alarm Exception Procedure    = DKA200:[ALARMS]MCC_ALARMS_EXCEPTION.COM, -
  Alarm Fired Parameters       = "NETMAN", -
  Batch Queue                  = "alarms$batch"
!
!
Create Domain LOCAL_NS:.premier_govt_management Rule SNMP_IP_Reach_Up  -
  Expression   = (CHANGE_OF (SNMP * ipReachability, down, up),at every=00:01:00), -
  Severity     = Clear, -
  Category     = "Router", -
  Description  = " IP Reachability = UP.  Node is now reachable by IP", -
  Alarm Fired Procedure        = DKA200:[ALARMS]MCC_ALARMS_BROADCAST.COM, -
  Alarm Exception Procedure    = DKA200:[ALARMS]MCC_ALARMS_EXCEPTION.COM, -
  Alarm Fired Parameters       = "NETMAN", -
  Batch Queue                  = "alarms$batch"
!
!


$!
$ manage/enterprise
!
! enable alarms
!
SET MCC 0 TCPIP_AM UDP TIMEOUT=30,UDP RETRIES=3,ICMP TIMEOUT=30,ICMP RETRIES=3
show mcc 0 tcpip_am all attr
do mcc_alarms:enable_alarms.com
!
! wait for a very long time
show mcc 0  all id, at start=(+9999-00:00:00)
exit
$ exit


$set noverify
DECmcc (V1.3.0)


MCC 0 TCPIP_AM 
AT 21-JAN-1994 13:54:51 Characteristics

Examination of Attributes Shows
                            UDP Timeout = 30
                            UDP Retries = 3
                           ICMP Timeout = 30
                           ICMP Retries = 3

MCC 0 TCPIP_AM 
AT 21-JAN-1994 13:54:51 All Attributes

                      Component Version = V1.3.0
               Component Identification = "DECmcc TCP/IP SNMP AM"
                            UDP Timeout = 30
                            UDP Retries = 3
                           ICMP Timeout = 30
                           ICMP Retries = 3
               Mib Extensions Available = ( "rmon",
                                            "EXP_RMON",
                                            "cisco",
                                            "novell",
                                            "synoptics" )

Domain LOCAL_NS:.primary_industries Rule SNMP_IP_Reach_Down 
AT 21-JAN-1994 13:54:53 

Normal operation has begun.

Domain LOCAL_NS:.primary_industries Rule SNMP_IP_Reach_Up 
AT 21-JAN-1994 13:54:54 

Normal operation has begun.

Domain LOCAL_NS:.road_transport Rule SNMP_IP_Reach_Down 
AT 21-JAN-1994 13:54:55 

Normal operation has begun.

Domain LOCAL_NS:.road_transport Rule SNMP_IP_Reach_Up 
AT 21-JAN-1994 13:54:55 

Normal operation has begun.

Domain LOCAL_NS:.southern_power_and_water Rule SNMP_IP_Reach_Down 
AT 21-JAN-1994 13:54:56 

Normal operation has begun.

Domain LOCAL_NS:.southern_power_and_water Rule SNMP_IP_Reach_Up 
AT 21-JAN-1994 13:54:56 

Normal operation has begun.

Domain LOCAL_NS:.environ_land_management Rule SNMP_IP_Reach_Down 
AT 21-JAN-1994 13:54:57 

Normal operation has begun.

Domain LOCAL_NS:.environ_land_management Rule SNMP_IP_Reach_Up 
AT 21-JAN-1994 13:54:57 

Normal operation has begun.

Domain LOCAL_NS:.housing_urban_development Rule SNMP_IP_Reach_Down 
AT 21-JAN-1994 13:54:58 

Normal operation has begun.

Domain LOCAL_NS:.housing_urban_development Rule SNMP_IP_Reach_Up 
AT 21-JAN-1994 13:54:58 

Normal operation has begun.

Domain LOCAL_NS:.labour_admin_services Rule SNMP_IP_Reach_Down 
AT 21-JAN-1994 13:55:00 

Normal operation has begun.

Domain LOCAL_NS:.labour_admin_services Rule SNMP_IP_Reach_Up 
AT 21-JAN-1994 13:55:01 

Normal operation has begun.

Domain LOCAL_NS:.justice Rule SNMP_IP_Reach_Down 
AT 21-JAN-1994 13:55:02 

Normal operation has begun.

Domain LOCAL_NS:.justice Rule SNMP_IP_Reach_Up 
AT 21-JAN-1994 13:55:03 

Normal operation has begun.

Domain LOCAL_NS:.arts_cultural_heritage Rule SNMP_IP_Reach_Down 
AT 21-JAN-1994 13:55:07 

Normal operation has begun.

Domain LOCAL_NS:.arts_cultural_heritage Rule SNMP_IP_Reach_Up 
AT 21-JAN-1994 13:55:08 

Normal operation has begun.

Domain LOCAL_NS:.premier_govt_management Rule SNMP_IP_Reach_Down 
AT 21-JAN-1994 13:55:10 

Normal operation has begun.

Domain LOCAL_NS:.premier_govt_management Rule SNMP_IP_Reach_Up 
AT 21-JAN-1994 13:55:10 

Normal operation has begun.

T.RTitleUserPersonal
Name
DateLines
5839.1MOLAR::YAHEY::BOSEMon Jan 24 1994 10:015
	When you get an alarm for reachability down, can you immediately
	issue a "SHOW SNMP xxxx ALL STATUS" and tell us the result. 

	Rahul.
5839.2option: 2nd poll in scriptCTHQ::WOODCOCKSkiing's 1st Human GroomerMon Jan 24 1994 10:5215
Greetings,

Assuming you get the poller as robust as it can be and you are still having
problems I'd recommend an implementation modification.

Within the script POLL THE DEVICE AGAIN if you can. I did this using ncp 
because of a similar problem as you describe however I have not tried this
with snmp. You could take two approaches. I simply did an "ncp tell x sho exec"
an captured the $STATUS. If successful this means the node is not really down
and don't send mail (in you're case don't send to TARGET). You might be able
to use the same technique by issueing "mcc show snmp x ipreachability, to file
=x.txt". Then search x.txt for "down" before sending to TARGET.

just a thought,
brad...
5839.3SNMP status OKADO75A::BOUCHERReece BoucherMon Jan 24 1994 22:4315
    Rahul,
    
    Showing the status of the SNMP device directly after an IP reachability
    Down normally shows IP Reachability=UP.
    
    Brad,
    
    I take it the way you are suggesting is to use the Data Collector. 
    This would work, but doesn't it go against the intention of the IP
    Poller ?
    
    Thanks for the replies.
    
    
    Reece Boucher
5839.4use a second poll as proofCTHQ::WOODCOCKSkiing's 1st Human GroomerTue Jan 25 1994 09:5838
Hi Reece,
    
>    Brad,
>    
>    I take it the way you are suggesting is to use the Data Collector. 
>    This would work, but doesn't it go against the intention of the IP
>    Poller ?
    
Actually no, not quite what I was suggesting. This may actually go against
the IP Poller but I'm not sure because I've never used it. Anyway, what we do
is write a simple alarm rule, enable it, and if something alarms down then in
the script we poll the device again. By doing this we double check whether the
device is really down, the network glitched, or the router was too busy, or 
etc.

Alarm rule expression something like     exp=(snmp * ipreachability=down)
					 procedure=node_down.com
enable rule

logic in node_down.com
	
	- get snmp device name
	- poll it again through the script
		$ mcc show snmp <device> ipreachability,to file=x.txt
		$ sea x.txt down
		$ if $status .ne. %x0000001 then node_status="down"
		$ if node_status .eqs. "down"
		$    then
		$   	send mail
		$	etc, etc...
		$ endif

The above stuff is not exact by any means, it will also not be fast. What it
does do is double check to make sure the device is down before sending mail
or creating a TARGET ticket which is the major hassle we are faced with.

cheers,
brad...
5839.5MOLAR::YAHEY::BOSETue Jan 25 1994 10:5810
	RE .3

	Reece,

		This probably means that the alarm encountered an exception.
	Can you check the contents of the alarms notification to confirm 
	that ?

	Rahul.
5839.6No exceptions encounteredADO75A::BOUCHERReece BoucherWed Jan 26 1994 18:5013
    Rahul,
    
    No exceptions were logged in mcc_notification.log.
    
    There are just normal IP reachability Up and IP Reachability DOWN error
    conditions reported.
    
    The difference between the two however, can be as much as 10 minutes.
    
    FYI.
    
    
    Reece...