[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference azur::mcc

Title:DECmcc user notes file. Does not replace IPMT.
Notice:Use IPMT for problems. Newsletter location in note 6187
Moderator:TAEC::BEROUD
Created:Mon Aug 21 1989
Last Modified:Wed Jun 04 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:6497
Total number of notes:27359

2619.0. "alarm exception questions" by COL01::LUNT () Tue Mar 24 1992 10:03

Hi,

	I have the following questions concerning alarms exceptions.
I am using T1.2.4 base level of MCC running on VMS 5.5.

1. When I submit alarms that should run every 15 minutes, and an exception
occurs, then the next exception occurs 5 minutes later not 15 when the
error has not cleared thereby causing a second exception. Is this correct? 
and is this "exception timer" changeable.

2. How is the Severity of an exception determined? Is there a way to set 
this so that it is not always Indeterminate? Why does it not take on
the Severity of the alarm that was set?

3. Why is an MCC alarm tied to a certain revision of the
Alarms_mail_exceptions.com file in mcc_common? Is there a way to change
this. Please see log file and below that the Rule Creation commands. I
changed the command file and thereby created a new revision number, but did
not notice that my exceptions were not working correctly until I had a
bridge hang and I recieved no exception. 

LOG FILE:

$!
$! This command procedure is always run when anybody on the entire system
$! logs in. It is equivalent to LOGIN.COM except that the instructions
$! contained herein are executed everytime anyone on the VMS system
$! logs in to their account.
$!
$! For interactive processes, turn on Control T, and set the terminal type
$!
$ IF (F$MODE() .EQS. "BATCH") THEN GOTO EXIT
$EXIT:
$EXIT
$! FILE: MCC_ALARMS_SECURITY.COM
$!  
$!  **************************************************************
$!  * ********************************************************** *
$!  * *                                                        * *
$!  * *  Copyright (c) Digital Equipment Corporation, 1990     * *
$!  * *  All Rights Reserved.  Unpublished rights reserved     * *
$!  * *  under the copyright laws of the United States.        * *
$!  * *                                                        * *
$!  * *  The software contained on this media is proprietary   * *
$!  * *  to and embodies the confidential technology of        * *
$!  * *  Digital Equipment Corporation.  Possession, use,      * *
$!  * *  duplication or dissemination of the software and      * *
$!  * *  media is authorized only pursuant to a valid written  * *
$!  * *  license from Digital Equipment Corporation.           * *
$!  * *                                                        * *
$!  * *  RESTRICTED RIGHTS LEGEND   Use, duplication, or       * *
$!  * *  disclosure by the U.S. Government is subject to       * *
$!  * *  restrictions as set forth in Subparagraph (c)(1)(ii)  * *
$!  * *  of DFARS 252.227-7013, or in FAR 52.227-19, as        * *
$!  * *  applicable.                                           * *
$!  * *                                                        * *
$!  * ********************************************************** *
$!  **************************************************************
$!  
$! FACILITY:
$!     MCC -- Management Control Center
$! 
$! ABSTRACT:
$! 
$! This purpose of this command file is to normalize DCL symbols, set
$! user's privileges to NOALL, restrict logical trasnlation to searching
$! on the system logical translation table, all as a security measure.
$! 
$! When a RULE fires, this command file will be queued to the SYS$BATCH
$! or user specifed execution queue.  This file will call the user's
$! specified command file after taking the above security precautions. 
$! It will then set the user's environment back to its original state.
$! 
$! The LOGFILE for the execution of the batch job is written to the
$! directory where the user specified command procedure is located.
$!
$! The parameters p1-p8 are displayed if the MCC_ALARMS_FM_LOG debug mask
$! is set to ??
$! 
$! 24-AUG-1990   Aud Orenstein	- P8 has datafile name concatenated
$! 30-AUG-1990   Anil Navkal      Deleted the data file: 
$! 05-SEP-1990   Aud Orenstein    Set ON for error handling
$! 06-DEC-1990   Aud Orenstein    Why doesn't the data file get deleted?
$! ========================================================================
$! ========================================================================
$
$
$!++
$! If the MCC_ALARMS_FM_LOG set to ALARMS$M_LOG_SECURITY_WRITELNS then write
$! the values of P1 through P8 
$!--
$
$ ALARMS$M_LOG_SECURITY_WRITELNS = %X08
$ 
$ ALARMS$STR_MASK = f$trnlnm ("MCC_ALARMS_FM_LOG")
$ ALARMS$INT_MASK = f$integer ("%x"+ ALARMS$STR_MASK)
$ ALARMS$BIT = ALARMS$INT_MASK .and. ALARMS$M_LOG_SECURITY_WRITELNS
$
$    queue = f$getqui ("DISPLAY_JOB", "QUEUE_NAME",, "THIS_JOB")
$    write sys$output "Current queue:	  ",queue
Current queue:	  MCC$ALARMS
$
$ if ALARMS$BIT .gt. 0
$ endif
$
$!++
$! Delete all the users symbols.  This is done to NORMALIZE the DCL symbols
$!--
$
$ D == "DELETE"
$ D/SYMBOL/GLOBAL/ALL                                                
$
$!++                     
$! DELETE user defined logicals in PROCESS and JOB tables
$! Restrict logical translation to the PROCESS, JOB and SYSTEM logical 
$! translation tables.  GROUP is not used.
$!--
$
$ if ALARMS$BIT .gt. 0 then SHOW LOG

$
$ DEASSIGN/PROCESS/ALL
$ DEASSIGN/JOB/ALL
$ DEFINE/TABLE=LNM$PROCESS_DIRECTORY LNM$FILE_DEV LNM$PROCESS,-
               LNM$JOB, LNM$SYSTEM
$
$ if ALARMS$BIT .gt. 0 then SHOW LOG
$
$!++
$! Save all users privileges and turn them off
$!--
$
$ SAVEPRIVS = F$SETPRV ("NOALL,TMPMBX, NETMBX")
$
$!++
$! Split the DATA filename off from the User's Procudure name
$!--                                          
$
$ COMMAND_PROCEDURE = F$ELEMENT(0, "/", P8)
$ DATA_FILE 	    = F$ELEMENT(1, "/", P8)
$
$!++
$! Set ON incase user's command procedure returns error
$!--                                          
$ on sever_error then continue
$ on error then continue
$ on warning then continue
$
$!++
$! Run the User's Command Procedure
$!--                                          
$ @DISK$DATA:[MCC]MCC_ALARMS_MAIL_EXCEPTION.COM;6 "MCC 0 ALARMS RULE BYLN13_UNREACHABLE"-   	!rulename
		      "DESCRIPTION"-   	!category 
		      "BRIDGE UNREACHABLE "-   	!description
		      "(BRIDGE .EU.BY.BYL.BYLN13 DEVICE STATE <> OPERATING, at  every=00:15:00)"-   	!expression
		      "24-MAR-1992 11:20:38.30"-   	!time
		      "Cannot communicate with target"-   	!dtcrtf or error
		      "@MCC_COMMON:ALARMS_DISTRIBUTION_LISTE.DIS"-   	!notification params
        	      "SYS$SCRATCH:MCC_ALARMS_DATA_11203830.DAT
%DCL-E-OPENIN, error opening DISK$DATA:[MCC]MCC_ALARMS_MAIL_EXCEPTION.COM;6 as input
-RMS-E-FNF, file not found
$ !++
$ ! 	Delete the data file 
$ !--
$ !
$ !	DELETE 'DATA_FILE';
$
$       DELETE/LOG DISK$DATA:[MCC_ALARMS]MCC_ALARMS_DATA_11203830.DAT;
%DELETE-I-FILDEL, DISK$DATA:[MCC_ALARMS]MCC_ALARMS_DATA_11203830.DAT;1 deleted (3 blocks)
$!==========================================================================
$ !++
$ ! clean exit point
$ !--
$ clean_exit: 
$ !
$ !--------------------------------------------------------------------------------------
$ 		exit
  MCC$ALARMS   job terminated at 24-MAR-1992 11:20:42.57

  Accounting information:
  Buffered I/O count:              40         Peak working set size:     445
  Direct I/O count:                41         Peak page file size:      2665
  Page faults:                    519         Mounted volumes:             0
  Charged CPU time:           0 00:00:00.98   Elapsed time:     0 00:00:03.34

===========================================================================

RULE Creation:
==========================================================================
Delete MCC 0 ALARMS RULE BYLN13_UNREACHABLE, in domain = DECNOS_NS:.EU.BY.
BYL.K9_PLT
Create MCC 0 ALARMS RULE BYLN13_UNREACHABLE -
  Category           = "BRIDGE UNREACHABLE ", -
  Description        = "DESCRIPTION", -
  Expression         = (BRIDGE .EU.BY.BYL.BYLN13 DEVICE STATE <> OPERATING, 
at  every=00:15:00), -
  Procedure          = DISK$DATA:[MCC]MCC_ALARMS_MAIL_ALARM.COM, -
  Exception Handler  = DISK$DATA:[MCC]MCC_ALARMS_MAIL_EXCEPTION.COM, -
  Parameter          = "@MCC_COMMON:ALARMS_DISTRIBUTION_LISTE.DIS", -
  Queue              = "MCC$ALARMS", -
  Perceived Severity = CRITICAL, -
  in domain = DECNOS_NS:.EU.BY.BYL.K9_PLT

T.RTitleUserPersonal
Name
DateLines
2619.1Some answersTOOK::MINTZErik Mintz, DECmcc Development, dtn 226-5033Tue Mar 24 1992 11:2119
>1. When I submit alarms that should run every 15 minutes, and an exception
>occurs, then the next exception occurs 5 minutes later not 15 when the
>error has not cleared thereby causing a second exception. Is this correct? 
>and is this "exception timer" changeable.

Don't know off hand.

>2. How is the Severity of an exception determined? Is there a way to set 
>this so that it is not always Indeterminate? Why does it not take on
>the Severity of the alarm that was set?

It is always Indeterminate.  We are considering changing it to take on
the severity of the corresponding alarm.

>3. Why is an MCC alarm tied to a certain revision of the
>Alarms_mail_exceptions.com file in mcc_common? Is there a way to change

This is a security feature.  There is no way to change it.

2619.2More NeededBEAGLE::ANDRADEThe sentinel (.)(.)Fri Mar 27 1992 04:2940
    Some more on alarms:
    
    Clients want, DECmcc to change icon colors for every event. Also not to
    receive too many messages about the same thing.
    
    As an example: Using polling to detect when nodes become unreachable 
    and then reachable again. 
    
    In order to do this it would help if the alarm color for the exception
    and for the normal alarm firing could be set to two different colors.
    And that DECmcc would always set the Icon to the last color alarmed.
    
    Also in a CHANGE_OF expression, I find it very strange that the
    EXCEPTION doesn't count in setting the known attribute value. I think
    the attribut value should be changed in this case to "UNKNOWN".
    
    Thus with an alarm of the type:
    
    (CHANGE_OF(NODE4 X state *,*)) , normal firing = clear = blue
    				   , exception firing = critical = red
    
    I could make the ICON go red (and mail sent, etc) when the node becomes
    unreachable. (State changes from "on" to "UNKNOWN")
    
    Then while the node is unreachable, the alarm exception doesn't fire 
    again because the state remains "UNKNOWN". 
    
    But when the node comes up again, the state changes from "UNKNOWN" to 
    "on" and the alarm fires again normally changing the ICON to blue 
    (and sends mail, etc).
    
    And lastly while the node remains reachable, the alarm doesn't fire
    again either because the state remains "on" all the while.
    
    Hoppefull,  Gil
    
    P.S.  I had several concerned people mentioning something similar to
    me. (how with DECmcc you can never see the node state = off, etc. )
    because then the node is unreachable and its the exception that fires.
    
2619.3Try IpReachabilityTOOK::MINTZErik Mintz, DECmcc Development, dtn 226-5033Fri Mar 27 1992 07:499
For SNMP nodes, I would suggest using the IpReachability attribute,
which should return up or down rather than the exception.

Unfortunately, it is too late in the development cycle for us to
change functionality in alarms for V1.2.  But I will file your
note as a suggestion QAR so it can be considered later.

-- Erik

2619.4Occurs N Times + 'generic' reachability?TOOK::MCPHERSONSave a tree: kill an ISO working group.Fri Mar 27 1992 08:1922
>    Clients want, DECmcc to change icon colors for every event. Also not to
>    receive too many messages about the same thing.

    Maybe I'm reading this incorrectly, but this statement seems to
    contradict itself.

    Would an "OCCURS N TIMES" rule format do anything to help your here? 

Re: your questions about reachability et al:

    For objects that have 'reachability' defined as an attribute (e.g.
    IPreachability for SNMP entities) then you can do what you want.   For
    all the rest, you will still need more fancy processing (which is not
    in 1.2)...

    Maybe one could make a case for a 'generic' attribute called
    "Reachability" that could be 'inherited' (sorta like reference
    attributes) ?  Might that be a more reasonable/predictable way to solve
    the general problem instead of trying to infer from alarm rule
    exceptions firing? 

    /doug
2619.5Re. 3 & .4BEAGLE::ANDRADEThe sentinel (.)(.)Mon Mar 30 1992 05:2952
    Re.3 Thanks Erik
    
    Too bad it can't be done for v1.2, I guess I should have mentioned it
    sooner. But it seemed so obvious that something like this was needed...
    
    Re.4 Doug
    
    A generic Reachability attribute that is inherited certainly seems like
    something worth doing. A CHANGE_OF rule would fire when the node first
    becomes rechable or unreachable. However, would the ICON colors change
    as well with the node state. This would require being able to set ICON
    colors by the "REACHABILITY" attribute state.
    
    The "UNKNOWN" value for attributes I asked for, is in fact the same
    as your "REACHABILITY" attribute. If you implement it for everything.
    Not just the NODE rechability but every attribute's rechability.
    
    Also people still don't want to be swamped by exception alarms...  
    Exceptions should fire only when the reason for the exception changes.
    
    I also sugest that it would be very usefull to have the exception
    and the normal alarm firing turn the ICONs to different colors. This
    is desirable no matter what else is done.
    
    When DECmcc fails to get some data then all involved attribute values
    should be set to "UNKNOWN" or "UNREACHABLE". For Status attributes, 
    this just means an extra valid state. For String Attributes this is just 
    another string. For Numeric attributes it maybe more difficult, it means 
    associating a state variable to everyone of them. 
    
    One use of such information is to get node rechability state, using 
    CHANGE_OF alarm rules. Another use is to notify you ONCE (both with the
    ICON color and your choosen method) when any attribute you are watching 
    for becomes unreachable, and to notify you ONCE again when it becomes 
    reachable, for whatever reasons.
    
    For example:   You have an alarm to inform you when a line's utilization
    goes over 80%. DECmcc would fire the rule normally as requested, but it
    also warns you when the  UTILIZATION  data is not available for whatever
    reason. (No need for you to create extra rules, and no fear of being
    swamped by exception alarms) 
    
    Regards,	Gil
    
    ***   Another thing that would be very usefull, would be to let us the
    DECmcc users. Set the ICON color from the ALARM user command procedure.
    Using a command like: 	"MCC> set entity X color Y, in domain Z"
    We could then choose the ICON color, based on whatever data item we
    choose and acording to the our specific enviroment.
    
    It would also be usefull for Demos, and the like. 
    
2619.6smart availabilitySKIBUM::GASSMANMon Mar 30 1992 09:0411
    As long as you are talking about reachability - one of the items that
    MSU had over HP (as of hot staging for last october's interop) was the
    attribute it looks at for the reachability map.  MSU digs down into the
    interfaces and checks their status - so during the 'frequent' poll, it
    not only checks the host, but also it's circuit information.  MSU is
    actually quite good at finding and displaying particular circuits down 
    due to this definition of reachability, and is one of MSU's competitive 
    features MCC needs to emulate as the LAN/internet market is sought
    after.  
    
    bill
2619.7DECmcc does have exception colorTOOK::R_SPENCENets don&#039;t fail me now...Mon Mar 30 1992 13:274
    By the way, DECmcc DOES have a seperate color for exceptions, it
    is the color associated with the severity INDETERMINATE.
    
    s/rob
2619.8more discussionMCDOUG::MCPHERSONSave a tree: kill an ISO working group.Mon Mar 30 1992 15:1856
>    A generic Reachability attribute that is inherited certainly seems like
>    something worth doing. A CHANGE_OF rule would fire when the node first
>    becomes rechable or unreachable. However, would the ICON colors change
>    as well with the node state. This would require being able to set ICON
>    colors by the "REACHABILITY" attribute state.

The behavior of the product is this: Icon color is associated with an event or
alarm rule severity.    If there is an event or alarm associated with this
REACHABILITY attribute, THEN the object's icon will change color. 


>    Also people still don't want to be swamped by exception alarms...  
>    Exceptions should fire only when the reason for the exception changes.

Sorry.  Your interpretation for EXCEPTIONs firing is in conflict with DECmcc's
prescribed behavior.  Exceptions fire when an alarm rule CANNOT be evaluated. 
Period. That is the prescribed behavior.  

>    I also sugest that it would be very usefull to have the exception
>    and the normal alarm firing turn the ICONs to different colors. This
>    is desirable no matter what else is done.

There is a default notification severity (hence color) for EXCEPTION: the one
associated with severity = Indeterminate.
    
>    When DECmcc fails to get some data then all involved attribute values
>    should be set to "UNKNOWN" or "UNREACHABLE". For Status attributes, 
>    this just means an extra valid state. For String Attributes this is just 
>    another string. For Numeric attributes it maybe more difficult, it means 
>    associating a state variable to everyone of them. 


This may be true in your particular requirements, but certainly not
universally.    Might it be that what you're hinting at is some indicator in an
entity's  description that describes the 'relationship atomicity' (for want of
a better description) of a group of attributes.  I.e. if any member of a given
attribute partition is not returned, then set a flag that indicates "data is
suspect".

Also remember: Alarm rules gotta do what alarm rules gotta do.  They look for
the data needed to satisfy rules in a BOOLEAN fashion.   I.e. there is no
"Maybe.  Come back later" state for an alarm rule; they are either TRUE, FALSE
or INDETERMINATE.  INDETERMINATE means that it couldn't evaluate either TRUE or
FALSE and *that* is considered an EXCEPTION (and not the rule.... ;^)  )
    
>Another thing that would be very usefull, would be to let us the
>    DECmcc users. Set the ICON color from the ALARM user command procedure.
>    Using a command like: 	"MCC> set entity X color Y, in domain Z"
>    We could then choose the ICON color, based on whatever data item we
>    choose and acording to the our specific enviroment.
>    
>    It would also be usefull for Demos, and the like. 

Use the Data Collector AM & sample code & specify a target entity in the event.
That'll do what you want fairly simply.    

2619.9MoreMAYDAY::ANDRADEThe sentinel (.)(.)Tue Mar 31 1992 12:3380
    Re.6 (Bill)
    
    I agree DECmcc alarms need more functionality. Alarms should be more
    then a simple YES/NO check of an entity's attribute.   Having DECmcc
    alarms do one thing per alarm, means that users have to create a lot
    of alarms should not have been needed. 
    
    And this is is what I am sugesting, making the alarms, keep track of
    the availability of the data they check as well as the data itself.
    Then passing the availability data in a effective maner to the user.
    
    Reducing the number of alarm rules, as well as the number of firings
    those rules do is one of my major goals. What I want is an ICON map
    that mirrors DECmcc's knowledge of the CURRENT state of the network.
    (This includes an indication of alarmed entities that are unreachable)
    
    With ONE mail sent to the appropriate people when something serious 
    happens. A node becoming unreachable is serious, I just don't want to
    hear about it thousands of times.
    
    
    Re.7 (Rob)
    
>    By the way, DECmcc DOES have a seperate color for exceptions, it
>    is the color associated with the severity INDETERMINATE.
    
    This is news to me, it must be v1.2 functionality. My v1.1 never did
    this.
    
    
    Re.8 (MCPHERSON)
    
>The behavior of the product is this: Icon color is associated with an event or
>alarm rule severity.    If there is an event or alarm associated with this
>REACHABILITY attribute, THEN the object's icon will change color. 

    I know that each alarm rule has ONE color associated with it. What I 
    sugested is that it maybe usefull to have MANY colors associated with
    a single rule. Avoiding the need to create many rules to do the same
    thing. Something like 
    	"STATE= (ON=green, OFF=red, UNREACHABLE=orange)"
    or
    	"UTILIZATION= ([<80]=green, [>=80]=yellow, UNREACHABLE=red)"
   
>Sorry.  Your interpretation for EXCEPTIONs firing is in conflict with DECmcc's
>prescribed behavior.  Exceptions fire when an alarm rule CANNOT be evaluated. 
>Period. That is the prescribed behavior.  

    I know that this is how they work. But consider this, you are polling
    10 systems (with 3 alarms each) in an ethernet segment every 10 minutes. 
    This means if that segment becomes unrechable for a day (has happned)
    you will receive over FOUR THOUSAND alarm exceptions in a single day
    informing you that those systems are unreachable. (and if you requested 
    mails...)
  
    That is why I sugest to reduce the work load of the users (and DECmcc)
    that if those alarm exceptions informed the user ONCE per alarm that
    the system is unreachable, and ONCE again when it becomes rechable. It
    would be a lot better; 60 alarm exceptions as opposed to over 4,000.
    

>Also remember: Alarm rules gotta do what alarm rules gotta do.  They look for
>the data needed to satisfy rules in a BOOLEAN fashion.   I.e. there is no
>"Maybe.  Come back later" state for an alarm rule; they are either TRUE, FALSE
>or INDETERMINATE.  INDETERMINATE means that it couldn't evaluate either TRUE or
>FALSE and *that* is considered an EXCEPTION (and not the rule.... ;^)  )
    
    I agree a rule gotta do what its gotta do, all I am saying is that 
    exceptions should fire TWICE ONLY. Firing ONCE when the rule becomes 
    INDETERMINATE and ONCE AGAIN when the rule becomes DETERMINATE again. 
    
    If as indicated, v1.2 sets the ICON to the INDETERMINATE color "WHILE"
    the rule evaluation is INDETERMINATE.  Then these two things together 
    would provide what I requested in my original reply.
    
    
>Use the Data Collector AM & sample code & specify a target entity in the event.
>That'll do what you want fairly simply.    

    Thanks for the information, I will look into it. 
2619.10Wish->QAR->Engineering's 'to do' list...TOOK::MCPHERSONSave a tree: kill an ISO working group.Tue Mar 31 1992 13:2119
    I urge you to file a QAR so that your suggestions about changes to the
    post 1.2 product will get in the queue.  Otherwise, they're likely to
    remain 'nice ideas'. 

    BTW: I understand your requirements and they would be nifty
    enhancements; you just 'cahn't get there from heah' right now.

    /doug

    P.S.  If you're getting tons of mail on your EXCEPTIONS, then you need
    to change the exception procedure to NOT send mail.  At least you
    wouldn't get flooded with mail when a segment goes unreachable...

    P.P.S 
    You can also create "rules on rules" that look at the counters
    associated (I.e. if it's unable to evaluate >10 times, then open the
    pod bay doors... blah blah blah.

    /doug
2619.11okMAYDAY::ANDRADEThe sentinel (.)(.)Thu Apr 09 1992 06:538
    Re.10 /doug
    
    Certainly, I will open a QAR if it will help, but Erik Mintz (re.3)
    has already done so for me.
    
    These last replies are just to iron the functionality details out.
    
    Gil
2619.12correlation of eventsTOOK::CALLANDERMCC = My Constant CompanionTue Apr 21 1992 11:4518
    to make things a bit clearer, what you are asking for has already been
    defined by the standards on alarming, they call this function event
    correlation. There exists in the profile (the document that explains
    how to implement a standard) a description of the how to's and when
    to's for event correlation. The idea behind correlation is so that
    an event is only reported once, and another report doesn't occur until
    the condition has been changed (better or worse). In DECmcc we have
    implemented correlation at the PM level and not the FM level for V1.2
    (though it needs to be handled in the FM; but not until we figure out
    how to handle notification of these events when user processes come and
    go between the leading edge and trailing edge of an event). If you
    select an entity and ask to see it's list of notifications (display
    notifications) you will see only the correlated list, not EVERY event
    that has been reported. We hope this helps, but are aware of the
    limitation that the implementation in the PM has on the user.
    
    jill
    
2619.13We hope to see this functionality also!!!!!COL01::LUNTTue Apr 28 1992 06:017
Hi,

	This correlation of alarms is really needed. We also hope to see it in
the follow on version to 1.2.

Julie Ann