[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference azur::mcc

Title:DECmcc user notes file. Does not replace IPMT.
Notice:Use IPMT for problems. Newsletter location in note 6187
Moderator:TAEC::BEROUD
Created:Mon Aug 21 1989
Last Modified:Wed Jun 04 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:6497
Total number of notes:27359

1027.0. "Alarms Bug???" by JETSAM::WOODCOCK () Fri May 17 1991 17:00

Hello, I think we may have an alarms bug here but a sanity check would be
good. The alarm described below was created and enabled. Note there was no
SEVERITY or EXCEPTION HANDLING in the rule. It was then enabled and the node4
was taken off the net. Without exception handling I would think no notification
to the map should occur. But the alarm fired and a color change was seen. The
DATA reads "Node not currently accessible". Furthermore the SEVERITY is listed
as CRITICAL with the corresponding color confirming the fact. The patch has
been installed. Have things changed in the last kit or two or have I lost it???

brad...



Domain NOCMAN_NS:.PKO-24 Rule TEST 
AT 17-MAY-1991 15:33:14 Characteristics

Examination of attributes shows:
                  Alarm Fired Procedure = SYS$COMMON:[MCC]MCC_ALARMS_MAIL_ALARM.
                                          COM;7
                 Alarm Fired Parameters = "DECMCC"
                             Expression = (NODE4 LHPK01 MAXIMUM ADDRESS>1,AT 
                                          EVERY 00:02:00)
                               Severity = Indeterminate

T.RTitleUserPersonal
Name
DateLines
1027.1You must have ghosts ...TOOK::ORENSTEINMon May 20 1991 17:0927
    Hi Brad,
    
    	My only guess is that THIS is not the rule that fired.
    

Domain NOCMAN_NS:.PKO-24 Rule TEST 
AT 17-MAY-1991 15:33:14 Characteristics

Examination of attributes shows:
                  Alarm Fired Procedure = SYS$COMMON:[MCC]MCC_ALARMS_MAIL_ALARM.
                                          COM;7
                 Alarm Fired Parameters = "DECMCC"
                             Expression = (NODE4 LHPK01 MAXIMUM ADDRESS>1,AT 
                                          EVERY 00:02:00)
                               Severity = Indeterminate


    Is there any chance that other rules were running.  Perhaps you forgot
    to turn one off?
    
    You mention DATA -- was this in a mail message or a log.  In these two
    cases, the rulename should be right there too.
    
    It really doesn't sound like a software bug.  SEVERITY can not change.
    I am interested in anything else strange you find.
    
    aud...
1027.2exception procedure different than icon notificationTOOK::CALLANDERMon May 20 1991 17:157
BTW even if you don't define an action (excepthion handler procedure) to
do something when the rule exception case is found, the
notification services WILL still pick up the event and cause an
icon color change; but it should be using the severity associated
wuith the rule (like Audrey said).

jill
1027.3gone nowJETSAM::WOODCOCKMon May 20 1991 18:0625
Hi there,

I definitely have had spirits in this system as of late!!! It was the
same rule, the one I showed in .0 and the one which alarmed. The notification
window is where I got the DATA from. Also the name of the rule was 
identical in both the notify window and the defined rule.

Actually, I had *several* rules fire with the same symptom. This led me to 
create the TEST rule to check myself. I suspect something happened when the
patch was installed. BTW, power was shut over the weekend and therefore the
system has been rebooted. I just double checked and the problem has
disappeared (the correct severity now shows both in color and notify text).

> BTW even if you don't define an action (excepthion handler procedure) to
> do something when the rule exception case is found, the
> notification services WILL still pick up the event and cause an
> icon color change; but it should be using the severity associated
> wuith the rule (like Audrey said).

Mail was also sent (and still does). Is this the correct action? I would
think if exception handling isn't defined the user probably doesn't want
his procedure to fire.

regards,
brad...
1027.4problem persistsJETSAM::WOODCOCKTue May 21 1991 10:3318
Todays view..

In double checking my testing I have found a couple of things which I
incorrectly stated in the last note. This problem STILL exists.

Clarification on the exact reactions is as follows.

1. Enable rule TEST (with node up and alarm is assumed to fire)
	- Rule fires with proper severity (color) and sends mail. All
	  seems ok.

2. Disconnect node and enable rule TEST 
	- Rule fires color change but does NOT send mail (this is different
	  from before MCC node rebooted, mail was also sent)
	- The severity is CRITICAL instead of INDETERMINATE (color matches
	  critical definition).

Any ideas???
1027.5The ghosts are gone ...TOOK::ORENSTEINTue May 21 1991 11:1314
    
    Yes, That's just right!
    
    If an exception occurs, an EXCEPTION event is generated by ALARMS
    (regardless of any exception handler).  The module that lights up
    the color is aware that this is an exception and thus uses the 
    color associated with CRITICAL.
    
    It looks like things are back to normal for you.
    
    As to why you even received mail the other day, I still think it
    is a ghost :)
    
    aud...
1027.6TOOK::GUERTINI do this for a living -- reallyTue May 21 1991 11:558
    RE.-1
    
    You mean if we had a "Nuclear Reactor" icon, you would change it the
    color of CRITICAL, because your "SHOW" call failed?  I wonder if this
    is the right model.  Food for thought: How about an "EXCEPTION" state?
    
    -Matt.
    
1027.7is it the right modelTOOK::CALLANDERTue May 21 1991 12:2113
I don't know if it is the right model, but I do understand some of the
reasoning that went into it.

Since in most cases, when a rule can no longer be evaluated (and they do
a number of retries before giving up), the entity is usually not
accessible for some reason; this seemed in most cases to be of critical
importance. As to if all users would agree with that, well who knows...
some one made an executive decision to try it that way. If you have
feedback (like I like it most of the time but would prefer it to be
customizable so that I can have it come out at the severity I picked...)
then enter it in this note. Your feedback (and any customers you want
to enter comments for as well) will help us made 1.2 more user friendly.

1027.8exception<>criticalJETSAM::WOODCOCKTue May 21 1991 17:2815
I also don't believe the SEVERITY should be established automatically by
MCC as critical. The idea of adding an EXCEPTION severity seems to be a
good idea if there is time for v1.2. This way the user can set this level
as a different color. When v1.2 comes out (and icon colors can
toggle up or down severity level, ie critical -> clear) a lot of users
will want to use the graphics to know the state of the network real time
as they see it without intervention. With this very common scenerio it
would be wiser not to confuse the user with critical and exception, and
use a seperation of their colors for clarity.

thanks for clearing/confirming,
brad...



1027.9It may just boil down to a matter of opinionTOOK::GUERTINI do this for a living -- reallyWed May 22 1991 10:4821
    RE:.7
    
    Jill,
      I understand the justification for going with this model now, thanks.
      As I understand it, we (MCC) will be listening for end-user input to
      determine if we should change the way we determine severity/color.
    
      To me it's like buying a smoke detector that sets off the alarm
      whenever the battery gets low.  The first time the alarm goes off,
      everyone runs out of the house.  The second time you run around and
      check if there really is a fire.  After a while, you just don't put a
      battery in.  I think that is why smoke detectors tell you in a
      *different* way that the battery is low.  I have two smoke detectors
      at home.  When the battery is low, one gives off a fast (but not too
      loud) beep-beep-beep, and the other has a flashing light that just
      stops flashing.  Personally, I'm very happy that the engineers who
      designed them decided not to have them just set off the alarm.
      
      Just one man's opinion :-)
      
  -Matt.
1027.10Bug? What bug....WAKEME::ANILWed May 22 1991 13:3459
    As usual interesting topics always pop out when I am not around!! Sorry
    for the delay folks I was off   on a course last two days.

   First to you Brad. As Audrey/Jill pointed out to you the behavior       
    of color change you saw was exactly the way we designed it. Now that we 
    hear that both Matt and you, are not exactly happy with the model       
    will force us to rethink about a strategy. But before I open this       
    can, let me give some reasoning behind the way the color changes.

    Two type of Alarms are of interest to us. One where the equipment you
    are monitoring fails and the other, the monitoring equipment fails.
    Matt let us take your example which should help us understand
    where we want to go.

        Say the Fire Alarms monitor fails at 10:30 PM. The alarms is 
        suppose to stop blinking (ie only a visual indication) giving 
        a clear indication that
        the if there is fire now, I am not watching! Most of the time 
        you get up next day and fix the problem. I.e. Change 
        the battery. What if you are on vacation? Well then the monitor
        has to wait till you return. Now what if there is real
        fire? Got the point?

        Of course one can go wild and say all fire Alarm monitors should
        be hooked up  to the Towns fire department. But thats another story.

    In V1.2 we plan on doing the following. Now, here is your chance folks.
    Bare in mind that resources are limited. If you do have a good idea
    lets hear it.

    In V1.2 The exception severity will be changed to indeterminate.

    Under OSI, following values are associated with different severities.

                      Indeterminate  = 0,             
                      Critical       = 1,             
                      Major          = 2,             
                      Minor          = 3,             
                      Warning        = 4,             
                      Clear          = 5 

    The standard is silent on which severity is higher and which is lower,
    and for a good reason. I think using indeterminate does solve the
    problem of assigning a meaning to a severity that user may not want to
    be associated.

    Also In V1.2 we will be generating a Rule Clear event that will have
    the severity "Clear", associated with it.

    My problem with adding one more argument to Create rule (rule
    exception severity) is that we already have ~15 fields  to fill in
    due to OSI compatibility requirements. Lets not add one more
    unless its absolutely necessary!

    Let us know if we have been off the wall!

    Thanks,

    - Anil Navkal
1027.11indeterminate=exceptableJETSAM::WOODCOCKWed May 22 1991 14:159
Exception=Indeterminate will work from my point of view. As far as I see
it an exception problem IS indeterminate until an intelligable decision
can be made given the error condition. It is up to the user to decide the
severity based on the error and react accordingly. A flag to the user is
needed but it must be unbiased to severity, and indeterminate would solve
this the same as creating an exception severity. V1.2 plans look ok from
here.

brad...
1027.12Thanks BradWAKEME::ANILWed May 22 1991 15:303
Thanks Brad.

- Anil
1027.13Working UI issues is a long tough jobENUF::GASSMANThu May 23 1991 08:2417
    Here is a case where experience with MSU can be drawn from without
    looking like we are stealing patented techniques from other vendors.
    When a remote polling daemon goes down, all the devices that were being
    polled by that daemon are colored 'indeterminate' - now, in the MSU
    circles, discussion is going on about leaving the color the same as it
    last was seen - but perhaps making the icon dotted.  The reality is
    that often the problem is somewhere between the management system and
    the agent - sometimes it's even the agent that is broken.  The concept
    of "I tried to get information but something stopped me" condition is
    needed.  A level deeper would try to determine what stopped the poll.
    Was it a timeout, did the network give a specific error message, did
    the remote device itself give you an indication that while it was
    somewhat alive, it wasn't going to service your management request.
    Here's a case where experience will tell us how to make it work, and
    customers will tell us if they like it that way.
    
    bill
1027.14TOOK::STRUTTManagement - the one word oxymoronMon May 27 1991 17:0815
    It's not clear to me that an exception received while evaluating a rule
    has any business being associated *directly* with the icon that will
    change when the rule fires.
    Overloading "indeterminate" seems equally inappropriate.
    
    What you might be better off with is having some way to indicate that
    there's a problem with "the alarm evaluation system" (sort of like
    Matt's analogy with the smoke detectors). One, but perhaps not the
    best, approach might be to have an icon that represents the alarm
    system. You already have something like that in the ability to show
    alarms - though you "cheat" by having the iconic map implement a special
    way of accessing that information. Maybe there's a consistent 'model'
    that could be used to deal with both things?
    
    Colin