[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference azur::mcc

Title:DECmcc user notes file. Does not replace IPMT.
Notice:Use IPMT for problems. Newsletter location in note 6187
Moderator:TAEC::BEROUD
Created:Mon Aug 21 1989
Last Modified:Wed Jun 04 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:6497
Total number of notes:27359

2151.0. "Event Alarms problem" by BAHTAT::TAYLOR () Tue Jan 21 1992 09:53


	Can anyone shed some light on what I see happening on a DECmcc 1.1
   implementation ??
        VS3100-76/32Mb VMS5.4-2 managing a network of 50+ DECrouter 2000's.
	All alarms are based on Decnet events (ie Occurs clauses). A total
   of 156 rules have been written and enabled. Since the network was quite 
   stable, an extra rule was created to allow a go/nogo test of the alarms
   functions.

	Imagine my suprise when after enabling all the alarms I found that
   my test would not work. After disabling the operational alarms the test 
   alarm worked fine. I then enabled further alarms in groups and found that
   with approx. 110 alarms enabled everything worked fine, but when another
   20+ were enabled everything stopped working. That is to say the test rule
   never fired, although the relevant Decnet event was received by the MCC 
   station (Opcom logged the event).
      On checking EVL.LOG/MCC_DNA4_EVL.LOG there are no error messages and
   system looks otherwise OK.

       Anyone else seen this ?

       Is this some resource problem ?


	Thanks in anticipation

		R.
    
T.RTitleUserPersonal
Name
DateLines
2151.1Some questionsTOOK::ORENSTEINTue Jan 21 1992 12:3612
    
    When you said "the rules stopped firing", did you mean that you saw
    no more notification on the map, or that you stopped recieving your
    command script notification, or that the counters stopped going up?
    
    I don't mean to be picky, but each senario can represent and entirely
    different problem.  Once we determine exactly what stopped working
    we can try to figure out why.
    
    How often did you expect your rules to fire?
    
    aud...
2151.2More InfoBAHTAT::BONDTue Jan 21 1992 12:4916
    Aud,
    
    I was discussing this earlier today with Richard, and in his absence
    will attempt to answer.  I believe that Richard was saying that the
    rule did not fire ie that any batch job associated with the rule
    evaluating true did not get launched, and no colour change was seen on
    the iconic map.  As Richard said, it was an 'OCCURS' type rule and his
    OPCOM indicated that the system had seen the event coming from EVL.
    
    It seemed to us that the problem had occurred as a result of having
    more than 110 (but less than 130) rules enabled.  The only significant
    number I can think of between these two is 127 (or maybe 128) so we
    were thinking of some sort of resource problem.
    
    Hope this helps,
    chris
2151.3< more clarification >BAHTAT::TAYLORWed Jan 22 1992 04:4515
    
     Aud,
    	Just to confirm and expand on .3.
    The map and script notification both ceased. The only alarm expected to
    "fire" was the test alarm which could be executed at will. (Counters
    zeroed on a remote node.) Opcom received the relevant event each time
    the remote node counters were zeroed but Mcc gave no indication at all.
    Once ALL the Alarms had been disabled and the test alarm re-enabled it
    "fired" each and every time the counters were zeroed. I left the
    station with 78 alarms enabled on "key" routers and everything was
    operating fine.
    
    Thanks for the quick response.
    
    		R.
2151.4 Further information BAHTAT::TAYLORThu Jan 23 1992 08:0135
  Aud,

   Further to the previous replies I can now add some more information.
   
   I have spent the last couple of days re-creating the problem back here
 in the office. Observations :-
 
   1. The problem IS reproducable.
   2. At the first attempt the test worked fine with up to 90 alarms enabled.
      Enabling a further 15 alarms caused the test to cease working. Disabling
      the last group of alarms resulted in two things :-
             1) The last of 3 "undelivered" alarms fired off.
             2) The test started working again.
      Alarms were then enabled in 5's. At 100 the test stopped working again -
      however when the last group were disabled the test still did not work,
      in fact ALL alarms had to be disabled before the test could be 
      re-enabled and would work. Also the Getevent directive would not work
      with the test "event".
   3. Continuing to enable alarms, the test failed with only 75 alarms enabled,
      and could only be recovered by disabling all etc.etc.
   4. This continued, surviving a number of complete logouts and even a system
      reboot until the test would not even work with 15 alarms enabled.
   5. After deleting the Alarm_Instance/Alarm_Attribute .dat files I started
      anew. After creating 180 alarms I once again enabled them in groups of
      15, testing after each group were enabled. At 75 everything worked fine.
      but at 90 the test failed. After disabling alarms 75-90 the test worked 
      ok again. From here I enabled each alarm manually, allowing several 
      seconds to elapse between each enable command. After each group of 5
      alarms were enabled I checked the test. In this way I managed to enable
      all 180 alarms AND have the test work successfully every time.

      Does any of this help at all ????

		R.
2151.5Thanks for the detailTOOK::ORENSTEINThu Jan 23 1992 12:338
    Thank you so much for the very complete report.  With this detail we
    will have a better chance of figuring out your problem.
    
    It may be a couple of days before we can get back to you due to the
    huge amount of work we need to do for V1.2
    
    	aud...
    
2151.6TOOK::SWISTJim Swist LKG2-2/T2 DTN 226-7102Thu Jan 23 1992 14:515
    The number of alarm rules you can enable in the 1.2 EFT kit is severely
    limited and we are working on the problem.   Both VMS and Ultrix fail
    in the vicinity of 100 rules (for different reasons, but that's no
    comfort to the user).