[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference azur::mcc

Title:	DECmcc user notes file. Does not replace IPMT.
Notice:	Use IPMT for problems. Newsletter location in note 6187
Moderator:	TAEC::BEROUD

Created:	Mon Aug 21 1989
Last Modified:	Wed Jun 04 1997
Last Successful Update:	Fri Jun 06 1997
Number of topics:	6497
Total number of notes:	27359

2976.0. "Translan alarms exceptions" by GIDDAY::CHONG (Andrew Chong - Sydney CSC ) Tue May 12 1992 02:17

	
	14 Translan alarms are enabled to poll the bridges at 15 seconds
interval to determine change of status on the links. 

	A sample alarm looks like this :


Alarm Fired Procedure = DISK_USER:[NETWORKS.MCC.ALARMS]AWB_BRS_LINK.COM;1
Alarm Exception Procedure = SYS$COMMON:[MCC]MCC_ALARMS_LOG_EXCEPTION.COM;3
Description = "AWB-BRS Megalink has RECOVERED"
Category = "Bridge Recovery"
Expression = (CHANGE_OF(TRANSLAN AWBTL3503 line 2 module state,*, FORWARDING), 
					  at every 00:00:15)
Severity = Critical
                               

	The problem is that for the past 2 weeks it has been geting exceptions
alarm randomly on all bridges . The exceptions are logged in the exception 
logfile and some entries are seen in the MCC_ALARMS_date_ERROR.LOG

Two types of exceptions are seen :

    1.	%MCC-W-TIME_ALREADY_PA, scheduled time has already passed
        	           The rule has been disabled

    2. 	Cannot communicate with target
	
Particularly of concern is the TIME_ALREADY_PA exception. Since the rule 
itself does not specify a start and ending time for the alarm. The disabling 
of the rule means that no further polling is done until the alarm is reenabled 
via a com procedure  that runs at midnight. 


	Any comments why it would get TIME_ALREADY_PA exception ?
	Is polling the bridges at 15 seconds too frequent ? 
	
	A second problem which may or may not be related is that the above 
alarms are kept alive by a detached process . The detach process runs a com
procedure which enables the translan alarms then do a show command within mcc 
with start="duration" . "duration" is the duration to midnight. This keeps 
the process alive till midnight. The procedure then exit from decmcc which 
disables all alarms and then goes back to enable the alarms for another 24 
hours. Over a period of two to 3 days it is obseved that the detached process 
gradually increase its cpu usage to over 60%. The process has to be restarted 
to get it back to normal (less then 6% cpu utilization).

	This process is created using the account that normally manages decmcc 
and is created with /authorized. 

	Any comments on why cpu usages creeps to such a hugh amount ?

	Andrew

T.R	Title	User	Personal Name	Date	Lines
2976.1	ease up on polling interval..	TOOK::MCPHERSON	Life is hard. Play short.	`Tue May 12 1992 09:21`	14
	Yes 15s polling interval is too short. You're probably digging yourself into a hole really fast, esp if the Translans are busy or the lines between them are congested. Suggestion: Using FCL, try the command for each of the Translans in your network and note the longest response time (probably do this a few times just to get a 'feel' for their average response times). Use that value + maybe a 10% fudge factor to derive your shorteset "least common denominator" for polling translans for that attribute. Unfortunately, the CPU usage issue has be boggled... Maybe someone from the alarms team can help there. /doug
2976.2		GIDDAY::CHONG	Andrew Chong - Sydney CSC	`Tue May 12 1992 20:24`	7
	The alarms pooling interval will be eased back to 30 seconds to see what effect it has . Though I can understand how the rules could be disabled with TIME_ALREADY_PASS exceptions. Andrew
2976.3	longer poll interval == problem solved	GIDDAY::CHONG	Andrew Chong - Sydney CSC	`Fri May 22 1992 02:20`	6
	Increasing the poll interval to 30 seconds has clear up the problem . It has also decrease the cpu usage of the detached process. Andrew