Title: | DECmcc user notes file. Does not replace IPMT. |
Notice: | Use IPMT for problems. Newsletter location in note 6187 |
Moderator: | TAEC::BEROUD |
Created: | Mon Aug 21 1989 |
Last Modified: | Wed Jun 04 1997 |
Last Successful Update: | Fri Jun 06 1997 |
Number of topics: | 6497 |
Total number of notes: | 27359 |
14 Translan alarms are enabled to poll the bridges at 15 seconds interval to determine change of status on the links. A sample alarm looks like this : Alarm Fired Procedure = DISK_USER:[NETWORKS.MCC.ALARMS]AWB_BRS_LINK.COM;1 Alarm Exception Procedure = SYS$COMMON:[MCC]MCC_ALARMS_LOG_EXCEPTION.COM;3 Description = "AWB-BRS Megalink has RECOVERED" Category = "Bridge Recovery" Expression = (CHANGE_OF(TRANSLAN AWBTL3503 line 2 module state,*, FORWARDING), at every 00:00:15) Severity = Critical The problem is that for the past 2 weeks it has been geting exceptions alarm randomly on all bridges . The exceptions are logged in the exception logfile and some entries are seen in the MCC_ALARMS_date_ERROR.LOG Two types of exceptions are seen : 1. %MCC-W-TIME_ALREADY_PA, scheduled time has already passed The rule has been disabled 2. Cannot communicate with target Particularly of concern is the TIME_ALREADY_PA exception. Since the rule itself does not specify a start and ending time for the alarm. The disabling of the rule means that no further polling is done until the alarm is reenabled via a com procedure that runs at midnight. Any comments why it would get TIME_ALREADY_PA exception ? Is polling the bridges at 15 seconds too frequent ? A second problem which may or may not be related is that the above alarms are kept alive by a detached process . The detach process runs a com procedure which enables the translan alarms then do a show command within mcc with start="duration" . "duration" is the duration to midnight. This keeps the process alive till midnight. The procedure then exit from decmcc which disables all alarms and then goes back to enable the alarms for another 24 hours. Over a period of two to 3 days it is obseved that the detached process gradually increase its cpu usage to over 60%. The process has to be restarted to get it back to normal (less then 6% cpu utilization). This process is created using the account that normally manages decmcc and is created with /authorized. Any comments on why cpu usages creeps to such a hugh amount ? Andrew
T.R | Title | User | Personal Name | Date | Lines |
---|---|---|---|---|---|
2976.1 | ease up on polling interval.. | TOOK::MCPHERSON | Life is hard. Play short. | Tue May 12 1992 09:21 | 14 |
Yes 15s polling interval is too short. You're probably digging yourself into a hole really fast, esp if the Translans are busy or the lines between them are congested. Suggestion: Using FCL, try the command for each of the Translans in your network and note the longest response time (probably do this a few times just to get a 'feel' for their average response times). Use that value + maybe a 10% fudge factor to derive your shorteset "least common denominator" for polling translans for that attribute. Unfortunately, the CPU usage issue has be boggled... Maybe someone from the alarms team can help there. /doug | |||||
2976.2 | GIDDAY::CHONG | Andrew Chong - Sydney CSC | Tue May 12 1992 20:24 | 7 | |
The alarms pooling interval will be eased back to 30 seconds to see what effect it has . Though I can understand how the rules could be disabled with TIME_ALREADY_PASS exceptions. Andrew | |||||
2976.3 | longer poll interval == problem solved | GIDDAY::CHONG | Andrew Chong - Sydney CSC | Fri May 22 1992 02:20 | 6 |
Increasing the poll interval to 30 seconds has clear up the problem . It has also decrease the cpu usage of the detached process. Andrew |