| OK -- I give up, I simply do not understand why, what I will describe here
does not work. Please extract the command file below and try it yourself.
I created the following command file to play with setting up some alarms
on node counters. I set up 6 alarms rules which are basically identical
expect for the counter which it monitors (why? because that's the way I did it).
See attached command file for CREATE commands.
Behavior:
Alarms rules 1 and 2 kick in after the appointed 1 min wait, do their thing
and remain enabled.
Rules 3-6 also kick in after the 1 min wait and immediately become disabled.
My MCC$Alarms_9-may-1990_errors.log indicates that alarms 3-6 experienced
an '%MCC-S-COMMON_EXPECTIO, Success with common exception reply' and
'Software logic error detected
MCC Routine Error = %MCC-E-ILVSYNTAXERROR, the ASN.1 valu
e's data type conflicts with supplied
type'
(ugly spacing BTW but that's not the issue)
Question:
Why do the first 2 alarms work and the next 4 fail? Initially, I had them
running all through the same nodify command file -- then I broke them up, which
didn't matter. I tried enabling them in different orders once I
created the rules, that didn't matter. My conclusion is that there
is something wrong with rules 3-6, but I do not know what. What doesn't
MCC like about rules 3-6 (I thought these were pretty straight forward,
but sometimes you get too close and need other eyes to see)???
Is this a dictionary problem?
/Claudia
!==================================================
! Alarms testing command file
!
! Delete the old rules and add the new rules.
!
DELETE MCC ALARMS RULE NODE_STATUS_ALARM1
DELETE MCC ALARMS RULE NODE_STATUS_ALARM2
DELETE MCC ALARMS RULE NODE_STATUS_ALARM3
DELETE MCC ALARMS RULE NODE_STATUS_ALARM4
DELETE MCC ALARMS RULE NODE_STATUS_ALARM5
DELETE MCC ALARMS RULE NODE_STATUS_ALARM6
!
!
CREATE MCC ALARMS RULE NODE_STATUS_ALARM1 -
EXPRESSION = (NODE4 CLAUDI RECEIVED CONNECT RESOURCE ERROR > 0, AT -
START =(+00:01:00) EVERY=01:00:00 ), -
PROCEDURE = MCC$COMMON:MCC$ALARMS_NOTIFY1.COM, -
PARAMETER = "ResourceError", -
CATEGORY = "COUNTERS CHECK", -
DESCRIPTION = "Test node counters", -
QUEUE = "SYS$BATCH"
ENABLE MCC ALARMS RULE NODE_STATUS_ALARM1
SHOW MCC ALARMS RULE NODE_STATUS_ALARM1 ALL STATUS
!
!
CREATE MCC ALARMS RULE NODE_STATUS_ALARM2 -
EXPRESSION = (NODE4 CLAUDI RESPONSE TIMEOUTS > 0, AT -
START =(+00:01:00) EVERY 01:00:00 ), -
PROCEDURE = MCC$COMMON:MCC$ALARMS_NOTIFY2.COM, -
PARAMETER = "ResponseTimeouts", -
CATEGORY = "COUNTERS CHECK", -
DESCRIPTION = "Test node counters", -
QUEUE = "SYS$BATCH"
ENABLE MCC ALARMS RULE NODE_STATUS_ALARM2
SHOW MCC ALARMS RULE NODE_STATUS_ALARM2 ALL STATUS
!
!
CREATE MCC ALARMS RULE NODE_STATUS_ALARM3 -
EXPRESSION = (NODE4 CLAUDI AGED PACKET LOSS > 0, AT -
START =(+00:01:00) EVERY=01:00:00 ), -
PROCEDURE = MCC$COMMON:MCC$ALARMS_NOTIFY3.COM, -
PARAMETER = "AgedPacketLoss", -
CATEGORY = "COUNTERS CHECK", -
DESCRIPTION = "Test node counters", -
QUEUE = "SYS$BATCH"
ENABLE MCC ALARMS RULE NODE_STATUS_ALARM3
SHOW MCC ALARMS RULE NODE_STATUS_ALARM3 ALL STATUS
!
!
CREATE MCC ALARMS RULE NODE_STATUS_ALARM4 -
EXPRESSION = (NODE4 CLAUDI OVERSIZED PACKET LOSS > 0, AT -
START =(+00:01:00) EVERY=01:00:00 ), -
PROCEDURE = MCC$COMMON:MCC$ALARMS_NOTIFY4.COM, -
PARAMETER = "OversizedPacketLoss", -
CATEGORY = "COUNTERS CHECK", -
DESCRIPTION = "Test node counters", -
QUEUE = "SYS$BATCH"
ENABLE MCC ALARMS RULE NODE_STATUS_ALARM4
SHOW MCC ALARMS RULE NODE_STATUS_ALARM4 ALL STATUS
!
!
CREATE MCC ALARMS RULE NODE_STATUS_ALARM5 -
EXPRESSION = (NODE4 CLAUDI PACKET FORMAT ERROR > 0, AT -
START =(+00:01:00) EVERY=01:00:00 ), -
PROCEDURE = MCC$COMMON:MCC$ALARMS_NOTIFY5.COM, -
PARAMETER = "PacketFormatError", -
CATEGORY = "COUNTERS CHECK", -
DESCRIPTION = "Test node counters", -
QUEUE = "SYS$BATCH"
ENABLE MCC ALARMS RULE NODE_STATUS_ALARM5
SHOW MCC ALARMS RULE NODE_STATUS_ALARM5 ALL STATUS
!
!
CREATE MCC ALARMS RULE NODE_STATUS_ALARM6 -
EXPRESSION = (NODE4 CLAUDI VERIFICATION REJECT > 0, AT -
START =(+00:01:00) EVERY=01:00:00 ), -
PROCEDURE = MCC$COMMON:MCC$ALARMS_NOTIFY6.COM, -
PARAMETER = "VerficationReject", -
CATEGORY = "COUNTERS CHECK", -
DESCRIPTION = "Test node counters", -
QUEUE = "SYS$BATCH"
ENABLE MCC ALARMS RULE NODE_STATUS_ALARM6
SHOW MCC ALARMS RULE NODE_STATUS_ALARM6 ALL STATUS
SHOW MCC ALARMS RULE * ALL STATUS
!
!
|
| Hi there,
This is a continuation of this discussion and it also references note
105 regarding circuit status.
Originally there were two options discussed thru the QAR system and
offline regarding alarms detecting whether a circuit was up or down.
The first was using the circuit substate and the second was using
the adjacent node of a circuit. The problem with circuit substate
was that the substate isn't returned if the circuit is running
properly causing an error in the alarm. The problem with the adj
node option was that alarms do not fire on error (which is the case
when the circuit is down and the adj node attribute is not there).
Also datatypes for 'name' and 'adj node' are not yet supported as
brought up in this note.
The solution for circuit status alarms has been worked out by MCC
supplying a substate if none is there (Note 105). Thanks for all
your efforts, this will certainly help MCC fit into our environment
at NETops and other support centers.
But I would like to point out an added feature the adj node option
would have given MCC net mngrs. It occurred to me this week while
using the ICON PM. I am presently creating maps with entities and
drawing lines between everything which are connected via DECnet.
These lines are only pictorial and MCC has no way of verifying this
pictorial topology. Hence, if someone beyond our realm of control
moves a circuit or changes a node, there is no way of telling other
than manually checking the adj node periodically. Worse yet, if a
change is made without us realizing, then a circuit goes down, we
are misinformed when looking at the map and end up wasting valuable
bandwidth/connectivity time searching for topology answers in order
to work the issue properly.
NETops has been using a DT homegrown tool for years now which has
been the mainstay of monitoring our nodes/circuits (even with NMCC).
It works by polling and keying off the ADJ NODES of circuits. If a
circuit goes down OR if the adj node changes, a line is graphically
changed red, notifying analysts to 'take a look'. The key is that it
informs you of circuit problems and/or forces the topology information
to be up to date. Knowledge of the topology is critical in network
management and this tool helps maintain that knowledge especially
in a large network.
When using the following expression this also creates the environment
of informing/forcing (subtle) changes in the topology info are needed.
expression=(node4 <node> circuit syn-n adj node <adj_node> name=adj_node)
This idea becomes VERY important within distributed network management
environments (ie. EASYnet, ACTnet, DSNnet, & ext vendors also) where
the boundaries of responsibility are not so well defined and political!!
In order to successfully implement this, 'firing' alarms on error are
needed. Is this going to be supported (or somehow made an option)? This
also resolves node reachability alarm problems which I know you are
also looking into resolving as a seperate issue.
...brad
|
| RE: .4
>> In order to successfully implement this, 'firing' alarms on error are
>> needed. Is this going to be supported (or somehow made an option)? This
>> also resolves node reachability alarm problems which I know you are
>> also looking into resolving as a separate issue.
>> ..brad
Brad,
Alarms has it now! I was really happy to see you "require" this functionality!!
We are adding finishing touches to the functionality you mentioned i.e.
Alarming on error. An example of rule indicating the syntax is as follows:
!
CREATE MCC ALARMS RULE Sample_Rule__102 -
Expression = (NODE4 foo Connects Sent > 3, at every 00:00:45),-
Procedure = SYS$COMMON:[MCC]MCC$ALARMS_MAIL_ALARM, -
Parameter = "wakeme::ANIL", -
exception handler = SYS$COMMON:[MCC]MCC$ALARMS_MAIL_ALARM,-
Description = " => Demo of MCC ALARMS functioning"
If there is any error in accessing the entity, (for that matter any error
in processing the data returned by the entity) Alarms will be queue the command
procedure indicated by the argument "Exception Handler".
If the error is "found" to be of permanent nature then the rule is Disabled
else we bump the error counter and save the error msg in memory and wait for
next "scheduled" time. There was a lot of effort put into this functionality,
but I think it is well worth it!
EFT update kit will have this functionality so stay tuned!
- Anil Navkal
Alarms Team Member
|