[Search for users]
[Overall Top Noters]
[List of all Conferences]
[Download this site]
Title: | DECmcc user notes file. Does not replace IPMT. |
Notice: | Use IPMT for problems. Newsletter location in note 6187 |
Moderator: | TAEC::BEROUD |
|
Created: | Mon Aug 21 1989 |
Last Modified: | Wed Jun 04 1997 |
Last Successful Update: | Fri Jun 06 1997 |
Number of topics: | 6497 |
Total number of notes: | 27359 |
1108.0. "Events happen to quick maybe !" by SNOC01::MISNETWORK (They call me LAT) Fri Jun 07 1991 03:52
I have a problem that may be related to what was said in 807, but I am
not sure.
I have alarms set up as follows -
create mcc 0 alarms rule SNO-ZKO-DECnet-DOWN -
expression=(occurs(node4 SNOR04 circ zko-1 circuit down)), -
procedure=disk$userdisk:[tassone.mcc]sprnet_decnet_alarms_broadcast.com;,-
parameter="user=(tassone,engineer)",queue="MCC_ALARMS$BATCH",-
category="SNO-ZKO-DECnet-DOWN",-
description="SNOR04 Nashua DECnet circuit has gone down",-
exception
handler=disk$userdisk:[tassone.mcc]SPRNET_ALARMS_BROADCAST_EXCEPTION.COM;,-
perceived severity=critical, in domain nashua
create mcc 0 alarms rule SNO-ZKO-DECnet-UP -
expression=(occurs(node4 SNOR04 circ zko-1 circuit up)), -
procedure=disk$userdisk:[tassone.mcc]sprnet_decnet_alarms_broadcast.com;,-
parameter="user=(tassone,engineer)",queue="MCC_ALARMS$BATCH",-
category="SNO-ZKO-DECnet-UP",-
description="SNOR04 Nashua DECnet circuit now back up",-
exception
handler=disk$userdisk:[tassone.mcc]SPRNET_ALARMS_BROADCAST_EXCEPTION.COM;,-
perceived severity=warning, in domain nashua
Both of these alarms are enabled. Today we had a problem with this
circuit which saw the thing bounce a number of times as the following
getevnts showed -
Node4 59.823 Circuit ZKO-1
AT 7-JUN-1991 16:45:57 Any Event
Successfully received events:
Circuit down, circuit fault
Reason = circuit synchronization lost
Adjacent Node Address = 2.10
Node4 59.823 Circuit ZKO-1
AT 7-JUN-1991 16:45:57 Any Event
Successfully received events:
Circuit up
Adjacent Node Address = 2.10
and a repl/enab=net showed -
$
%%%%%%%%%%% OPCOM 7-JUN-1991 16:45:56.97 %%%%%%%%%%%
Message from user DECNET on SPRNET
DECnet event 4.7, circuit down, circuit fault
From node 59.823 (SNOR04), 7-JUN-1991 16:47:45.14
Circuit ZKO-1, Line synchronization lost, Adjacent node = 2.10
$
%%%%%%%%%%% OPCOM 7-JUN-1991 16:45:57.32 %%%%%%%%%%%
Message from user DECNET on SPRNET
DECnet event 4.10, circuit up
From node 59.823 (SNOR04), 7-JUN-1991 16:47:46.12
Circuit ZKO-1, Adjacent node = 2.10
Unfortunately, only the alarm saying the circuit had come up fired, I
don't know why the circuit down alarm didn't fire. Normally it works
fine , this has only happened now that the circuit down and circuit up
occur so close together!
Is this related to what is said in the release notes -
"If too many occurences are requested at one time it is possible that
the event manager may not be capable of processing them all."
I don't see the events lost message anywhere !
Any ideas,
Cheers,
Louis
p.s. I also had the problem of no events firing, to fix this I
disabled/enabled the local sink monitor. All worked fine after that.
T.R | Title | User | Personal Name | Date | Lines |
---|
1108.1 | events aren't lost | TOOK::CALLANDER | Jill Callander DTN 226-5316 | Sun Jun 09 1991 22:31 | 18 |
| No, it doesn't look like the too many occurrences problem, and you
areright you would have seen the events lost (or some other error)
telling you that the manager lost events. As to why you didn't get
it I don't know. Were you certain that the rule was active? Did you
check to see if the exception handler had already fired or some other
reason?
I don't know, but it sure looks like it should have worked. Maybe Jim
Carey (PL for decnet AM stuff) could shed some like on how the event
listener for MCC handles getting these, or Matt on how multiple
requesters for the same event are handled. FYI -- Jim if he is doing
what I think the decnet AM should have a getevent from the FCL and a
getevent from alarms pending at the same time; could this cause
problems in the AM?
(just guessing)
jill
|
1108.2 | That isn't a lost event | TOOK::GUERTIN | I do this for a living -- really | Mon Jun 10 1991 09:21 | 23 |
| You only get lost events when the volume of events held in the Event
Pool fills to capacity. You never get lost events because events occur
too fast (the putter just blocks). The only time I've seen this kind
of behavior is when the request was for the "Next" event (instead of
a scope of interest), and the next request was too late to for the next
occurrence.
Something like:
Getter Thread Putter Thread
------------- -------------
1) Request Any Next Event
2) Event Occurs
3) Return Event to Requestor
4) Event Occurs
(No requests match so it is thrown away)
5) Request Any Next Event
(Waits for Next Event)
Note that Step 5 has missed the Event in Step 4.
-Matt.
|
1108.3 | wrong event | NETCUR::WADE | Bill Wade T&N Course Development | Mon Jun 10 1991 10:41 | 7 |
| I had the same problem.
If you look at the event it is #4.7 (circuit down circuit fault)
but you are checking for event #4.8 (circuit down). Not sure what
would trigger a 4.8 event??
bill
|
1108.4 | Thats it - no mystery | SNOC02::MISNETWORK | Take a byte | Mon Jun 10 1991 23:07 | 31 |
| You found my problem Bill. It was pretty obvious in the end, I even
documented it in a previous note 1093.
Trouble is, how can I check on DECnet outages without having to fire
two alarms, one looking for circuit downs ( 4.8 ) and one looking for
circuit down circuit fault ( 4.7 ), not to mention circuit down
operator initiated ( 4.9 ).
By the way , to get 4.8 to fire, simply turn a DECnet circuit off and
you will get event #4.9 (circuit down operator initiated) on the local
router as follows -
%%%%%%%%%%% OPCOM 11-JUN-1991 10:24:26.96 %%%%%%%%%%%
Message from user DECNET on SPRNET
DECnet event 4.9, circuit down, operator initiated
From node 59.857 (SNLR01), 11-JUN-1991 11:19:32.22
Circuit SNA-1, Line synchronization lost, Adjacent node = 59.430
(SNAR01)
and you will get event #4.8 (circuit down) from the remote router as
follows -
%%%%%%%%%%% OPCOM 11-JUN-1991 10:24:52.56 %%%%%%%%%%%
Message from user DECNET on SPRNET
DECnet event 4.8, circuit down
From node 59.430 (SNAR01), 11-JUN-1991 10:25:29.64
Circuit SNL-2, Adjacent node listener receive timeout
Adjacent node = 59.857
Cheers,
Louis
|