T.R | Title | User | Personal Name | Date | Lines |
---|
6044.1 | Use IP Reachability Poller | BIKINI::KRAUSE | CSC Network Management/Hubs | Tue Jul 12 1994 12:51 | 26 |
| >One of the alarms polls the snmp entities checking for CHANGE OF IPReachability
>changes from whatever state it is to UP, clearing (getting the icon blue) any
>alarm condition it could have before.
>
>The second alarm polls the snmp entities checking for CHANGE OF IPReachability
>changes from whatever state it is to DOWN, firing with some critical condition.
Possibly a dumb question, but why on earth don't you use the IP
Reachability Poller??? It has been designed to do just this and it
works well.
>I wouldn't expect, because of the configuration of the alarm, two consecutive
>firing staying IPREACHABILITY DOWN, as they are happening. The system must
>detect that the system became reachable before it can fire an alarm saying
>that it got unreachable again.
>
>The system says that IPREACHABILITY is UP several times before saying that
>the SNMP node got unreachable again.
Careful, lad! You are running two independent alarm rules. Those two
rules are not synchronized in any way. If you have several 'downs'
between samples this could explain what you see. Again: use the IP
Poller. If you need to act on events (mail, etc.) write occurs rules for
the IP Poller events (down/up).
*Robert
|
6044.2 | | CSC32::M_EVANS | skewered shitake | Wed Jul 13 1994 17:00 | 10 |
| having worked with Richard (and a host of customers who don't want to
use the ip reachabilty poller) I can see what the problem appears to
be. It appears that the udp timeouts and retries don't work and that
we are left at the defaul 20 seconds from the ip transport, at least
with UCX.
Out of curiosity how do we call "ping" from the alarm if this is what
we are doing?
meg
|
6044.3 | IP poller is per domain based | VNZV01::GONZALEZ | | Thu Jul 14 1994 12:37 | 16 |
|
Hi, again
re:.1
I realize that migth be a synchronization problem here, but I
need to change the color of the icons according to the severity of the
fault, and I don't know if there is another way to do it besides having
two differents alarms. If you know a way, please let me know it.
The reason for not using the IP POLLER is that as far as I know I have
to enable it for each domain once I get into IMPM, I won't do it and
more important, customer won't like that. Again, If you know another
way to do that, let me know.
Richard
|
6044.4 | MCC DOESN'T RESPECT MY TIMEOUTS | VNZV01::GONZALEZ | | Thu Jul 14 1994 12:54 | 35 |
|
Hi,
re: .2
I went to customer with my super IRIS (thanks Mitch and Chris for it)
to observe and collect some information about the pings in the network.
I realized some interesting things there.
First, Event though I set the ICMP Timeout to 10 (seconds according to
the documentation), MCC doesn't wait that time to declare the packet
lost, actually something after 10 to 15 msecs (milliseconds) it sends
the retries and it doesn't wait for retries timeout either.
Even more, I set the ICMP Timeout to 59 in MCC FCL and tried
'SHOW SNMP .SNMP.IPHOSTS ALL STATUS' (in the same session, of course) to
one node I knew was not reachable and then it came after 5 or 6 or even
2 seconds saying the IPHOSTS node was not accessable.
I don't think, and I don't want to believe this is the way It is
supposed to work.
We (Digital) says that we have a timeout settable.
Can someone explains this behavior to me and customers ?
What can be wrong ?
Thanks in advance for replies to this topic.
Richard
PS: I have the Iris data available, if someone wants to look at it.
|
6044.5 | | CSC32::M_EVANS | skewered shitake | Thu Jul 14 1994 13:32 | 6 |
| Richard,
Can you send me that information, I will add it to your IPMT case.
You are confirming something I have suspected for a while.
meg
|
6044.6 | DATA LOCATION | VNZV01::GONZALEZ | | Thu Jul 14 1994 16:16 | 23 |
|
Hi Meg,
At the moment when I used Iris I just had my floppy so I saved 45K of
information only.
That information is available at : VNZV01::PING.DAT.
If you have problems accessing this file, let me know.
I'm going to customer now. Probably I can have more data by the
end of the day. I will make that one available as VNZV01::PING1.DAT by
tomorrow morning.
This data is IRIS data, which I think you can load with other protocol
analyzers, but I don't know how to. Anyway IRIS is more than enough.
I hope this to be useful.
Thanks.
Richard
|
6044.7 | 'Known Problem' | BIKINI::KRAUSE | CSC Network Management/Hubs | Fri Jul 15 1994 04:39 | 17 |
| I've already opened an IPMT for the timeout problem (MGO100555) some
time ago. Meanwhile a slightly different problem has been fixed (now
ignores redirect messages and other 'noise'). I already sent traces, but
maybe your additional traces could help engineering to find the bug.
The IP Poller works ok - that's why I suggested to use it instead. You
should be able to start the poller from a batch job to have it running
all the time. But then you won't get the current status on the screen
when you later start the IM. A viable solution (or should I say
workaround?) is to run the two alarm rules once after starting the IM.
This could be done in batch as well, e.g. have a command procedure that
submits a job to run the rules /after=+00:02 and then starts the IM.
I wish there were a command to tell the IP Poller to send a complete
status and this command could be coupled with IM startup. Am I dreaming?
*Robert
|
6044.8 | MCC, UCX, IRIS, NEW DATA | VNZV01::GONZALEZ | | Fri Jul 15 1994 21:16 | 39 |
|
Hi fellows,
As I told you I went to customer and this is what I found
If I disable all the alarms and enable just a couple of them which
controls a device in the same LAN , MCC respects the timeout, but where
all the alarms are enabled and running it doesn't.
I couldn't make any test in the WAN because all the hosts were OK.
In the file VNZV01::PING1.DAT I have put IRIS filtered data that refers
just to one device in the WAN (name 8230_POZ1=00-80-21-00-94-d1).
BTW, I have also put the file dwnodes.ini where I have defined the
names with their hardware address.
If someone see that file with Iris will realize that the alarms was
polling every 1 minute as expected (If I don't remember bad, anyway
it's easy to see), and as I have two alarms checking for ipreachability,
you will see that the pings are sent in groups of two.
But, there are some where you'll see MCC (I don't have the address, but
it is 1.110) sending up to 3 packets no waiting for the 20 seconds
timeout I have set before. Observe that all the pings got their answer
but it seems that when the answer arrived MCC was not waiting for them
any longer and actually MCC fired the alarm.
I couldn't reproduce this again, but I would say I saw that seconds
after disabling all the alarms Iris was still seeing ping packets in
the network.
Is it possible that we have a synchronization problem between MCC
and UCX ?
Is it possible that MCC respects the timeouts but UCX is very busy
doing others things (btw, I increase the number of sockets before) ?
Thank you very much for the support with this problem.
Richard
|