[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference azur::mcc

Title:DECmcc user notes file. Does not replace IPMT.
Notice:Use IPMT for problems. Newsletter location in note 6187
Moderator:TAEC::BEROUD
Created:Mon Aug 21 1989
Last Modified:Wed Jun 04 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:6497
Total number of notes:27359

5735.0. "PNM 1.3/ipreachability down, but ping gets 'alive'" by MUNICH::SCHWEMMER () Mon Nov 15 1993 10:52

    
Our customer found unexpected behaviour in PNM 1.3/TCPIP_AM X1.3.7
at testing an snmp-node for ipreachability.

Over the same period of time a PNM-test for ipreachability gave status:
'ipreachability down' while UCX>PING-command answered with 'is alive'.

The snmp-node is wan-connected over a 64kbit-line; this line obviously
isn't very good.

In the first session he did:

MCC> SET MCC 0 TCPIP_AM UDP TIME 20,UDP RETR 3,ICMP TIME 20, ICMP RETR 3
MCC> SHOW SNMP IP.BE1003 IPREACH ,AT EVERY 00:00:03

In the second session he started UCX>ping/all with default-parameters,
that means with timeout=20 seconds.

The ipreachability-test sometimes came back with ipreachability down,
whereas the ping-cmd always gave status 'is alive'.

Because normally ipreachability is checked by using alarm rules, he gets
confused about the wrong ipreachability-message.

The customer uses UCX V2.0-d, but works with UCX$PING.EXE V2.0-0.

Is there anything we can do within MCC to correct this behaviour?
(changing parameters obviously doesn't help. He tried already to
raise the ICMP Timeout-value, but the behaviour was the same).

Customer doesn't want to use IP-Poller, because it just recognizes
a reachability-change.


Any help would be much appreciated,

Mathilde Schwemmer,
DSC Munich
T.RTitleUserPersonal
Name
DateLines
5735.1X????TOOK::MINTZErik MintzMon Nov 15 1993 11:299
Surely you are kidding.  X1.3.7 is an unsupported, internal pre-field test
baselevel code.  Why on earth is your customer running it at all?

If the same behavior occurs in released code, then by all means
request a fix through the support channels.  But a report against
this kind of software would just get bounced.

-- Erik

5735.2Occurs also with official moduleMUNICH::SCHWEMMERTue Nov 16 1993 04:418
    
    Customer got TCPIP_AM 1.3.7, because the same problem uccurred
    with the official module, delivered with PNM 1.3.
    
    Sorry, I forgot to mention it.
    
    Mathilde.
    
5735.3TOOK::MINTZErik MintzTue Nov 16 1993 06:216
I would definitely suggest that you escallate this through the usual
process (see note 7).  X1.3.7 is a much earlier baselevel than the
released code, and part of the purpose of the escallation process is to
insure that such things don't happen.

-- Erik
5735.4ClarificationBIKINI::KRAUSEEuropean NewProductEngineer for MCCTue Nov 23 1993 04:3111
Just to clarify the matter (and cool down Erik :-) :

The MCC_TCPIP_AM image used here was built by Rahul Bose to fix a few
bugs. The link date is 4-OCT-1993 so it is faily recent. It just happens
to have "X1.3.7" in it's Component Version attribute. 

BTW: None of the fixed modules I got from engineering recently as
response to an official CLD reflected the change in 'Component Version'. 
They all show "V1.3.0". So much for reliability of this attribute...

*Robert
5735.5Check if the UCX results are correct before blaming DECmcc.MOLAR::YAHEY::BOSETue Nov 23 1993 10:0914
	RE .0

	There is a known bug in UCX where the ping or the loop command
	will return a status stating that the node is alive, where in
	fact it is not reachable. Can you rlogin or ftp to that node?
	Can you ping that node from an Ultrix or OSF/1 station and compare
	the result.

	I aplogise for the version nos. The version nos. are defined in an
	include file used globally by all the MMs, and the system where I built 
	the executable must have had an older version.

	Rahul.
5735.6MOLAR::YAHEY::BOSETue Nov 23 1993 10:136
	One more thing. Try to increase the ICMP Timeout and Retry values
	and see if it makes a difference. Since you are trying to test 
	the node over a WAN, the response time would be greater.

	Rahul.
5735.7Three to oneBIKINI::KRAUSEEuropean NewProductEngineer for MCCWed Nov 24 1993 09:3019
Rahul,

the customer already set the ICMP Timeout to 20 and Retry to 3. It
didn't help. UCX PING/ALL sometimes shows delays in the 3 to 5 seconds 
range and every now and then a missing packet, but this shouldn't 
produce an IP Reachability = Down, especially given the high timeout and 
retry values.

The IP reachability poller, running at the same time and polling even
more frequently, never shows a Down event. Also Telnet sessions to this
node are never interrupted. So there are three voting against the AM :-)

Because of the transient nature this problem is not easy to reproduce.
But I'll have to escalate it anyway because this customer is annoyed by
false alarms. Do you have any ideas how to tackle the problem? Log bits?

Regards,

*Robert
5735.8Problem seen at many sites in SwedenANTIK::WESTERBERGStefan Westerberg DS StockholmWed Nov 24 1993 12:1123
	Hi, this problem has been observed on many sites here in Sweden.
	
	Some things to look for to keep the symtoms at a minmum level is:

	1. Check SNMP rules that generates alot of exceptions. Try to rewrite
	   them or remove them.

	2. Check the quality of the local lan the PN station is connectet to.
	   This is not tested but on segment with a high error frequens the
	   IPreachability problem occurs more frequent.

	3. Increase the BYTLIM quota for UCX processes.(see UCX$AUX_CONFIG.COM)

	Our general feeling is that it is UCX that cause the problem with
	IPreachability alarms. Another thing is that increasing the ICMP
	timeout and ICMP retries don't seem to have any affect. Some time
	it only seems to make it worse !

	Regards Stefan

	P.S

	We have an entered CLD on this but I don't recal the number.
5735.9CLD info?BIKINI::KRAUSEEuropean NewProductEngineer for MCCThu Nov 25 1993 04:546
Thanks Stefan! This makes me feel better. I were almost tempted to 
believe that I'm seeing ghosts :-)

Could you send me the CLD info (number, sent to UCX or MCC?)

*Robert