[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference clusta::acms

Title:ACMS comments and questions
Notice:This is not an official software support channel. Kits 5.*
Moderator:CLUSTA::HALLAN
Created:Mon Feb 17 1986
Last Modified:Fri Jun 06 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:4179
Total number of notes:15091

4179.0. "network disruption and high swlup logging rates" by CSC32::J_HENSON (Don't get even, get ahead!) Thu Jun 05 1997 16:21

acms v4.0-2, openvms v6.2, alpha , backend only, decnet phase iv

Air Touch Cellular has just given me some information that I want to
pass along.  It probably should be investigated in more detail by
engineering.

They recently experienced a problem in which the EXC process literally
swamped the swlup log.  It was generating errors so fast that it
was causing the swlup logger to die about once an hour.  This was
a significant impact to their production.

I can provide extracts of the swlup log if needed.  I also have the
atr log for the same time interval.

The errors being logged in swlup were all by a one particular exc
process (there are 3 apps on this node).  There were acmsexc-e-srvnotfound,
system-f-linkexit and acmsexc-e-tsker1.  The tsker1 error was reporting
that an error was occuring in an exchange step for one particular task.
The exchange step uses tdms, although I'm not sure that this is
relevant.  There are exactly 3 srvnotfound errors for every linkexit
and tsker1 error logged.

To make a long story short, it was discovered that the cause of this was
a network disruption.  In this particular case, a transceiver was changed
out.  The entire disruption lasted for about a minutes, but the adjacency
was down for that time.  As soon as this happened, the exc begin logging
the same sequence of messages over and over.  Other than this, the
only other symptoms are that the swlup log was growing very rapidly,
and the users are complaining of some slowdown.  And, when the swlup
processes crashes, everything stops until it gets restarted.

To me, it appears that the exc is not properly handling this disconnect.
In this case, I think that one or more of the CPs had died, but it
didn't seem to pick up on that, and kept trying over and over to
send it the exchange step.

Cycling ACMS (or probably just the application) clears up this problem.

Is this worth spr'ing?  I'm not sure how willing the customer is to
provide additional information.  Also, would 4.2 help?

Thanks,

Jerry
T.RTitleUserPersonal
Name
DateLines
4179.1Some changes were made in ACMS V4.2 that might helpOHMARY::HALLBill Hall - ACMS Engineering - ZKO2-2Thu Jun 05 1997 18:0134
    
    	Jerry
    
    	There was a change made in EXC in ACMS V4.2 to correct a problem
    with the handling of some errors in EXC (it talks about TDMS but that's what
    caused the problem, the lack of TDMS on Alpha):
    
    (extracted from the V4.2 Release Notes)
    
           3.2 Logging System Errors to the SWL Log on Alpha Systems No
                Longer Causes Looping
    
                  A looping problem is fixed: error messages were repeatedly
                  logged to the SWL log.
    
                  TDMS errors were incorrectly downgraded to a facility code
                  of zero (0) on systems where TDMS is not implemented.
                  The incorrect downgrading of the facility code to zero
                  conflicted with system errors that do have a facility code
                  of zero. The result was that system errors were incorrectly
                  downgraded and the system looped trying to write errors to
                  the SWL log.
    
                  TDMS errors are no longer downgraded when TDMS is not
                  implemented on the system.
    
    
    	
    Also, ACMS V4.0-2 is not supported on OpenVMS Alpha V6.2 according to
    the SPD.
    
    
    	Bill
    
4179.2looks like a fitCSC32::J_HENSONDon't get even, get ahead!Fri Jun 06 1997 10:5517
>>   <<< Note 4179.1 by OHMARY::HALL "Bill Hall - ACMS Engineering - ZKO2-2" >>>
>>            -< Some changes were made in ACMS V4.2 that might help >-

>>    	Jerry
    
>>    	There was a change made in EXC in ACMS V4.2 to correct a problem
>>    with the handling of some errors in EXC (it talks about TDMS but that's what
>>    caused the problem, the lack of TDMS on Alpha):
    
Bill,
    
This sounds like a match.  The customer will be upgrading their system
to v4.2 soon, so this problem will probably go away when they do.

Thanks,

Jerry