[Search for users]
[Overall Top Noters]
[List of all Conferences]
[Download this site]
Title: | ACMS comments and questions |
Notice: | This is not an official software support channel. Kits 5.* |
Moderator: | CLUSTA::HALL AN |
|
Created: | Mon Feb 17 1986 |
Last Modified: | Fri Jun 06 1997 |
Last Successful Update: | Fri Jun 06 1997 |
Number of topics: | 4179 |
Total number of notes: | 15091 |
4179.0. "network disruption and high swlup logging rates" by CSC32::J_HENSON (Don't get even, get ahead!) Thu Jun 05 1997 16:21
acms v4.0-2, openvms v6.2, alpha , backend only, decnet phase iv
Air Touch Cellular has just given me some information that I want to
pass along. It probably should be investigated in more detail by
engineering.
They recently experienced a problem in which the EXC process literally
swamped the swlup log. It was generating errors so fast that it
was causing the swlup logger to die about once an hour. This was
a significant impact to their production.
I can provide extracts of the swlup log if needed. I also have the
atr log for the same time interval.
The errors being logged in swlup were all by a one particular exc
process (there are 3 apps on this node). There were acmsexc-e-srvnotfound,
system-f-linkexit and acmsexc-e-tsker1. The tsker1 error was reporting
that an error was occuring in an exchange step for one particular task.
The exchange step uses tdms, although I'm not sure that this is
relevant. There are exactly 3 srvnotfound errors for every linkexit
and tsker1 error logged.
To make a long story short, it was discovered that the cause of this was
a network disruption. In this particular case, a transceiver was changed
out. The entire disruption lasted for about a minutes, but the adjacency
was down for that time. As soon as this happened, the exc begin logging
the same sequence of messages over and over. Other than this, the
only other symptoms are that the swlup log was growing very rapidly,
and the users are complaining of some slowdown. And, when the swlup
processes crashes, everything stops until it gets restarted.
To me, it appears that the exc is not properly handling this disconnect.
In this case, I think that one or more of the CPs had died, but it
didn't seem to pick up on that, and kept trying over and over to
send it the exchange step.
Cycling ACMS (or probably just the application) clears up this problem.
Is this worth spr'ing? I'm not sure how willing the customer is to
provide additional information. Also, would 4.2 help?
Thanks,
Jerry
T.R | Title | User | Personal Name | Date | Lines |
---|
4179.1 | Some changes were made in ACMS V4.2 that might help | OHMARY::HALL | Bill Hall - ACMS Engineering - ZKO2-2 | Thu Jun 05 1997 18:01 | 34 |
|
Jerry
There was a change made in EXC in ACMS V4.2 to correct a problem
with the handling of some errors in EXC (it talks about TDMS but that's what
caused the problem, the lack of TDMS on Alpha):
(extracted from the V4.2 Release Notes)
3.2 Logging System Errors to the SWL Log on Alpha Systems No
Longer Causes Looping
A looping problem is fixed: error messages were repeatedly
logged to the SWL log.
TDMS errors were incorrectly downgraded to a facility code
of zero (0) on systems where TDMS is not implemented.
The incorrect downgrading of the facility code to zero
conflicted with system errors that do have a facility code
of zero. The result was that system errors were incorrectly
downgraded and the system looped trying to write errors to
the SWL log.
TDMS errors are no longer downgraded when TDMS is not
implemented on the system.
Also, ACMS V4.0-2 is not supported on OpenVMS Alpha V6.2 according to
the SPD.
Bill
|
4179.2 | looks like a fit | CSC32::J_HENSON | Don't get even, get ahead! | Fri Jun 06 1997 10:55 | 17 |
| >> <<< Note 4179.1 by OHMARY::HALL "Bill Hall - ACMS Engineering - ZKO2-2" >>>
>> -< Some changes were made in ACMS V4.2 that might help >-
>> Jerry
>> There was a change made in EXC in ACMS V4.2 to correct a problem
>> with the handling of some errors in EXC (it talks about TDMS but that's what
>> caused the problem, the lack of TDMS on Alpha):
Bill,
This sounds like a match. The customer will be upgrading their system
to v4.2 soon, so this problem will probably go away when they do.
Thanks,
Jerry
|