T.R | Title | User | Personal Name | Date | Lines |
---|
985.1 | ??? | AZUR::HUREZ | Connectivity & Computing Services @VBE. DTN 828-5159 | Tue Jan 21 1997 13:26 | 18 |
985.2 | Any luck on the dial in... | CSC32::R_RIDGWAY | | Tue Jan 28 1997 14:32 | 11 |
|
Hi Olivier;
I'm sure your a busy fellow but I had sent you dial
for the problem system. I was wondering if you had
the time to look at the problem with extraneous
message removed/added SNS_C_DSK events.
Thanks;
Rodger Ridgway, csc
|
985.3 | Consolidation problem indeed. | AZUR::HUREZ | Connectivity & Computing Services @VBE. DTN 828-5159 | Wed Jan 29 1997 12:02 | 22 |
| Busy you say... Indeed. I'm starting to get serious difficulties in
serializing things :-( I didn't log as yet. Sorry.
However, I could see the extraneous messages on the local cluster I'm
using, considering the recent hardware problems I've got on the Alpha node
that is connected there...
28-JAN 04:48 LAVA Disk _CCOMCA$DKA300: status is mount verify timeout
28-JAN 04:48 LAVA Disk _CCOMCA$DKA200: status is mount verify timeout
28-JAN 04:47 YIPPEE Disk _CCOMCA$DKA300: status is mount verify timeout
28-JAN 04:47 YIPPEE Disk _CCOMCA$DKA200: status is mount verify timeout
despite of the LAVA, YIPPEE and CCOMCA membership to the AZUR cluster.
The SYS$CLUSTER_NAME is well defined and the profile is OK, so there
must be a bug somewhere.
This is in the ECO03 as well. It seems that I've got one more thing
to debug, and one more ECO to issue :-(
Regards,
-- Olivier.
|
985.4 | | CSC32::BUTTERWORTH | Gun Control is a steady hand. | Tue Feb 04 1997 20:08 | 22 |
| Ollie,
An update on this problem as I spoke with the customer today. It
seems this is happening during periods of very high CPU utilization
on the agents ( as in 100% or close to it). It may be that the agent is
not quite getting enough CPU time to perform it's work. The customer
has set the following logicals on the consolidator node:
SNS$DECNET_CONNECTION_TIMEOUT = "60"
SNS$UNR_RETRY_NUMBER" = "2"
The customer has speculated that the consolidator has timed-out the
first attempt and the timer for the second attmept is running when we
finally get a response from the Agent node.
We are going to try increasing the base-priority of the Agent process
to 6 and see if it has any effect. If there is a hole inthe code I
would suspect the agent in that it thinks it has lost connect to the
consolidator but the consolidator has not net exhausted it's retry
limit. It's very wierd that this only happens with "disk errors".
Regards,
Dan
|
985.5 | There's a bug in the event consolidation part. | AZUR::HUREZ | Connectivity & Computing Services @VBE. DTN 828-5159 | Wed Feb 05 1997 08:05 | 6 |
| It isn't a timer problem. I located the bug in the code. It is
strange it was not reported before, as it must have been there for a
while... The fix is uneasy; I'm busy working on it. It will be
available in the ECO04.
-- Olivier.
|