[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference noted::sns

Title:	POLYCENTER System Watchdog for VMS OSF/1 ULTRIX HP-UX AIX SunOS
Notice:	Wishes:406,FAQ:845,Kits-VMS:1000,UNIX:694 VMS ECO01 FT kit: 521
Moderator:	AZUR::HUREZZ

Created:	Fri May 15 1992
Last Modified:	Fri Jun 06 1997
Last Successful Update:	Fri Jun 06 1997
Number of topics:	1033
Total number of notes:	4584

985.0. "SNS_C_DSK message removed/added problems" by CSC32::R_RIDGWAY () Mon Jan 20 1997 13:21

T.R	Title	User	Personal Name	Date	Lines
985.1	???	AZUR::HUREZ	Connectivity & Computing Services @VBE. DTN 828-5159	`Tue Jan 21 1997 13:26`	18
985.2	Any luck on the dial in...	CSC32::R_RIDGWAY		`Tue Jan 28 1997 14:32`	11
	Hi Olivier; I'm sure your a busy fellow but I had sent you dial for the problem system. I was wondering if you had the time to look at the problem with extraneous message removed/added SNS_C_DSK events. Thanks; Rodger Ridgway, csc
985.3	Consolidation problem indeed.	AZUR::HUREZ	Connectivity & Computing Services @VBE. DTN 828-5159	`Wed Jan 29 1997 12:02`	22
	Busy you say... Indeed. I'm starting to get serious difficulties in serializing things :-( I didn't log as yet. Sorry. However, I could see the extraneous messages on the local cluster I'm using, considering the recent hardware problems I've got on the Alpha node that is connected there... 28-JAN 04:48 LAVA Disk _CCOMCA$DKA300: status is mount verify timeout 28-JAN 04:48 LAVA Disk _CCOMCA$DKA200: status is mount verify timeout 28-JAN 04:47 YIPPEE Disk _CCOMCA$DKA300: status is mount verify timeout 28-JAN 04:47 YIPPEE Disk _CCOMCA$DKA200: status is mount verify timeout despite of the LAVA, YIPPEE and CCOMCA membership to the AZUR cluster. The SYS$CLUSTER_NAME is well defined and the profile is OK, so there must be a bug somewhere. This is in the ECO03 as well. It seems that I've got one more thing to debug, and one more ECO to issue :-( Regards, -- Olivier.
985.4		CSC32::BUTTERWORTH	Gun Control is a steady hand.	`Tue Feb 04 1997 20:08`	22
	Ollie, An update on this problem as I spoke with the customer today. It seems this is happening during periods of very high CPU utilization on the agents ( as in 100% or close to it). It may be that the agent is not quite getting enough CPU time to perform it's work. The customer has set the following logicals on the consolidator node: SNS$DECNET_CONNECTION_TIMEOUT = "60" SNS$UNR_RETRY_NUMBER" = "2" The customer has speculated that the consolidator has timed-out the first attempt and the timer for the second attmept is running when we finally get a response from the Agent node. We are going to try increasing the base-priority of the Agent process to 6 and see if it has any effect. If there is a hole inthe code I would suspect the agent in that it thinks it has lost connect to the consolidator but the consolidator has not net exhausted it's retry limit. It's very wierd that this only happens with "disk errors". Regards, Dan
985.5	There's a bug in the event consolidation part.	AZUR::HUREZ	Connectivity & Computing Services @VBE. DTN 828-5159	`Wed Feb 05 1997 08:05`	6
	It isn't a timer problem. I located the bug in the code. It is strange it was not reported before, as it must have been there for a while... The fix is uneasy; I'm busy working on it. It will be available in the ECO04. -- Olivier.