Title: | DECWINDOWS 26-JAN-89 to 29-NOV-90 |
Notice: | See 1639.0 for VMS V5.3 kit; 2043.0 for 5.4 IFT kit |
Moderator: | STAR::VATNE |
Created: | Mon Oct 30 1989 |
Last Modified: | Mon Dec 31 1990 |
Last Successful Update: | Fri Jun 06 1997 |
Number of topics: | 3726 |
Total number of notes: | 19516 |
Hi, Down here we have a 25 node MI-cluster. Three main CI node, a 6320, a 6420 and a 8350, in wich the 6000 machines act as boot/file/x-client server. Most people run the applications locally on their station, but for MAIL, NOTES etc, they start the applications as x-clients on one of the 6000 machines. The x-client applications run on Base priority 4 in BATCH (interactive sessions run at base priority 4) via a generic queue. Some x-client applications are started off from interactive sessions. We are connected on an Ethernet backbone running in one building going into a DEREP to the other building going into a LANbridge 100 and next to the second segment. From time to time, the X-display server looses connection with the x-client application. This happens with stations on the same Ethernet segment as the CI node as well as with stations not on the same segment. This happens also on X-transport layer TCP-IP as well as DECnet. Where can be looked to see what see what happens? Is there programmable timing built into the X-11 protocol? I had a look into SYS$MANAGER:DECW$SERVER_0_OUTPUT.LOG and SYS$MANAGER:DECW$SERVER_0_ERROR.LOG The only remarkable thing to me is the following messages : 26-JUN-1990 14:22:24.7 Client 1 resets the server 26-JUN-1990 13:24:57.1 Client 3 resets the server . . . Thanks on forehand for any hints. The included version of ERROR.LOG surely had a lost x-client session {BRH555} $ type sys$manager:decw$server_0_error.log 26-JUN-1990 10:08:56.0 Hello, this is the X server Dixmain address=13074 Now attach all known txport images %DECW-I-ATTACHED, transport DECNET attached to its network %DECW-W-ATT_FAIL, failed to attach transport LAT -SYSTEM-F-PRIVINSTALL, shareable images must be installed to run privileged imag e in SetFontPath Connection 99700 is accepted by Txport out SetFontPath GPX color/monochrome support loaded gpx$InitOutput address=12a410 Connection Prefix: len == 42 26-JUN-1990 10:09:37.5 Now I call scheduler/dispatcher 26-JUN-1990 10:09:38.6 Connection 99738 is accepted by Txport 26-JUN-1990 10:09:41.7 Connection 99700 is closed by Txport 26-JUN-1990 10:21:49.7 Connection 99738 is closed by Txport 26-JUN-1990 10:21:51.8 Connection 99700 is accepted by Txport 26-JUN-1990 10:21:55.1 Connection 99770 is accepted by Txport 26-JUN-1990 10:22:04.1 Connection 99738 is accepted by Txport 26-JUN-1990 10:22:04.7 Connection 9adf8 is accepted by Txport 26-JUN-1990 10:22:09.2 Connection 9ae30 is accepted by Txport 26-JUN-1990 11:39:39.0 Using extra todo packet pool... 26-JUN-1990 13:24:56.3 Connection 99738 is closed by Txport 26-JUN-1990 13:24:56.6 Connection 9adf8 is closed by Txport 26-JUN-1990 13:24:57.1 Client 3 resets the server 26-JUN-1990 13:24:58.3 GPX color/monochrome support loaded gpx$InitOutput address=12a410 26-JUN-1990 13:25:03.0 Now I call scheduler/dispatcher 26-JUN-1990 13:25:04.7 Connection 286c70 is accepted by Txport 26-JUN-1990 13:25:05.3 Connection 286ca8 is accepted by Txport 26-JUN-1990 13:25:08.7 Connection 286c70 is closed by Txport 26-JUN-1990 13:32:20.6 Connection 286ca8 is closed by Txport 26-JUN-1990 13:32:23.8 Connection 286c70 is accepted by Txport 26-JUN-1990 13:32:33.2 Connection 286ca8 is accepted by Txport 26-JUN-1990 13:32:33.7 Connection 286ce0 is accepted by Txport 26-JUN-1990 13:32:37.5 Connection 294718 is accepted by Txport 26-JUN-1990 13:33:19.5 Connection 294718 is closed by Txport 26-JUN-1990 13:33:42.5 Connection 286d18 is accepted by Txport 26-JUN-1990 13:34:24.1 Connection 286d18 is closed by Txport 26-JUN-1990 13:34:30.1 Connection 286ca8 is closed by Txport 26-JUN-1990 13:34:30.3 Connection 286ce0 is closed by Txport 26-JUN-1990 13:34:30.7 Client 1 resets the server 26-JUN-1990 13:34:31.1 GPX color/monochrome support loaded gpx$InitOutput address=12a410 26-JUN-1990 10:22:09.2 Connection 9ae30 is accepted by Txport 26-JUN-1990 13:34:33.0 Now I call scheduler/dispatcher 26-JUN-1990 13:34:34.7 Connection 286c70 is accepted by Txport 26-JUN-1990 13:34:35.4 Connection 286ca8 is accepted by Txport 26-JUN-1990 13:34:37.6 Connection 286c70 is closed by Txport 26-JUN-1990 13:34:49.7 Connection 286ca8 is closed by Txport 26-JUN-1990 13:34:52.3 Connection 286c70 is accepted by Txport 26-JUN-1990 13:35:01.0 Connection 286ca8 is accepted by Txport 26-JUN-1990 13:35:01.6 Connection 286ce0 is accepted by Txport 26-JUN-1990 13:35:06.2 Connection 294750 is accepted by Txport 26-JUN-1990 13:58:59.8 Connection 294750 is closed by Txport 26-JUN-1990 13:59:04.0 Connection 286ca8 is closed by Txport 26-JUN-1990 13:59:04.1 Connection 286ce0 is closed by Txport 26-JUN-1990 13:59:04.6 Client 1 resets the server 26-JUN-1990 13:59:05.4 GPX color/monochrome support loaded gpx$InitOutput address=12a410 26-JUN-1990 13:59:07.2 Now I call scheduler/dispatcher 26-JUN-1990 13:59:09.5 Connection 286c70 is accepted by Txport 26-JUN-1990 13:59:10.0 Connection 286ca8 is accepted by Txport 26-JUN-1990 13:59:11.7 Connection 286c70 is closed by Txport 26-JUN-1990 14:04:01.0 Connection 286ca8 is closed by Txport 26-JUN-1990 14:04:04.7 Connection 286c70 is accepted by Txport 26-JUN-1990 14:04:13.7 Connection 286ca8 is accepted by Txport 26-JUN-1990 14:12:04.7 Connection 294788 is accepted by Txport 26-JUN-1990 14:12:41.1 Connection 294788 is closed by Txport 26-JUN-1990 14:16:24.8 Connection 286d18 is closed by Txport 26-JUN-1990 14:16:30.0 Connection 286ce0 is closed by Txport 26-JUN-1990 14:16:30.2 Connection 286ca8 is closed by Txport 26-JUN-1990 14:16:30.3 Txport status = 2dba002 ( copy and write) on client 2 26-JUN-1990 14:16:30.9 Client 1 resets the server 26-JUN-1990 14:16:31.4 GPX color/monochrome support loaded gpx$InitOutput address=12a410 26-JUN-1990 14:16:33.3 Now I call scheduler/dispatcher 26-JUN-1990 14:16:34.7 Connection 286c70 is accepted by Txport 26-JUN-1990 14:16:35.4 Connection 286ca8 is accepted by Txport 26-JUN-1990 14:16:37.0 Connection 286c70 is closed by Txport 26-JUN-1990 14:16:53.9 Connection 286ca8 is closed by Txport 26-JUN-1990 14:16:56.5 Connection 286c70 is accepted by Txport 26-JUN-1990 14:17:05.4 Connection 286ca8 is accepted by Txport 26-JUN-1990 14:17:05.8 Connection 286ce0 is accepted by Txport 26-JUN-1990 14:17:11.5 Connection 2947c0 is accepted by Txport 26-JUN-1990 14:20:05.5 Connection 2446c0 is accepted by Txport 26-JUN-1990 14:22:02.2 Co 26-JUN-1990 14:22:25.1 GPX color/monochrome support loaded gpx$InitOutput address=12a410 26-JUN-1990 14:22:27.1 Now I call scheduler/dispatcher 26-JUN-1990 14:22:28.8 Connection 286c70 is accepted by Txport 26-JUN-1990 14:22:29.4 Connection 286ca8 is accepted by Txport 26-JUN-1990 14:22:33.0 Connection 286c70 is closed by Txport 26-JUN-1990 14:23:46.5 Connection 286ca8 is closed by Txport 26-JUN-1990 14:23:49.2 Connection 286c70 is accepted by Txport 26-JUN-1990 14:23:59.3 Connection 286ca8 is accepted by Txport 26-JUN-1990 14:23:59.7 Connection 286ce0 is accepted by Txport 26-JUN-1990 14:24:04.8 Connection 286d18 is accepted by Txport 26-JUN-1990 14:25:20.0 Connection 286d18 is closed by Txport 26-JUN-1990 14:25:28.1 Connection 286d18 is accepted by Txport 26-JUN-1990 14:26:03.1 Connection 286d18 is closed by Txport 26-JUN-1990 14:26:49.6 Connection 286d18 is accepted by Txport 26-JUN-1990 14:29:57.5 Connection 286d18 is closed by Txport 26-JUN-1990 14:30:01.5 Connection 286ce0 is closed by Txport 26-JUN-1990 14:30:01.7 Connection 286ca8 is closed by Txport 26-JUN-1990 14:30:02.0 Client 1 resets the server 26-JUN-1990 14:30:02.5 GPX color/monochrome support loaded gpx$InitOutput address=12a410 26-JUN-1990 14:30:04.9 Now I call scheduler/dispatcher 26-JUN-1990 14:30:06.3 Connection 286c70 is accepted by Txport 26-JUN-1990 14:30:06.8 Connection 286ca8 is accepted by Txport nnection 2446c0 is closed by Txport 26-JUN-1990 14:22:18.9 Connection 2947c0 is closed by Txport 26-JUN-1990 14:22:24.2 Connection 286ca8 is closed by Txport 26-JUN-1990 14:22:24.3 Connection 286ce0 is closed by Txport 26-JUN-1990 14:22:24.7 Client 1 resets the server 26-JUN-1990 14:04:14.3 Connection 286ce0 is accepted by Txport 26-JUN-1990 14:04:18.0 Connection 286d18 is accepted by Txport 26-JUN-1990 14:30:06.8 Connection 286ca8 is accepted by Txport 26-JUN-1990 14:30:08.5 Connection 286c70 is closed by Txport 26-JUN-1990 14:30:23.3 Connection 286ca8 is closed by Txport 26-JUN-1990 14:30:25.6 Connection 286c70 is accepted by Txport 26-JUN-1990 14:30:29.5 Connection 286ce0 is accepted by Txport 26-JUN-1990 14:30:40.5 Connection 286ca8 is accepted by Txport 26-JUN-1990 14:30:40.9 Connection 286d18 is accepted by Txport 26-JUN-1990 14:30:45.5 Connection 23b848 is accepted by Txport 26-JUN-1990 17:36:07.5 Connection 2d6818 is accepted by Txport 26-JUN-1990 17:37:05.2 Connection 286d18 is closed by Txport 26-JUN-1990 17:37:05.4 Connection 286ca8 is closed by Txport 26-JUN-1990 17:37:05.7 Client 3 resets the server 26-JUN-1990 17:37:07.5 GPX color/monochrome support loaded gpx$InitOutput address=12a410 26-JUN-1990 17:37:12.1 Now I call scheduler/dispatcher 26-JUN-1990 17:37:13.7 Connection 2d6c18 is accepted by Txport 26-JUN-1990 17:37:14.2 Connection 2d6c50 is accepted by Txport 26-JUN-1990 17:37:15.7 Connection 2d6c18 is closed by Tx 27-JUN-1990 09:24:28.4 Connection 2e47a0 is accepted by Txport 27-JUN-1990 16:11:24.3 Connection 2446f8 is accepted by Txport 27-JUN-1990 18:19:21.9 Connection 2d6c88 is closed by Txport 27-JUN-1990 18:19:22.1 Connection 2d6c50 is closed by Txport 27-JUN-1990 18:19:22.8 Client 3 resets the server 27-JUN-1990 18:19:24.8 GPX color/monochrome support loaded gpx$InitOutput address=12a410 27-JUN-1990 18:19:30.2 Now I call scheduler/dispatcher 27-JUN-1990 18:19:32.1 Connection 355a18 is accepted by Txport 27-JUN-1990 18:19:32.3 Connection 355a50 is accepted by Txport 27-JUN-1990 18:19:35.7 Connection 355a18 is closed by Txport 27-JUN-1990 18:19:57.7 Connection 355a50 is closed by Txport 27-JUN-1990 18:19:59.8 Connection 355a18 is accepted by Txport 27-JUN-1990 18:20:03.5 Connection 35eab0 is accepted by Txport 27-JUN-1990 18:20:14.3 Connection 355a50 is accepted by Txport 27-JUN-1990 18:20:14.8 Connection 355a88 is accepted by Txport 27-JUN-1990 18:20:21.3 Connection 363530 is accepted by Txport 27-JUN-1990 18:28:46.7 Connection 363568 is accepted by Txport 27-JUN-1990 18:28:47.9 Connection 363568 is closed by Txport (status = 20e4) 27-JUN-1990 18:30:00.2 Connection 355ac0 is accepted by Txport 27-JUN-1990 18:30:01.2 Connection 355ac0 is closed by Txport (status = 20e4) 27-JUN-1990 18:31:43.1 Connection 355a50 is closed by Txport 27-JUN-1990 18:31:43.3 Connection 355a88 is closed by Txport 27-JUN-1990 18:31:43.9 Client 3 resets the server 27-JUN-1990 18:31:45.2 GPX color/monochrome support loaded port 27-JUN-1990 09:16:23.3 Connection 2d6c50 is closed by Txport 27-JUN-1990 09:16:25.4 Connection 2d6c18 is accepted by Txport 27-JUN-1990 09:16:30.7 Connection 2dfce8 is accepted by Txport 27-JUN-1990 18:31:45.2 GPX color/monochrome support loaded gpx$InitOutput address=12a410 27-JUN-1990 18:31:49.4 Now I call scheduler/dispatcher 27-JUN-1990 18:31:51.3 Connection 355a18 is accepted by Txport 27-JUN-1990 18:31:52.0 Connection 355a50 is accepted by Txport 27-JUN-1990 18:31:54.7 Connection 355a18 is closed by Txport 28-JUN-1990 09:02:35.3 Connection 355a50 is closed by Txport 28-JUN-1990 09:02:37.6 Connection 355a18 is accepted by Txport 28-JUN-1990 09:02:41.3 Connection 355a88 is accepted by Txport 28-JUN-1990 09:02:52.5 Connection 355a50 is accepted by Txport 28-JUN-1990 09:02:52.7 Connection 355ac0 is accepted by Txport 28-JUN-1990 09:02:56.2 Connection 355af8 is accepted by Txport 28-JUN-1990 09:11:50.1 Connection 3635a0 is accepted by Txport 28-JUN-1990 09:12:16.2 Connection 244730 is accepted by Txport 28-JUN-1990 15:36:21.4 Connection 244768 is accepted by Txport 28-JUN-1990 15:37:55.8 Connection 244768 is closed by Txport (status = 20e4) 28-JUN-1990 16:13:17.4 Connection 2444a8 is accepted by Txport {BRH555} $ 27-JUN-1990 09:16:41.3 Connection 2d6c50 is accepted by Txport 27-JUN-1990 09:16:42.0 Connection 2d6c88 is accepted by Txport 27-JUN-1990 09:16:46.6 Connection 2e4768 is accepted by Txport 27-JUN-1990 09:24:28.4 Connection 2e47a0 is accepted by Txport 27-JUN-1990 16:11:24.3 Connection 2446f8 is accepted by Txport
T.R | Title | User | Personal Name | Date | Lines |
---|---|---|---|---|---|
3011.1 | Some explanations, no answer | STAR::VATNE | Peter Vatne, VMS Development | Thu Jun 28 1990 14:55 | 22 |
We've seen lost connections in the past. However, we have no good way of identifying the cause yet. The messages: > 26-JUN-1990 14:22:24.7 Client 1 resets the server > 26-JUN-1990 13:24:57.1 Client 3 resets the server are not unusual. They just mean that you said "quit" to the session manager. What is unusual are messages such as: 28-JUN-1990 15:37:55.8 Connection 244768 is closed by Txport (status = 20e4) If you type "exit %x20e4" to DCL, you will see the message: %SYSTEM-F-LINKABORT, network partner aborted logical link This means that something unusual happened in your network that caused DECnet to shut down your X connection. There are no X11 protocol timers involved. This has all to do with DECnet timers and conditions. I'm sorry, but I don't have any suggestions for what to look for. | |||||
3011.2 | Connections are allways lost on same CI-node. | KETJE::STEUKERS | Fri Jun 29 1990 06:52 | 19 | |
Peter, Thanks for your reply. I'll crosspost the note into the DECNETVAX notesfile. In my opinion it must be an Ethernet/Ethernet-controller related problem, since the faults happen within members of the same cluster, between standalone stations (VMS and Ultrix) and the cluster, over transport layer DECnet as well as TCP/IP. Or it must be a CI-node problem, since the lost connections always happen on the same 6000 machine. I'll have a survey amongst the users to see if they also loose LAT or CTERM sessions on that CI-node. STEUKERS Erik. | |||||
3011.3 | DECWIN::JMSYNGE | James M Synge, VMS Development | Fri Jun 29 1990 17:41 | 6 | |
Erik, Is the VAX 6000 from which you are losing connections also a DECnet router? James | |||||
3011.4 | Yes, client side is a ROUTER | KETJE::STEUKERS | Thu Nov 15 1990 06:09 | 59 | |
James, We still have the problem down here. The client side that braks the connections is indeed a ROUTER IV node. The executor characteristics are the following: Node Volatile Characteristics as of 15-NOV-1990 10:51:12 Executor node = 48.15 (BRSADV) Identification = DECnet-VAX V5.3-1, VMS V5.3-1 Management version = V4.0.0 Incoming timer = 45 Outgoing timer = 60 Incoming Proxy = Enabled Outgoing Proxy = Enabled NSP version = V4.1.0 Maximum links = 40 Delay factor = 80 Delay weight = 5 Inactivity timer = 60 Retransmit factor = 10 Routing version = V2.0.0 Type = routing IV Routing timer = 600 Broadcast routing timer = 180 Maximum address = 1023 Maximum circuits = 16 Maximum cost = 1022 Maximum hops = 30 Maximum visits = 63 Maximum area = 63 Max broadcast nonrouters = 64 Max broadcast routers = 32 Maximum path splits = 1 Area maximum cost = 1022 Area maximum hops = 30 Maximum buffers = 100 Buffer size = 576 Nonprivileged user id = DECNET$SERVER Nonprivileged password = ????????? Default access = incoming and outgoing Pipeline quota = 13000 Alias incoming = Enabled Alias maximum links = 32 Alias node = 48.559 (KETJE) Path split policy = Normal Maximum Declared Objects = 31 Sorry it took a while to look at Your reply. Thanks on forehand. Erik. | |||||
3011.5 | my gess is that it's the long wire | DECWIN::LEMBREE | Just do it. | Fri Nov 16 1990 09:47 | 16 |
Erik, James has gone on to a new position since his reply to your last message, so we may not hear back from him in this conference. Sometimes networks can get a little bit flaky when you're running over a wire as complex as the one you've described. It's possible that packets get lost and cannot be resent because of various reasons. This could cause either TCP/IP or DECnet to lose the link. One experiment to try would to get a workstation on the same ethernet segment as the 6000 series machine and see if the problem occurs there. If it does, then you have a problem with the network local to the 6000. If it doesn't, then the problem is probably associated with the long run to the remote site. I hope this helps some. As always though, if this is a customer problem, this should be elevated through proper support channels, as this conference isn't a way to address customer issues. Rob | |||||
3011.6 | Don't accept the explanation `flakey network'.... | CVG::PETTENGILL | mulp | Fri Nov 16 1990 23:17 | 14 |
There is a real problem somewhere, although it hasn't been found yet as far as I know. It does see to be related to VAX/VMS routers. It seems to show up most often with DECwindows because there is a much higher probability of a message being outstanding at the time of some event and this event then causes the link to be broken and of course a user is very likely to see this and be annoyed by it. But probably happens at other times, but we have things like FTSV that mask the problem. After trying to isolate and reproduce this problem, I became convinced that there are a set of network problems that are being handwaved because they are very difficult to isolate. Reports and investigations have been going on for years. However, if we don't track them down, as DECwindow terminals become more common, we're going to have a growing customer satisfaction problem on our hands. Actually, we already have that situation.... | |||||
3011.7 | I've never seen a 'flakey network' return SS$_LINKABORT | KETJE::CORNELIS | Roger Cornelis - EIS Brussels - 856-7612 | Tue Nov 27 1990 08:27 | 23 |
Hi, I'm one of the victims that have just given up to work on the affected CI-member/DECnet router. It's just impossible when you get thrown out several times a day. Even if (!) the network is 'flakey', that does not explain why we get the error message %SYSTEM-F-LINKABORT, network partner aborted logical link ^^^^^^^^^^^^^^^ i.e. the DECwindows server I would rather expect a message like %SYSTEM-F-PATHLOST, path to network partner node lost ^^^^ In addition, I've never been thrown out when using LAT or CTERM. There's a comparable problem mentioned in the VAXcluster conference (1919.*). I just did an NCP> TELL node SHOW EXECUTOR CHARACTERISTICS, and they only seem to have problems with the single routing node. Any clues ... please? Roger | |||||
3011.8 | Possible help! | DSTEG1::HOSSFELD | I'm so confused! | Tue Nov 27 1990 10:50 | 8 |
one of the most frequent disconnects is caused when the window 'hangs'. One possible work around, if your problem is caused by a 'hang', is to keep watch on the window when you are working in it and if it 'hangs' stop all activity in that window until it starts running again. Paul H. | |||||
3011.9 | Client side wich breaks connections is a ROUTING IV node. | KETJE::STEUKERS | Tue Nov 27 1990 12:15 | 80 | |
Yes, Indeed the client side that looses the connections is a router. Could it be that the NETACP process is too busy handling other network stuff? The characteristics of the node are NCP>show exec char Node Volatile Characteristics as of 27-NOV-1990 17:48:53 Executor node = 48.15 (BRSADV) Identification = DECnet-VAX V5.3-2, VMS V5.3-2 Management version = V4.0.0 Incoming timer = 45 Outgoing timer = 60 Incoming Proxy = Enabled Outgoing Proxy = Enabled NSP version = V4.1.0 Maximum links = 40 Delay factor = 80 Delay weight = 5 Inactivity timer = 60 Retransmit factor = 10 Routing version = V2.0.0 Type = routing IV Routing timer = 600 Broadcast routing timer = 180 Maximum address = 1023 Maximum circuits = 16 Maximum cost = 1022 Maximum hops = 30 Maximum visits = 63 Maximum area = 63 Max broadcast nonrouters = 64 Max broadcast routers = 32 Maximum path splits = 1 Area maximum cost = 1022 Area maximum hops = 30 Maximum buffers = 100 Buffer size = 576 Nonprivileged user id = DECNET$SERVER Nonprivileged password = IDONTGETFOOLED Default access = incoming and outgoing Pipeline quota = 13000 Alias incoming = Enabled Alias maximum links = 32 Alias node = 48.559 (KETJE) Path split policy = Normal Maximum Declared Objects = 31 NCP> NETACP runs with the default quotas execpt the folowing: "NETACP$EXTENT" = "8000" "NETACP$MAXIMUM_WORKING_SET" = "6000" "NETACP$PAGE_FILE" = "20000" What makes the DECwindows client braking,or is it DECNET that takes the initiative? Wich counters/timers could be tried out to solve this problem? Applications run on transport layer DECNET or TCP both loose their connections. Altough the ones via DECNET are easier lost than the one on TCP. I have no feedback on users using the LAT transport layer with VT1200/VT1300s. Crossposted in the DECNETVAX notes file. Thanks on forhand, Erik. | |||||
3011.10 | Problem is probably bugs in DECnet | STAR::VATNE | Peter Vatne, VMS Development | Tue Nov 27 1990 14:09 | 8 |
I have received word that the problems with losing X connections over DECnet is probably due to a number of conspiring DECnet bugs. We are in the midst of testing the proposed fixes. In the meantime, all I can suggest is to keep your network as quiet as possible, make sure your NETACP process has lots of memory, and make sure your router is the fastest CPU possible. |