[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference bulova::decw_jan-89_to_nov-90

Title:	DECWINDOWS 26-JAN-89 to 29-NOV-90
Notice:	See 1639.0 for VMS V5.3 kit; 2043.0 for 5.4 IFT kit
Moderator:	STAR::VATNE

Created:	Mon Oct 30 1989
Last Modified:	Mon Dec 31 1990
Last Successful Update:	Fri Jun 06 1997
Number of topics:	3726
Total number of notes:	19516

516.0. "Revisit 0x2dba002 Error" by LVS::HABERLAND () Fri Mar 31 1989 10:29

I have read the previous notes 318 and 197 about the problem, but it is still 
occurring. About the configuration:

LAVc with 3500 (32MB) as Boot Node running SDC VMS 5.1

3 Satellites   -  2 GPX's with 16MB each and a VS2000 with 14 MB each having a 
local disk used for paging and swapping. (Pag 50k - Swap 10k)

I had the customer 

define DECW$SERVER_RETRY_WRITE_MIN 150000/TABLE=DECW$SERVER0_TABLE

and

define DECW$SERVER_RETRY_WRITE_MAX 3000000/TABLE=DECW$SERVER0_TABLE

The error they are receiving is

Error with Xlib Connections on Server
XIO non-translatable VMS Error Code
DEC-W-E-CNX-ABORT
Xlib-f-io error 0x2dba002

The problem progressively gets worst as the day goes on. Then they have to 
restart the server to clear the problem.

Does anyone have any suggestions on what I should check next?

Thanks,
     Dave Haberland
     SWS/E Graphics Support

T.R	Title	User	Personal Name	Date	Lines
516.1	Does "no response" = "no solution" ??	HPSTEK::JBATES	John D. Bates	`Sat Apr 29 1989 00:40`	18
	After placing the DECW$SERVER_RETRY_WRITE_MIN and DECW$SERVER_RETRY_WRITE_MAX parameters on our server nodes and increasing the number of LRPs our 2DBA002 problems "virtually" went away. The only problems we now see is when we have network problems. HOWEVER: I now have a user at another site that has a configuration very similar to .0 and has done the things that made us fly and he seems to have a LOT of these errors. Since this note has not been responded to in almost a month is it safe to assume that if these things mentioned above don't fix the problem there are no other fixes to try? I am going to the user site next week and am not looking forward to saying "Hey learn to live with it". Any suggestions welcome. John
516.2	Our favorite - the 0x2dba002 error	34858::SOCHA	Out in the Field	`Mon May 22 1989 13:00`	39
	There have been numerous Notes in this Conference regarding the infamous 0x2dba002 error which can occur when running applications remotely. From the internal QAR database for DECwindows, it was stated that this would not be fixed until DECwindows v2.0. -.1 >>> Since >>> this note has not been responded to in almost a month is it safe >>> to assume that if these things mentioned above don't fix the >>> problem there are no other fixes to try? Since noone seems to have a definite fix for those of us afflicted with this problem, I would like to gather a list here of those actions which seem to help. Perhaps someone from engineering could indicate whether this problem is always caused by resource shortages, or whether program errors can come up under this error. So far, I have seen the following recommendations: (1) Set DECW$SERVER_RETRY_WRITE_MIN = 150000 Set DECW$SERVER_RETRY_WRITE_MAX = 3000000 (2) Increasing the system parameter LRPCOUNT. (3) Increasing the number of DECnet Line Receive Buffers on the workstation. (4) Increasing the number of global pages. Unfortunately, the problem still persists. Remote DECwrite applications are aborted around every 10 minutes, and performance of remote applications is very jerky. I have observed a large number of DECnet Line User and System buffers unavailable on the workstations. The DECnet circuit only shows a few, and there are only a few on the remote client. So what do we try next?? Kevin
516.3	One catylst found...	38320::KIRK	Steve Kirk	`Mon May 22 1989 17:06`	10
	No suggestions on parameter settings here. I have however observed that the ListBox widget exacerbates the problem. Our product group eventually wrote our own list-box equivalent because anytime we did a large number of changes to the contents of a ListBox widget, the connection was lost via the infamous 0x2dba002 error. Rather irritating...
516.4	My collected wisdom on the subject, FWIW	DECWIN::FISHER	Burns Fisher 381-1466, ZKO3-4/W23	`Tue May 23 1989 10:57`	36
	The reason there are not too many answers is that there are not too many workarounds. I think, after some experimentation, that I would reduce the retry_min number substantially. For VMS 5.1, try something more like 5000 (one retry every 500 ms.) On VMS version 2, the units of these numbers have been fixed to be milliseconds, so the number would be more like 500. As to the max, I'm not sure that 3000000 is reasonable. This means that you will keep trying (and thus hanging your server) for 5 minutes. Chances are if it does not work in 30 seconds, it's not going to. I would put 300000 for V5.1. Note that for VMS V5.2, the numbers will default to what I suggested. You will want to remove any logical name definitions that you have made, since the multiplier factor has changed. (Sorry about that, but it was just plain wrong in V5.1, and it is not documented anyway!) We did some tests with people who use DECwrite, and they thought these numbers were reasonable. If that does not help, the next step is to try the TCP/IP transport (see the TCPIP keyword in this conference), or wait for the internal field test of Version 2 DECwindows. TCP will make more efficient use of transport buffers. Version 2 will improve the transport buffer use efficiency for all transports, and will give you some "knobs to turn" for the buffer sizes. You asked about whether this can be caused by program error. The answer is yes, but the reverse is not true. Bugs can cause the problem, but the problem does not necessarily imply bugs in the client. If you do anything which prevents the client from reading for a while, you can cause the problem. For example, when you respond to a button/menu/whatever and go off and do some work without periodically looking at the input queue, you are making the problem more likely to happen. However, it can still happen with a perfectly ok client which is swamped with requests. Scroll bars with a "magnifier window" like mail are my favorite way to make this happen, for example. Burns
516.5		4315::KONING	NI1D @FN42eq	`Tue May 23 1989 16:30`	6
	Why does DECnet use buffers less efficiently than TCP? Or to put it differently, is the DECnet transport going to be fixed up? There aren't any obvious reasons why one would be less efficient than the other. paul
516.6	Internally...	STAR::BRANDENBERG	Si vis pacem para bellum	`Tue May 23 1989 17:31`	8
	What Burns meant was that TCP/IP internally buffers more efficiently than DECnet does. (Actually, it buffers less efficiently but more effectively.) The DECwindows transports themselves are nearly identical. monty
516.7	Under VMS that is...	56579::thomas	The Code Warrior	`Tue May 23 1989 18:32`	2
	Under Ultrix, DECnet and TCP buffer almost identically.
516.8		ORPHAN::WINALSKI	Paul S. Winalski	`Wed May 24 1989 01:01`	19
	My favorite way to make this happen is to start any DECwindows application with the debugger, say GO in debug, then pull down the Commands menu and select EXIT. Debug's exit handler gets control before the server has finished queueing up the flurry of expose events, unmaps, etc. that accompany tearing down a widget hierarchy. The client isn't dispatching events because it's stuck in debug. If you ran the application from a DECterm, you can't even unstick things by giving debug a ^Z or exit, because it's the whole server that's hung, and it won't deliver stuff to the DECterm any more. It's more that TCP/IP and DECnet buffer differently than that one works better than the other. The MIT X server code was designed originally on Unix and a TCP/IP-based transport. It's transport management code thus fits that transport very effectively. Had they run on DECnet from day 1, they would have approached the problem differently and this hang problem might not exist. It's possible that, under those circumstances, it would have been TCP/IP with the problems. --PSW
516.9		4315::KONING	NI1D @FN42eq	`Wed May 24 1989 11:36`	4
	Re .6: fine, but my question still applies. paul
516.10	maybe	STAR::BRANDENBERG	Si vis pacem para bellum	`Wed May 24 1989 12:27`	7
	It may happen as part of the IPC project. We talked to them and begged for stream semantics for DECnet IPC connections as an option. If they have time/resources, they will do it. m
516.11		DECWIN::FISHER	Burns Fisher 381-1466, ZKO3-4/W23	`Wed May 24 1989 12:27`	12
	re .5: I will tell you my explanation as a non-expert and let Monty fill in the gaps and fix up the misconceptions... DECnet record oriented and TCP is stream-oriented. Given that the flow of info between client and server is stream-oriented, it favors TCP. The version 2 DECwindows server has code which compresses more X packets into DECnet buffers, which makes it look more stream-like. Monty? Burns
516.12		STAR::BRANDENBERG	Si vis pacem para bellum	`Wed May 24 1989 12:29`	5
	Burns and I collided but what he says is so. The changes alleviate but do not eliminate the problem. m
516.13	Rehashing old warmed over hash	40470::PETTENGILL	mulp	`Wed May 24 1989 22:35`	7
	See the imfamous note 60, response 66; I actually looked at what was going on with a datascope. DECnet/VAX gets to send 18 datagrams with 10,000 bytes of quota while VAX Ultrix Connection allows 60 datagrams with 4096 bytes of quota. Another implementation of DECnet, for example, DECnet Ultrix, may not have the same behavior as DECnet/VAX. However, in the simple test in 60.66, the server aborted the connection in both cases.
516.14	another fix for 0x2dba002	MDVAX3::SOCHA	Out in the Field	`Fri Aug 18 1989 12:42`	27
	I got this response in the DECwrite conference to repeated problems with the 0x2dba002 error. It is another thing to try, especially if you have alot of nodes on your LAN. Kevin <<< QUEEN::PIX1:[PUBLIC.NOTES]EPIC.NOTE;6 >>> -< You can't go wrong with DECwrite >- ================================================================================ Note 1959.21 DECwrite or DECwindows error? 21 of 21 DCC::HAGARTY "Essen, Trinken und Shaggen..." 13 lines 18-AUG-1989 04:24 -< Network configuration! >- -------------------------------------------------------------------------------- Ahhh Gi'day...� Sounds like the infamous BROADCAST NONROUTERS problem! MAKE SURE THAT THIS IS DONE ON ALL SYSTEMS IN THE LAN, but firstly yours... Count the numbers of nonrouters on the LAN (say in the region of 300), and do a: $ MC NCP SET EXEC MAX BROADCAST NONROUTERS 512 $ MC NCP DEF EXEC MAX BROADCAST NONROUTERS 512 This will stop the timeouts happening to the other nodes in the LAN! On big machines, make it 1024!
516.15	Curious	EAGLE1::BRUNNER	VAX Vector Architecture	`Thu Jan 04 1990 20:33`	4
	What I am trying to figure out as a novice is why I get this error when I invoke a remote DECWRITE through a remote FileVUE (both on the same system) but not when I invoke the remote DECWRITE directly (by remote job or logging into the remote system.) How is FileVUE getting in the way?