[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference bulova::decw_jan-89_to_nov-90

Title:DECWINDOWS 26-JAN-89 to 29-NOV-90
Notice:See 1639.0 for VMS V5.3 kit; 2043.0 for 5.4 IFT kit
Moderator:STAR::VATNE
Created:Mon Oct 30 1989
Last Modified:Mon Dec 31 1990
Last Successful Update:Fri Jun 06 1997
Number of topics:3726
Total number of notes:19516

516.0. "Revisit 0x2dba002 Error" by LVS::HABERLAND () Fri Mar 31 1989 11:29

I have read the previous notes 318 and 197 about the problem, but it is still 
occurring. About the configuration:

LAVc with 3500 (32MB) as Boot Node running SDC VMS 5.1

3 Satellites   -  2 GPX's with 16MB each and a VS2000 with 14 MB each having a 
local disk used for paging and swapping. (Pag 50k - Swap 10k)

I had the customer 

define DECW$SERVER_RETRY_WRITE_MIN 150000/TABLE=DECW$SERVER0_TABLE

and

define DECW$SERVER_RETRY_WRITE_MAX 3000000/TABLE=DECW$SERVER0_TABLE

The error they are receiving is

Error with Xlib Connections on Server
XIO non-translatable VMS Error Code
DEC-W-E-CNX-ABORT
Xlib-f-io error 0x2dba002

The problem progressively gets worst as the day goes on. Then they have to 
restart the server to clear the problem.

Does anyone have any suggestions on what I should check next?

Thanks,
     Dave Haberland
     SWS/E Graphics Support

T.RTitleUserPersonal
Name
DateLines
516.1Does "no response" = "no solution" ??HPSTEK::JBATESJohn D. BatesSat Apr 29 1989 01:4018
	After placing the DECW$SERVER_RETRY_WRITE_MIN and
	DECW$SERVER_RETRY_WRITE_MAX parameters on our server nodes and
	increasing the number of LRPs our 2DBA002 problems "virtually"
	went away. The only problems we now see is when we have network
	problems. 

	HOWEVER: I now have a user at another site that has a
	configuration very similar to .0  and has done the things that
	made us fly and he seems to have a LOT of these errors. Since
	this note has not been responded to in almost a month is it safe
	to assume that if these things mentioned above don't fix the
	problem there are no other fixes to try? 

	I am going to the user site next week and am not looking forward
	to saying "Hey learn to live with it". Any suggestions welcome. 

					John

516.2Our favorite - the 0x2dba002 error34858::SOCHAOut in the FieldMon May 22 1989 14:0039
	There have been numerous Notes in this Conference regarding
the infamous 0x2dba002 error which can occur when running applications
remotely.  From the internal QAR database for DECwindows, it was stated
that this would not be fixed until DECwindows v2.0.  

-.1
>>>     Since
>>>	this note has not been responded to in almost a month is it safe
>>>	to assume that if these things mentioned above don't fix the
>>>	problem there are no other fixes to try? 

	Since noone seems to have a definite fix for those of us afflicted with
this problem, I would like to gather a list here of those actions which
seem to help.  Perhaps someone from engineering could indicate whether
this problem is always caused by resource shortages, or whether program
errors can come up under this error.

	So far, I have seen the following recommendations:

(1)	Set DECW$SERVER_RETRY_WRITE_MIN = 150000
	Set DECW$SERVER_RETRY_WRITE_MAX = 3000000

(2)	Increasing the system parameter LRPCOUNT.

(3)	Increasing the number of DECnet Line Receive Buffers on the workstation.

(4)	Increasing the number of global pages.


	Unfortunately, the problem still persists.  Remote DECwrite applications
are aborted around every 10 minutes, and performance of remote applications
is very jerky.  I have observed a large number of DECnet Line User and
System buffers unavailable on the workstations.  The DECnet circuit only
shows a few, and there are only a few on the remote client.

	So what do we try next??

Kevin

516.3One catylst found...38320::KIRKSteve KirkMon May 22 1989 18:0610
    
    No suggestions on parameter settings here.
    
    I have however observed that the ListBox widget exacerbates the
    problem.  Our product group eventually wrote our own list-box
    equivalent because anytime we did a large number of changes to the
    contents of a ListBox widget, the connection was lost via the infamous
    0x2dba002 error.  Rather irritating...
                  

516.4My collected wisdom on the subject, FWIWDECWIN::FISHERBurns Fisher 381-1466, ZKO3-4/W23Tue May 23 1989 11:5736
The reason there are not too many answers is that there are not too many
workarounds.  I think, after some experimentation, that I would reduce the
retry_min number substantially.  For VMS 5.1, try something more like
5000 (one retry every 500 ms.)  On VMS version 2, the units of these numbers
have been fixed to be milliseconds, so the number would be more like 500.
As to the max, I'm not sure that 3000000 is reasonable.  This means that you
will keep trying (and thus hanging your server) for 5 minutes.  Chances
are if it does not work in 30 seconds, it's not going to.  I would put
300000 for V5.1.

Note that for VMS V5.2, the numbers will default to what I suggested.  You
will want to remove any logical name definitions that you have made, since
the multiplier factor has changed.  (Sorry about that, but it was just
plain wrong in V5.1, and it is not documented anyway!) We
did some tests with people who use DECwrite, and they thought these numbers
were reasonable.

If that does not help, the next step is to try the TCP/IP transport (see
the TCPIP keyword in this conference), or wait for the internal field test
of Version 2 DECwindows.  TCP will make more efficient use of transport
buffers.   Version 2 will improve the transport buffer use efficiency for
all transports, and will give you some "knobs to turn" for the buffer sizes.

You asked about whether this can be caused by program error.  The answer is
yes, but the reverse is not true.  Bugs can cause the problem, but the problem
does not necessarily imply bugs in the client.  If you do anything which
prevents the client from reading for a while, you can cause the problem.
For example, when you respond to a button/menu/whatever and go off and do
some work without periodically looking at the input queue, you are making
the problem more likely to happen.  However, it can still happen with a
perfectly ok client which is swamped with requests.  Scroll bars with
a "magnifier window" like mail are my favorite way to make this happen,
for example.

Burns

516.54315::KONINGNI1D @FN42eqTue May 23 1989 17:306
Why does DECnet use buffers less efficiently than TCP?  Or to put it 
differently, is the DECnet transport going to be fixed up?  There aren't
any obvious reasons why one would be less efficient than the other.

	paul

516.6Internally...STAR::BRANDENBERGSi vis pacem para bellumTue May 23 1989 18:318
    
    What Burns meant was that TCP/IP internally buffers more efficiently
    than DECnet does.  (Actually, it buffers less efficiently but more
    effectively.)  The DECwindows transports themselves are nearly
    identical.
    
    						monty

516.7Under VMS that is...56579::thomasThe Code WarriorTue May 23 1989 19:322
Under Ultrix, DECnet and TCP buffer almost identically.

516.8ORPHAN::WINALSKIPaul S. WinalskiWed May 24 1989 02:0119
My favorite way to make this happen is to start any DECwindows application
with the debugger, say GO in debug, then pull down the Commands menu and
select EXIT.  Debug's exit handler gets control before the server has finished
queueing up the flurry of expose events, unmaps, etc. that accompany tearing
down a widget hierarchy.  The client isn't dispatching events because it's
stuck in debug.  If you ran the application from a DECterm, you can't even
unstick things by giving debug a ^Z or exit, because it's the whole server
that's hung, and it won't deliver stuff to the DECterm any more.

It's more that TCP/IP and DECnet buffer differently than that one works better
than the other.  The MIT X server code was designed originally on Unix and a
TCP/IP-based transport.  It's transport management code thus fits that
transport very effectively.  Had they run on DECnet from day 1, they would
have approached the problem differently and this hang problem might not exist.
It's possible that, under those circumstances, it would have been TCP/IP with
the problems.

--PSW

516.94315::KONINGNI1D @FN42eqWed May 24 1989 12:364
Re .6: fine, but my question still applies.

	paul

516.10maybeSTAR::BRANDENBERGSi vis pacem para bellumWed May 24 1989 13:277
    
    It *may* happen as part of the IPC project.  We talked to them and
    begged for stream semantics for DECnet IPC connections as an option. 
    If they have time/resources, they will do it.
    
    						m

516.11DECWIN::FISHERBurns Fisher 381-1466, ZKO3-4/W23Wed May 24 1989 13:2712
re .5:  I will tell you my explanation as a non-expert and let Monty
fill in the gaps and fix up the misconceptions...

DECnet record oriented and TCP is stream-oriented.  Given that the flow
of info between client and server is stream-oriented, it favors TCP.  The
version 2 DECwindows server has code which compresses more X packets into
DECnet buffers, which makes it look more stream-like.

Monty?

Burns

516.12STAR::BRANDENBERGSi vis pacem para bellumWed May 24 1989 13:295
    Burns and I collided but what he says is so.  The changes alleviate but
    do not eliminate the problem.
    
    						m

516.13Rehashing old warmed over hash40470::PETTENGILLmulpWed May 24 1989 23:357
See the imfamous note 60, response 66; I actually looked at what was going on
with a datascope.  DECnet/VAX gets to send 18 datagrams with 10,000 bytes of
quota while VAX Ultrix Connection allows 60 datagrams with 4096 bytes of quota.
Another implementation of DECnet, for example, DECnet Ultrix, may not have the
same behavior as DECnet/VAX.  However, in the simple test in 60.66, the server
aborted the connection in both cases.

516.14another fix for 0x2dba002MDVAX3::SOCHAOut in the FieldFri Aug 18 1989 13:4227
    I got this response in the DECwrite conference to repeated
    problems with the 0x2dba002 error.  It is another thing to
    try, especially if you have alot of nodes on your LAN.
    
    Kevin
    
                  <<< QUEEN::PIX1:[PUBLIC.NOTES]EPIC.NOTE;6 >>>
                     -< You can't go wrong with DECwrite >-
================================================================================
Note 1959.21              DECwrite or DECwindows error?                 21 of 21
DCC::HAGARTY "Essen, Trinken und Shaggen..."         13 lines  18-AUG-1989 04:24
                          -< Network configuration! >-
--------------------------------------------------------------------------------
Ahhh Gi'day...�

    Sounds like  the  infamous BROADCAST NONROUTERS problem! MAKE SURE THAT
    THIS IS DONE ON ALL SYSTEMS IN THE LAN, but firstly yours...

    Count the  numbers of nonrouters on the LAN (say in the region of 300),
    and do a:

    $ MC NCP SET EXEC MAX BROADCAST NONROUTERS 512
    $ MC NCP DEF EXEC MAX BROADCAST NONROUTERS 512

    This will stop the timeouts happening to the other nodes in the LAN! On
    big machines, make it 1024!

516.15CuriousEAGLE1::BRUNNERVAX Vector ArchitectureThu Jan 04 1990 20:334
What I am trying to figure out as a novice is why I get this error when I
invoke a remote DECWRITE through a remote FileVUE (both on the same system)
but not when I invoke the remote DECWRITE directly (by remote job or
logging into the remote system.) How is FileVUE getting in the way?