[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference bulova::decw_jan-89_to_nov-90

Title:DECWINDOWS 26-JAN-89 to 29-NOV-90
Notice:See 1639.0 for VMS V5.3 kit; 2043.0 for 5.4 IFT kit
Moderator:STAR::VATNE
Created:Mon Oct 30 1989
Last Modified:Mon Dec 31 1990
Last Successful Update:Fri Jun 06 1997
Number of topics:3726
Total number of notes:19516

2105.0. "fatal x i/o errors" by AUSTIN::BROWN (10$: BRB 10$) Mon Jan 22 1990 16:43

    Below are some questions one of our customers asked regarding x clients
    and fatal x i/o errors. They are running decwindows v1 on vms 5.1 and 
    having trouble with client disconnects.
    
1. Under what circumstances would an X client get fatal X I/O errors?
2. Under what circumstances would an X client connection be aborted?
3. Is there a limit on the number of X ASTs? If there is, what happens
when the limit is reached? Would it cause fatal X I/O errors?
Would it cause X connection be aborted?
    
    
    -thanks in advance,
     ed
T.RTitleUserPersonal
Name
DateLines
2105.1SITBUL::KLEINSORGEBFMMon Jan 22 1990 16:5711
    Here is something that we have found on V1 (VMS) DECwindows.  If you
    call X$QUERY_POINTER more than about 300 times you get a fatal IO error.
    
    This problem is not in V2 (VMS V5.3).
    
    We run a lot of things from AST and have real headaches with
    deadlocking the server if we're not careful when doing something
    synchronous OR dying because the client stayed at AST too long and
    having the input queue fill up, deadlock the server and die.
    
    
2105.2Please upgrade ASAPDECWIN::JMSYNGEJames M Synge, VMS DevelopmentMon Jan 22 1990 17:2041
    Let me attempt to (partially) answer your questions.
    
    > 1. Under what circumstances would an X client get fatal X I/O errors?
    
    When a) the network lost connectivity for some reason;
         b) the server aborted the connection;
         c) a transport data structure got corrupted.
    
    > 2. Under what circumstances would an X client connection be aborted?
    
    If the server attempts to write to the client, but there are no buffers
    available for the data, then the server will enter a retry loop. 
    During this time, the server will appear to be locked up (because it
    is!).  If this retry loop times out, then the client will be aborted.
    
    In V1, this whole mechanism was much less robust than in V2.  We HIGHLY
    recommend upgrading to VMS V5.3 in order to get the new version of
    transport (which only works with the other V2 components).
    
    > 3. Is there a limit on the number of X ASTs?
    
    In V1, an enormous number of ASTs could get generated with in the
    transport layer.  This has been fixed (there should now use at most 3
    simultaneous ASTs per connection).
    
    It is still possible, by using certain backdoors, to cause Xlib to fire
    off lots of ASTs (one per event if you're really foolish :-)).  There
    is no limit on the number which Xlib will attempt to declare, other
    than ASTLM.
    
    > If there is, what happens when the limit is reached?
    
    I don't remember what the Xlib code does.  (I think it just ignores the
    error).  Transport attempts to ignore the error, but it could cause
    things to become rather screwed up, because some exec mode code would
    believe it has sent a signal to some user mode code, which would never
    receive it.  I should look into making that more robust.
    
    > Would it cause fatal X I/O errors? Would it cause X connection be aborted?
    
    Maybe.
2105.3Fatal X i/o errors - in DECwindows 2.0 as wellJEREMY::GALI don't know anything, I just work hereThu Jan 25 1990 09:1159
I seem to be having the same problem described in .0 . I hope this is the right
place to raise the problem.

I run applications such as EVE, Notes, CMS FileView etc. on a cluster mainframe,
in client/server mode, that is, displaying on a VMS workstation 
(a VAXstation II/GPX .)

Under VMS 5.2/DECwindows 1.0, client/server connection aborts used to
happen about 4-5 times a day, which was very frustrating. In fact, I was
on the verge of abandoning DECwindows altogether, except as a vehicle for
running DECterm sessions, when our system manager found out the problem had
been identified and would be fixed in the VMS 5.3/DECwindows 2.0 release. 

Our systems were upgraded to 5.3/2.0 about a month ago, but the problem still
occurs, about once or twice a day, which is still very annoying.
The error message is different, it is now

    XIO:  fatal IO error 65535  on X server "LILACH::0.0"
          after 22239 requests (22239 known processed) with 0 events remaining.
    %XLIB-F-IOERROR, xlib io error
    -SYSTEM-F-PATHLOST, path to network partner node lost

The IO error number is always 65535, but there are sometimes more than 0 events
remaining (or so the message says.)

Some more info which may be relevant:

The problem happens with all client/server applications, without any
favorites. (I run them independently, not under FileView, to minimize the
impact of a connection abort; if the FileView aborts, all the applications
under it do as well.)
Also, there seems to be no correlation between the amount of window
activity by the application and the X I/O error: it happens whether or not 
the application's window has input focus, and whether or not it is in the 
iconized state. I have also had it happen after several hours during which
all windows on the workstation were inactive (I mean, as far as the user is
concerned: no typing, mouse actions etc.; I am not familiar with the amount of
background XIO activity which probably occurs as well.) It happens whether
I have only a few windows on screen (say 5) or whether I have many (20).
It also seems to happen no matter which host on the cluster runs the
application, and how many client/server connections I have in total (I use
anywhere from one to five), and how many I have on a single host (when one
connection aborts, the other client/server applications connected to that
host still work.)
Several other people here have this problem, and others simply lost
patience and stopped using applications through DECwindows; running applications
locally on the workstation is just too slow.

Any help/ideas/thoughts will be greatly appreciated.

    ~~~
    ~~~ Gal	(who_is_potentially_a_DECwindows_fan)
    ~~~






2105.4DECWIN::JMSYNGEJames M Synge, VMS DevelopmentThu Jan 25 1990 16:188
    I would love it if we could provide a better error message, but
    PATHLOST is the error message which DECnet is telling us.
    
    Some recent notes have pointed the finger at high NETACP page fault
    rates.  Do you have this problem (most likely on the boot nodes)?
    
    James