[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference bulova::decw_jan-89_to_nov-90

Title:DECWINDOWS 26-JAN-89 to 29-NOV-90
Notice:See 1639.0 for VMS V5.3 kit; 2043.0 for 5.4 IFT kit
Moderator:STAR::VATNE
Created:Mon Oct 30 1989
Last Modified:Mon Dec 31 1990
Last Successful Update:Fri Jun 06 1997
Number of topics:3726
Total number of notes:19516

876.0. "user-written X IO Error Handler question" by CSC32::K_TICE (Ada...Keeping the world safe for bureaucracy!) Fri Jun 02 1989 14:14

    The documnetation for XSetIOErrorHandler says a user-written error 
    handler can be created to handle "...any type of system-call error, 
    such as losing the connection to the server..."
    
    OK-fine.  I have a customer who want to do exactly this.
    
    The manual also says "This is assumed to be a fatal condition; the
    error handler should not return.  If the IO error handler does 
    return, the client process exits."
    
    Ouch!  Picture the following scenario.  One client process has done
    several XtOpenDisplay's, each to a different server.  This kind of
    architecture is one that, from my experience, occurs frequently in 
    many military applications as well as many process control applications.
    In this case, it is a power plant.
    
    ...so you lose a connection to ONE server, and you blow away the 
    client process and ALL other displays with it?  In the mean time, 
    your reactor core about to melt down, but you don't know it because
    all your displays are gone just because somebody tripped over a
    thin-wire Ethernet cord.
    
    Is there a way to recover from this kind of error without destroying 
    the client process?
    
    
    Ken

T.RTitleUserPersonal
Name
DateLines
876.1setjmp/longjmpSTAR::BRANDENBERGSi vis pacem para bellumFri Jun 02 1989 15:3810
    
    This limitation is understood by both DEC and MIT.  I don't know if
    they're planning to change the behaviour of the error handlers but
    rws' response to such comments has been that the programmer could
    setjmp/longjmp out of such an I/O error handler and *never* refer to
    the connection again.  How well this will work in a full toolkit
    environment I can't say.
    
    					monty

876.2?ULTRA::WRAYJohn Wray, Secure Systems DevelopmentFri Jun 02 1989 16:142
    What does "SETJMP/LONGJMP" mean?

876.3TLE::REAGANPascal, A kinder and gentler languageFri Jun 02 1989 16:326
    It's a C-ism.  Setjmp/longjmp are C's cousin to Pascal's non-local
    GOTO (The C RTL uses a "clone" of the PASRTL's non-local GOTO code
    do to the work...).
    
    				-John

876.4STAR::BRANDENBERGSi vis pacem para bellumFri Jun 02 1989 16:344
    
    Sorry, I'm operating in C-mode.  C RTL routines which perform stack
    unwinds.  On VMS, implemented with signals and condition handlers.

876.5More Unixism/OS than C-ism...FUEL::grahamIf people lead, the leaders will followFri Jun 02 1989 19:3516
Probably, the real question is...

how does one design a *portable* error or signal handling mechanism
without breaking the disparate language models on various operating
systems or environments?

One of the strongest selling points of X has been its portability....
however, this area of difficulty looks like a mean task for the folks
at MIT.

Check out note 691.* for a discussion of setjmps ,lonjmps and error/signal
handling under X11.

Kris...

876.6ULTRA::WRAYJohn Wray, Secure Systems DevelopmentFri Jun 02 1989 21:5513
    Note 691 seems to imply that the problem is difficult, but also seems
    to say that the X consortium is dragging its feet on the issue.  Is
    this a correct analysis, or is there active work going on to solve the
    problem?  Meanwhile, what do we tell customers who are investigating a
    distributed DECwindows-based process-control solution?
    
    The previous reply indicated that part of the problem is due to some
    deficiency in the UNIX signalling mechanism.  Is this correct, or is it
    simply that the toolkit doesn't cope with a stack-unwind properly
    (which would be a SMOP to fix, surely - a re-entrant toolkit that could
    cope with being unwound would be fully upwards compatible with the MIT
    version)?

876.7Other problems and some ramblings...FUEL::grahamIf people lead, the leaders will followSat Jun 03 1989 03:1893
This topic should attract a lot of interest...especially as it relates to
mission critical applications requiring fast and unbuffered X responses to
system and user errors.

I have been asked this question a few times by people developing critical
applications for Wall Street traders... who put severaL checks and balances
in their callback routines. (Required to trade volatile financial instruments)

The asynchronous nature of X events is actually part of the problem..although
synchronization can be forced in debugging mode.

An excerpt from the C Library Reference Protocol by Scheifler, Gettys and
Newman follows:

"Because Xlib usuallly does not transmit requests to the server immediately
(that is, it buffers them), errors can be reported much later than they
actually occur..."

However, they go on to advise those users with critical needs for custom error
handlers...

"When Xlib detects an error, it calls an error handler, which your program
can provide..." 

BUT, they NEVER tell you how to recover errors with your own routines.
That must have been the smartest thing to do at the time if you remember
what the goals of X were...

 -  The X Protocol, as repository for X data structures needed to communicate
    (send/receive) between X clients and the server, had to guarantee that
    the X interface operate correctly, regardless of operating system,
    network transports, and programming languages.

Securing the above goals is nothing trivial ..especially when you think of
how one achieves portability for error handling routines for different 
operating systems and machine architectures without compromising the X mission
goals.

Also, it must be remembered that a lot of work was done initially using UNIX 
and C.  And, C is a by-product of UNIX - if one is to trace the history of
UNIX correctly (remember the 'B' language?).

Technically, the LONGJMP and SETJMP algorithms were conceived during the
design of UNIX.  These routines are used for creating program objects/processes
and restoring stack location (push and pops) during kernel context switches.
It is easy to confuse setjmps and longjmps as they relate to system context
switching and their use during user program development.  The confusion is
not very serious...especially when one is dealing with X as a networked
system, where numerous interrupts (via signals) are generated. One can think 
of a high priority Xlib program (call) that *cannot* be interrupted at any
arbitrary point because that program was in the middle of updating a very
complicated data structure.  In this instance, the way to force critical
errors (caused elswhere) to be generated, would be to set flags in the interrupt
routine.  In this instance, setjmps and longjmps can never save your ass!
I am beginning to think that this problem is bigger than most of us would
presume.  

RE: .6

>The previous reply indicated that part of the problem is due to some
>deficiency in the UNIX signalling mechanism.  Is this correct, or is it
>simply that the toolkit doesn't cope with a stack-unwind properly

[I am not an X designer so, the ideas in here are my own...I do not claim
to represent the DECwindows designers or the people at MIT.
Maybe, they will see this note and provide better comments than mine.]

I believe the problem is combination of several unresolved problems.  The fact
that the UNIX model was the bias of the MIT designers (for a good reason),
produced a lot of X features inherent in UNIX.  The UNIX-style of handling
errors and signals are good testimonies. 

So, how do you design a toolkit that is is re-entrant - without breaking the
fundamental goals of X as a heterogeneous platform with common user code? 
Tough question!

Erorr handling comes in different flavors (at least, in UNIX).  Sometimes,
there is a need to mix and match error handling with signal handling..for
instance...a test to catch floating point errors...such as when a floating
point number overflows.

The same can be said for the use of networked X applications.  How do we
apply clean signals that determine or pinpoint the exact location of LAN
failures without combining signals and error handlers with application
re-entrancy (at the toolkit level)?

Hopefully, future X extensions will come out with pragmatic ways to deal
with most of these probelms.  Some us see the problems...just that we cannot
prescribe any solutions just yet :-)

Kris...

876.8VWSENG::KLEINSORGEToys 'R' UsSat Jun 03 1989 16:4021
    
    Aren't you trying to solve too much?  At it's simplest level, all that
    is needed is a way to tell Xlib that the error isn't fatal and that it
    should convert the error into a non-fatal error.
    
    There are two error handler routines, XSetErrorHandler and
    XSetIOErrorHandler.  The basic difference is that in one case the
    result of the return is to die, and the other is to dismiss the error
    and continue.  Well the choice of fatal/non-fatal is pretty arbitrary
    in "application" terms.  Note that HOW this is done can be as O/S specific
    as you'd like.  If the user was able to return a status (i.e. a
    "return (1);" which means "hey, forget it, we'll handle the problem",
    or "return (0);" which might mean, "die an ugly death" then it's up to
    the application to worry about the implications of the error.  What the
    application does with the error is then the applications business... very
    X'ish huh?
    
    Personally, I'd like it if error handlers could be set for each
    display (and even perhaps window) as opposed to application wide...
    

876.9CASEE::LACROIXGone with the windSun Jun 04 1989 15:4810
> how does one design a *portable* error or signal handling mechanism
> without breaking the disparate language models on various operating
> systems or environments?

    I like the signal handling mechanisms similar to the one described in a
    recent SRC report; it seems to work fine on various OS, doesn't really
    fit all languages models.

    Denis.

876.10ULTRA::WRAYJohn Wray, Secure Systems DevelopmentSun Jun 04 1989 20:1017
>             <<< Note 876.8 by VWSENG::KLEINSORGE "Toys 'R' Us" >>>
>
>    Aren't you trying to solve too much?  At it's simplest level, all that
>    is needed is a way to tell Xlib that the error isn't fatal and that it
>    should convert the error into a non-fatal error.
    
    In the case of a "connection aborted" error, you also want a way to
    tell Xlib to do some stuff of its own - ie clean up any structures that
    had anything to do with the connection.  This sort of clean-up has to
    be provided by Xlib - the application can't do it.
    
    What does the protocol allow to happen if the client detects that the
    connection has gone away like this, but the server hasn't noticed yet?
    If the client application were allowed to continue, and tried to
    establish a new connection to the server, would the server get
    confused?

876.11VWSENG::KLEINSORGEToys &#039;R&#039; UsMon Jun 05 1989 09:447
    
    I'd be happy to keep stale data structures wasting memory rather than
    be forced to have the image terminated... though I wouldn't be unhappy
    if there was also a way to get Xlib to clean up...