[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference clt::cma

Title:DECthreads Conference
Moderator:PTHRED::MARYSTEON
Created:Mon May 14 1990
Last Modified:Fri Jun 06 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:1553
Total number of notes:9541

1531.0. "why does a UNIX main need -threads for threaded .so?" by EDSCLU::GARROD (IBM Interconnect Engineering) Fri Apr 25 1997 14:16

    
    I just read the following in note 1530.1.
    
>Note 1530.1    NSR crashes after DCE installation on UNIX V3.2D-2         1 of 3
>DCETHD::BUTENHOF "Dave Butenhof, DECthreads"         32 lines  25-APR-1997 08:53
>...
>It is a requirement that, to use libpthreads within a process, the MAIN
>PROGRAM must have been linked with a direct and explicit dependency on the
>threads libraries. You cannot link a nonthreaded main program and have it run
>with a threaded library.

    The above applying to Digital UNIX prior to V4.0. Since we distribute a
    .so module that heavily uses pthreads I'd like to understand the
    implications of a customers program linking against our .so but having
    no reference to libpthreads or compiling -threads. So my questions are:
    
    1, Exactly what happens?
    2, Exactly what doesn't work?
    3, Doesn't the .so module automatically map in libpthreads.so and its
       dependencies?
    4, Why does the main program cc command need a -threads if it doesn't
       reference anything thread like?
    5, How in general is a main program meant to know that some .so it uses
       has threads calls in it?
    
    Thanks for illumination on this subject.
    
    Dave
T.RTitleUserPersonal
Name
DateLines
1531.1DCETHD::BUTENHOFDave Butenhof, DECthreadsFri Apr 25 1997 14:5365
First, remember that we're talking about Digital UNIX (and DEC OSF/1) prior
to Digital UNIX 4.0...

>    1, Exactly what happens?
>    2, Exactly what doesn't work?
>    3, Doesn't the .so module automatically map in libpthreads.so and its
>       dependencies?

I'll take these together, rather than out of order, since the real answer
comes out of question 3.

Yes, .so files store dependencies, and all libraries on which they're
dependent will be mapped when they're loaded. So if the standard "ls",
completely thread-unaware, ends up dlopen-ing your threaded library, the
loader says "ah ha!" and maps in libpthreads.so, libmach.so, and libc_r.so.

But it maps them AFTER all the libraries.

Because libc was completely thread-unaware, we had libc_r, which preempts a
bunch of libc entry points to make them thread-safe. For bizarre historical
reasons I couldn't possible justify since I argued strongly against them,
some thread-safe functions (notably fork() and malloc()/free()/etc.) were not
even in libc_r, but in libpthreads. Because all of these libraries are mapped
"at the end" of the dependency list, they are unable to preempt all of those
unsafe libc entry points. The result is a program that simply isn't
thread-safe, which means that just about anything can happen.

And changing the loader so that it DID preempt the libc entry points wouldn't
be any better. There's already existing static state in libc, and libc_r is
not a clean and perfect replacement (it's more of a blast shield to keep the
worst of the impact of threads away from poor defenseless libc). You'd have
allocated memory (from malloc) that now couldn't be freed, stdio streams that
don't exist, or have different state, and so forth.

Bummer.

>    4, Why does the main program cc command need a -threads if it doesn't
>       reference anything thread like?

In case it's not already obvious, having the main program built with -threads
ensures that the libraries are pulled in IN THE RIGHT ORDER, regardless of
any shared library dependencies.

>    5, How in general is a main program meant to know that some .so it uses
>       has threads calls in it?

You just gotta know.

Hey, at least we've fixed most of the problem in 4.0, except that exceptions
don't quite work if the main program wasn't built with -lexc (a problem that
would be VERY easy for the development environment people to solve, since
libexc preempts only one libc symbol). And of course code built without
_REENTRANT would use the wrong errno -- which may or may not be a problem
(and we'd like to get this solved with the compiler folks, but we haven't
worked out a good method yet).

Solaris, by the way, has this problem in spades, with no solution in sight
(and no sign that anyone even wants to solve it). They didn't think of
something like TIS, so they made libc thread-safe by making direct calls to
thread synchronization functions. They made it work without the thread
library by putting stubs for all of these functions into libc. So threads
simply don't work, at all, even potentially, if you load the thread library
after libc. Period, end of game.

	/dave
1531.2How can you be sure a customer has done it right?EDSCLU::GARRODIBM Interconnect EngineeringFri Apr 25 1997 16:1530
    Re .1                                                            
    
    Thanks for your comprehensive explanation, it's very helpful.
    
    One additional question though. As a maintainer of a .so that uses
    threads I'd like to be sure that any customer program has been cc ed 
    -threads or has specified -libpthreads and the rest on the compile.
    Asking the customer how their program was compiled/linked is an
    extremely unreliable solution (learned from many years of dealing with
    customers). Never believe the data supplied with a problem if it comes
    from a human and is not backed up by computer output has unfortunately
    become a hard learned lesson. "Yes of course we have the XXX patch
    installed, Joe installed it last week." Many hours later you learn that
    indeed Joe installed it but it was on a different machine, but I
    digress.
    
    So is there any easy way to tell from a core dump or from the behaviour
    of a customer program that it was linked properly?
    
    I presume that not linking it properly leads to flaky problems like
    corrupted malloc's etc from the use of non thread safe functions from
    libc. Is that a correct presumption?
    
    As an aside it seems to be the Solaris approach is preferable to the
    pre V4.0 Digital approach because the failure mode is HARD rather than
    flaky failures that burn up large amount of support time to find.
    
    I wonder how many programs out there are using threads but due to the
    above .so modules using threads are actially making calls to non thread
    safe functions. 
1531.3Thank Satan for symbol preemption...WTFN::SCALESDespair is appropriate and inevitable.Fri Apr 25 1997 17:2523
.2> So is there any easy way to tell from a core dump or from the behaviour
.2> of a customer program that it was linked properly?

You might try asking the debugger where the fork() function (or any of several
others) are:  if it shows up in libc then things are bad; if it shows up in
libpthreads things are, well, worse?  ;-)

.2> I presume that not linking it properly leads to flaky problems like
.2> corrupted malloc's etc from the use of non thread safe functions from
.2> libc.

Well, it leaves the application open to reentrancy problems, which typically
result in corruption-type problems.

.2> it seems to be the Solaris approach is preferable to the
.2> pre V4.0 Digital approach because the failure mode is HARD rather than
.2> flaky failures that burn up large amount of support time to find.

No, I think the net result on Solaris is the same -- you've got a threaded
process using unsafe functions.


				Webb