[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference turris::digital_unix

Title:	DIGITAL UNIX(FORMERLY KNOWN AS DEC OSF/1)
Notice:	Welcome to the Digital UNIX Conference
Moderator:	SMURF::DENHAM

Created:	Thu Mar 16 1995
Last Modified:	Fri Jun 06 1997
Last Successful Update:	Fri Jun 06 1997
Number of topics:	10068
Total number of notes:	35879

8792.0. "pthreads and longjmp" by RHETT::HALETKY () Tue Feb 11 1997 12:58

    Hello,
    
    We have a cusotmer using pthreads on DU 4.0b and wishes to use the
    longjmp command. However it appears that this causes a bugchk in
    threads. Is there any caveats when using a longjmp?
    
    Perhaps he is finding the stack pointer incorrectly? Any suggestions?
    
    Here is the code:
    #include <pthread.h>
    #include <stdio.h>
    #include <setjmp.h>
    
    
    jmp_buf x;
    pthread_t thread1, thread2;
    
    void do_thread1(int i)
    {
            printf("thread 1 started\n");
    }
    
    void do_thread2(int i)
    {
            printf("thread 2 started\n");
    }
    
    foo()
    {
            char *s;
    
            printf("inside foo; stack near %x\n", &s);
            pthread_create(&thread2, NULL, (void *) do_thread2, (void *)
    &s);
    }
    
    main()
    {
            int i;
            char *s = (char *) malloc(32760);
            char *stackptr = (s +32760- 40);
    
            if ((long) stackptr & 0xf)
                    stackptr -= (long)stackptr % 16;
    
            for (i=0; i<32760; i++)
                    s[i] = 0;
    
            printf("new stack base = %x, top = %x\n", stackptr, s);
    
            pthread_create(&thread1, NULL, (void *) do_thread1, (void *)
    &i);
    
            if (_setjmp(x) == 0)
            {
                    printf("old stack = %x\n", x[34]);
                    x[34] = (long) stackptr;
            }
    
            for (i=0; i<32760; i++)
                    s[i] = 0;
    
            printf("new stack base = %x, top = %x\n", stackptr, s);
    
            pthread_create(&thread1, NULL, (void *) do_thread1, (void *)
    &i);
    
            if (_setjmp(x) == 0)
            {
                    printf("old stack = %x\n", x[34]);
                    x[34] = (long) stackptr;
                    _longjmp(x,1);
            }
            else
            {
                    foo();
                    pthread_join(thread1, NULL);
                    pthread_join(thread2, NULL);
            }
    }

T.R	Title	User	Personal Name	Date	Lines
8792.1	Well, that's pretty funny. I think....	WTFN::SCALES	Despair is appropriate and inevitable.	`Tue Feb 11 1997 18:09`	30
	.0> We have a cusotmer using pthreads on DU 4.0b and wishes to use the .0> longjmp command. Actually it looks like he wishes to ABuse the longjmp() function. I don't know whether what he's trying to do would be considered "supported" or not, but longjmp() certainly was not INTENDED to be used in this way. The customer is using longjmp() to switch the calling thread to a new stack. This is sheer hackery, and, as far as I can guess, this is a violation of the Alpha calling standard. I suspect that the DECthreads bugcheck is arising from the threads library trying to check for a possible stack overflow and finding the SP is completely out of bounds for the current thread. (This is a WAG, since you didn't post any of the bugcheck information, such as the "reason" string.) Also, the test program never restores the stack pointer to the original value, so when the main() function returns the registers will be filled with garbage, and that could cause all manner of "interesting" effects. Why on earth is the customer trying to switch to a private stack?? What's wrong with the original stack, or why can't the customer simply create a whole separate thread instead of just a separate stack? There are a number of things currently and in the future which care a great deal that the stack be properly managed. Stuffing addresses of arbitrary chunks of memory into the stack pointer is not going to work very well, and it will work even less well in the future. Webb
8792.2	WOrkaround? Possible solutions?	RHETT::HALETKY		`Wed Feb 12 1997 10:26`	12
	Hello, Can you suggest an alternative? Perhaps more than one as you already mentioned creating a new thread. They claim the code works on 3.2g and /all/ other operating systems. The company is Sybase so I would presume it does work on other platforms. As to /why/ they are doing this, I don't know. But if a valid SP is needed, which supposedly they capture what solutions are available? -ed Haletky
8792.3	What are they trying to do??	WTFN::SCALES	Despair is appropriate and inevitable.	`Wed Feb 12 1997 14:01`	13
	.2> As to /why/ they are doing this, I don't know. Well, without knowing why they are doing this, I can't really suggest any alternative. (My suggestion of creating a thread was based on a presumption of what they might be trying to accomplish.) .2> They claim the code works on 3.2g and /all/ other operating systems. I guess they've been lucky... Webb
8792.4	More info from cusotmer. Suggestins?	RHETT::HALETKY		`Fri Feb 14 1997 15:58`	70
	Cusotmers info and reseoning: I don't think it will work this way. Sybase's Open Server product implements it's own user mode multithreading model. In the description that follows I will refer to these user mode threads as a Sybase threads. Digital UNIX's threads will be refered to as pthreads. Description of the Sybase thread implementation: In the Sybase thread model we have our own thread scheduler that runs as part of the user's process. The UNIX kernel is unaware of this. A Sybase thread is created by allocating a Sybase thread control structure and allocating a stack for the thread (by default this is done using malloc). The thread control structure contains a jmp_buf structure and we initialise this with a call to setjmp. Then we patch the new stack base into the jmp_buf. The entry point for the new Sybase thread is a function pointer and this also gets patched into the jmp_buf. This Sybase thread control structure is then added to a run queue in the Sybase scheduler. The Sybase scheduler selects a Sybase thread to run, saves the Sybase scheduler's own context (using a call to setjmp) and does the context switch by calling longjmp with the jmp_buf of the selected thread. This causes the previously allocated stack to be loaded and control jumps to the entry point patched into the jmp_buf earlier. Note that the Sybase scheduler has it's own Sybase thread context which is set up during startup of the Sybase Open Server. At this point the process is now executing the code of the new thread with the stack residing in the memory previously obtained through malloc. The Sybase thread model is non-preemptive so a Sybase thread has to yield control back to the Sybase scheduler either directly or implicitly via an Open Server api call. The api call does this by first saving the yielding Sybase thread's context in the jmp_buf for the current thread using setjmp. Then it does a longjmp using the jmp_buf of the Sybase scheduler, thus resuming the scheduler context. Finally, when the Open Server is shut down it restores the original process context from that saved at the start, so main() will have it's correct stack. The sample provided to DEC emulates the Sybase scheduler's context switching. Sybase's Open Server uses a the Sybase transport control library (netlib) to make network IO requests. In Sybase System 11 netlib comes in two versions, the first is basically the same as used in System 10. The alternate version creates multiple pthreads to service the requests. When netlib is initialised it creates a fixed number of pthread worker threads. But it can also create additional pthread as the number of IO requests increases. When a Sybase thread makes an IO request it get put on the request queue of one of the worker threads. However, if there are no free request queues, netlib creates a new request queue and a new pthread to service it. The pthread_create call to create the new pthread is called in the context of the a Sybase thread making the netlib IO request. This causes the problem we are seeing. Is there any way we can turn off the stack overflow checking in the pthread library?
8792.5	Customer should use a supported mechanism.	WTFN::SCALES	Despair is appropriate and inevitable.	`Mon Feb 17 1997 16:35`	16
	.4> Is there any way we can turn off the stack overflow checking in the .4> pthread library? No. On the other hand, if the customer were to use a supported mechanism for "user level context switching" we could probably make everything work together [see makecontext(2) and swapcontext(2), although the man pages are almost worse than useless :-( ]. Note, I'm not saying that using these functions instead of setjmp()/longjmp() WILL solve the problem. However, before we _can_ solve the problem, the customer must be using a supported mechanism. Webb
8792.6		DCETHD::BUTENHOF	Dave Butenhof, DECthreads	`Tue Feb 18 1997 08:20`	22
	I'll also point out that using the makecontext/swapcontext solution would (possibly, potentially) work ONLY if they make sure that each "Sybase thread" runs only on one specific POSIX thread, and never on a different POSIX thread. The problem with the current setjmp/longjmp hack (and almost certainly with the current implementation of makecontext/swapcontext) is that they're designed to assume that the underlying system knows nothing about threads. But when you try to use them on top of threads, that's no longer true -- the underlying system DOES know about threads. And pulling the rug out from under the thread may be hazardous. It's not a matter of stacks or stack limit checking -- it's much more than that. It's a matter of basic thread identity. Personally, I believe that such user-mode context switching should simply be unsupported when using threads. But should the business decision be otherwise, then a substantial amount of development effort will be required to make it work. And it can only be made to work if supported mechanisms are used. Because setjmp and longjmp are "lighter weight" and intended for a lot of things where changing thread identity would not be desirable, it's very unlikely that we could support a "Sybase thread" model based on those primitives. /dave
8792.7	More suggestions needed	RHETT::HALETKY		`Tue Mar 04 1997 10:30`	10
	Even so, why would it work on 'every other platform' so sayeth the custoemr. It appears that makecontext/setcontext will not work for the customer. Any other suggestions? -ed haletky
8792.8		SMURF::DENHAM	Digital UNIX Kernel	`Tue Mar 04 1997 10:55`	5
	The problem report from the customer reached DECthreads last night. They have an engineer working the stack issues involved. I think those issues are pretty well understood at this point, so stay tuned for a test library most likely.
8792.9	The suggestion is the same: use a supported mechanism	WTFN::SCALES	Despair is appropriate and inevitable.	`Tue Mar 04 1997 12:07`	36
	.7> Even so, why would it work on 'every other platform' so sayeth the .7> custoemr. This sounds alot like a ten-year-old saying, "But, mom, all the other kids get to stay up late..." :-) The fact that it works on many other platforms means either that those platforms haven't tried or needed to do anything sophisticated with call stacks or procedure invocations or that the customer has simply been lucky. (Just because it appears to execute correctly doesn't mean that it's correct code, that it's supportable or robust, or that it will continue to work on the next version of the operating system or on any other operating system; the only way to ensure correctness is to write the code properly and not depend on undocumented features or implementation details.) The standards don't place any requirements on the contents of a jmp_buf. (They say only that it must be an array.) The Digital Unix calling standard points out that on Digital Unix all that really needs to be in the jmp_buf is a frame pointer (NOT a stack pointer) and a PC -- the things necessary for an "unwind". If we had used the minimal implementation, then the customer would have found that their hack couldn't be ported to Digital Unix at all when they made their initial attempt. :-} .7> It appears that makecontext/setcontext will not work for the customer. Appearances can be deceiving. If the customer were willing to switch and use a supported mechanism in lieu of their longjmp() hack, then I believe it would be trivial to make swapcontext() work for them. (They need to pass a status along with the context switch, so pass it in the thread control structure of the thread being scheduled.) This is not a case of "it won't work"; this is a case of "we got a cool demo threads package from some acedemic site on the Internet, and we used it in our product, and now we want Digital to support it, even though it's a hack..." :-( Webb
8792.10	Not acceptable, Try again	RHETT::HALETKY		`Wed Mar 05 1997 14:12`	31
	Setcontext/makecontext will nto work as they would be using it to span threads. setcontext/makecontext will not work becasue it would about 3 years to their development cycle. This is Sybase folks. Does Digital want to risk not having their products work on our machines because of some academic reason? I'll agree its a hack, they agree its a hack. But it does work with other Posix Threads implementations, but it does not work with ours. The answers we have given them are /not/ acceptable. In .8 it says the problem report reached engineering. Excuse me, I did report the problem via this notes conference and got back /It's a hack/. Care to explain what is going on? I'm looking for a solution to give to a company who sells $millions and could sell $millions for Digital. The answer 'it's a hack and they are lucky' is not acceptable to me and definitely not to the customer. I'm out of Ideas. There is /nothing/ in our documentation that says It won't work. Hence customers will think it does work, whether we consider it to be a hack or not. In essence, give me a solution that will work for this customer. Makecontext/swapcontext has been rejected as useable by them for a number of reasons. Are there any others? Regards, Ed Haletky Digital CSC (Customer Service Center) The ones who bring you bugs and problems via this Notes Conferences and IPMTs.
8792.11	let's hear those reasons	VAXCPU::michaud	Jeff Michaud - ObjectBroker	`Wed Mar 05 1997 14:44`	8
	> Makecontext/swapcontext has been rejected > as useable by them for a number of reasons. ^^^^^^^ I assume you meant "unuseable". Care to elaborate on those "number of reasons", or is this just a case of a stubborn customer?
8792.12	If it's not acceptable, they will be having problems again...and again...	WTFN::SCALES	Despair is appropriate and inevitable.	`Wed Mar 05 1997 16:05`	52
	.10> Excuse me, I did report the problem via this notes conference Ed, first off, this notes conference is not a problem reporting mechanism. (This is repeated over and over in this and many other notes conferences.) If you want to make a formal problem report, you must either enter a QAR (for problems encountered by internal groups) or open an IMPT case (for problems experienced by specific external customers). Notes conferences are for exchange of information, only. .10> There is /nothing/ in our documentation that says It won't work. Likewise, there is nothing in our _documentation_ which implies that it -would- work. Furthermore, there is nothing in any of the pertinent standards specifications which implies that this would work, either. Just because you try something and it appears to work doesn't make it a good solution. Just because you have access to the internals of a function, doesn't mean that the function will never be changed. Relying on undocumented, unspecified, or internal implementation details will not make your code reliable or supportable. .10> Does Digital want to risk not having their products work on our machines .10> because of some academic reason? The reason is a practical one, not an academic one. Even if we smooth over whatever wrinkle Sybase hit this time, there is no guarantee that their code will continue to work on the -next- version of Digital Unix. The customer is using an unsupported mechanism, and there is no way that we can ensure that it will work with each successive version of Digital Unix. Having the customer use a supported mechanism is in their best interest. .10> Setcontext/makecontext will nto work as they would be using it to span .10> threads. I'm not exactly sure what this means, but if swapcontext() won't work, then their existing hack is already dead in the water!! Beyond that, it strikes me as really unlikely that replacing the existing code which calls longjmp() with code calling swapcontext() could possibly add anything like 3 years to their development cycle. This is straight FUD. We understand that this is a big customer. We understand that they mean alot of business to Digital. However, it will make life better for everyone if they use facilities which we can ensure will work. If all of our major customers dive into our internals and make use of little bits, in the future we will be unable to enhance our software, and we will be quickly left behind in the industry. So, the result is pretty much the same either way: either we find people willing to work with us and follow the rules, or we're on our way out. Webb