[Search for users]
[Overall Top Noters]
[List of all Conferences]
[Download this site]
Title: | DECthreads Conference |
|
Moderator: | PTHRED::MARYS TE ON |
|
Created: | Mon May 14 1990 |
Last Modified: | Fri Jun 06 1997 |
Last Successful Update: | Fri Jun 06 1997 |
Number of topics: | 1553 |
Total number of notes: | 9541 |
1528.0. "Thread missing from "ladebug show thread" ?" by MUFFIT::gerry (Gerry Reilly) Thu Apr 17 1997 11:26
I am trying to help a partner debug a deadlock in their application, and I
want to understand something I currently don't...
This is on Digital UNIX V4.0A with the latest pthread patches applied.
Using ladebug I am seeing a list of threads (using show thread) that does
not include thread 7 and shows thread 3 terminating. Thread 3 I believe
I understand, it has been cancelled. However, thread 7 I find puzzling
because 'where thread all' shows a stack for this thread.
My question; should I expect show thread to show all the threads ? If not,
under what conditions are threads missed out ?
Any insight greatly appreciated.
-gerry
(ladebug) show thread
Thread State Substate Policy Priority Name
------ ---------- --------------- ---------- -------- -------------
>* -3 running idle 0 null thread for VP 0x1
1 blocked timed cond wait throughput 11 default thread
-1 blocked kernel fifo 32 manager thread
-2 running idle 0 null thread for VP 0x0
2 blocked timed cond wait throughput 11 <anonymous>
4 blocked kernel throughput 11 <anonymous>
5 blocked cond wait throughput 11 <anonymous>
6 blocked kernel throughput 11 <anonymous>
8 blocked mutex wait throughput 11 <anonymous>
3 terminated throughput 11 <anonymous>
(ladebug) where thread all
Stack trace for thread -3
>0 0x240d8a8c in nxm_idle(0x1, 0x14004630, 0x63cd4b70, 0x140045c8, 0x240beb74, 0x63cd37f0) DebugInformationStrippedFromFile19:???
#1 0x240c6ba0 in vpIdle(0x240beb74, 0x63cd37f0, 0x240b97e4, 0x140045c8, 0x63cceca8, 0x140045c8) DebugInformationStrippedFromFile110:???
#2 0x240b97e0 in UnknownProcedure7FromFile98(0x0, 0x0, 0x240c07d0, 0x1, 0x45586732, 0x3) DebugInformationStrippedFromFile98:???
#3 0x240c0b64 in thdBase(0x0, 0x0, 0x0, 0x1, 0x45586732, 0x3) DebugInformationStrippedFromFile102:???
Stack trace for thread 1
#0 0x240c8a78 in /usr/shlib/libpthread.so
#1 0x240b2d20 in /usr/shlib/libpthread.so
#2 0x240b0634 in dspDispatch(0x63ccf720, 0x0, 0x63ccf528, 0x0, 0x63ccf720, 0x11fff640) DebugInformationStrippedFromFile89:???
#3 0x240ab478 in cvTimedWait(0x7fff1910, 0x7fff1908, 0x63ccf750, 0x63ccf528, 0x0, 0x1400bb60) DebugInformationStrippedFromFile1:???
#4 0x240a961c in __pthread_delay_np(0x7ff8cd30, 0x63ccc120, 0x100000, 0x63ccf528, 0x3354edb4, 0x12028cc8) DebugInformationStrippedFromFile1:???
#5 0x24099084 in pthread_delay_np(0x100000, 0x63ccf528, 0x3354edb4, 0x12028cc8, 0x6fe7f868, 0x6ff384a0) DebugInformationStrippedFromFile7:???
#6 0x6fe7f864 in /opt/cics/lib/libcicsco.so
#7 0x6ff384a8 in CICSH_Suspend(0x6ff384ac, 0x2710, 0x0, 0x200000, 0x6ffe6cc4, 0x2710) DebugInformationStrippedFromFile2:???
#8 0x6ffe6cc0 in TerSH_Emulation(0xfffffffe, 0x36304100, 0x12003264, 0x0, 0x7ffd2ec0, 0x1) DebugInformationStrippedFromFile10:???
#9 0x1200332c in main(0x0, 0x140009818, 0x1, 0x45586732, 0x3, 0x140023400) DebugInformationStrippedFromFile1:???
Stack trace for thread -1
#0 0x240d8a44 in msg_receive_trap(0x14017680, 0x500, 0x240ba52c, 0x0, 0x63cd37f0, 0x1) DebugInformationStrippedFromFile19:???
#1 0x240cf200 in msg_receive(0x63cd37f0, 0x63cd4c50, 0x7, 0x500, 0x63cd6120, 0x9a97c3354e88f) DebugInformationStrippedFromFile6:???
#2 0x240b8ecc in UnknownProcedure3FromFile98(0x0, 0x0, 0x1, 0x45586732, 0x3, 0x0) DebugInformationStrippedFromFile98:???
#3 0x240c0b64 in thdBase(0x0, 0x0, 0x0, 0x1, 0x45586732, 0x3) DebugInformationStrippedFromFile102:???
Stack trace for thread -2
#0 0x240d8a8c in nxm_idle(0x1, 0x14004098, 0x63cd4b70, 0x14004030, 0x240beb74, 0x63cd37f0) DebugInformationStrippedFromFile19:???
#1 0x240c6ba0 in vpIdle(0x240beb74, 0x63cd37f0, 0x240b97e4, 0x14004030, 0x63cceca8, 0x14004030) DebugInformationStrippedFromFile110:???
#2 0x240b97e0 in UnknownProcedure7FromFile98(0x0, 0x0, 0x240c07d0, 0x1, 0x45586732, 0x3) DebugInformationStrippedFromFile98:???
#3 0x240c0b64 in thdBase(0x0, 0x0, 0x0, 0x1, 0x45586732, 0x3) DebugInformationStrippedFromFile102:???
Stack trace for thread 2
#0 0x240c8a78 in /usr/shlib/libpthread.so
#1 0x240b2d20 in /usr/shlib/libpthread.so
#2 0x240b0634 in dspDispatch(0x1403bb80, 0x0, 0x1403c3b0, 0x0, 0x1403bb80, 0x140479e8) DebugInformationStrippedFromFile89:???
#3 0x240ab478 in cvTimedWait(0x514c4, 0x0, 0x14032730, 0x1403c3b0, 0x0, 0x14037b80) DebugInformationStrippedFromFile1:???
#4 0x240a94bc in __pthread_cond_timedwait(0x14032730, 0x1403c3b0, 0x0, 0x14037b80, 0x240938c0, 0x2) DebugInformationStrippedFromFile1:???
#5 0x240938bc in ptdexc_cond_timedwait(0x240938c0, 0x2, 0x24132c1c, 0x64334050, 0x3354e890, 0x2564a5a8) DebugInformationStrippedFromFile4:???
#6 0x24132c18 in UnknownProcedure0FromFile2(0x1, 0x63ccf490, 0x63cd6120, 0x0, 0x3354e890, 0x2564a5a8) DebugInformationStrippedFromFile2:???
#7 0x240c0b64 in thdBase(0x0, 0x0, 0x0, 0x1, 0x45586732, 0x3) DebugInformationStrippedFromFile102:???
Stack trace for thread 4
#0 0x243ebff0 in /usr/shlib/libc.so
#1 0x24096e48 in __sigwait(0x0, 0x0, 0x0, 0x0, 0x6fe80a50, 0x0) DebugInformationStrippedFromFile6:???
#2 0x6fe80a4c in /opt/cics/lib/libcicsco.so
#3 0x6ff394f0 in TerSH_SignalInit(0x0, 0x0, 0x240c07d0, 0x6ff39448, 0x1407ce30, 0x0) DebugInformationStrippedFromFile3:???
#4 0x240c0b64 in thdBase(0x0, 0x0, 0x0, 0x1, 0x45586732, 0x3) DebugInformationStrippedFromFile102:???
Stack trace for thread 5
#0 0x240c8a78 in /usr/shlib/libpthread.so
#1 0x240b2d20 in /usr/shlib/libpthread.so
#2 0x240b0634 in dspDispatch(0x1403bc40, 0x100000, 0x14080180, 0x1, 0x1403bc40, 0x240abc80) DebugInformationStrippedFromFile89:???
#3 0x240ac288 in cvWait(0x14092910, 0x1, 0x14032ee0, 0x1407f270, 0x14080150, 0x14080180) DebugInformationStrippedFromFile1:???
#4 0x240a9504 in __pthread_cond_wait(0x14032ee0, 0x1407f270, 0x14080150, 0x14080180, 0x2409395c, 0x2) DebugInformationStrippedFromFile1:???
#5 0x24093958 in ptdexc_cond_wait(0x14080150, 0x14080180, 0x2409395c, 0x2, 0x2413c2b0, 0x240ac6d8) DebugInformationStrippedFromFile4:???
#6 0x2413c2ac in UnknownProcedure8FromFile17(0x0, 0x0, 0x642b08d0, 0x7ffd2e80, 0x140e5428, 0x1407f270) DebugInformationStrippedFromFile17:???
#7 0x2413d2c0 in rpc__cthread_stop_all(0x241429f4, 0x140e5428, 0x6430e028, 0x64334698, 0x64333a30, 0x140e5428) DebugInformationStrippedFromFile17:???
#8 0x241429f0 in rpc_server_listen(0x140e52f8, 0x140e52f8, 0x3, 0x45586732, 0x3, 0x11fff328) DebugInformationStrippedFromFile22:???
#9 0x6ff38ee4 in TerSH_RPCInit(0x0, 0x0, 0x1, 0x45586732, 0x3, 0x0) DebugInformationStrippedFromFile3:???
#10 0x240c0b64 in thdBase(0x0, 0x0, 0x0, 0x1, 0x45586732, 0x3) DebugInformationStrippedFromFile102:???
Stack trace for thread 6
#0 0x243b1a68 in /usr/shlib/libc.so
#1 0x241486e8 in UnknownProcedure5FromFile25(0x1, 0x63ccf490, 0x63cd6120, 0x0, 0x0, 0x0) DebugInformationStrippedFromFile25:???
#2 0x2414846c in UnknownProcedure4FromFile25(0x0, 0x0, 0x240c07d0, 0x1, 0x45586732, 0x3) DebugInformationStrippedFromFile25:???
#3 0x240c0b64 in thdBase(0x0, 0x0, 0x0, 0x1, 0x45586732, 0x3) DebugInformationStrippedFromFile102:???
Stack trace for thread 8
#0 0x240c8a78 in /usr/shlib/libpthread.so
#1 0x240b2d20 in /usr/shlib/libpthread.so
#2 0x240b0634 in dspDispatch(0x14005d10, 0x14082ee8, 0x14082d70, 0x7ffd2e80, 0x1403b940, 0x0) DebugInformationStrippedFromFile89:???
#3 0x240b4120 in pthread_mutex_block(0x0, 0xb, 0x0, 0x7ffd2e80, 0x1403b940, 0x0) DebugInformationStrippedFromFile96:???
#4 0x240c8870 in __pthread_mutex_lock(0xb, 0x0, 0x7ffd2e80, 0x1403b940, 0x0, 0x24099c00) DebugInformationStrippedFromFile112:???
#5 0x24099bfc in pthread_mutex_lock(0x7ffd2e80, 0x1403b940, 0x0, 0x24099c00, 0x6ff3a070, 0x0) DebugInformationStrippedFromFile7:???
#6 0x6ff3a06c in /opt/cics/lib/librcsco.so
#7 0x6ff39ae0 in TerSH_RSend(0x14086600, 0x48384a0054414843, 0x42495254384a0048, 0x6574786961004848, 0x6d72, 0x742f7665642f0000) DebugInformationStrippedFromFile4:???
#8 0x6ff3f9b0 in UnknownProcedure1FromFile15(0x6430e028, 0x1414d950, 0x1, 0x45586732, 0x3, 0x1) DebugInformationStrippedFromFile15:???
#9 0x24188358 in /usr/shlib/libdce.so
#10 0x2413b4f0 in UnknownProcedure2FromFile17(0x14082d70, 0x14092910, 0x1, 0x63ccf490, 0x63cd6120, 0x0) DebugInformationStrippedFromFile17:???
#11 0x240c0b64 in thdBase(0x0, 0x0, 0x0, 0x1, 0x45586732, 0x3) DebugInformationStrippedFromFile102:???
Stack trace for thread 7
#0 0x240c8a78 in /usr/shlib/libpthread.so
#1 0x240b2d20 in /usr/shlib/libpthread.so
#2 0x240b0634 in dspDispatch(0x14006290, 0x14082988, 0x14082810, 0x14082810, 0x1403bc40, 0x0) DebugInformationStrippedFromFile89:???
#3 0x240b4120 in pthread_mutex_block(0x1, 0x240aa94c, 0x14082810, 0x1, 0x14082810, 0x63cd37f0) DebugInformationStrippedFromFile96:???
#4 0x240c8870 in __pthread_mutex_lock(0x240aa94c, 0x14082810, 0x1, 0x14082810, 0x63cd37f0, 0x63cd4b70) DebugInformationStrippedFromFile112:???
#5 0x63cd4b6c in ???
Stack trace for thread 3
#0 0x2425fa98 in UnknownProcedure7FromFile331(0x14091a18, 0x14091a18, 0x240c0efc, 0x2425f238, 0x14087950, 0x0) DebugInformationStrippedFromFile331:???
#1 0x240c0b64 in thdBase(0x0, 0x0, 0x0, 0x3, 0x45586732, 0x3) DebugInformationStrippedFromFile102:???
(ladebug) detach
(ladebug) quit
T.R | Title | User | Personal Name | Date | Lines |
---|
1528.1 | #7 is dead...it's an ex-thread! | WTFN::SCALES | Despair is appropriate and inevitable. | Thu Apr 17 1997 11:54 | 12 |
| I suspect that the problem is not that "show thread" is missing a thread;
rather I'd bet it's that "where thread all" is showing you one that it
shouldn't.
I suspect that thread 7 has terminated; however, unlike thread 3, thread 7
has been reclaimed (because some thread already joined with it or because it
was explicitly detached). What "where thread all" is showing you for thread
7 is a cached thread "corpse", with an inconsistent execution context (no
surprise that, since the thread is dead!).
Webb
|
1528.2 | | DCETHD::BUTENHOF | Dave Butenhof, DECthreads | Thu Apr 17 1997 16:05 | 23 |
| Yes, prior to 4.0D, I recently discovered that some of the pthreaddebug
thread information functions (such as the "get registers" call that ladebug
uses to start a "where") didn't reject a terminated thread ID. As a result,
it'd return misleading data.
On the other hand, ladebug tracks thread activation and termination to keep
its own list of threads up to date -- so I don't know why the "where thread
all" thought there was a thread 7.
If you're using a version of ladebug earlier than 4.0-35, (and especially if
you've got one earlier than 4.0-30), you should update. (Of course, if this
is being done at "a partner"'s site, I don't know whether that's necessarily
possible.) There have been a lot of thread-related problems fixed in ladebug,
though I don't know all the details or have any idea whether this might be
one. But in general, threaded debugging will go much more smoothly with the
latest ladebug.
It's also possible that there's a DECthreads problem, and somehow ladebug
isn't getting all the termination events -- if I have some time at some point
I might try to provoke that to check, but, no guarantees. It might well prove
tricky to deliberately catch things in the right state...
/dave
|
1528.3 | Hmmm...did you attach to that process at all? | PTHRED::PORTANTE | Peter Portante, DTN 381-2261, (603)881-2261, MS ZKO2-3/Q18 | Thu Apr 17 1997 16:56 | 6 |
| Gerry,
Did you have ladebug run the program from the beginning or attach to the program
after it had started to run?
-Peter
|
1528.4 | Thanks | MUFFIT::gerry | Gerry Reilly | Fri Apr 18 1997 09:14 | 13 |
| Thanks for all the help hints.
I will upgrade the system to 4.0-35 and see if I get some more useful
information.
Currently, they are attaching to the process rather than starting the
process under the debugger. Really can't change this because provoking
the hang in a reasonable time (like 12 hrs) requires 60 instances of
the process to be run; they have no idea which ones of the 60 will
hang, and therefore they would need to start all 60 under ladebug...
-gerry
|
1528.5 | | DCETHD::BUTENHOF | Dave Butenhof, DECthreads | Fri Apr 18 1997 09:55 | 31 |
| The reason Pete asked about attaching is that he, a ladebug developer, and I
were talking about this note yesterday in the hall. Our concensus was that
the "missing thread" may be completely innocuous (and irrelevant) if you
attached. ladebug relies on two mechanisms to keep an internal list of
threads up to date: first, it iterates through all "known threads", and then
it keeps the list current by tracking the activation and termination of
threads.
However, there's a small window after the termination event where the
terminating thread is still "known" to our scheduler. Thus, if you attached
AFTER thread 7 "terminated" but before it went away, ladebug would add the
thread to its list, but would not know to remove it. Because of the bug in
the old pthreaddebug library, it didn't get an error when asking for that
thread's registers later, and showed a bogus stack -- but pthreaddebug
ignored the bogus thread ID for "show thread".
In 4.0D, ladebug will receive an error when trying to get the registers... if
nothing else changes, it should be prepared to deal with this situation by
updating the thread list.
However, it occurs to me that ladebug is really tracking the wrong events. I
think it's tracking ACTIVATE and TERMINATE, which would mean it doesn't know
about threads that have been created but haven't yet run, and it has
forgotten about threads that have terminated but haven't yet been joined or
detached. It should probably be tracking CREATE and FREE events, instead. (If
a FREE event has already been issued for thread 7 when you attach, that
thread will not be "known" to our scheduler.)
In any case, all of this has nothing to do with your partner's deadlock.
/dave
|
1528.6 | | MUFFIT::gerry | Gerry Reilly | Tue Apr 22 1997 10:38 | 38 |
| Well thanks for all the input. With metering enabled and the latest
ladebug we've found the partner's deadlock. It is occuring while they
are handling an exception. That problem can now be fixed.
However, while their system is in the 'hung' state with ladebug attached
and the information from metering available, is their any data available
to me regarding where the threads was when the DECthreads exception was
raised ?
The output for Thread 1 from
(ladebug) pthread "threads -af"
main thread 1 (blocked, timed cond wait) "default thread" (0x63ccf528), created
by pthread
Waiting on condition variable 4 using mutex 17; timeout at
Mon Apr 21 12:09:01 1997
Scheduling: throughput policy at priority 11
Masked signals: none
Pending signals: none
Object flags: none; self flags: delay; sched flags: none; mutex flags: none;
atomic flags: none
Thread specific data: 0=0x63ccf980, 1=0x140dbc20, 4=0x140b3b40, 5=0x1412bac0,
6=0x1412be80
Stack: 0x11fff300; base is 0x11fffffff, guard area at 0x4000000
General cancelability enabled, asynch cancelability disabled
Current vp is 1, synch port is 14, vp ID is 13
Join uses mutex 16 and condition variable 3; wait uses mutex 17 and
condition variable 4
The thread's start function and argument are unknown
The thread's latest errno is 22, the last DECthreads exception caught was
"exception formatting NYI" (status exception 0x16c9a016 [02662320026])
The thread has mutexes locked: 47, 48, 89
<<info for other threads deleted as looks uninteresting..>>
Thanks.
-gerry
|
1528.7 | The origination point is not recorded. | WTFN::SCALES | Despair is appropriate and inevitable. | Tue Apr 22 1997 11:34 | 21 |
| .6> is their any data available to me regarding where the threads was when
.6> the DECthreads exception was raised ?
No. We haven't come up with a feasible way of recording and reporting that
information.
Part of the problem is that the PC at which the exception was originally
raised may not be the one you're interested in, anyway -- what you want is
the deepest PC in _your_code_ which raised or tried to handle the exception!
:-) And, obviously, there's no way for DECthreads to record that (other than
recording the whole stack, and that's not warranted in a "production"
application).
If you have the luxury of running the application under the debugger (which
you don't in this case), you can set a breakpoint in the "raise" routine in
libexc (exc_raise()?) and check the stack at the point where the exception
originates. (However, remember that a number of facilities, DECthreads
included, raise exceptions as a part of -normal- operation...)
Webb
|
1528.8 | Thanks. | MUFFIT::gerry | Gerry Reilly | Tue Apr 22 1997 12:21 | 5 |
| Thought that might be the answer but it was always worth asking.
Thanks as always.
-gerry
|