[Search for users]
[Overall Top Noters]
[List of all Conferences]
[Download this site]
Title: | DECthreads Conference |
|
Moderator: | PTHRED::MARYS TE ON |
|
Created: | Mon May 14 1990 |
Last Modified: | Fri Jun 06 1997 |
Last Successful Update: | Fri Jun 06 1997 |
Number of topics: | 1553 |
Total number of notes: | 9541 |
1507.0. "debugging help needed.." by DECWET::ARTI () Tue Mar 18 1997 14:36
Hi,
I am running a multi-threaded test on Steel bl8 that does mmap
operations. I start it with four threads and it hangs shortly after.
When I attach to the hung process with ladebug and display the threads
here is the output:
(ladebug) show thread
Thread State Substate Policy Priority Name
------ ---------- --------------- ---------- -------- -------------
* 6 running throughput 11 <anonymous>
1 running throughput 11 default thread
-1 blocked kernel fifo 32 manager thread
-2 ready idle 0 null thread for VP 0x0
-3 ready idle 0 null thread for VP 0x1
-4 ready idle 0 null thread for VP 0x2
-5 ready idle 0 null thread for VP 0x3
2 blocked kernel throughput 11 <anonymous>
3 ready throughput 11 <anonymous>
4 running throughput 11 <anonymous>
> 5 running throughput 11 <anonymous>
The 4 threads of interest to me are threads 3 through 6. The following are the
traces of the four threads. 2 of them are in swtch_pri(), 1 is in memcpy()
and 1 is in hstRestoreRegisters(). Any help in interpreting this information,
and debugging this hang would be much appreciated.
thanks,
arti
===========================================================================
(ladebug) thread 3
Thread State Substate Policy Priority Name
------ ---------- --------------- ---------- -------- -------------
> 3 ready throughput 11 <anonymous>
(ladebug) where
>0 0x3ff8057c46c in hstRestoreRegisters(0x3ff80565c14, 0x3ff80579c1c, 0x3ff8057c3f0, 0x4c28, 0x14008f790, 0x3ffc01878a0) DebugInformationStrippedFromFile104
#1 0x3ff80579c18 in hstTransferContext(0x3ffc0183928, 0x140002340, 0x1400048e0, 0x140015100, 0x0, 0x140002340) DebugInformationStrippedFromFile102
#2 0x3ff805644e0 in dspDispatch(0x3ffc0188b50, 0x140002340, 0x3ff80565bd8, 0x3ffc01878a0, 0x3ff00000001, 0x140002340) DebugInformationStrippedFromFile82
#3 0x3ff80565c10 in /usr/shlib/libpthread.so
#4 0x3ff80572340 in pthread_yield_np(0x3ff80572344, 0x140020030, 0x3ff8011549c, 0x140015100, 0x1400022d0, 0x140020030) DebugInformationStrippedFromFile94
#5 0x3ff80115498 in __tis_yield(0x3ff8011549c, 0x140015100, 0x1400022d0, 0x140020030, 0x3ff80157bb8, 0x140015100) DebugInformationStrippedFromFile748
#6 0x3ff80157bb4 in __sched_yield(0x1400022d0, 0x140020030, 0x3ff80157bb8, 0x140015100, 0x1200086e4, 0x140001168) DebugInformationStrippedFromFile467
#7 0x1200086e0 in threadRaceLineUp(0x1200086e4, 0x140001168, 0x1200078f4, 0x140015100, 0x1400022d0, 0x140001560) DebugInformationStrippedFromFile3
#8 0x1200078f0 in /usr/users/arti/arti/mmap/exer/fs/mmap_syscalls_exer
#9 0x1200051ec in mmap_syscallRace(0x0, 0x0, 0x120009e64, 0x140015100, 0x120009e78, 0x140015100) DebugInformationStrippedFromFile1
#10 0x120009e74 in raceFuncWrapper(0x140020030, 0x140015100, 0x3ffc0183890, 0x1, 0x3ffc018a100, 0x0) DebugInformationStrippedFromFile3
#11 0x3ff8057432c in thdBase(0x0, 0x0, 0x0, 0x1, 0x45586732, 0x3) DebugInformationStrippedFromFile94
(ladebug) thread 4
Thread State Substate Policy Priority Name
------ ---------- --------------- ---------- -------- -------------
> 4 running throughput 11 <anonymous>
(ladebug) where
>0 0x3ff800d4fb8 in memcpy(0x3ff800d4fb8, 0x14000c2c0, 0x20000006394, 0x14009b9c0, 0x20, 0x2000000000000000) DebugInformationStrippedFromFile19
#1 0x120006c1c in UnknownProcedure14FromFile1(0x2524232221201f1e, 0x2d2c2b2a29282726, 0x3534333231302f2e, 0x100000020, 0x2c0, 0x140015118) DebugInformationStrippedFromFile1
#2 0x1200051ec in mmap_syscallRace(0x3534333231302f2e, 0x100000020, 0x2c0, 0x140015118, 0x120009e78, 0x140015118) DebugInformationStrippedFromFile1
#3 0x120009e74 in raceFuncWrapper(0x14002cb50, 0x140015118, 0x3ffc0183890, 0x1, 0x3ffc018a100, 0x0) DebugInformationStrippedFromFile3
#4 0x3ff8057432c in thdBase(0x0, 0x0, 0x0, 0x1, 0x45586732, 0x3) DebugInformationStrippedFromFile94
(ladebug) thread 5
Thread State Substate Policy Priority Name
------ ---------- --------------- ---------- -------- -------------
> 5 running throughput 11 <anonymous>
(ladebug) where
>0 0x3ff80540858 in swtch_pri(0x3ff80565bd8, 0x3ffc01878a0, 0x3ff00000001, 0x140002340, 0x3ff80572344, 0x14002d0b0) DebugInformationStrippedFromFile21
#1 0x3ff805722bc in pthread_yield_np(0x3ff80572344, 0x14002d0b0, 0x3ff8011549c, 0x140015130, 0x1400022d0, 0x0) DebugInformationStrippedFromFile94
#2 0x3ff80115498 in __tis_yield(0x3ff8011549c, 0x140015130, 0x1400022d0, 0x0, 0x3ff80157bb8, 0x0) DebugInformationStrippedFromFile748
#3 0x3ff80157bb4 in __sched_yield(0x1400022d0, 0x0, 0x3ff80157bb8, 0x0, 0x1200086e4, 0x1400013e0) DebugInformationStrippedFromFile467
#4 0x1200086e0 in threadRaceLineUp(0x1200086e4, 0x1400013e0, 0x12000660c, 0x140015130, 0x1400022d0, 0x14002d1b0) DebugInformationStrippedFromFile3
#5 0x120006608 in UnknownProcedure10FromFile1(0x1400022d0, 0x14002d1b0, 0x1200051f0, 0x140015130, 0x20000002000, 0x140015130) DebugInformationStrippedFromFile1
#6 0x1200051ec in mmap_syscallRace(0x1200051f0, 0x140015130, 0x20000002000, 0x140015130, 0x120009e78, 0x140015130) DebugInformationStrippedFromFile1
#7 0x120009e74 in raceFuncWrapper(0x14002d0b0, 0x140015130, 0x3ffc0183890, 0x1, 0x3ffc018a100, 0x0) DebugInformationStrippedFromFile3
#8 0x3ff8057432c in thdBase(0x0, 0x0, 0x0, 0x1, 0x45586732, 0x3) DebugInformationStrippedFromFile94
(ladebug) thread 6
Thread State Substate Policy Priority Name
------ ---------- --------------- ---------- -------- -------------
>* 6 running throughput 11 <anonymous>
(ladebug) where
>0 0x3ff80540858 in swtch_pri(0x3ff80565bd8, 0x3ffc01878a0, 0x3ff00000001, 0x140002340, 0x3ff80572344, 0x14002d610) DebugInformationStrippedFromFile21
#1 0x3ff805722bc in pthread_yield_np(0x3ff80572344, 0x14002d610, 0x3ff8011549c, 0x140015148, 0x1400022d0, 0x0) DebugInformationStrippedFromFile94
#2 0x3ff80115498 in __tis_yield(0x3ff8011549c, 0x140015148, 0x1400022d0, 0x0, 0x3ff80157bb8, 0x0) DebugInformationStrippedFromFile748
#3 0x3ff80157bb4 in __sched_yield(0x1400022d0, 0x0, 0x3ff80157bb8, 0x0, 0x1200086e4, 0x140001470) DebugInformationStrippedFromFile467
#4 0x1200086e0 in threadRaceLineUp(0x1200086e4, 0x140001470, 0x12000686c, 0x140015148, 0x1400022d0, 0x14002d710) DebugInformationStrippedFromFile3
#5 0x120006868 in UnknownProcedure12FromFile1(0x1400022d0, 0x14002d710, 0x1200051f0, 0x140015148, 0x2000000c000, 0x140015148) DebugInformationStrippedFromFile1
#6 0x1200051ec in mmap_syscallRace(0x1200051f0, 0x140015148, 0x2000000c000, 0x140015148, 0x120009e78, 0x140015148) DebugInformationStrippedFromFile1
#7 0x120009e74 in raceFuncWrapper(0x14002d610, 0x140015148, 0x3ffc0183890, 0x1, 0x3ffc018a100, 0x0) DebugInformationStrippedFromFile3
#8 0x3ff8057432c in thdBase(0x0, 0x0, 0x0, 0x1, 0x45586732, 0x3) DebugInformationStrippedFromFile94
T.R | Title | User | Personal Name | Date | Lines |
---|
1507.1 | Can you get a stack trace for all the threads? | PTHRED::PORTANTE | Peter Portante, DTN 381-2261, (603)881-2261, MS ZKO2-3/Q18 | Tue Mar 18 1997 15:11 | 7 |
| Arti,
Could you post the stack traces for the other threads in the program as well?
Most likely the problem is being caused by one of those. The ones that are
running look fine...
-Peter
|
1507.2 | All of your threads look more or less OK...what's #1 doing? | WTFN::SCALES | Despair is appropriate and inevitable. | Tue Mar 18 1997 15:33 | 15 |
| Well, thread 3 looks OK -- it's not running at the moment (i.e., it's "ready"),
so the top of its stack is as it should be, hstRestoreRegisters(). Threads 3,
5, and 6 are all in calls to sched_yield() called from threadRaceLineUp(), which
doesn't look too weird. (I'd guess that this is either the beginning or the end
of a test run...) You might want to look at whether you think they _ought_ to
be looping around a call to sched_yield().
Thread 4 looks like it's either just begun or not yet finished its piece of the
"race" (i.e., it's in memcpy(), presumably that's part of the test).
The missing piece is thread 1...it's also supposedly running at the moment.
What's it doing??
Webb
|
1507.3 | the traces for the other threads.. | DECWET::ARTI | | Tue Mar 18 1997 15:55 | 73 |
| The stack traces for the other threads follows. Thread 2 is in sigwaitprim(),
that's ok as that thread is always started up by the test to catch signals.
thread 1 is in a function called raceStarter().
arti
======================================================================
(ladebug) thread 1
Thread State Substate Policy Priority Name
------ ---------- --------------- ---------- -------- -------------
> 1 running throughput 11 default thread
(ladebug) where
>0 0x12000a6f4 in raceStarter(0x12000a728, 0x40e8000000000000, 0x120007fc8, 0x1400022d0, 0xb00, 0x0) DebugInformationStrippedFromFile3
#1 0x120007fc4 in threadRace(0x120005084, 0x140000058, 0xb00, 0x0, 0x120005170, 0x0) DebugInformationStrippedFromFile3
#2 0x120005080 in main(0x3ff80017380, 0x140018f00, 0x140010d20, 0x140000220, 0x120003760, 0x0) DebugInformationStrippedFromFile1
(ladebug) thread -1
Thread State Substate Policy Priority Name
------ ---------- --------------- ---------- -------- -------------
> -1 blocked kernel fifo 32 manager thread
(ladebug) where
>0 0x3ff80540774 in msg_receive_trap(0x0, 0x0, 0x0, 0x0, 0x0, 0x0) DebugInformationStrippedFromFile21
#1 0x3ff805365b8 in msg_receive(0x3ffc01878a0, 0x3ffc0188c30, 0x0, 0x400, 0x3ffc018a100, 0x0) DebugInformationStrippedFromFile6
#2 0x3ff8056c888 in UnknownProcedure3FromFile90(0x0, 0x0, 0x3ff00000001, 0x45586732, 0x3, 0x0) DebugInformationStrippedFromFile90
#3 0x3ff8057432c in thdBase(0x0, 0x0, 0x0, 0x1, 0x45586732, 0x3) DebugInformationStrippedFromFile94
(ladebug) thread -2
Thread State Substate Policy Priority Name
------ ---------- --------------- ---------- -------- -------------
> -2 ready idle 0 null thread for VP 0x0
(ladebug) where
>0 0x3ff8057c46c in hstRestoreRegisters(0x0, 0x3ffc018fb40, 0x0, 0x0, 0x0, 0x0) DebugInformationStrippedFromFile104
#1 0x3ff80579c18 in hstTransferContext(0x0, 0x3ffc018fb40, 0x0, 0x0, 0x0, 0x0) DebugInformationStrippedFromFile102
(ladebug) thread -3
Thread State Substate Policy Priority Name
------ ---------- --------------- ---------- -------- -------------
> -3 ready idle 0 null thread for VP 0x1
(ladebug) where
>0 0x3ff8057c46c in hstRestoreRegisters(0x0, 0x3ffc018fb40, 0x0, 0x0, 0x0, 0x0) DebugInformationStrippedFromFile104
#1 0x3ff80579c18 in hstTransferContext(0x0, 0x3ffc018fb40, 0x0, 0x0, 0x0, 0x0) DebugInformationStrippedFromFile102
(ladebug) thread -4
Thread State Substate Policy Priority Name
------ ---------- --------------- ---------- -------- -------------
> -4 ready idle 0 null thread for VP 0x2
(ladebug) where
>0 0x3ff8057c46c in hstRestoreRegisters(0x0, 0x3ffc018fb40, 0x0, 0x0, 0x0, 0x0) DebugInformationStrippedFromFile104
#1 0x3ff80579c18 in hstTransferContext(0x0, 0x3ffc018fb40, 0x0, 0x0, 0x0, 0x0) DebugInformationStrippedFromFile102
(ladebug) thread -5
Thread State Substate Policy Priority Name
------ ---------- --------------- ---------- -------- -------------
> -5 ready idle 0 null thread for VP 0x3
(ladebug) where
>0 0x3ff8057c46c in hstRestoreRegisters(0x0, 0x3ffc018fb40, 0x0, 0x0, 0x0, 0x0) DebugInformationStrippedFromFile104
#1 0x3ff80579c18 in hstTransferContext(0x0, 0x3ffc018fb40, 0x0, 0x0, 0x0, 0x0) DebugInformationStrippedFromFile102
(ladebug) thread 2
Thread State Substate Policy Priority Name
------ ---------- --------------- ---------- -------- -------------
> 2 blocked kernel throughput 11 <anonymous>
(ladebug) where
>0 0x3ff801117c8 in __sigwaitprim(0x0, 0x0, 0x0, 0x0, 0x0, 0x0) DebugInformationStrippedFromFile867
#1 0x3ff8056e714 in __sigwaitd10(0x0, 0x0, 0x0, 0x0, 0x12000aa14, 0x14005ba30) DebugInformationStrippedFromFile92
#2 0x12000aa10 in /usr/users/arti/arti/mmap/exer/fs/mmap_syscalls_exer
#3 0x3ff8057432c in thdBase(0x0, 0x0, 0x0, 0x1, 0x45586732, 0x3) DebugInformationStrippedFromFile94
|
1507.4 | raceStarter function | DECWET::ARTI | | Tue Mar 18 1997 16:03 | 86 |
| Here's the code for the raceStarter function:
// raceStarter - synchronizes thread execution (threadRaceLineUp)
void
raceStarter (void)
{
register struct perThreadInfo *thisThreadsInfo;
register threadReadyCnt; // how many threads are in Ready state
register struct raceInfo *r = &raceInfo;
DEBUG ("raceStarter: turning On lineUpSync");
r->lineUpSync = On; // turn on thread race line up synchronization.
// it will be turned off by the last racing thread
// to return from the users race function.
// no mutex is needed.
while (1) {
// spin here until all race threads mark themselves ready
do {
threadReadyCnt = 0;
// begin check at the last thread to improve response
for(thisThreadsInfo = &r->perThreadInfo[r->numThreads- 1]
; thisThreadsInfo >= &r->perThreadInfo[0]; thisThreadsInfo--) {
if (thisThreadsInfo->threadState == Ready)
threadReadyCnt++;
else {
DEBUG1( "raceStarter: yield on threadReadyCnt %d",
threadReadyCnt);
#ifdef WANT_YIELD_COUNTS
if (++r->raceStarterYieldCnt >
r->raceStarterYieldCntMax)
r->raceStarterYieldCntMax =
r->raceStarterYieldCnt
;
#endif
pthread_yield();
break;
}
}
if (r->raceCancelled || r->lineUpSync == Off)
return;
} while (threadReadyCnt != r->numThreads);
#ifdef WANT_YIELD_COUNTS
r->raceStarterYieldCnt = 0;
#endif
// Ready
DEBUG("raceStarter: Go = No");
r->Go = No; // hold back racing threads from executing
// by initializing Go to No.
mb();
// Set
DEBUG("raceStarter: threadState = Set");
for (thisThreadsInfo = &r->perThreadInfo[0];
thisThreadsInfo < &r->perThreadInfo[r->numThreads];
thisThreadsInfo++)
thisThreadsInfo->threadState = Set;
// Go
mb();
DEBUG("raceStarter: Go = Yes");
r->Go = Yes;
pthread_yield(); // hopefully racers will run
}
}
|
1507.5 | dbx output of hung test | DECWET::ARTI | | Tue Mar 18 1997 16:13 | 52 |
| Also if I dbx the running kernel, and print out the traces
for the threads in the hung test I get the following. Note
that it appears that one thread got a memory-management fault
but got prempted during the trap(). How do I look at the
kernel threads from ladebug. I know I have to set $threadlevel?
arti
(dbx) tstack
Thread 0xfffffc000f5878c0:
> 0 thread_block() ["/usr/sde/osf1/build/steelos.bl8/src/kernel/kern/sched_prim.c":2206, 0xfffffc00002a9d68]
1 msg_dequeue(0xfffffc000f183ea0, 0xfffffc000f183ee8, 0xffffffff8043f020, 0xffffffff8cb8b910, 0x28) ["/usr/sde/osf1/build/steelos.bl8/src/kernel/kern/ipc_basics.c":926, 0xfffffc0000296328]
2 msg_receive_trap(0x3ffc018a100, 0x0, 0xfffffc0007c15e00, 0x14001f680, 0x2800000400) ["/usr/sde/osf1/build/steelos.bl8/src/kernel/kern/ipc_basics.c":1292, 0xfffffc00002969dc]
3 _Xsyscall(0x8, 0x3ff80540774, 0x3ffc017d250, 0x14001f680, 0x400) ["/usr/sde/osf1/build/steelos.bl8/src/kernel/arch/alpha/locore.s":1404, 0xfffffc00004fc1f0]
Thread 0xfffffc000facbb80:
> 0 thread_block() ["/usr/sde/osf1/build/steelos.bl8/src/kernel/kern/sched_prim.c":2206, 0xfffffc00002a9d68]
1 mpsleep(0xfffffc0007d88a80, 0x9, 0x0, 0xfffffc000facbd90, 0x0) ["/usr/sde/osf1/build/steelos.bl8/src/kernel/bsd/kern_synch.c":592, 0xfffffc0000272414]
2 nxm_get_thread() ["/usr/sde/osf1/build/steelos.bl8/src/kernel/kern/sched_prim.c":4055, 0xfffffc00002ac730]
Thread 0xfffffc000faca840:
> 0 thread_block() ["/usr/sde/osf1/build/steelos.bl8/src/kernel/kern/sched_prim.c":2206, 0xfffffc00002a9d68]
1 mpsleep(0xfffffc0007d88ca8, 0x1001, 0x0, 0xfffffc000facaa50, 0x0) ["/usr/sde/osf1/build/steelos.bl8/src/kernel/bsd/kern_synch.c":592, 0xfffffc0000272414]
2 sigwaitprim() ["/usr/sde/osf1/build/steelos.bl8/src/kernel/bsd/kern_sig.c":2679, 0xfffffc000026bc30]
3 syscall(0x0, 0xffffffffffffffde, 0x0, 0xa3760332e7560, 0x0) ["/usr/sde/osf1/build/steelos.bl8/src/kernel/arch/alpha/syscall_trap.c":552, 0xfffffc0000506690]
4 _Xsyscall(0x8, 0x3ff801117c8, 0x3ffc018fb40, 0x2, 0x0) ["/usr/sde/osf1/build/steelos.bl8/src/kernel/arch/alpha/locore.s":1255, 0xfffffc00004fc094]
Thread 0xfffffc00096ae840:
> 0 thread_block() ["/usr/sde/osf1/build/steelos.bl8/src/kernel/kern/sched_prim.c":2206, 0xfffffc00002a9d68]
1 swtch_pri(0x140002008, 0xfffffc0000000005, 0xfffffc00004fc1f4, 0xffffffffffffffc5, 0x1400022d0) ["/usr/sde/osf1/build/steelos.bl8/src/kernel/kern/syscall_subr.c":374, 0xfffffc00002acc58]
More (n if no)?
2 _Xsyscall(0x8, 0x3ff80540858, 0x3ffc018fb40, 0x0, 0x14002d610) ["/usr/sde/osf1/build/steelos.bl8/src/kernel/arch/alpha/locore.s":1404, 0xfffffc00004fc1f0]
Thread 0xfffffc00096aedc0:
> 0 thread_block() ["/usr/sde/osf1/build/steelos.bl8/src/kernel/kern/sched_prim.c":2206, 0xfffffc00002a9d68]
1 thread_preempt(thread = 0xfffffc00096aedc0, processor = (nil)) ["/usr/sde/osf1/build/steelos.bl8/src/kernel/kern/sched_prim.c":3917, 0xfffffc00002ac400]
2 trap() ["/usr/sde/osf1/build/steelos.bl8/src/kernel/arch/alpha/trap.c":2112, 0xfffffc0000508274]
3 exception_exit(0x8, 0x12000a6f4, 0x14000c2c0, 0x3ffc0188b50, 0x1) ["/usr/sde/osf1/build/steelos.bl8/src/kernel/arch/alpha/locore.s":1005, 0xfffffc00004fbe68]
Thread 0xfffffc00096aeb00:
> 0 thread_block() ["/usr/sde/osf1/build/steelos.bl8/src/kernel/kern/sched_prim.c":2206, 0xfffffc00002a9d68]
1 thread_preempt(thread = 0xfffffc00096aeb00, processor = (nil)) ["/usr/sde/osf1/build/steelos.bl8/src/kernel/kern/sched_prim.c":3917, 0xfffffc00002ac400]
2 trap() ["/usr/sde/osf1/build/steelos.bl8/src/kernel/arch/alpha/trap.c":2112, 0xfffffc0000508274]
3 _XentMM(0x8, 0x3ff800d4fb8, 0x14000c2c0, 0x20000006394, 0x14009b9c0) ["/usr/sde/osf1/build/steelos.bl8/src/kernel/arch/alpha/locore.s":1473, 0xfffffc00004fc2d4]
Thread 0xfffffc00096ae2c0:
> 0 thread_block() ["/usr/sde/osf1/build/steelos.bl8/src/kernel/kern/sched_prim.c":2206, 0xfffffc00002a9d68]
1 thread_preempt(thread = 0xfffffc00096ae2c0, processor = (nil)) ["/usr/sde/osf1/build/steelos.bl8/src/kernel/kern/sched_prim.c":3917, 0xfffffc00002ac400]
2 trap() ["/usr/sde/osf1/build/steelos.bl8/src/kernel/arch/alpha/trap.c":2112, 0xfffffc0000508274]
3 exception_exit(0x8, 0x3ff80540858, 0x3ffc018fb40, 0x0, 0x14002d0b0) ["/usr/sde/osf1/build/steelos.bl8/src/kernel/arch/alpha/locore.s":1005, 0xfffffc00004fbe68]
|
1507.6 | | SMURF::DENHAM | Digital UNIX Kernel | Tue Mar 18 1997 21:19 | 7 |
| That thread preempt in the _XentMM could be doing a page fault,
or it could be stuck doing a SEGV in it's SEGV handler.
Try tracing the same thread a bunch of times in the kernel. If
you see pretty much the same trace but the thread is collecting
computes, it's most like the SEGV. Then you could try setting
the SIGSEGV handler to SIG_DFL in main() and see if you get
a core dump....
|
1507.7 | There is something missing here... | WTFN::SCALES | Despair is appropriate and inevitable. | Wed Mar 19 1997 09:50 | 27 |
| There's something fishy going on here...Arti, when this thing is "hung" do
the stack traces for threads 1-6 always look pretty much the same, or, if you
let it run again for a sec and then check again do they switch around?
I have the feeling that this program might be doing pretty much what you told
it to -- do a sh*load of yields. (The calls to switch_pri() are just a side
effect of intensively yielding on a multiprocessor.)
It looks to me like there is at least one hole in the raceStarter() code.
The "starter" starts the threads and then yields, but it never gets any
assurance from the threads that they actually started. Thus, when it loops
around to see if they are now "ready", it can't tell whether a thread is
ready now because it already finished or because it never started! I think
you need some sort of explicit "I did run" mechanism from each thread.
One other tidbit: you should understand that when you call pthread_yield()
there is no guarantee that anything special will happen; the call is a "hint"
to the DECthreads scheduler that "now would be a good time to preempt me",
but there is no requirement that the preemption must actually occur. Thus,
you cannot really rely on the raceStarter thread loosing control at any time,
as the code is currently written -- that is, this thread could just soak up
one of your processors for the whole test, forcing you to run the other four
threads on the other three processors... (You might want to make it sleep or
block in some other way.)
Webb
|
1507.8 | | SMURF::DENHAM | Digital UNIX Kernel | Wed Mar 19 1997 10:32 | 5 |
| This race started stuff is the core of the suite of MT SMP exercisers.
The race start always looks like a great candidate for a barrier
synchronization setup. I've always dislike the yield approach
because it makes the program look busy when it can for all intents
and purposes be consider hung...
|
1507.9 | No pthreads! (Well, not if they can help it...) | WTFN::SCALES | Despair is appropriate and inevitable. | Wed Mar 19 1997 11:40 | 11 |
| .8> The race start always looks like a great candidate for a barrier
.8> synchronization setup.
True -- but the test suite seems to have this rule about not using pthread
functions, except where absolutely necessary (e.g., you'll note the total
absence of mutexes and condition variables, despite the fact that they'd
probably wipe out most of the bugs). So, it'd have to be a "special" barrier...
:-p
Webb
|
1507.10 | duh.. | DECWET::ARTI | | Wed Mar 19 1997 22:21 | 2 |
| I see the same trace. What field of the thread do I look at
to figure out if its collecting computes?
|
1507.11 | reply for 1507.7 | DECWET::ARTI | | Wed Mar 19 1997 22:26 | 6 |
| Oops my previous reply was a response to 1507.6.
This note is a response to 1507.7
The traces for threads 1-6 stay pretty much the same in
both dbx -k /vmunix and when I decladebug the hung test.
Sounds like I may need to talk to the person that wrote
the test.
|
1507.12 | | SMURF::DENHAM | Digital UNIX Kernel | Thu Mar 20 1997 08:37 | 18 |
| Computes: well as ps mp PID of the threaded process will show
over a few tries which kernel threads are building up cpu
time. If you then count down from the top of the ps
output (starting a 0) to the computing thread, you can
usually correlate it with the kernel thread in dbx -k /vmunix.
Set $pid to that process and do a tlist command. Count down
that list (starting with 1 this time) to the same number thread
and tset there. To see if its going anywhere, you can try
printing these sorts of values:
(dbx) p thread.sched_stamp
(dbx) p thread.last_run_stamp
(dbx) p thread.sleep_stamp
The first 2 should keep changing if it's ever running. The last
one will change if it's running then blocking occasionally.
FWIW, eh?
|
1507.13 | Not hung, just moving very slowly? | WTFN::SCALES | Despair is appropriate and inevitable. | Thu Mar 20 1997 12:09 | 10 |
| .11> The traces for threads 1-6 stay pretty much the same in
.11> both dbx -k /vmunix and when I decladebug the hung test.
But they don't stay _exactly_ the same?? If they are varying at all, I suggest
that your program is not really "hung" (or, at least, not in the way that is
typically meant in this conference). Instead, it's probably just making very
slow progress.
Webb
|
1507.14 | no thread collects computes | DECWET::ARTI | | Thu Mar 20 1997 15:49 | 14 |
| I did a several ps mp of the hung process but always get the
following output with no change:
# ps mp 511
PID TTY S TIME CMD
511 pts/0 T + 10:43.10 ./mmap_syscalls_exer -b -t4 -I10s
T 0:00.05
T 0:00.00
T 2:41.57
T 0:00.00
T 2:40.63
T 2:42.87
T 2:37.96
So I assume that the _XentMM is a page fault and not a SEGV, correct?
|
1507.15 | really hung.. | DECWET::ARTI | | Thu Mar 20 1997 15:50 | 4 |
| Re. .13.
I meant to say that the traces stay _exactly_ the same. The process is
really and truly hung.
|
1507.16 | Um...try again.... | WTFN::SCALES | Despair is appropriate and inevitable. | Thu Mar 20 1997 18:10 | 8 |
| .14> 511 pts/0 T + 10:43.10 ./mmap_syscalls_exer -b -t4 -I10s
Um...the "T" means the process has been "stopped". Did you hit ^Z or do you
have this under the control of the debugger? Given that the threads are all
stopped, of course they are not accumulating any CPU time...
Webb
|