[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference clt::cma

Title:DECthreads Conference
Moderator:PTHRED::MARYSTEON
Created:Mon May 14 1990
Last Modified:Fri Jun 06 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:1553
Total number of notes:9541

1507.0. "debugging help needed.." by DECWET::ARTI () Tue Mar 18 1997 14:36

Hi,

  I am running a multi-threaded test on Steel bl8 that does mmap
operations. I start it with four threads and it hangs shortly after.
When I attach to the hung process with ladebug and display the threads
here is the output:

(ladebug) show thread
Thread State      Substate        Policy     Priority Name
------ ---------- --------------- ---------- -------- -------------
*    6 running                    throughput 11       <anonymous>
     1 running                    throughput 11       default thread
    -1 blocked    kernel          fifo       32       manager thread
    -2 ready                      idle        0       null thread for VP 0x0
    -3 ready                      idle        0       null thread for VP 0x1
    -4 ready                      idle        0       null thread for VP 0x2
    -5 ready                      idle        0       null thread for VP 0x3
     2 blocked    kernel          throughput 11       <anonymous>
     3 ready                      throughput 11       <anonymous>
     4 running                    throughput 11       <anonymous>
>    5 running                    throughput 11       <anonymous>

The 4 threads of interest to me are threads 3 through 6. The following are the
traces of the four threads. 2 of them are in swtch_pri(), 1 is in memcpy()
and 1 is in hstRestoreRegisters(). Any help in interpreting this information,
and debugging this hang would be much appreciated.

thanks,
arti
===========================================================================

(ladebug) thread 3
Thread State      Substate        Policy     Priority Name
------ ---------- --------------- ---------- -------- -------------
>    3 ready                      throughput 11       <anonymous>

(ladebug) where
>0  0x3ff8057c46c in hstRestoreRegisters(0x3ff80565c14, 0x3ff80579c1c, 0x3ff8057c3f0, 0x4c28, 0x14008f790, 0x3ffc01878a0) DebugInformationStrippedFromFile104
#1  0x3ff80579c18 in hstTransferContext(0x3ffc0183928, 0x140002340, 0x1400048e0, 0x140015100, 0x0, 0x140002340) DebugInformationStrippedFromFile102
#2  0x3ff805644e0 in dspDispatch(0x3ffc0188b50, 0x140002340, 0x3ff80565bd8, 0x3ffc01878a0, 0x3ff00000001, 0x140002340) DebugInformationStrippedFromFile82
#3  0x3ff80565c10 in /usr/shlib/libpthread.so
#4  0x3ff80572340 in pthread_yield_np(0x3ff80572344, 0x140020030, 0x3ff8011549c, 0x140015100, 0x1400022d0, 0x140020030) DebugInformationStrippedFromFile94
#5  0x3ff80115498 in __tis_yield(0x3ff8011549c, 0x140015100, 0x1400022d0, 0x140020030, 0x3ff80157bb8, 0x140015100) DebugInformationStrippedFromFile748
#6  0x3ff80157bb4 in __sched_yield(0x1400022d0, 0x140020030, 0x3ff80157bb8, 0x140015100, 0x1200086e4, 0x140001168) DebugInformationStrippedFromFile467
#7  0x1200086e0 in threadRaceLineUp(0x1200086e4, 0x140001168, 0x1200078f4, 0x140015100, 0x1400022d0, 0x140001560) DebugInformationStrippedFromFile3
#8  0x1200078f0 in /usr/users/arti/arti/mmap/exer/fs/mmap_syscalls_exer
#9  0x1200051ec in mmap_syscallRace(0x0, 0x0, 0x120009e64, 0x140015100, 0x120009e78, 0x140015100) DebugInformationStrippedFromFile1
#10 0x120009e74 in raceFuncWrapper(0x140020030, 0x140015100, 0x3ffc0183890, 0x1, 0x3ffc018a100, 0x0) DebugInformationStrippedFromFile3
#11 0x3ff8057432c in thdBase(0x0, 0x0, 0x0, 0x1, 0x45586732, 0x3) DebugInformationStrippedFromFile94

(ladebug) thread 4
Thread State      Substate        Policy     Priority Name
------ ---------- --------------- ---------- -------- -------------
>    4 running                    throughput 11       <anonymous>

(ladebug) where
>0  0x3ff800d4fb8 in memcpy(0x3ff800d4fb8, 0x14000c2c0, 0x20000006394, 0x14009b9c0, 0x20, 0x2000000000000000) DebugInformationStrippedFromFile19
#1  0x120006c1c in UnknownProcedure14FromFile1(0x2524232221201f1e, 0x2d2c2b2a29282726, 0x3534333231302f2e, 0x100000020, 0x2c0, 0x140015118) DebugInformationStrippedFromFile1
#2  0x1200051ec in mmap_syscallRace(0x3534333231302f2e, 0x100000020, 0x2c0, 0x140015118, 0x120009e78, 0x140015118) DebugInformationStrippedFromFile1
#3  0x120009e74 in raceFuncWrapper(0x14002cb50, 0x140015118, 0x3ffc0183890, 0x1, 0x3ffc018a100, 0x0) DebugInformationStrippedFromFile3
#4  0x3ff8057432c in thdBase(0x0, 0x0, 0x0, 0x1, 0x45586732, 0x3) DebugInformationStrippedFromFile94

(ladebug) thread 5
Thread State      Substate        Policy     Priority Name
------ ---------- --------------- ---------- -------- -------------
>    5 running                    throughput 11       <anonymous>

(ladebug) where
>0  0x3ff80540858 in swtch_pri(0x3ff80565bd8, 0x3ffc01878a0, 0x3ff00000001, 0x140002340, 0x3ff80572344, 0x14002d0b0) DebugInformationStrippedFromFile21
#1  0x3ff805722bc in pthread_yield_np(0x3ff80572344, 0x14002d0b0, 0x3ff8011549c, 0x140015130, 0x1400022d0, 0x0) DebugInformationStrippedFromFile94
#2  0x3ff80115498 in __tis_yield(0x3ff8011549c, 0x140015130, 0x1400022d0, 0x0, 0x3ff80157bb8, 0x0) DebugInformationStrippedFromFile748
#3  0x3ff80157bb4 in __sched_yield(0x1400022d0, 0x0, 0x3ff80157bb8, 0x0, 0x1200086e4, 0x1400013e0) DebugInformationStrippedFromFile467
#4  0x1200086e0 in threadRaceLineUp(0x1200086e4, 0x1400013e0, 0x12000660c, 0x140015130, 0x1400022d0, 0x14002d1b0) DebugInformationStrippedFromFile3
#5  0x120006608 in UnknownProcedure10FromFile1(0x1400022d0, 0x14002d1b0, 0x1200051f0, 0x140015130, 0x20000002000, 0x140015130) DebugInformationStrippedFromFile1
#6  0x1200051ec in mmap_syscallRace(0x1200051f0, 0x140015130, 0x20000002000, 0x140015130, 0x120009e78, 0x140015130) DebugInformationStrippedFromFile1
#7  0x120009e74 in raceFuncWrapper(0x14002d0b0, 0x140015130, 0x3ffc0183890, 0x1, 0x3ffc018a100, 0x0) DebugInformationStrippedFromFile3
#8  0x3ff8057432c in thdBase(0x0, 0x0, 0x0, 0x1, 0x45586732, 0x3) DebugInformationStrippedFromFile94

(ladebug) thread 6
Thread State      Substate        Policy     Priority Name
------ ---------- --------------- ---------- -------- -------------
>*   6 running                    throughput 11       <anonymous>

(ladebug) where
>0  0x3ff80540858 in swtch_pri(0x3ff80565bd8, 0x3ffc01878a0, 0x3ff00000001, 0x140002340, 0x3ff80572344, 0x14002d610) DebugInformationStrippedFromFile21
#1  0x3ff805722bc in pthread_yield_np(0x3ff80572344, 0x14002d610, 0x3ff8011549c, 0x140015148, 0x1400022d0, 0x0) DebugInformationStrippedFromFile94
#2  0x3ff80115498 in __tis_yield(0x3ff8011549c, 0x140015148, 0x1400022d0, 0x0, 0x3ff80157bb8, 0x0) DebugInformationStrippedFromFile748
#3  0x3ff80157bb4 in __sched_yield(0x1400022d0, 0x0, 0x3ff80157bb8, 0x0, 0x1200086e4, 0x140001470) DebugInformationStrippedFromFile467
#4  0x1200086e0 in threadRaceLineUp(0x1200086e4, 0x140001470, 0x12000686c, 0x140015148, 0x1400022d0, 0x14002d710) DebugInformationStrippedFromFile3
#5  0x120006868 in UnknownProcedure12FromFile1(0x1400022d0, 0x14002d710, 0x1200051f0, 0x140015148, 0x2000000c000, 0x140015148) DebugInformationStrippedFromFile1
#6  0x1200051ec in mmap_syscallRace(0x1200051f0, 0x140015148, 0x2000000c000, 0x140015148, 0x120009e78, 0x140015148) DebugInformationStrippedFromFile1
#7  0x120009e74 in raceFuncWrapper(0x14002d610, 0x140015148, 0x3ffc0183890, 0x1, 0x3ffc018a100, 0x0) DebugInformationStrippedFromFile3
#8  0x3ff8057432c in thdBase(0x0, 0x0, 0x0, 0x1, 0x45586732, 0x3) DebugInformationStrippedFromFile94


T.RTitleUserPersonal
Name
DateLines
1507.1Can you get a stack trace for all the threads?PTHRED::PORTANTEPeter Portante, DTN 381-2261, (603)881-2261, MS ZKO2-3/Q18Tue Mar 18 1997 15:117
Arti,

Could you post the stack traces for the other threads in the program as well?
Most likely the problem is being caused by one of those.  The ones that are
running look fine...

-Peter
1507.2All of your threads look more or less OK...what's #1 doing?WTFN::SCALESDespair is appropriate and inevitable.Tue Mar 18 1997 15:3315
Well, thread 3 looks OK -- it's not running at the moment (i.e., it's "ready"),
so the top of its stack is as it should be, hstRestoreRegisters().  Threads 3,
5, and 6 are all in calls to sched_yield() called from threadRaceLineUp(), which
doesn't look too weird.  (I'd guess that this is either the beginning or the end
of a test run...)  You might want to look at whether you think they _ought_ to
be looping around a call to sched_yield().

Thread 4 looks like it's either just begun or not yet finished its piece of the
"race" (i.e., it's in memcpy(), presumably that's part of the test).

The missing piece is thread 1...it's also supposedly running at the moment. 
What's it doing??


				Webb
1507.3the traces for the other threads..DECWET::ARTITue Mar 18 1997 15:5573
The stack traces for the other threads follows. Thread 2 is in sigwaitprim(),
that's ok as that thread is always started up by the test to catch signals.
thread 1 is in a function called raceStarter(). 

arti
======================================================================
(ladebug) thread 1
Thread State      Substate        Policy     Priority Name
------ ---------- --------------- ---------- -------- -------------
>    1 running                    throughput 11       default thread

(ladebug) where
>0  0x12000a6f4 in raceStarter(0x12000a728, 0x40e8000000000000, 0x120007fc8, 0x1400022d0, 0xb00, 0x0) DebugInformationStrippedFromFile3
#1  0x120007fc4 in threadRace(0x120005084, 0x140000058, 0xb00, 0x0, 0x120005170, 0x0) DebugInformationStrippedFromFile3
#2  0x120005080 in main(0x3ff80017380, 0x140018f00, 0x140010d20, 0x140000220, 0x120003760, 0x0) DebugInformationStrippedFromFile1

(ladebug) thread -1
Thread State      Substate        Policy     Priority Name
------ ---------- --------------- ---------- -------- -------------
>   -1 blocked    kernel          fifo       32       manager thread

(ladebug) where
>0  0x3ff80540774 in msg_receive_trap(0x0, 0x0, 0x0, 0x0, 0x0, 0x0) DebugInformationStrippedFromFile21
#1  0x3ff805365b8 in msg_receive(0x3ffc01878a0, 0x3ffc0188c30, 0x0, 0x400, 0x3ffc018a100, 0x0) DebugInformationStrippedFromFile6
#2  0x3ff8056c888 in UnknownProcedure3FromFile90(0x0, 0x0, 0x3ff00000001, 0x45586732, 0x3, 0x0) DebugInformationStrippedFromFile90
#3  0x3ff8057432c in thdBase(0x0, 0x0, 0x0, 0x1, 0x45586732, 0x3) DebugInformationStrippedFromFile94

(ladebug) thread -2
Thread State      Substate        Policy     Priority Name
------ ---------- --------------- ---------- -------- -------------
>   -2 ready                      idle        0       null thread for VP 0x0

(ladebug) where
>0  0x3ff8057c46c in hstRestoreRegisters(0x0, 0x3ffc018fb40, 0x0, 0x0, 0x0, 0x0) DebugInformationStrippedFromFile104
#1  0x3ff80579c18 in hstTransferContext(0x0, 0x3ffc018fb40, 0x0, 0x0, 0x0, 0x0) DebugInformationStrippedFromFile102

(ladebug) thread -3
Thread State      Substate        Policy     Priority Name
------ ---------- --------------- ---------- -------- -------------
>   -3 ready                      idle        0       null thread for VP 0x1

(ladebug) where
>0  0x3ff8057c46c in hstRestoreRegisters(0x0, 0x3ffc018fb40, 0x0, 0x0, 0x0, 0x0) DebugInformationStrippedFromFile104
#1  0x3ff80579c18 in hstTransferContext(0x0, 0x3ffc018fb40, 0x0, 0x0, 0x0, 0x0) DebugInformationStrippedFromFile102

(ladebug) thread -4
Thread State      Substate        Policy     Priority Name
------ ---------- --------------- ---------- -------- -------------
>   -4 ready                      idle        0       null thread for VP 0x2

(ladebug) where
>0  0x3ff8057c46c in hstRestoreRegisters(0x0, 0x3ffc018fb40, 0x0, 0x0, 0x0, 0x0) DebugInformationStrippedFromFile104
#1  0x3ff80579c18 in hstTransferContext(0x0, 0x3ffc018fb40, 0x0, 0x0, 0x0, 0x0) DebugInformationStrippedFromFile102

(ladebug) thread -5
Thread State      Substate        Policy     Priority Name
------ ---------- --------------- ---------- -------- -------------
>   -5 ready                      idle        0       null thread for VP 0x3

(ladebug) where
>0  0x3ff8057c46c in hstRestoreRegisters(0x0, 0x3ffc018fb40, 0x0, 0x0, 0x0, 0x0) DebugInformationStrippedFromFile104
#1  0x3ff80579c18 in hstTransferContext(0x0, 0x3ffc018fb40, 0x0, 0x0, 0x0, 0x0) DebugInformationStrippedFromFile102

(ladebug) thread 2
Thread State      Substate        Policy     Priority Name
------ ---------- --------------- ---------- -------- -------------
>    2 blocked    kernel          throughput 11       <anonymous>

(ladebug) where
>0  0x3ff801117c8 in __sigwaitprim(0x0, 0x0, 0x0, 0x0, 0x0, 0x0) DebugInformationStrippedFromFile867
#1  0x3ff8056e714 in __sigwaitd10(0x0, 0x0, 0x0, 0x0, 0x12000aa14, 0x14005ba30) DebugInformationStrippedFromFile92
#2  0x12000aa10 in /usr/users/arti/arti/mmap/exer/fs/mmap_syscalls_exer
#3  0x3ff8057432c in thdBase(0x0, 0x0, 0x0, 0x1, 0x45586732, 0x3) DebugInformationStrippedFromFile94
1507.4raceStarter functionDECWET::ARTITue Mar 18 1997 16:0386
Here's the code for the raceStarter function:

// raceStarter - synchronizes thread execution (threadRaceLineUp)

void
raceStarter (void)
{
        register struct perThreadInfo *thisThreadsInfo;
        register threadReadyCnt; // how many threads are in Ready state
        register struct raceInfo *r = &raceInfo;


        DEBUG ("raceStarter: turning On lineUpSync");

        r->lineUpSync = On;  // turn on thread race line up synchronization.
                             // it will be turned off by the last racing thread
                             // to return from the users race function.
                             // no mutex is needed.

        while (1) {

                // spin here until all race threads mark themselves ready

                do {
                        threadReadyCnt = 0;

                        // begin check at the last thread to improve response

                        for(thisThreadsInfo = &r->perThreadInfo[r->numThreads- 1]
 ; thisThreadsInfo >= &r->perThreadInfo[0]; thisThreadsInfo--) {

                                if (thisThreadsInfo->threadState == Ready)
                                        threadReadyCnt++;
                                else {
                                        DEBUG1( "raceStarter: yield on threadReadyCnt %d",
                                                threadReadyCnt);
#ifdef WANT_YIELD_COUNTS
                                        if (++r->raceStarterYieldCnt >
                                                    r->raceStarterYieldCntMax)
                                                r->raceStarterYieldCntMax =
                                                        r->raceStarterYieldCnt
;
#endif

                                        pthread_yield();
                                        break;
                                }
                        }

			if (r->raceCancelled || r->lineUpSync == Off)
                                return;

                } while (threadReadyCnt != r->numThreads);

#ifdef WANT_YIELD_COUNTS
                r->raceStarterYieldCnt = 0;
#endif

                // Ready

                DEBUG("raceStarter: Go = No");
                r->Go = No;     // hold back racing threads from executing
                                // by initializing Go to No.

                mb();

                // Set

                DEBUG("raceStarter: threadState = Set");
                for (thisThreadsInfo = &r->perThreadInfo[0];
                     thisThreadsInfo < &r->perThreadInfo[r->numThreads];
                     thisThreadsInfo++)

                        thisThreadsInfo->threadState = Set;

                // Go

                mb();

                DEBUG("raceStarter: Go = Yes");
                r->Go = Yes;

                pthread_yield(); // hopefully racers will run
        }
}

1507.5dbx output of hung testDECWET::ARTITue Mar 18 1997 16:1352
Also if I dbx the running kernel, and print out the traces
for the threads in the hung test I get the following. Note
that it appears that one thread got a memory-management fault
but got prempted during the trap(). How do I look at the
kernel threads from ladebug. I know I have to set $threadlevel?

arti

(dbx) tstack

Thread 0xfffffc000f5878c0:
>  0 thread_block() ["/usr/sde/osf1/build/steelos.bl8/src/kernel/kern/sched_prim.c":2206, 0xfffffc00002a9d68]
   1 msg_dequeue(0xfffffc000f183ea0, 0xfffffc000f183ee8, 0xffffffff8043f020, 0xffffffff8cb8b910, 0x28) ["/usr/sde/osf1/build/steelos.bl8/src/kernel/kern/ipc_basics.c":926, 0xfffffc0000296328]
   2 msg_receive_trap(0x3ffc018a100, 0x0, 0xfffffc0007c15e00, 0x14001f680, 0x2800000400) ["/usr/sde/osf1/build/steelos.bl8/src/kernel/kern/ipc_basics.c":1292, 0xfffffc00002969dc]
   3 _Xsyscall(0x8, 0x3ff80540774, 0x3ffc017d250, 0x14001f680, 0x400) ["/usr/sde/osf1/build/steelos.bl8/src/kernel/arch/alpha/locore.s":1404, 0xfffffc00004fc1f0]

Thread 0xfffffc000facbb80:
>  0 thread_block() ["/usr/sde/osf1/build/steelos.bl8/src/kernel/kern/sched_prim.c":2206, 0xfffffc00002a9d68]
   1 mpsleep(0xfffffc0007d88a80, 0x9, 0x0, 0xfffffc000facbd90, 0x0) ["/usr/sde/osf1/build/steelos.bl8/src/kernel/bsd/kern_synch.c":592, 0xfffffc0000272414]
   2 nxm_get_thread() ["/usr/sde/osf1/build/steelos.bl8/src/kernel/kern/sched_prim.c":4055, 0xfffffc00002ac730]

Thread 0xfffffc000faca840:
>  0 thread_block() ["/usr/sde/osf1/build/steelos.bl8/src/kernel/kern/sched_prim.c":2206, 0xfffffc00002a9d68]
   1 mpsleep(0xfffffc0007d88ca8, 0x1001, 0x0, 0xfffffc000facaa50, 0x0) ["/usr/sde/osf1/build/steelos.bl8/src/kernel/bsd/kern_synch.c":592, 0xfffffc0000272414]
   2 sigwaitprim() ["/usr/sde/osf1/build/steelos.bl8/src/kernel/bsd/kern_sig.c":2679, 0xfffffc000026bc30]
   3 syscall(0x0, 0xffffffffffffffde, 0x0, 0xa3760332e7560, 0x0) ["/usr/sde/osf1/build/steelos.bl8/src/kernel/arch/alpha/syscall_trap.c":552, 0xfffffc0000506690]
   4 _Xsyscall(0x8, 0x3ff801117c8, 0x3ffc018fb40, 0x2, 0x0) ["/usr/sde/osf1/build/steelos.bl8/src/kernel/arch/alpha/locore.s":1255, 0xfffffc00004fc094]

Thread 0xfffffc00096ae840:
>  0 thread_block() ["/usr/sde/osf1/build/steelos.bl8/src/kernel/kern/sched_prim.c":2206, 0xfffffc00002a9d68]
   1 swtch_pri(0x140002008, 0xfffffc0000000005, 0xfffffc00004fc1f4, 0xffffffffffffffc5, 0x1400022d0) ["/usr/sde/osf1/build/steelos.bl8/src/kernel/kern/syscall_subr.c":374, 0xfffffc00002acc58]
More (n if no)?
   2 _Xsyscall(0x8, 0x3ff80540858, 0x3ffc018fb40, 0x0, 0x14002d610) ["/usr/sde/osf1/build/steelos.bl8/src/kernel/arch/alpha/locore.s":1404, 0xfffffc00004fc1f0]

Thread 0xfffffc00096aedc0:
>  0 thread_block() ["/usr/sde/osf1/build/steelos.bl8/src/kernel/kern/sched_prim.c":2206, 0xfffffc00002a9d68]
   1 thread_preempt(thread = 0xfffffc00096aedc0, processor = (nil)) ["/usr/sde/osf1/build/steelos.bl8/src/kernel/kern/sched_prim.c":3917, 0xfffffc00002ac400]
   2 trap() ["/usr/sde/osf1/build/steelos.bl8/src/kernel/arch/alpha/trap.c":2112, 0xfffffc0000508274]
   3 exception_exit(0x8, 0x12000a6f4, 0x14000c2c0, 0x3ffc0188b50, 0x1) ["/usr/sde/osf1/build/steelos.bl8/src/kernel/arch/alpha/locore.s":1005, 0xfffffc00004fbe68]

Thread 0xfffffc00096aeb00:
>  0 thread_block() ["/usr/sde/osf1/build/steelos.bl8/src/kernel/kern/sched_prim.c":2206, 0xfffffc00002a9d68]
   1 thread_preempt(thread = 0xfffffc00096aeb00, processor = (nil)) ["/usr/sde/osf1/build/steelos.bl8/src/kernel/kern/sched_prim.c":3917, 0xfffffc00002ac400]
   2 trap() ["/usr/sde/osf1/build/steelos.bl8/src/kernel/arch/alpha/trap.c":2112, 0xfffffc0000508274]
   3 _XentMM(0x8, 0x3ff800d4fb8, 0x14000c2c0, 0x20000006394, 0x14009b9c0) ["/usr/sde/osf1/build/steelos.bl8/src/kernel/arch/alpha/locore.s":1473, 0xfffffc00004fc2d4]

Thread 0xfffffc00096ae2c0:
>  0 thread_block() ["/usr/sde/osf1/build/steelos.bl8/src/kernel/kern/sched_prim.c":2206, 0xfffffc00002a9d68]
   1 thread_preempt(thread = 0xfffffc00096ae2c0, processor = (nil)) ["/usr/sde/osf1/build/steelos.bl8/src/kernel/kern/sched_prim.c":3917, 0xfffffc00002ac400]
   2 trap() ["/usr/sde/osf1/build/steelos.bl8/src/kernel/arch/alpha/trap.c":2112, 0xfffffc0000508274]
   3 exception_exit(0x8, 0x3ff80540858, 0x3ffc018fb40, 0x0, 0x14002d0b0) ["/usr/sde/osf1/build/steelos.bl8/src/kernel/arch/alpha/locore.s":1005, 0xfffffc00004fbe68]

1507.6SMURF::DENHAMDigital UNIX KernelTue Mar 18 1997 21:197
    That thread preempt in the _XentMM could be doing a page fault,
    or it could be stuck doing a SEGV in it's SEGV handler.
    Try tracing the same thread a bunch of times in the kernel. If
    you see pretty much the same trace but the thread is collecting
    computes, it's most like the SEGV. Then you could try setting
    the SIGSEGV handler to SIG_DFL in main() and see if you get
    a core dump....
1507.7There is something missing here...WTFN::SCALESDespair is appropriate and inevitable.Wed Mar 19 1997 09:5027
There's something fishy going on here...Arti, when this thing is "hung" do
the stack traces for threads 1-6 always look pretty much the same, or, if you
let it run again for a sec and then check again do they switch around?

I have the feeling that this program might be doing pretty much what you told
it to -- do a sh*load of yields.  (The calls to switch_pri() are just a side
effect of intensively yielding on a multiprocessor.)

It looks to me like there is at least one hole in the raceStarter() code.
The "starter" starts the threads and then yields, but it never gets any
assurance from the threads that they actually started.  Thus, when it loops
around to see if they are now "ready", it can't tell whether a thread is
ready now because it already finished or because it never started!  I think
you need some sort of explicit "I did run" mechanism from each thread.

One other tidbit:  you should understand that when you call pthread_yield()
there is no guarantee that anything special will happen; the call is a "hint"
to the DECthreads scheduler that "now would be a good time to preempt me",
but there is no requirement that the preemption must actually occur.  Thus,
you cannot really rely on the raceStarter thread loosing control at any time,
as the code is currently written -- that is, this thread could just soak up
one of your processors for the whole test, forcing you to run the other four
threads on the other three processors...  (You might want to make it sleep or
block in some other way.)


				Webb
1507.8SMURF::DENHAMDigital UNIX KernelWed Mar 19 1997 10:325
    This race started stuff is the core of the suite of MT SMP exercisers.
    The race start always looks like a great candidate for a barrier
    synchronization setup. I've always dislike the yield approach
    because it makes the program look busy when it can for all intents
    and purposes be consider hung...
1507.9No pthreads! (Well, not if they can help it...)WTFN::SCALESDespair is appropriate and inevitable.Wed Mar 19 1997 11:4011
.8> The race start always looks like a great candidate for a barrier
.8> synchronization setup.

True -- but the test suite seems to have this rule about not using pthread
functions, except where absolutely necessary (e.g., you'll note the total
absence of mutexes and condition variables, despite the fact that they'd
probably wipe out most of the bugs).  So, it'd have to be a "special" barrier...
 :-p


				Webb
1507.10duh..DECWET::ARTIWed Mar 19 1997 22:212
    I see the same trace. What field of the thread do I look at
    to figure out if its collecting computes?
1507.11reply for 1507.7DECWET::ARTIWed Mar 19 1997 22:266
    Oops my previous reply was a response to 1507.6.
    This note is a response to 1507.7
    The traces for threads 1-6 stay pretty much the same in
    both dbx -k /vmunix and when I decladebug the hung test.
    Sounds like I may need to talk to the person that wrote
    the test.
1507.12SMURF::DENHAMDigital UNIX KernelThu Mar 20 1997 08:3718
    Computes: well as ps mp PID of the threaded process will show
    over a few tries which kernel threads are building up cpu
    time. If you then count down from the top of the ps
    output (starting a 0) to the computing thread, you can
    usually correlate it with the kernel thread in dbx -k /vmunix.
    Set $pid to that process and do a tlist command. Count down
    that list (starting with 1 this time) to the same number thread
    and tset there. To see if its going anywhere, you can try
    printing these sorts of values:
    
    (dbx) p thread.sched_stamp
    (dbx) p thread.last_run_stamp
    (dbx) p thread.sleep_stamp
    
    The first 2 should keep changing if it's ever running. The last
    one will change if it's running then blocking occasionally.
    
    FWIW, eh?
1507.13Not hung, just moving very slowly?WTFN::SCALESDespair is appropriate and inevitable.Thu Mar 20 1997 12:0910
.11> The traces for threads 1-6 stay pretty much the same in
.11> both dbx -k /vmunix and when I decladebug the hung test.

But they don't stay _exactly_ the same??  If they are varying at all, I suggest
that your program is not really "hung" (or, at least, not in the way that is
typically meant in this conference).  Instead, it's probably just making very
slow progress.


				Webb
1507.14no thread collects computesDECWET::ARTIThu Mar 20 1997 15:4914
    I did a several ps mp of the hung process but always get the
    following output with no change:
    # ps mp 511
       PID TTY      S           TIME CMD
       511 pts/0    T  +    10:43.10 ./mmap_syscalls_exer -b -t4 -I10s
                    T        0:00.05                                                
                    T        0:00.00                                                
                    T        2:41.57                                                
                    T        0:00.00                                                
                    T        2:40.63                                                
                    T        2:42.87                                                
                    T        2:37.96                                                
    
    So I assume that the _XentMM is a page fault and not a SEGV, correct?
1507.15really hung..DECWET::ARTIThu Mar 20 1997 15:504
    Re. .13.
    
    I meant to say that the traces stay _exactly_ the same. The process is
    really and truly hung.
1507.16Um...try again....WTFN::SCALESDespair is appropriate and inevitable.Thu Mar 20 1997 18:108
.14>       511 pts/0    T  +    10:43.10 ./mmap_syscalls_exer -b -t4 -I10s

Um...the "T" means the process has been "stopped".  Did you hit ^Z or do you
have this under the control of the debugger?  Given that the threads are all
stopped, of course they are not accumulating any CPU time...


				Webb