[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference clt::cma

Title:DECthreads Conference
Moderator:PTHRED::MARYSTEON
Created:Mon May 14 1990
Last Modified:Fri Jun 06 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:1553
Total number of notes:9541

1513.0. "Locking a mutex" by MUFFIT::helen (Helen Pratt) Wed Mar 26 1997 13:42


I am working with a 3rd party who are currently testing their product
on Digital UNIX V4.0A with the majority of the patches available.

They are currently at the stage of running stress tests and yesterday
after running for just over 1.5 hours on a 2 cpu system, one
of the major components of the system hung.  They managed to attach
with ladebug and gathered thread stack, mutex and condition variable
information.  I have been through this today and I'm a bit confused
by a mutex which doesn't appear to be locked, but has two threads which
are blocked attempting to lock it - can anyone help me in my confusion?

The following is I believe the pertinent information from ladebug, (note
that I have removed information about the other threads, but if anyone
wants to take a look, let me know):

Thread State      Substate        Policy     Priority Name
------ ---------- --------------- ---------- -------- -------------
    51 blocked    mutex wait      throughput 11       <anonymous>
    72 blocked    mutex wait      throughput 11       <anonymous>

Mutex 706 (normal) "mutex at 0x140201680" (0x140201680, block 0x1401ffcb0) is
  not locked, 2 threads waiting; ref count is 5; waiters: 51, 72


Stack trace for thread 51
#0  0x3ff8057bbb4 in /usr/shlib/libpthread.so
#1  0x3ff80566820 in /usr/shlib/libpthread.so
#2  0x3ff80564254 in dspDispatch(0x1401ffcb0, 0x141090268, 0x1410900f0, 0x0, 0x1
40201680, 0x100000000) DebugInformationStrippedFromFile89:???
#3  0x3ff80567c50 in pthread_mutex_block(0x0, 0x30040186a30, 0x475, 0x0, 0x14020
1680, 0x0) DebugInformationStrippedFromFile96:???
#4  0x3ff8057b9b0 in __pthread_mutex_lock(0x30040186a30, 0x475, 0x0, 0x140201680
, 0x0, 0x3ff805afc00) DebugInformationStrippedFromFile112:???
#5  0x3ff805afbfc in pthread_mutex_lock(0x0, 0x140201680, 0x0, 0x3ff805afc00, 0x
30000113018, 0x1410d11b8) DebugInformationStrippedFromFile7:???
#6  0x30000113014 in tc_watchDog_AddWatch(0x100000001, 0x12004b500, 0x0, 0x0, 0x
0, 0x100000000) DebugInformationStrippedFromFile93:???
#7  0x1200c2db4 in UnknownProcedure6FromFile23(0x14173e938, 0x1410d0e80, 0x2, 0x
1410cfdb0, 0x100000000, 0xa02f3) DebugInformationStrippedFromFile23:???
#8  0x1200c9e64 in UnknownProcedure45FromFile23(0x1410d11b8, 0x1, 0xa02f3, 0x3,
0x100000000, 0x0) DebugInformationStrippedFromFile23:???
#9  0x1200ca984 in svr_Read(0x14112f160, 0x140f6ee20, 0x141613ce0, 0x1416fa6e0,
0xabcdef02, 0x30000110088) DebugInformationStrippedFromFile23:???
#10 0x12008f43c in svr_FltTFRead(0x100000132, 0x100000002, 0x1410d15e8, 0x0, 0x1
00000000, 0x1410d1108) DebugInformationStrippedFromFile18:???
#11 0x12009e4cc in UnknownProcedure16FromFile21(0x1200b63b0, 0x1410d36a8, 0x1410
d36b0, 0xe8, 0x14113fb58, 0x1401a1180) DebugInformationStrippedFromFile21:???
#12 0x1200b6834 in /opt/encina/bin/sfs
#13 0x3000014bd3c in UnknownProcedure17FromFile112(0x1410900f0, 0x1410a4958, 0x1
00000001, 0x45586732, 0x3, 0x0) DebugInformationStrippedFromFile112:???
#14 0x300001e0ec4 in UnknownProcedure12FromFile179(0x3, 0x0, 0x3ff80573c88, 0x30
0001e0e00, 0x1410900f0, 0x0) DebugInformationStrippedFromFile179:???
#15 0x3ff80573c84 in thdBase(0x0, 0x0, 0x0, 0x1, 0x45586732, 0x3) DebugInformati
onStrippedFromFile102:???


Stack trace for thread 72
#0  0x3ff8057bbb4 in /usr/shlib/libpthread.so
#1  0x3ff80566820 in /usr/shlib/libpthread.so
#2  0x3ff80564254 in dspDispatch(0x1401ffcb0, 0x141535aa8, 0x141535930, 0x0, 0x1
40201680, 0x100000000) DebugInformationStrippedFromFile89:???
#3  0x3ff80567c50 in pthread_mutex_block(0x0, 0x30040186a30, 0x475, 0x0, 0x14020
1680, 0x0) DebugInformationStrippedFromFile96:???
#4  0x3ff8057b9b0 in __pthread_mutex_lock(0x30040186a30, 0x475, 0x0, 0x140201680
, 0x0, 0x3ff805afc00) DebugInformationStrippedFromFile112:???
#5  0x3ff805afbfc in pthread_mutex_lock(0x0, 0x140201680, 0x0, 0x3ff805afc00, 0x
30000113018, 0x1415cd3a8) DebugInformationStrippedFromFile7:???
#6  0x30000113014 in tc_watchDog_AddWatch(0x100000001, 0x12004b500, 0x0, 0x0, 0x
0, 0x100000000) DebugInformationStrippedFromFile93:???
#7  0x1200c2db4 in UnknownProcedure6FromFile23(0x14168d2d8, 0x1415ccf70, 0x2, 0x
1415cbec0, 0x100000000, 0x2d00fe) DebugInformationStrippedFromFile23:???
#8  0x1200c9e64 in UnknownProcedure45FromFile23(0x1415cd3a8, 0x0, 0x2d00fe, 0x3,
 0x100000000, 0x2130456) DebugInformationStrippedFromFile23:???
#9  0x1200ca984 in svr_Read(0x1200924e0, 0x6b95064b, 0x2, 0x0, 0xb400000000, 0x4
fa8abed) DebugInformationStrippedFromFile23:???
#10 0x120092504 in svr_FltTRead(0x6b95064b, 0x100000152, 0x1415cd2c0, 0x0, 0x100
000000, 0x1415ebf98) DebugInformationStrippedFromFile18:???
#11 0x1200b177c in UnknownProcedure32FromFile21(0x1200b63b0, 0x1415cf6a8, 0x1415
cf6b0, 0x218, 0x14125b5d8, 0x141127460) DebugInformationStrippedFromFile21:???
#12 0x1200b6a74 in /opt/encina/bin/sfs


Thank you in advance for any words of wisdom - I've been looking at this
too long!

Helen.

T.RTitleUserPersonal
Name
DateLines
1513.1SMURF::DENHAMDigital UNIX KernelWed Mar 26 1997 14:501
    Were these threads truly hung or were they burning up cpu cycles?
1513.2Not sure on the threads but process was !MUFFIT::helenHelen PrattWed Mar 26 1997 15:2515
>>    Were these threads truly hung or were they burning up cpu cycles?

Unfortunately I don't know the answer to that for these particular 
threads.  The process as a whole was burning up 182% of the 2 cpu's.
however, I suspect that that may have been the running threads of which
there were a couple.

What puzzles me more is the fact that the threads in .0 are both blocked.

Thanks for the quick response,

Helen.


1513.3DCETHD::BUTENHOFDave Butenhof, DECthreadsThu Mar 27 1997 06:469
We have found a few race conditions that can lead to stranded threads on a
mutex. They've all been patched, but I don't know the patch numbers or even
whether they've all gotten out to the field. Pete's been doing the patch
submissions, so perhaps he'll have a better idea. It's impossible to tell for
sure whether your case is related to any of these problems, though, based on
the information you've given. (And might be hard to tell even with all the
information -- they're fairly subtle races.)

	/dave
1513.4Wait -- all of this looks OK...WTFN::SCALESDespair is appropriate and inevitable.Thu Mar 27 1997 09:2828
Hang on everybody, things may not be quite what you think!

.0> They managed to attach with ladebug and gathered thread stack, mutex and
.0> condition variable information.  

So, then, what we're looking at is a single snapshot of a busy process in
action.

.0> Mutex 706 (normal) "mutex at 0x140201680" (0x140201680, block 0x1401ffcb0) is
.0>   not locked, 2 threads waiting; ref count is 5; waiters: 51, 72

There is nothing wrong here, per se.  Suppose just a moment ago the mutex
were locked with three waiters:  when the owner of the mutex unlocks it, it
would wake up one of the waiters, and the result would be exactly what we
have here -- an unlocked mutex two threads waiting on it.  (You'll notice
that the "ref count" is five, so there's alot going on with this mutex right
at the moment...)

We would need to know what the other threads in the process are doing (and
some general idea of what they are supposed to be doing) before we could
comment much more on what you're seeing.  Unfortunately, the stack traces you
posted are probably the least interesting ones in the process -- we alreaady
know that these two are waiting for the mutex (which the stack trace
confirms).  :-)  The most interesting ones are probably the two that are
currently running...


				Webb
1513.5Just the thingCICS03::helenHelen PrattTue Apr 01 1997 10:5716
Webb,

Thanks for the information it was just what I was looking for!

>>There is nothing wrong here, per se.  Suppose just a moment ago the mutex
>>were locked with three waiters:  when the owner of the mutex unlocks it, it
>>would wake up one of the waiters, and the result would be exactly what we
>>have here -- an unlocked mutex two threads waiting on it.  (You'll notice
>>that the "ref count" is five, so there's alot going on with this mutex right
>>at the moment...)

Regards,

Helen.