[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference clt::cma

Title:DECthreads Conference
Moderator:PTHRED::MARYSTEON
Created:Mon May 14 1990
Last Modified:Fri Jun 06 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:1553
Total number of notes:9541

1485.0. "how to isolate a problem while in debug" by KCBBQ::PRESTON (big enough never is) Mon Feb 17 1997 10:21

<email headers removed>

Hi Taylor,

We're having a problem and hope you or someone at DEC can help us out.

We are having a problem with threaded servers that are going into a hibernate 
state from user AST level.  Once this happens the thread never wakes up.  We
should be able to reproduce this problem sometime today in a debug version of 
the executable, but don't know what we should look for to try and isolate the
problem.  The last time this happened we tried a show task from debug
and received the following error:

    Task 0 is not a valid task.  

We believe that DECThreads is attempting to schedule a new thread at the 
time this happens.  Examining the call stack from SDA does not a return
address anywhere within the confines of our application.

Our applications are performing asynchronous qio calls with completion AST's,
but we have verified that there are no calls to sys$hiber() at ast level, or 
any pthread calls that are not callable at AST level.  We realize that you do 
not recommend mixing asynchronous system calls with threaded programs,  but we
do not have a choice here as threads are mission critical for this application.

What information should we gather from the threads debugger to help determine 
if this is a problem that is caused by our application, or a problem with 
DECThreads?

T.RTitleUserPersonal
Name
DateLines
1485.1addtl info KCBBQ::PRESTONbig enough never isMon Feb 17 1997 14:48558
Taylor,  Here is some info from an incident we had this afternoon.

David.

------------------------------------------------------------------------

DBG> show task/all

 task id     state hold  pri substate        thread_object
 %TASK     1 SUSP         11 Condition Wait  Initial thread 
 %TASK     2 SUSP         11 Condition Wait  2140261000
 %TASK     3 SUSP         11 Condition Wait  2140271320
 %TASK     4 SUSP         11 Condition Wait  2140271320
 %TASK     5 SUSP         11 Mutex Wait      367477076

DBG> show task/call %task 1

 task id     state hold  pri substate        thread_object
 %TASK     1 SUSP         11 Condition Wait  Initial thread 
 module name     routine name                     line       rel PC
abs PC
 SHARE$CMA$RTL                                              00000000
15A24320
 SHARE$CMA$RTL                                              00000000
15A1AA48
 SHARE$CMA$RTL                                              00000000
15A35DD4
 SHARE$CMA$RTL                                              00000000
15A2FE78
 SHARE$CMA$OPEN_RTL 
                                                            00000024
15A8724C
 SHARE$ORA_DRVR                                             00000000
15608200
 SHARE$ORA_DRVR                                             00000000
15600C9C
 SHARE$ORA_DRVR                                             00000000
1560BE08
 SHARE$ORA_DRVR                                             00000000
156092CC
 SHARE$ORA_DRVR                                             00000000
156088BC
 SHARE$ORA_DRVR                                             00000000
156225A8
                                                            00000000
C9C6A1F8

DBG> show task/call %task 2

 task id     state hold  pri substate        thread_object
 %TASK     2 SUSP         11 Condition Wait  2140261000
 module name     routine name                     line       rel PC
abs PC
 SHARE$CMA$RTL                                              00000000
15A24320
 SHARE$CMA$RTL                                              00000000
15A1AA48
 SHARE$CMA$RTL                                              00000000
15A2EF50
 SHARE$CMA$OPEN_RTL 
                                                            00000024
15A877C4
 SHARE$CMBRTL                                               00000000
15CADF80
 SHARE$CMA$RTL                                              00000000
15A36B58

DBG> show task/call %task 3

 task id     state hold  pri substate        thread_object
 %TASK     3 SUSP         11 Condition Wait  2140271320
 module name     routine name                     line       rel PC
abs PC
 SHARE$CMA$RTL                                              00000000
15A24320
 SHARE$CMA$RTL                                              00000000
15A1AA48
 SHARE$CMA$RTL                                              00000000
15A2EF50
 SHARE$CMA$OPEN_RTL 
                                                            00000024
15A877C4
 SHARE$CSARTL                                               00000000
15C0A89C
 SHARE$CSARTL                                               00000000
15BFEDC8
 SHARE$CSARTL                                               00000000
15C086E4
 SHARE$CSARTL                                               00000000
15C0875C
 SHARE$CMA$RTL                                              00000000
15A36B58

DBG> show task/call %task  
%DEBUG-E-SYNERREXPR, syntax error in expression at or near '%TASK'

DBG> show task/call %task 4

 task id     state hold  pri substate        thread_object
 %TASK     4 SUSP         11 Condition Wait  2140271320
 module name     routine name                     line       rel PC
abs PC
 SHARE$CMA$RTL                                              00000000
15A24320
 SHARE$CMA$RTL                                              00000000
15A1AA48
 SHARE$CMA$RTL                                              00000000
15A2EF50
 SHARE$CMA$OPEN_RTL 
                                                            00000024
15A877C4
 SHARE$CSARTL                                               00000000
15C0A89C
 SHARE$CSARTL                                               00000000
15C098D0
 SHARE$CSARTL                                               00000000
15BFEFE0
 SHARE$CSARTL                                               00000000
15C086E4
 SHARE$CSARTL                                               00000000
15C0875C
 SHARE$CMA$RTL                                              00000000
15A36B58

DBG> show task/call %task 5

 task id     state hold  pri substate        thread_object
 %TASK     5 SUSP         11 Mutex Wait      367477076
 module name     routine name                     line       rel PC
abs PC
 SHARE$CMA$RTL                                              00000000
15A24320
 SHARE$CMA$RTL                                              00000000
15A28B38
 SHARE$CMA$RTL                                              00000000
15A2843C
 SHARE$CMA$RTL                                              00000000
15A26D28
                                                            00000000
80571584
                                                            00000000
80572408
                                                            00000000
805724B0
 SHARE$SHRCCL                                               00000000
168E7F34
                                                            00000000
800985A4
 SHARE$CMA$RTL                                              00000000
15A14070
 SHARE$CMA$RTL                                              00000000
15A30A94
 SHARE$CMA$OPEN_RTL 
                                                            00000024
15A874EC
 SHARE$DBRTL                                                00000000
168119AC
 SHARE$DBRTL                                                00000000
16803D74
 SHARE$DBRTL                                                00000000
16803F34
 SHARE$DBRTL                                                00000000
167FE4C4
 SHARE$DBRTL                                                00000000
167FE6E0
 SHARE$DBRTL                                                00000000
167FE648
 SHARE$DBRTL                                                00000000
167FE648
 SHARE$DBRTL                                                00000000
167FCF10
 SHARE$DBRTL                                                00000000
167FA664
 SHARE$SYS$LOGIN_OEN_SRVROUTER_EXE 
                                                            00000000
165D4818
 SHARE$SYS$LOGIN_OEN_SRVROUTER_EXE 
                                                            00000000
165CAD74
 SHARE$SYS$LOGIN_OEN_SRVROUTER_EXE 
                                                            00000000
165B9FF4
 SHARE$SYS$LOGIN_OEN_SRVROUTER_EXE 
                                                            00000000
165E5848
 SHARE$SYS$LOGIN_OEN_SRVROUTER_EXE 
                                                            00000000
165F96AC
 SHARE$SYS$LOGIN_OEN_SRVROUTER_EXE 
                                                            00000000
165F8F44
 SHARE$SYS$LOGIN_OEN_SRVROUTER_EXE 
                                                            00000000
166121F0
 SHARE$SYS$LOGIN_OEN_SRVROUTER_EXE 
                                                            00000000
16612590
 SHARE$SYS$LOGIN_OEN_SRVROUTER_EXE 
                                                            00000000
165FB868
 SHARE$SYS$LOGIN_OEN_SRVROUTER_EXE 
                                                            00000000
165FD7E0
 SHARE$SYS$LOGIN_OEN_SRVROUTER_EXE 
                                                            00000000
165FD478
 SHARE$ORA_DRVR                                             00000000
15600564
 SHARE$ORA_DRVR                                             00000000
15608640
 SHARE$CMA$RTL                                              00000000
15A36B58

DBG> 


HERM28-D_PROD> type cma_dump.log
%DECthreads bugcheck (version T2.12-296), terminating execution.
% Running on OpenVMS AXP [OpenVMS V6.2; AlphaServer 8400 Model 5/300, 4cpus,
%  -2048Mb]
% Reason: Can't find null thread (context switch from interrupt?)
%     
%     The DECthreads library has detected an inconsistency in its internal
%   state and cannot continue execution. The inconsistency may be due to a bug
%   within the DECthreads library, the application program, or in any library
%   active in the address space. Common causes are unchecked stack overflows,
%   writes through uninitialized pointers, and synchronization races that
%   result in use of invalid data by some thread.
%
%     Application and library developers are requested to please check for
%   such problems before reporting a DECthreads library problem.
%
%     The information in this file may aid application program, library, or
%   DECthreads developers in determining the state of the process at the time
%   of this bugcheck. When the problem is reported, this information should be
%   included, along with a detailed procedure for reproducing the problem, if
%   that is possible. The 'detailed procedure' most likely to be of use to
%   developers is a complete program.
% 
% The bugcheck occurred at 17-FEB-1997 12:40:32.14 running image
%  DSA1301:[CERNER.W_STANDARD.PROD.VMSALPHAD.][000000]ORA_DRVR.EXE;10 in
%  process 20200581 (named "DXB"), under username "D_PROD". AST delivery is
%  enabled for all modes; ASTs are active in user
% The current thread is -1 (address 0x15E6EB70)
% Current thread traceback:
%     0:  PC 0x15A25CF8, FP 0x15E909E0, DESC 0x159E6AC8
%     1:  PC 0x15A24170, FP 0x15E90AB0, DESC 0x159E6720
%     2:  PC 0x15A28B38, FP 0x15E90B00, DESC 0x159E7060
%     3:  PC 0x15A2843C, FP 0x15E90B70, DESC 0x159E72F0
%     4:  PC 0x15A26D28, FP 0x15E90BA0, DESC 0x159E7610
%     5:  PC 0x15A308CC, FP 0x15E90BD0, DESC 0x159E8A70
%     6:  PC 0x15A87524, FP 0x15E90DC0, DESC 0x15A67230
%     7:  PC 0x166092E0, FP 0x15E90DE0, DESC 0x16586210
%     8:  PC 0x165F5784, FP 0x15E90E20, DESC 0x16582720
%     9:  PC 0x165F8000, FP 0x15E90E60, DESC 0x16582750
%    10:  PC 0x165F5B84, FP 0x15E90EA0, DESC 0x16582B58
%    11:  PC 0x165ECB48, FP 0x15E90EE0, DESC 0x16580BD0
%    12:  PC 0x165ED7D4, FP 0x15E90F20, DESC 0x16580C08
%    13:  PC 0x8056977C, FP 0x15E90F40, DESC 0x1584CA20
%    14:  PC 0x8002B3F4, FP 0x15E90FC0, DESC 0x15E5C180
%    15:  PC 0x15E4C734, FP 0x15E90FE0, DESC 0x15E5C148
%    16:  PC 0x15E4D194, FP 0x15E91060, DESC 0x15E61940
%    17:  PC 0x15EF5090, FP 0x15E91080, DESC 0x15EC76A8
%    18:  PC 0xC9B582BC, FP 0x15E917F0, DESC 0xC9A84DA8
%    19:  PC 0x80000304, FP 0x15E91B10, DESC 0x15E5C180
%    20:  PC 0x15E4C734, FP 0x15E91B30, DESC 0x15E5C148
%    21:  PC 0x15E4D194, FP 0x15E91BB0, DESC 0x15E61940
%    22:  PC 0x15A324B8, FP 0x15E91BD0, DESC 0x159E94B0
%    23:  PC 0x15A36B58, FP 0x15E91DC0, DESC 0x159EA0C0
%    24:  PC 0x15A140A8, FP 0x15E91FD0, DESC 0x159FE1F0 (base)
% DECthreads scheduling database is locked.
Current attributes objects:
  Attributes object 1 "default attr" (0x159FE720)
    Access synchronized by mutex 1
    Scheduling: policy throughput, priority 11; inherit scheduling
    Threads created joinable
    Stack size 11200, guard size 4096
    Mutex type fast
Current thread specific data keys:
  Key 1, destructor is 0x1584DBC0
  Key 2, destructor is 0x15B1B0E8
  Key 3, no destructor
  Key 4, no destructor
  Key 5, destructor is 0x15BCFBF0
  Key 6, destructor is 0x15A99000
  Key 7, destructor is 0x15854578
  Key 8, destructor is 0x15854558
  Key 9, destructor is 0x1584F7E0
  Key 10, destructor is 0x15854F80
Current threads:
  Thread 1 (blocked, cond wait) "default thread" (0x159FE760)
    Waiting on condition variable 23 using mutex 78
    Scheduling: throughput policy at priority 11
    Thread specific data: 1: 0x15E6F988
    (*)Stack: 0x7F91F350 (default stack)
    General cancelability enabled, asynch cancelability disabled
    No current vp
    Join uses mutex 19 and condition variable 1; wait uses mutex 20 and
      condition variable 2
    The thread's start function and argument are unknown
    The thread's latest errno is 0
      <<queue known mutex element 0x15E76428 (queue head 0x159FEA00): bad
        type: <unknown>, should be mutex>>
  Thread -2 (blocked) "manager thread" (0x15E6E790)
    Blocked on indeterminate object
    Scheduling: fifo policy at priority 32
    No thread specific data
    Stack: 0x15E87A90; base is 0x15E88000, guard area at 0x15E81FFF
    General cancelability enabled, asynch cancelability disabled
    No current vp
    Join uses mutex 21 and condition variable 3; wait uses mutex 22 and
      condition variable 4
    The thread's start function and argument are 0x159E9570 (0x00000000)
    The thread's latest errno is 0
      <<queue known mutex element 0x15E76428 (queue head 0x159FEA00): bad
        type: <unknown>, should be mutex>>
  Thread -1 (blocked, mutex wait) "null thread" (0x15E6EB70)
    Waiting to lock mutex 3
    Scheduling: idle policy at priority 0
    No thread specific data
    Stack: 0x15E90998; base is 0x15E92000, guard area at 0x15E8BFFF
    General cancelability enabled, asynch cancelability disabled
    Current vp is 0x00000000
    Join uses mutex 23 and condition variable 5; wait uses mutex 24 and
      condition variable 6
    The thread's start function and argument are 0x159E94B0 (0x00000000)
    The thread's latest errno is 0
      <<queue known mutex element 0x15E76428 (queue head 0x159FEA00): bad
        type: <unknown>, should be mutex>>
  Thread 2 (blocked, cond wait) "<pthread user@0x7F91CA88>" (0x15E71078)
    Waiting on condition variable 11 using mutex 42
    Scheduling: throughput policy at priority 11
    No thread specific data
    Stack: 0x15EADBA0; base is 0x15EAE000, guard area at 0x15EA7FFF
    Detached
    General cancelability enabled, asynch cancelability disabled
    No current vp
    Join uses mutex 43 and condition variable 12; wait uses mutex 44 and
      condition variable 13
    The thread's start function and argument are 0x15C55C30 (0x00000000)
    The thread's latest errno is 0
      <<queue known mutex element 0x15E76428 (queue head 0x159FEA00): bad
        type: <unknown>, should be mutex>>
  Thread 3 (blocked, cond wait) "<pthread user@0x7F91F2D8>" (0x15E71648)
    Waiting on condition variable 8 using mutex 35
    Scheduling: throughput policy at priority 11
    No thread specific data
    Stack: 0x1601FB30; base is 0x16020000, guard area at 0x1600BFFF
    General cancelability enabled, asynch cancelability disabled
    No current vp
    Join uses mutex 48 and condition variable 15; wait uses mutex 49 and
      condition variable 16
    The thread's start function and argument are 0x15BD0AD0 (0x15E70E80)
    The thread's latest errno is 0
      <<queue known mutex element 0x15E76428 (queue head 0x159FEA00): bad
        type: <unknown>, should be mutex>>
  Thread 4 (blocked, cond wait) "<pthread user@0x7F91F2D8>" (0x15E71A38)
    Waiting on condition variable 7 using mutex 33
    Scheduling: throughput policy at priority 11
    No thread specific data
    Stack: 0x16037AE0; base is 0x16038000, guard area at 0x16023FFF
    General cancelability enabled, asynch cancelability disabled
    No current vp
    Join uses mutex 50 and condition variable 17; wait uses mutex 51 and
      condition variable 18
    The thread's start function and argument are 0x15BD0AD0 (0x15E71A20)
    The thread's latest errno is 0
      <<queue known mutex element 0x15E76428 (queue head 0x159FEA00): bad
        type: <unknown>, should be mutex>>
  Thread 5 (blocked, mutex wait) "<pthread user@0x15E74154>" (0x15E76518)
    Waiting to lock mutex 3
    Scheduling: throughput policy at priority 11
    personal pending wake is set
    There is 1 thread waiting to join
    Thread specific data: 2: 0x15E77118, 5: 0x17DDADB8, 6: 0x17DCFD30, 7:
       0x15E76D18, 8: 0x15E76AB0, 9: 0x15E7C688, 10: 0x15E7C7F0
    Stack: 0x17D96AE0; base is 0x17D98000, guard area at 0x17D83FFF
    General cancelability enabled, asynch cancelability disabled
    No current vp
    Join uses mutex 78 and condition variable 23; wait uses mutex 79 and
      condition variable 24
    The thread's start function and argument are 0x155E14F0 (0x15E74150)
    The thread's latest errno is 16
      <<queue known mutex element 0x15E76428 (queue head 0x159FEA00): bad
        type: <unknown>, should be mutex>>
Current mutexes:
  Mutex 1 (fast) "default attr's mutex" (0x15E6E010) is not locked
  Mutex 2 (fast) "known attr list" (0x15E6E060) is not locked
  Mutex 3 (fast) "known mutex list" (0x15E6E0B0) is locked, 2 threads waiting;
    event flag set; waiters: 5, -1
  Mutex 4 (recursive) "global lock" (0x15E6E100) is not locked
  Mutex 5 (fast) "64 byte VM lookaside" (0x15E6E150) is not locked
  Mutex 6 (fast) "88 byte VM lookaside" (0x15E6E1A0) is not locked
  Mutex 7 (fast) "480 byte VM lookaside" (0x15E6E1F0) is not locked
  Mutex 8 (fast) "1608 byte VM lookaside" (0x15E6E240) is not locked
  Mutex 9 (fast) "2104 byte VM lookaside" (0x15E6E290) is not locked
  Mutex 10 (fast) "attributes object cache" (0x15E6E2E0) is not locked
  Mutex 11 (fast) "thread cache" (0x15E6E330) is not locked
  Mutex 12 (fast) "small stack cache" (0x15E6E380) is not locked
  Mutex 13 (fast) "default stack cache" (0x15E6E3D0) is not locked
  Mutex 14 (fast) "large stack cache" (0x15E6E420) is not locked
  Mutex 15 (fast) "VM stats" (0x15E6E470) is not locked
  Mutex 16 (fast) "per-thread context" (0x15E6E4C0) is not locked
  Mutex 17 (fast) "known cond list" (0x15E6E510) is not locked
  Mutex 18 (fast) "one time init" (0x15E6E560) is not locked
  Mutex 19 (fast) "thread 1 lock" (0x15E6E5B0) is not locked
  Mutex 20 (fast) "thread 1 wait" (0x15E6E650) is not locked
  Mutex 21 (fast) "thread -2 lock" (0x15E6E990) is not locked
  Mutex 22 (fast) "thread -2 wait" (0x15E6EA30) is not locked
  Mutex 23 (fast) "thread -1 lock" (0x15E6ED70) is not locked
  Mutex 24 (fast) "thread -1 wait" (0x15E6EE10) is not locked
  Mutex 25 (fast) "debugger client registry" (0x15E6EF50) is not locked
  Mutex 26 (fast) "<once block@0x1586C7E0>" (0x15E6F740) is not locked
  Mutex 27 (fast) "<once block@0x1586DB20>" (0x15E6FBC0) is not locked
  Mutex 28 (fast) "<CMA user@0x1589C690>" (0x15E6FC10) is not locked
  Mutex 29 (fast) "<pthread user@0x15B7A030>" (0x15E6FC60) is not locked
  Mutex 30 (fast) "<pthread user@0x15B7A03C>" (0x15E6FCB0) is not locked
  Mutex 31 (fast) "<pthread user@0x15E2A01C>" (0x15E6FD48) is not locked
  Mutex 32 (fast) "<pthread user@0x15C2E008>" (0x15E6FD98) is not locked
  Mutex 33 (fast) "<pthread user@0x15C2E058>" (0x15E6FDE8) is not locked
  Mutex 34 (fast) "<pthread user@0x15C2E084>" (0x15E6FEB8) is not locked
  Mutex 35 (fast) "<pthread user@0x15C2E090>" (0x15E6FF08) is not locked
  Mutex 36 (fast) "<pthread user@0x15C2E0D8>" (0x15E6FFD8) is not locked
  Mutex 37 (fast) "<pthread user@0x15C2E0F0>" (0x15E70028) is not locked
  Mutex 38 (fast) "<pthread user@0x15B08014>" (0x15E700C8) is not locked
  Mutex 39 (fast) "<once block@0x15C60000>" (0x15E70E98) is not locked
  Mutex 40 (fast) "<pthread user@0x15CEF880>" (0x15E70EE8) is not locked
  Mutex 41 (fast) "<once block@0x15C600C8>" (0x15E70F38) is not locked
  Mutex 42 (fast) "<pthread user@0x15D017F0>" (0x15E70FD8) is not locked
  Mutex 43 (fast) "thread 2 lock" (0x15E71278) is not locked
  Mutex 44 (fast) "thread 2 wait" (0x15E71318) is not locked
  Mutex 45 (fast) "<once block@0x15C601C0>" (0x15E71458) is not locked
  Mutex 46 (fast) "<pthread user@0x15CE91A0>" (0x15E714A8) is not locked
  Mutex 47 (fast) "for attr obj 4" (0x15E715F8) is not locked
  Mutex 48 (fast) "thread 3 lock" (0x15E71848) is not locked
  Mutex 49 (fast) "thread 3 wait" (0x15E718E8) is not locked
  Mutex 50 (fast) "thread 4 lock" (0x15E71C38) is not locked
  Mutex 51 (fast) "thread 4 wait" (0x15E71CD8) is not locked
  Mutex 52 (fast) "<once block@0x15C60100>" (0x15E71EB8) is not locked
  Mutex 53 (fast) "<pthread user@0x15CE8488>" (0x15E71F08) is not locked
  Mutex 54 (fast) "<once block@0x15C600F0>" (0x15E71FA8) is not locked
  Mutex 55 (fast) "<pthread user@0x15CE8470>" (0x15E71FF8) is not locked
  Mutex 56 (fast) "<pthread user@0x15E7218E>" (0x15E721A8) is not locked
  Mutex 57 (fast) "<pthread user@0x15E72248>" (0x15E72260) is not locked
  Mutex 58 (fast) "<pthread user@0x166F6054>" (0x15E741D8) is not locked
  Mutex 59 (fast) "<pthread user@0x166F6078>" (0x15E74258) is not locked
  Mutex 60 (fast) "<pthread user@0x166F609C>" (0x15E742D8) is not locked
  Mutex 61 (fast) "<pthread user@0x167B815C>" (0x15E74678) is not locked
  Mutex 62 (fast) "<pthread user@0x17C18004>" (0x15E75120) is not locked
  Mutex 63 (fast) "<pthread user@0x17C18048>" (0x15E751D0) is not locked
  Mutex 64 (fast) "<pthread user@0x17C18074>" (0x15E752A0) is not locked
  Mutex 65 (fast) "<pthread user@0x15E74130>" (0x15E75340) is not locked
  Mutex 66 (fast) "<pthread user@0x15E75430>" (0x15E75458) is not locked
  Mutex 67 (fast) "<pthread user@0x15E75580>" (0x15E755A8) is not locked
  Mutex 68 (fast) "<pthread user@0x15E75678>" (0x15E756A0) is not locked
  Mutex 69 (fast) "<pthread user@0x15E75760>" (0x15E75788) is not locked
  Mutex 70 (fast) "<pthread user@0x15E75870>" (0x15E75898) is not locked
  Mutex 71 (fast) "<pthread user@0x16653A94>" (0x15E75E28) is locked
  Mutex 72 (fast) "<pthread user@0x15E75EA8>" (0x15E75ED0) is not locked
  Mutex 73 (fast) "<pthread user@0x15E75FC0>" (0x15E75FE8) is not locked
  Mutex 74 (fast) "<pthread user@0x15E76110>" (0x15E76138) is not locked
  Mutex 75 (fast) "<pthread user@0x15E76208>" (0x15E76230) is not locked
  Mutex 76 (fast) "<pthread user@0x15E762F0>" (0x15E76318) is not locked
    <<queue known mutex element 0x15E76428 (queue head 0x159FEA00): bad type:
      <unknown>, should be mutex>>
Current condition variables:
  Condition variable 1, "thread 1 join" (0x15E6E600) has no waiters
  Condition variable 2, "thread 1 wait" (0x15E6E6A0) has no waiters
  Condition variable 3, "thread -2 join" (0x15E6E9E0) has no waiters
  Condition variable 4, "thread -2 wait" (0x15E6EA80) has no waiters
  Condition variable 5, "thread -1 join" (0x15E6EDC0) has no waiters
  Condition variable 6, "thread -1 wait" (0x15E6EE60) has no waiters
  Condition variable 7, "<pthread user@0x15C2E064>" (0x15E6FE38) has 1 waiter
    using mutex 33; event flag set, on waiters list; waiters: 4
  Condition variable 8, "<pthread user@0x15C2E09C>" (0x15E6FF58) has 1 waiter
    using mutex 35; event flag set, on waiters list; waiters: 3
  Condition variable 9, "<pthread user@0x15C2E0FC>" (0x15E70078) has no
    waiters
  Condition variable 10, "<pthread user@0x15D01470>" (0x15E70F88) has no
    waiters
  Condition variable 11, "<pthread user@0x15D017E0>" (0x15E71028) has 1 waiter
    using mutex 42; event flag set, on waiters list; waiters: 2
  Condition variable 12, "thread 2 join" (0x15E712C8) has no waiters
  Condition variable 13, "thread 2 wait" (0x15E71368) has no waiters
  Condition variable 14, "<pthread user@0x15CE91A8>" (0x15E714F8) has no
    waiters; pending wake flag set
  Condition variable 15, "thread 3 join" (0x15E71898) has no waiters
  Condition variable 16, "thread 3 wait" (0x15E71938) has no waiters
  Condition variable 17, "thread 4 join" (0x15E71C88) has no waiters
  Condition variable 18, "thread 4 wait" (0x15E71D28) has no waiters
  Condition variable 19, "<pthread user@0x15CE8490>" (0x15E71F58) has no
    waiters; deferred signal flag set, pending wake flag set
  Condition variable 20, "<pthread user@0x15CE8478>" (0x15E72048) has no
    waiters; pending wake flag set
  Condition variable 21, "<pthread user@0x17C18054>" (0x15E75220) has no
    waiters
  Condition variable 22, "<pthread user@0x17C18080>" (0x15E752F0) has no
    waiters
  Condition variable 23, "thread 5 join" (0x15E76768) has 1 waiter using mutex
    78; event flag set, on waiters list; waiters: 1
  Condition variable 24, "thread 5 wait" (0x15E76808) has no waiters

Current stacks:
  Thread 1 stack: 0x40000000 to 0x7FFFFFFF (1073741823 bytes)
  Thread -2 stack: 0x15E80000 to 0x15E88000 (32768 bytes); base is 0x15E88000,
    guard is 0x15E81FFF
  Thread -1 stack: 0x15E8A000 to 0x15E92000 (32768 bytes); base is 0x15E92000,
    guard is 0x15E8BFFF; 512 bytes lost in page alignment
  Thread 2 stack: 0x15EA6000 to 0x15EAE000 (32768 bytes); base is 0x15EAE000,
    guard is 0x15EA7FFF; 7168 bytes lost in page alignment
  Thread 3 stack: 0x1600A000 to 0x16020000 (90112 bytes); base is 0x16020000,
    guard is 0x1600BFFF
  Thread 4 stack: 0x16022000 to 0x16038000 (90112 bytes); base is 0x16038000,
    guard is 0x16023FFF
  Thread 5 stack: 0x17D82000 to 0x17D98000 (90112 bytes); base is 0x17D98000,
    guard is 0x17D83FFF

Current memory:
    lookaside 0 (64 bytes; k-vp, u-vp, stk, attr, mut, cv) 2373 in use, 0 free;
      1792 minimums (scale 2^11, 2048) and 1792 maximums (scale 2^11, 2048);
      average minimum is 15.968, average maximum is 15.968; maximum free 621;
      54032 hits, 24394 misses (68.89% hit rate); high water mark is 1; 22068
      adjust intervals (plus 3 iterations); current balance 1, trend is steady
      (for 6 intervals); 22021 flushed
    lookaside 1 (88 bytes; ctx) 0 in use, 0 free; maximum free 0; high water
      mark is 6; 0 adjust intervals (plus 0 iterations); current balance 0,
      trend is steady (for 0 intervals)
    lookaside 2 (480 bytes; tcb) 7 in use, 0 free; maximum free 0; 0 hits, 7
      misses (0.00% hit rate); high water mark is 6; 1 adjust interval (plus 0
      iterations); current balance 0, trend is up (for 0 intervals)
    lookaside 3 (1608 bytes; cv-meter) 0 in use, 0 free; maximum free 0; high
      water mark is 6; 0 adjust intervals (plus 0 iterations); current balance
      0, trend is steady (for 0 intervals)
    lookaside 4 (2104 bytes; mu-meter) 0 in use, 0 free; maximum free 0; high
      water mark is 6; 0 adjust intervals (plus 0 iterations); current balance
      0, trend is steady (for 0 intervals)
    attributes object: 0 in use, 1 free; maximum free 1; 2 hits, 1 miss
      (66.66% hit rate); high water mark is 6; 0 adjust intervals (plus 6
      iterations); current balance 0, trend is steady (for 0 intervals)
    thread: 6 in use, 0 free; maximum free 0; 0 hits, 6 misses (0.00% hit
      rate); high water mark is 6; 0 adjust intervals (plus 6 iterations);
      current balance 6, trend is steady (for 0 intervals)
    small stack: 0 in use, 0 free; maximum free 0; high water mark is 6; 0
      adjust intervals (plus 0 iterations); current balance 0, trend is steady
      (for 0 intervals)
    default stack: 3 in use, 0 free; maximum free 0; 0 hits, 3 misses (0.00%
      hit rate); high water mark is 6; 0 adjust intervals (plus 3 iterations);
      current balance 3, trend is steady (for 0 intervals)
    large stack: 3 in use, 0 free; maximum free 0; 0 hits, 3 misses (0.00% hit
      rate); high water mark is 6; 0 adjust intervals (plus 3 iterations);
      current balance 3, trend is steady (for 0 intervals)
    24422 external calls for 2175584 bytes; 2401 current allocations, 590072
      bytes; 7680 bytes lost in page alignment

HERM28-D_PROD> 

>
1485.2Those pesky ASTs... :-}WTFN::SCALESDespair is appropriate and inevitable.Mon Feb 17 1997 15:2950
.0> we have verified that there are no calls to sys$hiber() at ast level, or 
.0> any pthread calls that are not callable at AST level.

Taylor, despite the above protestation, the only reasonable explanation for this
phenomenon is that the customer's AST routine is calling (perhaps indirectly
through a subroutine or a macro) some DECthreads routine which blocks (such as a
mutex or CV wait).

.0> We are having a problem with threaded servers that are going into a 
.0> hibernate state

This would be consistent with having blocked all threads.  That is, when there
is no thread to run, DECthreads schedules the DECthreads "null" thread to run,
which places the process into the hibernate state.

.0> Examining the call stack from SDA does not a return
.0> address anywhere within the confines of our application.

This is consistent with the null thread being the currently "executing" thread.

.0> Task 0 is not a valid task.  

This might be consistent with the null thread being current.  (It's been a long
time since I've used the debugger with a production version of DECthreads -- I
don't know, how it responds to having a null thread as the "current task"... :-)

.0> What information should we gather from the threads debugger to help 
.0> determine if this is a problem that is caused by our application, or a 
.0> problem with DECThreads?

[It's unlikely to be a problem with DECthreads.]

First, it'd be great to know what on version of which operating system they are
running!!  (I.e., Alpha or VAX, and V6.1 or V7.1??)  And, if they are running on
V7, did they enable the kernel thread support?

Second, does the customer know about the DECthreads debugger (cma_debug() or
pthread_debug())?  (E.g., does the customer have access to the DECthreads
documentation, and have they read the sections on debugging??  [If not, they
should read the latest version of "The Guide to DECthreads" to which they have
access.])

I'd recommend that they look at which mutexes are locked (and which threads are
waiting for them) and at which threads are runnable.  If the currently running
thread is a null thread and the process is currently at AST level, then with
greater than 99% certainty the application has made an unsupported DECthreads
call from AST level.


				Webb
1485.3It's a mutex creation/deletion from an AST.WTFN::SCALESDespair is appropriate and inevitable.Mon Feb 17 1997 16:1334
.1> % Running on OpenVMS AXP [OpenVMS V6.2; AlphaServer 8400 Model 5/300, 4cpus,
.1> %  -2048Mb]

OK, so that answers that question...  ;-)

.1> Here is some info from an incident we had this afternoon.

I assert that this "incident" and the one reported in the base note have the
same cause (just different impact craters... ;-)  What happened in .1 is that
the null thread itself attempted to block (on a mutex).  This cannot happen. 
That is, what would we run if all the threads _including_the_null_thread_ were
to block?  (Hence the bugcheck.)  Also, and more importantly, the null thread's
code never attempts to lock a mutex.  Yet...

.1> Mutex 3 (fast) "known mutex list" (0x15E6E0B0) is locked, 2 threads waiting;
.1> event flag set; waiters: 5, -1

...here's the null thread waiting to lock mutex #3.  Thus, the null thread must
have been interrupted by an AST routine which itself called something which
tried to lock this mutex.  That is, apparently the AST routine is calling
DECthreads (perhaps indirectly) to create or delete a mutex, and the null thread
is blocking when it tries trying to access the list of known mutexes, presumably
to add or remove a mutex, because the list's mutex is already locked.

An analogous situation results in the hang reported in the base note.  In that
case, instead of interrupting the null thread, the AST interrupts the thread
which itself (already) holds the mutex which the AST routine will need.  Thus,
when the AST routine tries to acquire it, the thread enters a self-deadlock, and
the process eventually hangs.  Only, before it does, it the current thread
blocks and the null thread starts running _at_AST_level_, which can lead to all
manner of wierdness.


				Webb