[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference turris::digital_unix

Title:DIGITAL UNIX(FORMERLY KNOWN AS DEC OSF/1)
Notice:Welcome to the Digital UNIX Conference
Moderator:SMURF::DENHAM
Created:Thu Mar 16 1995
Last Modified:Fri Jun 06 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:10068
Total number of notes:35879

8718.0. "help: read() spins on Unix 4.0B / multithreaded program" by TAEC::MARTIN () Wed Feb 05 1997 06:55

Hi all,

  We are working on the migration of our product (DECss7) from DIGITAL Unix 3.2 to DIGITAL Unix 4.0B.
  Beside the fact of the slight change in standards (POSIX) for threads / signals, we are having a problem that
occurs with the read() function.

  Here is a description of the problem:

DECss7 is a multithreaded / distributed platform.
Processes have connections with sockets / mailboxes

In a simple configuration of the platform, all works fine for about 5 minutes and then a process stops with a high
process charge.

this is what i get with "ps":

18071  0.0 /usr/var/ss7/platform_m/bin/ss7_fep_ccitt_m.exe
29915  0.0 /usr/var/ss7/platform_m/bin/ss7_bep_ccitt_m.exe
17964  0.0 /usr/var/ss7/platform_m/bin/ss7_dir_ccitt_m.exe
17421  0.0 /usr/var/ss7/platform_l/bin/ss7_fep_ccitt_l.exe
2180   0.0 /usr/var/ss7/platform_l/bin/ss7_bep_ccitt_l.exe
18036  0.0 /usr/var/ss7/platform_l/bin/ss7_dir_ccitt_l.exe
18896  0.0 /usr/users/ss7/perfs/ivp2.loop/obj/multi_thread/ss7_hlr_thread_ccitt
18864 42.0 /usr/users/ss7/perfs/ivp2.loop/obj/multi_thread/ss7_vlr_thread_ccitt

We use ladebug to see where it is spinning:

ladebug -pid 18864 /usr/users/ss7/perfs/ivp2.loop/obj/multi_thread/ss7_vlr_thread_ccitt
Welcome to the Ladebug Debugger Version 4.0-26
------------------ 
object file name: /usr/users/ss7/perfs/ivp2.loop/obj/multi_thread/ss7_vlr_thread_ccitt 
Reading symbolic information ...done
Attached to process id 18864  ....
Thread received signal INT
stopped at [???: ??? 0x3ff800d1818]
(ladebug) show thread
Thread State      Substate        Policy     Priority Name
------ ---------- --------------- ---------- -------- -------------
>*  20 running                    throughput 11       0x1409528c0 TRP-RD-SS7
     1 ready                      throughput 11       default thread
    -1 blocked    kernel          fifo       32       manager thread
    -2 ready                      idle        0       null thread for VP 0x0
     2 ready                      throughput 11       0x140062f20 SIGNAL-SS7
     3 blocked    cond wait       rr         23       <anonymous>
     4 blocked    cond wait       rr         23       <anonymous>
     5 blocked    kernel          rr         23       <anonymous>
     6 blocked    kernel          throughput 11       0x1406e52c0 TRP-CN-SS7
     7 blocked    cond wait       throughput 11       0x1406e5320 TRP-RG-SS7
     8 blocked    cond wait       throughput 11       0x1406e53e0 MBX-RD-SS7
     9 blocked    cond wait       throughput 11       0x1406e5440 MBX-WR-SS7
    10 blocked    cond wait       throughput 11       0x1406e54a0 MBX-IO-SS7
    11 blocked    cond wait       rr         23       0x1406e5500 MBX-RD-SS7
    12 blocked    cond wait       rr         23       0x1406e5560 MBX-WR-SS7
    13 blocked    cond wait       rr         23       0x1406e55c0 MBX-IO-SS7
    14 blocked    kernel          rr         23       <anonymous>
    15 blocked    kernel          rr         23       <anonymous>
    16 blocked    kernel          throughput 11       0x1406e56e0 TRP-NT-SS7
    18 blocked    kernel          throughput 11       0x140952800 TRP-NT-SS7
    19 blocked    cond wait       throughput 11       0x140952860 TRP-WR-SS7
    21 blocked    kernel          throughput 11       0x140952980 TRP-RD-SS7
    22 ready      cond wait       throughput 11       <anonymous>
    23 blocked    cond wait       throughput 11       <anonymous>
(ladebug) where
>0  0x3ff800d1818 in /usr/shlib/libc.so
#1  0x3ffbffeb0e4 in ((TC_transport_layer_tcp_ip*)0x1400e9600)->FI_full_read(AI_socket=-1016448,
AP_bufadr=0x3fffffb2730="p\"\356\277\377\003", AI_buflen=-1120448)
/usr/users1/martin/ss7dev/ccitt_v30_osf40/dev/transport_layer_tcp_ip.cc:771
...
(ladebug) up
(ladebug) list 760
    760   // We read on the socket until we have filled the given buffer.
    761 
    762   while ( VI_count < AI_buflen )
    763   {
    764     int VI_rslt = 0;
    765 
    766     if (VI_rslt <0 || VI_rslt>2000)
    767       {
    768         printf("TCP/IP erreur: VI_rslt=%d\n",VI_rslt);
    769       }
    770 
>   771     VI_rslt = read ( AI_socket , AP_bufadr + VI_count , AI_buflen - VI_count ) ;
    772 
    773     // Check for connection loss.
    774 
    775     if ( VI_rslt < 0 && errno != ECONNRESET )
    776     {
    777       MM_throw_system_error ( errno ) ;
    778     }
    779     
    780     if ( VI_rslt <= 0 )
(ladebug) p VI_rslt
-1016976

As we can see, the read() function call spins.

Is this a known problem or is there a patch for read() in Unix 4.0 .

We are also having problem with sigwait that fails in "cxx__thow" ...???
(not at the same time, but perhaps is it related ? )

Is there a way to debug this ?

Any help appreciated !!!

Thank you !


Laurent Martin
SISB Telecom
France
T.RTitleUserPersonal
Name
DateLines
8718.1We can't tell that it's spinning.WTFN::SCALESDespair is appropriate and inevitable.Thu Feb 06 1997 19:0513
.0> As we can see, the read() function call spins.

Based solely on the information you've provided, it's not at all obvious that
the read() function call is spinning.  You've interrupted an active process and
taken a snapshot of what it was doing at that moment.  Are you certain that the
call to read() never returns?  Do other threads in the process continue to run?

I would recommend that you use the debugger to determine what the current thread
is actually doing...



					Webb
8718.2I'm sure it's spinning !!NETRIX::&quot;[email protected]&quot;Laurent MartinFri Feb 14 1997 05:33101
Hi,

  Webb, you're right, i didn't give details in the debugger output to
show that read is spinning:

We can reproduce the spin when making DECss7 run a certain time on unix4.0 .
We are having troubles with signals too (see topic #8821).

he is another debugging session: (with dbx this time.)

------------------------------------------------------------------------------
--
% dbx -pid 8777 /usr/var/ss7/platform_l/bin/ss7_fep_ccitt_l.exe    
dbx version 3.11.10
Type 'help' for help.

thread 0x15 stopped at>*[__read, 0x3ff800d1818]         beq     r19,
0x3ff800d1830
(dbx) where
>  0 __read(0x0, 0x1201fc4e0, 0x141166ef0, 0x1201fce90, 0x1400875e0)
[0x3ff800d1818]
   1 FI_full_read__25TC_transport_layer_tcp_ipXiPci(this = 0x140241e00,
AI_socket = 1089074352, AP_bufadr = 0x1, AI_buflen = 1091989408)
["/usr/ss7prj1/ccitt_v31_du40/base/starlet/transport_layer_tcp_ip.cc":774,
0x3ffbffe9ea0]
   2 FI_read__25TC_transport_layer_tcp_ipXiPci(this = 0x140241e00, AI_socket =
17, AP_bufadr = 0x141167230 = "^�", AI_buflen = 2048)
["/usr/ss7prj1/ccitt_v31_du40/base/starlet/transport_layer_tcp_ip.cc":573,
0x3ffbffe9c38]
   3 FV_read_service__18TC_transport_layerXPv(this = 0x140241e00, AI_socket =
0x11) ["/usr/ss7prj1/ccitt_v31_du40/base/starlet/transport_layer.cc":1221,
0x3ffbffe765c]
   4
FV_entry__31TC_task___T18TC_transport_layerXR31TC_task___T18TC_transport_layer
AR_task = @140dadd80)
["/usr/users1/martin/ss7dev/ccitt_v31_du40/dev/task_.hh":620, 0x3ffbffe63a8]
   5 thdBase(0x0, 0x0, 0x0, 0x1, 0x45586732) [0x3ff80574274]
(dbx) where
>  0 __read(0x0, 0x1201fc4e0, 0x141166ef0, 0x1201fce90, 0x1400875e0)
[0x3ff800d1818]
   1 FI_full_read__25TC_transport_layer_tcp_ipXiPci(this = 0x140241e00,
AI_socket = 1089074352, AP_bufadr = 0x1, AI_buflen = 1091989408)
["/usr/ss7prj1/ccitt_v31_du40/base/starlet/transport_layer_tcp_ip.cc":774,
0x3ffbffe9ea0]
   2 FI_read__25TC_transport_layer_tcp_ipXiPci(this = 0x140241e00, AI_socket =
17, AP_bufadr = 0x141167230 = "^�", AI_buflen = 2048)
["/usr/ss7prj1/ccitt_v31_du40/base/starlet/transport_layer_tcp_ip.cc":573,
0x3ffbffe9c38]
   3 FV_read_service__18TC_transport_layerXPv(this = 0x140241e00, AI_socket =
0x11) ["/usr/ss7prj1/ccitt_v31_du40/base/starlet/transport_layer.cc":1221,
0x3ffbffe765c]
   4
FV_entry__31TC_task___T18TC_transport_layerXR31TC_task___T18TC_transport_layer
AR_task = @140dadd80)
["/usr/users1/martin/ss7dev/ccitt_v31_du40/dev/task_.hh":620, 0x3ffbffe63a8]
   5 thdBase(0x0, 0x0, 0x0, 0x1, 0x45586732) [0x3ff80574274]
(dbx) list
    86  
    87      -- function Main, the real main routine for the overall program.
    88      --
    89      function Main -- in DEC Ada created code; please list lines 1:36
for more information
    90        (
    91          Argc    : in     Long_Integer;
    92          Argv    : in     System.Address;
    93          Envp    : in     System.Address
    94        )
    95        return Long_Integer
    96      is
(dbx) up
FI_full_read__25TC_transport_layer_tcp_ipXiPci: 774  VI_rslt = read (
AI_socket , AP_bufadr + VI_count , AI_buflen - VI_count ) ;
(dbx) list
   771    {
   772      int VI_rslt ;
   773  
>  774      VI_rslt = read ( AI_socket , AP_bufadr + VI_count , AI_buflen -
VI_count ) ;
   775  
   776      // Check for connection loss.
   777  
   778      if ( VI_rslt < 0 && errno != ECONNRESET )
   779      {
   780        MM_throw_system_error ( errno ) ;
   781      }
(dbx) stop at 776
[2] stop at
"/usr/ss7prj1/ccitt_v31_du40/base/starlet/transport_layer_tcp_ip.cc":778
(dbx) cont
-----------------------------------------------------------------------------

and here it never stops at 778, showing that something not normal occurs in
read()
(exception ? spin...?)

If you want more details (debugger details ...) I can give you more.

Thank you,

Laurent
[Posted by WWW Notes gateway]
8718.3SMURF::DENHAMDigital UNIX KernelFri Feb 14 1997 10:2415
    Kernel side stack traces would be useful too. Let this thing
    run and dbx -k /vmunix.
    
    Do a tstack command and try to find some in read(). If you do
    find someone, do a couple of stack traces a few seconds apart.
    Any change?
    
    Then try this, again a few seconds between iterations:
    
    (dbx) p ((thread_t)$tid).last_run_stamp
    
    This will show whether the thread is running at all.
    
    Of course ps mp PID-OF-STUCK-PROCESS a few times will show you the
    same thing -- whether the "spinning" thread is really spinning.
8718.4What is it doing there?WTFN::SCALESDespair is appropriate and inevitable.Fri Feb 14 1997 14:248
> thread 0x15 stopped at>*[__read, 0x3ff800d1818]      beq    r19, 0x3ff800d1830

It might also be illuminating if you were to single-step (by instruction) and
see if you can traverse the whole loop.  That might give us (or someone else
with access to the source) an idea of what's going on.


				Webb
8718.5Re:.2 .3 :dbx and ladebug traces of the problem.NETRIX::&quot;[email protected]&quot;Laurent MartinTue Feb 18 1997 07:44266
Hi,
  As Jeff said, i did a dbx -k /vmunix
with tstack commands (four per second): see annex A
we can see that we are very often in read()

  If i use dbx (or ladebug), stepi doesn't stop: Annex B

I don't know what i could do now ...

Laurent

########################## A ##############################################
(dbx) tstack

Thread 0xfffffc000bf5fb80:
   0 thread_block() ["../../../../src/kernel/kern/sched_prim.c":2094,
0xfffffc00002a86bc]
(dbx) tstack

Thread 0xfffffc000bf5fb80:
   0 thread_block() ["../../../../src/kernel/kern/sched_prim.c":2094,
0xfffffc00002a86bc]
   1 mmread(0x0, 0x0, 0xfffffc0000607f30, 0x0, 0xfffffc00004a391c)
["../../../../src/kernel/arch/alpha/mem.c":125, 0xfffffc00005059a0]
   2 spec_read(0xffffffff90e4f7d8, 0xfffffc000ee45a00, 0x8,
0xfffffc000ee65800, 0xfffffc0000200000)
["../../../../src/kernel/vfs/spec_vnops.c":1870, 0xfffffc00004a3918]
   3 msfsspec_read(vp = 0xfffffc00052af600, uio = 0xffffffff90e4f848, ioflag =
0, cred = 0xfffffc000ee65800)
["../../../../src/kernel/msfs/osf/msfs_vnops.c":4422, 0xfffffc00003eacac]
   4 vn_read(0xfffffc0000289b64, 0xffffffff90e4f8e0, 0xffffffff90e4f848, 0x0,
0xfffffc000edb3700) ["../../../../src/kernel/vfs/vfs_vnops.c":877,
0xfffffc00004c4a0c]
   5 rwuio(0xfffffc000ebb8220, 0xfffffc000bf5fb80, 0xfffffc0000686080,
0xfffffc000ebb8000, 0xfffffc000edb3700)
["../../../../src/kernel/bsd/sys_generic.c":1217, 0xfffffc0000289bb8]
   6 read(0xfffffc0000000008, 0xffffffff90e4f838, 0x5e7d850, 0x800000001, 0x0)
["../../../../src/kernel/bsd/sys_generic.c":1167, 0xfffffc0000289ac8]
   7 syscall(0x8, 0xfffffc00004fd890, 0xa4e75330985be, 0x3ff800d7248, 0x3)
["../../../../src/kernel/arch/alpha/syscall_trap.c":540, 0xfffffc0000507a00]
>  8 _Xsyscall(0x8, 0x3ff800d1818, 0x1400302b0, 0x3, 0x11fffb950)
["../../../../src/kernel/arch/alpha/locore.s":1209, 0xfffffc00004fdac4]
(dbx) tstack

Thread 0xfffffc000bf5fb80:
   0 thread_block() ["../../../../src/kernel/kern/sched_prim.c":2094,
0xfffffc00002a86bc]
   1 mmread(0x0, 0x0, 0xfffffc0000607f30, 0x0, 0xfffffc00004a391c)
["../../../../src/kernel/arch/alpha/mem.c":125, 0xfffffc00005059a0]
   2 spec_read(0xffffffff90e4f7d8, 0xfffffc000ee45a00, 0x2,
0xfffffc000ee65800, 0xfffffc0000200000)
["../../../../src/kernel/vfs/spec_vnops.c":1870, 0xfffffc00004a3918]
   3 msfsspec_read(vp = 0xfffffc00052af600, uio = 0xffffffff90e4f848, ioflag =
0, cred = 0xfffffc000ee65800)
["../../../../src/kernel/msfs/osf/msfs_vnops.c":4422, 0xfffffc00003eacac]
   4 vn_read(0xfffffc0000289b64, 0xffffffff90e4f8e0, 0xffffffff90e4f848, 0x0,
0xfffffc000edb3700) ["../../../../src/kernel/vfs/vfs_vnops.c":877,
0xfffffc00004c4a0c]
   5 rwuio(0xfffffc000ebb8220, 0xfffffc000bf5fb80, 0xfffffc0000686080,
0xfffffc000ebb8000, 0xfffffc000edb3700)
["../../../../src/kernel/bsd/sys_generic.c":1217, 0xfffffc0000289bb8]
   6 read(0xfffffc0000000008, 0xffffffff90e4f838, 0x5e7d850, 0x800000001, 0x0)
["../../../../src/kernel/bsd/sys_generic.c":1167, 0xfffffc0000289ac8]
   7 syscall(0x8, 0xfffffc00004fda70, 0xe42d4330985be, 0x12009ab68, 0x3)
["../../../../src/kernel/arch/alpha/syscall_trap.c":540, 0xfffffc0000507a00]
>  8 _Xsyscall(0x8, 0x3ff800d1818, 0x1400302b0, 0x3, 0x11fffb950)
["../../../../src/kernel/arch/alpha/locore.s":1209, 0xfffffc00004fdac4]
(dbx) tstack

Thread 0xfffffc000bf5fb80:
   0 thread_block() ["../../../../src/kernel/kern/sched_prim.c":2094,
0xfffffc00002a86bc]
   1 mmread(0x0, 0x0, 0xfffffc0000607f30, 0x0, 0xfffffc00004a391c)
["../../../../src/kernel/arch/alpha/mem.c":125, 0xfffffc00005059a0]
   2 spec_read(0xffffffff90e4f7d8, 0xfffffc0000230000, 0x120086380,
0xfffffc000ee65800, 0xfffffc0000200000)
["../../../../src/kernel/vfs/spec_vnops.c":1870, 0xfffffc00004a3918]
   3 msfsspec_read(vp = 0xfffffc00052af600, uio = 0xffffffff90e4f848, ioflag =
0, cred = 0xfffffc000ee65800)
["../../../../src/kernel/msfs/osf/msfs_vnops.c":4422, 0xfffffc00003eacac]
   4 vn_read(0xfffffc0000289b64, 0xffffffff90e4f8e0, 0xffffffff90e4f848, 0x0,
0xfffffc000edb3700) ["../../../../src/kernel/vfs/vfs_vnops.c":877,
0xfffffc00004c4a0c]
   5 rwuio(0xfffffc000ebb8220, 0xfffffc000bf5fb80, 0xfffffc0000686080,
0xfffffc000ebb8000, 0xfffffc000edb3700)
["../../../../src/kernel/bsd/sys_generic.c":1217, 0xfffffc0000289bb8]
   6 read(0xfffffc0000000008, 0xffffffff90e4f838, 0x5e7d850, 0x800000001, 0x0)
["../../../../src/kernel/bsd/sys_generic.c":1167, 0xfffffc0000289ac8]
   7 syscall(0x8, 0xfffffc00004fd890, 0x370a5330985bf, 0x12008b5a0, 0x3)
["../../../../src/kernel/arch/alpha/syscall_trap.c":540, 0xfffffc0000507a00]
>  8 _Xsyscall(0x8, 0x3ff800d1818, 0x1400302b0, 0x3, 0x11fffb950)
["../../../../src/kernel/arch/alpha/locore.s":1209, 0xfffffc00004fdac4]
(dbx) tstack

Thread 0xfffffc000bf5fb80:
   0 thread_block() ["../../../../src/kernel/kern/sched_prim.c":2094,
0xfffffc00002a86bc]
(dbx) tstack

Thread 0xfffffc000bf5fb80:
   0 thread_block() ["../../../../src/kernel/kern/sched_prim.c":2094,
0xfffffc00002a86bc]
   1 mmread(0x0, 0x0, 0xfffffc0000607f30, 0x0, 0xfffffc00004a391c)
["../../../../src/kernel/arch/alpha/mem.c":125, 0xfffffc00005059a0]
   2 spec_read(0xffffffff90e4f7d8, 0xfffffc000ee45a00, 0xfffffc00004c4d70,
0xfffffc000ee65800, 0xfffffc0000200000)
["../../../../src/kernel/vfs/spec_vnops.c":1870, 0xfffffc00004a3918]
   3 msfsspec_read(vp = 0xfffffc00052af600, uio = 0xffffffff90e4f848, ioflag =
0, cred = 0xfffffc000ee65800)
["../../../../src/kernel/msfs/osf/msfs_vnops.c":4422, 0xfffffc00003eacac]
   4 vn_read(0xfffffc0000289b64, 0xffffffff90e4f8e0, 0xffffffff90e4f848, 0x0,
0xfffffc000edb3700) ["../../../../src/kernel/vfs/vfs_vnops.c":877,
0xfffffc00004c4a0c]
   5 rwuio(0xfffffc000ebb8220, 0xfffffc000bf5fb80, 0xfffffc0000686080,
0xfffffc000ebb8000, 0xfffffc000edb3700)
["../../../../src/kernel/bsd/sys_generic.c":1217, 0xfffffc0000289bb8]
   6 read(0x8, 0xffffffff90e4f838, 0x5e7d850, 0x800000001, 0x0)
["../../../../src/kernel/bsd/sys_generic.c":1167, 0xfffffc0000289ac8]
   7 syscall(0x8, 0xfffffc00004fd890, 0xdc09f330985bf, 0x3ff800e5518, 0x3)
["../../../../src/kernel/arch/alpha/syscall_trap.c":540, 0xfffffc0000507a00]
>  8 _Xsyscall(0x8, 0x3ff800d1818, 0x1400302b0, 0x3, 0x11fffb950)
["../../../../src/kernel/arch/alpha/locore.s":1209, 0xfffffc00004fdac4]
(dbx) tstack

Thread 0xfffffc000bf5fb80:
   0 thread_block() ["../../../../src/kernel/kern/sched_prim.c":2094,
0xfffffc00002a86bc]
(dbx) tstack

Thread 0xfffffc000bf5fb80:
   0 thread_block() ["../../../../src/kernel/kern/sched_prim.c":2094,
0xfffffc00002a86bc]
(dbx) tstack

Thread 0xfffffc000bf5fb80:
   0 thread_block() ["../../../../src/kernel/kern/sched_prim.c":2094,
0xfffffc00002a86bc]
(dbx) tstack

Thread 0xfffffc000bf5fb80:
   0 thread_block() ["../../../../src/kernel/kern/sched_prim.c":2094,
0xfffffc00002a86bc]
   1 mmread(0x0, 0x0, 0xfffffc0000607f30, 0x0, 0xfffffc00004a391c)
["../../../../src/kernel/arch/alpha/mem.c":125, 0xfffffc00005059a0]
   2 spec_read(0xffffffff90e4f7d8, 0xfffffc000ee45a00, 0x120086380,
0xfffffc000ee65800, 0xfffffc0000200000)
["../../../../src/kernel/vfs/spec_vnops.c":1870, 0xfffffc00004a3918]
   3 msfsspec_read(vp = 0xfffffc00052af600, uio = 0xffffffff90e4f848, ioflag =
0, cred = 0xfffffc000ee65800)
["../../../../src/kernel/msfs/osf/msfs_vnops.c":4422, 0xfffffc00003eacac]
   4 vn_read(0xfffffc0000289b64, 0xffffffff90e4f8e0, 0xffffffff90e4f848, 0x0,
0xfffffc000edb3700) ["../../../../src/kernel/vfs/vfs_vnops.c":877,
0xfffffc00004c4a0c]
   5 rwuio(0xfffffc000ebb8220, 0xfffffc000bf5fb80, 0xfffffc0000686080,
0xfffffc000ebb8000, 0xfffffc000edb3700)
["../../../../src/kernel/bsd/sys_generic.c":1217, 0xfffffc0000289bb8]
   6 read(0xfffffc0000000008, 0xffffffff90e4f838, 0x5e7d850, 0x800000001, 0x0)
["../../../../src/kernel/bsd/sys_generic.c":1167, 0xfffffc0000289ac8]
   7 syscall(0x8, 0xfffffc00004fd890, 0x21798330985c1, 0x120039984, 0x3)
["../../../../src/kernel/arch/alpha/syscall_trap.c":540, 0xfffffc0000507a00]
>  8 _Xsyscall(0x8, 0x3ff800d1818, 0x1400302b0, 0x3, 0x11fffb950)
["../../../../src/kernel/arch/alpha/locore.s":1209, 0xfffffc00004fdac4]
(dbx) 
########################## B ##############################################
% ladebug -pid 17685
/usr/users/ss7/perfs/ivp2.loop/obj/multi_thread/ss7_vlr_thread_ccitt
Welcome to the Ladebug Debugger Version 4.0-26
------------------ 
object file name:
/usr/users/ss7/perfs/ivp2.loop/obj/multi_thread/ss7_vlr_thread_ccitt 
Reading symbolic information ...done
Attached to process id 17685  ....
Thread received signal INT
stopped at [???: ??? 0x3ff800d1818]
(ladebug) where
>0  0x3ff800d1818 in /usr/shlib/libc.so
#1  0x3ffbffe9b90 in
((TC_transport_layer_tcp_ip*)0x1400e9600)->FI_full_read(AI_socket=-885136,
AP_bufadr=0x3fffffd2820="P;\367\277\377\003", AI_buflen=-989136)
/usr/ss7prj1/ccitt_v31_du40/base/starlet/transport_layer_tcp_ip.cc:774
#2  0x3ffbffe9928 in
((TC_transport_layer_tcp_ip*)0x1400e9600)->FI_read(AI_socket=17,
AP_bufadr=0x140acb230="|\001AMUSSETVLR   \b\0029MUSSET      ", AI_buflen=2048)
/usr/ss7prj1/ccitt_v31_du40/base/starlet/transport_layer_tcp_ip.cc:573
#3  0x3ffbffe734c in
((TC_transport_layer*)0x1400e9600)->FV_read_service(AI_socket=0x11)
/usr/ss7prj1/ccitt_v31_du40/base/starlet/transport_layer.cc:1221
#4  0x3ffbffe6098 in TC_task_<TC_transport_layer>::FV_entry(AR_task={ ... })
/usr/users1/martin/ss7dev/ccitt_v31_du40/dev/task_.hh:620
#5  0x3ff80574274 in thdBase(0x0, 0x0, 0x0, 0x1, 0x45586732, 0x3)
DebugInformationStrippedFromFile101:???
(ladebug) stepi
Thread received signal INT
stopped at [???: ??? 0x3ff800d1818]     beq     r19, 0x3ff800d1830
(ladebug) 0x3ff800d1818/20 i
*[???:???, 0x3ff800d1818]       beq     r19, 0x3ff800d1830
 [???:???, 0x3ff800d181c]       call_pal        bpt
 [???:???, 0x3ff800d1820]       ldah    gp, 16380(gp)
 [???:???, 0x3ff800d1824]       lda     gp, 6544(gp)
 [???:???, 0x3ff800d1828]       ldq     r28, -27696(gp)
 [???:???, 0x3ff800d182c]       jmp     r31, (r28), 0x3ff800d1160(r31)
 [???:???, 0x3ff800d1830]       ret     r31, (r26), 1
 [???:???, 0x3ff800d1834]       bis     r31, r31, r31
 [???:???, 0x3ff800d1838]       ldq_u   r31, 0(sp)
 [???:???, 0x3ff800d183c]       bis     r31, r31, r31
 [__gettimeofday:???, 0x3ff800d1840]    addq    r31, 0x74, r0
 [__gettimeofday:???, 0x3ff800d1844]    call_pal        callsys
 [???:???, 0x3ff800d1848]       beq     r19, 0x3ff800d1860
 [???:???, 0x3ff800d184c]       br      gp, 0x3ff800d1850
 [???:???, 0x3ff800d1850]       ldah    gp, 16380(gp)
 [???:???, 0x3ff800d1854]       lda     gp, 6496(gp)
 [???:???, 0x3ff800d1858]       ldq     r28, -27696(gp)
 [???:???, 0x3ff800d185c]       jmp     r31, (r28), 0x3ff800cf0a0(r31)
 [???:???, 0x3ff800d1860]       ret     r31, (r26), 1
 [???:???, 0x3ff800d1864]       bis     r31, r31, r31
(ladebug) stopi at  0x3ff800d181c
[#1: stopi at 4395899885596 ]
(ladebug) stopi at 0x3ff800d1818
[#2: stopi at 4395899885592 ]
(ladebug) stopi at 0x3ff800d1830
[#3: stopi at 4395899885616 ]
(ladebug) nexti                      <<-- doesn't stop, i do Ctrl+C
Thread received signal INT
stopped at [???: ??? 0x3ff800d1818]     beq     r19, 0x3ff800d1830
(ladebug) stepi
Thread received signal INT
stopped at [???: ??? 0x3ff800d1818]     beq     r19, 0x3ff800d1830
(ladebug) up
>1  0x3ffbffe9b90 in
((TC_transport_layer_tcp_ip*)0x1400e9600)->FI_full_read(AI_socket=-885136,
AP_bufadr=0x3fffffd2820="P;\367\277\377\003", AI_buflen=-989136)
/usr/ss7prj1/ccitt_v31_du40/base/starlet/transport_layer_tcp_ip.cc:774
    774     VI_rslt = read ( AI_socket , AP_bufadr + VI_count , AI_buflen -
VI_count ) ;
(ladebug) list 770
    770   while ( VI_count < AI_buflen )
    771   {
    772     int VI_rslt ;
    773 
>   774     VI_rslt = read ( AI_socket , AP_bufadr + VI_count , AI_buflen -
VI_count ) ;
    775 
    776     // Check for connection loss.
    777 
    778     if ( VI_rslt < 0 && errno != ECONNRESET )
    779     {
    780       MM_throw_system_error ( errno ) ;
    781     }
    782     
    783     if ( VI_rslt <= 0 )
    784     {
    785       return 0 ;
    786     }
    787 
    788     VI_count += VI_rslt ;
    789   }
    790 
(ladebug) stop at 777
[#4: stop at
"/usr/ss7prj1/ccitt_v31_du40/base/starlet/transport_layer_tcp_ip.cc":778 ]
(ladebug) stepi
Thread received signal INT
stopped at [???: ??? 0x3ff800d1818]     beq     r19, 0x3ff800d1830


[Posted by WWW Notes gateway]
8718.6SMURF::DENHAMDigital UNIX KernelTue Feb 18 1997 10:3312
    Hmmmmm. The fact that you're getting different stack traces
    shows that you're not stuck in read. But the fact that you're
    seeing something besides the read trace shows that you appear
    to be completing the reads.
    
    Let's confirm what you're reading, OK? The call is going to the
    memory device -- so, this is read is to /dev/mem, kmem, zero?
    Something else? Nor can I find a thread_block at line 125
    of mmread, which is a little puzzling.
    
    So, tell us more about what you're reading and what's supposed
    to be happening....
8718.7Maybe it's doing what you told it to do...WTFN::SCALESDespair is appropriate and inevitable.Tue Feb 18 1997 15:019
.-2>    770   while ( VI_count < AI_buflen )

Also, you're doing the read inside a loop...did you take a look at the loop
conditional expression?  (E.g., any chance that AI_buflen is huge, or that
VI_count has been corrupted (e.g., it's negative), or that VI_count somehow
is not being incremented?)


				Webb
8718.8Another possibilityWTFN::SCALESDespair is appropriate and inevitable.Tue Feb 18 1997 15:0511
I see that you're setting the breakpoint inside the loop and it never seems
to hit that, which does suggest that read() is never returning (but which
could be an illusion -- the debugger seems to be pretty shakey on other
issues here...)

Another possibility is that the RA on the stack has been corrupted, so that
every time read() "returns" control passes to the beginning of the read()
routine.  (That's a pretty long shot, but it's no inconceivable...)


				Webb
8718.9it seems that problem doesn't occur on a multi cpu platform..NETRIX::&quot;[email protected]&quot;Laurent MartinThu Feb 20 1997 12:507
I tried DECss7 on a multi cpu (4) platform and i didn't yet have this problem.

But i still have the problem as in topic 8821 (sigwait interrupted with EINTR
 while all signals blocked.)

Laurent
[Posted by WWW Notes gateway]
8718.10Also seeing an apparent spin in readMUFFIT::gerryGerry ReillyTue Mar 04 1997 16:38144
We appear to be hitting a similar/same problem to .0 with a spin in read
from sockets from a multithread program.  This is being run
on V4.0a.  We have set breakpoints after the read - these are not reached.
Not always trusting a debugger, we also added trace statements 
after the reads - also do not get reached.  Therefore I have to conclude
the spin is actually in the read().

Only seen so far on SMP systems.  Also when we get the spin, we always 
see two threads spinning (never one).  These two threads are doing reads
against different sockets.

Any thoughts ?, or useful information that we should collect ?

-gerry


(ladebug) show thread
Thread State      Substate        Policy     Priority Name
------ ---------- --------------- ---------- -------- -------------
     1 blocked    cond wait       throughput 11       default thread
    -1 ready      kernel          fifo       32       manager thread
    -2 ready                      idle        0       null thread for VP 0x0
    -3 ready                      idle        0       null thread for VP 0x1
     2 blocked    timed cond wait throughput 11       <anonymous>
     3 ready                      throughput 11       <anonymous>
     4 blocked    kernel          throughput 11       <anonymous>
     5 blocked    cond wait       throughput 11       <anonymous>
     6 blocked    cond wait       throughput 11       <anonymous>
     7 blocked    cond wait       throughput 11       <anonymous>
     8 blocked    cond wait       throughput 11       <anonymous>
     9 blocked    cond wait       throughput 11       <anonymous>
    10 blocked    cond wait       throughput 11       <anonymous>
    11 blocked    cond wait       throughput 11       <anonymous>
    12 blocked    cond wait       throughput 11       <anonymous>
    13 ready      kernel          throughput 11       <anonymous>
    32 ready      kernel          throughput 11       <anonymous>
    33 ready      kernel          throughput 11       <anonymous>
    34 blocked    kernel          throughput 11       <anonymous>
    35 blocked    kernel          throughput 11       <anonymous>
    36 blocked    kernel          throughput 11       <anonymous>
    37 blocked    kernel          throughput 11       <anonymous>
    38 blocked    kernel          throughput 11       <anonymous>
    39 blocked    kernel          throughput 11       <anonymous>
    40 blocked    kernel          throughput 11       <anonymous>
    41 blocked    kernel          throughput 11       <anonymous>
>*  42 running                    throughput 11       <anonymous>
    43 blocked    cond wait       throughput 11       <anonymous>
    44 running                    throughput 11       <anonymous>
    45 blocked    cond wait       throughput 11       <anonymous>
    46 blocked    kernel          throughput 11       <anonymous>
    47 blocked    kernel          throughput 11       <anonymous>
    48 blocked    kernel          throughput 11       <anonymous>
(ladebug) thread 42
Thread State      Substate        Policy     Priority Name
------ ---------- --------------- ---------- -------- -------------
>*  42 running                    throughput 11       <anonymous>

(ladebug) where
>0  0x24b49818 in /usr/shlib/libc.so
#1  0x6f593924 in ComIP_SocketReadMsg() /u/canna/Monaco-L970302/src/com/ip/src/co
mip_skrct.c:404
#2  0x6f596518 in ComIP_InternalSocketReceive() /u/canna/Monaco-L970302/src/com/i
p/src/comip_skrct.c:964
#3  0x6f594a50 in ComIP_SocketReceiveThread() /u/canna/Monaco-L970302/src/com/ip/
src/comip_skrct.c:558
#4  0x246c0d74 in UnknownProcedure12FromFile179(0x3, 0x142efa18, 0x24810b98, 0x24
6c0cb0, 0x143b2f30, 0x0) DebugInformationStrippedFromFile179:???
#5  0x24810b94 in thdBase(0x0, 0x0, 0x248002c0, 0x1, 0x45586732, 0x3) DebugInform
ationStrippedFromFile101:???
(ladebug) thread 44
Thread State      Substate        Policy     Priority Name
------ ---------- --------------- ---------- -------- -------------
>   44 running                    throughput 11       <anonymous>

(ladebug) where
>0  0x24b49818 in /usr/shlib/libc.so
#1  0x6f593924 in ComIP_SocketReadMsg() /u/canna/Monaco-L970302/src/com/ip/src/co
mip_skrct.c:404
#2  0x6f596518 in ComIP_InternalSocketReceive() /u/canna/Monaco-L970302/src/com/i
p/src/comip_skrct.c:964
#3  0x6f594a50 in ComIP_SocketReceiveThread() /u/canna/Monaco-L970302/src/com/ip/
src/comip_skrct.c:558
#4  0x246c0d74 in UnknownProcedure12FromFile179(0x3, 0x14419a18, 0x24810b98, 0x24
6c0cb0, 0x1437b490, 0x0) DebugInformationStrippedFromFile179:???
#5  0x24810b94 in thdBase(0x0, 0x0, 0x0, 0x1, 0x45586732, 0x3) DebugInformationSt
rippedFromFile101:???
(ladebug) detach

===============
Code extract
===============

 This is the code extract from comip_skrct.c. Line 404 is the line:

        Read = SUPOS_SOCKET_READ(sock,BuffP,BuffL);




    ERZIENTRY("ComIP_SocketReadMsg", ("(Socket=%i)",sock));

    /*
     * We are expecting a Rx Header so read until we have all of
     * the header in our buffer.
     */
    Bytes = 0;
    while ((*Continue == TRUE) && (Bytes < COMIP_RX_LEN))
    {
        BuffP = Buffer + Bytes;
        BuffL = COMIP_RX_LEN - Bytes;

        ERZMID(COMIP_TINFO,("Reading %lu bytes of Rx Header",BuffL));

        Read = SUPOS_SOCKET_READ(sock,BuffP,BuffL);

        if (Read > 0)
        {
            ERZMID(COMIP_TINFO,("%lu bytes of Rx Header read",Read));
            Bytes += Read;
        }
        else
        {
            if (Read < 0)
            {
                xerrno = SUPOS_SOCKET_LAST_ERROR;
                ERZMID(COMIP_TERROR,("Read error on Rx Header = %d",xerrno));
            }
            else
            {
                ERZMID(COMIP_TERROR,("Socket closed reading Rx Header"));
            }
            *Continue = FALSE;
        }
    }


This is the #define for SUPOS_SOCKET_READ.

#define SUPOS_SOCKET_READ(Sock,pBuf,pBufLen)   read(Sock,pBuf,pBufLen)

ERZMID(...) writes out trace information 



8718.11fixed most likelySMURF::DENHAMDigital UNIX KernelTue Mar 04 1997 21:178
    Yeah, shoulda updated this yesterday. Working with the ss7
    folks who reported this and related sigwait problem, I located
    the cause of the spinning. You should be seeing EINTR returned
    repeatedly in the errno value (if you zero it before the
    read call of course).
    
    I have some v4.0b patches ready for you test. If you need 
    other versions, let me know and I'll cook them up asap.
8718.12SMURF::DENHAMDigital UNIX KernelTue Mar 04 1997 21:198
    I suppose I should have provided a few details on the nature
    of the problem, but I can't bring myself to say more than
    it's just another feature of trying to get threads and signals
    to work together.
    
    The code causing the problem was ripped out first thing
    when I started some v4.0d enhancements. It took me a while
    to realize why I never saw the problem on my latest kernels....
8718.13QAR for this problem is QAR51813NETRIX::&quot;[email protected]&quot;Laurent MartinWed Mar 05 1997 12:329
Hi,
  We entered a QAR for this problem that was related to the topic #8821.
This the fix to be traced in future Unix versions.

Laurent

SISB-Telecom

[Posted by WWW Notes gateway]