Title: | DIGITAL UNIX (FORMERLY KNOWN AS DEC OSF/1) |
Notice: | Welcome to the Digital UNIX Conference |
Moderator: | SMURF::DENHAM |
Created: | Thu Mar 16 1995 |
Last Modified: | Fri Jun 06 1997 |
Last Successful Update: | Fri Jun 06 1997 |
Number of topics: | 10068 |
Total number of notes: | 35879 |
Hi all, We are working on the migration of our product (DECss7) from DIGITAL Unix 3.2 to DIGITAL Unix 4.0B. Beside the fact of the slight change in standards (POSIX) for threads / signals, we are having a problem that occurs with the read() function. Here is a description of the problem: DECss7 is a multithreaded / distributed platform. Processes have connections with sockets / mailboxes In a simple configuration of the platform, all works fine for about 5 minutes and then a process stops with a high process charge. this is what i get with "ps": 18071 0.0 /usr/var/ss7/platform_m/bin/ss7_fep_ccitt_m.exe 29915 0.0 /usr/var/ss7/platform_m/bin/ss7_bep_ccitt_m.exe 17964 0.0 /usr/var/ss7/platform_m/bin/ss7_dir_ccitt_m.exe 17421 0.0 /usr/var/ss7/platform_l/bin/ss7_fep_ccitt_l.exe 2180 0.0 /usr/var/ss7/platform_l/bin/ss7_bep_ccitt_l.exe 18036 0.0 /usr/var/ss7/platform_l/bin/ss7_dir_ccitt_l.exe 18896 0.0 /usr/users/ss7/perfs/ivp2.loop/obj/multi_thread/ss7_hlr_thread_ccitt 18864 42.0 /usr/users/ss7/perfs/ivp2.loop/obj/multi_thread/ss7_vlr_thread_ccitt We use ladebug to see where it is spinning: ladebug -pid 18864 /usr/users/ss7/perfs/ivp2.loop/obj/multi_thread/ss7_vlr_thread_ccitt Welcome to the Ladebug Debugger Version 4.0-26 ------------------ object file name: /usr/users/ss7/perfs/ivp2.loop/obj/multi_thread/ss7_vlr_thread_ccitt Reading symbolic information ...done Attached to process id 18864 .... Thread received signal INT stopped at [???: ??? 0x3ff800d1818] (ladebug) show thread Thread State Substate Policy Priority Name ------ ---------- --------------- ---------- -------- ------------- >* 20 running throughput 11 0x1409528c0 TRP-RD-SS7 1 ready throughput 11 default thread -1 blocked kernel fifo 32 manager thread -2 ready idle 0 null thread for VP 0x0 2 ready throughput 11 0x140062f20 SIGNAL-SS7 3 blocked cond wait rr 23 <anonymous> 4 blocked cond wait rr 23 <anonymous> 5 blocked kernel rr 23 <anonymous> 6 blocked kernel throughput 11 0x1406e52c0 TRP-CN-SS7 7 blocked cond wait throughput 11 0x1406e5320 TRP-RG-SS7 8 blocked cond wait throughput 11 0x1406e53e0 MBX-RD-SS7 9 blocked cond wait throughput 11 0x1406e5440 MBX-WR-SS7 10 blocked cond wait throughput 11 0x1406e54a0 MBX-IO-SS7 11 blocked cond wait rr 23 0x1406e5500 MBX-RD-SS7 12 blocked cond wait rr 23 0x1406e5560 MBX-WR-SS7 13 blocked cond wait rr 23 0x1406e55c0 MBX-IO-SS7 14 blocked kernel rr 23 <anonymous> 15 blocked kernel rr 23 <anonymous> 16 blocked kernel throughput 11 0x1406e56e0 TRP-NT-SS7 18 blocked kernel throughput 11 0x140952800 TRP-NT-SS7 19 blocked cond wait throughput 11 0x140952860 TRP-WR-SS7 21 blocked kernel throughput 11 0x140952980 TRP-RD-SS7 22 ready cond wait throughput 11 <anonymous> 23 blocked cond wait throughput 11 <anonymous> (ladebug) where >0 0x3ff800d1818 in /usr/shlib/libc.so #1 0x3ffbffeb0e4 in ((TC_transport_layer_tcp_ip*)0x1400e9600)->FI_full_read(AI_socket=-1016448, AP_bufadr=0x3fffffb2730="p\"\356\277\377\003", AI_buflen=-1120448) /usr/users1/martin/ss7dev/ccitt_v30_osf40/dev/transport_layer_tcp_ip.cc:771 ... (ladebug) up (ladebug) list 760 760 // We read on the socket until we have filled the given buffer. 761 762 while ( VI_count < AI_buflen ) 763 { 764 int VI_rslt = 0; 765 766 if (VI_rslt <0 || VI_rslt>2000) 767 { 768 printf("TCP/IP erreur: VI_rslt=%d\n",VI_rslt); 769 } 770 > 771 VI_rslt = read ( AI_socket , AP_bufadr + VI_count , AI_buflen - VI_count ) ; 772 773 // Check for connection loss. 774 775 if ( VI_rslt < 0 && errno != ECONNRESET ) 776 { 777 MM_throw_system_error ( errno ) ; 778 } 779 780 if ( VI_rslt <= 0 ) (ladebug) p VI_rslt -1016976 As we can see, the read() function call spins. Is this a known problem or is there a patch for read() in Unix 4.0 . We are also having problem with sigwait that fails in "cxx__thow" ...??? (not at the same time, but perhaps is it related ? ) Is there a way to debug this ? Any help appreciated !!! Thank you ! Laurent Martin SISB Telecom France
T.R | Title | User | Personal Name | Date | Lines |
---|---|---|---|---|---|
8718.1 | We can't tell that it's spinning. | WTFN::SCALES | Despair is appropriate and inevitable. | Thu Feb 06 1997 19:05 | 13 |
.0> As we can see, the read() function call spins. Based solely on the information you've provided, it's not at all obvious that the read() function call is spinning. You've interrupted an active process and taken a snapshot of what it was doing at that moment. Are you certain that the call to read() never returns? Do other threads in the process continue to run? I would recommend that you use the debugger to determine what the current thread is actually doing... Webb | |||||
8718.2 | I'm sure it's spinning !! | NETRIX::"[email protected]" | Laurent Martin | Fri Feb 14 1997 05:33 | 101 |
Hi, Webb, you're right, i didn't give details in the debugger output to show that read is spinning: We can reproduce the spin when making DECss7 run a certain time on unix4.0 . We are having troubles with signals too (see topic #8821). he is another debugging session: (with dbx this time.) ------------------------------------------------------------------------------ -- % dbx -pid 8777 /usr/var/ss7/platform_l/bin/ss7_fep_ccitt_l.exe dbx version 3.11.10 Type 'help' for help. thread 0x15 stopped at>*[__read, 0x3ff800d1818] beq r19, 0x3ff800d1830 (dbx) where > 0 __read(0x0, 0x1201fc4e0, 0x141166ef0, 0x1201fce90, 0x1400875e0) [0x3ff800d1818] 1 FI_full_read__25TC_transport_layer_tcp_ipXiPci(this = 0x140241e00, AI_socket = 1089074352, AP_bufadr = 0x1, AI_buflen = 1091989408) ["/usr/ss7prj1/ccitt_v31_du40/base/starlet/transport_layer_tcp_ip.cc":774, 0x3ffbffe9ea0] 2 FI_read__25TC_transport_layer_tcp_ipXiPci(this = 0x140241e00, AI_socket = 17, AP_bufadr = 0x141167230 = "^�", AI_buflen = 2048) ["/usr/ss7prj1/ccitt_v31_du40/base/starlet/transport_layer_tcp_ip.cc":573, 0x3ffbffe9c38] 3 FV_read_service__18TC_transport_layerXPv(this = 0x140241e00, AI_socket = 0x11) ["/usr/ss7prj1/ccitt_v31_du40/base/starlet/transport_layer.cc":1221, 0x3ffbffe765c] 4 FV_entry__31TC_task___T18TC_transport_layerXR31TC_task___T18TC_transport_layer AR_task = @140dadd80) ["/usr/users1/martin/ss7dev/ccitt_v31_du40/dev/task_.hh":620, 0x3ffbffe63a8] 5 thdBase(0x0, 0x0, 0x0, 0x1, 0x45586732) [0x3ff80574274] (dbx) where > 0 __read(0x0, 0x1201fc4e0, 0x141166ef0, 0x1201fce90, 0x1400875e0) [0x3ff800d1818] 1 FI_full_read__25TC_transport_layer_tcp_ipXiPci(this = 0x140241e00, AI_socket = 1089074352, AP_bufadr = 0x1, AI_buflen = 1091989408) ["/usr/ss7prj1/ccitt_v31_du40/base/starlet/transport_layer_tcp_ip.cc":774, 0x3ffbffe9ea0] 2 FI_read__25TC_transport_layer_tcp_ipXiPci(this = 0x140241e00, AI_socket = 17, AP_bufadr = 0x141167230 = "^�", AI_buflen = 2048) ["/usr/ss7prj1/ccitt_v31_du40/base/starlet/transport_layer_tcp_ip.cc":573, 0x3ffbffe9c38] 3 FV_read_service__18TC_transport_layerXPv(this = 0x140241e00, AI_socket = 0x11) ["/usr/ss7prj1/ccitt_v31_du40/base/starlet/transport_layer.cc":1221, 0x3ffbffe765c] 4 FV_entry__31TC_task___T18TC_transport_layerXR31TC_task___T18TC_transport_layer AR_task = @140dadd80) ["/usr/users1/martin/ss7dev/ccitt_v31_du40/dev/task_.hh":620, 0x3ffbffe63a8] 5 thdBase(0x0, 0x0, 0x0, 0x1, 0x45586732) [0x3ff80574274] (dbx) list 86 87 -- function Main, the real main routine for the overall program. 88 -- 89 function Main -- in DEC Ada created code; please list lines 1:36 for more information 90 ( 91 Argc : in Long_Integer; 92 Argv : in System.Address; 93 Envp : in System.Address 94 ) 95 return Long_Integer 96 is (dbx) up FI_full_read__25TC_transport_layer_tcp_ipXiPci: 774 VI_rslt = read ( AI_socket , AP_bufadr + VI_count , AI_buflen - VI_count ) ; (dbx) list 771 { 772 int VI_rslt ; 773 > 774 VI_rslt = read ( AI_socket , AP_bufadr + VI_count , AI_buflen - VI_count ) ; 775 776 // Check for connection loss. 777 778 if ( VI_rslt < 0 && errno != ECONNRESET ) 779 { 780 MM_throw_system_error ( errno ) ; 781 } (dbx) stop at 776 [2] stop at "/usr/ss7prj1/ccitt_v31_du40/base/starlet/transport_layer_tcp_ip.cc":778 (dbx) cont ----------------------------------------------------------------------------- and here it never stops at 778, showing that something not normal occurs in read() (exception ? spin...?) If you want more details (debugger details ...) I can give you more. Thank you, Laurent [Posted by WWW Notes gateway] | |||||
8718.3 | SMURF::DENHAM | Digital UNIX Kernel | Fri Feb 14 1997 10:24 | 15 | |
Kernel side stack traces would be useful too. Let this thing run and dbx -k /vmunix. Do a tstack command and try to find some in read(). If you do find someone, do a couple of stack traces a few seconds apart. Any change? Then try this, again a few seconds between iterations: (dbx) p ((thread_t)$tid).last_run_stamp This will show whether the thread is running at all. Of course ps mp PID-OF-STUCK-PROCESS a few times will show you the same thing -- whether the "spinning" thread is really spinning. | |||||
8718.4 | What is it doing there? | WTFN::SCALES | Despair is appropriate and inevitable. | Fri Feb 14 1997 14:24 | 8 |
> thread 0x15 stopped at>*[__read, 0x3ff800d1818] beq r19, 0x3ff800d1830 It might also be illuminating if you were to single-step (by instruction) and see if you can traverse the whole loop. That might give us (or someone else with access to the source) an idea of what's going on. Webb | |||||
8718.5 | Re:.2 .3 :dbx and ladebug traces of the problem. | NETRIX::"[email protected]" | Laurent Martin | Tue Feb 18 1997 07:44 | 266 |
Hi, As Jeff said, i did a dbx -k /vmunix with tstack commands (four per second): see annex A we can see that we are very often in read() If i use dbx (or ladebug), stepi doesn't stop: Annex B I don't know what i could do now ... Laurent ########################## A ############################################## (dbx) tstack Thread 0xfffffc000bf5fb80: 0 thread_block() ["../../../../src/kernel/kern/sched_prim.c":2094, 0xfffffc00002a86bc] (dbx) tstack Thread 0xfffffc000bf5fb80: 0 thread_block() ["../../../../src/kernel/kern/sched_prim.c":2094, 0xfffffc00002a86bc] 1 mmread(0x0, 0x0, 0xfffffc0000607f30, 0x0, 0xfffffc00004a391c) ["../../../../src/kernel/arch/alpha/mem.c":125, 0xfffffc00005059a0] 2 spec_read(0xffffffff90e4f7d8, 0xfffffc000ee45a00, 0x8, 0xfffffc000ee65800, 0xfffffc0000200000) ["../../../../src/kernel/vfs/spec_vnops.c":1870, 0xfffffc00004a3918] 3 msfsspec_read(vp = 0xfffffc00052af600, uio = 0xffffffff90e4f848, ioflag = 0, cred = 0xfffffc000ee65800) ["../../../../src/kernel/msfs/osf/msfs_vnops.c":4422, 0xfffffc00003eacac] 4 vn_read(0xfffffc0000289b64, 0xffffffff90e4f8e0, 0xffffffff90e4f848, 0x0, 0xfffffc000edb3700) ["../../../../src/kernel/vfs/vfs_vnops.c":877, 0xfffffc00004c4a0c] 5 rwuio(0xfffffc000ebb8220, 0xfffffc000bf5fb80, 0xfffffc0000686080, 0xfffffc000ebb8000, 0xfffffc000edb3700) ["../../../../src/kernel/bsd/sys_generic.c":1217, 0xfffffc0000289bb8] 6 read(0xfffffc0000000008, 0xffffffff90e4f838, 0x5e7d850, 0x800000001, 0x0) ["../../../../src/kernel/bsd/sys_generic.c":1167, 0xfffffc0000289ac8] 7 syscall(0x8, 0xfffffc00004fd890, 0xa4e75330985be, 0x3ff800d7248, 0x3) ["../../../../src/kernel/arch/alpha/syscall_trap.c":540, 0xfffffc0000507a00] > 8 _Xsyscall(0x8, 0x3ff800d1818, 0x1400302b0, 0x3, 0x11fffb950) ["../../../../src/kernel/arch/alpha/locore.s":1209, 0xfffffc00004fdac4] (dbx) tstack Thread 0xfffffc000bf5fb80: 0 thread_block() ["../../../../src/kernel/kern/sched_prim.c":2094, 0xfffffc00002a86bc] 1 mmread(0x0, 0x0, 0xfffffc0000607f30, 0x0, 0xfffffc00004a391c) ["../../../../src/kernel/arch/alpha/mem.c":125, 0xfffffc00005059a0] 2 spec_read(0xffffffff90e4f7d8, 0xfffffc000ee45a00, 0x2, 0xfffffc000ee65800, 0xfffffc0000200000) ["../../../../src/kernel/vfs/spec_vnops.c":1870, 0xfffffc00004a3918] 3 msfsspec_read(vp = 0xfffffc00052af600, uio = 0xffffffff90e4f848, ioflag = 0, cred = 0xfffffc000ee65800) ["../../../../src/kernel/msfs/osf/msfs_vnops.c":4422, 0xfffffc00003eacac] 4 vn_read(0xfffffc0000289b64, 0xffffffff90e4f8e0, 0xffffffff90e4f848, 0x0, 0xfffffc000edb3700) ["../../../../src/kernel/vfs/vfs_vnops.c":877, 0xfffffc00004c4a0c] 5 rwuio(0xfffffc000ebb8220, 0xfffffc000bf5fb80, 0xfffffc0000686080, 0xfffffc000ebb8000, 0xfffffc000edb3700) ["../../../../src/kernel/bsd/sys_generic.c":1217, 0xfffffc0000289bb8] 6 read(0xfffffc0000000008, 0xffffffff90e4f838, 0x5e7d850, 0x800000001, 0x0) ["../../../../src/kernel/bsd/sys_generic.c":1167, 0xfffffc0000289ac8] 7 syscall(0x8, 0xfffffc00004fda70, 0xe42d4330985be, 0x12009ab68, 0x3) ["../../../../src/kernel/arch/alpha/syscall_trap.c":540, 0xfffffc0000507a00] > 8 _Xsyscall(0x8, 0x3ff800d1818, 0x1400302b0, 0x3, 0x11fffb950) ["../../../../src/kernel/arch/alpha/locore.s":1209, 0xfffffc00004fdac4] (dbx) tstack Thread 0xfffffc000bf5fb80: 0 thread_block() ["../../../../src/kernel/kern/sched_prim.c":2094, 0xfffffc00002a86bc] 1 mmread(0x0, 0x0, 0xfffffc0000607f30, 0x0, 0xfffffc00004a391c) ["../../../../src/kernel/arch/alpha/mem.c":125, 0xfffffc00005059a0] 2 spec_read(0xffffffff90e4f7d8, 0xfffffc0000230000, 0x120086380, 0xfffffc000ee65800, 0xfffffc0000200000) ["../../../../src/kernel/vfs/spec_vnops.c":1870, 0xfffffc00004a3918] 3 msfsspec_read(vp = 0xfffffc00052af600, uio = 0xffffffff90e4f848, ioflag = 0, cred = 0xfffffc000ee65800) ["../../../../src/kernel/msfs/osf/msfs_vnops.c":4422, 0xfffffc00003eacac] 4 vn_read(0xfffffc0000289b64, 0xffffffff90e4f8e0, 0xffffffff90e4f848, 0x0, 0xfffffc000edb3700) ["../../../../src/kernel/vfs/vfs_vnops.c":877, 0xfffffc00004c4a0c] 5 rwuio(0xfffffc000ebb8220, 0xfffffc000bf5fb80, 0xfffffc0000686080, 0xfffffc000ebb8000, 0xfffffc000edb3700) ["../../../../src/kernel/bsd/sys_generic.c":1217, 0xfffffc0000289bb8] 6 read(0xfffffc0000000008, 0xffffffff90e4f838, 0x5e7d850, 0x800000001, 0x0) ["../../../../src/kernel/bsd/sys_generic.c":1167, 0xfffffc0000289ac8] 7 syscall(0x8, 0xfffffc00004fd890, 0x370a5330985bf, 0x12008b5a0, 0x3) ["../../../../src/kernel/arch/alpha/syscall_trap.c":540, 0xfffffc0000507a00] > 8 _Xsyscall(0x8, 0x3ff800d1818, 0x1400302b0, 0x3, 0x11fffb950) ["../../../../src/kernel/arch/alpha/locore.s":1209, 0xfffffc00004fdac4] (dbx) tstack Thread 0xfffffc000bf5fb80: 0 thread_block() ["../../../../src/kernel/kern/sched_prim.c":2094, 0xfffffc00002a86bc] (dbx) tstack Thread 0xfffffc000bf5fb80: 0 thread_block() ["../../../../src/kernel/kern/sched_prim.c":2094, 0xfffffc00002a86bc] 1 mmread(0x0, 0x0, 0xfffffc0000607f30, 0x0, 0xfffffc00004a391c) ["../../../../src/kernel/arch/alpha/mem.c":125, 0xfffffc00005059a0] 2 spec_read(0xffffffff90e4f7d8, 0xfffffc000ee45a00, 0xfffffc00004c4d70, 0xfffffc000ee65800, 0xfffffc0000200000) ["../../../../src/kernel/vfs/spec_vnops.c":1870, 0xfffffc00004a3918] 3 msfsspec_read(vp = 0xfffffc00052af600, uio = 0xffffffff90e4f848, ioflag = 0, cred = 0xfffffc000ee65800) ["../../../../src/kernel/msfs/osf/msfs_vnops.c":4422, 0xfffffc00003eacac] 4 vn_read(0xfffffc0000289b64, 0xffffffff90e4f8e0, 0xffffffff90e4f848, 0x0, 0xfffffc000edb3700) ["../../../../src/kernel/vfs/vfs_vnops.c":877, 0xfffffc00004c4a0c] 5 rwuio(0xfffffc000ebb8220, 0xfffffc000bf5fb80, 0xfffffc0000686080, 0xfffffc000ebb8000, 0xfffffc000edb3700) ["../../../../src/kernel/bsd/sys_generic.c":1217, 0xfffffc0000289bb8] 6 read(0x8, 0xffffffff90e4f838, 0x5e7d850, 0x800000001, 0x0) ["../../../../src/kernel/bsd/sys_generic.c":1167, 0xfffffc0000289ac8] 7 syscall(0x8, 0xfffffc00004fd890, 0xdc09f330985bf, 0x3ff800e5518, 0x3) ["../../../../src/kernel/arch/alpha/syscall_trap.c":540, 0xfffffc0000507a00] > 8 _Xsyscall(0x8, 0x3ff800d1818, 0x1400302b0, 0x3, 0x11fffb950) ["../../../../src/kernel/arch/alpha/locore.s":1209, 0xfffffc00004fdac4] (dbx) tstack Thread 0xfffffc000bf5fb80: 0 thread_block() ["../../../../src/kernel/kern/sched_prim.c":2094, 0xfffffc00002a86bc] (dbx) tstack Thread 0xfffffc000bf5fb80: 0 thread_block() ["../../../../src/kernel/kern/sched_prim.c":2094, 0xfffffc00002a86bc] (dbx) tstack Thread 0xfffffc000bf5fb80: 0 thread_block() ["../../../../src/kernel/kern/sched_prim.c":2094, 0xfffffc00002a86bc] (dbx) tstack Thread 0xfffffc000bf5fb80: 0 thread_block() ["../../../../src/kernel/kern/sched_prim.c":2094, 0xfffffc00002a86bc] 1 mmread(0x0, 0x0, 0xfffffc0000607f30, 0x0, 0xfffffc00004a391c) ["../../../../src/kernel/arch/alpha/mem.c":125, 0xfffffc00005059a0] 2 spec_read(0xffffffff90e4f7d8, 0xfffffc000ee45a00, 0x120086380, 0xfffffc000ee65800, 0xfffffc0000200000) ["../../../../src/kernel/vfs/spec_vnops.c":1870, 0xfffffc00004a3918] 3 msfsspec_read(vp = 0xfffffc00052af600, uio = 0xffffffff90e4f848, ioflag = 0, cred = 0xfffffc000ee65800) ["../../../../src/kernel/msfs/osf/msfs_vnops.c":4422, 0xfffffc00003eacac] 4 vn_read(0xfffffc0000289b64, 0xffffffff90e4f8e0, 0xffffffff90e4f848, 0x0, 0xfffffc000edb3700) ["../../../../src/kernel/vfs/vfs_vnops.c":877, 0xfffffc00004c4a0c] 5 rwuio(0xfffffc000ebb8220, 0xfffffc000bf5fb80, 0xfffffc0000686080, 0xfffffc000ebb8000, 0xfffffc000edb3700) ["../../../../src/kernel/bsd/sys_generic.c":1217, 0xfffffc0000289bb8] 6 read(0xfffffc0000000008, 0xffffffff90e4f838, 0x5e7d850, 0x800000001, 0x0) ["../../../../src/kernel/bsd/sys_generic.c":1167, 0xfffffc0000289ac8] 7 syscall(0x8, 0xfffffc00004fd890, 0x21798330985c1, 0x120039984, 0x3) ["../../../../src/kernel/arch/alpha/syscall_trap.c":540, 0xfffffc0000507a00] > 8 _Xsyscall(0x8, 0x3ff800d1818, 0x1400302b0, 0x3, 0x11fffb950) ["../../../../src/kernel/arch/alpha/locore.s":1209, 0xfffffc00004fdac4] (dbx) ########################## B ############################################## % ladebug -pid 17685 /usr/users/ss7/perfs/ivp2.loop/obj/multi_thread/ss7_vlr_thread_ccitt Welcome to the Ladebug Debugger Version 4.0-26 ------------------ object file name: /usr/users/ss7/perfs/ivp2.loop/obj/multi_thread/ss7_vlr_thread_ccitt Reading symbolic information ...done Attached to process id 17685 .... Thread received signal INT stopped at [???: ??? 0x3ff800d1818] (ladebug) where >0 0x3ff800d1818 in /usr/shlib/libc.so #1 0x3ffbffe9b90 in ((TC_transport_layer_tcp_ip*)0x1400e9600)->FI_full_read(AI_socket=-885136, AP_bufadr=0x3fffffd2820="P;\367\277\377\003", AI_buflen=-989136) /usr/ss7prj1/ccitt_v31_du40/base/starlet/transport_layer_tcp_ip.cc:774 #2 0x3ffbffe9928 in ((TC_transport_layer_tcp_ip*)0x1400e9600)->FI_read(AI_socket=17, AP_bufadr=0x140acb230="|\001AMUSSETVLR \b\0029MUSSET ", AI_buflen=2048) /usr/ss7prj1/ccitt_v31_du40/base/starlet/transport_layer_tcp_ip.cc:573 #3 0x3ffbffe734c in ((TC_transport_layer*)0x1400e9600)->FV_read_service(AI_socket=0x11) /usr/ss7prj1/ccitt_v31_du40/base/starlet/transport_layer.cc:1221 #4 0x3ffbffe6098 in TC_task_<TC_transport_layer>::FV_entry(AR_task={ ... }) /usr/users1/martin/ss7dev/ccitt_v31_du40/dev/task_.hh:620 #5 0x3ff80574274 in thdBase(0x0, 0x0, 0x0, 0x1, 0x45586732, 0x3) DebugInformationStrippedFromFile101:??? (ladebug) stepi Thread received signal INT stopped at [???: ??? 0x3ff800d1818] beq r19, 0x3ff800d1830 (ladebug) 0x3ff800d1818/20 i *[???:???, 0x3ff800d1818] beq r19, 0x3ff800d1830 [???:???, 0x3ff800d181c] call_pal bpt [???:???, 0x3ff800d1820] ldah gp, 16380(gp) [???:???, 0x3ff800d1824] lda gp, 6544(gp) [???:???, 0x3ff800d1828] ldq r28, -27696(gp) [???:???, 0x3ff800d182c] jmp r31, (r28), 0x3ff800d1160(r31) [???:???, 0x3ff800d1830] ret r31, (r26), 1 [???:???, 0x3ff800d1834] bis r31, r31, r31 [???:???, 0x3ff800d1838] ldq_u r31, 0(sp) [???:???, 0x3ff800d183c] bis r31, r31, r31 [__gettimeofday:???, 0x3ff800d1840] addq r31, 0x74, r0 [__gettimeofday:???, 0x3ff800d1844] call_pal callsys [???:???, 0x3ff800d1848] beq r19, 0x3ff800d1860 [???:???, 0x3ff800d184c] br gp, 0x3ff800d1850 [???:???, 0x3ff800d1850] ldah gp, 16380(gp) [???:???, 0x3ff800d1854] lda gp, 6496(gp) [???:???, 0x3ff800d1858] ldq r28, -27696(gp) [???:???, 0x3ff800d185c] jmp r31, (r28), 0x3ff800cf0a0(r31) [???:???, 0x3ff800d1860] ret r31, (r26), 1 [???:???, 0x3ff800d1864] bis r31, r31, r31 (ladebug) stopi at 0x3ff800d181c [#1: stopi at 4395899885596 ] (ladebug) stopi at 0x3ff800d1818 [#2: stopi at 4395899885592 ] (ladebug) stopi at 0x3ff800d1830 [#3: stopi at 4395899885616 ] (ladebug) nexti <<-- doesn't stop, i do Ctrl+C Thread received signal INT stopped at [???: ??? 0x3ff800d1818] beq r19, 0x3ff800d1830 (ladebug) stepi Thread received signal INT stopped at [???: ??? 0x3ff800d1818] beq r19, 0x3ff800d1830 (ladebug) up >1 0x3ffbffe9b90 in ((TC_transport_layer_tcp_ip*)0x1400e9600)->FI_full_read(AI_socket=-885136, AP_bufadr=0x3fffffd2820="P;\367\277\377\003", AI_buflen=-989136) /usr/ss7prj1/ccitt_v31_du40/base/starlet/transport_layer_tcp_ip.cc:774 774 VI_rslt = read ( AI_socket , AP_bufadr + VI_count , AI_buflen - VI_count ) ; (ladebug) list 770 770 while ( VI_count < AI_buflen ) 771 { 772 int VI_rslt ; 773 > 774 VI_rslt = read ( AI_socket , AP_bufadr + VI_count , AI_buflen - VI_count ) ; 775 776 // Check for connection loss. 777 778 if ( VI_rslt < 0 && errno != ECONNRESET ) 779 { 780 MM_throw_system_error ( errno ) ; 781 } 782 783 if ( VI_rslt <= 0 ) 784 { 785 return 0 ; 786 } 787 788 VI_count += VI_rslt ; 789 } 790 (ladebug) stop at 777 [#4: stop at "/usr/ss7prj1/ccitt_v31_du40/base/starlet/transport_layer_tcp_ip.cc":778 ] (ladebug) stepi Thread received signal INT stopped at [???: ??? 0x3ff800d1818] beq r19, 0x3ff800d1830 [Posted by WWW Notes gateway] | |||||
8718.6 | SMURF::DENHAM | Digital UNIX Kernel | Tue Feb 18 1997 10:33 | 12 | |
Hmmmmm. The fact that you're getting different stack traces shows that you're not stuck in read. But the fact that you're seeing something besides the read trace shows that you appear to be completing the reads. Let's confirm what you're reading, OK? The call is going to the memory device -- so, this is read is to /dev/mem, kmem, zero? Something else? Nor can I find a thread_block at line 125 of mmread, which is a little puzzling. So, tell us more about what you're reading and what's supposed to be happening.... | |||||
8718.7 | Maybe it's doing what you told it to do... | WTFN::SCALES | Despair is appropriate and inevitable. | Tue Feb 18 1997 15:01 | 9 |
.-2> 770 while ( VI_count < AI_buflen ) Also, you're doing the read inside a loop...did you take a look at the loop conditional expression? (E.g., any chance that AI_buflen is huge, or that VI_count has been corrupted (e.g., it's negative), or that VI_count somehow is not being incremented?) Webb | |||||
8718.8 | Another possibility | WTFN::SCALES | Despair is appropriate and inevitable. | Tue Feb 18 1997 15:05 | 11 |
I see that you're setting the breakpoint inside the loop and it never seems to hit that, which does suggest that read() is never returning (but which could be an illusion -- the debugger seems to be pretty shakey on other issues here...) Another possibility is that the RA on the stack has been corrupted, so that every time read() "returns" control passes to the beginning of the read() routine. (That's a pretty long shot, but it's no inconceivable...) Webb | |||||
8718.9 | it seems that problem doesn't occur on a multi cpu platform.. | NETRIX::"[email protected]" | Laurent Martin | Thu Feb 20 1997 12:50 | 7 |
I tried DECss7 on a multi cpu (4) platform and i didn't yet have this problem. But i still have the problem as in topic 8821 (sigwait interrupted with EINTR while all signals blocked.) Laurent [Posted by WWW Notes gateway] | |||||
8718.10 | Also seeing an apparent spin in read | MUFFIT::gerry | Gerry Reilly | Tue Mar 04 1997 16:38 | 144 |
We appear to be hitting a similar/same problem to .0 with a spin in read from sockets from a multithread program. This is being run on V4.0a. We have set breakpoints after the read - these are not reached. Not always trusting a debugger, we also added trace statements after the reads - also do not get reached. Therefore I have to conclude the spin is actually in the read(). Only seen so far on SMP systems. Also when we get the spin, we always see two threads spinning (never one). These two threads are doing reads against different sockets. Any thoughts ?, or useful information that we should collect ? -gerry (ladebug) show thread Thread State Substate Policy Priority Name ------ ---------- --------------- ---------- -------- ------------- 1 blocked cond wait throughput 11 default thread -1 ready kernel fifo 32 manager thread -2 ready idle 0 null thread for VP 0x0 -3 ready idle 0 null thread for VP 0x1 2 blocked timed cond wait throughput 11 <anonymous> 3 ready throughput 11 <anonymous> 4 blocked kernel throughput 11 <anonymous> 5 blocked cond wait throughput 11 <anonymous> 6 blocked cond wait throughput 11 <anonymous> 7 blocked cond wait throughput 11 <anonymous> 8 blocked cond wait throughput 11 <anonymous> 9 blocked cond wait throughput 11 <anonymous> 10 blocked cond wait throughput 11 <anonymous> 11 blocked cond wait throughput 11 <anonymous> 12 blocked cond wait throughput 11 <anonymous> 13 ready kernel throughput 11 <anonymous> 32 ready kernel throughput 11 <anonymous> 33 ready kernel throughput 11 <anonymous> 34 blocked kernel throughput 11 <anonymous> 35 blocked kernel throughput 11 <anonymous> 36 blocked kernel throughput 11 <anonymous> 37 blocked kernel throughput 11 <anonymous> 38 blocked kernel throughput 11 <anonymous> 39 blocked kernel throughput 11 <anonymous> 40 blocked kernel throughput 11 <anonymous> 41 blocked kernel throughput 11 <anonymous> >* 42 running throughput 11 <anonymous> 43 blocked cond wait throughput 11 <anonymous> 44 running throughput 11 <anonymous> 45 blocked cond wait throughput 11 <anonymous> 46 blocked kernel throughput 11 <anonymous> 47 blocked kernel throughput 11 <anonymous> 48 blocked kernel throughput 11 <anonymous> (ladebug) thread 42 Thread State Substate Policy Priority Name ------ ---------- --------------- ---------- -------- ------------- >* 42 running throughput 11 <anonymous> (ladebug) where >0 0x24b49818 in /usr/shlib/libc.so #1 0x6f593924 in ComIP_SocketReadMsg() /u/canna/Monaco-L970302/src/com/ip/src/co mip_skrct.c:404 #2 0x6f596518 in ComIP_InternalSocketReceive() /u/canna/Monaco-L970302/src/com/i p/src/comip_skrct.c:964 #3 0x6f594a50 in ComIP_SocketReceiveThread() /u/canna/Monaco-L970302/src/com/ip/ src/comip_skrct.c:558 #4 0x246c0d74 in UnknownProcedure12FromFile179(0x3, 0x142efa18, 0x24810b98, 0x24 6c0cb0, 0x143b2f30, 0x0) DebugInformationStrippedFromFile179:??? #5 0x24810b94 in thdBase(0x0, 0x0, 0x248002c0, 0x1, 0x45586732, 0x3) DebugInform ationStrippedFromFile101:??? (ladebug) thread 44 Thread State Substate Policy Priority Name ------ ---------- --------------- ---------- -------- ------------- > 44 running throughput 11 <anonymous> (ladebug) where >0 0x24b49818 in /usr/shlib/libc.so #1 0x6f593924 in ComIP_SocketReadMsg() /u/canna/Monaco-L970302/src/com/ip/src/co mip_skrct.c:404 #2 0x6f596518 in ComIP_InternalSocketReceive() /u/canna/Monaco-L970302/src/com/i p/src/comip_skrct.c:964 #3 0x6f594a50 in ComIP_SocketReceiveThread() /u/canna/Monaco-L970302/src/com/ip/ src/comip_skrct.c:558 #4 0x246c0d74 in UnknownProcedure12FromFile179(0x3, 0x14419a18, 0x24810b98, 0x24 6c0cb0, 0x1437b490, 0x0) DebugInformationStrippedFromFile179:??? #5 0x24810b94 in thdBase(0x0, 0x0, 0x0, 0x1, 0x45586732, 0x3) DebugInformationSt rippedFromFile101:??? (ladebug) detach =============== Code extract =============== This is the code extract from comip_skrct.c. Line 404 is the line: Read = SUPOS_SOCKET_READ(sock,BuffP,BuffL); ERZIENTRY("ComIP_SocketReadMsg", ("(Socket=%i)",sock)); /* * We are expecting a Rx Header so read until we have all of * the header in our buffer. */ Bytes = 0; while ((*Continue == TRUE) && (Bytes < COMIP_RX_LEN)) { BuffP = Buffer + Bytes; BuffL = COMIP_RX_LEN - Bytes; ERZMID(COMIP_TINFO,("Reading %lu bytes of Rx Header",BuffL)); Read = SUPOS_SOCKET_READ(sock,BuffP,BuffL); if (Read > 0) { ERZMID(COMIP_TINFO,("%lu bytes of Rx Header read",Read)); Bytes += Read; } else { if (Read < 0) { xerrno = SUPOS_SOCKET_LAST_ERROR; ERZMID(COMIP_TERROR,("Read error on Rx Header = %d",xerrno)); } else { ERZMID(COMIP_TERROR,("Socket closed reading Rx Header")); } *Continue = FALSE; } } This is the #define for SUPOS_SOCKET_READ. #define SUPOS_SOCKET_READ(Sock,pBuf,pBufLen) read(Sock,pBuf,pBufLen) ERZMID(...) writes out trace information | |||||
8718.11 | fixed most likely | SMURF::DENHAM | Digital UNIX Kernel | Tue Mar 04 1997 21:17 | 8 |
Yeah, shoulda updated this yesterday. Working with the ss7 folks who reported this and related sigwait problem, I located the cause of the spinning. You should be seeing EINTR returned repeatedly in the errno value (if you zero it before the read call of course). I have some v4.0b patches ready for you test. If you need other versions, let me know and I'll cook them up asap. | |||||
8718.12 | SMURF::DENHAM | Digital UNIX Kernel | Tue Mar 04 1997 21:19 | 8 | |
I suppose I should have provided a few details on the nature of the problem, but I can't bring myself to say more than it's just another feature of trying to get threads and signals to work together. The code causing the problem was ripped out first thing when I started some v4.0d enhancements. It took me a while to realize why I never saw the problem on my latest kernels.... | |||||
8718.13 | QAR for this problem is QAR51813 | NETRIX::"[email protected]" | Laurent Martin | Wed Mar 05 1997 12:32 | 9 |
Hi, We entered a QAR for this problem that was related to the topic #8821. This the fix to be traced in future Unix versions. Laurent SISB-Telecom [Posted by WWW Notes gateway] |