| Title: | DIGITAL UNIX (FORMERLY KNOWN AS DEC OSF/1) |
| Notice: | Welcome to the Digital UNIX Conference |
| Moderator: | SMURF::DENHAM |
| Created: | Thu Mar 16 1995 |
| Last Modified: | Fri Jun 06 1997 |
| Last Successful Update: | Fri Jun 06 1997 |
| Number of topics: | 10068 |
| Total number of notes: | 35879 |
Hi all,
We are working on the migration of our product (DECss7) from DIGITAL Unix 3.2 to DIGITAL Unix 4.0B.
Beside the fact of the slight change in standards (POSIX) for threads / signals, we are having a problem that
occurs with the read() function.
Here is a description of the problem:
DECss7 is a multithreaded / distributed platform.
Processes have connections with sockets / mailboxes
In a simple configuration of the platform, all works fine for about 5 minutes and then a process stops with a high
process charge.
this is what i get with "ps":
18071 0.0 /usr/var/ss7/platform_m/bin/ss7_fep_ccitt_m.exe
29915 0.0 /usr/var/ss7/platform_m/bin/ss7_bep_ccitt_m.exe
17964 0.0 /usr/var/ss7/platform_m/bin/ss7_dir_ccitt_m.exe
17421 0.0 /usr/var/ss7/platform_l/bin/ss7_fep_ccitt_l.exe
2180 0.0 /usr/var/ss7/platform_l/bin/ss7_bep_ccitt_l.exe
18036 0.0 /usr/var/ss7/platform_l/bin/ss7_dir_ccitt_l.exe
18896 0.0 /usr/users/ss7/perfs/ivp2.loop/obj/multi_thread/ss7_hlr_thread_ccitt
18864 42.0 /usr/users/ss7/perfs/ivp2.loop/obj/multi_thread/ss7_vlr_thread_ccitt
We use ladebug to see where it is spinning:
ladebug -pid 18864 /usr/users/ss7/perfs/ivp2.loop/obj/multi_thread/ss7_vlr_thread_ccitt
Welcome to the Ladebug Debugger Version 4.0-26
------------------
object file name: /usr/users/ss7/perfs/ivp2.loop/obj/multi_thread/ss7_vlr_thread_ccitt
Reading symbolic information ...done
Attached to process id 18864 ....
Thread received signal INT
stopped at [???: ??? 0x3ff800d1818]
(ladebug) show thread
Thread State Substate Policy Priority Name
------ ---------- --------------- ---------- -------- -------------
>* 20 running throughput 11 0x1409528c0 TRP-RD-SS7
1 ready throughput 11 default thread
-1 blocked kernel fifo 32 manager thread
-2 ready idle 0 null thread for VP 0x0
2 ready throughput 11 0x140062f20 SIGNAL-SS7
3 blocked cond wait rr 23 <anonymous>
4 blocked cond wait rr 23 <anonymous>
5 blocked kernel rr 23 <anonymous>
6 blocked kernel throughput 11 0x1406e52c0 TRP-CN-SS7
7 blocked cond wait throughput 11 0x1406e5320 TRP-RG-SS7
8 blocked cond wait throughput 11 0x1406e53e0 MBX-RD-SS7
9 blocked cond wait throughput 11 0x1406e5440 MBX-WR-SS7
10 blocked cond wait throughput 11 0x1406e54a0 MBX-IO-SS7
11 blocked cond wait rr 23 0x1406e5500 MBX-RD-SS7
12 blocked cond wait rr 23 0x1406e5560 MBX-WR-SS7
13 blocked cond wait rr 23 0x1406e55c0 MBX-IO-SS7
14 blocked kernel rr 23 <anonymous>
15 blocked kernel rr 23 <anonymous>
16 blocked kernel throughput 11 0x1406e56e0 TRP-NT-SS7
18 blocked kernel throughput 11 0x140952800 TRP-NT-SS7
19 blocked cond wait throughput 11 0x140952860 TRP-WR-SS7
21 blocked kernel throughput 11 0x140952980 TRP-RD-SS7
22 ready cond wait throughput 11 <anonymous>
23 blocked cond wait throughput 11 <anonymous>
(ladebug) where
>0 0x3ff800d1818 in /usr/shlib/libc.so
#1 0x3ffbffeb0e4 in ((TC_transport_layer_tcp_ip*)0x1400e9600)->FI_full_read(AI_socket=-1016448,
AP_bufadr=0x3fffffb2730="p\"\356\277\377\003", AI_buflen=-1120448)
/usr/users1/martin/ss7dev/ccitt_v30_osf40/dev/transport_layer_tcp_ip.cc:771
...
(ladebug) up
(ladebug) list 760
760 // We read on the socket until we have filled the given buffer.
761
762 while ( VI_count < AI_buflen )
763 {
764 int VI_rslt = 0;
765
766 if (VI_rslt <0 || VI_rslt>2000)
767 {
768 printf("TCP/IP erreur: VI_rslt=%d\n",VI_rslt);
769 }
770
> 771 VI_rslt = read ( AI_socket , AP_bufadr + VI_count , AI_buflen - VI_count ) ;
772
773 // Check for connection loss.
774
775 if ( VI_rslt < 0 && errno != ECONNRESET )
776 {
777 MM_throw_system_error ( errno ) ;
778 }
779
780 if ( VI_rslt <= 0 )
(ladebug) p VI_rslt
-1016976
As we can see, the read() function call spins.
Is this a known problem or is there a patch for read() in Unix 4.0 .
We are also having problem with sigwait that fails in "cxx__thow" ...???
(not at the same time, but perhaps is it related ? )
Is there a way to debug this ?
Any help appreciated !!!
Thank you !
Laurent Martin
SISB Telecom
France
| T.R | Title | User | Personal Name | Date | Lines |
|---|---|---|---|---|---|
| 8718.1 | We can't tell that it's spinning. | WTFN::SCALES | Despair is appropriate and inevitable. | Thu Feb 06 1997 19:05 | 13 |
.0> As we can see, the read() function call spins. Based solely on the information you've provided, it's not at all obvious that the read() function call is spinning. You've interrupted an active process and taken a snapshot of what it was doing at that moment. Are you certain that the call to read() never returns? Do other threads in the process continue to run? I would recommend that you use the debugger to determine what the current thread is actually doing... Webb | |||||
| 8718.2 | I'm sure it's spinning !! | NETRIX::"[email protected]" | Laurent Martin | Fri Feb 14 1997 05:33 | 101 |
Hi,
Webb, you're right, i didn't give details in the debugger output to
show that read is spinning:
We can reproduce the spin when making DECss7 run a certain time on unix4.0 .
We are having troubles with signals too (see topic #8821).
he is another debugging session: (with dbx this time.)
------------------------------------------------------------------------------
--
% dbx -pid 8777 /usr/var/ss7/platform_l/bin/ss7_fep_ccitt_l.exe
dbx version 3.11.10
Type 'help' for help.
thread 0x15 stopped at>*[__read, 0x3ff800d1818] beq r19,
0x3ff800d1830
(dbx) where
> 0 __read(0x0, 0x1201fc4e0, 0x141166ef0, 0x1201fce90, 0x1400875e0)
[0x3ff800d1818]
1 FI_full_read__25TC_transport_layer_tcp_ipXiPci(this = 0x140241e00,
AI_socket = 1089074352, AP_bufadr = 0x1, AI_buflen = 1091989408)
["/usr/ss7prj1/ccitt_v31_du40/base/starlet/transport_layer_tcp_ip.cc":774,
0x3ffbffe9ea0]
2 FI_read__25TC_transport_layer_tcp_ipXiPci(this = 0x140241e00, AI_socket =
17, AP_bufadr = 0x141167230 = "^�", AI_buflen = 2048)
["/usr/ss7prj1/ccitt_v31_du40/base/starlet/transport_layer_tcp_ip.cc":573,
0x3ffbffe9c38]
3 FV_read_service__18TC_transport_layerXPv(this = 0x140241e00, AI_socket =
0x11) ["/usr/ss7prj1/ccitt_v31_du40/base/starlet/transport_layer.cc":1221,
0x3ffbffe765c]
4
FV_entry__31TC_task___T18TC_transport_layerXR31TC_task___T18TC_transport_layer
AR_task = @140dadd80)
["/usr/users1/martin/ss7dev/ccitt_v31_du40/dev/task_.hh":620, 0x3ffbffe63a8]
5 thdBase(0x0, 0x0, 0x0, 0x1, 0x45586732) [0x3ff80574274]
(dbx) where
> 0 __read(0x0, 0x1201fc4e0, 0x141166ef0, 0x1201fce90, 0x1400875e0)
[0x3ff800d1818]
1 FI_full_read__25TC_transport_layer_tcp_ipXiPci(this = 0x140241e00,
AI_socket = 1089074352, AP_bufadr = 0x1, AI_buflen = 1091989408)
["/usr/ss7prj1/ccitt_v31_du40/base/starlet/transport_layer_tcp_ip.cc":774,
0x3ffbffe9ea0]
2 FI_read__25TC_transport_layer_tcp_ipXiPci(this = 0x140241e00, AI_socket =
17, AP_bufadr = 0x141167230 = "^�", AI_buflen = 2048)
["/usr/ss7prj1/ccitt_v31_du40/base/starlet/transport_layer_tcp_ip.cc":573,
0x3ffbffe9c38]
3 FV_read_service__18TC_transport_layerXPv(this = 0x140241e00, AI_socket =
0x11) ["/usr/ss7prj1/ccitt_v31_du40/base/starlet/transport_layer.cc":1221,
0x3ffbffe765c]
4
FV_entry__31TC_task___T18TC_transport_layerXR31TC_task___T18TC_transport_layer
AR_task = @140dadd80)
["/usr/users1/martin/ss7dev/ccitt_v31_du40/dev/task_.hh":620, 0x3ffbffe63a8]
5 thdBase(0x0, 0x0, 0x0, 0x1, 0x45586732) [0x3ff80574274]
(dbx) list
86
87 -- function Main, the real main routine for the overall program.
88 --
89 function Main -- in DEC Ada created code; please list lines 1:36
for more information
90 (
91 Argc : in Long_Integer;
92 Argv : in System.Address;
93 Envp : in System.Address
94 )
95 return Long_Integer
96 is
(dbx) up
FI_full_read__25TC_transport_layer_tcp_ipXiPci: 774 VI_rslt = read (
AI_socket , AP_bufadr + VI_count , AI_buflen - VI_count ) ;
(dbx) list
771 {
772 int VI_rslt ;
773
> 774 VI_rslt = read ( AI_socket , AP_bufadr + VI_count , AI_buflen -
VI_count ) ;
775
776 // Check for connection loss.
777
778 if ( VI_rslt < 0 && errno != ECONNRESET )
779 {
780 MM_throw_system_error ( errno ) ;
781 }
(dbx) stop at 776
[2] stop at
"/usr/ss7prj1/ccitt_v31_du40/base/starlet/transport_layer_tcp_ip.cc":778
(dbx) cont
-----------------------------------------------------------------------------
and here it never stops at 778, showing that something not normal occurs in
read()
(exception ? spin...?)
If you want more details (debugger details ...) I can give you more.
Thank you,
Laurent
[Posted by WWW Notes gateway]
| |||||
| 8718.3 | SMURF::DENHAM | Digital UNIX Kernel | Fri Feb 14 1997 10:24 | 15 | |
Kernel side stack traces would be useful too. Let this thing
run and dbx -k /vmunix.
Do a tstack command and try to find some in read(). If you do
find someone, do a couple of stack traces a few seconds apart.
Any change?
Then try this, again a few seconds between iterations:
(dbx) p ((thread_t)$tid).last_run_stamp
This will show whether the thread is running at all.
Of course ps mp PID-OF-STUCK-PROCESS a few times will show you the
same thing -- whether the "spinning" thread is really spinning.
| |||||
| 8718.4 | What is it doing there? | WTFN::SCALES | Despair is appropriate and inevitable. | Fri Feb 14 1997 14:24 | 8 |
> thread 0x15 stopped at>*[__read, 0x3ff800d1818] beq r19, 0x3ff800d1830 It might also be illuminating if you were to single-step (by instruction) and see if you can traverse the whole loop. That might give us (or someone else with access to the source) an idea of what's going on. Webb | |||||
| 8718.5 | Re:.2 .3 :dbx and ladebug traces of the problem. | NETRIX::"[email protected]" | Laurent Martin | Tue Feb 18 1997 07:44 | 266 |
Hi,
As Jeff said, i did a dbx -k /vmunix
with tstack commands (four per second): see annex A
we can see that we are very often in read()
If i use dbx (or ladebug), stepi doesn't stop: Annex B
I don't know what i could do now ...
Laurent
########################## A ##############################################
(dbx) tstack
Thread 0xfffffc000bf5fb80:
0 thread_block() ["../../../../src/kernel/kern/sched_prim.c":2094,
0xfffffc00002a86bc]
(dbx) tstack
Thread 0xfffffc000bf5fb80:
0 thread_block() ["../../../../src/kernel/kern/sched_prim.c":2094,
0xfffffc00002a86bc]
1 mmread(0x0, 0x0, 0xfffffc0000607f30, 0x0, 0xfffffc00004a391c)
["../../../../src/kernel/arch/alpha/mem.c":125, 0xfffffc00005059a0]
2 spec_read(0xffffffff90e4f7d8, 0xfffffc000ee45a00, 0x8,
0xfffffc000ee65800, 0xfffffc0000200000)
["../../../../src/kernel/vfs/spec_vnops.c":1870, 0xfffffc00004a3918]
3 msfsspec_read(vp = 0xfffffc00052af600, uio = 0xffffffff90e4f848, ioflag =
0, cred = 0xfffffc000ee65800)
["../../../../src/kernel/msfs/osf/msfs_vnops.c":4422, 0xfffffc00003eacac]
4 vn_read(0xfffffc0000289b64, 0xffffffff90e4f8e0, 0xffffffff90e4f848, 0x0,
0xfffffc000edb3700) ["../../../../src/kernel/vfs/vfs_vnops.c":877,
0xfffffc00004c4a0c]
5 rwuio(0xfffffc000ebb8220, 0xfffffc000bf5fb80, 0xfffffc0000686080,
0xfffffc000ebb8000, 0xfffffc000edb3700)
["../../../../src/kernel/bsd/sys_generic.c":1217, 0xfffffc0000289bb8]
6 read(0xfffffc0000000008, 0xffffffff90e4f838, 0x5e7d850, 0x800000001, 0x0)
["../../../../src/kernel/bsd/sys_generic.c":1167, 0xfffffc0000289ac8]
7 syscall(0x8, 0xfffffc00004fd890, 0xa4e75330985be, 0x3ff800d7248, 0x3)
["../../../../src/kernel/arch/alpha/syscall_trap.c":540, 0xfffffc0000507a00]
> 8 _Xsyscall(0x8, 0x3ff800d1818, 0x1400302b0, 0x3, 0x11fffb950)
["../../../../src/kernel/arch/alpha/locore.s":1209, 0xfffffc00004fdac4]
(dbx) tstack
Thread 0xfffffc000bf5fb80:
0 thread_block() ["../../../../src/kernel/kern/sched_prim.c":2094,
0xfffffc00002a86bc]
1 mmread(0x0, 0x0, 0xfffffc0000607f30, 0x0, 0xfffffc00004a391c)
["../../../../src/kernel/arch/alpha/mem.c":125, 0xfffffc00005059a0]
2 spec_read(0xffffffff90e4f7d8, 0xfffffc000ee45a00, 0x2,
0xfffffc000ee65800, 0xfffffc0000200000)
["../../../../src/kernel/vfs/spec_vnops.c":1870, 0xfffffc00004a3918]
3 msfsspec_read(vp = 0xfffffc00052af600, uio = 0xffffffff90e4f848, ioflag =
0, cred = 0xfffffc000ee65800)
["../../../../src/kernel/msfs/osf/msfs_vnops.c":4422, 0xfffffc00003eacac]
4 vn_read(0xfffffc0000289b64, 0xffffffff90e4f8e0, 0xffffffff90e4f848, 0x0,
0xfffffc000edb3700) ["../../../../src/kernel/vfs/vfs_vnops.c":877,
0xfffffc00004c4a0c]
5 rwuio(0xfffffc000ebb8220, 0xfffffc000bf5fb80, 0xfffffc0000686080,
0xfffffc000ebb8000, 0xfffffc000edb3700)
["../../../../src/kernel/bsd/sys_generic.c":1217, 0xfffffc0000289bb8]
6 read(0xfffffc0000000008, 0xffffffff90e4f838, 0x5e7d850, 0x800000001, 0x0)
["../../../../src/kernel/bsd/sys_generic.c":1167, 0xfffffc0000289ac8]
7 syscall(0x8, 0xfffffc00004fda70, 0xe42d4330985be, 0x12009ab68, 0x3)
["../../../../src/kernel/arch/alpha/syscall_trap.c":540, 0xfffffc0000507a00]
> 8 _Xsyscall(0x8, 0x3ff800d1818, 0x1400302b0, 0x3, 0x11fffb950)
["../../../../src/kernel/arch/alpha/locore.s":1209, 0xfffffc00004fdac4]
(dbx) tstack
Thread 0xfffffc000bf5fb80:
0 thread_block() ["../../../../src/kernel/kern/sched_prim.c":2094,
0xfffffc00002a86bc]
1 mmread(0x0, 0x0, 0xfffffc0000607f30, 0x0, 0xfffffc00004a391c)
["../../../../src/kernel/arch/alpha/mem.c":125, 0xfffffc00005059a0]
2 spec_read(0xffffffff90e4f7d8, 0xfffffc0000230000, 0x120086380,
0xfffffc000ee65800, 0xfffffc0000200000)
["../../../../src/kernel/vfs/spec_vnops.c":1870, 0xfffffc00004a3918]
3 msfsspec_read(vp = 0xfffffc00052af600, uio = 0xffffffff90e4f848, ioflag =
0, cred = 0xfffffc000ee65800)
["../../../../src/kernel/msfs/osf/msfs_vnops.c":4422, 0xfffffc00003eacac]
4 vn_read(0xfffffc0000289b64, 0xffffffff90e4f8e0, 0xffffffff90e4f848, 0x0,
0xfffffc000edb3700) ["../../../../src/kernel/vfs/vfs_vnops.c":877,
0xfffffc00004c4a0c]
5 rwuio(0xfffffc000ebb8220, 0xfffffc000bf5fb80, 0xfffffc0000686080,
0xfffffc000ebb8000, 0xfffffc000edb3700)
["../../../../src/kernel/bsd/sys_generic.c":1217, 0xfffffc0000289bb8]
6 read(0xfffffc0000000008, 0xffffffff90e4f838, 0x5e7d850, 0x800000001, 0x0)
["../../../../src/kernel/bsd/sys_generic.c":1167, 0xfffffc0000289ac8]
7 syscall(0x8, 0xfffffc00004fd890, 0x370a5330985bf, 0x12008b5a0, 0x3)
["../../../../src/kernel/arch/alpha/syscall_trap.c":540, 0xfffffc0000507a00]
> 8 _Xsyscall(0x8, 0x3ff800d1818, 0x1400302b0, 0x3, 0x11fffb950)
["../../../../src/kernel/arch/alpha/locore.s":1209, 0xfffffc00004fdac4]
(dbx) tstack
Thread 0xfffffc000bf5fb80:
0 thread_block() ["../../../../src/kernel/kern/sched_prim.c":2094,
0xfffffc00002a86bc]
(dbx) tstack
Thread 0xfffffc000bf5fb80:
0 thread_block() ["../../../../src/kernel/kern/sched_prim.c":2094,
0xfffffc00002a86bc]
1 mmread(0x0, 0x0, 0xfffffc0000607f30, 0x0, 0xfffffc00004a391c)
["../../../../src/kernel/arch/alpha/mem.c":125, 0xfffffc00005059a0]
2 spec_read(0xffffffff90e4f7d8, 0xfffffc000ee45a00, 0xfffffc00004c4d70,
0xfffffc000ee65800, 0xfffffc0000200000)
["../../../../src/kernel/vfs/spec_vnops.c":1870, 0xfffffc00004a3918]
3 msfsspec_read(vp = 0xfffffc00052af600, uio = 0xffffffff90e4f848, ioflag =
0, cred = 0xfffffc000ee65800)
["../../../../src/kernel/msfs/osf/msfs_vnops.c":4422, 0xfffffc00003eacac]
4 vn_read(0xfffffc0000289b64, 0xffffffff90e4f8e0, 0xffffffff90e4f848, 0x0,
0xfffffc000edb3700) ["../../../../src/kernel/vfs/vfs_vnops.c":877,
0xfffffc00004c4a0c]
5 rwuio(0xfffffc000ebb8220, 0xfffffc000bf5fb80, 0xfffffc0000686080,
0xfffffc000ebb8000, 0xfffffc000edb3700)
["../../../../src/kernel/bsd/sys_generic.c":1217, 0xfffffc0000289bb8]
6 read(0x8, 0xffffffff90e4f838, 0x5e7d850, 0x800000001, 0x0)
["../../../../src/kernel/bsd/sys_generic.c":1167, 0xfffffc0000289ac8]
7 syscall(0x8, 0xfffffc00004fd890, 0xdc09f330985bf, 0x3ff800e5518, 0x3)
["../../../../src/kernel/arch/alpha/syscall_trap.c":540, 0xfffffc0000507a00]
> 8 _Xsyscall(0x8, 0x3ff800d1818, 0x1400302b0, 0x3, 0x11fffb950)
["../../../../src/kernel/arch/alpha/locore.s":1209, 0xfffffc00004fdac4]
(dbx) tstack
Thread 0xfffffc000bf5fb80:
0 thread_block() ["../../../../src/kernel/kern/sched_prim.c":2094,
0xfffffc00002a86bc]
(dbx) tstack
Thread 0xfffffc000bf5fb80:
0 thread_block() ["../../../../src/kernel/kern/sched_prim.c":2094,
0xfffffc00002a86bc]
(dbx) tstack
Thread 0xfffffc000bf5fb80:
0 thread_block() ["../../../../src/kernel/kern/sched_prim.c":2094,
0xfffffc00002a86bc]
(dbx) tstack
Thread 0xfffffc000bf5fb80:
0 thread_block() ["../../../../src/kernel/kern/sched_prim.c":2094,
0xfffffc00002a86bc]
1 mmread(0x0, 0x0, 0xfffffc0000607f30, 0x0, 0xfffffc00004a391c)
["../../../../src/kernel/arch/alpha/mem.c":125, 0xfffffc00005059a0]
2 spec_read(0xffffffff90e4f7d8, 0xfffffc000ee45a00, 0x120086380,
0xfffffc000ee65800, 0xfffffc0000200000)
["../../../../src/kernel/vfs/spec_vnops.c":1870, 0xfffffc00004a3918]
3 msfsspec_read(vp = 0xfffffc00052af600, uio = 0xffffffff90e4f848, ioflag =
0, cred = 0xfffffc000ee65800)
["../../../../src/kernel/msfs/osf/msfs_vnops.c":4422, 0xfffffc00003eacac]
4 vn_read(0xfffffc0000289b64, 0xffffffff90e4f8e0, 0xffffffff90e4f848, 0x0,
0xfffffc000edb3700) ["../../../../src/kernel/vfs/vfs_vnops.c":877,
0xfffffc00004c4a0c]
5 rwuio(0xfffffc000ebb8220, 0xfffffc000bf5fb80, 0xfffffc0000686080,
0xfffffc000ebb8000, 0xfffffc000edb3700)
["../../../../src/kernel/bsd/sys_generic.c":1217, 0xfffffc0000289bb8]
6 read(0xfffffc0000000008, 0xffffffff90e4f838, 0x5e7d850, 0x800000001, 0x0)
["../../../../src/kernel/bsd/sys_generic.c":1167, 0xfffffc0000289ac8]
7 syscall(0x8, 0xfffffc00004fd890, 0x21798330985c1, 0x120039984, 0x3)
["../../../../src/kernel/arch/alpha/syscall_trap.c":540, 0xfffffc0000507a00]
> 8 _Xsyscall(0x8, 0x3ff800d1818, 0x1400302b0, 0x3, 0x11fffb950)
["../../../../src/kernel/arch/alpha/locore.s":1209, 0xfffffc00004fdac4]
(dbx)
########################## B ##############################################
% ladebug -pid 17685
/usr/users/ss7/perfs/ivp2.loop/obj/multi_thread/ss7_vlr_thread_ccitt
Welcome to the Ladebug Debugger Version 4.0-26
------------------
object file name:
/usr/users/ss7/perfs/ivp2.loop/obj/multi_thread/ss7_vlr_thread_ccitt
Reading symbolic information ...done
Attached to process id 17685 ....
Thread received signal INT
stopped at [???: ??? 0x3ff800d1818]
(ladebug) where
>0 0x3ff800d1818 in /usr/shlib/libc.so
#1 0x3ffbffe9b90 in
((TC_transport_layer_tcp_ip*)0x1400e9600)->FI_full_read(AI_socket=-885136,
AP_bufadr=0x3fffffd2820="P;\367\277\377\003", AI_buflen=-989136)
/usr/ss7prj1/ccitt_v31_du40/base/starlet/transport_layer_tcp_ip.cc:774
#2 0x3ffbffe9928 in
((TC_transport_layer_tcp_ip*)0x1400e9600)->FI_read(AI_socket=17,
AP_bufadr=0x140acb230="|\001AMUSSETVLR \b\0029MUSSET ", AI_buflen=2048)
/usr/ss7prj1/ccitt_v31_du40/base/starlet/transport_layer_tcp_ip.cc:573
#3 0x3ffbffe734c in
((TC_transport_layer*)0x1400e9600)->FV_read_service(AI_socket=0x11)
/usr/ss7prj1/ccitt_v31_du40/base/starlet/transport_layer.cc:1221
#4 0x3ffbffe6098 in TC_task_<TC_transport_layer>::FV_entry(AR_task={ ... })
/usr/users1/martin/ss7dev/ccitt_v31_du40/dev/task_.hh:620
#5 0x3ff80574274 in thdBase(0x0, 0x0, 0x0, 0x1, 0x45586732, 0x3)
DebugInformationStrippedFromFile101:???
(ladebug) stepi
Thread received signal INT
stopped at [???: ??? 0x3ff800d1818] beq r19, 0x3ff800d1830
(ladebug) 0x3ff800d1818/20 i
*[???:???, 0x3ff800d1818] beq r19, 0x3ff800d1830
[???:???, 0x3ff800d181c] call_pal bpt
[???:???, 0x3ff800d1820] ldah gp, 16380(gp)
[???:???, 0x3ff800d1824] lda gp, 6544(gp)
[???:???, 0x3ff800d1828] ldq r28, -27696(gp)
[???:???, 0x3ff800d182c] jmp r31, (r28), 0x3ff800d1160(r31)
[???:???, 0x3ff800d1830] ret r31, (r26), 1
[???:???, 0x3ff800d1834] bis r31, r31, r31
[???:???, 0x3ff800d1838] ldq_u r31, 0(sp)
[???:???, 0x3ff800d183c] bis r31, r31, r31
[__gettimeofday:???, 0x3ff800d1840] addq r31, 0x74, r0
[__gettimeofday:???, 0x3ff800d1844] call_pal callsys
[???:???, 0x3ff800d1848] beq r19, 0x3ff800d1860
[???:???, 0x3ff800d184c] br gp, 0x3ff800d1850
[???:???, 0x3ff800d1850] ldah gp, 16380(gp)
[???:???, 0x3ff800d1854] lda gp, 6496(gp)
[???:???, 0x3ff800d1858] ldq r28, -27696(gp)
[???:???, 0x3ff800d185c] jmp r31, (r28), 0x3ff800cf0a0(r31)
[???:???, 0x3ff800d1860] ret r31, (r26), 1
[???:???, 0x3ff800d1864] bis r31, r31, r31
(ladebug) stopi at 0x3ff800d181c
[#1: stopi at 4395899885596 ]
(ladebug) stopi at 0x3ff800d1818
[#2: stopi at 4395899885592 ]
(ladebug) stopi at 0x3ff800d1830
[#3: stopi at 4395899885616 ]
(ladebug) nexti <<-- doesn't stop, i do Ctrl+C
Thread received signal INT
stopped at [???: ??? 0x3ff800d1818] beq r19, 0x3ff800d1830
(ladebug) stepi
Thread received signal INT
stopped at [???: ??? 0x3ff800d1818] beq r19, 0x3ff800d1830
(ladebug) up
>1 0x3ffbffe9b90 in
((TC_transport_layer_tcp_ip*)0x1400e9600)->FI_full_read(AI_socket=-885136,
AP_bufadr=0x3fffffd2820="P;\367\277\377\003", AI_buflen=-989136)
/usr/ss7prj1/ccitt_v31_du40/base/starlet/transport_layer_tcp_ip.cc:774
774 VI_rslt = read ( AI_socket , AP_bufadr + VI_count , AI_buflen -
VI_count ) ;
(ladebug) list 770
770 while ( VI_count < AI_buflen )
771 {
772 int VI_rslt ;
773
> 774 VI_rslt = read ( AI_socket , AP_bufadr + VI_count , AI_buflen -
VI_count ) ;
775
776 // Check for connection loss.
777
778 if ( VI_rslt < 0 && errno != ECONNRESET )
779 {
780 MM_throw_system_error ( errno ) ;
781 }
782
783 if ( VI_rslt <= 0 )
784 {
785 return 0 ;
786 }
787
788 VI_count += VI_rslt ;
789 }
790
(ladebug) stop at 777
[#4: stop at
"/usr/ss7prj1/ccitt_v31_du40/base/starlet/transport_layer_tcp_ip.cc":778 ]
(ladebug) stepi
Thread received signal INT
stopped at [???: ??? 0x3ff800d1818] beq r19, 0x3ff800d1830
[Posted by WWW Notes gateway]
| |||||
| 8718.6 | SMURF::DENHAM | Digital UNIX Kernel | Tue Feb 18 1997 10:33 | 12 | |
Hmmmmm. The fact that you're getting different stack traces
shows that you're not stuck in read. But the fact that you're
seeing something besides the read trace shows that you appear
to be completing the reads.
Let's confirm what you're reading, OK? The call is going to the
memory device -- so, this is read is to /dev/mem, kmem, zero?
Something else? Nor can I find a thread_block at line 125
of mmread, which is a little puzzling.
So, tell us more about what you're reading and what's supposed
to be happening....
| |||||
| 8718.7 | Maybe it's doing what you told it to do... | WTFN::SCALES | Despair is appropriate and inevitable. | Tue Feb 18 1997 15:01 | 9 |
.-2> 770 while ( VI_count < AI_buflen ) Also, you're doing the read inside a loop...did you take a look at the loop conditional expression? (E.g., any chance that AI_buflen is huge, or that VI_count has been corrupted (e.g., it's negative), or that VI_count somehow is not being incremented?) Webb | |||||
| 8718.8 | Another possibility | WTFN::SCALES | Despair is appropriate and inevitable. | Tue Feb 18 1997 15:05 | 11 |
I see that you're setting the breakpoint inside the loop and it never seems to hit that, which does suggest that read() is never returning (but which could be an illusion -- the debugger seems to be pretty shakey on other issues here...) Another possibility is that the RA on the stack has been corrupted, so that every time read() "returns" control passes to the beginning of the read() routine. (That's a pretty long shot, but it's no inconceivable...) Webb | |||||
| 8718.9 | it seems that problem doesn't occur on a multi cpu platform.. | NETRIX::"[email protected]" | Laurent Martin | Thu Feb 20 1997 12:50 | 7 |
I tried DECss7 on a multi cpu (4) platform and i didn't yet have this problem. But i still have the problem as in topic 8821 (sigwait interrupted with EINTR while all signals blocked.) Laurent [Posted by WWW Notes gateway] | |||||
| 8718.10 | Also seeing an apparent spin in read | MUFFIT::gerry | Gerry Reilly | Tue Mar 04 1997 16:38 | 144 |
We appear to be hitting a similar/same problem to .0 with a spin in read
from sockets from a multithread program. This is being run
on V4.0a. We have set breakpoints after the read - these are not reached.
Not always trusting a debugger, we also added trace statements
after the reads - also do not get reached. Therefore I have to conclude
the spin is actually in the read().
Only seen so far on SMP systems. Also when we get the spin, we always
see two threads spinning (never one). These two threads are doing reads
against different sockets.
Any thoughts ?, or useful information that we should collect ?
-gerry
(ladebug) show thread
Thread State Substate Policy Priority Name
------ ---------- --------------- ---------- -------- -------------
1 blocked cond wait throughput 11 default thread
-1 ready kernel fifo 32 manager thread
-2 ready idle 0 null thread for VP 0x0
-3 ready idle 0 null thread for VP 0x1
2 blocked timed cond wait throughput 11 <anonymous>
3 ready throughput 11 <anonymous>
4 blocked kernel throughput 11 <anonymous>
5 blocked cond wait throughput 11 <anonymous>
6 blocked cond wait throughput 11 <anonymous>
7 blocked cond wait throughput 11 <anonymous>
8 blocked cond wait throughput 11 <anonymous>
9 blocked cond wait throughput 11 <anonymous>
10 blocked cond wait throughput 11 <anonymous>
11 blocked cond wait throughput 11 <anonymous>
12 blocked cond wait throughput 11 <anonymous>
13 ready kernel throughput 11 <anonymous>
32 ready kernel throughput 11 <anonymous>
33 ready kernel throughput 11 <anonymous>
34 blocked kernel throughput 11 <anonymous>
35 blocked kernel throughput 11 <anonymous>
36 blocked kernel throughput 11 <anonymous>
37 blocked kernel throughput 11 <anonymous>
38 blocked kernel throughput 11 <anonymous>
39 blocked kernel throughput 11 <anonymous>
40 blocked kernel throughput 11 <anonymous>
41 blocked kernel throughput 11 <anonymous>
>* 42 running throughput 11 <anonymous>
43 blocked cond wait throughput 11 <anonymous>
44 running throughput 11 <anonymous>
45 blocked cond wait throughput 11 <anonymous>
46 blocked kernel throughput 11 <anonymous>
47 blocked kernel throughput 11 <anonymous>
48 blocked kernel throughput 11 <anonymous>
(ladebug) thread 42
Thread State Substate Policy Priority Name
------ ---------- --------------- ---------- -------- -------------
>* 42 running throughput 11 <anonymous>
(ladebug) where
>0 0x24b49818 in /usr/shlib/libc.so
#1 0x6f593924 in ComIP_SocketReadMsg() /u/canna/Monaco-L970302/src/com/ip/src/co
mip_skrct.c:404
#2 0x6f596518 in ComIP_InternalSocketReceive() /u/canna/Monaco-L970302/src/com/i
p/src/comip_skrct.c:964
#3 0x6f594a50 in ComIP_SocketReceiveThread() /u/canna/Monaco-L970302/src/com/ip/
src/comip_skrct.c:558
#4 0x246c0d74 in UnknownProcedure12FromFile179(0x3, 0x142efa18, 0x24810b98, 0x24
6c0cb0, 0x143b2f30, 0x0) DebugInformationStrippedFromFile179:???
#5 0x24810b94 in thdBase(0x0, 0x0, 0x248002c0, 0x1, 0x45586732, 0x3) DebugInform
ationStrippedFromFile101:???
(ladebug) thread 44
Thread State Substate Policy Priority Name
------ ---------- --------------- ---------- -------- -------------
> 44 running throughput 11 <anonymous>
(ladebug) where
>0 0x24b49818 in /usr/shlib/libc.so
#1 0x6f593924 in ComIP_SocketReadMsg() /u/canna/Monaco-L970302/src/com/ip/src/co
mip_skrct.c:404
#2 0x6f596518 in ComIP_InternalSocketReceive() /u/canna/Monaco-L970302/src/com/i
p/src/comip_skrct.c:964
#3 0x6f594a50 in ComIP_SocketReceiveThread() /u/canna/Monaco-L970302/src/com/ip/
src/comip_skrct.c:558
#4 0x246c0d74 in UnknownProcedure12FromFile179(0x3, 0x14419a18, 0x24810b98, 0x24
6c0cb0, 0x1437b490, 0x0) DebugInformationStrippedFromFile179:???
#5 0x24810b94 in thdBase(0x0, 0x0, 0x0, 0x1, 0x45586732, 0x3) DebugInformationSt
rippedFromFile101:???
(ladebug) detach
===============
Code extract
===============
This is the code extract from comip_skrct.c. Line 404 is the line:
Read = SUPOS_SOCKET_READ(sock,BuffP,BuffL);
ERZIENTRY("ComIP_SocketReadMsg", ("(Socket=%i)",sock));
/*
* We are expecting a Rx Header so read until we have all of
* the header in our buffer.
*/
Bytes = 0;
while ((*Continue == TRUE) && (Bytes < COMIP_RX_LEN))
{
BuffP = Buffer + Bytes;
BuffL = COMIP_RX_LEN - Bytes;
ERZMID(COMIP_TINFO,("Reading %lu bytes of Rx Header",BuffL));
Read = SUPOS_SOCKET_READ(sock,BuffP,BuffL);
if (Read > 0)
{
ERZMID(COMIP_TINFO,("%lu bytes of Rx Header read",Read));
Bytes += Read;
}
else
{
if (Read < 0)
{
xerrno = SUPOS_SOCKET_LAST_ERROR;
ERZMID(COMIP_TERROR,("Read error on Rx Header = %d",xerrno));
}
else
{
ERZMID(COMIP_TERROR,("Socket closed reading Rx Header"));
}
*Continue = FALSE;
}
}
This is the #define for SUPOS_SOCKET_READ.
#define SUPOS_SOCKET_READ(Sock,pBuf,pBufLen) read(Sock,pBuf,pBufLen)
ERZMID(...) writes out trace information
| |||||
| 8718.11 | fixed most likely | SMURF::DENHAM | Digital UNIX Kernel | Tue Mar 04 1997 21:17 | 8 |
Yeah, shoulda updated this yesterday. Working with the ss7
folks who reported this and related sigwait problem, I located
the cause of the spinning. You should be seeing EINTR returned
repeatedly in the errno value (if you zero it before the
read call of course).
I have some v4.0b patches ready for you test. If you need
other versions, let me know and I'll cook them up asap.
| |||||
| 8718.12 | SMURF::DENHAM | Digital UNIX Kernel | Tue Mar 04 1997 21:19 | 8 | |
I suppose I should have provided a few details on the nature
of the problem, but I can't bring myself to say more than
it's just another feature of trying to get threads and signals
to work together.
The code causing the problem was ripped out first thing
when I started some v4.0d enhancements. It took me a while
to realize why I never saw the problem on my latest kernels....
| |||||
| 8718.13 | QAR for this problem is QAR51813 | NETRIX::"[email protected]" | Laurent Martin | Wed Mar 05 1997 12:32 | 9 |
Hi, We entered a QAR for this problem that was related to the topic #8821. This the fix to be traced in future Unix versions. Laurent SISB-Telecom [Posted by WWW Notes gateway] | |||||