| Title: | DIGITAL UNIX (FORMERLY KNOWN AS DEC OSF/1) |
| Notice: | Welcome to the Digital UNIX Conference |
| Moderator: | SMURF::DENHAM |
| Created: | Thu Mar 16 1995 |
| Last Modified: | Fri Jun 06 1997 |
| Last Successful Update: | Fri Jun 06 1997 |
| Number of topics: | 10068 |
| Total number of notes: | 35879 |
I'm getting a strange problem on a particular Digital UNIX system.
It's running V3.2c or maybe V3.2. I'm not sure which.
Here is the problem. A select() call specifying one socket in readfds
never returns ie it blocks. The socket is a TCP/IP socket. The problem
is that there is DEFINITELY data to receive, netstat shows that there
are 7 bytes in the receive queue.
Coincidentially in another process a writev is blocking to a completely
different TCP/IP socket. On this socket the receiving end shows no
data in the receive call using netstat and the sending end has no
data in the send queue.
Does this problem sound familiar to anyone? It only seems to happen on
one particular machine (I think it is a 8400/5).
I really doubt it is a problem in our use of select(), especially since
I've got a writev() hanging as well.
If it happens again perhaps it would help if I could go looking at some
internal data structures. What exactly drives the setting of readfds?
The man page implies something in a TCP object. I'm not clear what
this is. Yes you've got it UNIX internals aren't my strong point.
A few dbx commands/pointers would sure help me.
I attach our macro that does the select(), just in case there is any
obvious bug in it. The basic point of this macro is to call a routine
called "call" that blocks on this io_port (an fd number). We do the
select() beforehand because we don't want our thread blocking inside
the call because that library is non thread safe and hence the call
has to be protected by a mutex (see below) and if it blocked inside
the call it would stop other threads from calling into the same
library.
I've tried COMET to see if I can find anything that looks like a
similar problem but I've drawn a blank.
Thanks,
Dave
#define WAIT_FOR_COMPL(stat, iosb, call, io_port, mutex, q_mutex, prior)\
{\
int nfds;\
fd_set readfds;\
int select_status;\
\
if ((q_mutex) != NULL)\
{\
cmn_unlock_queue((q_mutex), prior);\
}\
if (io_port != -1)\
{\
nfds = io_port + 1;\
FD_ZERO(&readfds);\
FD_SET(io_port, &readfds);\
do\
{\
select_status = select(\
nfds,\
&readfds,\
NULL,\
NULL,\
NULL\
);\
if (select_status == -1)\
{\
if ((errno == EINTR) || (errno == EAGAIN))\
stat = SNA_S_NOACTIVITY;\
else\
stat = SNA_S_FUNCABORT;\
}\
else\
{\
if (FD_ISSET(io_port, &readfds))\
{\
pthread_mutex_lock(&mutex);\
stat = call;\
pthread_mutex_unlock(&mutex);\
}\
else\
stat = SNA_S_NOACTIVITY;\
}\
}\
while (stat == SNA_S_NOACTIVITY);\
}\
else\
{\
pthread_mutex_lock(&mutex);\
stat = call;\
pthread_mutex_unlock(&mutex);\
}\
}
| T.R | Title | User | Personal Name | Date | Lines |
|---|---|---|---|---|---|
| 9173.1 | VAXCPU::michaud | Jeff Michaud - ObjectBroker | Fri Mar 14 1997 01:06 | 12 | |
> Here is the problem. A select() call specifying one socket in readfds > never returns ie it blocks. The socket is a TCP/IP socket. The problem > is that there is DEFINITELY data to receive, netstat shows that there > are 7 bytes in the receive queue. If the low-water mark was set to something other than 0 then select(2) will not consider the socket to be readable until there are at least that number (the setting of the receive low-water mark) of bytes are available to read. Before your select do a getsockopt to check to see what the low-water mark is set to. | |||||
| 9173.2 | VAXCPU::michaud | Jeff Michaud - ObjectBroker | Fri Mar 14 1997 01:10 | 6 | |
BTW, I notice you are using threads, you should make sure the application is compiled and linked properly. Also I only skimmed it once, but I don't understand how come in some cases you hold a lock while you set "stat", and in some cases you do not. Why? | |||||
| 9173.3 | More explanation and a request for some dbx commands | EDSCLU::GARROD | IBM Interconnect Engineering | Fri Mar 14 1997 17:00 | 27 |
Re .1 and .2.
We don't set any socket options like low water mark. It also works 99%
of the time. It is just this one system where we're having problems.
uname -a shows
OSF1 stacey.lkg.dec.com V3.2 62 alpha
I believe that is V3.2G?
Also as I said I have a writev on a TCP/IP socket blocked in another
process that shouldn't.
Please could you give me some dbx commands I can use to show some
interesting data structures around why select() is blocked and writev()
is blocked when I don't think they should be.
Regarding question on stat. The lock is not to protect the stat
variable. The argement "call" is actually a routine name with a bunch
of parameters of its own. So "call" expands into a routine call. The
lock is taken out around this routine call. As I said in .0 the reason
for this is that this routine call is part of a library that is
not threadsafe. So the point of this macro is to do the wait (SELECT)
outside the library and then call the library knowing that the routine
call won't block inside the routine.
Dave
| |||||
| 9173.4 | VAXCPU::michaud | Jeff Michaud - ObjectBroker | Fri Mar 14 1997 17:44 | 5 | |
> uname -a shows > OSF1 stacey.lkg.dec.com V3.2 62 alpha > I believe that is V3.2G? even easier, do a "head -2 /etc/motd" ... | |||||
| 9173.5 | COL01::LINNARTZ | Mon Mar 17 1997 16:46 | 15 | ||
i don't belive it will help you much on analyzing your problem,
but here are teh dbx commands
set $pid=<pid>
p *thread->stack->uthread->np_uthread
look for uu_select_event (kern/event.h)
p *thread->stack->uthread->utask
look for the ofile array (first three are usual stdin,...)
p *(struct file *)<address of your file pointer>
depending on the f_type the f_data points to the needed structure,
as you need socket (type should be 2) use
p *(struct socket *)<address of f_data>
i guess you don't need the inpcb/tcppcb, but the header files mention
the ptr's too
Pit
| |||||