[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference turris::digital_unix

Title:DIGITAL UNIX(FORMERLY KNOWN AS DEC OSF/1)
Notice:Welcome to the Digital UNIX Conference
Moderator:SMURF::DENHAM
Created:Thu Mar 16 1995
Last Modified:Fri Jun 06 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:10068
Total number of notes:35879

9173.0. "select() blocks on TCP/IP socket when data available to read" by EDSCLU::GARROD (IBM Interconnect Engineering) Fri Mar 14 1997 00:00

    I'm getting a strange problem on a particular Digital UNIX system.
    It's running V3.2c or maybe V3.2. I'm not sure which.
    
    Here is the problem. A select() call specifying one socket in readfds
    never returns ie it blocks. The socket is a TCP/IP socket. The problem
    is that there is DEFINITELY data to receive, netstat shows that there
    are 7 bytes in the receive queue.
    
    Coincidentially in another process a writev is blocking to a completely
    different TCP/IP socket. On this socket the receiving end shows no
    data in the receive call using netstat and the sending end has no
    data in the send queue.
    
    Does this problem sound familiar to anyone? It only seems to happen on
    one particular machine (I think it is a 8400/5).
    
    I really doubt it is a problem in our use of select(), especially since
    I've got a writev() hanging as well.
    
    If it happens again perhaps it would help if I could go looking at some
    internal data structures. What exactly drives the setting of readfds?
    The man page implies something in a TCP object. I'm not clear what
    this is. Yes you've got it UNIX internals aren't my strong point.
    
    A few dbx commands/pointers would sure help me.
    
    I attach our macro that does the select(), just in case there is any
    obvious bug in it. The basic point of this macro is to call a routine
    called "call" that blocks on this io_port (an fd number). We do the
    select() beforehand because we don't want our thread blocking inside
    the call because that library is non thread safe and hence the call
    has to be protected by a mutex (see below) and if it blocked inside
    the call it would stop other threads from calling into the same
    library.
    
    I've tried COMET to see if I can find anything that looks like a
    similar problem but I've drawn a blank.
    
    Thanks,
    
    Dave
    
#define WAIT_FOR_COMPL(stat, iosb, call, io_port, mutex, q_mutex, prior)\
{\
    int nfds;\
    fd_set readfds;\
    int select_status;\
\
    if ((q_mutex) != NULL)\
    {\
	cmn_unlock_queue((q_mutex), prior);\
    }\
    if (io_port != -1)\
    {\
	nfds = io_port + 1;\
	FD_ZERO(&readfds);\
	FD_SET(io_port, &readfds);\
	do\
	{\
	    select_status = select(\
				nfds,\
				&readfds,\
				NULL,\
				NULL,\
				NULL\
				);\
	    if (select_status == -1)\
	    {\
		if ((errno == EINTR) || (errno == EAGAIN))\
		    stat = SNA_S_NOACTIVITY;\
		else\
		    stat = SNA_S_FUNCABORT;\
	    }\
	    else\
	    {\
		if (FD_ISSET(io_port, &readfds))\
		{\
		    pthread_mutex_lock(&mutex);\
		    stat = call;\
		    pthread_mutex_unlock(&mutex);\
		}\
		else\
		    stat = SNA_S_NOACTIVITY;\
	    }\
	}\
	while (stat == SNA_S_NOACTIVITY);\
    }\
    else\
    {\
	pthread_mutex_lock(&mutex);\
	stat = call;\
	pthread_mutex_unlock(&mutex);\
    }\
}
    
    
    
T.RTitleUserPersonal
Name
DateLines
9173.1VAXCPU::michaudJeff Michaud - ObjectBrokerFri Mar 14 1997 01:0612
>     Here is the problem. A select() call specifying one socket in readfds
>     never returns ie it blocks. The socket is a TCP/IP socket. The problem
>     is that there is DEFINITELY data to receive, netstat shows that there
>     are 7 bytes in the receive queue.

	If the low-water mark was set to something other than 0 then
	select(2) will not consider the socket to be readable until
	there are at least that number (the setting of the receive
	low-water mark) of bytes are available to read.

	Before your select do a getsockopt to check to see what the
	low-water mark is set to.
9173.2VAXCPU::michaudJeff Michaud - ObjectBrokerFri Mar 14 1997 01:106
	BTW, I notice you are using threads, you should make sure the
	application is compiled and linked properly.

	Also I only skimmed it once, but I don't understand how come
	in some cases you hold a lock while you set "stat", and in some
	cases you do not.  Why?
9173.3More explanation and a request for some dbx commandsEDSCLU::GARRODIBM Interconnect EngineeringFri Mar 14 1997 17:0027
    Re .1 and .2.
    
    We don't set any socket options like low water mark. It also works 99%
    of the time. It is just this one system where we're having problems.
    uname -a shows
    
    OSF1 stacey.lkg.dec.com V3.2 62 alpha
    
    I believe that is V3.2G?
    
    Also as I said I have a writev on a TCP/IP socket blocked in another
    process that shouldn't.
    
    Please could you give me some dbx commands I can use to show some
    interesting data structures around why select() is blocked and writev()
    is blocked when I don't think they should be.
    
    Regarding question on stat. The lock is not to protect the stat
    variable. The argement "call" is actually a routine name with a bunch
    of parameters of its own. So "call" expands into a routine call. The
    lock is taken out around this routine call. As I said in .0 the reason
    for this is that this routine call is part of a library that is
    not threadsafe. So the point of this macro is to do the wait (SELECT)
    outside the library and then call the library knowing that the routine
    call won't block inside the routine.
    
    Dave
9173.4VAXCPU::michaudJeff Michaud - ObjectBrokerFri Mar 14 1997 17:445
>     uname -a shows
>     OSF1 stacey.lkg.dec.com V3.2 62 alpha
>     I believe that is V3.2G?

	even easier, do a "head -2 /etc/motd" ...
9173.5COL01::LINNARTZMon Mar 17 1997 16:4615
    i don't belive it will help you much on analyzing your problem,
    but here are teh dbx commands
    set $pid=<pid>
    p *thread->stack->uthread->np_uthread
    look for uu_select_event (kern/event.h)
    p *thread->stack->uthread->utask
    look for the ofile array (first three are usual stdin,...)
    p *(struct file *)<address of your file pointer>
    depending on the f_type the f_data points to the needed structure,
    as you need socket (type should be 2) use
    p *(struct socket *)<address of f_data>
    
    i guess you don't need the inpcb/tcppcb, but the header files mention
    the ptr's too
     Pit