[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference turris::digital_unix

Title:DIGITAL UNIX(FORMERLY KNOWN AS DEC OSF/1)
Notice:Welcome to the Digital UNIX Conference
Moderator:SMURF::DENHAM
Created:Thu Mar 16 1995
Last Modified:Fri Jun 06 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:10068
Total number of notes:35879

9418.0. "Socket connections not closed when using getgrgid_r()" by BLAZER::MIKELIS (Software Partner's Eng. MR01-3/F26) Mon Apr 07 1997 17:54

Problem: Digital Unix 4.0A and getgrgid_r and threads

This note just cam in from an ISV. However the ISV cannot reproduce the 
problem. He is concerned that the number for open files may exceed
system limit and cause problems for their customers. Is this a known problem?
Will these socket connections eventually time out?

Description:

  Their Database server starts a thread. It calls getgrgid_r() within this 
thread and then eventually terminates this thread.  

It turns out that getgrgid_r() opens a UDP socket connection, but when running 
under 4.0A, this socket connection is not cleaned up.  Eventually, there are 
literally thousands of these open socket connections hanging around.

  Now I've tried to reproduce this problem outside of our server, but without 
any luck.  I suspect that threads are required to reproduce this problem but 
I didn't try that.

T.RTitleUserPersonal
Name
DateLines
9418.1BLAZER::MIKELISSoftware Partner's Eng. MR01-3/F26Mon Apr 07 1997 20:0072
Here is a reproducer:

Compile it on a 3.2 system:

  cxx -g main.cc -D_REENTRANT -o main.x -lc_r -threads 

Run it on a 4.0A system and you should notice it opening
(but not closing) UDP sockets.

I have been using lsof (public domain software) to track
the open files.  What I am seeing on 4.0A is this:

main.x     6576    bobby    4u  inet   0x5cf4ba40        0t0        UDP *:4633
main.x     6576    bobby    5u  inet   0x12308180        0t0        UDP *:4640
main.x     6576    bobby    6u  inet   0x12309200        0t0        UDP *:4645
main.x     6576    bobby    7u  inet   0x12309a40        0t0        UDP *:4651


On our 4.0 system:
These sockets are not released until the process exits.
Each time a new thread is created, and getgrgid_r is
call, a new socket is opened and subsequently not release.
Eventually, system limits are hit and bad things happen.

On our 3.2 system:
only ONE socket is opened, it remains open until the process
exits, but that OK (it's only one file handle per process).

It appears that getgrgid_r is properly recycling/reusing this
file handle in 3.2, but in 4.0 it opens a new socket, somehow
forgetting that it had one open already.


CODE:
------

#include <stdlib.h>
#include <stream.h>
#include <grp.h>
#include <pthread.h>

static void* start_routine (void *) {
  gid_t gid;
  group grp;
  char buff[1024];

  cout << "\tThread is running..." << endl;
  grp.gr_name = 0;
  grp.gr_passwd = 0;
  grp.gr_gid = 0;
  grp.gr_mem = 0;
  gid = getgrgid_r(500, &grp, buff, sizeof(buff));
  cout << "\tgetgrgid_r returned = " << gid << endl;

  sleep (5);
  return (0);
}

void main (void) {
   for (int i=0; i < 5; i++) {
     cout << "Starting thread number " << i << endl;

     pthread_t thread;
     pthread_create(&thread, pthread_attr_default, start_routine, 0);

     pthread_addr_t status;
     pthread_join(thread, &status);
     cout << "thread number " << i << " terminated with status " \
          << status << endl;
  }
}

9418.2there are some anomalies in the supplied code.SMURF::GAFJerry Feldman, Unix Dev. Environment, DTN:381-2970Tue Apr 08 1997 09:3418
     gid = getgrgid_r(500, &grp, buff, sizeof(buff));
    
    The above call to getgrid_r is missing a parameter.
    The man page for getgrgid_r is:
    int getgrgid_r(
              gid_t gid,
              struct group *grp,
              char *buffer,
              size_t len,
              struct group **result);
    Also, the endgrent_r function should be called before exiting the 
    thread.
    void endgrent_r(
              FILE **gr_fp);
    
    Note that in V4.0, the standard functions are all thread safe.
    
    
9418.3BLAZER::MIKELISSoftware Partner&#039;s Eng. MR01-3/F26Wed Apr 09 1997 11:2234
I passed on your input and got the folowing back from my customer:

Yes, this is true for the new 4.0 interface but according to the man
pages, DEC still supports the 3.2 interface for backward compatibility.
Notice that our code is not new design.  We are just bringing binary
executable code built on 3.2 to a 4.0 system and running it on 4.0

This is the system call in question here.

  [Digital]  The following obsolete functions are supported in order to
             maintain backward compatibility with previous versions of the 
             operating system. You should not use them in new designs.

  int getgrgid_r(
          gid_t gid,
          struct group *grp,
          char *buffer,
          int len);

> 
>     Also, the endgrent_r function should be called before exiting the
>     thread.
> 
>     void endgrent_r(
>               FILE **gr_fp);
> 
>     Note that in V4.0, the standard functions are all thread safe.


If in fact this system call must be made, then which FILE should be
passed to the system call?  This file was not opened by the user,
it was opened by the operating system, so it's the operating systems
responsibility to close it or provide the handle to it (if the user
is expected to close it).
9418.4Could be a binary incompatibility SMURF::GAFJerry Feldman, Unix Dev. Environment, DTN:381-2970Wed Apr 09 1997 19:0310
    I erred when I posted .2, so I marked it hidden. Somehow it became
    unhidden. In any case, the getgrgid_r call always does a setgrent and
    and endgrent. 
    
    I have not looked at the V3.2 sources, but in V4.0, changes were made 
    to all of libc to eliminate the libc_r as a separate library, and to
    make all of libc thread-safe. If the customer's code were built
    -non_shared, then there could be a problem. I would suggest recompiling
    and relinking the code on 4.0, and see if that corrects the problem.
    
9418.5getgrgid_r() still a problemBLAZER::MIKELISSoftware Partner&#039;s Eng. MR01-3/F26Wed Apr 16 1997 12:4578
Here's another response from my ISV. UDP sockets are still
remaining open. A test case is included below. Could you
please take a look and see if there is some kind of solution
i can give my ISV? Thanks/james

------------

   As you suggested in a prior emailing, I am now running
my test program on a 4.0A system using the new posix
entry point for getgrgid_r() with the 5 argument signature.

   The problem is present in this version of the OS/RTL
as well.

   Furthur the program will hang after about 3950 iterations,
due to having so many UDP sockets open (I suspect).  Each
iteration thru the loop creates a thread which calls getgrgid
which opens a UDP socket but never closes it.  Just run lsof
to see how many UDP sockets are open when its wedged, you'll
see!

   At this point we have no way of working around this problem
and it appears to exist in both the 3.2G and the 4.0A version
of the Digital Unix Operating System.

   As a reference point, we did run this test program on an
SGI system running Irix 6.4, which also supports the latest
pthreads standard.  The program worked there with no problem.

   Can you provide a work-around or a patch for this problem ?

Stuck in neutral,
-- bobby --

Here's the code we're running:
------------------------------

#include <stdio.h>
#include <grp.h>
#include <pthread.h>

static void* start_routine (void *) {
  gid_t gid;
  group grp;
  group *grp_ptr;
  char buff[1024];

  fprintf(stderr, "\tThread is running...");
  grp.gr_name = 0;
  grp.gr_passwd = 0;
  grp.gr_gid = 0;
  grp.gr_mem = 0;

  gid = getgrgid_r(500, &grp, buff, sizeof(buff), &grp_ptr);
  fprintf(stderr, "\tgetgrgid_r returned = %d\n", gid);

//  sleep (1);
  return (0);
}

void main (void) {
   for (int i=0; i < 5000; i++) {
     fprintf(stderr, "Starting thread number %d\n", i);

     pthread_attr_t attrs;
     pthread_attr_init(&attrs);
     pthread_attr_setscope(&attrs, PTHREAD_SCOPE_SYSTEM);
     pthread_t thread;
     pthread_create(&thread, &attrs, start_routine, 0);

     void* status;
     pthread_join(thread, &status);
     fprintf(stderr, "thread number %d terminated with status %d\n",
	     i, status);
  }

  fprintf(stderr, "End of test program\n");
}
9418.6Please file a QAR or CLD.SMURF::GAFJerry Feldman, Unix Dev. Environment, DTN:381-2970Wed Apr 16 1997 18:232
    You really need to submit either a QAR or CLD. This is the only way
    that the engineering groups can respond or provide a solution.