| Date: 6-MAR-1997 12:54:40.40
From: DEC:.REO.REOVTX::WOOD_J "[email protected]"
Subj: Digital ASAP #21564: defunct (zombie) processes & signals
To: smtp%"[email protected]"
Nick,
I did find mention of a bug in older Digital UNIX whereby when
the system is short on memory, the output of "ps" is affected
such that it incorrectly displays "<defunct>" for some processes.
However, you're using Digital UNIX v3.2G, which is fairly recent.
Who is the parent process of the defunct processes? E.g. do
something like:
% ps aux | grep defunct
to identify the PID (second column) of defunct processes, then
do:
% ps j -p <PID>
and look at the PPID parent-PID field (third column). If your
application is the parent, then I suspect a programming bug;
otherwise it might be an o/s bug. Let me know.
I have done some reading about signals and under what
circumstances a child process becomes defunct (aka zombie).
Attached is an example program which I have been using,
which includes some comments. It may be that your
application needs to invoke the code of the do_sigaction()
routine.
Does your application create many child processes? If so,
it is possible that your application is not handling the
termination of the child processes properly. This could be
likely if your program was developed on a System-V variant
of UNIX, and is being ported to Digital UNIX where the
default signal handling is different. If possible, I would
recommend the use of POSIX 1003.1a signals (e.g. sigaction(),
sigprocmask(), etc.).
I'm not convinced that having defunct processes should
unduly slow down your system, because my understanding is
that defunct processes occupy a minimum of memory. However,
you could be running out of process slots, but if this
was the case I'd expect you to get an error message.
A comment about your signal-handler code "w_signal()". There
are restrictions as to what can safely be performed within
a signal handler: see the listing of safe (re-entrant)
functions under the "man 4 signal" man-page. Note that
printf() is not amongst them!
Hope this helps.
Regards,
John Wood
Software Partner Engineering (UK)
Digital Equipment Co
-----------------------
/*
defunct.c John Wood 6-March-1997
Program to examine behaviour of defunct (zombie) processes.
A defunct process is created when a parent process forks a child,
and the child process exits but the parent does not wait for, or
receive a signal from, the child.
A defunct process has freed up the program's text & data segments,
and has closed all files, but it still takes up a process table slot,
and a bit of memory for it's status.
On Digital UNIX v3.0 and greater, this program will by default create
a number of defunct children, which are only tidied up when the parent
process exits.
Run this program in the background, then use "ps" to see the defunct
(zombie) child-processes.
E.g.
cc defunct.c -o defunct.exe -build .exe
defunct.exe & -run program in background
ps aux | grep defunct -should see all the defunct child procs
To prevent the child processes from becoming defunct, you can make the
do_sigaction() routine get called. E.g.
cc -DDO_SIG defunct.c -o defunct_dosig.exe
defunct_dosig.exe &
ps aux | grep defunct -won't see any defunct children
Alternatively the program can call waitpid() to reap the children. E.g.
cc -DDO_WAIT defunct.c -o defunct_dowait.exe
defunct_dosig.exe &
ps aux | grep defunct -won't see any defunct children
*/
#include <stdio.h>
#include <signal.h>
#include <errno.h>
#include <sys/wait.h>
#define NUM_CHILDREN 10 /* number of children to fork */
#define PARENT_SLEEP 60 /* sleep time for parent in seconds */
void do_sigaction()
{
struct sigaction action = { SIG_IGN, 0, SA_NOCLDWAIT };
/* On Digital UNIX v3.0 and above, need this code to terminate child */
/* processes so they don't hang around and clutter up the process */
/* table as <defunct>. See sigaction(2) man-page, ref. SA_NOCLDWAIT */
printf( "\n\nCalling sigaction() for SIGCHLD with SIG_IGN &
SA_NOCLDWAIT\n\n" );
if (0 != sigaction( SIGCHLD, &action, 0 ))
perror( "sigaction" ), exit(1);
}
void fork_children()
{
int count;
pid_t childpid;
for (count=0; count < NUM_CHILDREN; count++)
if (childpid = fork())
printf( "Child %d process %d created\n", count, childpid );
else
exit( 0 ); /* child; exit now => defunct zombie */
}
void do_waitpid()
{
pid_t pid;
int status_locn;
printf( "\nCalling waitpid() with WNOHANG\n" );
while ((pid = waitpid( (pid_t) -1, &status_locn, WNOHANG )) > 0)
{
printf( "waitpid() returned <%d>\n", (int) pid );
}
}
main()
{
#ifdef DO_SIG
do_sigaction();
#endif
fork_children();
#ifdef DO_WAIT
do_waitpid();
#endif
sleep( PARENT_SLEEP ); /* wait a while so user can see defunct children
*/
printf( "parent exiting now\n" );
}
|
| Date: 6-MAR-1997 15:50:27.79
From: DEC:.REO.REOVTX::WOOD_J "[email protected]"
Subj: Re: Digital ASAP #21564: defunct (zombie) processes & signal
To: SMTP%"[email protected]"
> The processes for which these problems occur are actually
> single-threaded - we don't create child processes. The comments re:
> our signal handling function, w_signal, are interesting. Could the
> problem be a result of our call to printf, or any Sybase dblib calls?
Yes, the problem *could* be as a result of printf or sybase_dblib.
It would be better if, for example, your signal-handler for Ctrl-C
set a global flag indicating that the nuser wants to exit. Control would
then retrun from the signal handler back to main-line code, and the
current transaction could be completed. The global flag could then be
examined to see if a graceful exit should be performed before starting
the next transaction. You would need to declare such a global flag as
volatile to prevent compiler optimiations. Retro-fitting this to your
existing application probably isn't trivial. Nor can I guarentee that
it will resolve your defunct process problems. However, it is somewhere
to start.
John Wood
|