[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference turris::digital_unix

Title:DIGITAL UNIX(FORMERLY KNOWN AS DEC OSF/1)
Notice:Welcome to the Digital UNIX Conference
Moderator:SMURF::DENHAM
Created:Thu Mar 16 1995
Last Modified:Fri Jun 06 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:10068
Total number of notes:35879

8772.0. "How to suspend a process?" by APACHE::CHAMBERS () Mon Feb 10 1997 11:27

Does Dunix have a way to suspend a process and then let it continue, without
otherwise interfering with its behavior?  I know that with most varieties of
Unix, the answer is a clear "No", but there are clues in dbx that this might
actually be possible with OSF.

Note that I'm not talking about how to use dbx to control another process. 
The question is:  Given a process P that is running, is there a way to tell
the kernel "Don't give P any cpu time, but don't otherwise do anything to
it"?  Then, some time later, I'd like to say "OK, process P can be allowed
to use the cpu now."  This shouldn't send P any signals or do anything else
that might affect its behavior; I'd just like to make it not get any cpu
cycles for a period of time.

It's possible that this might be documented somewhere that I haven't managed
to find (possibly because I didn't guess the right keywords)?
T.RTitleUserPersonal
Name
DateLines
8772.1VAXCPU::michaudJeff Michaud - ObjectBrokerMon Feb 10 1997 11:3315
> Does Dunix have a way to suspend a process and then let it continue, without
> otherwise interfering with its behavior?  I know that with most varieties of
> Unix, the answer is a clear "No", but there are clues in dbx that this might
> actually be possible with OSF.

	The answer is not a clear "no" on most varieties of UNIX.  Just
	the opposite ....

> It's possible that this might be documented somewhere that I haven't managed
> to find (possibly because I didn't guess the right keywords)?

	% man signal
	% man 2 kill

	The signals you want to send are SIGSTOP and SIGCONT.
8772.2a similar alternative...QUARRY::petertrigidly defined areas of doubt and uncertaintyMon Feb 10 1997 11:596
Under program control, you could attach to the process in /proc, issue an
ioctl PIOCSTOP, and later a PIOCRUN.  This will do the same type of thing,
without any signals involved, though I think the SIGSTOP and SIGCONT
will not affect the output or behavior of the program.

PeterT
8772.3Yeah, I already knew about SIGSTOP and SIGCONT ...APACHE::CHAMBERSMon Feb 10 1997 12:5652
... but they clearly violate the part of the original question that said
that the scheme should have no effect on the program's behavior.

Even if the program doesn't have signal handlers for SIGSTOP and SIGCONT,
it's still fairly normal that SIGCONT causes a system call to return -1 
with errno=EINTR.  And lots of programs have signal handlers for SIGCONT.  
I could show you a number of them in the programs that I'm working on.

This is a *long* way from having "no effect" on a program's behavior.

BTW, I also noticed a curious inconsistency in the "man 2 sigaction" page.
At one point it says "The signal parameter can be any one of the signal 
values defined in the signal.h header file, except SIGKILL."  This clearly
implies that signal handlers for SIGSTOP and SIGCONT are accepted.

But later, in the ERRORS section, we read:
  [EINVAL]  An attempt was made to ignore or supply a handler for the SIG-
        KILL, SIGSTOP, and SIGCONT signals.
This seems to just as clearly imply that attempts to install a handler for
these three signals will fail.

As a typical (i.e., paranoid ;-) C programmer, I'd conclude that I can't
rely on the (il)legality of SIGSTOP and SIGCONT handlers; code that uses
them will possibly work on some releases of the kernel and not on others.
It might even be inconsistent on a single system, depending on which part
of this man page the kernel code is reading when the sigaction() call 
is made by the application.

(From my years of experience trying to write portable code, I could easily
construe this as following the "industry standards" for Unix. ;-)


I did a quick test on a (3.2G I think) system that I have handy.  I added
code to install a SIGCONT signal handler, via signal(SIGCONT,sigcont).  It
returned 0, and errno didn't change.  The sigcont routine just wrote a message
to the program's logfile saying that it was called, and returned.  I ran the
program, which is a background daemon, and used "kill -CONT <pid>".  The log
showed the signal handler's message.  It then showed that the program got a
-1 return from an accept() call, with errno=EINTR, and since it hadn't been
programmed to understand this, it just said "Can't accept any more connections"
and died.  So it appears that 1) SIGCONT handlers are in fact allowed, at
least on 3.2G, and 2) sending a SICONT can have a disastrous effect on at 
least some processes.


This is a *very* long way from "no affect" on a process's behavior.

(Please, no flames about how the program wasn't written correctly.  I'm talking
about suspending a program that was already written, usually by someone else,
and for which I may not have the source.  The above test seems like pretty
convincing proof that anything involving SIGCONT is *not* the answer.)

8772.4PIOCSTOP sound interesting ...APACHE::CHAMBERSMon Feb 10 1997 13:2216
| Under program control, you could attach to the process in /proc, issue an
| ioctl PIOCSTOP, and later a PIOCRUN. 

This sounds interesting.  A quick `find /usr/man/ -print | xargs grep PIOCSTOP`
turned up proc.4 as the only man page that mentions this symbol.  I wonder if
there are any tools lying about that can already use them to start/stop a
process, or if I'd have to develop my own tool.  The latter is somewhat daunting,
of course, as I'm not sure that I could justify the probably several-month task
that it could easily entail, judging from past experiences twiddling /proc files.

It does seem like dbx might be able to do the job somehow, though a few brief
tests convinced me that you'd have to know *exactly* what you're doing, which
could also be a multi-month project.  All of my attempts to stop/restart a
background daemon from dbx had very noticeable effects on its behavior, up
to and including death from mysterious causes when I try to get dbx to exit.
Clearly, dbx isn't for dummies ...
8772.5SMURF::DENHAMDigital UNIX KernelMon Feb 10 1997 14:0228
    Yes, you're allowed to catch SIGCONT. What POSIX says about this
    that *regardless* of the action specified for SIGCONT, receipt of
    the signal *must* cause all pending stop signals to be discarded
    and the process to removed from the stopped state. If you're
    not catch SIGCONT, then it is indeed a nop. Catching SIGCONT
    of course also gives all those caught-signal semantics as you've
    found. In earlier versions (way back) it *was* illegal to catch
    SIGCONT. But that restriction was removed once the signal was
    required always to continue a stopped process.
    
    You can never catch SIGSTOP.
    
    Another completely nonportable approach is to use procesor sets.
    Create a process set with no processor sets in it.
    
    # pset_create
    pset_id = 2
    # pset_assign_pid 2 PID-TO-SUSPEND
    
    The process and all its threads are now in limbo. To continue them
    with no side effects:
    
    # pset_assign_pid 0 PID-TO_CONTINUE
    
    You can see the available psets with the pset_info command.
    There's always a pset 0, the default_pset, when a machine boots
    up.
    
8772.6Why do you want/need to suspend(stop) and then continue a process?VAXCPU::michaudJeff Michaud - ObjectBrokerMon Feb 10 1997 14:31102
> ... but they clearly violate the part of the original question that said
> that the scheme should have no effect on the program's behavior.

	"clearly"?  Sorry, but you should of provided more info in your
	base note which is the one that was obviously no clearly written :-)

> Even if the program doesn't have signal handlers for SIGSTOP and SIGCONT,
> it's still fairly normal that SIGCONT causes a system call to return -1 
> with errno=EINTR.

	No it's not normal for an *un*-handled signal to cause an
	interruptable system call to be interrupted.  Normal behaviour
	is that an interrutable system call can only be interrupted
	by a signal handler being invoked.

> And lots of programs have signal handlers for SIGCONT.  

	"lots" is a very vauge term (and is certainly unclear what
	you are trying to say).  I'll stick my neck out and say *most*
	(ie. a majority, ie. over 50%, and probably more like 95+%)
	of programs do *not* have a signal handler for SIGCONT (or
	any signal handler for that matter).  More likely signal
	handlers for UNIX applications are SIGCHLD & SIGPIPE.

> This is a *long* way from having "no effect" on a program's behavior.

	Another vauge word "long".  It's not a long way at all.  As
	I already contend, most applications don't setup signal handlers
	for SIGCONT.  As such, SIGSTOP/SIGCONT will have *no* effect on the
	*majority* of applications.

	Do note that regardless of how you stop an application, there are
	a small subset of applications for which you will find it does
	change the behaviour, simply because you've changed the timing.
	For example, a STREAM socket application reading from a socket
	where the sender is sending in small chunks, will usually see
	(unless they set a low water mark) their read/recv's complete
	returning multiple times with small chunks each time.  However
	if you suspend the application while the data is being received,
	when you resume the application, the application will see their
	read/recv return with more data in the buffer than they would
	of otherwise.  This can make a difference if you are trying to
	debug an application with bugs in this area.

	So saying you want "no effect" really depends on what type of
	application you have and what it's doing, and for what purpose
	you are suspending the application (which you never stated in .0).

> BTW, I also noticed a curious inconsistency in the "man 2 sigaction" page.
> At one point it says "The signal parameter can be any one of the signal 
> values defined in the signal.h header file, except SIGKILL."  This clearly
> implies that signal handlers for SIGSTOP and SIGCONT are accepted.

	QAR it.  FWIW, historically SIGSTOP and SIGKILL are the only two
	signals that can never be caught or ignored.  This is the first
	that I've heard SIGCONT is a 3rd one.

	BTW, if you're reading the sigaction man page you should of come
	across the SA_RESTART flag.  While not portable, if you're playing
	with signals then your application is non-portable to being with
	(portability is another vauge term, even if you coded to ANSI C
	or ANSI C++, you're find that a truely portable program (one that
	doesn't require some system specific #ifdef or equiv) are more rare
	than not.

> I did a quick test on a (3.2G I think) system that I have handy.  I added
> code to install a SIGCONT signal handler, via signal(SIGCONT,sigcont).  It
> returned 0, and errno didn't change.  The sigcont routine just wrote a message
> to the program's logfile saying that it was called, and returned.  I ran the
> program, which is a background daemon, and used "kill -CONT <pid>".  The log
> showed the signal handler's message.  It then showed that the program got a
> -1 return from an accept() call, with errno=EINTR, and since it hadn't been
> programmed to understand this, it just said "Can't accept any more connections"
> and died.  So it appears that 1) SIGCONT handlers are in fact allowed, at
> least on 3.2G, and 2) sending a SICONT can have a disastrous effect on at 
> least some processes.

	Anything can have a disastrous effect on a poorly written program
	as the one you described.  The program explicitly uses signal
	handlers, and is not prepared to handle accept(2) returning with
	an errno of EINTR is broken.

> This is a *very* long way from "no affect" on a process's behavior.

	You cut & paste very well :-)  I'll instead point you to my
	response to your first utterance of this sentence :-)

> (Please, no flames about how the program wasn't written correctly. I'm talking
> about suspending a program that was already written, usually by someone else,
> and for which I may not have the source.  The above test seems like pretty
> convincing proof that anything involving SIGCONT is *not* the answer.)

	Convincing?  Just the opposite.  A more *realistic* test is to
	*not* setup a signal handler for SIGCONT.  Your contention at
	the start of this note is that sending SIGSTOP or SIGCONT to a
	process will cause a suspended & interrutable syscall to be
	interrupted, even if the action for these signals in the process
	is SIG_DFL or SIG_IGN (ie. even if the process is *not* handling
	these signals).  But your so-called "test" *does* setup a signal
	handler for these signals.

	Try again :-)
8772.7APACHE::CHAMBERSMon Feb 10 1997 17:2637
| >    ... I'm talking
| > about suspending a program that was already written, usually by someone else,
| > and for which I may not have the source.  The above test seems like pretty
| > convincing proof that anything involving SIGCONT is *not* the answer.)
| 
| 	Convincing?  Just the opposite.  A more *realistic* test is to
|	*not* setup a signal handler for SIGCONT.  Your contention at
|	the start of this note is that sending SIGSTOP or SIGCONT to a
|	process will cause a suspended & interrutable syscall to be
|	interrupted, even if the action for these signals in the process
|	is SIG_DFL or SIG_IGN (ie. even if the process is *not* handling
|	these signals).  But your so-called "test" *does* setup a signal
|	handler for these signals.

Well, actually, I didn't contend any such things.  What I've concluded is that,
given a process for which I don't have source (and thus can't add or subtract
signal handlers), I can't generally expect that SIGCONT will cause the process
to continue.  In some very real cases, it causes them to die.  

This is the situation with most Unix-like systems, and so the SIGSTOP/SIGCONT
mechanism can't be used to "stop and restart a process".  It appears this is
true of OSF, also, though perhaps there's a different mechanism that works.

I don't think I've been at all ambiguous about what I'm after.  I've worked
on a lot of time-sharing systems in the past, and all of them but Unix have
had commands to suspend and unsuspend a process.  This is not a complicated
idea.  On Unix-like systems, when you aks about it, you get bogged down in
discussions like this one ....



The idea of an empty processor set is an incredibly cute one.  Now to see if
it's possible to write a script that 1) discovers whether there's a processor
set without a processor, 2) creates one if none exists; and 3) moves a proccess
to or from this set.  Looking at the output of pset_info, the thought of doing
it in a sh script is rather horrible, but perl can probably handle it without
a whole lotta grief.  Let's see ....
8772.8VAXCPU::michaudJeff Michaud - ObjectBrokerMon Feb 10 1997 21:0398
> Well, actually, I didn't contend any such things.

	Reread your own words in .3 where you said:

	    "Even if the program doesn't have signal handlers for SIGSTOP
	    and SIGCONT, it's still fairly normal that SIGCONT causes a
	    system call to return -1 with errno=EINTR."

	in my book and in webster that's what you contended :-)

con.tend \k*n-'tend\ vb [MF or L; MF contendre, fr. L contendere, fr.com- + 
   tender]e to stretch 1: to strive or vie in contest or rivalry or against 
   difficulties 2: to strive in debate : ARGUE 1: MAINTAIN, ASSERT 2: to 
   struggle for - con.tend.er n
	
> What I've concluded is that,
> given a process for which I don't have source (and thus can't add or subtract
> signal handlers), .....

	You still haven't said why you want to be able to suspend/resume
	a process in the first place.  Once someone gives some real context
	to what they are trying to do, it's not uncommon for others to offer
	better ways to skin the cat.

	BTW, FWIW, you are correct that in general on UNIX can't add/subtract
	signal handlers for an already running process.  However, you can
	startup a process to have it ignore certain signals, assuming the
	process doesn't then explicitly change the sigaction for those
	signals after it's started up.  Then there's also clever hacks
	you can do with the aid of LD_LIBRARY_PATH, but I wouldn't
	recommend that for you :-)

> I can't generally expect that SIGCONT will cause the process
> to continue.  In some very real cases, it causes them to die.  

	Yes, as you've illustrated, in the very *rare* case of an application
	that both set's up a signal handler for SIGCONT *and* is severely
	broken to begin with that they have code that expects a SIGCONT
	signal to arrive, yet can not handle the case when that signal arrives.

	Again, please provide us with some context of what your real goal
	is in stopping and starting processes w/out their cooperation.

> I don't think I've been at all ambiguous about what I'm after.

	Think again :-)

> I've worked
> on a lot of time-sharing systems in the past, and all of them but Unix have
> had commands to suspend and unsuspend a process.  This is not a complicated
> idea.  On Unix-like systems, when you aks about it, you get bogged down in
> discussions like this one ....

	There we go with words like "lot" again that are completely
	meaningless because it is both vauge and provides no examples.
	I doubt that the system or systems you refer to

	    "... have a way to suspend a process and then let it continue,
	    without otherwise interfering with its behavior?"

	as I've already illustrated previously that this can not be true
	to the standard you claim UNIX can not not achive.  There will
	always be some applications for which the action of suspending and
	then continuing will indeed interfer with it's behaviour, and
	depending on the application, can cause it to, as you say, "die".

> The idea of an empty processor set is an incredibly cute one.

	Finally we agree on something :-)  Yes it is cute.  But I am
	confused, as I got the impression you indicated you wanted
	something which was "portable" to multiple UNIX platforms,
	and neither /proc nor processor sets are very portable in
	concept, never mind in implementation.

	Do note that using processor sets requires root privs (not
	only to create a new pset, but also to assign a process to
	a pset), while /proc and SIGSTOP don't if you are the owner.

> Now to see if
> it's possible to write a script that 1) discovers whether there's a processor
> set without a processor, 2) creates one if none exists; ....

	Now I can why you are afraid of broken applications :-)
	For test purposes the above is fine, but I would advise against
	that logic in a production system.  You should not assign a
	process to a non-default pset which you did not create.  While
	you may find a pset without cpus in at the time you checked,
	unless you know who created the pset and how they use it,
	a pset you didn't create could have one or more processors assigned
	to it by the creator of the pset after you've found it empty, or
	could even destroy the pset while you are using it.

> Looking at the output of pset_info, the thought of doing
> it in a sh script is rather horrible, but perl can probably handle it without
> a whole lotta grief.  Let's see ....

	It also appears there is a programming API to this functionality
	and the API is rather simple.
8772.9A small, so far untested, example...QUARRY::petertrigidly defined areas of doubt and uncertaintyTue Feb 11 1997 13:5786
> This sounds interesting.  A quick `find /usr/man/ -print | xargs grep PIOCSTOP`
> turned up proc.4 as the only man page that mentions this symbol.  I wonder if
> there are any tools lying about that can already use them to start/stop a
> process, or if I'd have to develop my own tool.  The latter is somewhat daunting,
> of course, as I'm not sure that I could justify the probably several-month task
> that it could easily entail, judging from past experiences twiddling /proc files.


Yeah, it's all described in the proc man pages.  Several months?  Well, if you
want to get fancy, I suppose.  It took a few months to get dbx to cut over 
to using the /proc interface, and I still occassionally run into problems,
but to just stop and start:

#include <stdio.h>
#include <sys/procfs.h>
#include <sys/types.h>
#include <sys/signal.h>
#include <sys/fault.h>
#include <sys/syscall.h>

main (int argc, char **argv)
{
	char  procname[32];
	int fd;
	struct prstatus prstat;
	int timetosleep;

/* One attaches to a running process by opening the proc id in /proc.  */
/* take as input a process id, and perhaps a time you want to suspend  */
/* it for.                                                             */


	if (argc == 1) {
	  printf("Please enter a process to suspend\n");
	  exit (0);
	}
	sprintf(procname, "/proc/%d", argv[1]);
	/* get number or seconds, minutes here, or pick a default time */
	if (argc < 3)
	   timetosleep = 300;    /* set for five minutes */
	else  { timetosleep = atoi(argv[2]); }

	fd = open(procname, O_RDWR);    /* will work if you're root or own the */
					/* process */

	if (ioctl(fd, PIOCSTOP, &prstat) < 0) {
		printf("Suspend failed \n");
		exit (0);
        }

	sleep (timetosleep);

	prn.pr_flags = PRCSIG;    /* Clear any signals that caused stop.  Not */
				  /* really applicable in this case, but can't hurt */
	if (ioctl(fd, PIOCRUN, &prn) < 0)  {
	   printf("Failed to wake process %s\n", procname);
	}

	exit (0);

}





This is sort of off the top of my head, with a few references to some
other programs lying about.  I don't know that this would compile directly,
but it shouldn't be too far off.  Basically open the file in /proc for 
read and write (you need write access to suspend the task) and use the PIOCSTOP
ioctl to stop the process.  The prstat structure captures the state of the
process as it is stopped.  Might not be useful for you, but I deal with
it all the time.  Then you sleep for a period, and then use the PIOCRUN to 
start up again.  I throw in the PRCSIG in prn.pr_flags, because I'm always
clearing signals in dbx.  But at this point there are no real signals, unless
one has come from the outside.  There are other things you can do, to 
make this a bit more spiffy, but this should do what you want.  If you have 
separate processes so that one suspends, and another wakes up, you have to be
careful, as the process may continue once you close the file (as you would 
implicitly when you exit.)

This should get you started.

PeterT