T.R | Title | User | Personal Name | Date | Lines |
---|
8772.1 | | VAXCPU::michaud | Jeff Michaud - ObjectBroker | Mon Feb 10 1997 11:33 | 15 |
| > Does Dunix have a way to suspend a process and then let it continue, without
> otherwise interfering with its behavior? I know that with most varieties of
> Unix, the answer is a clear "No", but there are clues in dbx that this might
> actually be possible with OSF.
The answer is not a clear "no" on most varieties of UNIX. Just
the opposite ....
> It's possible that this might be documented somewhere that I haven't managed
> to find (possibly because I didn't guess the right keywords)?
% man signal
% man 2 kill
The signals you want to send are SIGSTOP and SIGCONT.
|
8772.2 | a similar alternative... | QUARRY::petert | rigidly defined areas of doubt and uncertainty | Mon Feb 10 1997 11:59 | 6 |
| Under program control, you could attach to the process in /proc, issue an
ioctl PIOCSTOP, and later a PIOCRUN. This will do the same type of thing,
without any signals involved, though I think the SIGSTOP and SIGCONT
will not affect the output or behavior of the program.
PeterT
|
8772.3 | Yeah, I already knew about SIGSTOP and SIGCONT ... | APACHE::CHAMBERS | | Mon Feb 10 1997 12:56 | 52 |
| ... but they clearly violate the part of the original question that said
that the scheme should have no effect on the program's behavior.
Even if the program doesn't have signal handlers for SIGSTOP and SIGCONT,
it's still fairly normal that SIGCONT causes a system call to return -1
with errno=EINTR. And lots of programs have signal handlers for SIGCONT.
I could show you a number of them in the programs that I'm working on.
This is a *long* way from having "no effect" on a program's behavior.
BTW, I also noticed a curious inconsistency in the "man 2 sigaction" page.
At one point it says "The signal parameter can be any one of the signal
values defined in the signal.h header file, except SIGKILL." This clearly
implies that signal handlers for SIGSTOP and SIGCONT are accepted.
But later, in the ERRORS section, we read:
[EINVAL] An attempt was made to ignore or supply a handler for the SIG-
KILL, SIGSTOP, and SIGCONT signals.
This seems to just as clearly imply that attempts to install a handler for
these three signals will fail.
As a typical (i.e., paranoid ;-) C programmer, I'd conclude that I can't
rely on the (il)legality of SIGSTOP and SIGCONT handlers; code that uses
them will possibly work on some releases of the kernel and not on others.
It might even be inconsistent on a single system, depending on which part
of this man page the kernel code is reading when the sigaction() call
is made by the application.
(From my years of experience trying to write portable code, I could easily
construe this as following the "industry standards" for Unix. ;-)
I did a quick test on a (3.2G I think) system that I have handy. I added
code to install a SIGCONT signal handler, via signal(SIGCONT,sigcont). It
returned 0, and errno didn't change. The sigcont routine just wrote a message
to the program's logfile saying that it was called, and returned. I ran the
program, which is a background daemon, and used "kill -CONT <pid>". The log
showed the signal handler's message. It then showed that the program got a
-1 return from an accept() call, with errno=EINTR, and since it hadn't been
programmed to understand this, it just said "Can't accept any more connections"
and died. So it appears that 1) SIGCONT handlers are in fact allowed, at
least on 3.2G, and 2) sending a SICONT can have a disastrous effect on at
least some processes.
This is a *very* long way from "no affect" on a process's behavior.
(Please, no flames about how the program wasn't written correctly. I'm talking
about suspending a program that was already written, usually by someone else,
and for which I may not have the source. The above test seems like pretty
convincing proof that anything involving SIGCONT is *not* the answer.)
|
8772.4 | PIOCSTOP sound interesting ... | APACHE::CHAMBERS | | Mon Feb 10 1997 13:22 | 16 |
| | Under program control, you could attach to the process in /proc, issue an
| ioctl PIOCSTOP, and later a PIOCRUN.
This sounds interesting. A quick `find /usr/man/ -print | xargs grep PIOCSTOP`
turned up proc.4 as the only man page that mentions this symbol. I wonder if
there are any tools lying about that can already use them to start/stop a
process, or if I'd have to develop my own tool. The latter is somewhat daunting,
of course, as I'm not sure that I could justify the probably several-month task
that it could easily entail, judging from past experiences twiddling /proc files.
It does seem like dbx might be able to do the job somehow, though a few brief
tests convinced me that you'd have to know *exactly* what you're doing, which
could also be a multi-month project. All of my attempts to stop/restart a
background daemon from dbx had very noticeable effects on its behavior, up
to and including death from mysterious causes when I try to get dbx to exit.
Clearly, dbx isn't for dummies ...
|
8772.5 | | SMURF::DENHAM | Digital UNIX Kernel | Mon Feb 10 1997 14:02 | 28 |
| Yes, you're allowed to catch SIGCONT. What POSIX says about this
that *regardless* of the action specified for SIGCONT, receipt of
the signal *must* cause all pending stop signals to be discarded
and the process to removed from the stopped state. If you're
not catch SIGCONT, then it is indeed a nop. Catching SIGCONT
of course also gives all those caught-signal semantics as you've
found. In earlier versions (way back) it *was* illegal to catch
SIGCONT. But that restriction was removed once the signal was
required always to continue a stopped process.
You can never catch SIGSTOP.
Another completely nonportable approach is to use procesor sets.
Create a process set with no processor sets in it.
# pset_create
pset_id = 2
# pset_assign_pid 2 PID-TO-SUSPEND
The process and all its threads are now in limbo. To continue them
with no side effects:
# pset_assign_pid 0 PID-TO_CONTINUE
You can see the available psets with the pset_info command.
There's always a pset 0, the default_pset, when a machine boots
up.
|
8772.6 | Why do you want/need to suspend(stop) and then continue a process? | VAXCPU::michaud | Jeff Michaud - ObjectBroker | Mon Feb 10 1997 14:31 | 102 |
| > ... but they clearly violate the part of the original question that said
> that the scheme should have no effect on the program's behavior.
"clearly"? Sorry, but you should of provided more info in your
base note which is the one that was obviously no clearly written :-)
> Even if the program doesn't have signal handlers for SIGSTOP and SIGCONT,
> it's still fairly normal that SIGCONT causes a system call to return -1
> with errno=EINTR.
No it's not normal for an *un*-handled signal to cause an
interruptable system call to be interrupted. Normal behaviour
is that an interrutable system call can only be interrupted
by a signal handler being invoked.
> And lots of programs have signal handlers for SIGCONT.
"lots" is a very vauge term (and is certainly unclear what
you are trying to say). I'll stick my neck out and say *most*
(ie. a majority, ie. over 50%, and probably more like 95+%)
of programs do *not* have a signal handler for SIGCONT (or
any signal handler for that matter). More likely signal
handlers for UNIX applications are SIGCHLD & SIGPIPE.
> This is a *long* way from having "no effect" on a program's behavior.
Another vauge word "long". It's not a long way at all. As
I already contend, most applications don't setup signal handlers
for SIGCONT. As such, SIGSTOP/SIGCONT will have *no* effect on the
*majority* of applications.
Do note that regardless of how you stop an application, there are
a small subset of applications for which you will find it does
change the behaviour, simply because you've changed the timing.
For example, a STREAM socket application reading from a socket
where the sender is sending in small chunks, will usually see
(unless they set a low water mark) their read/recv's complete
returning multiple times with small chunks each time. However
if you suspend the application while the data is being received,
when you resume the application, the application will see their
read/recv return with more data in the buffer than they would
of otherwise. This can make a difference if you are trying to
debug an application with bugs in this area.
So saying you want "no effect" really depends on what type of
application you have and what it's doing, and for what purpose
you are suspending the application (which you never stated in .0).
> BTW, I also noticed a curious inconsistency in the "man 2 sigaction" page.
> At one point it says "The signal parameter can be any one of the signal
> values defined in the signal.h header file, except SIGKILL." This clearly
> implies that signal handlers for SIGSTOP and SIGCONT are accepted.
QAR it. FWIW, historically SIGSTOP and SIGKILL are the only two
signals that can never be caught or ignored. This is the first
that I've heard SIGCONT is a 3rd one.
BTW, if you're reading the sigaction man page you should of come
across the SA_RESTART flag. While not portable, if you're playing
with signals then your application is non-portable to being with
(portability is another vauge term, even if you coded to ANSI C
or ANSI C++, you're find that a truely portable program (one that
doesn't require some system specific #ifdef or equiv) are more rare
than not.
> I did a quick test on a (3.2G I think) system that I have handy. I added
> code to install a SIGCONT signal handler, via signal(SIGCONT,sigcont). It
> returned 0, and errno didn't change. The sigcont routine just wrote a message
> to the program's logfile saying that it was called, and returned. I ran the
> program, which is a background daemon, and used "kill -CONT <pid>". The log
> showed the signal handler's message. It then showed that the program got a
> -1 return from an accept() call, with errno=EINTR, and since it hadn't been
> programmed to understand this, it just said "Can't accept any more connections"
> and died. So it appears that 1) SIGCONT handlers are in fact allowed, at
> least on 3.2G, and 2) sending a SICONT can have a disastrous effect on at
> least some processes.
Anything can have a disastrous effect on a poorly written program
as the one you described. The program explicitly uses signal
handlers, and is not prepared to handle accept(2) returning with
an errno of EINTR is broken.
> This is a *very* long way from "no affect" on a process's behavior.
You cut & paste very well :-) I'll instead point you to my
response to your first utterance of this sentence :-)
> (Please, no flames about how the program wasn't written correctly. I'm talking
> about suspending a program that was already written, usually by someone else,
> and for which I may not have the source. The above test seems like pretty
> convincing proof that anything involving SIGCONT is *not* the answer.)
Convincing? Just the opposite. A more *realistic* test is to
*not* setup a signal handler for SIGCONT. Your contention at
the start of this note is that sending SIGSTOP or SIGCONT to a
process will cause a suspended & interrutable syscall to be
interrupted, even if the action for these signals in the process
is SIG_DFL or SIG_IGN (ie. even if the process is *not* handling
these signals). But your so-called "test" *does* setup a signal
handler for these signals.
Try again :-)
|
8772.7 | | APACHE::CHAMBERS | | Mon Feb 10 1997 17:26 | 37 |
| | > ... I'm talking
| > about suspending a program that was already written, usually by someone else,
| > and for which I may not have the source. The above test seems like pretty
| > convincing proof that anything involving SIGCONT is *not* the answer.)
|
| Convincing? Just the opposite. A more *realistic* test is to
| *not* setup a signal handler for SIGCONT. Your contention at
| the start of this note is that sending SIGSTOP or SIGCONT to a
| process will cause a suspended & interrutable syscall to be
| interrupted, even if the action for these signals in the process
| is SIG_DFL or SIG_IGN (ie. even if the process is *not* handling
| these signals). But your so-called "test" *does* setup a signal
| handler for these signals.
Well, actually, I didn't contend any such things. What I've concluded is that,
given a process for which I don't have source (and thus can't add or subtract
signal handlers), I can't generally expect that SIGCONT will cause the process
to continue. In some very real cases, it causes them to die.
This is the situation with most Unix-like systems, and so the SIGSTOP/SIGCONT
mechanism can't be used to "stop and restart a process". It appears this is
true of OSF, also, though perhaps there's a different mechanism that works.
I don't think I've been at all ambiguous about what I'm after. I've worked
on a lot of time-sharing systems in the past, and all of them but Unix have
had commands to suspend and unsuspend a process. This is not a complicated
idea. On Unix-like systems, when you aks about it, you get bogged down in
discussions like this one ....
The idea of an empty processor set is an incredibly cute one. Now to see if
it's possible to write a script that 1) discovers whether there's a processor
set without a processor, 2) creates one if none exists; and 3) moves a proccess
to or from this set. Looking at the output of pset_info, the thought of doing
it in a sh script is rather horrible, but perl can probably handle it without
a whole lotta grief. Let's see ....
|
8772.8 | | VAXCPU::michaud | Jeff Michaud - ObjectBroker | Mon Feb 10 1997 21:03 | 98 |
| > Well, actually, I didn't contend any such things.
Reread your own words in .3 where you said:
"Even if the program doesn't have signal handlers for SIGSTOP
and SIGCONT, it's still fairly normal that SIGCONT causes a
system call to return -1 with errno=EINTR."
in my book and in webster that's what you contended :-)
con.tend \k*n-'tend\ vb [MF or L; MF contendre, fr. L contendere, fr.com- +
tender]e to stretch 1: to strive or vie in contest or rivalry or against
difficulties 2: to strive in debate : ARGUE 1: MAINTAIN, ASSERT 2: to
struggle for - con.tend.er n
> What I've concluded is that,
> given a process for which I don't have source (and thus can't add or subtract
> signal handlers), .....
You still haven't said why you want to be able to suspend/resume
a process in the first place. Once someone gives some real context
to what they are trying to do, it's not uncommon for others to offer
better ways to skin the cat.
BTW, FWIW, you are correct that in general on UNIX can't add/subtract
signal handlers for an already running process. However, you can
startup a process to have it ignore certain signals, assuming the
process doesn't then explicitly change the sigaction for those
signals after it's started up. Then there's also clever hacks
you can do with the aid of LD_LIBRARY_PATH, but I wouldn't
recommend that for you :-)
> I can't generally expect that SIGCONT will cause the process
> to continue. In some very real cases, it causes them to die.
Yes, as you've illustrated, in the very *rare* case of an application
that both set's up a signal handler for SIGCONT *and* is severely
broken to begin with that they have code that expects a SIGCONT
signal to arrive, yet can not handle the case when that signal arrives.
Again, please provide us with some context of what your real goal
is in stopping and starting processes w/out their cooperation.
> I don't think I've been at all ambiguous about what I'm after.
Think again :-)
> I've worked
> on a lot of time-sharing systems in the past, and all of them but Unix have
> had commands to suspend and unsuspend a process. This is not a complicated
> idea. On Unix-like systems, when you aks about it, you get bogged down in
> discussions like this one ....
There we go with words like "lot" again that are completely
meaningless because it is both vauge and provides no examples.
I doubt that the system or systems you refer to
"... have a way to suspend a process and then let it continue,
without otherwise interfering with its behavior?"
as I've already illustrated previously that this can not be true
to the standard you claim UNIX can not not achive. There will
always be some applications for which the action of suspending and
then continuing will indeed interfer with it's behaviour, and
depending on the application, can cause it to, as you say, "die".
> The idea of an empty processor set is an incredibly cute one.
Finally we agree on something :-) Yes it is cute. But I am
confused, as I got the impression you indicated you wanted
something which was "portable" to multiple UNIX platforms,
and neither /proc nor processor sets are very portable in
concept, never mind in implementation.
Do note that using processor sets requires root privs (not
only to create a new pset, but also to assign a process to
a pset), while /proc and SIGSTOP don't if you are the owner.
> Now to see if
> it's possible to write a script that 1) discovers whether there's a processor
> set without a processor, 2) creates one if none exists; ....
Now I can why you are afraid of broken applications :-)
For test purposes the above is fine, but I would advise against
that logic in a production system. You should not assign a
process to a non-default pset which you did not create. While
you may find a pset without cpus in at the time you checked,
unless you know who created the pset and how they use it,
a pset you didn't create could have one or more processors assigned
to it by the creator of the pset after you've found it empty, or
could even destroy the pset while you are using it.
> Looking at the output of pset_info, the thought of doing
> it in a sh script is rather horrible, but perl can probably handle it without
> a whole lotta grief. Let's see ....
It also appears there is a programming API to this functionality
and the API is rather simple.
|
8772.9 | A small, so far untested, example... | QUARRY::petert | rigidly defined areas of doubt and uncertainty | Tue Feb 11 1997 13:57 | 86 |
| > This sounds interesting. A quick `find /usr/man/ -print | xargs grep PIOCSTOP`
> turned up proc.4 as the only man page that mentions this symbol. I wonder if
> there are any tools lying about that can already use them to start/stop a
> process, or if I'd have to develop my own tool. The latter is somewhat daunting,
> of course, as I'm not sure that I could justify the probably several-month task
> that it could easily entail, judging from past experiences twiddling /proc files.
Yeah, it's all described in the proc man pages. Several months? Well, if you
want to get fancy, I suppose. It took a few months to get dbx to cut over
to using the /proc interface, and I still occassionally run into problems,
but to just stop and start:
#include <stdio.h>
#include <sys/procfs.h>
#include <sys/types.h>
#include <sys/signal.h>
#include <sys/fault.h>
#include <sys/syscall.h>
main (int argc, char **argv)
{
char procname[32];
int fd;
struct prstatus prstat;
int timetosleep;
/* One attaches to a running process by opening the proc id in /proc. */
/* take as input a process id, and perhaps a time you want to suspend */
/* it for. */
if (argc == 1) {
printf("Please enter a process to suspend\n");
exit (0);
}
sprintf(procname, "/proc/%d", argv[1]);
/* get number or seconds, minutes here, or pick a default time */
if (argc < 3)
timetosleep = 300; /* set for five minutes */
else { timetosleep = atoi(argv[2]); }
fd = open(procname, O_RDWR); /* will work if you're root or own the */
/* process */
if (ioctl(fd, PIOCSTOP, &prstat) < 0) {
printf("Suspend failed \n");
exit (0);
}
sleep (timetosleep);
prn.pr_flags = PRCSIG; /* Clear any signals that caused stop. Not */
/* really applicable in this case, but can't hurt */
if (ioctl(fd, PIOCRUN, &prn) < 0) {
printf("Failed to wake process %s\n", procname);
}
exit (0);
}
This is sort of off the top of my head, with a few references to some
other programs lying about. I don't know that this would compile directly,
but it shouldn't be too far off. Basically open the file in /proc for
read and write (you need write access to suspend the task) and use the PIOCSTOP
ioctl to stop the process. The prstat structure captures the state of the
process as it is stopped. Might not be useful for you, but I deal with
it all the time. Then you sleep for a period, and then use the PIOCRUN to
start up again. I throw in the PRCSIG in prn.pr_flags, because I'm always
clearing signals in dbx. But at this point there are no real signals, unless
one has come from the outside. There are other things you can do, to
make this a bit more spiffy, but this should do what you want. If you have
separate processes so that one suspends, and another wakes up, you have to be
careful, as the process may continue once you close the file (as you would
implicitly when you exit.)
This should get you started.
PeterT
|