[Search for users]
[Overall Top Noters]
[List of all Conferences]
[Download this site]
Title: | DIGITAL UNIX (FORMERLY KNOWN AS DEC OSF/1) |
Notice: | Welcome to the Digital UNIX Conference |
Moderator: | SMURF::DENHAM |
|
Created: | Thu Mar 16 1995 |
Last Modified: | Fri Jun 06 1997 |
Last Successful Update: | Fri Jun 06 1997 |
Number of topics: | 10068 |
Total number of notes: | 35879 |
8738.0. "Unprivileged unniced read/write loop causes other processes to hang ..." by APACHE::CHAMBERS () Thu Feb 06 1997 12:59
... and we're looking for tools to learn why.
We have several programs which, when run by an unprivileged users as an
ordinary, normal priority program, results it other programs blocking for
the duration. One of them is /bin/rcp; another is a tiny test program that
I wrote which, leaving out the C boilerplate, looks like this:
if (!(b = malloc(s))) {
fprintf(stderr,"%s: Can't malloc(%d) [errno=%d]\n",av[0],s,errno);
_exit(ENOMEM);
}
while ((r = n = read(0,p=b,s)) > 0) {
while (p < b+r) {
if ((w = write(1,p,n)) <= 0) {
fprintf(stderr,"%s: Can't write %d bytes [errno=%d=%s]\n",
av[0],n,errno,sys_errlist[errno]);
_exit(errno);
}
t += w; n -= w; p += w;
} }
This is just a simple read/write loop, copying stdin to stdout; the only
bit of additional complexity is the code to detect partial writes and
finish them up (which is there because of TCP's tendency to accept less
that s bytes on a write). This oughta be just about the most innocuous
piece of code around.
What happens on several of our (3.2F and 3.2G) machines is that, when it
(or rcp or any of several other tiny programs) is run, many other processes
on the machine halt dead in their tracks, and do nothing more until this
loop terminates.
The things that halt generally aren't huge, either. One of them is the
"top" program that I have running in a window. Another is csh, which
doesn't echo chars for the duration. (The csh is running from a telnet,
so it's not the window that's hung; that's on a third machine.)
Anyhow, we're desparately looking for tools that can tell us just what
that is causing this blocking. It's a total showstopper for the project
when it happens; quite literally because one of the other tasks that gets
blocked is running a video tape recorder, and its screen goes black ...
So far, all the performance monitoring tools we've tried (top, ps, vmstat,
vmucb, and others) pretty much fail, because, well, they're blocked and
don't do anything at all until the copy terminates.
A curiosity is that the tools like ps and top, when they finally wake up,
tell us that the cpu is mostly idle, with the copy loop and all other processes
totalling maybe 30% of the cpu, but the "system" time is usually over 90%.
Several tools say that a lot of memory allocation and swapping has occurred,
but according to ps, no process's pagein value has changed.
Since we've only seen it when the active process is known to be using a TCP
connection, I strongly suspect that IP is somehow involved, but I can't find
any tools that will give any more clues than that. This is only semi-suspicious,
of course, since *most* of our processes are involved with a TCP connection,
if only because their files 0, 1 and 2 are pipes to a remote (d)xterm. But
the copy loop is in fact either reading or writing a TCP socket, though it
doesn't know that because it didn't open the file.
Any ideas how we can finger the culprit?
T.R | Title | User | Personal Name | Date | Lines
|
---|