T.R | Title | User | Personal Name | Date | Lines |
---|
8725.1 | UNIX 101 | NETRIX::"[email protected]" | Farrell Woods | Wed Feb 05 1997 18:27 | 44 |
| Your program is broken.
Here's a paragraph from the fork() man page:
+ The child process has its own copy of the parent process's file
descriptors. Each of the child's file descriptors refers to the same
open file description with the corresponding file descriptor of the
parent process.
Since the child receives a *copy* of the parent's file descriptors, if the
child is careless enough to close those descriptors (say, via a call to exit)
then the parent's copies will likewise be invalid after that point. The
exit function will cause "fclose" to be called for any currently open
stream. Here's a snippet from the fclose man page:
...The stream is disassociated from the file. If the associated
buffer was automatically allocated, it is deallocated. Any further use of
the stream specified by the stream parameter causes undefined behavior.
Now the child and parent don't share the same instance of the FILE structure
(i.e. "the stream".) So in the parent's context there's still a valid
assoiation between the stream and the file descriptor it uses. But if you
read further on:
The fclose() function performs the close() function on the file descriptor
associated with the stream parameter.
This is how the child can pull the rug pull out from under the parent. So
again if the child process calls fclose either directly or indirectly,
all bets are off if the parent tries to access the stream again.
Our C library (vs. the GNU C library used in Linux) will likely behave
differently. You were fortunate that your program ran under Linux.
Have the child call _exit instead of exit, and this will keep the child
from trashing that portion of its state that it still shares with the parent
after a fork. _exit is provided for exactly this kind of issue.
-- Farrell
[Posted by WWW Notes gateway]
|
8725.2 | | CECMOW::RAIKO | | Thu Feb 06 1997 03:18 | 20 |
| Thank for explanation, but we still have some questions:
> Since the child receives a *copy* of the parent's file descriptors, if the
> child is careless enough to close those descriptors (say, via a call to exit)
> then the parent's copies will likewise be invalid after that point.
As we know,
STDIO streams are implemented completely in the user space as a
library. So, any modifications in the child stream data structures should not
affect parent's ones. The only system call we expect from fclose(3)
(for read-only stream) is close(2). Close(2) should not affect the parent
process.
Moreover, if we perform explicit close in the child process, the program in .0
works correctly.
Could you clarify how fclose(3) can affect the parent process.
Regards,
Gleb.
|
8725.3 | Please read .1 again | NETRIX::"[email protected]" | Farrell Woods | Thu Feb 06 1997 10:18 | 11 |
| I explained everything in my first reply.
Yes, the child gets its own copy of the stream data structure. No, the child
DOES NOT get a unique file descriptor. This is made very clear in the man
page for the fork system call.
Your program is broken.
-- Farrell
[Posted by WWW Notes gateway]
|
8725.4 | Curiosity, and a guess | IOSG::MARSHALL | | Thu Feb 06 1997 11:41 | 33 |
| So .1 implies that any program run from a command-line shell which exits (ie any
program!) will close the invoking shell's stdin, stdout, stderr files.
Obviously this doesn't happen, as the shell can happily process more than one
command, so what happens to break this connection between parent and child (ie
that the two processes' file descriptors refer to the same system-wide open file
table). Probably nothing: I don't think closing a file in the child closes it
in the parent, viz:
Acording to "Advanced Unix Programming" (Rochkind, 1985) - a bit old now but an
invaluable book:
The child gets copies of the parent's open file descriptors. Each is
opened to the same file, and the file pointer has the same value. The
file pointer is shared. If the child changes it with lseek, then the
parent's next read or write will be at the new location. The file
descriptor itself, however, is distinct: if the child closes it, the
parent's copy is undisturbed.
I think this paragraph probably sums up .0's problem. They are perfectly OK
to (implicitly) close the file in the child (by exiting), and the file will
still be open in the parent. But I wouldn't be surprised if the close from
the child did funny things to the (shared) file pointer, such as resetting it
to the beginning of the file for example. This would be one reason why the
program loops forever.
You could test this (and fix the program at the same time) by using lseek to
get the file pointer before doing fork, and using it again (in the parent!) to
reset the file pointer to the previously-remembered value after the child has
exited.
Just a hopeful guess...
Scott
|
8725.5 | | VAXCPU::michaud | Jeff Michaud - ObjectBroker | Thu Feb 06 1997 11:44 | 25 |
| Re: .3
I disagree that the program is broken. I would most definitly
agree with you if vfork() was being used, but not when a full
fledged fork() is being done.
I did some testing and found what's causing the problem. I
used a smaller input file (/etc/motd, 2 lines on my system)
and the behaviour was slightly different than .0 sees, in
my case the last line of the file is only repeated once and
the program does exit, but what I found should explain the
indefinite repeating also. Using the public domain syscall
"trace" program I found:
The parent process is doing a read(..., st_blksize). This is
expected. When the child however fclose(f) (either as .0
has by calling exit(), or by explicitly calling fclose(f)
as the first thing after the fork), the 1st child process
does an lseek(..., -50, SEEK_CUR). So when the parent process
does a read() eventually (when stdio re-fills it's buffer)
it re-reads the last line of the file (50 is the number of
lines in the file).
I'd have to say it's a bug in stdio. It should *not* be
doing the lseek in this case.
|
8725.6 | | VAXCPU::michaud | Jeff Michaud - ObjectBroker | Thu Feb 06 1997 11:50 | 12 |
| > ... (50 is the number of lines in the file).
I meant to say 50 is the number of characters in the last line
of the file.
It also looks like I had a notes collision with .4 :-)
yes, one workaround for this stdio bug would be to save
and restore the file pointer. Another workaround is to
do an _exit() instead of exit() in the child as .1 suggests.
.0, you should submit a QAR (or equiv) and provide your
bug reproducer program.
|
8725.7 | | QUARRY::neth | Craig Neth | Thu Feb 06 1997 15:16 | 12 |
| FWIW, if you add the following line after the fopen:
f->_flag |= _IONONSTD;
Then it appears to work 'as expected'. The backing up in the file
is done by fflush, and there is a big comment in the fflush source
that tries to explain why it's doing it. It implies it is required by
POSIX (_IONONSTD turns off the POSIX stuff).
I have no idea what should happen here. I would suggest a QAR be filed
so that the libc maintainers can consult with the various standards and
try and give you their opinion on it.
|
8725.8 | UNIX has worked this way for a long, long time.... | NETRIX::"[email protected]" | Farrell Woods | Fri Feb 07 1997 13:36 | 50 |
| re .4:
> So .1 implies that any program run from a command-line shell which exits (ie
any
> program!) will close the invoking shell's stdin, stdout, stderr files.
Thanks for pointing this out. I started rummaging around the sources
in search of answers.
In the case of shells, they explicitly duplicate file descriptors for
stdin, stdout, and stderr, and arrange for these to appear as fd's 0, 1,
and 2 in the child process before exec'ing a program. Thus the child
can do whatever silly thing it wants to those descriptors and not affect
the parent's view of the descriptors
I shouldn't have been so adamant about close, because the parent can
still read from a descriptor that was closed by the child. BUT: the
child can still mess up the parent's view of the file by doing things
that cause the seek pointer to be moved (e.g. lseek or read/write.)
I looked at the fclose path in the C library. It appears that the fclose
path can end up calling lseek on the underlying descriptor in some cases.
fclose calls fflush if the buffered IO is in use (which it is in .0) fflush
will want to invalidate the unconsumed portion of the buffer. It does this
by calling lseek to move the file pointer back to the very beginning of
what wasn't consumed (by fgets.) But all of that went on in the child and
the parent didn't see any of it and the parent doesn't know that the file
pointer got moved. When the parent calls fgets again, the next line is read
out of the parent's buffer (no physical IO takes place.)
The loop occurs because eventually in the parent's context the buffer runs
dry. fgets calls a function to replenish the buffer. This works because
each time the child called fclose, fclose called fflush and moved the pointer
back (eventually moving it back to the beginning of the file.) That made the
underlying read call scoop up the whole file again.
The point is that you have to take care yourself to make sure that a parent
and child do not share the same descriptors, if you don't want surprises
like this.
IMO the best way (still) to fix the program in .0 is for the child
to use the _exit interface.
If you QAR this it's likely to get bounced.
-- Farrell
[Posted by WWW Notes gateway]
|
8725.9 | | VAXCPU::michaud | Jeff Michaud - ObjectBroker | Fri Feb 07 1997 19:00 | 46 |
| > UNIX has worked this way for a long, long time....
That sounds like a statement not based on any research or fact.
Especially given what Craig reported about the reason for this
behaviour appearing to have something to do with a POSIX
requirement. Ie. UNIX has been around a long long long time
*before* POSIX.
And I do recall doing simliar things to .0 in the past and never
running into this (on ULTRIX at least, if not also DEC OSF/1).
> IMO the best way (still) to fix the program in .0 is for the child
> to use the _exit interface.
And how would you "fix" the program if the program did *not*
want to exit (or _exit), but simply wanted to fclose() the
FILE in the child after the fork? [update, answer is below]
> If you QAR this it's likely to get bounced.
Well in the least the documentation (man pages for fopen and
fflush) should be updated to warn of this supposed POSIX behaviour.
This is mention of having to do a fflush before changing from
reading to writing or vice versa when you have a FILE open for
update ........... light bulb just clicked here for a clean "fix"
to this problem ......
.... prior to the fork(), manually do a fflush(f). I just tested
this and it works. In this case the lseek occurs at the time it's
fflush()ed. As long as the stdio implementation is smart enough
to have fflush be a no-op on a FILE that's had no io done since the
last fflush, it will work. Hopefully this is what POSIX says
so that this solution would be portable. If not, then .....
I also tried setting the FILE to be completely unbuffered (see setbuf
and friends) and that also works, but that does result in alot more
syscall overhead as a read() is done for one byte of the file at
a time.
I still don't like this supposedly POSIX behaviour as it means if
any API uses stdio under the covers will be screwed if they
don't do a fflush after doing IO and before returning to the user.
Not to mention the mess for threaded programs (thank goodness the
only valid thing a child of threaded process is really allowed to
do is to execve another program).
|
8725.10 | why do you want to do this? | NETRIX::"[email protected]" | Farrell Woods | Mon Feb 10 1997 10:44 | 22 |
| You're complaining because: when two processes share an object, one process
has the ability to affect the other processes' view of that object?
Your fflush solution works because the fclose processing caused by the
child won't have any additional effect on the seek offset pointer.
In other words, the child doesn't cause the seek offset pointer to
become inconsistent with the state of the parent's FILE structure.
If the parent were to try to read from the file again before waiting
on the child you'd have a race condition though...
You should notice one read syscall per line of file with your solution,
correct? A bit more activity than the original but not nearly as much as
the unbuffered case you mention.
Do you understand that if process A and process B share a file descriptor,
and process B calls lseek, that this affects the point from which
process A will continue reading from that file?
-- Farrell
[Posted by WWW Notes gateway]
|
8725.11 | QAR #51440 | CECMOW::RAIKO | | Mon Feb 10 1997 11:19 | 0 |
8725.12 | | VAXCPU::michaud | Jeff Michaud - ObjectBroker | Mon Feb 10 1997 11:29 | 25 |
| > Your fflush solution works because the fclose processing caused by the
> child won't have any additional effect on the seek offset pointer.
Isn't that what I already said?
> You should notice one read syscall per line of file with your solution,
> correct?
Should be one read (making the assumption no line is longer than
the stdio buffer for that FILE) and one lseek per line (except
of course for the last line of the file).
> Do you understand that if process A and process B share a file descriptor,
> and process B calls lseek, that this affects the point from which
> process A will continue reading from that file?
I think I know something about UNIX after developing on it
(and in it) for 15 years :-)
To refresh your memory, the issue is not the lseek and read
syscalls are operating on the same object (ie. single file
position pointer). The issue is that stdio, which provides
an abstract interface, is doing an lseek at all, on fclose,
of a file opened for "r"ead. This is not the traditional
behaviour.
|