[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference turris::digital_unix

Title:DIGITAL UNIX(FORMERLY KNOWN AS DEC OSF/1)
Notice:Welcome to the Digital UNIX Conference
Moderator:SMURF::DENHAM
Created:Thu Mar 16 1995
Last Modified:Fri Jun 06 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:10068
Total number of notes:35879

8725.0. "VERY strange behaviour: fork() and stdio. Is it a bug ?" by CECMOW::RAIKO () Wed Feb 05 1997 11:46

OS: DUNIX 3.2C, 4.0

The following program prints a file infinitely on DUNIXes. It doesn't seem
to be a legal behavior.

file test.c:

#include <stdio.h>

main() {
    char buf[1024];
    int  status;
    FILE* f = fopen("test.c", "r");

    while(fgets(buf, sizeof buf, f)) {
        if (fork() == 0) {
            sleep(1);
            fputs(buf, stdout);
            exit(1);
        } else {
            wait(&status);
        }
    }
}
------ end of file test.c ----

In Linux, it works as expected.

IS THIS A BUG ? 

Regards,
Gleb.
T.RTitleUserPersonal
Name
DateLines
8725.1UNIX 101NETRIX::&quot;[email protected]&quot;Farrell WoodsWed Feb 05 1997 18:2744
Your program is broken.

Here's a paragraph from the fork() man page:

    +  The child process has its own copy of the parent process's file
       descriptors.  Each of the child's file descriptors refers to the same
       open file description with the corresponding file descriptor of the
       parent process.

Since the child receives a *copy* of the parent's file descriptors, if the
child is careless enough to close those descriptors (say, via a call to exit)
then the parent's copies will likewise be invalid after that point.  The
exit function will cause "fclose" to be called for any currently open
stream.  Here's a snippet from the fclose man page:

...The stream is disassociated from the file.  If the associated
   buffer was automatically allocated, it is deallocated.  Any further use of
   the stream specified by the stream parameter causes undefined behavior.

Now the child and parent don't share the same instance of the FILE structure
(i.e. "the stream".)  So in the parent's context there's still a valid
assoiation between the stream and the file descriptor it uses.  But if you
read further on:

  The fclose() function performs the close() function on the file descriptor
  associated with the stream parameter.

This is how the child can pull the rug pull out from under the parent.  So
again if the child process calls fclose either directly or indirectly,
all bets are off if the parent tries to access the stream again.

Our C library (vs. the GNU C library used in Linux) will likely behave
differently.  You were fortunate that your program ran under Linux.

Have the child call _exit instead of exit, and this will keep the child
from trashing that portion of its state that it still shares with the parent
after a fork.  _exit is provided for exactly this kind of issue.


	-- Farrell



[Posted by WWW Notes gateway]
8725.2CECMOW::RAIKOThu Feb 06 1997 03:1820
Thank for explanation, but we still have some questions:

> Since the child receives a *copy* of the parent's file descriptors, if the
> child is careless enough to close those descriptors (say, via a call to exit)
> then the parent's copies will likewise be invalid after that point.

As we know, 
STDIO streams are implemented completely in the user space as a 
library. So, any modifications in the child stream data structures should not
affect parent's ones. The only system call we expect from fclose(3)
(for read-only stream) is close(2). Close(2) should not affect the parent
process.

Moreover, if we perform explicit close in the child process, the program in .0 
works correctly.

Could you clarify how fclose(3) can affect the parent process.

Regards,
Gleb. 
8725.3Please read .1 againNETRIX::&quot;[email protected]&quot;Farrell WoodsThu Feb 06 1997 10:1811
I explained everything in my first reply.

Yes, the child gets its own copy of the stream data structure.  No, the child
DOES NOT get a unique file descriptor.  This is made very clear in the man
page for the fork system call.

Your program is broken.

	-- Farrell

[Posted by WWW Notes gateway]
8725.4Curiosity, and a guessIOSG::MARSHALLThu Feb 06 1997 11:4133
So .1 implies that any program run from a command-line shell which exits (ie any
program!) will close the invoking shell's stdin, stdout, stderr files.

Obviously this doesn't happen, as the shell can happily process more than one
command, so what happens to break this connection between parent and child (ie
that the two processes' file descriptors refer to the same system-wide open file
table).  Probably nothing: I don't think closing a file in the child closes it
in the parent, viz:

Acording to "Advanced Unix Programming" (Rochkind, 1985) - a bit old now but an
invaluable book:

    The child gets copies of the parent's open file descriptors.  Each is
    opened to the same file, and the file pointer has the same value.  The
    file pointer is shared.  If the child changes it with lseek, then the
    parent's next read or write will be at the new location.  The file
    descriptor itself, however, is distinct: if the child closes it, the
    parent's copy is undisturbed.

I think this paragraph probably sums up .0's problem.  They are perfectly OK
to (implicitly) close the file in the child (by exiting), and the file will
still be open in the parent.  But I wouldn't be surprised if the close from
the child did funny things to the (shared) file pointer, such as resetting it
to the beginning of the file for example.  This would be one reason why the
program loops forever.

You could test this (and fix the program at the same time) by using lseek to
get the file pointer before doing fork, and using it again (in the parent!) to
reset the file pointer to the previously-remembered value after the child has
exited.

Just a hopeful guess...
Scott
8725.5VAXCPU::michaudJeff Michaud - ObjectBrokerThu Feb 06 1997 11:4425
Re: .3

	I disagree that the program is broken.  I would most definitly
	agree with you if vfork() was being used, but not when a full
	fledged fork() is being done.

	I did some testing and found what's causing the problem.  I
	used a smaller input file (/etc/motd, 2 lines on my system)
	and the behaviour was slightly different than .0 sees, in
	my case the last line of the file is only repeated once and
	the program does exit, but what I found should explain the
	indefinite repeating also.  Using the public domain syscall
	"trace" program I found:

	The parent process is doing a read(..., st_blksize).  This is
	expected.  When the child however fclose(f) (either as .0
	has by calling exit(), or by explicitly calling fclose(f)
	as the first thing after the fork), the 1st child process
	does an lseek(..., -50, SEEK_CUR).  So when the parent process
	does a read() eventually (when stdio re-fills it's buffer)
	it re-reads the last line of the file (50 is the number of
	lines in the file).

	I'd have to say it's a bug in stdio.  It should *not* be
	doing the lseek in this case.
8725.6VAXCPU::michaudJeff Michaud - ObjectBrokerThu Feb 06 1997 11:5012
> ... (50 is the number of lines in the file).

	I meant to say 50 is the number of characters in the last line
	of the file.

	It also looks like I had a notes collision with .4 :-)
	yes, one workaround for this stdio bug would be to save
	and restore the file pointer.  Another workaround is to
	do an _exit() instead of exit() in the child as .1 suggests.

	.0, you should submit a QAR (or equiv) and provide your
	bug reproducer program.
8725.7QUARRY::nethCraig NethThu Feb 06 1997 15:1612
FWIW, if you add the following line after the fopen:

	    f->_flag |= _IONONSTD;

Then it appears to work 'as expected'.   The backing up in the file
is done by fflush, and there is a big comment in the fflush source 
that tries to explain why it's doing it.  It implies it is required by
POSIX (_IONONSTD turns off the POSIX stuff).

I have no idea what should happen here.   I would suggest a QAR be filed
so that the libc maintainers can consult with the various standards and
try and give you their opinion on it.
8725.8UNIX has worked this way for a long, long time....NETRIX::&quot;[email protected]&quot;Farrell WoodsFri Feb 07 1997 13:3650
re .4:

> So .1 implies that any program run from a command-line shell which exits (ie
any
> program!) will close the invoking shell's stdin, stdout, stderr files.

Thanks for pointing this out.  I started rummaging around the sources
in search of answers.

In the case of shells, they explicitly duplicate file descriptors for
stdin, stdout, and stderr, and arrange for these to appear as fd's 0, 1,
and 2 in the child process before exec'ing a program.  Thus the child
can do whatever silly thing it wants to those descriptors and not affect
the parent's view of the descriptors

I shouldn't have been so adamant about close, because the parent can
still read from a descriptor that was closed by the child.  BUT: the
child can still mess up the parent's view of the file by doing things
that cause the seek pointer to be moved (e.g. lseek or read/write.)

I looked at the fclose path in the C library.  It appears that the fclose
path can end up calling lseek on the underlying descriptor in some cases.
fclose calls fflush if the buffered IO is in use (which it is in .0)  fflush
will want to invalidate the unconsumed portion of the buffer.  It does this
by calling lseek to move the file pointer back to the very beginning of
what wasn't consumed (by fgets.)  But all of that went on in the child and
the parent didn't see any of it and the parent doesn't know that the file
pointer got moved.  When the parent calls fgets again, the next line is read
out of the parent's buffer (no physical IO takes place.)

The loop occurs because eventually in the parent's context the buffer runs
dry.  fgets calls a function to replenish the buffer.  This works because
each time the child called fclose, fclose called fflush and moved the pointer
back (eventually moving it back to the beginning of the file.)  That made the
underlying read call scoop up the whole file again.


The point is that you have to take care yourself to make sure that a parent
and child do not share the same descriptors, if you don't want surprises
like this.

IMO the best way (still) to fix the program in .0 is for the child
to use the _exit interface.

If you QAR this it's likely to get bounced.


	-- Farrell

[Posted by WWW Notes gateway]
8725.9VAXCPU::michaudJeff Michaud - ObjectBrokerFri Feb 07 1997 19:0046
> UNIX has worked this way for a long, long time....

	That sounds like a statement not based on any research or fact.
	Especially given what Craig reported about the reason for this
	behaviour appearing to have something to do with a POSIX
	requirement.  Ie. UNIX has been around a long long long time
	*before* POSIX.

	And I do recall doing simliar things to .0 in the past and never
	running into this (on ULTRIX at least, if not also DEC OSF/1).

> IMO the best way (still) to fix the program in .0 is for the child
> to use the _exit interface.

	And how would you "fix" the program if the program did *not*
	want to exit (or _exit), but simply wanted to fclose() the
	FILE in the child after the fork? [update, answer is below]

> If you QAR this it's likely to get bounced.

	Well in the least the documentation (man pages for fopen and
	fflush) should be updated to warn of this supposed POSIX behaviour.

	This is mention of having to do a fflush before changing from
	reading to writing or vice versa when you have a FILE open for
	update ........... light bulb just clicked here for a clean "fix"
	to this problem ......

	.... prior to the fork(), manually do a fflush(f).  I just tested
	this and it works.  In this case the lseek occurs at the time it's
	fflush()ed.  As long as the stdio implementation is smart enough
	to have fflush be a no-op on a FILE that's had no io done since the
	last fflush, it will work.  Hopefully this is what POSIX says
	so that this solution would be portable.  If not, then .....

	I also tried setting the FILE to be completely unbuffered (see setbuf
	and friends) and that also works, but that does result in alot more
	syscall overhead as a read() is done for one byte of the file at
	a time.

	I still don't like this supposedly POSIX behaviour as it means if
	any API uses stdio under the covers will be screwed if they
	don't do a fflush after doing IO and before returning to the user.
	Not to mention the mess for threaded programs (thank goodness the
	only valid thing a child of threaded process is really allowed to
	do is to execve another program).
8725.10why do you want to do this?NETRIX::&quot;[email protected]&quot;Farrell WoodsMon Feb 10 1997 10:4422
You're complaining because: when two processes share an object, one process
has the ability to affect the other processes' view of that object?

Your fflush solution works because the fclose processing caused by the
child won't have any additional effect on the seek offset pointer.
In other words, the child doesn't cause the seek offset pointer to
become inconsistent with the state of the parent's FILE structure.
If the parent were to try to read from the file again before waiting
on the child you'd have a race condition though...

You should notice one read syscall per line of file with your solution,
correct?  A bit more activity than the original but not nearly as much as
the unbuffered case you mention.

Do you understand that if process A and process B share a file descriptor,
and process B calls lseek, that this affects the point from which
process A will continue reading from that file?


	-- Farrell

[Posted by WWW Notes gateway]
8725.11QAR #51440CECMOW::RAIKOMon Feb 10 1997 11:190
8725.12VAXCPU::michaudJeff Michaud - ObjectBrokerMon Feb 10 1997 11:2925
> Your fflush solution works because the fclose processing caused by the
> child won't have any additional effect on the seek offset pointer.

	Isn't that what I already said?

> You should notice one read syscall per line of file with your solution,
> correct?

	Should be one read (making the assumption no line is longer than
	the stdio buffer for that FILE) and one lseek per line (except
	of course for the last line of the file).

> Do you understand that if process A and process B share a file descriptor,
> and process B calls lseek, that this affects the point from which
> process A will continue reading from that file?

	I think I know something about UNIX after developing on it
	(and in it) for 15 years :-)

	To refresh your memory, the issue is not the lseek and read
	syscalls are operating on the same object (ie. single file
	position pointer).  The issue is that stdio, which provides
	an abstract interface, is doing an lseek at all, on fclose,
	of a file opened for "r"ead.  This is not the traditional
	behaviour.