[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference turris::digital_unix

Title:	DIGITAL UNIX(FORMERLY KNOWN AS DEC OSF/1)
Notice:	Welcome to the Digital UNIX Conference
Moderator:	SMURF::DENHAM

Created:	Thu Mar 16 1995
Last Modified:	Fri Jun 06 1997
Last Successful Update:	Fri Jun 06 1997
Number of topics:	10068
Total number of notes:	35879

8725.0. "VERY strange behaviour: fork() and stdio. Is it a bug ?" by CECMOW::RAIKO () Wed Feb 05 1997 11:46

OS: DUNIX 3.2C, 4.0

The following program prints a file infinitely on DUNIXes. It doesn't seem
to be a legal behavior.

file test.c:

#include <stdio.h>

main() {
    char buf[1024];
    int  status;
    FILE* f = fopen("test.c", "r");

    while(fgets(buf, sizeof buf, f)) {
        if (fork() == 0) {
            sleep(1);
            fputs(buf, stdout);
            exit(1);
        } else {
            wait(&status);
        }
    }
}
------ end of file test.c ----

In Linux, it works as expected.

IS THIS A BUG ? 

Regards,
Gleb.

T.R	Title	User	Personal Name	Date	Lines
8725.1	UNIX 101	NETRIX::"[email protected]"	Farrell Woods	`Wed Feb 05 1997 18:27`	44
	Your program is broken. Here's a paragraph from the fork() man page: + The child process has its own copy of the parent process's file descriptors. Each of the child's file descriptors refers to the same open file description with the corresponding file descriptor of the parent process. Since the child receives a copy of the parent's file descriptors, if the child is careless enough to close those descriptors (say, via a call to exit) then the parent's copies will likewise be invalid after that point. The exit function will cause "fclose" to be called for any currently open stream. Here's a snippet from the fclose man page: ...The stream is disassociated from the file. If the associated buffer was automatically allocated, it is deallocated. Any further use of the stream specified by the stream parameter causes undefined behavior. Now the child and parent don't share the same instance of the FILE structure (i.e. "the stream".) So in the parent's context there's still a valid assoiation between the stream and the file descriptor it uses. But if you read further on: The fclose() function performs the close() function on the file descriptor associated with the stream parameter. This is how the child can pull the rug pull out from under the parent. So again if the child process calls fclose either directly or indirectly, all bets are off if the parent tries to access the stream again. Our C library (vs. the GNU C library used in Linux) will likely behave differently. You were fortunate that your program ran under Linux. Have the child call _exit instead of exit, and this will keep the child from trashing that portion of its state that it still shares with the parent after a fork. _exit is provided for exactly this kind of issue. -- Farrell [Posted by WWW Notes gateway]
8725.2		CECMOW::RAIKO		`Thu Feb 06 1997 03:18`	20
	Thank for explanation, but we still have some questions: > Since the child receives a copy of the parent's file descriptors, if the > child is careless enough to close those descriptors (say, via a call to exit) > then the parent's copies will likewise be invalid after that point. As we know, STDIO streams are implemented completely in the user space as a library. So, any modifications in the child stream data structures should not affect parent's ones. The only system call we expect from fclose(3) (for read-only stream) is close(2). Close(2) should not affect the parent process. Moreover, if we perform explicit close in the child process, the program in .0 works correctly. Could you clarify how fclose(3) can affect the parent process. Regards, Gleb.
8725.3	Please read .1 again	NETRIX::"[email protected]"	Farrell Woods	`Thu Feb 06 1997 10:18`	11
	I explained everything in my first reply. Yes, the child gets its own copy of the stream data structure. No, the child DOES NOT get a unique file descriptor. This is made very clear in the man page for the fork system call. Your program is broken. -- Farrell [Posted by WWW Notes gateway]
8725.4	Curiosity, and a guess	IOSG::MARSHALL		`Thu Feb 06 1997 11:41`	33
	So .1 implies that any program run from a command-line shell which exits (ie any program!) will close the invoking shell's stdin, stdout, stderr files. Obviously this doesn't happen, as the shell can happily process more than one command, so what happens to break this connection between parent and child (ie that the two processes' file descriptors refer to the same system-wide open file table). Probably nothing: I don't think closing a file in the child closes it in the parent, viz: Acording to "Advanced Unix Programming" (Rochkind, 1985) - a bit old now but an invaluable book: The child gets copies of the parent's open file descriptors. Each is opened to the same file, and the file pointer has the same value. The file pointer is shared. If the child changes it with lseek, then the parent's next read or write will be at the new location. The file descriptor itself, however, is distinct: if the child closes it, the parent's copy is undisturbed. I think this paragraph probably sums up .0's problem. They are perfectly OK to (implicitly) close the file in the child (by exiting), and the file will still be open in the parent. But I wouldn't be surprised if the close from the child did funny things to the (shared) file pointer, such as resetting it to the beginning of the file for example. This would be one reason why the program loops forever. You could test this (and fix the program at the same time) by using lseek to get the file pointer before doing fork, and using it again (in the parent!) to reset the file pointer to the previously-remembered value after the child has exited. Just a hopeful guess... Scott
8725.5		VAXCPU::michaud	Jeff Michaud - ObjectBroker	`Thu Feb 06 1997 11:44`	25
	Re: .3 I disagree that the program is broken. I would most definitly agree with you if vfork() was being used, but not when a full fledged fork() is being done. I did some testing and found what's causing the problem. I used a smaller input file (/etc/motd, 2 lines on my system) and the behaviour was slightly different than .0 sees, in my case the last line of the file is only repeated once and the program does exit, but what I found should explain the indefinite repeating also. Using the public domain syscall "trace" program I found: The parent process is doing a read(..., st_blksize). This is expected. When the child however fclose(f) (either as .0 has by calling exit(), or by explicitly calling fclose(f) as the first thing after the fork), the 1st child process does an lseek(..., -50, SEEK_CUR). So when the parent process does a read() eventually (when stdio re-fills it's buffer) it re-reads the last line of the file (50 is the number of lines in the file). I'd have to say it's a bug in stdio. It should not be doing the lseek in this case.
8725.6		VAXCPU::michaud	Jeff Michaud - ObjectBroker	`Thu Feb 06 1997 11:50`	12
	> ... (50 is the number of lines in the file). I meant to say 50 is the number of characters in the last line of the file. It also looks like I had a notes collision with .4 :-) yes, one workaround for this stdio bug would be to save and restore the file pointer. Another workaround is to do an _exit() instead of exit() in the child as .1 suggests. .0, you should submit a QAR (or equiv) and provide your bug reproducer program.
8725.7		QUARRY::neth	Craig Neth	`Thu Feb 06 1997 15:16`	12
	FWIW, if you add the following line after the fopen: f->_flag \|= _IONONSTD; Then it appears to work 'as expected'. The backing up in the file is done by fflush, and there is a big comment in the fflush source that tries to explain why it's doing it. It implies it is required by POSIX (_IONONSTD turns off the POSIX stuff). I have no idea what should happen here. I would suggest a QAR be filed so that the libc maintainers can consult with the various standards and try and give you their opinion on it.
8725.8	UNIX has worked this way for a long, long time....	NETRIX::"[email protected]"	Farrell Woods	`Fri Feb 07 1997 13:36`	50
	re .4: > So .1 implies that any program run from a command-line shell which exits (ie any > program!) will close the invoking shell's stdin, stdout, stderr files. Thanks for pointing this out. I started rummaging around the sources in search of answers. In the case of shells, they explicitly duplicate file descriptors for stdin, stdout, and stderr, and arrange for these to appear as fd's 0, 1, and 2 in the child process before exec'ing a program. Thus the child can do whatever silly thing it wants to those descriptors and not affect the parent's view of the descriptors I shouldn't have been so adamant about close, because the parent can still read from a descriptor that was closed by the child. BUT: the child can still mess up the parent's view of the file by doing things that cause the seek pointer to be moved (e.g. lseek or read/write.) I looked at the fclose path in the C library. It appears that the fclose path can end up calling lseek on the underlying descriptor in some cases. fclose calls fflush if the buffered IO is in use (which it is in .0) fflush will want to invalidate the unconsumed portion of the buffer. It does this by calling lseek to move the file pointer back to the very beginning of what wasn't consumed (by fgets.) But all of that went on in the child and the parent didn't see any of it and the parent doesn't know that the file pointer got moved. When the parent calls fgets again, the next line is read out of the parent's buffer (no physical IO takes place.) The loop occurs because eventually in the parent's context the buffer runs dry. fgets calls a function to replenish the buffer. This works because each time the child called fclose, fclose called fflush and moved the pointer back (eventually moving it back to the beginning of the file.) That made the underlying read call scoop up the whole file again. The point is that you have to take care yourself to make sure that a parent and child do not share the same descriptors, if you don't want surprises like this. IMO the best way (still) to fix the program in .0 is for the child to use the _exit interface. If you QAR this it's likely to get bounced. -- Farrell [Posted by WWW Notes gateway]
8725.9		VAXCPU::michaud	Jeff Michaud - ObjectBroker	`Fri Feb 07 1997 19:00`	46
	> UNIX has worked this way for a long, long time.... That sounds like a statement not based on any research or fact. Especially given what Craig reported about the reason for this behaviour appearing to have something to do with a POSIX requirement. Ie. UNIX has been around a long long long time before POSIX. And I do recall doing simliar things to .0 in the past and never running into this (on ULTRIX at least, if not also DEC OSF/1). > IMO the best way (still) to fix the program in .0 is for the child > to use the _exit interface. And how would you "fix" the program if the program did not want to exit (or _exit), but simply wanted to fclose() the FILE in the child after the fork? [update, answer is below] > If you QAR this it's likely to get bounced. Well in the least the documentation (man pages for fopen and fflush) should be updated to warn of this supposed POSIX behaviour. This is mention of having to do a fflush before changing from reading to writing or vice versa when you have a FILE open for update ........... light bulb just clicked here for a clean "fix" to this problem ...... .... prior to the fork(), manually do a fflush(f). I just tested this and it works. In this case the lseek occurs at the time it's fflush()ed. As long as the stdio implementation is smart enough to have fflush be a no-op on a FILE that's had no io done since the last fflush, it will work. Hopefully this is what POSIX says so that this solution would be portable. If not, then ..... I also tried setting the FILE to be completely unbuffered (see setbuf and friends) and that also works, but that does result in alot more syscall overhead as a read() is done for one byte of the file at a time. I still don't like this supposedly POSIX behaviour as it means if any API uses stdio under the covers will be screwed if they don't do a fflush after doing IO and before returning to the user. Not to mention the mess for threaded programs (thank goodness the only valid thing a child of threaded process is really allowed to do is to execve another program).
8725.10	why do you want to do this?	NETRIX::"[email protected]"	Farrell Woods	`Mon Feb 10 1997 10:44`	22
	You're complaining because: when two processes share an object, one process has the ability to affect the other processes' view of that object? Your fflush solution works because the fclose processing caused by the child won't have any additional effect on the seek offset pointer. In other words, the child doesn't cause the seek offset pointer to become inconsistent with the state of the parent's FILE structure. If the parent were to try to read from the file again before waiting on the child you'd have a race condition though... You should notice one read syscall per line of file with your solution, correct? A bit more activity than the original but not nearly as much as the unbuffered case you mention. Do you understand that if process A and process B share a file descriptor, and process B calls lseek, that this affects the point from which process A will continue reading from that file? -- Farrell [Posted by WWW Notes gateway]
8725.11	QAR #51440	CECMOW::RAIKO		`Mon Feb 10 1997 11:19`	0
8725.12		VAXCPU::michaud	Jeff Michaud - ObjectBroker	`Mon Feb 10 1997 11:29`	25
	> Your fflush solution works because the fclose processing caused by the > child won't have any additional effect on the seek offset pointer. Isn't that what I already said? > You should notice one read syscall per line of file with your solution, > correct? Should be one read (making the assumption no line is longer than the stdio buffer for that FILE) and one lseek per line (except of course for the last line of the file). > Do you understand that if process A and process B share a file descriptor, > and process B calls lseek, that this affects the point from which > process A will continue reading from that file? I think I know something about UNIX after developing on it (and in it) for 15 years :-) To refresh your memory, the issue is not the lseek and read syscalls are operating on the same object (ie. single file position pointer). The issue is that stdio, which provides an abstract interface, is doing an lseek at all, on fclose, of a file opened for "r"ead. This is not the traditional behaviour.