[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference turris::digital_unix

Title:DIGITAL UNIX(FORMERLY KNOWN AS DEC OSF/1)
Notice:Welcome to the Digital UNIX Conference
Moderator:SMURF::DENHAM
Created:Thu Mar 16 1995
Last Modified:Fri Jun 06 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:10068
Total number of notes:35879

9070.0. "awk and FS wrong parsing." by TRN02::FRASSINO () Fri Mar 07 1997 08:32

    Hi folks,
    this entry just to know that this is a bug or a feature on v4.0a and
    v4.0b ( Ok in v3.2C, for example):
    
    Using csh:
    
    # set HOST=`hostname`
    # echo $HOST
    hsmsrv
    
    #set ADDR=`/usr/sbin/arp -u $HOST`
    # echo $ADDR
    hsmsrv (16.192.144.232) -- no entry
    
    #set AA=`echo $ADDR | awk '{FS="."; print $4}'`
    
    you suppose to see 
    
    AA=232) -- no entry        but ...
    
    # echo $AA
    no
    
    If you repeat the awk with the option -F"." intead of FS="." inside the
    {}
    
    you see 
    
    # set BB=`echo $ADDR | awk -F"." '{ print $4}'`
    # echo $BB
    232) -- no entry
    
    using a second awk -F")" you can correctly extract the info you need.
    
    To make the long story short,
    is possible that a "historical" utility such as awk has a bug like this
    ON V.40x ( and NOT on v3.2C ) ??
    
    Have I missed something, or something is really wrong in new awk ??
    
    Thanks in advance,
    					*8-) Pierpa
T.RTitleUserPersonal
Name
DateLines
9070.1COOKIE::STMARTINAndy St.MartinFri Mar 07 1997 18:148
    The "-F." option corresponds to an awk program patter/action of
    	BEGIN { FS="." }
    If the pattern is not "BEGIN", the result depends on the awk
    implementation.  A better awk command would be
    	awk 'BEGIN {FS="."} { print $4}'
    or, using the "split function"
    	awk '{split( $0, a,"\."); print a[4]}'
                                             
9070.2Mmhhh...TRN02::FRASSINOSat Mar 08 1997 13:075
    Ok,
    so I have to say to the customer that THE AWK IMPLEMENTATION IS CHANGED 
    FROM V3.2C TO V4.0x ??
    
    Mmhhhhh ...   
9070.3SEPPLT::MARKMark GarrettMon Mar 10 1997 00:4812
If you read the man page you will find that it was not documented to work 
by setting FS inside the main body loop. I'd say that this was a bug fix!

	The correct ways are to set FS inside the BEGIN {} or use -F'.'
also the awk script will fail is a big way when host names are FQN like

	fred.someco.com

	So the script needed working on anyway.

		Cheers
			Mark :)
9070.4VAXCPU::michaudJeff Michaud - ObjectBrokerMon Mar 10 1997 11:2121
> If you read the man page you will find that it was not documented to work 
> by setting FS inside the main body loop. I'd say that this was a bug fix!
> 
> 	The correct ways are to set FS inside the BEGIN {} or use -F'.'
> also the awk script will fail is a big way when host names are FQN like

	I would think that setting FS is legal (safe) in the "body loop" as
	long as you don't reference any fields or NF after doing so
	(ie. so that the next "record" read will be broken into fields
	using the new FS).

	All that appears semi-vauge from the documentation to me is that if
	you set FS in the "body loop" is whether it takes effect immediatly
	(ie. it causes awk to break down the current record again into
	fields).  Based on the paragraph begining:

		"Execution of an awk program starts by executing..."

	it would seem to imply that setting FS inside the "body loop"
	will only affect the way awk breaks down the next record read
	from the input file.
9070.5My two cents...NETRIX::"[email protected]"Farrell WoodsMon Mar 10 1997 14:0618
The problem appears to be this: awk does all the BEGIN stuff before the
first line is read and separated into tokens (as delimited by FS.)  This
would include the -F command-line option.  If there's no BEGIN or -F
processing then awk reads in and tokenizes the first line *before* it
executes the main program body on that line.  Thus setting the field
separator as in .0 won't have any tangible effect.

(Note, if you do something like echo $ADDR >foo ; echo $ADDR >>foo, then
cat foo | awk '{FS = "." ; print $4}' you'll see what I mean.)

I agree with Jeff that the field separator can be changed mid-flight.  But
keep in mind that this affects how awk will tokenize the NEXT line of the
input file.  The current line has already been broken up.


	-- Farrell

[Posted by WWW Notes gateway]
9070.6I don't think the doc. is that vagueNETRIX::"[email protected]"Farrell WoodsMon Mar 10 1997 14:1325
If you look at the bullets under the paragraph you cited, it's appears
to be pretty concrete:

  Then, each operand in an input-file argument (or standard input if an input
  file is not specified) is processed in turn by:

    +  [XPG4-UNIX]  Reading input data until a record separator is seen (a
       newline character by default)

    +  [XPG4-UNIX]  Splitting the current record into fields using the
       current value of FS

    +  [XPG4-UNIX]  Evaluating each pattern in the program in the order of
       occurrence

Thus, the change to FS isn't noted until after the current record is
split into fields.  I don't see what's vague about any of this as far
as .0 is concerned.

This seems to confirm what we've both said in previous replies.


	-- Farrell

[Posted by WWW Notes gateway]
9070.7Thanks , but ...TRN02::FRASSINOTue Mar 11 1997 14:0711
    I accept all your philosophical explanations about awk and apologize
    about my .0.
    
    I would only to point out that we are speaking about DU3.2C and
    DU4.0a/b, so two different Digital OS versions reacts differently to
    the same awk statement.
    
    That's all.
    
    		Thanks 
    				*8-) Pierpa