[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference turris::languages

Title:Languages
Notice:Speaking In Tongues
Moderator:TLE::TOKLAS::FELDMAN
Created:Sat Jan 25 1986
Last Modified:Wed May 21 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:394
Total number of notes:2683

116.0. "Code Generators - how?" by RT101::GRIER (This is of course impossible.) Tue Nov 11 1986 18:21

       On a somewhat-related topic to the "benchmarks" provided in a
    recent note, I've got some questions.
    
       I've been arguing lately with my college roommate about operating
    systems and languages (what else?)  I'm taking the position behind
    VMS and that all languages are created equal - just use the one
    which is best suited for getting the job done, and minimal developement
    effort.  He however stands behind C and will state that unequivocably(sp?)
    it produces much much better code.
    
       I've read through this conference and I realize that C doesn't
    really produce better code inherently - with good compilers.  However,
    in trying to convince him of this, he wants me to explain it to
    him how they produce the same code, and I'll explain that most of
    the languages share code generators and will product roughly equivalent
    code.
    
       What I'm interested in here is how they do that.  We've decided
    that there must be some sort of level of pseudo-code which is
    generated, and passed to the optimizer.  I'd like to know just how
    it all works, and in a non-confidential manner so that at least
    to him I can explain why.  He feels that with an intermediate level
    of code, the C compiler isn't allowed to generate all the optimizations
    which it possibly could, and I'm not sure.
    
       He's not a DEC employee - I guess he was working as an intern
    or something to that effect in the BASIC group some years ago, but
    if the explination if strictly company confidential, I guess I'll
    have to live with the reasons I've been giving him all along.
    
    
    					-Mike Grier
    
    
T.RTitleUserPersonal
Name
DateLines
116.1Some holiday gift ideas!TLE::FELDMANLSE, zealouslyTue Nov 11 1986 21:0427
    I'm going to leave the job of explaining intermediate code and
    language-independent optimizations to the experts, and just give
    you some sources of information:
    
    First, any good book on compilers should explain various techniques
    for intermediate code.  The "dragon" book is a standard text, for
    two values of dragon: Principles of Compiler Design, by Aho and
    Ullman, and Compilers: Principles, Techniques, and Tools, by Aho,
    Sethi, and Ullman.
    
    The book Engineering a Compiler (by a group of DEC people; sorry, my
    copy's at home, I can't find a citation here, and all of the authors
    left this group before I joined) talks specifically about an early
    version of DEC's VAX PL/I compiler; this is the origin of the VAX code
    generator that is shared by a number of our compilers.  I believe the
    book explicitly states that a C front end was either done or in the
    works.  Perhaps someone else can clarify whether or not it is public
    knowledge that our C and PL/I compilers share a back end.  
    
    I believe that the most popular C compiler is the Unix compiler.
    This compiler uses a documented intermediate code, since it is intended
    to be retargetable.  Thus the very criticism your friend applied
    to a notion of an intermediate code applies to this C compiler.  My
    recollection is that it isn't particularly outstanding with respect
    to optimizations.  

       Gary
116.2"*p++" - is Nothing sacred?NOBUGS::AMARTINAlan H. MartinWed Nov 12 1986 12:1839
Re .0:

>He feels that with an intermediate level
>    of code, the C compiler isn't allowed to generate all the optimizations
>    which it possibly could, and I'm not sure.

Ask you friend which concept is harder for the compiler to optimize into an
INCR instruction; the following Fortran fragment:

	INTEGER I
	. . .
	I = I + 1

or this C fragment:

	int i;
	. . .
	i++;

Fortran compilers accomplish the former with much work.  However, C
can generate the same object code with no optimizer at all merely by
forcing users to waste time writing "optimal" code in the first place.
If you want to write a C compiler which generates good code i = i + 1;",
then you have to go through just as much hair as for Fortran.

If you friend has been marveling at the wonderful optimizers in the average
C compiler, he should try feeding some Fortran code into it and see
how well it does.

Re .1:

The authors of "Engineering a Compiler" are Patricia Anklam, David Cutler,
Roger Heinen Jr. and M. Donald MacLaren.  The book is published by Digital
Press.  The book contains a section which I think of as saying "Towards
the end of the project we decided that implementing PL/I was getting
boring, so we decided to knock off a C compiler one weekend to liven things
up".  (Actually, I think they wrote that they wanted to see how flexible
the VCG was).
				/AHM
116.3TLE::NOLANWed Nov 12 1986 12:5538
    
    	UNIX and UNIX look-alike systems, such as DEC's ULTRIX come
    with a C compiler bundled on the kit.  This PCC, meaning Portable
    C Compiler, generate UNIX assembler which is sort of like PDP-11
    assembler code.  The VAX  versions of UNIX use generate the VAX-like
    assembler.  This code is then run through a macro assembler into
    object code and linked.  This means that you only have to rewrite
    the assembler for each new hardware that is to support UNIX.
    
    	However, taking this approach and also due to the fact that
    the original PCC compiler does not contain an optimizer, code generated
    is not very good.  The VAX C compiler, which uses the same code
    generator as VAX Ada and VAX PL/I and VAX SCAN, generates much better
    code than PCC.  Tartan Labs have written as C compiler that does
    contain an optimizer and generates much better code than PCC but
    not quite as good as VAX C, for the VAX hardware.
    
    	As Alan points in .-1, it is much harder to generate good code
    when the optimal code is not obvious.  C has many operators which
    were intended to map onto the corresponding PDP-11 macro instruction.
    C was originally designed as a systems language for the PDP-11.
    The pre and post increment and decrement instructions can be compiled
    directly to the INCx and DECx instructions on a VAX or to
    auto-increment and auto-decrement mode addressing if the variable
    is a pointer.  VAX C and Tartan Labs C both do this.  PCC does not
    always do it.
    
    	In answer to your original question:  if the languages that
    are to use the common code generator are know ahead of time, the
    intermediate language can be designed to meet the needs of all of
    the languages.  If this is true and in C's case with it's many
    operators a ggod compiler can generate good code, whether that compiler
    is for C or Ada or PL/I.  It is not the language that defines how
    good the generated code is going to be, but how good are the compiler
    writers.
    
    chris.
    
116.4Optimizer => IL neededTLE::BRETTWed Nov 12 1986 14:0327
    There is one other point that should be made.  Optimization practically
    DEMANDS an intermediate language, one that makes bare the exact
    details of the operations so that the nitty gritty can be optimized
    properly.
    
    The classic example is
    
    		A,B : array(1..10) of 5-byte-values
    
    		A(I) := B(I)
    
    Now the addressing tree for the A(I) and the B(I) both look like
    
    		+			+
    	address    *             address    *
    	  A    fetch 5             B    fetch 5
               address                  address
    		I                          I
    
    and thus, when made explicit like this, the optimizer can find the
    common multiply by 5, that was not apparent in the original source.
    
    
    In some ways, the BLISS language itself has many of the characteristics
    of a good intermediate language!
    
    /Bevin
116.5for correctness tooBACH::VANROGGENWed Nov 12 1986 17:135
    Well defined intermediate languages are also essential for
    proving the correctness of the compiler, in addition to the
    portability and simplicity mentioned earlier.
    
    			---Walter
116.6thanks! What's the intermediate code look like?RT101::GRIERThis is of course impossible.Wed Nov 12 1986 18:1733
    Re: .4:
    
       That's how I deduced that there HAD to be an intermediate level
    of code, along with the discussions of the several back-ends of
    the compilers here at digital.
    
       Hmmm...
    
       I guess my next question is "what is the structure of the
    intermediate language used?"  I personally haven't taken a course
    on compiler-writing and while I started to read a book by Gries
    on the subject, was overrun with other things and never finished,
    so I don't have much real knowledge on the subject other than my
    deductions and observations.
    
       Half of this is my own curiosity, the other 1/2 is so I can shut
    him up when he rants about how good C is (over language X - you name
    it) and how good Unix is (over VMS - I've got the personal knowledge
    to stand my own there definitely.)  My feelings are like most of
    those expressed in this conference - use whatever language best
    suits you and the project.  That's why special purpose languages
    like TPU, VAX scan, Lisp, and such come into play (OK - TPU and
    Lisp are *really* special cases I realize...)  The idea that some
    Pascal or (even worse!) BASIC code can be just as efficient as his
    cryptic C code is is very foreign to him.
    
       Thanks for the information so far.  I feel a lot more comfortable
    in what I say about it now.
    
    					-mjg
    
    
    
116.7A case where FORTRAN .GT. CTLE::RMEYERSRandy MeyersWed Nov 12 1986 19:4470
Re .3:

>It is not the language that defines how good the generated code is going
>to be, but how good are the compiler writers.


I somewhat disagree with you on this one Chris.  There are some languages
that make it so difficult for the poor compiler writer that he or she has
almost no choice between giving up or spending years of effort to be able
to optimize language FOO as well as language BAR.

An example helps to explain.  Consider the following FORTRAN program 

	SUBROUTINE SUB
	INTEGER I, J, K
	.
	.
	.
	K = I / J
	CALL SUB2
	K = I / J
	.
	.
	.
	END


the following C program:

	void SUB()
	{ static int I, J, K;
		.
		.
		.
		K = I / J;
		SUB2();
		K = I / J;
		.
		.
		.
	}


Assume that there are paths through both versions of routine SUB that
modify the values of I and J and return.  Further assume that the address
of I, J, and K are not exported from the C routine.  SUB2 is an external
routine in another file.

The two versions of routine SUB are very similar.  In both, the variables
I, J, and K have the same semantics: staticly allocated and keep their
values between invocations of SUB.  Furthermore, I, J, and K are not
accessible outside of routine SUB.

A FORTRAN compiler will easily realize that the statement after the call
to SUB2 is redundant and can be optimized away.  A C compiler, in general,
can not do the same.

The reason why is that FORTRAN subroutines are not recursive.  All C functions
are potentially recursive.  The FORTRAN compiler can take it as a given
that no one can get his hands on I and J.  The C compiler must assume
that SUB2 will call SUB recursively and that SUB itself will change the
values of I and J during the call to SUB2.

It is true that a compiler can be written to discover that SUB2 doesn't
call SUB and that it is safe to optimize the second assignment.  But,
this involves much more work than is needed to perform almost any other
optimization in C.

No DEC compilers perform (yet!) this "Universal Optimization," as it is
called.
116.8Don't be to hard on himTLE::RMEYERSRandy MeyersWed Nov 12 1986 20:2737
Re .6:

Before you lord it to much over your roomie, let me tell you his concerns
are quite valid.  There are some common back ends for compilers that do
only provide least common denominator optimizations.  His concern is just
not valid for the VMS compilers because of the great amount of effort put
into them.

Bevin gave the best example of a good intermediate representation: all of
the most elegant IL's look like a parse tree for BLISS (yes, even with the
dots).  The tree is decorated with low level type information and some
additional bookkeeping.

Unfortunately, the IL used by the VCG (the VAX Common Code Generator)
isn't quite as elegant.

By the way, any optimizing compiler worth the name uses creates an
intermediate form of the program.  (This is even true for compilers
whose code generator is made only to generate code for that one language
and compiler.)  The reason is that an optimizer needs to be able to view
an entire function as a unit in order to perform more than minimal
optimizations.  Compilers that do not produce an intermediate form
of the program typically compile, at most, one or two statements at
a time and produce dreadful code.

You may be interested in knowing that PCC produces very bad code.  This
is considered a weakness in DEC's Ultrix offering.  Some of DEC's
competitors (like SUN, in the workstation market) threw away the C
compiler that came with Unix and provide their own optimizing C
compiler.

DEC, Digital, and Ultrix are trademarks of Digital Equipment Corp.
Sun is a trademark of Sun Microsystems.

Unix is in the public domain.

(Ha, Ha, take that ATT.)
116.9so, it works like this?RT101::GRIERThis is of course impossible.Wed Nov 12 1986 21:2424
       Re: .8:
    
       That's all I really wanted to know.  So - basically the front-end
    of the compiler performs syntatical checking (lexical analysis is
    the term, right?) and generates a data structure which describes
    the flow of action in the program, in some sort of a m-way tree
    (pure conjecture there - siblings on a level are sequential
    operations?) and each sibling also points off to a structure describing
    its operands and access methods thereof.   And then the back-end
    just has to look at the standardized data, make correlations on
    operations and operands, and combine operations and optimize flow.
    Wow.  That makes a lot of sense, but am I right?
    
       I'm not trying to show him to be completely wrong, just humble
    his opinions of C/Unix. (And score some points for DEC among college
    CS grads - which is a rare feat these days.)
    
    (FYI - for reference, my fav. language is MACRO - I like being in
    complete control, for better (and sometimes) for worse.)
    
    
    					-mjg
    
    
116.10Basicly RightTLE::RMEYERSRandy MeyersThu Nov 13 1986 02:3735
Re .9:

Yes, you have the hang of it now.  I guess you are ready to go and write
your first compiler ;-).

The only error in what you said was in calling all the front end work
lexical analysis.  Lexical analysis is only part of the FE's work.
Lexical analysis is simply reading the source file character by character
building up the smallest meaningful elements of the program.  These
smallest meaningful elements of the program are called tokens; examples
of tokens are variable names, keywords, and operators.

In addition to lexical analysis, a FE also does syntatic and semantic
analysis.  Syntatic analysis is taking a stream of tokens and making sure
that the tokens form valid phrases of the language (this is also called
parsing).  Semantic analysis verifies that given whatever additional
information applies, do the phrases make sense according to the rules
of the language (for example, to the operands of the operator have
the correct type).  Semantic analysis also builds the internal
representation of the program to be passed to the back end of the
compiler.

Don't get too hung up though on the division between syntatic and semantic
analysis.  What gets checked where sometimes gets shifted around for the
convenience of the compiler writer.  For example, in Algol-60, some of
the rules on legal combinations of expressions involving boolean and
arithmetic operators were spelled out in the grammar of the language.
However, the grammar for Algol expressions is not LL(1), that is it
can not the parsed using a parsing technique in vogue at the time
most people were writing Algol-60 compilers.  So, the compiler
writers changed the grammar of Algol to allow illegal expressions
through syntatic analysis.  They then put the checks for the illegal
cases in semantic analysis.  (It turns out that checks had to go into
semantic analysis anyway, so there was no reason not to take this
approach.)
116.11DREGS::BLICKSTEINDaveThu Nov 13 1986 10:5243
    Your friend definitely needs humbling.
    
    Each language tends to have its own set of semantic definitional
    problems that impede optimization.  I can't claim to know C very
    well but I do know that C's thorny side is aliasing.  C has almost
    uninhibited aliasing.  A C pointer dereference is almost 
    like an optimization block.  It has the effect of saying "take 
    everything you think you know about the current state of variables 
    in the program and forget cause this pointer dereference can (but
    in most cases actually does not) invalidate that information".  So
    you lose some very important optimizations (CSEs, code motions,
    value propagation, range propagation, dead store elimination, 
    split lifetimes, etc.) 
    in the general case for an event that might occurs in the rare case.
    (Optimizer must always assume the worst case or they will 
    generate incorrect code.  This is known as "unsafe" optimization.)

    Considering that C is touted as a system language, this is
    one strike against it.
    
    Aliasing is handled much better in Pascal, Ada and BLISS.  Why do
    I mention that?  You should notice that it tends to support the
    recommendcation of using the most appropriate language.  If your
    program passes a lot of pointers around, you now have some 
    reasons to consider not using C since there are languages that deal
    with them much better.

    Also, its my understanding that VAX C is considered to be among
    the best optimizing C compilers.  (That's certainly the case for
    the VAX because the C group has received lots of requests to port
    VAX C to Unix.)  If VAX C is one of the better C optimizers and
    yet still loses benchmarks to other VAX languages that should at
    least cast some doubt on your friends statements about the optimization
    potential of the C language which by the way I think are mostly
    wrong - i.e. C doesn't appear to me as a language that was built for
    speed. C's concept of speed is allowing you to code in hi-level
    assembler to get 'manual optimization'.  This is different from having 
    a language whose semantics are designed to allow a high level of
    'automatic optimization'.
        
    In my opinion, your friend is being a bit sophomoric.
    
	db
116.12Brief digression on Sun's C compilerTLE::FELDMANPDS, our next successThu Nov 13 1986 11:1414
    As an aside about Sun:
    
    My recollection is that they have only released their optimizing
    F77 compiler, and that their optimizing C compiler is still under
    development.
    
    On the other hand, I assume they took responsibility for redoing
    the actually code generator part of the Unix C compiler -- the part
    that has to spit out the 68000 instructions.  I am certainly willing
    to believe that they did a better job at this than the ATT (or
    Berkeley) people did on the VAX equivalent for the Unix C compiler;
    I really don't know for sure.
    
       Gary
116.13on unix, use 'as'; on vms, use the vcgCOOKIE::DOUCETTEChuck Doucette, Database A/D @CXOThu Mar 10 1988 00:1831
I want to make a few observations/comments that are based on current knowledge
(or lack thereof). Please correct me if I am wrong.

On unix (Ultrix) the standard way of generating code for compilers (at least
this is done by the C compiler cc) is to a) write an assembler file, b) call
c2 for peephole-optimization (of the assembler file), and c) call the unix
assembler 'as' to assemble it and generate object code.

The unix assembler 'as' is supposedly written for compiler writers rather than
humans. For example, it allows the compiler to include high-level debugging
information for dbx. And, it doesn't do macros like MACRO-32 on VMS does.

On VMS, as far as I know, after a compiler (most of them I believe) gets a
good parse tree it then passes it in some common form to the VCG which does
optimization and object code generation. I'm not sure if the VCG helps
compilers output debugging information or does other things (like listings).

I find the compiler our group uses (DECWRL Modula-2) very interesting because
a) it optimizes the parse tree directly (rather than some intermediate form),
b) it generates an intermediate language (p-code, a simple stack machine) file
that is supposed to already be optimized and portable (to machines other than
vax, a titan for instance), and c) it translates this into vax assembler and
then d) calls the unix assembler 'as' to generate object code. For more
details about this compiler, see Dr. Mike Powell's paper, "A Portable
Optimizing Compiler for Modula-2".

One of the interesting arguments I have seen in the usenet news-group on
compilers is whether compilers should generate object code by themselves or
leave this to an assembler or something else (like the VCG).

Chuck
116.14Not everyone uses the VCG or MACRO!MINAR::BISHOPThu Mar 10 1988 13:4019
Re .13:
    
    I was on the Pascal project for a few years, and am now on
    the BLISS project.  I can tell you that VAX Pascal, VAX
    FORTRAN and BLISS (-16, -32 and -36) all use different
    language-specific code generators, not the VCG.  They do
    not call the assembler.  I believe that Ada, C, and PL/1
    use the VCG.
    
    BLISS and Pascal optimize the parse tree, and are portable to
    other architectures (or at least designed to be so).  The
    great advantage to no using an assembler is that the compiler
    knows more about the structure of the program--thus VAX Pascal
    leaves pseudo-instructions in the stream to mark the begining
    and ending of loops and sub-routines which have been expanded
    in-line.  This makes certain peepholes possible which an
    assembler could not do safely.
    
    				-John Bishop
116.15TLE::HOBBSThu Mar 10 1988 14:006
Also, VAX BASIC, VAX COBOL, VAX DIBOL, and VAX RPG use code generators separate
from the VCG.  VAX SCAN is another language product that does use the VCG. 

The structure of each of these compilers is often very different from the
others.  It is very difficult to make any statement that applies to all
the different VAX language products.
116.16AITG::VANROGGENThu Mar 10 1988 20:569
    And VAX LISP and VAX PROLOG also have different compiler organizations
    and code generators.
    
    VAX LISP's compiler is incremental and dynamic, in that at run-time
    one can very easily create new code, compile it (if desired for
    efficiency, otherwise can run it interpreted), and then execute
    it.   Another reason for it being different is that it has its own
    notions of linkage and memory management for efficiency reasons.
    (I.e., CALLS and LIB$* are too slow.)
116.17AdaTLE::MEIERBill Meier - VAX AdaSat Mar 12 1988 15:435
    re: past note
    
    Although VAX Ada does use the VCG for output of the final phase of code
    generation, a number of optimizations are done to the semantic tree
    first. 
116.18PSW::WINALSKIPaul S. WinalskiSat Mar 12 1988 22:2421
Here are a couple of reasons not to emit assembler source code from a compiler
and then use the assembler to produce the final object code:

1) The translation from assembler source to instructions is trivial,
   particularly when compared to a lot of the other stuff a compiler does.
   Instead of having the compiler go to all the bother of producing the
   instruction stream in ASCII form only to have the assembler put it back
   into binary, why not have the compiler directly build the binary in the
   first place?  It ups the compilation speed to do it directly.

2) When one materializes the program as assembler source, compiler-generated
   temporary names, such as branch labels, assume a real existence and one
   has the potential of clashes with user names (such as external variables).
   Unix compilers hack their ways around this problem by prefixing or
   appending varying numbers of underscores to user variable names.  This
   has all manner of curious side-effects, particularly in other tools such
   as linkers and debuggers.  When a compiler directly generates the object
   code, there is no problem with temporary names clashing with user names,
   since the compiler knows which ones are the temps.

--PSW
116.19MOIRA::FAIMANOntology Recapitulates PhilologyMon Mar 14 1988 10:326
    Also, when the compiler emits assembler source, there is the
    possibility of the user modifying that source before assembling
    it.  This is such a uniformly bad idea that anything that makes
    it easier ought to be avoided.
    
    	-Neil
116.20Yes, it's obviously bad.SMURF::REEVESJon Reeves, ULTRIX compiler groupTue Mar 15 1988 09:176
    That's right, with assembler source the user might actually be able 
    to work around a bug without an expensive support contract, or might 
    be able to apply application-specific optimizations without an
    expensive source code license.
    
    Sorry, I'm feeling cynical today.
116.21TOKLAS::FELDMANPDS, our next successTue Mar 15 1988 09:5125
    On the other hand, a common reason for writing assembler is that it is
    easier for the compiler writers than to have to write object code.
    This is especially true when you have no control over the object
    language.  Thus an independent software house could easily justify
    writing assembler source (especially on a system such as Unix, with its
    pipes).  And, as stated earlier, if there is a peephole optimizer built
    into the assembler, you get to share in the manufacturers expertise
    with respect to instruction timings. Obviously, these arguments doesn't
    apply to the systems vendor, such as DEC, since we do control the
    object language and are responsible for any such optimizers.
    
    Of course, the above arguments apply even more strongly to code
    generators.  If we ever were to ship a callable, documented version
    of the VCG, it would be the obvious way to go for third parties
    writing new compilers for DEC hardware.  Not that I'm suggesting
    we actually do that.  (Though, if we could get past the legalities,
    we might want to license it for languages that we know we'll never
    support ourselves, such as Jovial.)
    
       Gary
    
    PS Having had to munge code from a buggy Unix Modula compiler myself,
    I quite agree with the remarks in .20.  It's unfortunate that I
    was even in such a situation, but I was glad the workaround of editing
    .a files existed.
116.22Well, not quiteDSSDEV::JACKMarty JackTue Mar 15 1988 11:1712
        writing assembler source (especially on a system such as Unix, with its
    pipes).
    
    	pcc doesn't use pipes to connect the phases.  It uses temporary
	files.
    
             And, as stated earlier, if there is a peephole optimizer built
    into the assembler, you get to share in the manufacturers expertise
    
    	The peephole optimizer is not built into the assembler.  It
	runs as a separate assembler-source-to-source pass between
	code generation and assembly.
116.23It pays to generate .OBJsRACHEL::BARABASHDigital has you now!Wed Mar 16 1988 10:1215
  I guess I should add my experience here.  The VAX OPS5 compiler used to
  generate MACRO back when it was "internal only".  When I taught it to
  generate VAX .OBJs, the compiler effectively got FIVE TIMES FASTER.  The
  time required to build XCON (the VAX eXpert CONfigurer) went down from
  something like 8 CPU hours on an 11/780 to 2 hours (almost half of that
  in the VMS Linker!).  Conclusion:  it does indeed pay for compilers to
  assemble the object code.

  By the way, I found the computer-generated assembly language not signific-
  antly easier to read/maintain than the .OBJs.

  -- Bill B.

  [So *that's* why academicians like McKeeman are dissatisfied with the
  speed of their U*ix compilers!]
116.24SMURF::REEVESJon Reeves, ULTRIX compiler groupWed Mar 16 1988 10:4112
    A couple of misconceptions here:     
    
    . The UNIX as assembler was designed to be incredibly fast, mostly by 
      sacrificing many features in MACRO.  Therefore, experiences with
      MACRO (and VMS process creation) may not translate well to UNIX.
    
    . In ULTRIX, at least, knowledge of short jumps vs. long jumps is
      embedded solely within the assembler; therefore, in a sense, the
      assembler is doing some optimization after the compiler and the
      peephole optimizer have finished.  Obviously, in a system with 
      fixed-length instructions, this is not an issue.
                 
116.25TOKLAS::FELDMANPDS, our next successWed Mar 16 1988 11:4016
    Re: .22
    
    Thanks for the correction.  I didn't know that the peephole optimizer
    was a separate phase, but it makes sense, since it's consistent with
    the old Unix philosophy of writing filters.  I do have a strong
    recollection of seeing a script for the cc command, but either I'm
    misremembering the pipes, or the script I saw was merely expository
    and not the actual implementation.
    
    As for the general issue at hand, I'm wondering: Is the question really
    whether or not compilers should produce code by going through an
    assembler?  or is the question whether compilers should be able to
    produce a listing file that can be assembled, in addition to generating
    the object modules directly?  
    
       Gary 
116.26Layered products on Ultrix: name 3DENTON::AMARTINAlan H. MartinThu Mar 17 1988 19:474
Re .24:

I rather doubt that VAX C depends upon any assemblers to resolve branch lengths.
				/AHM
116.27DECnet, LISP, FORTRAN? :-)SMURF::REEVESJon Reeves, ULTRIX compiler groupFri Mar 18 1988 19:368
    re .26:
    
    Sorry, I should have made it clear that I was talking about pcc
    (and gcc and tcc) on ULTRIX; obviously, vcc has no dependence on
    the assembler.  The point I was really trying to make is that this
    (long jumps vs. short jumps) is one thing that the assembler can 
    solve very nicely, rather than each compiler [back end] writer 
    having to deal with it too.
116.28TLE::BRETTFri Mar 18 1988 23:1725
    I think you are lumping to much into the title "assembler".
    
    To me an assembler does this following...
    
    		parses source files containing "assembly language".
    
    		decides what instructions to emit, possibly doing some
    		minor tweaks like short/long jumps, short/long offsets.
    
    		generates the psect contributions
    
    		emits them into the object file in some special format
    
    Now, it seems to me that the last three steps, omitting the parsing
    of source files, could be made into a useful common backend for
    all compilers/assemblers/etc.
    
    I don't see any advantage in having compilers generate text just
    so the assembler can parse it back to instructions again.  That
    is just a waste of cpu cycles and disk space.  Of course, if you're
    in the business of building slow tools to sell fast machines...
          
    
    /Bevin
                                                                   
116.29TLE::JONANInto the Heart of the SunriseSat Mar 19 1988 15:2026
>    I think you are lumping to much into the title "assembler".
    
    Well, there are a number of translators that are usually called
    "assemblers" that do all sorts of things other than "just" assembling
    instructions from mneumonics.  I guess the one really common thread
    is that they perform 1-1 mappings from the source to "machine
    instructions".

>    Now, it seems to me that the last three steps, omitting the parsing
    
    Sure, but then I suppose it's reasonable to have an "assembler"
    (include the parse step) act as this common "backend" too.
    
>    I don't see any advantage in having compilers generate text just...
    
    Certainly not much of one (previous replies offer some possible
    reasons...), and the disadvantages clearly seem to outweigh any
    such possible benefits.
    
>    Of course, if you're
>    in the business of building slow tools to sell fast machines...
    
    Cheap shot! ;^)
    
    /Jon
    
116.30OK, let's look at efficiency...SMURF::REEVESJon Reeves, ULTRIX compiler groupMon Mar 21 1988 14:5827
    Let me offer another reason for generating assembler source.  I
    have the advantage(?) of running both styles of compiler on my
    workstation, so it's easy to compare.  Let's look at one ease-of-use
    feature: suppose I want to find out how the compiler will treat
    an operation; in other words, I want to see the machine code that
    was generated.  Now, with any compiler, I can compile and link,
    then use a debugger to inspect the result, but besides being
    inefficient, this probably loses the original variable names (and,
    if jumps are involved, it gets really messy); in any case, it involves
    using a complex tool unrelated to the task.
    
    With a compiler that generates assembler, I specify the -S switch
    and examine the .s file.  Creates one extra file (and some hackery
    can avoid that), but reasonably quick.
    
    With a compiler that produces only an object file, I need to also
    generate a listing file, then I need to run a script to extract
    the 9 meaningful lines out of the 77 lines (OK, that's a little
    unfair -- 45 of them contained no information at all).  I now have
    two files to dispose of (listing and object) instead of one.
    
    Oh, and as for efficiency: the assembler generator took less than
    half the time of the other one (they are close to equal -- slight
    edge to the assembler-generator -- for generating object files, in
    this case).  That's probably more a measure of fit-to-operating-system
    than anything, though.
                          
116.31But why do you care?TLE::MEIERBill Meier - VAX AdaMon Mar 21 1988 15:5812
    re: .-1
    
    I don't think many people (except maybe for C users :-) us a "HHL" and
    inspect the machine code generated. Either for bugs, or for possible
    "tweaks" in the source code that would improve the generated code. What
    language(s) do you think people look at the generated assembler output?
    And what for? And how often? And, if the answer to some of those
    questions is yes, for performance reasons, and/or frequently, I
    would say you have picked a HHL when you should have picked assembler
    to start with. 
    
    I consider .-1 a weak argument at best.
116.32Exactly -- leave it to the compiler writer...MJG::GRIERIn search of a real name...Mon Mar 21 1988 16:4933
    Re: .31:
    
       This (inspecting generated code) is a frequent practice of C
    programmers evidently.  Before working at Digital I was considering a
    position at Masscomp, a graphics workstation outfit in Mass., working
    for a famous ex-DECcie (Jack Burness of DEClander fame.)
     
       What Jack wanted me to do was after they ran the C compiler on their
    source, I was going to be inspecting the output and tweaking it as much
    as possible for the 68000 based systems they build for size and
    efficiency before compiling it.  They didn't trust the compiler to
    produce good code.
    
       Anyways, I guess it's evident that which option you want depends on
    your attitude towards the language.  If you want the language to be an
    abstraction tool to remove yourself from the implementation issues, you
    probably didn't use C anyways, and thus probably didn't use un*x, and
    are trusting enough to let the compiler designers do the optimization
    for you.
    
       However, it is typical that hackers who are used to using 
    inefficient compilers and trying to write nasty C statements want to
    look at the compiled code and mung it up as much as possible also. You
    begin to get (in my opinion) too worried about the implementation, and
    lose touch with the abstraction which the language is providing.
    
       I agree with .31.  If you care that much about knowing the details
    of the run-time code, you shouldn't be writing in a HLL anyways. (but
    then there's the argument by many that C isn't a HLL...)
    
    
    					-mjg
    
116.33Trust the compiler, it knows everythingSMURF::REEVESJon Reeves, ULTRIX compiler groupMon Mar 21 1988 18:1124
    Fine.  So, we are condemned to writing in assembler code (which
    is obviously not portable) or blindly trusting the compiler.
    
    In point of fact, in my latest example, I wasn't looking at the
    generated output for any reason related to efficiency; instead,
    I was assuming the compilers were correct and trying to figure out
    (by looking at what they were generating) *exactly* what the results
    of casting signed and unsigned bit fields were.  But then, in the
    abstract world, I suppose I should have used printf (and spent time
    agonizing over how to keep the call to printf from affecting the 
    results).  Or -- I should look in the manual, right?
    Well, I looked in four different references, and none directly
    addressed the question I had; I'm sure it was in at least some of
    them, but buried behind enough hidden cross references that it was
    much faster to experiment.  (And that's not an indicment of the
    documents: if you documented all the odd corners of any language
    in enough detail to answer all possible questions, you'd have an
    unusably large manual.)
    
    Gee, if I carry this argument far enough, I could argue against
    producing even the listing of assembly code -- after all, the people
    that care about the generated code should be able to plug the variables
    back into a disassembled form, right?
                            
116.34Machine code has a practical usefullnessURSA::DOTENGlenn DotenMon Mar 21 1988 18:3215
    Like Jon is saying, I have been forced to use the /MACHINE qualifier a
    number of times on modules written in Pascal, C, and BASIC. Sometimes I
    could swear that the compiler just wasn't working properly and my last
    resort was to look at the generated code and see what was REALLY going
    on (and a few of those times the compiler was at fault). Sometimes I
    just didn't understand enough about the language and taking a peek at
    the machine code gave me the understanding I was missing. This is a
    very usefull thing to be able to do; compilers certainly aren't perfect
    and never will be. It's a real pain when things aren't working the way
    you expect and you are stuck with a compiler that doesn't generate
    assembly code nor produces the machine code as part of a listing file.
    Trying to document every little nuance of a compiler simply isn't
    practical. 
    
    -Glenn-
116.35Only for the compiler developerMJG::GRIERIn search of a real name...Mon Mar 21 1988 19:5846
    Re: using machine code for debugging purposes:
    
    I don't stand by that.  If I have some code which doesn't work, I'll
    stare at it for a long time (as I assume you did.)  Then if I'm pretty
    confident that I coded by abstraction of the algorithm correctly, I'll
    doubt the compiler and compile with optimization off, with the
    assumption that most production compilers will produce correct code
    when there is no optimization applied.  From my little experience, most
    errors which occur between the syntactic/semantic analysis and code
    generation phases are from incorrect attempts at optimization.
    
    I then see what happens.  If the code runs correctly with optimization
    off, I'll try to weed the offending code into a separate module if
    possible and try again (there are obviously conditions where
    module-wide attributes can lead to faulty optimization.)  If it still
    doesn't work, and I'm still positive the algorithm is correct, once
    I've reduced the incorrect code to the smallest chuck possible, I'll
    compile that with optimization off, and generate relatively slow code
    which works for a small portion of the module.  AND I'll report a bug
    to the compiler group (SPR external customer/QAR internal).
    
    I guess my point is that the only reason to need to see the generated
    code is to debug the compiler, and that's not really our job anyways. 
    It's the compiler-writer's.  Hopefully the bugs were introducted in the
    optimization phase of the compilation, and thus can be worked around by
    requesting that the compiler not perform optimization.  If the bugs
    occur in a non-optimized case, then (slightly heavy handed comment
    here) the compiler probably wasn't ready for release anyways, and the
    writer is again to blame, and in this case in a nasty way, because a
    basic function of the language is malfunctioning.  The test suite for
    the compiler was inadequate.
    
    I personally am not against allowing the compilers to show you the
    machine code generated.  I've (in moments of terrible despiration) done
    the same thing as Glenn suggested, only to find my own problems.  (In
    my case a MACRO-32 routine wasn't saving all the registers it should
    have, which the compiler didn't use in the /NOOPTIMIZE case, but did in
    the /OPTIMIZE case.)
    
    So I guess you can sum up my opinion as that machine code output should
    only be a compiler-debug feature, not a programming tool.
    
    					-mjg
    
    
    
116.36No -- even when the compiler is *right*SMURF::REEVESJon Reeves, ULTRIX compiler groupTue Mar 22 1988 10:0412
    Perhaps I was unclear.  I didn't mean that machine code inspection
    should be used early in the debugging process.  In the case at hand,
    the problem had already been diagnosed as, "casts of bit
    fields are behaving in ways that are not clear to me."  The results
    had already been tested on three different compilers, and found
    to be identical; therefore, there was no question of a compiler
    bug.  The goal was to reach understanding by experiment: trying
    certain representative cases.  There is *no* easy way to determine
    the result of a cast operation in C other than code inspection.
    
    Finally, note my personal name...  but I've run across similar cases
    when I wasn't in a compiler group, and when I wasn't using C.
116.37I can "C"TLE::MEIERBill Meier - VAX AdaTue Mar 22 1988 14:036
    re: .-1
    
    "There is *no* easy way to determine the result of a cast operation in
    C other than code inspection." 

    Perhaps that makes some statement about the language?
116.38DSSDEV::JACKMarty JackTue Mar 22 1988 15:357
    The result of a cast is the same as if the value casted were assigned
    to a temporary variable of the type casted to.  Seems clear to me.
    
    The Masscomp compilers are the only Unix compilers I know of that
    produce a listing file, let alone with the machine code in it. 
    I had a lot to do with their creation in a previous life (but I
    didn't write the optimizer, hold the rocks please).
116.39Don't slime other compilers with pccDENTON::AMARTINAlan H. MartinThu Mar 24 1988 11:1466
Re < Note 116.28 by TLE::BRETT >:

Hmmm.  I've never really heard the case for using a machine-specific IL and a
shared peephole optimized/object file writer put quite so clearly before. It is
a good idea, and you expressed it well.

Re < Note 116.30 by SMURF::REEVES "Jon Reeves, ULTRIX compiler group" >:

Your example neglects a corollary of a point of Paul's from .18 - that
compiler-generated machine listings have the potential to (and traditionally do)
contain more information for a human reader than the output meant for
consumption by an assembler could.  For instance, if a C int formal parameter
named "i" lives 4 bytes beyond the start of the argument list, an assembly code
listing could say this:

	MOVL	i,R0
or
	MOVL	i(AP),R0
or
	MOVL	4(AP),R0	;i,R0

while I see that pcc (at least) generates this:

	MOVL	4(AP),R0

Note that output destined for assemblers without nested scopes for defining
symbols (most assemblers, I bet), cannot simply say this:

	i=4
	MOVL	i(AP),R0

because there might be a nested definition of "i" in the routine.  (This
is why AT&T C++ preprocessor changes all auto variables named foo to identifiers
something like the form _au0_foo, _au1_foo, etc.)  You could cut out the extra
garbage if "i" was unique, but that's just more work.

See HELP BLISS/MACHINE_CODE_LIST:([NO]ASSEMBLER,[NO]UNIQUE_NAMES) for an
implementation which does a lot to deal with these issues.

>    With a compiler that produces only an object file, I need to also
>    generate a listing file, then I need to run a script to extract
>    the 9 meaningful lines out of the 77 lines (OK, that's a little
>    unfair -- 45 of them contained no information at all).  I now have
>    two files to dispose of (listing and object) instead of one.

Your position seems to reflect living in an environment where compilers don't
generate assembly-code listings.  I've had better from KA10s and its F40
compiler since before I graduated from high school, well over a decade ago.
(Including the switch combination "/LIST/NOBINARY" which doesn't write an
unwanted object file, yet doesn't suppress internal code generation).  Perhaps
you should raise your expectations of compilers, or lower your opinion of what
you use.

Re < Note 116.33 by SMURF::REEVES "Jon Reeves, ULTRIX compiler group" :

If Harbison&Steele doesn't answer your question handily, I'll be disappointed.
Of course the answer may be "implementation-dependent", but then that's part
of the charm of using C.

Re < Note 116.35 by MJG::GRIER "In search of a real name..." >:

Have you ever had an SPR or QAR response that says "User error - application
makes illegal assumptions which cause different results under optimization"?
That seems at least as likely as uncovering lots of errors in the DEC compilers
I've worked with.
				/AHM