[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference turris::languages

Title:	Languages
Notice:	Speaking In Tongues
Moderator:	TLE::TOKLAS::FELDMAN

Created:	Sat Jan 25 1986
Last Modified:	Wed May 21 1997
Last Successful Update:	Fri Jun 06 1997
Number of topics:	394
Total number of notes:	2683

116.0. "Code Generators - how?" by RT101::GRIER (This is of course impossible.) Tue Nov 11 1986 18:21

       On a somewhat-related topic to the "benchmarks" provided in a
    recent note, I've got some questions.
    
       I've been arguing lately with my college roommate about operating
    systems and languages (what else?)  I'm taking the position behind
    VMS and that all languages are created equal - just use the one
    which is best suited for getting the job done, and minimal developement
    effort.  He however stands behind C and will state that unequivocably(sp?)
    it produces much much better code.
    
       I've read through this conference and I realize that C doesn't
    really produce better code inherently - with good compilers.  However,
    in trying to convince him of this, he wants me to explain it to
    him how they produce the same code, and I'll explain that most of
    the languages share code generators and will product roughly equivalent
    code.
    
       What I'm interested in here is how they do that.  We've decided
    that there must be some sort of level of pseudo-code which is
    generated, and passed to the optimizer.  I'd like to know just how
    it all works, and in a non-confidential manner so that at least
    to him I can explain why.  He feels that with an intermediate level
    of code, the C compiler isn't allowed to generate all the optimizations
    which it possibly could, and I'm not sure.
    
       He's not a DEC employee - I guess he was working as an intern
    or something to that effect in the BASIC group some years ago, but
    if the explination if strictly company confidential, I guess I'll
    have to live with the reasons I've been giving him all along.
    
    
    					-Mike Grier

T.R	Title	User	Personal Name	Date	Lines
116.1	Some holiday gift ideas!	TLE::FELDMAN	LSE, zealously	`Tue Nov 11 1986 21:04`	27
	I'm going to leave the job of explaining intermediate code and language-independent optimizations to the experts, and just give you some sources of information: First, any good book on compilers should explain various techniques for intermediate code. The "dragon" book is a standard text, for two values of dragon: Principles of Compiler Design, by Aho and Ullman, and Compilers: Principles, Techniques, and Tools, by Aho, Sethi, and Ullman. The book Engineering a Compiler (by a group of DEC people; sorry, my copy's at home, I can't find a citation here, and all of the authors left this group before I joined) talks specifically about an early version of DEC's VAX PL/I compiler; this is the origin of the VAX code generator that is shared by a number of our compilers. I believe the book explicitly states that a C front end was either done or in the works. Perhaps someone else can clarify whether or not it is public knowledge that our C and PL/I compilers share a back end. I believe that the most popular C compiler is the Unix compiler. This compiler uses a documented intermediate code, since it is intended to be retargetable. Thus the very criticism your friend applied to a notion of an intermediate code applies to this C compiler. My recollection is that it isn't particularly outstanding with respect to optimizations. Gary
116.2	"*p++" - is Nothing sacred?	NOBUGS::AMARTIN	Alan H. Martin	`Wed Nov 12 1986 12:18`	39
	Re .0: >He feels that with an intermediate level > of code, the C compiler isn't allowed to generate all the optimizations > which it possibly could, and I'm not sure. Ask you friend which concept is harder for the compiler to optimize into an INCR instruction; the following Fortran fragment: INTEGER I . . . I = I + 1 or this C fragment: int i; . . . i++; Fortran compilers accomplish the former with much work. However, C can generate the same object code with no optimizer at all merely by forcing users to waste time writing "optimal" code in the first place. If you want to write a C compiler which generates good code i = i + 1;", then you have to go through just as much hair as for Fortran. If you friend has been marveling at the wonderful optimizers in the average C compiler, he should try feeding some Fortran code into it and see how well it does. Re .1: The authors of "Engineering a Compiler" are Patricia Anklam, David Cutler, Roger Heinen Jr. and M. Donald MacLaren. The book is published by Digital Press. The book contains a section which I think of as saying "Towards the end of the project we decided that implementing PL/I was getting boring, so we decided to knock off a C compiler one weekend to liven things up". (Actually, I think they wrote that they wanted to see how flexible the VCG was). /AHM
116.3		TLE::NOLAN		`Wed Nov 12 1986 12:55`	38
	UNIX and UNIX look-alike systems, such as DEC's ULTRIX come with a C compiler bundled on the kit. This PCC, meaning Portable C Compiler, generate UNIX assembler which is sort of like PDP-11 assembler code. The VAX versions of UNIX use generate the VAX-like assembler. This code is then run through a macro assembler into object code and linked. This means that you only have to rewrite the assembler for each new hardware that is to support UNIX. However, taking this approach and also due to the fact that the original PCC compiler does not contain an optimizer, code generated is not very good. The VAX C compiler, which uses the same code generator as VAX Ada and VAX PL/I and VAX SCAN, generates much better code than PCC. Tartan Labs have written as C compiler that does contain an optimizer and generates much better code than PCC but not quite as good as VAX C, for the VAX hardware. As Alan points in .-1, it is much harder to generate good code when the optimal code is not obvious. C has many operators which were intended to map onto the corresponding PDP-11 macro instruction. C was originally designed as a systems language for the PDP-11. The pre and post increment and decrement instructions can be compiled directly to the INCx and DECx instructions on a VAX or to auto-increment and auto-decrement mode addressing if the variable is a pointer. VAX C and Tartan Labs C both do this. PCC does not always do it. In answer to your original question: if the languages that are to use the common code generator are know ahead of time, the intermediate language can be designed to meet the needs of all of the languages. If this is true and in C's case with it's many operators a ggod compiler can generate good code, whether that compiler is for C or Ada or PL/I. It is not the language that defines how good the generated code is going to be, but how good are the compiler writers. chris.
116.4	Optimizer => IL needed	TLE::BRETT		`Wed Nov 12 1986 14:03`	27
	There is one other point that should be made. Optimization practically DEMANDS an intermediate language, one that makes bare the exact details of the operations so that the nitty gritty can be optimized properly. The classic example is A,B : array(1..10) of 5-byte-values A(I) := B(I) Now the addressing tree for the A(I) and the B(I) both look like + + address * address * A fetch 5 B fetch 5 address address I I and thus, when made explicit like this, the optimizer can find the common multiply by 5, that was not apparent in the original source. In some ways, the BLISS language itself has many of the characteristics of a good intermediate language! /Bevin
116.5	for correctness too	BACH::VANROGGEN		`Wed Nov 12 1986 17:13`	5
	Well defined intermediate languages are also essential for proving the correctness of the compiler, in addition to the portability and simplicity mentioned earlier. ---Walter
116.6	thanks! What's the intermediate code look like?	RT101::GRIER	This is of course impossible.	`Wed Nov 12 1986 18:17`	33
	Re: .4: That's how I deduced that there HAD to be an intermediate level of code, along with the discussions of the several back-ends of the compilers here at digital. Hmmm... I guess my next question is "what is the structure of the intermediate language used?" I personally haven't taken a course on compiler-writing and while I started to read a book by Gries on the subject, was overrun with other things and never finished, so I don't have much real knowledge on the subject other than my deductions and observations. Half of this is my own curiosity, the other 1/2 is so I can shut him up when he rants about how good C is (over language X - you name it) and how good Unix is (over VMS - I've got the personal knowledge to stand my own there definitely.) My feelings are like most of those expressed in this conference - use whatever language best suits you and the project. That's why special purpose languages like TPU, VAX scan, Lisp, and such come into play (OK - TPU and Lisp are really special cases I realize...) The idea that some Pascal or (even worse!) BASIC code can be just as efficient as his cryptic C code is is very foreign to him. Thanks for the information so far. I feel a lot more comfortable in what I say about it now. -mjg
116.7	A case where FORTRAN .GT. C	TLE::RMEYERS	Randy Meyers	`Wed Nov 12 1986 19:44`	70
	Re .3: >It is not the language that defines how good the generated code is going >to be, but how good are the compiler writers. I somewhat disagree with you on this one Chris. There are some languages that make it so difficult for the poor compiler writer that he or she has almost no choice between giving up or spending years of effort to be able to optimize language FOO as well as language BAR. An example helps to explain. Consider the following FORTRAN program SUBROUTINE SUB INTEGER I, J, K . . . K = I / J CALL SUB2 K = I / J . . . END the following C program: void SUB() { static int I, J, K; . . . K = I / J; SUB2(); K = I / J; . . . } Assume that there are paths through both versions of routine SUB that modify the values of I and J and return. Further assume that the address of I, J, and K are not exported from the C routine. SUB2 is an external routine in another file. The two versions of routine SUB are very similar. In both, the variables I, J, and K have the same semantics: staticly allocated and keep their values between invocations of SUB. Furthermore, I, J, and K are not accessible outside of routine SUB. A FORTRAN compiler will easily realize that the statement after the call to SUB2 is redundant and can be optimized away. A C compiler, in general, can not do the same. The reason why is that FORTRAN subroutines are not recursive. All C functions are potentially recursive. The FORTRAN compiler can take it as a given that no one can get his hands on I and J. The C compiler must assume that SUB2 will call SUB recursively and that SUB itself will change the values of I and J during the call to SUB2. It is true that a compiler can be written to discover that SUB2 doesn't call SUB and that it is safe to optimize the second assignment. But, this involves much more work than is needed to perform almost any other optimization in C. No DEC compilers perform (yet!) this "Universal Optimization," as it is called.
116.8	Don't be to hard on him	TLE::RMEYERS	Randy Meyers	`Wed Nov 12 1986 20:27`	37
	Re .6: Before you lord it to much over your roomie, let me tell you his concerns are quite valid. There are some common back ends for compilers that do only provide least common denominator optimizations. His concern is just not valid for the VMS compilers because of the great amount of effort put into them. Bevin gave the best example of a good intermediate representation: all of the most elegant IL's look like a parse tree for BLISS (yes, even with the dots). The tree is decorated with low level type information and some additional bookkeeping. Unfortunately, the IL used by the VCG (the VAX Common Code Generator) isn't quite as elegant. By the way, any optimizing compiler worth the name uses creates an intermediate form of the program. (This is even true for compilers whose code generator is made only to generate code for that one language and compiler.) The reason is that an optimizer needs to be able to view an entire function as a unit in order to perform more than minimal optimizations. Compilers that do not produce an intermediate form of the program typically compile, at most, one or two statements at a time and produce dreadful code. You may be interested in knowing that PCC produces very bad code. This is considered a weakness in DEC's Ultrix offering. Some of DEC's competitors (like SUN, in the workstation market) threw away the C compiler that came with Unix and provide their own optimizing C compiler. DEC, Digital, and Ultrix are trademarks of Digital Equipment Corp. Sun is a trademark of Sun Microsystems. Unix is in the public domain. (Ha, Ha, take that ATT.)
116.9	so, it works like this?	RT101::GRIER	This is of course impossible.	`Wed Nov 12 1986 21:24`	24
	Re: .8: That's all I really wanted to know. So - basically the front-end of the compiler performs syntatical checking (lexical analysis is the term, right?) and generates a data structure which describes the flow of action in the program, in some sort of a m-way tree (pure conjecture there - siblings on a level are sequential operations?) and each sibling also points off to a structure describing its operands and access methods thereof. And then the back-end just has to look at the standardized data, make correlations on operations and operands, and combine operations and optimize flow. Wow. That makes a lot of sense, but am I right? I'm not trying to show him to be completely wrong, just humble his opinions of C/Unix. (And score some points for DEC among college CS grads - which is a rare feat these days.) (FYI - for reference, my fav. language is MACRO - I like being in complete control, for better (and sometimes) for worse.) -mjg
116.10	Basicly Right	TLE::RMEYERS	Randy Meyers	`Thu Nov 13 1986 02:37`	35
	Re .9: Yes, you have the hang of it now. I guess you are ready to go and write your first compiler ;-). The only error in what you said was in calling all the front end work lexical analysis. Lexical analysis is only part of the FE's work. Lexical analysis is simply reading the source file character by character building up the smallest meaningful elements of the program. These smallest meaningful elements of the program are called tokens; examples of tokens are variable names, keywords, and operators. In addition to lexical analysis, a FE also does syntatic and semantic analysis. Syntatic analysis is taking a stream of tokens and making sure that the tokens form valid phrases of the language (this is also called parsing). Semantic analysis verifies that given whatever additional information applies, do the phrases make sense according to the rules of the language (for example, to the operands of the operator have the correct type). Semantic analysis also builds the internal representation of the program to be passed to the back end of the compiler. Don't get too hung up though on the division between syntatic and semantic analysis. What gets checked where sometimes gets shifted around for the convenience of the compiler writer. For example, in Algol-60, some of the rules on legal combinations of expressions involving boolean and arithmetic operators were spelled out in the grammar of the language. However, the grammar for Algol expressions is not LL(1), that is it can not the parsed using a parsing technique in vogue at the time most people were writing Algol-60 compilers. So, the compiler writers changed the grammar of Algol to allow illegal expressions through syntatic analysis. They then put the checks for the illegal cases in semantic analysis. (It turns out that checks had to go into semantic analysis anyway, so there was no reason not to take this approach.)
116.11		DREGS::BLICKSTEIN	Dave	`Thu Nov 13 1986 10:52`	43
	Your friend definitely needs humbling. Each language tends to have its own set of semantic definitional problems that impede optimization. I can't claim to know C very well but I do know that C's thorny side is aliasing. C has almost uninhibited aliasing. A C pointer dereference is almost like an optimization block. It has the effect of saying "take everything you think you know about the current state of variables in the program and forget cause this pointer dereference can (but in most cases actually does not) invalidate that information". So you lose some very important optimizations (CSEs, code motions, value propagation, range propagation, dead store elimination, split lifetimes, etc.) in the general case for an event that might occurs in the rare case. (Optimizer must always assume the worst case or they will generate incorrect code. This is known as "unsafe" optimization.) Considering that C is touted as a system language, this is one strike against it. Aliasing is handled much better in Pascal, Ada and BLISS. Why do I mention that? You should notice that it tends to support the recommendcation of using the most appropriate language. If your program passes a lot of pointers around, you now have some reasons to consider not using C since there are languages that deal with them much better. Also, its my understanding that VAX C is considered to be among the best optimizing C compilers. (That's certainly the case for the VAX because the C group has received lots of requests to port VAX C to Unix.) If VAX C is one of the better C optimizers and yet still loses benchmarks to other VAX languages that should at least cast some doubt on your friends statements about the optimization potential of the C language which by the way I think are mostly wrong - i.e. C doesn't appear to me as a language that was built for speed. C's concept of speed is allowing you to code in hi-level assembler to get 'manual optimization'. This is different from having a language whose semantics are designed to allow a high level of 'automatic optimization'. In my opinion, your friend is being a bit sophomoric. db
116.12	Brief digression on Sun's C compiler	TLE::FELDMAN	PDS, our next success	`Thu Nov 13 1986 11:14`	14
	As an aside about Sun: My recollection is that they have only released their optimizing F77 compiler, and that their optimizing C compiler is still under development. On the other hand, I assume they took responsibility for redoing the actually code generator part of the Unix C compiler -- the part that has to spit out the 68000 instructions. I am certainly willing to believe that they did a better job at this than the ATT (or Berkeley) people did on the VAX equivalent for the Unix C compiler; I really don't know for sure. Gary
116.13	on unix, use 'as'; on vms, use the vcg	COOKIE::DOUCETTE	Chuck Doucette, Database A/D @CXO	`Thu Mar 10 1988 00:18`	31
	I want to make a few observations/comments that are based on current knowledge (or lack thereof). Please correct me if I am wrong. On unix (Ultrix) the standard way of generating code for compilers (at least this is done by the C compiler cc) is to a) write an assembler file, b) call c2 for peephole-optimization (of the assembler file), and c) call the unix assembler 'as' to assemble it and generate object code. The unix assembler 'as' is supposedly written for compiler writers rather than humans. For example, it allows the compiler to include high-level debugging information for dbx. And, it doesn't do macros like MACRO-32 on VMS does. On VMS, as far as I know, after a compiler (most of them I believe) gets a good parse tree it then passes it in some common form to the VCG which does optimization and object code generation. I'm not sure if the VCG helps compilers output debugging information or does other things (like listings). I find the compiler our group uses (DECWRL Modula-2) very interesting because a) it optimizes the parse tree directly (rather than some intermediate form), b) it generates an intermediate language (p-code, a simple stack machine) file that is supposed to already be optimized and portable (to machines other than vax, a titan for instance), and c) it translates this into vax assembler and then d) calls the unix assembler 'as' to generate object code. For more details about this compiler, see Dr. Mike Powell's paper, "A Portable Optimizing Compiler for Modula-2". One of the interesting arguments I have seen in the usenet news-group on compilers is whether compilers should generate object code by themselves or leave this to an assembler or something else (like the VCG). Chuck
116.14	Not everyone uses the VCG or MACRO!	MINAR::BISHOP		`Thu Mar 10 1988 13:40`	19
	Re .13: I was on the Pascal project for a few years, and am now on the BLISS project. I can tell you that VAX Pascal, VAX FORTRAN and BLISS (-16, -32 and -36) all use different language-specific code generators, not the VCG. They do not call the assembler. I believe that Ada, C, and PL/1 use the VCG. BLISS and Pascal optimize the parse tree, and are portable to other architectures (or at least designed to be so). The great advantage to no using an assembler is that the compiler knows more about the structure of the program--thus VAX Pascal leaves pseudo-instructions in the stream to mark the begining and ending of loops and sub-routines which have been expanded in-line. This makes certain peepholes possible which an assembler could not do safely. -John Bishop
116.15		TLE::HOBBS		`Thu Mar 10 1988 14:00`	6
	Also, VAX BASIC, VAX COBOL, VAX DIBOL, and VAX RPG use code generators separate from the VCG. VAX SCAN is another language product that does use the VCG. The structure of each of these compilers is often very different from the others. It is very difficult to make any statement that applies to all the different VAX language products.
116.16		AITG::VANROGGEN		`Thu Mar 10 1988 20:56`	9
	And VAX LISP and VAX PROLOG also have different compiler organizations and code generators. VAX LISP's compiler is incremental and dynamic, in that at run-time one can very easily create new code, compile it (if desired for efficiency, otherwise can run it interpreted), and then execute it. Another reason for it being different is that it has its own notions of linkage and memory management for efficiency reasons. (I.e., CALLS and LIB$* are too slow.)
116.17	Ada	TLE::MEIER	Bill Meier - VAX Ada	`Sat Mar 12 1988 15:43`	5
	re: past note Although VAX Ada does use the VCG for output of the final phase of code generation, a number of optimizations are done to the semantic tree first.
116.18		PSW::WINALSKI	Paul S. Winalski	`Sat Mar 12 1988 22:24`	21
	Here are a couple of reasons not to emit assembler source code from a compiler and then use the assembler to produce the final object code: 1) The translation from assembler source to instructions is trivial, particularly when compared to a lot of the other stuff a compiler does. Instead of having the compiler go to all the bother of producing the instruction stream in ASCII form only to have the assembler put it back into binary, why not have the compiler directly build the binary in the first place? It ups the compilation speed to do it directly. 2) When one materializes the program as assembler source, compiler-generated temporary names, such as branch labels, assume a real existence and one has the potential of clashes with user names (such as external variables). Unix compilers hack their ways around this problem by prefixing or appending varying numbers of underscores to user variable names. This has all manner of curious side-effects, particularly in other tools such as linkers and debuggers. When a compiler directly generates the object code, there is no problem with temporary names clashing with user names, since the compiler knows which ones are the temps. --PSW
116.19		MOIRA::FAIMAN	Ontology Recapitulates Philology	`Mon Mar 14 1988 10:32`	6
	Also, when the compiler emits assembler source, there is the possibility of the user modifying that source before assembling it. This is such a uniformly bad idea that anything that makes it easier ought to be avoided. -Neil
116.20	Yes, it's obviously bad.	SMURF::REEVES	Jon Reeves, ULTRIX compiler group	`Tue Mar 15 1988 09:17`	6
	That's right, with assembler source the user might actually be able to work around a bug without an expensive support contract, or might be able to apply application-specific optimizations without an expensive source code license. Sorry, I'm feeling cynical today.
116.21		TOKLAS::FELDMAN	PDS, our next success	`Tue Mar 15 1988 09:51`	25
	On the other hand, a common reason for writing assembler is that it is easier for the compiler writers than to have to write object code. This is especially true when you have no control over the object language. Thus an independent software house could easily justify writing assembler source (especially on a system such as Unix, with its pipes). And, as stated earlier, if there is a peephole optimizer built into the assembler, you get to share in the manufacturers expertise with respect to instruction timings. Obviously, these arguments doesn't apply to the systems vendor, such as DEC, since we do control the object language and are responsible for any such optimizers. Of course, the above arguments apply even more strongly to code generators. If we ever were to ship a callable, documented version of the VCG, it would be the obvious way to go for third parties writing new compilers for DEC hardware. Not that I'm suggesting we actually do that. (Though, if we could get past the legalities, we might want to license it for languages that we know we'll never support ourselves, such as Jovial.) Gary PS Having had to munge code from a buggy Unix Modula compiler myself, I quite agree with the remarks in .20. It's unfortunate that I was even in such a situation, but I was glad the workaround of editing .a files existed.
116.22	Well, not quite	DSSDEV::JACK	Marty Jack	`Tue Mar 15 1988 11:17`	12
	writing assembler source (especially on a system such as Unix, with its pipes). pcc doesn't use pipes to connect the phases. It uses temporary files. And, as stated earlier, if there is a peephole optimizer built into the assembler, you get to share in the manufacturers expertise The peephole optimizer is not built into the assembler. It runs as a separate assembler-source-to-source pass between code generation and assembly.
116.23	It pays to generate .OBJs	RACHEL::BARABASH	Digital has you now!	`Wed Mar 16 1988 10:12`	15
	I guess I should add my experience here. The VAX OPS5 compiler used to generate MACRO back when it was "internal only". When I taught it to generate VAX .OBJs, the compiler effectively got FIVE TIMES FASTER. The time required to build XCON (the VAX eXpert CONfigurer) went down from something like 8 CPU hours on an 11/780 to 2 hours (almost half of that in the VMS Linker!). Conclusion: it does indeed pay for compilers to assemble the object code. By the way, I found the computer-generated assembly language not signific- antly easier to read/maintain than the .OBJs. -- Bill B. [So that's why academicians like McKeeman are dissatisfied with the speed of their U*ix compilers!]
116.24		SMURF::REEVES	Jon Reeves, ULTRIX compiler group	`Wed Mar 16 1988 10:41`	12
	A couple of misconceptions here: . The UNIX as assembler was designed to be incredibly fast, mostly by sacrificing many features in MACRO. Therefore, experiences with MACRO (and VMS process creation) may not translate well to UNIX. . In ULTRIX, at least, knowledge of short jumps vs. long jumps is embedded solely within the assembler; therefore, in a sense, the assembler is doing some optimization after the compiler and the peephole optimizer have finished. Obviously, in a system with fixed-length instructions, this is not an issue.
116.25		TOKLAS::FELDMAN	PDS, our next success	`Wed Mar 16 1988 11:40`	16
	Re: .22 Thanks for the correction. I didn't know that the peephole optimizer was a separate phase, but it makes sense, since it's consistent with the old Unix philosophy of writing filters. I do have a strong recollection of seeing a script for the cc command, but either I'm misremembering the pipes, or the script I saw was merely expository and not the actual implementation. As for the general issue at hand, I'm wondering: Is the question really whether or not compilers should produce code by going through an assembler? or is the question whether compilers should be able to produce a listing file that can be assembled, in addition to generating the object modules directly? Gary
116.26	Layered products on Ultrix: name 3	DENTON::AMARTIN	Alan H. Martin	`Thu Mar 17 1988 19:47`	4
	Re .24: I rather doubt that VAX C depends upon any assemblers to resolve branch lengths. /AHM
116.27	DECnet, LISP, FORTRAN? :-)	SMURF::REEVES	Jon Reeves, ULTRIX compiler group	`Fri Mar 18 1988 19:36`	8
	re .26: Sorry, I should have made it clear that I was talking about pcc (and gcc and tcc) on ULTRIX; obviously, vcc has no dependence on the assembler. The point I was really trying to make is that this (long jumps vs. short jumps) is one thing that the assembler can solve very nicely, rather than each compiler [back end] writer having to deal with it too.
116.28		TLE::BRETT		`Fri Mar 18 1988 23:17`	25
	I think you are lumping to much into the title "assembler". To me an assembler does this following... parses source files containing "assembly language". decides what instructions to emit, possibly doing some minor tweaks like short/long jumps, short/long offsets. generates the psect contributions emits them into the object file in some special format Now, it seems to me that the last three steps, omitting the parsing of source files, could be made into a useful common backend for all compilers/assemblers/etc. I don't see any advantage in having compilers generate text just so the assembler can parse it back to instructions again. That is just a waste of cpu cycles and disk space. Of course, if you're in the business of building slow tools to sell fast machines... /Bevin
116.29		TLE::JONAN	Into the Heart of the Sunrise	`Sat Mar 19 1988 15:20`	26
	> I think you are lumping to much into the title "assembler". Well, there are a number of translators that are usually called "assemblers" that do all sorts of things other than "just" assembling instructions from mneumonics. I guess the one really common thread is that they perform 1-1 mappings from the source to "machine instructions". > Now, it seems to me that the last three steps, omitting the parsing Sure, but then I suppose it's reasonable to have an "assembler" (include the parse step) act as this common "backend" too. > I don't see any advantage in having compilers generate text just... Certainly not much of one (previous replies offer some possible reasons...), and the disadvantages clearly seem to outweigh any such possible benefits. > Of course, if you're > in the business of building slow tools to sell fast machines... Cheap shot! ;^) /Jon
116.30	OK, let's look at efficiency...	SMURF::REEVES	Jon Reeves, ULTRIX compiler group	`Mon Mar 21 1988 14:58`	27
	Let me offer another reason for generating assembler source. I have the advantage(?) of running both styles of compiler on my workstation, so it's easy to compare. Let's look at one ease-of-use feature: suppose I want to find out how the compiler will treat an operation; in other words, I want to see the machine code that was generated. Now, with any compiler, I can compile and link, then use a debugger to inspect the result, but besides being inefficient, this probably loses the original variable names (and, if jumps are involved, it gets really messy); in any case, it involves using a complex tool unrelated to the task. With a compiler that generates assembler, I specify the -S switch and examine the .s file. Creates one extra file (and some hackery can avoid that), but reasonably quick. With a compiler that produces only an object file, I need to also generate a listing file, then I need to run a script to extract the 9 meaningful lines out of the 77 lines (OK, that's a little unfair -- 45 of them contained no information at all). I now have two files to dispose of (listing and object) instead of one. Oh, and as for efficiency: the assembler generator took less than half the time of the other one (they are close to equal -- slight edge to the assembler-generator -- for generating object files, in this case). That's probably more a measure of fit-to-operating-system than anything, though.
116.31	But why do you care?	TLE::MEIER	Bill Meier - VAX Ada	`Mon Mar 21 1988 15:58`	12
	re: .-1 I don't think many people (except maybe for C users :-) us a "HHL" and inspect the machine code generated. Either for bugs, or for possible "tweaks" in the source code that would improve the generated code. What language(s) do you think people look at the generated assembler output? And what for? And how often? And, if the answer to some of those questions is yes, for performance reasons, and/or frequently, I would say you have picked a HHL when you should have picked assembler to start with. I consider .-1 a weak argument at best.
116.32	Exactly -- leave it to the compiler writer...	MJG::GRIER	In search of a real name...	`Mon Mar 21 1988 16:49`	33
	Re: .31: This (inspecting generated code) is a frequent practice of C programmers evidently. Before working at Digital I was considering a position at Masscomp, a graphics workstation outfit in Mass., working for a famous ex-DECcie (Jack Burness of DEClander fame.) What Jack wanted me to do was after they ran the C compiler on their source, I was going to be inspecting the output and tweaking it as much as possible for the 68000 based systems they build for size and efficiency before compiling it. They didn't trust the compiler to produce good code. Anyways, I guess it's evident that which option you want depends on your attitude towards the language. If you want the language to be an abstraction tool to remove yourself from the implementation issues, you probably didn't use C anyways, and thus probably didn't use un*x, and are trusting enough to let the compiler designers do the optimization for you. However, it is typical that hackers who are used to using inefficient compilers and trying to write nasty C statements want to look at the compiled code and mung it up as much as possible also. You begin to get (in my opinion) too worried about the implementation, and lose touch with the abstraction which the language is providing. I agree with .31. If you care that much about knowing the details of the run-time code, you shouldn't be writing in a HLL anyways. (but then there's the argument by many that C isn't a HLL...) -mjg
116.33	Trust the compiler, it knows everything	SMURF::REEVES	Jon Reeves, ULTRIX compiler group	`Mon Mar 21 1988 18:11`	24
	Fine. So, we are condemned to writing in assembler code (which is obviously not portable) or blindly trusting the compiler. In point of fact, in my latest example, I wasn't looking at the generated output for any reason related to efficiency; instead, I was assuming the compilers were correct and trying to figure out (by looking at what they were generating) exactly what the results of casting signed and unsigned bit fields were. But then, in the abstract world, I suppose I should have used printf (and spent time agonizing over how to keep the call to printf from affecting the results). Or -- I should look in the manual, right? Well, I looked in four different references, and none directly addressed the question I had; I'm sure it was in at least some of them, but buried behind enough hidden cross references that it was much faster to experiment. (And that's not an indicment of the documents: if you documented all the odd corners of any language in enough detail to answer all possible questions, you'd have an unusably large manual.) Gee, if I carry this argument far enough, I could argue against producing even the listing of assembly code -- after all, the people that care about the generated code should be able to plug the variables back into a disassembled form, right?
116.34	Machine code has a practical usefullness	URSA::DOTEN	Glenn Doten	`Mon Mar 21 1988 18:32`	15
	Like Jon is saying, I have been forced to use the /MACHINE qualifier a number of times on modules written in Pascal, C, and BASIC. Sometimes I could swear that the compiler just wasn't working properly and my last resort was to look at the generated code and see what was REALLY going on (and a few of those times the compiler was at fault). Sometimes I just didn't understand enough about the language and taking a peek at the machine code gave me the understanding I was missing. This is a very usefull thing to be able to do; compilers certainly aren't perfect and never will be. It's a real pain when things aren't working the way you expect and you are stuck with a compiler that doesn't generate assembly code nor produces the machine code as part of a listing file. Trying to document every little nuance of a compiler simply isn't practical. -Glenn-
116.35	Only for the compiler developer	MJG::GRIER	In search of a real name...	`Mon Mar 21 1988 19:58`	46
	Re: using machine code for debugging purposes: I don't stand by that. If I have some code which doesn't work, I'll stare at it for a long time (as I assume you did.) Then if I'm pretty confident that I coded by abstraction of the algorithm correctly, I'll doubt the compiler and compile with optimization off, with the assumption that most production compilers will produce correct code when there is no optimization applied. From my little experience, most errors which occur between the syntactic/semantic analysis and code generation phases are from incorrect attempts at optimization. I then see what happens. If the code runs correctly with optimization off, I'll try to weed the offending code into a separate module if possible and try again (there are obviously conditions where module-wide attributes can lead to faulty optimization.) If it still doesn't work, and I'm still positive the algorithm is correct, once I've reduced the incorrect code to the smallest chuck possible, I'll compile that with optimization off, and generate relatively slow code which works for a small portion of the module. AND I'll report a bug to the compiler group (SPR external customer/QAR internal). I guess my point is that the only reason to need to see the generated code is to debug the compiler, and that's not really our job anyways. It's the compiler-writer's. Hopefully the bugs were introducted in the optimization phase of the compilation, and thus can be worked around by requesting that the compiler not perform optimization. If the bugs occur in a non-optimized case, then (slightly heavy handed comment here) the compiler probably wasn't ready for release anyways, and the writer is again to blame, and in this case in a nasty way, because a basic function of the language is malfunctioning. The test suite for the compiler was inadequate. I personally am not against allowing the compilers to show you the machine code generated. I've (in moments of terrible despiration) done the same thing as Glenn suggested, only to find my own problems. (In my case a MACRO-32 routine wasn't saving all the registers it should have, which the compiler didn't use in the /NOOPTIMIZE case, but did in the /OPTIMIZE case.) So I guess you can sum up my opinion as that machine code output should only be a compiler-debug feature, not a programming tool. -mjg
116.36	No -- even when the compiler is right	SMURF::REEVES	Jon Reeves, ULTRIX compiler group	`Tue Mar 22 1988 10:04`	12
	Perhaps I was unclear. I didn't mean that machine code inspection should be used early in the debugging process. In the case at hand, the problem had already been diagnosed as, "casts of bit fields are behaving in ways that are not clear to me." The results had already been tested on three different compilers, and found to be identical; therefore, there was no question of a compiler bug. The goal was to reach understanding by experiment: trying certain representative cases. There is no easy way to determine the result of a cast operation in C other than code inspection. Finally, note my personal name... but I've run across similar cases when I wasn't in a compiler group, and when I wasn't using C.
116.37	I can "C"	TLE::MEIER	Bill Meier - VAX Ada	`Tue Mar 22 1988 14:03`	6
	re: .-1 "There is no easy way to determine the result of a cast operation in C other than code inspection." Perhaps that makes some statement about the language?
116.38		DSSDEV::JACK	Marty Jack	`Tue Mar 22 1988 15:35`	7
	The result of a cast is the same as if the value casted were assigned to a temporary variable of the type casted to. Seems clear to me. The Masscomp compilers are the only Unix compilers I know of that produce a listing file, let alone with the machine code in it. I had a lot to do with their creation in a previous life (but I didn't write the optimizer, hold the rocks please).
116.39	Don't slime other compilers with pcc	DENTON::AMARTIN	Alan H. Martin	`Thu Mar 24 1988 11:14`	66
	Re < Note 116.28 by TLE::BRETT >: Hmmm. I've never really heard the case for using a machine-specific IL and a shared peephole optimized/object file writer put quite so clearly before. It is a good idea, and you expressed it well. Re < Note 116.30 by SMURF::REEVES "Jon Reeves, ULTRIX compiler group" >: Your example neglects a corollary of a point of Paul's from .18 - that compiler-generated machine listings have the potential to (and traditionally do) contain more information for a human reader than the output meant for consumption by an assembler could. For instance, if a C int formal parameter named "i" lives 4 bytes beyond the start of the argument list, an assembly code listing could say this: MOVL i,R0 or MOVL i(AP),R0 or MOVL 4(AP),R0 ;i,R0 while I see that pcc (at least) generates this: MOVL 4(AP),R0 Note that output destined for assemblers without nested scopes for defining symbols (most assemblers, I bet), cannot simply say this: i=4 MOVL i(AP),R0 because there might be a nested definition of "i" in the routine. (This is why AT&T C++ preprocessor changes all auto variables named foo to identifiers something like the form _au0_foo, _au1_foo, etc.) You could cut out the extra garbage if "i" was unique, but that's just more work. See HELP BLISS/MACHINE_CODE_LIST:([NO]ASSEMBLER,[NO]UNIQUE_NAMES) for an implementation which does a lot to deal with these issues. > With a compiler that produces only an object file, I need to also > generate a listing file, then I need to run a script to extract > the 9 meaningful lines out of the 77 lines (OK, that's a little > unfair -- 45 of them contained no information at all). I now have > two files to dispose of (listing and object) instead of one. Your position seems to reflect living in an environment where compilers don't generate assembly-code listings. I've had better from KA10s and its F40 compiler since before I graduated from high school, well over a decade ago. (Including the switch combination "/LIST/NOBINARY" which doesn't write an unwanted object file, yet doesn't suppress internal code generation). Perhaps you should raise your expectations of compilers, or lower your opinion of what you use. Re < Note 116.33 by SMURF::REEVES "Jon Reeves, ULTRIX compiler group" : If Harbison&Steele doesn't answer your question handily, I'll be disappointed. Of course the answer may be "implementation-dependent", but then that's part of the charm of using C. Re < Note 116.35 by MJG::GRIER "In search of a real name..." >: Have you ever had an SPR or QAR response that says "User error - application makes illegal assumptions which cause different results under optimization"? That seems at least as likely as uncovering lots of errors in the DEC compilers I've worked with. /AHM