[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference turris::languages

Title:	Languages
Notice:	Speaking In Tongues
Moderator:	TLE::TOKLAS::FELDMAN

Created:	Sat Jan 25 1986
Last Modified:	Wed May 21 1997
Last Successful Update:	Fri Jun 06 1997
Number of topics:	394
Total number of notes:	2683

374.0. "Some general compiler questions..." by NEMEA1::MOLLOY () Mon Mar 07 1994 09:10

    Hi,
    
    I just have a general interest question, which resulted from
    the comiler project I'm working on, in school. 
    
    In general, how is the lexical analysis/parsing  done?... I'm
    not looking for detailed information, just general info.  For
    example, with the languages BLISS, C, C++, ADA,... is there
    a generic tool used something like LEX/YACC for U*ix space,
    or lib$tparse for VMS space; or is the parser hand crafted
    to obtain optimal performance?
    
    The reason I'm asking the question is because I'm not exactly
    sure which approach to use, that is, use lex/yacc or other
    utilities which are easy to implement, at the cost of efficiency,
    or to develop a more specialized scanner/parser which may be
    more efficient, but requires greater time to develop.  Right now,
    I'm leaning toward gaining as much efficiency as possible.
    
    Any input would be appreciated, thanks.
    
    	-- John

T.R	Title	User	Personal Name	Date	Lines
374.1	Don't sweat the efficiency	TLE::JBISHOP		`Mon Mar 07 1994 10:24`	36
	Well, there are several considerations when building a parser: 1. Ease of initial coding; 2. Ease of modification if the grammar changes; 3. Ease of use--does the parser recognize chunks that make sense, or odd fragments; can error-recovery be put in easily; can error-repair ("PL/1 Uses...") be put in; 4. Efficiency at run-time. The last is very much a minor point. Any parser is going to have to look at the input (O(n) and all the file open and read overhead)) and build an internal form (tree/symbol table or whatever, also O(n)). So we're arguing about the "extra" part, which isn't going to make much difference. Generally people don't seem to pay enough attention to point three and too much to points one and two. Two in particular gets far to much credit--real grammar changes are rare, as most language changes are just adding new operators or attributes. Of the compilers I've worked on, VAX Pascal is recursive-descent, because it's easy to code and modify and corresponds closely to the semantics in the heads of the developers; the various BLISSes are also recursive-descent. But the drawback is that error-recovery is hard, and there's no boost from using a generator tool. I worked on a debugger (F77DEBUG), which used a tool called Galileo to generate a parser--I think it was LL(1) but don't know for sure. Outside of my direct experience, I know that DCE IDL uses Yacc and Lex, as I've seen the input files to those tools. -John Bishop
374.2	my 2�	AUSSIE::GARSON	Hotel Garson: No Vacancies	`Mon Mar 07 1994 22:26`	41
	re .0 > In general, how is the lexical analysis/parsing done? I would guess that amongst Digital's suite of compilers a significant determining factor in the method used would be history. Nevertheless the first thing that you should look at is whether your grammar allows the proposed parsing method. e.g. VAX Pascal uses recursive descent with some lookahead because of certain constructs of VAX Pascal that can't be parsed without lookahead. For a sufficiently awkward grammar, recursive descent could be ruled out and you might need to use a table driven LALR parser (with tables generated automatically). The fact that spaces are not significant in FORTRAN (and certain other idiosyncrasies) means that it requires a more specialised scanner/parser than languages like C, Pascal, Ada. > Right now, I'm leaning toward gaining as much efficiency as possible. I can't see the justification for this, particularly for a school project rather than a commercial product. In my personal experience it's more fun doing your compiler in a way that allows you to add new features quickly so that you can one-up your fellow students. (-: MHO is that the trend these days in commercial compilers is towards harder working compilers that produce better code i.e. run-time speed is valued more highly than compile-time speed. Another significant consideration for a compiler is ease of portability and ease of retargeting. > or lib$tparse for VMS space; For the record, I don't think that LIB$TPARSE is a good way of parsing any serious sized language. lex and yacc are probably available under VMS via POSIX if you decide to use them and your platform is VMS.
374.3	some more general questions...	STAR::MOLLOY		`Tue Mar 08 1994 09:27`	33
	Thank you for the responses. In reference to note 374.2: >> Right now, I'm leaning toward gaining as much efficiency as possible. > I can't see the justification for this, particularly for a school project > rather than a commercial product. > In my personal experience it's more fun doing your compiler in a way > that allows you to add new features quickly so that you can one-up your > fellow students. (-: Well, I can understand what you are saying; However, this is a graduate level course, with the potential of becoming a PhD thesis. So, I'm not really looking to one-up my fellow students. I'm interested in theory, and hopefully try and get an understanding of its relevance to commercial applications. I haven't had any undergraduate experience in compiler design, however, I have a genuine interest in compiler technology, and especially in optimization theory, both from a run time environment as well a compile time environment and its applicability to highly parallel architectures. Anyway, the justification to my question was mainly to understand the reasoning used to implement parsing and design theory for some of the commercial compilers. For a school environment, if generality is a question, then why not use a Turing machine to execute the parsing, since it's more general in the Chomsky hierarchy? I realize the ease of coding is one of the driving issues. However, I just want to get a better understanding of theory with regard to time/space tradeoffs. Overall though, I would guess that history/folklore does govern the way projects are implemented. Thanks again. --John Molloy
374.4	Take a look at DCL ;-)	CVG::PETTENGILL	mulp	`Wed Mar 09 1994 23:53`	27
	Take a look at DCL (and CDU) and then ask the question, `what was the driving forces behind their parsers?'. In contrast, look at unix shells; tcl source is readily available and has a lot in common with DCL (easy ;-) to extend). The unix shells have a lot in common with the original BASIC. How do you figure out what kind of statement you are processing? Simple, look at the first token (which can't be abbreviated). BASIC and a bunch of other similar languages, FOCAL-8 comes to mind, and the shells really simplified things by the form of the language. On the other hand, other languages are nearly completely context driven with few or no reserved words (PL/I). Then there is the matter of how macro processing fits in to the parsing problem. There is the simplistic unix model where that is a completely separate phase, but langauge evolution and the need to support source debugging has made the preprocessing more involved with the base language parsing. And then there is BLISS where the macro facility is an integral part of the language. (BLISS structures are a specialized form of macro). Or how about TECO. Oh, and before you say, `DCL, shells, and TECO are interpretive and don't count', well there are some strong justifications for compiling them; unfortunately the ad hoc ways that they evolved makes that difficult, but impossible. REXX compilers are available, along with debuggers and other development tools.
374.5	an interesting recent report	MOVIES::HANCOCK		`Wed Mar 16 1994 04:27`	6
	> Then there is the matter of how macro processing fits > in to the parsing problem. For a very interesting perspective on this, see SRC report 121, `Extensible syntax with lexical scoping'.