[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference turris::languages

Title:Languages
Notice:Speaking In Tongues
Moderator:TLE::TOKLAS::FELDMAN
Created:Sat Jan 25 1986
Last Modified:Wed May 21 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:394
Total number of notes:2683

374.0. "Some general compiler questions..." by NEMEA1::MOLLOY () Mon Mar 07 1994 09:10

    Hi,
    
    I just have a general interest question, which resulted from
    the comiler project I'm working on, in school. 
    
    In general, how is the lexical analysis/parsing  done?... I'm
    not looking for detailed information, just general info.  For
    example, with the languages BLISS, C, C++, ADA,... is there
    a generic tool used something like LEX/YACC for U*ix space,
    or lib$tparse for VMS space; or is the parser hand crafted
    to obtain optimal performance?
    
    The reason I'm asking the question is because I'm not exactly
    sure which approach to use, that is, use lex/yacc or other
    utilities which are easy to implement, at the cost of efficiency,
    or to develop a more specialized scanner/parser which may be
    more efficient, but requires greater time to develop.  Right now,
    I'm leaning toward gaining as much efficiency as possible.
    
    Any input would be appreciated, thanks.
    
    	-- John
T.RTitleUserPersonal
Name
DateLines
374.1Don't sweat the efficiencyTLE::JBISHOPMon Mar 07 1994 10:2436
    Well, there are several considerations when building a parser:
    
    1.	Ease of initial coding;
    
    2.	Ease of modification if the grammar changes;
    
    3.	Ease of use--does the parser recognize chunks that
    	make sense, or odd fragments; can error-recovery be
    	put in easily; can error-repair ("PL/1 Uses...") be
    	put in;
    
    4.	Efficiency at run-time.
    
    The last is very much a minor point.  Any parser is going
    to have to look at the input (O(n) and all the file open and
    read overhead)) and build an internal form (tree/symbol table
    or whatever, also O(n)).  So we're arguing about the "extra"
    part, which isn't going to make much difference.
    
    Generally people don't seem to pay enough attention to point
    three and too much to points one and two.  Two in particular
    gets far to much credit--real grammar changes are rare, as
    most language changes are just adding new operators or attributes.
    
    Of the compilers I've worked on, VAX Pascal is recursive-descent,
    because it's easy to code and modify and corresponds closely 
    to the semantics in the heads of the developers; the various 
    BLISSes are also recursive-descent.  But the drawback is that
    error-recovery is hard, and there's no boost from using a
    generator tool.  I worked on a debugger (F77DEBUG), which used
    a tool called Galileo to generate a parser--I think it was
    LL(1) but don't know for sure.  Outside of my direct experience,
    I know that DCE IDL uses Yacc and Lex, as I've seen the input
    files to those tools.
    
    		-John Bishop
374.2my 2�AUSSIE::GARSONHotel Garson: No VacanciesMon Mar 07 1994 22:2641
re .0
    
>   In general, how is the lexical analysis/parsing  done?
    
    I would guess that amongst Digital's suite of compilers a significant
    determining factor in the method used would be history.
    
    Nevertheless the first thing that you should look at is whether your
    grammar allows the proposed parsing method.
    
    e.g. VAX Pascal uses recursive descent with some lookahead because of
    certain constructs of VAX Pascal that can't be parsed without lookahead.
    For a sufficiently awkward grammar, recursive descent could be ruled out
    and you might *need* to use a table driven LALR parser (with tables
    generated automatically).
    
    The fact that spaces are not significant in FORTRAN (and certain other
    idiosyncrasies) means that it *requires* a more specialised scanner/parser
    than languages like C, Pascal, Ada.
        
>   Right now, I'm leaning toward gaining as much efficiency as possible.
    
    I can't see the justification for this, particularly for a school project
    rather than a commercial product.
    
    In my personal experience it's more fun doing your compiler in a way
    that allows you to add new features quickly so that you can one-up your
    fellow students. (-:
    
    MHO is that the trend these days in commercial compilers is towards
    harder working compilers that produce better code i.e. run-time speed
    is valued more highly than compile-time speed.
    
    Another significant consideration for a compiler is ease of portability
    and ease of retargeting.
    
>    or lib$tparse for VMS space;
    
    For the record, I don't think that LIB$TPARSE is a good way of parsing
    any serious sized language. lex and yacc are probably available under
    VMS via POSIX if you decide to use them and your platform is VMS.
374.3some more general questions...STAR::MOLLOYTue Mar 08 1994 09:2733
Thank you for the responses.  In reference to note 374.2:

>>   Right now, I'm leaning toward gaining as much efficiency as possible.
    
>    I can't see the justification for this, particularly for a school project
>    rather than a commercial product.
    
>    In my personal experience it's more fun doing your compiler in a way
>    that allows you to add new features quickly so that you can one-up your
>    fellow students. (-:
 
Well, I can understand what you are saying; However, this is a graduate level
course, with the potential of becoming a PhD thesis.  So, I'm not really looking
to one-up my fellow students.  I'm interested in theory, and hopefully try and
get an understanding of its relevance to commercial applications.  I haven't had
any undergraduate experience in compiler design, however, I have a genuine 
interest in compiler technology, and especially in optimization theory, both
from a run time environment as well a compile time environment and its 
applicability to highly parallel architectures.

Anyway, the justification to my question was mainly to understand the reasoning
used to implement parsing and design theory for some of the commercial 
compilers.  For a school environment, if generality is a question, then why 
not use a Turing machine to execute the parsing, since it's more general in 
the Chomsky hierarchy?  I realize the ease of coding is one of the driving 
issues. However, I just want to get a better understanding of theory with 
regard to time/space tradeoffs.

Overall though, I would guess that history/folklore does govern the way 
projects are implemented.  Thanks again.

      --John Molloy

374.4Take a look at DCL ;-)CVG::PETTENGILLmulpWed Mar 09 1994 23:5327
Take a look at DCL (and CDU) and then ask the question, `what was the driving
forces behind their parsers?'.  In contrast, look at unix shells; tcl source
is readily available and has a lot in common with DCL (easy ;-) to extend).

The unix shells have a lot in common with the original BASIC.  How do you
figure out what kind of statement you are processing?  Simple, look at the
first token (which can't be abbreviated).  BASIC and a bunch of other similar
languages, FOCAL-8 comes to mind, and the shells really simplified things
by the form of the language.

On the other hand, other languages are nearly completely context driven with
few or no reserved words (PL/I).

Then there is the matter of how macro processing fits in to the parsing problem.
There is the simplistic unix model where that is a completely separate phase,
but langauge evolution and the need to support source debugging has made the
preprocessing more involved with the base language parsing.  And then there
is BLISS where the macro facility is an integral part of the language.  (BLISS
structures are a specialized form of macro).

Or how about TECO.

Oh, and before you say, `DCL, shells, and TECO are interpretive and don't
count', well there are some strong justifications for compiling them;
unfortunately the ad hoc ways that they evolved makes that difficult, but
impossible.  REXX compilers are available, along with debuggers and other
development tools.
374.5an interesting recent reportMOVIES::HANCOCKWed Mar 16 1994 04:276
> Then there is the matter of how macro processing fits
> in to the parsing problem.

For a very interesting perspective on this, see SRC report 121,
`Extensible syntax with lexical scoping'.