T.R | Title | User | Personal Name | Date | Lines |
---|
374.1 | Don't sweat the efficiency | TLE::JBISHOP | | Mon Mar 07 1994 10:24 | 36 |
| Well, there are several considerations when building a parser:
1. Ease of initial coding;
2. Ease of modification if the grammar changes;
3. Ease of use--does the parser recognize chunks that
make sense, or odd fragments; can error-recovery be
put in easily; can error-repair ("PL/1 Uses...") be
put in;
4. Efficiency at run-time.
The last is very much a minor point. Any parser is going
to have to look at the input (O(n) and all the file open and
read overhead)) and build an internal form (tree/symbol table
or whatever, also O(n)). So we're arguing about the "extra"
part, which isn't going to make much difference.
Generally people don't seem to pay enough attention to point
three and too much to points one and two. Two in particular
gets far to much credit--real grammar changes are rare, as
most language changes are just adding new operators or attributes.
Of the compilers I've worked on, VAX Pascal is recursive-descent,
because it's easy to code and modify and corresponds closely
to the semantics in the heads of the developers; the various
BLISSes are also recursive-descent. But the drawback is that
error-recovery is hard, and there's no boost from using a
generator tool. I worked on a debugger (F77DEBUG), which used
a tool called Galileo to generate a parser--I think it was
LL(1) but don't know for sure. Outside of my direct experience,
I know that DCE IDL uses Yacc and Lex, as I've seen the input
files to those tools.
-John Bishop
|
374.2 | my 2� | AUSSIE::GARSON | Hotel Garson: No Vacancies | Mon Mar 07 1994 22:26 | 41 |
| re .0
> In general, how is the lexical analysis/parsing done?
I would guess that amongst Digital's suite of compilers a significant
determining factor in the method used would be history.
Nevertheless the first thing that you should look at is whether your
grammar allows the proposed parsing method.
e.g. VAX Pascal uses recursive descent with some lookahead because of
certain constructs of VAX Pascal that can't be parsed without lookahead.
For a sufficiently awkward grammar, recursive descent could be ruled out
and you might *need* to use a table driven LALR parser (with tables
generated automatically).
The fact that spaces are not significant in FORTRAN (and certain other
idiosyncrasies) means that it *requires* a more specialised scanner/parser
than languages like C, Pascal, Ada.
> Right now, I'm leaning toward gaining as much efficiency as possible.
I can't see the justification for this, particularly for a school project
rather than a commercial product.
In my personal experience it's more fun doing your compiler in a way
that allows you to add new features quickly so that you can one-up your
fellow students. (-:
MHO is that the trend these days in commercial compilers is towards
harder working compilers that produce better code i.e. run-time speed
is valued more highly than compile-time speed.
Another significant consideration for a compiler is ease of portability
and ease of retargeting.
> or lib$tparse for VMS space;
For the record, I don't think that LIB$TPARSE is a good way of parsing
any serious sized language. lex and yacc are probably available under
VMS via POSIX if you decide to use them and your platform is VMS.
|
374.3 | some more general questions... | STAR::MOLLOY | | Tue Mar 08 1994 09:27 | 33 |
| Thank you for the responses. In reference to note 374.2:
>> Right now, I'm leaning toward gaining as much efficiency as possible.
> I can't see the justification for this, particularly for a school project
> rather than a commercial product.
> In my personal experience it's more fun doing your compiler in a way
> that allows you to add new features quickly so that you can one-up your
> fellow students. (-:
Well, I can understand what you are saying; However, this is a graduate level
course, with the potential of becoming a PhD thesis. So, I'm not really looking
to one-up my fellow students. I'm interested in theory, and hopefully try and
get an understanding of its relevance to commercial applications. I haven't had
any undergraduate experience in compiler design, however, I have a genuine
interest in compiler technology, and especially in optimization theory, both
from a run time environment as well a compile time environment and its
applicability to highly parallel architectures.
Anyway, the justification to my question was mainly to understand the reasoning
used to implement parsing and design theory for some of the commercial
compilers. For a school environment, if generality is a question, then why
not use a Turing machine to execute the parsing, since it's more general in
the Chomsky hierarchy? I realize the ease of coding is one of the driving
issues. However, I just want to get a better understanding of theory with
regard to time/space tradeoffs.
Overall though, I would guess that history/folklore does govern the way
projects are implemented. Thanks again.
--John Molloy
|
374.4 | Take a look at DCL ;-) | CVG::PETTENGILL | mulp | Wed Mar 09 1994 23:53 | 27 |
| Take a look at DCL (and CDU) and then ask the question, `what was the driving
forces behind their parsers?'. In contrast, look at unix shells; tcl source
is readily available and has a lot in common with DCL (easy ;-) to extend).
The unix shells have a lot in common with the original BASIC. How do you
figure out what kind of statement you are processing? Simple, look at the
first token (which can't be abbreviated). BASIC and a bunch of other similar
languages, FOCAL-8 comes to mind, and the shells really simplified things
by the form of the language.
On the other hand, other languages are nearly completely context driven with
few or no reserved words (PL/I).
Then there is the matter of how macro processing fits in to the parsing problem.
There is the simplistic unix model where that is a completely separate phase,
but langauge evolution and the need to support source debugging has made the
preprocessing more involved with the base language parsing. And then there
is BLISS where the macro facility is an integral part of the language. (BLISS
structures are a specialized form of macro).
Or how about TECO.
Oh, and before you say, `DCL, shells, and TECO are interpretive and don't
count', well there are some strong justifications for compiling them;
unfortunately the ad hoc ways that they evolved makes that difficult, but
impossible. REXX compilers are available, along with debuggers and other
development tools.
|
374.5 | an interesting recent report | MOVIES::HANCOCK | | Wed Mar 16 1994 04:27 | 6 |
| > Then there is the matter of how macro processing fits
> in to the parsing problem.
For a very interesting perspective on this, see SRC report 121,
`Extensible syntax with lexical scoping'.
|