| An obvious suggestion is to add another production to the grammar:
dummy_start_symbol : reaL_start_symbol_1
| real_start_symbol_2
This implies that the input determines which start symbol to use, as opposed
to having some other controlling aspect of the program make that determination.
Depending on the grammar, this may work, or it may generate conflicts. If it
generates conflicts, then the resulting grammar isn't adequate for yacc to
determine which start symbol to use.
ydb (a third-party, yacc compatible parser generator) allows you to have multiple
parsers within the same image. Using ydb, you could have two almost
identical grammars (differing only in the selection of the start symbol) and
two parsers at run time. This would have the extra overhead of two parse
tables. I don't believe yacc allows this, because it's parser hardwires the
names of the tables, but you could certainly hack around that.
Or you could apply your suggestion. If you lexer is written cleanly, it
shouldn't be too difficult to explain and maintain.
Dare I ask why you're trying to do this? It seems somewhat unusual to want to
change the start symbol, though I could probably come up with some plausible
reasons for doing it.
Gary
|
| Re: .-1:
Well, we have two different syntaxes, and each one contains the
other indirectly, but the start point for the syntaxes *are* different.
The actual situation is more complex, but consider one nonterminal
which represents a boolean expression, another which represents a
decision tree (which contains boolean expressions,) and yet another
which contains a list of expressions embedded inside tags intermixed
with normal text. (i.e. boilerplate substitution where each tag
contains arbitrary expressions - possibly including a boolean
expression.) Boolean expressions can involve expressions which are
also found in the context of the boilerplate expressions, so there an
interdependency. (It's likely that the differentiation between a
boolean expression and an arbitrary expression won't be expressed
syntactically, but rather semantically, but that doesn't alleviate my
major problem which is the decision tree vs. boolean expression
determination.)
The three cases are distinct, and while your suggestion about having
a single start symbol with the (n) nonterminals as productions might
work, it would not allow the parser to detect a contextually incorrect
syntax, and instead require a semantic check once the parse is done.
(Well, you could set some flag somewhere which the code for the
productions check and raise an error if something bad happens, but that
sounds like just as much of a hack as sticking a fake token in the
input stream.)
I've thought about it, and I guess the technique I outlined sounds
like the only really possible one, given the tools available. Buying a
3rd party tool sounds like a lot more work than a minor hack. Thanks
anyways!
-mjg
|