PPG : a parser generator for extensible grammars
PPG is a parser generator for extensible grammars, based on the
CUP
parser generator. It provides the ability to extend an existing base
language grammar written in CUP or PPG with localized, easily
maintained changes.
Distribution of PPG
PPG was designed and written by Michael Brukman and Andrew C. Myers
at Cornell University. It is part of the
Polyglot Java extensible compiler
toolkit.
It can be also be obtained
separately.
Questions about PPG should be directed to Andrew Myers
(myers@cs.cornell.edu).
PPG syntax
The PPG grammar syntax extends that of CUP with several new declarations:
- The first line of a PPG specification is where filename
is a relative path to the inherited specification. A PPG spec can
include a CUP spec or another PPG spec. It is an error for the
chain of included files to contain a cycle.
- A PPG grammar specification can modify the inherited grammar
using the following commands:
- precedence [ left | right |
nonassoc ] tokenlist;
is the syntax
for specifying new precedence rules, which is exactly as they
are specified in CUP. Specification of new precedence rules
in a PPG specification replaces the precedence rules in the
inherited grammar with these.
- precedence;
deletes all precedence rules
from the inherited grammar, and is mutually exclusive with
the syntax above: to add new precedence rules use the above
syntax, use of this statement will only remove precedence
rules.
- drop { symbol }
where symbol is an inherited terminal or
nonterminal.
The specified symbol is removed from grammar, and
if a nonterminal, all productions where it is on the
left-hand side are also eliminated. It is an error to drop
a non-terminal and not drop productions where it is mentioned on the
right-hand side.
- drop { S ::= <
productions> ; }
where S
is an inherited nonterminal, and productions
are inherited productions. The specified productions are not
inherited from the base grammar. The nonterminal remains in the
grammar, even if ALL of its productions are dropped in this way;
drop { S } must be used if the
nonterminal is to be eliminated.
- override S ::= <
productions> ;
The specified productions replace productions of S.
- extend S ::= <
productions> ;
The specified productions are added to the nonterminal S.
- transfer S to
A1 { rhs1 } to
An { rhsn }
where
rhsi is one or more right-hand sides
of productions of S. Each of the nonterminals
Ai is extended with productions as specified,
and the transferred productions are not inherited
by S, so a single production can be transferred to
multiple nonterminals. Note that S may be one of the
Ai, which has the effect of retaining the
productions in S.
- New terminals, nonterminals, and productions can be defined
as in CUP and extend the base grammar.
- PPG supports multiple start symbols in grammars. To specify
which symbols may be used as start symbols, PPG provides the
following syntax:
start with S1
func1
...
start with Sn
funcn
where each Si is a non-terminal symbol that
you would like to start the parser with, every time you call
the function funcn. PPG will automatically
generate a new nonterminal symbol for each of these productions
to differentiate between the different nonterminals that the
parsing may start with now, and patch the grammar accordingly.
After the parser class is instantiated, any of the
funci can be called on it to parse the appropriate
part of the grammar.
Note: if you are using multiple inheritance, make sure to specify the
symbols class you are planning to use with the -symbols
switch. See running PPG section.
Running PPG
The syntax for invocation of PPG is
ppg [-symbols <class name>]
<grammar file>
where
- -symbols <class name>
is an optional switch which specifies the name of the constant class,
as in CUP, and is only applicable if you are using multiple inheritance.
You should specify the same class name to PPG as you will to CUP
when you invoke it on the resulting output, because PPG will use that
class name. The default class name is "sym", as it is in CUP.
- <grammar file>
is either a standard CUP file or a PPG-based grammar inheritance file.
The inherited base grammar file is searched for relative to the location
of the PPG source file.
Semantics of PPG
The order of specification of commands is not important. The resulting
productions for any nonterminal N is as follows:
N' = N - drop(N) - transferL(N) + override(N) + transferR(N) +
extend(N) + newProductions(N)
The resulting available set of terminals is simply:
T' = T - drop(T) + newTerminals
where T is the set of terminals from base grammar, drop(T) is the
set of dropped terminals using the drop command, newTerminals
are terminals defined using CUP syntax. The resulting set of nonterminals
is similar.
For precedence rules:
if "precedence;" is specified by the PPG grammar, thus
not inheriting any precedence rules from the base grammar, or
where any newly defined precedence rules override any inherited precedence
rules from the base grammar.