The Purdue Compiler Construction Tool Set

                         Version 1.06 Update

                           December 1, 1992


1.  Enhancements

     This section describes the update of PCCTS Version 1.00  to  Ver-
sion  1.06  (1.01-1.05  were  internal releases).  The list of changes
here is not complete since many "little" fixes  were  made  (actually,
the predicated parsing feature is major).  Here, we concentrate on the
enhancements to the system; file BUGS100 contains a list  of  the  bug
fixes  incorporated  in  the 1.06 release.  In addition, note that the
manual has not been updated and will  not  be  until  the  next  major
release (or until we get around to it).

1.1.  Predicated Parsing - ALPHA Version

     Normally, parsing is  a  function  of  syntax  alone.   Semantics
(meaning/value  of  the input) are not taken into account when parsing
decisions are made-context-free parsing.  Typically,  languages,  such
as  C++, have some rules which require the lexical analyzer to consult
the symbol table  to  determine  what  token  (e.g.  classname  verses
enumerated name verses typedef name) is given to the parser.  However,
lexical analyzers have no context information except the current token
of  lookahead and must be coerced via flags to yield the various token
types.  Ad hoc approaches become increasingly difficult  to  implement
as  the number of ambiguities (due to lack of semantic information) in
a grammar rises.

     Context dictates the scope of variables in programming languages.
Because  only  the parser has context information, it is reasonable to
assume that the parser is the correct place to maintain and access the
symbol  table;  there  are  other situtations in which it is useful to
have context direct parsing.  Unfortunately, most parser generators do
not  allow  grammar  actions to influence the parse; i.e. parsing is a
function of syntax alone.  Because PCCTS  generates  recursive-descent
LL(k) parsers in C, allowing user-defined C actions to influence pars-
ing is a straightforward and natural addition to the PCCTS description
meta-language.    Although  bottom-up  parsing  strategies  can  allow
actions to direct parsing, bottom-up parser have  much  less  semantic
information; i.e. LR and LALR techniques allow only S-attributed gram-
mars whereas LL allows L-attributed grammars (for example,  LL  always
knows  which set of rules it is currently trying to match whereas LALR
does not, in general; see your favorite book on parsing theory).

     In this section, we introduce a  rudimentary  version  of  predi-
cates,  with  a  full implementation to appear in future releases.  To
support context-sensitive parsing PCCTS 1.06 allows a new action  type
called predicate (-pr must be used), which is denoted:


                                                                Page 1

                                                                 PCCTS


<<predicate>>?


or, optionally,


<<predicate>>?[fail_action]


where the second notation "passes" an argument of  an  action  to  the
predicate  operator  "?", which is executed upon predicate failure.  A
predicate is a C action that evaluates to  either  true  (success)  or
false  (failure)  and can be used for both semantic validation and for
disambiguating syntactic conflicts in the underlying grammar.

     Validation predicates are used as simple tests and are  function-
ally equivalent to the following normal action:


if ( !predicate ) {error message; exit rule}


or, when a fail action is present:


if ( !predicate ) {fail_action}


For example,


a   :   A <<pred>>? B
    ;


Matches A, evaluates pred, and matches B if pred is true else an error
is  reported  (i.e.  "failed  predicate  'pred'") and rule a is exited
before matching B.  To generate a more useful error message, the  user
may specify a fail action:


a   :   A <<pred>>?[fail] B
    ;


which executes fail when pred evaluates to false.

     A predicate which validates a production can, at the  same  time,
be  used  to disambiguate a syntactically ambiguous grammar structure.
To be a candidate, a predicate must appear on the extreme left edge of
a  production.   Disambiguating predicates are used in the alternative
prediction expressions to enable or disable certain  alternative  pro-
ductions   of  a  block  (rule  or  subrule).   Parsing  in  this  new


                                                                Page 2

                                                                 PCCTS


environment can be viewed in this way:

step 1: Disable invalid productions
     An invalid production is a production whose disambiguating predi-
     cate evaluates to false.

step 2: Parse as normal block
     Once a list of valid productions has been found, they are  parsed
     as a normal rule or subrule.

Essentially, disambiguating predicates "gate" alternative  productions
in  and  out  depending on their "applicability".  Productions without
predicates have an implied  predicate  of  <<TRUE>>?;  i.e.  they  are
always valid.

     A single predicate can be used as both a validation and a  disam-
biguating  predicate-ANTLR  determines which way to use them automati-
cally.  The role taken by a predicate depends on its location  in  the
grammar.   Predicates are all considered validation predicates as they
must always evaluate to true for parsing of the  enclosing  production
to  continue.   However,  when  a grammatical ambiguity is discovered,
ANTLR searches for visible predicates to disambiguate  the  construct.
A  visible  predicate  is  one  which can be evaluated in a production
prediction expression without consuming a token or executing an action
(except  initialization  actions);  e.g. all disambiguating predicates
appear on the left edge of some production.  Fail actions are not used
in  prediction  expressions because a failed predicate disables a pro-
duction; it does not report a syntax  error  unless  no  other  viable
predicates  are present; a viable production is one which is predicted
by the lookahead.  The action of moving a predicate into a  prediction
expression is called hoisting; currently, at most one predicate can be
hoisted per ambiguous production.  In general, predicates may only  be
a function of:

User variables
     This is dangerous as hoisting may evaluate  the  predicate  in  a
     scope  where  the  variable(s)  does  not  exist; e.g. using rule
     parameters can be a problem if a predicate that  references  them
     is hoisted outside of the rule.

Attributes
     Only attributes in the left context can be used; i.e.  attributes
     of rules/tokens created before the predicate is evaluated.

Lookahead
     The  next  k  tokens  of  lookahead  can  be  tested  via  LA(i),
     LATEXT(i).

Also, predicates may not have side-effects (user  must  undo  anything
that  was  changed if they do; in other words, they must "backtrack").
See the Future Work section  for  information  regarding  the  use  of
predicates and extremely large lookahead buffers.

     Consider the following simple grammar:


                                                                Page 3

                                                                 PCCTS


a   :   <<pred1>>? A B
    |   <<pred2>>? A C
    ;


This grammar is LL(2) (-k 2) and ANTLR would not report an  ambiguity.
However,  it  is LL(1) ambiguous (-k 1) and antlr t.g would generate a
warning message:


t.g, line 1: warning: alts 1 and 2 of rule ambiguous upon { A }


Now, if we allow predicates to be used with antlr -pr t.g, then  ANTLR
finds that both predicates are visible and can be used to disambiguate
the production; it is up to the user to ensure that the predicates do,
in  fact, disambiguate the predicate.  ANTLR would generate code simi-
lar to the following:


a()
{
    if ( pred1 && LA(1)==A ) {
    }
    else if ( pred2 && LA(1)==A ) {
    }
    else error;
}


The following grammars are syntactically and  semantically  equivalent
to the above grammar, but the predicates are not hoisted trivially:


a   :   ( <<pred1>>? A B )
    |   <<pred2>>? A C
    ;


ANTLR looks inside the subrule of production one and finds that  pred1
can  be  evaluated  before  executing  an action and before matching a
token; it is hoisted for use in the prediction expression of  rule  a.
Predicates can be hoisted from other rules as well:


                                                                Page 4

                                                                 PCCTS


a   :   b
    |   c
    ;

b   :   <<pred1>>? A B
    ;

c   :   <<pred2>>? A C
    ;


Rule a would still have the  same  production  prediction  expression.
The  following grammar does not have predicates that can be hoisted to
disambiguate the prediction:


a   :   A <<pred1>>? B             /* cannot hoist past token */
    |   <<action>> <<pred2>>? A C  /* cannot hoist past action */
    ;


     Now, consider how a simplified version  of  K&R  C  variable  and
function declarations could be specified:


decl:   /* variable definition, function DECLARATION/DEFINITION */
        type declarator ( func_body | ";" )
    |   /* function DEFINITION */
        declarator func_body
    ;

type:   "int"
    |   "float"
    |   WORD
    ;

declarator
    :   WORD { "\(" args "\)" }
    ;

func_body
    :   ( decl )* "\{" (decl)* (stat)* "\}"
    ;


unfortunately, rule decl is totally ambiguous in the LL(1) sense  (and
in  the  LL(k)  sense)  since WORD can begin both alternatives of rule
decl.  Increasing the lookahead to two tokens does  not  help  as  the
alternatives are ambiguous upon WORDWORD.  Alternative one could match
my_type var and alternative two could match func my_type;  hence,  the
second  alternative  matches  at  least one invalid sentence.  This is
because the ( decl )* subrule of rule func_body, which would match the
second  WORD  of WORD WORD, does not know if an argument list was seen


                                                                Page 5

                                                                 PCCTS


previously or not.  The question  of  whether  to  match  a  parameter
definition  list  after  a  declarator  is context-sensitive because a
function header may or may not have been seen immediately beforehand.

     One way in which this subset could be recognized is to  recognize
a  large superset of the language and then, using actions, verify that
the input was valid:


decl:   {type} declarator ( func_body | ";" )
    ;


But, jamming everything together makes action introduction  more  com-
plicated.   It  is  better  to specify the input language exactly with
predicates.

     To overcome the ambiguity in rule decl and to ensure that only  a
function  declarator is followed by a parameter definition, we augment
the grammar as follows:


decl:   <<int is_func;>>

        /* variable definition, function DECLARATION/DEFINITION */
        type declarator > [is_func]
        (   <<is_func>>?[printf("warning: declarator not func\n");]
            func_body
        |   ";"
        )

    |   /* function DEFINITION */
        declarator > [is_func]
        <<is_func>>?[printf("warning: expecting func header not %s\n", LATEXT(1));]
        func_body
    ;

type:   "int"
    |   "float"
    |   <<LA(1)==WORD ? isTYPE(LATEXT(1)) : 1>>?
        WORD
    ;

declarator > [int is_func]
    :   <<LA(1)==WORD ? !isTYPE(LATEXT(1)) : 1>>?
        WORD ( "\(" <<$is_func=1;>> args "\)" | <<$is_func=0;>> )
    ;

func_body
    :   ( decl )* "\{" (decl)* (stat)* "\}"
    ;


In rule type, we add a predicate that ensures that, if  the  lookahead


                                                                Page 6

                                                                 PCCTS


is a WORD, that the WORD is, in fact, a user-defined type; nothing can
be said if the lookahead is something else, so we  return  success  in
that  case; i.e. the predicate does not apply for that lookahead (con-
text).  If rule declarator is called, a failure of the predicate  will
report  a  semantic  error.   In  our  example,  the predicate is also
hoisted for use as a  disambiguating  predicate  in  rule  decl.   The
predicate in rule declarator ensures that, if the lookahead is a WORD,
that the WORD is not a user-defined  type.   As  before,  it  is  also
hoisted  to  rule  decl  to disambiguate the second production of that
rule.  Both productions of rule decl have predicates that are  hoisted
into  the  prediction expressions for those productions; ANTLR assumes
that the hoisted  predicates  completely  disambiguate  the  decision,
which  is true in our case.  The first predicate physically present in
rule decl ensures that, if a function body is coming up, the  declara-
tor  was  a  function header.  The second predicate in rule decl again
ensures that a function header was seen before trying to match a func-
tion  body.   The  variable is_func is set to the return value of rule
declarator, which is set by two simple actions in declarator.

     Currently, the context under which predicates are  evaluated  may
not be correct; i.e. currently ANTLR does not compute the context from
which it hoists a predicate.  For example,


a   :   b B
    |   (X Y | A C)
    ;

b   :   <<pred>>? A
    ;


The predicate pred will be hoisted for use in disambiguating  rule  a.
However, the predicate will be evaluate regardless of whether A, which
is its context, or X is on the input stream.  It  may  be  invalid  to
apply  the  predicate  when the lookahead is X.  The user may overcome
this by changing pred thus:


a   :   b B
    |   (X Y | A C)
    ;

b   :   <<LA(1)==A ? pred : TRUE>>? A
    ;


which will ensure that the predicate is only evaluated when A is  seen
on the input stream.

     Another problem area lies in that hoisting can  occur  in  situa-
tions that will break the semantics.  For example:


                                                                Page 7

                                                                 PCCTS


a   :   b[3]
    |   A
    ;

b[int i]
    : <<f($i)>>? A
    ;


The predicate f(i) will be hoisted to rule a where the parameter i  is
not  even defined.  Either do not do this or put a predicate (dummy or
otherwise) between the decision and the predicate you do not  wish  to
be hoisted:


a   :   <<1>>? b[3]
    |   A
    ;

b[int i]
    : <<f($i)>>? A
    ;


     This is an alpha release feature in that it is not anywhere  near
what  we  want  it  to  do  and  has NOT been thoroughly tested.  User
beware.  We do not guarantee that the syntax of predicates  will  stay
this  way;  however,  the semantics will not change.  We want users to
play with predicates to find new, better or different  ways  of  doing
things.   We really want user feedback on these critters before a full
implementation is developed.  For example, how do you want the  parser
to respond if no valid, viable production is found?


a   :   <<p1>>? A
    |   <<p2>>? A
    |   B
    ;


If the input were C, a correct error message would be reported:


line 1: syntax error at "C" missing { A B }


This is the correct message because it is truly a syntax,  as  opposed
to  semantic, error.  On the other hand, upon input A, when predicates
p1 and p2 are false, the parser reports the following bizarre message:


line 1: syntax error at "A"


                                                                Page 8

                                                                 PCCTS


which is not very meaningful.  The fail function, zzFAIL, found  that,
in  fact,  A  is syntactically valid and got confused because an error
was raised; this is because the fail routine has no semantic  informa-
tion.   A  suitable  definition  of  error  reporting in the predicate
environment must be created.

     Please send any  suggestions/questions  to  parrt@ecn.purdue.edu,
hankd@ecn.purdue.edu   or   use   the   email   server   mechanism  at
pccts@ecn.purdue.edu (send a Subject: line  of  "question  antlr"  and
fill in the body of the email with your suggestion/question).

1.2.  1.06 Is Written in 1.00

     The grammar for the PCCTS meta-language (file  format)  has  been
implemented in Version 1.00, making heavy use of #lexclass directives.
File lexhelp.c has been eliminated due to the superiority of  1.00  to
1.00B.

1.3.  ANTLR Compiles Under ANSI C

     Because of the rewrite of the grammar and some rewrites of  ANTLR
code,  ANTLR now compiles with ANSI compilers without a wimper (except
for two "unknown escape sequence" warnings).  Your mileage may vary.

1.4.  Grammar Files Distributed

     antlr.g and dlg_p.g are now included in the  source  distribution
for  1.06.  Note that the 1.00 PCCTS (both ANTLR and DLG) are required
to process these grammar files.  DO NOT DESTROY YOUR OLD COPY OF  1.00
PCCTS (at least save the executables).

1.5.  Script Generates Makefiles for PCCTS Projects

     A C program called genmk.c is available in the support  directory
of the PCCTS release which has the following usage:

genmk project f1.g f2.g ... fn.g

It generates a makefile that creates an executable,  project,  from  a
set of grammar files.  For example, genmk t t.g generates:


                                                                Page 9

                                                                 PCCTS


#
# PCCTS makefile for: t.g
#
DLG_FILE = parser.dlg
ERR_FILE = err.c
HDR_FILE = stdpccts.h
TOK_FILE = tokens.h
K = 1
ANTLR_H = .
BIN = .
ANTLR = $(BIN)/antlr
DLG = $(BIN)/dlg
CFLAGS = -I. -I$(ANTLR_H)
AFLAGS = -fe err.c -fh stdpccts.h -fl parser.dlg -ft tokens.h -k $(K)
-gk
DFLAGS = -C2 -i
GRM = t.g
SRC = scan.c t.c err.c
OBJ = scan.o t.o err.o


t: $(OBJ) $(SRC)
        cc -o t $(CFLAGS) $(OBJ)

t.c parser.dlg : t.g
        $(ANTLR) $(AFLAGS) t.g

scan.c : parser.dlg
        $(DLG) $(DFLAGS) parser.dlg scan.c


This program is handy when beginning a new PCCTS project or when first
learning about PCCTS.

1.6.  DLG Supports Case Insensitive Scanners

     DLG has two new options which provide control over the case  sen-
sitivity  of  the generated scanner.  Specifically, case insensitivity
implies that when a character is referenced in a  regular  expression,
DLG  behaves  as  if the user had typed both upper and lower case ver-
sions of that character; i.e. (a|A) where a is  some  character.   The
new options are:

-ci  Make lexical analyzer case insensitive

-cs  Make lexical analyzer case sensitive (default).

1.7.  Delayed Lookahead Fetches in Generated Parser

     Currently, PCCTS generates parsers which always have k tokens  of
lookahead.   This is done by following the strategy that another token
is fetched (zzCONSUME) when one is matched (zzmatch).  This can  be  a


                                                               Page 10

                                                                 PCCTS


problem for actions that need to occur after a token has been matched,
but before the next token of lookahead is fetched.  This  is  somewhat
overcome  in  PCCTS  1.00  by delaying the fetch if an action is found
immediately after a token reference.  With the new  delayed  lookahead
scheme,  the  next  token  is  not  consumed  until  the next match is
required.  This means that any  action  before  the  next  match  (not
necessarily  adjacent to the previous match) will be executed before a
lookahead fetch occurs.  Turn on ANTLR option -gk and DLG option -i to
enable  this  feature.   This  feature  appears to work with AST's and
attributes with the constraints mentioned below in the  incompatibili-
ties  section  (e.g.  use of LA(i) and LATEXT(i) has been restricted).
It has been tested with the C (k>1 and Pascal (k==1) examples provided
in the release and with several other large grammars.

     This feature is primarily useful for  developers  of  interactive
tools.   Previously, it was really hard to get PCCTS to generate truly
interactive tools.  It appeared as if the parser was always waiting on
a  token  fetch  rather than executing an appropriate action.  E.g. in
PCCTS 1.00,


a : ( "A" "B" "C" "\n" ) <<printf("got it\n");>>
  ;


would not print got it until one token  after  the  newline  had  been
typed.   PCCTS  1.06  will  generate  parsers  that  print the message
immediately upon newline and will exit  without  waiting  for  another
token as there are no token references following the action.

     Another way in which delayed lookahead is useful lies in transla-
tors which add symbols to the symbol table which must be examined by a
lexical action.  If a lookahead fetch occurs  too  fast,  the  lexical
action may miss the introduction of a symbol into the symbol table.

     This feature is a bit flaky for the  moment-LA(i)  and  LATEXT(i)
will  generally not have the same values as when the -gk option is not
used (for k>=2).  Use attributes to access token values; the lookahead
buffer  is not really a user object.  If you insist upon accessing the
lookahead buffer, use LA(0) and LATEXT(0), which typically access  the
last  token  matched  and last text matched respectively; this is dis-
tinguished from LA(1),  which  means  the  next  token  of  lookahead.
Accessing  the  next token of lookahead is invalid because it will not
be fetched from the input stream until needed (just  before  the  next
decision or match).

1.8.  Tutorials Available

     With release 1.06, we  are  distributing  both  a  beginning  and
advanced  tutorial.  They have not been thoroughly "debugged" much are
much better than nothing:

Beginning
     This tutorial introduces the  basic  functionality  of  PCCTS  by


                                                               Page 11

                                                                 PCCTS


     example.   The  user  need not be familiar with parsing theory or
     other compiler tools, but any familiarity  reduces  the  learning
     curve substantially.

Advanced
     Constructing a translator can be viewed as an  iterative  refine-
     ment  process  moving  from language recognition to intermediate-
     form  transformation.   This  tutorial  presents   one   possible
     sequence of refinements.  It uses as many features of PCCTS as is
     reasonable without regards to optimality.  It develops a compiler
     for a simple string manipulation language called sc.  The result-
     ing compiler generates code for a simple stack machine.

1.9.  Error Messages for k>1

     Previous versions of PCCTS did not handle error message correctly
for  k>1.  For example, with two tokens of lookahead and the following
grammar:


a : "A" "B" "D"
  | "A" "C" "E"
  ;


an incorrect input of A D D would yield:


line 1: syntax error at "A" missing A


which is wrong (and incredibly confusing).  The  new  error  mechanism
generates the following error message upon the same incorrect input:


line 1: syntax error at "A D"; "D" not in { B C }


which is infinitely superior.   Unfortunately,  situations  may  arise
when  even  this  method will give an invalid message.  This may occur
when alternatives have lookahead sequences which are  permutations  of
the same tokens.

     The definition of the standard error reporting function,  zzsyn()
has been modified.  The parameter list is now:


                                                               Page 12

                                                                 PCCTS


void
zzsyn(char *text,
      int tok,
      char *egroup,
      unsigned *eset,
      int etok,
      int k,
      char *bad_text);


Users can ignore this as it is transparent to them; unless, of course,
the standard error reporting must be modified.  In addition, zzFAIL is
now a function rather than a macro.

1.10.  Trace Facility has Exit Macro

     Previously, only an entry trace macro  was  inserted  in  parsers
when  the  -gd  ANTLR option was used.  An exit macro has been defined
which resulted in zzTRACE becoming zzTRACEIN.  Also, a  default  trace
macro  prints  out  "enter rule rule"  if  no default trace macros are
defined.  To define your own, the macro definitions must appear in the
#header action.  As before, the sole argument to the trace routines is
a string representing the rule which has been entered or is  about  to
be exited.

1.11.  Resource Limitation

     Occasionally, ANTLR is unable to analyze a grammar  submitted  by
the  user.   This  rare  situation  can only occur when the grammar is
large and the amount of lookahead is greater than  one.   A  nonlinear
analysis  algorithm  is  used  by  PCCTS to handle the general case of
LL(k) parsing.  The average complexity of analysis, however,  is  near
linear  due to some fancy footwork in the implementation which reduces
the number of calls to the full LL(k) algorithm.

     To avoid the situation where ANTLR takes 23 hours of CPU time and
then  runs  out of virtual memory, use the -rl n resource limit option
where n is the maximum number of tree nodes to be used by the analysis
algorithm.   An  error  message  will  be  displayed, if this limit is
reached, which indicates which grammar construct  was  being  analyzed
when  ANTLR hit a non-linearity.  Use this option if ANTLR seems to go
off to lunch and your disk start swapping; try n=10000 to start.  Once
the  offending  construct has been identified, try to remove the ambi-
guity that ANTLR was trying to overcome with large lookahead analysis.
Future  versions  of PCCTS may incorporate a known algorithm that does
not exhibit this exponential behavior.

1.12.  Rule Prefix Option

     An ANTLR option  has  been  added  that  prefixes  all  functions
corresponding  to  rules  with  a prefix.  This can be used to provide
symbol hiding in your project to isolate the parser.  It can  also  be
used  to allow rule names that correspond to C keywords such as if and


                                                               Page 13

                                                                 PCCTS


typedef.

1.13.  Standard PCCTS Header

     Two new ANTLR options have been added that control  the  creation
of  a  standard  PCCTS header file - stdpccts.h.  Option -gh instructs
ANTLR to create a file, stdpccts.h unless -fh is used, which  contains
all  header  information needed by non-PCCTS generated files that want
to access PCCTS symbols.  For example,  it  indicates  the  k  of  the
parser,  whether  trees are being constructed, whether lookahead is to
be delayed, and indicates what  the  user  specified  in  the  #header
action in the grammar file.  Previously, the user had to manually con-
struct this information from the grammar file in order  to  place  the
information  in  a non-PCCTS C file.  The -fh option is merely used to
rename stdpccts.h.

1.14.  Doubly-Linked AST's

     A new function is available in ast.c which will  doubly-link  any
tree  that  is  passed  in.   To use this option, the user must define
zzAST_DOUBLE in the #header directive or on the command-line of the  C
compile.   This  defines  left  and up fields automatically in the AST
node typedef.   ANTLR  generated  parsers,  normally,  only  construct
singly-linked  trees.  The fields can be filled in via code similar to
the following:


#header <<
#define AST_FIELDS      user-fields;
>>

<<
main()
{
        AST *root = NULL;

        ANTLR(start(&root), stdin);
        zzdouble_link(root, NULL, NULL);
}


where the function is defined as:


zzdouble_link(AST *t, AST *left, AST *up);


1.15.  C++ Compatible Parsers

     PCCTS parsers may now be compiled with C++  compilers;  i.e.  the
output is more ANSI C-like than before.  It has been successfully com-
piled with GCC 2.2, but not with GCC 1.37.  We do not  guarantee  any-
thing.   To  be safe, use the -ga option so that PCCTS generates ANSI-
style prototypes for functions generated  from  rules.   As  a  simple


                                                               Page 14

                                                                 PCCTS


example, consider:


#header <<
#include "charbuf.h"
/* stdio.h is included, by default, but doesn't seem to bother stream.h */
#include <stream.h>
#include <stdlib.h>
>>

#token "[\ \t\n]" <<zzskip();>>

<<
main()
{
        ANTLR(a(), stdin);
        cout << "end of C++ test\n";
}
>>

a       :       "A" "B" << cout << $1.text << $2.text << "\n"; >>
        ;


which does not do much:


% t
A B
AB
end of C++ test
%


but it does compile with G++ 2.2 except that a  warning  is  generated
concerning  strncpy()  not being declared before use.  This is trivial
to fix, of course - by modifying the charbuf.h file.  We compiled this
with:


antlr -k 2 -gk -ga t.g
Antlr parser generator   Version 1.06   1989-1992
dlg -C2 -i parser.dlg scan.c
dlg  Version 1.06   1989-1992
g++ -I. -I../1.06/h -g -c scan.c
g++ -I. -I../1.06/h -g -c t.c
t.c: In function `struct Attrib zzconstr_attr (int, char *)':
t.c:19: warning: implicit declaration of function `strncpy'
g++ -I. -I../1.06/h -g -c err.c
g++ -o t -I. -I../1.06/h -g scan.o t.o err.o


We anticipate a rewrite to be more C++ sometime in the future.


                                                               Page 15

                                                                 PCCTS


2.  Acknowledgements

     We acknowledge Dan Lawrence of MDBS for the new  error  reporting
facility concerning greater than one token of lookahead; Dana Hoggatt,
also of MDBS, suggested the rule prefix idea  (-gp  option)  and  beta
tested   1.06.   We  thank  Ed  Harfmann  of  MDBS  for  creating  the
makefile.os2 files and porting it to the PC.  We acknowledge the  fol-
lowing  beta  testers  for  1.06  (alphabetically):   Thomas  Buehlman
(buehlman@iwf.mabp.ethz.ch),  Peter  Dahl   (dahl@everest.ee.umn.edu),
Chris  Song (dsong@ncsa.uiuc.edu), Ariel Tamches (tamches@cs.UMD.EDU).
We reference Russell Quong (quong@ecn.purdue.edu) of Purdue EE for his
work  with  us  on  defining  and  refining predicates.  Ariel Tamches
(tamches@cs.UMD.EDU) deserves attention for hacking on the particulars
of the alpha-release predicates.

3.  Machine Compatibility

     PCCTS Version 1.06 has been tested on the following platforms:

Sun 3/60

Sun SPARC I, II

Encore Multimax running Umax 4.3

Sun sparcstation IPX

NeXTstation

Decstation 3100 running Ultrix 4.2

DEC 5000

Linux on 386PC

Microsoft C 6.0 on PC OS/2, DOS

CSET/2 C compiler on PC OS/2

IBM RS6000

MIPS r2000

4.  Incompatibilities

     Due to the bug fixes in 1.06, "new"  ambiguities  may  appear  in
your grammar.  These were always there-ANTLR simply did not find them.
The analysis is more correct.

     Calls to zzTRACE are no longer generated by the -gd option.  Now,
zzTRACEIN,  zzTRACEOUT  are  called  at the beginning and end of func-
tions, respectively.

     The way in which  PCCTS  translates  actions  has  been  changed;


                                                               Page 16

                                                                 PCCTS


before  they were parsed with a C function, now the #lexclass facility
is being used.  Some differences in  translation  may  be  discovered;
e.g. a character may need to be escaped with \ whereas before the sim-
ple character was sufficient.

     The user should no longer set the next token of lookahead or  the
text  of  the  next  token  in  the  lexical  analyzer using LA(1) and
LATEXT(1).  This is incompatible with the -gk option; hence,  NLA  and
NLATEXT should be used instead where the N means "next".

     The -ga does not generate anything different as the code  genera-
tor now dumps both K&R and ANSI with #ifdef's around the ANSI code.

     Previously, no prototype was given when -ga was off.  Now, proto-
types  are  always  generated (with the appropriated #ifdef's).  These
prototypes can conflict with the outside environment if the rule names
are  things  like  if  and  stat  (which  is  a system call).  Use the
-gp prefix option to prefix all functions corresponding to rules  with
a string of your choice.

5.  Future Work:

     Predicates are still under development.  We  expect  an  enhanced
version of PCCTS that computes context and hoists more aggressively.

     Often a grammar construct cannot be left factored  to  remove  an
ambiguity.   This  typically  arises  in the situation that the common
prefix can be arbitrarily long.  Fortunately, input is typically  fin-
ite  and  one could scan past these constructs given enough lookahead.
This is not the same thing as backtracking as the parser  never  backs
up;  it  simply  looks  ahead really far to make a decision.  This can
easily be handled with a predicate of the form:


<<is_this_found_ahead(context)>>?


which would look ahead in the lookahead buffer to see if  the  context
occurred  within  some  finite number of tokens.  This concept is very
similar to LR-Regular (LRR) parsers for those  familiar  with  parsing
theory.   Note  that  this  is a very fast, cheap way to get something
resembling the power of backtracking.

     Attribute names  are  expected  to  be  enhanced.   For  example,
instead of $i, $token_name could be used:


a : WORD TYPE <<printf("%s %s\n", $WORD, $TYPE);>>
  ;


     We expect to have a graphical user interface to PCCTS sometime in
the  future  which allows entry of grammars using syntax diagram nota-
tion.  The interface is expected to run under X Windows.


                                                               Page 17

                                                                 PCCTS


     We anticipate a version that supports object-oriented programming
and  generates C++ instead of ANSI C.  For the moment, PCCTS is compa-
tible with C++, but future versions will support C++.

     Future versions, both C and C++, will be able to refer  to  PCCTS
symbols  by  pccts.symbol  instead  of  zzsymbol.   E.g. zzskip() will
become pccts.skip().

     DLG will soon use lookahead of its own to allow  the  recognition
of  more  complicated expressions; specifically, those which have left
substrings in common with other regular expressions.

     We expect future versions of PCCTS to dump grammar  analysis  and
parser construction statistics such as how many rules required 1 token
of lookahead, how many ambiguities etc...


                                                               Page 18