You are on page 1of 26

LR Parsing

A. V. AHO and S. C. JOHNSON


Bell Laboratories, Murray Hzll, New Jersey 07974

The L R s y n t a x analysis m e t h o d is a useful and versatile technique for parsing


deterministic context-free languages in compiling applications. This paper
provides an informal exposition of L R parsing techniques emphasizing the
mechanical generation of efficient L R parsers for context-free grammars.
P a r t i c u l a r a t t e n t m n is given to extending the parser generation techniques to
apply to ambiguous grammars.
Keywords and phrases: g r a m m a l s , parsels, compilers, ambiguous grammars,
context-free languages, L R grammars.
CR calegorzes. 4 12, 5 23

1. INTRODUCTION shall restrict ourselves to a class of parsers


known as LR parsers. These parsers are
A complete specification of a programming efficient and well suited for use in compilers
language must perform at least two func- for programming languages. Perhaps more
tions. First, it must specify the syntax of the important is the fact that we can automati-
language; that is, which strings of symbols cally generate LR parsers for a large and use-
are to be deemed well-formed programs. ful class of context-free grammars. The pur-
Second, it must specify the semantics of the pose of this article is to show how L R parsers
language; that is, what meaning or intent can be generated from certain context-free
should be attributed to each syntactically grammars, even some ambiguous ones. An
correct program. important feature of the parser generation
A compiler for a programming language algorithm is the automatic detection of
must verify that its input obeys the syntactic ambiguities and difficult-to-parse constructs
conventions of the language specification. It in the language specification.
must also translate its input into an object We begin this paper by showing how a
language program in a manner that is con- context-free grammar defines a language.
sistent with the semantic specification of the
We then discuss LR parsing and outline the
language. In addition, if the input contains parser generation algorithm. We conclude
syntactic errors, the compiler should an-
by showing how the performance of LR
nounce their presence and try to pinpoint
parsers can be improved by a few simple
their location. To help perform these func-
tions every compiler has a device within it transformations, and how error recovery and
called a parser. "semantic actions" can be incorporated into
A context-free grammar can be used to the LR parsing framework.
help specify the syntax of a programming For the purposes of this paper, a sentence
language. In addition, if the grammar is de- is a string of terminal symbols. Sentences are
signed carefully, much of the semantics of written surrounded by a pair of single quotes.
the language can be related to the rules of For example, 'a', 'ab', and ',' are sentences.
the grammar. The empty sentence is written ". Two sen-
There are many different types of parsers tences written contiguously are to be con-
for context-free grammars. In this paper we catenated, thus 'a' 'b' is synonymous with

Computing Surveys, Vol 6, No 2, June 1974


100 • A. V. Aho and S. C. Johnson

CONTENTS 'ab'. I n this paper the t e r m language merely


means a set of sentences.

2. GRAMMARS
A g r a m m a r is used to define a language arid
1 lntroductmn to impose a structure on each sentence in
2 Grammars the language. We shall be exclusively con-
3 D e r l v a t m n Trees
cerned with context-free grammars, sometimes
4 Parsers
5 Representing the P a r s i n g Actmn a n d Goto Tables called B N F (for B a c k u s - N a u r form) specifi-
6 C o n s t r u c t m n of a Parser from a G r a m m a r cations.
6 I Sets of I t e m s
62 Constructing the Collectmn of Accesmble Sets of
In a context-free g r a m m a r , we specify
Items two disjoint sets of symbols to help define a
63 Constructing the Parsing A c t m n a n d Goto language. One is a set of nonterminal symbols.
Tables from the Collectmn of Sets of I t e m s
6 4 C o m p u t i n g Lookahead Sets We shall represent a nonterminal symbol by
7 Parsing A m b i g u o u s G r a m m a r s a string of one or more capital r o m a n letters.
8 0 p t l m l z a t m n of L R Parsers For example, L I S T represents a nonterminal
81 Merging Identmal States
82 S u b s u m i n g States as does the letter A. I n the g r a m m a r , one
83 E h m m a t m n of R e d u c t m n s b y Single P r o d u c t m n s nonterminal is distinguished as a start (or
9 Error Recovery
l0 O u t p u t
sentence) symbol.
11 Concluding R e m a r k s T h e second set of symbols used in a con-
References text-free g r a m m a r is the set of termznal
symbols. T h e sentences of the language gen-
erated b y a g r a m m a r will contain only
terminal symbols. We shall refer to a termi-
hal or nontcrminal symbol as a grammar
symbol.
A context-free g r a m m a r itself consists of a
finite set of rules called productzons. A
production has the form
left-side ~ right-side,
where left-side is a single nonterminal s y m b o l
(sometimes called a syntactic category) and
right-side is a string of zero or more g r a m m a r
symbols. T h e arrow is simply a special
s y m b o l t h a t separates the left and right
sides. For example,
L I S T ~ L I S T ',' E L E M E N T
is a p r o d u c t i o n in which L I S T and E L E -
M E N T are nonterminal symbols, and the
q u o t e d c o m m a represents a terminal sym-
bol.
A g r a m m a r is a rewriting system. If aA'r
Copyright (~) 1974, Association for Computing is a string of g r a m m a r symbols and A --+ fl
Machinery, Inc General permission to repubhsh,
but not for profit, all or part of thin materml is is a production, t h e n we write
granted, provided that ACM's copyright notice is
given and that reference is made to this publica- ~A-y ~ a~7
tion, to its date of issue, and to the fact that re-
printing priwleges were granted by permission of and say t h a t aA'y directly derives a~'y. A
the Association for Computing Machinery. sequence of strings
LR Parsing • 101

s0, Sl, - -- , Sn A prefix of a~ in the right sentential form


such t h a t s,-~ ~ s, for 1 ~< i ~< n is said to af3w is said to be a wable prefix of the gram-
be a derwalwn of s~ from ~0. We sometimes mar. For example,
also say s~ is derivable from s0. L I S T ','
The start symbol of a g r a m m a r is called a
sentent,al form. A string derivable from the is a viable prefix of G1, since it is a prefix of
start symbol is also a sententml form of the the right sentential form,
grammar. A sentential form containing only L I S T ',' E L E M E N T
terminal symbols is said to be a sentence
generated by the grammar. The language (Both s and w are null here.)
generated by a grammar (;, often denoted Restating this definition, a viable prefix of
by L(G), is the set of sentences generated by a g r a m m a r is any prefix of a right sentential
G. form t h a t does not extend past the right end
Example 2.1: The following grammar, of a handle in t h a t right sentential form.
hereafter called G~, has L I S T as its start Thus we know that there is always some
symbol: string of g r a m m a r symbols t h a t can be ap-
pended to the end of a viable prefix to ob-
L I S T --~ L I S T ',' E L E M E N T tain a right sentential form. Viable prefixes
L I S T --* E L E M E N T arc important in the construction of com-
E L E M E N T ~ 'a' pilers with good error-detecting capabilities,
E L E M E N T --~ 'b' as long as the portion of the input we have
The sequence: seen can be derived from a viable prefix,
L I S T ~ L I S T ',' E L E M E N T we can be sure t h a t there are no errors t h a t
L I S T ',a' can be detected having scanned only t h a t
L I S T ',' E L E M E N T ',a' part of the input.
L I S T ',b,a'
E L E M E N T ',b,a'
'a,b,a' 3. DERIVATION TREES
is a derivation of the sentence 'a,b,a'. L(G~) Frequently, our interest in a g r a m m a r is
consists of nonempty strings of a's and b's, not only in the language it generates, but
separated by commas. also in the structure it imposes on the sen-
tences of the language. This is the case be-
Note that in the derivation in Example cause grammatical analysis is closely con-
2.1, the rightmost nonterminal in each sen- nected with other processes, such as compila-
tential form is rewritten to obtain the fol- tion and translation, and the translations or
lowing sentential form. Such a derivation is actions of the other processes are frequently
said to be a r~ghlmost der~valzo~ and each sen- defined in terms of the productions of the
tential form in such a derivation is called a grammar. With this in mind, we turn our
mght se~le~t~al form. For example, attention to the representation of a deriva-
L I S T ',b,a' tion b y its demvatwn tree.
For each derivation in a g r a m m a r we can
is a right sentential form of C1.
construct a corresponding derivation tree.
If s A w is a right sentential form in which
w is a string of terminal symbols, and s A w ~ Let us consider the derivation in Example
s~w, then ~ is said to be a handle of s~w * 2.1. To model the first step of the derivation,
For example, 'b' is the handle of the right in which L I S T is rewritten as
sentential form L I S T ',' E L E M E N T
L I S T ',b,a' using production 1, we first create a root
in Example 2.1. labeled by the start symbol LIST, and then
• Some authors use a more restmctmg dehnltmn of create three direct descendants of the root,
handle labeled LIST, ',', and E L E M E N T :

Coraputmg Surveys, Vol 6, No 2, June 1974


102 • A.V. Aho and S. C. Johnson

there is at least one derivation tree with


that sentence as its frontier. A g r a m m a r t h a t
admits two or more distinct derivation trees
( • ,ST ) ~) -"" with the same frontier is said to be ambigu-
ous.
Example 3.1: The g r a m m a r G2 with pro-
ductions
(We follow historical usage and draw our
" r o o t " node at the top.) In the second step L I S T --* L I S T ',' L I S T
of the derivation, E L E M E N T is rewritten L I S T --~ 'a'
as 'a'. To model this step, we create a direct L I S T --* 'b'
descendant labeled 'a' for the node labeled
ELEMENT: is ambiguous because the following two
derivation trees have the same frontier.

LIST )
LIST)
( LIST~ (~ ~LEMENT~) ( •,ST~ © ~ .,ST~
@< ,,s, ~ © ¢ ,,ST)
® @
Continuing in this fashion, we obtain the
following tree:
LIST )
L,,T ~ © ~ ,,,T )
LIST~ © ~ L,,T~ ~
G
In certain situations ambiguous g r a m m a r s
Note t h a t if a node of the derivation tree is can be used to represent programming
labeled with a nonterminal symbol A and its languages more economically t h a n equiva-
direct descendants are labeled X1, X2, - . . , lent unambiguous grammars. However, if an
X,, then the production. ambiguous g r a m m a r is used, then some other
rules should be specified along with the
A--~X1X2... Xn g r a m m a r to determine which of several
derivation trees is to be associated with a
must be in the grammar. given input. We shall have more to say
If a~, a2, . . . , am are the labels of all the about ambiguous g r a m m a r s in Section 7.
leaves of a derivation tree, in the natural
left-to-right order, then the string

al a2 • • • am 4. PARSERS
is called the frontier of the tree. For example, We can consider a parser for a g r a m m a r to be
'a,b,a' is the frontier of the previous tree. a device which, when presented with an
Clearly, for every sentence in a language input string, a t t e m p t s to construct a deriva-

Coraputmg Surveys, Vol 6, No. 2, June 1974


L R Parsing • 103

tion tree whose frontier matches the input. using the production, -.
If the parser can construct such a derivation
E L E M E N T -~ 'a'
tree, then it will have verified that the input
string is a sentence of the language generated To reflect this parsing action, we say that 'a'
by the grammar. If the input is syntactically is reduced to E L E M E N T . Next we use the
incorrect, then the tree construction process production
will not succeed and the positions at which
L I S T --~ E L E M E N T
the process falters can be used to indicate
possible error locations. to obtain the tree:
A parser can operate in many different
ways. In this paper we shall restrict ourselves
to parsers that examine thc input string
from left to right, one symbol at a time. I
These parsers will attempt to construct the
derivation tree " b o t t o m - u p " ; i.e., from the
(ELEMENT~
leaves to the root. For historical reasons,
these parsers are called L R parsers. The " L "
stands for "left-to-right scan of the input",
the " R " stands for " n g h t m o s t derivation."
We shall see that an LR parser operates by Here, E L E M E N T is reduced to LIST. We
reconstructing the reverse of a rightmost then read the next input symbol ',', and
derivation for the input. In this section we add it to the forest as a one node tree.
shall describe in an informal way how a cer-
tain class of L R parsers, called LR(1)
parse-% operate. LIST)
An L R parser deals with a sequence of
I,
partially built trees during its tree construc-
tion process. Wc shall loosely call this se- (ELEMENT~
quence of trees a forest. In our framework the
forest is built from left to right as the input
is read. At a particular stage in the construc-
©
tion process, we have read a certain amount
of the input, and we have a partially con-
We now have two trees. These trees will
structed derivation tree. For example, sup-
eventually become sub-trees in the final
pose that we are parsing the input string
derivation tree. We then read the next input
'a,b' according to the grammar (ix. After symbol 'b' and create a single node tree for
reading the first 'a' we construct the tree: it as well'

Q LIST )
I
Then we construct:
~LEMENT~
© Q
~LEMENT~
Using the production,
E L E M E N T --~ 'b'

Computing Surveys, Vol 6, No 2, June 1974


104 • A . V. Aho and S. C. Johnson

we reduce 'b' to E L E M E N T to obtain: In a reduce action, a production, such as


A ---~X1X2 . . . X=
LIST ) is specified; each X~ represents a terminal or
I nonterminal symbol. A reduction by this
production causes the following operations:
~LEMENT~ ~ELEMENT~ (1) A new node labeled A is created.
(2) The rightmost n roots in the forest
(which will have already been labeled
X1, X2, - . . , X,) are made direct
descendants of the new node, which
then becomes the rightmost root of the
Finally, using the production forest.
LIST --~ L I S T ',' E L E M E N T If the reduction is b y ~ production of the
form
we combine these three trees into the final
tree: A __+ ,,
(i.e., where the right side is the empty
( LIST ) string), then the parser merely creates a root
labeled A with no descendants.

I
\ A parser operates by repeatedly making
parsing actions until either an accept or error
action occurs.
The reader should verify that the follow-

<b () ing sequence of parsing actions builds the


parse tree for 'a,b' in GI:
(1) Shift 'a'
(2) Reduce by: E L E M E N T -+ 'a'
At this point the parser detects that we have (3) Reduce by: L I S T --+ E L E M E N T
read all of the input and announces that the (4) Shift ','
parsing is complete. The rightmost deriva- (5) Shift 'b'
tion of 'a,b' in G1 is (6) Reduce by. E L E M E N T -+ 'b'
(7) Reduce by: L I S T - - ~ L I S T ','
L I S T ~ L I S T ',' E L E M E N T ELEMENT
L I S T ',b' (8) Accept
E L E M E N T ',b'
'a,b' We now consider the question of how an L R
parser decides what parsing actions to make.
In parsing 'a,b' in the above manner, all we Clearly ~ parsing action can depend on what
have done is reconstruct this rightmost actions have already been made and on what
derivation in reverse. The sequence of pro- the next input symbols are. An LR parser
ductions encountered in going through a that looks at only the next input symbol to
rightmost derivation in reverse is called a decide which parsing action to make is
right parse. called an LR(1) parser. If it looks at the
There are four types of parsing actions next k input symbols, k >/ 0, it is called an
that an LR parser can make; shift, reduce, LR(k) parser. To help to make its parsing
accept (announce completion of parsing), or decisions, an LR parser attaches to the root
announce error. of each tree in the forest a number called a
In a shift action, the next input symbol is state. The number on the root of the right-
removed from the input. A new node labeled most tree is called the current state. In addi-
by this symbol is added to the forest at the tion, there is an re,hal state to the left of the
right as a new tree by itself. forest, which helps determine the very first

Computing Surveys, Vol 6, No 2, June 1974


L R Parsing • 105

parsing action. We shall write the states in shift move or a reduce move, the parser
parentheses above the associated roots. For must determine what state to attach to the
example, root of the tree t h a t has just been added to
the forest. I n a shift move, this state is de-
(I) termined by the current state and the input
symbol t h a t was just shifted into the forest.
(UST) For example, if we have just shifted ','
into the forest
I

(0)
@LEMENT~ (5) (1)
©
represents a forest with states. State 5 is the
current state, and state 0 is the initial state.
The current state and the next input symbol Io, ©
determine the parsing action of an LR(1)
parser.
The following table shows the states of an then state 1 and ',' determine the state to be
LR(1) parser for G1, and the associated pars- attached to the new rightmost root ','.
ing actions. In this table there is a column I n a reduce move, suppose we reduce b y
labeled '$' with special significance. The '$' production
stands for the right endmarker, which is A --* X1X2 . . . X ,
assumed to be appended to the end of all
input strings. Another way of looking at this When we make nodes X1, . - . , X,, direct
is to think of '$' as representing the condi- descendants of the root A, we remove the
tion where we have read and shifted all of states t h a t were attached to X1, . . . , Xn.
the "real" characters in the input string. The state t h a t is to be attached to node A
is determined b y the state t h a t is now the
Next Input Symbol rightmost state in the forest, and the non-
terminal A. For example, if we have just
'a' 'b' ',' '$'
reduced b y the production
0 shift shift error error L I S T -~ L I S T ',' E L E M E N T
1 error error shift accept
2 error error Red. 2 Red 2 and created the forest
Current 3 error error Red. 3 Red. 3
State 4 error error Red. 4 Red. 4
5 shift shift error error
6 error error Red. 1 Red 1 ( LIST )
FIG. I. P a r s i n g A c t i o n T a b l e for Gt

The reduce actions are represented as


"Red. n" in the above table; the integer n
refers to the productions as follows:
(1) L I S T -~ L I S T ',' E L E M E N T
(0)
C ",st ?"
LE EN,)
G ©
I G
(2) L I S T --+ E L E M E N T
(3) E L E M E N T --+ ' a ' then state 0 and the nonterminal L I S T de-
(4) E L E M E N T --~ 'b' termine the state to be attached to the root
We shall refer to the entry for row s and L I S T . Note t h a t the states previously at-
column c as pa(s,c). After making either a tached to the direct descendants of the new

Computing Surveys, Vol 6, No. 2, June 1974


106 • A. V. Aho and S. C. Johnson

root have disappeared, and play no role in responding to the current state and the
the calculation of the new state. current input symbol. On the basis of this
The following table determines these new entry (Sh~ft, Reduce, Error, or Accept) do
states for G1. For reasons that will become one of the following four actions:
apparent later, we shall call this table the Shift: Add a new node, labeled with the
goto table for G1. current input symbol, to the forest. Associ-
ate the state
LABEL OF NEW ROOT
goto(current state, input)
LIST ELEMENT 'o' 'b' ','
1 2 3 4 to this node and make this state the new cur-
5 rent state. Advance the input cursor to
RIGHTMOST read the next character. Repeat the step
STATE labeled Parsing Action.
6 3 4
Reduce: If the indicated production is

A --~XIX2 "'" Xn
BOTO TABLE FOR G.I
add a new node labeled A to the forest, and
FIG 2. Goto Table for G1 make the rightmost n roots, n /> 0, direct
descendants of this new node. Remove the
We shall refer to the entry in the row for states associated with these roots. If s is the
state s and column c as goto(s, c). It turns state which is now rightmost in the forest
out that the entries in the goto table which (on the root immediately to the left of the
are blank will never be consulted [Aho and new node), then associate the state
Ullman (1972b)].
An L R parser for a grammar is completely goto(s,A)
specified when we have given the parsing
with the new node. Make this state the new
action table and the goto table. We can
current state. (Notice that the input charac-
picture an LR(1) parser as shown in Fig. 3.
ter is not changed.) Repeat the step labeled
Parsing Action.
,NPUT I ° 1 ' Ibl*l Accept: Halt. A complete derivation tree
4 INPUT CURSOR
/'~OREST CONSISTING~ ' ~ has been constructed.
( OF PARTIALLY CON-- ~ LR(I) I Error: Some error has occurred in the
\ STRUCTEDDERIVATION ] I PARSING I
\ TREE WITH STATES /
ATTACHED ~
I ALGORITHM I
~ input string. Announce error, and then try
to resume parsing by recovering from the
error. (This topic is discussed in Section 9.)
To see how an L R parser works, let us
again parse the input string 'a,b' using the
parsing action function p a (Figure 1) and
FIo. 3. Plctomal Representatmn of an LR(1)
Parser the g o t o function (Figure 2).
Initial,zatwn: We place state 0 into the
forest; 0 becomes the current state.
The LR(1) parsing algorithm can be sum- Parsing Actwn 1: pa(0, 'a') = shift. We
marized as follows: create a new root labeled 'a' and attach state
Initmlize: Place the initial state into an 3 to it (because goto(0, 'a') = 3). We have:
otherwise empty forest; the initial state is
the current state at the beginning of the
parse. (3)
Parsing Action: Examine the parsing ac- (0) Q
tion table, and determine the entry cor-

ComputingSurveys,Vol.6, No. 2, June 1974


L R Parsing • 107

Parsing Actwn 2: pa(3, ',') = reduce 3. Parsing Action 7: pa(6, '$') = reduce 1.
We reduce by production (3) We reduce by production (1)
E L E M E N T --* 'a' L I S T --~ L I S T ',' E L E M E N T
We examine the state immediately to the The state to the left of the newly created
left; this is state 0. Since goto(0, E L E - tree is state 0, so the new state is goto(0,
M E N T ) = 2, we label the new root with 2. L I S T ) = 1.
We now have: Parsing Action 8: pa(1, '$') = accept. We
halt and terminate the parse.
(2) The reader is urged to follow this pro-
cedure with another string, such as 'a,b,a' to
verify his understanding of this process. I t is
~.ELEMEN9 also suggested t h a t he try a string which is
not in L(G1), such as 'a,ba' or 'a,,b', to see
(0) how the error detection mechanism works.
Note t h a t the g r a m m a r symbols on the roots
of the forest, concatenated from left to right,
Parsing Action 3: pa(2, ',') = reduce 2. always form a viable prefix.
We reduce b y production (2) Properly constructed LR(1) parsers can
parse a large class of useful languages called
L I S T -~ E L E M E N T
the deterministic context-free languages. These
goto(0, L I S T ) = 1, so the new state is 1. parsers have a number of notable properties:
Parsing Action 4: pa(1, ',') = shift. We (1) T h e y report error as soon as possible
shift and attach state 5. (scanning the input from left to right).
Parsing Action 5: pa(5, 'b') = shift. We (2) T h e y parse a string in a time which is
shift and attach state 4. We now have proportional to the length of the
string.
(1) (3) T h e y require no rescanning of previ-
ously scanned input (backtracking).
(4) The parsers can be generated mechan-
LIST ) ically for a wide class of grammars,
I including all g r a m m a r s which can be
parsed b y recursive descent with no

(o) ® (5)
©@
(4) backtracking [Knuth (1971)] and
those grammars parsable b y operator
precedence techniques [Floyd (1963)].
The reader m a y have noticed t h a t the
Parszng Action 6: pa(4, '$') = reduce 4. states can be stored on a pushdown stack,
We reduce b y production (4) since only the rightmost state is ever used
at any stage in the parsing process. In a
E L E M E N T -~ 'b'
shift move, we stack the new state. In a
goto(5, E L E M E N T ) = 6, so the new state reduce move, we replace a string of states on
is 6. We now have top of.the stack b y the new state.
For example, in parsing the input string
(t) 'a,b' the stack would appear as follows at
each of the actions referred to above. (The
(LIST) top of the stack is on the right.)
I (6) A ctwn Stack Input
(ELEME9N (5) (ELEMENT) Initial
1
0
0 3
'a,b$'
',b$'
(01 © @ 3
4
2 02
01
0 15
',b$'
',b$'
'b$'

Computing Surveys, VoL 6, N o 2, Juue 1974


108 • A . V . A h o and S. C. J o h n s o n

Actwn Stack Input


A: i f (state = sl) goto = sl'
5 0 1 5 4 '$'
6 0 1 5 6 '$'
7 0 1 '$'
i f (state = s n ) g o t o = s~'
8 0 1 '$'
The goto table of G1 would be represented in
Thus, the parser control is independent of this format as:
the trees, and depends only on a stack of L I S T : i f (state = 0) g o t o = 1
states. I n practice, we m a y not need to con- E L E M E N T : i f (state = 0 ) g o t o = 2
struct the derivation tree explicitly, if the i f (state = 5 ) g o t o = 6
translation being performed is sufficiently
simple. For example, in Section 10, we men- I t turns out t h a t [Aho and Ullman (1972b)]
tion a class of useful translations t h a t can whenever we do a goto on A, the state will
be performed by an L R parser without re- always be one of sl, • • • , sn, even if the input
quiring the forest to be m a i n t a i n e d . string is in error. Thus, one of these branches
If we wish to build the derivation tree, we will always be taken. We shall return to this
can easily do so by stacking, along with each point later in this section.
state, the root of the tree associated with t h a t We shall encode parsing actions in the
state. same spirit, but by rows of the table. The
parsing actions for a state s will also be
represented b y a sequence of pseudo-pro-
gramming language statements. I f the input
5. REPRESENTING THE PARSING ACTION AND
symbols al, . . . , a= have the associated
GOTO TABLES
actions actionl, . . . , actionn, then we will
Storing the full action and goto tables write:
straightforwardly as matrices is extremely
s: i f (input = al) a c t i o n 1
wasteful of space for large parsers. For ex-
ample, the goto table is typically nearly all
i f (input = an) a c t i o n n
blank. In this section we discuss some simple
ways of compacting these tables which lead As we mentioned earlier, we shall attach
to substantial savings of space; in effect, we goto(s,a,) onto the action if action~ is shift.
are merely representing a sparse matrix more Similarly, if we have a reduction b y the
compactly, using a particular encoding. production A --* a, we will usually write
Let us begin with the shift actions. If x is reduce b y A --~ a
a terminal symbol and s is a state, the parsing
action on x in state s is shift if and only if as the action.
goto(s, x) is nonblank. We will encode the For example, the parsing actions for state
goto into the shift action, using the notation 1 in the parser for G~ are represented by:

shift 17 1: i f (input = 'a') e r r o r


i f (input = 'b') e r r o r
as a shorthand for "shift and attach state 17 i f (input = ',') s h i f t 5
to the new node." B y encoding the gotos on i f (input = '$') a c c e p t
terminal symbols as part of the action table,
At first glance this is no saving over the
we need only consider the gotos on non-
table, since the parsing action table is
terminal symbols. We will encode t h e m by
usually nearly full. We m a y make a large
columns; i.e., b y nonterminal symbol name. saving, however, by introducing the notion
If, on a nonterminal symbol A, there are of a default action in the statements. A
nonblank entries in the goto table corre- default action is simply a parsing action
sponding to states s~, s2, • • • , sn, and we have which is done irrespective of the input char-
s,' = goto(s,, A), for i = 1, . - . , n acter; there m a y be at most one of these in
then we shall encode the column for A in a each state, and it will be written last. Thus,
pseudo-programming language: in state 1 we have two error actions, a shift

C o m p u t i n g Surveys, Vol 6, N o 2, June 1974


L R Parsing • 109

action, and an accept action, we shall make in the last section, is dictated b y the current
the error action the default. We will write: state. This state reflects the progress of the
parse, i.e., it summarizes information about
1: i f (input = ',') s h i f t 5 the input string read to this point so t h a t
i f (input = $ ) a c c e p t parsing decisions can be made.
error
Another way to view a state is to consider
There is an additional saving which is the state as a representative of an equiva-
possible. Suppose a state has both error and lence class of viable prefixes. At every stage
reduce entries. Then we m a y replace all of the parsing process, the string formed by
error entries in that state b y one of the re- concatenating the g r a m m a r symbols on the
duce entries. The resulting parser m a y make roots of the existing subtrees m u s t be a vi-
a sequence of reductions where the original able prefix; the current state is the repre-
parser announced error but the new parser sentative of the class containing t h a t viable
will announce error before shifting the next prefix.
input symbol. Thus both parsers announce
error at the same position in the input, but 6.1 Sets of Items
the new parser m a y take slightly longer be- I n the same way t h a t we needed to discuss
fore doing so. partially built trees when talking about pars-
There is a benefit to be had from this modi- ing, we will need to talk about "partially
fication; the new parsing action table will re- recognized productions" when we talk about
quire less space than the original. For building parsers. We introduce the notion of
example, state 2 of the parsing action table item* to deal with this concept. An item is
for G1 would originally be represented by: simply a production with a dot (.) placed
somewhere in the right-hand side (possibly
2: i f (input = 'a') e r r o r at either end). For example,
if (input = 'b') e r r o r
if (input = ',') r e d u c e 2
[LIST ~ L I S T • ',' E L E M E N T ]
if (input = '$') r e d u c e 2
[ E L E M E N T -~ . 'a']
Applying this transformation, state 2 would
be simply represented as: are both items of G1.
We enclose items in square brackets to
2: r e d u c e 2 distinguish t h e m more clearly from produc-
Thus in a state with reduce actions, we tions.
will always have the shift and accept actions Intuitively, a set of items can be used to
precede the reduce actions. One of the reduce represent a stage in the parsing process; for
actions will become a default action, and we example, the item
will ignore the error entries. In a state with-
out reduce actions, the default action will be [A --~ a . f~]
error. We shall discuss other means of cut-
indicates that an input string derivable from
ting down on the size of a parser in Section 8.
a has just been seen, and, if we next see an
input string derivable from f3, we m a y be
able to reduce b y the production A --* aft.
6. CONSTRUCTION OF A PARSER FROM A Suppose the portion of the input t h a t we
GRAMMAR have seen to this point has been reduced to
the viable prefix "ya. Then the item [A --*
How do we construct the parsing action and
a . ~] is said to be valid for ~a if ~A is also a
goto tables of an LR(1) parser for a given
viable prefix. In general, more than one item
g r a m m a r ? In this section we outline a
method that works for a large class of is valid for a given viable prefix; the set of
all items which are valid at a particular
grammars called the lookahead LR(1)
(LALR(1)) grammars. * Some authors have used the t e r m "configura-
T h e behavior of an L R parser, as described t i o n " for item.

Computing Surveys, Vol 6, No 2, June 1974


110 • A. V. Aho and S. C. Johnson

stage of the parse corresponds to the current ample, we shall construct parsing action and
state of the parser. goto tables for G1.
As an example, let us examine the viable First, we augment the g r a m m a r with the
prefix production
L I S T ',' A C C E P T --~ L I S T
in G1. The item where in general L I S T would be the start
symbol of the g r a m m a r (here G1). A reduc-
[LIST --~ L I S T ',' . E L E M E N T ] tion by this production corresponds to the
accept action b y the parser.
is valid for this prefix, since, setting ~, to the
Next we construct I0 = V("), the set of
e m p t y string and a to L I S T ',' m the defini-
items valid for the viable prefix consisting
tion above, we see that ~ L I S T (which is
of the e m p t y string. By definition, for G1 this
lust L I S T ) is a viable prefix. I n other words,
set must contain the item
when this item is valid, we have seen a por-
tion of the input t h a t can be reduced to the [ A C C E P T --~ . LIST]
viable prefix, and we expect to see next a
The dot in front of the nonterminal L I S T
portion of the input t h a t can be reduced to
means that, at this point, we can expect to
ELEMENT.
find as the remaining input any sentence
The item
derivable from L I S T . Thus, I0 must also
[LIST --* . E L E M E N T ] contain the two items

is not valid for L I S T ',' however, since [LIST --~ . L I S T ',' E L E M E N T ]


setting ~/ to L I S T ',' and a to the e m p t y [LIST --~ . E L E M E N T ]
string we obtain obtained from the two productions for the
L I S T ',' L I S T nonterminal LIST. The second of the items
has a dot in front of the nonterminal E L E -
which is not a viable prefix. M E N T , so we should also add to the initial
The reader can (and should) verify t h a t state the items
the state corresponding to the viable prefix
L I S T ',' is associated with the set of items: [ E L E M E N T --~ . 'a']
[ E L E M E N T -~ . 'b']
[LIST -~ L I S T ',' . E L E M E N T ] corresponding to the two productions for
[ E L E M E N T --* . 'a'] element. These five items constitute I0.
[ E L E M E N T -~ . 'b'] We shall associate state 0 with I0.
If ~, is a viable prefix, we shall use V('~) to Now suppose t h a t we have computed
denote the set of items that are valid for % If V(~), the set of items which are valid for
~/is not a viable prefix, V(~,) will be empty. some viable prefix % Let X be a terminal or
We shall associate a state of the parser with nonterminal symbol. We compute V ( ~ X )
each set of valid items and construct the from V('y) as follows:
entries in the parsing action for t h a t state (1) For each item of the form [A --*
from the set of items. There is a finite num- a . X~] in V('y), we add to V('yX)
ber of productions, thus only a finite number the item [A ~ a X . ~].
of items, and thus a finite number of possi- (2) We compute the closure of the set of
ble states associated with every g r a m m a r G. items in V(~,X); t h a t is, for each item
of the form [B --~ a . C~] in V(~,X),
where C is a nonterminal symbol, we
6.2 Constructing the Collection of Accessible
add to V ( ~ X ) the items
Sets of Items
We shall now describe a constructive pro- [C ~ . ~1]
cedure for generating all of the states and,
at the same time, generating the parsing
action and goto table. As a rumfing ex- [C ~ . an]

Computing Surveys, Vol. 6, No. 2, June 1974


L R Parsing * 111

where C -~ ax, . . . , C -~ an are all ble sets of items by computing G O T O ( I , X ) ,


the productions ill G with C on the for all accessible sets of items I and gram-
left side. If one of these items is al- m a r symbols X , whenever the G O T O con-
ready in V('IX) we do not duplicate struction comes up with a new n o n e m p t y set
this item. We continue to apply this of items, this set of items is added to the set
process until no new items can be of accessible sets of items and the process
added to V('rX). continues. Since the number of sets of items
is finite, the process eventually terminates.
I t can be shown that steps (1) and (2)
The order in which the sets of items are
compute exactly the items t h a t are valid for
computed does not matter, nor does the
~,X [Aho and Ullman (1972a)].
name given to each set of items. We will
For cxample, let us compute 11 =
name the sets of items I0, 11, 12, . . . in the
V(LIST), the set of items t h a t are valid for
order in which we create them. We shall
the viable prefix LIST. We apply the above
then associate state i with I,.
construction with ~, = " and X = L I S T , and
Let us return to G1. We have computed
use the fivc items in I0.
I0, which contained the items
In step (1) of the above construction, we
add the items [ A C C E P T --~. LIST]
[LIST -~ . L I S T ',' E L E M E N T ]
[ A C C E P T -~ L I S T .]
[LIST --~ . E L E M E N T ]
[LIST --~ L I S T . ',' E L E M E N T ] [ E L E M E N T --* . 'a']
[ E L E M E N T - - * . 'b']
to 11. Since no item in 11 has a nonterminal
symbol immediately to the right of the dot, We now wish to compute GOTO(Io, X) for
the closure operation adds no new items to all g r a m m a r symbols X. We have already
11. The reader should verify t h a t these two computed
items are the only items valid for the viable GOTO(Io, L I S T ) = I1
prefix. We shall associate state 1 with 11.
Notice that the above construction is com- To determine GOTO(I0, E L E M E N T ) , we
pletely independent of ~/; it needs only the look for all items in I0 with a dot immedi-
items in V(~), and X. For every set of items ately before E L E M E N T . We then take
I and every g r a m m a r symbol X the above these items and move the dot to the right of
construction builds a new set of items which E L E M E N T . We obtain the single item
we shall call G O T O ( I , X); this is essentially [LIST --* E L E M E N T .]
the same goto function encountered in the
last two sections. Thus, in our example, we The closure operation yields no new items
have computed since this item has no nonterminal to the
right of the dot. We call the set with this
GOTO(I0, L I S T ) = 11 item I2. Continuing in this fashion we find
We can extend this G O T O function to that:
strings of g r a m m a r symbols as follows: GOTO(I0, 'a') contains only
G O T O ( I , ") = I [ E L E M E N T --~ 'a' .]
GOTO(I0, 'b') contains only
G O T O ( I , -rX) = G O T O ( G O T O ( I , ~), X) [ E L E M E N T --~ 'b' .]
where "r is a string of g r a m m a r symbols and and GOTO(I0, ',') and GOTO(I0, 'S') are
X is a nontermmal or terminal symbol. If empty. Let us call the two n o n e m p t y sets
I = V(a), then I = GOTO(Io, a). Thus I3 and I4. We have now computed all sets of
GOTO(I0, a) ~ ~b if and only if a is a viable items t h a t are directly accessible from I0.
prefix, where I0 = V("). We now compute all sets of items t h a t are
The sets of items which can be obtained accessible from the sets of items just com-
from Io by G O T O ' s are called the accesszble puted. We continue computing accessible
sets of ~tems. We build up the set of accessi- sets of items until no more new sets of items

Computing Surveys, Vot 6, No 2, June 1974


112 • A . V . A h o a n d S . C. J o h n s o n

are found. The following table shows the 6.3 Constructing the Parsing Action and Goto
collection of accessible sets of items for G~: Tables from the Collection of Sets of Items
The parsing action table is constructed
Io: ] A C C E P T --~ . LIST] from the collection of accessible sets of items.
[LIST --*. L I S T ',' E L E M E N T ]
From the items in each set of items I , we
[LIST --~ . E L E M E N T ] generate parsing actions. An item of the
[ E L E M E N T --~ . 'a']
form
[ E L E M E N T --~ . 'b']
[A --* a . 'a' El
Ix: [ A C C E P T --~ L I S T . ]
[LIST --~ L I S T . ',' E L E M E N T ] in I , generates the parsing action

if (input = 'a') s h i f t t
12: [LIST --~ E L E M E N T .]
where GOTO(I,, 'a') = It.
13 : [ E L E M E N T -~ 'a' .1 An item with the dot at the right end of
the production is called a completed item. A
I4: [ E L E M E N T --* 'b' .] completed item [A -~ a .] indicates t h a t we
may reduce by production A --~ a. However,
with an LR(1) parser we must determine
15: [LIST --~ L I S T ',' . E L E M E N T ]
[ E L E M E N T -~ . 'a'] on what input symbols this reduction is
[ E L E M E N T --* . 'b'] possible. If 'al', ' a2,' " . , 'a '
,, are these
symbols and 'al', a2, • • • , an are not asso-
ciated with shift or accept actions, then we
I6: [LIST --~ L I S T ',' E L E M E N T .]
would generate the sequence of parsing ac-
The GOTO function on this collection can tions:
be portrayed as a directed graph in which
if(input = 'al') r e d u c e b y : A --+
the nodes are labeled by the sets of items
if(input = 'a2') r e d u c e b y : A --*
and the edges by grammar symbols, as fol-
lows:
if(input = 'an') r e d u c e by: A --~

ELEMENT@ As we mentioned in the last section, if the


set of items contains only one completed
item, we can replace this sequence of parsing
ELEMENT ~@ actions by the default reduce action
r e d u c e by: A ~
k ,Q~ Q
This parsing action is placed after all shift
and accept actions generated by this set of
'b' i items.
If a set of items contains more than one
completed item, then we must generate
Here, we used i in place of I,. conditional reduce actions for all completed
For example, we observe items except one. In a while we shall ex-
plain how to compute the set of input sym-
GOTO(0, ") = 0 bols on which a given reduction is permissi-
GOTO(0, L I S T ',') = 5 ble.
GOTO(0, L I S T ',' E L E M E N T ) = 6 If a completed item is of the form

Observe that there is a path from vertex 0 [ A C C E P T --~ S . ]


to a given node if and only if that path spells
then we generate the accept action
out a viable prefix. Thus, GOTO(0, 'ab') is
empty, since 'ab' is not a viable prefix. if(input = '$') a c c e p t

C o m p u t i n g Surveys, Vol. 6, No 2, June 1974


LR Parsing • 113

where '$' is the right endmarker for the input and to decide between reductions if more
string. than one is possible in a given state. In
Finally, if a set of items generates no re- general, this is a complex task; the most
duce action, we generate the default error general solution of this problem was given by
statement. This statement is placed after [Knuth (1965)], but his algorithm suffers
all shift and accept actions generated from from large time and memory requirements.
the set of items. Several simplifications have been proposed,
Returning to our example for G1, from notably by [DeRemer (1969 and 1971)],
I0 we would generate the parsing actions: which lack the full generality of Knuth's
technique, but can construct practical par-
if(input = 'a') s h i f t 3
sers in reasonable time for a large class of
if(input = 'b') s h i f t 4
languages. We shall describe an algorithm
error
that is a simplification of Knuth's algorithm
Notice that these are exactly the same pars- which resolves all conflicts that can be re-
ing actions as those for state 0 in the parser solved when the parser has the states as
of Section 4. Similarly, I3 generates the ac- given above.
tion
6.4 Computing Lookahead Sets
reduce by: E L E M E N T -~ 'a'
Suppose [A -~ = . B] is an item that is
The goto table is used to compute the new valid for some viable prefix ~a. We say that
state after a reduction. For example, when input symbol 'a' is applicable for [A ---* ~ • ~]
the reduction in state 3 is performed we al- if, for some string of terminals 'w', both
ways have state 0 to the left of 'a'. The new "y=~'aw' and ~,A'aw' are right sentential
state is determined by simply noting that forms. The right endmarker '$' is applicable
for [A ---* = . ~] if both ~,=B and ~A are
GOTO(I0, E L E M E N T ) = I2 right sentential forms.
This gives rise to the code This definition has a simple intuitive ex-
planation when we consider completed items.
if(state = 0 ) g o t o = 2 Suppose input symbol 'a' is applicable for
completed item [A --* ~ .]. If an LR(1)
for E L E M E N T in the goto table.
parser makes the reduction specified by this
In general, if nonterminal A has precisely
item on the applicable input symbol 'a',
the following GOTO's in the GOTO graph:
then the parser will be able to make at least
GOTO(I~, A) = I , one more shift move without encountering
G O T O ( I , , A) = It, an error.
The set of symbols that are applicable for
GOTO(I,~, A) = It~ each item will be called the lookahead set
for that item. From now on we shall in-
then we would generate the following repre- clude the lookahead set as part of an item.
sentation for column A of the goto table: The production with the dot somewhere in
A: if(state = s l ) g o t o = tl the right side will be called the core of the
if(state = s2) g o t o = t~ item. For example,
( [ E L E M E N T -o 'a' .], {',', '$'})
if(state = s , , ) g o t o = t~
is an item of G1 with core
Thus, the goto table is simply a representa-
tion of the GOTO function of the last sec- [ E L E M E N T --* 'a' .]
tion, applied to the nonterminal symbols.
We must now determine the input sym- and lookahead set {',', '$'}.
bols on which each reduction is applicable. We shall now describe an algorithm that
This will enable us to detect ambiguities and will compute the sets of valid items for a
difficult-to-parse constructs in the grammar, grammar where the items include their

Computing Surveys, Vol 6, No. 2, June 1974


114 • A. V. Aho and S. C. Johnson

lookahead sets. Recall t h a t in the last sec- through the closure operation, to two addi-
tion items in a set of items arose in two ways: tional items
b y goto calculations, and then b y the closure
operation. The first t y p e of calculation is ([LIST - - * . L I S T ',' E L E M E N T ] , {','I)
very simple; if we have an item of the form and ([LIST--* . E L E M E N T ] , [','})
([A --~ a . X/3], L) since the first terminal symbol of any string
derivable from
where X is a g r a m m a r symbol and L is a
lookahead set, then when we perform the ',' E L E M E N T '$'
goto operation on X on this item, we obtain
the item is always ','. Since all items with the same
core are merged into a single item with the
([A --* a X . [3], L) same core and the union of the lookahead
(i.e., the lookahead set is unchanged). sets, we currently have the following items
I t is somewhat harder to compute the in I0:
lookahead sets in .the closure operation. ( [ A C C E P T - ~ . LIST], {'$'})
Suppose there is an item of the form ([LIST --~. L I S T ',' E L E M E N T ] , {',', '$'})
([A --~ a . BE], L) ( [ L I S T - - ~ . E L E M E N T ] , {',', '$'])

in a set of items, where B is a nonterminal T h e first two of these items no longer give
symbol. We must add items of the form rise to any new items when the closure
operation is applied. The third item gives
([B - - ~ . ~], L') rise to the two new items:
where B --* ~ is some production in the ( [ E L E M E N T --~. 'a'], {',', '$'})
grammar. The new lookahead set L ' will ( [ E L E M E N T --~. 'b'], {',', '$'})
contain all terminal symbols which are the
first symbol of some sentence derivable from and these five items make up I0.
any string of the form /3 'a', where 'a' is a We shall now compute
symbol in L. I2 = GOTO(I0, 'a').
If, in the course of carrying out this con-
struction, a set of items is seen to contain First we add the item
items with the same core; e.g.,
( [ E L E M E N T --* 'a' .], {',', '$'1)
([A --. a . / 3 ] , L,)
to I2, since 'a' appears to the right of the
and ([A --* a . ~], L2) dot of onc item in I9. T h e closure operation
adds no new items to 12.
then these items are merged to create a sin- I2 contains a completed item. The look-
gle item; e.g., ([A --~ a . ~], L1 U L2). ahead set /',', '$'} tells us on which input
We shall now describe the algorithm for symbols the reduction is applicable.
constructing the collection of sets of items The reader should verify t h a t the com-
in more detail b y constructing the valid sets plete collection of sets of items for G1 is:
of items for g r a m m a r G1. Initially, we con-
struct Io b y starting with the single item 10: ]ACCEPT --* . LIST[, {'$'}
[LIST --.. LIST ',' ELEMENT], [',', '$'J
( [ A C C E P T - - * . LIST], {'$'}) [LIST--* ELEMENT], {',', '$']
[ELEMENT -~ . 'a'], {',', '$'}
We then compute the closure of this set of [ELEMENT --* 'b'], [',', '$'}
items. The two productions for L I S T give
rise to the two items I~' [ACCEPT ~ LIST ], {'$'}
[LIST ~ LIST . ',' ELEMENT], I',', '$'}
([LIST - - * . L I S T ',' E L E M E N T ] , {'$'})
and ([LIST ~ . E L E M E N T ] , {'$'1) I~: [LIST -~ ELEMENT .], {',', '$'}

The first of these two items gives rise, Is: [ELEMENT -o 'a' .], {',', '$'}

C o m p u t i n g Surveys, Vol 6, No 2, June 1974


L R Parsing • 115

I4: [ELEMENT ~ 'b' .], {',', '$'} table construction process:


(1) Given a g r a m m a r G, augment the
15: [LIST --~ LIST ',' . ELEMENT], {',', '$'} g r a m m a r with a new initial produc-
[ELEMENT ~ 'a'], ',', '$'}
[ELEMENT ~ . 'b'], ',', '$'} tion
ACCEPT ~ S
16" [LIST ~ LIST ',' ELEMENT .], ',', '$'}
Although the situation does not occur where S is the start symbol of G.
here, if we generate a set of items I t such t h a t (2) Let I be the set with the one item
I t has the same set of cores as some other ([ACCEPT --~. S], {'$'})
set of items I , already generated, but I ,
It, then we combine I8 and I t into a new set Let I0 be the closure of I.
of items I by merging the lookahead sets of (3) Let C, the current collection of ac-
items with the same cores. We must then cessible sets of items, initially contain
compute G O T O ( I , X) for all g r a m m a r sym- only I0.
bols X. (4) For each I in C, and for each g r a m m a r
The lookahead sets on the completed symbol X, compute I ' = G O T O ( I , X ) .
items give the terminal symbols for which Three cases can occur:
the reductions should be performed. There a. I ' = I " for some I " already in C.
is a possibility t h a t there are ambiguities in I n this case, do nothing.
the grammar, or the g r a m m a r is too complex b. If the set of cores of I ' is distinct
to allow a parser to be constructed b y this from the set of cores of a set of
technique; this causes conflicts to be dis- items already in C, then add I' to C.
covered in the actions of the parser. For ex- c. If the set of cores of I ~ is the same
ample, suppose there is a set of items I~ in as the set of cores of some I " al-
which 'a' gives rise to the parsing action ready in A but I ' ~ I " , then let
shift because GOTO(Is, 'a') exists. Suppose I " be the set of items
also that there is a completed item
([A -~ a./~], L1 (J L2)
([A --. a .], L)
in I,, and t h a t the terminal symbol 'a' is in such t h a t
the lookahead set L. Then we have no way ([A --* a . f~], 51) is in I ' and
of knowing which action is correct in state s ([A --~ a . ~], L~) is in I " .
when we see an 'a'; we m a y shift 'a', or we
m a y reduce by A --~ a. Our only recourse is Replace I" b y I " in C.
to report a shift-reduce conflict. (5) Repeat step 4 until no new sets of
In the same way, if there are two reduc- items can be added to C. C is called
tions possible in a state because two com- the L A L R ( 1 ) collection of sets of items
pleted items contain the same terminal sym- for G.
bol in their lookahead sets, then we cannot (6) From C t r y to construct the parsing
tell which reduction we should do; we must action and goto tables as in Section
report a reduce-reduce conflict. 6.3.
Instead of reporting a conflict we m a y If this technique succeeds in producing a
a t t e m p t to proceed b y carrying out all con- collection of sets of items for a given gram-
flicting parsing actions, either b y parallel m a r in which all sets of items are consistent,
simulation [Earley (1970)] or b y backtrack- then t h a t g r a m m a r is said to be an L A L R ( 1 )
ing [Pager (1972b)]. grammar. LALR(1) g r a m m a r s include m a n y
A set of items is consistent or adequate if it important classes of grammars, including
does not generate any shift-reduce or reduce- the LL(1) g r a m m a r s [Lewis and Stearns
reduce conflicts. A collection of sets of items (1968)], the simple mixed strategy prece-
is vahd if all its sets of items are consistent; dence g r a m m a r s [McKeeman, Horning, and
our collection of sets of items for G1 is valid. W o r t m a n (1970)], and those parsable by
We summarize the parsing action and goto operator precedence techniques. Techniques

Computing Surveys, Vol 6, No. 2, June 1974


116 • A. V. Aho and S. C. Johnson

for proving these inclusions can be found in algorithm to determine if a context-free


[Aho and Ullman (1972a and 1973a)]. grammar is ambiguous (see, for example
Step (4) can be rather time-consuming to [Aho and Ullman (1972a)]).
implement. A simpler, but less general, Inconsistent sets of items are useful in
approach would be to proceed as follows. Let pinpointing difficult-to-parse or ambiguous
FOLLOW(A) be the set of terminal symbols constructions in a given grammar. For
that can follow nonterminal symbol A in a example, a production of the form
sentential form. If A can be the rightmost
symbol of a sentential form, then '$' is in- A --~ A A
cluded in FOLLOW(A). We can compute the
in any grammar will make that grammar
sets of items without lookaheads as in Section
ambiguous and cause a parsing action con-
6.2. Then in each completed item [A --~ a .]
flict to arise from sets of items containing
we can approximate the lookahead set L for
the items with the cores
this item b y F O L L O W ( A ) (In general, L is
a subset of FOLLOW(A).) The resulting [A --~ A A .]
collection of sets of items is called the [A --~ A . A]
SLR(1) collection. If all sets of items in the
SLR(1) collection are consistent, then the Constructions which are sufficiently com-
grammar is said to be simple LR(1) [De- plex to require more than one symbol of
Remer (1971)]. Although not every LALR(1) lookahead also result in parsing action con-
grammar is simple LR(1), every language flicts. For example, the grammar
generated by an LALR(1) grammar is also
generated by a simple LR(1) grammar S --~ A 'a'
([Aho and Ullman (1973a)] contains more A --) 'a' I "
details). is an LALR(2) but not LALR(1) grammar.
Experience with an LALR(1) parser
generator called YACC at Bell Laboratories
7. PARSING AMBIGUOUS GRAMMARS has shown that a few iterations with the
parser generator are usually sufficient to re-
It is undesirable to have undetected ambigui-
ties in the definition of a programming solve the conflicts in an LALR(1) collec-
language. However, an ambiguous grammar tion of sets of items for a reasonable pro-
can often be used to specify certain language gramming language.
Example 7.1: Consider the following pro-
constructs more easily than an equivalent
unambiguous grammar. We shall also see ductions for "if-then" and "if-then-else"
that we can construct more efficient parsers statements:
directly from certain ambiguous grammars S --~ 'if b then' S
than from equivalent unambiguous gram- S -~ 'if b then' S 'else' S
mars.
If we attempt to construct a parser for If these two productions appear in a gram-
an ambiguous grammar, the LALR(1) mar, then that grammar will be ambiguous;
parser construction technique will generate the string
at least one inconsistent set of items. Thus,
the parser generation technique can be used 'if b then if b then' S 'else' S
to determine that a grammar is unambigu- can be parsed in two ways as shown:
ous. T h a t is to say, if no inconsistent sets of
items are generated, the grammar is guaran-
teed to be unambiguous. However, if an
inconsistent set of items is produced, then
all we can conclude is that the grammar is
not LALR(1). The grammar may or may
not be ambiguous. (There is no general

Computing Surveys, Vol. 6, N o 2, June 1974


L R Parsing • 117

15: [S ~ 'if b t h e n ' S 'else' S], ['else', '$'}


IS ~ . 'if b t h e n ' S], {'else', '$'}
[3 ~ . 'if b t h e n ' S 'else' 3], ['else', '$'}
[3 ~ . 'a'], {'else', '$'}

Is' [3 ~ 'if b t h e n ' S 'else' S .], {'else', '$'}

I4 contains a shift-reduce conflict. On the


input 'else', I4 says t h a t either a shift move
I n most programming languages, the first
to /5 is permissible, or a reduction b y pro-
phrasing is preferred. T h a t is, each new
duction
'else' is to be associated with the closest
"unelsed" 'then'. S --~ 'if b then' S
A g r a m m a r using these ambiguous produc-
tions to specify if-then-else statements will is possible. If we choose to shift, we shall
be smaller and, we feel, easier to compre- associate the incoming 'else' with the last
hend t h a n an equivalent unambiguous unelsed 'then'. This is evident because the
grammar. In addition if a g r a m m a r has only item with the core
ambiguities of this type, then we can con-
struct a "Mid LALR(1) parser for the gram- IS --~ 'if b then' S . 'else' S]
m a r merely b y resolving each shift-reduce in I4 gives rise to the shift action.
conflict in favor of shift [Aho, Johnson, and The complete parsing action table, with
Ullman (1973)]. the conflict resolved, and the goto table con-
E x a m p l e 7.2: Consider the ambiguous structed from this collection of sets of items
grammar* are shown below:
S ~ 'if b then' S
S -~ 'if b then' S 'else' S Parsvng Action Table
S --~ 'a' 0: if(input = 'if b then') s h i f t 2
if(input = 'a') s h i f t 3
in which each else is to be associated with
error
the last unelsed 'then'. The LALR(1) col-
1: if(input = $) a c c e p t
lection of sets of items for this g r a m m a r is as
error
follows:
2: if(input = 'if b then') s h i f t 2
/0: [ACCEPT --, • S], {'$'} if(input = 'a') s h i f t 3
[3 --* . 'if b then' 3], {'3'} error
[3 ~ . 'if b then' S 'else' 3], {'$'} 3" r e d u c e b y : S --+ 'a'
[3 --* 'a'], {'$'}
4: i f ( i n p u t = 'else') s h i f t 5
I1 [ A C C E P T --, S .1, {'S'} r e d u c e b y : S --~ 'if b then' S
I~: IS ~ 'if b t h e n ' . 3[, 5: if(input = 'if b then') s h i f t 2
{'else', '$'1
[3 ~ 'if b t h e n ' S 'else' S], ['else', '$'} if(input = 'a') s h i f t 3
IS ~ 'if b t h e n ' S], ['else', '$'} error
IS ~ . 'if b t h e n ' S 'else' S], ['else', '$'} 6: r e d u c e b y : S --~ 'if b t h e n ' S 'else' S
[3 ~ . 'a'], ['else', '$'}

In: [S ~ 'a' ], {'else', '$'} Goto Table

14: [S --* 'if b t h e n ' S .1, ['else', '$'} S: if(state = 0) g o t o = 1


IS ~ 'if b t h e n ' S . 'else' S], {'else', '$'} if(state = 2) g o t o = 4
goto = 6
* T h e following g r a m m a r is an e q u i v a l e n t u n a m -
biguous grammar: Given an ambiguous grammar, with the
S --* 'if b t h e n ' S appropriate rules for resolving the ambigui-
S --* 'If b t h e n ' S~ 'else' S ties we can often directly produce a smaller
S --* 'a'
$I --* 'if b t h e n ' $1 'else' Sx parser from the ambiguous g r a m m a r t h a n
SI --* 'a' from the equivalent unambiguous grammar.

Computing Surveys, Vol 6, No 2, June 1974


118 • A . V. Aho and S. C. Johnson

However, some of the "optimizations" dis- The first parsing gives the usual left-to-right
cussed in the next section will make the par- associativity, the second a right-to-left
ser for the unambiguous grammar as small associativity.
as that for the ambiguous grammar. If we rewrote the grammar as G4:
E x a m p l e 7.3 : Consider the following gram- E---~ E ' A - ' T
mar G3 for arithmetic expressions: E---~ E ' . ' T
E---~T
E-~ E '+' E
T ~ '('E')'
E-~ E ',' E
T ~ 'a'
E --* '('E')'
E --~ 'a' then we would have eliminated this am-
biguity by imposing the normal left-to-right
where 'a' stands for any identifier. Assuming associativity for + and .. However, this
that + and • are both left associative and new grammar has still one more defect; +
• has higher precedence than + , there are and • have the same precedence, so that an
two things wrong with this grammar. First, expression of the form ' a + a , a ' would be
it is ambiguous in that the operands of the evaluated as ( a + a ) . a . To eliminate this,
binary operators ' + ' and ' . ' can be associ- we must further rewrite the grammar as
ated in any arbitrary way. For example, as:
'a + a -4- a' can be parsed as
E-)E '+' T
E-~T
T --~ T '*' F
T-") F
F -- ' ( ' E ' ) '
F ---) 'a'
We can now construct ~ parser for G5
quite easily, and find that we have 12 states;
if we count the number of parsing actions in
the parser (i.e., the sum of the number of
shift and reduce actions in all states to-
gether with the goto actions) we see that the
parser for G5 has 35 actions.
In contrast, the parser for G3 has only 10
states, and 29 actions. A considerable part
of the saving comes from the elimination of
or as the nonterminals T and F from Gs, as well as
the elimination of the productions E --~ T
and T -* F.
Let us discuss the resolution of parsing
action conflicts in G3 in somewhat more de-
tail. There are two sets of items in the
LALR(1) collection of sets of items for G3
which generate conflicts in their parsing ac-
tions:
[E ---) E . ' + ' El, { ' + ' , '.', ')', '$'}
[E --~ E . ' , ' El, { ' + ' , ',', ')', '$'}
[E --~ E ' + ' E .], { ' + ' , '.', ')', '$'}
and [E --* E . ' + ' El, { ' + ' , '.', ')', '$'}
[E -~ E . ' , ' El, { ' + ' , ' . ' , ')', '$'}
[E --* Z ' . ' E .1, { ' + ' , ' . ' , ')', '$'}

Computing Sur~eys, Vol 6, No. 2, June 1974


LR Parsing • 119

I n b o t h sets of items, shift-reduce conflicts Goto Table


arise on the two terminal symbols ' + ' and
' . ' . For example, in the first set of items on L I S T : if(state = 0) g o t o = 1
an input of ' + ' we m a y generate either a goto = 5
reduce action or a shift action. Since we wish
+ to be left associative, we wish to reduce Notice t h a t we have only 14 parsing ac-
on this input; a shift would have the effect of tions in this parser, compared to the 16
delaying the reduction until more of the which we had in the earlier parser for G1. I n
string had been read, and would imply right addition, the derivation trees produced b y
associativity. On the input symbol '*', how- this parser are smaller since the nodes cor-
ever, if we did the reduction we would end responding to the nonterminal symbol E L E -
up parsing the string ' a + a , a ' as ( a + a ) , a ; M E N T are no longer there. This in turn
t h a t is, we would not give • higher prece- means t h a t the parser makes fewer actions
dence t h a n + . Thus, it is correct to shift on when parsing a given input string. Parsing
this input. Using similar reasoning, we see of ambiguous g r a m m a r s is d~scussed by
t h a t it is always correct to generate a re- [Aho, Johnson, and Ullman (1973)] in more
duce action from the second set of items; on detail.
the input symbol ' , ' this is a result of the
left associativity of ,, while on the input
symbol ' + ' this reflects the precedence rela-
8. OPTIMIZATION OF LR PARSERS
tion between + and ,.
We conclude this section with an example There are a number of ways of reducing the
of how this reasoning can be applied to our size and increasing the speed of an LR(1)
g r a m m a r G1. We noted earlier t h a t the parser without affecting its good error-de-
g r a m m a r G2: tecting capability. I n this section we shall
list a few of m a n y transformations that can
L I S T --* L I S T ',' L I S T be applied to the parsing action and goto
L I S T --* 'a'
tables of an LR(1) parser to reduce their
L I S T --* 'b'
size. The transformations we list are some
is ambiguous, but this ambiguity should no simple ones t h a t we have found to be effec-
longer be of concern. Assuming t h a t the tive in practice. M a n y other transformations
language designer wants to treat ',' as a left are possible and a number of these can be
associative operator, then we can produce a found in the references at the end of this
parser which is smaller and faster than the section.
parser for G1 produced in the last section.
The smaller parser looks like: 8.1 Merging Identical States
T h e simplest and most obvious "optimiza-
Parsing Action Table tion" is to merge states with common parsing
actions. For example, the parsing action
0: if(input = 'a') s h i f t 2 table for G1 given in Section 5 contains
if(input = 'b') s h i f t 3 identical actions in states 0 and 5. Thus, it is
error
natural to represent this in the parser as:
1: if(input = '$') a c c e p t
if(input = ',') s h i f t 4 0: 5: if(input = 'a') s h i f t 3
error if(input = 'b') s h i f t 4
2: r e d u c e b y : L I S T -~ 'a' error
3: r e d u c e b y : L I S T --~ 'b'
4: if(input = 'a') s h i f t 2 Clearly the behavior of the LR(1) parser
if(input = 'b') s h i f t 3 using this new parsing action table is the
error same as t h a t of the LR(1) parser using the
5: r e d u c e b y : L I S T --~ L I S T ',' L I S T old table.

Computing Surveys, VoL 6, No 2, June 1974


120 • A . V. Aho and S. C. Johnson

8.2 SubsumingStates tion 5. The only state which calls for a re-
A slight generalization of the transforma- duction by this production is state 2. More-
tion in Section 8.1 is to eliminate a state over, the only way in which we can get to
whose parsing actions are a suffix of the state 2 is by the goto action
actions of another state. We then label the
E L E M E N T : if(state = 0) g o t o = 2
beginning of the suffix by the eliminated
state. For example, if we have: After the parser does the reduction in state
2, it immediately refers to the goto action
n: if(input = 'x') s h i f t p
if(input = 'y') s h i f t q LIST: goto = 1
error
at which time the current state becomes 1.
and m: if(input = 'y') s h i f t q Thus, the rightmost tree is only labeled with
error state 2 for a short period of time; state 2
represents only a step on the way to state I.
then we may eliminate state m by adding the We may eliminate this reduction by the sin-
label into the middle of state n: gle production by changing the goto action
under E L E M E N T to:
n: if(input = 'x') s h i f t p
m: if(input = 'y') s h i f t q E L E M E N T : if(state = 0) g o t o -- 1
error
so that we bypass state 2 and go directly to
Permuting the order of these statements state 1. We now find that state 2 can never
can increase the applicability of this op- be reached by any parsing action, so it can
timization. (See Ichbiah and Morse (1970) be eliminated. Moreover, it turns out here
for suggestions on the implementation of this (and frequently in practice as well) that the
optimization.) goto actions for L I S T and E L E M E N T be-
come compatible at this point; that is, the
8.3 Elimination of Reductions by Single actions do not differ on the same state. I t is
Productions always possible to merge compatible goto
A single production is one of the form actions for nonterminals; the resulting parser
A -* X, where A is a nonterminal and X is has one less state, and one less goto action.
a grammar symbol. If this production is not Example 8.1: The following is a representa-
of any importance in the translation, then tion of the parsing action and goto tables for
we say that the single production is se- an LR(1) parser for G1. It results from the
mantically mszgn~ficant. A common situa- parsing action and goto tables in Section 5
tion in which single productions arise occurs by applying state merger (Section 8.1), and
when a grammar is used to describe the eliminating the reduction by the single pro-
precedence levels and associativities of duction.
operators (see grammar G5 of Example 7.3).
We can always cause an L R parser to avoid Parsing Action Table
making these reductions; by doing so we 0. 5. if (input = 'a') shift 3
make the LR parser faster, and reduce the if (input = 'b') shift 4
number of states. (With some grammars, the error
size of the "optimized" form of the parsing 1: if (input = ',') shift 5
if (input = $) a c c e p t
action table may be greater than the un- error
optimized one.) 3" reduce by: ELEMENT --* 'a'
We shall give an example in terms of G1 4 reduce by: ELEMENT --~ 'b'
which contains the single production 6" reduce by: L I S T -~ L I S T ',' E L E M E N T

L I S T --~ E L E M E N T Goto Table


We shall eliminate reductions by this pro- L I S T : E L E M E N T : if(state = 0) g o t o = 1
duction from the parser for G, found in Sec- goto = 6

Computing Surveys, Vol. 6, No. 2, June 1974


LR Parsing • 121

These tables are identical with those for what the parser should do when an error is
the ambiguous version of G1, after the equal detected; in general, this depends on the
states have been identified. These tables environment in which the parser is operating.
differ only in that the nonterminal symbols Any scheme for error recovery must be
L I S T and E L E M E N T have been explicitly carefully interfaced with the lexical analysis
merged in the ambiguous grammar, while and code generation phases of compilation,
the distinction is still nominally made in the since these operations typically have "side
tables above. effects" which must be undone before the
In the general case, there may be a number error can be considered corrected. In addi-
of states which call for reductions by the tion, a compiler should recover gracefully
same single production, and there may be from each error encountered so that subse-
other parsing actions in the states which call quent errors can also be detected.
for these reductions. It is not always possi- L R parsers can accommodate a wide
ble, in general, to perform these modifica- variety of error recovery stratagems. In
tions without increasing the number of place of each error entry in each state, we
states; conditions which must be satisfied in may insert an error correction routine which
order to profitably carry out this process is prepared to take some extraordinary ac-
are given in [Aho and Ullman (1973b)]. It tions to correct the error. The description of
is enough for our purposes to notice that if the state as given by the set of items fre-
a reduction by a single production A --* X quently provides enough context information
is to be eliminated, and if this reduction is to allow for the construction of sophisticated
generated by exactly one set of items con- error recovery routines.
taining the item with the core We shall illustrate one simple method by
which error recovery can be introduced into
[A ~ X .]
the parsing process. This method is only one
then this single production can be eliminated. of many possible techniques. We introduce
I t turns out that the single productions error recovery productions of the form
which arise in the representation of operator
A --) error
precedence or associativity can always be
eliminated; the result is typically the same into the grammar for certain selected non-
as if an ambiguous grammar were written, terminals. Here, e r r o r is a special terminal
and the conflicts resolved as discussed in symbol. These error recovery productions
Section 6. However, the ambiguous grammar will introduce items with cores of the form
generates the reduced parser immediately,
[A -o . error]
without needing this optimizing algorithm
[Aho, Johnson, and Ullman (1973)]. into certain states, as well as introducing
Other approaches to optimization of L R new states of the form
parsers are discussed by [Aho and Ullman
[A -~ error .]
(1972b)], [Anderson (1972)], [Jolliat (1973)],
and [Pager (1970)]. [Anderson, Eve, and When the L R parser encounters an error, it
Horning (1973)], [Demers (1973)], and can announce error and replace the current
[Pager (1974)] also discuss the elimination of input symbol by the special terminal symbol
reductions by single productions. e r r o r . The parser can then discard trees
from the parse forest, one at a time from
right-to-left, until the current state (the
9. ERROR RECOVERY state on the rightmost tree in the parse
forest) has a parsing action shift on the in-
A properly designed LR parser will an- put e r r o r . The parser has now reached a
nounce that an error has occurred as soon state with at least one item of the form
as there is no way to make a valid continua-
[ A --* . e r r o r ]
tion to the input already scanned. Un-
fortunately, it is not always easy to decide The parser can then perform the shift

Computing Surveys, Vol. 6, N o 2, June 1974


122 • A . V . A h o a n d S . C. J o h n s o n

action and reduce b y one of the error re- [James (1972)], [Leinius (1970)], [McGruther
covery productions (1972)], [Peterson (1972)], and [Wirth
(1968)].
A --+ e r r o r
(If more t h a n one error recovery production
is present, a choice would have to be speci- 10. OUTPUT
fied.) On reducing, the parser can perform a
hand-tailored action associated with this I n compiling, we are not interested in pars-
error situation. One such action could be to ing but rather in producing a translation for
skip forward on the input until an input the source program. L R parsing is eminently
symbol 'a' was found such t h a t 'a' can suitable for producing b o t t o m - u p transla-
legitimately occur either as the last symbol tions.
of a string generated by A or as the first Any translation which can be expressed
symbol of a string t h a t can follow A. as the concatenation of outputs which are
Certain automatic error recovery actions associated with each production can be
are also possible. For example, the error re- readily produced by an L R parser, without
covery productions Call be mechanically having to construct the forest representing
generated for any specified set of nontermi- the derivation tree. For example, we can
nals. Parsing and error recovery can proceed specify a translation of arithmetic expressions
as above, except t h a t on reducing b y an error from infix notation to postfix Polish notation
recovery production, the parser can auto- in this way. To implement this class of trans-
matically discard input symbols until it finds lations, when we reduce, we perform an
an input symbol, say 'a', on which it can output action associated with t h a t produc-
make a legitimate parsing action, at which tion. For example, to produce postfix Polish
time normal parsing resumes. This would from G1, we can use the following transla-
correspond to assuming t h a t an error was tion scheme:
encountered while the parser was looking for Productwn Translatwn
a phrase t h a t could be reduced to nontermi- (1) E---* E ' + ' E '+'
nal A. The parser would then assume t h a t (2) E - + E '*' E '*'
b y skipping forward on the input to the (3) E -~ '('E')'
(4) E --, 'a' 'a'
symbol 'a' it would have found an instance
of nonterminal A. Here, as in Section 7, we assume t h a t q-
Certain error recovery schemes can pro- and • are left associative, and t h a t • has
duce an avalanche of error messages. To higher precedence than + . The translation
avoid a succession of error messages stem- element is the output string to be emitted
ming from an inappropriate recovery, a when the associated reduction is done. Thus,
parser might suppress the announcement of if the input string
subsequent errors until a certain number of 'a -t- a * (a "-k a)'
successful shift actions have occurred.
We feel that, at present, there is no effi- is parsed, the output will be
cient general "solution" to the error re-
'aaaa + • -}-'
covery problem in compiling. We see faults
with any uniform approach, including the These parsers can also produce three ad-
one above. Moreover, the success of any dress code or the parse tree as output with
given approach can v a r y considerably from the same ease. However, more complex
application to application. We feel t h a t if a translations m a y require more elaborate
language is cleanly designed and well hu- intermediate storage. Mechanisms for im-
man-engineered, automatic error recovery plementing these translations are discussed
will be easier as well. in [Aho and Ullman (1973a)] and in [Lewis,
Particular methods of error recovery dur- Rosenkrantz, and Stearns (1973)]. I t is our
ing parsing are discussed by [Aho and Peter- current belief that, if a complicated trans-
son (1972)], [Graham and Rhodes (1973)], lation is called for, the best way of imple-

Computing Surveys, Vol. 6, No. 2, June 1974


LR Parsing • 123

menting it is by constructing a tree. Optimiz- In his paper Knuth outlined a method for
ing transformations can then massage this constructing an LR parser for an LR gram-
tree before final code generation takes place. mar. However this algorithm results in
This scheme is simple and has low overhead parsers that are too large for practical use.
when the input is in error. A few years later [Korenjak (1969)] and par-
ticularly [DeRemer (1969 and 1971)] suc-
ceeded in substantially modifying Knuth's
11. CONCLUDING REMARKS original parser constructing procedure to
produce parsers of practical size. Substan-
LR parsers belong to the class of shift-reduce tial progress has been made since in improv-
parsing algorithms [Aho, Denning, and Ull- ing the size and performance of LR parsers.
man (1972)]. These are parsers that operate The general theory of LR(k) grammars
by scanning their input from left-to-right, and languages is developed in [Aho and Ull-
shifting input symbols onto a pushdown man (1972a and 1973a)]. Proofs of the cor-
stack until the handle of the current right rectness and efficacy of many of the con-
sentential form is on top of the stack; the structions in this paper can be found there.
handle is then reduced. This process is con- Perhaps the biggest advantage of LR
tinued either until all of the input has been parsing is that small, fast parsers can be
scanned and the stack contains only the mechanically generated for a large class of
start symbol, or until an error has been en- context-free grammars, that includes all
countered. other classes of grammars for which non-
During the 1960s a number of shift-reduce backtracking parsing algorithms can be
parsing algorithms were found for various mechanically generated. In addition, LR
subclasses of the context-free grammars. The parsers are capable of detecting syntax errors
operator precedence grammars ]Floyd at the earliest opportunity in a left-to-right
(1963]), the simple precedence grammars scan of an input string, a property not en-
[Wirth and Weber (1966)], the simple mixed
joyed by many other parsing algorithms.
strategy precedence grammars [McKeeman,
Just as we can parse by constructing a
Horning, and Wortman (1970)], and the
uniquely invertible weak precedence gram- derivation tree for an input string bottom-up
mars [Ichbiah and Morse (1970)] are some of (from the leaves to the root) we can also
these subclasses. The definitions of these parse top-down by constructing the deriva-
classes of grammars and the associated tion tree from the root to the leaves. A
parsing algorithms are discussed in detail in proper subclass of the LR grammars can
[Aho and Ullman (1972a)]. be parsed deterministically top-down. These
In 1965 Knuth defined a class of gram- are the class of LL grammars, first studied
mars which he called the LR(k) grammars. by [Lewis and Stearns (1968)]. LL parsers
These are the context-free grammars that are also efficient and have good error-de-
one can naturally parse bottom-up using a tecting capabilities. In addition, an LL par-
deterministic pushdown automaton with ser requires less initial optimization to be of
k-symbol lookahead to determine shift- practical size. However, the most serious
reduce parsing actions. This class of gram- disadvantage of LL techniques is that LL
mars includes all of the other shift-reduce grammars tend to be unnatural and awk-
parsable grammars and admits of a parsing ward to construct. Moreover, there are LR
procedure that appears to be at least as effi- languages which do not possess any LL
cient as the shift-reduce parsing algorithms grammar.
given for these other classes of grammars. These considerations, together with prac-
[Lalonde, Lee, and Homing (1971)] and tical experience with an automatic parser
]Anderson, Eve, and Horning (1973)] pro- generating system based on the principles
vide some empirical comparisons between expounded in this paper, lead us to believe
LR and precedence parsing that support that LR parsing is an important, practical
this conclusion. tool for compiler design.

Computing Surveys, Vot 6, No. 2, June 1974


124 • A . V. A h o and S. C. Johnson

REFERENCES purer Systems Research Group, Univ To-


ronto, Toronto, Canada, 1972.
AHO, A V., DENNING, P. J , AND ULLMAN, J D. JOLLIAT, M L. "On the reduced matrix repre-
"Weak and mixed strategy precedence par- sentation of LR(k) parser tables " PhD.
sing." J. ACM 19, 2 (1972), 225-243. Thesis, Univ. Toronto, Toronto, Canada
AHO, A V., JOHNSON, S C., AND ULLMAN, J. D. (1973).
"Deterministic parsing of ambiguous gram- KNUTH, D E "On the translation of languages
mars." Conference Record of ACM Symposium from left to right " Information and Control 8,
on Principles of Programming Languages (Oct. 6 (1965), 607-639
1973), 1-21. KNUTH, D. E "Top down syntax analysm."
AHO, A. V , AND PETERSON, T G "A minimum Acta Informatzca 1, 2 (1971), 97-110
distance error-correcting parser for context- KORENJAK, A. J. "A practical method of con-
free languages." SIAM J. Computing 1, 4 structmg LR(k) processors " Comm. ACM 12,
(1972) 305-312 11 (1969), 613-623
AHO, A. V., AND ULLMAN, J D. The Theory of LALONDE, W R., LEE, E S., AND HORNING, J. J
Pars,rig, Translatwn and Comp~hng, Vol. 1, "An LALR(k) parser generator." Proc. IFIP
Parsing. Prentice-Hall, Englewood Cliffs, Congress 71 TA-3, North-Holland Publishing
N . J , 1972a. Co., Amsterdam, the Netherlands (1971), pp.
AHO, A. V., AND ULLMAN, J. D. "Optimization of 153-157.
LR(k) parsers." J. Computer and System LEINIUS, P "Error detection and recovery for
Sciences 6, 6 (1972b), 573-602. syntax directed compiler systems " PhD
AHO, A. V , AND ULLMAN, J D The Theory of Thesis, Univ Wisconsin, Madmon, Wisc.
Parsing, Translatwn, and Compiling, Vol 2, (1970).
Compzhng. Prentice-Hall, Englewood Chffs, LEWIS, P. M , ROSENKRANTZ,D. J , AND STEARNS,
N J , 1973a R E "Attributed translations " Proc. Fzfth
Ano, A. V , AND ULLMAN, J. D. "A techmque for Annual ACM Symposzum on Theory of Com-
speeding up LR(k) parsers " SIAM J. Com- putzng (1973), 160-171
puting 2, 2 (1973b), 106-127. LEWTS, P M., AND STEARNS, R E. "Syntax
ANDERSON, T. Syntactic analys~s of LR(k) lan- directed transductIon." J. ACM 15, 3 (1968),
guages. PhD Thesis, Unlv Newcastle-upon- 464-488.
Tyne, Northumberland, England (1972) McGnUTH~R, T. "An approach to automating
ANDERSON, T , EVE, J., AND HORNING, J J. syntax error detection, recovery, and correc-
"Efficmnt LR(1) parsers." Acla Informatica tion for LR(k) grammars." Master's Thesis,
2 (1973), 12-39. Naval Postgraduate School, Monterey, Calif,
DEMERS, A. "Ehmination of single productions 1972
and merging nonterminal symbols of LR(1) MCKEEMAN, W. M , HORNING, J J., AND WORT-
grammars " Technical Report TR-127, Com- MAN, D. B A Compzler Generator. Prentice-
puter Science Laboratory, Dept of Electrical Hall, Englewood Cliffs, N J., 1970.
Engineering, Princeton Univ., Princeton, N . J , PAGLR, D. "A solution to an open problem by
July 1973. Knuth." Informatwn and Control 17 (1970),
DEREMER, F L. "Practical translators for 462-473.
LR(k) languages " Project MAC Report MAC PAGER, D. "On the incremental approach to left-
TR-65, MIT, Cambridge, Mass, 1969 to-right parsing " Technical Report PE 238,
DEREMER, F. L. "Simple LR(k) grammars " Information Sciences Program, Univ. Hawaii,
Comm. ACM 14, 7 (1971), 453-460 Honolulu, Hawan, 1972a.
EARLEY, J. "An efl~cmnt context-free parsing
algorithm." Comm ACM 13, 2 (1970), 94-102. PAGER, D "A fast left-to-right parser for con-
FELDMAN, J. A., AND GRILS, D. "Translator text-free grammars." Technical Report PE
writing systems " Comm. ACM 11, 2 (1968), 240, Information Sciences Program, Univ.
77-113. Hawaii, Honolulu, Hawaii, 1972b
FLOYD, R. W. "Syntactic analyms and operator PAGER, D. "On ehmmating unit productions
precedence " J. ACM 10, 3 (1963), 316-333. from LR(k) parsers." Technical Report, In-
GRAHAM, S. L., AND RHODES, S. P "Practical formation Sciences Program Univ Hawaii,
syntactic error recovery in compilers." Con- Honolulu, Hawai b 1974
ference Record of ACM Symposium on Pmn- PLTERSON, T G. "Syntax error detection, cor-
c~ples of Programmzng Languages (Oct. 1973), rectmn and recovery in parsers." PhD Thesis,
52-58 Stevens Institute of Technology, Hoboken,
GRIES, D Compiler Construction for D~g~tal N. J , 1972
Computers. Wiley, New York, 1971 WIRTH, N "PL360--a programminglanguage for
ICHBIAI-I, J. D , AND MORSE, S P. "A techmque the 360 computers." J. ACM 15, 1 (1968),
for generating almost optimal Floyd-Evans 37-74.
productions for precedence grammars " Comm. WIRTH, N , AND WEBER, H. "EULER--a generali-
ACM 13, 8 (1970), 501-508. zation of ALGOL and its formal definitmn."
JAMES, L R. "A syntax directed error recovery Comm. ACM 9, 1 (1966), 13-23, and 9, 2 (1966),
method." Technical Report CSRC,-13, Com- 89-99.

Computing Surveys, Vol. 6, No 2, June 1974