135 views

Uploaded by rdorbala

A complete guide for developing Canonical and Look Ahead LR Parser.

- OCPP 1.6 JSON Specification
- Atcd Model Qp
- Parallel Computing
- compiler design
- Lecture 03
- IMG_20170118_0002.pdf
- lec8
- Syllabus of Me
- Automata midsem paper 2010
- Compiler
- progassg03
- Source Code
- Parsing
- Quiz Compiler
- SAS Base E Learning Q
- ToC-040517 IOS Intermediate Rev
- NativeXQueryInOracleXmldb
- Decaf Grammar
- Compiler Design Lab.docx
- 10 TE Comp Sci

You are on page 1of 26

Bell Laboratories, Murray Hzll, New Jersey 07974

deterministic context-free languages in compiling applications. This paper

provides an informal exposition of L R parsing techniques emphasizing the

mechanical generation of efficient L R parsers for context-free grammars.

P a r t i c u l a r a t t e n t m n is given to extending the parser generation techniques to

apply to ambiguous grammars.

Keywords and phrases: g r a m m a l s , parsels, compilers, ambiguous grammars,

context-free languages, L R grammars.

CR calegorzes. 4 12, 5 23

known as LR parsers. These parsers are

A complete specification of a programming efficient and well suited for use in compilers

language must perform at least two func- for programming languages. Perhaps more

tions. First, it must specify the syntax of the important is the fact that we can automati-

language; that is, which strings of symbols cally generate LR parsers for a large and use-

are to be deemed well-formed programs. ful class of context-free grammars. The pur-

Second, it must specify the semantics of the pose of this article is to show how L R parsers

language; that is, what meaning or intent can be generated from certain context-free

should be attributed to each syntactically grammars, even some ambiguous ones. An

correct program. important feature of the parser generation

A compiler for a programming language algorithm is the automatic detection of

must verify that its input obeys the syntactic ambiguities and difficult-to-parse constructs

conventions of the language specification. It in the language specification.

must also translate its input into an object We begin this paper by showing how a

language program in a manner that is con- context-free grammar defines a language.

sistent with the semantic specification of the

We then discuss LR parsing and outline the

language. In addition, if the input contains parser generation algorithm. We conclude

syntactic errors, the compiler should an-

by showing how the performance of LR

nounce their presence and try to pinpoint

parsers can be improved by a few simple

their location. To help perform these func-

tions every compiler has a device within it transformations, and how error recovery and

called a parser. "semantic actions" can be incorporated into

A context-free grammar can be used to the LR parsing framework.

help specify the syntax of a programming For the purposes of this paper, a sentence

language. In addition, if the grammar is de- is a string of terminal symbols. Sentences are

signed carefully, much of the semantics of written surrounded by a pair of single quotes.

the language can be related to the rules of For example, 'a', 'ab', and ',' are sentences.

the grammar. The empty sentence is written ". Two sen-

There are many different types of parsers tences written contiguously are to be con-

for context-free grammars. In this paper we catenated, thus 'a' 'b' is synonymous with

100 • A. V. Aho and S. C. Johnson

means a set of sentences.

2. GRAMMARS

A g r a m m a r is used to define a language arid

1 lntroductmn to impose a structure on each sentence in

2 Grammars the language. We shall be exclusively con-

3 D e r l v a t m n Trees

cerned with context-free grammars, sometimes

4 Parsers

5 Representing the P a r s i n g Actmn a n d Goto Tables called B N F (for B a c k u s - N a u r form) specifi-

6 C o n s t r u c t m n of a Parser from a G r a m m a r cations.

6 I Sets of I t e m s

62 Constructing the Collectmn of Accesmble Sets of

In a context-free g r a m m a r , we specify

Items two disjoint sets of symbols to help define a

63 Constructing the Parsing A c t m n a n d Goto language. One is a set of nonterminal symbols.

Tables from the Collectmn of Sets of I t e m s

6 4 C o m p u t i n g Lookahead Sets We shall represent a nonterminal symbol by

7 Parsing A m b i g u o u s G r a m m a r s a string of one or more capital r o m a n letters.

8 0 p t l m l z a t m n of L R Parsers For example, L I S T represents a nonterminal

81 Merging Identmal States

82 S u b s u m i n g States as does the letter A. I n the g r a m m a r , one

83 E h m m a t m n of R e d u c t m n s b y Single P r o d u c t m n s nonterminal is distinguished as a start (or

9 Error Recovery

l0 O u t p u t

sentence) symbol.

11 Concluding R e m a r k s T h e second set of symbols used in a con-

References text-free g r a m m a r is the set of termznal

symbols. T h e sentences of the language gen-

erated b y a g r a m m a r will contain only

terminal symbols. We shall refer to a termi-

hal or nontcrminal symbol as a grammar

symbol.

A context-free g r a m m a r itself consists of a

finite set of rules called productzons. A

production has the form

left-side ~ right-side,

where left-side is a single nonterminal s y m b o l

(sometimes called a syntactic category) and

right-side is a string of zero or more g r a m m a r

symbols. T h e arrow is simply a special

s y m b o l t h a t separates the left and right

sides. For example,

L I S T ~ L I S T ',' E L E M E N T

is a p r o d u c t i o n in which L I S T and E L E -

M E N T are nonterminal symbols, and the

q u o t e d c o m m a represents a terminal sym-

bol.

A g r a m m a r is a rewriting system. If aA'r

Copyright (~) 1974, Association for Computing is a string of g r a m m a r symbols and A --+ fl

Machinery, Inc General permission to repubhsh,

but not for profit, all or part of thin materml is is a production, t h e n we write

granted, provided that ACM's copyright notice is

given and that reference is made to this publica- ~A-y ~ a~7

tion, to its date of issue, and to the fact that re-

printing priwleges were granted by permission of and say t h a t aA'y directly derives a~'y. A

the Association for Computing Machinery. sequence of strings

LR Parsing • 101

such t h a t s,-~ ~ s, for 1 ~< i ~< n is said to af3w is said to be a wable prefix of the gram-

be a derwalwn of s~ from ~0. We sometimes mar. For example,

also say s~ is derivable from s0. L I S T ','

The start symbol of a g r a m m a r is called a

sentent,al form. A string derivable from the is a viable prefix of G1, since it is a prefix of

start symbol is also a sententml form of the the right sentential form,

grammar. A sentential form containing only L I S T ',' E L E M E N T

terminal symbols is said to be a sentence

generated by the grammar. The language (Both s and w are null here.)

generated by a grammar (;, often denoted Restating this definition, a viable prefix of

by L(G), is the set of sentences generated by a g r a m m a r is any prefix of a right sentential

G. form t h a t does not extend past the right end

Example 2.1: The following grammar, of a handle in t h a t right sentential form.

hereafter called G~, has L I S T as its start Thus we know that there is always some

symbol: string of g r a m m a r symbols t h a t can be ap-

pended to the end of a viable prefix to ob-

L I S T --~ L I S T ',' E L E M E N T tain a right sentential form. Viable prefixes

L I S T --* E L E M E N T arc important in the construction of com-

E L E M E N T ~ 'a' pilers with good error-detecting capabilities,

E L E M E N T --~ 'b' as long as the portion of the input we have

The sequence: seen can be derived from a viable prefix,

L I S T ~ L I S T ',' E L E M E N T we can be sure t h a t there are no errors t h a t

L I S T ',a' can be detected having scanned only t h a t

L I S T ',' E L E M E N T ',a' part of the input.

L I S T ',b,a'

E L E M E N T ',b,a'

'a,b,a' 3. DERIVATION TREES

is a derivation of the sentence 'a,b,a'. L(G~) Frequently, our interest in a g r a m m a r is

consists of nonempty strings of a's and b's, not only in the language it generates, but

separated by commas. also in the structure it imposes on the sen-

tences of the language. This is the case be-

Note that in the derivation in Example cause grammatical analysis is closely con-

2.1, the rightmost nonterminal in each sen- nected with other processes, such as compila-

tential form is rewritten to obtain the fol- tion and translation, and the translations or

lowing sentential form. Such a derivation is actions of the other processes are frequently

said to be a r~ghlmost der~valzo~ and each sen- defined in terms of the productions of the

tential form in such a derivation is called a grammar. With this in mind, we turn our

mght se~le~t~al form. For example, attention to the representation of a deriva-

L I S T ',b,a' tion b y its demvatwn tree.

For each derivation in a g r a m m a r we can

is a right sentential form of C1.

construct a corresponding derivation tree.

If s A w is a right sentential form in which

w is a string of terminal symbols, and s A w ~ Let us consider the derivation in Example

s~w, then ~ is said to be a handle of s~w * 2.1. To model the first step of the derivation,

For example, 'b' is the handle of the right in which L I S T is rewritten as

sentential form L I S T ',' E L E M E N T

L I S T ',b,a' using production 1, we first create a root

in Example 2.1. labeled by the start symbol LIST, and then

• Some authors use a more restmctmg dehnltmn of create three direct descendants of the root,

handle labeled LIST, ',', and E L E M E N T :

102 • A.V. Aho and S. C. Johnson

that sentence as its frontier. A g r a m m a r t h a t

admits two or more distinct derivation trees

( • ,ST ) ~) -"" with the same frontier is said to be ambigu-

ous.

Example 3.1: The g r a m m a r G2 with pro-

ductions

(We follow historical usage and draw our

" r o o t " node at the top.) In the second step L I S T --* L I S T ',' L I S T

of the derivation, E L E M E N T is rewritten L I S T --~ 'a'

as 'a'. To model this step, we create a direct L I S T --* 'b'

descendant labeled 'a' for the node labeled

ELEMENT: is ambiguous because the following two

derivation trees have the same frontier.

LIST )

LIST)

( LIST~ (~ ~LEMENT~) ( •,ST~ © ~ .,ST~

@< ,,s, ~ © ¢ ,,ST)

® @

Continuing in this fashion, we obtain the

following tree:

LIST )

L,,T ~ © ~ ,,,T )

LIST~ © ~ L,,T~ ~

G

In certain situations ambiguous g r a m m a r s

Note t h a t if a node of the derivation tree is can be used to represent programming

labeled with a nonterminal symbol A and its languages more economically t h a n equiva-

direct descendants are labeled X1, X2, - . . , lent unambiguous grammars. However, if an

X,, then the production. ambiguous g r a m m a r is used, then some other

rules should be specified along with the

A--~X1X2... Xn g r a m m a r to determine which of several

derivation trees is to be associated with a

must be in the grammar. given input. We shall have more to say

If a~, a2, . . . , am are the labels of all the about ambiguous g r a m m a r s in Section 7.

leaves of a derivation tree, in the natural

left-to-right order, then the string

al a2 • • • am 4. PARSERS

is called the frontier of the tree. For example, We can consider a parser for a g r a m m a r to be

'a,b,a' is the frontier of the previous tree. a device which, when presented with an

Clearly, for every sentence in a language input string, a t t e m p t s to construct a deriva-

L R Parsing • 103

tion tree whose frontier matches the input. using the production, -.

If the parser can construct such a derivation

E L E M E N T -~ 'a'

tree, then it will have verified that the input

string is a sentence of the language generated To reflect this parsing action, we say that 'a'

by the grammar. If the input is syntactically is reduced to E L E M E N T . Next we use the

incorrect, then the tree construction process production

will not succeed and the positions at which

L I S T --~ E L E M E N T

the process falters can be used to indicate

possible error locations. to obtain the tree:

A parser can operate in many different

ways. In this paper we shall restrict ourselves

to parsers that examine thc input string

from left to right, one symbol at a time. I

These parsers will attempt to construct the

derivation tree " b o t t o m - u p " ; i.e., from the

(ELEMENT~

leaves to the root. For historical reasons,

these parsers are called L R parsers. The " L "

stands for "left-to-right scan of the input",

the " R " stands for " n g h t m o s t derivation."

We shall see that an LR parser operates by Here, E L E M E N T is reduced to LIST. We

reconstructing the reverse of a rightmost then read the next input symbol ',', and

derivation for the input. In this section we add it to the forest as a one node tree.

shall describe in an informal way how a cer-

tain class of L R parsers, called LR(1)

parse-% operate. LIST)

An L R parser deals with a sequence of

I,

partially built trees during its tree construc-

tion process. Wc shall loosely call this se- (ELEMENT~

quence of trees a forest. In our framework the

forest is built from left to right as the input

is read. At a particular stage in the construc-

©

tion process, we have read a certain amount

of the input, and we have a partially con-

We now have two trees. These trees will

structed derivation tree. For example, sup-

eventually become sub-trees in the final

pose that we are parsing the input string

derivation tree. We then read the next input

'a,b' according to the grammar (ix. After symbol 'b' and create a single node tree for

reading the first 'a' we construct the tree: it as well'

Q LIST )

I

Then we construct:

~LEMENT~

© Q

~LEMENT~

Using the production,

E L E M E N T --~ 'b'

104 • A . V. Aho and S. C. Johnson

A ---~X1X2 . . . X=

LIST ) is specified; each X~ represents a terminal or

I nonterminal symbol. A reduction by this

production causes the following operations:

~LEMENT~ ~ELEMENT~ (1) A new node labeled A is created.

(2) The rightmost n roots in the forest

(which will have already been labeled

X1, X2, - . . , X,) are made direct

descendants of the new node, which

then becomes the rightmost root of the

Finally, using the production forest.

LIST --~ L I S T ',' E L E M E N T If the reduction is b y ~ production of the

form

we combine these three trees into the final

tree: A __+ ,,

(i.e., where the right side is the empty

( LIST ) string), then the parser merely creates a root

labeled A with no descendants.

I

\ A parser operates by repeatedly making

parsing actions until either an accept or error

action occurs.

The reader should verify that the follow-

parse tree for 'a,b' in GI:

(1) Shift 'a'

(2) Reduce by: E L E M E N T -+ 'a'

At this point the parser detects that we have (3) Reduce by: L I S T --+ E L E M E N T

read all of the input and announces that the (4) Shift ','

parsing is complete. The rightmost deriva- (5) Shift 'b'

tion of 'a,b' in G1 is (6) Reduce by. E L E M E N T -+ 'b'

(7) Reduce by: L I S T - - ~ L I S T ','

L I S T ~ L I S T ',' E L E M E N T ELEMENT

L I S T ',b' (8) Accept

E L E M E N T ',b'

'a,b' We now consider the question of how an L R

parser decides what parsing actions to make.

In parsing 'a,b' in the above manner, all we Clearly ~ parsing action can depend on what

have done is reconstruct this rightmost actions have already been made and on what

derivation in reverse. The sequence of pro- the next input symbols are. An LR parser

ductions encountered in going through a that looks at only the next input symbol to

rightmost derivation in reverse is called a decide which parsing action to make is

right parse. called an LR(1) parser. If it looks at the

There are four types of parsing actions next k input symbols, k >/ 0, it is called an

that an LR parser can make; shift, reduce, LR(k) parser. To help to make its parsing

accept (announce completion of parsing), or decisions, an LR parser attaches to the root

announce error. of each tree in the forest a number called a

In a shift action, the next input symbol is state. The number on the root of the right-

removed from the input. A new node labeled most tree is called the current state. In addi-

by this symbol is added to the forest at the tion, there is an re,hal state to the left of the

right as a new tree by itself. forest, which helps determine the very first

L R Parsing • 105

parsing action. We shall write the states in shift move or a reduce move, the parser

parentheses above the associated roots. For must determine what state to attach to the

example, root of the tree t h a t has just been added to

the forest. I n a shift move, this state is de-

(I) termined by the current state and the input

symbol t h a t was just shifted into the forest.

(UST) For example, if we have just shifted ','

into the forest

I

(0)

@LEMENT~ (5) (1)

©

represents a forest with states. State 5 is the

current state, and state 0 is the initial state.

The current state and the next input symbol Io, ©

determine the parsing action of an LR(1)

parser.

The following table shows the states of an then state 1 and ',' determine the state to be

LR(1) parser for G1, and the associated pars- attached to the new rightmost root ','.

ing actions. In this table there is a column I n a reduce move, suppose we reduce b y

labeled '$' with special significance. The '$' production

stands for the right endmarker, which is A --* X1X2 . . . X ,

assumed to be appended to the end of all

input strings. Another way of looking at this When we make nodes X1, . - . , X,, direct

is to think of '$' as representing the condi- descendants of the root A, we remove the

tion where we have read and shifted all of states t h a t were attached to X1, . . . , Xn.

the "real" characters in the input string. The state t h a t is to be attached to node A

is determined b y the state t h a t is now the

Next Input Symbol rightmost state in the forest, and the non-

terminal A. For example, if we have just

'a' 'b' ',' '$'

reduced b y the production

0 shift shift error error L I S T -~ L I S T ',' E L E M E N T

1 error error shift accept

2 error error Red. 2 Red 2 and created the forest

Current 3 error error Red. 3 Red. 3

State 4 error error Red. 4 Red. 4

5 shift shift error error

6 error error Red. 1 Red 1 ( LIST )

FIG. I. P a r s i n g A c t i o n T a b l e for Gt

"Red. n" in the above table; the integer n

refers to the productions as follows:

(1) L I S T -~ L I S T ',' E L E M E N T

(0)

C ",st ?"

LE EN,)

G ©

I G

(2) L I S T --+ E L E M E N T

(3) E L E M E N T --+ ' a ' then state 0 and the nonterminal L I S T de-

(4) E L E M E N T --~ 'b' termine the state to be attached to the root

We shall refer to the entry for row s and L I S T . Note t h a t the states previously at-

column c as pa(s,c). After making either a tached to the direct descendants of the new

106 • A. V. Aho and S. C. Johnson

root have disappeared, and play no role in responding to the current state and the

the calculation of the new state. current input symbol. On the basis of this

The following table determines these new entry (Sh~ft, Reduce, Error, or Accept) do

states for G1. For reasons that will become one of the following four actions:

apparent later, we shall call this table the Shift: Add a new node, labeled with the

goto table for G1. current input symbol, to the forest. Associ-

ate the state

LABEL OF NEW ROOT

goto(current state, input)

LIST ELEMENT 'o' 'b' ','

1 2 3 4 to this node and make this state the new cur-

5 rent state. Advance the input cursor to

RIGHTMOST read the next character. Repeat the step

STATE labeled Parsing Action.

6 3 4

Reduce: If the indicated production is

A --~XIX2 "'" Xn

BOTO TABLE FOR G.I

add a new node labeled A to the forest, and

FIG 2. Goto Table for G1 make the rightmost n roots, n /> 0, direct

descendants of this new node. Remove the

We shall refer to the entry in the row for states associated with these roots. If s is the

state s and column c as goto(s, c). It turns state which is now rightmost in the forest

out that the entries in the goto table which (on the root immediately to the left of the

are blank will never be consulted [Aho and new node), then associate the state

Ullman (1972b)].

An L R parser for a grammar is completely goto(s,A)

specified when we have given the parsing

with the new node. Make this state the new

action table and the goto table. We can

current state. (Notice that the input charac-

picture an LR(1) parser as shown in Fig. 3.

ter is not changed.) Repeat the step labeled

Parsing Action.

,NPUT I ° 1 ' Ibl*l Accept: Halt. A complete derivation tree

4 INPUT CURSOR

/'~OREST CONSISTING~ ' ~ has been constructed.

( OF PARTIALLY CON-- ~ LR(I) I Error: Some error has occurred in the

\ STRUCTEDDERIVATION ] I PARSING I

\ TREE WITH STATES /

ATTACHED ~

I ALGORITHM I

~ input string. Announce error, and then try

to resume parsing by recovering from the

error. (This topic is discussed in Section 9.)

To see how an L R parser works, let us

again parse the input string 'a,b' using the

parsing action function p a (Figure 1) and

FIo. 3. Plctomal Representatmn of an LR(1)

Parser the g o t o function (Figure 2).

Initial,zatwn: We place state 0 into the

forest; 0 becomes the current state.

The LR(1) parsing algorithm can be sum- Parsing Actwn 1: pa(0, 'a') = shift. We

marized as follows: create a new root labeled 'a' and attach state

Initmlize: Place the initial state into an 3 to it (because goto(0, 'a') = 3). We have:

otherwise empty forest; the initial state is

the current state at the beginning of the

parse. (3)

Parsing Action: Examine the parsing ac- (0) Q

tion table, and determine the entry cor-

L R Parsing • 107

Parsing Actwn 2: pa(3, ',') = reduce 3. Parsing Action 7: pa(6, '$') = reduce 1.

We reduce by production (3) We reduce by production (1)

E L E M E N T --* 'a' L I S T --~ L I S T ',' E L E M E N T

We examine the state immediately to the The state to the left of the newly created

left; this is state 0. Since goto(0, E L E - tree is state 0, so the new state is goto(0,

M E N T ) = 2, we label the new root with 2. L I S T ) = 1.

We now have: Parsing Action 8: pa(1, '$') = accept. We

halt and terminate the parse.

(2) The reader is urged to follow this pro-

cedure with another string, such as 'a,b,a' to

verify his understanding of this process. I t is

~.ELEMEN9 also suggested t h a t he try a string which is

not in L(G1), such as 'a,ba' or 'a,,b', to see

(0) how the error detection mechanism works.

Note t h a t the g r a m m a r symbols on the roots

of the forest, concatenated from left to right,

Parsing Action 3: pa(2, ',') = reduce 2. always form a viable prefix.

We reduce b y production (2) Properly constructed LR(1) parsers can

parse a large class of useful languages called

L I S T -~ E L E M E N T

the deterministic context-free languages. These

goto(0, L I S T ) = 1, so the new state is 1. parsers have a number of notable properties:

Parsing Action 4: pa(1, ',') = shift. We (1) T h e y report error as soon as possible

shift and attach state 5. (scanning the input from left to right).

Parsing Action 5: pa(5, 'b') = shift. We (2) T h e y parse a string in a time which is

shift and attach state 4. We now have proportional to the length of the

string.

(1) (3) T h e y require no rescanning of previ-

ously scanned input (backtracking).

(4) The parsers can be generated mechan-

LIST ) ically for a wide class of grammars,

I including all g r a m m a r s which can be

parsed b y recursive descent with no

(o) ® (5)

©@

(4) backtracking [Knuth (1971)] and

those grammars parsable b y operator

precedence techniques [Floyd (1963)].

The reader m a y have noticed t h a t the

Parszng Action 6: pa(4, '$') = reduce 4. states can be stored on a pushdown stack,

We reduce b y production (4) since only the rightmost state is ever used

at any stage in the parsing process. In a

E L E M E N T -~ 'b'

shift move, we stack the new state. In a

goto(5, E L E M E N T ) = 6, so the new state reduce move, we replace a string of states on

is 6. We now have top of.the stack b y the new state.

For example, in parsing the input string

(t) 'a,b' the stack would appear as follows at

each of the actions referred to above. (The

(LIST) top of the stack is on the right.)

I (6) A ctwn Stack Input

(ELEME9N (5) (ELEMENT) Initial

1

0

0 3

'a,b$'

',b$'

(01 © @ 3

4

2 02

01

0 15

',b$'

',b$'

'b$'

108 • A . V . A h o and S. C. J o h n s o n

A: i f (state = sl) goto = sl'

5 0 1 5 4 '$'

6 0 1 5 6 '$'

7 0 1 '$'

i f (state = s n ) g o t o = s~'

8 0 1 '$'

The goto table of G1 would be represented in

Thus, the parser control is independent of this format as:

the trees, and depends only on a stack of L I S T : i f (state = 0) g o t o = 1

states. I n practice, we m a y not need to con- E L E M E N T : i f (state = 0 ) g o t o = 2

struct the derivation tree explicitly, if the i f (state = 5 ) g o t o = 6

translation being performed is sufficiently

simple. For example, in Section 10, we men- I t turns out t h a t [Aho and Ullman (1972b)]

tion a class of useful translations t h a t can whenever we do a goto on A, the state will

be performed by an L R parser without re- always be one of sl, • • • , sn, even if the input

quiring the forest to be m a i n t a i n e d . string is in error. Thus, one of these branches

If we wish to build the derivation tree, we will always be taken. We shall return to this

can easily do so by stacking, along with each point later in this section.

state, the root of the tree associated with t h a t We shall encode parsing actions in the

state. same spirit, but by rows of the table. The

parsing actions for a state s will also be

represented b y a sequence of pseudo-pro-

gramming language statements. I f the input

5. REPRESENTING THE PARSING ACTION AND

symbols al, . . . , a= have the associated

GOTO TABLES

actions actionl, . . . , actionn, then we will

Storing the full action and goto tables write:

straightforwardly as matrices is extremely

s: i f (input = al) a c t i o n 1

wasteful of space for large parsers. For ex-

ample, the goto table is typically nearly all

i f (input = an) a c t i o n n

blank. In this section we discuss some simple

ways of compacting these tables which lead As we mentioned earlier, we shall attach

to substantial savings of space; in effect, we goto(s,a,) onto the action if action~ is shift.

are merely representing a sparse matrix more Similarly, if we have a reduction b y the

compactly, using a particular encoding. production A --* a, we will usually write

Let us begin with the shift actions. If x is reduce b y A --~ a

a terminal symbol and s is a state, the parsing

action on x in state s is shift if and only if as the action.

goto(s, x) is nonblank. We will encode the For example, the parsing actions for state

goto into the shift action, using the notation 1 in the parser for G~ are represented by:

i f (input = 'b') e r r o r

as a shorthand for "shift and attach state 17 i f (input = ',') s h i f t 5

to the new node." B y encoding the gotos on i f (input = '$') a c c e p t

terminal symbols as part of the action table,

At first glance this is no saving over the

we need only consider the gotos on non-

table, since the parsing action table is

terminal symbols. We will encode t h e m by

usually nearly full. We m a y make a large

columns; i.e., b y nonterminal symbol name. saving, however, by introducing the notion

If, on a nonterminal symbol A, there are of a default action in the statements. A

nonblank entries in the goto table corre- default action is simply a parsing action

sponding to states s~, s2, • • • , sn, and we have which is done irrespective of the input char-

s,' = goto(s,, A), for i = 1, . - . , n acter; there m a y be at most one of these in

then we shall encode the column for A in a each state, and it will be written last. Thus,

pseudo-programming language: in state 1 we have two error actions, a shift

L R Parsing • 109

action, and an accept action, we shall make in the last section, is dictated b y the current

the error action the default. We will write: state. This state reflects the progress of the

parse, i.e., it summarizes information about

1: i f (input = ',') s h i f t 5 the input string read to this point so t h a t

i f (input = $ ) a c c e p t parsing decisions can be made.

error

Another way to view a state is to consider

There is an additional saving which is the state as a representative of an equiva-

possible. Suppose a state has both error and lence class of viable prefixes. At every stage

reduce entries. Then we m a y replace all of the parsing process, the string formed by

error entries in that state b y one of the re- concatenating the g r a m m a r symbols on the

duce entries. The resulting parser m a y make roots of the existing subtrees m u s t be a vi-

a sequence of reductions where the original able prefix; the current state is the repre-

parser announced error but the new parser sentative of the class containing t h a t viable

will announce error before shifting the next prefix.

input symbol. Thus both parsers announce

error at the same position in the input, but 6.1 Sets of Items

the new parser m a y take slightly longer be- I n the same way t h a t we needed to discuss

fore doing so. partially built trees when talking about pars-

There is a benefit to be had from this modi- ing, we will need to talk about "partially

fication; the new parsing action table will re- recognized productions" when we talk about

quire less space than the original. For building parsers. We introduce the notion of

example, state 2 of the parsing action table item* to deal with this concept. An item is

for G1 would originally be represented by: simply a production with a dot (.) placed

somewhere in the right-hand side (possibly

2: i f (input = 'a') e r r o r at either end). For example,

if (input = 'b') e r r o r

if (input = ',') r e d u c e 2

[LIST ~ L I S T • ',' E L E M E N T ]

if (input = '$') r e d u c e 2

[ E L E M E N T -~ . 'a']

Applying this transformation, state 2 would

be simply represented as: are both items of G1.

We enclose items in square brackets to

2: r e d u c e 2 distinguish t h e m more clearly from produc-

Thus in a state with reduce actions, we tions.

will always have the shift and accept actions Intuitively, a set of items can be used to

precede the reduce actions. One of the reduce represent a stage in the parsing process; for

actions will become a default action, and we example, the item

will ignore the error entries. In a state with-

out reduce actions, the default action will be [A --~ a . f~]

error. We shall discuss other means of cut-

indicates that an input string derivable from

ting down on the size of a parser in Section 8.

a has just been seen, and, if we next see an

input string derivable from f3, we m a y be

able to reduce b y the production A --* aft.

6. CONSTRUCTION OF A PARSER FROM A Suppose the portion of the input t h a t we

GRAMMAR have seen to this point has been reduced to

the viable prefix "ya. Then the item [A --*

How do we construct the parsing action and

a . ~] is said to be valid for ~a if ~A is also a

goto tables of an LR(1) parser for a given

viable prefix. In general, more than one item

g r a m m a r ? In this section we outline a

method that works for a large class of is valid for a given viable prefix; the set of

all items which are valid at a particular

grammars called the lookahead LR(1)

(LALR(1)) grammars. * Some authors have used the t e r m "configura-

T h e behavior of an L R parser, as described t i o n " for item.

110 • A. V. Aho and S. C. Johnson

stage of the parse corresponds to the current ample, we shall construct parsing action and

state of the parser. goto tables for G1.

As an example, let us examine the viable First, we augment the g r a m m a r with the

prefix production

L I S T ',' A C C E P T --~ L I S T

in G1. The item where in general L I S T would be the start

symbol of the g r a m m a r (here G1). A reduc-

[LIST --~ L I S T ',' . E L E M E N T ] tion by this production corresponds to the

accept action b y the parser.

is valid for this prefix, since, setting ~, to the

Next we construct I0 = V("), the set of

e m p t y string and a to L I S T ',' m the defini-

items valid for the viable prefix consisting

tion above, we see that ~ L I S T (which is

of the e m p t y string. By definition, for G1 this

lust L I S T ) is a viable prefix. I n other words,

set must contain the item

when this item is valid, we have seen a por-

tion of the input t h a t can be reduced to the [ A C C E P T --~ . LIST]

viable prefix, and we expect to see next a

The dot in front of the nonterminal L I S T

portion of the input t h a t can be reduced to

means that, at this point, we can expect to

ELEMENT.

find as the remaining input any sentence

The item

derivable from L I S T . Thus, I0 must also

[LIST --* . E L E M E N T ] contain the two items

setting ~/ to L I S T ',' and a to the e m p t y [LIST --~ . E L E M E N T ]

string we obtain obtained from the two productions for the

L I S T ',' L I S T nonterminal LIST. The second of the items

has a dot in front of the nonterminal E L E -

which is not a viable prefix. M E N T , so we should also add to the initial

The reader can (and should) verify t h a t state the items

the state corresponding to the viable prefix

L I S T ',' is associated with the set of items: [ E L E M E N T --~ . 'a']

[ E L E M E N T -~ . 'b']

[LIST -~ L I S T ',' . E L E M E N T ] corresponding to the two productions for

[ E L E M E N T --* . 'a'] element. These five items constitute I0.

[ E L E M E N T -~ . 'b'] We shall associate state 0 with I0.

If ~, is a viable prefix, we shall use V('~) to Now suppose t h a t we have computed

denote the set of items that are valid for % If V(~), the set of items which are valid for

~/is not a viable prefix, V(~,) will be empty. some viable prefix % Let X be a terminal or

We shall associate a state of the parser with nonterminal symbol. We compute V ( ~ X )

each set of valid items and construct the from V('y) as follows:

entries in the parsing action for t h a t state (1) For each item of the form [A --*

from the set of items. There is a finite num- a . X~] in V('y), we add to V('yX)

ber of productions, thus only a finite number the item [A ~ a X . ~].

of items, and thus a finite number of possi- (2) We compute the closure of the set of

ble states associated with every g r a m m a r G. items in V(~,X); t h a t is, for each item

of the form [B --~ a . C~] in V(~,X),

where C is a nonterminal symbol, we

6.2 Constructing the Collection of Accessible

add to V ( ~ X ) the items

Sets of Items

We shall now describe a constructive pro- [C ~ . ~1]

cedure for generating all of the states and,

at the same time, generating the parsing

action and goto table. As a rumfing ex- [C ~ . an]

L R Parsing * 111

the productions ill G with C on the for all accessible sets of items I and gram-

left side. If one of these items is al- m a r symbols X , whenever the G O T O con-

ready in V('IX) we do not duplicate struction comes up with a new n o n e m p t y set

this item. We continue to apply this of items, this set of items is added to the set

process until no new items can be of accessible sets of items and the process

added to V('rX). continues. Since the number of sets of items

is finite, the process eventually terminates.

I t can be shown that steps (1) and (2)

The order in which the sets of items are

compute exactly the items t h a t are valid for

computed does not matter, nor does the

~,X [Aho and Ullman (1972a)].

name given to each set of items. We will

For cxample, let us compute 11 =

name the sets of items I0, 11, 12, . . . in the

V(LIST), the set of items t h a t are valid for

order in which we create them. We shall

the viable prefix LIST. We apply the above

then associate state i with I,.

construction with ~, = " and X = L I S T , and

Let us return to G1. We have computed

use the fivc items in I0.

I0, which contained the items

In step (1) of the above construction, we

add the items [ A C C E P T --~. LIST]

[LIST -~ . L I S T ',' E L E M E N T ]

[ A C C E P T -~ L I S T .]

[LIST --~ . E L E M E N T ]

[LIST --~ L I S T . ',' E L E M E N T ] [ E L E M E N T --* . 'a']

[ E L E M E N T - - * . 'b']

to 11. Since no item in 11 has a nonterminal

symbol immediately to the right of the dot, We now wish to compute GOTO(Io, X) for

the closure operation adds no new items to all g r a m m a r symbols X. We have already

11. The reader should verify t h a t these two computed

items are the only items valid for the viable GOTO(Io, L I S T ) = I1

prefix. We shall associate state 1 with 11.

Notice that the above construction is com- To determine GOTO(I0, E L E M E N T ) , we

pletely independent of ~/; it needs only the look for all items in I0 with a dot immedi-

items in V(~), and X. For every set of items ately before E L E M E N T . We then take

I and every g r a m m a r symbol X the above these items and move the dot to the right of

construction builds a new set of items which E L E M E N T . We obtain the single item

we shall call G O T O ( I , X); this is essentially [LIST --* E L E M E N T .]

the same goto function encountered in the

last two sections. Thus, in our example, we The closure operation yields no new items

have computed since this item has no nonterminal to the

right of the dot. We call the set with this

GOTO(I0, L I S T ) = 11 item I2. Continuing in this fashion we find

We can extend this G O T O function to that:

strings of g r a m m a r symbols as follows: GOTO(I0, 'a') contains only

G O T O ( I , ") = I [ E L E M E N T --~ 'a' .]

GOTO(I0, 'b') contains only

G O T O ( I , -rX) = G O T O ( G O T O ( I , ~), X) [ E L E M E N T --~ 'b' .]

where "r is a string of g r a m m a r symbols and and GOTO(I0, ',') and GOTO(I0, 'S') are

X is a nontermmal or terminal symbol. If empty. Let us call the two n o n e m p t y sets

I = V(a), then I = GOTO(Io, a). Thus I3 and I4. We have now computed all sets of

GOTO(I0, a) ~ ~b if and only if a is a viable items t h a t are directly accessible from I0.

prefix, where I0 = V("). We now compute all sets of items t h a t are

The sets of items which can be obtained accessible from the sets of items just com-

from Io by G O T O ' s are called the accesszble puted. We continue computing accessible

sets of ~tems. We build up the set of accessi- sets of items until no more new sets of items

112 • A . V . A h o a n d S . C. J o h n s o n

are found. The following table shows the 6.3 Constructing the Parsing Action and Goto

collection of accessible sets of items for G~: Tables from the Collection of Sets of Items

The parsing action table is constructed

Io: ] A C C E P T --~ . LIST] from the collection of accessible sets of items.

[LIST --*. L I S T ',' E L E M E N T ]

From the items in each set of items I , we

[LIST --~ . E L E M E N T ] generate parsing actions. An item of the

[ E L E M E N T --~ . 'a']

form

[ E L E M E N T --~ . 'b']

[A --* a . 'a' El

Ix: [ A C C E P T --~ L I S T . ]

[LIST --~ L I S T . ',' E L E M E N T ] in I , generates the parsing action

if (input = 'a') s h i f t t

12: [LIST --~ E L E M E N T .]

where GOTO(I,, 'a') = It.

13 : [ E L E M E N T -~ 'a' .1 An item with the dot at the right end of

the production is called a completed item. A

I4: [ E L E M E N T --* 'b' .] completed item [A -~ a .] indicates t h a t we

may reduce by production A --~ a. However,

with an LR(1) parser we must determine

15: [LIST --~ L I S T ',' . E L E M E N T ]

[ E L E M E N T -~ . 'a'] on what input symbols this reduction is

[ E L E M E N T --* . 'b'] possible. If 'al', ' a2,' " . , 'a '

,, are these

symbols and 'al', a2, • • • , an are not asso-

ciated with shift or accept actions, then we

I6: [LIST --~ L I S T ',' E L E M E N T .]

would generate the sequence of parsing ac-

The GOTO function on this collection can tions:

be portrayed as a directed graph in which

if(input = 'al') r e d u c e b y : A --+

the nodes are labeled by the sets of items

if(input = 'a2') r e d u c e b y : A --*

and the edges by grammar symbols, as fol-

lows:

if(input = 'an') r e d u c e by: A --~

set of items contains only one completed

item, we can replace this sequence of parsing

ELEMENT ~@ actions by the default reduce action

r e d u c e by: A ~

k ,Q~ Q

This parsing action is placed after all shift

and accept actions generated by this set of

'b' i items.

If a set of items contains more than one

completed item, then we must generate

Here, we used i in place of I,. conditional reduce actions for all completed

For example, we observe items except one. In a while we shall ex-

plain how to compute the set of input sym-

GOTO(0, ") = 0 bols on which a given reduction is permissi-

GOTO(0, L I S T ',') = 5 ble.

GOTO(0, L I S T ',' E L E M E N T ) = 6 If a completed item is of the form

to a given node if and only if that path spells

then we generate the accept action

out a viable prefix. Thus, GOTO(0, 'ab') is

empty, since 'ab' is not a viable prefix. if(input = '$') a c c e p t

LR Parsing • 113

where '$' is the right endmarker for the input and to decide between reductions if more

string. than one is possible in a given state. In

Finally, if a set of items generates no re- general, this is a complex task; the most

duce action, we generate the default error general solution of this problem was given by

statement. This statement is placed after [Knuth (1965)], but his algorithm suffers

all shift and accept actions generated from from large time and memory requirements.

the set of items. Several simplifications have been proposed,

Returning to our example for G1, from notably by [DeRemer (1969 and 1971)],

I0 we would generate the parsing actions: which lack the full generality of Knuth's

technique, but can construct practical par-

if(input = 'a') s h i f t 3

sers in reasonable time for a large class of

if(input = 'b') s h i f t 4

languages. We shall describe an algorithm

error

that is a simplification of Knuth's algorithm

Notice that these are exactly the same pars- which resolves all conflicts that can be re-

ing actions as those for state 0 in the parser solved when the parser has the states as

of Section 4. Similarly, I3 generates the ac- given above.

tion

6.4 Computing Lookahead Sets

reduce by: E L E M E N T -~ 'a'

Suppose [A -~ = . B] is an item that is

The goto table is used to compute the new valid for some viable prefix ~a. We say that

state after a reduction. For example, when input symbol 'a' is applicable for [A ---* ~ • ~]

the reduction in state 3 is performed we al- if, for some string of terminals 'w', both

ways have state 0 to the left of 'a'. The new "y=~'aw' and ~,A'aw' are right sentential

state is determined by simply noting that forms. The right endmarker '$' is applicable

for [A ---* = . ~] if both ~,=B and ~A are

GOTO(I0, E L E M E N T ) = I2 right sentential forms.

This gives rise to the code This definition has a simple intuitive ex-

planation when we consider completed items.

if(state = 0 ) g o t o = 2 Suppose input symbol 'a' is applicable for

completed item [A --* ~ .]. If an LR(1)

for E L E M E N T in the goto table.

parser makes the reduction specified by this

In general, if nonterminal A has precisely

item on the applicable input symbol 'a',

the following GOTO's in the GOTO graph:

then the parser will be able to make at least

GOTO(I~, A) = I , one more shift move without encountering

G O T O ( I , , A) = It, an error.

The set of symbols that are applicable for

GOTO(I,~, A) = It~ each item will be called the lookahead set

for that item. From now on we shall in-

then we would generate the following repre- clude the lookahead set as part of an item.

sentation for column A of the goto table: The production with the dot somewhere in

A: if(state = s l ) g o t o = tl the right side will be called the core of the

if(state = s2) g o t o = t~ item. For example,

( [ E L E M E N T -o 'a' .], {',', '$'})

if(state = s , , ) g o t o = t~

is an item of G1 with core

Thus, the goto table is simply a representa-

tion of the GOTO function of the last sec- [ E L E M E N T --* 'a' .]

tion, applied to the nonterminal symbols.

We must now determine the input sym- and lookahead set {',', '$'}.

bols on which each reduction is applicable. We shall now describe an algorithm that

This will enable us to detect ambiguities and will compute the sets of valid items for a

difficult-to-parse constructs in the grammar, grammar where the items include their

114 • A. V. Aho and S. C. Johnson

lookahead sets. Recall t h a t in the last sec- through the closure operation, to two addi-

tion items in a set of items arose in two ways: tional items

b y goto calculations, and then b y the closure

operation. The first t y p e of calculation is ([LIST - - * . L I S T ',' E L E M E N T ] , {','I)

very simple; if we have an item of the form and ([LIST--* . E L E M E N T ] , [','})

([A --~ a . X/3], L) since the first terminal symbol of any string

derivable from

where X is a g r a m m a r symbol and L is a

lookahead set, then when we perform the ',' E L E M E N T '$'

goto operation on X on this item, we obtain

the item is always ','. Since all items with the same

core are merged into a single item with the

([A --* a X . [3], L) same core and the union of the lookahead

(i.e., the lookahead set is unchanged). sets, we currently have the following items

I t is somewhat harder to compute the in I0:

lookahead sets in .the closure operation. ( [ A C C E P T - ~ . LIST], {'$'})

Suppose there is an item of the form ([LIST --~. L I S T ',' E L E M E N T ] , {',', '$'})

([A --~ a . BE], L) ( [ L I S T - - ~ . E L E M E N T ] , {',', '$'])

in a set of items, where B is a nonterminal T h e first two of these items no longer give

symbol. We must add items of the form rise to any new items when the closure

operation is applied. The third item gives

([B - - ~ . ~], L') rise to the two new items:

where B --* ~ is some production in the ( [ E L E M E N T --~. 'a'], {',', '$'})

grammar. The new lookahead set L ' will ( [ E L E M E N T --~. 'b'], {',', '$'})

contain all terminal symbols which are the

first symbol of some sentence derivable from and these five items make up I0.

any string of the form /3 'a', where 'a' is a We shall now compute

symbol in L. I2 = GOTO(I0, 'a').

If, in the course of carrying out this con-

struction, a set of items is seen to contain First we add the item

items with the same core; e.g.,

( [ E L E M E N T --* 'a' .], {',', '$'1)

([A --. a . / 3 ] , L,)

to I2, since 'a' appears to the right of the

and ([A --* a . ~], L2) dot of onc item in I9. T h e closure operation

adds no new items to 12.

then these items are merged to create a sin- I2 contains a completed item. The look-

gle item; e.g., ([A --~ a . ~], L1 U L2). ahead set /',', '$'} tells us on which input

We shall now describe the algorithm for symbols the reduction is applicable.

constructing the collection of sets of items The reader should verify t h a t the com-

in more detail b y constructing the valid sets plete collection of sets of items for G1 is:

of items for g r a m m a r G1. Initially, we con-

struct Io b y starting with the single item 10: ]ACCEPT --* . LIST[, {'$'}

[LIST --.. LIST ',' ELEMENT], [',', '$'J

( [ A C C E P T - - * . LIST], {'$'}) [LIST--* ELEMENT], {',', '$']

[ELEMENT -~ . 'a'], {',', '$'}

We then compute the closure of this set of [ELEMENT --* 'b'], [',', '$'}

items. The two productions for L I S T give

rise to the two items I~' [ACCEPT ~ LIST ], {'$'}

[LIST ~ LIST . ',' ELEMENT], I',', '$'}

([LIST - - * . L I S T ',' E L E M E N T ] , {'$'})

and ([LIST ~ . E L E M E N T ] , {'$'1) I~: [LIST -~ ELEMENT .], {',', '$'}

The first of these two items gives rise, Is: [ELEMENT -o 'a' .], {',', '$'}

L R Parsing • 115

(1) Given a g r a m m a r G, augment the

15: [LIST --~ LIST ',' . ELEMENT], {',', '$'} g r a m m a r with a new initial produc-

[ELEMENT ~ 'a'], ',', '$'}

[ELEMENT ~ . 'b'], ',', '$'} tion

ACCEPT ~ S

16" [LIST ~ LIST ',' ELEMENT .], ',', '$'}

Although the situation does not occur where S is the start symbol of G.

here, if we generate a set of items I t such t h a t (2) Let I be the set with the one item

I t has the same set of cores as some other ([ACCEPT --~. S], {'$'})

set of items I , already generated, but I ,

It, then we combine I8 and I t into a new set Let I0 be the closure of I.

of items I by merging the lookahead sets of (3) Let C, the current collection of ac-

items with the same cores. We must then cessible sets of items, initially contain

compute G O T O ( I , X) for all g r a m m a r sym- only I0.

bols X. (4) For each I in C, and for each g r a m m a r

The lookahead sets on the completed symbol X, compute I ' = G O T O ( I , X ) .

items give the terminal symbols for which Three cases can occur:

the reductions should be performed. There a. I ' = I " for some I " already in C.

is a possibility t h a t there are ambiguities in I n this case, do nothing.

the grammar, or the g r a m m a r is too complex b. If the set of cores of I ' is distinct

to allow a parser to be constructed b y this from the set of cores of a set of

technique; this causes conflicts to be dis- items already in C, then add I' to C.

covered in the actions of the parser. For ex- c. If the set of cores of I ~ is the same

ample, suppose there is a set of items I~ in as the set of cores of some I " al-

which 'a' gives rise to the parsing action ready in A but I ' ~ I " , then let

shift because GOTO(Is, 'a') exists. Suppose I " be the set of items

also that there is a completed item

([A -~ a./~], L1 (J L2)

([A --. a .], L)

in I,, and t h a t the terminal symbol 'a' is in such t h a t

the lookahead set L. Then we have no way ([A --* a . f~], 51) is in I ' and

of knowing which action is correct in state s ([A --~ a . ~], L~) is in I " .

when we see an 'a'; we m a y shift 'a', or we

m a y reduce by A --~ a. Our only recourse is Replace I" b y I " in C.

to report a shift-reduce conflict. (5) Repeat step 4 until no new sets of

In the same way, if there are two reduc- items can be added to C. C is called

tions possible in a state because two com- the L A L R ( 1 ) collection of sets of items

pleted items contain the same terminal sym- for G.

bol in their lookahead sets, then we cannot (6) From C t r y to construct the parsing

tell which reduction we should do; we must action and goto tables as in Section

report a reduce-reduce conflict. 6.3.

Instead of reporting a conflict we m a y If this technique succeeds in producing a

a t t e m p t to proceed b y carrying out all con- collection of sets of items for a given gram-

flicting parsing actions, either b y parallel m a r in which all sets of items are consistent,

simulation [Earley (1970)] or b y backtrack- then t h a t g r a m m a r is said to be an L A L R ( 1 )

ing [Pager (1972b)]. grammar. LALR(1) g r a m m a r s include m a n y

A set of items is consistent or adequate if it important classes of grammars, including

does not generate any shift-reduce or reduce- the LL(1) g r a m m a r s [Lewis and Stearns

reduce conflicts. A collection of sets of items (1968)], the simple mixed strategy prece-

is vahd if all its sets of items are consistent; dence g r a m m a r s [McKeeman, Horning, and

our collection of sets of items for G1 is valid. W o r t m a n (1970)], and those parsable by

We summarize the parsing action and goto operator precedence techniques. Techniques

116 • A. V. Aho and S. C. Johnson

[Aho and Ullman (1972a and 1973a)]. grammar is ambiguous (see, for example

Step (4) can be rather time-consuming to [Aho and Ullman (1972a)]).

implement. A simpler, but less general, Inconsistent sets of items are useful in

approach would be to proceed as follows. Let pinpointing difficult-to-parse or ambiguous

FOLLOW(A) be the set of terminal symbols constructions in a given grammar. For

that can follow nonterminal symbol A in a example, a production of the form

sentential form. If A can be the rightmost

symbol of a sentential form, then '$' is in- A --~ A A

cluded in FOLLOW(A). We can compute the

in any grammar will make that grammar

sets of items without lookaheads as in Section

ambiguous and cause a parsing action con-

6.2. Then in each completed item [A --~ a .]

flict to arise from sets of items containing

we can approximate the lookahead set L for

the items with the cores

this item b y F O L L O W ( A ) (In general, L is

a subset of FOLLOW(A).) The resulting [A --~ A A .]

collection of sets of items is called the [A --~ A . A]

SLR(1) collection. If all sets of items in the

SLR(1) collection are consistent, then the Constructions which are sufficiently com-

grammar is said to be simple LR(1) [De- plex to require more than one symbol of

Remer (1971)]. Although not every LALR(1) lookahead also result in parsing action con-

grammar is simple LR(1), every language flicts. For example, the grammar

generated by an LALR(1) grammar is also

generated by a simple LR(1) grammar S --~ A 'a'

([Aho and Ullman (1973a)] contains more A --) 'a' I "

details). is an LALR(2) but not LALR(1) grammar.

Experience with an LALR(1) parser

generator called YACC at Bell Laboratories

7. PARSING AMBIGUOUS GRAMMARS has shown that a few iterations with the

parser generator are usually sufficient to re-

It is undesirable to have undetected ambigui-

ties in the definition of a programming solve the conflicts in an LALR(1) collec-

language. However, an ambiguous grammar tion of sets of items for a reasonable pro-

can often be used to specify certain language gramming language.

Example 7.1: Consider the following pro-

constructs more easily than an equivalent

unambiguous grammar. We shall also see ductions for "if-then" and "if-then-else"

that we can construct more efficient parsers statements:

directly from certain ambiguous grammars S --~ 'if b then' S

than from equivalent unambiguous gram- S -~ 'if b then' S 'else' S

mars.

If we attempt to construct a parser for If these two productions appear in a gram-

an ambiguous grammar, the LALR(1) mar, then that grammar will be ambiguous;

parser construction technique will generate the string

at least one inconsistent set of items. Thus,

the parser generation technique can be used 'if b then if b then' S 'else' S

to determine that a grammar is unambigu- can be parsed in two ways as shown:

ous. T h a t is to say, if no inconsistent sets of

items are generated, the grammar is guaran-

teed to be unambiguous. However, if an

inconsistent set of items is produced, then

all we can conclude is that the grammar is

not LALR(1). The grammar may or may

not be ambiguous. (There is no general

L R Parsing • 117

IS ~ . 'if b t h e n ' S], {'else', '$'}

[3 ~ . 'if b t h e n ' S 'else' 3], ['else', '$'}

[3 ~ . 'a'], {'else', '$'}

input 'else', I4 says t h a t either a shift move

I n most programming languages, the first

to /5 is permissible, or a reduction b y pro-

phrasing is preferred. T h a t is, each new

duction

'else' is to be associated with the closest

"unelsed" 'then'. S --~ 'if b then' S

A g r a m m a r using these ambiguous produc-

tions to specify if-then-else statements will is possible. If we choose to shift, we shall

be smaller and, we feel, easier to compre- associate the incoming 'else' with the last

hend t h a n an equivalent unambiguous unelsed 'then'. This is evident because the

grammar. In addition if a g r a m m a r has only item with the core

ambiguities of this type, then we can con-

struct a "Mid LALR(1) parser for the gram- IS --~ 'if b then' S . 'else' S]

m a r merely b y resolving each shift-reduce in I4 gives rise to the shift action.

conflict in favor of shift [Aho, Johnson, and The complete parsing action table, with

Ullman (1973)]. the conflict resolved, and the goto table con-

E x a m p l e 7.2: Consider the ambiguous structed from this collection of sets of items

grammar* are shown below:

S ~ 'if b then' S

S -~ 'if b then' S 'else' S Parsvng Action Table

S --~ 'a' 0: if(input = 'if b then') s h i f t 2

if(input = 'a') s h i f t 3

in which each else is to be associated with

error

the last unelsed 'then'. The LALR(1) col-

1: if(input = $) a c c e p t

lection of sets of items for this g r a m m a r is as

error

follows:

2: if(input = 'if b then') s h i f t 2

/0: [ACCEPT --, • S], {'$'} if(input = 'a') s h i f t 3

[3 --* . 'if b then' 3], {'3'} error

[3 ~ . 'if b then' S 'else' 3], {'$'} 3" r e d u c e b y : S --+ 'a'

[3 --* 'a'], {'$'}

4: i f ( i n p u t = 'else') s h i f t 5

I1 [ A C C E P T --, S .1, {'S'} r e d u c e b y : S --~ 'if b then' S

I~: IS ~ 'if b t h e n ' . 3[, 5: if(input = 'if b then') s h i f t 2

{'else', '$'1

[3 ~ 'if b t h e n ' S 'else' S], ['else', '$'} if(input = 'a') s h i f t 3

IS ~ 'if b t h e n ' S], ['else', '$'} error

IS ~ . 'if b t h e n ' S 'else' S], ['else', '$'} 6: r e d u c e b y : S --~ 'if b t h e n ' S 'else' S

[3 ~ . 'a'], ['else', '$'}

IS ~ 'if b t h e n ' S . 'else' S], {'else', '$'} if(state = 2) g o t o = 4

goto = 6

* T h e following g r a m m a r is an e q u i v a l e n t u n a m -

biguous grammar: Given an ambiguous grammar, with the

S --* 'if b t h e n ' S appropriate rules for resolving the ambigui-

S --* 'If b t h e n ' S~ 'else' S ties we can often directly produce a smaller

S --* 'a'

$I --* 'if b t h e n ' $1 'else' Sx parser from the ambiguous g r a m m a r t h a n

SI --* 'a' from the equivalent unambiguous grammar.

118 • A . V. Aho and S. C. Johnson

However, some of the "optimizations" dis- The first parsing gives the usual left-to-right

cussed in the next section will make the par- associativity, the second a right-to-left

ser for the unambiguous grammar as small associativity.

as that for the ambiguous grammar. If we rewrote the grammar as G4:

E x a m p l e 7.3 : Consider the following gram- E---~ E ' A - ' T

mar G3 for arithmetic expressions: E---~ E ' . ' T

E---~T

E-~ E '+' E

T ~ '('E')'

E-~ E ',' E

T ~ 'a'

E --* '('E')'

E --~ 'a' then we would have eliminated this am-

biguity by imposing the normal left-to-right

where 'a' stands for any identifier. Assuming associativity for + and .. However, this

that + and • are both left associative and new grammar has still one more defect; +

• has higher precedence than + , there are and • have the same precedence, so that an

two things wrong with this grammar. First, expression of the form ' a + a , a ' would be

it is ambiguous in that the operands of the evaluated as ( a + a ) . a . To eliminate this,

binary operators ' + ' and ' . ' can be associ- we must further rewrite the grammar as

ated in any arbitrary way. For example, as:

'a + a -4- a' can be parsed as

E-)E '+' T

E-~T

T --~ T '*' F

T-") F

F -- ' ( ' E ' ) '

F ---) 'a'

We can now construct ~ parser for G5

quite easily, and find that we have 12 states;

if we count the number of parsing actions in

the parser (i.e., the sum of the number of

shift and reduce actions in all states to-

gether with the goto actions) we see that the

parser for G5 has 35 actions.

In contrast, the parser for G3 has only 10

states, and 29 actions. A considerable part

of the saving comes from the elimination of

or as the nonterminals T and F from Gs, as well as

the elimination of the productions E --~ T

and T -* F.

Let us discuss the resolution of parsing

action conflicts in G3 in somewhat more de-

tail. There are two sets of items in the

LALR(1) collection of sets of items for G3

which generate conflicts in their parsing ac-

tions:

[E ---) E . ' + ' El, { ' + ' , '.', ')', '$'}

[E --~ E . ' , ' El, { ' + ' , ',', ')', '$'}

[E --~ E ' + ' E .], { ' + ' , '.', ')', '$'}

and [E --* E . ' + ' El, { ' + ' , '.', ')', '$'}

[E -~ E . ' , ' El, { ' + ' , ' . ' , ')', '$'}

[E --* Z ' . ' E .1, { ' + ' , ' . ' , ')', '$'}

LR Parsing • 119

arise on the two terminal symbols ' + ' and

' . ' . For example, in the first set of items on L I S T : if(state = 0) g o t o = 1

an input of ' + ' we m a y generate either a goto = 5

reduce action or a shift action. Since we wish

+ to be left associative, we wish to reduce Notice t h a t we have only 14 parsing ac-

on this input; a shift would have the effect of tions in this parser, compared to the 16

delaying the reduction until more of the which we had in the earlier parser for G1. I n

string had been read, and would imply right addition, the derivation trees produced b y

associativity. On the input symbol '*', how- this parser are smaller since the nodes cor-

ever, if we did the reduction we would end responding to the nonterminal symbol E L E -

up parsing the string ' a + a , a ' as ( a + a ) , a ; M E N T are no longer there. This in turn

t h a t is, we would not give • higher prece- means t h a t the parser makes fewer actions

dence t h a n + . Thus, it is correct to shift on when parsing a given input string. Parsing

this input. Using similar reasoning, we see of ambiguous g r a m m a r s is d~scussed by

t h a t it is always correct to generate a re- [Aho, Johnson, and Ullman (1973)] in more

duce action from the second set of items; on detail.

the input symbol ' , ' this is a result of the

left associativity of ,, while on the input

symbol ' + ' this reflects the precedence rela-

8. OPTIMIZATION OF LR PARSERS

tion between + and ,.

We conclude this section with an example There are a number of ways of reducing the

of how this reasoning can be applied to our size and increasing the speed of an LR(1)

g r a m m a r G1. We noted earlier t h a t the parser without affecting its good error-de-

g r a m m a r G2: tecting capability. I n this section we shall

list a few of m a n y transformations that can

L I S T --* L I S T ',' L I S T be applied to the parsing action and goto

L I S T --* 'a'

tables of an LR(1) parser to reduce their

L I S T --* 'b'

size. The transformations we list are some

is ambiguous, but this ambiguity should no simple ones t h a t we have found to be effec-

longer be of concern. Assuming t h a t the tive in practice. M a n y other transformations

language designer wants to treat ',' as a left are possible and a number of these can be

associative operator, then we can produce a found in the references at the end of this

parser which is smaller and faster than the section.

parser for G1 produced in the last section.

The smaller parser looks like: 8.1 Merging Identical States

T h e simplest and most obvious "optimiza-

Parsing Action Table tion" is to merge states with common parsing

actions. For example, the parsing action

0: if(input = 'a') s h i f t 2 table for G1 given in Section 5 contains

if(input = 'b') s h i f t 3 identical actions in states 0 and 5. Thus, it is

error

natural to represent this in the parser as:

1: if(input = '$') a c c e p t

if(input = ',') s h i f t 4 0: 5: if(input = 'a') s h i f t 3

error if(input = 'b') s h i f t 4

2: r e d u c e b y : L I S T -~ 'a' error

3: r e d u c e b y : L I S T --~ 'b'

4: if(input = 'a') s h i f t 2 Clearly the behavior of the LR(1) parser

if(input = 'b') s h i f t 3 using this new parsing action table is the

error same as t h a t of the LR(1) parser using the

5: r e d u c e b y : L I S T --~ L I S T ',' L I S T old table.

120 • A . V. Aho and S. C. Johnson

8.2 SubsumingStates tion 5. The only state which calls for a re-

A slight generalization of the transforma- duction by this production is state 2. More-

tion in Section 8.1 is to eliminate a state over, the only way in which we can get to

whose parsing actions are a suffix of the state 2 is by the goto action

actions of another state. We then label the

E L E M E N T : if(state = 0) g o t o = 2

beginning of the suffix by the eliminated

state. For example, if we have: After the parser does the reduction in state

2, it immediately refers to the goto action

n: if(input = 'x') s h i f t p

if(input = 'y') s h i f t q LIST: goto = 1

error

at which time the current state becomes 1.

and m: if(input = 'y') s h i f t q Thus, the rightmost tree is only labeled with

error state 2 for a short period of time; state 2

represents only a step on the way to state I.

then we may eliminate state m by adding the We may eliminate this reduction by the sin-

label into the middle of state n: gle production by changing the goto action

under E L E M E N T to:

n: if(input = 'x') s h i f t p

m: if(input = 'y') s h i f t q E L E M E N T : if(state = 0) g o t o -- 1

error

so that we bypass state 2 and go directly to

Permuting the order of these statements state 1. We now find that state 2 can never

can increase the applicability of this op- be reached by any parsing action, so it can

timization. (See Ichbiah and Morse (1970) be eliminated. Moreover, it turns out here

for suggestions on the implementation of this (and frequently in practice as well) that the

optimization.) goto actions for L I S T and E L E M E N T be-

come compatible at this point; that is, the

8.3 Elimination of Reductions by Single actions do not differ on the same state. I t is

Productions always possible to merge compatible goto

A single production is one of the form actions for nonterminals; the resulting parser

A -* X, where A is a nonterminal and X is has one less state, and one less goto action.

a grammar symbol. If this production is not Example 8.1: The following is a representa-

of any importance in the translation, then tion of the parsing action and goto tables for

we say that the single production is se- an LR(1) parser for G1. It results from the

mantically mszgn~ficant. A common situa- parsing action and goto tables in Section 5

tion in which single productions arise occurs by applying state merger (Section 8.1), and

when a grammar is used to describe the eliminating the reduction by the single pro-

precedence levels and associativities of duction.

operators (see grammar G5 of Example 7.3).

We can always cause an L R parser to avoid Parsing Action Table

making these reductions; by doing so we 0. 5. if (input = 'a') shift 3

make the LR parser faster, and reduce the if (input = 'b') shift 4

number of states. (With some grammars, the error

size of the "optimized" form of the parsing 1: if (input = ',') shift 5

if (input = $) a c c e p t

action table may be greater than the un- error

optimized one.) 3" reduce by: ELEMENT --* 'a'

We shall give an example in terms of G1 4 reduce by: ELEMENT --~ 'b'

which contains the single production 6" reduce by: L I S T -~ L I S T ',' E L E M E N T

We shall eliminate reductions by this pro- L I S T : E L E M E N T : if(state = 0) g o t o = 1

duction from the parser for G, found in Sec- goto = 6

LR Parsing • 121

These tables are identical with those for what the parser should do when an error is

the ambiguous version of G1, after the equal detected; in general, this depends on the

states have been identified. These tables environment in which the parser is operating.

differ only in that the nonterminal symbols Any scheme for error recovery must be

L I S T and E L E M E N T have been explicitly carefully interfaced with the lexical analysis

merged in the ambiguous grammar, while and code generation phases of compilation,

the distinction is still nominally made in the since these operations typically have "side

tables above. effects" which must be undone before the

In the general case, there may be a number error can be considered corrected. In addi-

of states which call for reductions by the tion, a compiler should recover gracefully

same single production, and there may be from each error encountered so that subse-

other parsing actions in the states which call quent errors can also be detected.

for these reductions. It is not always possi- L R parsers can accommodate a wide

ble, in general, to perform these modifica- variety of error recovery stratagems. In

tions without increasing the number of place of each error entry in each state, we

states; conditions which must be satisfied in may insert an error correction routine which

order to profitably carry out this process is prepared to take some extraordinary ac-

are given in [Aho and Ullman (1973b)]. It tions to correct the error. The description of

is enough for our purposes to notice that if the state as given by the set of items fre-

a reduction by a single production A --* X quently provides enough context information

is to be eliminated, and if this reduction is to allow for the construction of sophisticated

generated by exactly one set of items con- error recovery routines.

taining the item with the core We shall illustrate one simple method by

which error recovery can be introduced into

[A ~ X .]

the parsing process. This method is only one

then this single production can be eliminated. of many possible techniques. We introduce

I t turns out that the single productions error recovery productions of the form

which arise in the representation of operator

A --) error

precedence or associativity can always be

eliminated; the result is typically the same into the grammar for certain selected non-

as if an ambiguous grammar were written, terminals. Here, e r r o r is a special terminal

and the conflicts resolved as discussed in symbol. These error recovery productions

Section 6. However, the ambiguous grammar will introduce items with cores of the form

generates the reduced parser immediately,

[A -o . error]

without needing this optimizing algorithm

[Aho, Johnson, and Ullman (1973)]. into certain states, as well as introducing

Other approaches to optimization of L R new states of the form

parsers are discussed by [Aho and Ullman

[A -~ error .]

(1972b)], [Anderson (1972)], [Jolliat (1973)],

and [Pager (1970)]. [Anderson, Eve, and When the L R parser encounters an error, it

Horning (1973)], [Demers (1973)], and can announce error and replace the current

[Pager (1974)] also discuss the elimination of input symbol by the special terminal symbol

reductions by single productions. e r r o r . The parser can then discard trees

from the parse forest, one at a time from

right-to-left, until the current state (the

9. ERROR RECOVERY state on the rightmost tree in the parse

forest) has a parsing action shift on the in-

A properly designed LR parser will an- put e r r o r . The parser has now reached a

nounce that an error has occurred as soon state with at least one item of the form

as there is no way to make a valid continua-

[ A --* . e r r o r ]

tion to the input already scanned. Un-

fortunately, it is not always easy to decide The parser can then perform the shift

122 • A . V . A h o a n d S . C. J o h n s o n

action and reduce b y one of the error re- [James (1972)], [Leinius (1970)], [McGruther

covery productions (1972)], [Peterson (1972)], and [Wirth

(1968)].

A --+ e r r o r

(If more t h a n one error recovery production

is present, a choice would have to be speci- 10. OUTPUT

fied.) On reducing, the parser can perform a

hand-tailored action associated with this I n compiling, we are not interested in pars-

error situation. One such action could be to ing but rather in producing a translation for

skip forward on the input until an input the source program. L R parsing is eminently

symbol 'a' was found such t h a t 'a' can suitable for producing b o t t o m - u p transla-

legitimately occur either as the last symbol tions.

of a string generated by A or as the first Any translation which can be expressed

symbol of a string t h a t can follow A. as the concatenation of outputs which are

Certain automatic error recovery actions associated with each production can be

are also possible. For example, the error re- readily produced by an L R parser, without

covery productions Call be mechanically having to construct the forest representing

generated for any specified set of nontermi- the derivation tree. For example, we can

nals. Parsing and error recovery can proceed specify a translation of arithmetic expressions

as above, except t h a t on reducing b y an error from infix notation to postfix Polish notation

recovery production, the parser can auto- in this way. To implement this class of trans-

matically discard input symbols until it finds lations, when we reduce, we perform an

an input symbol, say 'a', on which it can output action associated with t h a t produc-

make a legitimate parsing action, at which tion. For example, to produce postfix Polish

time normal parsing resumes. This would from G1, we can use the following transla-

correspond to assuming t h a t an error was tion scheme:

encountered while the parser was looking for Productwn Translatwn

a phrase t h a t could be reduced to nontermi- (1) E---* E ' + ' E '+'

nal A. The parser would then assume t h a t (2) E - + E '*' E '*'

b y skipping forward on the input to the (3) E -~ '('E')'

(4) E --, 'a' 'a'

symbol 'a' it would have found an instance

of nonterminal A. Here, as in Section 7, we assume t h a t q-

Certain error recovery schemes can pro- and • are left associative, and t h a t • has

duce an avalanche of error messages. To higher precedence than + . The translation

avoid a succession of error messages stem- element is the output string to be emitted

ming from an inappropriate recovery, a when the associated reduction is done. Thus,

parser might suppress the announcement of if the input string

subsequent errors until a certain number of 'a -t- a * (a "-k a)'

successful shift actions have occurred.

We feel that, at present, there is no effi- is parsed, the output will be

cient general "solution" to the error re-

'aaaa + • -}-'

covery problem in compiling. We see faults

with any uniform approach, including the These parsers can also produce three ad-

one above. Moreover, the success of any dress code or the parse tree as output with

given approach can v a r y considerably from the same ease. However, more complex

application to application. We feel t h a t if a translations m a y require more elaborate

language is cleanly designed and well hu- intermediate storage. Mechanisms for im-

man-engineered, automatic error recovery plementing these translations are discussed

will be easier as well. in [Aho and Ullman (1973a)] and in [Lewis,

Particular methods of error recovery dur- Rosenkrantz, and Stearns (1973)]. I t is our

ing parsing are discussed by [Aho and Peter- current belief that, if a complicated trans-

son (1972)], [Graham and Rhodes (1973)], lation is called for, the best way of imple-

LR Parsing • 123

menting it is by constructing a tree. Optimiz- In his paper Knuth outlined a method for

ing transformations can then massage this constructing an LR parser for an LR gram-

tree before final code generation takes place. mar. However this algorithm results in

This scheme is simple and has low overhead parsers that are too large for practical use.

when the input is in error. A few years later [Korenjak (1969)] and par-

ticularly [DeRemer (1969 and 1971)] suc-

ceeded in substantially modifying Knuth's

11. CONCLUDING REMARKS original parser constructing procedure to

produce parsers of practical size. Substan-

LR parsers belong to the class of shift-reduce tial progress has been made since in improv-

parsing algorithms [Aho, Denning, and Ull- ing the size and performance of LR parsers.

man (1972)]. These are parsers that operate The general theory of LR(k) grammars

by scanning their input from left-to-right, and languages is developed in [Aho and Ull-

shifting input symbols onto a pushdown man (1972a and 1973a)]. Proofs of the cor-

stack until the handle of the current right rectness and efficacy of many of the con-

sentential form is on top of the stack; the structions in this paper can be found there.

handle is then reduced. This process is con- Perhaps the biggest advantage of LR

tinued either until all of the input has been parsing is that small, fast parsers can be

scanned and the stack contains only the mechanically generated for a large class of

start symbol, or until an error has been en- context-free grammars, that includes all

countered. other classes of grammars for which non-

During the 1960s a number of shift-reduce backtracking parsing algorithms can be

parsing algorithms were found for various mechanically generated. In addition, LR

subclasses of the context-free grammars. The parsers are capable of detecting syntax errors

operator precedence grammars ]Floyd at the earliest opportunity in a left-to-right

(1963]), the simple precedence grammars scan of an input string, a property not en-

[Wirth and Weber (1966)], the simple mixed

joyed by many other parsing algorithms.

strategy precedence grammars [McKeeman,

Just as we can parse by constructing a

Horning, and Wortman (1970)], and the

uniquely invertible weak precedence gram- derivation tree for an input string bottom-up

mars [Ichbiah and Morse (1970)] are some of (from the leaves to the root) we can also

these subclasses. The definitions of these parse top-down by constructing the deriva-

classes of grammars and the associated tion tree from the root to the leaves. A

parsing algorithms are discussed in detail in proper subclass of the LR grammars can

[Aho and Ullman (1972a)]. be parsed deterministically top-down. These

In 1965 Knuth defined a class of gram- are the class of LL grammars, first studied

mars which he called the LR(k) grammars. by [Lewis and Stearns (1968)]. LL parsers

These are the context-free grammars that are also efficient and have good error-de-

one can naturally parse bottom-up using a tecting capabilities. In addition, an LL par-

deterministic pushdown automaton with ser requires less initial optimization to be of

k-symbol lookahead to determine shift- practical size. However, the most serious

reduce parsing actions. This class of gram- disadvantage of LL techniques is that LL

mars includes all of the other shift-reduce grammars tend to be unnatural and awk-

parsable grammars and admits of a parsing ward to construct. Moreover, there are LR

procedure that appears to be at least as effi- languages which do not possess any LL

cient as the shift-reduce parsing algorithms grammar.

given for these other classes of grammars. These considerations, together with prac-

[Lalonde, Lee, and Homing (1971)] and tical experience with an automatic parser

]Anderson, Eve, and Horning (1973)] pro- generating system based on the principles

vide some empirical comparisons between expounded in this paper, lead us to believe

LR and precedence parsing that support that LR parsing is an important, practical

this conclusion. tool for compiler design.

124 • A . V. A h o and S. C. Johnson

ronto, Toronto, Canada, 1972.

AHO, A V., DENNING, P. J , AND ULLMAN, J D. JOLLIAT, M L. "On the reduced matrix repre-

"Weak and mixed strategy precedence par- sentation of LR(k) parser tables " PhD.

sing." J. ACM 19, 2 (1972), 225-243. Thesis, Univ. Toronto, Toronto, Canada

AHO, A V., JOHNSON, S C., AND ULLMAN, J. D. (1973).

"Deterministic parsing of ambiguous gram- KNUTH, D E "On the translation of languages

mars." Conference Record of ACM Symposium from left to right " Information and Control 8,

on Principles of Programming Languages (Oct. 6 (1965), 607-639

1973), 1-21. KNUTH, D. E "Top down syntax analysm."

AHO, A. V , AND PETERSON, T G "A minimum Acta Informatzca 1, 2 (1971), 97-110

distance error-correcting parser for context- KORENJAK, A. J. "A practical method of con-

free languages." SIAM J. Computing 1, 4 structmg LR(k) processors " Comm. ACM 12,

(1972) 305-312 11 (1969), 613-623

AHO, A. V., AND ULLMAN, J D. The Theory of LALONDE, W R., LEE, E S., AND HORNING, J. J

Pars,rig, Translatwn and Comp~hng, Vol. 1, "An LALR(k) parser generator." Proc. IFIP

Parsing. Prentice-Hall, Englewood Cliffs, Congress 71 TA-3, North-Holland Publishing

N . J , 1972a. Co., Amsterdam, the Netherlands (1971), pp.

AHO, A. V., AND ULLMAN, J. D. "Optimization of 153-157.

LR(k) parsers." J. Computer and System LEINIUS, P "Error detection and recovery for

Sciences 6, 6 (1972b), 573-602. syntax directed compiler systems " PhD

AHO, A. V , AND ULLMAN, J D The Theory of Thesis, Univ Wisconsin, Madmon, Wisc.

Parsing, Translatwn, and Compiling, Vol 2, (1970).

Compzhng. Prentice-Hall, Englewood Chffs, LEWIS, P. M , ROSENKRANTZ,D. J , AND STEARNS,

N J , 1973a R E "Attributed translations " Proc. Fzfth

Ano, A. V , AND ULLMAN, J. D. "A techmque for Annual ACM Symposzum on Theory of Com-

speeding up LR(k) parsers " SIAM J. Com- putzng (1973), 160-171

puting 2, 2 (1973b), 106-127. LEWTS, P M., AND STEARNS, R E. "Syntax

ANDERSON, T. Syntactic analys~s of LR(k) lan- directed transductIon." J. ACM 15, 3 (1968),

guages. PhD Thesis, Unlv Newcastle-upon- 464-488.

Tyne, Northumberland, England (1972) McGnUTH~R, T. "An approach to automating

ANDERSON, T , EVE, J., AND HORNING, J J. syntax error detection, recovery, and correc-

"Efficmnt LR(1) parsers." Acla Informatica tion for LR(k) grammars." Master's Thesis,

2 (1973), 12-39. Naval Postgraduate School, Monterey, Calif,

DEMERS, A. "Ehmination of single productions 1972

and merging nonterminal symbols of LR(1) MCKEEMAN, W. M , HORNING, J J., AND WORT-

grammars " Technical Report TR-127, Com- MAN, D. B A Compzler Generator. Prentice-

puter Science Laboratory, Dept of Electrical Hall, Englewood Cliffs, N J., 1970.

Engineering, Princeton Univ., Princeton, N . J , PAGLR, D. "A solution to an open problem by

July 1973. Knuth." Informatwn and Control 17 (1970),

DEREMER, F L. "Practical translators for 462-473.

LR(k) languages " Project MAC Report MAC PAGER, D. "On the incremental approach to left-

TR-65, MIT, Cambridge, Mass, 1969 to-right parsing " Technical Report PE 238,

DEREMER, F. L. "Simple LR(k) grammars " Information Sciences Program, Univ. Hawaii,

Comm. ACM 14, 7 (1971), 453-460 Honolulu, Hawan, 1972a.

EARLEY, J. "An efl~cmnt context-free parsing

algorithm." Comm ACM 13, 2 (1970), 94-102. PAGER, D "A fast left-to-right parser for con-

FELDMAN, J. A., AND GRILS, D. "Translator text-free grammars." Technical Report PE

writing systems " Comm. ACM 11, 2 (1968), 240, Information Sciences Program, Univ.

77-113. Hawaii, Honolulu, Hawaii, 1972b

FLOYD, R. W. "Syntactic analyms and operator PAGER, D. "On ehmmating unit productions

precedence " J. ACM 10, 3 (1963), 316-333. from LR(k) parsers." Technical Report, In-

GRAHAM, S. L., AND RHODES, S. P "Practical formation Sciences Program Univ Hawaii,

syntactic error recovery in compilers." Con- Honolulu, Hawai b 1974

ference Record of ACM Symposium on Pmn- PLTERSON, T G. "Syntax error detection, cor-

c~ples of Programmzng Languages (Oct. 1973), rectmn and recovery in parsers." PhD Thesis,

52-58 Stevens Institute of Technology, Hoboken,

GRIES, D Compiler Construction for D~g~tal N. J , 1972

Computers. Wiley, New York, 1971 WIRTH, N "PL360--a programminglanguage for

ICHBIAI-I, J. D , AND MORSE, S P. "A techmque the 360 computers." J. ACM 15, 1 (1968),

for generating almost optimal Floyd-Evans 37-74.

productions for precedence grammars " Comm. WIRTH, N , AND WEBER, H. "EULER--a generali-

ACM 13, 8 (1970), 501-508. zation of ALGOL and its formal definitmn."

JAMES, L R. "A syntax directed error recovery Comm. ACM 9, 1 (1966), 13-23, and 9, 2 (1966),

method." Technical Report CSRC,-13, Com- 89-99.

- OCPP 1.6 JSON SpecificationUploaded byAndrei Andruh
- Atcd Model QpUploaded byIT HOD
- Parallel ComputingUploaded byrdorbala
- compiler designUploaded byselvikaruna
- Lecture 03Uploaded byDibas Sil
- IMG_20170118_0002.pdfUploaded byJoel Smith Avila Rodriguez
- lec8Uploaded byperera.randika
- Syllabus of MeUploaded bykanvarsh
- Automata midsem paper 2010Uploaded bysiddhartha_puligilla
- CompilerUploaded bySanthini Ka
- progassg03Uploaded bycvinay24
- Source CodeUploaded byanandvinaygera
- ParsingUploaded byAkshar Gohel
- Quiz CompilerUploaded byapi-3812391
- SAS Base E Learning QUploaded bychinu-pawan
- ToC-040517 IOS Intermediate RevUploaded byKoala
- NativeXQueryInOracleXmldbUploaded byMonjuMentor
- Decaf GrammarUploaded byMilton Godinez
- Compiler Design Lab.docxUploaded byMohit Kumar
- 10 TE Comp SciUploaded byVikas Pandikode
- MVSIM_PAGUploaded byGnanaSai Dattatreya
- Large Margin Methods for Structured and Interdependent Output VariablesUploaded byJames
- 182 PaperUploaded byacouillault
- Case study on Automatic Transformation of User Stories into UMLUploaded byChandresh Prasad
- Ios Chapter 9mUploaded bySandeep Kodan
- Xilinx ISE Tutorial DKOPUploaded bywarlock_ajay
- 24-String Operations and Field SymbolsUploaded byKIRAN

- Backbiting and Its Adverse Effects.pdfUploaded byHassan
- Eight Most Popular Chinese DishesUploaded byΓεωργία Τζανάκου
- unofficial transcript 10 12 18Uploaded byapi-431984277
- DavidsonUploaded byRansford Benson
- assignmentsUploaded byapi-90908660
- ICT Seminar Flow Charts for 2013 NovUploaded byDilini Dakshika Herath
- hilbert (1)geometryUploaded byspececo
- Basics of .NETUploaded byManish kumar
- Persuasive LetterUploaded bytrustme77
- Relational Ecology and the Digital PharmakonUploaded byarchecinema
- How the Language WorksUploaded byDamnhait
- Newsflash Issue 3 FinalUploaded byIan Luo
- The Silent WayUploaded bySharon Forero
- daily wordsUploaded byGeorge Motroc
- Apa Itu DysgraphiaUploaded byapipirashdan8007
- Ros Wright Effective Comm Skills for the Caring Nurse Aug2012Uploaded byMavz Diano
- 26 Đề Tiếng Anh Chuyên Trung Học Cơ SởUploaded byPhương Mai
- CHAPTER IIUploaded bydiana_angelique
- Phone 81 Development for Absolute BeginnersUploaded byPetru Alin Gheorghe
- 123 Appendix4Uploaded byRae Rena
- Sentence Completion ProblemsUploaded byAzizur1419
- 7 British Bengal.pptxUploaded byFoyez Chowdhury
- A TREATISE ON LIVING THOUGHT.txtUploaded byAndrás Pál
- 1995-prodromou-backwashUploaded byapi-286919746
- Call Variant Function Into Object DepnedincesUploaded bymohameds1988
- Elt TutorialUploaded byIzzati Aqillah Roslan
- Derrida - CountersignatureUploaded byAlec Mitrovich
- The Miseducation of FilipinoUploaded by;LHKPOT
- Types of SentencesUploaded byPrinceMino
- Local Services PackUploaded byReubenBrocklehurst