You are on page 1of 88

Syntax Analyzer Parser

Chapter 3

By Esubalew Alemneh

Contents (Session-1)
Introduction
Context-free grammar
Derivation
Parse Tree
Ambiguity
Resolving Ambiguity

Immediate & Indirect Left Recursion


Eliminating Immediate & Indirect Left Recursion

Left Factoring
Non-Context Free Language Constructs

Introduction

Abstract representations of the input program could be:


abstract-syntax tree/parse tree + symbol table
intermediate code
object code
Syntax analysis is done by the parser.
Produces a parse tree from which intermediate code can be

generated
By detecting whether the program is written following the
grammar rules.
Reports syntax errors, attempts error correction and
recovery
Collects information into symbol tables

Introduction
Error

Source
program

Lexical
analyzer

token
Request
for token

Parse
parser tree

Rest of
front end

Symbol
table

Parsers can be

Top-down or Bottom-up

Int.
code

Context Free Grammars(CFG)


CFG is used to specify the structure of legal programs.
The design of the grammar is an initial phase of the

design of a programming language.


Formally a CFG G = (Vt,Vn,S,P), where:
Vt is the set of terminal symbols in the grammar

(i.e.,the set of tokens returned by the scanner)


Vn, the non-terminals, are variables that denote sets of
(sub)strings occurring in the language. These impose a
structure on the grammar.
S is the start/goal symbol, a distinguished non-terminal in Vn
denoting the entire set of strings in L(G).
P is a finite set of productions specifying how terminals and
non-terminals can be combined to form strings in the
language.
Each production must have a single non-terminal on its left
hand side.

The set V = Vt Vn is called the vocabulary of G

Context Free Grammars(CFG)


Example (G1):

E E+E | EE | E*E | E/E | -E


E (E)
E id

Where
Vt = {+, -, *, / (,), id}, Vn = {E}
S = {E}
Production are shown above

Sometimes can be replaced by ::=

CFG is more expressive than RE - Every language that

can be described by regular expressions can also be


described by a CFG
L = {anbn | n>=1}, is an example language that can be

expressed by CFG but not by RE


Context-free grammar is sufficient to describe most

programming languages.

Context Free Grammars(CFG)


BNF (Backus Normal Form or BackusNaur Form) a

notation techniques for context-free grammars, often


used to describe the syntax of languages used in
computing
Has many extensions and variants
Extended BackusNaur Form (EBNF)
Augmented BackusNaur Form (ABNF).

A BNF specification is a set of derivation rules, written as


<symbol> ::= expression

BNF for valid arithmetic expression

<expr> ::= <expr> <op> <expr>


<expr> ::= ( <expr> )
<expr> ::= - <expr>
<expr> ::= id
<op> ::= + | - | * | /

Derivation
A sequence of replacements of non-terminal

symbols to obtain strings/sentences is called a


derivation
If we have a grammar E E+E then we can
replace E by E+E
In general a derivation step is A if
there is a production rule A in a grammar
where and are arbitrary strings of terminal

and non-terminal symbols

Derivation of a string should start from a

is a sentential form (terminals & nonterminals Mixed)


is a sentence if it contains
only terminal symbols

production with start symbol in the left


S

Derivation
Derivate string (id+id) from G1

E -E -(E) -(E+E) -(id+E) -(id+id) (LMD)


OR
E -E -(E) -(E+E) -(E+id) -(id+id) (RMD)
At each derivation step, we can choose any of the non-

terminal in the sentential form of G for the replacement.


If we always choose the left-most non-terminal in each

derivation step, this derivation is called as left-most


derivation(LMD).
If we always choose the right-most non-terminal in each

derivation step, this derivation is called as right-most


derivation(RMD).

Parse Tree
A parse tree can be seen as a graphical

representation of a derivation
Inner nodes of a parse tree are non-terminal

symbols.
The leaves of a parse tree are terminal symbols.
E

E -E

-(E)

E
-

-(E+E)

E
(

E
-

-(id+E)

E
-

E
(

id

-(id+id)

E
(

id

id

E
-

E
(

Ambiguity
An ambiguous grammar is one that produces

more than one LMD or more than one RMD for


the same sentence. E E*E
E E+E
id+E
id+E*E

id+id*E
Eid+id*id
E

id

id

E+E*E
id+E*E
id+id*E
id+id*id
E
E

E
id

E
id

*
E
id

E
id

Ambiguity
For the most parsers, the grammar must be

unambiguous.
If a grammar unambiguous grammar then there are
unique selection of the parse tree for a sentence
We should eliminate the ambiguity in the grammar
during the design phase of the compiler.
An unambiguous grammar should be written to
eliminate the ambiguity.
We have to prefer one of the parse trees of a
sentence (generated by an ambiguous grammar)
to disambiguate that grammar to restrict to this
choice.

AmbiguityDangling If

stmt if expr then stmt |


if expr then stmt else stmt

if E1 then if E2 then S1 else S2


stmt

stmt
expr then
E1

| otherstmts

stmt

else

if expr then
E2

S1

stmt

stmtexpr then stmt


if
SE
21

if expr then stmt else s


E2

S1

S2

We prefer the second parse tree (else

matches with closest if).


So, we have to disambiguate our grammar

Resolving Ambiguity
Option 1: add a meta-rule e.g. precedence and

associativity rules
For example else associates with closest previous if
works, keeps original grammar intact
ad hoc and informal

Option 2: rewrite the grammar to resolve ambiguity

explicitly
stmt matchedstmt | unmatchedstmt
matchedstmt if expr then matchedstmt else matchedstmt |
otherstmts
unmatchedstmt if expr then stmt |
if expr then matchedstmt else unmatchedstmt
formal, no additional rules beyond syntax
sometimes obscures original grammar

Resolving Ambiguity
Option 3: redesign the language to remove
the ambiguity
Stmt ::= ... |
if Expr then Stmt end |
if Expr then Stmt else Stmt end
formal, clear, elegant
allows sequence of Stmts in then and else

branches, no { , } needed
extra end required for every if

Left Recursion
A grammar is left recursive if it has a non-

terminal A such that there is a derivation.


A A

for some string

Top-down parsing techniques cannot handle left-

recursive grammars.
So, we have to convert our left-recursive grammar
into an equivalent grammar which is not leftrecursive.
Two types of left-recursion
immediate left-recursion - appear in a single step of the

derivation (),
Indirect left-recursion - appear in more than one step of the
derivation.

Eliminating Immediate Left Recursion


AA|

where does not start with A


eliminate immediate left

recursion
A A
A A |

A A |

A
|
OR

In general,
A A 1 | ... | A m | 1 | ... | n
not start with A

where 1 ... n do

eliminate immediate left recursion

A 1 A | ... | n A
A 1 A | ... | m A |

an equivalent grammar

Eliminating Left Recursion


Remove left recursion from the grammar

below
E E+T | T
T T*F | F
F id | (E)
Answer

E T E
E +T E |
T F T
T *F T |
F id | (E)

Indirect Left-Recursion
A grammar cannot be immediately left-recursive,

but it still can be left-recursive.


By just eliminating the immediate left-recursion,
we may not get a grammar which is not leftrecursive.
S Aa | b
A Sc | d This grammar is not immediately leftrecursive,
but it is still left-recursive.
S Aa Sca
or
A Sc Aac causes to a left-recursion
So, we have to eliminate all left-recursions from our
grammar

Eliminating Indirect Left-Recursion


Arrange non-terminals in some order: A 1 ... An
we will remove indirect left recursion by
constructing an equivalent grammar G such that
- if Ai Aja is any production of G, then i < j
For each non-terminal in turn, do:
For each terminal Ai such that 1< j<i and we have a

production rule of the form Ai Aj, where the Aj


productions are Aj 1 | |Bn , do:
Replace the production rule Ai Aj with the rule Ai
1 | |Bn
Eliminate any immediate left recursion among the
productions 1

Eliminating Indirect Left-Recursion


Example 1

S Aa | b
A Ac | Sd | f
- Order of non-terminals: S = A1, A = A2
A1 A2 a | b
A2 A2 c | A1 d | f
The only production with j<i is A2 A1 d
for A:
- Replace it with A2 A2 ad | bd

A2 A2 c | A2 ad | bd | f
- Eliminate the immediate left-recursion in A
A2 bdA|bdA
A cA | adA|

So, the resulting equivalent grammar which is not left-recursive is:


S Aa | b
A bdA | fA
A cA | adA |

Eliminating Indirect Left-Recursion


Example 2
A1 A2 A3
A2 A3 A1 | b
A3 A1 A1 | a
Replace A3 A1 A1 by A3 A2 A3 A1
and then replace this by
A3 A3 A1 A3 A1 and A3 b A3 A1
Eliminating direct left recursion in the above,
gives: A3 aK | b A3 A1K
k A1 A3 A1K |

The resulting grammar is then:


A1 A2 A3
A2 A3 A1 | b
A3 aK | b A3 A1K
k A1 A3 A1K |

Left Factoring
A predictive parser (a top-down parser without

backtracking) insists that the grammar must be


left-factored.
stmt if expr then stmt else stmt |
if expr then stmt
when we see if, we cannot know which
production rule to choose to re-write stmt in the
derivation.
In general,
A 1 | 2 where is non-empty and the first
symbols of 1 and 2 (if they have one)are different
when processing we cannot know whether to expand

A to 1

or

A to 2

Left Factoring
But, if we re-write the grammar as follows

A A
A 1 | 2

so, we can immediately expand A to A

Left Factoring Algorithm

For each non-terminal A with two or more

alternatives (production rules) with a common


non-empty prefix, let say
A 1 | ... | n | 1 | ... | m
convert it into
A A | 1 | ... | m
A 1 | ... | n

Left Factoring
Example1

Example2

A abB | aB | cdg | cdeB A ad | a | ab | abc | b


| cdfB

A aA | cdg | cdeB |
cdfB
A bB | B

A aA | cdA
A bB | B
A g | eB | fB

A aA | b
A d | | b | bc

A aA | b
A d | | bA
A | c

Non-Context Free Language Constructs


There are some language constructions in the

programming languages which are not context-free.


This means that, we cannot write a context-free
grammar for these constructions.
L1 = { c | is in (a|b)*} is not context-free
declaring an identifier and checking whether it is

declared or not later. We cannot do this with a


context-free language. We need semantic analyzer
(which is not context-free).
L2 = {anbmcndm | n1 and m1 } is not context-free
declaring two functions (one with n parameters, the

other one with m parameters), and then calling them


with actual parameters.

Contents(Session-2)
Top Down Parsing
Recursive-Descent Parsing
Predictive Parser
Recursive Predictive Parsing
Non-Recursive Predictive Parsing

LL(1) Parser Parser Actions


Constructing LL(1) - Parsing Tables
Computing FIRST and FOLLOW functions
LL(1) Grammars
Properties of LL(1) Grammars

Top Down Parsing


Top-down parsing involves constructing a parse tree for

the input string, starting from the root


Basically, top-down parsing can be viewed as finding a
leftmost derivation for an input string.
How it works? Start with the tree of one node labeled with

the start symbol and repeat the following steps until the fringe
of the parse tree matches the input string
1. At a node labeled A, select a production with A on its LHS
and for each symbol on its RHS, construct the appropriate
child
2. When a terminal is added to the fringe that doesn't match
the input string, backtrack
3. Find the next node to be expanded
! Minimize the number of backtracks as much as

possible

Top Down Parsing


Two types of top-down parsing

Recursive-Descent Parsing
Backtracking is needed (If a choice of a production rule

does not work, we backtrack to try other alternatives.)


It is a general parsing technique, but not widely used
because it is not efficient
Predictive Parsing
no backtracking and hence efficient
needs a special form of grammars (LL(1) grammars).
Two types
Recursive

Predictive Parsing is a special form of


Recursive Descent Parsing without backtracking.
Non-Recursive (Table Driven) Predictive Parser is
also known as LL(1) parser.

Recursive-Descent Parsing
It tries to find the left-most derivation.

Backtracking is needed
Example
S aBc
B bc | b
input: abc
A left-recursive grammar can cause a

recursive-descent parser, even one with


backtracking, to go into an infinite loop.
That is, when we try to expand a non-terminal B,

we may eventually find ourselves again trying


to expand B without having consumed any input.

Predictive Parser
A grammar

eliminate

a grammar suitable for predictive


parsing (a LL(1) grammar)

left

no %100 guarantee.
When re-writing a non-terminal in a derivation step, a
predictive parser can uniquely choose a production
rule by just looking the current symbol in the input
string.
Note: When we are
stmt if ......
|
trying to write the nonwhile ...... |
terminal stmt, we can
begin ...... |
uniquely choose the
for .....
production rule by just
left recursion

factor

looking the current


However, even though we eliminate the left
token.
recursion in the grammar, and left factor it, it may not
be suitable for predictive parsing (not LL(1) grammar).

Recursive Predictive Parsing


Predictive Parsing can be recursive or non-recursive
In recursive predictive parsing, each non-terminal

corresponds to a procedure/function.
Example

A aBb | bAB
proc A {
case of the current token {
a: - match the current token with a, and move to the next token;
- call B;
- match the current token with b, and move to the next token;
b: - match the current token with b, and move to the next token;
- call A;
- call B;
}
}

Recursive Predictive Parsing


When to apply -productions?

A aA | bB | l
If all other productions fail, we should apply an l-

production. For example, if the current token is not


a or b, we may apply the
-production.
Most correct choice: We should apply a l-

production for a non-terminal A when the current


token is in the follow set of A (which terminals can
follow A in the sentential forms).

Recursive Predictive Parsing

Non-Recursive Predictive Parsing


A non-recursive predictive parser can be built

by maintaining a stack explicitly, rather than


implicitly via recursive calls
Non-Recursive predictive parsing is a table-

driven top-down parser.

Model of a table-driven
predictive parser

Non-Recursive Predictive Parsing


Input buffer
our string to be parsed. We will assume that its end is marked

with a special symbol $.

Output
a production rule representing a step of the derivation

sequence (left-most derivation) of the string in the input


buffer.

Stack
contains the grammar symbols
at the bottom of the stack, there is a special end marker

symbol $.
initially the stack contains only the symbol $ and the starting
symbol S.
when the stack is emptied (i.e. only $ left in the stack), the
parsing is completed.

Parsing table
a two-dimensional array M[A,a]

LL(1) Parser Parser Actions


The symbol at the top of the stack (say X) and the

current symbol in the input string (say a) determine


the parser action.
There are four possible parser actions.
1. If X and a are $ parser halts (successful
completion)
2. If X and a are the same terminal symbol (different
from $)
parser pops X from the stack, and moves the
next symbol in the input buffer.
3. If X is a non-terminal
parser looks at the parsing table entry M[X,a].
If M[X,a] holds a production rule XY1Y2...Yk, it
pops X from the stack and pushes Yk,Yk-1,...,Y1 into
the stack. The parser also outputs the production

LL(1) Parser Example1


S aBa
B bB | S
B
stack

b LL(1)

$
Parsing
Table

S aBa
B

input

B bB
output

$S
$aBa
$aB
$aBb
$aB
$aBb
$aB
$a

abba$
S aBa
abba$
bba$
B bB
bba$
ba$
B bB
ba$
a$
B
a$

We will see
how to
construct
parsing
table Very
soon

accept, successful completion

LL(1) Parser Example2


E TE
E +TE |
T FT
T *FT |
F (E) | id
id
E

E TE

E
E +TE
T T FT
T
T
F F id

E is start symbol

E TE
T FT
T *FT
F (E)

LL(1) Parser Example2


stack inputoutput
$E id+id$ E TE
$ET id+id$ T FT
$E TF id+id$ F id
$ E Tid id+id$
$ E T +id$ T
$ E +id$ E +TE
$ E T+ +id$
$ E T id$ T FT
$ E T F id$ F id
$ E Tid id$
$ E T $ T
$ E $ E
$ $ accept

LL(1) Parser Example3


Taking Input
id+id*id
which is formed
from the Grammar
for Example 2

Constructing LL(1) Parsing Tables


Two functions are used in the construction of

LL(1) parsing tables:


FIRST
FOLLOW

FIRST() is a set of the terminal symbols which

occur as first symbols in strings derived from


where is any string of grammar symbols.
if derives to , then is also in FIRST() .

FOLLOW(A) is the set of the terminals which

occur immediately after the non-terminal A in


the strings derived from the starting symbol.
a terminal a is in FOLLOW(A)

if S Aa

Compute FIRST for a String X


1. If X is a terminal symbol, then

FIRST(X)={X}

2. If X is , then FIRST(X)={}

3. If X is a non-terminal symbol and X is

a production rule, then add in FIRST(X).

4. If X is a non-terminal symbol and X

Y1Y2..Yn is a production rule, then


if a terminal a in FIRST(Yi) and is in all
FIRST(Yj) for j=1,...,i-1, then a is in
FIRST(X).

Compute FIRST for a String X


Example

E TE
E +TE |
T FT
T *FT|
F (E) | id

From Rule 1
FIRST(id) = {id}
From Rule 2
FIRST() = {}
From Rule 3 and
4
First(F) = {(, id}
First(T) = {*, }

FIRST(E) = {+, }
FIRST(E) = {(,id}
Others

FIRST(TE) = {(,id}
FIRST(+TE ) = {+}
FIRST(FT) = {(,id}
FIRST(*FT) = {*}
FIRST((E)) = {(}

Compute FOLLOW (for non-terminals)


1. $ is in FOLLOW(S), if S is the start symbol
2. Look at the occurrence of a nonterminal on the

RHS of a production which is followed by


something
if A B is a production rule, then everything in

FIRST() except is FOLLOW(B)

3. Look at B on the RHS that is not followed by

anything
If ( A B is a production rule )

or ( A B is
a production rule and is in FIRST() ), then
everything in FOLLOW(A) is in FOLLOW(B).

Compute FOLLOW (for non-terminals)


Example

E TE
E +TE |
iii. T FT
iv. T *FT |
v. F (E) | id
FOLLOW(E) = { $, ) }, because
i.
ii.

.From first rule Follow (E) contains $


.From Rule 2 Follow(E) is first()), from the production F

(E)

FOLLOW(E) = { $, ) } . Rule 3
FOLLOW(T) = { +, ), $ }
From Rule 2 + is in FOLLOW(T)
From Rule 3 Everything in Follow(E) is in Follow(T) since
First(E) contains

.
.

FOLLOW(F) = {+, *, ), $ } same reasoning as

Constructing LL(1) Parsing Table -- Algorithm


For each production rule A of a grammar

1. for each terminal a in FIRST()

add A to M[A,a]
2. If in FIRST()

for each terminal a in FOLLOW(A) add A


to M[A,a]
3. If in FIRST() and $ in FOLLOW(A)

Constructing LL(1) Parsing Table -- Example


E TE

FIRST(TE)={(,id}

E TE into M[E,(] and M[E,id]

E +TE

FIRST(+TE )={+}

E +TE into M[E,+]

T FT

FIRST()={}
none
but since in FIRST()
and FOLLOW(E)={$,)} E into M[E,$] and M[E,)]
FIRST(FT)={(,id}

T *FT FIRST(*FT )={*}

T FT into M[T,(] and M[T,id]


T *FT into M[T,*]

FIRST()={}
none
but since in FIRST()
and FOLLOW(T)={$,),+}
T into M[T,$], M[T,)]
& M[T,+]

F (E)

FIRST((E) )={(}

F (E) into M[F,(]

LL(1) Grammars
A grammar whose parsing table has no multiple-defined

entries is said to be LL(1) grammar.


First L refers input scanned from left, the second L refers leftmost derivation and 1 refers one input symbol used as a lookhead symbol do determine parser action input scanned from left
to right
A grammar G is LL(1) if and only if the following conditions

hold for two distinctive production rules A and A


1. Both and cannot derive strings starting with same terminals.
2. At most one of and can derive to .
3. If can derive to , then cannot derive to any string starting

with a terminal in FOLLOW(A).

From 1 & 2, we can say that First( ) I First() = 0


From 3, means that if

and the like

is in First(), then First( ) I Follow(A) = 0

A Grammar which is not LL(1)


The parsing table of a grammar may contain more than one

production rule.
In this case, we say that it is not a LL(1) grammar.
SiCtSE |
EeS |
Cb

FIRST(iCtSE) = {i}
FIRST(a) = {a}
FIRST(eS) = {e}
FIRST() = {}
FIRST(b) = {b}
FOLLOW(S) = { $,e }
FOLLOW(E) = { $,e }
FOLLOW(C) = { t }

S iCtSE

S Sa
EeS

Cb

two production rules for M[E,e]


Problem ambiguity

A Grammar which is not LL(1)


What do we have to do if the resulting parsing table contains

multiply defined entries?


Eliminate left recursion in the grammar, if it is not eliminated
A A |

any terminal that appears in FIRST() also appears


FIRST(A) because A .
If is , any terminal that appears in FIRST() also appears
in FIRST(A) and FOLLOW(A).
Left factor the grammar, if it is not left factored.
A grammar is not left factored, it cannot be a LL(1) grammar:
A 1 | 2
any terminal that appears in FIRST(1) also appears in
FIRST(
If its (new grammars) parsing table still contains multiply
defined entries, that grammar is ambiguous or it is inherently
not a LL(1) grammar.
An ambiguous grammar cannot be a LL(1) grammar.

Error Recovery in Predictive Parsing


An error may occur in the predictive parsing

(LL(1) parsing)
if the terminal symbol on the top of stack does

not match with


the current input symbol.
if the top of stack is a non-terminal A, the
current input symbol is a,
the parsing table entry M[A,a] is empty.
What should the parser do in an error case?
The parser should be able to give an error

message (as much as possible meaningful error


message).
It should recover from that error case, and it

Contents (Session-3)
Bottom Up Parsing
Handle Pruning
Implementation of A Shift-Reduce

Parser
LR Parsers
LR Parsing Algorithm
Actions of A LR-Parser
Constructing SLR Parsing Tables
SLR(1) Grammar
Error Recovery in LR Parsing

Bottom-Up Parsing
A bottom-up parser creates the parse tree of the

given input starting from leaves towards the root.


A bottom-up parser tries to find the RMD of the given

input in the reverse order.


Bottom-up parsing is also known as shift-reduce

parsing because its two main actions are shift and


reduce.
At each shift action, the current symbol in the input

string is pushed to a stack.


At each reduction step, the symbols at the top of the
stack will be replaced by the non-terminal at the left
side of that production.
Accept: Successful completion of parsing.
Error: Parser discovers a syntax error, and calls an
error recovery routine.

Bottom-Up Parsing
A shift-reduce parser tries to reduce the given input

string into the starting symbol.


a string

the starting symbol

reduced to

At each reduction step, a substring of the input

matching to the right side of a production rule is


replaced by the non-terminal at the left side of that
production rule.
If the substring is chosen correctly, the right most
derivation of that string is created in the reverse order.
Rightmost Derivation:

Shift-Reduce Parser finds:

... S

rm
rm

rm

Shift-Reduce Parsing -- Example


S aABb
A aA | a
B bB | b

input string: aaabb


aaAbb
aAbb
reduction
aABb
S
S aABb aAbb aaAbb aaabb
Right Sentential Forms
How do we know which substring to be

replaced at each reduction step?

Handle
Informally, a handle of a string is a substring that

matches the right side of a production rule.

But not every substring matches the right side of a

production rule is handle

A handle of a right sentential form ( ) is


a production rule A and a position of
where the string may be found and replaced by A to
produce
the previous right-sentential form in a rightmost
derivation of .

S A rm

rm

If the grammar is unambiguous, then every right-

sentential form of the grammar has exactly one


handle.

Handle Pruning
A right-most derivation in reverse can be

obtained by handle-pruning.
S 0 rm
rm
1 rm
2 ...
n-1 n=
rm
rm
input string
Start from n, find a handle Ann in n,

and

replace n by An to get n-1.


Then find a handle An-1n-1 in n-1, and replace n1 in by An-1 to get n-2.
Repeat this, until we reach S.

A Shift-Reduce Parser - example


E E+T | T
Right-Most Derivation of id+id*id
T T*F | FE E+T E+T*F E+T*id E+F*id
F (E) | id
E+id*id T+id*id F+id*id id+id*id
Right-Most Sentential Form
id+id*id F id
F+id*id
TF
T+id*id
ET
E+id*id
F id
E+F*id
TF
E+T*id
F id
E+T*F T T*F
E+T
E E+T
E

Reducing Production

Handles are red and underlined in the right-sentential


forms.

A Stack Implementation of A Shift-Reduce Parser


Stack

Input Action

$ id+id*id$

shift

$id

+id*id$ reduce by F id

$F

+id*id$ reduce by T F

$T

+id*id$ reduce by E T

$E

+id*id$ shift

$E+

id*id$

shift

$E+id *id$

reduce by F id

$E+F *id$

reduce by T F

$E+T *id$

shift

$E+T*

id$ shift

$E+T*id

reduce by F id

$E+T*F

reduce by T T*F

$E+T $

reduce by E E+T

$E

accept

Initial stack just


contains only the
end-marker $ &
the end of the
input string is
marked by the
Parse Tree
end-marker $.

Shift-Reduce Parsers
The most prevalent type of bottom-up parser

today is based on a concept called LR(k) parsing;


left to right

right-most

k lookhead (k is omitted it is 1)

LR-Parsers overs wide range of


Simple LR parser (SLR )
Look Ahead LR (LALR)

CFG
LR
grammars.
LALR
SLR

most general LR parser (LR )

SLR, LR and LALR work same, only their parsing

tables are different.

LR Parsers
LR parsing is attractive because:
LR parsers can be constructed to recognize virtually all
programming-language constructs for which contextfree grammars can be written.
LR parsing is most general non-backtracking shiftreduce parsing, yet it is still efficient.
The class of grammars that can be parsed using LR
methods is a proper superset of the class of grammars
that can be parsed with predictive parsers.
LL(1)-Grammars LR(1)-Grammars
An LR-parser can detect a syntactic error as soon as
it is possible to do so a left-to-right scan of the input.
Drawback of the LR method is that it is too much

work to construct an LR parser by hand.


Use tools e.g. yacc

LR Parsing Algorithm
input

a1

... ai

... an

stack

Sm
Xm

LR Parsing Algorithm

Sm-1
Xm-1
.
.
S1
X1
S0

s
t
a
t
e
s

Action Table

Goto Table

terminals and $

non-terminal

four different
actions

s
t
a
t
e
s

each item is
a state number

output

A Configuration of LR Parsing
Algorithm
A configuration of a LR parsing is:

( So S1 ... Sm, ai ai+1 ... an $ )


Stack

Rest of Input

Sm and ai decides the parser action by consulting the

parsing action table. (Initially Stack contains just S o )


A configuration of a LR parsing represents the right

sentential form:
X1 ... Xm ai ai+1 ... an $
Xi is the grammar symbol represented by state s i

Actions of A LR-Parser
1. If ACTION[Sm,

ai ] = shift s, the parser executes a shift

move ; it shifts the next state s onto the stack, entering the
configuration
( So S1 ... Sm, ai ai+1 ... an $ ) ( So S1 ... Sm s, ai+1 ... an $ )

2. If ACTION[Sm,

ai ] = reduce A, then the parser executes

a reduce move changing configuration from

r is the
length of , and s = GOTO[sm-r, A]. Output is the reducing
production A
Here the parser first popped r state symbols off the stack,
exposing state sm-r then the parser pushed s.
( So S1 ... Sm, ai ai+1 ... an $ ) to ( So S1 ... Sm-r s, ai ... an $ ) where

3. If ACTION[Sm, ai ] = Accept, parsing successfully completed


4. If ACTION[Sm,

ai ] = Error, parser detected an error (an

empty entry in the action table)

LR-parsing algorithm

(SLR) Parsing Tables for Expression


Grammar
Expression
Grammar
1) E E+T
2) E T
3) T T*F
4) T F
5) F (E)
6) F id

Action Table
state

id

s5

Goto Table
)

s4

s6

r2

s7

r2

r2

r4

r4

r4

r4

s4
r6

acc

s5

r6

r6

s5

s4

s5

s4

r6
10

s6

r1

s7

r1

r1

10

r3

r3

r3

r3

s11

Actions of A (S)LR-Parser -Example


For id*id+id
stack

input

id*id+id$

0id5

*id+id$

reduce by Fid

Fid

0F3

*id+id$

reduce by TF

TF

0T2

*id+id$

shift 7

0T2*7

id+id$

shift 5

0T2*7id5

action

output

shift 5

+id$

0T2*7F10 (*)+id$

reduce by Fid

Fid

reduce by TT*F

TT*F

0T2

+id$

reduce by ET

0E1

+id$

shift 6

ET

0E1+6

id$

shift 5

0E1+6id5

reduce by Fid

Fid

0E1+6F3

reduce by TF

TF

0E1+6T9 (**)
0E1

$
accept

b/c goto(0, F)
=3
b/c goto(0, T)
=2

reduce by EE+T

b/c goto(7, F)
= 10
b/c goto(0, T)
=2

(*) T2*7F10

EE+T
reduced by T
b/c

Conflicts During Shift-Reduce Parsing


There are context-free grammars for which shift-

reduce parsers cannot be used.


Stack contents and the next input symbol may
not decide action:
shift/reduce conflict: Whether make a shift
operation or a reduction.
reduce/reduce conflict: The parser cannot
decide which of several reductions to make.
If a shift-reduce parser cannot be used for a
grammar, that grammar is called as non-LR(k)
grammar.
An ambiguous grammar can never be a LR
grammar

Constructing SLR Parsing Tables


LR(0) Item
An LR parser makes shift-reduce decisions by maintaining states to

keep track of where we are in a parse.


An LR(0) item of a grammar G is a production of G a dot at the some
position of the right side.

Ex:

A aBb

Possible LR(0) Items: A

(four different possibility)

aBb

A a Bb

.
.

A aB b
A aBb

Sets of LR(0) items will be the states of action and goto table of the

SLR parser.
i.e. States represent sets of "items.

A collection of sets of LR(0) items (the canonical LR(0) collection) is

the basis for constructing SLR parsers.

Constructing SLR Parsing Tables


To construct the canonical LR(0) collection for a

grammar, we define an augmented grammar and two


functions, CLOSURE and GOTO.
Augmented Grammar:
G is G with a new production rule SS where S is the new
starting symbol.
Purpose: to provide a single production that, when reduced,
signals the end of parsing
If I is a set of LR(0) items for a grammar G, then

closure(I) is the set of LR(0) items constructed from


I by the two rules:
1. Initially, every LR(0) item in I is added to closure(I).
2. If A .B is in closure(I), for all production rules B in G,

add B. in the closure(I).


We will apply this rule until no more new LR(0) items
can be added to closure(I).

Closure (I) . Example


Give a grammar
EE+T|T
TT*F|F
F ( E ) | id
Then,
Closure({T T .* F}) = {T T . * F}
Closure ({T T .* F, T T * .F}) = {T T .* F, T
T * .F, F .(E ), F .id}
Closure ({F ( .E ) } ) = {F ( .E ), E . E + T, E
. T, T . T * F, T . F, F .( E ), F . id }
closure({E .E}) ={E .E, E .E+T, E .T,
T .T*F, T .F, F .(E), F .id}

Goto Operation
If I is a set of LR(0) items and X is a grammar symbol (terminal or non-terminal), then goto(I,X) is

defined as follows:

If A X in I, then every item in closure({A X }) will be in goto(I,X).

Example:

I ={E
F

.
.

E, E

(E), F

E+T, E

T, T

T*F, T

F,

id}

. .
. .
. .
. .
.
.
.
. .
. .
.
.

goto(I,E) = closure({ E E , E E +T }) = { E E , E E +T })
goto(I,T) = closure({ E T , T T *F }) = {E T , T T *F}
goto(I,F) = closure({ T F

}) = { T F

})

goto(I,() = closure({ F ( E)}) = {F ( E), E


goto(I,id) = closure( { F id

}) = { F id

Goto({E E ., E E + .T},+) = closure({ }) = { }

E+T, E

T, T

T*F, T

F, F

(E), F

id }

Construction of The Canonical LR(0)


Collection
To create the SLR parsing tables for a

grammar G, we will create the canonical LR(0)


collection of the grammar G.
Algorithm:
Void items(G) {
C = { CLOSURE({S.S}) }
repeat
for (each set of items I in C)
for(each grammar symbol X)
if (goto(I,X) is not empty
and not in C)
add goto(I,X) to C
Until no new sets of items are added

The Canonical LR(0) Collection -Example


C = {closure({E .E}) } = {E . E, E . E + T, E

. T, T . T * F, T . F, F .( E ), F . id}.
This gives us the items for the first state (state0
I0) of our DFA

Now we need to compute Goto functions for all

of the relevant symbols in the set.


In this case, we care about the symbols E, T, F, (,

and id, since those are the symbols that have a


.symbol in front of them in some item of the set C.
For symbol E, Goto(I0, E) = closure({E

E ., E

E . + T})
= {E E ., E E . + T} = call it I1
For symbol T, Goto(I0, T) = closure({E T ., T

For symbol F, Goto(I0, F) =closure({T F.}) = {T

F.} = I3
For symbol (, Goto(I0, () = closure({F ( .E ) }) = {F

(.E ), E . E + T, E . T, T . T * F, T . F, F .(
E ), F . Id} = I4

For symbol id, Goto(I0, id) = closure({F id.}) = {F

id.} = I5

Repeat this step for newly created states (I1, I2, I3, I4, I5)

till . occures at the end of kernal of each state.


For symbol +, Goto(I1, +) = closure({E E +. T}) = {E

E +. T, }, T . T * F, T . F, F .( E ), F . Id} = I6

For symbol *, Goto(I2, *) = closure({T T * .F})= {T

T * .F, F .( E ), F . Id} = I7.

For symbol E, Goto(I4, E) = closure({F ( E. ), E

E .+ T})

Summary of states obtained , and to which state each


production in a state goes

Transition Diagram (DFA) of Goto


Function

LR(0) automaton for the Example


E E
E E+T

closure({E

E}) =

{ E

ET

T T*F

TF

.
.
.

E Rule 1

E+T Rule 2
T Rule 2

T*F

Rule 2

Constructing SLR Parsing Table -Example


Before we start construction of SLR action/goto

tables, we need to compute the follow sets for


all of the non-terminals in the grammar and we
need to number the productions.
0: E E
1: E E + T
Follow(E) = {$}
2: E T
Follow(E) = {$,),+}
3: T T * F
Follow(T) = {$,),+}
4: T F
Follow(F) = {$,),+,*}
5: F ( E )
6: F id
Each terminal is column for action table, each nonterminal is column for GOTO table and each state is
row for both tables

Constructing SLR Parsing Table


1. Construct the canonical collection of sets of LR(0)

items for G.
C{I0,...,In}
2. Create the parsing action table as follows

2.1. If a is a terminal, A.a in Ii and goto(Ii,a)=Ij


then action[i, a] is shift j.
2.2. If A. is in Ii , then action[i,a] is reduce A for
all a in FOLLOW(A) where AS.
2.3. If SS. is in Ii , then action[i,$] is accept.
2.4. If any conflicting actions generated by these rules,
the grammar is not SLR(1).
3. Create the parsing goto table
. for all non-terminals A, if goto(I i,A)=Ij then goto[i,A]=j

4. All entries not defined by (2) and (3) are errors.


5. Initial state of the parser contains S.S

Constructing SLR Parsing Table


:Example
From Rule 2.1.

Take F .( E ) from I0, Goto(I0, ( ) = I4, then action[0, (] = shift

4
Take E E . + T from I1, Goto(I1, +) = I6, then action[1, +] =
shift 6
Take T T . * F from I2, Goto(I2, *) = I7, then action[2, *] = shift
7
other shifts can be populated in the same way

From Rule 2.2.

Take E T. from I2, Follow(T) = {$,),+}


action[2,$] = reduce 2 (2: E T )
Action[2, )] = reduce 2
Action[2, +)] = reduce 2

other reduces can be done in the same way

From Rule 2.3.

E E . is I1, action[1,$] = accept

Constructing SLR Parsing Table


:Example
From 3 - Creating the parsing goto table

o
.
.
.
.
.
.
.
.

Take
Take
Take
Take
Take
Take
Take
Take
Take

E in I0, goto(I0,E)=I1 then goto[0,E]=1


T in I0 , goto(I0,T)= I2 then goto[0, T] = 2
T in I0, goto(I0,F)= I3 then goto[0, F] = 3
E in I4, goto(I4,E)=I8 then goto[4,E]=8
T in I4 , goto(I4,T)= I2 then goto[4, T] = 2
F in I4, goto(I4,F)= I3 then goto[4, F] = 3
T in I6 , goto(I6,T)= I9 then goto[6, T] = 9
F in I6, goto(I6,F)= I3 then goto[6, F] = 3
F in I7, goto(I7,F)= I10 then goto[7, F] = 10

Parsing Tables of Expression


Grammar

Action Table

stat
e
0

id

s5

s4

s6

r2

s7

r2

r2

r4

r4

r4

r4

s4
r6

r6

s5

s4

s5

s4
s6

r1

10

r3

Goto Table
8

r6

6
8

acc

s5

r6
10

s7

s1
1
r1

r1

r3

r3

r3

Exercise
Construct SLR Parse table for the augmented

grammar and show how the parser accepts the


string or input () ()
1: S S
2:S (S)S
3:S l
Answer

SLR(1) Grammar
An LR parser using SLR(1) parsing tables for a

grammar G is called as the SLR(1) parser for G.


If a grammar G has an SLR(1) parsing table, it is

called SLR(1) grammar (or SLR grammar in short).


Every SLR grammar is unambiguous, but every

unambiguous grammar is not a SLR grammar.


If the SLR parsing table of a grammar G has a
conflict, we say that that grammar is not SLR
grammar.
shift/reduce conflict, reduce/reduce conflict

Error Recovery in LR Parsing


An LR parser will detect an error when it

consults the parsing action table and finds


an error entry. All empty entries in the action
table are error entries.
missing operand
unbalanced right parenthesis

Errors are never detected by consulting the

goto table.
Some error recovery are
Discard zero or more input symbols until a symbol

a is found
By marking each empty entry in the action table
with a specific error routine.

Assignment 3
Given the following grammar where a, b, & c

are terminals and S, X, Y are non-terminals


S XaYb | Y |l
X aY | c
Y bX | a
Build LL(1) Parsing table for the grammar (Show

all the necessary steps).


What can you say about ambiguity of the grammar?
show how the parser accepts/rejects the input cabbab

Build Simple LR Parsing table for the grammar

(Show all the necessary steps).


Is there any conflicts during shift-reduce parsing?
show how the parser accepts/rejects the string cabbab

You might also like