Syntax Analyzer Parser Chapter Explains Context-Free Grammars

Syntax Analyzer Parser
Chapter 3
By Esubalew Alemneh
Contents (Session-1)
Introduction
Context-free grammar
Derivation
Parse Tree
Ambiguity
Resolving Ambiguity
Immediate & Indirect Left Recursion

Eliminating Immediate & Indirect Left Recursion
Left Factoring
Non-Context Free Language Constructs
Introduction
Abstract representations of the input program could be:

abstract-syntax tree/parse tree + symbol table
intermediate code
object code
Syntax analysis is done by the parser.
Produces a parse tree from which intermediate code can be
generated
By detecting whether the program is written following the
grammar rules.
Reports syntax errors, attempts error correction and
recovery
Collects information into symbol tables
Introduction
Error
Source
program
Lexical
analyzer
token
Request
for token
Parse
parser tree
Rest of
front end
Symbol
table
Parsers can be
Top-down or Bottom-up
Int.
code
Context Free Grammars(CFG)

CFG is used to specify the structure of legal programs.
The design of the grammar is an initial phase of the
design of a programming language.

Formally a CFG G = (Vt,Vn,S,P), where:
Vt is the set of terminal symbols in the grammar
(i.e.,the set of tokens returned by the scanner)

Vn, the non-terminals, are variables that denote sets of
(sub)strings occurring in the language. These impose a
structure on the grammar.
S is the start/goal symbol, a distinguished non-terminal in Vn
denoting the entire set of strings in L(G).
P is a finite set of productions specifying how terminals and
non-terminals can be combined to form strings in the
language.
Each production must have a single non-terminal on its left
hand side.
The set V = Vt Vn is called the vocabulary of G

Example (G1):
E E+E | EE | E*E | E/E | -E

E (E)
E id
Where
Vt = {+, -, *, / (,), id}, Vn = {E}
S = {E}
Production are shown above
Sometimes can be replaced by ::=
CFG is more expressive than RE - Every language that
can be described by regular expressions can also be

described by a CFG
L = {anbn | n>=1}, is an example language that can be
expressed by CFG but not by RE

Context-free grammar is sufficient to describe most
programming languages.

BNF (Backus Normal Form or BackusNaur Form) a
notation techniques for context-free grammars, often

used to describe the syntax of languages used in
computing
Has many extensions and variants
Extended BackusNaur Form (EBNF)
Augmented BackusNaur Form (ABNF).
A BNF specification is a set of derivation rules, written as

<symbol> ::= expression
BNF for valid arithmetic expression
<expr> ::= <expr> <op> <expr>

<expr> ::= ( <expr> )
<expr> ::= - <expr>
<expr> ::= id
<op> ::= + | - | * | /
Derivation
A sequence of replacements of non-terminal
symbols to obtain strings/sentences is called a

derivation
If we have a grammar E E+E then we can
replace E by E+E
In general a derivation step is A if
there is a production rule A in a grammar
where and are arbitrary strings of terminal
and non-terminal symbols
Derivation of a string should start from a
is a sentential form (terminals & nonterminals Mixed)

is a sentence if it contains
only terminal symbols
production with start symbol in the left

S
Derivation
Derivate string (id+id) from G1
E -E -(E) -(E+E) -(id+E) -(id+id) (LMD)

OR
E -E -(E) -(E+E) -(E+id) -(id+id) (RMD)
At each derivation step, we can choose any of the non-
terminal in the sentential form of G for the replacement.

If we always choose the left-most non-terminal in each
derivation step, this derivation is called as left-most

derivation(LMD).
If we always choose the right-most non-terminal in each
derivation step, this derivation is called as right-most

derivation(RMD).
Parse Tree
A parse tree can be seen as a graphical
representation of a derivation
Inner nodes of a parse tree are non-terminal
symbols.
The leaves of a parse tree are terminal symbols.
E
E -E
-(E)
E
-
-(E+E)
E
(
E
-
-(id+E)
E
-
E
(
id
-(id+id)
E
(
id
id
E
-
E
(
Ambiguity
An ambiguous grammar is one that produces
more than one LMD or more than one RMD for

the same sentence. E E*E
E E+E
id+E
id+E*E
id+id*E
Eid+id*id
E
id
id
E+E*E
id+E*E
id+id*E
id+id*id
E
E
E
id
E
id
*
E
id
E
id
Ambiguity
For the most parsers, the grammar must be
unambiguous.
If a grammar unambiguous grammar then there are
unique selection of the parse tree for a sentence
We should eliminate the ambiguity in the grammar
during the design phase of the compiler.
An unambiguous grammar should be written to
eliminate the ambiguity.
We have to prefer one of the parse trees of a
sentence (generated by an ambiguous grammar)
to disambiguate that grammar to restrict to this
choice.
AmbiguityDangling If
stmt if expr then stmt |

if expr then stmt else stmt
if E1 then if E2 then S1 else S2

stmt
stmt
expr then
E1
| otherstmts
stmt
else
if expr then
E2
S1
stmt
stmtexpr then stmt

if
SE
21
if expr then stmt else s

E2
S1
S2
We prefer the second parse tree (else
matches with closest if).

So, we have to disambiguate our grammar
Resolving Ambiguity
Option 1: add a meta-rule e.g. precedence and
associativity rules
For example else associates with closest previous if
works, keeps original grammar intact
ad hoc and informal
Option 2: rewrite the grammar to resolve ambiguity
explicitly
stmt matchedstmt | unmatchedstmt
matchedstmt if expr then matchedstmt else matchedstmt |
otherstmts
unmatchedstmt if expr then stmt |
if expr then matchedstmt else unmatchedstmt
formal, no additional rules beyond syntax
sometimes obscures original grammar
Resolving Ambiguity
Option 3: redesign the language to remove
the ambiguity
Stmt ::= ... |
if Expr then Stmt end |
if Expr then Stmt else Stmt end
formal, clear, elegant
allows sequence of Stmts in then and else
branches, no { , } needed
extra end required for every if
Left Recursion
A grammar is left recursive if it has a non-
terminal A such that there is a derivation.

A A
for some string
Top-down parsing techniques cannot handle left-
recursive grammars.
So, we have to convert our left-recursive grammar
into an equivalent grammar which is not leftrecursive.
Two types of left-recursion
immediate left-recursion - appear in a single step of the
derivation (),
Indirect left-recursion - appear in more than one step of the
derivation.
Eliminating Immediate Left Recursion

AA|
where does not start with A

eliminate immediate left
recursion
A A
A A |
A A |
A
|
OR
In general,
A A 1 | ... | A m | 1 | ... | n
not start with A
where 1 ... n do
eliminate immediate left recursion
A 1 A | ... | n A
A 1 A | ... | m A |
an equivalent grammar
Eliminating Left Recursion

Remove left recursion from the grammar
below
E E+T | T
T T*F | F
F id | (E)
Answer
E T E
E +T E |
T F T
T *F T |
F id | (E)
Indirect Left-Recursion
A grammar cannot be immediately left-recursive,
but it still can be left-recursive.

By just eliminating the immediate left-recursion,
we may not get a grammar which is not leftrecursive.
S Aa | b
A Sc | d This grammar is not immediately leftrecursive,
but it is still left-recursive.
S Aa Sca
or
A Sc Aac causes to a left-recursion
So, we have to eliminate all left-recursions from our
grammar
Eliminating Indirect Left-Recursion

Arrange non-terminals in some order: A 1 ... An
we will remove indirect left recursion by
constructing an equivalent grammar G such that
- if Ai Aja is any production of G, then i < j
For each non-terminal in turn, do:
For each terminal Ai such that 1< j<i and we have a
production rule of the form Ai Aj, where the Aj

productions are Aj 1 | |Bn , do:
Replace the production rule Ai Aj with the rule Ai
1 | |Bn
Eliminate any immediate left recursion among the
productions 1

Example 1
S Aa | b
A Ac | Sd | f
- Order of non-terminals: S = A1, A = A2
A1 A2 a | b
A2 A2 c | A1 d | f
The only production with j<i is A2 A1 d
for A:
- Replace it with A2 A2 ad | bd
A2 A2 c | A2 ad | bd | f
- Eliminate the immediate left-recursion in A
A2 bdA|bdA
A cA | adA|
So, the resulting equivalent grammar which is not left-recursive is:

S Aa | b
A bdA | fA
A cA | adA |

Example 2
A1 A2 A3
A2 A3 A1 | b
A3 A1 A1 | a
Replace A3 A1 A1 by A3 A2 A3 A1
and then replace this by
A3 A3 A1 A3 A1 and A3 b A3 A1
Eliminating direct left recursion in the above,
gives: A3 aK | b A3 A1K
k A1 A3 A1K |
The resulting grammar is then:

A1 A2 A3
A2 A3 A1 | b
A3 aK | b A3 A1K
k A1 A3 A1K |
Left Factoring
A predictive parser (a top-down parser without
backtracking) insists that the grammar must be

left-factored.
stmt if expr then stmt else stmt |
if expr then stmt
when we see if, we cannot know which
production rule to choose to re-write stmt in the
derivation.
In general,
A 1 | 2 where is non-empty and the first
symbols of 1 and 2 (if they have one)are different
when processing we cannot know whether to expand
A to 1
or
A to 2
Left Factoring
But, if we re-write the grammar as follows
A A
A 1 | 2
so, we can immediately expand A to A
Left Factoring Algorithm
For each non-terminal A with two or more
alternatives (production rules) with a common

non-empty prefix, let say
A 1 | ... | n | 1 | ... | m
convert it into
A A | 1 | ... | m
A 1 | ... | n
Left Factoring
Example1
Example2
A abB | aB | cdg | cdeB A ad | a | ab | abc | b

| cdfB
A aA | cdg | cdeB |
cdfB
A bB | B
A aA | cdA
A bB | B
A g | eB | fB
A aA | b
A d | | b | bc
A aA | b
A d | | bA
A | c
Non-Context Free Language Constructs

There are some language constructions in the
programming languages which are not context-free.

This means that, we cannot write a context-free
grammar for these constructions.
L1 = { c | is in (a|b)*} is not context-free
declaring an identifier and checking whether it is
declared or not later. We cannot do this with a

context-free language. We need semantic analyzer
(which is not context-free).
L2 = {anbmcndm | n1 and m1 } is not context-free
declaring two functions (one with n parameters, the
other one with m parameters), and then calling them

with actual parameters.
Contents(Session-2)
Top Down Parsing
Recursive-Descent Parsing
Predictive Parser
Recursive Predictive Parsing
Non-Recursive Predictive Parsing
LL(1) Parser Parser Actions

Constructing LL(1) - Parsing Tables
Computing FIRST and FOLLOW functions
LL(1) Grammars
Properties of LL(1) Grammars
Top Down Parsing

Top-down parsing involves constructing a parse tree for
the input string, starting from the root

Basically, top-down parsing can be viewed as finding a
leftmost derivation for an input string.
How it works? Start with the tree of one node labeled with
the start symbol and repeat the following steps until the fringe
of the parse tree matches the input string
1. At a node labeled A, select a production with A on its LHS
and for each symbol on its RHS, construct the appropriate
child
2. When a terminal is added to the fringe that doesn't match
the input string, backtrack
3. Find the next node to be expanded
! Minimize the number of backtracks as much as
possible
Top Down Parsing

Two types of top-down parsing
Backtracking is needed (If a choice of a production rule
does not work, we backtrack to try other alternatives.)

It is a general parsing technique, but not widely used
because it is not efficient
Predictive Parsing
no backtracking and hence efficient
needs a special form of grammars (LL(1) grammars).
Two types
Recursive
Predictive Parsing is a special form of

Recursive Descent Parsing without backtracking.
Non-Recursive (Table Driven) Predictive Parser is
also known as LL(1) parser.
It tries to find the left-most derivation.
Backtracking is needed
Example
S aBc
B bc | b
input: abc
A left-recursive grammar can cause a
recursive-descent parser, even one with

backtracking, to go into an infinite loop.
That is, when we try to expand a non-terminal B,
we may eventually find ourselves again trying

to expand B without having consumed any input.
Predictive Parser
A grammar
eliminate
a grammar suitable for predictive

parsing (a LL(1) grammar)
left
no %100 guarantee.
When re-writing a non-terminal in a derivation step, a
predictive parser can uniquely choose a production
rule by just looking the current symbol in the input
string.
Note: When we are
stmt if ......
|
trying to write the nonwhile ...... |
terminal stmt, we can
begin ...... |
uniquely choose the
for .....
production rule by just
left recursion
factor
looking the current

However, even though we eliminate the left
token.
recursion in the grammar, and left factor it, it may not
be suitable for predictive parsing (not LL(1) grammar).

Predictive Parsing can be recursive or non-recursive
In recursive predictive parsing, each non-terminal
corresponds to a procedure/function.
Example
A aBb | bAB
proc A {
case of the current token {
a: - match the current token with a, and move to the next token;
- call B;
- match the current token with b, and move to the next token;
b: - match the current token with b, and move to the next token;
- call A;
- call B;
}
}

When to apply -productions?
A aA | bB | l
If all other productions fail, we should apply an l-
production. For example, if the current token is not

a or b, we may apply the
-production.
Most correct choice: We should apply a l-
production for a non-terminal A when the current

token is in the follow set of A (which terminals can
follow A in the sentential forms).

A non-recursive predictive parser can be built
by maintaining a stack explicitly, rather than

implicitly via recursive calls
Non-Recursive predictive parsing is a table-
driven top-down parser.
Model of a table-driven
predictive parser

Input buffer
our string to be parsed. We will assume that its end is marked
with a special symbol $.
Output
a production rule representing a step of the derivation
sequence (left-most derivation) of the string in the input

buffer.
Stack
contains the grammar symbols
at the bottom of the stack, there is a special end marker
symbol $.
initially the stack contains only the symbol $ and the starting
symbol S.
when the stack is emptied (i.e. only $ left in the stack), the
parsing is completed.
Parsing table
a two-dimensional array M[A,a]
LL(1) Parser Parser Actions

The symbol at the top of the stack (say X) and the
current symbol in the input string (say a) determine

the parser action.
There are four possible parser actions.
1. If X and a are $ parser halts (successful
completion)
2. If X and a are the same terminal symbol (different
from $)
parser pops X from the stack, and moves the
next symbol in the input buffer.
3. If X is a non-terminal
parser looks at the parsing table entry M[X,a].
If M[X,a] holds a production rule XY1Y2...Yk, it
pops X from the stack and pushes Yk,Yk-1,...,Y1 into
the stack. The parser also outputs the production
LL(1) Parser Example1

S aBa
B bB | S
B
stack
b LL(1)
$
Parsing
Table
S aBa
B
input
B bB
output
$S
$aBa
$aB
$aBb
$aB
$aBb
$aB
$a
abba$
S aBa
abba$
bba$
B bB
bba$
ba$
B bB
ba$
a$
B
a$
We will see
how to
construct
parsing
table Very
soon
accept, successful completion

E TE
E +TE |
T FT
T *FT |
F (E) | id
id
E
E TE
E
E +TE
T T FT
T
T
F F id
E is start symbol
E TE
T FT
T *FT
F (E)

stack inputoutput
$E id+id$ E TE
$ET id+id$ T FT
$E TF id+id$ F id
$ E Tid id+id$
$ E T +id$ T
$ E +id$ E +TE
$ E T+ +id$
$ E T id$ T FT
$ E T F id$ F id
$ E Tid id$
$ E T $ T
$ E $ E
$ $ accept

Taking Input
id+id*id
which is formed
from the Grammar
for Example 2
Constructing LL(1) Parsing Tables

Two functions are used in the construction of
LL(1) parsing tables:

FIRST
FOLLOW
FIRST() is a set of the terminal symbols which
occur as first symbols in strings derived from

where is any string of grammar symbols.
if derives to , then is also in FIRST() .
FOLLOW(A) is the set of the terminals which
occur immediately after the non-terminal A in

the strings derived from the starting symbol.
a terminal a is in FOLLOW(A)
if S Aa
Compute FIRST for a String X

1. If X is a terminal symbol, then
FIRST(X)={X}
2. If X is , then FIRST(X)={}
3. If X is a non-terminal symbol and X is
a production rule, then add in FIRST(X).
4. If X is a non-terminal symbol and X
Y1Y2..Yn is a production rule, then

if a terminal a in FIRST(Yi) and is in all
FIRST(Yj) for j=1,...,i-1, then a is in
FIRST(X).
Compute FIRST for a String X

Example
E TE
E +TE |
T FT
T *FT|
F (E) | id
From Rule 1
FIRST(id) = {id}
From Rule 2
FIRST() = {}
From Rule 3 and
4
First(F) = {(, id}
First(T) = {*, }
FIRST(E) = {+, }
FIRST(E) = {(,id}
Others
FIRST(TE) = {(,id}
FIRST(+TE ) = {+}
FIRST(FT) = {(,id}
FIRST(*FT) = {*}
FIRST((E)) = {(}
Compute FOLLOW (for non-terminals)

1. $ is in FOLLOW(S), if S is the start symbol
2. Look at the occurrence of a nonterminal on the
RHS of a production which is followed by

something
if A B is a production rule, then everything in
FIRST() except is FOLLOW(B)
3. Look at B on the RHS that is not followed by
anything
If ( A B is a production rule )
or ( A B is
a production rule and is in FIRST() ), then
everything in FOLLOW(A) is in FOLLOW(B).
Compute FOLLOW (for non-terminals)

Example
E TE
E +TE |
iii. T FT
iv. T *FT |
v. F (E) | id
FOLLOW(E) = { $, ) }, because
i.
ii.
.From first rule Follow (E) contains $

.From Rule 2 Follow(E) is first()), from the production F
(E)
FOLLOW(E) = { $, ) } . Rule 3
FOLLOW(T) = { +, ), $ }
From Rule 2 + is in FOLLOW(T)
From Rule 3 Everything in Follow(E) is in Follow(T) since
First(E) contains
.
.
FOLLOW(F) = {+, *, ), $ } same reasoning as
Constructing LL(1) Parsing Table -- Algorithm

For each production rule A of a grammar
1. for each terminal a in FIRST()
add A to M[A,a]
2. If in FIRST()
for each terminal a in FOLLOW(A) add A

to M[A,a]
3. If in FIRST() and $ in FOLLOW(A)
Constructing LL(1) Parsing Table -- Example

E TE
FIRST(TE)={(,id}
E TE into M[E,(] and M[E,id]
E +TE
FIRST(+TE )={+}
E +TE into M[E,+]
T FT
FIRST()={}
none
but since in FIRST()
and FOLLOW(E)={$,)} E into M[E,$] and M[E,)]
FIRST(FT)={(,id}
T *FT FIRST(*FT )={*}
T FT into M[T,(] and M[T,id]

T *FT into M[T,*]
FIRST()={}
none
but since in FIRST()
and FOLLOW(T)={$,),+}
T into M[T,$], M[T,)]
& M[T,+]
F (E)
FIRST((E) )={(}
F (E) into M[F,(]
LL(1) Grammars
A grammar whose parsing table has no multiple-defined
entries is said to be LL(1) grammar.

First L refers input scanned from left, the second L refers leftmost derivation and 1 refers one input symbol used as a lookhead symbol do determine parser action input scanned from left
to right
A grammar G is LL(1) if and only if the following conditions
hold for two distinctive production rules A and A

1. Both and cannot derive strings starting with same terminals.
2. At most one of and can derive to .
3. If can derive to , then cannot derive to any string starting
with a terminal in FOLLOW(A).
From 1 & 2, we can say that First( ) I First() = 0

From 3, means that if
and the like
is in First(), then First( ) I Follow(A) = 0
A Grammar which is not LL(1)

The parsing table of a grammar may contain more than one
production rule.
In this case, we say that it is not a LL(1) grammar.
SiCtSE |
EeS |
Cb
FIRST(iCtSE) = {i}
FIRST(a) = {a}
FIRST(eS) = {e}
FIRST() = {}
FIRST(b) = {b}
FOLLOW(S) = { $,e }
FOLLOW(E) = { $,e }
FOLLOW(C) = { t }
S iCtSE
S Sa
EeS
Cb
two production rules for M[E,e]

Problem ambiguity
A Grammar which is not LL(1)

What do we have to do if the resulting parsing table contains
multiply defined entries?

Eliminate left recursion in the grammar, if it is not eliminated
A A |
any terminal that appears in FIRST() also appears

FIRST(A) because A .
If is , any terminal that appears in FIRST() also appears
in FIRST(A) and FOLLOW(A).
Left factor the grammar, if it is not left factored.
A grammar is not left factored, it cannot be a LL(1) grammar:
A 1 | 2
any terminal that appears in FIRST(1) also appears in
FIRST(
If its (new grammars) parsing table still contains multiply
defined entries, that grammar is ambiguous or it is inherently
not a LL(1) grammar.
An ambiguous grammar cannot be a LL(1) grammar.
Error Recovery in Predictive Parsing

An error may occur in the predictive parsing
(LL(1) parsing)
if the terminal symbol on the top of stack does
not match with

the current input symbol.
if the top of stack is a non-terminal A, the
current input symbol is a,
the parsing table entry M[A,a] is empty.
What should the parser do in an error case?
The parser should be able to give an error
message (as much as possible meaningful error

message).
It should recover from that error case, and it
Contents (Session-3)
Bottom Up Parsing
Handle Pruning
Implementation of A Shift-Reduce
Parser
LR Parsers
LR Parsing Algorithm
Actions of A LR-Parser
Constructing SLR Parsing Tables
SLR(1) Grammar
Error Recovery in LR Parsing
Bottom-Up Parsing
A bottom-up parser creates the parse tree of the
given input starting from leaves towards the root.

A bottom-up parser tries to find the RMD of the given
input in the reverse order.

Bottom-up parsing is also known as shift-reduce
parsing because its two main actions are shift and

reduce.
At each shift action, the current symbol in the input
string is pushed to a stack.

At each reduction step, the symbols at the top of the
stack will be replaced by the non-terminal at the left
side of that production.
Accept: Successful completion of parsing.
Error: Parser discovers a syntax error, and calls an
error recovery routine.
Bottom-Up Parsing
A shift-reduce parser tries to reduce the given input
string into the starting symbol.

a string
the starting symbol
reduced to
At each reduction step, a substring of the input
matching to the right side of a production rule is

replaced by the non-terminal at the left side of that
production rule.
If the substring is chosen correctly, the right most
derivation of that string is created in the reverse order.
Rightmost Derivation:
Shift-Reduce Parser finds:
... S
rm
rm
rm
Shift-Reduce Parsing -- Example

S aABb
A aA | a
B bB | b
input string: aaabb

aaAbb
aAbb
reduction
aABb
S
S aABb aAbb aaAbb aaabb
Right Sentential Forms
How do we know which substring to be
replaced at each reduction step?
Handle
Informally, a handle of a string is a substring that
matches the right side of a production rule.
But not every substring matches the right side of a
production rule is handle
A handle of a right sentential form ( ) is

a production rule A and a position of
where the string may be found and replaced by A to
produce
the previous right-sentential form in a rightmost
derivation of .
S A rm
rm
If the grammar is unambiguous, then every right-
sentential form of the grammar has exactly one

handle.
Handle Pruning
A right-most derivation in reverse can be
obtained by handle-pruning.
S 0 rm
rm
1 rm
2 ...
n-1 n=
rm
rm
input string
Start from n, find a handle Ann in n,
and
replace n by An to get n-1.

Then find a handle An-1n-1 in n-1, and replace n1 in by An-1 to get n-2.
Repeat this, until we reach S.
A Shift-Reduce Parser - example

E E+T | T
Right-Most Derivation of id+id*id
T T*F | FE E+T E+T*F E+T*id E+F*id
F (E) | id
E+id*id T+id*id F+id*id id+id*id
Right-Most Sentential Form
id+id*id F id
F+id*id
TF
T+id*id
ET
E+id*id
F id
E+F*id
TF
E+T*id
F id
E+T*F T T*F
E+T
E E+T
E
Reducing Production
Handles are red and underlined in the right-sentential

forms.
A Stack Implementation of A Shift-Reduce Parser

Stack
Input Action
$ id+id*id$
shift
$id
+id*id$ reduce by F id
$F
+id*id$ reduce by T F
$T
+id*id$ reduce by E T
$E
+id*id$ shift
$E+
id*id$
shift
$E+id *id$
reduce by F id
$E+F *id$
reduce by T F
$E+T *id$
shift
$E+T*
id$ shift
$E+T*id
reduce by F id
$E+T*F
reduce by T T*F
$E+T $
reduce by E E+T
$E
accept
Initial stack just

contains only the
end-marker $ &
the end of the
input string is
marked by the
Parse Tree
end-marker $.
Shift-Reduce Parsers
The most prevalent type of bottom-up parser
today is based on a concept called LR(k) parsing;

left to right
right-most
k lookhead (k is omitted it is 1)
LR-Parsers overs wide range of

Simple LR parser (SLR )
Look Ahead LR (LALR)
CFG
LR
grammars.
LALR
SLR
most general LR parser (LR )
SLR, LR and LALR work same, only their parsing
tables are different.
LR Parsers
LR parsing is attractive because:
LR parsers can be constructed to recognize virtually all
programming-language constructs for which contextfree grammars can be written.
LR parsing is most general non-backtracking shiftreduce parsing, yet it is still efficient.
The class of grammars that can be parsed using LR
methods is a proper superset of the class of grammars
that can be parsed with predictive parsers.
LL(1)-Grammars LR(1)-Grammars
An LR-parser can detect a syntactic error as soon as
it is possible to do so a left-to-right scan of the input.
Drawback of the LR method is that it is too much
work to construct an LR parser by hand.

Use tools e.g. yacc
input
a1
... ai
... an
stack
Sm
Xm
Sm-1
Xm-1
.
.
S1
X1
S0
s
t
a
t
e
s
Action Table
Goto Table
terminals and $
non-terminal
four different
actions
s
t
a
t
e
s
each item is
a state number
output
A Configuration of LR Parsing
Algorithm
A configuration of a LR parsing is:
( So S1 ... Sm, ai ai+1 ... an $ )

Stack
Rest of Input
Sm and ai decides the parser action by consulting the
parsing action table. (Initially Stack contains just S o )

A configuration of a LR parsing represents the right
sentential form:
X1 ... Xm ai ai+1 ... an $
Xi is the grammar symbol represented by state s i
Actions of A LR-Parser
1. If ACTION[Sm,
ai ] = shift s, the parser executes a shift
move ; it shifts the next state s onto the stack, entering the
configuration
( So S1 ... Sm, ai ai+1 ... an $ ) ( So S1 ... Sm s, ai+1 ... an $ )
2. If ACTION[Sm,
ai ] = reduce A, then the parser executes
a reduce move changing configuration from
r is the
length of , and s = GOTO[sm-r, A]. Output is the reducing
production A
Here the parser first popped r state symbols off the stack,
exposing state sm-r then the parser pushed s.
( So S1 ... Sm, ai ai+1 ... an $ ) to ( So S1 ... Sm-r s, ai ... an $ ) where
3. If ACTION[Sm, ai ] = Accept, parsing successfully completed

4. If ACTION[Sm,
ai ] = Error, parser detected an error (an
empty entry in the action table)
LR-parsing algorithm
(SLR) Parsing Tables for Expression

Grammar
Expression
Grammar
1) E E+T
2) E T
3) T T*F
4) T F
5) F (E)
6) F id
Action Table
state
id
s5
Goto Table
)
s4
s6
r2
s7
r2
r2
r4
r4
r4
r4
s4
r6
acc
s5
r6
r6
s5
s4
s5
s4
r6
10
s6
r1
s7
r1
r1
10
r3
r3
r3
r3
s11
Actions of A (S)LR-Parser -Example

For id*id+id
stack
input
id*id+id$
0id5
*id+id$
reduce by Fid
Fid
0F3
*id+id$
reduce by TF
TF
0T2
*id+id$
shift 7
0T2*7
id+id$
shift 5
0T2*7id5
action
output
shift 5
+id$
0T2*7F10 (*)+id$
reduce by Fid
Fid
reduce by TT*F
TT*F
0T2
+id$
reduce by ET
0E1
+id$
shift 6
ET
0E1+6
id$
shift 5
0E1+6id5
reduce by Fid
Fid
0E1+6F3
reduce by TF
TF
0E1+6T9 (**)
0E1
$
accept
b/c goto(0, F)
=3
b/c goto(0, T)
=2
reduce by EE+T
b/c goto(7, F)
= 10
b/c goto(0, T)
=2
(*) T2*7F10
EE+T
reduced by T
b/c
Conflicts During Shift-Reduce Parsing

There are context-free grammars for which shift-
reduce parsers cannot be used.

Stack contents and the next input symbol may
not decide action:
shift/reduce conflict: Whether make a shift
operation or a reduction.
reduce/reduce conflict: The parser cannot
decide which of several reductions to make.
If a shift-reduce parser cannot be used for a
grammar, that grammar is called as non-LR(k)
grammar.
An ambiguous grammar can never be a LR
grammar

LR(0) Item
An LR parser makes shift-reduce decisions by maintaining states to
keep track of where we are in a parse.

An LR(0) item of a grammar G is a production of G a dot at the some
position of the right side.
Ex:
A aBb
Possible LR(0) Items: A
(four different possibility)
aBb
A a Bb
.
.
A aB b
A aBb
Sets of LR(0) items will be the states of action and goto table of the
SLR parser.
i.e. States represent sets of "items.
A collection of sets of LR(0) items (the canonical LR(0) collection) is
the basis for constructing SLR parsers.

To construct the canonical LR(0) collection for a
grammar, we define an augmented grammar and two

functions, CLOSURE and GOTO.
Augmented Grammar:
G is G with a new production rule SS where S is the new
starting symbol.
Purpose: to provide a single production that, when reduced,
signals the end of parsing
If I is a set of LR(0) items for a grammar G, then
closure(I) is the set of LR(0) items constructed from

I by the two rules:
1. Initially, every LR(0) item in I is added to closure(I).
2. If A .B is in closure(I), for all production rules B in G,
add B. in the closure(I).

We will apply this rule until no more new LR(0) items
can be added to closure(I).
Closure (I) . Example

Give a grammar
EE+T|T
TT*F|F
F ( E ) | id
Then,
Closure({T T .* F}) = {T T . * F}
Closure ({T T .* F, T T * .F}) = {T T .* F, T
T * .F, F .(E ), F .id}
Closure ({F ( .E ) } ) = {F ( .E ), E . E + T, E
. T, T . T * F, T . F, F .( E ), F . id }
closure({E .E}) ={E .E, E .E+T, E .T,
T .T*F, T .F, F .(E), F .id}
Goto Operation
If I is a set of LR(0) items and X is a grammar symbol (terminal or non-terminal), then goto(I,X) is
defined as follows:
If A X in I, then every item in closure({A X }) will be in goto(I,X).
Example:
I ={E
F
.
.
E, E
(E), F
E+T, E
T, T
T*F, T
F,
id}
. .
. .
. .
. .
.
.
.
. .
. .
.
.
goto(I,E) = closure({ E E , E E +T }) = { E E , E E +T })
goto(I,T) = closure({ E T , T T *F }) = {E T , T T *F}
goto(I,F) = closure({ T F
}) = { T F
})
goto(I,() = closure({ F ( E)}) = {F ( E), E

goto(I,id) = closure( { F id
}) = { F id
Goto({E E ., E E + .T},+) = closure({ }) = { }
E+T, E
T, T
T*F, T
F, F
(E), F
id }
Construction of The Canonical LR(0)

Collection
To create the SLR parsing tables for a
grammar G, we will create the canonical LR(0)

collection of the grammar G.
Algorithm:
Void items(G) {
C = { CLOSURE({S.S}) }
repeat
for (each set of items I in C)
for(each grammar symbol X)
if (goto(I,X) is not empty
and not in C)
add goto(I,X) to C
Until no new sets of items are added
The Canonical LR(0) Collection -Example

C = {closure({E .E}) } = {E . E, E . E + T, E
. T, T . T * F, T . F, F .( E ), F . id}.
This gives us the items for the first state (state0
I0) of our DFA
Now we need to compute Goto functions for all
of the relevant symbols in the set.

In this case, we care about the symbols E, T, F, (,
and id, since those are the symbols that have a

.symbol in front of them in some item of the set C.
For symbol E, Goto(I0, E) = closure({E
E ., E
E . + T})
= {E E ., E E . + T} = call it I1
For symbol T, Goto(I0, T) = closure({E T ., T
For symbol F, Goto(I0, F) =closure({T F.}) = {T
F.} = I3
For symbol (, Goto(I0, () = closure({F ( .E ) }) = {F
(.E ), E . E + T, E . T, T . T * F, T . F, F .(
E ), F . Id} = I4
For symbol id, Goto(I0, id) = closure({F id.}) = {F
id.} = I5
Repeat this step for newly created states (I1, I2, I3, I4, I5)
till . occures at the end of kernal of each state.

For symbol +, Goto(I1, +) = closure({E E +. T}) = {E
E +. T, }, T . T * F, T . F, F .( E ), F . Id} = I6
For symbol *, Goto(I2, *) = closure({T T * .F})= {T
T * .F, F .( E ), F . Id} = I7.
For symbol E, Goto(I4, E) = closure({F ( E. ), E
E .+ T})
Summary of states obtained , and to which state each

production in a state goes
Transition Diagram (DFA) of Goto

Function
LR(0) automaton for the Example

E E
E E+T
closure({E
E}) =
{ E
ET
T T*F
TF
.
.
.
E Rule 1
E+T Rule 2
T Rule 2
T*F
Rule 2
Constructing SLR Parsing Table -Example

Before we start construction of SLR action/goto
tables, we need to compute the follow sets for

all of the non-terminals in the grammar and we
need to number the productions.
0: E E
1: E E + T
Follow(E) = {$}
2: E T
Follow(E) = {$,),+}
3: T T * F
Follow(T) = {$,),+}
4: T F
Follow(F) = {$,),+,*}
5: F ( E )
6: F id
Each terminal is column for action table, each nonterminal is column for GOTO table and each state is
row for both tables
Constructing SLR Parsing Table

1. Construct the canonical collection of sets of LR(0)
items for G.
C{I0,...,In}
2. Create the parsing action table as follows
2.1. If a is a terminal, A.a in Ii and goto(Ii,a)=Ij

then action[i, a] is shift j.
2.2. If A. is in Ii , then action[i,a] is reduce A for
all a in FOLLOW(A) where AS.
2.3. If SS. is in Ii , then action[i,$] is accept.
2.4. If any conflicting actions generated by these rules,
the grammar is not SLR(1).
3. Create the parsing goto table
. for all non-terminals A, if goto(I i,A)=Ij then goto[i,A]=j
4. All entries not defined by (2) and (3) are errors.

5. Initial state of the parser contains S.S

:Example
From Rule 2.1.
Take F .( E ) from I0, Goto(I0, ( ) = I4, then action[0, (] = shift
4
Take E E . + T from I1, Goto(I1, +) = I6, then action[1, +] =
shift 6
Take T T . * F from I2, Goto(I2, *) = I7, then action[2, *] = shift
7
other shifts can be populated in the same way
From Rule 2.2.
Take E T. from I2, Follow(T) = {$,),+}

action[2,$] = reduce 2 (2: E T )
Action[2, )] = reduce 2
Action[2, +)] = reduce 2
other reduces can be done in the same way
From Rule 2.3.
E E . is I1, action[1,$] = accept

:Example
From 3 - Creating the parsing goto table
o
.
.
.
.
.
.
.
.
Take
Take
Take
Take
Take
Take
Take
Take
Take
E in I0, goto(I0,E)=I1 then goto[0,E]=1

T in I0 , goto(I0,T)= I2 then goto[0, T] = 2
T in I0, goto(I0,F)= I3 then goto[0, F] = 3
E in I4, goto(I4,E)=I8 then goto[4,E]=8
F in I4, goto(I4,F)= I3 then goto[4, F] = 3
Parsing Tables of Expression

Grammar
Action Table
stat
e
0
id
s5
s4
s6
r2
s7
r2
r2
r4
r4
r4
r4
s4
r6
r6
s5
s4
s5
s4
s6
r1
10
r3
Goto Table
8
r6
6
8
acc
s5
r6
10
s7
s1
1
r1
r1
r3
r3
r3
Exercise
Construct SLR Parse table for the augmented
grammar and show how the parser accepts the

string or input () ()
1: S S
2:S (S)S
3:S l
Answer
SLR(1) Grammar
An LR parser using SLR(1) parsing tables for a
grammar G is called as the SLR(1) parser for G.

If a grammar G has an SLR(1) parsing table, it is
called SLR(1) grammar (or SLR grammar in short).

Every SLR grammar is unambiguous, but every
unambiguous grammar is not a SLR grammar.

If the SLR parsing table of a grammar G has a
conflict, we say that that grammar is not SLR
grammar.
shift/reduce conflict, reduce/reduce conflict
Error Recovery in LR Parsing

An LR parser will detect an error when it
consults the parsing action table and finds

an error entry. All empty entries in the action
table are error entries.
missing operand
unbalanced right parenthesis
Errors are never detected by consulting the
goto table.
Some error recovery are
Discard zero or more input symbols until a symbol
a is found
By marking each empty entry in the action table
with a specific error routine.
Assignment 3
Given the following grammar where a, b, & c
are terminals and S, X, Y are non-terminals

S XaYb | Y |l
X aY | c
Y bX | a
Build LL(1) Parsing table for the grammar (Show
all the necessary steps).

What can you say about ambiguity of the grammar?
show how the parser accepts/rejects the input cabbab
Build Simple LR Parsing table for the grammar
(Show all the necessary steps).

Is there any conflicts during shift-reduce parsing?
show how the parser accepts/rejects the string cabbab

Syntax Analyzer Parser Chapter Explains Context-Free Grammars

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Syntax Analyzer Parser Chapter Explains Context-Free Grammars

Uploaded by

Copyright:

Available Formats

Syntax Analyzer Parser

Immediate & Indirect Left Recursion

Abstract representations of the input program could be:

Context Free Grammars(CFG)

design of a programming language.

(i.e.,the set of tokens returned by the scanner)

The set V = Vt Vn is called the vocabulary of G

Context Free Grammars(CFG)

E E+E | EE | E*E | E/E | -E

Sometimes can be replaced by ::=

CFG is more expressive than RE - Every language that

can be described by regular expressions can also be

expressed by CFG but not by RE

Context Free Grammars(CFG)

notation techniques for context-free grammars, often

A BNF specification is a set of derivation rules, written as

BNF for valid arithmetic expression

<expr> ::= <expr> <op> <expr>

symbols to obtain strings/sentences is called a

and non-terminal symbols

Derivation of a string should start from a

is a sentential form (terminals & nonterminals Mixed)

production with start symbol in the left

E -E -(E) -(E+E) -(id+E) -(id+id) (LMD)

terminal in the sentential form of G for the replacement.

derivation step, this derivation is called as left-most

derivation step, this derivation is called as right-most

more than one LMD or more than one RMD for

stmt if expr then stmt |

if E1 then if E2 then S1 else S2

stmtexpr then stmt

if expr then stmt else s

We prefer the second parse tree (else

matches with closest if).

Option 2: rewrite the grammar to resolve ambiguity

terminal A such that there is a derivation.

for some string

Top-down parsing techniques cannot handle left-

Eliminating Immediate Left Recursion

where does not start with A

eliminate immediate left recursion

Eliminating Left Recursion

but it still can be left-recursive.

Eliminating Indirect Left-Recursion

production rule of the form Ai Aj, where the Aj

Eliminating Indirect Left-Recursion

So, the resulting equivalent grammar which is not left-recursive is:

Eliminating Indirect Left-Recursion

The resulting grammar is then:

backtracking) insists that the grammar must be

so, we can immediately expand A to A

Left Factoring Algorithm

For each non-terminal A with two or more

alternatives (production rules) with a common

A abB | aB | cdg | cdeB A ad | a | ab | abc | b

Non-Context Free Language Constructs

programming languages which are not context-free.

declared or not later. We cannot do this with a

other one with m parameters), and then calling them

LL(1) Parser Parser Actions

Top Down Parsing

the input string, starting from the root

T FT FIRST(FT )={*}