Professional Documents
Culture Documents
Bottom up parsing
Bottom Up Parsing
Shift-Reduce Parsing
Reduce a string to the start symbol of the grammar.
At every step a particular sub-string is matched (in left-to-right fashion)
to the right side of some production, Replace this string by the LHS
(called reduction).
If the substring is chosen correctly at each step, it is the trace of a
rightmost derivation in reverse
Reverse
order
Consider:
abbcde
S aABe
aAbcde
A Abc | b
aAde
Bd
aABe
S
Rightmost Derivation:
S aABe aAde aAbcde abbcde
Handle
A Handle of a string
A substring that matches the RHS of some production and whose
reduction represents one step of a rightmost derivation in reverse
So we scan tokens from left to right, find the handle, and replace it
by corresponding LHS
Formally:
handle of a right sentential form is a production A ,
location of in , that satisfies the above property.
i.e. A is a handle of at the location immediately after the
end of , if:
S => A =>
Handle
A certain sentential form may have many different handles.
Handle-pruning,
The process of discovering a handle & reducing it to the
appropriate left-hand side is called handle pruning.
Handle pruning forms the basis for a bottom-up parsing method.
Problems:
Two problems:
locate a handle and
decide which production to use (if there are more than two candidate
productions).
Stack
$
Input String
num
num
num
$
10
num
num
num
)
11
SHIFT
num
num
num
)
12
SHIFT
num
num
num
)
13
REDUCE
num
num
num
)
14
REDUCE
num
num
num
)
15
REDUCE
Expr
num
*
num
num
)
16
SHIFT
Expr
num
*
num
num
)
17
SHIFT
Expr
num
(
num
num
)
18
Op
REDUCE
Expr
num
*
(
num
num
)
19
Op
SHIFT
Expr
num
*
(
num
num
)
20
(
Op
SHIFT
Expr
num
*
num
num
)
21
(
Op
SHIFT
Expr
num
*
num
num
)
22
SHIFT
num
(
Op
Expr
num
*
+
num
)
23
REDUCE
SHIFT
Expr
(
Op
Expr
num
num
+
num
)
24
SHIFT
Expr
(
Op
Expr
num
num
+
num
)
25
SHIFT
Expr
(
Op
Expr
num
num
num
)
26
Op
REDUCE
SHIFT
Expr
(
Op
Expr
num
num
+
num
)
27
Op
SHIFT
Expr
(
Op
Expr
num
num
+
num
)
28
Op
SHIFT
Expr
(
Op
Expr
num
num
+
num
)
29
REDUCE
SHIFT
Expr
(
Op
Expr
num
num
num
)
30
REDUCE
SHIFT
Expr
(
Op
Expr
Expr
Op
Expr
num
num
num
)
31
SHIFT
Expr
(
Op
Expr
Expr Expr Op
Expr
Expr (Expr)
Expr - Expr
Expr num
Op +
Op Op *
Expr
Op
Expr
num
num
num
)
32
SHIFT
Expr
(
Op
Expr
Expr
Op
Expr
num
num
num
33
)
Expr
Expr
REDUCE
Expr
Op
Expr
num
Op
Expr
num
num
34
Expr
Expr
Op
REDUCE
Expr
Expr
num
Expr
Op
Expr
num
num
35
Expr
Expr
Op
ACCEPT!
Expr
Expr
num
Expr
Op
Expr
num
num
36
Basic Idea
37
Term
Term
Term
Fact.
Fact.
Fact.
<id,y>
<id,x> <num,2>
SEXP
EXP EXP + TERM |TERM
TERMTERM*F ACT| FACT
FACT(EXP)|ID |NUM
38
right-most
derivation
k lookhead
Stack
if then stmt
Input
else
Confilcts Resolution
Conflict resolution by adapting the parsing algorithm (e.g., in
parser generators)
Shift-reduce conflict
Resolve in favor of shift
Reduce-reduce conflict
Use the production that appears earlier
41
Shift-Reduce Parsers
1. Operator-Precedence Parser
LALR
2. LR-Parsers
SLR
43
Operator-Precedence Parser
Operator grammar
small, but an important class of grammars
we may have an efficient operator precedence parser (a shift-reduce
parser) for an operator grammar.
In an operator grammar, no production rule can have:
at the right side
two adjacent non-terminals at the right side.
Ex:
EAB
Aa
Bb
not operator grammar
EEOE
Eid
O+|*|/
not operator grammar
EE+E |
E*E |
E/E | id
operator grammar
44
Precedence Relations
In operator-precedence parsing, we define three disjoint precedence
relations between certain pairs of terminals.
a <. b
a = b
a .> b
45
46
id
id
.>
.>
.>
<.
.>
<.
.>
<.
.>
.>
.>
<.
<.
<.
Then the input string id+id*id with the precedence relations inserted
will be:
$ <. id .> + <. id .> * <. id .> $
47
E id
E id
E id
E E*E
E E+E
$ id + id * id $
$ E + id * id $
$ E + E * id $
$ E + E * .E $
$E+E$
$E$
48
The input string is w$, the initial stack is $ and a table holds precedence relations between
certain terminals
Algorithm:
set p to point to the first symbol of w$ ;
repeat forever
if ( $ is on top of the stack and p points to $ ) then return
else {
let a be the topmost terminal symbol on the stack and let b be the symbol pointed to
by p;
if ( a <. b or a = b ) then {
/* SHIFT */
push b onto the stack;
advance p to the next input symbol;
}
else if ( a .> b ) then
/* REDUCE */
repeat pop stack
until ( the top of stack terminal is related by <. to the terminal most recently
popped );
else error();
}
49
input
id+id*id$
+id*id$
+id*id$
id*id$
*id$
*id$
id$
$
$
$
$
action
shift
reduce E id
shift
shift
reduce E id
shift
shift
reduce E id
reduce E E*E
reduce E E+E
accept
id
id
.>
.>
.>
<.
.>
<.
.>
<.
.>
.>
.>
<.
<.
<.
EE+E|E*E |id
50
1.
2.
3.
4.
Also, let
(=)
( <. (
( <. id
$ <. (
$ <. id
id .> )
id .> $
) .> $
) .> )
51
Operator-Precedence Relations
+
.>
.>
*
<.
/
<.
^
<.
id
<.
(
<.
)
.>
$
.>
*
/
.>
.>
<.
<.
<.
.>
.>
<.
<.
<.
.>
.>
<.
<.
<.
.>
.>
<.
.>
.>
.>
.>
<.
.>
.>
.>
.>
^
id
(
)
$
.>
.>
.>
.>
<.
<.
.>
.>
.>
.>
.>
.>
.>
.>
<.
<.
<.
<.
.>
.>
.>
.>
<.
<.
<.
<.
<.
.>
<.
.>
<.
<.
<.
=
.>
<.
.>
<.
52
Then, we make
O <. unary-minus
unary-minus .> O
unary-minus <. O
53
Precedence Functions
Compilers using operator precedence parsers do not need to store the
table of precedence relations.
The table can be encoded by two precedence functions f and g that map
terminal symbols to integers.
For symbols a and b.
f(a) < g(b)
whenever a <. b
f(a) = g(b)
whenever a = b
f(a) > g(b)
whenever a .> b
54
Advantages:
simple
powerful enough for expressions in programming languages
55
56
57
Example
e1:
id
id
e3
e3
.>
.>
<.
<.
=.
e4
e3
e3
.>
.>
<.
<.
e2
e1
e2:
Example
e3:
id
id
e3
e3
.>
.>
<.
<.
=.
e4
e3
e3
.>
.>
<.
<.
e2
e1
e4:
60
LR Parsers
The most powerful shift-reduce parsing (yet efficient) is:
LR(k) parsing.
left to right
scanning
right-most
derivation
k lookhead
(k is omitted it is 1)
62
More on LR(k)
Can recognize virtually all programming language constructs (if CFG
can be given)
Most general non-backtracking shift-reduce method known, but can be
implemented efficiently
Class of grammars can be parsed is a superset of grammars parsed by
LL(k)
Can detect syntax errors as soon as possible
63
More on LR(k)
Main drawback: too tedious to do by hand for typical
programming lang. grammars
We need a parser generator
Many available
Yacc (yet another compiler compiler) or bison for C/C++
environment
CUP (Construction of Useful Parsers) for Java environment;
JavaCC is another example
We write the grammar and the generator produces the parser for that
grammar
64
LR Parsers
LR-Parsers
covers wide range of grammars.
SLR simple LR parser
LR most general LR parser
LALR intermediate LR parser (look-head LR parser)
SLR, LR and LALR work same (they used the same algorithm),
only their parsing tables are different.
65
LR Parsing Algorithm
input a1
... ai
... an
stack
Sm
Xm
LR Parsing Algorithm
Sm-1
output
Xm-1
.
.
Action Table
S1
X1
S0
Goto Table
terminals and $
s
t
a
t
e
s
four different
actions
non-terminal
s
t
a
t
e
s
each item is
a state number
66
Key Idea
Deciding when to shift and when to reduce is based on a DFA
applied to the stack
Edges of DFA labeled by symbols that can be on stack
(terminals + non-terminals)
Transition table defines transitions (and characterizes the type
of LR parser)
67
sn
gn
rk
a
Meaning
69
LR(0) Item
An LR(0) item is a production and a position in its RHS marked by a dot
(e.g., A )
The dot tells how much of the RHS we have seen so far. For example,
for a production S XYZ,
S XYZ: we hope to see a string derivable from XYZ
S XYZ: we have just seen a string derivable from X and we hope
to see a string derivable from YZ
SXY.Z : we have just seen a string derivable from XY and we
hope to see a string derivable from Z
SXYZ. : we have seen a string derivable from XYZ and going to
reduce it to S
(X, Y, Z are grammar symbols)
70
SLR PARSING
The central idea in the SLR method is first to construct from
the grammar a DFA to recognize viable prefixes. We group
items into sets, which become the states of the SLR parser.
Viable prefixes:
The set of prefixes of a right sentential form that can appear on the
stack of a Shift-Reduce parser is called Viable prefixes.
Example :- a, aa, aab, and aabb are viable prefixes of aabbbbd.
Augmented Grammar
If G is a grammar with start symbol S, then G', the
augmented grammar for G, is G with
new start symbol S' and
the production S' S.
The purpose of the augmenting production is to indicate to
the parser when it should stop parsing and accept the input.
That is, acceptance occurs only when the parser is about to
reduce by the production S' S.
72
closure({[E E]}) =
{
{ [E E] }
{ [E E]
{ [E E]
[E E + T] [E E + T]
[E T] }
[E T]
[T T * F]
Add [E]
[T F] }
Add [T]
Add [F]
Grammar:
EE+T|T
TT*F|F
F(E)
F id
[E E]
[E E + T
[E T]
[T T * F]
[T F]
[F ( E )]
[F id] }
74
75
Then goto(I,E)
Suppose I ={ [E E]
[E E + T] = closure({[E E , E E + T]}
= { [E E ]
[E T]
[E E + T] }
[T T * F]
[T F]
[F ( E )]
[F id] }
Grammar:
EE+T|T
TT*F|F
F(E)
F id
76
Suppose I = { [E E ], [E E + T] }
{ [=E E + T]
Then goto(I,+) = closure({[E E + T]})
[T T * F]
[T F]
[F ( E )]
[F id] }
Grammar:
EE+T|T
TT*F|F
F(E)
F id
77
State 0
We start by adding item E' E to
state 0.
This item has a " " immediately to the
left of a nonterminal. Whenever this is
the case, we must perform step 3
(closure) of the set construction
algorithm.
We add the items E E + T and E
T to state 0, giving
I0:
E' E
E E+T
ET
T T*F
T F
F (E)
F id
{ E' E
EE+T
ET }
78
State 0
Reapplying closure to E T, we must add the
items T T * F and
T F to state 0, giving
I0:
{ E' E
EE+T
ET
TT*F
TF
}
E' E
E E+T
ET
T T*F
T F
F (E)
F id
79
State 0
Reapplying closure to T F, we must
add the items F ( E ) and F id
to state 0, giving
I0: { E' E
EE+T
ET
TT*F
TF
F(E)
F id
}
E' E
E E+T
ET
T T*F
T F
F (E)
F id
80
E' E
E E+T
ET
T T*F
T F
F (E)
F id
I0
I1
81
State 1
State 1 starts with the items E' E and
E E + T. These items are formed from
items E' E and E E + T by
moving the "" one grammar symbol to the
right. In each case, the grammar symbol is
E.
Closure does not add any new items, so
state 1 ends up with the 2 items:
I1: {
E' E
EE+T
}
I0
E' E
E E+T
ET
T T*F
T F
F (E)
F id
I1
82
I0
E' E
E E+T
ET
T T*F
T F
F (E)
F id
I2
83
E' E
E E+T
ET
T T*F
T F
F (E)
F id
I3
84
85
E' E
E E+T
ET
T T*F
T F
F (E)
F id
86
State 4
Applying closure to E T, we add
items T T * F and
T F to state 4, giving
F(E)
EE+T
ET
TT*F
TF
E' E
E E+T
ET
T T*F
T F
F (E)
F id
87
State 4
Applying step 3 to T F, we add items F
( E ) and F id to state 4, giving the
final set of items
I 4: {
F(E)
EE+T
ET
TT*F
TF
F(E)
F id
}
The next slide shows the DFA to this point.
E' E
E E+T
ET
T T*F
T F
F (E)
F id
I0
I4
88
89
F id
}
Since this item is a complete item, we will
not be able to produce new states from state
5.
The next slide shows the DFA to this point.
I0
id
E' E
E E+T
ET
T T*F
T F
F (E)
F id
I4
90
91
E' E
E E+T
ET
T T*F
T F
F (E)
F id
I6: {
EE+T
T T*F
T F
F ( E )
F id
}
I1
I6
92
93
E' E
E E+T
ET
T T*F
T F
F (E)
F id
I7: {
TT*F
F (E)
F id
}
I2
I7
94
95
E' E
E E+T
ET
T T*F
T F
F (E)
F id
I8: {
F (E)
E E+T
}
No further items can be added to state 8
through closure.
There are other transitions from state 4,
but they do not result in new states.
I4
I8
96
98
E+T
T*F
F
(E)
id
E' E
E E+T
ET
T T*F
T F
F (E)
F id
}
All other transitions from state 6 go to
existing states. The next slide shows the
DFA to this point.
I6
I9
99
100
E' E
E E+T
ET
T T*F
T F
F (E)
F id
T T*F
}
All other transitions from state 7 go to
existing states. The next slide shows
the DFA to this point.
I7
I10
101
102
F (E)
}
All other transitions from state 8 go to
existing states.
State 9 has one transition to an existing
state (7). No other new states can be
added, so we are done.
The next slide shows the final DFA for
I7
viable prefixes.
E' E
E E+T
ET
T T*F
T F
F (E)
F id
I10
103
104
E E+T
ET
T T*F
TF
F (E)
F id
state
id
s5
Goto Table
)
s4
s6
r2
s7
r2
r2
r4
r4
r4
r4
s4
r6
acc
s5
r6
r6
s5
s4
s5
s4
r6
10
s6
s11
r1
s7
r1
r1
10
r3
r3
r3
r3
11
r5
r5
r5
r5
105
108
Action Table
1)
2)
3)
4)
5)
6)
E E+T
ET
T T*F
TF
F (E)
F id
state
id
s5
Goto Table
(
s4
1
2
3
4
5
6
7
8
9
10
11
109
110
Action Table
1)
2)
3)
4)
5)
6)
E E+T
ET
T T*F
TF
F (E)
F id
state
id
s5
Goto Table
(
s4
s6
acc
2
3
4
5
6
7
8
9
10
11
111
112
Action Table
1)
2)
3)
4)
5)
6)
E E+T
ET
T T*F
TF
F (E)
F id
state
id
s5
Goto Table
(
s4
s6
acc
2
3
4
5
r6
r6
r6
r6
7
8
9
10
113
The complete SLR parse table for the expression grammar is given on the next slide.
114
State
0
id
s5
goto
s4
s6
r2
s7
r2
r2
r4
r4
r4
r4
s4
r6
acc
s5
Rules:
r6
r6
s5
s4
s5
s4
1.
2.
3.
4.
5.
6.
EE+T
ET
TT*F
TF
F(E)
F id
Notation:
r6
10
s6
s11
r1
s7
r1
r1
10
r3
r3
r3
r3
11
r5
r5
r5
r5
s5 = shift 5
r2 = reduce by
E T
115
I0 = closure({[C C]})
I1 = goto(I0,C) = closure({[C C]})
goto(I0,C)
State I1:
C C final
State I4:
C A B
goto(I2,B)
State I0:
State I2:
goto
(
I
,
A
)
C C
0
C AB
C A B
B a
goto(I2,a)
A a
goto(I0,a) State I3:
A a
State I5:
B a
116
State I0:
State I1:
C C
C C
C A B
A a
1
start
C
A
2
a
a
3
s3
s5
r3
acc
4
r2
r4
Grammar:
1. C C
2. C A B
3. A a
4. B a
117
Actions of A LR-Parser
1. shift s -- shifts the next input symbol and the state s onto the stack
( So X1 S1 ... Xm Sm, ai ai+1 ... an $ ) ( So X1 S1 ... Xm Sm ai s, ai+1 ... an $ )
LR Parsing Algorithm
Refer Text:
Compilers Principles Techniques and Tools by Alfred V Aho, Ravi
Sethi, Jeffery D Ulman
Page No. 218-219
119
input
id*id+id$
*id+id$
*id+id$
*id+id$
id+id$
+id$
+id$
+id$
+id$
id$
$
$
$
$
action
shift 5
reduce by Fid
reduce by TF
shift 7
shift 5
reduce by Fid
reduce by TT*F
reduce by ET
shift 6
shift 5
reduce by Fid
reduce by TF
reduce by EE+T
accept
output
Fid
TF
Fid
TT*F
ET
Fid
TF
EE+T
120
Exercise
a)
b)
c)
121
122
Conflict Example
S L=R
SR
L *R
L id
RL
I0: S .S
S .L=R
S .R
L .*R
L .id
R .L
Problem
FOLLOW(R)={=,$}
=
shift 6
reduce by R L
shift/reduce conflict
I1:S S.
I2:S L.=R
R L.
I6:S L=.R
R .L
L .*R
L .id
I9: S L=R.
I3:S R.
I4:L *.R
R .L
L .*R
L .id
I7:L *R.
I8:R L.
I5:L id.
123
Conflict Example2
S AaAb
S BbBa
A
B
I0: S .S
S .AaAb
S .BbBa
A.
B.
Problem
FOLLOW(A)={a,b}
FOLLOW(B)={a,b}
a
reduce by A
reduce by B
reduce/reduce conflict
reduce by A
reduce by B
reduce/reduce conflict
b
124
SLR(1)
There is an easy fix for some of the shift/reduce or reduce/reduce errors
requires to look one token ahead (called the lookahead token)
TT.*F
FOLLOW(E) = { $, +, ) }
FIRST(* F) = { * }
no overlapping!
125
126
LR(1) items
A LR(1) item is:
A ,a
A ,an
127
128
goto operation
If I is a set of LR(1) items and X is a grammar symbol
(terminal or non-terminal), then goto(I,X) is defined as
follows:
If A .X,a in I
then every item in closure({A X.,a}) will be in
goto(I,X).
129
130
A ,a1
...
A ,an
can be written as
A ,a1/a2/.../an
131
I1
I0
S.S,$
S.CC,$
C.cC,c/d
C.d, c/d
SS.,$
133
I4:
{
Cd.,c/d
}
I3:
{
Cc.C,c/d
CcC, c/d
C.d,c/d
}
I0
S.S,$
S.CC,$
C.cC,c/d
C.d, c/d
I1
S
SS.,$
I2
SC.C,$
C.cC,$
C.d,$
Cc.C,c/d
C.cC,c/d
C.d,c/d
I3
I4
Cd., c/d
d
135
137
.
.
s3
s4
GOTO
$
2.CcC
acc
s6
s7
s3
s4
r3
r3
5
6
3. Cd
r1
s6
s7
7
8
1.SCC
r3
r2
r2
r2
139
LALR(1)
If the lookaheads s1 and s2 are different, then the items A a, s1
and A a , s2 are different
this results to a large number of states since the combinations of expected lookahead
symbols can be very large.
140
Practical Considerations
How to avoid reduce/reduce and shift/reduce conflicts:
left recursion is good, right recursion is bad
141
LALR parsers are often used in practice because LALR parsing tables
are smaller than LR(1) parsing tables.
The number of states in SLR and LALR parsing tables for a grammar G
are equal.
But LALR parsers recognize more grammars than SLR parsers.
yacc creates a LALR parser for the given grammar.
A state of LALR parser will be again a set of LR(1) items.
142
shrink # of states
LALR Parser
143
I7:Cd ,$
A new state:
We will do this for all states of a canonical LR(1) parser to get the states
of the LALR parser.
In fact, the number of the states of the LALR parser for a grammar will
be equal to the number of states of the SLR parser for that grammar.
144
I3: Cc.C,c/d
C.cC,c/d
C.d,c/d
A new state:
I36: Cc.C,c/d/$
C.cC,c/d/$
C.d,c/d/$
I6: Cc.C,$
C.cC,$
C.d,$
145
146
s36
s47
2.CcC
acc
s36
s47
36
s36
s47
89
47
r3
r3
5
89
1.SCC
3. Cd
r3
r1
r2
r2
r2
148
Exercises
Q1. Show that the following grammar
SAa|bAc|dc|dba
Ad
Is LALR(1) but not SLR(1).
Q2. Show that the following grammar
SAa|bAc|Bc|bBa
Ad
Bd
Is LR(1) but not LALR(1)
149