You are on page 1of 12

PARSING:-

Even though a complier may not actually construct a parse tree a par
ser must be capable of constructing the tree.
The top-down construction of a parse tree is done starting with the
root, labeled with the starting non-terminal, and repeatedly performing the foll
owing two steps….
1. At node n, labeled with the non-terminal A, select one of the production
s for A and construct children at n, for the symbols on the right side of the pr
oduction.
2. Find the next node at which a sub tree is to be constructed.
a) <type>
b) <type>

Array [<sample>] of <type>


c) <type>

Array [<sample>] of <type>

Num dotdot num


d) <type>

Array [<sample>] of <type>

Num dotdot num <sample>


e)
<type>

Array [<sample>] of <type>

Num dotdot num <sample>


Inte
ger

For some grammar, the above steps can be implemented during a single lef
t-to-right scan of the input string.
The current token being scanned in the input is frequently referred as t
he look ahead symbol.
Initially, the look ahead symbol is the first, i.e., left most token of
the input string.
In general, the selection of a production for a non-terminal may be in t
he involvement of trail-and-error, i.e., we may have to try a production and bac
ktrack to try another production if the first is found to be unsuitable.
A production is unsuitable; we cannot complete the tree to match the inp
ut string.
There is an important special case; however called predictive parsing, i
n which back tracking does not occur.
PREDECTIVE PARSING:-
Recursive –descent parsing is a top-down method of syntax analysis in wh
ich we execute a set of recursive productions to process the input.
A procedure is associated with each non-terminal of the grammar.
The predictive parser consists of a procedures for the non-terminal <typ
e> and <sample> of grammar and an additional procedure, match.
MATCH:-
Match is used to simplify the code for <type> and <simple> it advance to the nex
t input token if it’s the argument t matchs the look ahead symbol
procedure match(t:token)
begin
if look ahead=t then
lookahead=next token
else
error
end
procedure type
begin
if lookahead is in{integer,char,num}them
simple
else
if lookahead=’^’then begin
match(^);match(id);
end
else
if lookahead=array then begin
match(array);match(‘[‘);
simple;
match(‘]);match(of);
type
end
else
error
end
procedure simple
begin
if lookahead=integer then
match(integer)
else
if lookahead=name then begin
match(name);match(dot dot);match(num);
end
else
error
end
tokens,pattern,lexems:-
->in general there is a set of strings in the input for which The same token is
procedured as output
->this set of string is described by a rule called a pattern Or (R.E)associated
with the token
->a lexeme is sequence of character in the source program that is matched by the
pattern for a token
->for eq:
Cost pi=3.1416
The substring pi is lexeme for the token identifier
->tokenscan be treated as terminal strings in the grammar For the source languag
e
->the lexeme mtched by the pattern for the token represent String of charcters i
n the source program that can be Treated as a lexical unit. In most programing l
anguages the following costructs are treated as tokens
Keywords, operators identifers, constants, literal strings, functions sy
mbols such as parantheses, commas and semicolons
In the aboveeq:when the character sequence pi appears in The source program a
token representing an identifier Is returned to the parser
->the returning of token is often implemented by passing an integer correspondin
g to the token
->A pattern is a rule describing the set of lexemes that can Repersent a particu
lar token in source program.
Parsing:
………
àparsering is the process of determing if a string of tokens can be generated by
a grammar
àA parser can be constructed for any grammar
àFor any CFG there is a parser that takes at most6 times to parse a string of n
tokens
àBut it is too expensive
àgiven a programming language generaly a grammar can be constructedthatcan be p
arsed quickly
àlinear algorithms suffice to parse essentially all languages that arise in prac
tice
àprogramming language parsers almost always make a signle left to right scan ove
r the input looking aheadb on looking ahead one token at atime
àmost parsing methods fall into one of two classes called the top down and botto
m up methods
-->top down methods starts at the root and proceeds towards the leaves while the
bottom up construction starts at the leaves and proceeds towards the root
Top-bottom parsing:-
àfor eq:the foloowing generats of the subsets of the types of pascal and assume
that dot dot is the token to emphasize
That the character sequence is treated as limit
Typeà<simple>/^id/array[<simple>]of <type> <simple>àinteger/char/num dot dot num
NON RECRUSSIVE PREDICATE PARSING:
A nonrecursive predicate parsing can be built by maintaining a stack explicitly,
rather than implivity via recursive calls.
The key problems during presdicate parsing is that of determining the production
to be apllied for a nontrminal.
The nonrecursive parse model is illustrated as
input
a + b $

X
Y
Z
$
Predicative parsing programme

Parsing table
M
It looks the produvtion to be apllied in a parsing table
A table_driven predicate praser has an input buffer,a stack,a parsing table,and
an output stream.
The input buffer contains the sting to be parsed,followed by $ ,a symbol used as
a right end marker to indicate the end of the input string.
The stack contains a sequence of grammar symbol with $ on the bottom,indicating
the bottom of the stack.
Initially,the stack contains the start symbol of the grammar on top of $.
The parsing table is a two-dimensional array M[A,a] ,where A is non-terminal and
a terminal or the symbol$.
E→
E+T/E-T/T
T→T*F/T/F/F
F→(E)/id
E→TE’
E’→+TE’/-TE’/e
T→FT’
T’→
*FT’/1FT’/e
The parser is controlled by a program that behaves as follows:
The program determines the action based on the symbols X on top stack, and ‘a’ t
he current input symbol.
The action as follows:
1. If X=a=$, then parser halts and announces successful completion of parsi
ng.
2. If X=a≠$, then the parser pops X off the stack and advances the input po
inter to the next input symbol.
3. If X is a non-terminal, the program consults entry M[X, a] of the parsin
g table M.
This entry may be X-production of grammar of an error. If the entry M[X, a] cont
ains X→UVW, the parser replaces X on the top of the stack by WVU (with U on top)
. IF M[X, a]=error, the parser calls an Error recovery Routine.
Algorithm is as follows: (Non recursive predicting parser)
Input: A string W and a parsing table M for grammar G
Output: IF W is in α (G), leftmost deriv tion of W, otherwise n error
indic tion.
Method: Initi lly the p rser is in $S on the st ck, where S is st rt symbol nd
W$ in the input buffer.
1. Set ‘ip’ to point to the first symbol of W$.
2. Repe t
Let X be the top of st ck symbol nd the symbol pointed to by ‘ip’
If X is termin l or $ then,
If X= then
POP X from st ck nd dv nce ‘ip’
else
error ()
else
if M[X, ]=X Y1Y2……..Yk then
{
POP X from the st ck
PUSH Yk,Yk-1,……Y1 on to st ck, with Y1 on top.
Output the production X Y1Y2…..Yk
}
else
error()
until X=$
The beh vior of the p rser c n be described in terms of its configur tion.
For ex mple: The p rse t ble is s follows
NON
TERMINAL INPUT SYMBOL
Id  + * (  ) $
E E TE E TE  
E  E +TE  E € E €
T T FT    T FT  
T T € T *FT T € T €
F F id F (E)

The configur tions of p rser with imp ct string id+id+id$ nd st ck $E re s f


ollows
St ck input output
$E id+id*id$
$E’T id+id*id$ E TE’
$E’T’F id+id*id$ T FT’
$E’T’id id+id*id$ F id
$E’T’ +id*id$
$E’ +id*id$ T’ E
$E’T+ +id*id$ E’ +TE’
$E’T id*id$
$E’T’F id*id$ T FT’
$E’T’id id*id$ F iD
$E’T’ *id$
$E’T’F* *id$ T’ *FT’
$E’T’F id$
$E’T’id id$ F id
$E’T’ $
$E’ $ T’ E
$ $ E’ E

Constr ction of p rse t ble:-


The constr ction of predictive p rser is ided by two functions ssoci ted
with gr mm r G ,n mely,FIRST nd FOLLOW
FIRST ( lph ) is the set of terimin ls begin the strings derived from ,where
lph E(NuT)*
If lph *=>E,then E is lso in FIRST( lph )FOLLOW(A),for non-terimin l A,
is the set of terimin ls th t c n ppe r immedi tely to the right of A in some
sententi l form ie.,the set of terimin ls such th t there exists deriv tion
of the forms*=> lph A root3 for some lph nd root3
If Ac n be the right most symbol in some sententi lform then $ is n F
OLLOW(A) .
the rules for computing FIRST(X) for ll gr mmer symbols re s follows
1.if X is terimin l,then FIRST(x) is {X}.
2.if X E is production then ddd E to FIRST(X).
3.if X is non terimin l nd X Y1, Y2…..Yk is production then pl ce in FIRST(X
) if for some I, is in FIRST(Yi) nd E is in ll of FIRST(Y1),…..FIRST(Yi-1)i.e;
Y1,Y2,Y3…..Yi-1 *=>E
If E is in FIRST(Yj) for ll j=1,2,3….k, then ddE then FIRST(X).
If y1 does not derive E, then simply the terimin ls of FIRST(Y1) re dded to FIR
ST(X).
If Y1*=>E, then FIRST (Y2) re dded to FIRST(X) nd soon.
the rules for computing FOLLOW(A) for ll non termni ls re s follows.
1.pl ce $ in FOLLOW(S). where s is the stsrt symbol nd $ is the input right end
m rker.
2.if there is production A lph Bbit , then every thing in FIRST(BITA) except
for E is pl ced in FOLLOW(B).
3.if there is productinn A lph B,or production A lph B, where FIRST(BITA)
cont ins E(i.e;bit *=>E), then every thing in FOLLOW(A) is in FOLLOW(B).
For ex :
Consider the CFG
E → TE′
E′→+TE′/€
T→FT′
T′→*FT’/€
F→ (E)/id
THEN FIRST(+)={+},FIRST(*)={*},FIRST(id)={id},
FIRST(‘(‘)={‘(‘},FIRST(‘)’)={’)’},
FIRST(E)={‘(‘,id}
FIRST(T)={‘(‘,id}
FIRST(F)={‘(‘,id}
FIRST(E’)={+,€}
FIRST(T’)={*,€}
FOLLOW(E)={‘)’,$}
FOLLOW(E’)={‘)’,$}
FOLLOW(T)={+}èFOLLOW(T)={+,’)’,$}
FOLLOW( T’)={+,)’’,$}
FOLLOW(F)={*}èFOLLOW(F)={*,+,’)’,$}
Algorithm for construction of predic te p rsing t ble:
Input: Gr mm r G
Output: p rsing t ble M
Method
1.For e ch production A→α of the gr mmer,do step 2&3
2. For e ch termin l in FIRST(α), dd A→α to M[A, ]
3. If € is in FIRST(α), dd A→α to M[A,b] for e ch termin l b in FOLLOW(A)
If € is in FIRST(α), nd $ is in FOLLOW(A), dd A→α to M[A,$]
4.M ke e ch undefined entry of M is the error
→This lgorithm c n be pplied to the bove gr mm r
→first E→TE’
Since FIRST(TEE’)=FIRST(T)={‘(‘,id}
The production E→TE’ is dded in M[E,(] & M[E,id]
Simil rly
E’→+TE’ c uses M[E’,+]
E’→€ c uses M[E’,)] nd M[E,(] SINCE FOLLOW(E’)={),$}
T→FT’
Since FIRST(FT’)=FIRST(F)={(,id}
The production T→FT’ is dded M[T,(] nd M[T’,$]
T’→*FT’ c uses M[T’,*]
T’→€ c uses M[T’,)], M[T’,+] AND M[T’,$]
Since FOLLOW(T’)={+,),$}
F→(E) c uses M[F,(] nd
F→id c uses M[F,id]
The t ble is s follows
INPUT SYMBOL
Non termin l id + * ( ) $
E E→TE′ E→TE′
E’ E′→+TE′ E′→€ E′→€
T T→FT′ T→FT′
T’ T′→€ T′→*FT′ T′→€ T′→€
F F→id F→(E)
S→iEtss’/ FIRST(s)={I, }
S’→es/E FIRST(s’)={e,E}
E→b FIRST(E)={b}
FIRST(i)={i}
FIRST( )={ }
FIRST(b)={b}
FIRST(t)={
FOLLOW(S)={e}
S I
s→iEtSS’’ t s→ b $
S’ S’→es
S’→>e S’→e
E ---------
FOLLOW(S’)={E.$}
FOLLOW(E)=t

*→ if G is left recursive or mbigious ,then M will h ve t le st one multiply


definedentry
*in the bove M[S’,e] cont ins both s’→es nd s’→e,since FOLLOW(s’)={e.$}
LL(!) GRAMMARS
A gr mm r whose p rsing t ble h s no multiply defined entries is s id to be LL(1
)
→The first ’L’ st nds for sc nning the input from left to right.
→The second ‘L’ st nd for producing left most deriv tion.
→And’1’ for using one input symbol of look he d out e ch step to m ke p rsing
ction decision.
→LL(1) gr mm r h ve sever l distinctive properties,no mbigious or left-recrussi
ve gr mm r c n be LL(1).
→A gr mm r G is LL(1) iff whenever A→ α/β are two distinct productions of G,the
following conditions hold
 
1.for no terminal a do oth α nd β derive strings eginning with a.
2.atmost one of α nd β can derive empty strings.
3.if β =>€ ,then αdoes not derive ny substring .beginning with termin l in FO
LLOW(A) bove.
NOTE: Gr mm r for rthim tic expression is LL(1).
If then-else st tements CFG is not LL(1).
→For this re son,tr nsform the gr mm r by elimin ting ll left recursion nd the
n left f ctoring whenever possible.
BOTTOM-UP PARSING:
→It is lso known s shift-resduce p rsing.
→E sy-to-implement form of shift-reduce p rsing is oper tor-procedure p rsing.
→Much-more-gener l method of shift-reduce p rsing is c lled LR p rsing.
→It ttempts to construct p rse tree for n import string beginning t the le
ves(bottom) nd working up tow rds the root(top).
→At e ch reduction step, p rticul r substring m tching the right side of prod
uction is repl ced by the symbol on the left of th t production.
→And if the substring os choosen correctly t e ch step, right most deriv tion
is tr ced out in reverse.
FOREQ:
consider the gr mm r
S→ ABe the sentence bbcde c n be reduced
A→ABc/b to S by the following steps
B→d
bbcde i.e.,S=> ABe right most deriv
tion is equiv lent
Abcde => Ade to shift-r
esducing
Ade => Abcde
ABe => bbcde
H ndles:
A H ndle of string is sub string of string th t m tches the right side of
production, ( nd whose reduction to the non termin l on the left side of the
production)
-> Form lly, h ndle of right senti l from r is production
 A->β and a posi
tion of r where the string β may e found and replaced y A to produce the previ
ous
Right-sentential from in a right most derivation of β.
i.e, if s*=>αAW=>αβthen A->β in the position following αβW. The string W to the
right of the handle
 contains only terminals. 
->Note these may e more  than one handle if the grammar am iguous.
-> If a grammar is unam iguous, then every right-sentential from of the grammar
has exactly one handle.   
For eq: S-> aABe the sentence a cde can e reduced y the following step
s.  
A->A c/
 B->d. 
A cde From s=>aA e
aABCde =>aAde 
aAde =>aA
 cde
aABC =>a cde

 
So, a cde isa right sentential from whose handle is A-> position
 2.
Likewise, aA cde is right-sentential
 from whose handle is A->A c at position 2.
Note that we say that, the su string β is a handle of αβW.

Handle programming
 
It is nothing ut reducing the handle y the non terminal, which is towa
rds the left of the production.   
A right most derivation in reverse can e o tained y handle programming
.  
Two points are to e concerned when we passing y handle running. They a
re  
-To locate the su string to e reduced in a right sentential form, and
-Too determine
 what production to choose if there is more than one production wi
th the su string on the right side
Shift Reduce Parsing:

->Shift-reduce parser as a ottom-up parse, which attempts to construct a parse
tree for an input string eginning at the leaves and working up towards the root
. 
->At each set up of reduction,
 a particular
 su string matching the right side of
a production is replaced y the sym ol on the left of that production.
->The operations or actions
 of shift-reduce parser are :
Shift: The next imp sym ol is shifted on to the top
 of the stack.

Reduce : The handle on the top of the stack will e reduced y the non terminal
which is towards the left side of the production, which is selected for reducing
the handler.
Accept: The parser announces successful completion of parsing
Error: The parser discover that a syntax error has occurred and calls an error r
ecovery routine.
IMPLEMENTATION:

•  it uses the data structure,stack
 which can hold grammer sym ols,and
input uffer to hold the string to e parsed.

• The $ sm ol is used to mark the ottom of the stack and also the
right end of the input. 
• Intially,the stack is empty and the string W is on the input as shown e
low:
STACK INPUT

 $ W$ 
• The parser operates y shifting zero or more input sym ols onto the
stack until a handle S3 ia on top of the stack
• Once the handle appear on the top of the stack,the parser reduces S3
to the left side of the production.

• The parser repeats the a ove steps
 until it has detected an error or
until the stack contains the start sym ol and the input is empty.
• After the parser enters this state it halts and announces successful
completion of parsing. 
• The stack implementation for the input string id1+id2*id3 is as shown e
low:
STACK INPUT ACTION
$ id1+id2*id3$ SHIFT
$ id1 id +id2*id3$ REDUCE BY E
$ E +id2*id3$ SHIFT
$E+ id2*id3$ SHIFT
$E+id2 *id3$id REDUCE BY E
$E+E *id3$
SHIFT
$E+E* id3$
SHIFT
$E+E*id3 $
id REDUCE BY E
$E+E*E $
E*E REDUCE E
$E+E $
E+E REDUCE BY
$E
$ ACCEPT

 *At one step,the stack contains


 E+E even though E+E can
e reduced to E y the E+E,the input sym ol reduce parse production E
is the reverse of the rightmost derivation.IF
 E+E is reduced toE,it
ecomes the reverse of the leftmost o servation which is not the
characteristic of shift reduce parser.
CONFLICTS DURING SHIFT-REDUCE PARSING :
When the shift-reduce parser is applied to some CFG it leads to some
conflicts ecause shift-reduce parser cannot e used for CFGS.the
conflicts are
SHIFT/REDUCE CONFLICT :
Theparser even after knowing the entire stack contains and the next
sym ol,cannot decide whether to shift or to reduce it is shift/reduce
conflict.
REDUCE/REDUCECONFLICT
Theparser knowing the entire stack contents and the next input
sym ol,cannot decide which productions to use or which of several
reductions to make.
FOR EX: dangling-else grammar
<stmt>it <expr>then<stml>/if <expr> then <stml> else <stml>
STACK:
If <expr> then <stmt> else…..$
We cannot tell whether if <expr> then <stmt> is the handle or not
This leads the parser in cotension wether to shift else or reduce the
stack top element
so it is shift/reduce conflict
Simple LR Parsing (SLR):
Let (s0x1s1x2s2….xm sm, ai ai+1…..an$) the current configuration represent x1x2…
…xm, aiai+1….an$
1. If action[sm, ai] = shift s, the parser executes a shift move, entering the c
onfiguration (s0x1s1x2s2…xm sm ai s, ai+1…..an$)
2. If action[sm, ai] = reduce A-->B then the parser executes a reduce move, ente
ring the configuration
(s0x1s1x2s2….xm-r, sm-rAs, aiai+1...an$)
Where s=go to[sm-r, A] and r is length of B the right side of
 the production
The
 parser first popped 2r sym ols off the stack(r state
 sym ols and r grammar s
ym ols), exposing state sm-r. The parser then pushed oth A, the left side of th
e production, and s, the entry for go to [sm-r, A] on to the stack
The current input sym ol is not changed in a reduce move.
3. If action[sm,ai] = accept, parsing is completed
4. If action[sm, ai] = errors, the parser has discovered an error recovery routi
ne
ALGORITHM:

Input: An input string W and an LR parsing ta le with functions action and go t
o for a grammar G.

Output: If W is in L (G), a ottom-up parsing for W, otherwise an error indicati
on.
Method: Initially theparser has s0 on its stack, where s0 is the initial state,
and W$ in the input uffer. 
Set ip to point to the first sym ol of W$.
Repeat
{   
Let s e the state on top of the stack and a the Sym ol pointed y ip;
If action [s, a ] =shift s’ then
{
 Push a then s’ on the top of the stack. Advance ip to the next input
sym ol.
}
Else
If action [s, a] = reduce a->ß then
{ 
Pop 2*│ ß│ sym ol off stack
Let s’ e the state now on the top of the stack
Push A then go to [s’, A] on the top of the stack;
Output the production a-> ß
}
Else
If action [s, a] =accept then
Return
Else
Error ()
End.
For example:-
1. E->E+T
2. E->T
3. T->T*F
4. T->F
5. F->(E)
6. F->id
State ACTION
GO TO
Id + * ( ) $ E T F
0 S5 S4 1 2
3
1 S6 Acc
2 R2 S7 R2 R2
3 R4 R4 R4 R4
4 S5 S4 8 2 3
5 R6 R6 R6 R6
6 S5 S4 9 3
7 S5 S4 10
8 S6 S11
9 R1 S7 R1 R1
10 R3 R3 R3 R3
11 R5 R5 R5 R5

Stack Input Action


O Id*id+id$ Shift
Oid5 *id+id$ Reduced y F->id
OF3 *id+id$ Reduced y T->F
OT2 *id+id$ Shift
OT2*7 Id+id$ Shift 
OT2*7id5 +id$ Reduced y F->id
OT2*7F10 +id$ Reduced y T->T*F
OT2 +id$ Reduced y E->T
OE1 +id$ Shift
OE1+6 Id$ shift 
OE1+6id5 $ Reduced y F->id
OE1+6F3 $ Reduced y T->F
OE1+6T9 $ Reduced y E->E+T
OE1 $ accept

You might also like