Parsing

Syntax Analysis
Position of a Parser in the Compiler Model

Source Program
Lexical Analyzer Token, tokenval Get next token Parser and rest of front-end Intermediate representatio n
Lexical error
Syntax error Semantic error Symbol Table
-1-
A Parser
Context free grammar, G Token stream, s (from lexer)
Parser
Yes, if s in L(G) No, otherwise Error messages
Syntax analyzers (parsers) = CFG acceptors which also output the corresponding derivation when the token stream is accepted Various kinds: LL(k), LR(k), SLR, LALR
-2-
The Parser
A parser implements a C-F grammar The role of the parser is twofold: 1. To check syntax (= string recognizer)
o
o
And to report syntax errors accurately
2. To invoke semantic actions

For static semantics checking, e.g. type checking of expressions, functions, etc. o For syntax-directed translation of the source code to an intermediate representation
-3-
Parsing
Universal (any C-F grammar)
o o o o o o
Cocke-Younger-Kasimi Earley Recursive descent (predictive parsing) LL (Left-to-right, Leftmost derivation) methods
Top-down (C-F grammar with restrictions)

Bottom-up (C-F grammar with restrictions)
Operator precedence parsing LR (Left-to-right, Rightmost derivation) methods
SLR, canonical LR, LALR
-4-
Top Down parsing

Start from S (the start symbol) Use productions to derive a sequence of tokens For arbitrary strings , , and for a production: A
o o o o
A single step of the derivation is A (substitute for A) S E+S (S + E) + E
Example
(E + S + E) + E
-5-
Parsing Top-Down
Goal: construct a leftmost derivation of string while reading in S E+S|E sequentail token stream E num | (S)
Partly-derived String Lookahead parsed part unparsed part
E + S ( (1+2+(3+4))+5 (S) + S 1 (1+2+(3+4))+5 (E+S)+S 1 (1+2+(3+4))+5 (1+S)+S 2 (1+2+(3+4))+5 (1+E+S)+S 2 (1+2+(3+4))+5 (1+2+S)+S 2 (1+2+(3+4))+5 (1+2+E)+S ( (1+2+(3+4))+5 (1+2+(S))+S 3 (1+2+(3+4))+5 (1+2+(E+S))+S 3 (1+2+(3+4))+5
-6-
...
Problem with Top-Down Parsing

Want to decide which production to apply based on next symbol
S E E+S|E num | (S)
Ex1: (1) S E (S) Ex2: (1)+2 S E+S (1)+E (1)+2
(E) (1) (S)+S (E)+S
How did you know to pick E+S in Ex2, if you picked E followed by (S), you couldnt parse it?
-7-
Grammar is Problem
This grammar cannot be parsed topdown with only a single look-ahead symbol! LL(1) = Left-to-right scanning, Left-most derivation, 1 look-ahead symbol Is it LL(k) for some k? If yes, then can rewrite grammar to allow top-down parsing: create LL(1) grammar for same language
-8-
S E
E+S|E num | (S)
Making a Grammar LL(1)

S S E E E+S E num (S)
S S S E E
ES +S num (S)
Problem: Cant decide which S production to apply until we see the symbol after the first expression Left-factoring: Factor common S prefix, add new non-terminal S at decision point. S derives (+S)* Also: Convert left recursion to right recursion
-9-
Parsing with New Grammar

S ES S | +S E num | (S)
ES ( (1+2+(3+4))+5 (S)S 1 (1+2+(3+4))+5 (ES)S 1 (1+2+(3+4))+5 (1S)S + (1+2+(3+4))+5 (1+ES)S 2 (1+2+(3+4))+5 (1+2S)S + (1+2+(3+4))+5 (1+2+S)S ( (1+2+(3+4))+5 (1+2+ES)S ( (1+2+(3+4))+5 (1+2+(S)S)S 3 (1+2+(3+4))+5 (1+2+(ES)S)S 3 (1+2+(3+4))+5 (1+2+(3S)S)S + (1+2+(3+4))+5 (1+2+(3+E)S)S 4 (1+2+(3+4))+5
- 10 -
...
Predictive Parsing
LL(1) grammar:
For a given non-terminal, the lookahead symbol uniquely determines the production to apply o Top-down parsing = predictive parsing o Driven by predictive parsing table of
o
non-terminals x terminals
productions
- 11 -
Parsing with Table

S ES S | +S E num | (S)
ES ( (1+2+(3+4))+5 (S)S 1 (1+2+(3+4))+5 (ES)S 1 (1+2+(3+4))+5 (1S)S + (1+2+(3+4))+5 (1+ES)S 2 (1+2+(3+4))+5 (1+2S)S + (1+2+(3+4))+5 num + ( ) $ S ES ES S +S E num (S)
- 12 -
How to Implement This?

Table can be converted easily into a recursive descent parser 3 procedures: parse_S(), parse_S(), and parse_E()
num + ( ) $ S ES ES S +S E num (S)

- 13 -
Recursive-Descent Parser
void parse_S() { switch (token) { case num: parse_E(); parse_S(); return; case (: parse_E(); parse_S(); return; default: ParseError(); } }
lookahead token

- 14 -
Recursive-Descent Parser (2)

void parse_S() { switch (token) { case +: token = input.read(); parse_S(); return; case ): return; case EOF: return; default: ParseError(); } } num + ( ) $ S ES ES S +S E num (S)
- 15 -
Recursive-Descent Parser (3)

void parse_E() { switch (token) { case number: token = input.read(); return; case (: token = input.read(); parse_S(); if (token != )) ParseError(); token = input.read(); return; default: ParseError(); } } num + ( ) $ S ES ES S +S E num (S)
- 16 -
Call Tree = Parse Tree

parse_ S parse_ parse_S E parse_ parse_ S S parse_ parse_S E parse_ S parse_ parse_S E parse_ S parse_ parse_S E parse_ S - 17 S E (S) E+S 1 E+S 2 E (S) E+S 3 E + S E 5
How to Construct Parsing Tables?

Needed: Algorithm for automatically generating a predictive parse table from a grammar
S S E
ES | +S number | (S)
??
- 18 -
Constructing Parse Tables

Can construct predictive parser if:
o
For every non-terminal, every lookahead symbol can be handled by at most 1 production
FIRST( ) for an arbitrary string of terminals and non-terminals is:

o
Set of symbols that might begin the fully expanded version of
FOLLOW(X) for a non-terminal X is:

o
Set of symbols that might follow the derivation of X in the input stream
X
FIRST
- 19 -
FOLLOW
Parse Table Entries

Consider a production X Add to the X row for each symbol in FIRST( ) If can derive ( is nullable), add for each symbol in FOLLOW(X) Grammar is LL(1) if no conflicting entries
- 20 -
S S E
Computing Nullable
X is nullable if it can derive the empty string:
o o
If it derives directly (X ) If it has a production X YZ ... where all RHS symbols (Y,Z) are nullable
Algorithm: assume all non-terminals are nonnullable, apply rules repeatedly until no change
S S E
Only S is nullable
- 21 -
Computing FIRST
Determining FIRST(X)
o o o
if X is a terminal, then add X to FIRST(X) if X then add to FIRST(X) if X is a nonterminal and X Y1Y2...Yk then a is in FIRST(X) if a is in FIRST(Yi) and is in FIRST(Yj) for j = 1...i-1 (i.e., its possible to have an empty prefix Y1 ... Yi-1 if is in FIRST(Y1Y2...Yk) then is in FIRST(X)
S S E
- 22 -
FIRST(S) = {num, ( } FIRST(S) = { , + } FIRST(E) = { num, ( }
Computing FOLLOW
Determining FOLLOW(X)
o o o
if S is the start symbol then $ is in FOLLOW(S) if A B then add all FIRST( ) != to FOLLOW(B) if A B or B and is in FIRST( ) then add FOLLOW(A) to FOLLOW(B) FIRST(S) = {num, ( } FIRST(S) = { , + } FIRST(E) = { num, ( }
S S E
- 23 -
FOLLOW(S) = { $, ) } FOLLOW(S) = { $, ) } FOLLOW(E) = { +, ), $ }
Putting it all Together

FIRST(S) = {num, ( } FIRST(S) = { , + } FIRST(E) = { num, ( } FOLLOW(S) = { $, ) } FOLLOW(S) = { $, ) } FOLLOW(E) = { +, ), $ }
Consider a production X Add to the X row for each symbol in FIRST( ) If can derive ( is nullable), add for each symbol in FOLLOW(X)
S S E

- 24 -
Ambiguous Grammars
Construction of predictive parse table for ambiguous grammar results in conflicts
S S+S|S*S| num
FIRST(S+S) = FIRST(S*S) = FIRST(num) = { num }
- 25 -
LL(1) Grammar
A grammar G is LL(1) if it is not left recursive and for each collection of productions A 1| 2| | n for nonterminal A the following holds: 1. FIRST( i) FIRST( j) = for all i 2. if i * then 2.a. j * for all i j 2.b. FIRST( j) FOLLOW(A) = for all i j j
- 26 -
Non-LL(1) Examples
Grammar S Sa|a S aS|a
Not LL(1) because: Left recursive FIRST(a S) FIRST(a) For R: S

*
S aR| R S| S aRa R S|
and
For R: FIRST(S) FOLLOW(R)
- 27 -
Impact of Ambiguity
Different parse trees correspond to different evaluations! Thus, program meaning is not defined!!
* *
+
1
+ 3 1 2 =9
2
=7
- 28 -
Can We Get Rid of Ambiguity?

Ambiguity is a function of the grammar, not the language! A context-free language L is inherently ambiguous if all grammars for L are ambiguous Every deterministic CFL has an unambiguous grammar
o o
So, no deterministic CFL is inherently ambiguous No inherently ambiguous programming languages have been invented
To construct a useful parser, must devise an unambiguous grammar
- 29 -
Eliminating Ambiguity
Often can eliminate ambiguity by adding nonterminals and allowing recursion only on right or left S
o o
S T
S+T|T T * num | num

T
S+T T*3 2
o o
1 T non-terminal enforces precedence Left-recursion; left associativity
- 30 -
A Closer Look at Eliminating Ambiguity

Precedence enforced by
Introduce distinct non-terminals for each precedence level o Operators for a given precedence level are specified as RHS for the production o Higher precedence operators are accessed by referencing the next-higher precedence non-terminal
o
- 31 -
Associativity
An operator is either left, right or non associative
o o o
Left: a + b + c = (a + b) + c Right: a ^ b ^ c = a ^ (b ^ c) Non: a < b < c is illegal (thus undefined)
Position of the recursion relative to the operator dictates the associativity

o o
Left (right) recursion left (right) associativity Non: Dont be recursive, simply reference next higher precedence non-terminal on both sides of operator
- 32 -
Error Handling
A good compiler should assist in identifying and locating errors
o o o o
Lexical errors: important, compiler can easily recover and continue Syntax errors: most important for compiler, can almost always recover Static semantic errors: important, can sometimes recover Dynamic semantic errors: hard or impossible to detect at compile time, runtime checks are required Logical errors: hard or impossible to detect
- 33 -
Error Recovery Strategies

Panic mode
o
Discard input until a token in a set of designated synchronizing tokens is found Perform local correction on the input to repair the error
Phrase-level recovery
o
Error productions
o
Augment grammar with productions for erroneous constructs

Choose a minimal sequence of changes to obtain a global least-cost correction
Global correction
o
- 34 -
Panic Mode Recovery

Add synchronizing actions to undefined entries based on FOLLOW Pro: Can be automated Cons: Error messages are needed
id E ER T TR F F id T F TR E T ER ER TR synch + T ER synch TR * F TR F (E) synch T + * E ( ER F TR synch TR synch
FOLLOW(E) = { ) $ } FOLLOW(ER) = { ) $ } FOLLOW(T) = { + ) $ } FOLLOW(TR) = { + ) $ } FOLLOW(F) = { + * ) $ }

) ER synch TR synch $ synch
T ER synch
- 35 - nonterminal A and skips input synch: the driver pops current till
Phrase-Level Recovery
Change input stream by inserting missing tokens For example: id id is changed into id * id Pro: Can be automated Cons: Recovery not always intuitive Can then continue
here
id E ER T TR F T F F TR id insert * E T ER ER TR synch + T ER synch TR * F TR F (E) synch
- 36 -
* E T
( ER
) ER
$ synch synch TR synch
T ER synch F TR synch TR synch
insert *: driver inserts missing * and retries the production
Error Productions
E T ER ER + T ER | T F TR TR * F TR | F ( E ) | id
id E ER T TR F T TR F F TR F TR id E T ER ER TR synch + T ER synch TR * F TR F (E)
- 37 -
Add error production: T R F TR to ignore missing *, e.g.: id id Pro: Powerful recovery method Cons: Cannot be automated
+ * E T synch ( ER F TR synch TR synch ) ER synch TR synch $ synch T ER synch

Parsing

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Parsing

Uploaded by

Copyright:

Available Formats

Syntax Analysis

Position of a Parser in the Compiler Model

Syntax error Semantic error Symbol Table

Yes, if s in L(G) No, otherwise Error messages

And to report syntax errors accurately

2. To invoke semantic actions

Top-down (C-F grammar with restrictions)

Top Down parsing

A single step of the derivation is A (substitute for A) S E+S (S + E) + E

Problem with Top-Down Parsing

(E) (1) (S)+S (E)+S

E+S|E num | (S)

Making a Grammar LL(1)

Parsing with New Grammar

Parsing with Table

How to Implement This?

num + ( ) $ S ES ES S +S E num (S)

num + ( ) $ S ES ES S +S E num (S)

Recursive-Descent Parser (2)

Recursive-Descent Parser (3)

Call Tree = Parse Tree

How to Construct Parsing Tables?

num + ( ) $ S ES ES S +S E num (S)

Constructing Parse Tables

FIRST( ) for an arbitrary string of terminals and non-terminals is:

Set of symbols that might begin the fully expanded version of

FOLLOW(X) for a non-terminal X is:

Parse Table Entries

FIRST(S) = {num, ( } FIRST(S) = { , + } FIRST(E) = { num, ( }

FOLLOW(S) = { $, ) } FOLLOW(S) = { $, ) } FOLLOW(E) = { +, ), $ }

Putting it all Together

num + ( ) $ S ES ES S +S E num (S)

FIRST(S+S) = FIRST(S*S) = FIRST(num) = { num }

Grammar S Sa|a S aS|a

Not LL(1) because: Left recursive FIRST(a S) FIRST(a) For R: S

For R: FIRST(S) FOLLOW(R)

Can We Get Rid of Ambiguity?

To construct a useful parser, must devise an unambiguous grammar

S+T|T T * num | num

1 T non-terminal enforces precedence Left-recursion; left associativity

A Closer Look at Eliminating Ambiguity

Left: a + b + c = (a + b) + c Right: a ^ b ^ c = a ^ (b ^ c) Non: a < b < c is illegal (thus undefined)

Position of the recursion relative to the operator dictates the associativity

Error Recovery Strategies

Augment grammar with productions for erroneous constructs

Panic Mode Recovery

FOLLOW(E) = { ) $ } FOLLOW(ER) = { ) $ } FOLLOW(T) = { + ) $ } FOLLOW(TR) = { + ) $ } FOLLOW(F) = { + * ) $ }

$ synch synch TR synch

T ER synch F TR synch TR synch

insert *: driver inserts missing * and retries the production

You might also like

insert : driver inserts missing and retries the production