Professional Documents
Culture Documents
Lexical error
-1-
A Parser
Context free grammar, G Token stream, s (from lexer)
Parser
Syntax analyzers (parsers) = CFG acceptors which also output the corresponding derivation when the token stream is accepted Various kinds: LL(k), LR(k), SLR, LALR
-2-
The Parser
A parser implements a C-F grammar The role of the parser is twofold: 1. To check syntax (= string recognizer)
o
o
-3-
Parsing
Universal (any C-F grammar)
o o o o o o
Cocke-Younger-Kasimi Earley Recursive descent (predictive parsing) LL (Left-to-right, Leftmost derivation) methods
-4-
Example
(E + S + E) + E
-5-
Parsing Top-Down
Goal: construct a leftmost derivation of string while reading in S E+S|E sequentail token stream E num | (S)
Partly-derived String Lookahead parsed part unparsed part
E + S ( (1+2+(3+4))+5 (S) + S 1 (1+2+(3+4))+5 (E+S)+S 1 (1+2+(3+4))+5 (1+S)+S 2 (1+2+(3+4))+5 (1+E+S)+S 2 (1+2+(3+4))+5 (1+2+S)+S 2 (1+2+(3+4))+5 (1+2+E)+S ( (1+2+(3+4))+5 (1+2+(S))+S 3 (1+2+(3+4))+5 (1+2+(E+S))+S 3 (1+2+(3+4))+5
-6-
...
How did you know to pick E+S in Ex2, if you picked E followed by (S), you couldnt parse it?
-7-
Grammar is Problem
This grammar cannot be parsed topdown with only a single look-ahead symbol! LL(1) = Left-to-right scanning, Left-most derivation, 1 look-ahead symbol Is it LL(k) for some k? If yes, then can rewrite grammar to allow top-down parsing: create LL(1) grammar for same language
-8-
S E
S S S E E
ES +S num (S)
Problem: Cant decide which S production to apply until we see the symbol after the first expression Left-factoring: Factor common S prefix, add new non-terminal S at decision point. S derives (+S)* Also: Convert left recursion to right recursion
-9-
ES ( (1+2+(3+4))+5 (S)S 1 (1+2+(3+4))+5 (ES)S 1 (1+2+(3+4))+5 (1S)S + (1+2+(3+4))+5 (1+ES)S 2 (1+2+(3+4))+5 (1+2S)S + (1+2+(3+4))+5 (1+2+S)S ( (1+2+(3+4))+5 (1+2+ES)S ( (1+2+(3+4))+5 (1+2+(S)S)S 3 (1+2+(3+4))+5 (1+2+(ES)S)S 3 (1+2+(3+4))+5 (1+2+(3S)S)S + (1+2+(3+4))+5 (1+2+(3+E)S)S 4 (1+2+(3+4))+5
- 10 -
...
Predictive Parsing
LL(1) grammar:
For a given non-terminal, the lookahead symbol uniquely determines the production to apply o Top-down parsing = predictive parsing o Driven by predictive parsing table of
o
non-terminals x terminals
productions
- 11 -
ES ( (1+2+(3+4))+5 (S)S 1 (1+2+(3+4))+5 (ES)S 1 (1+2+(3+4))+5 (1S)S + (1+2+(3+4))+5 (1+ES)S 2 (1+2+(3+4))+5 (1+2S)S + (1+2+(3+4))+5 num + ( ) $ S ES ES S +S E num (S)
- 12 -
Recursive-Descent Parser
void parse_S() { switch (token) { case num: parse_E(); parse_S(); return; case (: parse_E(); parse_S(); return; default: ParseError(); } }
lookahead token
S S E
ES | +S number | (S)
??
- 18 -
For every non-terminal, every lookahead symbol can be handled by at most 1 production
Set of symbols that might follow the derivation of X in the input stream
X
FIRST
- 19 -
FOLLOW
S S E
Computing Nullable
X is nullable if it can derive the empty string:
o o
If it derives directly (X ) If it has a production X YZ ... where all RHS symbols (Y,Z) are nullable
Algorithm: assume all non-terminals are nonnullable, apply rules repeatedly until no change
S S E
ES | +S number | (S)
Only S is nullable
- 21 -
Computing FIRST
Determining FIRST(X)
o o o
if X is a terminal, then add X to FIRST(X) if X then add to FIRST(X) if X is a nonterminal and X Y1Y2...Yk then a is in FIRST(X) if a is in FIRST(Yi) and is in FIRST(Yj) for j = 1...i-1 (i.e., its possible to have an empty prefix Y1 ... Yi-1 if is in FIRST(Y1Y2...Yk) then is in FIRST(X)
S S E
ES | +S number | (S)
- 22 -
Computing FOLLOW
Determining FOLLOW(X)
o o o
if S is the start symbol then $ is in FOLLOW(S) if A B then add all FIRST( ) != to FOLLOW(B) if A B or B and is in FIRST( ) then add FOLLOW(A) to FOLLOW(B) FIRST(S) = {num, ( } FIRST(S) = { , + } FIRST(E) = { num, ( }
S S E
ES | +S number | (S)
- 23 -
Consider a production X Add to the X row for each symbol in FIRST( ) If can derive ( is nullable), add for each symbol in FOLLOW(X)
S S E
ES | +S number | (S)
Ambiguous Grammars
Construction of predictive parse table for ambiguous grammar results in conflicts
S S+S|S*S| num
- 25 -
LL(1) Grammar
A grammar G is LL(1) if it is not left recursive and for each collection of productions A 1| 2| | n for nonterminal A the following holds: 1. FIRST( i) FIRST( j) = for all i 2. if i * then 2.a. j * for all i j 2.b. FIRST( j) FOLLOW(A) = for all i j j
- 26 -
Non-LL(1) Examples
S aR| R S| S aRa R S|
and
- 27 -
Impact of Ambiguity
Different parse trees correspond to different evaluations! Thus, program meaning is not defined!!
* *
+
1
+ 3 1 2 =9
2
=7
- 28 -
So, no deterministic CFL is inherently ambiguous No inherently ambiguous programming languages have been invented
- 29 -
Eliminating Ambiguity
Often can eliminate ambiguity by adding nonterminals and allowing recursion only on right or left S
o o
S T
S+T T*3 2
o o
- 30 -
- 31 -
Associativity
An operator is either left, right or non associative
o o o
Left (right) recursion left (right) associativity Non: Dont be recursive, simply reference next higher precedence non-terminal on both sides of operator
- 32 -
Error Handling
A good compiler should assist in identifying and locating errors
o o o o
Lexical errors: important, compiler can easily recover and continue Syntax errors: most important for compiler, can almost always recover Static semantic errors: important, can sometimes recover Dynamic semantic errors: hard or impossible to detect at compile time, runtime checks are required Logical errors: hard or impossible to detect
- 33 -
Discard input until a token in a set of designated synchronizing tokens is found Perform local correction on the input to repair the error
Phrase-level recovery
o
Error productions
o
Global correction
o
- 34 -
T ER synch
- 35 - nonterminal A and skips input synch: the driver pops current till
Phrase-Level Recovery
Change input stream by inserting missing tokens For example: id id is changed into id * id Pro: Can be automated Cons: Recovery not always intuitive Can then continue
here
id E ER T TR F T F F TR id insert * E T ER ER TR synch + T ER synch TR * F TR F (E) synch
- 36 -
* E T
( ER
) ER
Error Productions
E T ER ER + T ER | T F TR TR * F TR | F ( E ) | id
id E ER T TR F T TR F F TR F TR id E T ER ER TR synch + T ER synch TR * F TR F (E)
- 37 -
Add error production: T R F TR to ignore missing *, e.g.: id id Pro: Powerful recovery method Cons: Cannot be automated
+ * E T synch ( ER F TR synch TR synch ) ER synch TR synch $ synch T ER synch