Professional Documents
Culture Documents
CS 1622 Lecture 6
Source code
Scanner
tokens
Parser
IR
Errors
Parser Checks the stream of words and their parts of speech (produced by the scanner) for grammatical correctness Determines if the input is syntactically well formed
Guides checking at deeper levels than syntax Builds an IR representation of the code
CS 1622 Lecture 6
Input: sequence of tokens from lexer Output: parse tree of the program
parse tree is generated if the input is a legal program Instead of parse tree, some parsers produce directly:
abstract syntax tree (AST) + symbol table, or intermediate code, or object code
CS 1622 Lecture 6 3
CS 1622 Lecture 6
Example
The program: x*y+z Input to parser:
ID TIMES ID PLUS ID
well write tokens as follows:
E E E id * E id
5
E id
id * id + id
CS 1622 Lecture 6
parser must return the parse tree A language for describing valid strings of tokens A method for distinguishing valid from invalid strings of tokens (and for building the parse tree)
CS 1622 Lecture 6 6
We need:
Parser
source lexical analyzer token parser get next token symbol table parse tree rest of frontend IR
Grammars
Precise, easy-to understand description of syntax Context-free grammars -> efficient parsers (automatically!) Help in translation and error detection
CS 1622 Lecture 6
Syntax Errors
eg. Unbalanced () Report errors quickly & accurately Recover quickly (continue parsing after error) Little overhead on parse time
CS 1622 Lecture 6 9
Error Recovery
Panic mode
Phrase level
Local correction: replace a token by another and continue Encode commonly expected errors in grammar Find closest input string that is in L(G)
Error productions
Global correction
CS 1622 Lecture 6
10
Context-free Grammars
Precise and easy way to specify the syntactical structure of a programming language Efficient recognition methods exist Natural specification of many recursive constructs:
Terminals T
Symbols which form strings of L(G), G a CFG (= tokens in the scanner), e.g. if, else, id Syntactic variables denoting sets of strings of L(G) Impose hierarchical structure (e.g., precedence rules) Denotes the set of strings of L(G) Rules that determine how strings are formed N -> (N|T)*
Nonterminals N
Start symbol S ( N)
Productions P
CS 1622 Lecture 6
12
CS 1622 Lecture 6
13
CS 1622 Lecture 6
14
Terminals:
Nonterminals
Start symbol
CS 1622 Lecture 6
15
Notational Conventions
Terminals
Nonterminals
A, B, C .. S start symbol (if present) or first nonterminal in production list u,v,.. , A ->
Terminal strings
Productions
CS 1622 Lecture 6
16
CS 1622 Lecture 6
17
More Definitions
L(G) language generated by G = set of strings derived from S S =>+ w : w sentence of G (w string of terminals) S =>+ : sentential form of G (string can contain nonterminals) G and G are equivalent : L(G) = L(G) A language generated by a grammar (of the form shown) is called a context-free language
CS 1622 Lecture 6
18
Example
G = ({-,*,(,),<id>}, {E}, E, {E -> E + E, E-> E * E , E -> (E) , E> - E, E -> <id>})
Sentence: -(<id> + <id>) Derivation: E => -E => -(E) => -(E+E)=>-(<id>+E) => (<id> + <id>)
Leftmost derivation i.e. always replace leftmost nonterminal Rightmost derivation analogously Left /right sentential form
CS 1622 Lecture 6
19
Parse Trees
E E => -E => -(E) => -(E+E)=> -(<id>+E) => -(<id> + <id>) ( E <id>
CS 1622 Lecture 6
Expressive Power
Can express matching () with CFGs Can express most properties desired for programming languages Identifiers declared before used L = {wcw|w is in (a|b)*} Parameter checking (#formals = #actuals) L ={anbmcndm|n 1, m 1}
CS 1622 Lecture 6
21
Parsing
= determining whether a string of tokens can be generated by a grammar Two classes based on order in which parse tree is constructed:
Top-down parsing
Start construction at root of parse tree Start at leaves and proceed to root
CS 1622 Lecture 6 22
Bottom-up parsing
1 n
YYL
CS 1622 Lecture 6
23
Derivation Example
See board
CS 1622 Lecture 6
24
Notes on Derivations
An in-order traversal of the leaves is the original input The parse tree shows the association of operations, the input string does not
CS 1622 Lecture 6 25
Terminals
Terminals are called because there are no rules for replacing them Once generated, terminals are permanent Terminals are the tokens of the language represented by the grammar
CS 1622 Lecture 6
26
CS 1622 Lecture 6
28
Note that right-most and left-most derivations have the same parse tree The difference is the order in which branches are added
CS 1622 Lecture 6
29
Summary of Derivations
10
Ambiguity (Cont.)
This string has two parse trees
E E E + E id * E id E id E * E + E id
id E id
CS 1622 Lecture 6
31
Parse trees
Question 1:
for each of the two parse trees, find the corresponding left-most derivation
Question 2:
for each of the two parse trees, find the corresponding right-most derivation
CS 1622 Lecture 6
32
Ambiguity (Cont.)
it has more than one parse tree if there is more than one right-most derivation if there is more than one left-most derivation
Ambiguity is BAD
CS 1622 Lecture 6
33
11
CS 1622 Lecture 6
34
use a different nonterminal for each precedence level start with the lowest precedence (MINUS)
E E - E | E / E | ( E ) | id
rewrite to
E E-E | T T T/T | F F id | ( E )
CS 1622 Lecture 6
35
Example
parse tree for id id / id
E E E-E | T T T/T | F F id | ( E ) T F id
CS 1622 Lecture 6
E E T T F id / T F id
36
12
Attempt to construct a parse tree for idid/id that shows the wrong precedence. Question:
CS 1622 Lecture 6
37
Associativity
fails to express that both subtraction and division are left associative;
CS 1622 Lecture 6
38
Recursion
X + X
recall that + means in one or more steps, X derives a sequence of symbols that includes an X
X + X
X + X
13
The grammar given above is both left and right recursive in nonterminals exp and term
try at home: write the derivation steps that show this. For left associativity, use left recursion. For right associativity, use right recursion.
14