Professional Documents
Culture Documents
After lexical analysis (scanning), we have a series of tokens. Goal: Recover the structure of the program described by those series of tokens. Goal: Report errors if those tokens do not properly encode a structure.
4/30/12
Syntax Analyzer
Syntax Analyzer creates the syntactic structure of the given source program. This syntactic structure is mostly a parse tree. Syntax Analyzer is also known as parser. The syntax of a programming is described by a context-free grammar (CFG). We will use BNF (Backus-Naur Form) notation in the description of CFGs. The syntax analyzer (parser) checks whether a given source program satisfies the rules implied by a context-free grammar or not. If it satisfies, the parser creates the parse tree of that program. Otherwise the parser gives the error messages. A context-free grammar gives a precise syntactic specification of a programming language. the design of the grammar is an initial phase of the design of a compiler. a grammar can be directly converted into a parser by some tools.
4/30/12
Parser
4/30/12
Parser Categories
We categorize the parsers into two groups: 1. Top-Down Parser the parse tree is created top to bottom, starting from the root. 2. Bottom-Up Parser the parse is created bottom to top; starting from the leaves
4/30/12
4/30/12
Derivations
E E+E E derives E+E (E+E derives from E) we can replace E by E+E we have to have a production rule EE+E in our grammar. E E+E id+E id+id A sequence of replacements of non-terminal symbols is called a derivation of id+id from E. In general a derivation step is A [if there is a production rule A in our grammar where and are arbitrary strings of terminal and non-terminal symbols.]
4/30/12
Derivation Examples
E -E -(E) -(E+E) -(id+E) -(id+id) left-most derivation OR E -E -(E) -(E+E) -(E+id) -(id+id) ) right-most derivation At each derivation step, we can choose any of the non-terminal in the sentential form of G for the replacement. If we always choose the left-most non-terminal in each derivation step, this derivation is called as left-most derivation. If we always choose the right-most non-terminal in each derivation step, this derivation is called as right-most derivation. The top-down parsers try to find the left-most derivation of the program given source program. The bottom-up parsers try to find the right-most derivation of the given source program in the reverse order.
4/30/12
Parse Tree
A parse tree is a graphical representation of a derivation. Inner nodes of a parse tree are non-terminal symbols. The leaves of a parse tree are terminal symbols.
4/30/12
Ambiguity
A grammar producing more than one parse tree for a sentence is an ambiguous grammar. It causes confusing interpretations of a given input string. The first parse tree interprets id+id*id as id+(id*id) whereas the second parse tree interprets it as (id+id)*id leading to confusion during intermediate code generation phase.
4/30/12
Ambiguity Elimination
Ambiguous grammars (because of ambiguous operators) can be disambiguated according to the precedence and associativity rules. E E+E | E*E | E^E | id | (E) Disambiguate the grammar Precedence Associativity ^ (highest) (right to left) * (left to right) + (left to right) E E+T | T T T*F | F F G^F | G G id | (E)
4/30/12
Ambiguity
4/30/12
Ambiguity Elimination
We prefer the parse tree, in which else matches with the closest if. So, we can disambiguate our grammar to reflect this choice. The unambiguous grammar will be: stmt matchedstmt | unmatchedstmt matchedstmt if expr then matchedstmt else matchedstmt | otherstmts unmatchedstmt if expr then stmt | if expr then matchedstmt else unmatchedstmt
4/30/12
Top-Down Parser
Can be viewed as an attempt to find a leftmost derivation for an input string. Can also be viewed as an attempt to construct a parse tree for the input starting from the root and creating the nodes of the parse tree in preorder
4/30/12
Root = current-node=start-symbol and current-input symbol = first input symbol Form branches of the current-node applying production yet to be considered at this node. Current-node = left-most unexplored node traversing the preorder route If(current-node == terminal symbol) { If(current-node == current-input symbol){
2.
3.
4.
If there is no more input symbol to be read return success. Current-input symbol = next-input symbol Go to step 3
} Else if(there exists any other production for the parent node yet to be applied){
Discard current production applied at parent node // backtracking needed Current-node = parent node Go to step 2
4/30/12
4.
S c d A a
A S c
2.
d
cad; match found so advance
A a S c a A d b b
S c A a d
4/30/12