You are on page 1of 15

Why Syntax Analysis?

After lexical analysis (scanning), we have a series of tokens. Goal: Recover the structure of the program described by those series of tokens. Goal: Report errors if those tokens do not properly encode a structure.

4/30/12

Syntax Analyzer
Syntax Analyzer creates the syntactic structure of the given source program. This syntactic structure is mostly a parse tree. Syntax Analyzer is also known as parser. The syntax of a programming is described by a context-free grammar (CFG). We will use BNF (Backus-Naur Form) notation in the description of CFGs. The syntax analyzer (parser) checks whether a given source program satisfies the rules implied by a context-free grammar or not. If it satisfies, the parser creates the parse tree of that program. Otherwise the parser gives the error messages. A context-free grammar gives a precise syntactic specification of a programming language. the design of the grammar is an initial phase of the design of a compiler. a grammar can be directly converted into a parser by some tools.

4/30/12

Parser

Parser works on a stream of tokens. The smallest item is a token.

4/30/12

Parser Categories
We categorize the parsers into two groups: 1. Top-Down Parser the parse tree is created top to bottom, starting from the root. 2. Bottom-Up Parser the parse is created bottom to top; starting from the leaves

4/30/12

Context Free Grammar(CFG)


Recursive structures of a programming language are defined by a context-free grammar. A context-free grammar consists of: A finite set of terminals (in our case, this will be the set of tokens) A finite set of non-terminals (syntactic-variables) A finite set of production rules in the following form A where A is a non-terminal and is a string of terminals and non-terminals (including the empty string) A start symbol (one of the non-terminal symbol) Example: E E + E | E E | E * E | E / E | - E | ( E ) | id

4/30/12

Derivations
E E+E E derives E+E (E+E derives from E) we can replace E by E+E we have to have a production rule EE+E in our grammar. E E+E id+E id+id A sequence of replacements of non-terminal symbols is called a derivation of id+id from E. In general a derivation step is A [if there is a production rule A in our grammar where and are arbitrary strings of terminal and non-terminal symbols.]

4/30/12

Derivation Examples
E -E -(E) -(E+E) -(id+E) -(id+id) left-most derivation OR E -E -(E) -(E+E) -(E+id) -(id+id) ) right-most derivation At each derivation step, we can choose any of the non-terminal in the sentential form of G for the replacement. If we always choose the left-most non-terminal in each derivation step, this derivation is called as left-most derivation. If we always choose the right-most non-terminal in each derivation step, this derivation is called as right-most derivation. The top-down parsers try to find the left-most derivation of the program given source program. The bottom-up parsers try to find the right-most derivation of the given source program in the reverse order.
4/30/12

Parse Tree
A parse tree is a graphical representation of a derivation. Inner nodes of a parse tree are non-terminal symbols. The leaves of a parse tree are terminal symbols.

4/30/12

Ambiguity
A grammar producing more than one parse tree for a sentence is an ambiguous grammar. It causes confusing interpretations of a given input string. The first parse tree interprets id+id*id as id+(id*id) whereas the second parse tree interprets it as (id+id)*id leading to confusion during intermediate code generation phase.

4/30/12

Ambiguity Elimination
Ambiguous grammars (because of ambiguous operators) can be disambiguated according to the precedence and associativity rules. E E+E | E*E | E^E | id | (E) Disambiguate the grammar Precedence Associativity ^ (highest) (right to left) * (left to right) + (left to right) E E+T | T T T*F | F F G^F | G G id | (E)
4/30/12

Ambiguity

4/30/12

Ambiguity Elimination
We prefer the parse tree, in which else matches with the closest if. So, we can disambiguate our grammar to reflect this choice. The unambiguous grammar will be: stmt matchedstmt | unmatchedstmt matchedstmt if expr then matchedstmt else matchedstmt | otherstmts unmatchedstmt if expr then stmt | if expr then matchedstmt else unmatchedstmt

4/30/12

Top-Down Parser

Can be viewed as an attempt to find a leftmost derivation for an input string. Can also be viewed as an attempt to construct a parse tree for the input starting from the root and creating the nodes of the parse tree in preorder

4/30/12

General Top Down Parsing Algo


1.

Root = current-node=start-symbol and current-input symbol = first input symbol Form branches of the current-node applying production yet to be considered at this node. Current-node = left-most unexplored node traversing the preorder route If(current-node == terminal symbol) { If(current-node == current-input symbol){

2.

3.

4.

If there is no more input symbol to be read return success. Current-input symbol = next-input symbol Go to step 3

} Else if(there exists any other production for the parent node yet to be applied){

Discard current production applied at parent node // backtracking needed Current-node = parent node Go to step 2

4/30/12

CFG: S->cAd, A->ab|a ; Input string: cad


1.

Illustration of Top Down Parsing Algo.


c
S

cad; match found so advance ptr to point to next symbol a

4.

S c d A a

A S c
2.

d
cad; match found so advance

A a S c a A d b b

ptr to point to next symbol d cad; match found so advance

S c A a d

ptr to point to next symbol d

4/30/12

You might also like