Why Syntax Analysis?

Why Syntax Analysis?
After lexical analysis (scanning), we have a series of tokens. Goal: Recover the structure of the program described by those series of tokens. Goal: Report errors if those tokens do not properly encode a structure.
4/30/12
Syntax Analyzer
Syntax Analyzer creates the syntactic structure of the given source program. This syntactic structure is mostly a parse tree. Syntax Analyzer is also known as parser. The syntax of a programming is described by a context-free grammar (CFG). We will use BNF (Backus-Naur Form) notation in the description of CFGs. The syntax analyzer (parser) checks whether a given source program satisfies the rules implied by a context-free grammar or not. If it satisfies, the parser creates the parse tree of that program. Otherwise the parser gives the error messages. A context-free grammar gives a precise syntactic specification of a programming language. the design of the grammar is an initial phase of the design of a compiler. a grammar can be directly converted into a parser by some tools.
4/30/12
Parser
Parser works on a stream of tokens. The smallest item is a token.
4/30/12
Parser Categories
We categorize the parsers into two groups: 1. Top-Down Parser the parse tree is created top to bottom, starting from the root. 2. Bottom-Up Parser the parse is created bottom to top; starting from the leaves
4/30/12
Context Free Grammar(CFG)

Recursive structures of a programming language are defined by a context-free grammar. A context-free grammar consists of: A finite set of terminals (in our case, this will be the set of tokens) A finite set of non-terminals (syntactic-variables) A finite set of production rules in the following form A where A is a non-terminal and is a string of terminals and non-terminals (including the empty string) A start symbol (one of the non-terminal symbol) Example: E E + E | E E | E * E | E / E | - E | ( E ) | id
4/30/12
Derivations
E E+E E derives E+E (E+E derives from E) we can replace E by E+E we have to have a production rule EE+E in our grammar. E E+E id+E id+id A sequence of replacements of non-terminal symbols is called a derivation of id+id from E. In general a derivation step is A [if there is a production rule A in our grammar where and are arbitrary strings of terminal and non-terminal symbols.]
4/30/12
Derivation Examples
E -E -(E) -(E+E) -(id+E) -(id+id) left-most derivation OR E -E -(E) -(E+E) -(E+id) -(id+id) ) right-most derivation At each derivation step, we can choose any of the non-terminal in the sentential form of G for the replacement. If we always choose the left-most non-terminal in each derivation step, this derivation is called as left-most derivation. If we always choose the right-most non-terminal in each derivation step, this derivation is called as right-most derivation. The top-down parsers try to find the left-most derivation of the program given source program. The bottom-up parsers try to find the right-most derivation of the given source program in the reverse order.
4/30/12
Parse Tree
A parse tree is a graphical representation of a derivation. Inner nodes of a parse tree are non-terminal symbols. The leaves of a parse tree are terminal symbols.
4/30/12
Ambiguity
A grammar producing more than one parse tree for a sentence is an ambiguous grammar. It causes confusing interpretations of a given input string. The first parse tree interprets id+id*id as id+(id*id) whereas the second parse tree interprets it as (id+id)*id leading to confusion during intermediate code generation phase.
4/30/12
Ambiguity Elimination
Ambiguous grammars (because of ambiguous operators) can be disambiguated according to the precedence and associativity rules. E E+E | E*E | E^E | id | (E) Disambiguate the grammar Precedence Associativity ^ (highest) (right to left) * (left to right) + (left to right) E E+T | T T T*F | F F G^F | G G id | (E)
4/30/12
Ambiguity
4/30/12
Ambiguity Elimination
We prefer the parse tree, in which else matches with the closest if. So, we can disambiguate our grammar to reflect this choice. The unambiguous grammar will be: stmt matchedstmt | unmatchedstmt matchedstmt if expr then matchedstmt else matchedstmt | otherstmts unmatchedstmt if expr then stmt | if expr then matchedstmt else unmatchedstmt
4/30/12
Top-Down Parser
Can be viewed as an attempt to find a leftmost derivation for an input string. Can also be viewed as an attempt to construct a parse tree for the input starting from the root and creating the nodes of the parse tree in preorder
4/30/12
General Top Down Parsing Algo

1.
Root = current-node=start-symbol and current-input symbol = first input symbol Form branches of the current-node applying production yet to be considered at this node. Current-node = left-most unexplored node traversing the preorder route If(current-node == terminal symbol) { If(current-node == current-input symbol){

2.
3.
4.
If there is no more input symbol to be read return success. Current-input symbol = next-input symbol Go to step 3
} Else if(there exists any other production for the parent node yet to be applied){

Discard current production applied at parent node // backtracking needed Current-node = parent node Go to step 2
4/30/12
CFG: S->cAd, A->ab|a ; Input string: cad

1.
Illustration of Top Down Parsing Algo.

c
S
cad; match found so advance ptr to point to next symbol a
4.
S c d A a
A S c
2.
d
cad; match found so advance
A a S c a A d b b
ptr to point to next symbol d cad; match found so advance
S c A a d
ptr to point to next symbol d
4/30/12

Why Syntax Analysis?

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Why Syntax Analysis?

Uploaded by

Copyright:

Available Formats

Why Syntax Analysis?

Parser works on a stream of tokens. The smallest item is a token.

Context Free Grammar(CFG)

General Top Down Parsing Algo

CFG: S->cAd, A->ab|a ; Input string: cad

Illustration of Top Down Parsing Algo.

cad; match found so advance ptr to point to next symbol a

ptr to point to next symbol d cad; match found so advance

ptr to point to next symbol d

You might also like