You are on page 1of 5

Compilers

A compiler is a program that reads a program in one language, the source language and translates into an equivalent program in another language, the target language. The translation process should also report the presence of errors in the source program. Source Program

Compiler

Target Program

Error Messages There are two parts of compilation. The analysis part breaks up the source program into constant piece and creates an intermediate representation of the source program. The synthesis part constructs the desired target program from the intermediate representation.

Phases of Compiler
The compiler has a number of phases plus symbol table manager and an error handler. Input Source Program

Lexical Analyzer

Syntax Analyzer

Symbol Table Semantic Analyzer Error Handler

Manager

Intermediate Code Generator

Code Optimizer

Code Generator

Out Target Program The cousins of the compiler are 1. Preprocessor. 2. Assembler. 3. Loader and Link-editor. Front End vs Back End of a Compilers. The phases of a compiler are collected into front end and back end. The front end includes all analysis phases end the intermediate code generator. The back end includes the code optimization phase and final code generation phase. The front end analyzes the source program and produces intermediate code while the back end synthesizes the target program from the intermediate code. A naive approach (front force) to that front end might run the phases serially. 1. 2. 3. 4. Lexical analyzer takes the source program as an input and produces a long string of tokens. Syntax Analyzer takes an out of lexical analyzer and produces a large tree. Semantic analyzer takes the output of syntax analyzer and produces another tree. Similarly, intermediate code generator takes a tree as an input produced by semantic analyzer and produces intermediate code.

Minus Points

Requires enormous amount of space to store tokens and trees.

Very slow since each phase would have to input and output to and from temporary disk

Remedy

use syntax directed translation to inter leaves the actions of phases. Compiler construction tools.

Parser Generators: The specification of input based on regular expression. The organization is based on finite automation. Scanner Generator: The specification of input based on regular expression. The organization is based on finite automation. Syntax-Directed Translation: It walks the parse tee and as a result generate intermediate code. Automatic Code Generators: Translates intermediate rampage into machine language. Data-Flow Engines: It does code optimization using data-flow analysis.

Syntax Definition
A contex free grammar, CFG, (synonyms: Backus-Naur Firm of BNF) is a common notation for specifying the syntax of a languages For example, an "IF-ELSE" statement in c-language has the form IF (Expr) stmt ELSE stmt In other words, it is the concatenation of:

the keyword IF ; an opening parenthesis ( ; an expression Expr ; a closing parenthesis ) ; a statement stmt ; a keyword ELSE ; Finally, another statement stmt.

The syntax of an 'IF-ELSE' statement can be specified by the following 'production rule' in the CFG. stmt IF (Expr) stmt ELSE stmt The arrow ( ) is read as "can have the form". A context-free grammar (CFG) has four components:

1. A set of tokens called terminals. 2. A set of variable called nonterminals. 3. A set of production rules. 4. A designation of one of the nonterminals as the start symbol. Multiple production with the same nonterminal on the left like: list + digit list - digit list may be grouped together separated by vertical bars, like: list list + digit | list - digit | digit

Ambiguity
A grammar is ambiguous if two or more different parse trees can be desire the same token string. Equivalently, an ambiguous grammar allows two different derivations for a token string. Grammar for complier should be unambiguous since different parse trees will give a token string different meaning. Consider the following grammar string string + string | string - string |0|2|...|9 To show that a grammar is ambiguous all we need to find a "single" stringthat has more than one perse tree.

Figure:23 --- pg.31 Above figure show two different parse trees for the token string 9 - 5 + 2 that corresponds to two different way of parenthesizing the expression: ( - 5) + 2 and 9 -(5 + 2). The first parenthesization evaluates to 2. Perhaps, the most famous example of ambiguity in a programming language is the dangling 'ELSE'.

Consider the grammar G with the production: S IF b THEN S ELSE S | IF b THEN S | a G is ambiguous since the sentence IF b THEN IF b THEN a ELSE a has two different parse trees or derivation trees. Parse tree I figure This parse tree imposes the interpretation IF b THEN (IF b THEN a ) ELSE a Parse Tree II Figure This parse tree imposes the interpretation IF b THEN (IF b THEN a ELSE a) The reason that the grammar G is ambiguous is that an 'ELSE' can be associated with two different THENs. For this reason, programming languages which allows both IF-THEN-ELSE and IFTHEN constant can be ambiguous.

You might also like