You are on page 1of 47

Introduction

Organization
Basics of Compilers
Stages in a Compiler
Compiler Implementation
Basics of Compilers
What is a Compiler ?
Translates Code Written in Higher
Level Languages into Target
Language

Target Language can be Assembly


language or Object Code
Basic Functions of Compiler
Report Errors and Warnings in the Input
Source

Provides Options for Debugging


Companions of Compiler
Preprocessor
Compiler
Assembler
Linker
Preprocessor
Preprocessor

Substitutes Macros, Strip Comments,


Includes Header Files

Input : Higher Level Language Source


Output : (Pure) Higher Level
Language Source
Compiler
Translates Code Written in Higher Level
Languages into Target Language

Input : Preprocessed Higher Level


Language Source

Output : Lower Level Language usually


Assembly or Object Code
Assembler
Assembler
Converts Assembly language Program
into Machine Language Program
(Object Code)

Input : Assembly Level Language


Output : Object Code
Linker
Linker
Links one or more Object Files into an
Executable File

Input : One or more files of Object


Code
Output : Executable file
The Compiler
Takes in the input
source after
Preprocessing ex1.i Pre-Processed File

Step 2 C Compiler (cc1.exe)

Gives out ex1.s Assembly File

Assembly
language Program
Stages in a Compiler
Stages in a Compiler
Two stages ex1.c

Front end (or Analysis)


Back end (or Synthesis). Front End

Front End/ Analysis Intermediate Code

Preprocessed file to
Intermediate Code Back End

Back End/ Synthesis


Intermediate Code to Target Code
Target Assembly Code
Advantages of 2 Stage Compiler
Easy to Add Support for New Machine

Easy to Add Support for New Language

Static Code Analysis: Analysis of computer


softwarethat is performed without actually
executing programs
Front End ex1.c

Lexical Analysis Lexical Analysis

Syntax Analysis Tokens

Semantic Syntax Analysis


Analysis Parse Tree

Intermediate Front End Semantic Analysis


Code generation (Analysis)

Annotated Parse Tree


Intermediate
Code Intermediate Code generation

Optimization Intermediate Code

Intermediate Code Optimization

Intermediate Code
Back End
Intermediate Code

Target Code
generation Target Code generation

Back End Target Assembly Language


Target Code (Synthesis)

Optimization
Target Code Optimization

Target Assembly Language


Lexical Analysis / Scanning
Reads stream of characters
Group characters into meaningful
sequences Lexemes
For each lexeme, L.A. produces a output
token of form:
<token-name, attribute value>
token-name: name of valid token, used during
syntax analysis
attribute value: Entry for token in symbol table
Lexical Analysis / Scanning
Role of Lexical Analysis

Generate Tokens
Ignores spaces, new line etc.
Generate error messages line by line
Helps in Macro Expansion
Lexical Analysis / Scanning
Small meaningful sequence of
characters called Tokens
C Programming language Tokens
Identifiers (user defined variables)
Keywords (while, for),
punctuation marks (Left brace/Right brace)
Operators (+ and -).
Syntax Analysis / Parsing
Do Tokens form a valid sequence ?
= b c+d is not a valid sequence
Syntax or Parse Tree can be an Output

Tree like
intermediate
Tokens Parser representatio
n (Syntax
Tree)
Semantic Analysis
Performs Type Checking
Do syntactically correct statements
make a meaningful reading ?
'x = y + 2 ; ' not meaningful if x is function
name or array and y is a float
Checks whether
Syntax Tree + Semantic prog. is
info. from Analysis meaningful or
symbol table not?
Intermediate Code Generation
Generates Intermediate Code

Uses Three Address Code consists of


sequence of Assembly like instructions
Intermediate Code Optimization
Methods:
Optimize the Intermediate Code
Common Sub-expression Elimination: In the
following code:
a = b * c + g; d = b * c * e;

Dead Code Elimination: remove code which does not


affect the program results.

Copy Propagation: Process of replacing the occurrences of


direct assignments with their values.
y=x
z = 3 + y Copy propagation would yield: z = 3 + x
Target Code Generation
Intermediate code is translated into
machine or Assembly code.
Associate memory locations with the
variables
Generate assembly instructions for
target processor.
Intermediate Code Target Code (x86 Assembly)
x := y + z movl _y, %eax
addl _z,%eax
movl %eax,_x
Target Code Optimization
Target code is transformed into a more
efficient Target code.

Better usage of Registers


Compiler Implementation
Compiler Implementations
Simple Modular Implementation
The Lexical Analysis module scans the entire
input source program, breaks it up into
tokens and output the entire list of tokens.
The Syntax Analysis module takes the entire
list of Tokens and outputs a parse tree
The Semantic Analysis module takes entire
parse tree, performs type checking and
annotates it with data type information.
Compiler Implementations
Pass and Phase

A pass is a single time the compiler passes over


(goes through) the sources code

Phase is used to classify compilers according to the


construction. E.g 2 phase (analysis and synthesis)

A Multi Pass Compiler


Compiler Implementations
Collapsing several phases into one pass
1. Read a single token. [Lexical Analysis]

2. If the token sequence matches a grammar rule


go to step 3, else go to step 1. [Syntax Analysis]

3. Perform the Semantic check for the matched


grammar construct. [Semantic Analysis]

4. Generate Intermediate code for the matched


grammar construct. [Intermediate Code generation]
Compiler Implementations
ex1.c

Collapsing Lexical Analysis

several Syntax Analysis

phases into
Pass 1

Semantic Analysis

one pass Front End


(Analysis)

Intermediate Code generation

Intermediate Code

Intermediate Code Optimization Pass 2

Intermediate Code

Pass 3
Target Code generation

Back End
Target Assembly Language
(Synthesis)

Target Code Optimization Pass 4

Target Assembly Language


Data Structures in a Compiler
Symbol Table
Information about identifiers used in the
input source program
= , + , id etc.
Literal Table
Stores the strings and constants found
in the input source Program.
Hello World, 5 , x=%d y=%d\n
Conserve Memory by reusing the constants
and strings
Symbol Table
Each entry corresponds to a symbol
(Identifier)
Contains: the name of the Symbol, the
data type, size etc.
Updated and looked up in different
phases of the compiler
Implemented as hash table with indexes
for fast lookup capability.
Other Data Structures
Intermediate Code
Array /Linked list of structures for
facilitating easy reorganization
Parse Tree
Pointer-based structure, where the parent
and children contain pointers to each other
for quick traversal
Compiler Construction Tools
Scanner Generator:
These tools generate Lexical Analyzers from
Regular Expressions.

Parser Generator:
These tools produce Syntax analyzers from input:
based on Grammar.
Generate parse tree

Syntax Directed Translation Engines:


These tools produce routines that generate
intermediate code from parse tree.
Compiler Construction Tools
Automatic Code Generators:
This tool takes collection of rules that define
translation of each operation of intermediate
language into machine language.

Data Flow Engines:


Generates information to perform code
optimization
Question
1. Compiler is a program. To
compile a program we need a
compiler. Then how is the code of a
compiler executed?

Q2. Difference b/w compiler and


interpreter.
Compiler vs. Interpreter

Compiler Interpreter
Pros Pros
Less space Easy debugging
Fast execution Fast Development

Cons Cons
Slow processing Not for large projects
Partly Solved Exceptions: Perl,
(Separate compilation) Python
Debugging Requires more space
Improved thru IDEs Slower execution
Interpreter in memory
all the time
Application of Compiler technology
Implementation of high level programming
languages
- Concepts of OOPs
Optimization for Computer Architecture
- Parallelism
- Memory hierarchies of machines(Reg, Arrays etc)
Design of new computer Architectures
- RISC
- CISC
- SIMD etc
Program translations
Software productivity tools
Translation of a statement
Translation of a statement
Intermediate Code Generation
Generates Intermediate Code

Uses Three Address Code consists of


sequence of Assembly like instructions
Intermediate Code Optimization
Methods:
Optimize the Intermediate Code
Common Sub-expression Elimination: In the
following code:
a = b * c + g; d = b * c * e;

Dead Code Elimination: remove code which does not


affect the program results.

Copy Propagation: Process of replacing the occurrences of


direct assignments with their values.
y=x
z = 3 + y Copy propagation would yield: z = 3 + x
Data Structures in a Compiler
Symbol Table
Information about identifiers used in the
input source program
= , + , id etc.
Literal Table
Stores the strings and constants found
in the input source Program.
Hello World, 5 , x=%d y=%d\n
Conserve Memory by reusing the constants
and strings
Other Data Structures
Intermediate Code
Array /Linked list of structures for
facilitating easy reorganization
Parse Tree
Pointer-based structure, where the parent
and children contain pointers to each other
for quick traversal
Compiler Construction Tools
Scanner Generator:
These tools generate Lexical Analyzers from
Regular Expressions.

Parser Generator:
These tools produce Syntax analyzers from input:
based on Grammar.
Generate parse tree

Syntax Directed Translation Engines:


These tools produce routines that generate
intermediate code from parse tree.
Compiler Construction Tools
Automatic Code Generators:
This tool takes collection of rules that define
translation of each operation of intermediate
language into machine language.

Data Flow Engines:


Generates information to perform code
optimization

You might also like