You are on page 1of 23

Module 1

Introduction to Compilers
Organization of a computer
Categorization of System Software
Text Editors
Compilers
Assemblers
Linkers
Loaders
Macro processor
Compilers
Definition: a program that converts instructions into a
machine-code or lower-level form so that they can be
read and executed by a computer

A compiler is a computer software that transforms


computer code written in one programming language
(the source language) into another programming
language

A compiler is a software program that transforms


high-level source code that is written by a developer in
a high-level programming language into a low level
object code (binary code) in machine language, which
can be understood by the processor. The process of
converting high-level programming into machine
language is known as compilation.
Fig: Structure of a compiler

Fig: Structure of a compiler

Fig: Structure of an interpreter


Difference between compiler and interpreter

INTERPRETER COMPILER

Translates program one statement at a Scans the entire program and translates it
time. as a whole into machine code.

It takes less amount of time to analyze the It takes large amount of time to analyze
source code but the overall execution time the source code but the overall execution
is slower. time is comparatively faster.

Generates intermediate object code which


No intermediate object code is generated,
further requires linking, hence requires
hence are memory efficient.
more memory.

Continues translating the program until the It generates the error message only after
first error is met, in which case it stops. scanning the whole program. Hence
Hence debugging is easy. debugging is comparatively hard.

Programming language like Python, Ruby Programming language like C, C++ use
use interpreters. compilers.
History of Compilers
Corrado Bohm – 1951
Navy Electronics Laboratory
International ALGOL Compiler - NELIAC
Lisp
Forth
Source program

Preprocessor

Modified Source program

Compiler

Target Assembly Program

Assembler

Relocatable Machine Code

Linker/Loader

Target Machine Code

Fig: Language Processing System


Evolution of Programming Languages
Machine language as in sequence of 0’s and 1’s –
1940
◦ Move data
◦ Compare two values
◦ Hard to understand and modify
Mnemonic Assembly Language – 1950
Higher level languages – more user friendly, less
error prone, easy modification
◦ Fortran – Scientific Calculation
◦ Cobol – Business data processing
◦ Lisp – symbolic computation
Generation of Programming Languages
First Generation
◦ Machine languages

Second Generation
◦ Assembly languages

Third Generation
◦ Fortran, Cobol, Lisp, C, C++, C#, Java

Fourth Generation
◦ NOMAD, SQL, Postscript

Fifth Generation
◦ Prolog
Impact on Compilers
Advances in programming language – new
algorithms and representations

Minimize the execution overhead

Compiler development is challenging

Generation of optimal target code


Development of Compilers
Formulation of abstract mathematical model
◦ Problem analysis
◦ Formulation of mathematical abstraction
◦ Solve using mathematical techniques

Preserve the meaning of the program under translation

Models used are: finite-state machines, regular


expressions, context free grammars
◦ Correct, with meaning preserved, improvement in
performance – speed of code execution, efficient in terms
of time and efforts, minimize power consumption and
usable
Code optimization
Correctness of the optimized code

Improve the performance of the program


◦ Speed of program execution
◦ Minimize power consumption
◦ Usable

Reasonable compilation time


◦ Less compilation time and debugging cycle

Manageable engineering effort


◦ Prioritize optimization
Structure of a Compiler
Lexical Analysis
LA or Scanners reads the source program one
character at a time – group of meaningful
sequence of characters – lexeme
Output – token
◦ <token_name, attribute_value>
A token describes a pattern of characters having
same meaning in the source program
Puts information about identifiers into the symbol
table not all attributes
Regular expressions are used to describe tokens
Blank spaces removed
DFA
Syntax Analysis
Parsing / Parser
Produces a syntactic structure of the given
program – syntax tree or parse tree
Syntax tree – terminals and non terminals
Intermediate representation – grammatical
structure
Syntax tree
◦ Interior nodes represent operations
◦ Child nodes represent the arguments of operation
Semantic Analysis
A semantic analyzer checks the source program
for semantic errors and collects the type
information for the code generation
Uses both syntax tree and symbol table
Performs type checking
Determines the meaning of the source string
Applications of Compiler Technology

Implementation of high level programming languages

Optimization for computer architecture


◦ Parallelism
◦ Memory hierarchy

Design of new computer architectures

Program Translations

Software Productivity Tools

You might also like