You are on page 1of 51

Introduction to Compilers

SS 2012
Jun.-Prof. Dr. Christian Plessl
Custom Computing
University of Paderborn
Version 1.1.2 2012-05-01

Outline
compiler structure, intermediate code
code generation
code optimization
retargetable compiler

Translation Process
skeletal source program

preprocessor
source program

compiler
assembler program

assembler
relocatable machine code

linker / loader

library

absolute machine code


3

Compiler Phases
source program

lexical analysis

analysis

syntactic analysis
semantic analysis
symbol table
intermediate code generat.

error handling

code optimization
code generation
synthesis
target program
4

Overview Analysis Phase


lexical analysis
scanning of the source program and splitting into symbols
regular expressions: recognition by finite automata

syntactic analysis
parsing symbol sequences and construction of sentences
sentences are described by a context-free grammar
!A Identifier := E
E E + E | E * E | Identifier | Number

semantic analysis
make sure the program parts "reasonably" fit together,
e.g. implicit type conversions

Overview Synthesis Phase


generation of intermediate code
machine independent simplified retargeting
should be easy to generate
should be easily translatable into target program

optimization
goals for GP processors: fast code, fast translation
goals for specialized processors: fast code, short code (low memory
requirements), low power consumption
both intermediate and target code can be optimized

code generation
translate intermediate representation into assembler code for target
architecture
apply target specific optimizations
6

Example (1)

source program

lexical analysis

assignment symbol

position := initial + rate * 60

id1 :=

id2 +

identifiers

operators

id3

*
60
number

Example (2)

syntactic analysis

semantic analysis

:=
id1

:=
id1

+
id2

id2

*
id3

60

*
id3

IntToReal
60

Example (3)
intermediate code generat.
tmp1 := IntToReal(60)
tmp2 := id3*tmp1
tmp3 := id2+tmp2
id1 := tmp3

code optimization
tmp1 := id3 * 60.0
id1 := id2 + tmp1
code generation
ld.s
li.s
mul.s
ld.s
add.s
st.s

$f1,
$f2,
$f2,
$f1,
$f2,
$f2,

id3
60.0
$f2, $f1
id2
$f2, $f1
id1
9

Syntax Tree and DAG


a := b*(c-d) + e*(c-d)

syntax tree

DAG (directed acyclic graph)


:=

:=

+
*

*
-

*
e

d
10

3 Address Code (1)


3 address instructions
maximal 3 addresses (2 operands, 1 result)
maximal 2 operators

assignment instructions

control flow instructions

x := y op z
x := op y
x := y

goto L
if x relop y goto L

x := y[i]
x[i] := y

subroutines

x := &y
y := *x
*x := y

param x
x = call p,n
return y

11

3 Address Code (2)


Generation of 3 address code from a DAG
(valid but not optimal)
:=

t1 := c - d
t2 := e * t1
t3 := b * t1
t4 := t2 + t3
a := t4

t4
a
t3

t2

*
t1

b
e
c

12

3 Address Code (3)


advantages of 3 address code
dissection of long arithmetic expressions
temporary names facilitate reordering of instructions
forms a valid schedule

definition: A 3 address instruction


x := y op z
defines x and
uses y and z

13

Basic Blocks (1)


definition: A basic block is a sequence of instructions where
the control flow enters at the beginning and exits at the end,
without stopping in-between or branching (except at the
end).

t1
t2
t3
t4
if

:=
:=
:=
:=
t4

c - d
e * t1
b * t1
t2 + t3
< 10 goto L

14

Basic Blocks (2)


determining the basic blocks from a sequence of 3 address
instructions:
1. determine the block beginnings:
the first instruction
targets of un/conditional jumps
instructions that follow un/conditional jumps

2. determine the basic blocks:


there is a basic block for each block beginning
the basic block consists of the block beginning and runs until the
next block beginning (exclusive) or until the program ends

15

Control Flow Graphs


degenerated" control flow graph (CFG)
shows possible control flows in a program
degenerated means that the nodes of the CFG
are basic blocks (instead of instructions)

i := 0
t2 := 0
t2 := t2 + i
i := i + 1
if i < 10 goto L
x := t2

i < 10

i >= 10

16

DAG of a Basic Block


Definition: A DAG of a basic block is a directed acyclic
graph with following node markings:
Leaves are marked with a variable / constant name. Variables with
initial values are assigned the index 0.
Inner nodes are marked with an operator symbol. From the operator
we can conclude whether the value or the address of the variable is
being used.
Optionally, a node can be marked with a sequence of variable names.
Then, all variables are assigned the computed value.

17

Example (1)
C program

int i, prod, a[20], b[20];


...
prod = 0;
i = 0;
do {
prod = prod + a[i]*b[i];
i++;
} while (i < 20);

3 address code
(1)
(2)
(3)
(4)
(5)
(6)
(7)
(8)
(9)
(10)
(11)
(12)

prod := 0
i := 0
t1 := 4 * i
t2 := a[t1]
t3 := 4 * i
t4 := b[t3]
t5 := t2 * t4
t6 := prod + t5
prod := t6
t7 := i + 1
i := t7
if i < 20 goto (3)

18

Example (2)
basic blocks
(1)
(2)
(3)
(4)
(5)
(6)
(7)
(8)
(9)
(10)
(11)
(12)

prod := 0
B1
i := 0
t1 := 4 * i
B2
t2 := a[t1]
t3 := 4 * i
t4 := b[t3]
t5 := t2 * t4
t6 := prod + t5
prod := t6
t7 := i + 1
i := t7
if i < 20 goto (3)

control flow graph

B1
B2

19

Example (3)
basic block B2

DAG for B2
t6, prod

+
t1 := 4 * i
t2 := a[t1]
t3 := 4 * i
t4 := b[t3]
t5 := t2 * t4
t6 := prod + t5
prod := t6
t7 := i + 1
i := t7
if i < 20 goto (3)

prod0

t5

t2

t4

[]

[]

<
t1, t3

*
b

t7, i
20

a
4

i0

1
20

Compiler and Code Generation


compiler structure, intermediate code
code generation
code optimization
code generation for specialized processors
retargetable compiler

21

Code Generation
requirements
correct code
efficient code
fast code generation

code generation = software synthesis


allocation:
mostly the components are fixed (registers, ALUs)

binding:
register binding (register allocation, register assignment)
instruction selection

scheduling:
instruction sequencing

22

Register Binding
goal: efficient use of registers
minimize number of LOAD/STORE instructions (RISC)
instructions with register operands are generally shorter and faster
than instructions with memory operands (CISC)

register allocation, register assignment


allocation: determine for each point in time the set of variables that
should be held in registers
assignment: assign these variables to registers

optimal register binding


NP-complete problem
additionally: restrictions for register use by the processor
architecture, compiler, and operating system

23

Instruction Selection
naive approach
use code pattern for each 3 address instruction

x := y + z
u := x - w

problems

lw
lw
add
sw

$r1,
$r2,
$r1,
$r1,

y
z
$r1, $r2
x

lw
lw
sub
sw

$r1,
$r2,
$r1,
$r1,

x
w
$r1, $r2
u

often inefficient code generated code optimization


there might be several matching target instructions
some instructions work only with particular registers
exploitation of special processor features difficult,
e.g. auto-increment / decrement addressing

24

Instruction Scheduling (1)


Optimal instruction sequence
minimal number of instructions for the given number of registers
NP-complete problem

t4

t1

t3

t2
t3
t1
t4

:=
:=
:=
:=

c + d
e - t2
a + b
t1 - t3

t1
t2
t3
t4

:=
:=
:=
:=

a + b
c + d
e - t2
t1 - t3

t2

b e

+
c

25

Instruction Scheduling (2)


with 2 registers (R0,R1):
t2
t3
t1
t4

:=
:=
:=
:=

t1
t2
t3
t4

c + d
e - t2
a + b
t1 - t3

:=
:=
:=
:=

a + b
c + d
e - t2
t1 - t3

machine model (here): CPU with


memory operands
- register/register instructions
ADD R0, R1 (R1 = R1 + R0)
- register/memory instructions

MOV
ADD
MOV
SUB
MOV
ADD
SUB
MOV

c, R0
d, R0
e, R1
R0, R1
a, R0
b, R0
R1, R0
R0, t4

MOV e, R0 (R0 = *e)


R0:c
R0:t2
R0:t2
R0:t2
R0:a
R0:t1
R0:t4

R1:R1:R1:e
R1:t3
R1:t3
R1:t3

load contents of address e into


register R0
ADD a, R0 (R0 = R0 + *a)
add contents of address to
register R0

26

Instruction Scheduling (3)


with 2 registers (R0,R1):
t2
t3
t1
t4

:=
:=
:=
:=

c + d
e - t2
a + b
t1 - t3

t1
t2
t3
t4

:=
:=
:=
:=

a + b
c + d
e - t2
t1 - t3

machine model (here): CPU with


memory operands
- register/register instructions
ADD R0, R1 (R1 = R1 + R0)
- register/memory instructions

MOV
ADD
MOV
ADD
MOV
MOV
SUB
MOV
SUB
MOV

a, R0
b, R0
c, R1
d, R1
R0, t1
e, R0
R1, R0
t1, R1
R0, R1
R1, t4

R0:a R1:R0:t1 R1:R0:t1 R1:c


R0:t1 R1:t2
R0:t1 R1:t2
R0:e R1:t2
R0:t3 R1:t2
R0:t3 R1:t1
R0:t3 R1:t4

MOV e, R0 (R0 = *e)


load contents of address e into
register R0
ADD a, R0 (R0 = R0 + *a)
add contents of address to
register R0

t1 and t2 are used below red line (register needs to be temporarily saved to memory,
this is denoted as register spill)

27

Compiler and Code Generation


compiler structure, intermediate code
code generation
code optimization
retargetable compiler

28

Code Optimization
transformations on the intermediate code and on the
target code
peephole optimization
small window (peephole) is moved over the code
several passes, because an optimization can generate new optimization
opportunities

local optimization
transformations inside basic blocks

global optimization
transformations across several basic blocks

29

Peephole Optimizations (1)


deletion of unnecessary instructions
(1)
(2)

lw $r1, a
sw $r1, a

(1)

lw $r1, a

if (1) and (2) are in the same basic block

algebraic simplifications
x := y + 0*(z**4/(y-1));
x := x * 1;
x := x + 0;

x := y;

delete

30

Peephole Optimizations (2)


strength reductions
x := y*8;

x := y << 3;

x := y**2;

x := y * y;

31

Local Optimizations
common sub-expression elimination
(1)
(2)
(3)
(4)

a
b
c
d

:=
:=
:=
:=

b
a
b
a

+
+
-

c
d
c
d

(1)
(2)
(3)
(4)

a
b
c
d

:=
:=
:=
:=

b + c
a - d
b + c
b

variable renaming
t := b + c

u := b + c

normal form of a basic block: each variable is defined only once

instruction interchange
t1 := b + c
t2 := x + y

t2 := x + y
t1 := b + c
32

Global Optimizations (1)


dead code elimination
an instruction that defines x can be deleted if x is not used afterward

copy propagation
(1)
(2)
(3)
(4)

x := t1
a[t2] := t3
a[t4] := x
goto L

(1)
(2)
(3)
(4)

x := t1
a[t2] := t3
a[t4] := t1
goto L

if x is not used after (1), (1) is dead code

33

Global Optimizations (2)


control flow optimizations
(1)
(2) L1

goto L1
.
goto L2

(1)
(2) L1

goto L2
.
goto L2

if L1 is not reachable:
delete (2) (dead code elimination)

34

Global Optimizations (3)


code motion
t = limit*4+2;
while (i <= t)
{
....
}

while (i <= limit*4+2)


{
....
}

if limit is not modified in the loop body

induced variables and strength reduction

(1)
(2)
(3)
(4)

j := n
j := j - 1
t4 := 4 * j
t5 := a[t4]
if t5 > v goto (1)

(1)
(2)
(3)
(4)

j := n
t4 := 4 * j
j := j - 1
t4 := t4 - 4
t5 := a[t4]
if t5 > v goto (1)
35

Compiler and Code Generation


compiler structure, intermediate code
code generation
code optimization
retargetable compiler

36

Retargetable Compiler
portable compiler
developer retargetable
code generation by tree pattern matching

compiler-compiler
user retargetable (semi-automatic)
compiler is generated from a description of the target architecture
(processor model)

machine independent compiler


automatically retargetable
compiler generates code for several processors / processor variants
for parametrizable architectures

37

Tree Pattern Matching


rules for transforming a syntax tree (DAG) are given as
tree patterns
replacement

example:

pattern { action }

reg i

reg i

{ ADD Rj, Ri }

reg j

stepwise replacement by tree pattern matching until the tree


contains only one node
38

Target Instructions (1)

(1)

reg i

const c { MOV #c, Ri }

(4)

mem

:= { MOV Rj, *Ri }


ind

(2) reg i

reg i

mem a { MOV a, Ri }

(5) reg i
(3)

mem

:= { MOV Ri, a }
mem a

reg i

reg j

ind { MOV c(Rj), Ri }


+
const c

reg j

39

Target Instructions (2)

(7) reg i

+ { ADD Rj, Ri }
reg i

(6) reg i

reg j

+ { ADD c(Rj), Ri }
reg i

ind
(8) reg i

+
const c

reg j

+ { INC Ri }
reg i

const 1

40

Tree Pattern Matching - Example (0)


a[i] := b + 1

:=
+

ind

+
+
const _a

mem b

const 1

ind

reg SP

const _i

+
reg SP
41

Tree Pattern Matching - Example (0)


a[i] := b + 1
b is a global variable
stored on the heap
compiler knows address (absolute
addressing)
a and i are local variables
stored on the stack
compiler knows offset from the SP
(relative addressing)
offset is stored in constants _a and
_i
how to compute address of a[i]?
get value of i (read memory at
address SP+_i
a[i] is located at address SP+_a+i

0x000
0x100

42

0x104
0x108

0xF00

stack pointer (SP)

0xF04
0xF08

a[0]

0xF0C
0xF10

11

a[1]

0xF14

a[2]

0xF1B

a[3]

_i=0x8
_a=0xC

heap memory
stack memory

42

Tree Pattern Matching - Example (1)


a[i] := b + 1

:=
+

ind

+
(1) { MOV #_a, R0 }

const _a

mem b

const 1

ind

reg SP

const _i

+
reg SP
43

Tree Pattern Matching - Example (2)


a[i] := b + 1

:=
+

ind

(7) { ADD SP, R0 }

+
reg 0

mem b

const 1

ind

reg SP

const _i

+
reg SP
44

Tree Pattern Matching - Example (3)


:=

a[i] := b + 1

ind

(6) { ADD _i(SP), R0 }

reg 0

mem b

const 1

ind

+
const _i

reg SP
45

Tree Pattern Matching - Example (4)


a[i] := b + 1

:=
+

ind
reg 0

mem b

const 1

(2) { MOV b, R1 }

46

Tree Pattern Matching - Example (5)


a[i] := b + 1

:=
+

ind
reg 0

reg 1

const 1

(8) { INC R1 }

47

Tree Pattern Matching - Example (6)


a[i] := b + 1

:=
ind

reg 1

reg 0
(4) { MOV R1, *R0 }

MOV
ADD
ADD
MOV
INC
MOV

#_a, R0
SP, R0
_i(SP), R0
b, R1
R1
R1, *R0
48

Compiler Compiler
front-end

back-end

source
program
(DFL)

parsing,
flow graph
generation

pattern
matching

processor
model
(HDL)

instruction
set
extraction

pattern
matcher
generator

optimization

executable
code

RECORD Compiler Compiler:


R.Leupers, Retargetable Generator of Code
Selectors from HDL Processor Models,
European Design and Test Conference, 1997.

49

Instruction Set Extraction


instruction

xx011zz

control bits
xx

reg

acc

01

operation reg[zz] <- reg[xx] + acc!

pattern
zz

reg

reg

+
acc

reg

50

Changes
v1.1.2 (2012-05-01)
tree pattern matching: show DAG to be matched before explanation

v1.1.1 (2012-04-27)
fixed semantic of ADD R0,R1 operation on slide 26 and added a new slide
27 for illustrating the differences between the generated code
move "control flow optimization" to slide 27 because it is not a local but a
global optimization

v1.1.0 (2012-04-24)
updated for SS2012, minor corrections

v1.0.3 (2010-05-05)
fix minor typos in explanation of a[i]= b + 1 memory layout description

v1.0.2 (2010-05-02)
add discussion of how a[i]= b + 1 is stored in memory

v1.0.1 (2010-04-27)
slide 11: clarified that call instruction in 3 addr code returns a value
51