Professional Documents
Culture Documents
Slide 1
Computer Systems
Communication Networks Institute
Prof. Dr.-Ing. Christian Wietfeld
Slide 2
Pipelined Processors
The drawback of the simple processor architecture is that
each instruction requires several cycles to execute.
Improvement: Pipelined execution of instructions
Slide 3
No Pipeline
Slide 4
Pipelining
Slide 5
Super Pipelining
Slide 6
A Simple Pipeline
Instruction n:
Fetch and
decode
Instruction n:
Execution
Instruction n+1:
Fetch and
decode
Instruction n+1:
Execution
Instruction n+2:
Fetch and
decode
Time
28.10.2015
Computer Systems | Unit 4 Pipelining| Winterterm 2015 | Dipl.-Ing. Ralf Burda
Slide 7
Pipelines
Pipeline stages are connected by clocked pipeline registers
(also called latches)
Each pipeline stages logic delay is at most one clock period
In the optimal case, each instruction requires k clock cycles to
pass a pipeline with k stages
If a new instruction enters the pipeline in each cycle, k
instructions are handled in parallel inside the pipeline and
also one instruction leaves the pipeline at the end (in the ideal
case)
Slide 8
Register
Register
Register
Register
Eingabe
Input
Output
Ausgabe
Slide 9
Definitions
The latency is the time that an instruction requires to pass all
(relevant) pipeline stages. A pipeline with k stages shows a
latency of k clock cycles in the ideal case.
The throughput of a pipeline specifies the number of
instructions that can leave the pipeline in a single cycle. This
value represents the (theoretical) performance of a pipeline
Slide 10
Speedup
We assume n instructions and k steps that are required to
execute one instruction
A processor without pipeline requires n*k clock cycles
A processor with pipeline requires k+n-1 clock cycles
We assume an ideal pipeline with a latency of k and a throughput of 1
Slide 11
Basic Pipeline
Instruction fetch,
Instruction decode,
Operand fetch from the register file
(the memory where all registers are located)
Instruction execution inside the ALU (Arithmetic Logic Unit)
Write back of the result to the register file
Sometimes, instruction decode and operand fetch are combined in
one single pipeline stage.
Load/Store instructions require an address calculation and at least one
(additional) memory access stage.
Computer Systems | Unit 4 Pipelining| Winterterm 2015 | Dipl.-Ing. Ralf Burda
Slide 12
IF
5-Deep
ID
EX MEM WB
IF
-- Instruction Fetch
ID
EX
-- Execute/Address Calculation
ID
EX MEM WB
IF
ID
EX MEM WB
IF
ID
EX MEM WB
IF
ID
WB
-- Write Back
EX MEM WB
Slide 13
Slide 14
Instruction Fetch
32 1
Instruction
Register
PC
IF/ID
Registers
Add
I-cache
4
32
MUX
PC
Slide 15
32
32
Result
Register
Selector
5
Register File
5
Register Addressing
PC
Instruction
Register
Immediate
Register
32
Sign
Extended
16
ID/EX
Registers
32
Instruction decode/
register fetch (ID)
ALU Input
Register 2
ALU Input
Register 1
PC
I F /I D
R eg i sters
Slide 16
Execute
True/False
ALU Output
Register
Register
True/False
1
Store Value
Register
ALU
Zero ?
MUX
MUX
32
PC
ALU Input
Register 1
ALU Input
Register 2
Immediate
Register
EX/MEM
Registers
Execution/effective
address calculation (EX)
Conditional
ID/EX
Registers
Slide 17
Write
back (WB)
Load/Store
Address
MEM/WB
Registers
True/False
Conditional
Register
ALU Output
Register
Store Value
Register
Memory access/branch
completion (MEM)
D-cache
ALU Result
Register
Load Memory
Data Register
EX/MEM
Registers
Slide 18
Slide 19
Resource Conflict
Instruction
LOAD
i+1
i+2
i+3
i+4
Cycle Number
1 2 3 4 5 6 7
IF ID EX ME WB
IF ID EX ME WB
9 10
Access Conflict
IF ID EX ME WB
IF ID EX ME WB
IF ID EX ME WB
Slide 20
i+1
i+2
i+3
i+4
Cycle Number
1 2 3 4 5 6 7
IF ID EX ME WB
IF ID EX ME WB
9 10
IF ID EX ME WB
O IF ID EX ME WB
Bubble
IF ID EX ME WB
Slide 21
Data Dependencies
Assume two instructions I1 and I2:
A true dependence st exists if I1 generates a result that is
required by I2
An anti-dependence sa exists if I1 reads a register that is
overwritten by I2
An output dependence so exists if both instructions write to
the same destination
Slide 22
Formal structure:
IND: OPERATION, DEST, OP1, OP2
S1:ADD R1,R2,2;
S2:ADD R4,R1,R3;
S3:MULT R3,R5,3;
S4:MULT R3,R6,3;
R1 = R2+2
R4 = R1+R3
R3 = R5*3
R3 = R6*3
True Dependence
S2
Anti D ependence
S3
Output Dependence
S4
Computer Systems | Unit 4 Pipelining| Winterterm 2015 | Dipl.-Ing. Ralf Burda
Slide 23
Data Conflicts
Data conflicts can occur if two instructions with data
dependencies are located close to each other.
Close depends on the pipeline structure and the actual
instructions.
Three kinds of data conflicts can occur:
Read after write (RAW), caused by a true dependence
Write after read (WAR), caused by an anti dependence
Write after write (WAW), caused by an output dependence
Slide 24
Data Conflicts
1
ADD
R1 R2 R3
SUB
R4 R1 R5
AND
R6 R4 R1
OR
R7 R1 R6
XOR
R8 R1 R4
IF ID EX ME WB
IF ID EX ME WB
IF ID EX ME WB
IF ID EX ME WB
IF ID EX ME WB
Slide 25
Data Conflicts
1
ADD
R1 R2 R3
SUB
R4 R1 R5
AND
R6 R4 R1
OR
R7 R1 R6
XOR
R8 R1 R4
IF ID EX ME WB
IF ID EX ME WB
IF ID EX ME WB
IF ID EX ME WB
IF ID EX ME WB
Slide 26
Data Conflicts
1
ADD
R1 R2 R3
SUB
R4 R1 R5
AND
R6 R4 R1
OR
R7 R1 R6
XOR
R8 R1 R4
IF ID EX ME WB
IF ID EX ME WB
IF ID EX ME W
WB
B
IF ID EX ME WB
IF ID EX ME WB
Slide 27
Hardware solution:
Three stages
shift-register
with parallel
output
Data forwarding
MUX
ALU
MUX
Control
logic
Slide 28
R1 4(R2)
ADD
R4 R1 R3
AND
R5 R6 R7
OR
R7 R6 R8
IF ID EX ME WB
IF ID EX ME WB
Dependence
IF ID EX ME WB
IF ID EX ME WB
Slide 29
Load Instructions
1
LW
R1 4(R2)
ADD
R4 R1 R3
AND
R5 R6 R7 Bubble IF
O ID EX ME WB
R7 R6 R8
O IF ID EX ME WB
OR
IF ID EX ME WB
IF ID
O EX ME WB
Slide 30
Instruction Reordering
ADD R1
R1 R2 R3
SUB R4 R1 R5
AND R6 R7 R8
OR R9 R10 R11
XOR R12 R13 R14
Slide 31
Instruction Reordering
ADD R1 R2 R3
SUB R4 R1 R5
AND R6 R7 R8
OR R9 R10 R11
XOR R12 R13 R14
Slide 32
Control Dependencies
Branching
Instruction i+1
IF ID EX ME WB
IF ID EX ME WB
Instruction i+2
Instruction i+3
IF ID EX ME WB
IF ID EX ME WB
Slide 33
Control Dependencies
Branching
Instruction i+1
Instruction i+2
IF ID EX ME WB
IF
IF
Bubbles
Instruction i+3
O IF ID EX ME WB
O IF ID EX ME WB
O IF ID EX ME WB
Slide 34
R4
R1
R1
R2
R2
R5
R3
R3
offset
Slide 35
R4
R1
R1
R2
R2
R5
R3
R3
offset
Slide 36
Slide 37