Pipelining and Parallel Processing

Pipelining and Parallel Processing
ECEG 3202 Computer Architecture and

Organization
By Getachew T.
What is Pipelining?
2 Memory Organization
Laundry Example
from David patersson
Almaz, Bekele, Chala, Desta

each have one load of clothes A B C D
to wash, dry, and fold
Washer takes 30 minutes
Dryer takes 40 minutes
Folder takes 20 minutes
Sequential Laundry
6 PM 7 8 9 10 11 Midnight
Time
30 40 20 30 40 20 30 40 20 30 40 20
T
a A
s
k
B
O
r
d C
e
r D
Sequential laundry takes 6 hours for 4 loads
If they learned pipelining, how long would laundry take?
Pipelined Laundry
Start work ASAP
6 PM 7 8 9 10 11 Midnight
Time
30 40 40 40 40 20
T
a A
s
k
B
O
r
d C
e
r
D
Pipelined laundry takes 3.5 hours for 4 loads

Effect of Pipelining
Pipelining doesnt help latency of single task, it helps
throughput of entire workload
Pipeline rate limited by slowest pipeline stage
Multiple tasks operating simultaneously
Potential speedup = Number pipe stages
Unbalanced lengths of pipe stages reduces speedup
Time to fill pipeline and time to drain it reduces
speedup
Instruction Cycle
Instruction Fetch
Instruction Decoding
Operand Fetch
Execute
Store Result
5 Steps of MIPS Datapath
Instruction Instr. Decode Execute Memory Write
Fetch Reg. Fetch Addr. Calc Access Back
Next PC
MUX
Adder
Next SEQ PC
4 RS1
Zero?
Reg File
Address
Memory
MUX MUX
RS2
Inst
ALU
Memory
Data
RD L
M
MUX
D
Sign
Imm Extend
WB Data
5 Steps of MIPS Datapath
Figure 3.4, Page 134 , CA:AQA 2e
Instruction Instr. Decode Execute Memory Write
Fetch Reg. Fetch Addr. Calc Access Back
Next PC
IF/ID
ID/EX
EX/MEM
MEM/WB
MUX
Next SEQ PC Next SEQ PC
Adder
4 RS1
Zero?
Reg File
Address
Memory
MUX MUX
RS2
ALU
Memory
Data
MUX
Sign
WB Data
Extend
Imm
RD RD RD
Data stationary control

local decode for each instruction phase / pipeline stage
Visualizing Pipelining
Time (clock cycles)
Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7

I
ALU
Ifetch Reg DMem Reg
n
s
t
ALU
r. Ifetch Reg DMem Reg
O
r
ALU
Ifetch Reg DMem Reg
d
e
r
ALU
Ifetch Reg DMem Reg
Pipeline Time Analysis
With Pipeline
k segment pipeline
n tasks
tp clock cycle time
ktp time to complete task T1
(n-1)tp time to complete remaining n-1 tasks
k+(n-1) clock cycles to complete n tasks
(k+n-1)tp time to complete n tasks
Without pipeline
tn time to complete each task
ntn time to complete n tasks
Pipeline Time Analysis
Speedup of pipelining
nt n nt n tn
S= S = lim =
(k + n 1)t p n (k + n 1)t
p tp
Assuming equal time for the pipeline and non-

pipeline kt p
S= =k
tp
Thus, theoretical speedup limit is k, number of
pipeline segments
Arithmetic Pipelining
Floating-point operations
Fixed-point multiplication
Other scientific problem
computations
Hazards due to Pipelining
Limits to pipelining: Hazards prevent next instruction from
executing during its designated clock cycle
Structural hazards: HW cannot support this combination of
instructions (Contention for similar hardware)
Data hazards: Instruction depends on result of prior instruction
still in the pipeline.
Control hazards: Caused by delay between the fetching of
instructions and decisions about changes in control flow
(branches and jumps).
One Memory Port/Structural Hazards
Time (clock cycles)
Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7
ALU
I Load Ifetch Reg DMem Reg
n
s
ALU
t Instr 1 Ifetch Reg DMem Reg
r.
ALU
Ifetch Reg DMem Reg
Instr 2
O
r
ALU
Reg
d Instr 3 Ifetch Reg DMem
ALU
r Instr 4 Ifetch Reg DMem Reg
Data Hazard on R1
Time (clock cycles)
IF ID/RF EX MEM WB
ALU
add r1,r2,r3 Ifetch Reg DMem Reg
n
s
ALU
t sub r4,r1,r3 Ifetch Reg DMem Reg
r.
ALU
Ifetch Reg DMem Reg
O and r6,r1,r7
r
d
ALU
Ifetch Reg DMem Reg
e or r8,r6,r9
r
ALU
Reg
xor r10,r1,r11 Ifetch Reg DMem
Control Hazard due to Branches
ALU
10: beq r1,r3,36 Ifetch Reg DMem Reg
ALU
Ifetch Reg DMem Reg
14: and r2,r3,r5
ALU
Reg Reg
18: or r6,r1,r7 Ifetch DMem
ALU
Ifetch Reg DMem Reg
22: add r8,r1,r9
ALU
36: xor r10,r1,r11 Ifetch Reg DMem Reg
Solutions
Instruction Reordering
Branch Prediction
Parallel Processing
Concurrent data processing
Possibilities
Fetch next instruction while current instruction is executed in
ALU
System may have more than one ALU
System may have more than one CPU
Overall goal is to increase throughput
Multiple Functional Units
Parallel Processing Classifications
Classification of parallel processing can be considered based
on
Internal organization of processors
Interconnection structure between processors
Flow of information through system
Flynns classification
SISD: Single Instruction, Single Data
SIMD: Single Instruction, Multiple Data
MISD: Multiple Instruction, Single Data
MIMD: Multiple Instruction, Multiple Data
SISD
Single computer with
Control Unit
CPU, and
Memory
Instructions are executed sequentially
Parallel processing achieved by
Multiple functional units
Pipeline processing
SIMD
Multiple processing units supervised by a common control
unit
All processors:
Receive same instruction received from the control unit
Operate on different data
Shared memory unit must have multiple modules so multiple
processors can each access their own memory module
simultaneously
MIMD
Computer system that simultaneously executes many
programs
Category for most multiprocessor and multicomputer
systems

Pipelining and Parallel Processing

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Pipelining and Parallel Processing

Uploaded by

Copyright:

Available Formats

Pipelining and Parallel Processing

ECEG 3202 Computer Architecture and

Almaz, Bekele, Chala, Desta

Pipelined laundry takes 3.5 hours for 4 loads

Data stationary control

Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7

Assuming equal time for the pipeline and non-

You might also like