Professional Documents
Culture Documents
CS465
Multi-cycle CPU
CS465
2ns
2ns
2ns
2ns
1ns
What are the delays for lw, sw, R-Type, beq, j instructions?
Multi-cycle CPU
CS465
X
X
X
X
Jump
X
X
X
X
X
X
X
X
R
M
M
Register
Access
6
8
7
5
2
Multi-cycle CPU
CS465
CS465
Outline
Todays
topic
Outline
CS465
Pipelining is Natural!
Laundry
example
CS465
Sequential Laundry
6 PM
11
10
Midnight
Time
30 40 20 30 40 20 30 40 20 30 40 20
T
a
s
k
A
B
O
r
d
e
r
C
D
CS465
Pipelined
Laundry
6 PM
7
8
9
10
11
Midnight
Time
30 40
T
a
s
k
O
r
d
e
r
40
40
40 20
A
B
C
D
Pipeline
CS465
10
9
Time
30 40
T
a
s
k
O
r
d
e
r
A
B
C
40
40
40 20
Multiple
tasks operating
simultaneously using
different resources
Pipelining
doesnt help
latency of single task, it
helps throughput of entire
workload
Pipeline
rate is limited by
slowest pipeline stage
Unbalanced lengths of
pipeline stages reduces
speedup
Pipeline
CS465
11
9
Time
30 40
T
a
s
k
O
r
d
e
r
A
B
40
40
40 20
Potential
speedup =
Number pipeline stages
Time
Stall
for dependencies
C
D
Pipeline
CS465
12
MIPS Pipeline
Pipeline Performance
Instr
ALU op
Memory
access
Register
write
Total time
lw
200ps
100 ps
200ps
200ps
100 ps
800ps
sw
200ps
100 ps
200ps
200ps
R-format
200ps
100 ps
200ps
beq
200ps
100 ps
200ps
Chapter 4 The Processor
14
700ps
100 ps
600ps
500ps
Pipeline Performance
Single-cycle (Tc= 800ps)
Pipeline Speedup
If
Load/store addressing
Can calculate address in 3rd stage, access memory in
4th stage
Hazards
Situations
hazard
hazard
Structure Hazards
Conflict
Hence, pipelined
datapaths require
separate instruction/data memories
Or separate instruction/data caches
Chapter 4 The Processor
19
Structural Hazard:
One
Memory
Time (clock cycles)
Instr 4
Reg
Reg
Mem
Reg
Mem
Reg
Mem
Reg
Mem
Reg
Mem
Reg
Mem
Reg
ALU
Instr 3
Mem
Mem
ALU
Instr 2
Reg
ALU
Instr 1
Mem
ALU
O
r
d
e
r
Load
ALU
I
n
s
t
r.
Mem
Reg
CS465
20
Structural Hazard:
One
Memory
Time (clock cycles)
Instr 2
stall
Reg
Mem
Mem
Reg
Reg
Mem
Reg
Mem
Reg
ALU
Instr 1
Mem
ALU
Mem
Reg
Instr 3
Mem
Reg
ALU
O
r
d
e
r
Load
ALU
I
n
s
t
r.
Mem
Reg
CS465
21
CS465
22
Data
Hazard
Example
Dependences backward in time are hazards
O
r
d
e
r
or r8,r1,r9
Dm
Reg
Dm
Reg
Dm
Reg
Dm
Reg
ALU
and r6,r1,r7
Im
Im
Im
Reg
Im
xor r10,r1,r11
WB
ALU
sub r4,r1,r3
Reg
MEM
ALU
I
n
s
t
r.
Im
EX
ALU
add r1,r2,r3
ID/RF
ALU
Reg
Reg
Reg
Dm
Reg
CS465
23
O
r
d
e
r
or r8,r1,r9
Dm
Reg
Dm
Reg
Dm
Reg
Dm
Reg
ALU
and r6,r1,r7
Im
Im
Im
Reg
Im
xor r10,r1,r11
WB
ALU
sub r4,r1,r3
Reg
MEM
ALU
I
n
s
t
r.
Im
EX
ALU
add r1,r2,r3
ID/RF
ALU
Reg
Reg
Reg
Dm
Reg
CS465
24
sub r4,r1,r3
Im
Reg
Im
EX
MEM
Dm
Reg
ALU
lw r1,0(r2)
ID/RF
ALU
WB
Reg
Dm
Reg
Cant
Pipeline
CS465
25
sub r4,r1,r3
Im
Reg
Stall
EX
MEM
WB
Dm
Reg
Im
Reg
ALU
lw r1,0(r2)
ID/RF
ALU
Dm
Reg
Must
CS465
26
stall
stall
lw
lw
add
sw
lw
add
sw
$t1,
$t2,
$t3,
$t3,
$t4,
$t5,
$t5,
0($t0)
4($t0)
$t1, $t2
12($t0)
8($t0)
$t1, $t4
16($t0)
lw
lw
lw
add
sw
add
sw
13 cycles
Chapter 4 The Processor
27
$t1,
$t2,
$t4,
$t3,
$t3,
$t5,
$t5,
0($t0)
4($t0)
8($t0)
$t1, $t2
12($t0)
$t1, $t4
16($t0)
11 cycles
Control Hazards
Branch
In
MIPS pipeline
Stall on Branch
Wait
Branch Prediction
Longer
Predict
outcome of branch
MIPS pipeline
if wrong
Impact: 0 lost cycles per branch
instruction if right, 1 if wrong
Need to Squash and restart following
instruction if wrong
Prediction
scheme
CS465
31
Prediction
incorrect
Pipeline Summary
The BIG Picture
Pipelining
Subject
to hazards
MEM
Right-to-left
flow leads to
hazards
WB
Pipeline registers
Need
Pipeline Operation
Cycle-by-cycle
flow of instructions
through the pipelined datapath
Single-clock-cycle pipeline diagram
Shows pipeline usage in a single cycle
Highlight resources used
Well
EX for Load
WB for Load
Wrong
register
number
Chapter 4 The Processor
42
EX for Store
WB for Store
form
Observations
IF
ID
ALU
MEM
WB
NONE
NONE
RegDst, ALUOp, ALUSrc
Branch, MemRead, MemWrite
MemtoReg, RegWrite
51
Pipelined Control
Control
As in single-cycle implementation
Pipelined Control
Pipeline Hazards
CS465
54
Deeper pipeline
Less work per stage shorter clock cycle
Multiple issue
Two-issue packets
One ALU/branch instruction
One load/store instruction
64-bit aligned
ALU/branch, then load/store
Pad an unused instruction with nop
Address
Instruction type
Pipeline Stages
ALU/branch
IF
ID
EX
MEM
WB
n+4
Load/store
IF
ID
EX
MEM
WB
n+8
ALU/branch
IF
ID
EX
MEM
WB
n + 12
Load/store
IF
ID
EX
MEM
WB
n + 16
ALU/branch
IF
ID
EX
MEM
WB
n + 20
Load/store
IF
ID
EX
MEM
WB
Multiple Issue
Speculation
Speculate on load
Roll back if location is updated
Chapter 4 The Processor
60
Compiler/Hardware Speculation
Compiler
packets
Group of instructions that can be issued on a
single cycle
Determined by pipeline resources required
Think
Thought Question:
In
Scheduling Example
Schedule
Loop: lw
addu
sw
addi
bne
Loop:
$t0,
$t0,
$t0,
$s1,
$s1,
0($s1)
$t0, $s2
0($s1)
$s1,4
$zero, Loop
#
#
#
#
#
$t0=array element
add scalar in $s2
store result
decrement pointer
branch $s1!=0
ALU/branch
Load/store
cycle
nop
lw
nop
nop
bne
sw
$t0, 0($s1)
$t0, 4($s1)
Loop Unrolling
Replicate
Use
ALU/branch
Load/store
cycle
lw
$t0, 0($s1)
nop
lw
$t1, 12($s1)
lw
$t2, 8($s1)
lw
$t3, 4($s1)
sw
$t0, 16($s1)
sw
$t1, 12($s1)
nop
sw
$t2, 8($s1)
sw
$t3, 4($s1)
bne
processors
CPU decides whether to issue 0, 1, 2,
each cycle
Avoiding structural and data hazards
Avoids
Example
lw
$t0, 20($s2)
addu $t1, $t0, $t2
sub
$s4, $s4, $t3
slti $t5, $s4, 20
Can start sub while addu is waiting for lw
Chapter 4 The Processor
69
Hold pending
operands
Register Renaming
Reservation stations and reorder buffer
effectively provide register renaming
On instruction issue to reservation station
Speculation
Predict
speculation
Power Efficiency
Complexity
Year
Clock Rate
Pipeline
Stages
Issue
width
Out-of-order/
Speculation
Cores
Power
i486
1989
25MHz
No
5W
Pentium
1993
66MHz
No
10W
Pentium Pro
1997
200MHz
10
Yes
29W
P4 Willamette
2001
2000MHz
22
Yes
75W
P4 Prescott
2004
3600MHz
31
Yes
103W
Core
2006
2930MHz
14
Yes
75W
UltraSparc III
2003
1950MHz
14
No
90W
UltraSparc T1
2005
1200MHz
No
70W
72 physical
registers
!
!
FP is 5 stages longer
Up to 106 RISC-ops in progress
Bottlenecks
!
!
!
Concluding Remarks
Exception
Arises within the CPU
e.g., undefined opcode, overflow, syscall,
Interrupt
From an external I/O controller
4.9 Exceptions
Handling Exceptions
In MIPS, exceptions managed by a System
Control Coprocessor (CP0)
Save PC of offending (or interrupted)
instruction
An Alternate Mechanism
Vectored
Interrupts
either
Handler Actions
Read
Terminate program
Report error using EPC, cause,
Chapter 4 The Processor
82
Exceptions in a Pipeline
Another
to mispredicted branch
Exception Properties
Restartable
exceptions
PC
Exception Example
Exception on add in
40
44
48
4C
50
54
sub
and
or
add
slt
lw
$11,
$12,
$13,
$1,
$15,
$16,
$2, $4
$2, $5
$2, $6
$2, $1
$6, $7
50($7)
sw
sw
$25, 1000($0)
$26, 1004($0)
Handler
80000180
80000184
Exception Example
Exception Example
Multiple Exceptions
In complex pipelines
Multiple instructions issued per cycle
Out-of-order completion
Maintaining precise exceptions is difficult!
Chapter 4 The Processor
89