Professional Documents
Culture Documents
Datapath Design:
Capabilities & performance characteristics of principal
Functional Units (FUs):
(e.g., Registers, ALU, Shifters, Logic Units, ...)
Ways in which these components are interconnected (buses
connections, multiplexors, etc.).
How information flows between components.
Software
Assembly Language
Programs
Application
Operating
System
Machine Language
Program
Compiler
Software/Hardware
Boundary
Firmware
Instruction Set
Architecture
Hardware
Digital Design
Circuit Design
Microprogram
Layout
Logic Diagrams
Register Transfer
Notation (RTN)
Circuit Diagrams
EECC550 - Shaaban
#2 Final Review Winter 2000 2-19-2001
How can one measure the performance of this machine running this
program?
Intuitively the machine is said to be faster or has better
performance running this program if the total execution time is
shorter.
Thus the inverse of the total measured program execution time is
a possible performance measure or metric:
PerformanceA = 1 / Execution TimeA
How to compare performance of different machines?
What factors affect performance? How to improve performance?
EECC550 - Shaaban
#3 Final Review Winter 2000 2-19-2001
== Seconds
Seconds
Program
Program
==Instructions
xx Seconds
Instructions xx Cycles
Cycles
Seconds
Program
Instruction
Cycle
Program
Instruction
Cycle
EECC550 - Shaaban
#4 Final Review Winter 2000 2-19-2001
== Seconds
Seconds
Program
Program
==Instructions
xx Seconds
Instructions xx Cycles
Cycles
Seconds
Program
Instruction
Cycle
Program
Instruction
Cycle
Instruction
Count
CPI
Program
Compiler
Instruction Set
Architecture (ISA)
Organization
Technology
Clock Rate
X
X
EECC550 - Shaaban
#5 Final Review Winter 2000 2-19-2001
Instruction Count
Depends on:
Program Used
Compiler
ISA
CPU Organization
CPI
Clock
Cycle
Depends on:
CPU Organization
Technology
EECC550 - Shaaban
#6 Final Review Winter 2000 2-19-2001
CPI
1
2
3
CPI =
i =1
(CPI F )
i
EECC550 - Shaaban
#8 Final Review Winter 2000 2-19-2001
% Time
23%
45%
14%
18%
Typical Mix
CPI =
(CPI
n
i =1
F )
i
CPI = .5 x 1 + .2 x 5 + .1 x 3 + .2 x 2 = 2.2
EECC550 - Shaaban
#9 Final Review Winter 2000 2-19-2001
Application
Programming
Language
Compiler
ISA
Datapath
Control
Function Units
Transistors Wires Pins
EECC550 - Shaaban
#10 Final Review Winter 2000
2-19-2001
Amdahls Law:
Performance improvement or speedup due to enhancement E:
Execution Time without E
Speedup(E) = -------------------------------------Execution Time with E
Performance with E
= --------------------------------Performance without E
EECC550 - Shaaban
#11 Final Review Winter 2000
2-19-2001
Affected fraction: F
Unchanged
F/S
After:
Execution Time with enhancement E:
Execution Time without enhancement E
1
Speedup(E) = ------------------------------------------------------ = -----------------Execution Time with enhancement E
(1 - F) + F/S
EECC550 - Shaaban
#12 Final Review Winter 2000
2-19-2001
Percentage1 = F1 = 20%
Percentage1 = F2 = 15%
Percentage1 = F3 = 10%
While all three enhancements are in place in the new design, each
enhancement affects a different portion of the code and only one
enhancement can be used at a time.
What is the resulting overall speedup?
Speedup
( (1 F
i
)+
F
S
EECC550 - Shaaban
#13 Final Review Winter 2000
2-19-2001
R-Type
26
op
rs
6 bits
I-Type: ALU
31
26
31
J-Type: Jumps
5 bits
11
rd
shamt
funct
5 bits
5 bits
6 bits
16
0
immediate
rt
5 bits
16 bits
26
op
6 bits
5 bits
21
rs
6 bits
16
rt
5 bits
op
Load/Store, Branch
21
0
target address
26 bits
EECC550 - Shaaban
#14 Final Review Winter 2000
2-19-2001
nPC_sel
RegDst
00
MemtoReg
Rs Rt
5
busA
Rw Ra Rb
32 32-bit
Registers
busB
32
5
imm16
16
32
Data In
32
ExtOp
Clk
32
0
Mux
Clk
Extender
Clk
32
Mux
PC
Mux
Adder
PC Ext
imm16
ALUctr MemWr
Equal
ALU
Adder
32
Imm16
RegWr 5
busW
Rd
Rd Rt
1
4
Rt
Instruction<31:0>
<0:15>
Rs
<11:15>
Adr
<16:20>
<21:25>
Inst
Memory
WrEn Adr
Data
Memory
ALUSrc
EECC550 - Shaaban
#15 Final Review Winter 2000
2-19-2001
<0:25>
Rd
<0:15>
Rs
<11:15>
Rt
<16:20>
Op Fun
<21:25>
Adr
Instruction<31:0>
<21:25>
Instruction
Memory
Imm16 Jump_target
Control Unit
nPC_sel RegWr RegDst ExtOp ALUSrc ALUctr MemWr MemtoReg Jump
Equal
DATA PATH
EECC550 - Shaaban
#16 Final Review Winter 2000
2-19-2001
00 0000
R-type
ori
lw
sw
beq
jump
RegDst
ALUSrc
MemtoReg
RegWrite
MemWrite
Branch
Jump
ExtOp
R-type
Or
Add
Add
Subtract
xxx
ALUop <2>
ALUop <1>
ALUop <0>
ALUop (Symbolic)
EECC550 - Shaaban
#17 Final Review Winter 2000
2-19-2001
Clk-to-Q
New Value
Instruction Memoey Access Time
New Value
Old Value
ALUctr
Old Value
ExtOp
Old Value
New Value
ALUSrc
Old Value
New Value
MemtoReg
Old Value
New Value
RegWr
Old Value
New Value
busA
busB
Register
Write Occurs
Old Value
Delay through Extender & Mux
Old Value
New Value
ALU Delay
Address
Old Value
New Value
Data Memory Access Time
busW
Old Value
New
EECC550 - Shaaban
#18 Final Review Winter 2000
2-19-2001
Reg File
mux
ALU
mux
setup
Load
PC
Inst Memory
ALU
Data Mem
Store
PC
Inst Memory
Reg File
ALU
Data Mem
Branch
PC
Inst Memory
Reg File
mux
cmp
mux setup
mux
EECC550 - Shaaban
#19 Final Review Winter 2000
2-19-2001
EECC550 - Shaaban
#20 Final Review Winter 2000
2-19-2001
EECC550 - Shaaban
#21 Final Review Winter 2000
2-19-2001
storage element
Acyclic
Combinational
Logic (A)
Acyclic
Combinational
Logic
=>
storage element
storage element
Acyclic
Combinational
Logic (B)
storage element
EECC550 - Shaaban
#22 Final Review Winter 2000
2-19-2001
Instruction
of next instruction
Instruction
Decode
Execute
Result
Store
Common
steps
for all
instructions
EECC550 - Shaaban
#23 Final Review Winter 2000
2-19-2001
Result Store
MemWr
MemRd
MemWr
RegDst
RegWr
Reg.
File
Data
Mem
Exec
Mem
Access
ALUctr
ALUSrc
ExtOp
Operand
Fetch
Instruction
Fetch
PC
Next PC
nPC_sel
EECC550 - Shaaban
#24 Final Review Winter 2000
2-19-2001
RegDst
Reg.
RegWr
File
Equal
MemToReg
ALUSrc
ALUctr
MemRd
MemWr
M
Result Store
Operand
Fetch
Instruction
Fetch
Data
Mem
Mem
Access
Reg
File
Ext
ALU
ExtOp
IR
PC
Next PC
nPC_sel
Registers added:
IR:
Instruction register
A, B: Two registers to hold operands read from register file.
R:
or ALUOut, holds the output of the ALU
M:
or Memory data register (MDR) to hold data read from data memory
EECC550 - Shaaban
#25 Final Review Winter 2000
2-19-2001
Logic
Immediate
Load
Store
Branch
Instruction
Fetch
IR Mem[PC]
IR Mem[PC]
IR Mem[PC]
IR Mem[PC]
Instruction
Decode
A R[rs]
A R[rs]
A R[rs]
A R[rs]
R[rs]
B R[rt]
R[rt]
B R[rt]
IR Mem[PC]
If Equal = 1
Execution
R A + B
R A OR ZeroExt[imm16]
R A + SignEx(Im16)
PC PC + 4 +
R A + SignEx(Im16)
(SignExt(imm16) x4)
else
PC PC + 4
Memory
M Mem[R]
Mem[R]
PC PC + 4
Write
Back
R[rd] R
R[rt] R
PC PC + 4
PC PC + 4
R[rd]
PC PC + 4
EECC550 - Shaaban
#26 Final Review Winter 2000
2-19-2001
Next State
Logic
State X
Control State
Register Transfer
Control Points
Depends on Input
Output Logic
outputs (control points)
EECC550 - Shaaban
#27 Final Review Winter 2000
2-19-2001
IR MEM[PC]
A R[rs]
B R[rt]
R A or ZX
R[rd] R
PC PC + 4
R[rt] R
PC PC + 4
To instruction fetch
LW
SW
R A + SX
R A + SX
M MEM[R]
MEM[R] B
PC PC + 4
PC PC +
SX || 00
To instruction fetch
Write-back
R A fun B
ORi
Memory
Execute
R-type
R[rt] M
PC PC + 4
To instruction fetch
EECC550 - Shaaban
#28 Final Review Winter 2000
2-19-2001
2-19-2001
instruction fetch
0000
imem_rd, IRen
A R[rs]
B R[rt]
Aen, Ben
0001
ALUfun, Sen
ORi
LW
R A or ZX
R A + SX
0110
1000
RegDst,
RegWr,
PCen
M MEM[S]
1001
SW
R A + SX
1011
MEM[S] B
PC PC + 4
1100
R[rd] R
PC PC + 4
R[rt] R
PC PC + 4
0101
0111
PC PC + 4
0011
PC PC +
SX || 00
0010
To instruction fetch
state 0000
Write-back
R A fun B
0100
Memory
Execute
R-type
R[rt] M
PC PC + 4
1010
To instruction fetch state 0000
EECC550 - Shaaban
#30 Final Review Winter 2000
2-19-2001
Ops
AB
Exec
Ex Sr ALU S
Mem
RWM
Write-Back
M-R Wr Dst
11
11
11
11
11
11
0 1 fun
1
0 1 1
0 0 or
1
0 1 0
1 0 add 1
1 0 0
1
1 0
1 0 add 1
0 1
EECC550 - Shaaban
#31 Final Review Winter 2000
2-19-2001
EECC550 - Shaaban
#32 Final Review Winter 2000
2-19-2001
IR Mem[PC]
PC PC + 4
A R[rs]
Instruction
Decode
Execution
B R[rt]
Logic
Immediate
IR Mem[PC]
PC PC + 4
Load
Store
IR Mem[PC]
PC PC + 4
IR Mem[PC]
PC PC + 4
A R[rs]
A R[rs]
B R[rt]
B R[rt]
ALUout PC +
(SignExt(imm16)
x4)
ALUout PC +
ALUout A + B
ALUout
(SignExt(imm16) x4)
A OR ZeroExt[imm16]
R[rt]
ALUout PC +
(SignExt(imm16) x4)
ALUout
Branch
IR Mem[PC]
PC PC + 4
A R[rs]
R[rs]
ALUout PC +
ALUout PC +
(SignExt(imm16) x4)
(SignExt(imm16) x4)
If Equal = 1
ALUout
A + SignEx(Im16)
R[rt]
PC ALUout
A + SignEx(Im16)
Memory
M Mem[ALUout]
Write
Back
R[rd] ALUout
R[rt] ALUout
R[rd]
Mem[ALUout]
Mem
EECC550 - Shaaban
#33 Final Review Winter 2000
2-19-2001
instruction fetch
0000
A R[rs]
B R[rt]
ALUout
PC +SX
decode
0001
LW
ALUout
A fun B
ALUout
A op ZX
ALUout
A + SX
0100
0110
1000
M
MEM[ALUout]
1001
BEQ
SW
ALUout
A + SX
If A = B then
PC ALUout
1011
0010
MEM[ALUout]
B
To instruction fetch
Write-back
ORi
Memory
Execute
R-type
1100
R[rd]
ALUout
R[rt]
ALUout
0101
0111
R[rt] M
1010
To instruction fetch
To instruction fetch
EECC550 - Shaaban
#34 Final Review Winter 2000
2-19-2001
Frequency
CPIi x freqIi
Arith/Logic
40%
1.6
Load
30%
1.5
Store
10%
0.4
branch
20%
0.6
Average CPI:
4.1
EECC550 - Shaaban
#35 Final Review Winter 2000
2-19-2001
Sequencing Control
Logic Representation
Implementation
Technique
Microprogram
Microprogram counter
+ Dispatch ROMs
Logic Equations
PLA
hardwired control
Truth Tables
ROM
microprogrammed control
EECC550 - Shaaban
#36 Final Review Winter 2000
2-19-2001
Microprogrammed Control
Finite state machine control for a full set of instructions is very
complex, and may involve a very large number of states:
Slight microoperation changes require new FSM controller.
Microprogramming: Designing the control as a program that
implements the machine instructions.
A microprogam for a given machine instruction is a symbolic
representation of the control involved in executing the instruction
and is comprised of a sequence of microinstructions.
2-19-2001
Multicycle
Datapath
Fields
Microinstruction Address
Inputs
1
Adder
Sequencing
Control
Field
State Reg
Address Select Logic
Types of branching
Set state to 0 (fetch)
Dispatch i (state 1)
Use incremented
address (seq) state
number 2
Microprogram
Counter, MicroPC
Opcode
EECC550 - Shaaban
#38 Final Review Winter 2000
2-19-2001
MemWrite None
IorD
Memory address = PC
IRWrite
None
PCWrite
None
PCWriteCond None
PCSource
PCSource = ALU
Signal name Value Effect
ALUOp
00
ALU adds
01
ALU subtracts
10
ALU does function code
11
ALU does logical OR
ALUSelB
000
2nd ALU input = Reg[rt]
001
2nd ALU input = 4
010
2nd ALU input = sign extended IR[15-0]
011
2nd ALU input = sign extended, shift left 2 IR[15-0]
100
2nd ALU input = zero extended IR[15-0]
EECC550 - Shaaban
#39 Final Review Winter 2000
2-19-2001
EECC550 - Shaaban
#40 Final Review Winter 2000
2-19-2001
ALU
SRC1
SRC2
Fetch:
Add
Add
PC
PC
4
Extshft
Lw:
Add
rs
Extend
Dest.
Memory
Read PC
Sequencing
ALU
Seq
Seq
Fetch
Read ALU
rt MEM
Sw:
Add
rs
Extend
Seq
Fetch
Write ALU
Rtype:
Func
rs
rt
Seq
Fetch
rd ALU
Beq:
Subt.
rs
rt
Ori:
Or
rs
Extend0
Seq
Dispatch
ALUoutCond.
Fetch
Seq
Fetch
rt ALU
EECC550 - Shaaban
#41 Final Review Winter 2000
2-19-2001
10 operations
thus 4 control bits
32
c A
zero
ovf
32
B
ALU
S
32
4
m
00
add
01
addU
02
sub
03
subU
04
and
05
or
06
xor
07
nor
12
slt
13
sltU
EECC550 - Shaaban
#42 Final Review Winter 2000
2-19-2001
Operation
CarryIn
and
1-bit
Full
Adder
Mux
or
Result
add
CarryOut
EECC550 - Shaaban
#43 Final Review Winter 2000
2-19-2001
A0
1-bit
Result0
ALU
B0
CarryOut0
CarryIn1
A1
1-bit
Result1
ALU
B1
CarryOut1
CarryIn2
A2
B2
1-bit
ALU
Result2
CarryIn3
:
:
CarryIn31
A31
B31
CarryOut30
1-bit
ALU
Addition/Subtraction Performance:
Assume gate delay = T
Total delay =
=
=
=
Result31
CarryOut31
EECC550 - Shaaban
#44 Final Review Winter 2000
2-19-2001
A0
B0
A1
B1
Less = 0
A2
B2
Less = 0
1-bit
Result0
ALU
CarryIn1 CarryOut0
1-bit
Result1
ALU
CarryIn2 CarryOut1
1-bit
ALU
CarryIn3
:
:
Zero
Result2
:
:
:
:
CarryOut30
CarryIn31
A31
1-bit
B31
Result31
ALU
Less = 0
CarryOut31
Overflow
EECC550 - Shaaban
#45 Final Review Winter 2000
2-19-2001
A
0
0
1
1
S
G
P
C1 =G0 + C0 P0
A
B
S
G
P
A
B
B
0
1
0
1
C-out
0
C-in
C-in
1
kill
propagate
propagate
generate
G = A and B
P = A xor B
C2 = G1 + G0 P1 + C0 P0 P1
S
G
P
C3 = G2 + G1 P2 + G0 P1 P2 + C0 P0 P1 P2
A
B
S
G
P
G
P
EECC550 - Shaaban
#46 Final Review Winter 2000
2-19-2001
C
L
A
G0
P0
C1 =G0 + C0 P0
4-bit
Adder
C2 = G1 + G0 P1 + C0 P0 P1
Assuming all
gates have
equal delay T
4-bit
Adder
4-bit
Adder
C3 = G2 + G1 P2 + G0 P1 P2 + C0 P0 P1 P2
G
P
C4 = . . .
EECC550 - Shaaban
#47 Final Review Winter 2000
2-19-2001
( 0 x multiplicand).
( 1 x multiplicand).
2-19-2001
0
A3
A3
A3
A3
P7
P6
A2
P5
A2
A1
P4
A2
A1
0
A2
A1
0
A1
0
A0
B0
A0
B1
A0
B2
A0
P3
B3
P2
P1
P0
EECC550 - Shaaban
#49 Final Review Winter 2000
2-19-2001
Control
Write
EECC550 - Shaaban
#50 Final Review Winter 2000
2-19-2001
Multiply Algorithm
Version 3 Product0 = 1
Start
1. Test
Product0
Product0 = 0
32nd
repetition?
Yes: 32 repetitions
Done
EECC550 - Shaaban
#51 Final Review Winter 2000
2-19-2001
0
D
A6
A5
A4
A3
A2
A1
S2 S1 S0
A0
R7
R6
R5
R4
R3
R2
R1
R0
EECC550 - Shaaban
#52 Final Review Winter 2000
2-19-2001
Division
1001
Divisor 1000
1001010
1000
10
101
1010
1000
10
Quotient
Dividend
See how big a number can be subtracted, creating quotient bit on each step:
Binary =>
1 * divisor or 0 * divisor
EECC550 - Shaaban
#53 Final Review Winter 2000
2-19-2001
LO
Shift Left
Remainder (Quotient)
64 bits
Control
Write
EECC550 - Shaaban
#54 Final Review Winter 2000
2-19-2001
Divide Algorithm
Version 3
Test
Remainder
Remainder < 0
nth
repetition?
EECC550 - Shaaban
#55 Final Review Winter 2000
2-19-2001
Value = N = (-1)S X 2
0 < E < 255
Actual exponent is:
e = E - 127
Example:
1
sign S
E-127
X (1.M)
8
E
23
M
exponent:
excess 127
binary integer
added
0 = 0 00000000 0 . . . 0
mantissa:
sign + magnitude, normalized
binary significand with
a hidden integer bit: 1.M
-1.5 = 1 01111111 10 . . . 0
2
-126
(1.0)
1.8 x 10
- 38
127
(2 - 2 -23 )
to
to
3.40 x 10
38
EECC550 - Shaaban
#56 Final Review Winter 2000
2-19-2001
Value = N = (-1)S X 2
0 < E < 255
Actual exponent is:
e = E - 127
Example:
1
sign S
E-127
X (1.M)
8
E
23
M
exponent:
excess 127
binary integer
added
0 = 0 00000000 0 . . . 0
mantissa:
sign + magnitude, normalized
binary significand with
a hidden integer bit: 1.M
-1.5 = 1 01111111 10 . . . 0
2
-126
(1.0)
1.8 x 10
- 38
127
(2 - 2 -23 )
to
to
3.40 x 10
38
EECC550 - Shaaban
#57 Final Review Winter 2000
2-19-2001
Value = N = (-1)S X 2
Example:
1
sign S
E-1023
X (1.M)
11
E
52
M
Mantissa:
sign + magnitude, normalized
binary significand with
a hidden integer bit: 1.M
exponent:
excess 1023
binary integer
added
0 = 0 00000000000 0 . . . 0
-1.5 = 1 01111111111 10 . . . 0
-1022
1023
(2 - 2 - 52 )
(1.0)
to
- 308
2.23 x 10
to
1.8 x 10
308
EECC550 - Shaaban
#58 Final Review Winter 2000
2-19-2001
10001010
1 bit
8 bits
00100101001001000000000
M
23 bits
EECC550 - Shaaban
#59 Final Review Winter 2000
2-19-2001
Start
Compare the exponents of the two numbers
shift the smaller number to the right until its
exponent matches the larger exponent
(1)
(2)
Floating Point
Addition
Flowchart
(3)
(4)
No
Overflow or
Underflow ?
Yes
Generate exception
or return error
No
Still
normalized?
yes
(5)
Done
EECC550 - Shaaban
#60 Final Review Winter 2000
2-19-2001
EECC550 - Shaaban
#61 Final Review Winter 2000
2-19-2001
Floating Point
Multiplication Flowchart
(1)
Is one/both
operands =0?
(2)
Compute exponent:
biased exp.(X) + biased exp.(Y) - bias
(3)
(4)
(5)
Generate exception
or return error
No
Start
Yes
Overflow or
Underflow?
(6)
No
Round or truncate the result mantissa
Still
Normalized?
Yes
(7)
Done
EECC550 - Shaaban
#62 Final Review Winter 2000
2-19-2001
1.xxxxx
1.xxxxx
+ 1.xxxxx
0.001xxxxx
0.01xxxxx
1x.xxxxy
1.xxxxxyyy
1x.xxxxyyy
post-normalization
pre-normalization
Guard Digits: digits to the right of the first p digits of significand to guard
against loss of digits can later be shifted left into first P places during
normalization.
Addition: carry-out shifted in
Subtraction: borrow digit and guard
Multiplication: carry and guard, Division requires guard
EECC550 - Shaaban
#63 Final Review Winter 2000
2-19-2001
Rounding Digits
Normalized result, but some non-zero digits to the right of the
significand --> the number should be rounded
E.g., B = 10, p = 3:
-
2-bias
0 2 1.69 = 1.6900 * 10
0 0 7.85 = - .0785 * 10 2-bias
0 2 1.61 = 1.6115 * 10 2-bias
One round digit must be carried to the right of the guard digit so that
after a normalizing left shift, the result can be rounded, according
to the value of the round digit.
IEEE Standard:
four rounding modes: round to nearest (default)
round towards plus infinity
round towards minus infinity
round towards 0
round to nearest:
round digit < B/2 then truncate
> B/2 then round up (add 1 to ULP: unit in last place)
= B/2 then round to nearest even digit
it can be shown that this strategy minimizes the mean error
introduced by rounding.
EECC550 - Shaaban
#64 Final Review Winter 2000
2-19-2001
Sticky Bit
Additional bit to the right of the round digit to better fine tune rounding
d0 . d1 d2 d3 . . . dp-1 0 0 0
0. 0 0 X... X XX S
XX S
d0 . d1 d2 d3 . . . dp-1 0 0 0
0. 0 0 X... X XX 0
d0 . d1 d2 d3 . . . dp-1 0 0 0
0. 0 0 X... X XX 1
generates a borrow
Rounding Summary:
Radix 2 minimizes wobble in precision.
Normal operations in +,-,*,/ require one carry/borrow bit + one guard digit.
One round digit needed for correct rounding.
Sticky bit needed when round digit is B/2 for max accuracy.
Rounding to nearest has mean error = 0 if uniform distribution of digits
are assumed.
EECC550 - Shaaban
#65 Final Review Winter 2000
2-19-2001
EECC550 - Shaaban
#66 Final Review Winter 2000
2-19-2001
Instruction I
Instruction I+1
Instruction I+2
Instruction I+3
Instruction I +4
IF
ID
IF
EX
ID
IF
MEM
EX
ID
IF
WB
MEM
EX
ID
IF
WB
MEM
EX
ID
WB
MEM
EX
WB
MEM
WB
Pipeline Stages:
IF
ID
EX
MEM
WB
= Instruction Fetch
= Instruction Decode
= Execution
= Memory Access
= Write Back
First instruction, I
Completed
Last instruction,
I+4 completed
EECC550 - Shaaban
#67 Final Review Winter 2000
2-19-2001
Multicycle Machine:
2 ns/cycle x 4.6 CPI (due to inst mix) x 1000 inst = 9200 ns
2-19-2001
Cycle 2
Clk
Single Cycle Implementation:
8 ns
Load
Store
Waste
2ns
Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9 Cycle 10
Clk
Multiple Cycle Implementation:
Load
IF
Store
ID
EX
MEM
WB
MEM
WB
IF
R-type
ID
EX
MEM
IF
Pipeline Implementation:
Load IF
ID
Store IF
EX
ID
R-type IF
EX
ID
MEM
EX
WB
MEM
WB
EECC550 - Shaaban
#69 Final Review Winter 2000
2-19-2001
IF/ID
ID/EX
E X/ME M
MEM/WB
Ad d
4
A dd
A dd
result
PC
Add ress
In struction
memory
Ins truction
Shift
left 2
Read
register 1
Re ad
data 1
Read
register 2
Reg iste rs Re ad
Write
data 2
register
Write
data
Ze ro
0
M
u
x
1
ALU
ALU
result
A ddress
Read
da ta
1
M
u
x
0
Data
me mory
Wri te
data
16
IF
Instruction Fetch
Sign
e xtend
32
ID
Instruction Decode
EX
Execution
MEM
Memory
WB
Write Back
EECC550 - Shaaban
#70 Final Review Winter 2000
2-19-2001
Pipeline Control
Pass needed control signals along from one stage to the next as the
instruction travels through the pipeline just like the data
Instruction
R-format
lw
sw
beq
Write-back
stage control
lines
Reg Mem to
write
Reg
1
0
1
1
0
X
0
X
WB
Instruction
IF/ID
Control
WB
EX
WB
ID/EX
EX/MEM
MEM/WB
EECC550 - Shaaban
#71 Final Review Winter 2000
2-19-2001
2-19-2001
2-19-2001
Pipeline Hazards
Hazards are situations in pipelining which prevent the next
instruction in the instruction stream from executing during
the designated clock cycle resulting in one or more stall cycles.
Hazards reduce the ideal speedup gained from pipelining and
are classified into three classes:
2-19-2001
Instr 4
Reg
Mem
Reg
Mem
Reg
Mem
Reg
Mem
Reg
Mem
Reg
Mem
Reg
ALU
Instr 3
Reg
ALU
Instr 2
Mem
Mem
ALU
Instr 1
Reg
ALU
O
r
d
e
r
Load
Mem
ALU
I
n
s
t
r.
Mem
Reg
Detection is easy in this case (right half highlight means read, left half write)
EECC550 - Shaaban
#75 Final Review Winter 2000
2-19-2001
sub
and
or
add
sw
$2, $1, $3
$12, $2, $5
$13, $6, $2
$14, $2, $2
$15, 100($2)
CC 2
CC 3
CC 4
CC 5
CC 6
CC 7
CC 8
CC 9
10
10
10
10/ 20
20
20
20
20
DM
Reg
Program
execution
order
(in instructions)
sub $2, $1, $3
or $13, $6, $2
sw $15, 100($2)
IM
Reg
IM
DM
Reg
IM
DM
Reg
IM
Reg
DM
Reg
IM
Reg
Reg
Reg
DM
Reg
EECC550 - Shaaban
#76 Final Review Winter 2000
2-19-2001
or $13, $6, $2
sw $15, 100($2)
IM
CC 2
CC 3
CC 4
CC 5
CC 6
CC 7
CC 8
CC 9
CC 10
CC 11
10
10
10
10/ 20
20
20
20
20
20
20
DM
Reg
DM
Reg
Reg
IM
STALL
STALL
Reg
STALL
STALL
IM
DM
Reg
IM
DM
Reg
IM
Reg
Reg
Reg
DM
Reg
EECC550 - Shaaban
#77 Final Review Winter 2000
2-19-2001
or $13, $6, $2
sw $15, 100($2)
IM
CC 2
CC 3
CC 4
CC 5
CC 6
CC 7
CC 8
CC 9
CC 10
CC 11
10
10
10
10/ 20
20
20
20
20
20
20
DM
Reg
DM
Reg
Reg
IM
STALL
STALL
Reg
STALL
STALL
IM
DM
Reg
IM
DM
Reg
IM
Reg
Reg
Reg
DM
Reg
EECC550 - Shaaban
#78 Final Review Winter 2000
2-19-2001
EECC550 - Shaaban
#79 Final Review Winter 2000
2-19-2001
EECC550 - Shaaban
#80 Final Review Winter 2000
2-19-2001
CC 2
CC 3
CC 4
CC 5
CC 6
CC 7
CC 8
CC 9
10
X
X
10
X
X
10
20
X
10/ 20
X
20
20
X
X
20
X
X
20
X
X
20
X
X
DM
Reg
Program
execution order
(in instructions)
sub $2, $1, $3
or $13, $6, $2
sw $15, 100($2)
IM
Reg
IM
Reg
IM
DM
Reg
IM
Reg
DM
Reg
IM
Reg
DM
Reg
Reg
DM
Reg
EECC550 - Shaaban
#81 Final Review Winter 2000
2-19-2001
or $8, $2, $6
IM
CC 2
CC 3
Reg
IM
CC 4
CC 5
DM
Reg
Reg
IM
DM
Reg
IM
CC 6
CC 8
CC 9
Reg
DM
Reg
IM
CC 7
Reg
DM
Reg
Reg
DM
Reg
EECC550 - Shaaban
#82 Final Review Winter 2000
2-19-2001
EECC550 - Shaaban
#83 Final Review Winter 2000
2-19-2001
48 or $13, $6, $2
72 lw $4, 50($7)
IM
CC 3
Reg
IM
CC 4
CC 5
DM
Reg
Reg
IM
DM
Reg
IM
CC 6
CC 8
CC 9
Reg
DM
Reg
IM
CC 7
Reg
DM
Reg
Reg
DM
Reg
EECC550 - Shaaban
#84 Final Review Winter 2000
2-19-2001
Next PC of a branch known in MEM stage: Costs three lost cycles if taken.
If next PC is known in EX stage, one cycle is saved.
Branch address calculation can be moved to ID stage using a register comparator, costing
only one cycle if branch is taken.
IF.Flush
H az ard
detection
unit
ID/EX
M
u
x
WB
C ontrol
0
M
u
x
IF/ID
WB
EX
MEM/WB
WB
Shift
left 2
R egisters
PC
EX/MEM
M
u
x
Instruc tion
m emory
Data
memory
ALU
M
u
x
M
u
x
Sign
extend
M
u
x
F orw arding
unit
EECC550 - Shaaban
#85 Final Review Winter 2000
2-19-2001
Frequency
40%
30%
of which 25% are followed immediately by
an instruction using the loaded value
10%
20%
of which 45% are taken
1 +
1 +
1 +
1.165
EECC550 - Shaaban
#86 Final Review Winter 2000
2-19-2001
100
CPU
Processor-Memory
Performance Gap:
(grows 50% / year)
10
DRAM
Proc
60%/yr.
DRAM
7%/yr.
1980
1981
1982
1983
1984
1985
1986
1987
1988
1989
1990
1991
1992
1993
1994
1995
1996
1997
1998
1999
2000
Performance
1000
EECC550 - Shaaban
#87 Final Review Winter 2000
2-19-2001
Year
CPU
speed
MHZ
1986:
1988:
1991:
1994:
1996:
8
33
75
200
300
CPU
cycle
ns
125
30
13.3
5
3.33
Memory
Access
ns
190
175
155
130
100
190/125
175/30
155/13.3
130/5
110/3.33
= 1.5
= 5.8
= 11.65
= 26
= 33
EECC550 - Shaaban
#88 Final Review Winter 2000
2-19-2001
2-19-2001
Registers
Cache
Main Memory
Magnetic Disc
Optical Disk or Magnetic Tape
EECC550 - Shaaban
#90 Final Review Winter 2000
2-19-2001
EECC550 - Shaaban
#91 Final Review Winter 2000
2-19-2001
EECC550 - Shaaban
#92 Final Review Winter 2000
2-19-2001
Block Address
Tag
Index
Block
Offset
EECC550 - Shaaban
#93 Final Review Winter 2000
2-19-2001
Block Address
Tag
Block
Offset
Index
EECC550 - Shaaban
#94 Final Review Winter 2000
2-19-2001
Tag
Field
31 3 0
12 11 10 9 8
22
Index
T ag
Data
3 2 1 0
Index
Field
Tag
Data
T ag
Data
T ag
Data
0
1
2
253
254
255
22
256 sets
1024 block frames
32
4- to- 1 m ultiplexo r
Hit
Da ta
EECC550 - Shaaban
#95 Final Review Winter 2000
2-19-2001
2-19-2001
Valid bit
V
Block Address = 12 bits
Tag = 12 bits
Block offset
= 4 bits
EECC550 - Shaaban
#97 Final Review Winter 2000
2-19-2001
Valid bit
V
Index = 7 bits
Block offset
= 4 bits
Main Memory
EECC550 - Shaaban
#98 Final Review Winter 2000
2-19-2001
Index = 6 bits
Block offset
= 4 bits
Main Memory
EECC550 - Shaaban
#99 Final Review Winter 2000
2-19-2001
Winter 2000
2-19-2001
2-way
LRU
Random
5.18% 5.69%
1.88% 2.01%
1.15% 1.17%
4-way
LRU Random
4.67% 5.29%
1.54% 1.66%
1.13% 1.13%
8-way
LRU
Random
4.39% 4.96%
1.39% 1.53%
1.12% 1.12%
EECC550 - Shaaban
#101 Final Review
Winter 2000
2-19-2001
EECC550 - Shaaban
#102 Final Review
Winter 2000
2-19-2001
EECC550 - Shaaban
#103 Final Review
Winter 2000
2-19-2001
Load/store
EECC550 - Shaaban
#104 Final Review
Winter 2000
2-19-2001
CPUs with higher clock rate, have more cycles per cache miss and more
memory impact on CPI
EECC550 - Shaaban
#105 Final Review
Winter 2000
2-19-2001
3 Levels of Cache
CPU
L1 Cache
L2 Cache
L3 Cache
Main Memory
Memory access penalty, M
EECC550 - Shaaban
#106 Final Review
Winter 2000
2-19-2001
Winter 2000
2-19-2001
Find CPI.
With single L1, CPI = 1.1 + 1.3 x .05 x 100 = 7.6
CPI =
Mem Stall cycles per instruction = Mem accesses per instruction x Stall cycles per access
Stall cycles per memory access = [1 - H1] x [ H 2 x T 2 + ( 1-H2 ) x (H 3 x (T2 + T3)
+ (1 - H 3) x M) ]
= [.05] x [ .97 x 2 + (.03) x ( .985 x (2+5)
+
.015 x 100)]
= .05 x [ 1.94 + .03 x ( 6.895 + 1.5) ]
= .05 x [ 1.94 + .274] = .11
Winter 2000
2-19-2001