You are on page 1of 36

CS M151B / EE M116C

Computer Systems Architecture

Multicycle

Instructor: Prof. Lei He


<LHE@ee.ucla.edu>

Some notes adopted from Glenn Reinman

Single Cycle Implementation

0
M
u
x
AL U
Add res u lt
A dd

Sh i f t
le ft 2

R eg Dst
B ranc h

1
PCS rc

M e mRe ad
In struction [31 26]

Co ntrol

M e mto Re g
A LU Op
M e mWr ite
A LU S rc
R e gWrite

PC

In struction [25 21]

R ea d
a d dr es s

In struction [20 16]


Ins tru ctio n
[31 0]

In structi on
me mo ry

R e ad
r egi ster 1

In struction [15 11]

In struction [15 0]

0
M
u
x
1

Re ad
d a ta 1
R e ad
r egi ster 2
Re gis te rs Re ad
W rite
d a ta 2
r egi ster

Zer o
0
M
u
x
1

W rite
d a ta

16

Ins tructio n [5 0]

Si g n
e xte nd

32
A LU
con tr ol

A LU

AL U
r es ult

A d d r e ss

Write
data

R e ad
da ta
D a ta
me m or y

1
M
u
x
0

Why Go Multicycle?

Single-cycle designs need a cycle time that can


fit the instruction with the longest delay
Solution: break execution into smaller tasks
each task takes a cycle
different instructions require different numbers of
cycles

Need fewer logic blocks


One ALU versus ALU plus 2 adders?
One unified (instruction + data) memory port?

CPI will increase, but cycle time should drop

Steps in the Multicycle Implementation

Five execution steps (some instructions use fewer)

IF: Instruction Fetch


ID: Instruction Decode (& register fetch & add PC+immed)
EX: Execute
Mem: Memory access
WB: Write-Back into registers

Single Cycle Datapath Partitioning

PC Src

Add
4
Shift
left 2

RegWrite
Instruction [25 21]
PC

Read
address
Instruction
[31 0]
Instruction
memory

Instruction [20 16]


1
M
u
Instruction [15 11] x
0

Read
register 1
Read
register 2

Read
data 1

MemWrite
ALUSrc

Read
Wr ite
data 2
register
Wr ite
Registers
data

RegDst
Instruction [15 0]

16

Sign
extend

AL U
Add result

1
M
u
x
0

1
M
u
x
0

Zer o
ALU ALU
result

MemtoReg
Address

Write
data

32
AL U
control

Read
data

Data
memory

1
M
u
x
0

MemRead

Instruction [5 0]
ALUOp

IF

ID

EX

Mem

WB

Goal is to balance work done in each cycle - minimize cycle time!

We Need More State Elements

Instruction
register
PC

Address

Instruction
Memory
or data
Data

Data

Memory
data
register

Register #
Registers
Register #

ALU

ALUOut

B
Register #

Extra registers needed when


signal is computed in one clock cycle and used in another
inputs to the functional block that produces this signal can
change before the signal is written into a state element

Multicycle Datapath

IorD

PC

0
M
u
x
1

MemRead

MemWrite

RegDst

RegWrite

Instruction
[25 21]

Address
Memory
MemData
Write
data

IRWrite

Instruction
[20 16]
Instruction
[15 0]
Instruction
register
Instruction
[15 0]
Memory
data
register

ALUSrcA

0
M
u
x
1

Read
register 1
Read
Read
data 1
register 2
Registers
Write
Read
register
data 2

0
M
Instruction u
x
[15 11]
1
0
M
u
x
1

B
4

Write
data

16

Sign
extend

32

Shift
left 2

Zero
ALU ALU
result

0
1 M
u
2 x
3

ALU
control

Instruction [5 0]

MemtoReg

ALUSrcB ALUOp

ALUOut

Step 1: Fetch

Fetch the instruction from memory


Compute the address of the next sequential
instruction
IR = Mem[PC]
PC = PC + 4
(NOTE: the PC may change later)

Step 2: Decode and Register Fetch

Without even knowing what instruction we have


read registers indicated by rs and rt fields
compute branch target address
A = Reg[IR[25-21]]
B = Reg[IR[20-16]]
ALUOut = PC + (sign-extend (IR[15-0]) << 2)

Up to this point, everything has been instruction


independent
After this, everything is instruction dependent

Step 3: Execution

Load or Store

ALUOut = A + sign-extend(IR[15-0])

R-type
ALUout = A op B

Branch
if (A==B) PC = ALUOut
Branch target was stored in ALUOut from prior step
Branches are complete at this step

Step 4: Memory Access or R-type Completion

Memory Reference
load
MDR = Mem[ALUout]

store
Mem[ALUout] = B

NOTE: ALUout held effective address calculation

R-type (Writeback)
Reg[IR[15-11]] = ALUout
R-type instructions are complete at this step

Step 5: Memory Writeback

Load

Reg[IR[20-16]] = MDR
Memory ops are complete at this step

Step Summary

Step

R-type

Memory

Branch

Instruction
Fetch

IR = Mem[PC]
PC = PC+4

Instruction
Decode /
Register Fetch

A = Reg[IR[25-21]]
B = Reg[IR[20-16]]
ALUout = PC + (sign-extend(IR[15-0]) << 2)

Execution

ALUout = A op B

ALUout = A + signextend(IR[15-0])

Memory Access
or R-type
Completion

Reg[IR[15-11]] =
ALUout

MDR = Memory[ALUout]
OR
Memory[ALUout = MDR

Writeback

Reg[IR[20-16]] = MDR

if (A==B) then PC =
ALUout

Multicycle Control and Datapath

PCWrit eCon d

PCSour ce

PCWrit e
O utpu ts ALUO p
Ior D
AL USrcB
M emRe ad
ALUSrcA
Co ntr ol
M emWr ite
Re gW rite
Me mto Reg
IRWrite

Op
[5 0]

RegDst
0

26

Ins tr uc tion [25 0]

PC

0
M
u
x
1

Shift
left 2

Instruc tion
[31- 26]
Addres s
Memor y
MemData
Write
data

Instruction
[25 21]

R ead
r egister 1

Instruction
[20 16]

R ead
R ead
r egister 2 data 1
Registers
Write
R ead
r egister data 2

Instruction
[15 0]
Instruction
register
Instruc tion
[15 0]
Memory
data
register

0
M
Instruction u
x
[15 11]
1
0
M
u
x
1

B
4

Write
d a ta

16

Sign
extend

32

Shift
left 2

Jum p
ad dre ss [3 1-0 ]

Zero
ALU ALU
r es ult

0
1 M
u
2 x
3

ALU
c ontrol

Ins truction [5 0]

also includes logic for JUMP instruction

ALUOut

1 u
x

PC [31-28 ]

0
M
u
x
1

28

Review: Finite State Machines

Finite state machines:


a set of states
next state function

determined by current state and the input

output function
determined by current state and possibly input

Current state

Next-state
function

Next
state

Clock
Inputs

Output
function

Well use a Moore machine


output based only on current state

Outputs

Step 1: Fetch

PCWrit eCon d

PCSour ce

PCWrit e
O utpu ts ALUO p
Ior D
AL USrcB
M emRe ad
ALUSrcA
Co ntr ol
M emWr ite
Re gW rite
Me mto Reg
IRWrite

Op
[5 0]

RegDst
0

26

Ins tr uc tion [25 0]

PC

0
M
u
x
1

Shift
left 2

Instruc tion
[31- 26]
Addres s
Memor y
MemData
Write
data

Instruction
[25 21]

R ead
r egister 1

Instruction
[20 16]

R ead
R ead
r egister 2 data 1
Registers
Write
R ead
r egister data 2

Instruction
[15 0]
Instruction
register
Instruc tion
[15 0]
Memory
data
register

0
M
Instruction u
x
[15 11]
1
0
M
u
x
1

B
4

Write
d a ta

16

Sign
extend

32

Shift
left 2

Jum p
ad dre ss [3 1-0 ]

Zero
ALU ALU
r es ult

ALUOut

0
1 M
u
2 x
3

ALU
c ontrol

Ins truction [5 0]

IR = Memory [PC]

PC=PC+4

MemRead, ALUsrcA = 0, IorD = 0, IRwrite, ALUsrcB = 01, ALUop = 00, PCwrite, PCsource = 00

1 u
x

PC [31-28 ]

0
M
u
x
1

28

Control FSM

Instruction Fetch
MemRead
ALUsrcA = 0
IorD = 0
IRwrite
ALUsrcB = 01
ALUop = 00
PCwrite
PCsource = 00

Step 2: Decode and Register Fetch

PCWrit eCon d

PCSour ce

PCWrit e
O utpu ts ALUO p
Ior D
AL USrcB
M emRe ad
ALUSrcA
Co ntr ol
M emWr ite
Re gW rite
Me mto Reg
IRWrite

Op
[5 0]

RegDst
0

26

Ins tr uc tion [25 0]

PC

0
M
u
x
1

Shift
left 2

Instruc tion
[31- 26]
Addres s
Memor y
MemData
Write
data

Instruction
[25 21]

R ead
r egister 1

Instruction
[20 16]

R ead
R ead
r egister 2 data 1
Registers
Write
R ead
r egister data 2

Instruction
[15 0]
Instruction
register

0
M
Instruction u
x
[15 11]
1

Instruc tion
[15 0]
Memory
data
register

0
M
u
x
1

B
4

Write
d a ta

16

Sign
extend

32

Shift
left 2

Ins truction [5 0]

A = Reg[IR[25-21]]
B = Reg[IR[20-16]]
ALUOut = PC + (sign-extend (IR[15-0]) << 2)
ALUsrcA = 0, ALUsrcB = 11, ALUop = 00

Jum p
ad dre ss [3 1-0 ]

Zero
ALU ALU
r es ult

0
1 M
u
2 x
3

ALU
c ontrol

ALUOut

1 u
x

PC [31-28 ]

0
M
u
x
1

28

Control FSM

MemRead
ALUSrcA = 0
IorD = 0
IRWrite
ALUSrcB = 01
ALUOp = 00
PCWrite
PCSource = 00

Instruction decode/
Register fetch
1
ALUSrcA = 0
ALUSrcB = 11
ALUOp = 00

(Op = 'JMP')

Start

Instruction fetch

Memory reference FSM


(Figure 5.38)

R-type FSM
(Figure 5.39)

Branch FSM
(Figure 5.40)

Jump FSM
(Figure 5.41)

Step 3: R-Type Execution

PCWrit eCon d

PCSour ce

PCWrit e
O utpu ts ALUO p
Ior D
AL USrcB
M emRe ad
ALUSrcA
Co ntr ol
M emWr ite
Re gW rite
Me mto Reg
IRWrite

Op
[5 0]

RegDst
0

26

Ins tr uc tion [25 0]

PC

0
M
u
x
1

Shift
left 2

Instruc tion
[31- 26]
Addres s
Memor y
MemData
Write
data

Instruction
[25 21]

R ead
r egister 1

Instruction
[20 16]

R ead
R ead
r egister 2 data 1
Registers
Write
R ead
r egister data 2

Instruction
[15 0]
Instruction
register

0
M
Instruction u
x
[15 11]
1

Instruc tion
[15 0]
Memory
data
register

0
M
u
x
1

Write
d a ta

16

Sign
extend

32

Ins truction [5 0]

ALUOut = A op B
ALUsrcA = 1, ALUsrcB = 00, ALUop = 10

Shift
left 2

Jum p
ad dre ss [3 1-0 ]

Zero
ALU ALU
r es ult

0
1 M
u
2 x
3

ALU
c ontrol

ALUOut

1 u
x

PC [31-28 ]

0
M
u
x
1

28

Step 4: R-Type Writeback

PCWrit eCon d

PCSour ce

PCWrit e
O utpu ts ALUO p
Ior D
AL USrcB
M emRe ad
ALUSrcA
Co ntr ol
M emWr ite
Re gW rite
Me mto Reg
IRWrite

Op
[5 0]

RegDst
0

26

Ins tr uc tion [25 0]

PC

0
M
u
x
1

Shift
left 2

Instruc tion
[31- 26]
Addres s
Memor y
MemData
Write
data

Instruction
[25 21]

R ead
r egister 1

Instruction
[20 16]

R ead
R ead
r egister 2 data 1
Registers
Write
R ead
r egister data 2

Instruction
[15 0]
Instruction
register
Instruc tion
[15 0]
Memory
data
register

0
M
Instruction u
x
[15 11]
1
0
M
u
x
1

Write
d a ta

16

Sign
extend

32

Ins truction [5 0]

Reg[IR[15-11]] = ALUout
RegDst = 1, RegWrite, MemtoReg = 0

Shift
left 2

Jum p
ad dre ss [3 1-0 ]

Zero
ALU ALU
r es ult

0
1 M
u
2 x
3

ALU
c ontrol

ALUOut

1 u
x

PC [31-28 ]

0
M
u
x
1

28

R-type FSM

From state 1

(Op = R-type)
Execut ion
6
ALUSrcA = 1
ALUSrcB = 00
A LUOp = 10

R-type complet ion


7
RegDst = 1
RegWrite
MemtoReg = 0

To state 0
(Figure 5.37)

Step 3: Load/Store Execution

PCWrit eCon d

PCSour ce

PCWrit e
O utpu ts ALUO p
Ior D
AL USrcB
M emRe ad
ALUSrcA
Co ntr ol
M emWr ite
Re gW rite
Me mto Reg
IRWrite

Op
[5 0]

RegDst
0

26

Ins tr uc tion [25 0]

PC

0
M
u
x
1

Shift
left 2

Instruc tion
[31- 26]
Addres s
Memor y
MemData
Write
data

Instruction
[25 21]

R ead
r egister 1

Instruction
[20 16]

R ead
R ead
r egister 2 data 1
Registers
Write
R ead
r egister data 2

Instruction
[15 0]
Instruction
register

0
M
Instruction u
x
[15 11]
1

Instruc tion
[15 0]
Memory
data
register

0
M
u
x
1

B
4

Write
d a ta

16

Sign
extend

32

Shift
left 2

Ins truction [5 0]

ALUOut = A + sign-extend(IR[15-0])
ALUsrcA = 1, ALUsrcB = 10, ALUop = 00

Jum p
ad dre ss [3 1-0 ]

Zero
ALU ALU
r es ult

0
1 M
u
2 x
3

ALU
c ontrol

ALUOut

1 u
x

PC [31-28 ]

0
M
u
x
1

28

Step 4: Memory Access

PCWrit eCon d

PCSour ce

PCWrit e
O utpu ts ALUO p
Ior D
AL USrcB
M emRe ad
ALUSrcA
Co ntr ol
M emWr ite
Re gW rite
Me mto Reg
IRWrite

Op
[5 0]

RegDst
0

26

Ins tr uc tion [25 0]

PC

0
M
u
x
1

Shift
left 2

Instruc tion
[31- 26]
Addres s
Memor y
MemData
Write
data

Instruction
[25 21]

R ead
r egister 1

Instruction
[20 16]

R ead
R ead
r egister 2 data 1
Registers
Write
R ead
r egister data 2

Instruction
[15 0]
Instruction
register

0
M
Instruction u
x
[15 11]
1

Instruc tion
[15 0]
Memory
data
register

0
M
u
x
1

B
4

Write
d a ta

16

Sign
extend

32

Jum p
ad dre ss [3 1-0 ]

Zero
ALU ALU
r es ult

ALUOut

0
1 M
u
2 x
3

ALU
c ontrol

Shift
left 2

Ins truction [5 0]

Load

MDR = Memory[ALUOut]

LOAD: MemRead, IorD = 1

Store
STORE: MemWrite, IorD = 1

Memory[ALUOut] = B

1 u
x

PC [31-28 ]

0
M
u
x
1

28

Step 5: Load Writeback

PCWrit eCon d

PCSour ce

PCWrit e
O utpu ts ALUO p
Ior D
AL USrcB
M emRe ad
ALUSrcA
Co ntr ol
M emWr ite
Re gW rite
Me mto Reg
IRWrite

Op
[5 0]

RegDst
0

26

Ins tr uc tion [25 0]

PC

0
M
u
x
1

Shift
left 2

Instruc tion
[31- 26]
Addres s
Memor y
MemData
Write
data

Instruction
[25 21]

R ead
r egister 1

Instruction
[20 16]

R ead
R ead
r egister 2 data 1
Registers
Write
R ead
r egister data 2

Instruction
[15 0]
Instruction
register
Instruc tion
[15 0]
Memory
data
register

0
M
Instruction u
x
[15 11]
1
0
M
u
x
1

Write
d a ta

16

Sign
extend

32

Ins truction [5 0]

Reg[IR[20-16]] = MDR
RegWrite, MemtoReg = 1, RegDst = 0

Shift
left 2

Jum p
ad dre ss [3 1-0 ]

Zero
ALU ALU
r es ult

0
1 M
u
2 x
3

ALU
c ontrol

ALUOut

1 u
x

PC [31-28 ]

0
M
u
x
1

28

Memory FSM

From state 1

(Op = 'LW') or (Op = 'SW')


Memory address computation
2

(Op = 'LW')

ALUSrcA = 1
ALUSrcB = 10
ALUOp = 00

Memory
access

Memory
access
5

MemRead
IorD = 1

MemWrite
IorD = 1

Wr ite-back step

RegWrite
MemtoReg = 1
RegDst = 0

To state 0
(Figure 5.37)

Step 3: Branch Completion

PCWrit eCon d

PCSour ce

PCWrit e
O utpu ts ALUO p
Ior D
AL USrcB
M emRe ad
ALUSrcA
Co ntr ol
M emWr ite
Re gW rite
Me mto Reg
IRWrite

Op
[5 0]

RegDst
0

26

Ins tr uc tion [25 0]

PC

0
M
u
x
1

Shift
left 2

Instruc tion
[31- 26]
Addres s
Memor y
MemData
Write
data

Instruction
[25 21]

R ead
r egister 1

Instruction
[20 16]

R ead
R ead
r egister 2 data 1
Registers
Write
R ead
r egister data 2

Instruction
[15 0]
Instruction
register
Instruc tion
[15 0]
Memory
data
register

0
M
Instruction u
x
[15 11]
1
0
M
u
x
1

B
4

Write
d a ta

16

Sign
extend

32

Shift
left 2

Ins truction [5 0]

if (A == B) PC = ALUOut
ALUsrcA = 1, ALUsrcB = 00, ALUop = 01, PCWriteCond, PCSource = 01

Jum p
ad dre ss [3 1-0 ]

Zero
ALU ALU
r es ult

0
1 M
u
2 x
3

ALU
c ontrol

ALUOut

1 u
x

PC [31-28 ]

0
M
u
x
1

28

Step 3: Jump Completion

PCWrit eCon d

PCSour ce

PCWrit e
O utpu ts ALUO p
Ior D
AL USrcB
M emRe ad
ALUSrcA
Co ntr ol
M emWr ite
Re gW rite
Me mto Reg
IRWrite

Op
[5 0]

RegDst
0

26

Ins tr uc tion [25 0]

PC

0
M
u
x
1

Shift
left 2

Instruc tion
[31- 26]
Addres s
Memor y
MemData
Write
data

Instruction
[25 21]

R ead
r egister 1

Instruction
[20 16]

R ead
R ead
r egister 2 data 1
Registers
Write
R ead
r egister data 2

Instruction
[15 0]
Instruction
register
Instruc tion
[15 0]
Memory
data
register

0
M
Instruction u
x
[15 11]
1
0
M
u
x
1

B
4

Write
d a ta

16

Sign
extend

32

Shift
left 2

Ins truction [5 0]

PC = PC [31-28] || (IR[25-0]<<2)
PCWrite, PCSource = 10

Jum p
ad dre ss [3 1-0 ]

Zero
ALU ALU
r es ult

0
1 M
u
2 x
3

ALU
c ontrol

ALUOut

1 u
x

PC [31-28 ]

0
M
u
x
1

28

Control FSMs

From state 1
(Op = 'J')

From state 1
(Op = 'BEQ')

Branch completion
8
ALUSrcA = 1
ALUSrcB = 00
ALUOp = 01
PCWriteCond
PCSource = 01

To state 0
(Figure 5.37)

Jump completion
9
PCWrite
PCSource = 10

To state 0
(Figure 5.37)

Final FSM

In s tr u c t i o n fe tc h

S t a rt

M e m o r y ad dr es s
c o m p u t a ti o n

M em R e ad
A L U S rc A = 0
Io r D = 0
IR W r i te
A L U S rc B = 0 1
AL UOp = 0 0
P C W r i te
P C S ou r c e = 0 0

( Op = 'L W')

A L U S rc A = 1
A L U S r c B = 10
AL U O p = 0 0

M em o r y
a c c es s
5

M e m R ea d
Io r D = 1

W rite -ba c k ste p

R eg Dst = 0
R e g W r i te
M e m to R e g = 1

B r a nc h
co m pletion
8

A L U S rc A = 1
A L U S r c B = 00
A L U O p = 10

M e m o ry
a c ce ss

A L U S rc A = 0
A L U S r c B = 11
AL U O p = 00

E xe c u t i o n

M em Wr ite
Io r D = 1

R e gD s t = 1
R e g W r i te
M e m to R e g = 0

Ju mp
c o mp le tio n

9
AL U Src A = 1
A L U S rc B = 0 0
AL U O p = 0 1
P C W r i t eC o nd
P C S ou r c e = 0 1

R-ty pe c om p le tio n
7

( O p = ' J' )

In st ruc tion de c od e/
r e g i s te r fe t c h

P C W r i te
P C S o u rc e = 1 0

Questions You Should Be Able to Answer

Given a code fragment, how many cycles will


execution of the fragment take?

lw $t2, 0($t3)
lw $t3, 4($t3)
beq $t2, $t3, Label
add $t5, $t2, $t3
sw $t5, 8($t3)

# assume not taken

Given a code fragment, what is happening in


the 5th cycle of execution?
In what cycle does the branch comparison in
the above code occur?
Given an instruction mix, find the CPI

CPI in Multicycle CPU

Instruction Mix

25% loads, 10% stores, 18% branches, 2% jumps,


45% ALU

Clock cycles per instruction


loads=5, stores=4, ALU=4, branches=3, jumps=3

CPI = CPU clock cycles / Instruction Count


= (Instruction Counti / Total Instruction Count) x CPIi
CPI = 0.25 x 5 + 0.10 x 4 + 0.45 x 4 + 0.18 x 3 + 0.02 x 3
= 4.05
Worst case CPI?

Control FSM Implementation

Combinational
control logic

Datapath control outputs

Outputs

Inputs

Inputs from instruction


register opcode field

Next state
State register

State Next
Opcode Reg State

Datapath
control

xxxxxx 0000 0001

00110...

R-type 0001 0110

10011...

lw,sw

0001 0010

10011...

beq

0001 1000

10011...

xxxxxx 0110 0111

R-type cycle 3

xxxxxx

R-type cycle 4

0111 0000

xxxxxx 1000 0000

beq cycle 3

lw

0010 0011

lw/sw cycle 3

sw

0010 0101

lw/sw cycle 3

ROM Implementation

Read Only Memory

memory values fixed ahead of time


can be used to implement a truth table
if the address is m-bits, we can address 2m entries in the ROM.
our outputs are the bits of data that the address points to.

Inputs
6 bit opcode
4 bit state

Outputs
16 datapath control bits
4 next state bits

m is the "height
n is the "width"

ROM is 2010 x 20 = 20K bit


Wasteful - in many cases, the opcode is ignored and
states have the same output

Other Alternatives?

Microprograms are an alternative to FSMs for


specifying control signals
FSM is like a flow chart;
Microprogram is a program
written in microinstructions

State sequencing can be done with a counter


instead of an explicit next state function.

Key Points

Performance gain when using a multicycle


implementation
Saves hardware but needs additional state elements
registers between steps

Control is more complex than single-cycle


more signals, more complexity

High level description can be in RTL


Low level settings of control signals can be given by
FSM (later, well see microprograms)
Can implement control in combinational logic or ROM

You might also like