Professional Documents
Culture Documents
Multicycle MIPS
Montek Singh
Mar 25, 2010
Topics
Issue w/ single cycle
Multicycle MIPS
State elements
Now add registers between stages
How to control
Performance
CLK
PC'
WE
PC
EN
CLK
RD
Instr / Data
Memory
A1
A2
A3
WD
WD3
WE3
RD1
RD2
Register
File
IRWrite
CLK
CLK
PC'
PC
A
CLK
CLK
WE
RD
Instr / Data
Memory
Instr
EN
A1
A2
A3
WD
WD3
WE3
RD1
RD2
Register
File
CLK
CLK
PC'
PC
A
WE
RD
Instr / Data
Memory
Instr
EN
CLK
CLK
CLK
25:21
A1
A2
A3
WD
WD3
WE3
RD1
RD2
Register
File
Multicycle Datapath: lw
immediate
IRWrite
CLK
CLK
PC'
PC
A
WE
RD
CLK
CLK
CLK
Instr
25:21
EN
Instr / Data
Memory
A1
A2
A3
WD
WD3
WE3
RD1
RD2
Register
File
SignImm
15:0
Sign Extend
IRWrite
PC'
PC
A
WE
RD
CLK
CLK
CLK
Instr
25:21
A1
A2
EN
WE3
RD1
RD2
SrcB
Instr / Data
Memory
A3
WD
WD3
Register
File
SignImm
15:0
Sign Extend
CLK
SrcA
ALU
CLK
CLK
ALUControl2:0
ALUResult
ALUOut
CLK
CLK
PC'
PC
0
1
Adr
WE
RD
CLK
CLK
CLK
Instr / Data
Memory
WD
ALUControl2:0
Instr
25:21
EN
A1
A2
WE3
RD1
RD2
SrcB
A3
CLK
Data
WD3
Register
File
SignImm
15:0
Sign Extend
CLK
SrcA
ALU
IorD
ALUResult
ALUOut
CLK
CLK
PC'
PC
0
1
Adr
WE
RD
Instr
25:21
EN
A1
A2
ALUControl2:0
CLK
CLK
CLK
Instr / Data
Memory
WD
RegWrite
WE3
RD1
RD2
SrcB
20:16
CLK
Data
A3
WD3
Register
File
SignImm
15:0
Sign Extend
CLK
SrcA
ALU
IorD
ALUResult
ALUOut
IRWrite
CLK
CLK
PC'
EN
PC
0
1
Adr
WE
RD
Instr
25:21
EN
A1
A2
CLK
CLK
Instr / Data
Memory
WD
RegWrite
WE3
RD1
RD2
0
1
A
4
20:16
CLK
Data
A3
WD3
Register
File
Sign Extend
00
01
10
11
SignImm
15:0
SrcA
SrcB
CLK
ALU
PCWrite
ALUResult
ALUOut
Multicycle Datapath: sw
Already know how to generate addr
Write data in rt to memory
IorD
MemWrite IRWrite
CLK
CLK
PC'
EN
PC
0
1
Adr
WE
RD
Instr
25:21
20:16
EN
A1
A2
CLK
CLK
Instr / Data
Memory
WD
RegWrite
WE3
RD1
RD2
A
B
4
20:16
CLK
Data
A3
WD3
Register
File
Sign Extend
00
01
10
11
SignImm
15:0
SrcA
1
SrcB
CLK
ALU
PCWrite
ALUResult
ALUOut
IorD
MemWrite IRWrite
CLK
CLK
PC'
EN
PC
0
1
Adr
WE
RD
RegWrite
Instr
25:21
A1
A2
20:16
EN
20:16
15:11
CLK
WE3
RD1
RD2
A3
WD3
Register
File
SignImm
15:0
A
B
Sign Extend
SrcA
1
4
0
1
Data
CLK
CLK
Instr / Data
Memory
WD
RegDst MemtoReg
00
01
10
11
SrcB
CLK
ALU
PCWrite
ALUResult
ALUOut
CLK
CLK
PC'
EN
PC
0
1
Adr
WE
RD
RegWrite
Instr
EN
25:21
A1
20:16
A2
20:16
15:11
CLK
WE3
RD1
RD2
0
1
A
B
4
0
1
A3
0
Data
WD3
Register
File
<<2
SignImm
15:0
PCSrc
CLK
CLK
CLK
Instr / Data
Memory
WD
RegDst MemtoReg
Sign Extend
SrcA
00
01
10
11
SrcB
Zero
ALU
IorD
ALUResult
CLK
ALUOut
0
1
CLK
PCWrite
Branch
PCEn
EN
0
1
Adr
Instr / Data
Memory
WD
Instr
25:21
20:16
EN
20:16
15:11
CLK
A1
A2
WE3
RD1
RD2
A
B
0
1
A3
WD3
Register
File
SignImm
Sign Extend
00
01
10
11
<<2
15:0
SrcA
1
4
0
1
Data
CLK
CLK
SrcB
Zero
ALU
PC
WE
RD
MemtoReg
PC'
CLK
RegDst
CLK
CLK
ALUResult
CLK
ALUOut
0
1
Control Unit
Control
Unit
MemtoReg
RegDst
IorD
Opcode5:0
Main
Controller
(FSM)
PCSrc
ALUSrcB1:0
ALUSrcA
IRWrite
MemWrite
PCWrite
Branch
RegWrite
ALUOp1:0
Funct5:0
ALU
Decoder
Multiplexer
Selects
ALUControl2:0
Register
Enables
CLK
1
PCWrite
Branch
PCEn
EN
0
0
1
Adr
WE
RD
Instr / Data
Memory
WD
CLK
Instr
25:21
ALUSrcA
RegWrite
20:16
EN
20:16
CLK
15:11
A1
A2
WE3
CLK
0
RD1
RD2
A
B
A3
X
WD3
SignImm
Sign Extend
00
01
10
11
Register
File
<<2
15:0
SrcA
1
4
0
1
Data
CLK
01
SrcB
010
ALU
PC
Funct
MemtoReg
PC'
Op
5:0
RegDst
CLK
CLK
31:26
Zero
ALUResult
CLK
ALUOut
0
1
IorD = 0
AluSrcA = 0
ALUSrcB = 01
ALUOp = 00
PCSrc = 0
IRWrite
PCWrite
Fetch instruction
Also increment PC (because ALU not in use)
CLK
1
PCWrite
Branch
PCEn
PC'
PC
EN
0
0
1
Adr
0
WE
RD
Instr / Data
Memory
WD
Op
5:0
Funct
CLK
Instr
25:21
ALUSrcA
RegWrite
MemtoReg
CLK
CLK
31:26
RegDst
Note:
signals only
shown when
needed and
enables only
when
asserted.
20:16
EN
1
CLK
15:11
0
WE3
CLK
RD1
RD2
A
B
0
1
A3
WD3
SignImm
Sign Extend
00
01
10
11
Register
File
<<2
15:0
SrcA
1
4
0
1
Data
A1
A2
X
20:16
CLK
01
SrcB
010
ALU
Reset
Zero
ALUResult
CLK
ALUOut
0
0
1
Reset
IorD = 0
AluSrcA = 0
ALUSrcB = 01
ALUOp = 00
PCSrc = 0
IRWrite
PCWrite
S1: Decode
PCWrite
Branch
PCEn
PC
EN
X
0
1
Adr
RD
Instr / Data
Memory
WD
Funct
CLK
Instr
25:21
ALUSrcA
RegWrite
MemtoReg
PC'
0
WE
Op
5:0
RegDst
CLK
CLK
31:26
20:16
EN
0
CLK
15:11
0
WE3
CLK
RD1
RD2
0
1
A3
WD3
SignImm
Sign Extend
00
01
10
11
Register
File
<<2
15:0
SrcA
1
4
0
1
Data
A1
A2
X
20:16
CLK
XX
SrcB
XXX
Zero
ALU
S0: Fetch
ALUResult
CLK
ALUOut
X
0
1
S2: MemAdr
IorD = 0
AluSrcA = 0
ALUSrcB = 01
ALUOp = 00
PCSrc = 0
IRWrite
PCWrite
S1: Decode
Op = LW
or
Op = SW
CLK
0
PCWrite
Branch
IorD Control PCSrc
ALUControl2:0
Unit
MemWrite
ALUSrcB1:0
IRWrite
31:26
ALUSrcA
Op
RegWrite
5:0
Funct
PC
EN
CLK
Instr / Data
Memory
20:16
CLK
15:11
X
0
1
Adr
WD
Instr
25:21
MemtoReg
PC'
0
WE
RD
RegDst
CLK
CLK
20:16
EN
A1
A2
0
WE3
RD1
RD2
0
1
A3
WD3
A
B
Register
File
SignImm
Sign Extend
SrcA
<<2
15:0
PCEn
CLK
0
1
Data
CLK
00
01
10
11
10
SrcB
010
ALU
S0: Fetch
Zero
ALUResult
CLK
ALUOut
X
0
1
IorD = 0
AluSrcA = 0
ALUSrcB = 01
ALUOp = 00
PCSrc = 0
IRWrite
PCWrite
Reset
S2: MemAdr
S1: Decode
Op = LW
or
Op = SW
CLK
0
PCWrite
Branch
IorD Control PCSrc
ALUControl2:0
Unit
MemWrite
ALUSrcB1:0
IRWrite
PC
EN
X
0
1
Adr
WE
RD
Instr / Data
Memory
WD
Funct
CLK
Instr
25:21
RegWrite
20:16
EN
20:16
CLK
15:11
CLK
A1
A2
WE3
CLK
0
RD1
RD2
A3
1
1
X
WD3
00
Register
File
SignImm
Sign Extend
01
10
11
<<2
15:0
SrcA
1
4
Data
PCEn
ALUSrcA
MemtoReg
PC'
Op
5:0
RegDst
CLK
CLK
31:26
10
SrcB
010
ALU
ALUSrcA = 1
ALUSrcB = 10
ALUOp = 00
Zero
ALUResult
CLK
ALUOut
X
0
1
IorD = 0
AluSrcA = 0
ALUSrcB = 01
ALUOp = 00
PCSrc = 0
IRWrite
PCWrite
Reset
S2: MemAdr
Op = LW
or
Op = SW
ALUSrcA = 1
ALUSrcB = 10
ALUOp = 00
Op = LW
S3: MemRead
IorD = 1
S4: Mem
Writeback
RegDst = 0
MemtoReg = 1
RegWrite
S1: Decode
For lw now
need to read
from memory
Then write
to register
IorD = 0
AluSrcA = 0
ALUSrcB = 01
ALUOp = 00
PCSrc = 0
IRWrite
PCWrite
Reset
S2: MemAdr
S1: Decode
Op = LW
or
Op = SW
ALUSrcA = 1
ALUSrcB = 10
ALUOp = 00
Op = SW
Op = LW
S5: MemWrite
S3: MemRead
IorD = 1
S4: Mem
Writeback
RegDst = 0
MemtoReg = 1
RegWrite
IorD = 1
MemWrite
sw just
writes to
memory
One step
shorter
IorD = 0
AluSrcA = 0
ALUSrcB = 01
ALUOp = 00
PCSrc = 0
IRWrite
PCWrite
Reset
S2: MemAdr
S1: Decode
Op = LW
or
Op = SW
Op = R-type
S6: Execute
ALUSrcA = 1
ALUSrcB = 10
ALUOp = 00
ALUSrcA = 1
ALUSrcB = 00
ALUOp = 10
Op = SW
Op = LW
S5: MemWrite
S3: MemRead
IorD = 1
S4: Mem
Writeback
RegDst = 0
MemtoReg = 1
RegWrite
IorD = 1
MemWrite
S7: ALU
Writeback
RegDst = 1
MemtoReg = 0
RegWrite
The r-type
instructions
have two
steps:
compute
result in ALU
and write to
reg
IorD = 0
AluSrcA = 0
ALUSrcB = 01
ALUOp = 00
PCSrc = 0
IRWrite
PCWrite
Reset
S2: MemAdr
S1: Decode
ALUSrcA = 0
ALUSrcB = 11
ALUOp = 00
Op = BEQ
Op = LW
or
Op = SW
Op = R-type
S6: Execute
ALUSrcA = 1
ALUSrcB = 10
ALUOp = 00
ALUSrcA = 1
ALUSrcB = 00
ALUOp = 10
Op = SW
Op = LW
S5: MemWrite
S3: MemRead
IorD = 1
S4: Mem
Writeback
RegDst = 0
MemtoReg = 1
RegWrite
IorD = 1
MemWrite
S7: ALU
Writeback
RegDst = 1
MemtoReg = 0
RegWrite
S8: Branch
ALUSrcA = 1
ALUSrcB = 00
ALUOp = 01
PCSrc = 1
Branch
beq needs to
use ALU
twice, so
consumes two
cycles
One to
compute addr
Another to
decide on eq
Can take
advantage of
decode when
ALU not used
to compute
BTA
(no harm if
BTA not used)
IorD = 0
AluSrcA = 0
ALUSrcB = 01
ALUOp = 00
PCSrc = 0
IRWrite
PCWrite
Reset
S2: MemAdr
S1: Decode
ALUSrcA = 0
ALUSrcB = 11
ALUOp = 00
Op = BEQ
Op = LW
or
Op = SW
Op = R-type
S6: Execute
ALUSrcA = 1
ALUSrcB = 10
ALUOp = 00
ALUSrcA = 1
ALUSrcB = 00
ALUOp = 10
Op = SW
Op = LW
S5: MemWrite
S3: MemRead
IorD = 1
S4: Mem
Writeback
RegDst = 0
MemtoReg = 1
RegWrite
IorD = 1
MemWrite
S7: ALU
Writeback
RegDst = 1
MemtoReg = 0
RegWrite
S8: Branch
ALUSrcA = 1
ALUSrcB = 00
ALUOp = 01
PCSrc = 1
Branch
IorD = 0
AluSrcA = 0
ALUSrcB = 01
ALUOp = 00
PCSrc = 0
IRWrite
PCWrite
Reset
S2: MemAdr
S1: Decode
Similar to rtype
ALUSrcA = 0
ALUSrcB = 11
ALUOp = 00
Op = BEQ
Op = LW
or
Op = SW
Op = ADDI
Op = R-type
S6: Execute
ALUSrcA = 1
ALUSrcB = 10
ALUOp = 00
ALUSrcA = 1
ALUSrcB = 00
ALUOp = 10
S8: Branch
ALUSrcA = 1
ALUSrcB = 00
ALUOp = 01
PCSrc = 1
Branch
S9: ADDI
Execute
Op = SW
Op = LW
S5: MemWrite
S3: MemRead
IorD = 1
S4: Mem
Writeback
RegDst = 0
MemtoReg = 1
RegWrite
IorD = 1
MemWrite
S7: ALU
Writeback
RegDst = 1
MemtoReg = 0
RegWrite
S10: ADDI
Writeback
Add
Write back
IorD = 0
AluSrcA = 0
ALUSrcB = 01
ALUOp = 00
PCSrc = 0
IRWrite
PCWrite
Reset
S2: MemAdr
S1: Decode
ALUSrcA = 0
ALUSrcB = 11
ALUOp = 00
Op = BEQ
Op = LW
or
Op = SW
Op = ADDI
Op = R-type
S6: Execute
ALUSrcA = 1
ALUSrcB = 10
ALUOp = 00
ALUSrcA = 1
ALUSrcB = 00
ALUOp = 10
S8: Branch
ALUSrcA = 1
ALUSrcB = 00
ALUOp = 01
PCSrc = 1
Branch
S9: ADDI
Execute
ALUSrcA = 1
ALUSrcB = 10
ALUOp = 00
Op = SW
Op = LW
S5: MemWrite
S3: MemRead
IorD = 1
S4: Mem
Writeback
RegDst = 0
MemtoReg = 1
RegWrite
IorD = 1
MemWrite
S7: ALU
Writeback
RegDst = 1
MemtoReg = 0
RegWrite
S10: ADDI
Writeback
RegDst = 0
MemtoReg = 0
RegWrite
Extended Functionality: j
PCEn
MemWrite IRWrite
CLK
CLK
PC'
PC
EN
0
1
Adr
WE
RD
RegWrite
Instr
25:21
A1
A2
20:16
EN
20:16
15:11
CLK
WE3
RD1
RD2
A
B
0
1
31:28
A3
0
1
WD3
Register
File
SignImm
25:0 (jump)
Sign Extend
01
10
11
<<2
15:0
SrcA
00
0
1
Data
PCSrc1:0
CLK
CLK
CLK
Instr / Data
Memory
WD
RegDst MemtoReg
SrcB
CLK
Zero
ALU
IorD
ALUResult
ALUOut
PCJump
<<2
27:0
00
01
10
Control FSM: j
S0: Fetch
IorD = 0
AluSrcA = 0
ALUSrcB = 01
ALUOp = 00
PCSrc = 00
IRWrite
PCWrite
Reset
S2: MemAdr
S1: Decode
S11: Jump
ALUSrcA = 0
ALUSrcB = 11
ALUOp = 00
Op = J
Op = BEQ
Op = LW
or
Op = SW
Op = ADDI
Op = R-type
S6: Execute
ALUSrcA = 1
ALUSrcB = 10
ALUOp = 00
ALUSrcA = 1
ALUSrcB = 00
ALUOp = 10
S8: Branch
ALUSrcA = 1
ALUSrcB = 00
ALUOp = 01
PCSrc = 01
Branch
S9: ADDI
Execute
ALUSrcA = 1
ALUSrcB = 10
ALUOp = 00
Op = SW
Op = LW
S5: MemWrite
S3: MemRead
IorD = 1
S4: Mem
Writeback
RegDst = 0
MemtoReg = 1
RegWrite
IorD = 1
MemWrite
S7: ALU
Writeback
RegDst = 1
MemtoReg = 0
RegWrite
S10: ADDI
Writeback
RegDst = 0
MemtoReg = 0
RegWrite
Control FSM: j
S0: Fetch
IorD = 0
AluSrcA = 0
ALUSrcB = 01
ALUOp = 00
PCSrc = 00
IRWrite
PCWrite
Reset
S2: MemAdr
S1: Decode
S11: Jump
ALUSrcA = 0
ALUSrcB = 11
ALUOp = 00
Op = J
PCSrc = 10
PCWrite
Op = BEQ
Op = LW
or
Op = SW
Op = ADDI
Op = R-type
S6: Execute
ALUSrcA = 1
ALUSrcB = 10
ALUOp = 00
ALUSrcA = 1
ALUSrcB = 00
ALUOp = 10
S8: Branch
ALUSrcA = 1
ALUSrcB = 00
ALUOp = 01
PCSrc = 01
Branch
S9: ADDI
Execute
ALUSrcA = 1
ALUSrcB = 10
ALUOp = 00
Op = SW
Op = LW
S5: MemWrite
S3: MemRead
IorD = 1
S4: Mem
Writeback
RegDst = 0
MemtoReg = 1
RegWrite
IorD = 1
MemWrite
S7: ALU
Writeback
RegDst = 1
MemtoReg = 0
RegWrite
S10: ADDI
Writeback
RegDst = 0
MemtoReg = 0
RegWrite
Multicycle Performance
Instructions take different number of cycles:
3 cycles:
beq, j
4 cycles:
R-Type, sw, addi
5 cycles: lw
CPI is weighted average
SPECINT2000 benchmark:
25% loads
10% stores
11% branches
2% jumps
52% R-type
Average CPI = (0.11 + 0.2)(3) + (0.52 + 0.10)
Multicycle Performance
Multicycle critical path:
Tc = tpcq + tmux + max(tALU + tmux, tmem) + tsetup
CLK
PCWrite
PCEn
Branch
IorD Control PCSrc
ALUControl2:0
Unit
MemWrite
ALUSrcB1:0
IRWrite
EN
CLK
WE
0
1
Adr
RD
Instr / Data
Memory
WD
Funct
Instr
25:21
RegWrite
20:16
EN
CLK
15:11
WE3
RD1
RD2
A3
WD3
00
11
SignImm
Sign Extend
01
10
Register
File
<<2
15:0
SrcA
1
4
Data
A1
A2
20:16
CLK
CLK
SrcB
Zero
ALU
PC
5:0
ALUSrcA
MemtoReg
PC'
Op
RegDst
CLK
CLK
31:26
ALUResult
CLK
ALUOut
0
1
seconds). Why?
31:26
5:0
MemtoReg
Control
MemWrite
Unit
Branch
ALUControl2:0
Op
ALUSrc
Funct RegDst
PCSrc
RegWrite
CLK
0
1
0
1
PC'
PC
RD
Instr
Instruction
Memory
25:21
20:16
A1
CLK
WE3
A2
RD2
A3
Register
WD3
File
0 SrcB
1
20:16
PCJump
15:11
Sign Extend
WriteData
0
1
SignImm
15:0
ALUResult
<<2
WriteReg4:0
PCPlus4
Zero
SrcA
RD1
ALU
CLK
27:0
31:28
25:0
<<2
PCBranch
WE
A
RD
Data
Memory
WD
ReadData
0 Result
1
Branch
IorD Control PCSrc
ALUControl2:0
Unit
MemWrite
ALUSrcB1:0
IRWrite
5:0
ALUSrcA
RegWrite
Op
Funct
MemtoReg
RegDst
CLK
CLK
PC'
PC
EN
WE
0
1
Adr
RD
Instr / Data
Memory
WD
CLK
CLK
CLK
Instr
25:21
A1
A2
20:16
EN
20:16
15:11
CLK
RD1
RD2
A
B
31:28
A3
1
0
1
WD3
Register
File
ImmExt
25:0 (Addr)
Sign Extend
00
01
10
11
<<2
15:0
SrcA
1
4
Data
WE3
SrcB
CLK
Zero
ALU
31:26
ALUResult
ALUOut
PCJump
<<2
27:0
00
01
10
Next Time
Well look at pipelined MIPS
Adding throughput (and complexity) by trying
38