You are on page 1of 20

ECE4680 Computer Organization and Architecture Designing a Single Cycle Datapath

Processor Design: How to Implement MIPS Simplicity favors regularity

ECE4680 Datapath.1

2003-3-19

The Big Picture: Where are We Now?


The Five Classic Components of a Computer
Processor Input Control Memory Datapath

Output

Todays Topic: Datapath Design What is data? What is datapath?

ECE4680 Datapath.2

2003-3-19

The Big Picture: The Performance Perspective


Performance of a machine was determined by: Instruction count Clock cycle time Clock cycles per instruction Processor design (datapath and control) will determine: Clock cycle time Clock cycles per instruction In the next two lectures: Single cycle processor: Advantage: One clock cycle per instruction Disadvantage: long cycle time

ECE4680 Datapath.3

2003-3-19

The MIPS Instruction Formats


All MIPS instructions are 32 bits long. The three instruction formats:
31 26 op 6 bits 31 op 6 bits 31 op 6 bits 26 target address 26 bits 26 rs 5 bits rs 5 bits 21 rt 5 bits 21 rt 5 bits 16 immediate 16 bits 0 16 rd 5 bits 11 shamt 5 bits 6 funct 6 bits 0 0

R-type I-type J-type

The different fields are: op: operation of the instruction rs, rt, rd: the source and destination register specifiers shamt: shift amount funct: selects the variant of the operation in the op field address / immediate: address offset or immediate value target address: target address of the jump instruction
ECE4680 Datapath.4 2003-3-19

The MIPS Subset


ADD and subtract add rd, rs, rt sub rd, rs, rt
31 31 26 op 6 bits 26 op 6 bits 21 rs 5 bits 21 rs 5 bits 16 rt 5 bits 16 rt 5 bits 11 rd 5 bits 6 shamt 5 bits funct 6 bits 0 immediate 16 bits 0

OR Immediate: ori rt, rs, imm16 LOAD and STORE lw rt, rs, imm16 sw rt, rs, imm16 BRANCH: beq rs, rt, imm16 JUMP: j target
31

26 op 6 bits

0 target address 26 bits


2003-3-19

ECE4680 Datapath.5

An Abstract View of the Implementation


Two types of functional units Operational element that operate on data (combinational) State element that contain data (sequential) Generic Implementation:

Clk

PC Instruction Address Ideal Instruction Memory Instruction bus Rd Rs 5 5 Rt 5

use PC to supply instruction address get the instruction from memory read registers use the instruction to decide exactly what to do All instructions use the ALU after reading the registers

Imm Why? memory-reference? arithmetic? control flow? 16 Data Address Data In Clk

32 Clk

Rw Ra Rb 32 32-bit Registers

32 32 ALU 32

Ideal Data Memory

DataOut

Next step: to fill in the details: more units, more connections, and control unit
ECE4680 Datapath.6 2003-3-19

State Elements
Unclocked vs. Clocked Clocks used in synchronous logic when should an element that contains state be updated? falling edge

cycle time rising edge

ECE4680 Datapath.7

2003-3-19

An unclocked state element


The set-reset latch output depends on present inputs and also on past inputs

__

ECE4680 Datapath.8

2003-3-19

Latches and Flip-flops


Output is equal to the stored value inside the element (don't need to ask for permission to look at the value) Change of state (value) is based on the clock Latches: whenever the inputs change, and the clock is asserted Flip-flop: state changes only on a clock edge (edge-triggered methodology) "logically true", could mean electrically low A clocking methodology defines when signals can be read and written wouldn't want to read a signal at the same time it was being written

ECE4680 Datapath.9

2003-3-19

D-latch and D flip-flop


Two inputs: the data value to be stored (D) the clock signal (C) indicating when to read & store D Output changes when C is high
C Q
D

_ Q D

Output changes only on the clock edge

D C

D latch

Q D latch _ C Q

Q _ Q
D

C
Q

ECE4680 Datapath.10

2003-3-19

Clocking Methodology (Appendix B.7)


Clk Setup Hold Dont Care Setup Hold

. . .

. . .

. . .

. . .

All storage elements are clocked by the same clock edge Edge-trigged: all stored values are updated on a clock edge Cycle Time = Latch Prop + Longest Delay Path + Setup + Clock Skew (Latch Prop + Shortest Delay Path - Clock Skew) > Hold Time
ECE4680 Datapath.11 2003-3-19

An Abstract View of the Critical Path


Register file and ideal memory: The CLK input is a factor ONLY during write operation During read operation, behave as combinational logic: Clk PC Instruction Address Ideal Instruction Memory Instruction bus Rd Rs 5 5 Rt 5 Imm 16

Address valid => Output valid after access time.


Critical Path (Load Operation) = PCs prop time + Instruction Memorys Access Time + Register Files Access Time + ALU to Perform a 32-bit Add + Data Memory Access Time + Setup Time for Register File Write + Clock Skew Data Address Data In Clk
2003-3-19

32 Clk

Rw Ra Rb 32 32-bit Registers

32 32 ALU 32

Ideal Data Memory

DataOut

ECE4680 Datapath.12

The Steps of Designing a Processor


Instruction Set Architecture => Register Transfer Language Register Transfer Language (RTL) => Datapath components Datapath interconnect Datapath components => Control signals Control signals => Control logic

po Element < com

nent

ECE4680 Datapath.13

2003-3-19

What is RTL: The ADD Instruction


Register Transfer Language

add

rd, rs, rt Fetch the instruction from memory The ADD operation Calculate the next instructions address

mem[PC] R[rd] PC R[rs] + R[rt] PC + 4

ECE4680 Datapath.14

2003-3-19

What is RTL: The Load Instruction


lw

rt, rs, imm16 mem[PC] Addr Fetch the instruction from memory

R[rs] + SignExt(imm16) Calculate the memory address Mem[Addr] PC + 4 Load the data into the register Calculate the next instructions address

R[rt] PC

ECE4680 Datapath.15

2003-3-19

Combinational Logic Elements


Adder
CarryIn A 32 Adder

32

Sum Carry

32 Select

MUX (p.B-9,B-19)

Decoder
A B out0 out1 out2 out7

Decoder

32 32

MUX

32

ALU
OP A 32 In which cases do we need an adder, ALU, MUX or Decoder? ALU

32

Result Zero

B
ECE4680 Datapath.16

32

2003-3-19

Storage Element: Register (p.B22-B25)


Register Similar to the D Flip Flop except - N-bit input and output - Write Enable input Write Enable: 0: Data Out will not change 1: Data Out will become Data In
Write Enable Data In N Data Out N

Clk

Array of logical elements(see register file on next 2 slides)


l is set to 1. rite Enable signa

The content is

the W ck tick ONLY if updated at the clo

ECE4680 Datapath.17

2003-3-19

Storage Element: Register File


Register File consists of 32 registers: Two 32-bit output busses: busA and busB One 32-bit input bus: busW Register is selected by: RA selects the register to put on busA RB selects the register to put on busB RW selects the register to be written via busW when Write Enable is 1 Clock input (CLK) The CLK input is a factor ONLY during write operation During read operation, behaves as a combinational logic block: - RA or RB valid => busA or busB valid after access time.
RW RA RB Write Enable 5 5 5 busW 32 Clk busA 32 busB 32

32 32-bit Registers

ECE4680 Datapath.18

2003-3-19

Storage Element: Register File -- Detailed diagram


RW RA RB Write Enable 5 5 5 busW 32 Clk busA 32 busB 32

32 32-bit Registers

Write Enable
C D C D

RA RB

0 1

Register 0 Register 1

RW

32-to-1 Decoder
30 31

M U X

busA

C D C

Register 30 Register 31

busW

Clk

M U X
busB

ECE4680 Datapath.19

2003-3-19

Storage Element: Idealized Memory


Write Enable Address

Memory (idealized) One input bus: Data In One output bus: Data Out

Memory word is selected by: Address selects the word to put on Data Out Write Enable = 1: address selects the memory memory word to be written via the Data In bus Clock input (CLK) The CLK input is a factor ONLY during write operation During read operation, behaves as a combinational logic block: - Address valid => Data Out valid after access time.

Data In 32 Clk

DataOut 32

ECE4680 Datapath.20

2003-3-19

Overview of the Instruction Fetch Unit (Fig. 5.5)


The common RTL operations Fetch the Instruction: mem[PC] Update the program counter: - Sequential Code: PC <- PC + 4 Branch and Jump PC <- something else

Clk

PC Next Address Logic Address Instruction Memory

Instruction Word 32

ECE4680 Datapath.21

2003-3-19

RTL: The ADD Instruction


31 op 6 bits 26 rs 5 bits 21 rt 5 bits 16 rd 5 bits 11 shamt 5 bits 6 funct 6 bits 0

add

rd, rs, rt Fetch the instruction from memory The actual operation Calculate the next instructions address

mem[PC] R[rd] PC R[rs] + R[rt] PC + 4

ECE4680 Datapath.22

2003-3-19

RTL: The Subtract Instruction


31 op 6 bits 26 rs 5 bits 21 rt 5 bits 16 rd 5 bits 11 shamt 5 bits 6 funct 6 bits 0

sub

rd, rs, rt Fetch the instruction from memory The actual operation Calculate the next instructions address

mem[PC] R[rd] PC R[rs] - R[rt] PC + 4

ECE4680 Datapath.23

2003-3-19

Datapath for Register-Register Operations


R[rd] <- R[rs] op R[rt] Example: add rd, rs, rt Ra, Rb, and Rw comes from instructions rs, rt, and rd fields ALUctr and RegWr: control logic after decoding the instruction
31 op 6 bits 26 rs 5 bits 21 rt 5 bits 16 rd 5 bits 11 shamt 5 bits 6 funct 6 bits 0

Rd Rs Rt RegWr 5 5 5 busW 32 Clk Rw Ra Rb 32 32-bit Registers busA 32 busB 32

ALUctr

ALU

Result 32

ECE4680 Datapath.24

2003-3-19

Register-Register Timing
Clk PC Old Value Clk-to-Q New Value Old Value Old Value Old Value Old Value Old Value Instruction Memory Access Time New Value Delay through Control Logic New Value New Value Register File Access Time New Value ALU Delay New Value

Rs, Rt, Rd, Op, Func ALUctr RegWr busA, B busW

Rd Rs Rt RegWr 5 5 5 busW 32 Clk Rw Ra Rb 32 32-bit Registers busA 32 busB 32

ALUctr

Register Write Occurs Here Result 32

ALU

ECE4680 Datapath.25

2003-3-19

RTL: The OR Immediate Instruction


31 26 op 6 bits 21 rs 5 bits 16 rt 5 bits 0 immediate 16 bits

ori

rt, rs, imm16 Fetch the instruction from memory

mem[PC] R[rt]

R[rs] or ZeroExt(imm16) The OR operation PC + 4


31 0000000000000000 16 bits

PC

Calculate the next instructions address


16 15 immediate 16 bits 0

ECE4680 Datapath.26

2003-3-19

Datapath for Logical Operations with Immediate


R[rt] <- R[rs] op ZeroExt[imm16]]
31 26 op 6 bits 21 rs 5 bits 16 rt 5 bits

Example: ori

rt, rs, imm16


0

immediate 16 bits Newly added parts are in blue color.

Rd RegDst Mux RegWr 5 busW 32 Clk

Rt Rs 5 5 Dont Care (Rt) busA 32 busB 32 ZeroExt

ALUctr

Rw Ra Rb 32 32-bit Registers

Result 32

ALU

Mux

imm16

16

32 ALUSrc

ECE4680 Datapath.27

2003-3-19

RTL: The Load Instruction


31

lw

rt, rs, imm16 mem[PC] Addr R[rt] PC

26 op 6 bits

21 rs 5 bits

16 rt 5 bits

0 immediate 16 bits

Fetch the instruction from memory

R[rs] + SignExt(imm16) Calculate the memory address Mem[Addr] PC + 4


31

Load the data into the register Calculate the next instructions address
0 immediate 16 bits 0 immediate 16 bits
2003-3-19

16 15 0 0000000000000000 16 bits 16 15 1111111111111111 1 16 bits

31

ECE4680 Datapath.28

Datapath for Load Operations


R[rt] <- Mem[R[rs] + SignExt[imm16]]
31 26 op 6 bits Rt Rs 5 5 Dont Care (Rt) busA 32 busB 32 Extender 32 ALUSrc
ECE4680 Datapath.29

Example: lw
immediate 16 bits

rt, rs, imm16


0

21 rs 5 bits

16 rt 5 bits

Rd RegDst Mux RegWr 5 busW 32 Clk

ALUctr MemtoReg ALU

Rw Ra Rb 32 32-bit Registers

Mux

32 MemWr WrEn Adr

32

Mux Data In 32 Clk

imm16

16

Data Memory

ExtOp

2003-3-19

RTL: The Store Instruction


31 26 op 6 bits 21 rs 5 bits 16 rt 5 bits 0 immediate 16 bits

sw

rt, rs, imm16 Fetch the instruction from memory

mem[PC] Addr

R[rs] + SignExt(imm16) Calculate the memory address R[rt] Store the register into memory Calculate the next instructions address

Mem[Addr] PC PC + 4

ECE4680 Datapath.30

2003-3-19

Datapath for Store Operations


Mem[R[rs] + SignExt[imm16] <- R[rt]]
31 26 op 6 bits Rt Rs 5 5 busA 32 busB 32 Extender 32 ALUSrc Clk Mux ALU Mux Rt 21 rs 5 bits 16 rt 5 bits

Example: sw
immediate 16 bits

rt, rs, imm16


0

Rd RegDst Mux RegWr 5 busW 32 Clk

ALUctr MemWr MemtoReg

Rw Ra Rb 32 32-bit Registers

32 32 WrEn Adr Data Memory

Data In 32

imm16

16

ECE4680 Datapath.31

ExtOp

2003-3-19

RTL: The Branch Instruction


31 26 op 6 bits 21 rs 5 bits 16 rt 5 bits 0 immediate 16 bits

beq

rs, rt, imm16 Fetch the instruction from memory Calculate the branch condition

mem[PC] Cond R[rs] - R[rt]

if (COND eq 0) Calculate the next instructions address - PC PC + 4 + ( SignExt(imm16) x 4 ) else - PC PC + 4

ECE4680 Datapath.32

2003-3-19

Datapath for Branch Operations


beq rs, rt, imm16
31 26 op 6 bits Rt Rs 5 5 busA 32 busB 32 Extender 32 ALUSrc
ECE4680 Datapath.33

We need to compare Rs and Rt!


21 rs 5 bits 16 rt 5 bits 0 immediate 16 bits Branch Rt Clk

Rd RegDst Mux RegWr 5 busW 32 Clk

PC

ALUctr

imm16 16

Next Address Logic

Rw Ra Rb 32 32-bit Registers

ALU

Zero

To Instruction Memory

Mux

imm16

16

ExtOp

2003-3-19

Binary Arithmetic for the Next Address


In theory, the PC is a 32-bit byte address into the instruction memory: Sequential operation: PC<31:0> = PC<31:0> + 4 Branch operation: PC<31:0> = PC<31:0> + 4 + SignExt[Imm16] * 4 The magic number 4 always comes up because: The 32-bit PC is a byte address And all our instructions are 4 bytes (32 bits) long In other words: The 2 LSBs of the 32-bit PC are always zeros There is no reason to have hardware to keep the 2 LSBs In practice, we can simplify the hardware by using a 30-bit PC<31:2>: Sequential operation: PC<31:2> = PC<31:2> + 1 Branch operation: PC<31:2> = PC<31:2> + 1 + SignExt[Imm16] In either case: Instruction Memory Address = PC<31:2> concat 00
ECE4680 Datapath.34 2003-3-19

Next Address Logic: Expensive and Fast Solution


Using a 30-bit PC: Sequential operation: PC<31:2> = PC<31:2> + 1 Branch operation: PC<31:2> = PC<31:2> + 1 + SignExt[Imm16] In either case: Instruction Memory Address = PC<31:2> concat 00
30 30 30 1 Clk imm16 Instruction<15:0> 16 SignExt 00 0 30 Adder Mux Adder Addr<31:2> Addr<1:0> Instruction Memory 32 PC
ECE4680 Datapath.35

1 30

30 Instruction<31:0> Branch Zero

2003-3-19

Next Address Logic: Cheap and Slow Solution


Why is this slow? Cannot start the address add until Zero (output of ALU) is valid Does it matter that this is slow in the overall scheme of things? Probably not here. Critical path is the load operation.
30 30 0 Clk imm16 Instruction<15:0> 16 0 Mux Addr<31:2> Addr<1:0> Instruction Memory 32 PC
ECE4680 Datapath.36

1 Carry In Adder

00

1 30 30

30

SignExt

Instruction<31:0> Branch Zero


2003-3-19

RTL: The Jump Instruction


31 26 op 6 bits 0 target address 26 bits

target mem[PC] PC<31:2> Fetch the instruction from memory PC<31:28> concat target<25:0> Calculate the next instructions address

ECE4680 Datapath.37

2003-3-19

Instruction Fetch Unit


j target PC<31:2> PC<31:28> concat target<25:0>
30 PC<31:28> Target 4 Instruction<25:0> 26 PC Clk imm16 Instruction<15:0> 16
ECE4680 Datapath.38

30 00 30 1 Mux 0

Addr<31:2> Addr<1:0> Instruction Memory 32

30 1

Adder Adder SignExt 30

30

0 Mux 1

Jump

Instruction<31:0>

30

Branch

Zero

This is the whole design of Instruction Fetch Unit: 3 inputs: jump, Branch and Zero; 1 output: instruction word.
2003-3-19

Putting it All Together: A Single Cycle Datapath


We have everything except control signals (underline)

Branch Rd RegDst Rt Rs 5 5 Rt Jump Clk Instruction Fetch Unit

Instruction<31:0> <21:25> <16:20> <11:15> <0:15>

1 Mux 0 RegWr 5 ALUctr

Rt Zero ALU

Rs

Rd

Imm16 MemtoReg 0 Mux

busW 32 Clk

busA Rw Ra Rb 32 32 32-bit Registers busB 0 32 Extender 1 32

MemWr

32 32 WrEn Adr

Mux Data In 32 Clk

imm16

16

Data Memory

ALUSrc ExtOp
ECE4680 Datapath.39 2003-3-19

Where to get more information?


To be continued ...

ECE4680 Datapath.40

2003-3-19

You might also like