Professional Documents
Culture Documents
Register 1
M
... u Read data 1
x
Read register Register n 2
number 1 Read Register n 1
Read register data 1
number 2
Register file Read register
Write Read number 2
register data 2
Write
data Write M
u Read data 2
x
C
0
1 Register 0
n-to-2n .. D
Register number .
decoder
C
Register 1
n1
D
n
..
.
C
Register n 2
D
C
Register n 1
Register data D
Instruction
address
Instruction
memory
Read
Address data
16 32
Sign
Data extend
Write memory
data
MemRead
a. Registers b. ALU
M
Add u
x
ALU
4 Add
result
Shift
left 2
16 32 MemRead
Sign
extend
ALUOp Op2
Op1
ALU control block
Op0
ALUOp0
ALUOp1
Outputs
Operation2 R-format Iw sw beq
F3 RegDst
Operation
F2 Operation1 ALUSrc
F (5 0)
F1 MemtoReg
Operation0
RegWrite
F0
MemRead
MemWrite
Branch
ALUOp1
ALUOpO
Assume that the multiplexors, control unit, PC accesses, sign extension unit, and
wires have no delay, which of the following implementations would be faster and by
how much?
To compare the performance, assume the following instruction mix: 25% loads, 10%
stores, 45% ALU instructions, 15% branches, and 5% jumps.
Load word Instruction fetch Register access ALU Memory access Register access
PCWriteCond None. The PC is written is the Zero output from the ALU is also
active.
ALUSrcB 00 The second input to the ALU comes from the B register.
10 The second input to the ALU is the sign-extend, lower 16 bits of the IR.
11 The second input to the ALU is the sign-extended, lower 16 bits of the IR shifted left 2 bits.
01 The contents of ALUOut (the branch target address) are sent to the PC for waiting.
10 The jump target address (IR[25:0] shifted left 2 bits and concatenated with PC+4[31:28] is
sent to the PC for writing.)
Break it down into steps following our rule that data flows through
at most one major functional unit (e.g., balance work across steps)
Instruction Fetch
Write-back step
IR <= Memory[PC];
PC <= PC + 4;
A <= Reg[IR[25:21]];
B <= Reg[IR[20:16]];
ALUOut <= PC + (sign-extend(IR[15:0]) << 2);
Memory Reference:
R-type:
ALUOut <= A op B;
Branch:
The write actually takes place at the end of the cycle on the edge
Answer:
The mix is 25% loads (1% load byte+24% load word), 10% stores (1%
store byte+9% store word), 11% branches (6% beq, 5% bne), 2% jumps
(1% jal+1% jr), and 52% ALU (all the rest of the mix, which we assume to
be ALU instructions). From Figure 5.30 on page 329, the number of clock
cycles for each instruction class is the following:
Loads: 5 ; Store: 4; ALU instructions: 4; Branches: 3; Jumps: 3;
Instruction counti
Instruction count
is simplify the instruction frequency for the instruction class i. We
can therefore substitute to obtain