You are on page 1of 5

Problem 1

(1)

Input device:
keyboard, mouse, scanner and so on.
Output device: monitor, printer and so on.
Storage (on-chip or off-chip): memory, hard disk, CD
ROM and so on.
Control:
control circuit in CPU
Data-path:
ALU in CPU

(2)

Instruction set architecture (ISA)

(3)

Conditional branch PC = current PC + 4 + Branch Addr


(Branch Addr is the last 16 bits of the instruction shifted by 2 bits).

(4)
(5)

Jump: PC (highest 4 bits) = current PC (highest 4 bits)


PC (lowest 28 bits) = Jump Addr
(Jump Addr is the last 26 bits of the instruction shifted by 2
bits)
Compiler, instruction set, program
Micro benchmarks, Application benchmark set

Problem 2
0100 0100 0000 1111 1100 0000 0000 0000
a)
b)
c)

230+226+219+218+217+216+215+214=1141882880
the range of the positive floating point number is from
2-1275.9*10-39 to 21283.4*1038
0100 01 - 00 000 - 0 1111 - 1100 0 - 000 0000 0000
OP = 0x21 R[rd] = R[rs] + R[rt]
rs = $0, rt = $15 and rd = $24
(This one has a typo, so everyone gets credit, however you need to grasp the idea, how to
decode instruction in a control of CPU)

1001 0100 0000 1111 1100 0100 0000 1000


d)

-1810906104

Problem 3
Memory-Memory
ADD A, B, C
SUB D, A, C

Load-Store
Load R1, B
Load R2, C
ADD R3, R1, R2
SUB R3, R3, R2
STORE D R3

(2 points)

(3 points)

Memory-memory is more efficient as measured by code size because it uses fewer


instructions. (1 point)


Memory-memory: 2 x (6 + 3x24) / 8 = 19.5 bytes

Load-Store: 5 x 32 / 8 = 20 bytes

Load-Store architecture is more efficient as measured by memory bandwidth requirement
because it can store the middle result in registers not in memory, therefore has fewer memory
access.
(1 point)

Memory-memory: 3 memory accesses for each line of codes, in total 6


memory access

Load-Store: in total 3 accesses, B C and D.


Load-Store architecture is preferred for high performance because the operations on
registers are much faster than those on memory.
(1 point)

Memory-memory: 2 x (4 + 3*10) = 78 cycles

Load-store: 4 + 4 + 3*10 = 38 cycles

Problem 4
For option 1, the total execute time is (1-24%)*1.3 = 0.988
For option 2, the total execute time is (1-24%/2)*1.1=0.968

(3 points)
(3 points)


Therefore, option 2 is faster than option 1. Option 2 will be chosen. (2 points)

Problem 5


LOOP:

Exit:

addi
sll
addi
sll
and
sll
add
ldcl
add.d
add
sdcl
addi
blt

$a0,
$a0,
$a1,
$a1,
$s0,
$s1,
$s2,
$t3,
$t5,
$s3,
$t5,
$s0,
$s0,

$zero,
$a0,
$zero,
$a1,
$s0,
$s0,
$a0,
-8($s2)
$t3,
$a1,
0($s3)
$s0,
$t0,

2
24
5
24
$zero
3
$s0

# $a0 = address of A[0]


# $a1 = address of C[0]


# $s0 = 0, i=0
# $s1 = i * 8
# $s2 = address of A[i]
# $t3 = A[i - 1]
# $t5 = A[i - 1]+B
# $s3 = address of C[i]
# C[i] = A[i]+B
# i=i+1
# i<N then continued

$t1
$s1

1
Loop

Problem 6
1) Truth table
G
0
0
1
1
0
0
1
1

Inputs
P
0
1
0
1
0
1
0
1

Cin
0
0
0
0
1
1
1
1

Output
Cout
0
0
1
0
1
1

Cout = G + PCin (2 points for equation and 2 points for truth table)
The structure of PLA is shown as following:
And plane

2)
C4 = g [0, 3] + p[0, 3]C0
C8 = g[4, 7] + p[4, 7] C4
C12=g[8, 11] + p[8, 11] C8
G[0, 15] =G[12, 15] +
P[12, 15].G[8,11] +
P[12, 15].P[8, 11].G[4, 7] +
P[12, 15]P[8, 11]P[4, 7]G[0, 3]
P[0, 15] =P[15, 12]. P[8, 11]. P[4, 7]. P[0, 3]

Cin

Or plane

Cout

You might also like