Professional Documents
Culture Documents
1
FIRST SEMESTER EXAMINATIONS June 2000
ADVANCED COMPUTER ARCHITECTURE (623.305)
Answer:
Note that (Ri) means that the content of register Ri and Memory(10) contains 64
initially.
b) Are there any resource dependencies if only one copy of each functional unit
is available in the CPU?
Answer:
a)
S1 S2 S5
S4 S3
2
FIRST SEMESTER EXAMINATIONS June 2000
ADVANCED COMPUTER ARCHITECTURE (623.305)
b) There are storage dependencies between instruction pairs (S2, S5) and (S4, S5).
There is a resource dependence between S1 and S2 on the load unit, and another
between S4 and S5 on the store unit.
QUESTION 3 (8 Marks)
Compare the PRAM models with physical models of real parallel computers in each of
the following categories:
a) Which PRAM variant can best model SIMD machines and how?
Answer:
a) Since the processing elements of a SIMD machine read and write data from
different memory modules synchronously, no access conflicts should arise.
Thus, any PRAM variant can be used to model SIMD machines.
b) The processors in a MIMD machine can read the same memory location
simultaneously. However, writing to the same memory location is
prohibited. Thus, the CREW-PRAM can best model an MIMD machine.
QUESTION 4 (7 Marks)
Answer:
Consider the following pipelined processor with four stages. This pipeline has a total
evaluation time of six clock cycles. All successor stages must be used after each clock
cycle.
3
FIRST SEMESTER EXAMINATIONS June 2000
ADVANCED COMPUTER ARCHITECTURE (623.305)
Output
Input
S1 S2 S3 S4
a) Specify the reservation table for this pipeline with six columns and four rows?
b) List the set of forbidden latencies between task initiations.
c) Draw the state diagram which shows all possible latency cycles.
d) List all greedy cycles from the state diagram.
e) Determine the minimal average latency (MAL).
Answer:
a) Reservation table:
1 2 3 4 5 6
S1 X X
S2 X X
S3 X
S4 X
f) MAL = (1+1+1+5)/4 = 2.
4
FIRST SEMESTER EXAMINATIONS June 2000
ADVANCED COMPUTER ARCHITECTURE (623.305)
QUESTION 6 (8 Marks)
A uniprocessor computer can operate in either scalar or vector mode. In vector mode,
computations can be performed nine times faster than in scalar mode. A certain
benchmark program took time T to run on this computer. Further, it was found that
25% of T was attributed to the vector mode. In the remaining time, the machine
operated in the scalar mode.
a) Calculate the effective speedup under the above conditions as compared with the
condition when the vector mode is not used at all. Also calculate α, the percentage
of code that has been vectorized in the above program.
b) Suppose we double the speed ratio between the vector mode and the scalar mode
by hardware improvements. Calculate the effective speedup that can be achieved.
c) Suppose the same speedup obtained in part (b) must be obtained by compiler
improvements instead of hardware improvements. What would be the new
vectorization ratio α that should be supported by the vectorizing compiler for the
same benchmark program?
Answer:
a) If the vector mode is not used at all, the execution time will be:
0.75T + 9 × 0.25T = 3T
Therefore, effective speedup = 3T/T =3. Let the fraction of vectorized code be α.
Then α = 9 × 0.25T/3T = 0.75.
b) Suppose that the speed ratio between the vector mode and the scalar mode is
doubled. The execution time becomes:
c) Suppose the speed for vector mode computation is still nine times as fast as that
for scalar mode. To maintain the effective speedup of 3.43, the vectorization ratio
α must satisfy the following relation:
1 24
=
α 1− α 7
+
9 1
5
FIRST SEMESTER EXAMINATIONS June 2000
ADVANCED COMPUTER ARCHITECTURE (623.305)
On a Fujitsu VP2000, the vector processing unit is equipped with two load/store
pipelines plus five functional pipelines as shown below.
9HFWRU 8QLW
W
L
0DVN 0DVN 3LSHOLQH
Q
6 5HJLV
6 8
H
WHU 0DVN3LSHOLQH
W J
L D
Q U
8 R /RDG6WRUH
W $/83LSHOLQH
H 6
J 3LSHOLQH
U
D Q U H
U L R W
R D W V
L $/83LSHOLQH
W 0 /RDG6WRUH F J
H H
6
3LSHOLQH
9 5
P 'LYLGH 3LSHOLQH
H
W
V
\ &KDQQHO
6
3URFHVV
6FDODU
%XIIHU6WRUDJH
([HFXWLRQ
8QLW
6FDODU8QLWV
Answer:
(BC+DE)
Add BC+DE + FG
Time
6
FIRST SEMESTER EXAMINATIONS June 2000
ADVANCED COMPUTER ARCHITECTURE (623.305)
Given the following assembly language code. Exploit the maximum degree of
parallelism among the 16 instructions, assuming no resource conflicts and multiple
functional units are available simultaneously. For simplicity, no pipelining is assumed.
All instructions take one machine cycle to execute. Ignore all other overhead.
a) Draw a program graph with 16 nodes to show the flow relationships among the 16
instructions.
Answer: 10 8
(a) R7 R4 R1 R2
R9
R5
R8 R3
11
R10
R6
12 13 15
R11 R12
Y X
14 16
7
FIRST SEMESTER EXAMINATIONS June 2000
ADVANCED COMPUTER ARCHITECTURE (623.305)
(b)
QUESTION 9 (8 Marks)
Explain the applicability and the restrictions involved in using Amdahl’s law and
Gustafson’s law to estimate the speedup performance of an n-processor system
compared with that of a single-processor system. Ignore all communication overheads.
(please answer in one paragraph).
Answer:
Amdahl’s law is based on a fixed workload, where the problem size is fixed regardless
of the machine size. Gustafson’s law is based on a scaled workload, where the problem
size is increased with the machine size so that the solution time is the same for
sequential and parallel executions.
QUESTION 10 (5 Marks)
Answer:
In VLSI systems, networks with many dimensions require more and longer wires than
low-dimensional networks. Thus, high-dimensional networks cost more and run more
slowly than low-dimensional networks. Under the assumption of constant wire
bisection, low-dimensional networks have wide channels, and high-dimensional
networks have narrow channels. With the wormhole routing method, which is used by
most of the second- and third-generation multicomputers, the wider channels provide a
lower latency, less contention, and higher hot-spot throughput.