You are on page 1of 6

TAGORE ENGINEERING COLLEGE

RATHINAMANGALAM, CHENNAI - 600 127.


DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
UNIT TEST-II
CS6801 – MULTI-CORE ARCHITECTURES AND PROGRAMMING
ANSWER KEY

PART-A
1. Lists the different challenges of parallel programming design
1. Synchronization challenge
2. Communication challenge
3. Load balancing challenge
4. Scalability challenge
2. What is Memory Fence?
Memory Fence is also called as memory barrier which is a processor dependent
operation that ensures that one thread can see other threads memory operation during
processing.
3. What is partitioning? Explain the ways of partitioning.
The Partitioning performs load balancing by dividing the computation and data into
pieces. There are two ways of partitioning.
i) Data centric partitioning (Domain decomposition): It is a parallel design method which
divides the data of the serial program into small pieces and then determines how to associate
the computations with data.
ii) Computation centric partitioning (Functional decomposition): It is a process of
dividing computation of the program into pieces and analyze how to associate data with the
individual computations.
4. Give an ISO efficiency relation.
If a parallel system explicit the efficiency €(n,p) by definining

C = €(n,p)
1-€(n,p) and
T0(n,p)=(p-1)ɕ(n)+pα(n,p)
To improve the scalability of the parallel system, it should satisfy the following condition
T(n,1)≥CT0(n,p)
The ISO efficiency relation is used to determine the range of processors for maintaining
the performance efficiency.
5. Define Dead lock and Live lock.
Dead lock: The dead lock arises when one thread wait for another resource that is already
locked by another waiting thread.
Live lock: It occur, when two threads continuously conflict with each other and back off.
6. Lists the steps to avoiding the data races.
i) Should confirm that only one thread can update the variable at a time.
ii) Place the synchronization lock around all that variable access.
iii) Ensure that the thread must acquire the lock before referencing the variable.
7. What is Mutex?
The simple method of proving synchronization is Mutex (mutually exclusive lock). Only
one thread in the program can acquire a mutex lock at a time. The mutex is the simplest lock
implementation that can be used in the program.
8. List out the different types of locks.
i) Mutex locks
ii) Recursive locks
iii) Reader Writer Locks
iv) Spin Locks
9. What is Spin lock? Lists the advantages.
Spin lock is a condition that occurs, when one thread have locked a data and it is
continuing its work, making all the thread to wait for a long time for some other thread to
unlock the data. This situation is spin lock.
Advantage:
The thread will acquire the lock to any data, once the data is immediately released by
other thread.
10. What is Barriers?
In parallel programming, some restriction mechanisms allow synchronization among the
multiple attributes. One of such mechanism is said to be barrier. By this technique, a single
thread process has to wait for all other threads to complete its execution for the purpose of
proceeding the next execution step.
PART-B

1. Explain the challenges in parallel programming design.


To perform the parallel programming, we need to improve the system performance by
implementing threads.
But threading process adds complexity to the programming and complexity of the
parallel program increases when more than one functionality occurs in the program.
There are four challenges that are faced in parallel programming.
i) Synchronization challenge:
It is the process in which two or more threads coordinate their functions and activities. For
example one thread waits for another thread to complete its task before continuing its
operation.
Two synchronization operations.
i) Mutual exclusion: one thread can block the critical section and may operate in the shared
data other threads have to wait still the thread holding the critical section completes its task.
ii) Condition synchronization: The thread is blocked until the system reaches some specific
condition. Here the threads wait to enter in to the critical section till the defined condition is
reached.
The synchronization primitives are: Semaphores, Locks and Condition Variables
ii) Communication challenge:
The message is the method of communication to transfer the information from one node to
another node.
Three concerns of message communications are
i) Multi granularity
ii)Multi threading
iii) Multi tasking
Different Communications for message passing are
1) Inter process communication: Two threads that communicate that reside in two different
processes.
2) Intra process communication: Two threads that communicate with the messages and reside
in same process.
3) Process to process communication: Two different processes communicate through message.
iii) Load balancing challenge:
It is major need of parallel programming. The load balancing can be done in effective way by
using appropriate loop scheduling and portioning. The Partitioning performs load balancing by
dividing the computation and data into pieces. There are two ways of partitioning.
a) Data centric partitioning (Domain decomposition): Determines how to associate the
computations with data.
b) Computation centric partitioning (Functional decomposition): Analyze how to associate data
with the individual computations.
The load balancing can be implemented in two ways.
i) Static load balancing: This is done when we need to map the tasks to processors in order to
minimize the communication overhead of the parallel program.
ii) Dynamic load balancing: The dynamic load balancing algorithms analyze the task dynamically
and create the current mapping of tasks to the processors.
iv) Scalability challenge:
The scalability is the ability of the parallel program to increase the performance as the
number of processors is increasing.
The scalability is limited by hardware interaction where the presence of multiple threads
causes the hardware to become less effective.
It is also limited by the software where the synchronization overhead becomes more
issue.

2. Analyze the performance of the parallel program by deriving the Amdahl’s and Gustafson
barsis law.
(i)Amdahl’s Law:
Used to know the limit of increase in the number of processors and also used to determine the
asymptotic speedup achievable as the number of processor increases.
Definition:
Let “g” be the fraction of operation in a computation that must be performed sequentially
where 0≤f≤1. The maximum speedup ¥ achievable by a parallel computer with ‘p’ number of
processors performing the computations is as follows
¥ (n,p) ≤ 1
g+(1-g)/p

Derivation:
The speedup of parallel program execution is
¥ (n,p) ≤ ɕ(n) + Ø(n)
ɕ(n) + (Ø(n) / p)+ α(n,p)

WKT, α(n,p)>0, so we can write speedup is


¥ (n,p) ≤ ɕ(n) + Ø(n)
ɕ(n) + (Ø(n) / p)

Let as assume that ‘g’ be the sequential computation


g= ɕ(n)
ɕ(n) + Ø(n)
Then we can write,
¥ (n,p) ≤ 1
g+(1-g)/p

(ii)Gustafson Barsis Law:


It starts with parallel computation and estimates how fast the parallel computation is
performing than the same program while executed on a single processor.

Definition: Let solving the ‘n’ size program with ‘p’ processors and ‘T’ denote the fraction of
total execution time spent in serial code, then the maximum speedup speedup ¥ achievable by
¥ (n,p) = p+(1-p)T
Derivation:
WKT, The equation of for speedup with α(n,p)>0 is
¥ (n,p) ≤ ɕ(n) + Ø(n)
ɕ(n) + (Ø(n) / p) -----------------(A)
Let ‘T’ denote the fraction of total execution time spent in serial code for performing the
parallel computation and the parallel operations has 1-T.
T = ɕ(n) + Ø(n)
ɕ(n) + (Ø(n) / p) -------------------(1)

1- T = (ɕ(n) /p)
ɕ(n) + (Ø(n) / p) -------------------(2)
From eqn (2)

Ø(n) = (ɕ(n) + (Ø(n) / p) )(1- T)p -----------(3)

From eqn (1)


ɕ(n) = (ɕ(n) + (Ø(n) / p) )T ----------------(4)
Substitute the equation (3) & (4) in equation (A) and we get the following,

¥ (n,p) = T+(1-T)p
(or)
¥ (n,p) = p+(1-p)T

3. Derive the Karp-Flatt metric law to improve the high performance of parallel program.
Both Amdahl’s and Gustafson law ignore the parallel overhead but here α(n,p) is
considered. It provides the high performance in parallel program design.
Definition:
With the given parallel computation speedup ¥ or ‘p’ number of processors where p>1
then experimentally determined serial fraction ‘e’ is
e = (1/ ¥) - (1/p)
1- (1/p)
Derivation:
WKT, the execution time of parallel program is,
T(n,p) = ɕ(n) + (Ø(n)/p)+α(n,p) --------------(1)
The serial programs do not have any interprocessor communication or overhead. So execution
time is
T(n,1) = ɕ(n) + Ø(n)---------------------(2)
The experiment determines the serial fraction ‘e’ is
ɕ(n) + α(n,p)=T(n,1)e ------------------(3)
Substitute equation (3) in (1)
T(n,p)= T(n,1)e + (Ø(n)/p) -----------------(4)
From equation (3),
ɕ(n)= T(n,1)e- α(n,p)
But, in serial program, parallel overhead is not possible. So α(n,p)=0.
Therefore, ɕ(n)= T(n,1)e -------------------------(5)
Substitute equation (5) in (2) and get the following
Ø(n)=T(n,1) (1-e) --------------------------(6)
Substitute equation (6) in (4) and get the parallel execution time as follows
T(n,p)=T(n,1)e+(T(n,1)(1-e))/p-------------------(7)
WKT, the Speedup is,
¥ =T(n,1) / T(n,p)
Let as assume, T(n,p)=1
From Equation (7) , Finally we get the following
e (1-(1/p))=(1/¥)-(1/p)
Where ‘e’ is experimentally determined serial fraction

4. Derive the ISO efficiency relation by scalability of the parallel program.

The scalability of the parallel system is the measure of the ability to increase and improve the
performance as the number of processors increases.
Here, an ISO efficiency relation is formalized to stabilize the performance and efficiency.
Derivation:
WKT, The Speed up is ¥ (n,p) ≤ ɕ(n) + Ø(n)
ɕ(n) + (Ø(n) / p)+ α(n,p)

¥ (n,p) ≤ p(ɕ(n) + Ø(n)}


pɕ(n) + Ø(n) + pα(n,p) ----------------(1)
WKT, pɕ(n) as follows,
pɕ(n)= ɕ(n)+(p-1) ɕ(n) --------------------(2)
Substitute (2) in (1)
¥ (n,p) ≤ p(ɕ(n) + Ø(n))
ɕ(n) + Ø(n) + (p-1) ɕ(n)+pα(n,p) ----------(3)
We already know that,
T0(n,p) = (p-1) ɕ(n)+pα(n,p) --------------------- (4)
Where T0(n,p) is the total time spent by all process for not performing any work in sequential
program and (p-1) process spent executing sequential code.
Substitute the eqn (4) in (3)
¥ (n,p) ≤ p(ɕ(n) + Ø(n)}
ɕ(n) + Ø(n) + T0(n,p)
WKT, the efficiency is equal to the speedup divided by equal number of processors, i.e,
ε(n,p) = ɕ(n) + Ø(n)
ɕ(n) + Ø(n) + T0(n,p) ------------------- (5)
Divide the numerators and denominators of eqn (5)
ε(n,p) ≤ 1 (Since T(n,1)= ɕ(n) + Ø(n)
1+( T0(n,p)/T(n,1))
T(n,1) ≥ ε(n,p) [T0(n,p)]
1- ε(n,p)
Here, the constant level efficiency i.e, an ISO efficiency relation is
C = ε(n,p)
1- ε(n,p)

and
T0(n,p)=(p-1)ɕ(n)+pα(n,p)
To improve the scalability of the parallel system, it should satisfy the following condition
T(n,1)≥ C T0(n,p)
The ISO efficiency relation is used to determine the range of processors for maintaining
the performance efficiency.

You might also like