Professional Documents
Culture Documents
PART-A
1. Lists the different challenges of parallel programming design
1. Synchronization challenge
2. Communication challenge
3. Load balancing challenge
4. Scalability challenge
2. What is Memory Fence?
Memory Fence is also called as memory barrier which is a processor dependent
operation that ensures that one thread can see other threads memory operation during
processing.
3. What is partitioning? Explain the ways of partitioning.
The Partitioning performs load balancing by dividing the computation and data into
pieces. There are two ways of partitioning.
i) Data centric partitioning (Domain decomposition): It is a parallel design method which
divides the data of the serial program into small pieces and then determines how to associate
the computations with data.
ii) Computation centric partitioning (Functional decomposition): It is a process of
dividing computation of the program into pieces and analyze how to associate data with the
individual computations.
4. Give an ISO efficiency relation.
If a parallel system explicit the efficiency €(n,p) by definining
C = €(n,p)
1-€(n,p) and
T0(n,p)=(p-1)ɕ(n)+pα(n,p)
To improve the scalability of the parallel system, it should satisfy the following condition
T(n,1)≥CT0(n,p)
The ISO efficiency relation is used to determine the range of processors for maintaining
the performance efficiency.
5. Define Dead lock and Live lock.
Dead lock: The dead lock arises when one thread wait for another resource that is already
locked by another waiting thread.
Live lock: It occur, when two threads continuously conflict with each other and back off.
6. Lists the steps to avoiding the data races.
i) Should confirm that only one thread can update the variable at a time.
ii) Place the synchronization lock around all that variable access.
iii) Ensure that the thread must acquire the lock before referencing the variable.
7. What is Mutex?
The simple method of proving synchronization is Mutex (mutually exclusive lock). Only
one thread in the program can acquire a mutex lock at a time. The mutex is the simplest lock
implementation that can be used in the program.
8. List out the different types of locks.
i) Mutex locks
ii) Recursive locks
iii) Reader Writer Locks
iv) Spin Locks
9. What is Spin lock? Lists the advantages.
Spin lock is a condition that occurs, when one thread have locked a data and it is
continuing its work, making all the thread to wait for a long time for some other thread to
unlock the data. This situation is spin lock.
Advantage:
The thread will acquire the lock to any data, once the data is immediately released by
other thread.
10. What is Barriers?
In parallel programming, some restriction mechanisms allow synchronization among the
multiple attributes. One of such mechanism is said to be barrier. By this technique, a single
thread process has to wait for all other threads to complete its execution for the purpose of
proceeding the next execution step.
PART-B
2. Analyze the performance of the parallel program by deriving the Amdahl’s and Gustafson
barsis law.
(i)Amdahl’s Law:
Used to know the limit of increase in the number of processors and also used to determine the
asymptotic speedup achievable as the number of processor increases.
Definition:
Let “g” be the fraction of operation in a computation that must be performed sequentially
where 0≤f≤1. The maximum speedup ¥ achievable by a parallel computer with ‘p’ number of
processors performing the computations is as follows
¥ (n,p) ≤ 1
g+(1-g)/p
Derivation:
The speedup of parallel program execution is
¥ (n,p) ≤ ɕ(n) + Ø(n)
ɕ(n) + (Ø(n) / p)+ α(n,p)
Definition: Let solving the ‘n’ size program with ‘p’ processors and ‘T’ denote the fraction of
total execution time spent in serial code, then the maximum speedup speedup ¥ achievable by
¥ (n,p) = p+(1-p)T
Derivation:
WKT, The equation of for speedup with α(n,p)>0 is
¥ (n,p) ≤ ɕ(n) + Ø(n)
ɕ(n) + (Ø(n) / p) -----------------(A)
Let ‘T’ denote the fraction of total execution time spent in serial code for performing the
parallel computation and the parallel operations has 1-T.
T = ɕ(n) + Ø(n)
ɕ(n) + (Ø(n) / p) -------------------(1)
1- T = (ɕ(n) /p)
ɕ(n) + (Ø(n) / p) -------------------(2)
From eqn (2)
¥ (n,p) = T+(1-T)p
(or)
¥ (n,p) = p+(1-p)T
3. Derive the Karp-Flatt metric law to improve the high performance of parallel program.
Both Amdahl’s and Gustafson law ignore the parallel overhead but here α(n,p) is
considered. It provides the high performance in parallel program design.
Definition:
With the given parallel computation speedup ¥ or ‘p’ number of processors where p>1
then experimentally determined serial fraction ‘e’ is
e = (1/ ¥) - (1/p)
1- (1/p)
Derivation:
WKT, the execution time of parallel program is,
T(n,p) = ɕ(n) + (Ø(n)/p)+α(n,p) --------------(1)
The serial programs do not have any interprocessor communication or overhead. So execution
time is
T(n,1) = ɕ(n) + Ø(n)---------------------(2)
The experiment determines the serial fraction ‘e’ is
ɕ(n) + α(n,p)=T(n,1)e ------------------(3)
Substitute equation (3) in (1)
T(n,p)= T(n,1)e + (Ø(n)/p) -----------------(4)
From equation (3),
ɕ(n)= T(n,1)e- α(n,p)
But, in serial program, parallel overhead is not possible. So α(n,p)=0.
Therefore, ɕ(n)= T(n,1)e -------------------------(5)
Substitute equation (5) in (2) and get the following
Ø(n)=T(n,1) (1-e) --------------------------(6)
Substitute equation (6) in (4) and get the parallel execution time as follows
T(n,p)=T(n,1)e+(T(n,1)(1-e))/p-------------------(7)
WKT, the Speedup is,
¥ =T(n,1) / T(n,p)
Let as assume, T(n,p)=1
From Equation (7) , Finally we get the following
e (1-(1/p))=(1/¥)-(1/p)
Where ‘e’ is experimentally determined serial fraction
The scalability of the parallel system is the measure of the ability to increase and improve the
performance as the number of processors increases.
Here, an ISO efficiency relation is formalized to stabilize the performance and efficiency.
Derivation:
WKT, The Speed up is ¥ (n,p) ≤ ɕ(n) + Ø(n)
ɕ(n) + (Ø(n) / p)+ α(n,p)
and
T0(n,p)=(p-1)ɕ(n)+pα(n,p)
To improve the scalability of the parallel system, it should satisfy the following condition
T(n,1)≥ C T0(n,p)
The ISO efficiency relation is used to determine the range of processors for maintaining
the performance efficiency.