Professional Documents
Culture Documents
computing techniques
GVHD:
Phm Trn V
Sinh vin:
L Trng Tn
Mai Vn Ninh
Phng Quang Chnh
Nguyn c Cnh
ng Trung Tn
Contents
Motivation of Parallel Computing Techniques
Parallel Computing Techniques
Message-passing computing
Pipelined Computations
Embarrassingly Parallel Computations
Partitioning and Divide-and-Conquer Strategies
Synchronous Computations
Load Balancing and Termination Detection
www.cse.hcmut.edu.vn
Contents
Motivation of Parallel Computing Techniques
Parallel Computing Techniques
Message-passing computing
Pipelined Computations
Embarrassingly Parallel Computations
Partitioning and Divide-and-Conquer Strategies
Synchronous Computations
Load Balancing and Termination Detection
www.cse.hcmut.edu.vn
Contents
Motivation of Parallel Computing Techniques
Parallel Computing Techniques
Message-passing computing
Pipelined Computations
Embarrassingly Parallel Computations
Partitioning and Divide-and-Conquer Strategies
Synchronous Computations
Load Balancing and Termination Detection
www.cse.hcmut.edu.vn
Message-Passing Computing
Basics
of
Message-Passing
Programming
using
user-level
message passing libraries
Two primary mechanisms needed:
A method of creating separate
processes for execution on different
computers
A method of sending and receiving
messages
www.cse.hcmut.edu.vn
Message-Passing Computing
Static process creation:
Source
file
Compile to suit
processor
executables
www.cse.hcmut.edu.vn
Source
file
Processor
0
Source
file
Processor
n-1
Message-Passing Computing
Dynamic process creation:
Processor 1
PVM way
.
spawn()
.
.
.
.
.
time
www.cse.hcmut.edu.vn
Star
t
of pr executio
oces
n
s2
Processor 2
.
.
.
.
.
.
.
8
Message-Passing Computing
www.cse.hcmut.edu.vn
Contents
Motivation of Parallel Computing Techniques
Parallel Computing Techniques
Message-passing computing
Pipelined Computations
Embarrassingly Parallel Computations
Partitioning and Divide-and-Conquer Strategies
Synchronous Computations
Load Balancing and Termination Detection
www.cse.hcmut.edu.vn
10
Pipelined Computation
Problem divided into a series of
tasks that have to be completed
one after the other (the basis of
sequential programming).
Each task executed by a separate
process or processor.
www.cse.hcmut.edu.vn
11
Pipelined Computation
Where pipelining can be used to good effect
1-If more than one instance of the
complete problem is to be executed
2-If a series of data items must be
processed, each requiring multiple
operations
3-If information to start the next
process can be passed forward before
the process has completed all its internal
operations
www.cse.hcmut.edu.vn
12
Pipelined Computation
13
Pipelined Computation
www.cse.hcmut.edu.vn
14
Pipelined Computation
www.cse.hcmut.edu.vn
15
Pipelined Computation
www.cse.hcmut.edu.vn
16
Pipelined Computations
www.cse.hcmut.edu.vn
17
Pipelined Computation
www.cse.hcmut.edu.vn
18
Contents
Motivation of Parallel Computing Techniques
Parallel Computing Techniques
Message-passing computing
Pipelined Computations
Embarrassingly Parallel Computations
Partitioning and Divide-and-Conquer Strategies
Synchronous Computations
Load Balancing and Termination Detection
www.cse.hcmut.edu.vn
19
20
www.cse.hcmut.edu.vn
21
slave
approach
www.cse.hcmut.edu.vn
22
www.cse.hcmut.edu.vn
23
www.cse.hcmut.edu.vn
24
Process
80
640
640
Map
Map
80
480
480
www.cse.hcmut.edu.vn
25
Mandelbrot Set
Set of points in a complex plane that are quasi-stable
when computed by iterating the function
www.cse.hcmut.edu.vn
26
Mandelbrot Set
www.cse.hcmut.edu.vn
27
Mandelbrot Set
www.cse.hcmut.edu.vn
28
Mandelbrot Set
c.real = real_min + x * (real_max - real_min)/disp_width
c.imag = imag_min + y * (imag_max - imag_min)/disp_height
29
Mandelbrot Set
Dynamic Task Assignment
Have processor request regions
computing previouos regions
www.cse.hcmut.edu.vn
after
30
www.cse.hcmut.edu.vn
31
www.cse.hcmut.edu.vn
32
33
Sequential code
www.cse.hcmut.edu.vn
34
www.cse.hcmut.edu.vn
35
Contents
Motivation of Parallel Computing Techniques
Parallel Computing Techniques
Message-passing computing
Pipelined Computations
Embarrassingly Parallel Computations
Partitioning and Divide-and-Conquer Strategies
Synchronous Computations
Load Balancing and Termination Detection
www.cse.hcmut.edu.vn
36
www.cse.hcmut.edu.vn
37
Partitioning can be applied to the program data (data partitioning or domain decomposition) and the functions of a program (functional
decomposition).
It is much less mommon to find concurrent functions in a problem, but data partitioning is a main strategy for parallel programming.
www.cse.hcmut.edu.vn
38
A sequence of numbers, x0
x0 x(n/p)-1
+
Partial sums
n: number of items
p: number of processors
+
Sum
39
of
same
form
as
larger
40
www.cse.hcmut.edu.vn
41
Initial problem
Divide
problem
Final
task
Tree construction
www.cse.hcmut.edu.vn
42
Original list
Initial problem
P0
P0
P0
P0
P1
x0
www.cse.hcmut.edu.vn
P2
P2
Divide
problem
P4
P4
P3
P4
P6
P5
P6
P7
Final
task
xn-1
43
Many possibilities.
Operations on sequences of number such as
simply adding them together
Several sorting algorithms can often be partitioned
or constructed in a recursive fashion
Numerical integration
N-body problem
www.cse.hcmut.edu.vn
44
One bucket assigned to hold numbers that fall within each region.
Numbers in each bucket sorted using a sequential sorting
algorithm.
n: number of items
m: number of buckets
www.cse.hcmut.edu.vn
45
Simple approach
Assign one processor for each bucket.
www.cse.hcmut.edu.vn
46
47
www.cse.hcmut.edu.vn
48
www.cse.hcmut.edu.vn
49
all-to-all
www.cse.hcmut.edu.vn
50
Contents
Motivation of Parallel Computing Techniques
Parallel Computing Techniques
Message-passing computing
Pipelined Computations
Embarrassingly Parallel Computations
Partitioning and Divide-and-Conquer Strategies
Synchronous Computations
Load Balancing and Termination Detection
www.cse.hcmut.edu.vn
51
Synchronous Computations
Synchronous
Barrier
Barrier Implementation
Centralized Counter implementation
Tree Barrier Implementation
Butterfly Barrier
Synchronized Computations
Fully synchronous
Data Parallel Computations
Synchronous Iteration(Synchronous Parallelism)
Locally synchronous
Heat Distribution Problem
Sequential Code
Parallel Code
www.cse.hcmut.edu.vn
52
Barrier
www.cse.hcmut.edu.vn
53
Barrier Image
www.cse.hcmut.edu.vn
54
Barrier Implementation
Centralized Counter implementation
( linear barrier)
Tree Barrier Implementation.
Butterfly Barrier
Local Synchronization
Deadlock
www.cse.hcmut.edu.vn
55
www.cse.hcmut.edu.vn
56
Example code
Master:
for (i = 0; i < n; i++)/*count slaves as they reach
barrier*/
recv(Pany);
for (i = 0; i < n; i++)/* release slaves */
send(Pi);
Slave processes:
send(Pmaster);
recv(Pmaster);
www.cse.hcmut.edu.vn
57
Second stage:
P2 sends message to P0; (P2 & P3 reached their barrier)
P6 sends message to P4; (P6 & P7 reached their barrier)
Second stage:
P4 sends message to P0; (P4, P5, P6, & P7 reached barrier)
P0 terminates arrival phase;( when P0 reaches barrier &
received message from P4)
www.cse.hcmut.edu.vn
58
Tree barrier
www.cse.hcmut.edu.vn
59
Butterfly Barrier
60
Local Synchronization
Suppose a process Pi needs to be synchronized and
to exchange data with process Pi-1 and process Pi+1
61
Synchronized Computations
Fully synchronous
In fully synchronous, all processes involved in the computation
must be synchronized.
Locally synchronous
In locally synchronous, processes only need to synchronize
with a set of logically nearby processes, not all processes
involved in the computation
62
www.cse.hcmut.edu.vn
63
Synchronous Iteration
Each iteration composed of several processes
that start together at beginning of iteration.
Next iteration cannot begin until all processes
have finished previous iteration Using forall :
for (j = 0; j < n; j++) /*for each synch. iteration */
forall (i = 0; i < N; i++) { /*N procs each using*/
body(i);
/* specific value of i */
}
www.cse.hcmut.edu.vn
64
Synchronous Iteration
Solving a General System of Linear Equations by Iteration
Suppose the equations are of a general form with n
equations and n unknowns where the unknowns are x0,
x1, x2, xn-1 (0 <= i < n).
an-1,0x0 + an-1,1x1 + an-1,2x2 + an-1,n-1xn-1 = bn-1
.
.
.
.
a2,0x0 + a2,1x1 + a2,2x2 + a2,n-1xn-1 = b2
a1,0x0 + a1,1x1 + a1,2x2 + a1,n-1xn-1 = b1
a0,0x0 + a0,1x1 + a0,2x2 + a0,n-1xn-1 = b0
where the unknowns are x0, x1, x2, xn-1 (0<= i < n).
www.cse.hcmut.edu.vn
65
Synchronous Iteration
By rearranging the ith equation:
ai,0x0 + ai,1x1 + ai,2x2 + ai,n-1xn-1 = bi
to
xi = (1/ai,i)[bi-(ai,0x0+ai,1x1+ai,2x2ai,i-1xi-1+ai
,i+1xi+1+ai,n-1xn-1)]
Or
www.cse.hcmut.edu.vn
66
67
www.cse.hcmut.edu.vn
68
Sequential Code
Using a fixed number of iterations
for (iteration = 0; iteration < limit; iteration++) {
for (i = 1; i < n; i++)
for (j = 1; j < n; j++)
g[i][j] = 0.25*(h[i-1][j]+h[i+1][j]+h[i][j-1]
+h[i][j+1]);
for (i = 1; i < n; i++)/* update points */
for (j = 1; j < n; j++)
h[i][j] = g[i][j];
www.cse.hcmut.edu.vn
69
Parallel Code
With fixed number of iterations, Pi,j (except for the
boundary points):
for (iteration = 0; iteration < limit; iteration++) {
g = 0.25 * (w + x + y + z);
send(&g, Pi-1,j); /* non-blocking sends */
send(&g, Pi+1,j);
Local
send(&g, Pi,j-1);
send(&g, Pi,j+1);
Barrier
recv(&w, Pi-1,j); /* synchronous receives */
recv(&x, Pi+1,j);
recv(&y, Pi,j-1);
recv(&z, Pi,j+1);
}
www.cse.hcmut.edu.vn
70
Contents
Motivation of Parallel Computing Techniques
Parallel Computing Techniques
Message-passing computing
Pipelined Computations
Embarrassingly Parallel Computations
Partitioning and Divide-and-Conquer Strategies
Synchronous Computations
Load Balancing and Termination Detection
www.cse.hcmut.edu.vn
71
Content
Load Balancing
Used to distribute
computations fairly
across processors in
order to obtain the
highest possible
execution speed
www.cse.hcmut.edu.vn
Termination Detection
Detecting when a
computation has been
completed. More difficult
when the computation is
distributed.
73
Load Balancing
www.cse.hcmut.edu.vn
74
Load Balancing
www.cse.hcmut.edu.vn
75
76
www.cse.hcmut.edu.vn
77
www.cse.hcmut.edu.vn
78
www.cse.hcmut.edu.vn
79
www.cse.hcmut.edu.vn
80
Termination
www.cse.hcmut.edu.vn
81
www.cse.hcmut.edu.vn
82
www.cse.hcmut.edu.vn
83
Process Selection
Algorithms for selecting a process:
Round robin algorithm process
Pi requests tasks from process
Px,where x is given by a counter
that is incremented after each
request,
using
modulo
n
arithmetic
(n
processes),
excluding x = i.
Random polling algorithm
process Pi requests tasks from
process Px, where x is a number
that is selected randomly
between 0 and n- 1 (excluding i).
www.cse.hcmut.edu.vn
84
Termination Conditions
Application-specific local termination conditions exist
throughout the collection of processes, at time t.
There are no messages in transit between processes at
time t.
Second condition necessary because a message in
transit might restart a terminated process. More difficult
to recognize. The time that it takes for messages to
travel between processes will not be known in advance.
www.cse.hcmut.edu.vn
85
www.cse.hcmut.edu.vn
86
87
www.cse.hcmut.edu.vn
88
References:
Parallel Programming: Techniques and
Applications Using Networked Workstations
and Parallel Computers, Barry Wilkinson
and MiChael Allen, Second Edition, Prentice
Hall, 2005.