You are on page 1of 19

TEKNIK PARALEL PADA

PENGOLAHAN CITRA
(CNH4O3)
Dr. Putu Harry Gunawan, M.Si, M.Sc
Week 2: Review parallel programming
(OpenMP)

Why parallel
programming?
Simply put, because it may speed up your code. Unlike 10
years ago, today, your computer (and probably even your
smartphone) have one or more CPUs that have multiple
processing cores (Multi-core processor).
This helps with desktop computing tasks like multitasking
(running multiple programs, plus the operating system,
simultaneously).
For scientific computing, this means you have the ability in
principle of splitting up your computations into groups and
running each group on its own processor.

Why parallel
programming?
Serial

Why parallel
programming?
Serial

Why parallel
programming?
Parallel

Why parallel
programming?
Parallel

Kinds of Parallel
Programming
There are many flavours of parallel programming, some that are general and can be
run on any hardware, and others that are specific to particular hardware
architectures.
Two main paradigms we can talk about here are shared memory versus distributed
memory models.
In shared memory models, multiple processing units all have access to the same,
shared memory space. This is the case on your desktop or laptop with multiple CPU
cores.
In a distributed memory model, multiple processing units each have their own
memory store, and information is passed between them. This is the model that a
networked cluster of computers operates with.
A computer cluster is a collection of standalone computers that are connected to
each other over a network, and are used together as a single system. We won't be
talking about clusters here, but some of the tools we'll talk about (e.g. MPI) are
easily used with clusters.

Types of parallel task


Broadly speaking we can separate a computation into two camps depending on how
it can be parallelized. A so-called embarrassingly parallel problem is one for which it
is dead easy to separate it into some number of independent tasks that then may be
run in parallel.
Embarrassingly parallel problems
Embarrassingly parallel computational problems are the easiest to parallelize and
you can achieve impressive speedups if you have a computer with many cores.
Even if you have just two cores, you can get close to a two-times speedup. An
example of an embarrassingly parallel problem is when you need to run a
preprocessing pipeline on datasets collected for 15 subjects. Each subject's data can
be processed independently of the others. In other words, the computations involved
in processing one subject's data do not in any way depend on the results of the
computations for processing some other subject's data.
As an example, a grad student in my lab (Heather) figured out how to distribute her
FSL preprocessing pipeline for 24 fMRI subjects across multiple cores on her Mac
Pro desktop (it has 8) and as a result what used to take about 48 hours to run, now
takes "just" over 6 hours.

Types of parallel task


Serial problems
In contrast to embarrassingly parallel problems, there is a class of problems that cannot
be split into independent sub-problems, we can call them inherently sequential or serial
problems. For these types of problems, the computation at one stage does depend on the
results of a computation at an earlier stage, and so it is not so easy to parallelize across
independent processing units. In these kinds of problems, there is a need for some
communication or coordination between sub-tasks.
An example of a serial problem is a simulation of an arm movement. We run simulations of
arm movements like reaching, that use detailed mathematical models of muscle
mechanics, activation dynamics, musculoskeletal dynamics and spinal reflexes.
Differential equations govern the relationship between muscle stimulation (the input) and
the resulting arm movement (the output). These equations are "unwrapped" in time by a
differential equation integrator, that takes small steps (like 1 millisecond at a time) to
generate a simulation of a whole movement (e.g. 1 second of simulated time). On each
step the current state of the system depends on both the current input (muscle command)
and on the previous state of the system. With a 1 ms step, it takes (at least) 1000
computations to simulate a 1 sec arm movement but we cannot simply split up those
1000 computations and distribute them to a set of independent processing units. This is an
inherently serial problem where the current computation cannot be carried out without the
results of the previous computation.

Types of parallel task


Mixtures
A good example of a problem that has both embarrassingly
parallel properties as well as serial dependency properties, is
the computations involved in training and running an artificial
neural network (ANN). An ANN is made up of several layers of
neuron-like processing units, each layer having many (even
hundreds or thousands) of these units. If the ANN is a pure
feedforward architecture, then computations within each layer
are embarrassingly parallel, while computations between
layers are serial.

Tools of parallel
programming
The threads model
The threads model of parallel programming is one in which a
single process (a single program) can spawn multiple,
concurrent "threads" (sub-programs). Each thread runs
independently of the others, although they can all access the
same shared memory space (and hence they can
communicate with each other if necessary). Threads can be
spawned and killed as required, by the main program.

Tools of parallel
programming
OpenMP
OpenMP is an API that implements a multi-threaded, shared
memory form of parallelism. It uses a set of compiler directives
(statements that you add to your C code) that are incorporated
at compile-time to generate a multi-threaded version of your
code. You can think of Pthreads (above) as doing multithreaded programming "by hand", and OpenMP as a slightly
more automated, higher-level API to make your program
multithreaded. OpenMP takes care of many of the low-level
details that you would normally have to implement yourself, if
you were using Pthreads from the ground up.

MPI

Tools of parallel
programming

The Message Passing Interface (MPI) is a standard defining core syntax and semantics
of library routines that can be used to implement parallel programming in C (and in other
languages as well). There are several implementations of MPI such as Open MPI,
MPICH2 and LAM/MPI.
In the context of this tutorial, you can think of MPI, in terms of its complexity, scope and
control, as sitting in between programming with Pthreads, and using a high-level API such
as OpenMP.
The MPI interface allows you to manage allocation, communication, and synchronization
of a set of processes that are mapped onto multiple nodes, where each node can be a
core within a single CPU, or CPUs within a single machine, or even across multiple
machines (as long as they are networked together).
One context where MPI shines in particular is the ability to easily take advantage not just
of multiple cores on a single machine, but to run programs on clusters of several
machines. Even if you don't have a dedicated cluster, you could still write a program
using MPI that could run your program in parallel, across any collection of computers, as
long as they are networked together. Just make sure to ask permission before you load
up your lab-mate's computer's CPU(s) with your computational tasks!

Tools of parallel
programming
GPU COMPUTING

The Real World is Massively Parallel:

Who is Using Parallel Computing?

Who is Using Parallel Computing?

Pause go to parallel
programming with
OpenMP

OpenMP

https://www.dartmouth.edu/~rc/classes/intro_open
mp/what_is_openmp.html

https://computing.llnl.gov/tutorials/openMP/

You might also like