Professional Documents
Culture Documents
Specifications
M. D. Jones, Ph.D. (CCR/UB) Shared Memory Programming With OpenMP HPCI Fall 2014 1 / 70 M. D. Jones, Ph.D. (CCR/UB) Shared Memory Programming With OpenMP HPCI Fall 2014 3 / 70
M. D. Jones, Ph.D. (CCR/UB) Shared Memory Programming With OpenMP HPCI Fall 2014 4 / 70 M. D. Jones, Ph.D. (CCR/UB) Shared Memory Programming With OpenMP HPCI Fall 2014 5 / 70
Introduction to OpenMP Introduction to OpenMP
OpenMP is not
Ease of (mis)use intended for distributed memory parallel computing (can be used
Incremental parallelization in combination with MPI, however)
Fairly easy to get speedup Intel’s Cluster OpenMP, extend over distributed memory
UPC, Co-Array FORTRAN, proposed PGAS language extensions for
Potentially scalable on large (SMP) systems
shared & distributed memory
HPC architectures are evolving towards OpenMP’s
implemented identically by all vendors (not a surprise)
design(Multi-core, single socket CPUs are already here, and
gaining rapidly in popularity) promised to be the most efficient way to make use of shared
memory (data locality is still an outstanding issue)
M. D. Jones, Ph.D. (CCR/UB) Shared Memory Programming With OpenMP HPCI Fall 2014 6 / 70 M. D. Jones, Ph.D. (CCR/UB) Shared Memory Programming With OpenMP HPCI Fall 2014 7 / 70
M. D. Jones, Ph.D. (CCR/UB) Shared Memory Programming With OpenMP HPCI Fall 2014 8 / 70 M. D. Jones, Ph.D. (CCR/UB) Shared Memory Programming With OpenMP HPCI Fall 2014 9 / 70
Introduction to OpenMP Introduction to OpenMP
$OMP_NUM_THREADS=8 Most OpenMP constructs are compiler directives or pragmas (we will
(implicit synchronization)
deal with the OpenMP API separately):
!$omp parallel
1 #pragma omp c o n s t r u c t [ c l a u s e [ c l a u s e ] . . . ]
Master Master
Thread Thread
F77:
1 C$OMP c o n s t r u c t [ c l a u s e [ c l a u s e ] . . . ]
Team
F90:
Fork-Join, the master thread spawns a team of threads inside parallel 1 !$OMP c o n s t r u c t [ c l a u s e [ c l a u s e ] . . . ]
regions.
Typical Use: split compute-intensive loops among the thread team. See to compilers that do not support OpenMP, these directives are
the previous slide. comments, and have no effect.
M. D. Jones, Ph.D. (CCR/UB) Shared Memory Programming With OpenMP HPCI Fall 2014 10 / 70 M. D. Jones, Ph.D. (CCR/UB) Shared Memory Programming With OpenMP HPCI Fall 2014 11 / 70
M. D. Jones, Ph.D. (CCR/UB) Shared Memory Programming With OpenMP HPCI Fall 2014 12 / 70 M. D. Jones, Ph.D. (CCR/UB) Shared Memory Programming With OpenMP HPCI Fall 2014 13 / 70
Introduction to OpenMP Introduction to OpenMP
OpenMP
Run−time Book: "Using OpenMP: Portable Shared Memory Parallel
Library Programming,” by B. Chapman, G. Jost, and R. van der Pas (MIT,
OpenMP Directives Boston, 2008).
#pragma omp ...
!$omp ... OpenMP Sample codes available online at www.openmp.org
Environment
Variables Book: "Parallel Programming in OpenMP," by R. Chandra et. al.
(Academic, New York, 2001).
Web: www.openmp.org
We will consider the smaller pieces of the OpenMP puzzle first (they
are reasonable self-contained and will help to inform the rest)
M. D. Jones, Ph.D. (CCR/UB) Shared Memory Programming With OpenMP HPCI Fall 2014 14 / 70 M. D. Jones, Ph.D. (CCR/UB) Shared Memory Programming With OpenMP HPCI Fall 2014 15 / 70
The OpenMP Run-time Library The OpenMP Run-time Library Run-Time Control
M. D. Jones, Ph.D. (CCR/UB) Shared Memory Programming With OpenMP HPCI Fall 2014 17 / 70 M. D. Jones, Ph.D. (CCR/UB) Shared Memory Programming With OpenMP HPCI Fall 2014 18 / 70
The OpenMP Run-time Library Run-Time Control The OpenMP Run-time Library Run-Time Control
FORTRAN C/C++
simple locks may not be locked if already in a locked state.
subroutine OMP_SET_NUM_THREADS void omp_set_num_threads(int nthreads)
integer function OMP_GET_NUM_THREADS int omp_get_num_threads(void) nestable locks may be locked multiple times by the same thread.
integer function OMP_GET_MAX_THREADS int omp_get_max_threads(void)
integer function OMP_GET_THREAD_NUM
integer function OMP_GET_NUM_PROCS
int omp_get_thread_num(void)
int omp_get_num_procs(void)
Lock variables:
logical function OMP_IN_PARALLEL int omp_in_parallel(void)
subroutine OMP_SET_DYNAMIC(scalar_logical_expr) void omp_set_dynamic(int dthreads) 1 INTEGER(KIND=OMP_LOCK_KIND) :: svar
logical function OMP_GET_DYNAMIC int omp_get_dynamic(void) 2 INTEGER(KIND=OMP_NEST_LOCK_KIND) :: nvar
subroutine OMP_SET_NESTED(scalar_logical_expr) void omp_set_nested(int nested)
logical function OMP_GET_NESTED int omp_get_nested(void)
1 omp_lock_t *lock;
2
Some new routines added in OpenMP 3.0 will be discussed later. omp_nest_lock_t *lock;
M. D. Jones, Ph.D. (CCR/UB) Shared Memory Programming With OpenMP HPCI Fall 2014 19 / 70 M. D. Jones, Ph.D. (CCR/UB) Shared Memory Programming With OpenMP HPCI Fall 2014 20 / 70
The OpenMP Run-time Library Run-Time Control The OpenMP Run-time Library Run-Time Control
It is easy to get into trouble with locks - setting a lock will cause the
non-owning threads to block until the lock is unset, consider:
1 #include <omp.h>
2
FORTRAN C/C++ 3 omp_lock_t *lock1;
4
subroutine OMP_INIT_LOCK(svar) void omp_init_lock(omp_lock_t *lock)
5 /* lots of intervening code ... */
subroutine OMP_INIT_NEST_LOCK(nvar) void omp_init_nest_lock(omp_nest_lock_t *lock)
6
subroutine OMP_DESTROY_LOCK(svar) omp_destroy_lock(omp_lock_t *lock)
7 omp_init_lock(lock1);
subroutine OMP_DESTROY_NEST_LOCK(nvar) void omp_destroy_nest_lock(omp_nest_lock_t *lock)
8
subroutine OMP_SET_LOCK(svar) void omp_set_lock(omp_lock_t *lock)
9 #pragma omp parallel for shared(lock1)
subroutine OMP_SET_NEST_LOCK(nvar) void omp_set_nest_lock(omp_nest_lock_t *lock)
10 for (i=0; i<=N-1; i++) { /* simple lock example */
subroutine OMP_UNSET_LOCK(svar) void omp_unset_lock(omp_lock_t *lock)
11 if (A[i] > CURRENT_MAX) {
subroutine OMP_UNSET_NEST_LOCK(nvar) void omp_unset_nest_lock(omp_nest_lock_t *lock)
12 omp_set_lock(&lock1);
logical function OMP_TEST_LOCK(svar) int omp_test_lock(omp_lock_t *lock)
13 if (A[i] > CURRENT_MAX) {
logical function OMP_TEST_NEST_LOCK(nvar) int omp_test_nest_lock(omp_nest_lock_t *lock)
14 CURRENT_MAX = A[i];
15 }
16 }
17 omp_unset_lock(&lock1);
18 }
19 omp_destroy_lock(&lock1);
Timing Routines
Usage example:
in FORTRAN :
The run-time library includes two timing routines that implement a
1 DOUBLE PRECISION tstart,tend
portable wall-clock timer. 2 tstart=OMP_GET_WTIME()
3 call bigjob(i,j,M,N,a)
4
in FORTRAN : 5
tend=OMP_GET_WTIME()
print*, ’bigjob exec time = ’,tend-tstart
M. D. Jones, Ph.D. (CCR/UB) Shared Memory Programming With OpenMP HPCI Fall 2014 23 / 70 M. D. Jones, Ph.D. (CCR/UB) Shared Memory Programming With OpenMP HPCI Fall 2014 24 / 70
OMP_NUM_THREADS integer
how many default threads used in parallel regions
OMP_SCHEDULE (type[,chunk]) tcsh :
control default for SCHEDULE directive, type can be one of
1 setenv OMP_NUM_THREADS 2
static, dynamic, andguided. Also auto in OpenMP 3.0. 2 setenv OMP_SCHEDULE "guided,4"
3 setenv OMP_SCHEDULE "dynamic"
OMP_DYNAMIC true|false
allows/disallows variable number of threads bash :
OMP_NESTED true|false 1 export OMP_NUM_THREADS=2
allows/disallows nested parallelism. If allowed, number of 2
3
export OMP_SCHEDULE="guided,4"
export OMP_SCHEDULE="dynamic"
threads used to execute nested parallel regions is
implementation dependent (can even be serialized!).
M. D. Jones, Ph.D. (CCR/UB) Shared Memory Programming With OpenMP HPCI Fall 2014 26 / 70 M. D. Jones, Ph.D. (CCR/UB) Shared Memory Programming With OpenMP HPCI Fall 2014 27 / 70
OpenMP Basics Sentinels OpenMP Basics Conditional Compilation
M. D. Jones, Ph.D. (CCR/UB) Shared Memory Programming With OpenMP HPCI Fall 2014 29 / 70 M. D. Jones, Ph.D. (CCR/UB) Shared Memory Programming With OpenMP HPCI Fall 2014 30 / 70
M. D. Jones, Ph.D. (CCR/UB) Shared Memory Programming With OpenMP HPCI Fall 2014 31 / 70 M. D. Jones, Ph.D. (CCR/UB) Shared Memory Programming With OpenMP HPCI Fall 2014 32 / 70
OpenMP Basics Conditional Compilation OpenMP Directives
M. D. Jones, Ph.D. (CCR/UB) Shared Memory Programming With OpenMP HPCI Fall 2014 33 / 70 M. D. Jones, Ph.D. (CCR/UB) Shared Memory Programming With OpenMP HPCI Fall 2014 35 / 70
M. D. Jones, Ph.D. (CCR/UB) Shared Memory Programming With OpenMP HPCI Fall 2014 36 / 70 M. D. Jones, Ph.D. (CCR/UB) Shared Memory Programming With OpenMP HPCI Fall 2014 37 / 70
OpenMP Directives Work-Sharing OpenMP Directives Work-Sharing
M. D. Jones, Ph.D. (CCR/UB) Shared Memory Programming With OpenMP HPCI Fall 2014 38 / 70 M. D. Jones, Ph.D. (CCR/UB) Shared Memory Programming With OpenMP HPCI Fall 2014 39 / 70
1 !$OMP DO [clause[,clause]...] 1 void load_arrays(int n, int m, double *A, double *B, double *C, double *D) {
2 #pragma omp for [clause[,clause]...] 2 int i;
3 #pragma omp parallel
4 {
5 #pragma omp for nowait
clause PRIVATE(list) 6 for (i=0; i<n; i++) {
clause FIRSTPRIVATE(list) 7 B[i] = (A[i]-A[i-1])/2.0;
8 }
clause LASTPRIVATE(list) 9 #pragma omp for nowait
10 for (i=0; i<m; i++) {
clause REDUCTION({operator|intrinsic_procedure_name}:list) 11 D[i] = sqrt(C[i]);
12 }
clause SCHEDULE(type[,chunk]) 13 }
control how loop iterations are mapped onto threads. 14 }
M. D. Jones, Ph.D. (CCR/UB) Shared Memory Programming With OpenMP HPCI Fall 2014 40 / 70 M. D. Jones, Ph.D. (CCR/UB) Shared Memory Programming With OpenMP HPCI Fall 2014 41 / 70
OpenMP Directives Work-Sharing OpenMP Directives Work-Sharing
M. D. Jones, Ph.D. (CCR/UB) Shared Memory Programming With OpenMP HPCI Fall 2014 42 / 70 M. D. Jones, Ph.D. (CCR/UB) Shared Memory Programming With OpenMP HPCI Fall 2014 43 / 70
1 !$OMP PARALLEL
2 !$OMP SECTIONS
3 !$OMP SECTION SINGLE serializes a parallel region
4 CALL XAXIS() ! xaxis(),yaxis(),zaxis() independent
5 1 !$OMP SINGLE [clause[,clause]...]
6 !$OMP SECTION 2 .
7 CALL YAXIS() 3 .
8 4 !$OMP END SINGLE [COPYPRIVATE|NOWAIT]
9 !$OMP SECTION
10 CALL ZAXIS()
11
12
!$OMP END SECTIONS
!$OMP END PARALLEL available clauses:
PRIVATE(list)
Three subroutines executed concurrently FIRSTPRIVATE(list)
Scheduling of individual SECTION blocks is implementation
dependent.
M. D. Jones, Ph.D. (CCR/UB) Shared Memory Programming With OpenMP HPCI Fall 2014 44 / 70 M. D. Jones, Ph.D. (CCR/UB) Shared Memory Programming With OpenMP HPCI Fall 2014 45 / 70
OpenMP Directives Work-Sharing OpenMP Directives Work-Sharing
1 void single_example() {
2 #pragma omp parallel
3 {
4 #pragma omp single 1 !$OMP WORKSHARE
5 printf("Beginning do_lots_of_work ...\n"); 2 .
6 3 .
7 do_lots_of_work(); 4 !$OMP END WORKSHARE [NOWAIT]
8
9 #pragma omp single
10 printf("Finished do_lots_of_work.\n"); available clauses:
11
12 #pragma omp single nowait
13 printf("Beginning do_lots_more_work ...\n"); divides work in enclosed code into segments executed once by
14
15 do_lots_more_work(); thread team members.
16 }
17 } units of work assigned in any manner subject to execution-once
constraint.
No guarantee which thread executes SINGLE region BARRIER is implied unless END WORKSHARE NOWAIT is used.
Can use a NOWAIT clause if other threads can continue without
waiting at implicit barrier
M. D. Jones, Ph.D. (CCR/UB) Shared Memory Programming With OpenMP HPCI Fall 2014 46 / 70 M. D. Jones, Ph.D. (CCR/UB) Shared Memory Programming With OpenMP HPCI Fall 2014 47 / 70
1 integer :: i,j
Restrictions on WORKSHARE: 2 integer,parameter :: n=1000
3 real,dimension(n,n) :: A,B,C,D
FORTRAN only 4
5 do i=1,n
Requires OpenMP version >= 2.0 (often seems to be exception 6
7
do j=1,n
a(i,j) = i*2.0
even in 2.5) 8
9
b(i,j) = j+2.0
enddo
10
Enclosed block can only consist of: 11
enddo
M. D. Jones, Ph.D. (CCR/UB) Shared Memory Programming With OpenMP HPCI Fall 2014 48 / 70 M. D. Jones, Ph.D. (CCR/UB) Shared Memory Programming With OpenMP HPCI Fall 2014 49 / 70
OpenMP Directives Work-Sharing OpenMP Directives Work-Sharing
M. D. Jones, Ph.D. (CCR/UB) Shared Memory Programming With OpenMP HPCI Fall 2014 50 / 70 M. D. Jones, Ph.D. (CCR/UB) Shared Memory Programming With OpenMP HPCI Fall 2014 51 / 70
M. D. Jones, Ph.D. (CCR/UB) Shared Memory Programming With OpenMP HPCI Fall 2014 52 / 70 M. D. Jones, Ph.D. (CCR/UB) Shared Memory Programming With OpenMP HPCI Fall 2014 53 / 70
OpenMP Directives Data Environment OpenMP Directives Data Environment
A private copy of each list variable is created for each thread, the 1 void ex_reduction(double *x, double *y, int n) {
2 int i,b;
reduced value is written to the global shared variable 3 double a;
4
listed variables must be named scalars (not arrays or structures), 5
6
a=0.0;
b=0;
declared as SHARED 7
8
#pragma omp parallel for private(i) shared(x,y,n) \
reduction(+:a) reduction(^:b)
Watch out for commutativity-associativity (subtraction, for 9
10
for (i=0;i<n;i++) {
a += x[i];
example) 11
12 }
b ^= y[i]; /* bitwise XOR */
13 }
FORTRAN C/C++
intrinsic max,min,iand,ior,ieor
operator +, ∗, −, .and., .or ., .eqv ., .neqv . +,*,-,^,&,|,&&,||,
OpenMP 3.1 has (finally) added support for min and max in C/C++
M. D. Jones, Ph.D. (CCR/UB) Shared Memory Programming With OpenMP HPCI Fall 2014 54 / 70 M. D. Jones, Ph.D. (CCR/UB) Shared Memory Programming With OpenMP HPCI Fall 2014 55 / 70
M. D. Jones, Ph.D. (CCR/UB) Shared Memory Programming With OpenMP HPCI Fall 2014 56 / 70 M. D. Jones, Ph.D. (CCR/UB) Shared Memory Programming With OpenMP HPCI Fall 2014 57 / 70
OpenMP Directives Synchronization OpenMP Directives Synchronization
M. D. Jones, Ph.D. (CCR/UB) Shared Memory Programming With OpenMP HPCI Fall 2014 60 / 70 M. D. Jones, Ph.D. (CCR/UB) Shared Memory Programming With OpenMP HPCI Fall 2014 62 / 70
New Features in OpenMP 3.x New Features in OpenMP 3.x
M. D. Jones, Ph.D. (CCR/UB) Shared Memory Programming With OpenMP HPCI Fall 2014 63 / 70 M. D. Jones, Ph.D. (CCR/UB) Shared Memory Programming With OpenMP HPCI Fall 2014 64 / 70
New Features in OpenMP 3.x Improvements to Loop Parallelism New Features in OpenMP 3.x Improvements to Loop Parallelism
M. D. Jones, Ph.D. (CCR/UB) Shared Memory Programming With OpenMP HPCI Fall 2014 65 / 70 M. D. Jones, Ph.D. (CCR/UB) Shared Memory Programming With OpenMP HPCI Fall 2014 66 / 70
New Features in OpenMP 3.x Improvements to Loop Parallelism New Features in OpenMP 3.x Improvements to Loop Parallelism
M. D. Jones, Ph.D. (CCR/UB) Shared Memory Programming With OpenMP HPCI Fall 2014 67 / 70 M. D. Jones, Ph.D. (CCR/UB) Shared Memory Programming With OpenMP HPCI Fall 2014 68 / 70
M. D. Jones, Ph.D. (CCR/UB) Shared Memory Programming With OpenMP HPCI Fall 2014 70 / 70