Slides18 Parallelism PDF

15-150 Fall 2014
Lecture 18
Stephen Brookes
today
parallel programming
parallelism and functional style

cost semantics

Brents Theorem and speed-ups

sequences: an abstract type with
efficient parallel operations
parallelism
exploiting multiple processors

evaluating independent code simultaneously

low-level implementation

scheduling work onto processors

high-level planning

designing code abstractly

without baking in a schedule
our approach
design abstractly
specify behavioral correctness

specify asymptotic runtime (work, span)

reason abstractly
independently of schedule

cost semantics and evaluation
functional benefits
No side effects, so evaluation order
doesnt affect behavioral correctness

Can build abstract types that support

efficient parallel-friendly operations

Can use work and span to predict

how parallelizable our code is

Work and span are independent of

scheduling details
caveat
In practice, its hard to achieve speed-up

Current language implementations
dont make it easy

Problems include:

scheduling overhead

locality of data (cache problems)

runtime sensitive to scheduling choices
why bother?
Its good to think abstractly first
and figure out details later

Focus on data dependencies
when you design your code

Our thesis: this approach to parallelism

will prevail...
(and 15-210 builds on these ideas...)
cost semantics
Weve already introduced work and span

Work estimates the sequential running time

on a single processor

Span takes account of data dependency,

estimates the parallel running time
with unlimited processors
cost semantics
We showed how to calculate work and span
for recursive functions with recurrence relations

Now we introduce cost graphs,
another way to deal with work and span

Cost graphs also allow us to talk about schedules...

... and the potential for speed-up
cost graphs
A cost graph is a series-parallel graph

directed graph, with source and sink

nodes represent units of work

edges represent data dependencies

branching indicates potential parallelism
(constant time)
cost graphs
.
..
.
G1
G2
sequential

composition
a single node
.
. .
. .
.
G1
G2
parallel

composition
work and span

of a cost graph
The work is the number of nodes

The span is the length of the longest path
from source to sink
span(G) work(G)
.
..
.
.
. .
. .
.
work
G1
work
G2
work
G1
G2
= work G1 + work G2 + c
sequential code add the work
= work G1 + work G2 + c
independent code add the work
.
..
.
.
. .
. .
.
span
G1
span
G2
span
G1
G2
= span G1 + span G2 + c
sequential code add the span
= max(span G1 , span G2) + c

parallel code max the span
example
and
must be done

before
work = 11 (number of nodes)

span = 4 (longest path length)
using cost graphs

Every expression can be given a cost graph

Can calculate work and span using the graph

These are asymptotically the same as
the work and span derived from
recurrence relations
work and span provide

asymptotic estimates of

actual running time,

under certain assumptions
basic operations

take constant time
work: single processor

span: many processors
Work: number of nodes

Span: length of critical path
w = 11
s=4
uses 5 processors
scheduling
assign units of work to processors

respecting data dependency
(i)
(ii)
an optimal

(iii) parallel schedule
(5 rounds,

(iv)
or 4 steps)
(v)
example
What if there are only 2 processors?
w = 11
s=4
(i)
(ii)
(iii)
(iv)
(v)
(vi)
a best schedule

for 2 processors
(6 rounds,

5 steps)
2 cannot do the work as fast as 5 (!)
Brents Theorem
An expression with work w and span s !
can be evaluated on a p-processor machine
in time O(max(w/p, s)).
Optimal schedule using p processors:

Do (up to) p units of work each round
Total work to do is w

Needs at least s steps
Richard Brent is an illustrious Australian mathematician and computer scientist.

He is known for Brents Theorem, which shows that a parallel algorithm can
always be adapted to run on fewer processors with only the obvious time penalty
a beautiful example of an obvious but non-trivial theorem.
Brents Theorem
An expression with work w and span s !
can be evaluated on a p-processor machine
in time O(max(w/p, s)).
Find me the smallest

p such that

w/p s
Using more than

this many processors

wont yield any speed-up
example
w = 11
s=4
min {p | w/p s} is 3
(i) a best schedule

(ii) for 3 processors
(iii)
(iv)
(5 rounds,

(v)
4 steps)
3 processors

can do the work as fast as 5(!)
next
Exploiting parallelism in ML

A signature for parallel collections

Cost analysis of implementations

Cost benefits of parallel algorithm design
sequences
signature SEQ =!
sig!
type a seq!
exception Range!
val tabulate : (int -> a) -> int -> a seq!
val length : a seq -> int!
val nth : int -> a seq -> a!
val map : (a -> b) -> a seq -> b seq!
val reduce : (a * a -> a) -> a -> a seq -> a!
val mapreduce : (a -> b) -> b -> (b * b -> b) -> a seq -> b!
end
implementations
Many ways to implement the signature

lists, balanced trees, arrays, ...

For each one, can give a cost analysis

There may be implementation trade-offs

arrays: item access is O(1)

trees: item access is O(log n)
Seq : SEQ
An abstract parameterized type of sequences

Think of a sequence as a parallel collection

With parallel-friendly operations

constant-time access to items

efficient map and reduce
Well work today with an implementation

Seq : SEQ

based on vectors
sequence values
A value of type t seq

is a sequence of values of type t
We use math notation like

v1, ..., vn
v0, ..., vn-1
for sequence values

1, 2, 4, 8 is a value of type int seq
equality
Two sequence values are (extensionally) equal
iff they have the same length
and their items are equal
v1, ..., vn = u1, ..., um

if and only if
n = m and for all i, vi = ui
operations
For each operation in the signature SEQ
we specify the (extensional) behavior of the

operation implemented in Seq
and discuss its cost semantics

Other structures with the same signature

may implement the operations with
different work and span profile

Learn to choose wisely!
tabulate
tabulate f n = f 0, ..., f(n-1)
If G is cost graph for f(i),

i
the cost graph for tabulate f n is

.
work?
G0
... Gn-1
span?
.
If f is O(1), the work for tabulate f n is O(n)

If f is O(1), the span for tabulate f n is O(1)
examples
tabulate (fn x:int => x) 6

tabulate (fn x:int => x*x) 6

tabulate (fn _ => raise Range) 0
0, 1, 2, 3, 4, 5
0, 1, 4, 9, 16, 25
length
length v1, ..., vn = n
!
Work is O(1)

Span is O(1)

.
Cost graph is
.
Contrast: List.length [v1,...,vn] = n

work, span O(n)
nth
nth i v0, ..., vn-1 = vi
= raise Range
!
Work is O(1)

Span is O(1)
.
Cost graph is
.
Seq provides

constant-time access to items
if 0 i < n
otherwise
map
map f v1, ..., vn = f v1, ..., f vn
map f v1, ..., vn has cost graph
G1
!
...
.
Gn
where each Gi

is graph for f vi
If f is constant time, map f v , ..., v has

1
work O(n), span O(1)

(contrast with List.map)
reduce
reduce should be used to combine a sequence using
an associative function g with identity element z

g : t * t -> t is associative iff for all x ,x ,x :t

1
g(x1, g(x2, x3)) = g(g(x1, x2), x3)

z is an identity for g iff for all x:t, g(x,z) = x

We write
v1 g v2 g ... g vn g z
for the result of combining v1, , vn, z
reduce g z v1, ..., vn = v1 g v2 g ... g vn g z
reduce
When g is associative and z is an identity

reduce g z v1, ..., vn = v1 g v2 g ... g vn g z
If g is constant time,
reduce g z v1, ..., vn
has work O(n)

and span O(log n)
needs to use g n times
divide-and-conquer
(Contrast with foldr, foldl on lists)
reduce (op +) 0 1, 2, 3, 4, 5, 6, 7, 8
.
.
1 2
3 4
5 6
+
cost graph
7 8
+
+
+
reduce cost
reduce g z v1, ..., v2n =
g(reduce g z v1, ..., vn, reduce g z vn+1, ..., v2n)
G1, ..., 2n =
G1, ..., n
W(2n) = 2*W(n) + c
S(2n) = S(n) + c
Gn+1, ..., 2n
.
.
W(n) is O(n)
S(n) is O(log2 n)
mapreduce
When g is associative and z is an identity,

mapreduce f z g v1, ..., vn = (f v1) g ... g (f vn) g z
When f, g are constant time,

mapreduce f z g v1, ..., vn
has work O(n)
and span O(log n)
examples
fun sum (s : int seq) : int = !
reduce (op +) 0 s
fun count (s : int seq seq) : int = !
sum (map sum s)
analysis
fun sum (s : int seq) : int = reduce (op +) 0 s
fun count (s : int seq seq) : int = sum (map sum s)
Let s be a value of type int seq seq
consisting of n rows, each of length n

What are the work and span for

count s ?
analysis
Let s = s1, ..., sn , si = xi1, ..., xin, ti = sum si
For each i, sum si = reduce(op +) 0 xi1, ..., xin
cost graph of

sum si
sum si
work is O(n)

span is O(log n)
log2 n
map sum s = sum s1, ..., sum sn

.
2)

work
is
O(n
cost graph of

sum s
sum s
...
map sum s
span is O(log n)
.
1
analysis
Let ti = sum si
count s = sum t1, ..., tn
.
sum s1
cost graph of

sum (map sum s)
...
.
sum sn
sum t1, ..., tn
log2 n
log2 n
work is O(n2)

span is O(log n)

Slides18 Parallelism PDF

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Slides18 Parallelism PDF

Uploaded by

Copyright:

Available Formats

15-150 Fall 2014

parallelism and functional style

specify behavioral correctness

doesnt affect behavioral correctness

Can build abstract types that support

Can use work and span to predict

Work and span are independent of

Focus on data dependencies

when you design your code

Our thesis: this approach to parallelism

(and 15-210 builds on these ideas...)

Work estimates the sequential running time

Span takes account of data dependency,

for recursive functions with recurrence relations

Now we introduce cost graphs,

another way to deal with work and span

Cost graphs also allow us to talk about schedules...

directed graph, with source and sink

work and span

The work is the number of nodes

= max(span G1 , span G2) + c

work = 11 (number of nodes)

using cost graphs

work and span provide

Work: number of nodes

assign units of work to processors

2 cannot do the work as fast as 5 (!)

Richard Brent is an illustrious Australian mathematician and computer scientist.

We use math notation like

v0, ..., vn-1

for sequence values

v1, ..., vn = u1, ..., um

we specify the (extensional) behavior of the

Other structures with the same signature

Learn to choose wisely!

If G is cost graph for f(i),

the cost graph for tabulate f n is

If f is O(1), the work for tabulate f n is O(n)

If f is constant time, map f v , ..., v has

work O(n), span O(1)

g : t * t -> t is associative iff for all x ,x ,x :t

g(x1, g(x2, x3)) = g(g(x1, x2), x3)

z is an identity for g iff for all x:t, g(x,z) = x

reduce g z v1, ..., vn

has work O(n)

needs to use g n times

When f, g are constant time,

Let s be a value of type int seq seq

consisting of n rows, each of length n

What are the work and span for

map sum s = sum s1, ..., sum sn

sum t1, ..., tn

You might also like