Professional Documents
Culture Documents
PRNG
(Not)shared
Atomics
2014-11-05
Summary
The problem
PRNG
(Not)shared
Atomics
Summary
The problem
PRNG
(Not)shared
Atomics
The problem
Summary
The problem
PRNG
(Not)shared
Atomics
Summary
The problem
PRNG
(Not)shared
Atomics
Summary
The problem
PRNG
(Not)shared
Atomics
Summary
The problem
PRNG
(Not)shared
Atomics
Summary
The problem
PRNG
(Not)shared
Atomics
And so. . .
Concurrent programming
Hard ("Small fixes to prevent blue-screens")
Attention-demanding ("Free lunch is over")
Summary
The problem
PRNG
(Not)shared
Atomics
And so. . .
Concurrent programming
Hard ("Small fixes to prevent blue-screens")
Attention-demanding ("Free lunch is over")
Summary
The problem
PRNG
(Not)shared
PRNG
Atomics
Summary
The problem
PRNG
(Not)shared
Atomics
Sequential program
1
2
3
4
5
Summary
The problem
PRNG
(Not)shared
Atomics
Sequential program
1
2
3
4
5
Summary
The problem
PRNG
(Not)shared
Atomics
Sequential program
1
2
3
4
5
Summary
The problem
PRNG
Parallel program
4 cores 4 threads
1/4 iterations each
4x speedup!
(Not)shared
Atomics
Summary
The problem
PRNG
(Not)shared
Atomics
Parallel program
4 cores 4 threads
1/4 iterations each
4x speedup!
1
2
3
4
5
Summary
The problem
PRNG
(Not)shared
Results
Timing:
Parallel way slower. . .
More cores == slower execution
Atomics
Summary
The problem
PRNG
(Not)shared
Results
Timing:
Parallel way slower. . .
More cores == slower execution
Profiling:
Low CPUs load
Mostly waiting on a single mutex
Atomics
Summary
The problem
PRNG
(Not)shared
Results
Timing:
Parallel way slower. . .
More cores == slower execution
Profiling:
Low CPUs load
Mostly waiting on a single mutex
Logic:
Come again?
What MUTEX?!
Atomics
Summary
The problem
PRNG
(Not)shared
Results
Timing:
Parallel way slower. . .
More cores == slower execution
Profiling:
Low CPUs load
Mostly waiting on a single mutex
Logic:
Come again?
What MUTEX?!
Suspect:
simulateRandomEnv()
random()
POSIX: random() is thread-safe. . .
. . . via mutex
Atomics
Summary
The problem
PRNG
What is wrong?
(Not)shared
Atomics
Summary
The problem
PRNG
What is wrong?
(Not)shared
Atomics
Summary
The problem
PRNG
(Not)shared
Fix
Drop problematic random()
Add per-thread PRNG
Atomics
Summary
The problem
PRNG
(Not)shared
Atomics
Summary
Fix
Drop problematic random()
Add per-thread PRNG
1
2
3
4
5
6
7
8
9
10
11
12
13
int simulateRandomEnv()
{
using Distribution = uniform_int_distribution<long>;
random_device rd;
mt19937
gen{rd()};
Distribution dist{0, RAND_MAX};
auto
random = [&]{ return dist(gen); };
int
result = 0;
//
// rest of the code remains the same!
//
return result;
}
The problem
PRNG
(Not)shared
Lessons learned
Measure:
Do it always
Also "the obvious"
Especially when "you are sure"
No excuse for not doing so
Atomics
Summary
The problem
PRNG
(Not)shared
Lessons learned
Measure:
Do it always
Also "the obvious"
Especially when "you are sure"
No excuse for not doing so
Atomics
Summary
The problem
PRNG
(Not)shared
Lessons learned
Measure:
Do it always
Also "the obvious"
Especially when "you are sure"
No excuse for not doing so
Atomics
Summary
The problem
PRNG
(Not)shared
Atomics
(Not)shared
Summary
The problem
PRNG
(Not)shared
Source code
1
2
unsigned a = 0;
unsigned b = 0;
3
4
5
6
7
8
void threadOne()
{
for(int i=0; i<10*1000*1000; ++i)
a += computeSth(i);
}
9
10
11
12
13
14
void threadTwo()
{
for(int i=0; i<10*1000*1000; ++i)
b += computeSthElse(i);
}
Atomics
Summary
The problem
PRNG
(Not)shared
Source code
1
2
unsigned a = 0;
unsigned b = 0;
3
4
5
6
7
8
void threadOne()
{
for(int i=0; i<10*1000*1000; ++i)
a += computeSth(i);
}
9
10
11
12
13
14
void threadTwo()
{
for(int i=0; i<10*1000*1000; ++i)
b += computeSthElse(i);
}
Data-race free
Returns expected results
Atomics
Summary
The problem
PRNG
(Not)shared
Atomics
Summary
The problem
PRNG
(Not)shared
Atomics
Summary
The problem
PRNG
(Not)shared
Variables in memory
1
2
Atomics
Summary
The problem
PRNG
(Not)shared
Variables in memory
1
2
Assume 32-bit
Most likely:
Consecutive addresses
Same cache line
Atomics
Summary
The problem
PRNG
(Not)shared
Line-wise
Caches are not byte-wise
Operate on lines
Tens-hundreds of bytes
Eg. 64B in my case
Operate on aligned addresses
Atomics
Summary
The problem
PRNG
(Not)shared
Line-wise
Caches are not byte-wise
Operate on lines
Tens-hundreds of bytes
Eg. 64B in my case
Operate on aligned addresses
Atomics
Summary
The problem
HOT-line
PRNG
(Not)shared
Atomics
Summary
The problem
PRNG
HOT-line
(Not)shared
Atomics
Summary
The problem
PRNG
(Not)shared
Solution #1
Sun Tzu: Art of war. . .
. . . avoid situations like this! :)
Atomics
Summary
The problem
PRNG
(Not)shared
Atomics
Summary
The problem
PRNG
(Not)shared
Atomics
Probable:
Globals from different compilation units
Dynamically allocated memory
Summary
The problem
PRNG
(Not)shared
Atomics
Probable:
Globals from different compilation units
Dynamically allocated memory
Risky business. . .
Summary
The problem
PRNG
(Not)shared
Solution #2
Ensure wont happen
One variable one cache line:
Alignment
Padding
Atomics
Summary
The problem
PRNG
(Not)shared
Solution #2
Ensure wont happen
One variable one cache line:
Alignment
Padding
Atomics
Summary
The problem
PRNG
(Not)shared
Atomics
Helper template
1
2
3
4
5
6
7
8
Summary
The problem
PRNG
(Not)shared
Atomics
Summary
The problem
PRNG
(Not)shared
Atomics
Summary
The problem
PRNG
(Not)shared
Atomics
Summary
The problem
PRNG
(Not)shared
Atomics
Summary
The problem
PRNG
(Not)shared
Lessons learned
Measure:
Do it always
Also "the obvious"
Especially when "you are sure"
No excuse for not doing so
Atomics
Summary
The problem
PRNG
(Not)shared
Lessons learned
Measure:
Do it always
Also "the obvious"
Especially when "you are sure"
No excuse for not doing so
Atomics
Summary
The problem
PRNG
(Not)shared
Atomics
Atomics
Summary
The problem
PRNG
(Not)shared
Atomics
int g_counter = 0;
2
3
4
5
6
7
Good. . . ?
Bad. . . ?
Ugly. . . ?
Summary
The problem
PRNG
(Not)shared
Atomics
int g_counter = 0;
2
3
4
5
6
7
Bad. . . ?
Ugly. . . ?
Right 5% of times
Good. . . ?
Summary
The problem
PRNG
(Not)shared
Atomics
2
3
4
5
6
7
Good. . . ?
Bad. . . ?
Ugly. . . ?
Summary
The problem
PRNG
(Not)shared
Atomics
2
3
4
5
6
7
Bad. . . ?
Ugly. . . ?
Good. . . ?
Closest: 1871882/4000000
Summary
The problem
PRNG
(Not)shared
Atomics
std::atomic<int> g_counter(0);
2
3
4
5
6
7
Summary
The problem
PRNG
(Not)shared
Atomics
std::atomic<int> g_counter(0);
2
3
4
5
6
7
Summary
The problem
PRNG
(Not)shared
Atomics
std::atomic<int> g_counter(0);
// shared
2
3
4
5
6
7
8
9
Summary
The problem
PRNG
(Not)shared
Atomics
Summary
The problem
PRNG
(Not)shared
3
4
5
6
7
8
9
void thread1()
{
started1 = true;
if(not started2)
std::cout << "thread 1 was first\n";
}
10
11
12
13
14
15
16
void thread2()
{
started2 = true;
if(not started1)
std::cout << "thread 2 was first\n";
}
Atomics
Summary
The problem
PRNG
(Not)shared
Results
Atomics
Summary
The problem
PRNG
(Not)shared
Results
Atomics
Summary
The problem
PRNG
(Not)shared
Results
Atomics
Summary
The problem
PRNG
(Not)shared
1
2
3
4
5
6
7
8
9
void thread1()
{
started1 = true; // memory write
if(not started2) // memory read
{ /* some action */ }
}
Atomics
Summary
The problem
PRNG
(Not)shared
Atomics
Zoom in reordered
1
2
3
4
5
6
7
8
9
void thread1()
{
if(not started2) // memory read
{ /* some action */ }
started1 = true; // memory write (!)
}
Summary
The problem
PRNG
(Not)shared
Sequential consistency
Reordering by:
Compiler
Hardware
Atomics
Summary
The problem
PRNG
(Not)shared
Sequential consistency
Reordering by:
Compiler
Hardware
Atomics
Summary
The problem
PRNG
(Not)shared
Sequential consistency
Reordering by:
Compiler
Hardware
Atomics
Summary
The problem
PRNG
(Not)shared
Sequential consistency
Reordering by:
Compiler
Hardware
Atomics
Summary
The problem
PRNG
(Not)shared
Lessons learned
Never use volatiles for threading
I mean it!
Synchronize using:
Atomics
Mutexes
Conditionals
Atomics
Summary
The problem
PRNG
(Not)shared
Lessons learned
Never use volatiles for threading
I mean it!
Synchronize using:
Atomics
Mutexes
Conditionals
Atomics
Summary
The problem
PRNG
(Not)shared
Lessons learned
Never use volatiles for threading
I mean it!
Synchronize using:
Atomics
Mutexes
Conditionals
Atomics
Summary
The problem
PRNG
(Not)shared
Lessons learned
Never use volatiles for threading
I mean it!
Synchronize using:
Atomics
Mutexes
Conditionals
Atomics
Summary
The problem
PRNG
(Not)shared
Atomics
Summary
Summary
The problem
PRNG
(Not)shared
Atomics
Summary
The problem
PRNG
(Not)shared
Atomics
Be cache-aware:
Keep non-shared data in separate cache lines
Prefer local over shared
Summary
The problem
PRNG
(Not)shared
Atomics
Be cache-aware:
Keep non-shared data in separate cache lines
Prefer local over shared
Synchronize properly:
Ensure code is data-race free (DRF)
NEVER use volatiles for sharing
Benefit from sequential consistency (SC)
Summary
The problem
PRNG
(Not)shared
Atomics
Be cache-aware:
Keep non-shared data in separate cache lines
Prefer local over shared
Synchronize properly:
Ensure code is data-race free (DRF)
NEVER use volatiles for sharing
Benefit from sequential consistency (SC)
Homework:
Read C++11 memory model
Read multi-threaded executions and data races
x86* vs. ARM and IA-64
Summary
The problem
PRNG
(Not)shared
Atomics
More materials
Something to watch:
Threads and shared variables in C++11,
Hans Boehm,
available on Channel9
Atomic<> Weapons,
Herb Sutter,
available on Channel9
CPU Caches and Why You care,
Scott Meyers,
available on Code::Dive :)
Summary
The problem
PRNG
(Not)shared
Atomics
More materials
Something to watch:
Threads and shared variables in C++11,
Hans Boehm,
available on Channel9
Atomic<> Weapons,
Herb Sutter,
available on Channel9
CPU Caches and Why You care,
Scott Meyers,
available on Code::Dive :)
Something to read:
C++ Concurrency in action: practical multithreading,
Anthony Williams
Summary
The problem
PRNG
(Not)shared
Atomics
More materials
Something to watch:
Threads and shared variables in C++11,
Hans Boehm,
available on Channel9
Atomic<> Weapons,
Herb Sutter,
available on Channel9
CPU Caches and Why You care,
Scott Meyers,
available on Code::Dive :)
Something to read:
C++ Concurrency in action: practical multithreading,
Anthony Williams
The Hitchhikers Guide to the Galaxy,
Douglas Adams
Summary
The problem
PRNG
(Not)shared
Questions?
Atomics
Summary