Professional Documents
Culture Documents
HIPerWall:
200 Million
Pixels
50 Displays
30 Power
Mac G5s
Write Back
L2 Cache
CPU1
Mem I/F
System/
FP
Thread 2
L/S CPU2
CPU2
CPU1
CPU2
CPU1
CPU2
CPU1
CPU2
CPU3
CPU4
CPU1
CPU2
Core 2 Xeon (Mac Pro) Dual-Core Opteron Core 2 Quad/Extreme
pi = 0.0;
for (i = 0; i < threadCount; i++)
pi += resultArray[i];
UCI EECS Scalable Parallel and Distributed Systems Lab 17
Java Threads
CPU1
Mem I/F
System/
CPU Mem
Mem
CPU2
Then
Now
Memory Communications
Bottleneck Through Cache
Conventional Producer/Consumer
(Spatial Decomposition) (SPPM)
9 Benefits 9 Drawbacks
9Memory bandwidth 9Complex
same as sequential programming
version 9Some synchronization
9Performance overhead
improvement (usually) 9Not always faster than
9Easy in concept SDM (or sequential)
9 8 cores soon
9 Room for improvement
Athlon 64 ATI
9Multi-way caches CPU GPU
expensive
9Coherence protocols XBAR
perform poorly Hyper- Memory
9 Stream programming Transport Controller
9GPU or multi-core Possible Hybrid
9GPGPU.org for details AMD Multi-Core
Design
180
160 Seq
SDM
140
SPPM
120 PTM
Seconds
100
160.27
80 122.63
114.84
110.67
60
40
41.47
33.83
24.98
26.09
20
0
Cell Core2Duo
FDTD 80x80x80x1000