Professional Documents
Culture Documents
October 2009
Note
2
ANSYS FLUENT
Computational Fluid Dynamics (CFD) is a computational technology
3
Objectives
4
Test Cluster Configuration
Benchmark Workload
5
Mellanox InfiniBand Solutions
Industry Standard
Hardware, software, cabling, management
The InfiniBand Performance
Design for clustering and storage interconnect Gap is Increasing
Performance 240Gb/s
40Gb/s node-to-node (12X)
120Gb/s switch-to-switch
1us application latency
Most aggressive roadmap in the industry
6
Delivering Intelligent Performance
Next Generation Intel Microarchitecture
Bandwidth Intensive
Intel QuickPath Technology
Integrated Memory Controller
Intel 5520
Chipset
Intel Data Center
Manager
Threaded Applications
45nm quad-core Intel Xeon Processors
Intel Node Manager Intel Hyper-threading Technology
Technology
PCI Express* 2.0
Performance on Demand
Intel Turbo Boost Technology
Intel X25-E ICH 9/10
SSDs Intel Intelligent Power Technology
7
Dell PowerEdge Servers helping Simplify IT
8
FLUENT 12 Benchmark Results - Interconnect
Input Dataset
Truck_14M (14 millions elements)
External Flow Over a Truck Body
Performance testing is reported as normalized scalability
The performance multiple increase versus 2 servers as the base line
For example, rating scalability of 4 means 4x performance versus the 2 servers case performance
InfiniBand QDR delivers above linear scalability
At 24 nodes, GigE wastes 50% of the cluster resources
Higher is better
9
FLUENT 12 Benchmark Results - Interconnect
Input Dataset
Truck_Poly_14M (14 millions elements)
Performance testing is reported as normalized scalability
The performance multiple increase versus 2 servers as the base line
For example, rating scalability of 4 means 4x performance versus the 2 servers case performance
InfiniBand QDR delivers above linear scalability
At 24 nodes, GigE wastes more than 50% of the cluster resources
Higher is better
10
FLUENT Performance Comparison
11
FLUENT 12 Profiling MPI Functions
16000
12000
8000
4000
0 st
v
cv
nd
d
e
e
ll
e
ec
rie
uc
ob
en
uc
ta
ca
re
se
ai
_R
ar
ed
_S
ed
pr
_B
_I
_I
_W
_B
PI
PI
_I
_R
llr
PI
PI
PI
PI
M
M
PI
PI
_A
M
M
PI
M
M
PI
M
MPI Function
4 Nodes 8 Nodes 16 Nodes
12
FLUENT 12 Profiling Timing
20000
16000
12000
8000
4000
0
l
st
nd
nd
_A obe
v
cv
r
e
e
l
rie
ta
ec
uc
uc
ca
Se
re
se
ai
ar
_R
ed
ed
pr
_B
_I
W
_I
_
_B
_I
llr
PI
PI
PI
R
_
PI
PI
PI
PI
_
PI
M
M
M
M
PI
M
M
M
PI
M
MPI Function
13
FLUENT 12 Profiling Message Transferred
28000
24000
(Thousands)
20000
16000
12000
8000
4000
0
]
]
]
]
]
]
]
]
]
B
B
B
4B
B
M
B
M
K
56
6K
K
4K
G
K
..4
..1
..6
..4
..1
56
..2
.2
[1
..1
.6
B
[0
[1
B
M
.2
4.
K
6.
[4
56
[6
[4
4.
56
[1
[6
[2
[2
Message Size
14
FLUENT Profiling Summary
MPI Functions
FLUENT 12 introduced MPI collective functions
MPI_Allreduce help improves the communication efficiency
15
Productive System = Balanced System
16
Thank You
HPC Advisory Council
All trademarks are property of their respective owners. All information is provided As-Is without any kind of warranty. The HPC Advisory Council makes no representation to the accuracy and
completeness of the information contained herein. HPC Advisory Council Mellanox undertakes no duty and assumes no obligation to update or correct any information presented herein
17 17