Professional Documents
Culture Documents
for FEA
John Higgins, PE
Senior Application Engineer
1 © 2012 ANSYS, Inc. May 18, 2012
Agenda
• Overview
• Parallel Processing Methods
• Solver Types
• Performance Review
• Memory Settings
• GPU Technology
• Software Considerations
• Appendix
A machine :
Analysing the model prior to launch the
-Number of cores run may help to choose the more suitable
-RAM solver configuration at the first attempt
Memory Settings
Memory Settings
Memory Settings
Memory Settings
Memory Settings
Workstation/Server:
Workstation
Cluster
11 © 2012 ANSYS, Inc. May 18, 2012
Parallel Processing – Hardware + Software
Laptop/Desktop Cluster
or
Workstation/Server
ANSYS YES SMP (per node)
Distributed ANSYS YES YES
Processor 1
Processor 4
Processor 2
Distributed PCG
• For static and full transient analyses
Memory Settings
Equation solver dominates solution CPU time! Need to pay attention to equation solver
Equation solver also consumes the most system resources (memory and I/O)
Prep Data
Element Formation
Solution Procedures
Global Assembly
Solve [K]{x} = {b}
Element Stress Recovery
Element emat
esav
Formation
data in-core objects
Symbolic
database
full
Assembly
rst,rth Output
Element
Output
21 © 2012 ANSYS, Inc. May 18, 2012
Solver Types: SPARSE (Direct)
SPARSE (Direct)
Filing …
LN09
SPARSE (Direct)
PROS
- More robust with poorly conditioned problems (Shell-Beams)
- Solution always guaranteed
- Fast for 2nd Solve or Higher (Multiple Load cases)
CONS
- Factoring matrix & Solving are resource intensive
- Large memory requirements
PCG (Iterative)
- Minimization of residuals/potential energy (Standard Conjugate
Gradient Method) ( {r} = {f} – [K].{u} )
- Iterative process requiring a convergence test (PCGTOL).
- Preconjugate CG used instead to reduce the number of iterations
( Preconditioner [Q] ̴ [K-1] - [Q] cheaper than [K-1] )
- Number of iterations
PCGTOL
25 © 2012 ANSYS, Inc. May 18, 2012
Solver Types: PCG (Iterative)
PCG (Iterative)
Filing…
*.PC*
PCG (Iterative)
PROS
- Less memory requirements
- Better suited for well conditioned bigger problem
CONS
- Not useful with near or rigid body behavior
- Less robust with ill-conditioned models (Shells & Beams, inadequate
boundary conditions (Rigid Body Motions), elements considerably
elongated, nearly singular matrices…) – more difficult to approximate
[K-1] with [Q]
Memory Settings
- Click Start, click in the Start Search box, type resmon.exe, and then press ENTER.
Memory Settings
Memory Settings
If more than 1547 MB is available, it’ll run fully in-core – best performance
From PCGOPT,Lev_Diff
Important statistic!
Memory Settings
PCI Express
channel
Motivation
• Equation solver dominates solution time
– Logical place to add GPU acceleration
Supported hardware
• Currently recommending NVIDIA Tesla 20-series cards
• Recently added support for Quadro 6000
• Requires the following items
– Larger power supply (1 card needs about 225W)
– Open 2x form factor PCIe x16 Gen2 slot
• Supported on Windows/Linux 64-bit
NVIDIA Tesla NVIDIA Tesla NVIDIA Quadro
C2050 C2070 6000
Power 225 Watts 225 Watts 225 Watts
CUDA cores 448 448 448
Memory 3 GB 6 GB 6 GB
Memory Bandwidth 144 GB/s 144 GB/s 144 GB/s
Solver
Kernel
Speedups
Overall
Speedups
3000
Lower
is Xeon 5670 2.93 GHz Westmere (Dual Socket)
Better V13sp-5 Model
ANSYS Mechanical Times in Seconds
2000
1848
Add a Tesla C2075 to
use with 6 cores:
now 46% faster than
12, with 6 available
for other tasks
4.2x 1192 Turbine
1000
geometry
3.5x 846 2,100 K DOF
2.7x SOLID187 FEs
564 2.1x 516 1.9x
444 Static, nonlinear
399
342 314 273 270 One iteration
0 Direct sparse
1 Socket 2 Socket
180000
160000
Factorization speed (Mflops)
140000
120000
100000
80000
60000
40000
20000
0
0 100 200 300 400 500 600 700 800 900 1000 1100 1200 1300
Front size (MB)
62 © 2012 ANSYS, Inc. May 18, 2012
ANSYS Mechanical – Multi-Node GPU
1.0
0.0
Solder 16 cores 32 cores 64 cores
balls Results Courtesy of MicroConsult Engineering, GmbH
Mold
PCB
63 © 2012 ANSYS, Inc. May 18, 2012
Trends in Performance by Solver Type
Comparative Trends
Elapsed
Time
I II III
With multiple
cores & GPUs all
trends can
change due to PCG1
speedup
sparse
difference
sparse+gpu
PCG2
Load balance
Improvements on domain decomposition
Amdahl’s Law
• Algorithmic enhancements: every part of the code is to run in parallel
User controllable items:
• Contact pair definitions: big contact pairs hurt load balance (one contact pair is put
into one domain in our code )
• CE definition: many CE terms hurt load balance and Amdahl’s law ( CE needs
communications among domains that the CE’s are defined )
• Use best and most suitable hardware possible (speedup of the CPU, memory, I/O and
interconnects)
Avoid overlapping
contact surface if
possible
Define half circle as
target, don’t define
full circle
Define potential
contact surface
into smaller pieces
14 bonded
contact pairs
Internal CE
generated
by bonded
contact
Torque
Torque defined by RBE3 on end
surface only – good practice
Most of solution differences come from contact applications when NP =1, versus
NP = 2, 3, 4, 5, 6, 7, ……
• Check on contact pairs to make sure we don’t have a case of bifurcation and also plot
deformations to see the case.
• Tighten CNVTOL convergence tolerance to see solution accuracy. If solution is less than,
say, 1 % in difference, then parallel computing can make some difference in convergence,
all solutions are acceptable.
• If solution is well-defined and all input settings are correct, report this case to ANSYS Inc.
for investigations
- Do not uninstall HP-MPI, this is required for compatibility purposes with R13.
For ANSYS Mechanical customers who have R13 installed and wish to
continue to use R13, please run the following command to ensure
compatibility :
"%AWP_ROOT140%\commonfiles\MPI\Platform\8.1.2\Windows\HPMPICO
MPAT\hpmpicompat.bat“
(by default : “C:\Program Files\ANSYS Inc\v140\commonfiles\MPI\Platform
\8.1.2\Windows\HPMPICOMPAT\hpmpicompat.bat”)
The command will display a dialog box with a title of "ANSYS 13.0 SP1
Help".
- Change the Ansys path and the number of processors if necessary (-np x)
- Save and run the file "test_mpi14.bat"
- The expected result is shown below :
Name of the machines used for the solve with PCMPI (up to 3)
Description Choice
Machine Number of machines used 1,2 or 3
Solver Type of solver used sparse or pcg
Division Division of the edge for meshing Any integer
Release Select Ansys Release 140 or 145
GPU Use GPU acceleration yes or no
np total Total number of cores No choice (value calculated)
np / machine Number of cores by machines Any integer
PCG level Only available for PCG solver 1,2,3 or 4
Simulation Shared Memory or Distributed SMP or DMP
method Memory
89 © 2012 ANSYS, Inc. May 18, 2012
INPUT DATA
Create a job.bat file with all the input data given in the Excel
2
3
PCG SPARSE