You are on page 1of 3

Estimating CPU Time & Memory

for Computational Chemistry Calculations


Prepared by David Young
Ior users oI the Alabama Supercomputer Center


CPU time, memory, and disk usage all scale according to scaling Iactors called
complexities. CPU time would scale according to a time complexity, which would be
denoted like O(N
2
) in computer science texts. Many oI these time complexities can be
Iound in chapter 15 oI 'Computational Chemistry; A Practical Guide Ior Applying
Techniques to Real World Problems, D. Young, Wiley, 2001.
The way in which these complexities are used is in described in text Iorm on page
129 oI the reIerence above. Here are some examples in equation Iorm.

Example 1: Consider the situation in which you have run a small molecule MP2
calculation with a given basis set, and you want to estimate the time required to run a
larger molecule using MP2 and the same basis set. You know the number oI basis
Iunctions in the small molecule, denoted N
1
, and the CPU time to run the small molecule,
denoted T
1
. Note that CPU time should be obtained Irom the PBS tracejob command, as
the timings printed in the outputs oI Gaussian and other codes are extremely inaccurate in
the current generation oI operating systems and computer hardware. You also know Irom
the reIerence above that the MP2 algorithm has a time complexity oI N
5
.
The time complexity is a proportionality Iactor, thus you know


5


Now consider how one would estimate the time to run a MP2 calculation with the
same basis set on a larger molecule. We will denote the number oI basis Iunctions in the
larger molecule as N
2
and the time to run the larger calculation as T
2
. Now you can make
a Iirst rough estimate oI T
2
like this.

5
1
5
2
1
2
N
N
T
T



1
5
1
2
2
T
N
N
T



When you run the calculation, you might Iind that the time was signiIicantly
diIIerent Irom the time you estimated. Now we must look at why this Iirst rough estimate
was imperIect, and how to make it better. Here are some oI the Iactors that our Iirst
estimate Iailed to take into account.

SCF convergence: The two calculations may have required a diIIerent number oI
SCF iterations to converge. This is generally a small error in closed shell, organic
molecule computations. It can be a large error in open shell systems, or
molecules containing transition atoms.
Geometry optimization convergence: II the calculations are geometry
optimization calculations, one may have taken more optimization steps than the
other. In general, the molecule that starts out closest to the optimized geometry
will require Iewer optimization iterations. This is why many researchers optimize
their starting geometries with a molecular mechanics calculation beIore doing an
ab initio geometry optimization. Also as a general rule, larger molecules require
more optimization steps than small molecules (assuming a similar accuracy oI
starting geometry).
Algorithmic improvements: Algorithmic improvements are slick tricks in the
computer code that lets the program do the exact same calculation and get the
exact same answer, but do so more quickly. Some examples oI algorithmic
improvements are semi-direct integral evaluation, incremental Fock matrix
updates, and linear scaling methods. For example, the original Roothaan SCF
procedure Ior solving the Hartree-Fock equations has a time complexity oI N
4
, but
the Gaussian program has so many algorithmic improvements that the eIIective
time complexity Ior it`s HF calculations is closer to N
2.5
. This can be the biggest
source oI error in the crude estimate we showed above. Fortunately, algorithmic
improvements generally make your estimate a worse case limit.
Similar size molecules: Consider the situation in which you are using a methane
calculation to estimate the processing requirements Ior a large peptide chain. II
the code is using a semi-direct integral evaluation scheme, the methane integrals
may have all Iit in memory thus making it eIIectively an in core calculation.
However, only a small percentage oI the peptide integrals will Iit in memory
making it eIIectively a Iull direct calculation. This is an annoying error in our
estimates because it can result in the larger calculation requiring signiIicantly
more resources than estimated. The Iix is to use the calculation Ior a molecule
closer in size to the one being estimated as your reIerence.

Now the question is how to make a more accurate estimation?
SCF convergence is usually either a very minor or a very big problem. II you are
seeing large numbers oI SCF iterations (more than 20) in the output, reIer to chapter 22
oI the book reIerenced above.
Geometry optimization issues are best addressed by doing a pre-optimization with
a molecular mechanics method. Making estimates based on a similar sized molecule,
usually somewhat smaller, also minimizes this error in the estimate. Thus using the best
(closest in size) reIerence molecule calculation available minimizes two diIIerent types oI
errors.
The really big issue to be addressed is that the program you are using may have
algorithmic improvements that mean that the published complexity isn`t correct. These
diIIerences tend to be the most pronounced Ior Hartree-Fock, MP2, density Iunctional
theory, and certain basis sets. In Iact, a method may have a diIIerent time complexity in
one soItware package than in another. Or there may not be a published complexity to
use. Memory complexity can be utilized in the same way to estimate the memory needs
oI a larger job, but very Iew memory complexities are published. Likewise Iew
complexities relevant to Irequency calculations are published. The solution to any oI
these problems is to do a couple calculations and use two calculations to compute the
eIIective time/memory/disk complexity.

Example 2: Consider the situation in which we don`t know the memory complexity Ior
MP2 Irequency calculations within a given code. An estimate can be made by looking at
two previous jobs, each doing MP2 Irequencies with the same basis set Ior diIIerent size
molecules. We can reasonably assume that the value we need to Iind is X, where N
X
is
the basis set scaling Ior this method in this piece oI soItware. What we know are the Iirst
calculations memory usage and number oI basis Iunctions, denoted M
1
and N
1

respectively. We know the same inIormation Ior a second calculation, denoted M
2
and
N
2
.
The proportionality expression is


X
N M

This is again made into a ratio, and solved Ior X

X
X
N
N
M
M
1
2
1
2


1
2
1
2
1
2
ln ln ln
N
N
X
N
N
M
M
X


1
2
1
2
ln
ln
N
N
M
M
X

Now this value X can be used, just as we did in Example 1 to give a more
accurate estimate because it is based on the particular method, soItware package, basis
set, and hopeIully somewhat similar size molecules. None the less, it will still be an
estimate. There will still be speciIic calculations Ior which something unusual goes
wrong, or right (guess which is most common). In the end, experience is still the best
teacher.

You might also like