Professional Documents
Culture Documents
5
1
5
2
1
2
N
N
T
T
1
5
1
2
2
T
N
N
T
When you run the calculation, you might Iind that the time was signiIicantly
diIIerent Irom the time you estimated. Now we must look at why this Iirst rough estimate
was imperIect, and how to make it better. Here are some oI the Iactors that our Iirst
estimate Iailed to take into account.
SCF convergence: The two calculations may have required a diIIerent number oI
SCF iterations to converge. This is generally a small error in closed shell, organic
molecule computations. It can be a large error in open shell systems, or
molecules containing transition atoms.
Geometry optimization convergence: II the calculations are geometry
optimization calculations, one may have taken more optimization steps than the
other. In general, the molecule that starts out closest to the optimized geometry
will require Iewer optimization iterations. This is why many researchers optimize
their starting geometries with a molecular mechanics calculation beIore doing an
ab initio geometry optimization. Also as a general rule, larger molecules require
more optimization steps than small molecules (assuming a similar accuracy oI
starting geometry).
Algorithmic improvements: Algorithmic improvements are slick tricks in the
computer code that lets the program do the exact same calculation and get the
exact same answer, but do so more quickly. Some examples oI algorithmic
improvements are semi-direct integral evaluation, incremental Fock matrix
updates, and linear scaling methods. For example, the original Roothaan SCF
procedure Ior solving the Hartree-Fock equations has a time complexity oI N
4
, but
the Gaussian program has so many algorithmic improvements that the eIIective
time complexity Ior it`s HF calculations is closer to N
2.5
. This can be the biggest
source oI error in the crude estimate we showed above. Fortunately, algorithmic
improvements generally make your estimate a worse case limit.
Similar size molecules: Consider the situation in which you are using a methane
calculation to estimate the processing requirements Ior a large peptide chain. II
the code is using a semi-direct integral evaluation scheme, the methane integrals
may have all Iit in memory thus making it eIIectively an in core calculation.
However, only a small percentage oI the peptide integrals will Iit in memory
making it eIIectively a Iull direct calculation. This is an annoying error in our
estimates because it can result in the larger calculation requiring signiIicantly
more resources than estimated. The Iix is to use the calculation Ior a molecule
closer in size to the one being estimated as your reIerence.
Now the question is how to make a more accurate estimation?
SCF convergence is usually either a very minor or a very big problem. II you are
seeing large numbers oI SCF iterations (more than 20) in the output, reIer to chapter 22
oI the book reIerenced above.
Geometry optimization issues are best addressed by doing a pre-optimization with
a molecular mechanics method. Making estimates based on a similar sized molecule,
usually somewhat smaller, also minimizes this error in the estimate. Thus using the best
(closest in size) reIerence molecule calculation available minimizes two diIIerent types oI
errors.
The really big issue to be addressed is that the program you are using may have
algorithmic improvements that mean that the published complexity isn`t correct. These
diIIerences tend to be the most pronounced Ior Hartree-Fock, MP2, density Iunctional
theory, and certain basis sets. In Iact, a method may have a diIIerent time complexity in
one soItware package than in another. Or there may not be a published complexity to
use. Memory complexity can be utilized in the same way to estimate the memory needs
oI a larger job, but very Iew memory complexities are published. Likewise Iew
complexities relevant to Irequency calculations are published. The solution to any oI
these problems is to do a couple calculations and use two calculations to compute the
eIIective time/memory/disk complexity.
Example 2: Consider the situation in which we don`t know the memory complexity Ior
MP2 Irequency calculations within a given code. An estimate can be made by looking at
two previous jobs, each doing MP2 Irequencies with the same basis set Ior diIIerent size
molecules. We can reasonably assume that the value we need to Iind is X, where N
X
is
the basis set scaling Ior this method in this piece oI soItware. What we know are the Iirst
calculations memory usage and number oI basis Iunctions, denoted M
1
and N
1
respectively. We know the same inIormation Ior a second calculation, denoted M
2
and
N
2
.
The proportionality expression is
X
N M
This is again made into a ratio, and solved Ior X
X
X
N
N
M
M
1
2
1
2
1
2
1
2
1
2
ln ln ln
N
N
X
N
N
M
M
X
1
2
1
2
ln
ln
N
N
M
M
X
Now this value X can be used, just as we did in Example 1 to give a more
accurate estimate because it is based on the particular method, soItware package, basis
set, and hopeIully somewhat similar size molecules. None the less, it will still be an
estimate. There will still be speciIic calculations Ior which something unusual goes
wrong, or right (guess which is most common). In the end, experience is still the best
teacher.