You are on page 1of 7

REVIEW resolutions.

We focus below on the use of clas-


sical MD simulations to study biomolecules.

Biophysical experiments Challenges and opportunities for


biomolecular simulations

and biomolecular simulations: The successful use of MD simulations hinges on


solving two distinct, yet related, problems: the
“sampling problem” and the “force-field problem”
A perfect match? (Fig. 2). The sampling problem refers to our ability
to sample the relevant biomolecular configura-
Sandro Bottaro and Kresten Lindorff-Larsen* tions and to determine their relative populations.
Exhaustive sampling is difficult to achieve, be-
A fundamental challenge in biological research is achieving an atomic-level description cause it is not possible to know in advance the
and mechanistic understanding of the function of biomolecules. Techniques for required amount of sampling needed to calculate
biomolecular simulations have undergone substantial developments, and their accuracy precise statistical averages. It is even difficult to
and scope have expanded considerably. Progress has been made through an increasingly assess whether a simulation is converged, because
tight integration of experiments and simulations, with experiments being used to one may never know whether there are motions
refine simulations and simulations used to interpret experiments. Here we review the occurring on time scales beyond those sampled
underpinnings of this progress, including methods for more efficient conformational and robust and generally applicable tools to mon-
sampling, accuracy of the physical models used, and theoretical approaches to integrate itor convergence are needed (4, 5). Thus, active
experiments and simulations. These developments are enabling detailed studies of areas of development are theories, algorithms,
and technological improvements to increase the

Downloaded from http://science.sciencemag.org/ on July 26, 2018


complex biomolecular assemblies.
precision of the simulations.

I
The force-field problem refers to the construc-
n modern biological research, a key goal is correspond to a model that correctly predicts the tion of the energy function that describes the
to understand the functional consequences physical behavior of a system—for example, the physical interactions between atoms. Improve-
of structure, dynamics, and interactions of catalytic power and stability of an enzyme as a ments in force fields thus increase the accuracy
biological macromolecules. Proteins, lipids, function of pH and temperature. Such perfect of simulations by providing a more realistic de-
carbohydrates, and nucleic acids interact, re- models do not exist, so variants are developed scription of the molecular interactions. Although
arrange, and modify their shape while effecting with distinct strengths and areas of application. progress on solving these two problems requires
their various functions. Experimentalists face the Detailed information for reaction mechanisms distinct approaches, they are tightly related. Only
daunting task of characterizing thermodynamic and transition states in chemical reactions can after taking into account all relevant conforma-
and kinetic properties of macromolecules in a be obtained via quantum mechanical (QM) cal- tions, that is, those that contribute to thermo-
complex environment. Computational simulation culations. These allow for simulation of the dynamic averages, is it meaningful to calculate
plays a role in these efforts, as modeling ap- electronic properties of a subset of atoms within average quantities and compare them to experi-
proaches can aid in understanding experimental a macromolecule, which can be used to investi- ments. Hence, our ability to improve force fields is
data and designing and predicting the outcome gate bond cleavage and formation, distribution tightly connected to improvements in sampling.
of future experiments. of charge and spin, and reaction mechanisms.
Here we review the progress and current Simulation of electronic properties of molecules Challenge 1: Improving physical models
challenges in computational modeling of bio- requires a great deal of computational power, The first fundamental challenge in biomolecular
molecules, focusing on the topic of atomistic and thus the applicability of QM methods is, in modeling is the construction of the physical
biomolecular simulations and the relationship general, limited to small systems or short time model itself. Trade-off between computer power
between experiments and simulations. We high- scales (1). Molecular dynamics (MD) simulations and spatial or temporal resolution requires a
light recent technological and theoretical advances with empirical molecular mechanics force fields choice of model, ranging from all-atom repre-
in the field and consider whether there is a per- treat atoms as classical particles rather than sentations to CG (6) and ultra-CG models in
fect match between experiments and simula- considering their electronic structure: This ap- which multiple residues or nucleotides are repre-
tions. Disagreement between computation and proximation makes it possible to study the struc- sented by a single site (7). A long-standing goal
experiment provides useful insights to further ture and dynamics of larger systems for longer of the field is to construct hybrid models that
our understanding, and their complementary use periods of time, such as small proteins at the smoothly couple together different components
yields a clearer picture than either does alone. millisecond time scale. There are many relevant at different resolutions (8).
biological processes, however, that involve much In atomistic MD simulations, interactions be-
Biomolecular simulations across length larger biomolecular assemblies. The computa- tween particles are modeled by “physics-based”
and time scales tional complexity of these problems can be de- terms that take into account chain connectivity,
Experimentalists often collect data that must creased by grouping atoms together into single electrostatic interactions, London dispersion
then be synthesized into a coherent model particles called beads. Such coarse-grained (CG) forces, and so on. Parameterized pairwise inter-
through inverse problem-solving. Computa- models range in resolution from one or a few action terms are fitted against QM calculations
tional modelers deal with the forward problem: beads per amino acid to one bead per hundreds and experimental data to generate a force field
constructing a microscopic molecular model that or thousands of DNA bases. Despite their in- that describes interactions between individual
can be compared with observed data (Fig. 1). In trinsic approximations, such models are essen- particles. Parameters like the equilibrium dis-
the best-case scenario, computational models are tial for tackling important problems in structural tance between two covalently bonded atoms are
fully predictive and widely applicable. This would biology, including understanding complex for- known with high accuracy. Other parameters,
mation between intrinsically disordered proteins such as partial charges, are difficult to establish,
(2) and rationalizing chromosome conformation– as they do not correspond to physical observables
Structural Biology and NMR Laboratory, Linderstrøm-Lang capture experimental data, thereby gaining in- that can be directly probed through experiments.
Centre for Protein Science and Integrative Structural Biology
at University of Copenhagen (ISBUC), Department of Biology,
sights into the internal chromosome organization Accurate force-field parameterization for pro-
University of Copenhagen, Copenhagen, Denmark. (3). There also exist mixed, hybrid, and inter- teins has benefited from benchmarking and direct
*Corresponding author. Email: lindorff@bio.ku.dk mediate models that bridge together different optimization of MD simulations with experimental

Bottaro et al., Science 361, 355–360 (2018) 27 July 2018 1 of 6


F R ON T IER S I N C O MP UT AT ION

nuclear magnetic resonance (NMR) data on par- provement in accuracy, as the forces are accumu- improved the accuracy of simulations of IDPs (14),
tially structured peptides (9, 10). Simulations of lated over multiple residues and the populations which play important roles in biology and disease.
peptides that are 10 to 40 residues long are pos- scale to energies exponentially. Corrections to A side effect of increasing protein-water inter-
sible to converge, yet can capture cooperative force fields obtained from examining short pep- actions is, however, destabilization of folded
phenomena, such as helix-coil transitions or for- tides are transferable among different structural proteins. In practice, one might thus have to
mation of small hydrophobic cores, which are classes of proteins and have improved models choose between one family of force fields for
difficult to parameterize from smaller molecules. of folding processes for small globular and fast- simulations of folded proteins and a separate
Solution NMR experiments can provide residue- folding proteins (11, 12). set for disordered systems, complicating studies
level information and are sensitive to the relative Unfolded states and intrinsically disordered of partially folded systems or of order-disorder
energies of conformations that correspond to local proteins (IDPs) have long appeared to be more transitions such as folding upon binding. To
minima and have sizable populations. By optimiz- compact and structured when observed in MD tackle this problem, it is necessary to consider
ing the backbone potential to match the experi- simulations than when observed through experi- simultaneously proteins that span from fully
mentally measured helicity of a 15-residue peptide, ments. This discrepancy suggests that important ordered to completely disordered and to test
as measured by NMR, a small change of about physical effects were not modeled correctly in and optimize parameters at the same time on all
1 kJ mol−1 was found to be sufficient to balance the simulations. Proposed solutions include im- of these systems. A comprehensive study of
the secondary-structure populations (10). This proving the description of water or of protein- model systems with diverse properties has re-
small change in energy leads to a substantial im- water interactions (13). These modifications have cently produced a force field capable of hand-
ling both fully folded proteins and IDPs (15).
In parallel to the development of models to
study the structure and dynamics of proteins, there
has been a growing interest in modeling nucleic
acids, particularly RNA because of its catalytic

Downloaded from http://science.sciencemag.org/ on July 26, 2018


and regulatory activities. Although important
improvements have been made, state-of-the-art
RNA force fields remain less accurate that those
for polypeptides (16). Here, too, artifacts of MD
simulations have been uncovered by direct com-
parison against solution NMR data on small
model systems (17). Similar to the case of IDPs,
promising results have been obtained by balanc-
ing water-RNA and RNA-RNA interactions (18).
Systematic benchmarking of force fields against
experiments has revealed a comforting trend:
Force fields are getting better (12, 15, 19). It is
worthwhile to note that these improvements have
been possible even without substantial changes
to the underlying model or mathematical func-
tion used in the force fields. Thus, despite the in-
herent simplicity and lack of, for example, taking
polarization into account, it has been possible
to improve force fields dramatically. Indeed, it
is surprising that it is possible to parameterize
force fields that work well across many different
proteins and problems (20), and eventually, pro-
gress will require models that are more complex.
Improvements of force fields generally rely on
ab initio QM calculations. Machine-learning ap-
proaches, particularly neural networks, make it
possible to train simple potentials with QM-level
accuracy (21). Training is typically done on small
molecules, and encouraging results have been
obtained by transferring these potentials to the
study of larger organic molecules (22). Force fields
that explicitly include polarization effects are also
likely to benefit from automated methods for
CREDIT: ADAPTED BY V. ALTOUNIAN/SCIENCE; FREEPIK

integrated parameterization from experiments


Fig. 1. Simulations and experiments are complementary. (A) Solving an inverse problem aims to and QM calculations and from improvements in
describe causal factors that produce a set of observations. Molecular simulations, conversely, can be used software and algorithms from sampling with these
to construct a set of microscopic molecular conformations that can be compared with experimental potentials. Here, Bayesian methods for optimizing
observations through the use of a forward model. (B) Computational approaches to studying biomolecules force fields against experimental data and QM
range from detailed quantum mechanical models to atomistic molecular mechanics to coarse-grained calculations are expected to play an even larger
models, where several atoms are grouped together. The decreased computational complexity granted by role, by enabling systematic balancing of differ-
progressive coarse-graining makes it possible to access longer time scales and greater length scales. ent sources of information (23–25).
(C) Experimental data can be combined with physical models to provide a thermodynamic and kinetic
description of a system. As the model quality improves, it becomes possible to describe more complex Challenge 2: Accessing long time scales
phenomena with less experimental data. SANS, small-angle neutron scattering; EPR, electron Atomistic biomolecular MD simulations are in-
paramagnetic resonance; FRET, fluorescence resonance energy transfer; DG, Gibb’s free energy. herently costly, owing to the need to model forces

Bottaro et al., Science 361, 355–360 (2018) 27 July 2018 2 of 6


between tens or hundreds of thousands of in-
dividual atoms or more. These forces are eval-
uated every few femtoseconds of simulation
time, requiring about a billion steps to simulate
a molecule for a microsecond. Although the
speed of a simulation depends strongly on the
size of the biomolecular system and the available
computational resources, it is not uncommon
to require weeks or months of computer time
with hundreds or thousands of processors work-
ing simultaneously to obtain microsecond-
length simulations.
Conceptually, the most straightforward means
to increase the speed and throughput of molec-
ular simulations is perhaps “simply” to decrease
the time it takes to perform a single iteration of
the simulation. Widely used software packages
designed for biomolecular simulations, such as
GROMACS (26), NAMD (27), Desmond (28),
AceMD (29), and AMBER (30), use different
levels of parallelization by taking advantage
of multicore processors and high-performance

Downloaded from http://science.sciencemag.org/ on July 26, 2018


computing facilities. Speedups can be achieved
by off-loading calculations to graphics processing
units, which provide high performance at rea-
sonable cost. A different route to improve ef-
ficiency is to build hardware specifically adapted
to molecular simulations such as MDGRAPE (31)
and Anton (32). For example, Anton is a massively
parallel supercomputer designed to perform fast Fig. 2. Sampling and accuracy in molecular simulations. An MD simulation samples the temporal
and accurate simulations of biomolecules by evolution of molecular configurations, but sampling is, in practice, limited to a finite time (t = tsim).
simultaneously considering all parts of the calcu- (A) When the simulation time is much longer than the slowest time scales of motions, many
lations, including MD-specific integrated circuits transitions are observed between the relevant conformational states. (B) When such simulations are
for calculating the costly parts of the force-field performed with an accurate force field, statistical averages are converged and will be close to
interactions, a specialized communication net- experimental values, and the averages approach the infinite time-scale average. (C) By contrast,
work tailored to match the periodic boundary when the force field is inaccurate, converged simulations give rise to precise, but inaccurate, results.
conditions used in simulation, and special par- (D) When the simulation time is too short compared to the time scales of the system, it is difficult to
allelization algorithms developed for this archi- calculate precise quantities. (E) In this case, one may get disagreement between experiment and
tecture. Anton enabled the first millisecond-length simulation even when the force field is accurate. (F) The worst situation, when sampling is
all-atom MD simulation of a globular protein (32). insufficient and the force field is inaccurate.
Its successor, Anton 2, is optimized for larger
biomolecular systems and can perform multi- tein folding, ligand binding, and protein-protein at high temperatures is larger, so that increased
microsecond simulations in a single day for sys- association (35). Path-based methods such as rates of sampling are more than offset by the
tems such as a small virus or a solvated ribosome transition path sampling (39) and milestoning increased volume of conformational space. Ac-
with more than 1 million atoms (33). (40) also use many short simulations to study celerated MD may instead be used to “boost”
Massive parallelization has been exploited kinetics and mechanisms of long–time scale the energy along internal degrees of freedom,
in the folding@home project, which utilizes processes. These and related methods exploit such as the backbone dihedral angles, thus en-
hundreds of thousands of “stand-by” machines the fact that many conformational transitions hancing the ability to cross local barriers (42).
all over the globe (34). Such distributed com- are “rare events,” for which the time it takes to Enhancing sampling along one or more pre-
puting studies may now reach multiple milli- cross the barrier is substantially shorter than specified CVs that describe the process of interest
seconds of aggregate simulation time and consist the waiting time between such events. is another widely used strategy (43). In a protein-
of hundreds or thousands of simulations ranging Sampling may also be enhanced by changing folding simulation, the number of native contacts
from hundreds of nanoseconds to a few micro- simulation parameters. Increasing the tempera- formed or the progress along an initial guess
seconds (35). Because each simulation may be ture increases the kinetic energy, making barrier- of the folding path might be used to guide the
much shorter than the time scales of interest, a crossing events faster (41). This idea is at the basis simulation, even if the path is imperfect, and
key problem is how to extract information about of parallel tempering, perhaps the most widely thus provide detailed insight into the folding
slow, long–time scale processes from a combined used enhanced sampling approach. These al- free-energy landscape. Metadynamics (44) uses
CREDIT: ADAPTED BY V. ALTOUNIAN/SCIENCE

analysis of many short simulations. One possible terations can be viewed as enhancing simula- a time-dependent potential to simultaneously
solution to this problem is to build a Markov tions along a progress variable, also known as a enhance sampling and construct a free-energy
state model (MSM) (35, 36), which enables one reaction coordinate or collective variable (CV), profile along such CVs and is widely used both
to construct a “memoryless transition network” in this case related to the energy of the system. because of its applicability to a range of problems
describing the populations and kinetics of in- For some problems, rapid fluctuations of the (e.g., biomolecular processes, molecular docking,
terconversion between different conformational energy, and similar energies in different distinct chemical reactions, crystal growth, and proton
states. In recent years, MSMs have gained wide- conformational states, mean that increased diffusion) and the availability of efficient and
spread use, thanks to improved algorithms and temperature does not transfer into efficient easy-to-use software (45).
software (37, 38) and several successful applica- sampling. This problem can be exacerbated by Long unbiased simulations performed with
tions to biomolecular processes, including pro- the fact that the available conformational space Anton represent a useful reference to benchmark

Bottaro et al., Science 361, 355–360 (2018) 27 July 2018 3 of 6


F R ON T IER S I N C O MP UT AT ION

and validate enhanced sampling methods and ferent sources of complementary experimental
kinetic models. In such applications, one may data is, in this context, important, as it allows
compare a specific protocol for enhanced sam- one to cross-validate results and avoid overfitting.
pling or constructing a kinetic model with the Structural experiments such as SAXS, NMR,
results from an unbiased simulation with the and x-ray diffraction report on quantities aver-
same force field to focus on benchmarking the aged over many molecules and long periods of
algorithms and avoiding complications from time. For rigid molecules, the error may be small
force-field uncertainty (46, 47). when interpreting ensemble-averaged quantities
Approaches based on CVs are very powerful, for individual structures. However, dynamical
but their optimal choice is a critical and non- averaging is crucial when studying flexible
trivial step. For complex biomolecular rear- molecules, such as IDPs or single-stranded RNA,
rangements, it is difficult to identify CVs that because the structural interpretation of exper-
correspond to the relevant, slowly varying de- imental data must be addressed by considering
grees of freedom. In this respect, deep learning the coexistence of multiple conformational states
approaches have recently been used to identify (Fig. 3). One theoretical approach for dealing
improved CVs (48). CVs are not only useful to with the averaging problem is based on the
enhance sampling: They are essential to ratio- maximum-entropy principle (52). The basic idea
nalize the large amount of complex data gen- is to introduce a perturbation to the conforma-
erated in MD simulations. New approaches to tional ensemble generated by simulations in
create better low-dimensional representations order to match a set of experimental data. The
of high-dimensional data are also useful to perturbation should be as small as possible:
construct improved MSMs (35), and we expect Mathematically, this is achieved by maximizing

Downloaded from http://science.sciencemag.org/ on July 26, 2018


the advances in machine-learning methods a quantity called relative Shannon entropy, hence
[e.g., low-dimensional embedding and cluster- the name maximum entropy. Thus, a minimal
ing (49, 50)] to play an increasingly important modification to the simulations to match the
role in this field. experimental data results in the least-biased
combination of the force field and the experimen-
Challenge 3: Integrating experiments tal measurements. In practice, these approaches
and simulations can remove much of the uncertainty associated
Although simulations and statistical mechanical with the choice of force fields so that conforma-
theories are important and very powerful in tional ensembles derived by combining experi-
their own right, direct integration of experimental ments and simulations are more similar than
data with molecular simulations can provide a ensembles derived solely from simulations (5, 57).
rich description of the structure and dynamics Although the maximum-entropy principle pro-
of biomolecules. This field—also called integrative vides a coherent framework to obtain conforma-
structural biology (51)—has benefited from recent tional ensembles that combine force fields and
technological advances in cryo–electron micros- experimental data, the basic formalism does not
copy (cryo-EM) and is particularly important for Fig. 3. Experimentally driven simulations. take sources of error into account. Another im-
studies of complex, dynamic systems for which (A) Probability distribution of the structural sim- portant development has thus been theory that
multiple structural techniques provide com- ilarity to the native structure of a protein deter- considers not only experimental measurements
plementary information. Formally, the problem mined by using a simplified force field (blue) or but also the associated uncertainty. When com-
consists of determining the three-dimensional when the same force field is combined with NMR bining data from multiple experimental tech-
structure or, more generally, an ensemble of chemical-shift restraints (green) (53). RMSD, root niques, uncertainties are essential to set the correct
molecular conformations and their associated mean square deviation. (B) A representative three- weights among them. For some sources of ex-
weights, which are compatible with a set of ex- dimensional structure from the restrained perimental data—for example, chemical shifts
perimental observations. simulation (green) and a reference structure from NMR spectroscopy—the measurement itself
One strategy is to modify the simulation to (black). (C) In a conventional restrained simulation, is extremely precise, but our ability to relate the
match experimental data (Fig. 3). In this case, the probability distribution of a measured quantity experimental quantity to molecular structure (i.e.,
the force field is not considered a fixed, im- obtained by sampling using the force field alone the forward model that is used to calculate ex-
mutable model but instead a fitting function (blue) is modified by adding an additional energy perimental quantities from three-dimensional
to be adjusted by experimental data. Indeed, term that enforces the agreement with experi- structures) is associated with substantial uncer-
this “pseudoenergy” approach underlies most mental data (green). In the resulting ensemble, all tainty. Both experimental and modeling uncer-
structure-determination algorithms in which individual molecular conformations are close to the tainty can be treated by using Bayesian approaches
a physical energy function (often a simplified experimental average. (D) When heterogeneous such as those used in inferential structural deter-
force field) is combined with an “experimental conformations give rise to the measured average mination protocols, leading to improved precision
energy function” that measures the deviation value (e.g., scalar couplings for different rotameric and a rigorous approach to integrate multiple
between experiment and simulation (52). These states), adding the experimental restraints to push sources of experimental data (58).
CREDIT: ADAPTED BY V. ALTOUNIAN/SCIENCE

integrative approaches enable accurate protein- individual conformation close to the experimental Combined Bayesian–maximum entropy inte-
structure determination when using chemical value is not correct, as this forces the simulation to grative methods that consider uncertainty and
shifts (53) (Fig. 3A) or when using sparse, un- structures that may not represent any of the averaging offer a promising route to reconstruct
certain, and ambiguous experimental data (54). relevant states. In maximum-entropy approaches, the conformational variability of complex biomo-
Similar approaches have been developed with the experimental data are satisfied by introducing a lecular systems (59, 60). These methods can be
the aim of providing molecular models of large minimal perturbation to the simulation ensemble. used with all-atom simulations or with CG mod-
molecular complexes constructed by using di- In this simplified example, the solution is a small els for larger assemblies. For instance, the struc-
verse sets of experimental data, including cross- shift in the populations of the two states, which ture and allosteric mechanism of a protein kinase
linking, small-angle x-ray scattering (SAXS), and results in a calculated average (red dashed line) were revealed by reweighting CG simulations
cryo-EM images (55, 56). The availability of dif- compatible with the experiment. using SAXS experimental data (61). An alternative

Bottaro et al., Science 361, 355–360 (2018) 27 July 2018 4 of 6


approach is to construct an MSM that has also to isolate properties that current models fail to sive maps of the mutational effects on protein
been biased by using experimental data (62). describe (9, 17). By testing and optimizing models stability across entire proteins (79) and enable a
An important challenge is how the informa- broadly across different classes of problems and deeper understanding and benchmarking of our
tion gleaned from these studies may be fed back molecules, it will be possible to create force fields ability to predict the consequences of mutations
into improved force fields—for example, by sys- that are more transferable. Eventually, we will (76, 80).
tematically analyzing differences between the have to go beyond the current simple functional We thus anticipate that simulations will even-
experimentally restrained ensembles and those forms (67, 68), but a surprising observation has tually be commonplace when studying the effect
obtained from the models alone. For instance, been how much force fields could be improved by of drugs and mutations and will play an essential
we recently identified a specific dihedral angle careful parameter optimization on an increasingly role in the future of bioengineering in the same
in the RNA backbone whose distribution in MD broad set of QM and experimental data. When way that computer modeling is used today in
simulations was markedly different from that reading the simulation literature, one should thus computational prototyping of cars and buildings.
found when reweighting the same simulations check whether a carefully validated force field
using a Bayesian–maximum entropy approach has been used. Judging this is helped by the in- REFERENCES AND NOTES
(63). This observation suggested that force-field creased availability of systematic comparisons 1. R. E. Amaro, A. J. Mulholland, Nat. Rev. Chem. 2, 0148 (2018).
errors for this specific term could explain part on a broad range of systems (12, 15, 19, 69). 2. A. Borgia et al., Nature 555, 61–66 (2018).
of the disagreement between experiment and Further, as it remains difficult to sample con- 3. T. J. Stevens et al., Nature 544, 59–64 (2017).
4. J. D. Chodera, J. Chem. Theory Comput. 12, 1799–1805 (2016).
simulations, and, indeed, parallel work on im- formational space sufficiently, particularly for 5. M. Tiberti, E. Papaleo, T. Bengtsen, W. Boomsma,
proving RNA force fields resulted in distribu- complex systems, one should check whether K. Lindorff-Larsen, PLOS Comput. Biol. 11, e1004415 (2015).
tions for this dihedral angle that were in much convergence has been assessed and whether 6. S. J. Marrink, D. P. Tieleman, Chem. Soc. Rev. 42, 6801–6822
better agreement with the experimentally derived quantitative differences are backed up by suf- (2013).
7. J. F. Dama et al., J. Chem. Theory Comput. 9, 2466–2480 (2013).
results (18, 63). ficient sampling. This is inherently difficult 8. M. Praprotnik, L. D. Site, K. Kremer, Annu. Rev. Phys. Chem. 59,
The discussion above pertains to experimental because it is much easier to prove lack of con-

Downloaded from http://science.sciencemag.org/ on July 26, 2018


545–571 (2008).
data that can be related to equilibrium proper- vergence than the opposite (70). Nevertheless, 9. J. Graf, P. H. Nguyen, G. Stock, H. Schwalbe, J. Am. Chem. Soc.
ties and that can be represented by population- useful questions to ask include (i) whether the 129, 1179–1189 (2007).
10. R. B. Best, G. Hummer, J. Phys. Chem. B 113, 9004–9015
weighted averages over individual conformations same events are observed multiple times, (ii) (2009).
in the ensemble. For example, distances probed the simulations are longer than the correlation 11. K. Lindorff-Larsen, S. Piana, R. O. Dror, D. E. Shaw, Science
via nuclear Overhauser effect (NOE) NMR ex- times and the statistical analyses take time 334, 517–520 (2011).
periments are typically calculated from the aver- correlation into account, and (iii) whether the 12. K. Lindorff-Larsen et al., PLOS ONE 7, e32131 (2012).
13. P. S. Nerenberg, T. Head-Gordon, Curr. Opin. Struct. Biol. 49,
age of the inverse sixth power of the distances in observed effects are greater than the statistical 129–138 (2018).
each individual structure (58). In reality, NOEs uncertainty. 14. J. Huang et al., Nat. Methods 14, 71–73 (2017).
and many other experimental quantities depend We must, however, also be pragmatic in the 15. P. Robustelli, S. Piana, D. E. Shaw, Proc. Natl. Acad. Sci. U.S.A.
on kinetic properties that need to be taken into way simulations are used. Like experiments, sim- 115, E4758–E4766 (2018).
16. J. Šponer et al., Chem. Rev. 118, 4177–4338 (2018).
account for the most accurate calculations. Re- ulations are not perfect, and we will continue to 17. J. D. Tubbs et al., Biochemistry 52, 996–1010 (2013).
cent theoretical and practical advances make it live with uncertainty in sampling and force fields. 18. D. Tan, S. Piana, R. M. Dirks, D. E. Shaw, Proc. Natl. Acad.
possible to construct conformational ensembles Here the integration between experiment and Sci. U.S.A. 115, E1346–E1355 (2018).
also based on such information (62, 64–66) and simulations can help alleviate problems in 19. K. A. Beauchamp, Y.-S. Lin, R. Das, V. S. Pande, J. Chem.
Theory Comput. 8, 1409–1414 (2012).
thus extend applications to new sources of ex- both accuracy and sampling. We envision that 20. J. C. Faver et al., PLOS ONE 6, e18868 (2011).
perimental data. these methods will play an increasingly impor- 21. R. T. McGibbon et al., J. Chem. Phys. 147, 161725 (2017).
tant role in studying the relationship between 22. T. Bereau, R. A. DiStasio Jr., A. Tkatchenko, O. A. von Lilienfeld,
Conclusions and outlook structure and dynamics of large biomolecular J. Chem. Phys. 148, 241706 (2018).
23. A. B. Norgaard, J. Ferkinghoff-Borg, K. Lindorff-Larsen,
The complexity of biological systems often man- assemblies or highly flexible molecules. The link Biophys. J. 94, 182–192 (2008).
dates the combined use of multiple techniques, between molecular simulations and cryo-EM, 24. L.-P. Wang, T. J. Martinez, V. S. Pande, J. Phys. Chem. Lett. 5,
including biomolecular simulations. Clearly, sim- inherently a single-molecule technique, might 1885–1891 (2014).
ulations are not ordinary experiments and often be particularly fruitful for looking at conforma- 25. J. Chen, J. Chen, G. Pinamonti, C. Clementi, J. Chem. Theory
Comput. 10.1021/acs.jctc.8b00187 (2018).
require a detailed knowledge of algorithms, un- tional dynamics at high spatial resolution (71, 72). 26. M. J. Abraham et al., SoftwareX 1–2, 19–25 (2015).
derlying assumptions, and tricks that can be Much can also be gained by carefully choosing 27. J. C. Phillips et al., J. Comput. Chem. 26, 1781–1802 (2005).
difficult to access and understand for non- systems that are amenable to both experimental 28. K. J. Bowers et al., in Proceedings of the 2006 Association for
specialists. Much progress has been made on and computational analysis. Recent examples Computing Machinery (ACM)/Institute of Electrical and
Electronics Engineers (IEEE) Conference on Supercomputing
making these tools more user-friendly and ac- include elucidating the molecular details that (ACM, New York, 2006); http://doi.acm.org/10.1145/1188455.
cessible, though analyzing simulations often re- underlie the alternating access mechanism in a 1188544.
quires specialist knowledge. With a wide range minimal sugar transporter (73) and an atomic- 29. M. J. Harvey, G. Giupponi, G. D. Fabritiis, J. Chem. Theory
of tools available, it is important to balance pre- level description of interactions that lead to barrier Comput. 5, 1632–1639 (2009).
30. D. A. Case et al., J. Comput. Chem. 26, 1668–1688 (2005).
cision and accuracy when deciding on a sim- roughness in protein folding (74). 31. I. Ohmura, G. Morimoto, Y. Ohno, A. Hasegawa, M. Taiji,
ulation strategy (sampling method, force field, The overwhelming growth of sequence data Phil. Trans. A Math. Phys. Eng. Sci. 372, 20130387 (2014).
and level of resolution): What level of detail is also presents new opportunities for computa- 32. D. E. Shaw et al., Science 330, 341–346 (2010).
relevant to the problem at hand, what are the tional chemists seeking to understand macro- 33. D. E. Shaw et al., in SC14: International Conference for High
Performance Computing, Networking, Storage and Analysis
relevant time scales, and can I address imper- molecular structure and function. Evolution is, (ACM, New York, 2014), pp. 41–53.
fections in the model by, for example, experi- after all, governed by the same physical forces 34. M. Shirts, V. S. Pande, Science 290, 1903–1904 (2000).
mental restraints? that simulations are constructed to model. One 35. B. E. Husic, V. S. Pande, J. Am. Chem. Soc. 140, 2386–2396
Substantial improvements in force fields have point of convergence has been the use of evolu- (2018).
36. J. D. Chodera, F. Noé, Curr. Opin. Struct. Biol. 25, 135–144 (2014).
been made possible by using data from experi- tionary records to construct statistical models 37. K. A. Beauchamp et al., J. Chem. Theory Comput. 7, 3412–3419
mental studies on systems that are large enough of amino acid sequences (75, 76). Conversely, (2011).
to capture complex behavior yet simple enough computational biophysics can guide interpreta- 38. M. K. Scherer et al., J. Chem. Theory Comput. 11, 5525–5542
to converge simulations. Future progress requires tions of what mutations do to proteins when (2015).
39. J. Juraszek, J. Vreede, P. G. Bolhuis, Chem. Phys. 396, 30–44
that experimentalist and computational chemists analyzing exome sequencing for patient diag- (2012).
continue to work together to design experiments nosis (76–78). Finally, large-scale deep mutational 40. R. Elber, A. West, Proc. Natl. Acad. Sci. U.S.A. 107, 5001–5005
that are best suited to optimize force fields and scanning experiments can provide comprehen- (2010).

Bottaro et al., Science 361, 355–360 (2018) 27 July 2018 5 of 6


F R ON T IER S I N C O MP UT AT ION

41. Y. Sugita, Y. Okamoto, Chem. Phys. Lett. 314, 141–151 (1999). 56. E. Karaca, J. P. G. L. M. Rodrigues, A. Graziadei, 70. A. Grossfield, D. M. Zuckerman, Annu. Rep. Comput. Chem. 5,
42. D. Hamelberg, J. Mongan, J. A. McCammon, J. Chem. Phys. A. M. J. J. Bonvin, T. Carlomagno, Nat. Methods 14, 897–902 23–48 (2009).
120, 11919–11929 (2004). (2017). 71. S. Hanot et al., bioRxiv 113951 [Preprint]. 25 January 2018.
43. G. M. Torrie, J. P. Valleau, J. Comput. Phys. 23, 187–199 (1977). 57. T. Löhr, A. Jussupow, C. Camilloni, J. Chem. Phys. 146, 165102 https://doi.org/10.1101/113951
44. A. Laio, M. Parrinello, Proc. Natl. Acad. Sci. U.S.A. 99, (2017). 72. T. Nakane, D. Kimanius, E. Lindahl, S. H. Scheres, eLife 7,
12562–12566 (2002). 58. W. Rieping, M. Habeck, M. Nilges, Science 309, 303–306 e36861 (2018).
45. G. A. Tribello, M. Bonomi, D. Branduardi, C. Camilloni, G. Bussi, (2005). 73. N. R. Latorraca et al., Cell 169, 96–107.e12 (2017).
Comput. Phys. Commun. 185, 604–613 (2014). 59. G. Hummer, J. Köfinger, J. Chem. Phys. 143, 243150 (2015). 74. H. S. Chung, S. Piana-Agostinetti, D. E. Shaw, W. A. Eaton,
46. L. C. T. Pierce, R. Salomon-Ferrer, C. Augusto F. de Oliveira, 60. M. Bonomi, C. Camilloni, A. Cavalli, M. Vendruscolo, Sci. Adv. 2, Science 349, 1504–1510 (2015).
J. A. McCammon, R. C. Walker, J. Chem. Theory Comput. 8, e1501177 (2016). 75. S. Wang, S. Sun, Z. Li, R. Zhang, J. Xu, PLOS Comput. Biol. 13,
2997–3002 (2012). 61. T. A. Leonard, B. Różycki, L. F. Saidi, G. Hummer, J. H. Hurley, e1005324 (2017).
47. Y. Wang, O. Valsson, P. Tiwary, M. Parrinello, K. Lindorff-Larsen, Cell 144, 55–66 (2011). 76. T. A. Hopf et al., Nat. Biotechnol. 35, 128–135 (2017).
J. Chem. Phys. 149, 072309 (2018). 62. S. Olsson, H. Wu, F. Paul, C. Clementi, F. Noé, Proc. Natl. Acad. 77. J. Shendure, J. M. Akey, Science 349, 1478–1483 (2015).
48. J. M. L. Ribeiro, P. Bravo, Y. Wang, P. Tiwary, J. Chem. Phys. Sci. U.S.A. 114, 8265–8270 (2017). 78. S. V. Nielsen et al., PLOS Genet. 13, e1006739 (2017).
149, 072301 (2018). 63. S. Bottaro, G. Bussi, S. D. Kennedy, D. H. Turner, 79. K. A. Matreyek et al., Nat. Genet. 50, 874–882 (2018).
49. C. Wehmeyer, F. Noé, J. Chem. Phys. 148, 241703 (2018). K. Lindorff-Larsen, Sci. Adv. 4, r8521 (2018). 80. V. Gapsys, S. Michielssens, D. Seeliger, B. L. de Groot,
50. C. X. Hernández, H. K. Wayment-Steele, M. M. Sultan, 64. N. Salvi, A. Abyzov, M. Blackledge, J. Phys. Chem. Lett. 7, Angew. Chem. Int. Ed. Engl. 55, 7364–7368 (2016).
B. E. Husic, V. S. Pande, Phys. Rev. E 97, 062412 (2017). 2483–2489 (2016).
51. H. van den Bedem, J. S. Fraser, Nat. Methods 12, 307–318 (2015). 65. P. D. Dixit, K. A. Dill, J. Chem. Theory Comput. 14, 1111–1119 AC KNOWLED GME NTS
52. W. Boomsma, J. Ferkinghoff-Borg, K. Lindorff-Larsen, (2018). We thank Y. Wang for providing part of Fig. 1. Parts of Fig. 1A were
PLOS Comput. Biol. 10, e1003406 (2014). 66. R. Capelli, G. Tiana, C. Camilloni, J. Chem. Phys. 148, 184114 (2018). designed by Freepik. Funding: We acknowledge funding from the
53. W. Boomsma et al., Proc. Natl. Acad. Sci. U.S.A. 111, 67. Y. Shi et al., J. Chem. Theory Comput. 9, 4046–4063 (2013). Velux Foundations, the Lundbeck Foundation BRAINSTRUC
13852–13857 (2014). 68. K. T. Debiec et al., J. Chem. Theory Comput. 12, 3926–3947 initiative, and a Hallas-Møller stipend from the Novo Nordisk
54. A. Perez, J. A. Morrone, E. Brini, J. L. MacCallum, K. A. Dill, (2016). Foundation. Competing interests: None declared.
Sci. Adv. 2, e1601274 (2016). 69. C. Bergonzo, N. M. Henriksen, D. R. Roe, T. E. Cheatham 3rd,
55. D. Russel et al., PLOS Biol. 10, e1001244 (2012). RNA 21, 1578–1590 (2015). 10.1126/science.aat4010

Downloaded from http://science.sciencemag.org/ on July 26, 2018

Bottaro et al., Science 361, 355–360 (2018) 27 July 2018 6 of 6


Biophysical experiments and biomolecular simulations: A perfect match?
Sandro Bottaro and Kresten Lindorff-Larsen

Science 361 (6400), 355-360.


DOI: 10.1126/science.aat4010

Downloaded from http://science.sciencemag.org/ on July 26, 2018


ARTICLE TOOLS http://science.sciencemag.org/content/361/6400/355

RELATED http://science.sciencemag.org/content/sci/361/6400/342.full
CONTENT
http://science.sciencemag.org/content/sci/361/6400/344.full
http://science.sciencemag.org/content/sci/361/6400/348.full
http://science.sciencemag.org/content/sci/361/6400/360.full
http://science.sciencemag.org/content/sci/361/6400/366.full
http://science.sciencemag.org/content/sci/361/6400/313.full
http://science.sciencemag.org/content/sci/361/6400/326.full
REFERENCES This article cites 77 articles, 16 of which you can access for free
http://science.sciencemag.org/content/361/6400/355#BIBL

PERMISSIONS http://www.sciencemag.org/help/reprints-and-permissions

Use of this article is subject to the Terms of Service

Science (print ISSN 0036-8075; online ISSN 1095-9203) is published by the American Association for the Advancement of
Science, 1200 New York Avenue NW, Washington, DC 20005. 2017 © The Authors, some rights reserved; exclusive
licensee American Association for the Advancement of Science. No claim to original U.S. Government Works. The title
Science is a registered trademark of AAAS.

You might also like