You are on page 1of 13

Remco J.

Vietor1
Karim Mazeau2
Miles Lakin2
Serge Perez1,2
1

Ingenierie Moleculaire,
Institut National de la
Recherche Agronomique,
Rue de la Geraudie`re,
BP 71627,
44316 Nantes Cedex, France

A Priori Crystal Structure


Prediction of Native
Celluloses

Centre de Recherches
sur les Macromolecules
Vegetales,*
Centre National de la
Recherche Scientifique,
BP53,
38041 Grenoble Cedex,
France
Received 14 May 1999;
accepted 29 March 2000

Abstract: The packing of -1,4-glucopyranose chains has been modeled to further elaborate the
molecular structures of native cellulose microfibrils. A chain pairing procedure was implemented
that evaluates the optimal interchain distance and energy for all possible settings of the two chains.
Starting with a rigid model of an isolated chain, its interaction with a second chain was studied at
various helix-axis translations and mutual rotational orientations while keeping the chains at van
der Waals separation. For each setting, the sum of the van der Waals and hydrogen-bonding energy
was calculated. No energy minimization was performed during the initial screening, but the energy
and interchain distances were mapped to a three-dimensional grid, with evaluation of parallel
settings of the cellulose chains. The emergence of several energy minima suggests that parallel
chains of cellulose can be paired in a variety of stable orientations. A further analysis considered
all possible parallel arrangements occurring between a cellulose chain pair and a further cellulose
chain. Among all the low-energy three-chain models, only a few of them yield closely packed
three-dimensional arrangements. From these, unit-cell dimensions as well as lattice symmetry were
derived; interestingly two of them correspond closely to the observed allomorphs of crystalline
native cellulose. The most favorable structural models were then optimized using a minicrystal
procedure in conjunction with the MM3 force field. The two best crystal lattice predictions were
for a triclinic (P1) and a monoclinic (P21) arrangement with unit cell dimensions a 0.63, b
0.69, c 1.036 nm, 113.0, 121.1, 76.0, and a 0.87, b 0.75, c 1.036 nm,
94.1, respectively. They correspond closely to the respective lattice symmetry and unit-cell
dimensions that have been reported for cellulose I and cellulose I allomorphs. The suitability of
Correspondence to: Serge Perez; email: perez@cermav.cnrs.fr
Contract grant sponsor: INRA, CARENET-2, and CNRS
* Associated with University Joseph Fourier, Grenoble.
Biopolymers, Vol. 54, 342354 (2000)
2000 John Wiley & Sons, Inc.

342

Crystal Structure Prediction

343

the modeling protocol is endorsed by the agreement between the predicted and experimental
unit-cell dimensions. The results provide pertinent information toward the construction of macromolecular models of microfibrils. 2000 John Wiley & Sons, Inc. Biopoly 54: 342354, 2000
Keywords: -1,4-glucopyranose chains; packing; molecular structure; native cellulose microfibrils; crystal structure prediction

INTRODUCTION
Many recent advances in the theory and application of
molecular modeling to the structural elucidation of
carbohydrate and carbohydrate polymers have produced a wide range of useful results.1 4 In combination with experimental methods, computer modeling
has become an integral part of the strategy for revealing three-dimensional structures, both in solution and
in the condensed phase. Nevertheless, the realm of
carbohydrate modeling has tended to emphasize intramolecular rather than intermolecular aspects. When
dealing with materials in condensed phases, the modeling technique can be combined with information
derived from electron and fiber diffraction to enable
quantitative solution of the three-dimensional crystalline structure.5 In addition to rationalizing why the
observed crystalline arrangement is the preferred
form, a further goal is the prediction of all stable
three-dimensional organizations accessible to the
polysaccharide in a given conformation. It would also
be desirable to extend this predictive methodology to
less ordered systems such as gels, where chain chain
interactions may occur to promote the formation of
the so-called junction zones.
These topics require the development of general
rules for analyzing the stability of certain interhelix
arrangements. Several authors have proposed methods for investigating the interhelix structure and energy through nonbonded forces.6 15 These procedures
involve a minimization of the interhelix energy. In
contrast, we have developed a method where the
helices are positioned so as to allow contact but not
interpenetration of the van der Waals surfaces of the
two helices. After the helices are placed at the position of van der Waals contact for a given helix helix
rotation and translation, the energy is calculated. This
procedure takes considerably less computer time than
methods involving energy minimization, and has been
successfully applied to synthetic polymers,16 and to
the polysaccharide chitin and starch.17 In this latter
example, the structure predicted to be most stable
corresponds to a duplex of parallel double helices, as
found in both the crystalline A and B allomorphs.18,19
From these results, an explanation of the transition
from the A to B allomorph has been proposed.17
Native cellulosic materials are organized into microfibrils in which crystalline domains coexist with

amorphous zones. Little is known about the ultrastructures of the amorphous zones. Incidentally, the
detailed crystal structure of native cellulose (cellulose
I) is still a matter of debate despite more than 70 years
of research effort. X-ray fiber diffraction experiments
initially lead to models based on two-chain or eightchain unit cells depending upon the source of the
sample,20,21 with the eight-chain unit cell invoked to
account for weak signals in the diffraction pattern.
Later experiments using cross-polarizationmagic angle spinning proton nmr indicated the presence of two
allomorphs in samples of native cellulose, designated
I and I.2224 Subsequent results from electron diffraction25 verified the presence of these two allomorphs and provided data on crystal symmetry and
unit cell size for each allomorphs. The I allomorph
crystallizes in the triclinic P1 space group and contains one cellobiose unit per unit cell with a parallel
arrangement of the chains as would be expected. The
I form crystallizes in the monoclinic space group
P21 with two cellobiose moieties per unit cell. Since
the two allomorphs are found within the same microfibril, parallel packing for I is the inescapable conclusion. The chain repeat length is found to be invariant at 1.036 nm.
Building realistic macromolecular models of
cellulosic microfibrils starting solely with information derived from the fiber repeat distance is still a
difficult task. One needs to predict both crystalline
allomorphic phases of cellulose together with less
ordered regions, which could occur in the amorphous phase. This requires an exhaustive exploration of the low-energy three-dimensional arrangements of cellulose chains. The present work
assesses the feasibility of the methodology of generation. A very important aspect of the work is the
adequation of the proposed models. Therefore,
prior to the generation of dense packed chains one
needs a proper validation of the method used; this
can be achieved by a careful prediction of the two
allomorphs of cellulose I. In addition to refining the
methodology for predicting solid state polymorphism, the study should yield the crystal structure
of the two cellulose I allomorphs, and provide some
insight into the nature of each form and into transitions between them.

344

Vietor et al.

FIGURE 1 Schematic representation of the cellobiose


unit. The atoms belonging to the reducing residue are
primed. The relative arrangement of the two glucose residues is determined by three angles: the glycosidic bond
angle defined by the atoms C1OO4OC4, and the two
torsion angles (O5OC1OO1OC4) and
(C1OO1OC4OC5). In addition, the torsion angle
(O5OC5OC6OO6) was used to describe the orientation of
the primary hydroxyl group, which normally adopts one of
the three staggered positions: gauche gauche (gg), gauche
trans (gt), and trans gauche (tg).

COMPUTATIONAL METHOD
Nomenclature
The atom coordinates for the glucose residue used in this
study were taken from the MONOBANK database of
monosaccharide structures.26 The atom numbering and the
angle definitions used are shown in Figure 1. The relative
orientation of two glucose residues was determined by three
angles: the glycosidic bond angle , defined by the atoms
C1OO4OC4, and the two torsion angles
(O5OC1OO1OC4) and (C1OO1OC4OC5). In addition, the torsion angle (O5OC5OC6OO6) was used to
describe the position of the primary hydroxyl group.

Computer Generation of Cellulose Chains


The PFOS program27 was used to generate cellobiose units
and to calculate the conformational energy and the helical
parameters n (number of residues per turn) and h (advance
along the helicoidal axis per residue) as a function of and
for the primary hydroxyl conformations gg ( 60),
gt ( 180), and tg ( 60; see Table I). The
calculations used the 1.036 nm cellulose fiber repeat distance established by x-ray diffraction together with a gly-

Table I Conformation Parameters of Studied Chains


(for 60, 60; 180)

c
(nm)

()

Chain
Helix
Translational

117.5
117.5
117.5

90
77
102

153
141
164

1.036
1.036
1.036

FIGURE 2 Rigid residue potential energy surface of cellobiose. Iso-energy contours are shown at 1 kcal/mol intervals relative to the global minimum. Superimposed upon
these contours are those calculated for the helical parameters n 2 and h 0.518 nm . Where the cellulose chain
adopts true 21 helical symmetry (, ) must assume the
values (90, 153). This requirement may be relaxed if
the glycosidic torsion angles alternate between (, )1
(77, 141) and (, )2 (102, 164).
cosidic bond angle of 117.5. A typical potential energy
surface is represented in Figure 2.
Strict helical symmetry of a macromolecular chain requires that equivalent chemical units occupy equivalent
positions about the molecular axis. Such model chains were
prepared with glycosidic torsion angles falling at minima on
the potential energy surface and generating helices having n
2 and h 0.518 nm. This type of chain will be referred
to as helical with 117.5, 90, and 153.
Cellulose chains exhibiting a repeat distance of 1.036 nm
can also be constructed by regular alternation between two sets
of glycosidic torsion angles. Such chains, without internal
symmetry, will be referred to as translational. They were
obtained by setting one glycosidic linkage to one of the minimum energy conformations determined with PFOS; with the
next linkage manipulated until translational symmetry with the
required period of 1.036 nm was obtained between residues i
and (i 2). Relative chain energies for the conformations
obtained in this way were estimated by averaging the energies
determined with PFOS for the individual sets (, ). The chain
having the lowest energy was selected for further study. Conformational parameters for the generated chains are collected
in Table I. Coordinates of both cellulose helices are available
upon request form the authors.

Prediction of the Stable ChainChain


Arrangements
The relative orientation of two commensurate chains (chain
A and chain B) oriented either parallel or antiparallel, re-

Crystal Structure Prediction

345

appropriate van der Waals radius R around each of the


constituent atoms. Then for a given orientation of the chains
(as dictated by rotation angles A and B and an increment
along the chain axis z) the relative translation x is found
that will bring the surfaces into a position where they are in
contact, but without interpenetration. In general, the final
position of the two initially penetrating polymer chains is
characterized by the following conditions:
1. For at least one atom pair (i, j) the ith atom of the
chain A is separated from the jth atom of the chain B
by the sum of the associated van der Waals radii (R ij
R i R j ). The atom pair i, j that satisfies these
conditions is referred to as the determining contact.
2. For all atom pairs in the two separate chains, there is
no pair at a distance closer than the sum of their van
der Waals radii (D ij R i R j ).
3. Condition (2) is violated any time that an atom pair i,
j is involved in hydrogen bonding with no a priori
optimum value for the interatomic distance. This limiting case is treated by identifying all atom pairs
potentially eligible for participation in interchain hydrogen bonding and omitting them from the procedure. Thus hydrogen bonding will not violate principle (1).

FIGURE 3 Interhelical parameters used to define the geometric orientation of the two parallel cellulose chains (A
and B): Chain rotations A and B , interchain contact
distance x , and longitudinal offset z .
quires a set of four interhelical parameters: A , a rotation of
the chain A about the helical axis from 0 to 360, B , a
rotation of the chain B about its axis from 0 to 360, x,
and z, which are taken as positive, and represent positional shifts normal and parallel to the identity axis, respectively. z is bounded between 0 and t (fiber repeat). The
spatial description of these parameters is shown in Figure 3.
The minimum energy arrangement of the two polymer
chains with respect to a displacement, will tend to bring the
molecules as close as possible without interpenetration of
Van der Waals radii. In reality, a small amount of repulsive
energy resulting from interpenetration of some atom pairs
can be compensated by additional attractions from the remaining atoms pairs. However, nonbonded contact distances deviate by only 10% (0.02 0.03 nm) in molecular
solids.28 The contacting procedure28 involves describing the
surface of each chain by circumscribing a hard sphere of

Using this contacting procedure for chain chain construction, the search space is reduced to three geometric
variables. The resulting interchain interaction energy (E AB )
can then be calculated to the required degree of approximation.
For the simulation of cellulose, the interaction energy of
the two chains was considered to be the sum of all pairwise
atomatom interactions and was calculated using a 6 12
potential function,29,30 with an additional term to cover the
stabilization arising from the interchain hydrogen bonding.
This was based on the distance between the oxygen atoms
that can interact through hydrogen bonding (0.25 0.30 nm)
without recourse to the hydroxyl hydrogens.
The CHACHA program17 was used to map x and E AB
as a function of the structural variables A , B , and z.
The analysis was performed by incrementing A and B
over the whole angular range by increments of a few degrees and the relative translation (z) between the two
chains was studied over the length of the whole fiber repeat,
typically by increments of 0.01 0.05 nm (i.e., h/ 20). The
rotations A and B were set to be both independent and
coupled ( A B ). For each setting of the chain as
a function of A and B and z, the magnitude of the
perpendicular offset x was derived according to the contact procedure described above. The value of the energy
E AB was then computed. The mapping procedure was used
to search for low-energy regions. In order to pinpoint the
energy minima, regions containing a local minimum were
searched a second time using intervals of 1 () and 0.005
nm (for z). This procedure provided a complete overview
of the symmetry (or lack of symmetry) of the chain chain
interactions. The set of interhelical parameters relate to the
lattice symmetry that characterises the three-dimensional
organisation as follows:

346

Vietor et al.

1. A B : Chain A and chain B are not related by


any symmetry operation. They are independent and
both form the asymmetric unit of the cell.
2. A B : Chain B is derived from chain A by a
simple translational operation.
3. A B 180, and z 0: The two chains are
parallel and related by a twofold operation. For a
twofold screw axis operation, the conditions will be
A B 180 and D z t/ 2.
4. A B 180: The two chains are antiparallel
and related by a twofold or by a twofold screw axis
symmetry operation.
Several protocols were used for the grid search. In the
simplest case, the rotations of the 2 chains were coupled ( A
B ). For the helical chains, the internal 21 screw symmetry of the chains reduced the range to be searched to
between 0 and 180, and z between 0.0 and 0.518 nm.
This reduction of the search space was not applicable to the
translational chains.
In the study of the monoclinc arrangement, two cases
need to be considered. In the first and most general case, the
chain axes do not coincide with the 21 screw axes of the
lattice. This implies that the chains have to be related
through 21 symmetry, i.e., A B 180 and z
0.518 nm (c/ 2). In the alternative case, where the
chains do coincide with the screw axis, two groups of chains
can be distinguished, with no a priori relation between the
chains of different groups; the relative orientations of two
such chains are independent. The chains within a group are
related by translation in the xy plane. For the helical chains,
both of these situations are possible, the first being a subset
of the second. For the translational chains, only the first
situation is possible.

Expansion to a Three-Dimensional Lattice


Given two independent orientations of B chains with respect
to the A chain, a three-dimensional lattice is described. The
required parameters, 1A , 1B , x1 , z1 , and 2 A , 2B ,
x2 , z2 , are illustrated in Figure 4. Such arrangements
were determined by combining a further single chain with a
chain pair from one of the stable conformations obtained in
the previous step. The rotations of the single chain and the
duplex were always coupled since the symmetry of the
systems under investigation does not allow more than two
orientations of the cellulose chains. The vertical displacement z was free for triclinic lattices, but fixed at 0.00 nm
for monoclinic lattices.
The three-chain arrangements obtained in this way were
screened to exclude those with a space-filling array of unit
cell volume per cellobiose outside the range of the experimental values, i.e., outside 0.3 and 0.4 nm3. The corresponding arrangements were checked for stability using a
Simplex optimization. Finally, unit-cell parameters were
determined for the refined arrangements that gave the smallest unit cell volumes (i.e., the highest densities). The detained procedure to calculate the unit-cell parameters has
been described previously16 and the set of equations to be
used are given in the caption for Figure 4.

FIGURE 4 Building up crystalline structures. The calculations for three chains gave the relative positions of the
chains in stable triplet interactions. For triclinic lattices, the
sides a and b, the unit cell angles , , and , the projection
of on the xy plane (*), and the lattice volume per
cellobiose unit can be calculated from the parameters for the
stable configurations using Eqs. (1)(7). For the monoclinic
lattices, Eqs. (8), (9), and (10) were used instead of (1), (2),
and (6) respectively to calculate a, b, and .
a x12 z12

(1)

b x22 z22

(2)

tan

z2
x2

(3)

tan

z1
x1

(4)

* 180 1, A 2, A
cos cos cos sin sin cos *

(5)
(6)

V x1 x2 c sin *

(7)

a x22 x1 22 4 x1 x2 cos *

(8)

b x2

(9)

sin180

2x1
sin *
a

(10)

Up to five unit cells per series were retained. The general


procedure to predict the occurrence of all possible threedimensional arrangements, is summarized in Figure 5.

Simplex Calculations
The CHACHA chain chain algorithm allows three-chain
arrays to be considered stable even where two of the chains
do not interact. This situation would lead to channel-like
voids in the corresponding crystal lattice and were therefore
excluded.
In order to confirm their viability, Simplex optimizations
were performed on the CHACHA-derived three-chain arrangements. The interchain distances (x), the chain orientations (), the relative shifts (z), and the angle in the xy

Crystal Structure Prediction

347

central cellobiose unit was measured by calculating the rms


atom displacement of all nonhydrogen atoms and of the
final values for , , and . The orientations of the primary
hydroxyl groups after relaxation were also determined. The
lattice energy of the central disaccharide was estimated as
the sum of all intermolecular interactions involving this
disaccharide.

RESULTS
Chain Construction

FIGURE 5 Flow chart showing the general procedure


used to investigate all possible three-dimensional parallel
and antiparallel arrangements of cellulose chains.

plane determined by the three-chain arrangement were varied (eight parameters). In order to retain the required translational symmetry for the triclinic cases the rotation around
the helical axis was held identical for the three chains. This
gave 6 degrees of freedom in total. The Simplex was generated by multiplying the initial value of each variable in
turn by 1.05 and using the obtained tupel as an additional
vertex for the simplex, giving 7 vertices in total. Simplex
optimisation was continued until the minimum for the total
interchain energy was reached. Both nonbonding and hydrogen-bonding interactions were taken into account for
these energy calculations. Structures that showed large displacements or rotations were rejected.

Minicrystal Generation and Energy


Minimization
The most favorable structural models (i.e., those having the
lowest unit cell volume) were optimized using a minicrystal
procedure as described by French et al.31 Minicrystals consisting of 7 cellotetraose units were generated using the
calculated chain orientations and lattice parameters, in such
a way that a central cellobiose unit was completely surrounded. The resulting model was allowed to relax under
the MM3 force field.32,33 The total steric energy calculated
by MM3 includes intramolecular terms (bond stretching and
bending, forming and deforming pyranose rings), as well as
nonbonded forces that can apply to both intra- and intermolecular interactions. The more stable arrangement will have
a lower total energy, regardless of whether the stability
comes from a more stable isolated molecule or from a better
intermolecular arrangement. Instead of energy contributions
estimated from explicit atomic charges, the dipole dipole
energy is evaluated. This is very dependent on the dielectric
constant as is the hydrogen-bonding term. A value of 4 was
used to mimic the effect of a crystalline environment on
isolated molecules. After relaxation, deformation of the

Among the (, ) combinations compatible with the


1.036 nm fiber repeat, only the one giving n 2 and
h 0.518 nm falls within the fully allowed zone of
the potential energy surface shown in Figure 2. If an
ideal twofold helical structure is assumed for the
cellulose chain, there exists only one set of glycosidic
torsion angles that can generate the helical parameters. For such a helical chain, minimal glycosidic
bond energy is obtained with 90 and
153. Further stabilization is possible through
interresidue hydrogen bonds between O5 and O3
(oxygen oxygen distance, d OOO, approximately 0.25
nm) and, depending on the orientation of the primary
hydroxyl group, between O6 (orientation gt) and O3
(d OOO 0.31 nm) or O2 and O6 (orientation tg;
d OOO 0.28 nm). Consideration of the potential
energy surface suggests that slight deviations from the
ideal helical structure could not only be accommodated but also be energetically more stable. A chain
not conforming to helical symmetry is characterised
by a set of alternating torsional glycosidic angles (,
)1 and (, )2, and is termed a translational chain.
Minimum energy for such a chain was found for the
combination of (, )1 (77, 141) and (, )2
(102, 164).

Packing of Helical Cellulose Chains


Two-chain Arrangements. Both coupled chain rotations and independent rotations were considered and a
number of stable arrangements were found. As a first
step, all possible arrangements occurring between two
parallel cellulose chains, were examined. This was
performed by rotating A and B over the whole
angular range from 0 to 360 and the relative displacement of the two chains was investigated over the
whole length of the fibre repeat and with the different
orientations of the primary hydroxyl groups. For each
setting of the chains as a function of A , B , and z,
the magnitude of the offset perpendicular to the chain
axis x was computed according to the contact procedure. The value of the energy corresponding to each
set of chain orientations was evaluated as a function

348

Vietor et al.

FIGURE 6 (a) Interchain potential energy surface as a function of coupled variation of A and B
with the perpendicular offset x. Contours are drawn at intervals of 5 kcal/mol/cellobiose; (b)
Interchain potential energy map at the optimum perpendicular offset x, as a function of the
translation z along the chain direction and coupled rotations of A and B . Contours are drawn
at 5 kcal/mol/cellobiose intervals.

of the set of four interhelical orientations. The search


of energy minima was performed within the threedimensional ( A , B , z) space. Seventeen energy
minima were found within a difference energy window of 2 kcal/mol/cellobiose. The results indicate that
the significant energy minima occurred for values of
A lying in the vicinity of B . This suggests that
appropriate packing of neighboring helical chains can
be achieved with operations of simple translation.
The following steps were conducted assuming A

B . This allowed for a straightforward two-dimensional study. The contour maps calculated as a function of the translation z, along the fibre axis and the
coupled rotation angles A B are shown in Figure
6. Figure 6a is a representation of interchain energy in
relation to coupled variations of A and B , with the
perpendicular offset x. Figure 6b shows the interchain potential energy map at the optimum perpendicular offset x, as a function of the translation z,
along the chain direction and coupled rotations of A

Crystal Structure Prediction

349

Table II

Optimum Values for the ChainChain Interactions for Coupled Rotations of Helical Cellulose Chains

A
()

z
(nm)

x
(nm)

E(vdW)
(kcal/mol/dis)

E(HB) (kcal/
mol/dis)

E(Tot) (kcal/
mol/dis)

Contacts

gg
gg
gg
gg
tg
tg
tg
tg
tg
tg
tg
tg
tg
gt
gt
gt
gt
gt
gt
gt
gt
gt

77
77
85
86
77
60
96
58
88
74
50
111
9
126
169
77
101
61
77
58
88
9

0.00
0.520
0.226
0.219
0.00
0.274
0.324
0.000
0.270
0.520
0.179
0.000
0.269
0.279
0.000
0.000
0.219
0.279
0.520
0.00
0.209
0.264

0.481
0.488
0.501
0.503
0.482
0.468
0.544
0.477
0.522
0.494
0.487
0.635
0.642
0.669
0.809
0.480
0.550
0.468
0.488
0.477
0.508
0.646

7.24
6.42
5.50
5.47
7.03
6.17
6.89
5.60
5.18
5.07
4.97
4.91
4.57
5.61
3.35
6.11
5.68
5.65
5.39
5.13
5.08
4.76

0
0
0
0
0
0
0
0
0
0
0
0
0
1.95
3.95
0
0
0
0
0
0
0

7.24
6.42
5.50
5.47
7.03
6.17
5.89
5.60
5.18
5.07
4.97
4.91
4.57
7.56
7.30
6.11
5.68
5.65
5.39
5.13
5.08
4.76

154
145
132
131
159
150
135
120
135
127
130
94
100
101
62
140
139
138
138
114
119
100

and B . In this example, all primary hydroxyl groups


of the cellulose chain are tg. This map exhibits an
obvious symmetry; four equivalent orientations ( ,
z), ( 180, z), ( 180, c z) and ( ,
c z) are observed due to the helical nature of the
cellulose chain used as the model. Consequently, only
one section needs to be described. The most stable
area of the map, as delineated by the first contour
covers a 20 angular range in and a 0.15 nm range
in x. This indicates that there are multiple interchain
arrangements and that libration motion within those
limits could occur. Domains are found, which correspond to a somewhat restricted range of A . Vertical
displacements are 0.00 or 0.279 nm (slightly more
that h/ 2 2.59 nm); x appears to be mainly
dependent on A , and much less on z. Within these
domains, 17 energy minima were found within a
difference energy window of 2 kcal/mol/cellobiose.
Characteristics of the most energetically stable arrangements are given in Table II. These arrangements
have the chains stacked with ring surfaces touching.
Three-chain Arrangements. Three-chain arrangements were calculated starting from each of the stable
two-chain arrangements calculated, by combining the
chain duplex with a third identical chain. Only stable
arrangements with low energy were retained. In most
cases, the global minimum for the three-chain

searches gave arrangements with the three chains in


one plane. Since such arrangements do not correspond
to space-filling lattices, they were excluded. As an
example, Figure 7 shows the energy and lateral displacement as functions of and z using a single
chain combined with a chain duplex. The presence of
a pair of chains instead of a single chain removes a
number of equivalent positions that were visible in
Figure 6. Here again, the most stable areas extend in
both directions, suggesting that there are many chain
arrangements with comparable energy.
With the freely oriented chains, differences in
chain orientation ( A B ) of more than ca. 10;
caused the appearance of channels in the corresponding lattice, so decreasing the chain chain interaction and the lattice density. Despite their importance in the description of the amorphous structures,
such arrangements were not suitable for generating
three-dimensional crystalline arrangements. They
were therefore discarded.

Packing of Translational
Cellulose Chains
Two-chain Arrangements. Stable two-chain and
three-chain arrangements were determined and selected as described for the helical chains. Since the
translational chains do not contain an internal 21 axis;

350

Vietor et al.

FIGURE 7 (a) Chain : chain-pair potential energy surface as a function of chain rotation and
the perpendicular offset x. Contours are drawn at 5 kcal/mol/cellobiose intervals above the global
minimum; (b) Chain : chain-pair potential energy map at the optimum perpendicular offset x, as
a function of the translation z along the chain direction, and coupled chain rotations . Contours
are drawn at 5 kcal/mol/cellobiose intervals above the global minimum.

only coupled rotations needed to be considered. Hydrogen bonding was not taken into account at this
stage. For the stable configurations the energies found
were higher than for the helical chains. Also, the
distance between the chains was larger, and the number of chain chain contacts lower.

compatible with a monoclinic lattice could be determined. Though several arrangements compatible with
a triclinic lattice were found, though these resulted in
rather large unit-cell volumes compared to those obtained for the helical chains.

Three-chain Arrangements. No stable 3-chain arrangements that resulted in a viable spacefilling lattice

Unit-cell parameters were calculated for the selected


three-chain arrangement obtained, allowing a space-

Final Selection

Crystal Structure Prediction

351

filling lattice to be generated. The structures resulting


in the lowest cell volumes were retained. This gave
two models for the triclinic phase: one with helical
chains and one with translational chains. The triclinic
model with helical chains was considered to be the
preferred candidate in view of its smaller unit cell
volume. Only a model with helical chains could be
retained for the monoclinic phase; the translational
chains did not give acceptable three-chain configurations.
The cellulose chains of both models are arranged
in sheets that can be stabilized by hydrogen bonding.
For the triclinic model, hydrogen bonding between
sheets is not possible, whereas in the monoclinic
model one hydrogen bond per cellobiose unit can
participate in an intersheet link.

Description of the Selected Models


Two models were selected as representative of the
crystalline structure of native cellulose. For both models, the cellulose chains were taken as parallel to the
c axis of the unit cell.
Triclinic Model. The triclinic unit cell, space group
P1, contains one cellobiose unit, and is shown in
Figure 8. Unit-cell parameters are a 0.63, b
0.69, c 1.036 nm, 113.0, 121.1,
76.0. As the chains are related by translational
symmetry only, all have the same orientation about
the chain axis. The chains of this structure are organised into sheets oriented along the (1, 1, 0) plane.
Within the sheets, the relative positions of the chains
are constant, with a distance between helix axes of
0.82 nm and a displacement of 0.055 nm along the
chain axis. These sheets are stabilized by an interchain hydrogen bond between the OH6 of one chain
and the OH3 of the closest neighbor within the sheet
(d OOO 0.30 nm). The distance between the sheets
is 0.433 nm, with a displacement of 0.279 nm along
the chain axis. No hydrogen bonding was evident
between chains located in different sheets. An intrachain hydrogen bond can be deduced between the tg
O6 and O2 in the next ring along the chain (d OOO
0.27).
Monoclinic Model. The predicted monoclinic arrangement corresponds to space group P21 and contains 2 cellobiose units, as shown in Figure 9. The unit
cell parameters are a 0.87, b 0.75, c 1.036
nm, 94.1. The chain axes coincide with symmetry axes of the unit cell with one chain placed at the
corner of the unit cell and the other at the center. As
in the triclinic lattice, the chains are organized into
sheets. In the present model, these sheets are arranged

FIGURE 8 The triclinic model of cellulose I, viewed (a)


along the chain axis and (b) perpendicular to the chain axis.

parallel to the bc (1, 0, 0) plane with a distance of


0.435 nm between the sheets. Two types of sheets can
be distinguished based on a 5 difference in the orientation of the constituent cellulose chains. Interchain
hydrogen bonds are possible between O6 of a selected
chain and the O2 of the closest neighboring chain
(d OOO 0.25 0.28 nm, depending on the sheet) and
not O3 as in the triclinic case. A further hydrogen
bond may occur between the O6 of a chain in one
sheet and the O4 of the closest chain in a neighboring
sheet (d OOO 0.25 0.32 nm). O6 of each unit
exhibits a gt orientation and lies at 0.31 nm from O3
of the next pyranose ring along the chain. The difference between our model and some suggested one for
the monoclinic phase lies in the orientation of the O6
primary hydroxyl group. According to nmr measurement of the C6 chemical shift a tg conformation has
been proposed as derived from the use of an empirical

352

Vietor et al.

FIGURE 9 The monoclinic model of cellulose I, viewed


(a) along the chain axis and (b) perpendicular to the chain
axis.

in the relative orientation of the two chains. The


variation of the cylindrical polar angle between the
two chains has been reported to be either null or
slightly negative (3.5) whereas the magnitude of
this angle is 5 in our predicted three-dimensional
structure. However, it was shown that the variations
of this angle in the range 10 to 10 have only a
minor influence on the calculated agreement factor.
Not surprisingly, the calculated cell dimensions
exhibit some discrepancies with respect to those that
have been determined experimentally. In the case of
the monoclinic allomorph, the largest deviations
amount to 0.03 nm and to 2.2. When compared to the
average experimental data, the maximum deviations
in the unit-cell dimensions amount to 6% in lengths
and to 7% in the angles for the triclinic model, and 8%
in the lengths and only 3% in the angles for the
monoclinic model. Those variations reflect the occurrence of some flexibility in the interchain arrangements as revealed in the present study. They also
indicate that while our modeling protocol provides
satisfactory models, there is still room for improvement in the methodology. Also, taking into account
the possible conformational changes that the hydroxymethyl pendant group may undergo, along with the
occurrence of some possible variations in the puckering parameters of the pyranose rings, might improve
the end results in terms of the cell dimensions. It
should nevertheless be pointed out that in our simulation protocol yield prediction of both the space
group symmetry and unit-cell dimensions, without
any other constraint than the polysaccharide fiber
repeat. It is therefore remarkable to reach such an
agreement for the dimensions of the unit cells.

Lattice Energy Calculations


correlation between chemical shifts and conformation
of the primary hydroxyl groups. As a consequence,
intersheets hydrogen bonds cannot be formed. Of
course, our calculations do reveal some possible monoclinic arrangements for cellulose chains having their
primary hydroxyl groups in the tg orientation. However, the corresponding cell dimensions differ substantially than those experimentally derived. Beside,
the energy of such arrangements is slightly higher
than the one corresponding to our selected model.
A monoclinic model has been recently proposed.34
It is based on the refinement of two independent sets
of x-ray fiber diffraction data. The comparison between their proposed models and ours shows a very
satisfactory agreement. The unit-cell parameters and
the space group symmetry, along with the chain conformation, are the same except for the conformation
of the hydroxymethyl group. A small difference lies

As a final refinement, seven-chain minicrystals were


generated based on the unit cells obtained that had the
lowest cell volume. The conformational energy was
determined after minimisation with MM3. Lattice energy was determined as the sum of all intermolecular
energy contributions involving the central cellobiose
unit of the minicrystal.
The results of these calculations are shown in Table III. The structures based on helical chains showed
only minimal deformation after minicrystal optimization. The structure based on a translational chain, on
the other hand, was severely deformed toward a symmetrical helical chain. As a result of this deformation,
the lattice energy could not be usefully estimated.

Allomorphic Transitions
Comparison of the triclinic and monoclinic models
indicates that the interchain distances are quite simi-

Crystal Structure Prediction

353

Table III Chain Conformation Parameters after MM3 Seven-Chain Minicrystal Minimization (see Table I for
Original Values)

Lattice
Triclinic
Helix
Translational
Monoclinic
Helix chain A
Helix chain B

()

()

c
(nm)

Lattice
Energy
(kJ/mol)

Changea
(nm)

149
136

170

169

1.036
1.036

20.0
NDb

0.0114
0.0562

157
149

63
66

61
66

1.036
1.036

15.7
18.0

0.0136
0.0095

()

()

()

116
115.3

92
85

116.1
115.2

91
92

a
Root mean square of displacement of the non-hydrogen atoms of the central cellobiose unit (except OO6 and OO6) after fitting to the
original conformation.
b
Lattice energy could not be determined due to the large lattice deformation.

lar. Superposition of the two structures suggests that a


transition between them would not give rise to gross
crystal deformations.
A transition from the triclinic to the monoclinic
form would require the following longitudinal shifts
and small rotations of some cellulosic chains:
1. Reorient the conformation of OH-6 from tg to
gt.
2. Rotate the chains in every second sheet by 5
about c.
3. Vertically shift every fourth sheet by 0.518 nm,
i.e., c/ 2, or a 180 rotation of the chains in this
sheet (these changes are equivalent due to the
twofold helical axis).
4. As a consequence of these reorientation, a vertical shift of the third layer will result.
Given the absence of hydrogen bonding between
the sheets in the triclinic model, the vertical shifts
along the c axis should be relatively accessible at
elevated temperature. The formation of intersheet hydrogen bonds would then stabilize the structure by
fixing the chains in place. No lateral shifts would be
required to facilitate the transition so leaving the
layers intact.

CONCLUSION
The present work has established a computational
procedure to predict the different ways that a polysaccharide chain of known conformation is able to
interact with other chain-like molecules. The procedure has been applied to cellulose, for which stable
parallel chain pairings have been generated.
Few of these arrangements are capable of generating an efficiently packed three-dimensional array, but

may be pertinent to situations such as the amorphous


state or at the surface of cellulose crystalline domains.
Structures with parallel chains were used for comparison with experimentally derived data. They
should provide sound starting models for refinement
against observed structure factors derived from x-ray
or electron diffraction, but should not be considered
as complete descriptions of the two allomorphs of
native cellulose. Agreement between the predicted
unit cell dimensions and the published dimensions has
provided some degree of validation of the methodology.
The two most favorable predicted crystalline arrangements correspond to a triclinic lattice, space
group P1, and to a monoclinic form, space group P21.
These structures correspond closely to those which
have been reported for cellulose I and I, respectively. The cellulose chains in the selected models
form layers, stabilized by interchain hydrogen bonds.
Stacking of the layers to gives rise to the complete
crystal lattice. Layer stacking in the triclinic model is
stabilized only by van der Waals interactions. For the
monoclinic model, the layers are linked through two
interplane hydrogen bonds per cellobiose unit, one to
each neighbouring layer.
The present algorithm is limited by its two-stage
determination of stable three-chain configurations,
and acceptance of minimal interchain distances where
all three chains are not in contact. A new algorithm
that directly determines all stable three-chain interactions is being developped in our laboratory.
The authors gratefully acknowledge financial support from
INRA to RJV. The work was also conducted within the
framework of CARENET-2 a European funded network
within the Training and Mobility of Researchers 1994 1998
(MTL). The provision for financial support by INRA and
CNRS is acknowledged.

354

Vietor et al.

REFERENCES
1. Perez, S.; Kouwijzer, M. L. C. E.; Mazeau, K.; Engelsen, S. B. E. J Mol Graphics 1997, 14, 307321.
2. OSullivan, A. C. Cellulose 1997, 4, 173207.
3. Kroon-Batenburg, L. M. J.; Kroon, J. Glycoconjugate J
1997, 14, 677 690.
4. Kroon-Batenburg, L. M. J.; Bouma, B.; Kroon, J. Macromolecules 1996, 29, 56955699.
5. Perez, S. Methods Enzymol 1991, 203, 510 556.
6. Aabloo, A.; French, A. D. Macromol Theory Simul
1994, 3, 185191.
7. Aabloo, A.; French, A. D.; Mikelssar, R. H.; Perstin, A.
Cellulose 1994, 1, 161168.
8. Cousins, S. K.; Malcom Brown, R., Jr. Polymer 1995,
36, 38853888.
9. Heiner, A. P.; Sugiyama, J.; Telleman, O. Carbohydr
Res 1995, 273, 207223.
10. Hopfinger, A. K. Biopolymers 1971, 10, 1299 1315.
11. Hopfinger, A. J.; Walron, A. G. J Macromol Sci Phys
1969, B3, 195208.
12. Hopfinger, A. J.; Walron, A. G. J Macromol Sci Phys
1970, B4, 185199.
13. Marhofer, R. J.; Relling, S.; Brickman, J. Ber Bunsenges Phys Chem 1996, 100, 1350 1354.
14. Tai, K.; Kobayashi, M.; Tadokoro, H. J Polym Sci
Polym Phys Eds 1976, 14, 783797.
15. Woodcock, C.; Sarko, A. Macromolecules 1980, 13,
1183.
16. Perez, S. In Electron Crystallography of Organic Molecules; Fryer, J.; Dorset, D. L., Eds.; NATO ASI Series;
Kluwer Academic: New York, 1990; pp 3353.
17. Perez, S.; Imberty, A.; Scaringe, R. P. In Computer
Modeling of Carbohydrate Molecules; French, A. D.;
Brady, J. W., Eds.; ACS Symposium Series, American
Chemical Society: Washington, DC, 1990; pp 281299.

18. Imberty, A.; Chanzy, H.; Perez, S.; Buleon, A.; Tran, V.
J Mol Biol 1988, 201, 365378.
19. Imberty, A.; Perez, S. Biopolymers 1988, 27, 1205
1221.
20. Gardner, K. H.; Blackwell, J. Biopolymers 1974, 13,
19752001.
21. Sarko, A.; Muggli, R. Macromolecules 1974, 7, 486
494.
22. Attala, R. H.; VanderHart, D. L. Science 1984, 223,
283.
23. VanderHart, D. L.; Atalla, R. H. Macromolecules 1984,
17, 14651472.
24. Vanderhart, D. L.; Atalla, R. H. In The Structure of
Cellulose; ACS Symposium Series 1987, American
Chemical Society: Washington, DC, 1987; pp 88 118.
25. Sugiyama, J.; Vuong, R.; Chanzy, H. Macromolecules
1991, 24, 4168 4175.
26. Perez, S.; Delage, M. M. Carbohydr Res 1992, 212,
253259.
27. Tvaroska, I.; Perez, S. Carbohydr Res 1986, 149, 389
410.
28. Scaringe, R. P.; Perez, S. J Phys Chem 1987, 91,
2394 2403.
29. Chou, K. C.; Nemethy, G.; Scheraga, H. A. J Phys
Chem 1983, 87, 2869 2881.
30. Chou, K. C.; Nemethy, G.; Scheraga, H. A. J Am Chem
Soc 1984, 106, 31613170.
31. French, A. D.; Miller, D. P.; Aabloo, A. Int J Biol
Macromol 1993, 15, 30 36.
32. Allinger, N. L.; Yuh, Y. H.; Lii, J.-H. J Am Chem Soc
1989, 111, 8551 8134.
33. Allinger, N. L.; Rahman, M.; Lii, J.-H. J Am Chem Soc
1990, 112, 8293 8307.
34. Finkenstadt, V. L.; Millane, R. P. Macromolecules
1998, 31, 7776 7783.

You might also like