Professional Documents
Culture Documents
Viewpoint
Molecular modelling
for investigating
structureefunction
relationships of soy
glycinin
Thushan S. Withana-Gamagea,b
and Janitha P.D.
Wanasundaraa,b,*
a
Introduction
The conventional approach of discovering new protein for
food application relies on stepwise isolation and
* Corresponding author.
purification, and generating functional property profiles under predetermined conditions applicable to foods. Most
studies on functional properties (FP) of food proteins
have dealt with screening protein sources in vitro and in
model foods by a hit-or-miss approach. In this research
area, the emphasis is on the extrinsic factors that govern
protein functionality. For example, conditions involved in
protein processing, protein denaturation state, and other associated components affecting protein properties are mostly
highlighted. The link between the technologically valuable
FP and innate properties of the protein at molecular and
structural level is less explored.
Modelling structureefunction relations to quantify food
protein functionalities using Quantitative Structure Activity
Relationship (QSAR) approach has evolved since Nakai
(Nakai, 1983; Nakai & Li-Chan, 1988; Townsend &
Nakai, 1983; Voustinas, Cheung, & Nakai, 1983;
Voustinas, Nakai, & Harwalker, 1983) and other groups
continue enhancing the capability of this area (Pripp,
Isaksson, Stepaniak, Srhaug, & Ardo, 2005). Furthermore,
Liebman (1998) evaluated data mining approaches to investigate the relationships between structure and functions of
proteins for rational molecular design for directed uses.
The development of structureefunction relationships of
food proteins through molecular modelling approach reviewed by Kumosinski, Brown, and Farrel (1991a, 1991b)
used primary sequences of k-casein and as1-casein to generate secondary and unrefined three dimensional structures
demonstrating the ability of molecular modelling in solving
certain structureefunction relations of these proteins relevant in food applications.
Bioinformatics of post genomic era has revealed considerable information on plant proteins particularly related to
understanding desirable quality traits of food crops, however, these exponentially expanding data and tools have
barely been used to advance food protein research. The
knowledge gap on molecular biology of food proteins,
their structures and functions/properties can be narrowed
down through three-dimensional (3-D) molecular modelling based on homology models. Protein structure is
closely linked with protein function; the structural genomics has the potential to inform knowledge of protein
function.
Development of molecular modelling programs and
their application has been formalized in designing new
drugs and called computer assisted drug design (CADD)
0924-2244/$ - see front matter Crown Copyright 2012 Published by Elsevier Ltd. All rights reserved.
http://dx.doi.org/10.1016/j.tifs.2012.06.014
154
T.S. Withana-Gamage, J.P.D. Wanasundara / Trends in Food Science & Technology 28 (2012) 153e167
T.S. Withana-Gamage, J.P.D. Wanasundara / Trends in Food Science & Technology 28 (2012) 153e167
155
156
T.S. Withana-Gamage, J.P.D. Wanasundara / Trends in Food Science & Technology 28 (2012) 153e167
Fig. 1. The phylogenetic tree showing evolutionary relationships among storage proteins (SSP) of edible seeds. Ring (a) Major SSP clades: Brassicaceae in red, Fabaceae in yellow, Poaceae in blue, other species in black. Ring (b) The phylogeny of major plant protein families: 11e13S globulins
(green), 7S globulins (blue), 2S albumins (magenta), prolamin zein (red), and prolamin gliadin/glutelin (orange). Ring (c) Percentage identity (blacke
white) of each SSP to the best available template, Ring (d) Relationship of SSPs with the plant clade and group: Eudicotyledon clade (pink) and monocotyledon group (cyan). Ring (e) Bars represent the length of the sequence: filled bar indicate experimentally determined structure is available for the
sequences and few selected structures are shown outside of Ring e. Inset relates to Ring b and Ring e. Amino acid sequences were obtained from
UniProt database. The phylogenetic tree was generated using ClustalW 2.0 (Larkin et al., 2007, http://www.ebi.ac.uk/Tools/msa/clustalw2/) and interactive tree of life (iTOL) web server (Ciccarelli et al., 2006; Letunic & Bork, 2007, http://itol.embl.de/).
T.S. Withana-Gamage, J.P.D. Wanasundara / Trends in Food Science & Technology 28 (2012) 153e167
157
Fig. 2. Process steps of homology modelling and its application for building soybean glycinin protomer models. (a) Major steps involved in homology
or comparative modelling of protein tertiary structure. (b) Structure modelling of the five known subunits of soybean (Glycine max). Subunits are
divided into two groups according to their homology; group I e A1bB2, A1aB1b, and A2B1a, and group II e A3B4 and A5A4B3. The 3-D structure
of two protomers A1aB1b of group I and A3B4 of group II has been experimentally determined (PDB codes: 1FXZ and 2D5F, respectively). Tertiary
and quaternary structures of all five promoters were modelled using these two templates. Dark coloured loop areas in final protomers show the constructed disordered regions. (c) Bar charts show the homology in terms of sequence identity, sequence similarity, and gap of each subunit with corresponding sequence (black and grey) and template (other colours).
158
T.S. Withana-Gamage, J.P.D. Wanasundara / Trends in Food Science & Technology 28 (2012) 153e167
have obtained either from protein isolates of relevant mutant lines or microbial expression of cDNA sequences.
The glycinin protomers A1bB2, A2B1a, and A5A4B3
could be modelled without any HVRs to be consistent
with the available crystal structures of A1aB1b and A3B4
subunits. These structures are available as Supplementary
data Fig. S2 and called as core-structures (hereafter
referred to as Modelcore throughout this communication).
This referencing is similar to the explanation of
Maruyama et al. (1999) for the b-conglycinin subunits isolated from deletion mutants ac and a0c that are devoid of extension regions or HVRs and designating them as core
regions. The loops can be constructed for all five glycinin
subunit variants using MODELLER program which uses
the optimization-based approach (Fiser, Do, & Sali,
2000)
and hereafter the structures with loops are referred to as
Modelcore HVR. The loop regions or HVRs
(Supplementary data Fig. S1) involving over 12 amino
acid residues can be built or modelled using the step-by
step procedure of the MODELLER program (Sali
&
Blundell, 1993). According to Fiser et al. (2000), the loops
containing 12 residues can be predicted using the MODELLER with an average accuracy of 2.61 0.16
A. The stereochemistry evaluation of loop regions using the
PROCHECK and the Verify3D programs confirms these regions have been built without any serious errors (data not
shown). Lack of higher order secondary structures in the
disordered loop regions (Adachi, Kanamori, et al., 2003)
may cause flexible conformation (free flowing) in protein
regions. Therefore, rather than not having any of these regions in the molecule, it is better to include at least lessaccurate loops to understand properties within the protein
structure. Details of homology modelling for 11S protein
used in this study are explained in our previous communication using 11S cruciferin (Withana-Gamage et al., 2011).
The molecular structures of protomers of 5 defined glycinin subunit variants A1aB1b, A2B1a, A1bB2, A3B4, and
A5A4B3 and their respective homotrimers generated with
(Modelcore HVR) and without (Modelcore) loop regions
are used to explore physico-chemical properties and to understand and predict structureefunction relations.
Surface hydrophobicity and related properties
The hydropathy profile of a protein with known structure
determined using the linear amino acid sequence gives little
information with respect to the overall hydrophobicity of
the molecule at its tertiary structure or any higher level.
The surface hydrophobicity (S0) of a protein plays an important role in determining solubility, emulsifying and
foaming properties (Nakai, 1983) for food related systems.
The surface hydrophobicity of a protein can be measured in
two ways; by its ability to bind small fluorescent molecule
such as cis-parinaric acid (CPA) or 8-anilino-1-naphthalenesulfonic acid (ANS), and to adsorb on to polymer materials such as phenyl- or butyl-Sepharose generally
determined using hydrophobic column chromatography
T.S. Withana-Gamage, J.P.D. Wanasundara / Trends in Food Science & Technology 28 (2012) 153e167
(the higher the surface hydrophobicity, the stronger the adsorption to the column).
Glycinin protomers can be arranged in the descending
order of average hydrophobicity (H) calculated
based on primary amino acid sequence, as A1bB2 >
A2B1a > A1aB1b > A3B4 > A5A4B3 (Table 1 &
Supplementary data Table S1). When the hydrophobicity
values of amino acid residues assigned according to the
scale proposed by Kyte and Doolittle (1982) are plotted
on the solvent accessible surface of the homotrimers of
the Modelcore (Supplementary data Fig. S2), for all glycinin protomers (Table 1), relatively more hydrophobic residues on IE face (interchain SeS bond containing face;
Adachi et al., 2001) can be observed than that of the IA
face (intrachain SeS bonds containing face; Adachi
et al., 2001). Among the Modelcore subunit variants, the
number of hydrophobic residues on IA (36e47) and IE
159
Table 1. Theoretical physico-chemical parameters of modelled glycinin subunits and experimental values reported for functional properties of
glycinin subunits.
Property
Subunit
A1aB1b
A1bB2
A2B1a
A3B4
A5A4B3
Formula
C2333H3660
N686O741S14
53.6
30.1
12.4
2.4:1
0.81
5.78
24,515.0
2603.3
477.6 (9)
6.1
28.3
36
54
28
34
6
6
3
3
C2377H3719
N691O736S17
54.3
28.3
11.3
2.5:1
0.59
6.01
23,547.4
5639.8
883.0 (9)
5.6
30.3
43
52
24
31
3
6
3
3
57.7 0.3
74.4
6
98
6.8
47.3
65.1
C2363H3714
N694O748S18
54.4
30.0
10.7
2.8:1
0.66
5.46
23,018.2
4151.5
990.6 (10)
5.4
28.8
37
56
25
38
6
6
3
3
C2333H3660
N686O741S14
58.2
28.2
12.8
2.2:1
0.82
5.52
26,034.1
3478.0
644.3 (3)
7.2
29.9
41
59
38
50
3
3
3
3
71.7 0.3
71.1
9.5
95
3.1
11.7
78.0
C2765H4325
N817O902S11
63.8
30.3
13.9
2.2:1
0.95
5.17
26,247.2
3999.9
708.9 (3)
6.7
28.5
39
58
35
38
3
3
3
3
61.2 0.2
65.5
12
30
2.3
18.8
73.9
Mra (KDa)
Asx Glx (%)
His Arg Lys (%)
Acidic:Basic
Ha
pIa
A2)
ASAb (
Pocket area of central chanel cavityc
A3)
Individual pocket openingc (
Proline residuesd (%)
Hydrophobic residuese (%)
Surface hydrophobic residuesf:
ModelCore
Surface hydrophobic residuesf:
ModelCore HVR
Number of eSH groups
(
A3)
IA face
IE face
IA face
IE face
IA face
IE face
IA face
IE face
67.0
10.5
78.1
19.4
73.3
H is the grand average hydrophobicity. pI and Mr are calculated using 1 sequence of the molecule.
Solvent-accessible surface area (ASA) was calculated for homotrimers by rolling ball method with a radius of 1.4
A.
c
Pocket area: size of the cavities around central channel of core-structures, in addition the size of the mouth opening of individual pocket
and the number of openings is given in parenthesis.
d
No. of proline residues per single subunit.
e
Mol% of sum of Val, Pro, Leu, Ile, Phe, and Trp residues.
f
Surface exposed hydrophobic residues were counted manually after visualizing the molecule with VMD software.
g
A1aB1b, A1bB2 and A2B1a were reported together as Group I subunits.
h
Maruyama et al., 2004 (glycinin hexamer form).
i
Prak et al., 2005 (proglycinin trimer form).
a
160
T.S. Withana-Gamage, J.P.D. Wanasundara / Trends in Food Science & Technology 28 (2012) 153e167
are reported in Table 1. The hydrophobic residues and electrostatic potential of the glycinin trimers can be mapped on
the ASA (Fig. 3). The expansion of extension region of
HVR-III of A3B4 and A5A4B3 protomers is different.
This is evident in the side view of the molecules (90 rotation of IE or IA face molecule; Fig. 3 iv and v). Furthermore, the centre channel of the A3B4 homotrimer is
covered by the HVR-III (Fig. 3 iv) but the extension of
this disordered region is not as great as HVR-III of
A5A4B3 (Fig. 3 v and Supplementary data Fig. 2 v). The
centre channel of A5A4B3 trimer is not covered by
HVR-III suggesting that accessibility of ANS is easier
than in the A3B4 molecule. Surface adsorption of protein
molecules in the hydrophobic (Sepharose) column may be
easier in A3B4 due to the shorter HVR-III, on the other
hand, much relaxed and highly hydrophilic arms of HVRIII may sterically hinder the binding ability of A5A4B3
hexamer via IA face to the hydrophobic column. High surface hydrophobicity reported for A1bB2 proglycinin (i.e. in
trimer configuration and has the shortest HVRs among all
five protomers) that is determined by hydrophobic column
chromatography (Prak et al., 2005) may be due to the differences in HVR length.
Solubility
The solubility properties of a protein depend on the
physico-chemical nature of the molecular surface.
Moreover, protein solubility under a given set of conditions
is the thermodynamic manifestation of the equilibrium between proteineprotein and proteinesolvent interactions
and relates to the net free energy changes due to the interaction of hydrophobic and hydrophilic residues on the protein surface with the surrounding solvent. Therefore, the
distribution of electrostatic surface potential (may relate
to the salt binding sites) of a molecule and its surface
hydrophobicity are critical factors influencing solubility
properties of a protein (Damodaran, 2008). For the Modelcore HVR glycinin homotrimers, we calculated the electrostatic surface potential by solving PoissoneBoltzmann
equation using the Adaptive PoissoneBoltzmann Solver
(APBS) (Baker, Sept, Joseph, Holst, & McCammon,
2001) plug-in (developed by Michael G. Lerner, University
of Michigan) of PyMol (Warren L. DeLano, DeLano Scientific, San Carlos, CA, http://www.pymol.org). Electrostatic
surface potentials of Modelcore of soybean protomers show
generally slight positive (basic) charge on IE face than the
IA face. Electrostatic surface potential of IA face of group
II homotrimers (i.e. A3B4 & A5A4B3, Supplementary data
Fig. S1) shows a prominent negative charge and aligns well
with the lowest value for acidic:basic residues (2.2:1)
among the glycinin protomers (Table 1). Similar to surface
hydrophobicity, the surface electrostatic potential of glycinin Modelcore HVR shows remarkable differences when
mapped to the surface representation of homology models
(Fig. 3). Generally, the HVRs are rich in acidic residues
(Asx and Glx, Supplementary data Fig. S1) and may result
T.S. Withana-Gamage, J.P.D. Wanasundara / Trends in Food Science & Technology 28 (2012) 153e167
161
Fig. 3. Surface characterization of developed soybean glycinin models with HVRs (Modelcore HVR). (a) Surface hydrophobicity (i: A1aB1b, ii:
A1bB2, iii: A2B1a, iv: A3B4, & v: A5A4B3). Distribution of hydrophilic and hydrophobic residues assigned according to Kyte and Doolittle
(1982) scale is represented in green (hydrophilic) and red (hydrophobic) on the solvent accessible surface of the models. (b) Electrostatic potential
of molecular surfaces of glycinin models are indicated in colour and the values range from 5 kT/e (blue) and 5 kT/e (red).
to negative charges may contribute to this property. According to Prak et al. (2005), low ionic strength (m 0.08) resulted in precipitation of proglycinin protomers with
incomplete solubility for A2B1a and A1bB2 when pH
changed from 5.7 to 6.7. In the same study, very low
162
T.S. Withana-Gamage, J.P.D. Wanasundara / Trends in Food Science & Technology 28 (2012) 153e167
T.S. Withana-Gamage, J.P.D. Wanasundara / Trends in Food Science & Technology 28 (2012) 153e167
(5639.8
A3) > A2B1a (4151.5
A3) > A5A4B3
3
3
A 3)
(3999.9 A ) > A3B4 (3478.9 A ) > A1aB1b (2603.3
(Table 1). Studies on 7S (Fukuda et al., 2008) and 11S
(Tandang-Silvas et al., 2011) globulins have demonstrated
that the proteins with large cavity size have low thermal
stability, therefore the thermal stability of glycinin homotrimers could be predicted in the descending order as
A1aB1b > A3B4 > A5A4B3 > A2B1a > A1bB2. This
is exactly the same order of thermal stability for soybean
proglycinin reported by Prak et al. (2005) at subunit levels
suggesting that the cavity size of a protein molecule is
a good parameter to predict proteins thermal stability.
The lesser number of proline residues in A1bB2 (5.6%)
and A2B1a (5.4%) than A1aB1b, A5A4B3, and A3B4
(6.1, 6.7 and 7.2%, respectively) may contribute further
to thermal destabilization of A1bB2 and A2B1a homotrimers (Table 1). The proteins with long loops are susceptible to heat induced denaturation than those with shorter
loops (Chakravarty & Varadarajan, 2002; Kumar &
Nussinov, 2001). Although the A3B4, A5A4B3 and
A1aB1b have longer HVRs than other subunit variants, features such as high number of proline residues and small
cavity size may have negated the effect of loop length difference on thermal stability. The type and stability of a thermally induced gel can be predicted by evaluating surface
hydrophobicity, charge distribution, disulphide/sulfhydryl
(eSH/SeS) content and size of the cavities (Damodaran,
2008; Shimada & Matsushita, 1980). Soybean glycinin
contains two SeS bonds; one is interchain (between the
acidic and basic chain, A1aB1b: Cys124Cys45, A1bB2:
Cys314Cys64, A2B1a: Cys284Cys61, A3B4: Cys324Cys65, and A5A4B3: Cys334Cys66) and the other
is intrachain (within acidic chain, A1aB1b: Cys884Cys298, A1bB2: Cys1074Cys304, A2B1a: Cys1044Cys307, A3B4: Cys1084Cys385, and A5A4B3:
Cys1094Cys351) (Supplementary data Fig. S3). Using disulphide bond-deficient mutants C12G and C88S of proglycinin A1aB1b, Adachi and group revealed that the
contribution of inter- and intrachain disulphide bonds to
thermal stability is low, particularly for the proglycinin
A1aB1b protomer (Adachi, Okuda, et al., 2003).
The content of eSH and SeS bonds affects hardness of
heat-induced protein gel because of the disulphide bond exchange that may occur during heating (Shimada &
Matsushita, 1980; Tezuka et al., 2004). The HVR-V of
group I and HVR-IV of group II contain six (two per protomer) and three (one per protomer) eSH residues on IE
face, respectively (Supplementary data Fig. S2 and
Fig. S3). In the glycinin hexamer, the eSH residues may
be hidden inside the molecule and may not participate in
forming SeS bonds in the initial stage of heating. All glycinin trimers have three more eSH residues embedded in
the IA face with the potential to form disulphide bonds
when conditions are favourable (such as during heat induced aggregation). Three additional eSH residues are
found in group I subunit variants except A1bB2 protomer
163
164
T.S. Withana-Gamage, J.P.D. Wanasundara / Trends in Food Science & Technology 28 (2012) 153e167
Fig. 4. The proposed pathway for applying homology modelling and models in screening food proteins for desired functional properties. With the
enormous amount of amino acid sequence information available in databases and the limited number of 3-D structures, development of knowledgebased molecular structure models of several food proteins is quite possible. Initial screening of proteins at molecular or subunit levels in silico for
selected physico-chemical properties that are predictors of technologically valuable functional properties can be achieved at a fraction of the cost
and time compared to wet-lab techniques.
and deposited in plants, which is through a selection or enhancement process at molecular level to perform desired biological activity or functionality in a complex system
which is our food.
Conclusion
In this paper, we show the possibility of using homology modelling to predict structure and physico-chemical
properties of glycinin at the molecular level in the in silico platform as an approach to understand and investigate
properties that are important in processing functionality.
Although functional properties of food proteins are at
macroscopic length scale, the structure related properties
of constituting molecules largely contribute to these.
Homology modelling allows to predict 3D-structure
based on genetic relationship or related proteins that are
well studied. This communication shows one of the
ways that food protein scientists can utilize bioinformatics (emphasis on homology modelling) to screen or
investigate suitability of a protein for specific functionalities needed in food. This approach resembles designing
of drugs in pharmaceutical and medicinal chemistry.
This proposed approach indeed requires proper validation
with well-defined food proteins and appropriate in vitro
data for FP. Homology modelling allow to derive molecular structure of a protein of interest and structure properties can be investigated to obtain physico-chemical
properties of the molecule that are important in processing functionality. Therefore, homology modelling can be
complementary to the existing approaches of food protein
structure and function prediction.
Acknowledgements
This work is supported by the Agriculture and AgriFood Canada (AAFC) funded project RBPI 1827.
T.S. Withana-Gamage, J.P.D. Wanasundara / Trends in Food Science & Technology 28 (2012) 153e167
Supplementary data
Supplementary data related to this article can be found
online at http://dx.doi.org/10.1016/j.tifs.2012.06.014.
References
Adachi, M., Kanamori, J., Masuda, T., Yagasaki, K., Kitamura, K.,
Mikami, B., et al. (2003). Crystal structure of soybean 11S
globulin: glycinin A3B4 homohexamer. Proceedings of the
National Academy of Sciences of the United States of America,
100, 7395e7400.
Adachi, M., Okuda, E., Kaneda, Y., Hashimoto, A., Shutov, A. D.,
Becker, C., et al. (2003). Crystal structures and structural stabilities
of the disulfide bond-deficient soybean proglycinin mutants C12G
and C88G. Journal of Agricultural and Food Chemistry, 51,
4633e4639.
Adachi, M., Yagasaki, K., Gidamis, A. B., Mikami, B., & Utsumi, S.
(2001). Crystal structure of soybean proglycinin A1aB1b
homotrimer. Journal of Molecular Biology, 305, 291e305.
Aiking, H. (2011). Future protein supply. Trends in Food Science &
Technology, 22, 112e120.
Altschul, S. F., Madden, T. L., Schaffer, A. A., Zhang, J., Zhang, Z.,
Miller, W., et al. (1997). Gapped BLAST and PSI-BLAST: a new
generation of protein database search programs. Nucleic Acids
Research, 25, 3389e3402.
Baker, D., &
Sali, A. (2001). Protein structure prediction and structural
genomics. Science, 294, 93e96.
Baker, N. A., Sept, D., Joseph, S., Holst, M. J., & McCammon, J. A.
(2001). Electrostatics of nanosystems: application to microtubules
and the ribosome. Proceedings of the National Academy of
Sciences of the United States of America, 98, 10037e10041.
Barre, A., Borges, J.-P., & Rouge, P. (2005). Molecular modelling of the
major peanut allergen, Ara h 1 and other homotrimeric allergens
of the cupin superfamily: a structural basis for their IgE-binding
cross-reactivity. Biochimie, 78, 499e506.
Barre, A., Jacquet, G., Sordet, C., Culerrier, R., & Rouge, P. (2007).
Homology modelling and conformational analysis of IgE-binding
epitopes of Ara h 3 and other legumin allergens with a cupin fold
from tree nuts. Molecular Immunology, 44, 3243e3255.
Bonneau, R., & Baker, D. (2001). Ab initio protein structure
prediction: progress and prospects. Annual Review of Biophysics
and Biomolecular Structure, 30, 173e189.
Bordoli, L., Kiefer, F., Arnold, K., Benkert, P., Battey, J., & Schwed, T.
(2009). Protein structure homology modeling using SWISSMODEL work place. Nature Protocols, 4, 1e13.
Brooks, B. R., Bruccoleri, R. E., Olafson, B. D., States, D. J.,
Swaminathan, S., & Karplus, M. (1983). CHARMM: a program for
macromolecular energy, minimization, and dynamics
calculations. Journal of Computational Chemistry, 4, 187e217.
Cabanos, C., Tandang-Silvas, M. R., Odijk, V., Brostedt, P., Tanaka, A.,
Utsumi, S., et al. (2010). Expression, purification, cross-reactivity
and homology modeling of peanut profilin. Protein Expression and
Purification, 73, 36e45.
Cavasotto, C. N., & Phatak, S. S. (2009). Homology modeling in drug
discovery: current trends and applications. Drug Discovery Today,
14, 676e682.
165
166
T.S. Withana-Gamage, J.P.D. Wanasundara / Trends in Food Science & Technology 28 (2012) 153e167
T.S. Withana-Gamage, J.P.D. Wanasundara / Trends in Food Science & Technology 28 (2012) 153e167
http://www.ebi.ac.uk/Tools/msa/clustalw2/.
http://www.ncbi.nlm.nih.gov/bioproject/.
http://www.pdb.org/pdb/home/home.do.
http://www.proteinmodelportal.org/.
http://www.pymol.org/.
http://www.uniprot.org/.
167