You are on page 1of 15

Trends in Food Science & Technology 28 (2012) 153e167

Viewpoint

Molecular modelling
for investigating
structureefunction
relationships of soy
glycinin
Thushan S. Withana-Gamagea,b
and Janitha P.D.
Wanasundaraa,b,*
a

Agriculture and Agri-Food Canada, Saskatoon SK,


Canada S7N 0X2 (e-mails: sanjeewa.thushan@agr.gc.
ca; thushan.sanjeewa@usask.ca)
b
Department of Food and Bioproduct Sciences,
University of Saskatchewan, Saskatoon SK, Canada
S7N 5A9 (Tel.: D1 306 956 7684; fax: D1 306 956
7247; e-mail: janitha.wanasundara@agr.gc.ca)
Increasing global food protein demand drives research to improve existing sources for efficient use or to convert unconventional sources to mainstream protein ingredients. The structure
at all levels is the most important intrinsic property that dictates suitability of a protein for food use. High throughput sequencing has facilitated genome mapping of food plants at an
accelerated pace, however, the information is poorly utilized
in food protein research. Use of bioinformatics for data mining
and molecular modelling in revealing structureefunction relations is discussed using soybean glycinin as a model protein.
This in silico approach is complementary to the process of understanding food protein molecule structure, and linking molecule physico-chemical properties with the functionalities that
protein provides in food.

Introduction
The conventional approach of discovering new protein for
food application relies on stepwise isolation and
* Corresponding author.

purification, and generating functional property profiles under predetermined conditions applicable to foods. Most
studies on functional properties (FP) of food proteins
have dealt with screening protein sources in vitro and in
model foods by a hit-or-miss approach. In this research
area, the emphasis is on the extrinsic factors that govern
protein functionality. For example, conditions involved in
protein processing, protein denaturation state, and other associated components affecting protein properties are mostly
highlighted. The link between the technologically valuable
FP and innate properties of the protein at molecular and
structural level is less explored.
Modelling structureefunction relations to quantify food
protein functionalities using Quantitative Structure Activity
Relationship (QSAR) approach has evolved since Nakai
(Nakai, 1983; Nakai & Li-Chan, 1988; Townsend &
Nakai, 1983; Voustinas, Cheung, & Nakai, 1983;
Voustinas, Nakai, & Harwalker, 1983) and other groups
continue enhancing the capability of this area (Pripp,
Isaksson, Stepaniak, Srhaug, & Ardo, 2005). Furthermore,
Liebman (1998) evaluated data mining approaches to investigate the relationships between structure and functions of
proteins for rational molecular design for directed uses.
The development of structureefunction relationships of
food proteins through molecular modelling approach reviewed by Kumosinski, Brown, and Farrel (1991a, 1991b)
used primary sequences of k-casein and as1-casein to generate secondary and unrefined three dimensional structures
demonstrating the ability of molecular modelling in solving
certain structureefunction relations of these proteins relevant in food applications.
Bioinformatics of post genomic era has revealed considerable information on plant proteins particularly related to
understanding desirable quality traits of food crops, however, these exponentially expanding data and tools have
barely been used to advance food protein research. The
knowledge gap on molecular biology of food proteins,
their structures and functions/properties can be narrowed
down through three-dimensional (3-D) molecular modelling based on homology models. Protein structure is
closely linked with protein function; the structural genomics has the potential to inform knowledge of protein
function.
Development of molecular modelling programs and
their application has been formalized in designing new
drugs and called computer assisted drug design (CADD)

0924-2244/$ - see front matter Crown Copyright 2012 Published by Elsevier Ltd. All rights reserved.
http://dx.doi.org/10.1016/j.tifs.2012.06.014

154

T.S. Withana-Gamage, J.P.D. Wanasundara / Trends in Food Science & Technology 28 (2012) 153e167

or computer assisted molecular design (CAMD). A basic


principle of this research area is that the biological activity
of a molecule is dependent on the three dimensional placement of specific functional groups. In this area of research,
hypotheses associating structural properties with bioactivities have been developed and validated in predicting properties and activities of new chemical entities using
computational tools in conjunction with conventional
research techniques that examine structural properties of
existing compounds. Food protein functionality research
can take a similar approach in understanding FP of food
protein molecules and rationalizing modifications to the
molecules for enhanced functionalities.
According to global per capita food consumption data,
plant foods comprise 61.3% of dietary protein intake
(FAOSTAT, 2010) indicating the value of plant proteins
as a food macromolecule. About 84.1% of dietary plant
proteins originate from seeds (FAOSTAT, 2010) and include cereals, legumes, oilseeds and nuts because of their
high protein density and abundant utilization in our food.
Although the protein content of seeds varies considerably,
a major fraction of proteins is stored as a source of C, N,
and S for mobilization and utilization to support seedling
growth and these proteins are known as seed storage proteins; SSP (Derbyshire, Wright, & Boulter, 1976). In food
industry, as the main protein source of seeds, SSP play
a crucial role in satisfying the protein demand of the human
food supply. It is estimated that on average, six units of
plant protein is required to produce one unit of muscle protein (Pimentel & Pimentel, 2003), consequently, reaching
only w15% of protein and energy of animal feed crops
to human mouth (Aiking, 2011). Increase in the direct
use of plant proteins in human food may help to overcome
this inefficiency. Under the current forecast for global food
demand and supply, efficient use of plant proteins will be
needed to satisfy future protein requirements for food and
feed. This means the improved efficiency of plant protein
use should be achieved at both levels of food consumption;
before and during eating (involves desired functionalities in
food products) and after eating (includes nutritional value
and safety).
In this communication, we show the value of bioinformatics databases and tools in predicting important parameters of protein molecules related to food functionality and
possibility of using these tools and information to screen
proteins for selected FP. In addition, the ability of computational proteomics approach to investigate the relationships of SSP among food crops and to direct molecular
structure model selection is discussed.
Among the plant food proteins and SSP, soybean glycinin has information ranging from the genes involved in expressing protomers (subunits) of the multimeric protein,
crystal structure, to FP of homotrimers and homohexamers
(Fukushima, 1991; Tandang-Silvas, Tecson-Mendoza,
Mikami, Utsumi, & Maruyama, 2011). In this communication, we show that the 3-D structures of trimers and

hexamers of soybean glycinin can be developed through


homology modelling using available information on molecular structure and genetic relationship. We analyzed the
most obvious characteristics of glycinin homology-based
structures that may be useful in predicting physicochemical properties and functionalities of hexameric and
trimeric forms, and compared them with the laboratory
data available in literature on similar glycinin molecule
forms.
Homology modelling
Homology or comparative modelling involves computational procedures that can be employed in predicting the
3-D structure of a protein using its amino acid sequence
as the target and solved homologous protein as the template. Homology modelling of the 3-D structure of an unknown protein based on evolutionary relationship with
experimentally determined structure is regarded as
a high-throughput and low-resolution technique and
has successfully been incorporated in drug discovery process by screening many homology models in pharmaceutical industry (Cavasotto & Phatak, 2009; Hillisch, Pineda, &
Hilgenfeld, 2004; Maggio & Ramnarayan, 2001). Bordoli
et al. (2009) describe step-by-step process of protein structure homology modelling using SWISS-Model workspace.
The review by Forster (2002) describes the value of 3-D
models and modelling techniques for exploring proteineligand and proteineprotein complexes. The ProFunc
server
(http://www.ebi.ac.uk/thornton-srv/databases/
ProFunc) helps in identifying the likely biochemical function/s of a protein from its 3-D structure (Laskowski,
Watson, & Thornton, 2005). To date, homology modelling
of 3-D structure of food related protein and computer (in
silico) prediction of protein characteristics in relation to
food use is limited to the few studies on allergenic proteins
(Barre, Borges, & Rouge, 2005; Barre, Jacquet, Sordet,
Culerrier, & Rouge, 2007; Cabanos et al., 2010; Schein,
Ivanciuc, & Braun, 2007). Recently, we described on the
expected properties of cruciferin, the main SSP in Brassicaceae family, using homology models (Withana-Gamage,
Hegedus, Qiu, & Wanasundara, 2011).
For the structureefunction or structureeproperty studies, many food proteins are without experimentally
obtained structure details. X-ray crystallography and twodimensional nuclear magnetic resonance (2-D NMR)
spectroscopy have been the most employed techniques for
protein 3-D structure determination. However, the limitations of obtaining satisfactory crystals for X-ray analysis
and low solubility of large molecules for NMR analysis
have restricted structure availability of many food proteins.
Therefore, structure modelling approaches offer a plausible
way to obtain 3-D structure of such food protein that has no
available experimental data but closely related to a well
characterized protein. The de novo or ab initio technique
is an alternative protein structure modelling approach and
is template-independent. This approach mainly depends

T.S. Withana-Gamage, J.P.D. Wanasundara / Trends in Food Science & Technology 28 (2012) 153e167

on the primary amino acid sequence and in silico folding of


proteins to its native state according to the free energy landscape theory using computational algorithms (Bonneau &
Baker, 2001). Application of ab initio method is still limited to small protein molecules due to the complex calculation; therefore, the application to SSP that have few
hundreds of amino acid residues is marginalized. Considering the limitations of template independent modelling
methods, the homology based modelling offers an alternative approach to understand and investigate the protein molecule structure and the relationships of structural properties
with the functions under defined conditions which could be
useful for food proteins. Such understanding will enable to
design plausible structural modifications that can be extended to the genetic expression level to achieve desirable
molecular properties.
Amino acid sequence, structure, and genetic
relationship of SSP
The amino acid sequence of SSP provides a means of
understanding the relationship of similar groups of proteins
in different plants that have evolved from common ancestral gene(s). A total of 723 sequences results from the
term seed storage proteins with protein knowledgebase
(UniProtKB, 2010) search. Among these 723 clusters,
477 amino acid sequences are categorized as nutrient reservoir activity according to Gene Ontology (GO) hierarchy. Comparison of primary structure of SSP provides
homologies within each plant family and it is one of the
most important considerations in the knowledge-based
structure modelling for protein property prediction. The
phylogenetic tree constructed using interactive tree of life
(iTOL) web server (Ciccarelli et al., 2006; Letunic &
Bork, 2007, http://itol.embl.de/) for the above mentioned
477 amino acid sequences after sequence alignment with
ClustalW 2.0 (Larkin et al., 2007, http://www.ebi.ac.uk/
Tools/msa/clustalw2/) is provided in Fig. 1. A majority of
these amino acid sequences is from the globulin family
(Fig. 1, Ring a & b) that is widely distributed in cereals
(mainly rice and oat), legumes, and oilseeds. The total protein content of these seeds ranges from 10% (cereals) to
w40% (legumes and oilseeds) (Shewry, Napier, &
Tatham, 1995). Currently, the Protein Data Bank (PDB,
http://www.pdb.org/pdb/home/home.do) contains molecular structures of 35 SSP. Most of these SSP with elucidated
structure are from the clade of eudicotyledons (Fig. 1, Ring
d). Availability of such experimentally determined tertiary
and quaternary structures makes it possible to model up
to hundreds of similar yet unknown proteins within each
homology group or each fold-class (Maggio &
Ramnarayan, 2001). The sequence identities (>30%
match) of SSP that align with the primary sequence of
the best available 3-D structure are illustrated in the Ring
c of Fig. 1. Availability of matching templates is indicated
as filled columns in the Ring e of Fig. 1. Sequence identity
value greater than 30% is found for most of the globulins

155

and albumins of eudicotyledons including food legumes


(e.g. Cicer arietinum, Lupinus albus, Phaseolus vulgaris,
Pisum sativum, Vicia faba), oilseeds (Ara h 3 of Arachis hypogaea, procruciferin of Brassica napus, A1bB2 and
A2B1a subunits of Glycine max) and some cereal globulins
(Avena sativa and Oryza sativa). However, 3-D structures
of prolamins with known primary sequences (i.e. gliadin/
glutelin of Triticum aestivum, Hordeum vulgare and Avena
sativa, zein of Zea mays, Oryza sativa, and Sorghum bicolor) of the Poaceae (Gramineae) family have yet to be
characterized experimentally (Fig. 1, Ring c). In the absence of homologous structures, knowledge-based structure
prediction of proteins such as in the prolamin superfamily,
especially the high molecular weight proteins, is complicated. According to phylogenetic analysis of SSP shown
here, it is clear that 7S and 11/12S proteins have close relatedness mostly owing to the sharing of common ancestor
(Fig. 1, Ring a). Considering the available number of template structures and knowledge on structures of globulin
family SSP, it is possible to build good quality homology
models for globulins of many food crops.
Knowledge-based structure prediction
Among the higher plants, sequencing of a total of four
whole genomes and twenty draft assemble genomes has
been completed and another 76 genome sequencing
projects are in progress (The National Center for
Biotechnology Information, NCBI, Entrez Genome
Project
Database,
http://www.ncbi.nlm.nih.gov/sites/
entrez?dbgenomeprj). Recently, 1.1-gigabase size entire
genome of Glycine max has been sequenced using
a whole-genome shotgun approach (Schmutz et al.,
2010). These projects have generated large number of
amino acid sequences but only partial information about
those genes and their products (i.e. proteins) is available.
The structural information of proteins is required to understand their physiological (in-plant system) and physicochemical (related to FP in food system) properties.
According to the research collaboratory for structural bioinformatics of protein data bank statistics (RCSB PDB; as
a member of the wwPDB, http://www.wwpdb.org/, the
RCSB PDB curates and annotates PDB data), 86.8% of protein structures have been refined by X-ray crystallography
and only 12.4% have been determined by solution NMR
spectroscopy with a rate of deposition in PDB around 23
structures per day (March, 2011). In general, both structure
resolution methods are expensive, slow, and difficult to be
applied for all proteins, especially on SSP. The microheterogeneity of most SSP is a barrier to obtain good quality experimental structures and only 19 food-related SSP have
been characterized to date. Fortunately, structure predictions using the evolution (i.e. homology modelling and
threading methods) and global free energy minimization
(i.e. ab initio method) principles have narrowed the gap between protein sequences and 3-D structures. In the
homology modelling process, a high quality structure can

156

T.S. Withana-Gamage, J.P.D. Wanasundara / Trends in Food Science & Technology 28 (2012) 153e167

Fig. 1. The phylogenetic tree showing evolutionary relationships among storage proteins (SSP) of edible seeds. Ring (a) Major SSP clades: Brassicaceae in red, Fabaceae in yellow, Poaceae in blue, other species in black. Ring (b) The phylogeny of major plant protein families: 11e13S globulins
(green), 7S globulins (blue), 2S albumins (magenta), prolamin zein (red), and prolamin gliadin/glutelin (orange). Ring (c) Percentage identity (blacke
white) of each SSP to the best available template, Ring (d) Relationship of SSPs with the plant clade and group: Eudicotyledon clade (pink) and monocotyledon group (cyan). Ring (e) Bars represent the length of the sequence: filled bar indicate experimentally determined structure is available for the
sequences and few selected structures are shown outside of Ring e. Inset relates to Ring b and Ring e. Amino acid sequences were obtained from
UniProt database. The phylogenetic tree was generated using ClustalW 2.0 (Larkin et al., 2007, http://www.ebi.ac.uk/Tools/msa/clustalw2/) and interactive tree of life (iTOL) web server (Ciccarelli et al., 2006; Letunic & Bork, 2007, http://itol.embl.de/).

be modelled with a template when sequence identity is over


50%, and a sequence identity over 30% may result in a reasonable good structure. The accuracy of such modelled
structures is comparable to those obtained from mediumresolution NMR or low-resolution X-ray diffraction

(Baker & Sali,
2001). Comparative models with good stereochemistry can be obtained from web-based modelling
expert systems such as SWISS-MODEL workspace
(Bordoli et al., 2009) or installable computer software pro & Blundell, 1993).
grams such as MODELLER (Sali
Homology or comparative modelling uses the principle of
evolutionary conservation of primary structural features of
an unknown protein to a known molecule structure. Basic

steps of homology model building for a protein with known


primary sequence are shown in Fig. 2a. The process includes;
the initial step of recognizing the known experimental structure (i.e. template), alignment of target-template, building
model in silico (including back bone, side chain, and loop
generation), and finally refinement of modelled structure (including energy minimization and model validation) (Bordoli
et al., 2009; Kopp & Schwede, 2004; Mart-Renom et al.,
2000). The amino acid sequence of SSP of interest can be obtained from available protein databases (e.g. UniProtKB/
TrEMBL) if the protein is sequenced and information is deposited. The structures of protein molecules with close homology to the target can be identified using a gapped

T.S. Withana-Gamage, J.P.D. Wanasundara / Trends in Food Science & Technology 28 (2012) 153e167

BLAST or PSI-BLAST query (Altschul et al., 1997) against


the template library such as PDB. A suitable template is selected based on two criteria; (i) percentage sequence identity
between the target and the template sequence and (ii) the experimental quality of the available solved structure (i.e. resolution) (Bordoli et al., 2009). After selecting the best
template, the target sequence and the template has to be
aligned to construct a 3-D model. Sometimes manual

157

intervention may be required to minimize any misalignment.


In the next phase, if necessary, geometries of side chain
packing should be corrected by energy minimization using
force-field approaches such as GROMOS96 (van
Gunsteren et al., 1996) or CHARMM22 force fields
(Brooks et al., 1983). The loop regions of the molecule
must be carefully optimized or even reconstructed using
 & Blundell, 1993).
the programs like MODELLER (Sali

Fig. 2. Process steps of homology modelling and its application for building soybean glycinin protomer models. (a) Major steps involved in homology
or comparative modelling of protein tertiary structure. (b) Structure modelling of the five known subunits of soybean (Glycine max). Subunits are
divided into two groups according to their homology; group I e A1bB2, A1aB1b, and A2B1a, and group II e A3B4 and A5A4B3. The 3-D structure
of two protomers A1aB1b of group I and A3B4 of group II has been experimentally determined (PDB codes: 1FXZ and 2D5F, respectively). Tertiary
and quaternary structures of all five promoters were modelled using these two templates. Dark coloured loop areas in final protomers show the constructed disordered regions. (c) Bar charts show the homology in terms of sequence identity, sequence similarity, and gap of each subunit with corresponding sequence (black and grey) and template (other colours).

158

T.S. Withana-Gamage, J.P.D. Wanasundara / Trends in Food Science & Technology 28 (2012) 153e167

The stereochemical quality of the modelled structures can be


assessed using tools such as PROCHECK (Laskowski,
MacArthur, Moss, & Thornton, 1993), Verify3D (Luthy,
Bowie, & Eisenberg, 1992), and ProSA (Wiederstein &
Sippl, 2007). As mentioned earlier, the alternative protein
structure modelling approach ab initio technique is independent of template matching and conserved structure patterns
of the molecule.
Structure of soybean 12S protein
Information on the protein structure and FP is available
for the soybean glycinin (a 12S globulin) in literature
(Maruyama et al., 2004; Prak et al., 2005; Tezuka,
Yagasaki, & Ono, 2004). In addition, data on physicochemical, thermal and techno-functional properties and
molecular structures are available for the identified five
subunit or protomer variants of glycinin (Adachi,
Kanamori, et al., 2003; Adachi, Okuda, et al., 2003;
Adachi, Yagasaki, Gidamis, Mikami, & Utsumi, 2001;
Fukushima, 1991; Maruyama et al., 1999, 2004; Prak
et al., 2005; Tezuka, Taira, Igarashi, Yagasaki, & Ono,
2000; Tezuka et al., 2004). Glycinin, the major soybean
SSP has a hexameric quaternary structure. The five subunit
variants out of the 7 identified protomers may randomly assemble to form the glycinin hexamer (Fukushima, 1991).
Two groups of glycinin subunit variants have been identified according to their homology; group I (A1aB1b,
A2B1a, and A1bB2) and group II (A3B4 and A5A4B3)
(Neilsen et al., 1989). Molecular structures of A1aB1b of
group I and A3B4 of group II have been determined using
X-ray crystallography at a resolution of 2.80 
A (Adachi
et al., 2001, PDB code: 1FXZ) and 1.90 
A (Adachi,
Kanamori, et al., 2003, PDB code: 2D5F), respectively.
Three other glycinin protomers can be built using corresponding template of the respective group (Fig. 2b) utilizing the high degree of homology found among them
(Fig. 2c, Supplementary data Fig. S1). The hyper-variable
regions (HVRs) of glycinin (Wright, 1987) have resulted
in poor atomic density maps (Adachi, Kanamori, et al.,
2003; Adachi et al., 2001) due to their molecular heterogeneity, therefore the available crystal structures do not contain them. As a result, no structural or functional
information about these inserted regions can be extracted
from the available glycinin template structures. When physiological (e.g. immunogenicity) or physico-chemical (e.g.
hydration properties, electrochemical properties, chemical
reactivity) properties are concerned perhaps the loops including the disordered regions or HVRs are equally important. Kealley et al. (2008) have indicated that non-ordered
and ordered structure domains of glycinin contribute differently to the molecule structure mobility and rigidity.
Structure-based prediction of physico-chemical and
functional properties of glycinin
The available data on structure, and functional and
physico-chemical properties of glycinin subunit variants

have obtained either from protein isolates of relevant mutant lines or microbial expression of cDNA sequences.
The glycinin protomers A1bB2, A2B1a, and A5A4B3
could be modelled without any HVRs to be consistent
with the available crystal structures of A1aB1b and A3B4
subunits. These structures are available as Supplementary
data Fig. S2 and called as core-structures (hereafter
referred to as Modelcore throughout this communication).
This referencing is similar to the explanation of
Maruyama et al. (1999) for the b-conglycinin subunits isolated from deletion mutants ac and a0c that are devoid of extension regions or HVRs and designating them as core
regions. The loops can be constructed for all five glycinin
subunit variants using MODELLER program which uses

the optimization-based approach (Fiser, Do, & Sali,
2000)
and hereafter the structures with loops are referred to as
Modelcore HVR. The loop regions or HVRs
(Supplementary data Fig. S1) involving over 12 amino
acid residues can be built or modelled using the step-by
step procedure of the MODELLER program (Sali
&
Blundell, 1993). According to Fiser et al. (2000), the loops
containing 12 residues can be predicted using the MODELLER with an average accuracy of 2.61  0.16 
A. The stereochemistry evaluation of loop regions using the
PROCHECK and the Verify3D programs confirms these regions have been built without any serious errors (data not
shown). Lack of higher order secondary structures in the
disordered loop regions (Adachi, Kanamori, et al., 2003)
may cause flexible conformation (free flowing) in protein
regions. Therefore, rather than not having any of these regions in the molecule, it is better to include at least lessaccurate loops to understand properties within the protein
structure. Details of homology modelling for 11S protein
used in this study are explained in our previous communication using 11S cruciferin (Withana-Gamage et al., 2011).
The molecular structures of protomers of 5 defined glycinin subunit variants A1aB1b, A2B1a, A1bB2, A3B4, and
A5A4B3 and their respective homotrimers generated with
(Modelcore HVR) and without (Modelcore) loop regions
are used to explore physico-chemical properties and to understand and predict structureefunction relations.
Surface hydrophobicity and related properties
The hydropathy profile of a protein with known structure
determined using the linear amino acid sequence gives little
information with respect to the overall hydrophobicity of
the molecule at its tertiary structure or any higher level.
The surface hydrophobicity (S0) of a protein plays an important role in determining solubility, emulsifying and
foaming properties (Nakai, 1983) for food related systems.
The surface hydrophobicity of a protein can be measured in
two ways; by its ability to bind small fluorescent molecule
such as cis-parinaric acid (CPA) or 8-anilino-1-naphthalenesulfonic acid (ANS), and to adsorb on to polymer materials such as phenyl- or butyl-Sepharose generally
determined using hydrophobic column chromatography

T.S. Withana-Gamage, J.P.D. Wanasundara / Trends in Food Science & Technology 28 (2012) 153e167

(the higher the surface hydrophobicity, the stronger the adsorption to the column).
Glycinin protomers can be arranged in the descending
order of average hydrophobicity (H) calculated
based on primary amino acid sequence, as A1bB2 >
A2B1a > A1aB1b > A3B4 > A5A4B3 (Table 1 &
Supplementary data Table S1). When the hydrophobicity
values of amino acid residues assigned according to the
scale proposed by Kyte and Doolittle (1982) are plotted
on the solvent accessible surface of the homotrimers of
the Modelcore (Supplementary data Fig. S2), for all glycinin protomers (Table 1), relatively more hydrophobic residues on IE face (interchain SeS bond containing face;
Adachi et al., 2001) can be observed than that of the IA
face (intrachain SeS bonds containing face; Adachi
et al., 2001). Among the Modelcore subunit variants, the
number of hydrophobic residues on IA (36e47) and IE

159

(52e59) faces is not very different (Table 1). However,


the measured surface hydrophobicity values of glycinin
subunits of both trimeric (proglycinin) and hexameric (mature glycinin) form of soybean 11S are different
(Maruyama et al., 1999; Prak et al., 2005; Tezuka et al.,
2004).
Multiple sequence alignments of group I and II glycinin
proteins and the elucidated structures of A1aB1b (PDB
code: 1FXZ) and A3B4 (PDB code: 2D5F) verify six
HVRs in group I and five HVRs in group II
(Supplementary data Fig. S1). In glycinin group I, HRV-II
(A1aB1b A2B1a: 18 and A1bB2: 15 residues) and
HVR-V (A1aB1b: 48, A2B1a: 41, and A1bB2: 35 residues)
are located on IE face, while HVR-III (A1aB1b A2B1a:
20 and A1bB2: 22 residues) is located on IA face
(Supplementary data Fig. S3). Among the group II glycinin,
HVR-II (A3B4: 12 and A5A4B3: 13 residues), HVR-IV

Table 1. Theoretical physico-chemical parameters of modelled glycinin subunits and experimental values reported for functional properties of
glycinin subunits.
Property

Subunit
A1aB1b

A1bB2

A2B1a

A3B4

A5A4B3

Formula

C2333H3660
N686O741S14
53.6
30.1
12.4
2.4:1
0.81
5.78
24,515.0
2603.3
477.6 (9)
6.1
28.3
36
54
28
34
6
6
3
3

C2377H3719
N691O736S17
54.3
28.3
11.3
2.5:1
0.59
6.01
23,547.4
5639.8
883.0 (9)
5.6
30.3
43
52
24
31
3
6
3
3
57.7  0.3
74.4
6
98
6.8
47.3
65.1

C2363H3714
N694O748S18
54.4
30.0
10.7
2.8:1
0.66
5.46
23,018.2
4151.5
990.6 (10)
5.4
28.8
37
56
25
38
6
6
3
3

C2333H3660
N686O741S14
58.2
28.2
12.8
2.2:1
0.82
5.52
26,034.1
3478.0
644.3 (3)
7.2
29.9
41
59
38
50
3
3
3
3
71.7  0.3
71.1
9.5
95
3.1
11.7
78.0

C2765H4325
N817O902S11
63.8
30.3
13.9
2.2:1
0.95
5.17
26,247.2
3999.9
708.9 (3)
6.7
28.5
39
58
35
38
3
3
3
3
61.2  0.2
65.5
12
30
2.3
18.8
73.9

Mra (KDa)
Asx Glx (%)
His Arg Lys (%)
Acidic:Basic
Ha
pIa
A2)
ASAb (
Pocket area of central chanel cavityc
A3)
Individual pocket openingc (
Proline residuesd (%)
Hydrophobic residuese (%)
Surface hydrophobic residuesf:
ModelCore
Surface hydrophobic residuesf:
ModelCore HVR
Number of eSH groups

(
A3)

Number of SeS groups


Hydrophobic chromatography (min)g,h
Hydrophobic chromatographyi (min)
ANS binding So (at 30  C)g,h
Solubility (pH 4.8, m 0.05) g,h (%)
Emulsifying propertiesg,h (mm)
Emulsifying propertiesi (mm)
Denaturation temperaturei ( C)

IA face
IE face
IA face
IE face
IA face
IE face
IA face
IE face

67.0

10.5
78.1

19.4
73.3

H is the grand average hydrophobicity. pI and Mr are calculated using 1 sequence of the molecule.
Solvent-accessible surface area (ASA) was calculated for homotrimers by rolling ball method with a radius of 1.4 
A.
c
Pocket area: size of the cavities around central channel of core-structures, in addition the size of the mouth opening of individual pocket
and the number of openings is given in parenthesis.
d
No. of proline residues per single subunit.
e
Mol% of sum of Val, Pro, Leu, Ile, Phe, and Trp residues.
f
Surface exposed hydrophobic residues were counted manually after visualizing the molecule with VMD software.
g
A1aB1b, A1bB2 and A2B1a were reported together as Group I subunits.
h
Maruyama et al., 2004 (glycinin hexamer form).
i
Prak et al., 2005 (proglycinin trimer form).
a

160

T.S. Withana-Gamage, J.P.D. Wanasundara / Trends in Food Science & Technology 28 (2012) 153e167

(the longest among glycinin protomers, A3B4: 73 and


A5A4B3: 104 residues) and HVR-III (A3B4 A5A4B3:
22 residues) are found on both IE and IA faces. When
the size of the disordered regions is considered, these regions can occupy a significant portion of molecular surface
area of both IE and IA faces of protomers. According to the
surface properties of glycinin protomers with HVRs
(Modelcore HVR; Fig. 3) and without HVRs (Modelcore;
Supplementary data Fig. S2), drastically different hydrophobic residue profiles are observed although most of the
HVRs are deficient in hydrophobic residues. Maruyama
et al. (2004) and Tezuka et al. (2004) have reported
contradicting surface hydrophobicity estimations for
soybean glycinin composed of similar type of subunits
when the estimation method is changed from ANS
binding to hydrophobic column chromatography.
According to ANS binding-based surface hydrophobicity,
the glycinin homohexamers are in the decreasing order of
A5A4B3 > A3B4 > group I (A1aB1b A1bB2
A2B1a) (Tezuka et al., 2004), whereas this order is changed
to A3B4 > A5A4B3 > group I when assessed by hydrophobic column chromatography (Maruyama et al., 2004)
(Supplementary data Table S1). Careful examination of
glycinin Modelcore HVR structures shows that the group
II has higher number of well-exposed hydrophobic residues
on IA face than that of group I (Fig. 3 and Table 1). For all
five subunit variants, the third hypervariable region (HVRIII) is located on IA face, whereas HVR-IV and HVR-V are
located on IE face of the trimeric protein molecules
(Supplementary data Fig. S1 and Fig. S3). According to
the primary sequence of the glycinin subunits, the HVRIII region does not contain hydrophobic residues but
HVR-V of group I and HVR-IV of group II contain considerable number of lipophilic residues and reside on the IE
face (8e9 residues and 14e16 residues, respectively)
(Supplementary data Fig. S1). However, the relatively short
HVR-V of group I protomers may not get exposed when the
hexamer is formed. The HVR-IV of the group II protomers
may protrude of the homohexamer molecule surface because of its long chain length (Fig. 3). The number of hydrophobic residues found in HVR-IV of group II is more
than those of other HVRs have. This protruded HVR-IV
can enhance the surface hydrophobicity of group II glycinin. The glycinin molecule composed of A5A4B3 subunits
has higher surface hydrophobicity than that of the A3B4
counterpart assessed by ANS binding (Tezuka et al.,
2004). An opposite trend (i.e. A3B4 > A5A4B3) has
been reported for the hydrophobicity values assessed using
phenyl- and butyl-Sepharose column chromatography
(Maruyama et al., 2004). The discrepancies found in these
studies may be related to the differences in the accessibility
of the molecular surface. The surface area of a protein molecule that is exposed to the surrounding solvent is referred
to as solvent accessible surface area (ASA) and can be estimated by rolling ball method with a radius of 1.4 
A (Lee
& Richards, 1971). The ASA of glycinin Modelcore HVR

are reported in Table 1. The hydrophobic residues and electrostatic potential of the glycinin trimers can be mapped on
the ASA (Fig. 3). The expansion of extension region of
HVR-III of A3B4 and A5A4B3 protomers is different.
This is evident in the side view of the molecules (90 rotation of IE or IA face molecule; Fig. 3 iv and v). Furthermore, the centre channel of the A3B4 homotrimer is
covered by the HVR-III (Fig. 3 iv) but the extension of
this disordered region is not as great as HVR-III of
A5A4B3 (Fig. 3 v and Supplementary data Fig. 2 v). The
centre channel of A5A4B3 trimer is not covered by
HVR-III suggesting that accessibility of ANS is easier
than in the A3B4 molecule. Surface adsorption of protein
molecules in the hydrophobic (Sepharose) column may be
easier in A3B4 due to the shorter HVR-III, on the other
hand, much relaxed and highly hydrophilic arms of HVRIII may sterically hinder the binding ability of A5A4B3
hexamer via IA face to the hydrophobic column. High surface hydrophobicity reported for A1bB2 proglycinin (i.e. in
trimer configuration and has the shortest HVRs among all
five protomers) that is determined by hydrophobic column
chromatography (Prak et al., 2005) may be due to the differences in HVR length.
Solubility
The solubility properties of a protein depend on the
physico-chemical nature of the molecular surface.
Moreover, protein solubility under a given set of conditions
is the thermodynamic manifestation of the equilibrium between proteineprotein and proteinesolvent interactions
and relates to the net free energy changes due to the interaction of hydrophobic and hydrophilic residues on the protein surface with the surrounding solvent. Therefore, the
distribution of electrostatic surface potential (may relate
to the salt binding sites) of a molecule and its surface
hydrophobicity are critical factors influencing solubility
properties of a protein (Damodaran, 2008). For the Modelcore HVR glycinin homotrimers, we calculated the electrostatic surface potential by solving PoissoneBoltzmann
equation using the Adaptive PoissoneBoltzmann Solver
(APBS) (Baker, Sept, Joseph, Holst, & McCammon,
2001) plug-in (developed by Michael G. Lerner, University
of Michigan) of PyMol (Warren L. DeLano, DeLano Scientific, San Carlos, CA, http://www.pymol.org). Electrostatic
surface potentials of Modelcore of soybean protomers show
generally slight positive (basic) charge on IE face than the
IA face. Electrostatic surface potential of IA face of group
II homotrimers (i.e. A3B4 & A5A4B3, Supplementary data
Fig. S1) shows a prominent negative charge and aligns well
with the lowest value for acidic:basic residues (2.2:1)
among the glycinin protomers (Table 1). Similar to surface
hydrophobicity, the surface electrostatic potential of glycinin Modelcore HVR shows remarkable differences when
mapped to the surface representation of homology models
(Fig. 3). Generally, the HVRs are rich in acidic residues
(Asx and Glx, Supplementary data Fig. S1) and may result

T.S. Withana-Gamage, J.P.D. Wanasundara / Trends in Food Science & Technology 28 (2012) 153e167

161

Fig. 3. Surface characterization of developed soybean glycinin models with HVRs (Modelcore HVR). (a) Surface hydrophobicity (i: A1aB1b, ii:
A1bB2, iii: A2B1a, iv: A3B4, & v: A5A4B3). Distribution of hydrophilic and hydrophobic residues assigned according to Kyte and Doolittle
(1982) scale is represented in green (hydrophilic) and red (hydrophobic) on the solvent accessible surface of the models. (b) Electrostatic potential
of molecular surfaces of glycinin models are indicated in colour and the values range from 5 kT/e (blue) and 5 kT/e (red).

in reduced intensity of positive charge on the IE face. The


long HVRs with high number of polar residues and the
dominant negative charge on both faces of homotrimers
or homohexamers may lead to a high solubility in group
II proteins. Repulsion of proteineprotein molecules due

to negative charges may contribute to this property. According to Prak et al. (2005), low ionic strength (m 0.08) resulted in precipitation of proglycinin protomers with
incomplete solubility for A2B1a and A1bB2 when pH
changed from 5.7 to 6.7. In the same study, very low

162

T.S. Withana-Gamage, J.P.D. Wanasundara / Trends in Food Science & Technology 28 (2012) 153e167

solubility has been observed at high ionic strength


(m 0.5) for all proglycinin protomers except A1aB1b
(60% soluble) for the pHs lower than 5.8. At pI and low
ionic strength, neutralization of charged residues of the protein surface may occur and hydrophobic interaction between proteineprotein may result in reduced solubility.
The high solubility of A1aB1b (Prak et al., 2005) may be
related to the less number of hydrophobic residues on surface. In the hexamer structure, solubility is a combination
of surface hydrophobicity and charge distribution of IA
face because IE face of the trimers interacts to form the
hexamer (Adachi, Okuda, et al., 2003) and may not be exposed to solvent phase. According to Maruyama et al.
(2004), the solubility values are in the order of group
I > A3B4 > A5A4B3 of glycinin at m 0.5 which agrees
with the decreasing order of surface hydrophobic differences explained here.
Emulsion and foam formation
The emulsifying and foaming properties of proteins are
related to molecular surface properties including surface
hydrophobicity, ligand binding ability, molecular flexibility,
and structure stability (Damodaran, 2008; Kumosinski &
Farrell, 1994; Prak et al., 2005; Voutsinas, Cheung, &
Nakai, 1983). The solubility of a protein strongly relates
to the proteinesaltewater interactions and depends on hydrophobicity and availability of salt-binding sites of the
molecules (Kumosinski & Farrell, 1994) which in turn influence emulsifying and foaming properties as well as
heat induced gel formation. The ability to denature at the
interface (surface denaturation) and the molecular flexibility are critical factors that contribute to the interfacial activity of a protein (Damodaran, 2008). In addition, the degree
of exposure of hydrophobic residues influences the thermodynamic stability of the protein; the high number of exposed hydrophobic patches in the protein leads to higher
thermal and interfacial denaturation susceptibility than
those with more hydrophobic residues buried inside
(Damodaran, 1994). The emulsifying ability of soybean
glycinin subunits has been reported in various studies
(Supplementary data Table S1). The order of emulsifying
ability of glycinin subunits reported by Maruyama et al.
(2004) is similar to that of the order of hydrophobicity
values obtained using ANS probe by Tezuka et al. (2004)
than those resulted in using hydrophobic chromatography.
The high number of hydrophobic residues on both IA and
IE faces and exposed hydrophobic residues in HVRs of
A5A4B3 and A3B4 (Fig. 3) may have contributed to their
higher emulsifying ability than that of group I protomers
(A1aB1b, A1bB2, and A2B1a). The calculated value of
main cavity area around the central channel of glycinin
Modelcore HVR structures is in the decreasing order of
A3) > A5A4B3
A1bB2 (5639.8 
A3) > A2B1a (4151.5 
3
3


A3)
(3999.9 A ) > A3B4 (3478.9 A ) > A1aB1b (2603.3 
and similar to the descending order of emulsifying ability
values reported by Prak et al. (2005) (Table 1). The size

of the pocket opening also shows a similar pattern except


for A1bB2 and A2B1a (Table 1) indicating a potential relationship of the central cavity area with emulsifying properties of these proteins. Globulin Modelcore structures are
generally compact and less flexible. The work of
Maruyama et al. (1999) on b-conglycinin devoid of extension regions (ac and a0c ) confirms the reduced flexibility of
compact molecule indicating that the HVRs can remarkably
influence emulsifying ability. The order of proglycinin subunits according to the residue length of the longest HVR on
IE face (i.e. HVR-V in group I and HVR-IV in group II) is;
A5A4B3 (104) > A3B4 (74) > A1aB1b (48) > A2B1a
(41) > A1bB2 (35) (Supplementary data Fig. S1) and
closely follows the order of emulsifying ability of proglycinin except higher values for A1aB1b than A5A4B3 reported previously (Table 1, Prak et al., 2005). The
number of lipophilic entities of the HVRs of group I and
II are 7e8 and 12e14 hydrophobic residues, respectively.
It can be postulated that the lengthy HVRs on IE face
and considerably high number of lipophilic residues of proglycinin or dissociated product of hexamer glycinin (i.e. trimers) together may contribute to favourable interfacial
activities for emulsion formation than that of the glycinin
hexamer. According to Martin, Bos, and van Vilet (2002),
glycinin in the 3S/7S form (at pH 3 due to dissociation of
11S form) adsorbs much faster at the air/water interface
than the 11S form (at pH 6.7) showing that the less compact
structure and exposure of favourable residues of the IE face
may affect interfacial activities of glycinin.
Heat-induced gel formation
Heat-induced gel formation or thermal gelation is another important property that proteins contribute to food
products such as processed meat, heat set gels, cakes, etc.
The properties that glycinin contributes to the understanding of heat-induced gelation of SSP are well documented
(Salleh et al., 2002; Tezuka et al., 2000, 2004; Utsumi,
Matsumara, & Mori, 1997). According to Yamauchi,
Yamagishi, and Iwabuchi (1991), heat-induced gel formation of soy glycinin follows first thermal unfolding, then
association-dissociation of subunits, followed by aggregation involving to certain extent, sulfhydryledisulphide exchange. Partial denaturation of native protein structure
due to heating depends on the stability of the molecular
structure for increased thermal energy and is a prerequisite
for subsequent aggregation to form the protein gel network
(Utsumi et al., 1997). In fact, the thermal stability of a protein is related to the structural features such as cavity size
of the molecule, number of proline residues (Fukuda,
Maruyama, Salleh, Mikami, & Utsumi, 2008; TandangSilvas et al., 2010), occurrence of hydrophobic amino acids
(Kumar & Nussinov, 2001), length of loop regions
(Chakravarty & Varadarajan, 2002), and elimination of surface loops (Kumar & Nussinov, 2001). The calculated main
cavity area around the central channel of glycinin
Modelcore HVR is in the order of A1bB2

T.S. Withana-Gamage, J.P.D. Wanasundara / Trends in Food Science & Technology 28 (2012) 153e167

(5639.8 
A3) > A2B1a (4151.5 
A3) > A5A4B3
3
3


A 3)
(3999.9 A ) > A3B4 (3478.9 A ) > A1aB1b (2603.3 
(Table 1). Studies on 7S (Fukuda et al., 2008) and 11S
(Tandang-Silvas et al., 2011) globulins have demonstrated
that the proteins with large cavity size have low thermal
stability, therefore the thermal stability of glycinin homotrimers could be predicted in the descending order as
A1aB1b > A3B4 > A5A4B3 > A2B1a > A1bB2. This
is exactly the same order of thermal stability for soybean
proglycinin reported by Prak et al. (2005) at subunit levels
suggesting that the cavity size of a protein molecule is
a good parameter to predict proteins thermal stability.
The lesser number of proline residues in A1bB2 (5.6%)
and A2B1a (5.4%) than A1aB1b, A5A4B3, and A3B4
(6.1, 6.7 and 7.2%, respectively) may contribute further
to thermal destabilization of A1bB2 and A2B1a homotrimers (Table 1). The proteins with long loops are susceptible to heat induced denaturation than those with shorter
loops (Chakravarty & Varadarajan, 2002; Kumar &
Nussinov, 2001). Although the A3B4, A5A4B3 and
A1aB1b have longer HVRs than other subunit variants, features such as high number of proline residues and small
cavity size may have negated the effect of loop length difference on thermal stability. The type and stability of a thermally induced gel can be predicted by evaluating surface
hydrophobicity, charge distribution, disulphide/sulfhydryl
(eSH/SeS) content and size of the cavities (Damodaran,
2008; Shimada & Matsushita, 1980). Soybean glycinin
contains two SeS bonds; one is interchain (between the
acidic and basic chain, A1aB1b: Cys124Cys45, A1bB2:
Cys314Cys64, A2B1a: Cys284Cys61, A3B4: Cys324Cys65, and A5A4B3: Cys334Cys66) and the other
is intrachain (within acidic chain, A1aB1b: Cys884Cys298, A1bB2: Cys1074Cys304, A2B1a: Cys1044Cys307, A3B4: Cys1084Cys385, and A5A4B3:
Cys1094Cys351) (Supplementary data Fig. S3). Using disulphide bond-deficient mutants C12G and C88S of proglycinin A1aB1b, Adachi and group revealed that the
contribution of inter- and intrachain disulphide bonds to
thermal stability is low, particularly for the proglycinin
A1aB1b protomer (Adachi, Okuda, et al., 2003).
The content of eSH and SeS bonds affects hardness of
heat-induced protein gel because of the disulphide bond exchange that may occur during heating (Shimada &
Matsushita, 1980; Tezuka et al., 2004). The HVR-V of
group I and HVR-IV of group II contain six (two per protomer) and three (one per protomer) eSH residues on IE
face, respectively (Supplementary data Fig. S2 and
Fig. S3). In the glycinin hexamer, the eSH residues may
be hidden inside the molecule and may not participate in
forming SeS bonds in the initial stage of heating. All glycinin trimers have three more eSH residues embedded in
the IA face with the potential to form disulphide bonds
when conditions are favourable (such as during heat induced aggregation). Three additional eSH residues are
found in group I subunit variants except A1bB2 protomer

163

(Cys53 of A1aB1b and Cys69 of A2B1a) on the sheet B


of acidic chain (Supplementary data Fig. S1a) which are
protruded towards the periphery of the molecule
(Supplementary data Fig. S3) suggesting their availability
to form extra cross-links. Such molecular characteristics
of group I variants may contribute to higher gel strength
than that of group II. This rationalization is also in accordance with the reported higher breaking stress of the curd
generated from group I variants than those of groups IIa
(A5A4B3) and group IIb (A3B4) by Tezuka et al. (2004).
The presence of high number of free eSH groups on IE
face of group I variants and external conditions (e.g. pH,
ionic strength, and temperature changes) that facilitate the
opening of hexameric structure may further enhance SeS
bond formation during aggregation and strengthen gel
structure.
Opportunity of homology modelling in food protein
functionality studies
Protein tertiary structure is a source of useful information for predicting functions and is widely used in
structure-based methods for functional annotations
(Kinoshita & Nakamura, 2003; Thornton, Todd, Milburn,
Borkakoti, & Orengo, 2000). Global folds of proteins as
well as local structural motifs are important in structurebased prediction of biochemical functions. OBrien
(1991) suggests that the physico-chemical properties, geometrical indices, topological indices and electrostatic properties of protein molecule structure as four closely related
categories of molecular descriptors which can be employed
to describe protein functional properties and biological activities. Similarly, the properties of protein structure and sequence i.e. numerical parameters derived from surface
geometry, sequence conservation, electrostatics, solvent accessibility are used in predicting proteins likely active sites
for biological functions (Laskowski et al., 2005; ProFunc
http://www.ebi.ac.uk/thornton-srv/databases/profunc/index.
html). For example, ligand docking that is related to binding site identification for predicting enzyme activity utilizes
binding pockets of a protein surface that are derived considering residue conservation, compactness, convexity, protrusion, rigidity, hydrophobicity and charge density (Ma et al.,
2001; Rossi, Marti-Renom, & Sali, 2006). When the structureeproperty relationships are concerned, the physical
properties of protein as well as the structural determinants
of protein intermolecular forces are important. The most
important FP of food proteins such as formation of emulsions, foams, gels and adhesive/cohesive films are related
to the protein surface characteristics that include charge
density, hydrophobicity, steric, and electrical forces. The
knowledge on quantitative characterization of interactions
of protein molecule in the chemical space (and biological
space) in terms of food functional properties is limited.
Therefore choosing appropriate molecular descriptors is
one of the challenges to derive structureeproperty relationships of food proteins.

164

T.S. Withana-Gamage, J.P.D. Wanasundara / Trends in Food Science & Technology 28 (2012) 153e167

Fig. 4. The proposed pathway for applying homology modelling and models in screening food proteins for desired functional properties. With the
enormous amount of amino acid sequence information available in databases and the limited number of 3-D structures, development of knowledgebased molecular structure models of several food proteins is quite possible. Initial screening of proteins at molecular or subunit levels in silico for
selected physico-chemical properties that are predictors of technologically valuable functional properties can be achieved at a fraction of the cost
and time compared to wet-lab techniques.

In this communication we selected Asx Glx,


His Arg Lys, S0, pI, solvent accessible surface area,
pocket area of central cavity, size of individual pocket
opening, number of Pro residues, total number of hydrophobic residues, and number of hydrophobic residues,
eSH residues and SeS groups separately of IE and IA
faces, as descriptors or indices of the molecular structure
of glycinin variants generated through homology modelling. The relationships and trends observed between the
quantitative values of these molecular parameters and functional properties experimentally obtained for the same glycinin variants (from literature) show moderate predictive
power (Supplementary data Table S2) indicating the need
of selecting most suitable molecular parameters and data
analysis methods for enhanced predictive power. Identification of best descriptors and predictive model development
using appropriate subset of descriptors may be achieved
through regression analysis or pattern recognition techniques. In addition, digestibility and release of bioactive sequences (by nicksite prediction for gastro-intestinal
enzymes), flavour molecule binding (docking or ligand
binding), allergenicity prediction (epitope sequences, proteineprotein interaction), and designing modification sites
for site-directed mutagenesis are possible with the modelled molecular structure.
A focused and integrated approach is needed to link food
crop genomics information to expressed proteins and then
to their structure and properties required in food. Therefore
building interfaces between in vitro screening and biological/chemical modification will help to identify and select
proteins for efficient use. The proposed mechanism for
the homology structure-based pre-screening of food proteins for desired functional proteins and intervention stages
for functionality modifications is depicted in Fig. 4. The expected outcome is the efficient use of proteins synthesized

and deposited in plants, which is through a selection or enhancement process at molecular level to perform desired biological activity or functionality in a complex system
which is our food.
Conclusion
In this paper, we show the possibility of using homology modelling to predict structure and physico-chemical
properties of glycinin at the molecular level in the in silico platform as an approach to understand and investigate
properties that are important in processing functionality.
Although functional properties of food proteins are at
macroscopic length scale, the structure related properties
of constituting molecules largely contribute to these.
Homology modelling allows to predict 3D-structure
based on genetic relationship or related proteins that are
well studied. This communication shows one of the
ways that food protein scientists can utilize bioinformatics (emphasis on homology modelling) to screen or
investigate suitability of a protein for specific functionalities needed in food. This approach resembles designing
of drugs in pharmaceutical and medicinal chemistry.
This proposed approach indeed requires proper validation
with well-defined food proteins and appropriate in vitro
data for FP. Homology modelling allow to derive molecular structure of a protein of interest and structure properties can be investigated to obtain physico-chemical
properties of the molecule that are important in processing functionality. Therefore, homology modelling can be
complementary to the existing approaches of food protein
structure and function prediction.
Acknowledgements
This work is supported by the Agriculture and AgriFood Canada (AAFC) funded project RBPI 1827.

T.S. Withana-Gamage, J.P.D. Wanasundara / Trends in Food Science & Technology 28 (2012) 153e167

SaskCanola is acknowledged for Dr. Roger Rimmer


Graduate scholarship provided to T. S. Withana-Gamage.
We thank Dr. B. Dave Oomah of AAFC Summerland, BC
for his valuable suggestions and critical reading of the
manuscript.

Supplementary data
Supplementary data related to this article can be found
online at http://dx.doi.org/10.1016/j.tifs.2012.06.014.
References
Adachi, M., Kanamori, J., Masuda, T., Yagasaki, K., Kitamura, K.,
Mikami, B., et al. (2003). Crystal structure of soybean 11S
globulin: glycinin A3B4 homohexamer. Proceedings of the
National Academy of Sciences of the United States of America,
100, 7395e7400.
Adachi, M., Okuda, E., Kaneda, Y., Hashimoto, A., Shutov, A. D.,
Becker, C., et al. (2003). Crystal structures and structural stabilities
of the disulfide bond-deficient soybean proglycinin mutants C12G
and C88G. Journal of Agricultural and Food Chemistry, 51,
4633e4639.
Adachi, M., Yagasaki, K., Gidamis, A. B., Mikami, B., & Utsumi, S.
(2001). Crystal structure of soybean proglycinin A1aB1b
homotrimer. Journal of Molecular Biology, 305, 291e305.
Aiking, H. (2011). Future protein supply. Trends in Food Science &
Technology, 22, 112e120.
Altschul, S. F., Madden, T. L., Schaffer, A. A., Zhang, J., Zhang, Z.,
Miller, W., et al. (1997). Gapped BLAST and PSI-BLAST: a new
generation of protein database search programs. Nucleic Acids
Research, 25, 3389e3402.
Baker, D., & 
Sali, A. (2001). Protein structure prediction and structural
genomics. Science, 294, 93e96.
Baker, N. A., Sept, D., Joseph, S., Holst, M. J., & McCammon, J. A.
(2001). Electrostatics of nanosystems: application to microtubules
and the ribosome. Proceedings of the National Academy of
Sciences of the United States of America, 98, 10037e10041.
Barre, A., Borges, J.-P., & Rouge, P. (2005). Molecular modelling of the
major peanut allergen, Ara h 1 and other homotrimeric allergens
of the cupin superfamily: a structural basis for their IgE-binding
cross-reactivity. Biochimie, 78, 499e506.
Barre, A., Jacquet, G., Sordet, C., Culerrier, R., & Rouge, P. (2007).
Homology modelling and conformational analysis of IgE-binding
epitopes of Ara h 3 and other legumin allergens with a cupin fold
from tree nuts. Molecular Immunology, 44, 3243e3255.
Bonneau, R., & Baker, D. (2001). Ab initio protein structure
prediction: progress and prospects. Annual Review of Biophysics
and Biomolecular Structure, 30, 173e189.
Bordoli, L., Kiefer, F., Arnold, K., Benkert, P., Battey, J., & Schwed, T.
(2009). Protein structure homology modeling using SWISSMODEL work place. Nature Protocols, 4, 1e13.
Brooks, B. R., Bruccoleri, R. E., Olafson, B. D., States, D. J.,
Swaminathan, S., & Karplus, M. (1983). CHARMM: a program for
macromolecular energy, minimization, and dynamics
calculations. Journal of Computational Chemistry, 4, 187e217.
Cabanos, C., Tandang-Silvas, M. R., Odijk, V., Brostedt, P., Tanaka, A.,
Utsumi, S., et al. (2010). Expression, purification, cross-reactivity
and homology modeling of peanut profilin. Protein Expression and
Purification, 73, 36e45.
Cavasotto, C. N., & Phatak, S. S. (2009). Homology modeling in drug
discovery: current trends and applications. Drug Discovery Today,
14, 676e682.

165

Chakravarty, S., & Varadarajan, R. (2002). Elucidation of factors


responsible for enhanced thermal stability of proteins: a structural
genomics based study. Biochemistry, 25, 8152e8161.
Ciccarelli, F. D., Doerks, T., von Mering, C., Creevey, C. J., Snel, B., &
Bork, P. (2006). Toward automatic reconstruction of a highly
resolved tree of life. Science, 311, 1283e1287.
Damodaran, S. (1994). Structureefunction relationship of food
proteins. In N. S. Hettiarachchy, & G. R. Ziegler (Eds.), Protein
functionality in food systems (pp. 1e37). New York: Marcel
Dekker.
Damodaran, S. (2008). Amino acids, peptides and proteins. In
S. Damodaran, K. L. Parkin, & O. R. Fennema (Eds.), Fennemas
food chemistry (pp. 217e329). Boca Raton: CRC Press.
Derbyshire, E., Wright, D. J., & Boulter, D. (1976). Legumin and
vicillin, storage proteins of legume seeds. Phytochemistry, 15,
3e24.
FAOSTAT (December, 2010). Food and Agricultural Organization,
Statistics Division, FAOSTAT food balance sheets. http://faostat.
fao.org/site/368/
Fiser, A., Do, R. K. G., & Sali, A. (2000). Modeling of loops in protein
structures. Protein Science, 9, 1753e1773.
Forster, M. J. (2002). Molecular modelling in structural biology.
Micron, 33, 365e384.
Fukuda, T., Maruyama, N., Salleh, M. R., Mikami, B., & Utsumi, S.
(2008). Characterization and crystallography of recombinant 7S
globulins of Adzuki bean and structureefunction relationships
with 7S globulins of various crops. Journal of Agricultural and
Food Chemistry, 56, 4145e4153.
Fukushima, D. (1991). Structures of plant storage proteins and their
functions. Food Reviews International, 7, 353e381.
Hillisch, A., Pineda, L. F., & Hilgenfeld, R. (2004). Utility of homology
models in the drug discovery process. Drug Discovery Today, 9,
659e669.
Kealley, C. S., Rout, M. K., Dezfoili, M. R., Strounina, E.,
Whittaker, A. K., Appleqvist, I. A. M., et al. (2008). Structure and
molecular mobility of soy glycinin the solid state.
Biomacromolecules, 9, 2937e2946.
Kinoshita, K., & Nakamura, H. (2003). Identification of protein
biochemical functions by similarity search using the
molecular surface database eF-site. Protein Science, 12,
1589e1595.
Kopp, J., & Schwede, T. (2004). Automated protein structure
homology modeling: a progress report. Pharmacogenomics, 5,
405e416.
Kumar, S., & Nussinov, R. (2001). How do thermophilic proteins
deal with heat? Cellular and Molecular Life Sciences, 58,
1216e1233.
Kumosinski, T. F., Brown, E. M., & Farrel Jr., H. M. (1991a). Molecular
modeling in food research. Trends in Food Science & Technology,
2, 110e115.
Kumosinski, T. F., Brown, E. M., & Farrel Jr., H. M. (1991b). Molecular
modeling in food research: applications. Trends in Food Science &
Technology, 2, 190e193.
Kumosinski, T. F., & Farrell Jr., H. M. (1994). Solubility of proteins:
proteinesaltewater interactions. In N. S. Hettiarachchy, &
G. R. Ziegler (Eds.), Protein functionality in food systems
(pp. 39e77). New York: Marcel Dekker.
Kyte, J., & Doolittle, R. F. (1982). A simple method for displaying the
hydropathic character of a protein. Journal of Molecular Biology,
157, 105e132.
Larkin, M. A., Blackshields, G., Brown, N. P., Chenna, R.,
McGettigan, P. A., McWilliam, H., et al. (2007). ClustalW and
ClustalX version 2. Bioinformatics, 23, 2947e2948.
Laskowski, R. A., MacArthur, M. W., Moss, D. S., & Thornton, J. M.
(1993). PROCHECK: a program to check the stereochemical
quality of protein structures. Journal of Applied Crystallography,
26, 283e291.

166

T.S. Withana-Gamage, J.P.D. Wanasundara / Trends in Food Science & Technology 28 (2012) 153e167

Laskowski, R. A., Watson, J. D., & Thornton, J. M. (2005). ProFunc:


a server for predicting protein function from 3D structure. Nucleic
Acids Research, 33(Web Server issue), W89eW93.
Lee, B., & Richards, F. M. (1971). The interpretation of protein
structure: estimation of static accessibility. Journal of Molecular
Biology, 55, 379e400.
Letunic, I., & Bork, P. (2007). Interactive Tree Of Life (iTOL): an online
tool for phylogenetic tree display and annotation. Bioinformatics,
23, 127e128.
Liebman, M. N. (1998). Information: a renewable resource in the
analysis of protein structure and function. In D. J. Sessa, &
J. L. Willett (Eds.), Paradigm for successful utilization of renewable
resources (pp. 88e106). Champaign: AOCS Press.
L
uthy, R., Bowie, J. U., & Eisenberg, D. (1992). Assessment of protein
models with three-dimensional profiles. Nature, 356, 283e285.
Ma, B., Kumar, S., Tsai, C.-J., Wolfson, H., Sinha, N., & Nussinov, R.
(2001). Protein-ligand interactions: Induced fit. Encyclopedia of
Life Sciences. Chichester: John Wiley & Sons Ltd. http://dx.doi.org/
10.1038/npg.els.0003140. http://www.els.net
Maggio, E. T., & Ramnarayan, K. (2001). Recent developments in
computational proteomics. Structural Bioinformatics, 6,
996e1004.
Martin, A. H., Bos, M. A., & van Vilet, T. (2002). Interfacial
rheological properties and conformational aspects of soy glycinin
at the air/water interface. Food Hydrocolloids, 16, 63e71.
Mart-Renom, M. A., Stuart, A. C., Fiser, A., Sanchez, R., Melo, F., &

Sali, A. (2000). Comparative protein structure modeling of genes
and genomes. Annual Review of Biophysics and Biomolecular
Structure, 29, 291e325.
Maruyama, N., Prak, K., Motoyama, S., Choi, S.-K., Yagasaki, K.,
Ishimoto, M., et al. (2004). Structureephysicochemical function
relationships of soybean glycinin at subunit levels assessed by
using mutant lines. Journal of Agricultural and Food Chemistry,
52, 8197e8201.
Maruyama, N., Sato, R., Wada, Y., Matsumura, Y., Goto, H.,
Okuda, E., et al. (1999). Structureephysicochemical functional
relationships of soybean b-conglycinin constituent subunits.
Journal of Agricultural and Food Chemistry, 47, 5278e5284.
Nakai, S. (1983). Structureefunction relationships of food proteins
with an emphasis on the importance of protein hydrophobicity.
Journal of Agricultural and Food Chemistry, 31, 676e683.
Nakai, S., & Li-Chan, E. (1988). Hydrophobic interactions in food
systems. Boca Raton: CRC Press.
Neilsen, N. C., Dickinson, C. D., Cho, T. J., Thanh, V. H.,
Scallon, B. J., Fischer, R. L., et al. (1989). Characterization of the
glycinin gene family in soybean. The Plant Cell, 1, 313e328.
OBrien, J. (1991). Molecular modeling in food research: an exciting
future. Trends in Food Science & Technology, 2, 185e186.
Pimentel, D., & Pimentel, M. (2003). Sustainability of meat-based and
plant-based diets and the environment. American Journal of
Clinical Nutrition, 78, 660Se663S.
Prak, K., Nakatani, K., Katsube-Tanaka, T., Adachi, M., Maruyama, N.,
& Utsumi, S. (2005). Structureefunction relationships of soybean
proglycinins at subunit levels. Journal of Agricultural and Food
Chemistry, 53, 3650e3657.
Pripp, A. H., Isaksson, T., Stepaniak, L., Srhaug, T., & Ard
o, Y. (2005).
Quantitative structure activity relationship modelling of peptides
and proteins as a tool in food science. Trends in Food Science &
Technology, 16, 484e494.
Rossi, A., Marti-Renom, M. A., & Sali, A. (2006). Localization of
binding sites in protein structure by optimization of a composite
scoring function. Protein Science, 15, 2366e2380.

Sali, A., & Blundell, L. T. (1993). Comparative protein modelling by
satisfaction of spatial restraints. Journal of Molecular Biology, 234,
779e815.
Salleh, M. R. B. M., Maruyama, N., Adachi, M., Hontani, N., Saka, S.,
Kato, N., et al. (2002). Comparison of protein chemical and

physicochemical properties of rapeseed cruciferin with those of


soybean glycinin. Journal of Agricultural and Food Chemistry, 50,
7380e7385.
Schein, C. H., Ivanciuc, O., & Braun, W. (2007). Bioinformatics
approaches to classifying allergens and predicting crossreactivity. Immunology and Allergy Clinics of North America,
27, 1e27.
Schmutz, J., Cannon, S. B., Schlueter, J., Ma, J., Mitros, T., Nelson, W.,
et al. (2010). Genome sequence of the palaeopolyploid soybean.
Nature, 463, 178e183.
Shewry, P. R., Napier, J. A., & Tatham, A. S. (1995). Seed
storage proteins: structure and biosynthesis. The Plant Cell, 7,
945e956.
Shimada, K., & Matsushita, S. (1980). Relationship between
thermocoagulation of proteins and amino acid compositions.
Journal of Agricultural and Food Chemistry, 28, 413e417.
Tandang-Silvas, M. R. G., Fukuda, T., Fukuda, C., Prak, K.,
Cabanos, C., Kimura, A., et al. (2010). Conservation and
divergence on plant seed 11S globulins based on crystal structures.
Biochimica et Biophysica Acta, 1804, 1432e1442.
Tandang-Silvas, M. R. G., Tecson-Mendoza, E. M., Mikami, B.,
Utsumi, S., & Maruyama, N. (2011). Molecular design of
seed storage proteins for enhanced food physicochemical
properties. Annual Review of Food Science and Technology, 2,
59e73.
Tezuka, M., Taira, H., Igarashi, Y., Yagasaki, K., & Ono, T. (2000).
Properties of tofus and soy milks prepared from soybeans having
different subunits of glycinin. Journal of Agricultural and Food
Chemistry, 48, 1111e1117.
Tezuka, M., Yagasaki, K., & Ono, T. (2004). Changes in characters of
soybean glycinin groups I, IIa, and IIb caused by heating. Journal
of Agricultural and Food Chemistry, 52, 1693e1699.
Thornton, J. M., Todd, A. E., Milburn, D., Borkakoti, N., &
Orengo, C. A. (2000). Form structure to function: approaches and
limitations. Nature Structural Biology, (Supplement), 11,
991e994.
Townsend, A.-A., & Nakai, S. (1983). Relationships between
hydrophobicity and foaming characteristics of food proteins.
Journal Food Science, 48, 588e594.
Utsumi, S., Matsumara, Y., & Mori, T. (1997). Structureefunction
relationships of soy proteins. In S. Damodaran, & A. Paraf (Eds.),
Food proteins and their application (pp. 257e291). New York:
Marcel Dekker Inc.
van Gunsteren, W. F., Billeter, S. R., Eising, A. A., H
unenberger, P. H.,
Kr
uger, P., Mark, A. E., et al. (1996). Biomolecular simulation: The
GROMOS96 manual and user guide. Z
urich: vdf Hochschulverlag
urich.
AG an der ETH Z
Voutsinas, L. P., Cheung, E., & Nakai, S. (1983). Relationships of
hydrophobicity to emulsifying properties of heat denatured
proteins. Journal of Food Science, 48, 26e32.
Voustinas, L. P., Nakai, S., & Harwalker, V. R. (1983). Relationships
between protein hydrophobicity and thermal functional properties
of food proteins. Canadian Institute of Food Science and
Technology, 16, 185e190.
Wiederstein, M., & Sippl, M. (2007). ProSA-web: interactive web
service for the recognition of errors in three-dimensional structures
of proteins. Nucleic Acids Research, 35, W407eW410.
Withana-Gamage, T. S., Hegedus, D. D., Qiu, X., &
Wanasundara, J. P. D. (2011). In silico homology modeling to
predict functional properties of cruciferin. Journal of Agricultural
and Food Chemistry, 59, 12925e12938.
Wright, D. J. (1987). The seed globulins. In Hudson, B J. F. (Ed.).
(1987). Developments in food proteins, Vol. 5 (pp. 81e157).
London: Elsevier.
Yamauchi, F., Yamagishi, T., & Iwabuchi, S. (1991). Molecular
understanding of heat induced phenomena of soybean protein.
Food Reviews International, 7, 283e322.

T.S. Withana-Gamage, J.P.D. Wanasundara / Trends in Food Science & Technology 28 (2012) 153e167

Websites (accessed on December, 2010)


http://itol.embl.de/itol.cgi.
http://modbase.compbio.ucsf.edu/modbase-cgi/index.cgi.
http://swissmodel.expasy.org/.

http://www.ebi.ac.uk/Tools/msa/clustalw2/.
http://www.ncbi.nlm.nih.gov/bioproject/.
http://www.pdb.org/pdb/home/home.do.
http://www.proteinmodelportal.org/.
http://www.pymol.org/.
http://www.uniprot.org/.

Put your research


ahead of the curve
Experience SciVerse - the new platform for
ScienceDirect and Scopus users - with:
Integrated search across ScienceDirect, Scopus and the scientific web,
ranked by relevance and without duplication
New applications that enhance search and discovery, allowing you to
search methodologies and protocols, view search terms highlighted
in full sentences and see the most prolific authors for search results

Experience it for yourself at www.info.sciverse.com/preview

Open to accelerate science

167

You might also like