You are on page 1of 9

Cell and Molecular Biology Laboratory First Term 2016-2017

Experiment 9: Bioinformatics Tools for Cell and Molecular Biology

Daehl R. Santiago, Jerard Angelo A. Sio, Aleziz Kryzzien V. Tan, Elizabeth Jade L. Vicera
Department of Biological Sciences, College of Science
University of Santo Tomas, Espana, Manila 1051

Date submitted: December 1, 2016

Introduction
Bioinformatics utilizes the statistical, mathematical and analytical capability of
computers to interpret and evaluate available data. This interdisciplinary field was born due to
the huge amounts of data that was discovered after significant advances in other scientific fields
especially molecular biology. Storing and analyzing of the large amounts of data such as DNA
and protein sequences became impractical without the use of computers. Through the evolution
of technology, database and computational and analytical programs are now available today
which are very useful for researchers around the world. One such program is the Molecular
Evolutionary Genetics Analysis software or MEGA. The program was created by Masatoshi Nei
and his associates in the Pennsylvania State University. This software is mainly used in the
evaluation of evolutionary data and the construction of phylogenetic trees by using DNA and
amino acid sequences.
Genetic or evolutionary distance measure of how closely related species or
populations are to each other through the analysis of DNA/Protein sequences.
Phylogenetic tree an illustration that is used to organize and classify the numerous
species and organisms that have been studied and discovered. It shows the evolutionary
descent and relationships of various organisms and a common ancestor based on their
similarities and differences.
Neighbor-Joining Method an algorithm that utilizes a distance matrix to form a
phylogenetic tree.
Objectives
To estimate evolutionary distance by computing the differences of DNA and/or Protein
sequences
To construct a phylogenetic tree of the given species
Procedure
A. Aligning sequences
MEGA 7.0 software was downloaded from the internet. MEGAs integrated browser was
used to get GenBank sequence data from the NCBI website. Align | Edit/Build Alignment was
selected from the main menu. Once prompted, Create New Alignment was selected and ok
was clicked, afterwards, Protein was selected. In M7: Alignment Explorer, MEGAs
integrated browser was activated through selecting Web | Query GenBank from the main
menu. Once NCBI: Protein site was loaded; rbcL was entered followed by the scientific
name of the plant as the search item in the search box. Search button was selected. The
results were displayed, and boxes of items desired to import into MEGA were ticked. FASTA
was clicked and the site reloaded with the amino acid sequence in a FASTA format. Add to
Alignment button was pressed and sequences were imported into Alignment Explorer. The
steps were repeated for the remaining plant samples. Once done, Web Browser window was
closed.
Alternative procedure:
rbcL amino acid sequence for plant samples was downloaded from
http://www.ncbi.nlm.nih.gov/protein. In the search window, rcbL plus the scientific name of
the plant was searched. A list of sequences appeared and the complete protein was chosen.
GenProt was clicked, which is below the protein of choice. Amino acid sequence was copied
and pasted in an MS-Word document. Steps were repeated for the remaining plant samples.
Amino acid sequences were directly copied to MEGA7.
Aligning sequences by ClustalW:
MEGA7.0 was opened and Align | Build Alignment was selected. Once prompted, Create
New Alignment then ok was clicked. Protein was selected. M7: Alignment Explorer was
opened, and Data | Create a new alignment was clicked then protein was selected. Edit |
Insert blank sequence was clicked, and the area for the new sequence was marked as
sequence 1. It was right clicked and Edit sequence name was selected. Name of the plant
was typed and Tab was pressed. The amino acid sequence from the MS Word document was
copied and pasted in the M7 Alignment Explorer. This was done in the remaining plant list.
Once done, Edit | Select All was clicked. Alignment | Align by ClustalW was selected from
the main menu and the selected sequences data were aligned using ClustalW algorithm. Ok
button was clicked in order to accept the default settings for ClustalW. Completed alignment
was saved by selecting Data | Export Data from the main menu. Alignment explorer was
then closed by selecting Data | Exit Aln Explorer.

B. Estimating evolutionary distances using Pairwise Distance


The saved data was opened and Distance | Compute Pairwise Distance was selected from
the main MEGA launch bar. Substitutions Type was selected from the Analysis Preferences
window and the Amino acid option was chosen. Model/Method was clicked and p-distance
model was selected. Compute was clicked to start computation. A progress window appeared
and it was left open in order to compare results.

C. Computing the proportion of amino acid differences


Distance | Compute Pairwise Differences was selected from the main menu of the main
MEGA window. Analysis Preferences window was displayed. Amino Acid was selected from
the Substitutions Type pull down, and p-distance was selected under Model/Method.
Compute button was clicked to accept the default values for the rest of the options and to
begin the computation. A results viewer window will be displayed with the nucleotide
estimation. Results were inspected and File | Quit Viewer was selected to close the results
viewer. The data was closed by selecting Close Data.

D. Building a Neighbor-Joining (NJ) Tree


The data file from earlier was activated. Phylogeny | Construct/Test Neighbor-Joining
Tree was selected from the option of the main MEGA launch bar. Analysis Preferences was
opened and p-distance was selected under the Model/Method drop-down. Compute button
was clicked to accept the default values for the rest of the options and to begin the
computation. Progress indicator appeared before the tree displays in the Tree Explorer
window. A branch was selected and the up, down, left, right arrow keys were pressed to
navigate the cursor through the tree. The branch style was changed by selecting the View |
Tree/Branch Style from the Tree Explorer menu. View | Topology Only was chosen from the
Tree Explorer menu and the branching pattern was displayed. The numerical branch lengths
were displayed by selecting View | Options and Branch tab. The box labeled Display Branch
Length was checked.

Discussion

Figure 1

Figure 2
After following the instructions stated in the manual, the following phylogenetic trees
were obtained. The first displays the evolutionary relationship between the plants while the
second is a more simplified topology.
It can be gathered that Delonix regia and Arachis hypogea are the most closely related
since the evolutionary distance obtained (around 0.0200) is the shortest. Evolutionary distance is
defined as when 2 or species last shared a common ancestor. It was estimated by Mega7 program
computing the proportion of nucleotide differences between each pair of sequences or based on
the differences of proportions of the amino acids.
The common ancestor of Delonix regia and Arachi hypogea shares an ancestor with
Caladium bicolor and so on and forth until Kyllinga monocephalo and Hibiscus rose-sinensis
which are least related plants in the known selection since it is farthest and newest in terms of its
own evolutionary distance.
Organisms belonging to same clade are more likely to be part of the same class or order.
As Delonix regia, Arachis hypogea, Lagerstroemia speciosa, and Caladium bicolor are very
close to each other (in terms of phylogenetic tree and later by sequencing) we can assume that
they are part of the same family or order. Following the same logic, we can also assume that
Kyllinga monocephala and Hibiscus rosa-sinensis are farther from the rest, they may be part of
another family and/or order, Hibiscus rosa-sinensis especially.

Figure 3
The Figure 3 above represents p-distances of the difference between pairs of sequences
from nucleotides of each plant to the rest.
The numbers represent p-distances between the plants: Kyllinga monocephala has p-
distance of 0.039 from Pistia stratiotes; Caladium bicolor has a p-distance of 0.056 to Kyllinga
monocephala, a 0.026 p-distance to Pistia stratiotes; Commelina benghalensis has 0.063 to
Kyllinga monocephala, 0.039 to Pistia stratiotes, 0.046 to Caladium bicolor; Curcuma longa has
a p-distance of 0.059 to Kyllinga monocephala, 0.030 to Pistia stratiotes, 0.023 to Caladium
bicolor, and 0.039 to Commelina benghalensis; Delonix regia has a p-distance of 0.059 to
Kyllinga monocephala, 0.033 to Pistia stratiotes, 0.020 to Caladium bicolor, 0.053 to
Commelina benghalensis, and 0.036 to Curcuma longa; Arachis hypogea has a p-distnace of
0.072 to Kyllinga monocephala, 0.046 to Pistia stratiotes, 0.030 to Caladium bicolor, 0.063 to
Commelina benghalensis, 0.053 to Curcuma longa, 0.020 to Delonix regia; Hibiscus rosa-
sinensi has a p-distance of 0.931 to Kyllinga monocephala, 0.928 to Pistia stratiotes, 0.928 to
Caladium bicolor, 0.924 to Commelina benghalensis, 0.928 to Curcuma longa, 0.928 to Delonix
regia, and 0.928 to Arachis hypogea; and Lagerstroemia speciose has a p distance 0.059 to
Kyllinga monocephala, 0.033 to Pistia stratiotes, 0.013 to Caladium bicolor, 0.053 to
Commelina benghalensis, 0.036 to Curcuma longa, 0.016 to Delonix regia, 0.023 to Arachis
hypogea, and 0.928 to Hibiscus rose-sinensis.
Since the distance between Arachis hypogea and Delonix regia is the shortest when
compared to the rest of the plants, they are closely related compared to the other plants and have
recent share a common ancestor. Arachis hypogea and Delonix regia are also more closely
related to Lagerstroemia speciosa and Caladium bicolor and are thus closer to it in the
phylogenetic tree than Curcuma longa and since Arachis hypogea has a larger p-distance than
Delonix regia, it has a longer evolutionary distance of the two.
It can also be noted that the lower the number, the shorter is its evolutionary distance to
the other plants, meaning it evolved separately from them. Hibiscus rosa-sinensis has the largest
distance out of all of the plants, this is correlated with the length of its evolutionary distance;
similarly, Lagerstroemia speciosa has the lowest overall distance and thus has the shortest
evolutionary distance.
When the plants were compared with their order and family it was found that Pistia
stratiotes and Caladium bicolor are both part of the order Alismatales and family Araceae. This
correlates with the data obtained from Mega7, which gave a p-distance of 0.026. Similarly,
Arachis hypogea and Delonix regia share the same order and family as well (Fabales and
Fabaceae) with a value of 0.020. Interestingly, some plants showed a lower p-distance value
when compared to other plants but are not part of the same family or order; Curcuma longa and
Caladium bicolor have a value of 0.023, much lower than Pistia stratiotes and Caladium
bicolors value of 0.026. A possible reason for this can be due to the presence of key difference
within the nucleotide sequnces.

Plant Order Family


Commelina benghalensis Commelinales Commelinaceae
Curcuma longa Zingiberales Zingiberaceae
Kyllinga monocephala Poales Cyperaceae
Pistia stratiotes Alismatales Araceae
Caladium bicolor Alismatales Araceae
Delonix regia Fabales Fabaceae
Hibiscus rosa-sinensis Malvales Malvaceae
Lagerstroemia speciosa Myrtales Lythraceae
Arachis hypogea Fabales Fabaceae
Table 1

It can be concluded that using this program is beneficial in creating phylogenetic trees.
Table 1 is the resulting phylogenetic tree that is given by the MEGA 7.0 software.

Conclusion
The software MEGA or Molecular Evolutionary Genetics Analysis is a bioinformatics
tool that is used at comparing the similarities between the amino acids of the DNA/protein
sequences. It involves the comparative analysis of homologous gene sequences from different
species. The similarities can show the evolutionary timeline from the different sequences. This
can be used to create a phylogenetic tree from different organisms by using their DNA. The
sequences from 9 specimens were downloaded from the NCBI website. The sequences gathered
were aligned using the MEGA software and then a phylogenetic tree was created to show the
evolutionary relationships of each of the 9 specimens. The software determined the relationships
between specimens through the comparison of their p-distances. The phylogenetic tree showed
that the outgroup among the 9 specimens is Hibiscus rosa-sinensis. The remaining specimens
shared a common ancestor from the specimen Kyllinga monocephala. The closely related
specimens are Delonix regia and Arachi hypogea as they share the least amount of difference in
their p-distances and therefore share a common ancestor in a single clade in the phylogenetic
tree. In summary, If the values of the p-distances are closer from each other it means that they
are more likely to be related with one another in terms of their DNA sequence and are more
likely to share a common ancestor.

References

Books and Journals:

Futuyma, D. J. (1998). Evolutionary biology. Sunderland, MA: Sinauer Associates.


Kumar, S., Nei, M., Dudley, J., & Tamura, K. (2008). MEGA: A biologist-centric software for
evolutionary analysis of DNA and protein sequences. Briefings in Bioinformatics, 9(4),
299-306. doi:10.1093/bib/bbn017
Mount, D. W. (2001). Bioinformatics: Sequence and genome analysis. Cold Spring Harbor, NY:
Cold Spring Harbor Laboratory Press.
Nei, M., & Kumar, S. (2000). Molecular evolution and phylogenetics. Oxford: Oxford University
Press.

Websites:

(n.d.). Retrieved November 30, 2016, from http://www.nature.com/scitable/topicpage/reading-a-


phylogenetic-tree-the-meaning-of-41956

Bioinformatics. (n.d.). Retrieved November 30, 2016, from


https://en.wikipedia.org/wiki/Bioinformatics

MEGA, Molecular Evolutionary Genetics Analysis - Wikipedia. (n.d.). Retrieved November 30,
2016, from
https://en.wikipedia.org/wiki/MEGA,_Molecular_Evolutionary_Genetics_Analysis&p=D
evEx,5090.1

Neighbor joining. (n.d.). Retrieved November 30, 2016, from


https://en.wikipedia.org/wiki/Neighbor_joining

Neighbor Joining (Construct Phylogeny). (n.d.). Retrieved November 30, 2016, from
http://www.megasoftware.net/mega4/WebHelp/part_iv___evolutionary_analysis/construc
ting_phylogenetic_trees/statistical_tests_of_a_tree_obtained/interior_branch_tests/hc_nei
ghbor_joining.htm

Phylogenetic tree. (n.d.). Retrieved November 30, 2016, from


https://en.wikipedia.org/wiki/Phylogenetic_tree

What is bioinformatics | BioPlanet. (n.d.). Retrieved November 30, 2016, from


http://www.bioplanet.com/what-is-bioinformatics/

You might also like