You are on page 1of 33

“The time will come, I believe, though I shall not live to see

it, when we shall have fairly true genealogical trees of each


great kingdom of Nature” - Charles Darwin
Charles Darwin
On the origin of
species.
1859
Chapter IV -
Natural Selection
What is Molecular Phylogenetics

Phylogenetics is the study of evolutionary relationships

Example - relationship among species


crocodiles
primates
rodents birds
birds lizards
crocodiles
snakes
marsupials snakes rodents
primates
lizards marsupials
A Brief History of Molecular Phlogenetics
1900s
Immunochemical studies: cross-reactions stronger for closely related
organisms
Nuttall (1902) - apes are closest relatives to humans

1960s - 1970s
Protein sequencing methods, electrophoresis, DNA hybridization and PCR
contributed to a boom in molecular phylogeny

late 1970s to present


Discoveries using molecular phylogeny:
- Endosymbiosis - Margulis, 1978
- Divergence of phyla and kingdom - Woese, 1987
- Many Tree of Life projects completed or underway
Molecular data vs. Morphology / Physiology
Molecular data vs. Morphology / Physiology

 Strictly heritable entities  Can be influenced by


environmental factors
 Data is unambiguous  Ambiguous modifiers: “reduced”,
“slightly elongated”, “somewhat
flattened”
 Regular & predictable evolution  Unpredictable evolution

 Quantitative analyses  Qualitative argumentation

 Ease of homology assesment  Homology difficult to assess

 Relationship of distantly related  Only close relationships can be


organisms can be inferred confidently inferred

 Abundant and easily generated with  Problems when working with micro-
PCR and sequencing organisms and where visible
morphology is lacking
Phylogenetic concepts:
Interpreting a Phylogeny
Sequence A

Sequence B
• Physical position in tree
Sequence C is not meaningful
• Swiveling can only be
Sequence D done at the nodes
• Only tree structure
matters
Sequence E
Present
Time
Phylogenetic concepts:
Interpreting a Phylogeny
Sequence A

Sequence B
• Physical position in tree
Sequence E is not meaningful
• Swiveling can only be
Sequence D done at the nodes
• Only tree structure
matters
Sequence C
Present
Time
Tree Terminology
- Relationships are illustrated by a phylogenetic tree / dendrogram
- The branching pattern is call the tree’s topology
- Trees can be represented in several forms:

Rectangular cladogram

Slanted cladogram
Tree Terminology
- Relationships are illustrated by a phylogenetic tree / dendrogram
- The branching pattern is call the tree’s topology
- Trees can be represented in several forms:

Circular cladogram
Tree Terminology
Operational taxonomic units (OTU) / Taxa

Internal nodes A

C
Terminal nodes
D
Sisters
Root E

Branches
Polytomy
Tree Terminology
Rooted vs. unrooted trees
D
A B
A E
B
C
D
Root E
C
F
F

Rooted trees: Has a root that denotes common ancestry


Unrooted trees: Only specifies the degree of kinship among taxa but
not the evolutionary path
Tree Terminology
Scaled vs. unscaled trees

A
B
C
D
E
F

Scaled trees: Branch lengths are proportional to the number of


nucleotide/amino acid changes that occurred on that branch (usually a
scale is included).
Unscaled trees: Branch lengths are not proportional to the number of
nucleotide/amino acid changes (usually used to illustrate evolutionary
relationships only).
Tree Terminology
Monophyletic vs. paraphyletic

Saturnite 1 Jupiterian 32
Saturnite 2 Jupiterian 5
Saturnite 3 Jupiterian 67
Martian 1 Human 11
Martian 3 Jupiterian 8
Martian 2 Human 3

Monophyletic groups: All taxa within the group are derived from a
single common ancestor and members form a natural clade.
Paraphyletic groups: The common ancestor is shared by other taxon in
the group and members do not form a natural clade.
Methods in Phylogenetic Reconstruction

 Distance
 Maximum Parsimony
 Maximum Likelihood
Bayesian

* All algorithms are calculated using available software,


eg. PAUP, PHYLIP, McClade, Mr. Bayes etc.
Comparison of Methods
Distance Maximum Maximum likelihood
parsimony
Uses only pairwise Uses only shared Uses all data
distances derived characters

Minimizes distance Minimizes total Maximizes tree likelihood


between nearest distance given specific parameter
neighbors values
Very fast Slow Very slow
Easily trapped in local Assumptions fail Highly dependent on
optima when evolution is assumed evolution
rapid model
Good for generating Best option when Good for very small data
tentative tree, or tractable (<30 taxa, sets and for testing trees
choosing among homoplasy rare) built using other methods
multiple trees
Methods in Phylogenetic Reconstruction

Distance
• Using a sequence alignment, pairwise distances are calculated
• Creates a distance matrix
• A phylogenetic tree is calculated with clustering algorithms, using the
distance matrix.
• Examples of clustering algorithms include the Unweighted Pair Group
Method using Arithmetic averages (UPGMA) and Neighbor Joining
clustering.
A A A

B B B

C C

D
Methods in Phylogenetic Reconstruction

Maximum Parsimony
• All possible trees are determined for each position of the sequence
alignment
• Each tree is given a score based on the number of evolutionary step
needed to produce said tree
• The most parsimonious tree is the one that has the fewest evolutionary
changes for all sequences to be derived from a common ancestor
• Usually several equally parsimonious trees result from a single run.
Maximum parsimony: exhaustive stepwise addition
B C

Step 1
A

B D B D C B C
C D
Step 2
A A A

E
D E D D E
B B B
C C C

A A A …………………
Step 3
Methods in Phylogenetic Reconstruction

Maximum Likelihood
• Creates all possible trees like Maximum Parsimony method but
instead of retaining trees with shortest evolutionary steps……
• Employs a model of evolution whereby different rates of
transition/transversion ration can be used
• Each tree generated is calculated for the probability that it reflects
each position of the sequence data.
• Calculation is repeated for all nucleotide sites
• Finally, the tree with the best probability is shown as the maximum
likelihood tree - usually only a single tree remains
• It is a more realistic tree estimation because it does not assume equal
transition-transversion ratio for all branches.
How confident are we about the inferred phylogeny?

? rat
human
?
turtle
? fruit fly
? oak
duckweed
Bootstrapping
The Bootstrap

• Computational method to estimate the confidence level of a certain


phylogenetic tree.
Pseudo sample 1
001122234556667
Sample
rat GGAAGGGGCTTTTTA
0123456789 human GGTTGGGGCTTTTTA
rat GAGGCTTATC turtle GGTTGGGCCCCTTTA
human GTGGCTTATC fruitfly CCTTCCCGCCCTTTT
turtle GTGCCCTATG oak AATTCCCGCTTCCCT
fruitfly CTCGCCTTTG duckweed AATTCCCCCTTCCCC
oak ATCGCTCTTG
duckweed ATCCCTCCGG
Pseudo sample 2
445556777888899
rat CCTTTTAAATTTTCC
rat human CCTTTTAAATTTTCC
turtle CCCCCTAAATTTTGG
human
fruitfly CCCCCTTTTTTTTGG
turtle
oak CCTTTCTTTTTTTGG
fruit fly
duckweed CCTTTCCCCGGGGGG
oak
duckweed
Many more replicates
Inferred tree (between 100 - 1000)
Bootstrap values

100 rat
65 human
turtle
0
fruit fly
55 oak
duckweed

• Values are in percentages


• Conventional practice: only values 60-100% are shown
Some Discoveries Made Using
Molecular Phylogenetics
Universal Tree of Life
• Using rRNA sequences
• Able to study the
relationships of uncultivated
organisms, obtained from a
hot spring in Yellowstone
National Park

Barns et al., 1996


Endosymbiosis: Origin of the Mitochondrion
and Chloroplast

-Purple
Bacteria Other bacteria

Chloroplasts
Mitochondria

Root Cyanobacteria
Eukaryotes

Archaea

Mitochondria and chloroplasts are derived from the -purple bacteria


and the cyanobacteria respectively, via separate endosymbiotic events.
Relationships within species: HIV subtypes

Rwanda A
Ivory Coast
Italy Uganda
B U.S.
U.S. India Rwanda
U.K. C
Ethiopia
Uganda
S. Africa
Uganda
D Netherlands
Tanzania

Russia
Romania F G
Taiwan
Cameroon Brazil Netherlands
Problems and Errors in Phylogenetic Reconstruction

• Inherent strengths and weaknesses in different tree-making


methodologies (see the Holder & Lewis reading for this week).
• More is better: Errors in inferred phylogeny may be caused by small data
sets and/or limited sampling.
• Unsuitable sequences: those undergoing rapid nucleotide changes or
slow to zero changes overtime may skew phylogenetic estimations
• Mutations: Duplications, inversions, insertions, deletions etc. can give
inaccurate signals
• Genomic hotspots: small regions of rapid evolution are not easily detected

• Homoplasy: nucleotide changes that are similar but occurred independently


in separate lineages are mistakenly assumed as inherited changes

• Sample contamination / mislabeling: always a possibility when working with


large data sets
Cautionary tales in phylogenetics
The position of Amborella as sister to all flowering plants

By adding Acorus, a non-cereal monocot, Amborella is placed as a


basal flowering plant.
Cautionary tales in phylogenetics
The phylogeny of chordates constructed using 20 mitochondrial genes

Because mitochondrial coding


sequences from carp and trout
undergo rapid evolution (nucleotide
substitutions), they experience long
branch attraction, which causes
their misplacement.
Han Chuan Ong
hanong@u.washington.edu

You might also like