You are on page 1of 13

6.

Molecular clocks
Molecular clocks are a tool that we are going to use to measure the time. We
are going to measure the evolution time, the time since two species split
apart, how much time have pass since those two species were just one.
Another characteristic of these clocks is that they are molecular. Molecular
in this context means that this clocks use the information about some
molecules, specially DNA molecules and protein molecules, and with that
information we are going be able to know how much time pass since the
species split apart. Molecular clocks are very sloppy clocks, not so trustful,
bad clocks. So the three main characteristics of molecular clocks are:
1. The measure time
2. They are molecular
3. They are sloppy
The longer two lineages have been separated, the more differences we
expect when we compare their genomes. What we are doing it's just starting
to think that, to use molecular clocks, we have to check how many
differences are between two species when we consider a particular
sequence. That sequence can be haemoglobin, glucagon, cytochrome-C... If
we count the number of differences we can know if those species as far
away or not in evolution.
Comparisons between DNA sequences can help us estimate when lineages
diverged from each other. To estimate evolutionary time, we need to know
the rate of molecular evolution (r), is something that we can obtain dividing
the number of differences between those two sequences, by the time lapse
between those species split apart, the time that took for those two species

to split apart: 2K
r=
T
Some scientist found out, not very far away in time, that the rates of
aminoacid replacement (the rates at which one aminoacid changes in the
protein sequence) were approximately the same among different
mammalian lineages. That means that the rate at which haemoglobin
changes in mammalian is approximately the same for all the species. So if
the rate of change in aminoacid substitutions in a cow is the same as in
human, we can evaluate (taking into account that rate in both) when those
2 species split apart. This is the clue for scientist to realize that we can have
molecular clocks.
For any given protein, the rate of molecular evolution is approximately
constant (the same) over time. So, haemoglobin has change over the time
at the same rate. Cytochrome-C would have change at a different rate than
haemoglobin but constantly over the time also. And that happens for all
lineages When they realize about this, the scientism recognize that hey
where in front of a molecular clock.

1
Molecular clocks are not so truthful, because the changes don't occur
exactly regularly, that's why they are a little bit sloppy, it can happen. But in
general terms this is true as far as that particular protein keeps on having
that function. If there is a mutation that changes the haemoglobin function
in a particular species. In that case, that particular species will take out of
that rule. As far as the protein we are dealing with keeps on having the
function it suppose to have, this is true.
If we are able to calculate the number of differences between the sequences
and we know the rate of evolution, then we can calculate the time that
happened since the two species split apart. And how do we know the rate at
which one particular protein changes over evolution? Because we are in
front of a problem with two parameters that we don't know (r and T).
What we need is a reference. Here our reference should be the fossil record,
the strata, that tell us with certain degree of certainty about when a couple
of species split apart. We have old fashion data (strata, fossils, geographic
distribution...) and we have a lot of data that tell us that a certain species
split apart with a certain degree of certainty. If we compare the sequences
and we know the time (because with we know it by these methods), then we
get the rate.
And then we use this rate that we have proved with two known sequences,
in which we know exactly, or very well, the time when they split apart, and
that's how we calibrate our clock. Then, we can use that rate when we don't
know it took to split apart to a couple of species. We need that because
many times we don't have fossil data or other type of data to calculate the
time it took to two species to split apart. We calibrate our clock with those
species we know very well the time it took to those species to split apart,
because we have plenty of fossil data for those two species, and other
evidences. Those are the ones we use to calibrate our watch. An then we
can use that with the species of which we don't have de faintest idea of
when they split apart. We know the rate because we have calibrate our
clock. Then we can build up new data to that phylogenetic tree.
We are trying to know why the haemoglobin has the same rate of evolution
in all the species. Because we know that the processes can be different and
that are sources of mutations that can lead to different rates. But even in
those cases, all those differences counterbalance, so the reality is that the
rate is the same. One process can increase the rate and the other decrease
it.
And there is another thing that is the neutral theory that has a lot to do with
this, because we tend to think that most of the mutations are driven by
success in the performance, but the neutral theory says that that's not the
majority. It assesses that the majority of the mutations occur just by chance
in terms of genetic drift and bias processes.
So, somehow, that unspecific environmental driven processes related to a
particular species counterbalance. So up to date, the best tool to check and
built phylogenetic trees are molecular clocks.

2
1. Rate of molecular evolution
Darwin said that the evolutionary change is continuous but he never said
that it would occur at the same rate in all lineages. We are talking here
about organisms, not about a particular protein. They are different concepts.
In fact, THE RATE OF MOLECULAR EVOLUTION VARIES BETWEEN LINEAGES.
We have two examples:
Honeycreepers 5-10 million year =50 different species
Coelacanth 5-10 million years = almost the same.
The coelacanth keep being the same after that amount of years. In the
contrary, honeycreepers had a rate of evolution faster.
They are two different things, we can get the rate of evolution of a protein
comparing the sequences of the proteins in common between organisms,
and then we can compare the rates of evolution of all the proteins in an
organism to get the rate of evolution of that species. But the importance of
the proteins in the phenotype and in the performance in the environment is
different.
Depending in which protein we base, the protein is going to change at the
same rate in all the species. When we calculate the phylogenetic distance
between two species, we get different proteins and we calculate the rates
and the time, and we make the combinations of whole of those before we
get the information of the divergence time in those two species.
For example, cytochrome-C can be an essential protein for every single
species, but if we are talking about haemoglobin, it would be different if we
are talking about sea mammals or about a particular species for which the
haemoglobin is very important.
We have to get the proteins individually, but then we have to summarize all
that in the same picture for us to give those general numbers to each
species. The molecular clocks allow us to arrive to a general conclusion in
terms of species. And we have to make different analysis with each
molecular clock, there is a molecular clock for each protein, and the
combination of those measurements gives us the real divergence time.
WHY?
George Gaylord Simpson: Coined the term TEMPO AND MODE EVOLUTION
(tempo = peace, mode = type)
The way the pace and the type of evolutionary change can vary between
lineages, over different periods in evolutionary history, or in different places.
To compare the tempo and mode across lineages or periods or places, it is
necessary to estimate rates of evolutionary change.
RATE: DISTANCE DIVIDED BY TIME:

2K
r=
T

3
The rate of evolution is what tell us if we can compare different sequences.
To calculate it we need some measure of the amount of evolutionary change
accumulated and the period of evolutionary time over which this change
occurred. To calculate the rate of evolution in a certain protein what we have
to do is compare the sequences of two different species of that particular
protein, multiply the differences times 2, and divide it by the time. So if we
have the rate and we observe the differences between two sequences we
can calculate the time of divergence of those two sequences, and that's
what we use as a molecular clock.
But the problem is that many time we don't know the rate and we don't
know the time. And for that we use the fossil record data, that many times is
quite accurate. Time can come from fossil data (Simpson). It provides the
primary source of temporal information in evolutionary biology. But there
are not too many data! It is patchy! (timer, areas in biological coverage).

If we have two species where we know the time of divergence because we


have good fossil records, what we do is to calculate the differences in that
particular protein sequence for those two species, and from that we take the
rate of evolution of that particular protein. Usually we do this with many
different species of which we know the divergence time in fossil record.
DNA sequence:
To use DNA sequences to investigate the tempo and mode of evolution we
need to be able to calibrate the rate of genomic change against a known
evolutionary timescale.
How to use DNA data to estimate the timing of evolutionary events?

2. Molecular clocks:
Since molecular change accumulates continuously, we predict that two
lineages that have a recent common ancestor will have fewer differences
between their genomes than with a more distantly related lineages.
If we can estimate how many genetic changes have occurred since they
split and we know how fast they accumulate we can use the genetic change
to predict when two lineages diverged.
INFERRING EVOLUTIONARY TIME FROM MOLECULAR CHANGES ARE
MOLECULAR CLOCKS.
MOLECULAR CLOCK HYPOTHESIS: IMPLICATIONS

If protein sequences evolve at constant rates, they can be used to estimate


the times that sequences diverged. This is analogous to dating geological
specimens by radioactive decay.

K
r=
2T
r rate of nucleotide substitution per site per year

4
K number of substitutions between two homologous sequences
T time of divergence (usually inferred from paleontological and
biogeographical data)
RATE OF NUCLEOTIDE SUBSTITUTION R AND TIME OF DIVERGENCE T
r = rate of substitution = 0.56 x 10-9 per site per year for haemoglobin
alpha
K = 0.093 = number of substitutions per nucleotide site (rat versus human)
r = K / 2T
T = .093 / (2)(0.56 x 10-9) = 80 million years
MOLECULAR CLOCK:
1960s- Zuckerkandl and Pauling observed that
number of amino acid differences between
haemoglobins had an approximately linear
relationship with the time since the common
ancestor (estimated from the fossil record).

And then we can build up a graphic


representation where we put all those
data, and then we know million year
ago they split apart because we have
fossil data, and then we calculate in
the sequences (in this example it's the
haemoglobin), the differences
between the sequence of haemoglobin
in those couples and we get a straight
line.
If we want to know the divergence
between dogs and humans (that it's
not in the graphic), and we don't have fossil data to know when those two
species split apart.
So we get this graphic, we calculate the differences in haemoglobin
sequences between dogs and humans, since we know the rate, so we can
calculate when those two species split apart. That's the aim of molecular
clocks, they are tools to calculate the divergence time between two species.
What for? To calculate the relationships in the phylogenetic trees:

5
Phylogenetic trees are proposals we do with the data we have to reconstruct
the history of evolution. In the past to do this we used embryo data, fossil
data, paleontological data, etc... And now we have another one which is the
molecular data, another one that

allows us to be more accurate.

Left: We can do this with different proteins. We can give millions of years of
time of divergence between different groups or species according to
different proteins.
We see that the rates are different, but what is constant is the linear
relationship between the degree of divergence (differences in the sequences
(2K)) and time of divergence (T).
So, the important thing here is that when we use molecular data, in
particular protein sequences, we can establish a linear relationship that
relates the number of differences and the time last since those two species
split apart. And we can do that taking into account different proteins,
cytochrome-C, haemoglobin, fibrinopeptides...
The slope for each protein is different because the rate of change is
different. And according to that, we can observe that cytochrome-C changes
slowly in comparison with haemoglobin or other proteins. So probably this is
very good when we are using a particular type of species that are far away.
We can use cytochrome-C for time of divergence of 20 million years
between species, the haemoglobin for species that diverge 5,8 million
years...

6
Right: We have different representations of the same. This was a revolution
in molecular evolution, because suddenly scientist realize that they would
have a very good way to solving that question, time and phylogenetic trees.

Here we have the rates and we can use one


or the other depending on the rates, and
depending on the distance between those
two species.

HYPOTESIS
Zuckerkandl and Pauling were very excited about that. They realize that for
any given macromolecule (a protein or DNA sequence) the rate of evolution
is approximately constant over time in all evolutionary lineages
(Zuckerkandl and Pauling 1965 in Wen-Hsiung Li 1997). That gave them the
linear relationship.

3. Neutral theory of evolution:


Then Kimura came, and he also said that it was very important that not all
the changes are ruled by the environment and some of them occur
randomly and try to neutralize somehow the importance of those changes.
An often-held view of evolution is that just as organisms propagate through
natural selection, so also DNA and protein molecules are selected for.
According to Motoo Kimuras 1968 neutral theory of molecular evolution, the
vast majority of DNA changes are not selected for in a Darwinian sense. The
main cause of evolutionary change is random drift of mutant alleles that are
selectively neutral (or nearly neutral). Positive Darwinian selection does
occur, but it has a limited role.
Kimura went step forward in the molecular clock theory and he said in 1983:
For each protein the rate of aa replacement (rate of evolution) is
approximately constant per year, per site for various lines, as long as the
function and tertiary structure of the molecule remain essentially
unaltered.
Meaning that if we have a sequence and that changes so much that has a
different function, that means that a protein evolves to a different one, then
the molecular clock it's not so good. Molecular clocks can be used as far as
the changes that occur in those proteins keeps their function and their three
dimensional structure. Because they keeps on being the same protein with
the same function. When a protein changes so much that it doesn't have the
same function anymore, then we cannot use it anymore.
THE "SACRAMENT" OF THE STRAIGHT LINE:

7
Here we can see the composition made by the combination of different
sequences, cytochrome-C, fribrinopeptide, haemoglobin... We have the
sacrament of the straight line: if we combine the data we have using
different molecular clocks (different proteins) we can get to this picture,
where we can very accurately say which one was the divergence time of
each of those couples.
This doesn't mean that everything will fit in the straight line, as we can see
over there, for example horse and donkeys are accelerate in the evolution,
they split earlier than molecular data sais. Or Apes and monkeys which are
slowdown in evolution.
But we have a very good prediction, and the line is the molecular clock
expectation. So, if we have two species and we calculate the aminoacid
replacement for a site, we are pretty much sure that we are going to be able
to calculate the speciation time for those two species.

THE MOLECULAR CLOCK IS A SLOPPY CLOCK:

Although we can describe an average rate of molecular change, and we can


estimate the probability that a particular site will change in a given time
period, we cannot say exactly which nucleotides in DNA sequence will
change and when. This is because the process of substitution is influenced
by chance at many different stages.
Therefore molecular change does not tick regularly and then it is difficult
to predict precise time from irregular changes

8
We can estimate the time they split apart. But this is not completely exact,
because when we have two points in the straight line it's easy to draw it, but
when we have this. When we have two observation points, making a straight
line is very easy (1).
But when we have more than two observation points, we have to use a very
thick line, meaning that if we calculate the number of the differences here
we are going to be between an interval of divergence time (2).
Or we can deny the accuracy of the measurements on one or both axes,
because assume that some of those are correct, and then we have another
one that has either more differences than we have observe (ms abajo en el
eje vertical), or the speciation time is shorter than as calculated (est ms a
la derecha en horizontal de lo esperado) (3). And this is the uncertainty.

1 2
3

We can see that there are some cases that goes out of that molecular
calculation, and that's something that occurs with any calculation through
molecular clocks.

IT IS DIFFICULT TO PREDICT PRECISE TIME FROM IRREGULAR CHANGES:

We are going to see why this occurs and we are going to see a method to
make the calculation as much accurate as possible. The first thing it's what
we can see over there, we can have two sequences and they change along
the time, and we can see how many changes occur, seven.
In the first one (blue), we can see that every X years we have a change in
that sequence, and after that X time we have another change. They occur
regularly every X years. But DNA sequences doesn't work like that, changes
sometimes occur after X/2 years and the next change occurs after 2X/3
years, and so on (red line). So the changes are not exactly regular.

9
So if we make this rate of change every X years, we get different rate. The
clock rate is the same (every X years we have a change), or in an sloppy
clock, we have a distribution of changes in terms of time. Here is the clue.
We need to know that we are going to get the most probable change rate.
So, we have to choose that one that is more likely to occur. To do that, we
cannot use just one sequence, like we are doing here, we have to use
different sequences:

Each length is the sequence of that particular protein and the peaces give
us the different times when changes occur in that sequence of each species.
So we can make a distribution of how many changes occur after, 2 years,
how many changes occur after 5 years... A dristrivution to predict the most
probable rate. Every line is the same protein but in different species. For
example we can have the haemoglobin, and we know that haemoglobin has
change in a pattern of 6 for cow, 8 for bird, 7 for rabbit and 9 for humans. So
if we analyze the intervals in which each of those sequences mutate, we
make the distribution, so we can easily observe that the most probable rate
of change in this particular protein, considering these 4 different species,
that is going to be after 2 years.
If they have same rate of substitution, do they have more substitutions
because it is older or because of chance? Same average rate of change but
due to random variation in intervals between changes they have
accumulated different absolute numbers of changes in the same time
period.

10
So we would be able to calculate the average of rate of the haemoglobin.
That's what we are going to use to build up our molecular clock. But, there
are some species that fall out of the most likely rate, and are those that fall
out the straight line. So that's why molecular clocks are not so accurate. But
they are accurate enough to give us more information about molecular
evolution.
And another thing to keep in mind is that we don't have in evolution a
unique tool. We need more than one prove, and usually what we do is to
analyze the same problem under different techniques, and under different
points of view to arrive to the same conclusion. Molecular clocks is a very
strong tool but not unique. And that's the problem of molecular clocks.
So, the sloppiness prevents us from staying exactly how long it took to
produce the observed substitutions. But we can use a distribution to
describe how likely the value is to fail between any given range of path
length. WE CAN DEFINE A CONFIDENCE SET, or a confidence period of time,
and never say exactly two species split apart in an exact number of years.
But we are going to talk about a rate between which we know two species
split apart. So a sloppy clock is better than no clock.
Controversy in molecular clock hypothesis: If we are allowed to move any
point horizontally, we will be able to fit any set of data! To avoid this
problem there are test that do not require knowledge of divergence
time. So as more narrow the Gaussian bell curve is, more accurate the
calculation will be, but if we have more variations, with some values that
make it wide and low, the calculation will be less precise.
That's why we need to know that, to know in which terms are we moving
when we are dealing with some particular species where the data don't fit
exactly with the expectancy.

4. Relative rate tests:


So that means that we need test to calculate and to really know with higher
accuracy the divergence time between two species, those are called relative
rate tests:
Sarich & Wilson's test
Tajima's one dimension method
Test involving comparisons of duplicate genes
Because if we have two species, for example, and we want to know the
divergence time between those two species (A and B). So if the accuracy it's
not big enough, then we are going to be very sure about that. There are a
few ways to know the accuracy if we know another third species where the
data we have are really accurate (C). That's what we call the relative test,
because we use another species of which we have trustful data as a
reference to calculate the divergence time of the other two.
If we want to compare the rate between lineages A and B, and we are going
to use the third species C as an out-group reference (this should be

11
branched off earlier than the divergence of A and B), so
we are going to calculate the number of substitutions
between A and B.
And that can be calculated knowing the number of
substitutions between A and 0, and that plus the
number of substitutions between 0 and B (A0) +
(B0).
The number of substitutions between A and B (K AB) is
equal to the sum of substitutions that have occurred
from point O to point A and from point O to point B.
KAC, KAB and KBC can be directly estimated from the
nucleotide sequences.
We can do the same if we compare those two sequences with the third one
(C0) of which one we have more accurate data. So then we establish this
two equations where we are going to calculate the number of substitutions
between A and C (A0 + 0top + topC) and between B and C (B0 +
0top + topC), considering the top point.
So we have these three different equations, and we have some data we can
easily calculate. We can calculate the number of substitutions directly from
the sequences, so we can calculate AC (because we know both
sequences), BC and also AB.
Since KAC, KAB and KBC can be directly estimated
from the nucleotide sequences we can easily
solve the three equations to find the values of
KOA, KOB and KOC.
Since we have those KAC, KAB and KBC we have three equations and we have
three unknown parameters, KOA, KOB, KOC.
We have to solve a system of three equations with 3 unknown parameters.
We can calculate that distance establishing that set of equations where
there is some data we can easily calculate from the comparison of the
sequences.
So what we have done here is to calculate the distance of speciation of
those two sequences taking into account a third sequence that we know,
that's why it's called relative comparative method. We compare the
sequences taking as a reference one third sequence.
So when we solve that, we can check if the rate of substitutions are correct
or not, and if the clock hypothesis is correct or not.
We can now decide if the rates of substitution are equal in lineages A and B
by comparing the values of KOA and KOB. If the molecular clock hypothesis is
correct, then KOA KOB = 0, so the hypothesis would be quite accurate. Then
we use KAC KBC as an estimator, we can use that difference, we can
calculate the distance between those two sequences.

12
We are not going to go in detail, but we should know that we can use
molecular clocks as an example of this method, to calculate the distance
between two sequences, if we have a third one of which we have many
data.
We have an example of the
calculation of the differences
between synonymous and not
synonymous substitutions per
100 sites. If we know that, we
can calculate of it's very close
to zero or not, and then with
that we can calculate the
distance time between those
two sequences.

13

You might also like