You are on page 1of 49

DNA analysis

in population
genetics

757618S

Teacher Tanja Pyhäjärvi and Jouni Aspi


Tanja.pyhajarvi@oulu.fi
Room B244
Intro and practicalities
Course description

– Deeper understanding of population genetics theory and coalescence theory


– Neutral theory and other theories related to genetic variation
– Mutation, linkage disequilibrium and recombination affecting genetic variation
– Demographic changes, mating system, selection, population structure and their
relationship to genetic polymorphism
– Identifying natural selection
Practical info

– Two lecturers:
– DNA sequence analysis and theory (Tanja Pyhäjärvi) 6 x 2 h
– Markers and applications, population structure (Jouni Aspi) 6 x 2 h
Course content and
requirements
– 10 ECTS = 270 h of student work!
– 12 x 2 h lectures
– 6 h exercises (problem solving)
– 36 h computer exercises
– 3 h seminars
– Take home exam
– 201 h independent work!!!
Independent work

– Reading
– Homework (Problem solving)
– A project work
– You will choose a population genetic dataset
– Will do independent analysis (computer class)
– Write a report (5-10 pages)
– Present your work in a seminar
– Groups of 1-2 persons
– A teacher will be available for advice and supervision
– More info later…
Course literature

Population genetics / Matthew B. Hamilton.


Available in library
Available as an e-book (Dawsonera) through
University library
Other resources

– Graham Coops Population Genetics notes


– http://cooplab.github.io/popgen-notes/#the-coalescent-and-patterns-of-
neutral-diversity
– Simulations
– http://www.coalescent.dk
– http://scit.us/redlynx/
Course evaluation

– Tale home exam (Lectures AND literature): max 30 p


– Written reports: max 50 p
– Seminar presentations: max 10 p
– Activity: max 10 p
Points Grade
90-100 5
80-89 4
70-79 3
60-69 2
50-59 1
< 50 fail
Course Noppa-site

https://noppa.oulu.fi/noppa/kurssi/757618s/etusivu
– Timetable
– Lecture slides
– Material
– News
– Etc.
Format of my 6 x 2 h lectures

– Step 1 Check the reading assignments from Noppa

– Step 2 Reading and thinking….

Repeat x 6
– Step 3 Lecture
– Discussion on previous reading assignment
– Lecture on topic
– New reading assignment
Warm-up Quiz
These you should already know

– https://play.kahoot.it/#/k/07c602db-7c57-4206-8497-ea51f150870f
How each of these phenomena affect
the amount of genetic diversity?

– Mutation
– Linkage disequilibrium
– Drift
– Inbreeding
– Selection
Drift

– This should be familiar from previous courses


– If not, remind yourself
Drift
– Wright-Fisher –model

Finite population producing


infinite number of gametes
A1
A2 Number of gametes is in direct
relationship to allele frequencies
p
q


• 107 replicate populations
• 8 male and females
transferred to form the
next generation
• bw75 allele frequency in
the beginning 0.5

“absorbing state”

D. L. Hartl and A. G. Clark. 1989. Principles of Population


Genetics. Sinauer, Sunderland, MA.
http://scit.us/redlynx/
Reading assignments

– Hamilton: Section 3.6 Gene genealogies and the coalescent model


– Guiding questions in Noppa
Coalescence, what happens
here?

Rosenberg & Nordborg 2002


Coalescent theory
Introduction

Some slides courtesy of Outi Savolainen


Basic Glossary

– Gene genealogy
– Mutation past
– Lineage
– Coalescent event TMRCA

– MRCA
– TMRCA
– Polymorphism present
Polymorphism data
Individual 1 ATTGCGGTCCGTAATAATCTGT
Individual 2 ATGGCGGTCCGTAATAATCTGT
Nucleotide diversity Individual 3 ATGGCTGTCCGTAACAATCTGT
Individual 4 ATGGCTGTCCGTAACAATCTGT
Individual 5 ATGGCTGTCCGTAATAATCTGT
Individual 6 ATGGCTGTCCGTAATAATCTGT

SNPs, single
nucleotide
polymorphism
Why coalescent?

– We observe patterns of genetic diversity


– And want to make conclusions based on diversity on e.g.
– Population size
– Structure
– Demographic events in the past
– Effect of natural selection
– …
Peculiarities of genetic
polymorphism data
– Result of stochastic process of sampling gametes, from one generation to
another
+
– Random mutations

Under exactly same conditions


the outcome, pattern of
polymorphism can be different
Coalescent theory, process and
simulations are a tool to
account for these random
processes and connect with
data
Example
Resurrecting Surviving Neandertal Lineages from Modern Human Genomes, (Vernot & Akey 2014,
Science)
– non-African humans inherit ~1 to 3% of their genomes from Neanderthal ancestors
– Authors developed a method to identify Neanderthal sequences
– Used coalescent simulations to test the method and identify admixture model explaining the
observed pattern of haplotype structure
Wright-Fisher population model

– Mating is _________
– Non-_____________ generations
– ______________ population size
– No ___________
– No ___________
Sampling process, forward

In the nature, there is past


variance in offspring
number. Even in a
constant population,
each individual does not
produce exactly 1
offspring, but in average
1.

present
Present generation
past

present
Observed samples
past

present

Sample
Coalescent process

– Only model the genealogy of samples vs. modeling the genealogy of all the
individuals in the populations history
– Ignores any lineage that did not make to present time
– Originally by Kingman (1982), Hudson and Tajima, too
Ne

– Effective population size (vs. census size)


“the number of breeding individuals in an idealized population that would
show the same amount of dispersion of allele frequencies under random
genetic drift or the same amount of inbreeding as the population under
consideration” (Sewall Wright)

– Another definition:
– Coalescent effective population size

“the Ne of a W-F population with the same rate of coalescence as


observed“

– Can be estimated e.g. from polymorphism data


Coalescent process

– Backwards in time, start from sampled individuals


– Offspring pick their parents
– When they pick the same parent: It is a coalescent event!
– When Ne is large and n (sample size) is small
– Continuous time
– Maximum of one coalescence event per generation (often none)
– Which is more likely; that two individuals have same parent when Ne = 10 or Ne
= 1000?
– http://www.coalescent.dk
Graphical version of principle of
coalescent
Random process
– Samples of coalescent trees
Add mutations, get polymorphism
– Note that mutation is independent of genealogy when mutations are neutral
What Coalescent is…

• Tool to model random processes involved in


genealogical trees
• Way to model evolutionary processes together
with drift
• Framework to think in population genetics
• Allows hypothesis testing
…and is not

– They are not phylogenetic trees


– Their purpose is not to infer phylogenetic relationship i.e. which other species
this species derived from
– They are not the whole truth
– How do the deviations from assumptions change coalescent process?
Reminder
– Phylogenetic trees are depictions of
species history
– If divergence was long ago, individual
gene trees will be representative of
species trees
– Genealogies in a population are highly
variable, and are of interest to inform
about processes, not in themselves
(Rosenberg and Nordborg Fig)
Coalescent theory
Specific questions
Coalescent tree

Hedrick 2005
Questions on basic coalescent

– We think of population with size N (2N genes)


– But normally only k sequences are examined (k << 2N)
– Questions:
– What length of time are there k sequences? How long do we wait
for a coalescent event?
– What is the general structure of genealogies?
– What is the time to most recent common ancestor?
– How big is the tree (total branch length)?
Finding ancestors
Probability that two alleles in generation t have the same ancestral allele
in gen t-1 is:______________

Hamilton 2012 Book

In principle, many alleles can find shared ancestor – we allow only


one coalescence at a time

Reasonable when N is large and sample size small (Kingmans important


result)
Geometric distribution

– X times needed to get one success, probablity of success is p


– Prob (X=k)=(1-p)(k-1)p
– E(X)=1/p Var(X)=1/p2
– How many tries before first 6 when you throw dice?

– From basic probability theory


Two lineages, probability of
coalescence
– For two lineages, the probability of coalescence follows geometric distribution
with parameter 1/2N

– The probability that two genes coalesce t generations ago is given by

– Can be approximated by using exponential function


– Using exponential approximation, cumulative probability that pair of lineages
coalesce at t or before generation t
Hamilton 2012 Book
Following expectations from
exponential distribution
– Waiting time is the inverse of the probability of occurrence
– Thus the expected waiting time for coalescence event is:______
– Variance of the waiting time is:____________
– What to expect from independent coalescent trees?
Reading assignments

– Hamilton: 5.1, 5.3: Mutation models for DNA sequences, 5.5, 8.2 (skip
divergence) p 248-250
– Guiding questions in Noppa

You might also like