Professional Documents
Culture Documents
Basic probability
Events are possible outcomes from some
random process
Example
Consider the cross AaBbCc X aaBbCc
The OR rule
Again, consider two possible events, E1 and E2.
If these events are NONOVERLAPPING (they contain
no common elements), then Pr(E1 or E2) = Pr(E1) +
Pr( E2)
Hence, OR = add
Example: What is the probability that a genotype is
A-, i.e., that is AA or Aa?
The events genotype = AA and genotype = Aa
are nonoverlapping
Hence, Pr(A-) = Pr(AA or Aa) = Pr(AA) + Pr(Aa)
Conditional Probability
It is ALWAYS true that
Note that
P(A|B) = P(A,B)/P(B)
Bayes Theorem
Suppose an unobservable random variable (RV) takes on
values b1 .. bn
Suppose that we observe the outcome A of an RV correlated
with b. What can we say about b given A?
Bayes theorem:
Here
= [0.60*0.023]/0.12 = 0.115
Freq(genotype)
0.5
0.3
0.2
0.3
0.6
0.9
11
Pi > 0,
P =1
i
13
14
Unit normal
(mean 0, variance 1)
17
Mean () = peak of
distribution
18
20
21
Covariances
Cov(x,y) = E [(x-x)(y-y)]
= E [x*y] - E[x]*E[y]
23
X
24
25
Correlation
Cov = 10 tells us nothing about the strength of an
association
What is needed is an absolute measure of association
This is provided by the correlation, r(x,y)
More generally
Regressions
Consider the best (linear) predictor of y given we know x
30
r2 = 0.3
r2 = 0.9
r2 = 0.6
r2 = 1.0
31
32
Distribution-based estimation
33
Distribution-based estimation
Maximum likelihood estimation
MLE
REML
More in Lynch & Walsh (book) Appendix 3
Bayesian
34
Maximum Likelihood
p(x1,, xn | ) = density of the observed data (x1,, xn)
given the (unknown) distribution parameter(s)
Fisher suggested the method of maximum likelihood --given the data (x1,, xn) find the value(s) of that
maximize p(x1,, xn | )
We usually express p(x1,, xn | ) as a likelihood
function l ( | x1,, xn ) to remind us that it is dependent
on the observed data
The Maximum Likelihood Estimator (MLE) of are the
value(s) that maximize the likelihood function l given
the observed data x1,, xn .%
35
MLE of
l ( | x)
%
This is formalized by looking at the log-likelihood surface,
L = ln [l ( | x) ]. Since ln is a monotonic function, the
value of that maximizes l also maximizes L
The curvature of the likelihood surface in the neighborhood of
the MLE informs us as to the precision of the estimator.
A narrow peak = high precision. A board peak = low precision
The larger the curvature, the smaller
the variance
36
37
Bayesian Statistics
An extension of likelihood is Bayesian statistics
Instead of simply estimating a point estimate (e.g., the
MLE), the goal is the estimate the entire distribution
for the unknown parameter given the data x
p( | x) = C * l(x | ) p()
p( | x) is the posterior distribution for given the data x
l(x | ) is just the likelihood function
p() is the prior distribution on .
38
Bayesian Statistics
Why Bayesian?
Exact for any sample size
Marginal posteriors
Efficient use of any prior information
MCMC (such as Gibbs sampling) methods
Priors quantify the strength of any prior information.
Often these are taken to be diffuse (with a high
variance), so prior weights on spread over a wide
range of possible values.
39
40
41
42
43
Lecture 1:
Intro/refresher in
Matrix Algebra
Bruce Walsh lecture notes
Short Course on Evolutionary
Quantitative Genetics,
Edinburgh, 31 Oct - 4 Nov 2016
Topics
Definitions, dimensionality, addition,
subtraction
Matrix multiplication
Inverses, solving systems of equations
Quadratic products and covariances
The multivariate normal distribution
Eigenstructure
Basic matrix calculations in R
The Singular Value Decompositon (SVD)
2
Column vector
Row vector
(3 x 1)
(1 x 4)
General Matrices
Usually written in bold uppercase, e.g. A, C, D
Square matrix
(3 x 2)
Partitioned Matrices
It will often prove useful to divide (or partition) the
elements of a matrix into a matrix whose elements are
itself matrices.
Example:
XTX
or, =
"
(XTX)-1 XTy
XTy
10
Matrix Multiplication:
The order in which matrices are multiplied affects
the matrix product, e.g. AB = BA
For the product of two matrices to exist, the matrices
must conform. For AB, the number of columns of A must
equal the number of rows of B.
The matrix C = AB has the same number of rows as A
and the same number of columns as B.
11
13
Example
Matrix multiplication in R
R fills in the matrix from
the list c by filling in as
columns, here with 2 rows
(nrow=2)
Entering A or B displays what was
entered (always a good thing to check)
The command %*% is the R code
for the multiplication of two matrices
On your own: What is the matrix resulting from BA?
What is A if nrow=1 or nrow=4 is used?
15
17
18
Solving equations
The identity matrix I
19
1 for i = j
0 otherwise"
20
21
22
Inversion in R
solve(A) computes A-1
det(A) computes determinant of A
Using A entered earlier
Compute A-1
=I
Computing determinant of A
24
Homework
Put the following system of equations in matrix
form, and solve using R
3x1 + 4x2 + 4 x3 + 6x4 = -10
9x1 + 2x2 - x3 - 6x4 = 20
x1 + x2 + x3 - 10x4 = 2
2x1 + 9x2 + 2x3 - x4 = -10
25
Variance-Covariance matrix
A very important square matrix is the
variance-covariance matrix V associated with
a vector x of random variables.
Vij = Cov(xi,xj), so that the i-th diagonal
element of V is the variance of xi, and off
-diagonal elements are covariances
V is a symmetric, square matrix
29
The trace
The trace, tr(A) or trace(A), of a square matrix
A is simply the sum of its diagonal elements
The importance of the trace is that it equals
the sum of the eigenvalues of A, tr(A) = i
For a covariance matrix V, tr(V) measures the
total amount of variation in the variables
i / tr(V) is the fraction of the total variation
in x contained in the linear combination eiTx, where
ei, the i-th principal component of V is also the
i-th eigenvector of V (Vei = i ei)
30
Eigenstructure in R
eigen(A) returns the eigenvalues and vectors of A
Trace = 60
PC 1 accounts for 34.4/60 =
57% of all the variation
0.400* x1 0.139*x2 + 0.906*x3
PC 1
31
32
34
37
"
"
x1, x2 equal variances,
uncorrelated
"
"
39
Eigenstructure of V
The direction of the largest axis of
variation is given by the unit-length
vector e1, the 1st eigenvector of V.
1 e1
2 e2
"
40
41
Principal components
The principal components (or PCs) of a covariance
matrix define the axes of variation.
42
44
45
46
47
49
50
A data set for soybeans grown in New York (Gauch 1992) gives the
GE matrix as
51
52
53
54
Additional references
Lynch & Walsh Chapter 8 (intro to
matrices)
Online notes:
Appendix 4 (Matrix geometry)
Appendix 5 (Matrix derivatives)
55
Lecture 3:
Linear and Mixed Models
Bruce Walsh lecture notes
Short Course on Evolutionary
Quantitative Genetics,
Edinburgh, 31 Oct - 4 Nov 2016
y = X + e
y = vector of observed dependent values
X = Design matrix: observations of the variables in the
assumed linear model
= vector of unknown parameters to estimate
e = vector of residuals (deviation from model fit),
e = y-X "
2
y = X + e
Solution to depends on the covariance structure
(= covariance matrix) of the vector e of residuals
Ordinary least squares (OLS)
OLS: e ~ MVN(0, 2 I)
Residuals are homoscedastic and uncorrelated,
so that we can write the cov matrix of e as Cov(e) = 2I
the OLS estimate, OLS() = (XTX)-1 XTy
Generalized least squares (GLS)
GLS: e ~ MVN(0, V)
Residuals are heteroscedastic and/or dependent,
GLS() = (XT V-1 X)-1 V-1 XTy
3
BLUE
Both the OLS and GLS solutions are also
called the Best Linear Unbiased Estimator (or
BLUE for short)
Whether the OLS or GLS form is used
depends on the assumed covariance
structure for the residuals
Special case of Var(e) = e2 I -- OLS
All others, i.e., Var(e) = R -- GLS
Linear Models
One tries to explain a dependent variable y as a linear
function of a number of independent (or predictor)
variables.
A multiple regression is a typical linear model,
Here e is the residual, or deviation between the true
value observed and the value predicted by the linear
model.
The (partial) regression coefficients are interpreted
as follows: a unit change in xi while holding all
other variables constant results in a change of i in y
5
Linear Models
As with a univariate regression (y = a + bx + e), the model
parameters are typically chosen by least squares,
wherein they are chosen to minimize the sum of
squared residuals, ei2
This unweighted sum of squared residuals assumes
an OLS error structure, so all residuals are equally
weighted (homoscedastic) and uncorrelated
If the residuals differ in variances and/or some are
correlated (GLS conditions), then we need to minimize
the weighted sum eTV-1e, which removes correlations and
gives all residuals equal variance.
6
Regression model
si = effect of sire i
dij = effect of dam j crossed to sire i
xijk = age of the kth offspring from i x j cross
10
In-class Exercise
Suppose you measure height and sprint speed for
five individuals, with heights (x) of 9, 10, 11, 12, 13
and associated sprint speeds (y) of 60, 138, 131, 170, 221
1) Write in matrix form (i.e, the design matrix
X and vector of unknowns) the following models
y = bx
y = a + bx
y = bx2
y = a + bx + cx2
2) Using the X and y associated with these models,
compute the OLS BLUE, = (XTX)-1XTy for each
12
14
16
Sample Variances/Covariances
The residual variance can be estimated as
18
Polynomial Regressions
GLM can easily handle any function of the observed
predictor variables, provided the parameters to estimate
are still linear, e.g. Y = + 1f(x) + 2g(x) + + e
Quadratic regression:
19
Interaction Effects
Interaction terms (e.g. sex x age) are handled similarly
21
22
23
Model diagnostics
Its all about the residuals
Plot the residuals
24
25
26
27
29
Environmental effects
Consider yield data measured over several years in a
series of plots.
Standard to treat year-to-year variation at a specific
site as being random effects
Often the plot effects (mean value over years) are
also treated as random.
For example, consider plants group in growing
region i, location j within that region, and year
(season) k for that location-region effect
E = Ri + Lij + eijk
30
Random models
With a random model, one is assuming that
all levels of a factor are not observed.
Rather, some subset of values are drawn
from some underlying distribution
31
32
Random models
Lets go back to treating yearly effects as random
If assume these are uncorrelated, only use one
degree of freedom, but makes assumptions about
covariance structure
Standard: Uncorrelated
Option: some sort of autocorrelation process, say with a
yearly decay of r (must also be estimated)
33
yij = + si + eij
Conversely, if we are not only interested in these
10 particular sires but also wish to make some
inference about the population from which they
were drawn (such as the additive variance, since
2A = 42s, ), then the si are random effects. In this
case we wish to estimate and the variances
2s and 2w. Since 2si also estimates (or predicts)
the breeding value for sire i, we also wish to
estimate (predict) these as well. Under a
random-effects interpretation, we write the model as
yij = + si + eij, 2(e) = 2eI, 2(s) = 2AA
34
Identifiability
Recall that a fixed effect is said to be
estimable if we can obtain a unique estimate
for it (either because X is of full rank or when
using a generalized inverse it returns a
unique estimate)
Lack of estimable arises because the experiment
design confounds effects
y = X + Zu + e
Incidence
matrix for
fixed effects
Vector of random
effects, such as
individual
Breeding values
(to be estimated)
36
y = X + Zu + e
Incidence
matrix for
fixed effects
Vector of random
effects
Observe y, X, Z.
Estimate fixed effects
Estimate random effects u, e
37
38
SSM
SST
=1-
SSE
SST
41
42
43
Hypothesis testing
Provided the residual errors in the model are MVN, then for a model
with n observations and p estimated parameters,
The difference in the error sum of squares for the full and reduced
model provided a test for whether the model fit is the same
44
Mixed models
Mixture models
46
Mixture models
Under a mixture model, an observation potentially
comes from one of several different distributions, so
that the density function is 11 + 22 + 33
47
48
!
=
!
T
!
!
!
1
!
2
!
1
!
L(,V | x) =! (!2!!)!
j!V!j!
ex! !p! ! (!x!! ")! V! (x! ! ! !)!
2!
-
n =2
Lecture 4:
Introduction to Quantitative
Genetics
Bruce Walsh lecture notes
Short Course on Evolutionary
Quantitative Genetics,
Edinburgh, 31 Oct - 4 Nov 2016
Basic model: P = G + E
Environmental value
Genotypic value
Variation in G
Var(P) = Var(G) +
Var(E)
Genotypic values
It will prove very useful to decompose the genotypic
value into the difference between homozygotes (2a) and
a measure of dominance (d or k = d/a)
aa
C-a
Aa
AA
C+d
C+a
10
Computing a and d
Suppose a major locus influences plant height, with
the following values
Genotype
aa
Aa
AA
Trait value
10
15
16
11
aa
Aa
AA
Value
C-a
C+d
C+a
Frequency
q2
2pq
p2
Contribution from
heterozygotes
12
aa
Aa
AA
Value
C-a
C+d
C+a
Frequency
1/4
1/2
1/4
13
Genotype
aa
Aa
AA
Value
C-a
C+d
C+a
Frequency
RILs = C + a(p-q)
Note this is independent of the amount of dominance (d)
14
15
Random mating
Consider the average effect of allele A when a parent is randomlymated to another individual from its population
Probability
Genotype
Value
AA
C+a
Aa
C+d
16
Random mating
Now suppose parent contributes a
Allele from other
parent
Probability
Genotype
Value
Aa
C+d
aa
C-a
17
E(X) = pA + qa = pq - qp = 0,
The average effect of a random allele is zero,
hence average effects are deviations from the
mean
18
Dominance deviations
Fisher (1918) decomposed the contribution
to the genotypic value from a single locus as
Gij = + i + j + ij
Here, is the mean (a function of p)
i are the average effects
Hence, + i + j is the predicted genotypic
value given the average effect (over all
genotypes) of alleles i and j.
The dominance deviation associated with
genotype Gij is the difference between its true
value and its value predicted from the sum of
average effects (essentially a residual)
19
Gij = G + i + j + ij
Average contribution to genotypic value for allele i
Mean value G = Gij Freq(AiAj)
20
Gij = G + i + j + ij
Since parents pass along single alleles to their
offspring, the i (the average effect of allele i)
represent these contributions
The average effect for an allele is POPULATIONSPECIFIC, as it depends on the types and frequencies
of alleles that it pairs with
The genotypic value predicted from the individual
allelic effects is thus
^ = + +
G
ij
G
i
j
21
Gij = G + i + j + ij
The genotypic value predicted from the individual
allelic effects is thus
^ = + +
G
ij
G
i
j
Dominance deviations --- the difference (for genotype
AiAj) between the genotypic value predicted from the
two single alleles and the actual genotypic value,
^ij = ij
Gij - G
22
+ 22
Genotypic Value
G22
G21
12
+ 21
22
+ 1 + 2
Slope = = 2 - 1
1
11
G11
0
11
N = # Copies of Allele 2
Genotypes
1
21
2
22
23
Gij = G + i + j + ij
Predicted value
Residual error
Gij = G + 21 +(2 1) N + ij
Independent (predictor) variable N = # of A2 alleles
Note that the slope 2 - 1 = , the average effect
of an allelic substitution
24
Gij = G + 21 + (2 1) N + ij
Intercept
Regression slope
G
G11
0
1
2
N
The size of the circle denotes the weight associated with
that genotype. While the genotypic values do not change,
their frequencies (and hence weights) do.
26
Slope = 2 - 1
G22
G
G11
0
1
N
27
G
G11
0
A ( G ij ) = i + j
29
Genetic Variances
Writing the genotypic value as
Gij = G + (i + j) + ij
The genetic variance can be written as
Genetic Variances
$G2 = $ 2A + $D2
32
33
Q1Q1
Q1Q2
Q2Q2
a(1+k)
2a
Since E[] = 0,
Var() = E[( -a)2] = E[2]
One locus, 2 alleles:
Dominance variance
Q1Q1
Q1Q2
Q2Q2
a(1+k)
2a
VA
Allele frequency, p
36
Complete dominance (k = 1)
VA
VD
Allele frequency, p
37
Epistasis
38
39