Professional Documents
Culture Documents
Regression
2dy 2y
x
0
2x
p X,Y x,y
2
pY X y x
pX x
2x
1
y 0,x
uniform
2
Bayes theorem
p D
p D,
p D
p D p
p D
Prediction
%
p(D|D)
See D, compute:
Predictive Distribution
% p(D|
%)p( | D)d
p(D|D)
% )) !!!)
( p(D|
% p D
% p D
assumes p D,D
Decision theory
Loss: L(a,) where a=action; =state of nature
Bayesian decision theory:
) ; typically, L , ' A
min
L(
10
E r E ED| L ,
E D E|D L ,
12
Bayes/Classical Estimators
l N
l n
p
Prior washes out locally uniform!!! Bayes is
consistent unless you have dogmatic prior.
1
&
p D ~ N MLE , H
MLE
13
14
Bayesian Computations
Before simulation methods, Bayesians used posterior
expectations of various functions as summary of
posterior.
E D h h
p D p
p D
note : p D p D p d
15
Conjugate Families
Models with convenient analytic properties almost
invariably come from conjugate families.
Why do I care now?
- conjugate models are used as building blocks
- build intuition re functions of Bayesian inference
Definition:
A prior is conjugate to a likelihood if the
posterior is in the same class of distributions as prior.
Basically, conjugate priors are like the posterior from
some imaginary dataset with a diffuse prior.
16
Beta-Binomial model
yi ~ Bern( )
n
( ) yi (1 )1 yi
i 1
(1 )
y
n y
where y
y
i 1
p( | y ) ?
Need a prior!
17
Beta distribution
0.08
0.10
Beta(,) 1(1 )1
0.02
0.04
0.06
E[] /( )
0.00
a=2, b=4
a=3, b=3
a=4, b=2
0.0
0.2
0.4
0.6
0.8
1.0
18
Posterior
p( | D) p(D | )p( )
y (1 )ny 1(1 ) 1
y 1(1 )n y1
~ Beta( y, n y)
19
Prediction
p( |y)d
0
E[ |y]
20
Regression model
i ~ Normal(0, 2 )
yi x i
'
i
p(yi |, )
2
1
' 2
exp
(y
x
i
i)
2
2
2
2
1
21
simultaneous
systems are not
written this way!
Regression model
p(x,y) p x p y x,, 2
p ,, y,X p p x i p , 2
i
2
p
y
x
,
i i
i
Conjugate Prior
What is conjugate prior? Comes from form of
likelihood function. Here we condition on X.
~ N 0, 2In
Jacobian : y 1
l ,
p y X, ,
2
n / 2
exp
1
2 2
X ' g
Geometry of regression
y
e y y
x2
y x11 x2 2
2x2
1x1
x1
e ' y 0
24
Traditional regression
e ' y y 'e 0
(X )'(y X ) 0
Cholesky Roots
In Bayesian computations, the fundamental matrix
operation is the Cholesky root. chol() in R
The Cholesky root is the generalization of the
square root applied to positive definite matrices.
As Bayesians with proper priors, we dont ever
have to worry about singular matrices!
U'U, p.d.s.
ii
26
Cholesky Roots
Cholesky roots can be useful to simulate from
Multivariate Normal Distribution.
U'U; z ~ N 0,I
Regression with R
data.txt:
UNIT Y
X1
X2
A 1 0.23815 0.43730
A 2 0.55508 0.47938
A 3 3.03399 -2.17571
A 4 -1.49488 1.66929
B 10 -1.74019 0.35368
B 9 1.40533 -1.26120
B 8 0.15628 -0.27751
B 7 -0.93869 -0.04410
B 6 -3.06566 0.14486
df=read.table("data.txt",header=TRUE)
myreg=function(y,X){
#
# purpose: compute lsq regression
#
# arguments:
# y -- vector of dep var
# X -- array of indep vars
#
# output:
# list containing lsq coef and std errors
#
XpXinv=chol2inv(chol(crossprod(X)))
bhat=XpXinv%*%crossprod(X,y)
res=as.vector(y-X%*%bhat)
ssq=as.numeric(res%*%res/(nrow(X)-ncol(X)))
se=sqrt(diag(ssq*XpXinv))
list(b=bhat,std_errors=se)
}
28
Regression likelihood
(y X)'(y X) (X X)'(X X) (y X )'(y X )
2(y X )'(X X)
s2 ( )' X ' X( )
s 2 SSE (y X )'(y X )
where
nk
p y X,,
2 k / 2
exp
( )' X ' X( )
2
2
s
2 / 2
( )
exp 2
2
29
Regression likelihood
p y X,, 2 normal ?
? is density of form p e
30
Bayesian Regression
Prior:
Interpretation
as from
another
dataset.
p(, 2 ) p( | 2 )p(2 )
1
)'
A(
)
2
p( | 2 ) ( 2 )k / 2 exp
p(2 ) ( 2 )
0
1
2
0s02
exp
2
2
Inverted Chi-Square:
s
2 ~ 02 0
0
Posterior
p(, 2 |D) l ( , 2 )p( | 2 )p( 2 )
1
(y
)'(y
)
2
( 2 )n / 2 exp
2 k / 2
( )
( )
2
exp
(
)'
A(
)
2
0
1
2
0s02
exp
2
2
32
y
v
U
X
U
33
Posterior
1
%
%
(
)'(X
'
X
A)(
)
2
(2 )k / 2 exp
( 2 )
n 0 2
2
2
(0 s02 ns%
)
exp
2
2
[ | 2 ] N(%
, 2 (X ' X A)1)
2
s
[2 ] 12 1 with 1 0 n
1
2
2
%
n
s
s12 0 0
0 n
34
IID Simulations
Scheme: [y|X, , 2] [|2] [2]
1) Draw [2 | y, X]
2) Draw [ | 2,y, X]
3) Repeat
35
s
1) [2 |y,X] 12 1
1
1
2
2
%
2) y, X, N , X ' X A
1
2
%
%
note : ~ N 0,I ; U' ~ N ,U'U X ' X A
36
Bayes Estimator
The Bayes Estimator is the posterior mean of .
%
E D E2 D E D,2 E2 D %
37
1
2
2
1
2 1
2
%
Var (X ' X A) A or X ' X
Is this reasonable?
38
39
p ,
p p
2
1
2
Is this non-informative?
Of course not, it says that is large with high
prior probability
Is this wise computationally?
No, I have to worry about singularity in XX
Is this a good procedure?
No, it is not admissible. Shrinkage is good!
40
runireg
runireg=
function(Data,Prior,Mcmc){
#
# purpose:
# draw from posterior for a univariate regression model with natural conjugate prior
#
# Arguments:
# Data -- list of data
#
y,X
# Prior -- list of prior hyperparameters
# betabar,A
prior mean, prior precision
# nu, ssq
prior on sigmasq
# Mcmc -- list of MCMC parms
# R number of draws
# keep -- thinning parameter
#
# output:
#
list of beta, sigmasq draws
#
beta is k x 1 vector of coefficients
# model:
#
Y=Xbeta+e var(e_i) = sigmasq
#
priors: beta| sigmasq ~ N(betabar,sigmasq*A^-1)
#
sigmasq ~ (nu*ssq)/chisq_nu
41
runireg
RA=chol(A)
W=rbind(X,RA)
z=c(y,as.vector(RA%*%betabar))
IR=backsolve(chol(crossprod(W)),diag(k))
#
W'W=R'R ; (W'W)^-1 = IR IR' -- this is UL decomp
btilde=crossprod(t(IR))%*%crossprod(W,z)
res=z-W%*%btilde
s=t(res)%*%res
#
# first draw Sigma
#
sigmasq=(nu*ssq + s)/rchisq(1,nu+n)
#
# now draw beta given Sigma
#
beta = btilde + as.vector(sqrt(sigmasq))*IR%*%rnorm(k)
list(beta=beta,sigmasq=sigmasq)
}
42
0
500
1000
1500
2000
43
1.0
1.5
out$betadraw
2.0
2.5
0
500
1000
1500
2000
44
0.20
0.25
out$sigmasqdraw
0.30
0.35
Multivariate Regression
y1 X1 1
Y XB E,
M
y c X c c
B 1,K ,c ,K ,m
M
y m X m m
row ~ iid N 0,
Y y1,K ,y c ,K ,ym
E 1,K , c ,K , m
45
1 n
p Y | X,B, | |
exp yr B' x r 1 y r Bx r
2 r 1
| |n / 2 etr Y XB Y XB 1
2
n / 2
| |(nk) / 2 etr S 1
2
k / 2
||
where S Y XB Y XB
etr B B XX B B
2
46
therefore,
1
| |k / 2 exp 1 X X
2
47
p 0 ,V0
0 m1 / 2
etr 21 V0 1
denoted ~ IW 0 ,V0
if 0 m 1, E (0 m 1) 1 V0
- tightness
V- location
however, as
increases,
spread also
increases
If ~ IW 0 ,V0 , 1 ~ W 0 ,V01
if 0 m 1, E 0 V01
Generalization of 2 :
Let i ~ Nm (0, )
Then W
' ~ W(, )
i 1
i i
p ,B p p B |
~ IW 0 ,V0
| ~ N , A 1
Posterior:
| Y,X ~ IW 0 n,V0 S%
1
| Y,X, ~ N %
, XX A
% Y XB% Y XB% B% B A B% B
S
% vec B%,
1
B% XX A X XB AB ,
50
51
Conjugacy is Fragile!
SUR:
yi X i i i i 1,K ,m
set of
regressions
related via
correlated
errors