You are on page 1of 36

Bayesian statistics

new data prior knowledge

p( y | )

p ( )

p ( | y ) p ( y | ) p ( )
posterior

likelihood

prior

Bayes theorem allows one to


formally incorporate prior
knowledge into computing
statistical probabilities.

The posterior probability of


the parameters given the data
is an optimal combination of
prior knowledge and new
data, weighted by their
relative precision.

Bayes rule
Given data y and parameters , their joint probability can be
written in 2 ways:

p( | y ) p( y ) p( y, )

p( y, ) p( y | ) p( )

Eliminating p(y,) gives Bayes rule:


Likelihoo
d
Posterior

Prior

p ( y | ) p ( )
p ( | y )
p( y )
Evidence

Principles of Bayesian inference


Formulation of a generative model
likelihood
likelihoodp(y|
p(y|))
prior
priordistribution
distributionp(
p())

Observation of data

yy
Update of beliefs based upon observations, given a
prior state of knowledge

p( | y ) p( y | ) p( )

Univariate Gaussian
Normal densities

p ( ) N ( ; p , p1 )

y e

p ( y | ) N ( y; , e1 )

p( | y) N ( ; , )
1

e p

1 e y p p

Posterior mean =
precision-weighted
combination of prior mean

Bayesian GLM: univariate case


Normal densities

p ( ) N ( ; p , p1 )

y x e

p( y | ) N ( y; x, e1 )

p( | y) N ( ; , )
1

e x p
2

1 e xy p p

Bayesian GLM: multivariate case


Normal densities

y X e

p( ) N ( ; p , C p )

p( y | ) N ( y; X , Ce )

C X T Ce X C p

p( | y ) N ( ; , C )
1

C X T Ce y C p p

One step if Ce and Cp are


known. Otherwise iterative
estimation.

Approximate inference: optimization


mean-field
approximation

True posterior

p ( | y, m)

iteratively improve
Approximate
posterior

p ( y, | m)
p ( y m)

q ( ) q( i )
i

p ( y , | m)

q ( )
q ( ) log

q
(

)
p
(

|
y
,
m
)

free energy

log p ( y m) q ( ) log

Objective
function

Value of parameter

Simple example linear regression


Data

Ordinary least squares

y X
y

ED ( y X )T ( y X )
ED
0 ols ( X T X ) 1 X T y

Simple example linear regression


Data and model fit

Ordinary least squares

y X
ED ( y X )T ( y X )

ED
0 ols ( X T X ) 1 X T y

Bases (explanatory variables)

Sum of squared errors

Simple example linear regression


Data and model fit

Ordinary least squares

y X
ED ( y X )T ( y X )

ED
0 ols ( X T X ) 1 X T y

Bases (explanatory variables)

Sum of squared errors

Simple example linear regression


Data and model fit

Ordinary least squares

y X
ED ( y X )T ( y X )

ED
0 ols ( X T X ) 1 X T y

Bases (explanatory variables)

Sum of squared errors

Simple example linear regression


Data and model fit

Ordinary least squares


Over-fitting: model fits noise
Inadequate cost function: blind to
overly complex models

Solution: include uncertainty in


model parameters
Bases (explanatory variables)

Sum of squared errors

Bayesian linear regression:


priors and likelihood
Model:

y X e

Bayesian linear regression:


priors and likelihood
Model:

Prior:

y X e

p( 2 ) N k (0, 21 I k )
exp( 2

/ 2)

Bayesian linear regression:


priors and likelihood
Model:

Prior:

y X e

p( 2 ) N k (0, 21 I k )
exp( 2

Sample curves from prior


(before observing any data)
Mean curve

/ 2)

Bayesian linear regression:


priors and likelihood
y X e

Model:

p( 2 ) N k (0, 21 I k )

Prior:

exp( 2
Likelihood:

/ 2)

p ( y , 1 ) p ( yi | , 11 )
i 1

p( yi , 1 ) N ( X i , 11 )
exp( 1 ( yi X i ) 2 / 2)

Bayesian linear regression:


priors and likelihood
y X e

Model:

p( 2 ) N k (0, 21 I k )

Prior:

exp( 2
Likelihood:

/ 2)

p ( y , 1 ) p ( yi | , 11 )
i 1

p( yi , 1 ) N ( X i , 11 )
exp( 1 ( yi X i ) 2 / 2)

Bayesian linear regression:


priors and likelihood
y X e

Model:

p( 2 ) N k (0, 21 I k )

Prior:

exp( 2
Likelihood:

/ 2)

p ( y , 1 ) p ( yi | , 11 )
i 1

p( yi , 1 ) N ( X i , 11 )
exp( 1 ( yi X i ) 2 / 2)

Bayesian linear regression:


posterior
y X e

Model:

p( 2 ) N k (0, 21 I k )

Prior:

exp( 2
Likelihood:

/ 2)

p ( y , 1 ) p ( yi | , 1 )
i 1

Bayes Rule:

p( y, ) p( y | , ) p( | )

Bayesian linear regression:


posterior
y X e

Model:

p( 2 ) N k (0, 21 I k )

Prior:

exp( 2
Likelihood:

/ 2)

p ( y , 1 ) p ( yi | , 1 )

i 1

Bayes Rule:

p( y, ) p( y | , ) p( | )

Posterior:

p | y,

N , C

C 1 X T X 2 I k

1CX T y

Bayesian linear regression:


posterior
y X e

Model:

p( 2 ) N k (0, 21 I k )

Prior:

exp( 2
Likelihood:

/ 2)

p ( y , 1 ) p ( yi | , 1 )

i 1

Bayes Rule:

p( y, ) p( y | , ) p( | )

Posterior:

p | y,

N , C

C 1 X T X 2 I k

1CX T y

Bayesian linear regression:


posterior
y X e

Model:

p( 2 ) N k (0, 21 I k )

Prior:

exp( 2
Likelihood:

/ 2)

p ( y , 1 ) p ( yi | , 1 )

i 1

Bayes Rule:

p( y, ) p( y | , ) p( | )

Posterior:

p | y,

N , C

C 1 X T X 2 I k

1CX T y

Posterior Probability Maps (PPMs)

osterior distribution: probability of the effect given the dat

p( | y)

mean: size of effect


precision: variability

Posterior probability map: images of the


probability (confidence) that an activation exceeds
some specified thresholdsth, given the data y

p ( sth | y ) pth

p( | y)

sth

Two thresholds:
activation threshold sth : percentage of whole brain
mean signal (physiologically relevant size of effect)
probability pth that voxels must exceed to be
displayed (e.g. 95%)

Bayesian linear regression:


model selection
Bayes Rule:

p ( y | , , m) p ( | , m)
p ( y , , m)
p ( y , m)
normalizing constant

Model evidence:

p ( y , m) p ( y | , , m) p ( | , m)d

log p ( y | , m)
accuracy (m) complexity(m)

accuracy (m) y X

complexity(m) k log 21 2

aMRI segmentation

PPM of belonging to

grey matter

white matter

CSF

Dynamic Causal Modelling:


generative model for fMRI and ERPs
Hemodynamic
forward model:
neural activityBOLD

Electric/magnetic
forward model:
neural activityEEG
MEG
LFP

Neural state equation:

x F ( x, u, )

fMRI
Neural model:
1 state variable per region
bilinear state equation
no propagation delays
inputs

ERPs

Neural model:
8 state variables per region
nonlinear state equation
propagation delays

Bayesian Model Selection for fMRI


m1

m2

m3

m4

attention

attention

attention

attention

PPC

stim

V1

PPC

stim

V5

V1

V5

PPC

stim

V1

V5

PPC

stim

V1

V5

attention
models marginal likelihood

ln p y m

15

0.10

PPC

10

1.25

stim

0.26

0.39
0.26

V1 0.13
0.46

m1 m2 m3 m4

[Stephan et al., Neuroimage, 2008]

V5

estimated
effective synaptic strengths
for best model (m4)

fMRI time series analysis with spatial priors


degree of smoothness

p N 0, 1 L1

Y X

aMRI

Spatial precision matrix

prior precision
of GLM coeff

Smooth Y (RFT)

prior precision
of AR coeff

prior precision
of data noise

ML estimate of

VB estimate of

GLM coeff

AR coeff
(correlated noise)

observations

Penny et al 2005

fMRI time series analysis with spatial priors:


posterior probability maps

Display only voxels


that exceed e.g. 95%

activation
threshold

p pth
p q ( sth )

sth

Probability mass pn

Mean (Cbeta_*.img)

PPM (spmP_*.img)

Posterior density q(n)


probability of getting an effect, given the data

q( n ) N ( n , n )
Std dev (SDbeta_*.img)

mean: size of effect


covariance: uncertainty

fMRI time series analysis with spatial priors:


Bayesian model selection
log p ( y m) F (q )
Log-evidence maps
subject 1
model 1
subject N

model K

Compute log-evidence
for each model/subject

fMRI time series analysis with spatial priors:


Bayesian model selection
log p ( y m) F (q )
Log-evidence maps

BMS maps

subject 1
model 1
subject N

q(rk 0.5) 0.941

rk

PPM

EPM

q (rk )
model K

rk
Compute log-evidence
for each model/subject

Probability that model k


generated data

model k
Joao et al, 2009

Reminder
Long term memory
Short term memory

Compare two models


Short-term memory model

IT indices: H,h,I,i
onsets

long-term memory model

IT indices are smoother

Missed
trials
H=entropy; h=surprise; I=mutual information; i=mutual surprise

Group data: Bayesian Model Selection maps


Regions
best
explained by
short-term
memory
model

primary visual
cortex

Regions best
explained by
long-term
memory
model

frontal cortex
(executive
control)

Thank-you

You might also like