You are on page 1of 48

Bayesian Techniques for Parameter Estimation

He has Van Goghs ear for music, Billy Wilder

Statistical Inference
Goal: The goal in statistical inference is to make conclusions about a
phenomenon based on observed data.
Frequentist: Observations made in the past are analyzed with a specified
model. Result is regarded as confidence about state of real world.
Probabilities defined as frequencies with which an event occurs if
experiment is repeated several times.
Parameter Estimation:
o Relies on estimators derived from different data sets and a specific
sampling distribution.
o Parameters may be unknown but are fixed
Bayesian: Interpretation of probability is subjective and can be updated with
new data.
Parameter Estimation: Parameters described as density

Bayesian Inference
Framework:
Prior Distribution: Quantifies prior knowledge of parameter values.
Likelihood: Probability of observing a data if we have a certain set of
parameter values.
Posterior Distribution: Conditional probability distribution of unknown
parameters given observed data.
Joint PDF: Quantifies all combination of data and observations

Bayes Relation: Specifies posterior in terms of likelihood, prior, and


normalization constant

Problem: Evaluation of normalization constant typically requires high


dimensional integration.

Bayesian Inference
Uninformative Prior: No a priori information parameters

Informative Prior: Use conjugate priors; prior and posterior from same
distribution
Evaluation Strategies:
Analytic integration --- Rare
Classical quadrature; e.g., p = 2
Monte Carlo quadrature Techniques
Markov Chains

Bayesian Inference
Example:

Bayesian Inference
Example:

1 Head, 0 Tails
Note:

5 Heads, 9 Tails

49 Heads, 51 Tails

Bayesian Inference
Example: Now consider

5 Heads, 5 Tails

50 Heads, 50 Tails

Note: Poor informative prior incorrectly influences results for a long time.

Parameter Estimation Problem


Likelihood:

Assumption:

Note:

Parameter Estimation: Example


Example: Consider the spring model

Parameter Estimation: Example


Ordinary Least Squares: Here

Sample Distribution

Parameter Estimation: Example


Bayesian Inference: The likelihood is

Posterior Distribution:

Strategy: Create Markov chain using


random sampling so that created chain
has the posterior distribution as its
limiting (stationary) distribution.

Markov Chains
Definition:

Note: A Markov chain is characterized by three components: a state space, an


initial distribution, and a transition kernel.
State Space:
Initial Distribution: (Mass)

Transition Probability: (Markov Kernel)

Markov Chains
Example:

Chapman-Kolmogorov Equations:

Markov Chains: Limiting Distribution


Example: Raleigh weather -- Tomorrows weather conditioned on todays
weather

rain

sun

Question:

Definition: This is the limiting distribution (invariant measure)

Markov Chains: Limiting Distribution


Example: Raleigh weather
Solve

rain

sun

Irreducible Markov Chains


Reducible Markov Chain:

p1

p2

Note: Limiting distribution not


unique if chain is reducible.
Irreducible:

Periodic Markov Chains


Example:

Periodicity: A Markov chain is periodic if parts of the state space are visited at
regular intervals. The period k is defined as

Periodic Markov Chains


Example:

Stationary Distribution
Theorem: A finite, homogeneous Markov chain that is irreducible and aperiodic
has a unique stationary distribution
and the .chain will converge in the sense of
distributions from any initial distribution .
Recurrence (Persistence):

Example: State 3 is transient

Ergodicity: A state is termed ergodic if it is aperiodic and recurrent. If all states


of an irreducible Markov chain are ergodic, the chain is said to be ergodic.

Matrix Theory
Definition:

Lemma:

Example:

Matrix Theory
Theorem (Perron-Frobenius):

Corollary 1:

Proposition:

Stationary Distribution
Corollary:

Proof:

Convergence: Express

Stationary Distribution

Detailed Balance Conditions


Reversible Chains: A Markov chain determined by the transition matrix
is reversible if there is a distribution that satisfies the detailed balance
conditions

Proof: We need to show that


Example:

Markov Chain Monte Carlo Methods


Strategy: Markov chain simulation used when it is impossible, or
computationally prohibitive, to sample directly from

Note:
In Markov chain theory, we are given a Markov chain, P, and we
construct its equilibrium distribution.
In MCMC theory, we are given a distribution and we want to construct
a Markov chain that is reversible with respect to it.

Markov Chain Monte Carlo Methods


General Strategy:

Intuition: Recall that

Markov Chain Monte Carlo Methods


Intuition:

Note: Narrower proposal distribution yields higher probability of acceptance.

Metropolis Algorithm
Metropolis Algorithm: [Metropolis and Ulam, 1949]

Metropolis-Hastings Algorithm
Metropolis-Hastings Algorithm:

Examples:

Note: Considered one of top 10 algorithms of 20th century

Proposal Distribution
Proposal Distribution: Significantly affects mixing
Too wide: Too many points rejected and chain stays still for long periods;
Too narrow: Acceptance ratio is high but algorithm is slow to explore
parameter space
Ideally, it should have similar shape to posterior (target) distribution.

Problem:
Anisotropic posterior,
isotropic proposal;
Efficiency nonuniform for
different parameters

Result:
Recovers efficiency of
univariate case

Proposal Distribution and Acceptance Probability


Proposal Distribution: Two basic approaches
Choose a fixed proposal function
o Independent Metropolis
Random walk (local Metropolis)
o Two (of several) choices:

Acceptance Probability:

Random Walk Metropolis Algorithm for Parameter Estimation

Random Walk Metropolis Algorithm for Parameter Estimation

Markov Chain Monte Carlo: Example


Example: Consider the spring model

Markov Chain Monte Carlo: Example


Example: Single parameter c

Markov Chain Monte Carlo: Example


Example: Consider the spring model

Case i:

Markov Chain Monte Carlo: Example


Case i:

Note:

Markov Chain Monte Carlo: Example


Case i:

Markov Chain Monte Carlo: Example


Example: SMA-driven bending actuator -- talk with John Crews

Model:

Estimated Parameters:

Markov Chain Monte Carlo: Example


Example: SMA-driven bending actuator -- talk with John Crews

Markov Chain Monte Carlo: Example


Example: SMA-driven bending actuator -- talk with John Crews

Markov Chain Monte Carlo: Example


Example: SMA-driven bending actuator -- talk with John Crews

Transition Kernel and Detailed Balance Condition


Transition Kernel: Recall that

Detailed Balance Condition:

Transition Kernel and Detailed Balance Condition


Detailed Balance Condition: Here

Note:

Transition Kernel: Definition

Sampling Error Variance


Strategy: Treat error variance

as parameter to be estimated.

Recall: Assumption that errors

are normally distributed yields

Goal: Determine posterior distribution

Strategy:
Choose prior so that posterior is from same family --- termed conjugate prior.
For normal distribution with unknown variance, conjugate prior is inverse
Gamma distribution
which is equivalent to inverse

Sampling Error Variance


Definition:

Strategy:

Note:
Note:

Sampling Error Variance


Example: Consider the spring model

Related Topics
Note: This is an active research area and there are a number of related topics
Burn in and convergence
Adaptive algorithms
Population Monte Carlo methods
Sequential Monte Carlo methods and particle filters
Gaussian mixture models
Development of metamodels, surrogates and emulators to improve
implementation speeds
References:
A. Solonen, Monte Carlo Methods in Parameter Estimation of Nonlinear Models,
Masters Thesis, 2006.
H. Haario, E. Saksman, J. Tamminen, An adaptive Metropolis algorithm, Bernoulli,
7(2), pp. 223-242, 2001.
C. Andrieu and J. Thomas, A tutorial on adaptive MCMC, Statistics and
Computing, 18, pp. 343-373, 2008.
M. Vihola, Robust adaptive Metropolis algorithm with coerced acceptance rate,
arXiv:1011.4381v2.

You might also like