Ch11 1-BasicSampling

Machine Learning
Srihari
Basic Sampling Methods
Sargur Srihari
srihari@cedar.buffalo.edu
1
Machine Learning Srihari
Topics
1.  Motivation
2.  Ancestral Sampling
3.  Basic Sampling Algorithms
4.  Rejection Sampling
5.  Importance Sampling
6.  Sampling-Importance-Resampling
2
1. Motivation
•  When exact inference is intractable, we need
some form of approximation
–  True of probabilistic models of practical significance
•  Inference methods based on numerical sampling
are known as Monte Carlo techniques
•  Most situations will require evaluating
expectations of unobserved variables, e.g., to
make predictions
–  Rather than the posterior distribution
3
Task
•  Find expectation E[f] of some function f(z) wrt
distribution p(z)
–  Components of z can be discrete, continuous or combination
–  Function can be z, z2, etc
•  We wish to evaluate
–  Assume it is too complex to be evaluated analytically

•  E.g., Mixture of Gaussians
–  Note: EM with GMM is for clustering, Our current interest is
inference
–  In discrete case, integral replaces by summation
4
Sampling: main idea

•  Obtain set of samples z(l) where l =1,.., L
–  Drawn independently from distribution p(z)
•  Expectation of a function f(z) or E[f] is approximated by
L
1
fˆ = ∑ f (z(l ) ) called an estimator
L l=1
–  Then E[ fˆ ] = E[ f ] so the estimator has the correct mean

L 2
–  Estimator variance is var[ fˆ ] = 1
€ ∑(
f (z(l ) ) − E[ f ]
L
)
l=1
•  or accuracy of estimator does not depend on dimensionality of z
€
•  High accuracy with few (10-20 independent) samples
•  However €
–  1. Samples may not be iid
•  so effective sample may be smaller than apparent sample size
–  2. In example f(z) is small when p(z) is high and vice versa
•  Expectation dominated by regions of small probability thereby requiring large
sample sizes 5
2. Ancestral Sampling
•  If joint distribution is represented by a directed graph
with no observed variables
–  a straightforward method exists
•  Distribution is specified by
–  where zi are set of variables associated with node i and

–  pai are set of variables associated with node parents of node i
•  To obtain samples from joint
–  we make one pass through set of variables in order z1,..zM
sampling from conditional distribution p(z|pai)
•  After one pass through the graph we obtain one sample
6
Logic Sampling
•  Directed graph where some nodes are
instantiated with observed values
•  Use ancestral sampling, except
–  When sample is obtained for an observed value, if
they agree then sample value is retained and proceed
to next variable
–  If they don’t agree, whole sample is discarded
•  Samples correctly from posterior distribution
–  However probability of accepting sample decreases
as no of variables increase and number of states that
variables can take increases
•  This is a special case of Importance Sampling
–  Rarely used in practice
7
Undirected Graphs
•  No one-pass sampling strategy even for
case of no observed variables
•  Computationally expensive methods such
as Gibbs sampling must be used
8
3. Basic Sampling Algorithms

•  Strategies for generating samples from a given
standard distribution, e.g., Gaussian
•  Assume that we have a pseudo-random
generator for uniform distribution over (0,1)
•  For standard distributions we can transform
uniformly distributed samples into desired
distributions
9
Transforming Uniform to Standard Distribution

0 1 f(z)
z y
•  If z is uniformly distributed over (0,1), i.e., p(z) =1 in the interval
•  If we transform values of z using f() such that y=f(z)
•  Distribution of y is governed by
•  Goal is to choose f(z) such that values of y have distribution p(y)

•  Integrating (1) above Since p(z)=1 and integral
of dz/dy wrt y is z
–  which is an indefinite integral of p(y)

•  Thus y = h-1(z)
•  So we have to transform uniformly distributed random numbers
–  using a function which is the inverse of the indefinite integral of the
distribution 10
Geometry of Transformation
•  Generating non-uniform random variables
•  h(y) is indefinite integral of desired p(y)

•  z~U(0,1) is transformed using y = h-1(z)
•  Results in y being distributed as p(y)
11
Transformation for Exponential

•  Exponential Distribution
where 0<y<oo
•  In this case h(y)=1 - exp(-λy)
•  If we transform using y = λ-1 ln (1 - z)
•  Then y will have an exponential distribution
12
Transformation for Cauchy

•  Cauchy Distribution
•  Inverse of the integral can be expressed as

a “tan” function
13
Generalization: Multivariate and Gaussian

•  Box-Muller for Gaussian
•  Example of a bivariate Gaussian
•  Generate pairs of uniformly distributed
random numbers
z1,z2 ε (-1,1)
–  Can be done from U(0,1) using z2z-1
•  Discard each pair unless z12+z22<1
•  Leads to uniform distribution of points
inside unit circle with p(z1,z2)=1/π
14
Generating a Gaussian
•  For each pair z1,z2 evaluate the quantities
•  Then y1 and y2 are independent Gaussians with

zero mean and variance
•  If y ~ N(0,1) then σ y + µ has N(µ,σ 2)
•  In multivariate case
–  If components are independent and N(0,1) then y=µ
+Lz will have N(µ,Σ)
where Σ = LLT, is called Cholesky decomposition 15
4. Rejection Sampling
•  Transformation method depends on ability to
calculate and then invert indefinite integral
•  Method feasible only for some standard
distributions
•  More general strategy is needed
•  Rejection sampling and importance sampling are
limited to univariate distributions
–  Although not applicable to complex problems, they
are important components in more general strategies
•  Allows sampling from complex distributions
16
Rejection Sampling Method

•  Wish to sample from distribution p(z)
•  Suppose we are able to easily evaluate p(z) for any
given value of z
•  Samples are drawn from simple distribution, called
proposal distribution q(z)
•  Introduce constant k whose value is such that kq(z)
> p(z) for all z
–  Called comparison function
17
Rejection Sampling Intuition

•  Samples are drawn from
simple distribution q(z)
•  Rejected if they fall in grey
area
–  Between un-normalized
distribution p~(z) and scaled
distribution kq(z)
•  Resulting samples are
distributed according to p(z)
which is the normalized
version of p~(z)
18
How to determine if sample is in

shaded region?
•  Each step involves generating two random
numbers
–  z0 from q(z) and u0 from uniform distribution [0,kq(z0)]
•  This pair has uniform distribution under the curve of
function kq(z)
•  If u0 > p(z0) the pair is rejected otherwise it is
retained
•  Remaining pairs have a uniform distribution under
the curve of p(z) and hence the corresponding z
values are distributed according to p(z) as desired
19
Rejection Sampling from Gamma

•  Task of sampling from Gamma
•  Since Gamma is roughly bell-

shaped, proposal distribution is
Cauchy
•  Cauchy has to be slightly
generalized
–  To ensure it is nowhere smaller
than Gamma
20
Adaptive Rejection Sampling

•  When difficult to find suitable
analytic distribution p(z)
•  Straight-forward when p(z) is
log concave
–  When ln p(z) has derivatives that
are non-increasing functions of z
–  Function ln p(z) and its gradient
are evaluated at initial set of grid
points
–  Intersections are used to
construct envelope
•  A sequence of linear functions
21
Dimensionality and Rejection Sampling

•  Gaussian example
Proposal distribution q(z) is Gaussian
•  Acceptance rate is Scaled version is kq(z)
ratio of volumes
under p(z) and
kq(z)
–  diminishes
exponentially with
True distribution p(z)
dimensionality
22
5. Importance Sampling
•  Evaluating expectation Proposal distribution
of f(z) with respect to

distribution f(z) from
which it is difficult to
draw samples directly
•  Samples {z(l)} are drawn
from simpler distribution
q(z)
•  Terms in summation
are weighted by ratios
p( z(l) ) /q( z(l) ) 23
6. Sampling Importance Re-sampling (SIR)

•  Rejection sampling depends on Rejection Sampling
suitable value of k Proposal q(z) is Gaussian.
–  For many pairs of distributions p(z) Scaled version is kq(z)
and q(z) it is impractical to
determine value of k
–  If it is sufficiently large to
guarantee a bound then
impractically small acceptance
rates True distribution p(z)
•  Method makes use of sampling
distribution q(z) but avoids
having to determine k
24
SIR Method
•  Two stages
•  Stage 1: L samples z(1),..,z(L) are drawn
from q(z)
•  Stage 2: Weights w1,..,wL are constructed
–  As in importance sampling
•  Finally a second set of L samples are
drawn from the discrete distribution
{z(1),..,z(L) } with probabilities given by
{w1,..,wL}
25
Next Topic
•  Markov Chain Monte Carlo (MCMC)
–  Does not have limitation of Rejection sampling
and Importance Sampling in High
Dimensional Spaces
26

Ch11 1-BasicSampling

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Ch11 1-BasicSampling

Uploaded by

Copyright:

Available Formats

Machine Learning

Basic Sampling Methods

– Assume it is too complex to be evaluated analytically

Sampling: main idea

– Then E[ fˆ ] = E[ f ] so the estimator has the correct mean

– where zi are set of variables associated with node i and

3. Basic Sampling Algorithms

Transforming Uniform to Standard Distribution

• Goal is to choose f(z) such that values of y have distribution p(y)

– which is an indefinite integral of p(y)

• h(y) is indefinite integral of desired p(y)

Transformation for Exponential

Transformation for Cauchy

• Inverse of the integral can be expressed as

Generalization: Multivariate and Gaussian

• Then y1 and y2 are independent Gaussians with

Rejection Sampling Method

Rejection Sampling Intuition

How to determine if sample is in

Rejection Sampling from Gamma

• Since Gamma is roughly bell-

Adaptive Rejection Sampling

Dimensionality and Rejection Sampling

of f(z) with respect to

6. Sampling Importance Re-sampling (SIR)

You might also like

–  Assume it is too complex to be evaluated analytically

–  Then E[ fˆ ] = E[ f ] so the estimator has the correct mean

–  where zi are set of variables associated with node i and

•  Goal is to choose f(z) such that values of y have distribution p(y)

–  which is an indefinite integral of p(y)

•  h(y) is indefinite integral of desired p(y)

•  Inverse of the integral can be expressed as

•  Then y1 and y2 are independent Gaussians with

•  Since Gamma is roughly bell-