You are on page 1of 26

Machine Learning

Srihari

Basic Sampling Methods

Sargur Srihari
srihari@cedar.buffalo.edu

1
Machine Learning Srihari

Topics
1.  Motivation
2.  Ancestral Sampling
3.  Basic Sampling Algorithms
4.  Rejection Sampling
5.  Importance Sampling
6.  Sampling-Importance-Resampling

2
Machine Learning Srihari

1. Motivation
•  When exact inference is intractable, we need
some form of approximation
–  True of probabilistic models of practical significance
•  Inference methods based on numerical sampling
are known as Monte Carlo techniques
•  Most situations will require evaluating
expectations of unobserved variables, e.g., to
make predictions
–  Rather than the posterior distribution

3
Machine Learning Srihari

Task
•  Find expectation E[f] of some function f(z) wrt
distribution p(z)
–  Components of z can be discrete, continuous or combination
–  Function can be z, z2, etc
•  We wish to evaluate

–  Assume it is too complex to be evaluated analytically


•  E.g., Mixture of Gaussians
–  Note: EM with GMM is for clustering, Our current interest is
inference
–  In discrete case, integral replaces by summation
4
Machine Learning Srihari

Sampling: main idea


•  Obtain set of samples z(l) where l =1,.., L
–  Drawn independently from distribution p(z)
•  Expectation of a function f(z) or E[f] is approximated by
L
1
fˆ = ∑ f (z(l ) ) called an estimator
L l=1

–  Then E[ fˆ ] = E[ f ] so the estimator has the correct mean


L 2
–  Estimator variance is var[ fˆ ] = 1
€ ∑(
f (z(l ) ) − E[ f ]
L
)
l=1
•  or accuracy of estimator does not depend on dimensionality of z

•  High accuracy with few (10-20 independent) samples
•  However €
–  1. Samples may not be iid
•  so effective sample may be smaller than apparent sample size
–  2. In example f(z) is small when p(z) is high and vice versa
•  Expectation dominated by regions of small probability thereby requiring large
sample sizes 5
Machine Learning Srihari

2. Ancestral Sampling
•  If joint distribution is represented by a directed graph
with no observed variables
–  a straightforward method exists
•  Distribution is specified by

–  where zi are set of variables associated with node i and


–  pai are set of variables associated with node parents of node i
•  To obtain samples from joint
–  we make one pass through set of variables in order z1,..zM
sampling from conditional distribution p(z|pai)
•  After one pass through the graph we obtain one sample
6
Machine Learning Srihari

Logic Sampling
•  Directed graph where some nodes are
instantiated with observed values
•  Use ancestral sampling, except
–  When sample is obtained for an observed value, if
they agree then sample value is retained and proceed
to next variable
–  If they don’t agree, whole sample is discarded
•  Samples correctly from posterior distribution
–  However probability of accepting sample decreases
as no of variables increase and number of states that
variables can take increases
•  This is a special case of Importance Sampling
–  Rarely used in practice
7
Machine Learning Srihari

Undirected Graphs
•  No one-pass sampling strategy even for
case of no observed variables
•  Computationally expensive methods such
as Gibbs sampling must be used

8
Machine Learning Srihari

3. Basic Sampling Algorithms


•  Strategies for generating samples from a given
standard distribution, e.g., Gaussian
•  Assume that we have a pseudo-random
generator for uniform distribution over (0,1)
•  For standard distributions we can transform
uniformly distributed samples into desired
distributions

9
Machine Learning Srihari

Transforming Uniform to Standard Distribution


0 1 f(z)
z y
•  If z is uniformly distributed over (0,1), i.e., p(z) =1 in the interval
•  If we transform values of z using f() such that y=f(z)
•  Distribution of y is governed by

•  Goal is to choose f(z) such that values of y have distribution p(y)


•  Integrating (1) above Since p(z)=1 and integral
of dz/dy wrt y is z

–  which is an indefinite integral of p(y)


•  Thus y = h-1(z)
•  So we have to transform uniformly distributed random numbers
–  using a function which is the inverse of the indefinite integral of the
distribution 10
Machine Learning Srihari

Geometry of Transformation
•  Generating non-uniform random variables

•  h(y) is indefinite integral of desired p(y)


•  z~U(0,1) is transformed using y = h-1(z)
•  Results in y being distributed as p(y)
11
Machine Learning Srihari

Transformation for Exponential


•  Exponential Distribution

where 0<y<oo
•  In this case h(y)=1 - exp(-λy)
•  If we transform using y = λ-1 ln (1 - z)
•  Then y will have an exponential distribution

12
Machine Learning Srihari

Transformation for Cauchy


•  Cauchy Distribution

•  Inverse of the integral can be expressed as


a “tan” function

13
Machine Learning Srihari

Generalization: Multivariate and Gaussian


•  Box-Muller for Gaussian
•  Example of a bivariate Gaussian
•  Generate pairs of uniformly distributed
random numbers
z1,z2 ε (-1,1)
–  Can be done from U(0,1) using z2z-1
•  Discard each pair unless z12+z22<1
•  Leads to uniform distribution of points
inside unit circle with p(z1,z2)=1/π

14
Machine Learning Srihari

Generating a Gaussian
•  For each pair z1,z2 evaluate the quantities

•  Then y1 and y2 are independent Gaussians with


zero mean and variance
•  If y ~ N(0,1) then σ y + µ has N(µ,σ 2)
•  In multivariate case
–  If components are independent and N(0,1) then y=µ
+Lz will have N(µ,Σ)
where Σ = LLT, is called Cholesky decomposition 15
Machine Learning Srihari

4. Rejection Sampling
•  Transformation method depends on ability to
calculate and then invert indefinite integral
•  Method feasible only for some standard
distributions
•  More general strategy is needed
•  Rejection sampling and importance sampling are
limited to univariate distributions
–  Although not applicable to complex problems, they
are important components in more general strategies
•  Allows sampling from complex distributions

16
Machine Learning Srihari

Rejection Sampling Method


•  Wish to sample from distribution p(z)
•  Suppose we are able to easily evaluate p(z) for any
given value of z
•  Samples are drawn from simple distribution, called
proposal distribution q(z)
•  Introduce constant k whose value is such that kq(z)
> p(z) for all z
–  Called comparison function

17
Machine Learning Srihari

Rejection Sampling Intuition


•  Samples are drawn from
simple distribution q(z)
•  Rejected if they fall in grey
area
–  Between un-normalized
distribution p~(z) and scaled
distribution kq(z)
•  Resulting samples are
distributed according to p(z)
which is the normalized
version of p~(z)
18
Machine Learning Srihari

How to determine if sample is in


shaded region?
•  Each step involves generating two random
numbers
–  z0 from q(z) and u0 from uniform distribution [0,kq(z0)]
•  This pair has uniform distribution under the curve of
function kq(z)
•  If u0 > p(z0) the pair is rejected otherwise it is
retained
•  Remaining pairs have a uniform distribution under
the curve of p(z) and hence the corresponding z
values are distributed according to p(z) as desired
19
Machine Learning Srihari

Rejection Sampling from Gamma


•  Task of sampling from Gamma

•  Since Gamma is roughly bell-


shaped, proposal distribution is
Cauchy
•  Cauchy has to be slightly
generalized
–  To ensure it is nowhere smaller
than Gamma

20
Machine Learning Srihari

Adaptive Rejection Sampling


•  When difficult to find suitable
analytic distribution p(z)
•  Straight-forward when p(z) is
log concave
–  When ln p(z) has derivatives that
are non-increasing functions of z
–  Function ln p(z) and its gradient
are evaluated at initial set of grid
points
–  Intersections are used to
construct envelope
•  A sequence of linear functions
21
Machine Learning Srihari

Dimensionality and Rejection Sampling


•  Gaussian example
Proposal distribution q(z) is Gaussian
•  Acceptance rate is Scaled version is kq(z)
ratio of volumes
under p(z) and
kq(z)
–  diminishes
exponentially with
True distribution p(z)
dimensionality

22
Machine Learning Srihari

5. Importance Sampling
•  Evaluating expectation Proposal distribution

of f(z) with respect to


distribution f(z) from
which it is difficult to
draw samples directly
•  Samples {z(l)} are drawn
from simpler distribution
q(z)
•  Terms in summation
are weighted by ratios
p( z(l) ) /q( z(l) ) 23
Machine Learning Srihari

6. Sampling Importance Re-sampling (SIR)


•  Rejection sampling depends on Rejection Sampling
suitable value of k Proposal q(z) is Gaussian.
–  For many pairs of distributions p(z) Scaled version is kq(z)
and q(z) it is impractical to
determine value of k
–  If it is sufficiently large to
guarantee a bound then
impractically small acceptance
rates True distribution p(z)
•  Method makes use of sampling
distribution q(z) but avoids
having to determine k

24
Machine Learning Srihari

SIR Method
•  Two stages
•  Stage 1: L samples z(1),..,z(L) are drawn
from q(z)
•  Stage 2: Weights w1,..,wL are constructed
–  As in importance sampling
•  Finally a second set of L samples are
drawn from the discrete distribution
{z(1),..,z(L) } with probabilities given by
{w1,..,wL}
25
Machine Learning Srihari

Next Topic
•  Markov Chain Monte Carlo (MCMC)
–  Does not have limitation of Rejection sampling
and Importance Sampling in High
Dimensional Spaces

26

You might also like