You are on page 1of 2

Mixture Models (heading)

Mixture models allow us to model differences in data caused by the presence


of groups when we dont know what group each observation is in. In some
cases, we dont even know how many groups there are! Instead of assuming
that all observations come from one distribution, mixtures assume only that
each comes from 1 of K groups. They can be used for very complex tasks,
such as real-time video tracking, and recognising you from a recording of your
voice, for example.
Lets consider the following mixture model

It is computationally straightforward to perfectly (and without any rejection)


sample from this mixture model, however the use of any Statistical Toolbox
software in MATLAB cannot be used.
Research (Heading)
Initially background research was conduced to determine the best method in
sampling from this mixture model. Cross Validated Stack Exchange
suggested generating a random variable from a uniform distribution. If this
uniform random variable is contained within the mixture model distribution,
then it corresponds to the probability of the kth component of the mixture
model. Then you generate the distribution from the kth component. This
process is repeated for the desired amount of samples from the mixture
distribution.
This process was followed to generate the function code used to sample from
our mixture model, with a total of 10000 samples. The inputs are the values of
w, mu1, mu2, sigma1, sigma2 and n (number of samples).
MATLAB Function Code (Heading)
The MATLAB Function Code used to generate samples from this mixture
model can be seen in the appendix.
Comparison (Heading)
10,000 samples were generated using this method of sampling from mixture
models. Using the ksdensity feature in MATLAB, these samples were
graphed along with the mixture model and their individual distributions. The
comparison of the mixture model to the ksdensity sample distribution can be
seen below.

Figure 1: Mixture model, ksdensity function and individual components;


n=10000.

The green and purple lines represent the first and second individual
component of the mixture model. The red line represents the entire mixture
model and the blue line represents the ksdensity of the sampled values from
the yellow dots on the x-axis.
It is clear from the above plot that the ksdensity function used to estimate the
target distribution is quite sufficient. At the peak of the first distribution there is
a slight error and towards the second peak the ksdensity estimates veers
away from the target distribution. However, considering only 10000 samples
were used, this is a fair estimate of the mixture model. It should be noted that
if less samples were used, then the estimate of the distribution would not be
as efficient, which is evident when the number of samples is changed to 1000.
The speed of this sampling technique is still considerably fast when
generating 1000 to 10000 samples, which was an unexpected result. All in all,
sampling from this mixture model is quite efficient and easy to implement
which was an unexpected result. It will be interesting to compare this method
to rejection sampling or importance sampling later on in the report.

You might also like