Maximum Likelihood Estimation Explained: Parameter Estimation Using MLE

Parameter Estimation:
Maximum Likelihood Estimation

----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Discriminant function log (probability density function)

Probability density function is specified by no. of parameters
To find the decision surface between two different classes, the nature of
probability density function is very important, which is specified by the
parameter vector.
For ex., Let us take Gaussian distribution : P(x/wj) ~ N(, )
It is described by two parameters mean vector and covariance matrix
Depending upon the nature of this parameter vectors, we can have different
types of decision boundary between two different classes. ( linear or non linear)
If probability density function is other than Gaussian distribution, in that case we

have to find out what is the parameter vector that identifies the probability
density function.
The maximum likelihood estimation is a technique used for parameter vector

estimation, for a known parametric form of the probability density function.
For example, if P(x/wj) ~ N(, ) then, j consists of components mean

vector and covariance matrix.
If probability density function is other than Gaussian distribution, in that case

we have to find out what is the parameter vector that identifies the probability
density function.
Let C no. of classes

D1 D2 . . . . . Dc are the set of samples in each class
Let the samples are independent identically distributed
The samples in Dj have been drawn independently according to the probability

law P(x/wj). And we also assume that, this P(x/wj) has a known parametric
form, and is therefore determined uniquely by the value of a parameter vector j.
To show the dependence of P(x/wj) on j , we can write p(x|j) as P(x/wj , j )

Our problem is to use the information provided by the training samples to
obtain good estimates for the unknown parameter vectors 1, ..., c associated
with each category.
The estimation of parameter vector j from the information available from

samples in the set Dj is called as maximum likelihood estimation.
So, we have to use the information from training samples in the set Dj to have
a good estimate of parameter vector j.
To simplify treatment of this problem, we shall assume that samples in Di does

not provide any information about j for i j
That is, we shall assume that the parameters for the different classes are
functionally independent. This permits us to work with each class separately,
and thus we have C no.of separate problems.
Use a set D of training samples drawn independently from the probability

density P(x|) to estimate the unknown parameter vector .
Suppose that D contains n samples, x1, x2 ..., xn then, since the samples were
drawn independently, we have
P(D|) is called the likelihood of w.r.to the set of samples.
The maximum likelihood estimate of P(D|) is the value that maximizes P(D|)
Instead of taking likelihood P(D|), we can take the logarithmic of P(D|) for
analysis
Log-likelihood = l() = ln P(D|)

For the maximization of likelihood l(), take the differentiation of l() and equate
that differential to zero.
Since is a vector, take the gradient operator, instead of simple differentiation.
Let is a p-component vector
= (1 , 2, p )t
The value of ~ that maximizes the log-likelihood can be obtained by making
Where
Thus, a set of necessary conditions for the maximum likelihood estimate for can be
obtained from the set of p- equations
The above graph shows several training points in one dimension, assumed to be
drawn from a Gaussian of a particular variance, but unknown mean.
From the graph, we can observe that, If we have more number of training points, the
likelihood P(D|) will be very narrow and the confidence in estimating the
maximum is high. Hence our confidence in estimating the value that maximizes the
likelihood is marked depends on the no. of samples. If we have more no. of
samples, the estimation of is more accurate.

Maximum Likelihood Estimation Explained: Parameter Estimation Using MLE

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Maximum Likelihood Estimation Explained: Parameter Estimation Using MLE

Uploaded by

Copyright:

Available Formats

Parameter Estimation:

Maximum Likelihood Estimation

Discriminant function log (probability density function)

If probability density function is other than Gaussian distribution, in that case we

The maximum likelihood estimation is a technique used for parameter vector

For example, if P(x/wj) ~ N(, ) then, j consists of components mean

If probability density function is other than Gaussian distribution, in that case

Let C no. of classes

The samples in Dj have been drawn independently according to the probability

To show the dependence of P(x/wj) on j , we can write p(x|j) as P(x/wj , j )

The estimation of parameter vector j from the information available from

To simplify treatment of this problem, we shall assume that samples in Di does

Use a set D of training samples drawn independently from the probability

P(D|) is called the likelihood of w.r.to the set of samples.

Log-likelihood = l() = ln P(D|)

Since is a vector, take the gradient operator, instead of simple differentiation.

Let is a p-component vector

The value of ~ that maximizes the log-likelihood can be obtained by making

You might also like