CH 7 (I) JBJBJ

IC102: Data Analysis and Interpretation
MB
Mani Bhushan,
Department of Chemical Engineering,
Indian Institute of Technology Bombay
Mumbai, India- 400076
mbhushan@iitb.ac.in
Acknowledgements: Santosh Noronha (some material from his slides)
Autumn 2012
MB (IIT Bombay) IC102 Autumn 2012 1 / 55
Todays lecture:
Chapter 7
Parameter estimation
Maximum likelihood estimation
Point and interval estimates
Estimation
So far: we knew the probability density function and answered questions on
probabilities of occurrence of various events.
Now: some of the parameters of the probability density function are
unknown. Instead a sample is available.
Problem: estimate the unknown parameters of the density function using the
available sample.
Can have point and interval estimates of such parameters.
Estimator
Statistic: a random variable whose value is determined by the sample data.
Any statistic used to estimate the value of an unknown parameter (of the
probability density function) is called an estimator of .
Eg. Parameter (unknown mean) of a normal population.
Available: sample X
1
, X
2
, ..., X
n
from that normal population.
Problem: estimate ?
Can have several estimators: such as sample mean

X =
n
i =1
X
i
/n, or sample
median or simply the average of the rst (X
1
) and the last (X
n
) values.
We will look at ways to compare dierent estimators later.
Maximum Likelihood Estimators (MLE)
A very popular approach to obtain an estimate of the unknown parameter .
Suppose that the random variables X
1
, X
2
, ..., X
n
whose joint probability
distribution is assumed given except for an unknown parameter , are to be
observed.
Let f
X
1
,X
2
,...,X
n
(x
1
, x
2
, ..., x
n
| ) be the joint density (or mass) function of the
random variables X
1
, X
2
, ..., X
n
.
Since is assumed unknown, we explicitly show the dependence of f on .
We will write f (x
1
, x
2
, ..., x
n
| ) for simplicity (ignoring subscripts).
MLE (continued)
f (x
1
, x
2
, ..., x
n
| ) represents the likelihood that the values x
1
, x
2
, ..., x
n
will
be observed when is the true value of the parameter.
A reasonable estimate of is that value yielding the largest likelihood of the
observed values.
i.e. the maximum likelihood estimate

is dened to be that value of
maximizing f (x
1
, x
2
, ..., x
n
| ) where x
1
, x
2
, ..., x
n
are observed values.
The function f (x
1
, x
2
, ..., x
n
| ) is referred to as the likelihood function of .
Useful fact: instead of maximizing f (x
1
, x
2
, ..., x
n
| ) we can also maximize
log[f (x
1
, x
2
, ..., x
n
| )]. Why?
Maximizing log often results in simpler problem.
Maximum Likelihood Estimation of ,
2
of a normal
distribution
Let X
1
, . . . , X
n
be independent normal random variables with unknown mean
and standard deviation . The joint density is given by
f (x
1
, x
2
, ..., x
n
| , ) =
n
i =1
1
2
exp
_
(x
i
)
2
2
2
_
=
_
1
2
_
n/2
1
n
exp
_
n
i =1
(x
i
)
2
2
2
_
The logarithm of the likelihood is thus
J = log f (x
1
, ..., x
n
| , ) =
n
2
log(2) n log()
n
i =1
(x
i
)
2
2
2
Maximum Likelihood for Gaussian Parameter Estimation
What estimates of and
2
would maximize the likelihood of seeing the
data?
Set
J
n
i =1
(x
i
)
2
J
=
n
n
i =1
(x
i
)
2
3
to 0 and solve for and as:
=
n
i =1
x
i
/n
=
_
n
i =1
(x
i
)
2
/n
_
1/2
MLE of Gaussian Parameters
The MLE of is =

X (sample average)
The MLE of
2
is a surprise:

2
=
n
i =1
(X
i

X)
2
n
The MLE of
2
is dierent from sample variance S
2
and is biased since
E[
2
] =
n 1
n

2
=
2
For large n,
2
is approximately equal to S
2
.
Maximum Likelihood Estimation
The MLE of a parameter is that value of that maximizes the likelihood
(or log-likelihood) of seeing the observed values.
Most MLEs are intuitive.
The MLE of success probability p for a binomial RV (X heads in n tosses) is
p =
X
n
and E[ p] = p
Read book for proof of above expression of p and other examples.
Interval Estimation
MLE gave us a point estimate of the unknown parameters.
For example, for a sample from a normal population, we obtained estimate of
as

X =
n
i =1
X
i
/n.
However, we dont expect that the sample mean

X will exactly equal , but
that it will be close.
Useful to specify an interval for which we have a certain degree of condence
that lies within.
For example, it would be nice to be able to give bound such that
X =
or
=

X
Remember that

X is a RV.
Chebyshevs Inequality for a Random Variable X
Theorem
P (|X
i
| > k)
1
k
2
4 2 0 2 4 6
0
.
0
0
.
1
0
.
2
0
.
3
0
.
4
k + k

Chebyshevs inequality for a random variable X
This applies to ANY distribution.
For k = 2,
Chebyshevs theorem: P(|X

i
| > 2) 0.25.
For a Gaussian, P(|X

i
| > 2) 0.05.
4 2 0 2 4 6
0
.
0
0
.
1
0
.
2
0
.
3
0
.
4
k + k
3 x 2 + + 2 + 3
68%
95%
99.7%
f (x)
MONTGOMERY: Applied Statistics, 3e
Fig. 4.12 W-68
Chebyshevs Inequality for

X
P
_
|
X | > k

n
_
1
k
2
Let k/
n = = some tolerance.
P{|
X | > }

2
n
2
P{|
X | < } 1

2
n
2
For a given tolerance, as n , then

X
This is the weak law of large numbers.
Which estimate intervals do we want?
Chebyshevs inequality gives us a loose bound. We want better estimates.
For a normal distribution,
1. interval estimate of with
2
known,
2. interval estimate of with
2
unknown,
3. interval estimate of
2
with unknown,
4. interval estimate of dierence in means of two normal populations with known
(same or dierent) variances,
5. interval estimate of dierence in means of two normal populations with
unknown but same variances
For a binomial distribution,
6. interval estimate of p
1. Normal: interval for ,
2
known
We have n independent normally distributed points.
X
i
N(,
2
)
X N
_
,

2
n
_

X
/
n
N(0, 1)
2
known
Recall the z
notation for a threshold:

P {Z > z
} =
2
known
For a two-sided interval,
P
_
z
/2
<
X
/
n
< z
/2
_
= 1
specically P
_
1.96 <
X
/
n
< 1.96
_
= 0.95
2
known
We can rearrange the inequalities below to nd an interval around
P
_
1.96 <
X
/
n
< 1.96
_
= 0.95
We get
P
_
X 1.96

n
< <

X + 1.96

n
_
= 0.95
that is 95% of the time will lie within 1.96/
n units of the sample

average.
If we now observe the sample and it turns out that

X = x, then we say that
with 95% condence
x 1.96

n
< < x + 1.96

n
The interval
_
x 1.96

n
, x + 1.96

n
_
is called a 95% condence interval estimate of .
Example 7.3a
Suppose that when a signal having value is transmitted from location A the
value received at location B is normally distributed with mean and variance
4. That is, if is sent, then the value received is + N where N,
representing noise, is normal with mean 0 and variance 4. To reduce error,
suppose the same value is sent 9 times. If the successive values received are
5,8.5,12,15,7,9,7.5,6.5,10.5, construct a 95% condence interval (CI) for .
x = 81/9 = 9. Under the assumption that the values received are
independent, a 95% CI for is:
_
9 1.96
3
, 9 + 1.96
3
_
= (7.69, 10.31)
i.e. we are 95% condent that the true message lies between 7.69 and 10.31.
2
known
What does a 95% condence interval mean?
3 2 1 0 1 2 3
0
.
5
0
.
0
0
.
5
1
.
0
value
P
r
o
b
a
b
i
l
i
t
y
| |
| |
| |
| |
| |
In the long run, 95% of such intervals will contain .

2
known
For a two-sided interval,
P
_
z
/2
<
X
/
n
< z
/2
_
= 1
2
known: 100(1 )%
Condence interval
In general: the two-sided 100(1 )% condence interval is
x z
/2
n
We can also have 100(1 )% upper and lower one sided condence
intervals.
_
x z
n
, +
_
_
, x + z
n
_
2
known: sample size
The two-sided 100(1 )% CI is
x z
/2
n
What should the sample size (n) be, if we desire x to approach to within a
desired level of condence (i.e. given )?
Rearrange and solve for n:
n =
_
z
/2
| x |
_
2
2
known: sample size
n =
_
z
/2
| x |
_
2
Does the dependency of n on the various terms make sense?
You need more samples if
you want x to come very close to ,
x
E = error = x
u = x + z
/2
/ n
l = x z
/2
/ n

Fig. 8.2 W-138
2
known
n =
_
z
/2
| x |
_
2
Does the dependency of n on the various terms make sense?
You need more samples if
you want x to come very close to ,
is large,
z
/2
is increased (i.e. is decreased, or 1 is increased)
3 x 2 + + 2 + 3
68%
95%
99.7%
f (x)
Fig. 4.12 W-68 MB (IIT Bombay) IC102 Autumn 2012 26 / 55
2
unknown
The two sided interval for , when
2
is known is
x z
/2
n
What happens when
2
is unavailable?
We must estimate
2
. Its point estimate is s
2
.
Now:
x
/
n
N(0, 1)
But
x
s/
n
t
n1
i.e. is a t random variable with n 1 degrees of freedom.
2
unknown
x
s/
n
follows the t
n1
curve
Remember that the t-density function is symmetric about the mean (0)
2
unknown
For any (0, 1) we have
P
_
t
/2,n1
<
X
S
< t
/2,n1
_
= 1
or,
P
_
X t
/2,n1
S
n
< <

X + t
/2,n1
S
n
_
= 1
Thus, if it is observed that

X = x and S = s, then we can say with
100(1 )% condence that

_
x t
/2,n1
s
n
, x + t
/2,n1
s
n
_
Similarly, 100(1 ) upper and lower one sided intervals would be

_
x t
,n1
s
n
,
_

_
, x + t
,n1
s
n
_
Example 7.3f
Suppose that when a signal having value is transmitted from location A the
value received at location B is normally distributed with mean and variance
2
with
2
unknown. That is, if is sent, then the value received is + N
where N, representing noise, is normal with mean 0 and variance
2
. To
reduce error, suppose the same value is sent 9 times. If the successive values
received are 5,8.5,12,15,7,9,7.5,6.5,10.5, construct a 95% condence interval
(CI) for .
x = 81/9 = 9, and s
2
=
9
i =1
x
2
i
9( x)
2
8
= 9.5, or s = 3.082
With t
0.025,8
= 2.306, a 95% CI for is
_
9 2.306
3.082
3
, 9 + 2.306
3.082
3
_
= (6.63, 11.37)
Note: in example 7.3a we used Z distribution with
2
= 4 to get a 95% CI of
(7.69, 10.31). If we assume s
2
= 4 now, we would get the 95% CI to be
(7.46, 10.54) which is larger than for earlier case.
t density function has heavier tails than Z density function.
So far
Interval estimates for mean with known and unknown variances.
Now consider interval estimate for the unknown variance.
3. Normal: interval for
2
If (X
1
, . . . , X
n
) N(,
2
). The point estimate of
2
is S
2
S
2
=
n
i =1
(X
i

X)
2
n 1
, E[S
2
] =
2
We have
(n 1)S
2
2

2
n1
or
S
2

2
n 1
2
n1
2
0 5 10 15 20 25 x
k = 10
k = 5
k = 2
f ( x)
Fig. 8.8 W-144
The
2
distribution has x
always > 0 and is skewed to
the right.
It is not symmetric.
The skew as n .
It is related to the gamma
function
2
We compute intervals, using quantiles, as before.
No symmetry we need to evaluate both quantiles for a two sided interval.
(a)
k ,
2
0
f (x) f (x)
x
(b)
2
0
0.05
0.05
0.95, 10
= 3.94
2
0.05, 10
= 18.31
Fig. 8.9 W-145,146
2
(a)
k ,
2
0
f (x) f (x)
x
(b)
2
0
0.05
0.05
0.95, 10
= 3.94
2
0.05, 10
= 18.31
Fig. 8.9 W-145,146
(n 1)S
2
2

2
n1
P
_
(n 1)S
2
2
/2,n1
<
2
<
(n 1)s
2
2
1/2,n1
_
= 1
2
Hence when, S
2
= s
2
, a 100%(1 ) condence interval for
2
is
_
(n 1)s
2
2
/2,n1
,
(n 1)s
2
2
1/2,n1
_
One sided intervals can be obtained similarly:
Lower:
_
0,
(n 1)s
2
2
1,n1
_
Upper:
_
(n 1)s
2
2
,n1
,
_
Example 7.3h
A company produces washers with very small deviations in their thickness.
Suppose that 10 such randomly chosen washers were measured and their
thickness found to be (in inches):
.123, .124, .126, .120, .130, .133, .125, .128, .124, .126.
Compute a 90% CI for the standard deviation of the thickness of a washer
produced by the company?
Here s
2
= 1.366 10
5
. With
2
0.05,9
= 16.917,
2
0.95,9
= 3.334, the 90% CI
for
2
is
_
9 1.366 10
5
16.917
= 7.267 10
6
,
9 1.366 10
5
3.334
= 36.875 10
6
_
Thus, a 90% CI for is (on taking square root)
(2.696 10
3
, 6.072 10
3
)
2
Notice how we standardize:
We standardize x
i
as
x
i

We standardize x as
x
/
n
We standardize s
2
as
(n 1)s
2
2
Dierence in Means of Two Normal Populations
So far, data from a single normal population.
Now consider data from two dierent normal populations.
X
1
, X
2
, ..., X
n
be a sample of size n from N(
1
,
2
1
); and
Y
1
, Y
2
, ..., Y
m
be a sample of size m from N(
2
,
2
2
).
The two samples independent of each other.
Estimate
1
2
?
Distribution of

X

Y
Individual sample means are
X =
n
i =1
X
i
n
;

Y =
m
i =1
Y
i
m
X

Y is the maximum likelihood estimator of
1
2
.
Whats the distribution of

X

Y?
Sum of independent normal variables is also normal
X N(
1
,
2
1
/n);

Y N(
2
,
2
2
/m)
X

Y N
_
2
,

2
1
n
+

2
2
m
_
4. Distribution of

X

Y;
2
1
,
2
2
known
Assuming
2
1
,
2
2
are known
X

Y (
1
2
)
_
2
1
n
+

2
2
m
N(0, 1)
P
_
_
_
z
/2
<
X

Y (
1
2
)
_
2
1
n
+

2
2
m
< z
/2
_
_
_
= 1
or, equivalently
P
_
X

Y z
/2
_
2
1
n
+

2
2
m
<
1
2
<

X

Y + z
/2
_
2
1
n
+

2
2
m
_
= 1
4. Distribution of

X

Y;
2
1
,
2
2
known
If

X,

Y are observed to be x, y, then a 100(1 )% two sided Condence
Interval on
1
2
is
2

_
x y z
/2
_
2
1
n
+

2
2
m
, x y + z
/2
_
2
1
n
+

2
2
m
_
Example7.4(a)
Estimating dierence in means of two normal populations with known variances
Two dierent types of electrical cable insulation have been tested to
determine the voltage level at which failures tend to occur. when specimens
were subjected to an increasing voltage stress in a laboratory experiment,
failures for the two types of cable insulation occurred at the following
voltages:
Type A: 36, 44, 41, 53, 38, 36, 34, 54, 52, 37, 51, 44, 35, 44.
Type B: 52, 64, 38, 68, 66, 52, 60, 44, 48, 46, 70, 62.
Suppose it is known that the amount of voltage that cables having type A
insulation can withstand is normally distributed with unknown mean
A
and
known variance
2
A
= 40, whereas the corresponding distribution for type B
insulation is normal with unknown mean
B
and known variance
2
B
= 100.
Determine a 95% condence interval for
A
B
.
Example 7.4a (continued)
x = 42.8, y = 55.8
= 0.05, z
/2
= z
0.025
= 1.96
n = 14,
2
1
= 40; m = 12,
2
2
= 100
2

_
x y z
/2
_
2
1
n
+

2
2
m
_
= (19.6, 6.5)
5. Distribution of

X

Y;
2
1
,
2
2
unknown
What if
2
1
,
2
2
are also unknown alongwith
1
,
2
?
Natural to replace
2
1
,
2
2
with S
2
1
, S
2
2
,
S
2
1
=
n
i =1
(X
i

X)
2
n 1
, S
2
2
=
n
i =1
(Y
i

Y)
2
m 1
Then use
X

Y (
1
2
)
_
S
2
1
/n + S
2
2
/m
But distribution of above Random variable is complicated for the case when
2
1
=
2
2
.
5. Distribution of

X

Y;
2
1
=
2
2
unknown
Let unknowns
2
1
=
2
2
=
2
Distribution of S
2
1
, S
2
2
(n 1)
S
2
1
2

2
n1
(m 1)
S
2
2
2

2
m1
Since samples independent, the two chisquare random variables are
independent.
(n 1)
S
2
1
2
+ (m 1)
S
2
2
2

2
n+m2
Let,
S
2
p
=
(n 1)S
2
1
+ (m 1)S
2
2
n + m 2
be the pooled sample variance.
n + m 2
2
S
2
p

2
n+m2
5. Distribution of

X

Y;
2
1
=
2
2
unknown
Since,
X

Y N
_
2
,

2
n
+

2
m
_
it follows that
X

Y (
1
2
)
_
2
n
+

2
m
N(0, 1)
Thus,
X

Y (
1
2
)
_
2
n
+

2
m
_
S
2
p
/
2
=
X

Y (
1
2
)
_
S
2
p
(1/n + 1/m)
has a t-distribution with n + m 2 degrees of freedom.
5. Distribution of

X

Y;
2
1
=
2
2
unknown
Thus,
P
_
t
/2,n+m2

X

Y (
1
2
)
S
p
_
1/n + 1/m
t
/2,n+m2
_
= 1
Hence, when data results in

X = x,

Y = y, S
2
p
= s
2
p
, a 100(1 )%
two-sided CI for
1
2
is
_
x y t
/2,n+m2
s
p
_
1/n + 1/m, x y + t
/2,n+m2
s
p
_
1/n + 1/m
_
Example 7.4a Modied
Estimating dierence in means of two normal populations with unknown but same
variances
Consider the earlier example but with the variances unknown but assumed to
be the same.
n = 14, x = 42.8, s
2
1
= 52.03; m = 12, y = 55.8, s
2
2
= 110.88
s
2
p
= ((n 1) s
2
1
+ (m 1) s
2
2
)/(n + m 2) = 79
= 0.05, t
0.025,24
= 2.06
2

_
x y t
/2,n+m2
s
p
_
1
n
+
1
m
_
= (20.26, 5.83)
6. Binomial: interval for p
So far: CIs for parameters of normal distribution.
Now consider: success probability p in Binomial or Bernoulli distribution.
p is the mean of Bernoulli Random Variable.
Assume n trials, X positive outcomes. Then,
X Bi (n, p) E[X] = np var(X) = np(1 p)
As n increases, using Central Limit Theorem (CLT),
X N(np, np(1 p))
Therefore
X np
_
np(1 p)
N(0, 1)
To obtain the interval, observe that
X N(np, np(1 p)) = N
_
,
2
_
From the Gaussian density function:
P
_
z
/2
<
X
< z
/2
_
= 1
Thus,
P
_
z
/2
<
X np
_
np(1 p)
< z
/2
_
1
Thus, for X = x, an approximate 100(1 )% CI for p is
_
p : z
/2
<
x np
_
np(1 p)
< z
/2
_
Above is however not an interval
We can replace p with p = X/n = p
ML
P
_
z
/2
<
X np
_
n p(1 p)
< z
/2
_
1
Multiply throughout by
_
n p(1 p). Also, since X = n p,
P
_
z
/2
_
n p(1 p) < n p np < z
/2
_
n p(1 p)
_
1
Rearrange.
P
_
p z
/2
_
p(1 p)
n
< p < p + z
/2
_
p(1 p)
n
_
1
A two-sided 100(1 )% CI on p then is
_
p z
/2
_
p(1 p)
n
_
Ex7.5a: A sample of 100 transistors is randomly chosen from a large batch to
determine if they meet the standards. If 80 of them meet the standards, then
an approximate 95% CI for p the fraction of all transistors that meet the
standards, is given by
(0.8 1.96
_
0.8(0.2)/100, 0.8 + 1.96
_
0.8(0.2)/100) = (0.7216, 0.8784)
6. Binomial: Sample Size n
Let b be the desired width of a 100(1 )% condence interval:
b = 2z
/2
_
p(1 p)
n
Solving for n gives
n = p(1 p)
_
2z
/2
b
_
2
that is, if k items were initially sampled to obtain estimate p of p, then an
additional n k (or 0 if n k) items should be sampled (keep p xed at
earlier value).
We can get an upper bound on n for p = 1/2:
n =
1
4
_
2z
/2
b
_
2
THANK YOU

CH 7 (I) JBJBJ

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

CH 7 (I) JBJBJ

Uploaded by

Copyright:

Available Formats

IC102: Data Analysis and Interpretation

MB (IIT Bombay) IC102 Autumn 2012 12 / 55

Chebyshevs theorem: P(|X

For a Gaussian, P(|X

notation for a threshold:

n units of the sample

In the long run, 95% of such intervals will contain .

You might also like