You are on page 1of 55

IC102: Data Analysis and Interpretation

MB
Mani Bhushan,
Department of Chemical Engineering,
Indian Institute of Technology Bombay
Mumbai, India- 400076
mbhushan@iitb.ac.in
Acknowledgements: Santosh Noronha (some material from his slides)
Autumn 2012
MB (IIT Bombay) IC102 Autumn 2012 1 / 55
Todays lecture:
Chapter 7
Parameter estimation
Maximum likelihood estimation
Point and interval estimates
MB (IIT Bombay) IC102 Autumn 2012 2 / 55
Estimation
So far: we knew the probability density function and answered questions on
probabilities of occurrence of various events.
Now: some of the parameters of the probability density function are
unknown. Instead a sample is available.
Problem: estimate the unknown parameters of the density function using the
available sample.
Can have point and interval estimates of such parameters.
MB (IIT Bombay) IC102 Autumn 2012 3 / 55
Estimator
Statistic: a random variable whose value is determined by the sample data.
Any statistic used to estimate the value of an unknown parameter (of the
probability density function) is called an estimator of .
Eg. Parameter (unknown mean) of a normal population.
Available: sample X
1
, X
2
, ..., X
n
from that normal population.
Problem: estimate ?
Can have several estimators: such as sample mean

X =

n
i =1
X
i
/n, or sample
median or simply the average of the rst (X
1
) and the last (X
n
) values.
We will look at ways to compare dierent estimators later.
MB (IIT Bombay) IC102 Autumn 2012 4 / 55
Maximum Likelihood Estimators (MLE)
A very popular approach to obtain an estimate of the unknown parameter .
Suppose that the random variables X
1
, X
2
, ..., X
n
whose joint probability
distribution is assumed given except for an unknown parameter , are to be
observed.
Let f
X
1
,X
2
,...,X
n
(x
1
, x
2
, ..., x
n
| ) be the joint density (or mass) function of the
random variables X
1
, X
2
, ..., X
n
.
Since is assumed unknown, we explicitly show the dependence of f on .
We will write f (x
1
, x
2
, ..., x
n
| ) for simplicity (ignoring subscripts).
MB (IIT Bombay) IC102 Autumn 2012 5 / 55
MLE (continued)
f (x
1
, x
2
, ..., x
n
| ) represents the likelihood that the values x
1
, x
2
, ..., x
n
will
be observed when is the true value of the parameter.
A reasonable estimate of is that value yielding the largest likelihood of the
observed values.
i.e. the maximum likelihood estimate

is dened to be that value of
maximizing f (x
1
, x
2
, ..., x
n
| ) where x
1
, x
2
, ..., x
n
are observed values.
The function f (x
1
, x
2
, ..., x
n
| ) is referred to as the likelihood function of .
Useful fact: instead of maximizing f (x
1
, x
2
, ..., x
n
| ) we can also maximize
log[f (x
1
, x
2
, ..., x
n
| )]. Why?
Maximizing log often results in simpler problem.
MB (IIT Bombay) IC102 Autumn 2012 6 / 55
Maximum Likelihood Estimation of ,
2
of a normal
distribution
Let X
1
, . . . , X
n
be independent normal random variables with unknown mean
and standard deviation . The joint density is given by
f (x
1
, x
2
, ..., x
n
| , ) =
n

i =1
1

2
exp
_
(x
i
)
2
2
2
_
=
_
1
2
_
n/2
1

n
exp
_

n
i =1
(x
i
)
2
2
2
_
The logarithm of the likelihood is thus
J = log f (x
1
, ..., x
n
| , ) =
n
2
log(2) n log()

n
i =1
(x
i
)
2
2
2
MB (IIT Bombay) IC102 Autumn 2012 7 / 55
Maximum Likelihood for Gaussian Parameter Estimation
What estimates of and
2
would maximize the likelihood of seeing the
data?
Set
J

n
i =1
(x
i
)

2
J

=
n

n
i =1
(x
i
)
2

3
to 0 and solve for and as:
=
n

i =1
x
i
/n
=
_
n

i =1
(x
i
)
2
/n
_
1/2
MB (IIT Bombay) IC102 Autumn 2012 8 / 55
MLE of Gaussian Parameters
The MLE of is =

X (sample average)
The MLE of
2
is a surprise:

2
=

n
i =1
(X
i


X)
2
n
The MLE of
2
is dierent from sample variance S
2
and is biased since
E[
2
] =
n 1
n

2
=
2
For large n,
2
is approximately equal to S
2
.
MB (IIT Bombay) IC102 Autumn 2012 9 / 55
Maximum Likelihood Estimation
The MLE of a parameter is that value of that maximizes the likelihood
(or log-likelihood) of seeing the observed values.
Most MLEs are intuitive.
The MLE of success probability p for a binomial RV (X heads in n tosses) is
p =
X
n
and E[ p] = p
Read book for proof of above expression of p and other examples.
MB (IIT Bombay) IC102 Autumn 2012 10 / 55
Interval Estimation
MLE gave us a point estimate of the unknown parameters.
For example, for a sample from a normal population, we obtained estimate of
as

X =

n
i =1
X
i
/n.
However, we dont expect that the sample mean

X will exactly equal , but
that it will be close.
Useful to specify an interval for which we have a certain degree of condence
that lies within.
For example, it would be nice to be able to give bound such that

X =
or
=

X
Remember that

X is a RV.
MB (IIT Bombay) IC102 Autumn 2012 11 / 55
Chebyshevs Inequality for a Random Variable X
Theorem
P (|X
i
| > k)
1
k
2
4 2 0 2 4 6
0
.
0
0
.
1
0
.
2
0
.
3
0
.
4
k + k

MB (IIT Bombay) IC102 Autumn 2012 12 / 55


Chebyshevs inequality for a random variable X
This applies to ANY distribution.
For k = 2,

Chebyshevs theorem: P(|X


i
| > 2) 0.25.

For a Gaussian, P(|X


i
| > 2) 0.05.
4 2 0 2 4 6
0
.
0
0
.
1
0
.
2
0
.
3
0
.
4
k + k
3 x 2 + + 2 + 3
68%
95%
99.7%
f (x)
MONTGOMERY: Applied Statistics, 3e
Fig. 4.12 W-68
MB (IIT Bombay) IC102 Autumn 2012 13 / 55
Chebyshevs Inequality for

X
P
_
|

X | > k

n
_

1
k
2
Let k/

n = = some tolerance.
P{|

X | > }

2
n
2
P{|

X | < } 1

2
n
2
For a given tolerance, as n , then

X
This is the weak law of large numbers.
MB (IIT Bombay) IC102 Autumn 2012 14 / 55
Which estimate intervals do we want?
Chebyshevs inequality gives us a loose bound. We want better estimates.
For a normal distribution,
1. interval estimate of with
2
known,
2. interval estimate of with
2
unknown,
3. interval estimate of
2
with unknown,
4. interval estimate of dierence in means of two normal populations with known
(same or dierent) variances,
5. interval estimate of dierence in means of two normal populations with
unknown but same variances
For a binomial distribution,
6. interval estimate of p
MB (IIT Bombay) IC102 Autumn 2012 15 / 55
1. Normal: interval for ,
2
known
We have n independent normally distributed points.
X
i
N(,
2
)

X N
_
,

2
n
_

X
/

n
N(0, 1)
MB (IIT Bombay) IC102 Autumn 2012 16 / 55
1. Normal: interval for ,
2
known
Recall the z

notation for a threshold:


P {Z > z

} =
MB (IIT Bombay) IC102 Autumn 2012 17 / 55
1. Normal: interval for ,
2
known
For a two-sided interval,
P
_
z
/2
<

X
/

n
< z
/2
_
= 1
specically P
_
1.96 <

X
/

n
< 1.96
_
= 0.95
MB (IIT Bombay) IC102 Autumn 2012 18 / 55
1. Normal: interval for ,
2
known
We can rearrange the inequalities below to nd an interval around
P
_
1.96 <

X
/

n
< 1.96
_
= 0.95
We get
P
_

X 1.96

n
< <

X + 1.96

n
_
= 0.95
that is 95% of the time will lie within 1.96/

n units of the sample


average.
If we now observe the sample and it turns out that

X = x, then we say that
with 95% condence
x 1.96

n
< < x + 1.96

n
The interval
_
x 1.96

n
, x + 1.96

n
_
is called a 95% condence interval estimate of .
MB (IIT Bombay) IC102 Autumn 2012 19 / 55
Example 7.3a
Suppose that when a signal having value is transmitted from location A the
value received at location B is normally distributed with mean and variance
4. That is, if is sent, then the value received is + N where N,
representing noise, is normal with mean 0 and variance 4. To reduce error,
suppose the same value is sent 9 times. If the successive values received are
5,8.5,12,15,7,9,7.5,6.5,10.5, construct a 95% condence interval (CI) for .
x = 81/9 = 9. Under the assumption that the values received are
independent, a 95% CI for is:
_
9 1.96

3
, 9 + 1.96

3
_
= (7.69, 10.31)
i.e. we are 95% condent that the true message lies between 7.69 and 10.31.
MB (IIT Bombay) IC102 Autumn 2012 20 / 55
1. Normal: interval for ,
2
known
What does a 95% condence interval mean?
3 2 1 0 1 2 3

0
.
5
0
.
0
0
.
5
1
.
0
value
P
r
o
b
a
b
i
l
i
t
y
| |
| |
| |
| |
| |

In the long run, 95% of such intervals will contain .


MB (IIT Bombay) IC102 Autumn 2012 21 / 55
1. Normal: interval for ,
2
known
For a two-sided interval,
P
_
z
/2
<

X
/

n
< z
/2
_
= 1
MB (IIT Bombay) IC102 Autumn 2012 22 / 55
1. Normal: interval for ,
2
known: 100(1 )%
Condence interval
In general: the two-sided 100(1 )% condence interval is
x z
/2

n
We can also have 100(1 )% upper and lower one sided condence
intervals.
_
x z

n
, +
_
_
, x + z

n
_
MB (IIT Bombay) IC102 Autumn 2012 23 / 55
1. Normal: interval for ,
2
known: sample size
The two-sided 100(1 )% CI is
x z
/2

n
What should the sample size (n) be, if we desire x to approach to within a
desired level of condence (i.e. given )?
Rearrange and solve for n:
n =
_
z
/2

| x |
_
2
MB (IIT Bombay) IC102 Autumn 2012 24 / 55
1. Normal: interval for ,
2
known: sample size
n =
_
z
/2

| x |
_
2
Does the dependency of n on the various terms make sense?
You need more samples if
you want x to come very close to ,
x
E = error = x
u = x + z
/2
/ n

l = x z
/2
/ n


MONTGOMERY: Applied Statistics, 3e
Fig. 8.2 W-138
MB (IIT Bombay) IC102 Autumn 2012 25 / 55
1. Normal: interval for ,
2
known
n =
_
z
/2

| x |
_
2
Does the dependency of n on the various terms make sense?
You need more samples if
you want x to come very close to ,
is large,
z
/2
is increased (i.e. is decreased, or 1 is increased)
3 x 2 + + 2 + 3
68%
95%
99.7%
f (x)
MONTGOMERY: Applied Statistics, 3e
Fig. 4.12 W-68 MB (IIT Bombay) IC102 Autumn 2012 26 / 55
2. Normal: interval for ,
2
unknown
The two sided interval for , when
2
is known is
x z
/2

n
What happens when
2
is unavailable?
We must estimate
2
. Its point estimate is s
2
.
Now:
x
/

n
N(0, 1)
But
x
s/

n
t
n1
i.e. is a t random variable with n 1 degrees of freedom.
MB (IIT Bombay) IC102 Autumn 2012 27 / 55
2. Normal: interval for ,
2
unknown
x
s/

n
follows the t
n1
curve
Remember that the t-density function is symmetric about the mean (0)
MB (IIT Bombay) IC102 Autumn 2012 28 / 55
2. Normal: interval for ,
2
unknown
For any (0, 1) we have
P
_
t
/2,n1
<

X
S
< t
/2,n1
_
= 1
or,
P
_

X t
/2,n1
S

n
< <

X + t
/2,n1
S

n
_
= 1
Thus, if it is observed that

X = x and S = s, then we can say with
100(1 )% condence that

_
x t
/2,n1
s

n
, x + t
/2,n1
s

n
_
Similarly, 100(1 ) upper and lower one sided intervals would be

_
x t
,n1
s

n
,
_

_
, x + t
,n1
s

n
_
MB (IIT Bombay) IC102 Autumn 2012 29 / 55
Example 7.3f
Suppose that when a signal having value is transmitted from location A the
value received at location B is normally distributed with mean and variance

2
with
2
unknown. That is, if is sent, then the value received is + N
where N, representing noise, is normal with mean 0 and variance
2
. To
reduce error, suppose the same value is sent 9 times. If the successive values
received are 5,8.5,12,15,7,9,7.5,6.5,10.5, construct a 95% condence interval
(CI) for .
x = 81/9 = 9, and s
2
=

9
i =1
x
2
i
9( x)
2
8
= 9.5, or s = 3.082
With t
0.025,8
= 2.306, a 95% CI for is
_
9 2.306
3.082
3
, 9 + 2.306
3.082
3
_
= (6.63, 11.37)
Note: in example 7.3a we used Z distribution with
2
= 4 to get a 95% CI of
(7.69, 10.31). If we assume s
2
= 4 now, we would get the 95% CI to be
(7.46, 10.54) which is larger than for earlier case.
t density function has heavier tails than Z density function.
MB (IIT Bombay) IC102 Autumn 2012 30 / 55
So far
Interval estimates for mean with known and unknown variances.
Now consider interval estimate for the unknown variance.
MB (IIT Bombay) IC102 Autumn 2012 31 / 55
3. Normal: interval for
2
If (X
1
, . . . , X
n
) N(,
2
). The point estimate of
2
is S
2
S
2
=

n
i =1
(X
i


X)
2
n 1
, E[S
2
] =
2
We have
(n 1)S
2

2

2
n1
or
S
2


2
n 1

2
n1
MB (IIT Bombay) IC102 Autumn 2012 32 / 55
3. Normal: interval for
2
0 5 10 15 20 25 x
k = 10
k = 5
k = 2
f ( x)
MONTGOMERY: Applied Statistics, 3e
Fig. 8.8 W-144
The
2
distribution has x
always > 0 and is skewed to
the right.
It is not symmetric.
The skew as n .
It is related to the gamma
function
MB (IIT Bombay) IC102 Autumn 2012 33 / 55
3. Normal: interval for
2
We compute intervals, using quantiles, as before.
No symmetry we need to evaluate both quantiles for a two sided interval.
(a)
k ,

2
0
f (x) f (x)
x
(b)

2
0
0.05
0.05
0.95, 10
= 3.94

2
0.05, 10
= 18.31
MONTGOMERY: Applied Statistics, 3e
Fig. 8.9 W-145,146
MB (IIT Bombay) IC102 Autumn 2012 34 / 55
3. Normal: interval for
2
(a)
k ,

2
0
f (x) f (x)
x
(b)

2
0
0.05
0.05
0.95, 10
= 3.94

2
0.05, 10
= 18.31
MONTGOMERY: Applied Statistics, 3e
Fig. 8.9 W-145,146
(n 1)S
2

2

2
n1
P
_
(n 1)S
2

2
/2,n1
<
2
<
(n 1)s
2

2
1/2,n1
_
= 1
MB (IIT Bombay) IC102 Autumn 2012 35 / 55
3. Normal: interval for
2
Hence when, S
2
= s
2
, a 100%(1 ) condence interval for
2
is
_
(n 1)s
2

2
/2,n1
,
(n 1)s
2

2
1/2,n1
_
One sided intervals can be obtained similarly:
Lower:
_
0,
(n 1)s
2

2
1,n1
_
Upper:
_
(n 1)s
2

2
,n1
,
_
MB (IIT Bombay) IC102 Autumn 2012 36 / 55
Example 7.3h
A company produces washers with very small deviations in their thickness.
Suppose that 10 such randomly chosen washers were measured and their
thickness found to be (in inches):
.123, .124, .126, .120, .130, .133, .125, .128, .124, .126.
Compute a 90% CI for the standard deviation of the thickness of a washer
produced by the company?
Here s
2
= 1.366 10
5
. With
2
0.05,9
= 16.917,
2
0.95,9
= 3.334, the 90% CI
for
2
is
_
9 1.366 10
5
16.917
= 7.267 10
6
,
9 1.366 10
5
3.334
= 36.875 10
6
_
Thus, a 90% CI for is (on taking square root)
(2.696 10
3
, 6.072 10
3
)
MB (IIT Bombay) IC102 Autumn 2012 37 / 55
3. Normal: interval for
2
Notice how we standardize:
We standardize x
i
as
x
i

We standardize x as
x
/

n
We standardize s
2
as
(n 1)s
2

2
MB (IIT Bombay) IC102 Autumn 2012 38 / 55
Dierence in Means of Two Normal Populations
So far, data from a single normal population.
Now consider data from two dierent normal populations.
X
1
, X
2
, ..., X
n
be a sample of size n from N(
1
,
2
1
); and
Y
1
, Y
2
, ..., Y
m
be a sample of size m from N(
2
,
2
2
).
The two samples independent of each other.
Estimate
1

2
?
MB (IIT Bombay) IC102 Autumn 2012 39 / 55
Distribution of

X

Y
Individual sample means are

X =

n
i =1
X
i
n
;

Y =

m
i =1
Y
i
m

X

Y is the maximum likelihood estimator of
1

2
.
Whats the distribution of

X

Y?
Sum of independent normal variables is also normal

X N(
1
,
2
1
/n);

Y N(
2
,
2
2
/m)

X

Y N
_

2
,

2
1
n
+

2
2
m
_
MB (IIT Bombay) IC102 Autumn 2012 40 / 55
4. Distribution of

X

Y;
2
1
,
2
2
known
Assuming
2
1
,
2
2
are known

X

Y (
1

2
)
_

2
1
n
+

2
2
m
N(0, 1)
P
_
_
_
z
/2
<

X

Y (
1

2
)
_

2
1
n
+

2
2
m
< z
/2
_
_
_
= 1
or, equivalently
P
_

X

Y z
/2
_

2
1
n
+

2
2
m
<
1

2
<

X

Y + z
/2
_

2
1
n
+

2
2
m
_
= 1
MB (IIT Bombay) IC102 Autumn 2012 41 / 55
4. Distribution of

X

Y;
2
1
,
2
2
known
If

X,

Y are observed to be x, y, then a 100(1 )% two sided Condence
Interval on
1

2
is

2

_
x y z
/2
_

2
1
n
+

2
2
m
, x y + z
/2
_

2
1
n
+

2
2
m
_
MB (IIT Bombay) IC102 Autumn 2012 42 / 55
Example7.4(a)
Estimating dierence in means of two normal populations with known variances
Two dierent types of electrical cable insulation have been tested to
determine the voltage level at which failures tend to occur. when specimens
were subjected to an increasing voltage stress in a laboratory experiment,
failures for the two types of cable insulation occurred at the following
voltages:
Type A: 36, 44, 41, 53, 38, 36, 34, 54, 52, 37, 51, 44, 35, 44.
Type B: 52, 64, 38, 68, 66, 52, 60, 44, 48, 46, 70, 62.
Suppose it is known that the amount of voltage that cables having type A
insulation can withstand is normally distributed with unknown mean
A
and
known variance
2
A
= 40, whereas the corresponding distribution for type B
insulation is normal with unknown mean
B
and known variance
2
B
= 100.
Determine a 95% condence interval for
A

B
.
MB (IIT Bombay) IC102 Autumn 2012 43 / 55
Example 7.4a (continued)
x = 42.8, y = 55.8
= 0.05, z
/2
= z
0.025
= 1.96
n = 14,
2
1
= 40; m = 12,
2
2
= 100

2

_
x y z
/2
_

2
1
n
+

2
2
m
_
= (19.6, 6.5)
MB (IIT Bombay) IC102 Autumn 2012 44 / 55
5. Distribution of

X

Y;
2
1
,
2
2
unknown
What if
2
1
,
2
2
are also unknown alongwith
1
,
2
?
Natural to replace
2
1
,
2
2
with S
2
1
, S
2
2
,
S
2
1
=

n
i =1
(X
i


X)
2
n 1
, S
2
2
=

n
i =1
(Y
i


Y)
2
m 1
Then use

X

Y (
1

2
)
_
S
2
1
/n + S
2
2
/m
But distribution of above Random variable is complicated for the case when

2
1
=
2
2
.
MB (IIT Bombay) IC102 Autumn 2012 45 / 55
5. Distribution of

X

Y;
2
1
=
2
2
unknown
Let unknowns
2
1
=
2
2
=
2
Distribution of S
2
1
, S
2
2
(n 1)
S
2
1

2

2
n1
(m 1)
S
2
2

2

2
m1
Since samples independent, the two chisquare random variables are
independent.
(n 1)
S
2
1

2
+ (m 1)
S
2
2

2

2
n+m2
Let,
S
2
p
=
(n 1)S
2
1
+ (m 1)S
2
2
n + m 2
be the pooled sample variance.
n + m 2

2
S
2
p

2
n+m2
MB (IIT Bombay) IC102 Autumn 2012 46 / 55
5. Distribution of

X

Y;
2
1
=
2
2
unknown
Since,

X

Y N
_

2
,

2
n
+

2
m
_
it follows that

X

Y (
1

2
)
_

2
n
+

2
m
N(0, 1)
Thus,

X

Y (
1

2
)
_

2
n
+

2
m

_
S
2
p
/
2
=

X

Y (
1

2
)
_
S
2
p
(1/n + 1/m)
has a t-distribution with n + m 2 degrees of freedom.
MB (IIT Bombay) IC102 Autumn 2012 47 / 55
5. Distribution of

X

Y;
2
1
=
2
2
unknown
Thus,
P
_
t
/2,n+m2

X

Y (
1

2
)
S
p
_
1/n + 1/m
t
/2,n+m2
_
= 1
Hence, when data results in

X = x,

Y = y, S
2
p
= s
2
p
, a 100(1 )%
two-sided CI for
1

2
is
_
x y t
/2,n+m2
s
p
_
1/n + 1/m, x y + t
/2,n+m2
s
p
_
1/n + 1/m
_
MB (IIT Bombay) IC102 Autumn 2012 48 / 55
Example 7.4a Modied
Estimating dierence in means of two normal populations with unknown but same
variances
Consider the earlier example but with the variances unknown but assumed to
be the same.
n = 14, x = 42.8, s
2
1
= 52.03; m = 12, y = 55.8, s
2
2
= 110.88
s
2
p
= ((n 1) s
2
1
+ (m 1) s
2
2
)/(n + m 2) = 79
= 0.05, t
0.025,24
= 2.06

2

_
x y t
/2,n+m2
s
p
_
1
n
+
1
m
_
= (20.26, 5.83)
MB (IIT Bombay) IC102 Autumn 2012 49 / 55
6. Binomial: interval for p
So far: CIs for parameters of normal distribution.
Now consider: success probability p in Binomial or Bernoulli distribution.
p is the mean of Bernoulli Random Variable.
Assume n trials, X positive outcomes. Then,
X Bi (n, p) E[X] = np var(X) = np(1 p)
As n increases, using Central Limit Theorem (CLT),
X N(np, np(1 p))
Therefore
X np
_
np(1 p)
N(0, 1)
MB (IIT Bombay) IC102 Autumn 2012 50 / 55
6. Binomial: interval for p
To obtain the interval, observe that
X N(np, np(1 p)) = N
_
,
2
_
From the Gaussian density function:
P
_
z
/2
<
X

< z
/2
_
= 1
Thus,
P
_
z
/2
<
X np
_
np(1 p)
< z
/2
_
1
MB (IIT Bombay) IC102 Autumn 2012 51 / 55
6. Binomial: interval for p
Thus, for X = x, an approximate 100(1 )% CI for p is
_
p : z
/2
<
x np
_
np(1 p)
< z
/2
_
Above is however not an interval
We can replace p with p = X/n = p
ML
P
_
z
/2
<
X np
_
n p(1 p)
< z
/2
_
1
Multiply throughout by
_
n p(1 p). Also, since X = n p,
P
_
z
/2
_
n p(1 p) < n p np < z
/2
_
n p(1 p)
_
1
Rearrange.
P
_
p z
/2
_
p(1 p)
n
< p < p + z
/2
_
p(1 p)
n
_
1
MB (IIT Bombay) IC102 Autumn 2012 52 / 55
6. Binomial: interval for p
A two-sided 100(1 )% CI on p then is
_
p z
/2
_
p(1 p)
n
_
Ex7.5a: A sample of 100 transistors is randomly chosen from a large batch to
determine if they meet the standards. If 80 of them meet the standards, then
an approximate 95% CI for p the fraction of all transistors that meet the
standards, is given by
(0.8 1.96
_
0.8(0.2)/100, 0.8 + 1.96
_
0.8(0.2)/100) = (0.7216, 0.8784)
MB (IIT Bombay) IC102 Autumn 2012 53 / 55
6. Binomial: Sample Size n
Let b be the desired width of a 100(1 )% condence interval:
b = 2z
/2
_
p(1 p)
n
Solving for n gives
n = p(1 p)
_
2z
/2
b
_
2
that is, if k items were initially sampled to obtain estimate p of p, then an
additional n k (or 0 if n k) items should be sampled (keep p xed at
earlier value).
We can get an upper bound on n for p = 1/2:
n =
1
4
_
2z
/2
b
_
2
MB (IIT Bombay) IC102 Autumn 2012 54 / 55
THANK YOU
MB (IIT Bombay) IC102 Autumn 2012 55 / 55

You might also like