You are on page 1of 41

INFERENTIAL STATISTICS

Statistical inference may be divided into two major


areas:
Estimation

Testing of Hypothesis

ESTIMATION Terms
In theory of estimation, STATISTIC is renamed as
ESTIMATOR (

x)

Value of an estimator is called ESTIMATE

Point Estimate
is a SINGLE VALUE of the estimator, obtained from available
sample observations
Example : proportion of vegetarians in a random sample of 50
PGP students can be a point estimate of the corresponding
proportion in the population of all PGP students.
Interval Estimate (Confidence Interval)
is an INTERVAL that provides an upper and lower bound for a
specific unknown population parameter.
Ex: The interval (45, 52) may contain the true proportion of
vegetarians among all PGP students with 95% confidence.
Point estimate is always within the interval estimate

POINT ESTIMATION

Point Estimators and Their


Properties
An estimator of a parameter is a statistic used to estimate the
parameter. The most commonly-used estimator of the:
Population (Parameter)
Estimator (statistic)
Mean ()
is the
Mean (X)
Variance (2)
is the
Variance (s2)
Standard Deviation ()
is the
Standard Deviation (s)
Proportion (p)
is the
proportion ( p )
Difference of means (1 2 ) is the
difference of means( x1 x2 )

Desirable properties of estimators include:


Unbiasedness
Efficiency
Consistency

Unbiased and Biased Estimators

Bias

An unbiased estimator is on
target on average.

A biased estimator is
off target on average.

Properties of estimator: Unbiasedness


T is said to be an unbiased estimator of iff E(T)=
Example:
SAMPLE MEAN IS THE ESTIMATOR OF POPULATION MEAN

1 n
1 n
1 n
E ( x ) E ( xi ) E ( xi ) E ( x )
n i 1
n i 1
n i 1

Example of biased estimator: Sample


variance.
Given sample of size n from the population with unknown mean () and variance
(2) we estimate mean as we already know and variance (intuitively) as:
2
1 n
1 n 2
2
T ( xi x ) xi x
n i 1
n i 1

Sample variance is not an unbiased estimator for the population variance. That is why
when mean and variance are unknown the following equation is used for sample
variance:
1 n

E (T )

E (x

n
i 1

2
i

) E(x 2 )

1 n
[var( xi ) ( E ( xi )) 2 ] [var( x ) ( E ( x )) 2
n i 1
1

2
2
[ ] [ 2 ]
n i 1
n
n 1 2

n
n

1 n
2
s
(
x

x
)
i
n 1 i 1
2

Consistency
A consistent estimator converges towards the
parameter being estimated as the sample size
increases. i.e. E (T ) and
V (T ) 0 as n

n = 10
McGraw-Hill/Irwin

n = 100
2007 The McGraw-Hill Companies, Inc. All rights reserved.

Efficiency
An estimator is efficient if it has a relatively small variance (and
standard deviation).

An efficient estimator is,


on average, closer to the
parameter being estimated..

An inefficient estimator is, on


average, farther from the
parameter being estimated.

sample mean vs sample median


E(sample mean)=
E(sample median)=
V(sample mean) is 2/n

V(sample median) is 1.572/n

Example
Suppose, you want to estimate mean and sd of score of a batsman
in one day cricket. So, you have randomly chosen 5 different
innings and recorded scores as below
20 52 8 63 11
Find out unbiased estimator of mean and variance.

x 30.8
1 n
2
s
(
x

x
)
25.07

i
n 1 i 1

Interval Estimator =
Point Estimator Margin of
Error

Elements of Interval Estimation

A Probability That the Population Parameter Falls


Somewhere Within the Interval.
Sample
Confidence Interval
Statistic or
Point
Estimator
Confidence Limit
(Lower)

Confidence Limit
(Upper)

Elements of Interval Estimation


Confidence Coefficient/Level : Probability that
the confidence interval will contain true
parameter
Denoted by (1 - ) % e.g. 90%, 95%, 99%
: Probability that the interval does not contain the
parameter

The confidence coefficient is the area under the


curve of the sampling distribution.

Interval Estimator (Large


Sample)
Confidence
Intervals
Mean

Known
known

Proportion

unknown

Confidence Interval of (
known)
Assumption

Population Standard Deviation, is Known

Sample size is large

Confidence Interval Estimator of :

Let x1 , x2 ,..., xn be an iid random sample of size n,


drawn from a population with mean and sd .
100(1-)% Confidence Interval of is

x z / 2

x z / 2


The quantity
z / 2
the sampling error. n

For example, if: n = 30


= 20
x = 122

is often called the margin of error or

A 95% confidence interval:


=0.05 ; /2=0.025

z0.025 1.96

20
x 1.96
122 1.96
n
30
122 (1.96)(3.65)
122 7.15

114.85,129.15

What is happening?
Sampling Distribution of the Mean
0.4

95%
f(x)

0.3

0.2

0.1

2.5%

2.5%

0.0

x 1.96
n

x 1.96

x
x

2.5% fall below


the interval

x
x
x

2.5% fall above


the interval

x
x
x

95% fall within


the interval

Background
We define z as the z value that cuts off a right-tail area of under the standard
2
2
normal curve.

P z > z /2

P z < z /2

<
<
P z z z (1 )
2

S t a n d ard N o r m al Dis trib utio n


0.4

(1 )

f(z)

0.3

0.2

0.1

(1- )100% Confidence Interval:

0.0
-5

-4

-3

-2

-1

2
2

x z
2

Critical Values of z and Levels of


Confidence
0.99
0.98
0.95
0.90
0.80

0.005
0.010
0.025
0.050
0.100

S t a n d a rd N o r m al Di s trib utio n

0.4

(1 )

2.576
2.326
1.960
1.645
1.282

0.3

f(z)

(1 )

0.2

0.1

2
0.0
-5

-4

-3

-2

-1

2
2

Example 10.1

Confidence level and the Width


of the Confidence Interval
When sampling from the same population, using a fixed sample size, the

higher the confidence level, the wider the confidence interval.


S t a n d a r d N o r m al Di s tri b u ti o n

0.4

0 .4

0.3

0 .3

f(z)

f(z)

S t a n d a r d N o r m al Di s tri b uti o n

0.2

0.1

0 .2

0 .1

0.0

0 .0
-5

-4

-3

-2

-1

-5

-4

-3

-2

-1

80% Confidence Interval:


x 128
.

95% Confidence Interval:


x 196
.

Sample Size and the Width of the


Confidence Interval
When sampling from the same population, using a fixed confidence
level, the larger the sample size, n, the narrower the confidence
interval.
S a m p lin g D is trib u tio n o f th e Me a n

S a m p lin g D is trib u tio n o f th e Me a n

0 .4

0 .9
0 .8
0 .7

0 .3

f(x)

f(x)

0 .6
0 .2

0 .5
0 .4
0 .3

0 .1

0 .2
0 .1

0 .0

0 .0

95% Confidence Interval: n = 20

95% Confidence Interval: n = 40

Confidence Interval of (
unknown)

Assumption

Population Standard Deviation, is unknown

Sample size is large

Confidence Interval Estimator of :


Let x1 , x2 ,..., xn be an iid random sample of size n, drawn from a large sample
with mean and sd . 100(1-)% Confidence Interval of is

x z / 2

s
n

x z / 2

where s

1 n
2

x
i
n i 1

s
n

Practice Problem 1:
A manufacturer of light bulbs claims that its light bulbs have a mean life hours
with a standard deviation of 85 hours. A random sample of 40 such bulbs is
selected for testing. If the sample produces a mean value of 1505 hours, find out
95% Confidence Interval of .
Solution: Given, n=40 (large), =85 (known), 1-=0.95, =0.05,
x 1505

z / 2 z 0.025 1.96

Therefore,
95% CI of is given by

85
85

1505 40 1.96 , 1505 40 1.96

1478.66 , 1531.34

Practice Problem 2:
Waiting times (in hours) at a popular restaurant are found to have a mean waiting
time of 1.52 hours with sd 2.25hrs. for a sample of 50 customers. Construct the
99% confidence interval for the estimate of the population mean.
Solution: Given, n=50 (large), s=2.25 (estimated), 1-=0.99, =0.01,

z / 2 z 0.005 2.58
Therefore,
99% CI of is given by

x 1.52

2.25
2.25
2.58 , 1.52
2.58
1.52
50
50

1.20 , 2.34

Large-Sample Confidence Intervals


for the Population Proportion, p
The estimator of the population proportion, p , is the sample proportion, p . If the
sample size is large, p has an approximately normal distribution, with E( p ) = p and
pq
V( p ) =
, where q = (1 - p). When the population proportion is unknown, use the
n
estimated value, p , to estimate the standard deviation of p .
For estimating p , a sample is considered large enough when both n p an n q are greater
than 5.

Large-Sample Confidence Intervals


for the Population Proportion, p
Assumptions
Two Categorical Outcomes
Population Follows Binomial Distribution
Large Sample
100(1-)% Confidence Interval for population
proportion p is given by
p Z / 2

p (1 p )
p p Z / 2
n

p (1 p )
n

Practice Problem 3:
A marketing research firm wants to estimate the share that foreign companies
have in the Indian market for certain products. A random sample of 100
consumers is obtained, and it is found that 34 people in the sample are users
of foreign-made products; the rest are users of domestic products. Give a
95% confidence interval for the share of foreign products in this market.

p z
2

pq
( 0.34 )( 0.66)

0.34 1.96
n
100
0.34 (1.96)( 0.04737 )
0.34 0.0928
0.2472 ,0.4328

Thus, the firm may be 95% confident that foreign manufacturers control
anywhere from 24.72% to 43.28% of the market.

Reducing the Width of Confidence Intervals The Value of Information


The width of a confidence interval can be reduced only at the
price of:
a lower level of confidence, or
a larger sample.
Lower Level of Confidence

Larger Sample Size


Sample Size, n = 200

90% Confidence Interval


p z
2

pq

(0.34)(0.66)
0.34 1645
.
n
100
0.34 (1645
. )(0.04737)
0.34 0.07792
0.2621,0.4197

p z
2

pq

(0.34)(0.66)
0.34 196
.
n
200
0.34 (196
. )(0.03350)
0.34 0.0657
0.2743,0.4057

Interval Estimator (Small Sample)


Confidence
Intervals
Mean

Known
known

unknown

Confidence Interval of (
known)

Assumption

Population Distribution is Normal

Population Standard Deviation, is known


Confidence Interval Estimator of :

Let x1 , x2 ,..., xn be an iid random sample of size n, drawn


from a normal distribution with mean and sd .
100(1-)% Confidence Interval of is

x z / 2

x z / 2

Confidence Interval of (
unknown)
Assumption
Population Distribution is Normal
Population Standard Deviation, is unknown

Confidence Interval Estimator of :


Let x1 , x2 ,..., xn be a random sample of size n, drawn from normal with mean
and sd . 100(1-)% Confidence Interval of is

x t / 2,n 1

s
n

x t / 2,n 1

where t / 2 is the value of the t distribution


with n-1 degrees of freedom that cuts off
a tail area of to its right.

where s

s
n
1 n
2

x
i
n 1 i 1

Practice Problem 4:
A stock market analyst wants to estimate the average return on a certain
stock. A random sample of 15 days yields an average (annualized) return of
x 10.37% and a standard deviation of s = 3.5%. Assuming a normal
population of returns, give a 95% confidence interval for the average return
on this stock.
The critical value of t for df = (n -1) = (15 -1) =14 and a righttail area of 0.025 is:

`2

15 2

14

t 0.025 2.145
= 13.125; ` = 3.623

=
The corresponding confidence interval or interval estimate is:
x t 0.025

s
n

10.37 2.145
10.37 1.81
8.56,12.18

3.623
15

Sample-Size Determination
Before determining the necessary sample size, three questions must be
answered:

How close do you want your sample estimate to be to the unknown

parameter? (What is the desired bound, B?)


What do you want the desired confidence level (1-) to be so that the
distance between your estimate and the parameter is less than or equal to
B?
What is your estimate of the variance (or standard deviation) of the
population in question?

For example : (1 - )% Confidence Interval for : x z

Bound, B

Minimum Sample Size: Mean


and Proportion
Minimum required sample size in estimating the population
mean, :
z2 2
n 2 2
B
Bound of estimate : B (Known)

Minimum required sample size in estimating the population


proportion,
z2 pq
n 2 2
B

Example 1
A marketing research firm wants to conduct a survey to estimate the average
amount spent on entertainment by each person visiting a popular resort. The
people who plan the survey would like to determine the average amount spent by
all people visiting the resort to within $120, with 95% confidence. From past
operation of the resort, an estimate of the population standard deviation is
s = $400. What is the minimum required sample size?
z
2

(1.96) ( 400)

120
2

42.684 43

Example 2
The manufacturers of a sports car want to estimate the proportion of people in a
given income bracket who are interested in the model. The company wants to
know the population proportion, p, to within 0.01 with 99% confidence. Current
company records indicate that the proportion p may be around 0.25. What is the
minimum required sample size for this survey?

z2 pq
2

B2

2.5762 (0.25)(0.75)

010
. 2
124.42 125

Problem
NDTV randomly selected 10,000 final year students
across different management schools in India and
asked them about their career choices. 4% said they
want to take the plunge and start their own companies
even if that meant giving up lucrative job offers from
established MNCs. Find a 99% confidence interval of
the true population proportion of management
students in India who want to work their start-ups.

You might also like