You are on page 1of 49

Review #2

Chapter 9
Chapter 10
Chapter 11
Chapter 12
1
Chapter 9
A statistic is a random variable describing a
characteristic of a random samples.
Sample mean
Sample variance
We use statistic values in inferential
statistics (make inference about population
characteristics from sample characteristics).
Statistics have distributions of their own.
The Central Limit Theorem
The distribution of the sample mean is normal if the
parent distribution is normal.
The distribution of the sample mean approaches the
normal distribution for sufficiently large samples
(n 30), even if the parent distribution is not normal.
The parameters of the sample distribution of the mean
are:
Mean:
Standard deviation: x x
x
x
n
Problem 1
Given a normal population whose mean is
50 and whose standard deviation is 5,
Find the probability that a random sample of 4
has a mean between 49 and 52
Answer:

49 50 52 50
P(49 x 52) P( Z )
5 4 5 4
P( .4 Z .8) .7881 .3446 .4435
-.4 .8
Problem 2
Find the probability that a random sample of 16
has a mean between 49 and 52.
Answer
49 50 52 50
P(49 x 52) P( Z )
5 16 5 16
P( .8 Z 1.6) .9332 .2119 .7213
Problem 2
The amount of time per day spent by adults
watching TV is normally distributed with =6 and
=1.5 hours.
What is the probability that a What is the probability that 5
randomly selected adult adults watch TV on the
watches TV for more than average 7 or more hours?
7 hours a day?
Answer: Answer:
76 7 6
P(X 7) P Z
P(X 7) P Z
1.5 1.5 5

P(Z .67) 1 .7486 .2514 P(Z 1.49) 1 .9319 .0681
Problem 2
Additional question
What is the probability that the total TV
watching time of the five adults sampled will
exceed 28 hours?
Answer:
5.6 6

P( X 28 / 5) P Z
1.5 5

Sampling distribution of the
sample proportion
In a sample of size n, if np > 5 and n(1-p) > 5,
then the sample proportion p^ = x/n is
approximately normally distributed with the
following parameters:

p(1 p)
p p and p , therefore,
n
p p
Z
p(1 p) n
Problem 3
A commercial of a household appliances
manufacturer claims that less than 5% of all
of its products require a service call in the
first year.
A survey of 400 households that recently
purchased the manufacturer products was
conducted to check the claim.
Problem 3
Assuming the manufacturer is right, what is the
probability that more than 10% of the surveyed
households require a service call within the first
year?

.10 .05
P(p .10) P Z P(Z 4.59) 0
.05(1 .05) 400

If indeed 10% of the sampled households reported


a call for service within the first year, what does it
tell you about the the manufacturer claim?
Chapter 10
A populations parameter can be estimated
by a point estimator and by an interval
estimator.
A confidence interval with 1-a confidence
level is an interval estimator that covers the
estimated parameters (1-a)% of the time.
Confidence intervals are constructed using
sampling distributions.
Confidence interval of the mean
We use the central limit theorem to build
the following confidence interval


x za / 2 x za / 2
n n

a/2 1-a a/2

-za/2 za/2
Problem 4
How many classes university students miss
each semester? A survey of 100 students
was conducted. (see Missed Classes)
Assuming the standard deviation of the
number of classes missed is 2.2, estimate
the mean number of classes missed per
student.
Use 99% confidence level.
Problem 4
Solution

= 10.21 2.575 = 10.21 .57
2 .2
x za / 2
n 100
Missed classes

1- a = .99 Mean 10.21


a = .01 Standard Error
Median
0.21755993
10
a/2 = .005 Mode 10
Standard Deviation 2.1755993
Za/2 = Z.005= 2.575 Sample Variance 4.73323232
Kurtosis 0.91111511
Skewness -0.107237
Range 14
LCL = 9.64, UCL = 10.78 Minimum 3
Maximum 17
Sum 1021
Count 100
Selecting the sample size
The shorter the confidence interval, the
more accurate the estimate.
We can, therefore, limit the width of the
interval to W, and get

x W x za / 2 or W z a / 2
n n
From here we have
2
za / 2
n
W
Problem 5
An operation manager wants to estimate the
average amount of time needed by a worker
to assemble a new electronic component.
Sigma is known to be 6 minutes.
The required estimate accuracy is within 20
seconds.
The confidence level is 90%; 95%.
Find the sample size.
Problem 5
Solution
= 6 min; W = 20 sec = 1/3 min;
1 - a =.90 Za/2 = Z.05 = 1.645
2 2 2
z a / 2 z.05 1.645(6)
n 876.75
W W 1/ 3
Take n 877
1-a = .95, Za/2 = Z.025 = 1.96
2
1.96(6)
n 1244.67 Take n 1245
1/ 3
Chapter 11
Hypotheses tests
In hypothesis tests we hypothesize on a value of
a population parameter, and test to see if there
is sufficient evidence to support our belief.
The structure of hypotheses test
Formulate two hypotheses.
H0: The one we try to reject in favor of
H1: The alternative hypothesis, the one we try to prove.
Define a significance level a.
Hypotheses tests
The significance level is the probability of
erroneously reject the null hypothesis.
a= P(reject H0 when H0 is true)
Sample from the population and calculate a
statistic that provides an indication whether or
not the parameter value defined under H1 is
more probable.
We shall test the population mean assuming the
standard deviation is known.
Problem 6
A machine is set so that the average
diameter of ball bearings it produces is .50
inch. In a sample of 100 ball bearings the
mean diameter was .51 inch. Assuming the
standard deviation is .05 inch, can we
conclude at 5% significance level that the
mean diameter is not .50 inch.
Problem 6
The population studied is the ball-bearing
diameters.
We hypothesize on the population mean.
A good point estimator for the population
mean is the sample mean.
We use the distribution of the sample mean
to build a sample statistic to test whether
= .50 inch.
Problem 6
Solution
Probability of
Define the hypotheses: type one error
H0: = .50
H1: = .50
Define a rejection region. Note that this is a two tail
test because of the inequality.
P(X X L1 or X X L2 given that .50) .05
P(Z Z L1 or Z Z L2 given that .50) .05
Let us take symmetrica l rejection area Z L1 Z L2
Problem 6

P(Z Z.025 or Z Z.025 given that .50) .05


Critical Z
Z.025 = 1.96 (obtained from the Z-table)

Build a rejection region: Zsample> Za/2, or


Zsample<-Za/2 1.96
-1.96
Calculate the value of the sample Z statistic
and compare it to the critical value
X .51 .50 Since 2 > 1.96, there is
Z sample 2 sufficient evidence to reject
n .05 100 H0 in favor of H1 at 5%
significance level.
Problem 6
We can perform the test in terms of the mean
value.
Let us find the critical mean values for
rejection

XL1=0 + Z.025 =.50+1.96(.05/(100)1/2=.5098
n

XL2=0 - Z.025 =.50 -1.96(.05/(100)1/2=.402
n
Since.51 > .5098, there is sufficient evidence to
reject the null hypothesis at 5% significance level.
Problem 7
The average annual return on investment for
American banks was found to be 10.2% with
standard deviation of 0.8%.
It is believed that banks that exercise comprehensive
planning do better.
A sample of 26 banks that conducted a
comprehensive training provided the following
result: Mean return = 10.5%.
Can we infer that the belief about bank performance
is supported at 10% significance level by this sample
result?
Problem 7
The population tested is the annual rate of
return.
H0: = 10.2
H1: > 10.2
Let us perform the test with the p-value method:
P(X > 10.5 given that = 10.2) =
P(Z > (10.5 10.2)/[.8/(26)1/2] =
P(Z > 1.91) = 1 - .5719 = .0281
Since .0281 < .10 we reject the null hypothesis at
10% significance level.
Problem 7
Note the equivalence between the
standardized method or the rejection region
method and the p-value method.
P(Z>Z.10) = .10
Z10 = 1.28
Run the test with Data Analysis Plus.
.0281
See data in Return

1.28 1.91
Type II Error
Type II error occurs when H0 is erroneously not
rejected.
The probability of a type II error is called b.
b=P(Do not reject H0 when H1 is true)
To calculate b:
H1 specifies an actual parameter value (not a range of
values). Example: H0: = 100; H1: = 110
The critical value is expressed in original terms (not in
standard terms).
Problem 7a
What is the probability youll believe the
mean return in problem 7 is 10.2% while
actually its 10.6%, if the sample provided a
mean return of 10.5%?
Problem 7a
Solution
The two hypotheses are:
H0: = 10.2
H1: = 10.6
H0 is not rejected (we believe = 10.2) if the
sample mean is less than a critical value.
Therefore, the probability required is:
b = P(X < Xcr | = 10.6).
Problem 7a
The critical value is (recall, this problem was a case
of a right hand tail test, with 10% significance
level):
.8
X L 0 Z.10 10.2 1.28 10.40
n 26
b = P(X<10.4 when = 10.6) =
P(Z < (10.4-10.6)/[.8/(26)1/2]) = P(Z < -1.27) = .102
Chapter 12
Generally, the standard deviation is unknown the
same way the mean may be unknown.
When the standard deviation is unknown, we need
to change the test statistic from Z to t.
We shall test three population parameters:
Mean
Variance
Proportion
Testing the mean
(unknown variance)
Replace the statistic Z with t

X
t
s n
The original distribution must be normal (or at
least mound shaped).
Problem 8
A federal agency inspects packages to determine if
the contents is at least as great as that advertised.
A random sample of (i)5, (ii)50 containers whose
packaging states that the weight was 8.04 ounces
was drawn. (See Content).
From the sample results
Can we conclude that the average weight does not meet
the weight stated? (use a = .05).
Estimate the mean weight of all containers with 99%
confidence
What assumption must be met?
Problem 8
Solution
We hypothesize on the mean weight.
H0: = 8.04
H1: < 8.04
(i) n=5. For small samples let us solve manually
Assume the sample was: 8.07, 8.03, 7.99, 7.95, 7.94
The rejection region: t < -ta,n1 = -t.05,5-1 = -2.132
The tsample = ?
Mean = (8.07++7.94)/5 = 7.996 -2.132
Std. Dev.={[(8.07-7.996)2++(7.94- 7.996)2]/4}1/2 = 0.054
Problem 8
The t sample is calculated as follows:

X 7.996 8.04
t 1.32
s n 0.054 5

Since -1.32 > -2.132


-2.132 the sample statistic does not
fall into the rejection region. There is insufficient
evidence to conclude that the mean weight is smaller
than 8, at 5% significance level.
Rejection Region
-1.32
Problem 8
(ii) n=50. To calculate the sample statistics we use
Excel, Descriptive statistics from the Tools>Data
analysis menu. From the sample we obtain:
Mean = 8.02; Std. Dev. = .04
The confidence interval is calculated by
s .04
x ta/2 = 8.02 2.678 = 8.02 .015
n 50
1-a = .99
a = .01 or LCL = 8.005, UCL = 8.35
a/2 = .005
t.005,50-1 = about 2.678 from the t - table
Problem 8
Comments
Check whether it appears that the distribution is
normal
Frequency

20
15
10
5
0
7.93 7.97 8.01 8.05 8.09 More
Using Excel
To obtain an exact value for t use the TINV
function:
Degrees of
=TINV(0.01,49) freedom

The exact value: 2.6799535


.01 is the two tail probability
Problem 8
In our example recall:
t-Test: Two-Sample Assuming Unequal Variances
H0: = 8.4
H1: < 8.4 Weights V2
Mean 8.0182 8.04
The p-value = Variance 0.001627 0
Observations 50 50
.000187 < .05 Hypothesized Mean Difference0
There is sufficient df
t Stat
49
-3.82126
evidence to reject P(T<=t) one-tail 0.000187
t Critical one-tail 1.676551
the H0 in favor of H1. P(T<=t) two-tail 0.000375
t Critical two-tail 2.009574

Note: t = (8.018-8.04)/[.0403/(50)1.2]=-3.82. < -t.05,49 = -1.676


Inference about the population
Variance
The following statistic is c2 (Chi squared)
distributed with n-1 degrees of freedom:
2
2 (n 1)s
c 2

We use this relationship to test and estimate


the variance.
Inference about the population
Variance
The Hypotheses tested are:

H0 : 2 20
H1 : 2 20 or 20 or 20

The rejection region is:


(n 1)s 2
c 2a ,n 1 or c12 a ,n1
20
a
For the two tail test replace with a.
2
Problem 9
A random sample of 100 observations was
taken from a normal population. The
sample variance was 29.76.
Can we infer at 2.5% significance level that
the population variance exceeds 30?
Estimate the population variance with 90%
confidence.
Problem 9
Solution:
H0:2 = 30
Rejection region: c2 < c2a, n-1
H1:2 < 30
(n 1)s2 (100 1)29.762
c2 = = = 97.42
02 302
For the confidence interval
look at page 370.
c2a,n-1 = c2.025,100-1 = about 129.561

Since 97.42 < 129.42 we conclude that there is


sufficient evidence at 2.5% significance level that the
variance is smaller than 30.
Using Excel
We can get an exact value of the probability P(c2d.f.>
c2) = ? for a given c2 and known d.f. This makes it
possible to determine the p-value.
Use the CHIDIST function: =CHIDIST(c2,d.f.)
For example: = CHIDIST(97.42,99) = .526
That is: P(c299> 97.42) = .526
In our example we had a left hand tail rejection region.
The p-value is calculated based on the c2 value (97.42):
P(c299 < 97.42) = 1 - .526
Using Excel
We can get the exact c2 value for which
P(c2d.f.> c2) = a, for any given probability a
and known d.f.
Use the CHIINV function =CHIINV(a,d.f.)

For example: =CHIINV(.025,99) = 128.4219


That is: P(c299 > ?) = .025. c2 = 128.4219
Inference about a population
proportion
The test and the confidence interval are based on
the approximated normal distribution of the
sample proportion, if np>5 and n(1-p)>5.
For the confidence interval of p we have:
p( 1 p )
p Z a 2
n
where p^ = x/n

For the hypotheses test, we run a Z test.


Problem 10
A consumer protection group run a survey
of 400 dentists to check a claim that 4 out of
5 dentists recommend ingredients included
in a certain toothpaste.
The survey results are as follows:
71 No; 329 Yes
At 5% significance level, can the consumer
group infer that the claim is true?
Problem 10
Solution
The two hypotheses are:
H0: p = .8
H1: p > .8 The rejection region: Z > Za

p p .8225 .8
Z 1.18
p (1 p ) n .8225(1 .8225) 400
Z.05 = 1.645
Since 1.18 < 1.645 the consumer group cannot confirm
the claim at 5% significance level.

You might also like