You are on page 1of 87

Key Statistical Concepts

Population
a population is the group of all items of interest to a
statistics practitioner.
frequently very large; sometimes infinite.
E.g. All 5 million Florida voters, per Example 12.5

Sample
A sample is a set of data drawn from the population.
Potentially very large, but less than the population.
E.g. a sample of 765 voters exit polled on election day.
Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.

9.1

Key Statistical Concepts


Parameter
A descriptive measure of a population.
Statistic
A descriptive measure of a sample.

Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.

9.2

Descriptive and Inferential Statistics


Descriptive statistics are methods of organizing,
summarizing, and presenting data in a convenient and
informative way.
Descriptive Statistics describe the data set thats being
analyzed, but doesnt allow us to draw any conclusions or
make any interferences about the data.
Inferential statistics is also a set of methods, but it is used to
draw conclusions or inferences about characteristics of
populations based on data from a sample.

Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.

9.3

Chapter 9
Sampling Distributions

Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.

9.4

In real life calculating parameters of populations is


prohibitive because populations are very large.
Rather than investigating the whole population, we
take a sample, calculate a statistic related to the
parameter of interest, and make an inference.
The sampling distribution of the statistic is the tool
that tells us how close is the statistic to the
parameter.

Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.

9.5

Remember
Parameters describe populations
Parameters are almost always unknown
We take a random sample of a population to obtain the
necessary data
We calculate one or more statistics from the sample
data

Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.

9.6

Notation for Samples and Populations


Statistics

= sample mean

Parameters

m = population mean

s2 = sample variance

s2 = population variance

s = sample standard
deviation

s = population standard
deviation

Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.

9.7

Properties of the Standard Deviation, s


1.

s measures the variability is a sample of measurements. It is a


measure of how much the sample values deviate from the
sample mean.

2.

s is a nonnegative number. If all the numbers in a sample are


equal, the value of the standard deviation will be zero. This is
the smallest possible value for the standard deviation.

3.

When comparing 2 samples of data, the sample that is more


variable will have a larger standard deviation.

Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.

9.8

Interpreting Standard Deviation


The standard deviation can be used to
compare the variability of several distributions
make a statement about the general shape of a distribution.

The empirical rule: If a sample of observations has a


mound-shaped distribution, the interval
( x s, x s) contains approximately 68% of the measuremen ts
(x 2s, x 2s) contains approximately 95% of the measuremen ts
( x 3s, x 3s) contains approximately 99.7% of the measuremen ts

Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.

9.9

Simulation
We will first simulate a number of samples, n, of a
certain size. We will then find the mean of each of
the samples. Then we will talk about the
distributions of these means. We will also make a
histogram to graphically illustrate the distribution.

Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.

9.10

Describing Quantitative Data - Histograms


The histogram will look like a bar chart, but the intervals
will be right next to each other. There may not be spaces
between bars.
Note: Spaces would actually be intervals with no
observations, so technically, there is never space between
bars in a histogram.
We will use technology to produce histograms, so we will
focus on interpreting the picture, rather than producing it
by hand.
Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.

9.11

Histogram Example #1
Suppose we examine the scores on the first test for
students in MAT 213 from past semesters.
Clearly this is quantitative data, and as such, a histogram
of the scores is appropriate.
Using Minitab, the following histogram was produced.
Notice the bars are equal width, the heights of the bars
give the number of scores in each interval, etc.

Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.

9.12

Describing Quantitative Data - Histograms


Histogram of Test1
20

Frequency

15

10

20

40

Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.

60
Test1

80

100

9.13

Histogram Example #1 Questions


There are many questions you should be able to answer
from this type of graph.
What are the largest and smallest values possible in the
data set?
Are there any unusual data points (called outliers)?

Where do most values lie in the data set (if appropriate)?


What is the shape of the histogram (if appropriate)?
Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.

9.14

Thinking About the Sample Mean


Suppose I take a sample of book costs for 100 NKU
students. I then find the mean, and x = $340.
Suppose I then take a second sample of 100 NKU
students, different than before, and find their book costs.
If I calculated the sample mean, how do you think it
would compare to $340?
Would it be the same?
Would it be similar?
Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.

9.15

Thinking About the Sample Mean


Now suppose I performed the experiment a large number
of times, with each step involving:
sample 100 NKU students
record textbook costs
calculate the mean for the sample of 100 students
Each sample produces a different x
What happens if I make a histogram for all the different
values?

Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.

9.16

9.2 The Sampling Distribution of the Sample Mean

When you take a sample, compute a statistic, repeat the


process a large number of times, and then make a
histogram of the statistics you observed, you are
examining the sampling distribution of the statistic.
Under special conditions, some of these distributions
(histograms) will begin to resemble a normal distribution.

Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.

9.17

Sampling Distribution of the Mean


A fair die is thrown infinitely many times,
with the random variable X = # of spots on any throw.
The probability distribution of X is:
x
P(x)

1/6

1/6

1/6

1/6

1/6

1/6

and the mean and variance are calculated as well:

Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.

9.18

Throwing a die twice sample mean


Sample
1
2
3
4
5
6
7
8
9
10
11
12

1,1
1,2
1,3
1,4
1,5
1,6
2,1
2,2
2,3
2,4
2,5
2,6

Mean Sample
Mean
1
13
3,1
2
1.5
14
3,2
2.5
2
15
3,3
3
2.5
16
3,4
3.5
3
17
3,5
4
3.5
18
3,6
4.5
1.5
19
4,1
2.5
2
20
4,2
3
2.5
21
4,3
3.5
3
22
4,4
4
3.5
23
4,5
4.5
4
24
4,6
5

Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.

Sample
25
26
27
28
29
30
31
32
33
34
35
36

Mean
5,1
5,2
5,3
5,4
5,5
5,6
6,1
6,2
6,3
6,4
6,5
6,6

3
3.5
4
4.5
5
5.5
3.5
4
4.5
5
5.5
6

9.19

Sampling Distribution of Two Dice


A sampling distribution is created by looking at
all samples of size n=2 (i.e. two dice) and their means

While there are 36 possible samples of size 2, there are only


11 values for , and some (e.g. =3.5) occur more
frequently than others (e.g.
=1).
Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.

9.20

Sampling Distribution of Two Dice


The sampling distribution of
1/36
2/36
3/36
4/36
5/36
6/36
5/36
4/36
3/36
2/36
1/36

5/36

6/36

4/36

P(

1.0
1.5
2.0
2.5
3.0
3.5
4.0
4.5
5.0
5.5
6.0

P( )

3/36

is shown below:

2/36
1/36
1.0

Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.

1.5

2.0

2.5

3.0

3.5

4.0

4.5

5.0

5.5

6.0

9.21

Compare
Compare the distribution of X

1.0

1.5

with the sampling distribution of

2.0

2.5

3.0

3.5

4.0

4.5

5.0

5.5

6.0

As well, note that:


Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.

9.22

Generalize
We can generalize the mean and variance of the sampling of
two dice:

to n-dice:
The standard deviation of the
sampling distribution is
called the standard error:

Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.

9.23

Central Limit Theorem


The sampling distribution of the mean of a random sample
drawn from any population is approximately normal for a
sufficiently large sample size.
The larger the sample size, the more closely the sampling
distribution of X will resemble a normal distribution.

Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.

9.24

Sampling Distribution of the Mean


n5

n 10

n 25

m x 3.5

m x 3.5

m x 3.5

s2x
s .5833 ( )
5 6

s2x
2
s x .2917 ( )
10

s2x
s .1167 ( )
25

2
x

Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.

2
x

9.25

Central Limit Theorem


If the population is normal, then X is normally distributed
for all values of n.
If the population is non-normal, then X is approximately
normal only for larger values of n.
In many practical situations, a sample size of 30 may be
sufficiently large to allow us to use the normal distribution
as an approximation for the sampling distribution of X.

Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.

9.26

Sampling Distribution of the Sample Mean


1.

2.
3. If X is normal, X is normal. If X is nonnormal, X is
approximately normal for sufficiently large sample sizes.
Note: the definition of sufficiently large depends on the
extent of nonnormality of x (e.g. heavily skewed;
multimodal)

Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.

9.27

Sampling Distribution of the Mean

Demonstration: The variance of the sample mean is


smaller than the variance of the population.
Mean = 1.5 Mean = 2. Mean = 2.5

Population

Let us take samples


of two observations

1.5
2.5
22
3
1.5
2.5
22
1.5
2.5
1.5
2
2.5
1.5
2.5
2
Compare
the variability
of the population
1.5
2.5
1.5
22 of the2.5
to the variability
sample mean.
1.5
2.5
1.5
2.5
2
1.5
2.5
1.5
2
2.5
1.5
2
2.5
1.5
2
2.5
1

Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.

9.28

Sampling Distribution of the Mean


Also,
Expected value of the population mean =
(1 + 2 + 3)/3 = 2

Expected value of the sample mean =


(1.5 + 2 + 2.5)/3 = 2

Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.

9.29

Example 9.1(a)
The foreman of a bottling plant has observed that the amount
of soda in each 32-ounce bottle is actually a normally
distributed random variable, with a mean of 32.2 ounces and
a standard deviation of .3 ounce.
If a customer buys one bottle, what is the probability that the
bottle will contain more than 32 ounces?

Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.

9.30

Sampling Distribution of the Sample Mean


Example 9.1
The foreman of a bottling plant has observed that the
amount of soda in each 32-ounce bottle is actually a
normally distributed random variable, with a mean of
32.2 ounces and a standard deviation of 0.3 ounce.
Find the probability that a bottle bought by a customer
will contain more than 32 ounces.
Solution
The random variable X is the
amount of soda in a bottle.
P( x 32) P(

x m 32 32.2

)
sx
.3

0.7486

P( z .67) 0.7486

x = 32 m = 32.2
Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.

9.31

Example 9.1(a)
We want to find P(X > 32), where X is normally distributed
and =32.2 and =.3

there is about a 75% chance that a single bottle of soda


contains more than 32oz.

Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.

9.32

Example 9.1(b)
The foreman of a bottling plant has observed that the amount
of soda in each 32-ounce bottle is actually a normally
distributed random variable, with a mean of 32.2 ounces and
a standard deviation of .3 ounce.
If a customer buys a carton of four bottles, what is the
probability that the mean amount of the four bottles will be
greater than 32 ounces?

Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.

9.33

Sampling Distribution of the Sample Mean


Find the probability that a carton of four bottles will have a
mean of more than 32 ounces of soda per bottle.
Solution
Define the random variable as the mean amount of soda
per bottle.
P( x 32) P(

x m 32 32.2

)
sx
.3 4

P( z 1.33) 0.9082
0.9082
0.7486

Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.

x = 32
x 32 m = 32.2
m 32.2

9.34

Example 9.1(b)
We want to find P(X > 32), where X is normally distributed
with =32.2 and =.3
Things we know:
1) X is normally distributed, therefore so will X.
2)

= 32.2 oz.

3)

Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.

9.35

Example 9.1(b)
If a customer buys a carton of four bottles, what is the
probability that the mean amount of the four bottles will be
greater than 32 ounces?

There is about a 91% chance the mean of the four bottles


will exceed 32oz.

Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.

9.36

Graphically Speaking
mean=32.2

what is the probability that one bottle will


contain more than 32 ounces?

Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.

what is the probability that the mean of


four bottles will exceed 32 oz?

9.37

Sampling Distribution of the Sample Mean


Fully describe the sampling distribution of

Original population of soda pop: = 32.2 oz, = 0.3


oz
Random sample, n = 4
Sampling distribution of X:
m X m x 32.2
sx

0.3
sX

0.15
n
4

Sample is normally distributed, since population is


normally distributed.
Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.

9.38

Chapter-Opening Example
The dean of the School of Business claims that the average
salary of the schools graduates one year after graduation is
$800 per week with a standard deviation of $100. A secondyear student would like to check whether the claim about the
mean is correct. He does a survey of 25 people who
graduated one year ago and determines their weekly salary.
He discovers the sample mean to be $750. To interpret his
finding he needs to calculate the probability that a sample of
25 graduates would have a mean of $750 or less when the
population mean is $800 and the standard deviation is $100.

Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.

9.39

Chapter-Opening Example
We want to compute

P(X 750)
Although X is likely skewed it is likely that
is normally distributed. The mean of X is

m x m 800
The standard deviation is

s x s / n 100 / 25 20

Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.

9.40

Chapter-Opening Example
X m x 750 800

P( X 750) P

s
20
x

P(Z 2.5) .5 P(0 Z 2.5) .5 .4938 .0062


The probability of observing a sample mean as low as $750 when the
population mean is $800 is extremely small(0.62%). Because the event
is quite unlikely, we would conclude that the deans claim is not
justified.

Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.

9.41

Standardizing the Sample Mean


The sampling distribution can be used to make inferences
about population parameters. In order to do so, the sample
mean can be standardized to the standard normal distribution
using the following formulation:

Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.

9.42

Another Way to State the Probability


In Chapter 8 we saw that
P(-1.96 < Z < 1.96) = .95
From the sampling distribution of the mean we have
Z

X m
s/ n

Substituting this definition of Z in the probability statement we


produce
X m
P(1.96
1.96) .95
s/ n
Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.

9.43

Another Way to State the Probability


With a little algebra we rewrite the probability statement as

s
s
.95
P m 1.96
X m 1.96
n
n

Similarly

s
s
.90
P m 1.645
X m 1.645
n
n

In general

s
s
1
P m z / 2
X m z / 2
n
n

All are probability statements about


statistical inference
Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.

X,

which well use in

9.44

Return to the Chapter-Opening Example


Substituting m = 800,

= 100, n = 25, and

= .05, we get

s
s
1 .05
P m z .025
X m z .025
n
n

100
100
.95
P 800 1.96
X 800 1.96
25
25

P 760.8 X 839.2 .95

This is another way of checking the deans claim. The probability that
X falls between 760.8 and 839.2 is 95%. It is unlikely that we
would observe a sample mean as low as $750 when the population
mean is $800.
Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.

9.45

Calculating a Probability for

1. Make sure the sample contains at least 30 measurements (to


allow for normality)
2. Find the values of the mean and standard deviation of the
sample distribution of the sample mean using
X mX
m m and
X

sX

3. Sketch a normal curve, and shade the area, or probability, of


interest

Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.

9.46

Continued
Calculate the z-scores corresponding to the appropriate values of x
bar using the formula
z

X mX

sX

Use Table 1 to find the area under the normal curve corresponding
to each calculated z score.
Using the curve sketched in Step 3, find the probability of interest
by adding or subtracting the appropriate areas.

Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.

9.47

Fully describe the sampling distribution


To fully describe the sampling distribution:
1. Describe the population parameters given
2. Describe the sample with data given
3. Give the properties of the sampling distribution of
x bar, that is, the expected mean of the sample, the
standard error of the sample, and whether the
sample is normal

Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.

9.48

Sampling Distribution of a Proportion


The estimator of a population proportion of successes is the
sample proportion. That is, we count the number of
successes in a sample and compute:

(read this as p-hat).

X is the number of successes, n is the sample size.

Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.

9.49

Normal Approximation to Binomial


Binomial distribution with n=20 and p=.5 with a normal
approximation superimposed ( =10 and =2.24)

Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.

9.50

Normal Approximation to Binomial


Binomial distribution with n=20 and p=.5 with a normal
approximation superimposed ( =10 and =2.24)
where did these values come from?!
From 7.6 we saw that:

Hence:
and
Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.

9.51

Normal Approximation to Binomial


Normal approximation to the binomial works best when the
number of experiments, n, (sample size) is large, and the
probability of success, p, is close to 0.5
For the approximation to provide good results two
conditions should be met:
1) np 5
2) n(1p) 5

Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.

9.52

Normal Approximation to Binomial


To calculate P(X=10) using the
normal distribution, we can find
the area under the normal curve
between 9.5 & 10.5

P(X = 10) P(9.5 < Y < 10.5)


where Y is a normal random variable approximating
the binomial random variable X
Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.

9.53

Normal Approximation to Binomial


In fact:
P(X = 10) = .176
while
P(9.5 < Y < 10.5) = .1742
the approximation is quite good.

P(X = 10) P(9.5 < Y < 10.5)


where Y is a normal random variable approximating
the binomial random variable X
Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.

9.54

Sampling Distribution of a Sample Proportion


Using the laws of expected value and variance, we can
determine the mean, variance, and standard deviation of .
(The standard deviation of is called the standard error of
the proportion.)

Sample proportions can be standardized to a standard normal


distribution using this formulation:

Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.

9.55

Calculating a Probability for

Consider a population of qualitative data where


p is the proportion
having a particular attribute

of interest. Let p be the corresponding


proportion in a random sample of n
observations. To find the probability associated

with
, follow pthese steps:

Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.

9.56

Step 1

Define the sample proportion of interest in words. This


is important because every qualitative set of data has
more than one category, and you must be sure to
identify the category, or attribute, of interest. Also
specify the values of the population proportion p and
the sample size n.

Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.

9.57

Step 2

Find the values of the mean and standard error of the


sampling distribution of p hat using

m p

Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.

p (1 p )
n

9.58

Step 3

Verify that the sampling distribution of p is


approximately normal by checking that the
following holds
np>5 and n(1-p) > 5

Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.

9.59

Step 4

Sketch a normal curve, and shade the area


corresponding to the probability of interest.

Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.

9.60

Step 5

Calculate the z-scores corresponding to the appropriate


values of p. (Remember p is the same as the mean)

p p

p
Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.

9.61

Step 6
Use table to find the area under the normal curve
corresponding to each calculated z-score.

Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.

9.62

Step 7
With the help of the curve sketched in Step 4, find the
probability of interest by adding or subtracting
appropriate areas.

Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.

9.63

Example 9.3
A state representative received 52% of the votes
in the last election.
One year later the representative wanted to study
his popularity.
If his popularity has not changed, what is the
probability that more than half of a sample of
300 voters would vote for him?

Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.

9.64

Sampling Distribution
Example 9.3

Solution - Describe Sampling Distribution of p


Population p = .52
Sample: Random, n = 300
Sampling
distribution:

p = .50

p (1 p )
.0288
n

normal because np = 300(.52) = 156 and


n(1-p) = 300(1-.52) = 144 (both greater than 5)

Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.

9.65

Example 9.3
Solution for the probability:

p
.
50

.
52
.7549
P( p .50) P

p(1 p) n

(.
52
)(
1

.
52
)
300

The probability that more than half of the 300 sampled


voters will vote for him is 75.49%.

Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.

9.66

Sampling Distribution of the Sample Proportion


1.

m p

2. s
p

p (1 p )
n

3. The sampling distribution is normal if np>5 and n(1-p)

>5

Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.

9.67

Part 3
Finding probabilities when is unknown

Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.

9.68

Inference With Standard Deviation Unknown


Previously, we looked at estimating and testing the
population mean when the population standard
deviation ( ) was known or given:

But how often do we know the actual population variance?


Instead, we use the Student t-statistic, given by:

Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.

9.69

Inference With Standard Deviation Unknown


When

is unknown, we use its point estimator s

and the z-statistic is replaced by the the t-statistic, where the


number of degrees of freedom , is n1.

Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.

9.70

The t - Statistic

t
The t distribution is mound-shaped,
and symmetrical around zero.

d.f. = v2

v1 < v2

d.f. = v1

x m

s n
The degrees of freedom,
(a function of the sample size)
determine how spread the
distribution is (compared to the
normal distribution)

Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.

9.71

Inference About

with

Unknown

What value changes (possibly) with each new sample


for the z-statistic? x
What values change (possibly) with each new sample
for the t-statistic? x and s
Clearly, this adds more variability to the distribution
which describes possible t-scores.
This distribution is called a t-distribution and assumes
the sample comes from a normal distribution.
Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.

9.72

Degrees of freedom
I want to find 3 numbers that add up to 10.
If I choose 2 and 5 for the first two, what is the
third?
If I choose 4 and -6 for the first two, what is the
third?
Once I have designated two of the numbers, the third
is fixed, that is, there is no choice if the original
condition is to be met (sum = 10)
In this case we have 2 degrees of freedom.

Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.

9.73

Degrees of Freedom
If I want 4 numbers that add up to 20, how many
degrees of freedom do I have?
If I want 2 numbers that add up to 15, how many
degrees of freedom do I have?
If I want 5 numbers that add up to 16, how many
degrees of freedom do I have?

Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.

9.74

The t-distribution
Properties of the t-distribution:
1. It is symmetric and mound-shaped but it is not a
normal distribution.
2. The mean value (its center) is zero.
3. The t-distribution has n 1 degrees of freedom,
written df = n 1.
4.

The standard deviation of the distribution is

df
df 2
Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.

9.75

Example 12.1
Will new workers achieve 90% of the level of experienced
workers within one week of being hired and trained?
Experienced workers can process 500 packages/hour, thus if
our conjecture is correct, we expect new workers to be
able to process .90(500) = 450 packages per hour.
Given the data, is this the case?

Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.

9.76

Example 12.1

COMPUTE

Our test statistic is:

With n=50 data points, we have n1=49 degrees of freedom.

Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.

9.77

Example 12.1
From the data, we calculate

COMPUTE

= 460.38, s =38.83 and thus:

To change a t value to the probability (or area under the


curve) we use the table you received as a handout.

Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.

9.78

The t value from Table 4


To find the appropriate value of t to correctly estimate the
population mean, we must use Table 4.

Calculate the degrees of freedom, df = n 1


Look for the appropriate row labeled with that value
of df in Table 4
Look under the column with the appropriate level of
confidence

Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.

9.79

Example 12.1
Since t = 1.89, and we have 49 degrees of freedom, we go to
row 49, and then over to 1.9 for the calculated value of t.
This gives us 0.0317. This is the area under the curve.
NOTICE the shaded region on the top of the table. The
numbers in the table are the right tail of the distribution.
So the probability of getting the value gotten from the
experiment (in this case 460.38)or a higher value, if the
true mean is 450 is 0.0317 or 3.17%.

Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.

9.80

Example 12.1

COMPUTE

Alternatively, we can use t-test:Mean from


Tools > Data Analysis Plus in Excel

:
:
Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.

9.81

Example 12.1

COMPUTE

Technology gives a p-value of 0.0323. This is actually more


accurate, because we had to use t = 1.9 on our table, even
though we calculated t = 1.89 and even there we rounded.
Technology doesnt round, and so gives a more accurate
answer. However, the practical difference between 3.17%
and 3.23% isnt going to change many of our decisions.

Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.

9.82

Simulations
We can simulate the sampling distribution of z.

The z statistic gives us the number of standard deviations


away from the mean in a normal distribution.
So the average of all the z statistics in a simulation should be
0, because there should be as many pieces of data on one
side of the mean as on the other.
The standard deviation of all the z statistics in a simulation
should be 1, because the z is the standard deviation.
Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.

9.83

Sampling Distribution of z
In summary,
The shape of the sampling distribution of z is normal if the
population is normal, or n > 30.

mz 0
s z 1

Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.

9.84

The sampling distribution of t


Properties of the t-distribution:
It is symmetric and mound-shaped but it is not a
normal distribution.
The mean value (its center) is zero.

mt 0

The standard deviation of the distribution is

df
df 2
The t-distribution has n 1 degrees of freedom,
written df = n 1.

Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.

9.85

Sampling Distribution using Simulation


We have looked at 4 sample distributions.
These are x bar, p hat, z and t
To do simulations of these, use StatCrunch
For problems involving these
1. Give the theoretical distributions
1.Shape
2.Mean
3.Standard deviation

2. Do the Simulation using technology, and make a


histogram of the distribution
3. See how well the actual compares to the theoretical
Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.

9.86

Summary
Statistic

Shape

Mean

Normal if
Population is normal
OR n > 30

mx m

Normal if np > 5 AND


n(1-p) > 5

Normal if
Population is normal
OR n > 30

Symmetric and moundshaped but not normal

Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.

m p p

mz 0

mt 0

Standard Deviation

sx

s
n

s p p(1 p) / n

s z 1
st

df
df 2

9.87

You might also like