You are on page 1of 182

Introduction: What is Statistics?

Definition: Statistics is the science of


measurement and decision-making under
conditions of uncertainty, randomness,
and variability.

More briefly: Statistics is the field of


dealing with data.
Math 321 - Dr. Minnotte

In statistics, we make observations, to


collect information, to help make
decisions.

If that sounds familiar, it should. We do


that sort of thing every day, in every field
of study, and in our everyday life.

In statistics, we simply formalize this


process mathematically. This allows us to
recognize smaller differences than might
otherwise be found, and to make decisions
under conditions of greater uncertainty.
Math 321 - Dr. Minnotte

The term statistic is also used to describe


any bit of numerical information, like the 6.3%
unemployment rate in April, 2014 or the
15,143 students enrolled at UND in Fall, 2013.

These numerical bits of data are thrown at us


every time we read the newspaper, or watch
TV news, or read a journal in our field.

Just as words should be read with


understanding, so should statistics. If we
uncritically accept the numbers others give us,
we open ourselves to believing
misinformation.
Math 321 - Dr. Minnotte

Math 321 - Dr. Minnotte

Statistics are an important tool in almost


every field. In this class, well look at
examples like:

How can doctors tell if a new vaccine really


works?
How can irrigation engineers use past river flow
rates to predict future flows?
How can polltakers use responses from a few
thousand voters to predict the results of an
election in which more than a hundred million
people vote?

What are some other examples of statistics in


practice?
Math 321 - Dr. Minnotte

The Challenger Disaster:


A Statistical Cautionary Tale

In 1986, a lack of statistical thinking


contributed to a tragedy: the explosion of the
space shuttle Challenger.

The destruction of the Challenger killed


seven astronauts, including Christa
McAuliffe, a 37-year-old teacher selected to
be the first teacher in space, and set the U.S.
manned space program back several years.

Math 321 - Dr. Minnotte

Math 321 - Dr. Minnotte

Math 321 - Dr. Minnotte

The solid rocket motors used to launch the


space shuttles are shipped to the Kennedy
Space Center in four pieces. Large rubber
O-rings are used to seal the three joints
between the pieces.

The Challenger explosion occurred when one


of the O-rings failed to seal quickly enough to
prevent hot gasses from escaping from the
rocket and igniting the large external fuel
tank.

Implicated in the failure was the unusually


cold (for Florida) launch temperature of 29F.
Math 321 - Dr. Minnotte

The night before the launch, forecasters


predicted a temperature of 31F for the
launch time.

A three-hour teleconference took place


between people at:

Morton Thiokol (manufacturer of the rocket


motors)
Marshall Space Flight Center (NASA center
for motor design control), and
Kennedy Space Center.
Math 321 - Dr. Minnotte

There was concern that the cold


temperatures could lead to problems with
the O-rings.

In 7 out of 23 previous launches, some Oring damage had occurred.

Some participants recommended delaying


the launch until the temperature rose
above 53F, the lowest previous launch
temperature, in which the greatest number
of damaged O-rings occurred.
Math 321 - Dr. Minnotte

Math 321 - Dr. Minnotte

10

In the end, the recommendation was made


to launch on schedule, in part because of
the following plot.

The plot shows temperature vs. number of


damaged O-rings for the 7 affected
launches.

The relationship seems limited, at most.

What error was made preparing this plot?

Math 321 - Dr. Minnotte

Math 321 - Dr. Minnotte

11

Math 321 - Dr. Minnotte

12

Math 321 - Dr. Minnotte

13

By only including the launches in which


incidents occurred, the investigators left
out some important information!

When the data from all 23 launches is


plotted, a temperature dependence
becomes obvious.
All of the 4 launches below 66F had damage.
Only 3 out 16 flights above that temperature
suffered damage.

Note where 31F or 29F would appear on


that plot.

Math 321 - Dr. Minnotte

14

More sophisticated analyses are possible,


but unnecessary.

Had the concerned engineers presented


the complete data in such a format, they
might well have convinced the decisionmakers to delay the launch and prevented
the tragedy.

Theres more to this story, so well return


to it later in the semester.
Math 321 - Dr. Minnotte

15

Chapter 1: Univariate Data Populations and Samples


Definition: A population consists of all
potential observations from a distribution
of interest.

In an enumerative study, the population will


be tangible, real and finite, and might be
represented by a sampling frame listing the
members of the population.
o

Examples include populations of people, or


corporations, or items in a shipment.
Math 321 - Dr. Minnotte

Math 321 - Dr. Minnotte

16

In an analytic study, we study an ongoing


process, and the conceptual population is
infinite and simply a useful theoretical
construct. No sampling frame is possible.
o

Examples include populations of rainfall over time,


or objects coming off an ongoing assembly line, or
repeated measurements of the same underlying
weight.

As an investigator, you have a great deal


of flexibility in defining the population of
interest.
Math 321 - Dr. Minnotte

17

Example: We are interested in the ages of


UND students. What are some possible
relevant populations?

Example: A quality engineer wishes to


study the volume of milk in containers
coming off a production line. What are
possible populations?

Example: We wish to examine the


incidence of obesity in preteen children.
What is an appropriate population?
Math 321 - Dr. Minnotte

18

Once we have defined our population, we


take a sample from that population.

Measurements from each member of the


sample will be the observations which
make up the dataset we will analyze.

Example: Student ages.


Math 321 - Dr. Minnotte

Math 321 - Dr. Minnotte

19

Experiments

Suppose that a chemical engineer wants


to determine how the concentration of a
catalyst affects the yield of a process.

The engineer can run the process several


times, changing the concentration each
time and compare the yields that result.

This sort of experiment is called a


controlled experiment because the values
of the concentration variable are under the
control of the experimenter.
Math 321 - Dr. Minnotte

20

Observational Studies

There are many situations in which scientists


cannot control the variables of interest.

Many studies have been conducted to


determine the effect of cigarette smoking on
the risk of lung cancer. In these studies,
rates of cancer among smokers are
compared with rates among nonsmokers.

The experimenter cannot control who


smokes and who doesnt.

This kind of study is called an observational


study.
Math 321 - Dr. Minnotte

21

When we study a sample, we must make


sure it is representative of the population.

One option is a census, or complete


enumeration, of everyone in the
population. What are some problems with
this approach?

Math 321 - Dr. Minnotte

Math 321 - Dr. Minnotte

22

Usually, the best solution is to take a random


sample, choosing your sample with planned
probability methods.
The most basic such method is called a
simple random sample (SRS).
In a SRS, we draw individuals out of the
population with the equivalent of drawing
names out of a (well-mixed) hat.
Each subset of the population of the
appropriate size is equally likely to make up
the sample.
This is theoretically convenient, but often
hard to arrange in practice.
Math 321 - Dr. Minnotte

23

When viewed in order, or over time, the


observations of a SRS should not show
any noticeable pattern or trend.

Math 321 - Dr. Minnotte

24

A SRS is not guaranteed to reflect the


population perfectly.

SRSs always differ in some ways from


each other; occasionally a sample is
substantially different from the population.

This phenomenon is known as sampling


variation.

Math 321 - Dr. Minnotte

Math 321 - Dr. Minnotte

25

The items in a sample are independent if


knowing the values of some of the items
does not help to predict the values of the
others.

Items in a simple random sample may be


treated as independent in most cases
encountered in practice. The exception
occurs when the population is finite and
the sample comprises a large fraction
(more than 5%) of the population.

Math 321 - Dr. Minnotte

26

Samples of Convenience

A nonrandom sample, or sample of


convenience, may be easier to collect, but
may be nonrepresentative in some
important ways.

Such a sample may bias your results,


making them worthless (or at least a whole
lot less trustworthy).

Math 321 - Dr. Minnotte

27

Example: We are interested in the size of


hometowns for all U.S. college students,
but only sample at UND.

Example: We want to survey UND


students on math anxiety, and pick a class
to interview:

Math 321?
Upper-division English?

Math 321 - Dr. Minnotte

Math 321 - Dr. Minnotte

28

Example: Not everyone will consent to test


a new AIDS vaccine. We could give those
who consent the vaccine, and leave those
who dont alone to be the control group.

What about a historical control (compare


vaccinated group with past infection
rates)?
Math 321 - Dr. Minnotte

29

Terminology and Notation

From each individual person or object in


our sample, we are generally interested
only in a small number of characteristics.

Each characteristic we record will be


called a variable, and assigned a letter
from the end of the alphabet.

Math 321 - Dr. Minnotte

30

Data that we collect may be of two main


types:
1)

Categorical classifying the subject into one


of several distinct groups.
o
o
o

2)

X = Sex
T = Hair Color
W = Zip Code

Numerical data recorded as a number,


where operations like averages make sense.
o
o
o

Y = Age
U = Rainfall
Z = Volume of milk
Math 321 - Dr. Minnotte

Math 321 - Dr. Minnotte

31

10

We also classify datasets based on how


many variables we measure on each
individual.

If we only collect a single variable (e.g. age),


we say the dataset is univariate.

If we collect two variables for each individual


(e.g. age and sex), we say it is bivariate.

With still more variables, we say that it is


trivariate, quadrivariate, and so on, or more
commonly, that it is multivariate.
Math 321 - Dr. Minnotte

32

We often use subscripts on the variable


name (letter) to indicate specific
observations in a dataset, such as X1, X2,
, Xn.

A subscript of i (occasionally j or k)
indicates a specific, but arbitrary,
observation.

We usually reserve the label n for the


number of observations (the sample size).
Math 321 - Dr. Minnotte

33

There are two primary branches of


statistics:

1)

Descriptive statistics simply attempts to


simplify and understand a dataset.

2)

Inferential statistics attempts to say (infer)


something about the broader population or
distribution from which the data was
drawn.

Descriptive statistics are simpler, so well


start there.
Math 321 - Dr. Minnotte

Math 321 - Dr. Minnotte

34

11

Summary Statistics (1.2)

Given data X1, X2, , Xn, we frequently use


sample statistics to summarize the dataset.

A statistic is anything which may be


calculated from a dataset. A sample statistic
simply makes clear that it derives from a
sample.

Use of sample statistics can improve our


understanding of the data, as well as make it
easier to communicate with others about it.
Math 321 - Dr. Minnotte

35

The Sample Mean

The most important feature of a dataset to


describe is generally its location, or the
location of its center.

The most commonly used statistic for center


is the familiar average, or sample mean.

Definition: The sample mean of data X1, X2,


, Xn is

Math 321 - Dr. Minnotte

36

Example: Stocks:

Math 321 - Dr. Minnotte

Math 321 - Dr. Minnotte

37

12

To understand how the mean works,


suppose we were to take a very thin
yardstick or similarly marked board, and
place a small (equal) weight at the mark
for each observations value.

The mean may be thought of as the point


where this would balance.

Math 321 - Dr. Minnotte

38

Outliers

An outlier is an observation which is very


different from the rest of the sample. For
univariate data, this means it is much larger
or much smaller than the rest.

Outliers should be carefully examined. Often


they are the result of measurement or
recording errors.

If so, they should be fixed or deleted. Correct


but unusual values, however, should be kept.
Math 321 - Dr. Minnotte

39

The sample mean is not robust (resistant


to outliers). Changing even one
observation can change the sample mean
as much as we want.

Example: Mistype the final stock return as


374 (instead of 37.4). What is the sample
mean now?

Math 321 - Dr. Minnotte

Math 321 - Dr. Minnotte

40

13

Measures of Variability

After center, the second-most-used


feature to describe a sample is its
variability, or spread.

Math 321 - Dr. Minnotte

41

The simplest measure of variability is the


range, the difference between the
maximum and minimum values.
R = max(X) min(X)

Unfortunately, the range both wastes most


of the data, and is maximally non-robust,
using only the two extreme data points, so
it is rarely used.
Math 321 - Dr. Minnotte

42

A better solution looks at the deviations


from the mean,
This removes the
effect of the mean (location), and looks
only at the variability around the mean.

One option: Look at the average deviation


from the mean.

Problem: Positive deviations cancel out


negative ones, and the average deviation
from the mean is always 0.
Math 321 - Dr. Minnotte

Math 321 - Dr. Minnotte

43

14

We could take absolute values of the


deviations, but for a few theoretical
reasons, its better to look at the squared
deviations instead.

Definition: The sample variance, s2,


measures the spread of a dataset.

Definition: The sample standard deviation,


s, is the square root of the sample
variance.
Math 321 - Dr. Minnotte

44

Use of the definition formula is tedious, as


it requires finding and squaring each of the
n deviations from the mean.

It is usually simpler to calculate s2 using


the following computation formula.

Math 321 - Dr. Minnotte

45

Example: What are the variance and


standard deviation of the stocks data?

Math 321 - Dr. Minnotte

Math 321 - Dr. Minnotte

46

15

The sample variance and standard


deviation are measures of the spread of a
dataset, and estimates of the variance and
standard deviation of the underlying
population or distribution.

Like the sample mean, they are not robust.


Example: Stocks, replace 37.4 with 374:

s2 = ?
s=?
Math 321 - Dr. Minnotte

While very useful practically and


theoretically, the variance and standard
deviation are a little tricky intuitively.

One helpful rule of thumb:

47

About 2/3 of data should fall in


About 95% of data should fall in
Almost all data should fall in

Example: Stock data:

Math 321 - Dr. Minnotte

48

If X1, , Xn is a sample, and Yi = a + b Xi,


where a and b are constants, then

This is most commonly needed if we


change units for our data.

Math 321 - Dr. Minnotte

Math 321 - Dr. Minnotte

49

16

Example: Let X1,,Xn be a sample of


temperatures measured in degrees
Celsius, with = 30. Let Y1,,Yn be the
same temperatures in degrees Fahrenheit,
Yi = 9/5 Xi + 32. What is ?

Example: Let the variance of the Celsius


temperatures be
= 25.

What is the standard deviation?


What is the variance of the Fahrenheit
temperatures? The s.d.?
Math 321 - Dr. Minnotte

50

Order Statistics and Robust


Measures of Center and Spread

Definition: The ith order statistic, X(i), is the


ith smallest value when the Xs are sorted.
The minimum is X(1), the second smallest
X(2), and so on up to the maximum, X(n).

Math 321 - Dr. Minnotte

51

Example: Stock data (sorted):

X(1) = -7.2, X(4) = 1.3, X(20) = 37.4, and so


on.

Because outliers will always be in the first


or last few order statistics, values
computed from middle order statistics will
be very robust.
Math 321 - Dr. Minnotte

Math 321 - Dr. Minnotte

52

17

Definition: The sample median,


middle of the sorted data.

, is the

If n is odd, the sample median is the (n+1)/2th


order statistic.

If n is even, it is the average of the n/2th and


(n+2)/2th order statistics.

Example: Stocks:

=?

Math 321 - Dr. Minnotte

53

The sample median has 50% of the data


on either side of it.

The sample median is very robust;


changing one or a few observations wont
change it much, if at all.

Example: Stocks: Replace 37.4 with 374,


and the sample median remains 17.6

Math 321 - Dr. Minnotte

54

Quartiles

The quartiles of the data divide the sample


into quarters.

The first quartile, Q1, splits the lowest quarter


of the sample from the rest.

If (n+1)/4 is an integer, Q1 is the (n+1)/4 order


statistic.
If (n+1)/4 is not an integer, Q1 is the average of
the two order statistics on either side.

The third quartile, Q3, splits the highest


quarter from the rest.

Find it as Q1, but using 3(n+1)/4.


Math 321 - Dr. Minnotte

Math 321 - Dr. Minnotte

55

18

Example: Sorted stocks:

Q1 = ?
Q3 = ?

Math 321 - Dr. Minnotte

Definition: The sample interquartile range


is a robust measure of spread, found as
the difference between the sample
quartiles, IQR = Q3 Q1.

Example: Stocks: IQR = ?

Note: Changing 37.4 to 374 doesnt


change Q1, Q3, or IQR.

Math 321 - Dr. Minnotte

56

57

Percentiles

Definition: The pth sample percentile, has


(roughly) p% of the data below it, and
(100-p)% above it.

Compute p(n + 1)/100. If this is an


integer, use that order statistic. If not,
average the two closest order statistics.

The median and quartiles are just special


names for the 50th, 25th, and 75th
percentiles.
Math 321 - Dr. Minnotte

Math 321 - Dr. Minnotte

58

19

Example: Descriptive Statistics in Minitab

Descriptive Statistics: Stock Returns 1976-1995


Variable
Stock Returns 19

Variable
Stock Returns 19

Mean

StDev

Variance

Minimum

Q1

Median

Q3

Maximum

15.37

13.66

186.49

-7.20

5.48

17.60

28.90

37.40

IQR
23.43

Math 321 - Dr. Minnotte

59

Basic Statistical Graphics (1.3)

Some of the most powerful tools available


for understanding a dataset are graphics
which we can use to look at our data.

Its very hard to get much useful out of


large tables or long columns of numbers.
But the human eye is very good at picking
out patterns in pictures.

Math 321 - Dr. Minnotte

60

Bar Charts

Given categorical data, the most useful


plot available is usually a simple bar chart.

A bar is drawn for each category, with the


height proportional to the count
(frequency) or percentage found in that
category.

Other measurements for each category


may also be compared.
Math 321 - Dr. Minnotte

Math 321 - Dr. Minnotte

61

20

Example: Television Picture Grades

Perfect, Good, Satisfactory, Fail

Math 321 - Dr. Minnotte

Category
Perfect

62

Count
64

Good
Satisfactory
Fail

47
33
6

Total

150

Math 321 - Dr. Minnotte

63

Spaces between the bars show


categories.

Bars should start at 0 and show full height


(no truncation!). Otherwise, relative
heights get distorted.

Math 321 - Dr. Minnotte

Math 321 - Dr. Minnotte

64

21

Math 321 - Dr. Minnotte

65

Unless there is a strong natural ordering


(e.g. poor-fair-good-excellent; not
alphabetical), bars should be sorted in
ascending or descending order. This
makes comparisons between close values
much easier.

Math 321 - Dr. Minnotte

Math 321 - Dr. Minnotte

66

Math 321 - Dr. Minnotte

67

22

Math 321 - Dr. Minnotte

Many categories or long category names


may be better served by horizontal bars.

Math 321 - Dr. Minnotte

68

69

3-D perspective looks fancy but hurts


clarity usually a bad idea.

Math 321 - Dr. Minnotte

Math 321 - Dr. Minnotte

70

23

A stacked bar chart includes a second


categorical variable, but focuses on the
totals for the main category of the bars.
Individuals on the Titanic

1000
900
800
700
600
500
400
300
200
100
0

Survived
Died

1st Class

2nd
Class

3rd
Class

Crew

Math 321 - Dr. Minnotte

A clustered bar chart focuses on the


counts of the specific combinations of
categories, and is useful for comparing the
distribution of one variable for different
values of the other.
800
700
600
500
400
300
200
100
0

Died
Survived
1st
Class

2nd
Class

3rd
Class

Crew

Math 321 - Dr. Minnotte

71

72

Example Minitab Bar Charts

Math 321 - Dr. Minnotte

Math 321 - Dr. Minnotte

73

24

Math 321 - Dr. Minnotte

74

Math 321 - Dr. Minnotte

75

Pie Charts

The other common chart for categorical


data.

A pie chart should only be used when the


categories represent (all of the) parts of
some whole, and so should always plot
percentages.

Math 321 - Dr. Minnotte

Math 321 - Dr. Minnotte

76

25

Each categorys slice gets an angle


equal to

Math 321 - Dr. Minnotte

Comparing angles is much more difficult


than comparing heights or lengths. Bar
charts are almost always more effective.

3-D pie charts are the work of the devil.


(Probably worse than no chart.)

77

Math 321 - Dr. Minnotte

78

Math 321 - Dr. Minnotte

79

Minitab:

Math 321 - Dr. Minnotte

26

Dotplots

Dotplots are simple plots which are very


useful for looking at univariate numeric
data, especially when the sample size is
small or there are many ties in the data.

Each observation is plotted at its location


above an appropriate number line. If there
are ties, one dot is stacked for each tied
observation.
Math 321 - Dr. Minnotte

80

Example: Temperature (F) at launch of


the first 25 space shuttle launches.

66

70

69

80

68

67

72

73

70

57

63

78

70

67

53

75

67

70

81

76

79

75

76

58

31

Math 321 - Dr. Minnotte

81

Histograms

A histogram is a bar chart for numerical


data.

The shape of the histogram describes the


shape of the distribution of the data.

If you have a large, randomly collected


sample, the shape is also descriptive of
the population the sample was taken from.

Your book also describes stem-and-leaf


plots, which are similar, but rarely used.
Math 321 - Dr. Minnotte

Math 321 - Dr. Minnotte

82

27

Constructing a Histogram
Find the minimum and maximum of the
data.

1)

Break that interval into class intervals.

2)

5-20 classes is often a good start. More for


large samples, less for small ones.
A reasonable rule of thumb is

Select your classes so that each is of equal


width.
Math 321 - Dr. Minnotte

3)

Find the frequencies (counts, ni) and


relative frequencies (fi = ni/n) in each
class.

4)

Plot the bar chart with a bar over each


class whose height equals fi or ni.

Math 321 - Dr. Minnotte

83

84

Example: Stock Data (Annual Rate of Return,


1976-1995):

Math 321 - Dr. Minnotte

Math 321 - Dr. Minnotte

85

28

The shape of the histogram tells us about


the distribution. Some things to look for
include:

Is the distribution left-skewed?


Symmetric?
Right-skewed?

Math 321 - Dr. Minnotte

Is the distribution bimodal?

Multimodal?

Are there any outliers?


Math 321 - Dr. Minnotte

86

87

Its a good idea to look at several choices


of bin width and location, as different
choices here can produce dramatically
different histograms.

Features that remain in many histograms


are likely to be trustworthy; those that only
appear sometimes are less certain.

Math 321 - Dr. Minnotte

Math 321 - Dr. Minnotte

88

29

Example: Milk Fill Weights Data

Math 321 - Dr. Minnotte

Math 321 - Dr. Minnotte

89

Math 321 - Dr. Minnotte

90

Math 321 - Dr. Minnotte

91

30

Math 321 - Dr. Minnotte

92

Math 321 - Dr. Minnotte

93

Boxplots

Definition: A boxplot is another graphical


tool for displaying a sample:

Math 321 - Dr. Minnotte

Math 321 - Dr. Minnotte

94

31

The box goes from the first to the third


quartile, with a line at the median.

For boxplots, outliers are usually defined


as any values below
Q1 1.5 IQR
or above

Q3 + 1.5 IQR.

Those points are marked individually.

The whiskers go from the quartiles to the


least and greatest values among the nonoutliers.
Math 321 - Dr. Minnotte

95

Boxplots are much less informative than


histograms for a single distribution, so the
histogram is usually preferable.

On the other hand, comparing histograms


is difficult, while comparing boxplots is
easy.

Use boxplots to compare 2-20 (or more)


distributions.

Math 321 - Dr. Minnotte

96

Example: Fish length data

Math 321 - Dr. Minnotte

Math 321 - Dr. Minnotte

97

32

Example: Circuit board data by board.

Math 321 - Dr. Minnotte

98

Ch. 2: Bivariate Data

Statistics is most powerful when looking at


relationships between variables.

In the simplest case, this involves looking


at pairs of measurements made on the
same subjects, (x, y).

Recall, such data is called bivariate (two


variables).
Math 321 - Dr. Minnotte

99

Examples:

Heights and weights of a group of people.


ACT score and Freshman GPA for college
students.
January and April average temperatures for
many years at a specified location.
January and February inflows of the Nile river
at a location.

Math 321 - Dr. Minnotte

Math 321 - Dr. Minnotte

100

33

We usually picture our variables in a


cause-and-effect relationship.

The explanatory (independent, predictor)


variable, x, is assumed to play some role
in determining the value of the response
(dependent) variable, y.
x

Math 321 - Dr. Minnotte

101

Scatterplots (2.1)

Definition: A scatterplot is the most


common graph for displaying bivariate
data. It consists of plotting each point at
(xi, yi), on a standard x-y graph.

The pattern formed by the points


describes the relationship between the
variables.

Math 321 - Dr. Minnotte

Math 321 - Dr. Minnotte

102

Math 321 - Dr. Minnotte

103

34

Math 321 - Dr. Minnotte

Math 321 - Dr. Minnotte

104

Math 321 - Dr. Minnotte

105

Math 321 - Dr. Minnotte

106

35

Minitab Scatterplot:

Math 321 - Dr. Minnotte

107

Correlation

Suppose we have a sample of (x, y) pairs


and compute the sample means, and

For each observation (xi, yi), compute the


product of the two deviations from the
means.

Dividing the scatterplot at the means


results in two quadrants where the product
is positive, and two where it is negative.

Math 321 - Dr. Minnotte

Math 321 - Dr. Minnotte

108

Math 321 - Dr. Minnotte

109

36

For a scatterplot with a positive


relationship, most of the products will have
a positive sign, and the sum will be
positive.

Likewise, if the picture shows a negative


relationship, the sum of the products will
be negative.

Unfortunately, the exact value of the sum


depends on the units and spread (as
measured by standard deviation) of the
variables.
Math 321 - Dr. Minnotte

110

Dividing by measures of spread for x and y


solves this issue.

Then
is a good, unitless
measure of the linear relationship between x
and y called the correlation coefficient.
Math 321 - Dr. Minnotte

Example: Nile flow data: n=115

What is r?
Math 321 - Dr. Minnotte

Math 321 - Dr. Minnotte

111

112

37

Properties of r
1.

The value of r does not depend on the units of x


or y. We will not change r if we multiply all xs,
all ys, or both by a positive constant or if we add
any constant to all xs, all ys, or both.

2.

The value of r does not depend on which


variable is labeled x.

3.

Correlation is always between -1 and +1.

4.

The sign of r shows whether the relationship


between x and y is positive or negative.
Math 321 - Dr. Minnotte

113

Properties of r (continued)
5.

The absolute value of r measures the strength of the


linear relationship between x and y. Roughly
speaking:
a.
b.
c.
d.

If |r| < 0.5, the relationship (if any) is weak.


If 0.5 < |r| < 0.8, the association is moderate.
If 0.8 < |r| < 1.0, the association is strong.
If |r| = 1.0, the association is perfect. This occurs only
when all (x, y) points fall in a perfect line.

Note that strength is often context- and disciplinedependent. An engineer might find any correlation less
than .95 to be weak, while a social scientist might find a
correlation of .3 to be very strong.

Math 321 - Dr. Minnotte

Math 321 - Dr. Minnotte

114

Math 321 - Dr. Minnotte

115

38

Math 321 - Dr. Minnotte

116

Properties of r (continued)
6.

The correlation coefficient cannot measure the


strength of a nonlinear (curved) relationship.

Math 321 - Dr. Minnotte

7.

117

Outliers can also lead to an inappropriate value in either direction!

Math 321 - Dr. Minnotte

Math 321 - Dr. Minnotte

118

39

High correlation indicates strong


association, not necessarily causality.

If |r| is large, there are at least 3 possible


explanations:
1)
2)
3)

x determines y
y determines x
Some third value, z, (called a confounding
factor) determines both x and y.

Math 321 - Dr. Minnotte

119

Example: Weekly surveys show that per


capita chocolate consumption is strongly
correlated with traffic fatalities.

Should driving under the influence of


chocolate be outlawed?
Do people eat a lot of chocolate at funerals?
Is there a third explanation that makes more
sense?

Math 321 - Dr. Minnotte

120

Example: Over time, ministers salaries in


Massachusetts are strongly correlated with
the price of rum in Havana. What is the
causal relationship here?

Example: Childrens shoe size is


correlated with size of vocabulary. What is
the causal relationship?

Math 321 - Dr. Minnotte

Math 321 - Dr. Minnotte

121

40

One advantage of well-designed


randomized, controlled experiments is that
potential confounding factors should be
(roughly) balanced between levels of the
independent variable we are investigating,
so should be much less likely to produce a
spurious correlation.

Math 321 - Dr. Minnotte

122

Linear Regression (2.2 2.3)

Definition: Regression involves modeling


and predicting the values of one response
variable, based on the observed values of
one or more other explanatory variables.

Well focus on the case of simple linear


regression, where a straight line is fit to a
scatterplot of x and y.

Math 321 - Dr. Minnotte

123

We want an equation for a line of the form

The most common way to estimate and


uses the least squares fit, minimizing

This leads to the least squares estimates,

Math 321 - Dr. Minnotte

Math 321 - Dr. Minnotte

124

41

Deviations from a potential regression line:

Math 321 - Dr. Minnotte

125

The least squares line best fits the scatter plot.

Math 321 - Dr. Minnotte

126

Example: Nile flow data

What is the least-squares line for this data,


and what should we predict the flow for
February to be if Januarys was 3?

Math 321 - Dr. Minnotte

Math 321 - Dr. Minnotte

127

42

Math 321 - Dr. Minnotte

128

What would we predict for February from a


January value of 10?

Is this likely to be a valid prediction?


(Recall, Januarys mean is about 4, and its
standard deviation is about 1.)

Extrapolation outside the range of the data


is dangerous.
Math 321 - Dr. Minnotte

129

Residuals and Goodness-of-Fit

Definition: Given a data set (xi, yi) and an


associated fitted regression model, the
fitted value for observation i is

Definition: The residual for i is

The smaller the residuals, the better x and


the regression line are at predicting y.

Math 321 - Dr. Minnotte

Math 321 - Dr. Minnotte

130

43

The error sum of squares (SSE) is

SSE is usually compared to the total sum of squares, SST:

and the regression sum of squares, SSR:

To avoid having to calculate all the residuals, we may use the


computing formula:

SSE = SST - SSR


Math 321 - Dr. Minnotte

131

Math 321 - Dr. Minnotte

132

The coefficient of determination, r2,


measures the proportion of the total
variation of y which is explained by x:

The closer r2 is to 1, the more successful


the relationship is at explaining the
variation in y.

As the notation suggests, the coefficient of


determination is the square of the
correlation coefficient.
Math 321 - Dr. Minnotte

Math 321 - Dr. Minnotte

133

44

Example: Nile flow data:

Find SST, SSR, SSE, and r2.

What do these say about our predictions?

Note: r = 0.933.
Math 321 - Dr. Minnotte

134

The coefficient of determination r2 is found as R-Sq


in Minitab output.

The sums of squares may be found in the SS


column of the Analysis of Variance table.

The regression equation is


February Inflow = - 0.4698 + 0.8362 January Inflow

S = 0.330519

R-Sq = 87.1%

R-Sq(adj) = 87.0%

Analysis of Variance
Source

DF

SS

MS

83.3794

83.3794

763.25

0.000

Error

113

12.3444

0.1092

Total

114

95.7238

Regression

Math 321 - Dr. Minnotte

135

Chapter 3: Probability

Definition: Probability is the branch of


mathematics dealing with chance,
randomness, and uncertainty.

Probability provides most of the


mathematical foundation for inferential
statistics.

Math 321 - Dr. Minnotte

Math 321 - Dr. Minnotte

136

45

Definition: A situation for which the


outcome cannot be determined in advance
is called an experiment.

Examples:

The roll of a die.


The draw of a card.
The lifetime of an electronic component.

Math 321 - Dr. Minnotte

Definition: The sample space, S, of an


experiment is the set of all possible
outcomes.

Examples:

137

Die: S = {1, 2, 3, 4, 5, 6}
Card: S = ?
Component: S = ?

Math 321 - Dr. Minnotte

An experiment with several steps can be


visually represented by a tree diagram:

Example: Toss a coin three times:

Math 321 - Dr. Minnotte

Math 321 - Dr. Minnotte

138

139

46

Events
Definition: Set A is a subset of set B
(A B) if every element of A is also in B.

Example: S = {1, 2, 3, 4, 5, 6}

A = {1, 3, 5} S
B = {1, 2, 6, 7} S

Every set is a subset of itself.

The empty set, , consisting of no


elements, is a subset of every set.
Math 321 - Dr. Minnotte

Definition: Any interesting subset of the


sample space can be called an event.

Examples:

140

Die: A = odd numbers = {1, 3, 5}


Card: B = ?
Component: C = ?

The individual outcomes which make up S


are sometimes called simple events.

Math 321 - Dr. Minnotte

141

Combining Events

For subsets of S, A and B (A S, B S):

1)

The union of A and B (A B) is the set


consisting of all elements found in A, B, or
both.

Keyword: or

Example: S = {1, 2, 3, 4, 5, 6}

A = {1, 3, 5} S
B = {1, 2, 3} S
AB=?
Math 321 - Dr. Minnotte

Math 321 - Dr. Minnotte

142

47

The intersection of A and B (A B) is the


set consisting of all elements found in both
A and B.

2)

Keywords: and, both

Example: S = {1, 2, 3, 4, 5, 6}

A = {1, 3, 5}
B = {1, 2, 3}
AB=?

Math 321 - Dr. Minnotte

143

The complement of A (Ac) is the set


consisting of all elements of S not found in
A.

3)

Keyword: not

Example: S = {1, 2, 3, 4, 5, 6}

A = {1, 3, 5}
Ac = ?

Math 321 - Dr. Minnotte

144

Sets A and B are said to be mutually


exclusive if there are no elements in both
A and B. That is, if A B = (the empty
set).

4)

Example: S = {1, 2, 3, 4, 5, 6}

A = {1, 3, 5}
C = {4, 6}
A and C = , so A and C are mutually
exclusive.
Math 321 - Dr. Minnotte

Math 321 - Dr. Minnotte

145

48

Example: Three coin tosses.

S={HHH, HHT, HTH, HTT, THH, THT, TTH, TTT}

Let A = First toss is a head = ?

Let B = Last toss is a head = ?

What simple events make up the event A and


B?

A or B?

Not A?

Are A and B mutually exclusive?


Math 321 - Dr. Minnotte

146

The Axioms of Probability


Definition: A probability function P() is a
function from subsets of S (events) to the
real numbers which satisfies the following
axioms of probability:

1)
2)
3)

P(S) = 1.
0 P(A) 1 for all events A.
If A and B are mutually exclusive,
P(A B) = P(A) + P(B).
Math 321 - Dr. Minnotte

147

Example: A fair die.

P(1) = 1/6, P(2) = 1/6, P(3) = 1/6, P(4) = 1/6,


P(5) = 1/6, P(6) = 1/6.

Probabilities of bigger events are found by


axiom 3:

P({1,3}) = P(1) + P(3) = 1/6 + 1/6 = 2/6 = 1/3


P({1,3,5}) = ?

Math 321 - Dr. Minnotte

Math 321 - Dr. Minnotte

148

49

Example: A biased die.

P(1) = 1/12, P(2) = 1/6, P(3) = 1/6, P(4) = 1/6,


P(5) = 1/6, P(6) = 3/12 = 1/4.

Note:

(as required by axiom 2)

P({1,3}) = P(1) + P(3) = 1/12 + 1/6 = 1/4


P({1,3,5}) = ?

Math 321 - Dr. Minnotte

149

When applied to real experiments,


probability measures (long-term)
likelihood: if the experiment is repeated
many times, event A should occur roughly
P(A) fraction of the time.

Math 321 - Dr. Minnotte

150

Additional Properties of Probability

The axioms of probability imply some


additional properties:

1)

For any event A, P(Ac) = 1 P(A).

This is sometimes called the complementary


events rule, or the opposites rule.

Show:
Note: Since Sc = , P() = 0.
Math 321 - Dr. Minnotte

Math 321 - Dr. Minnotte

151

50

For any events A and B,


P(A B) = P(A) + P(B) P(A B).

2)

This is sometimes called the general addition


rule.

Show:
Note: if A and B are mutually exclusive,
P(A B) = P() = 0, so this is the same
as axiom 3.

Math 321 - Dr. Minnotte

152

Example: A fair die.

P(1) = 1/6, P(2) = 1/6, P(3) = 1/6, P(4) = 1/6,


P(5) = 1/6, P(6) = 1/6.
A = {1, 3, 5},
P(A) = 3/6 = 1/2.
B = {1, 2},
P(B) = 2/6 = 1/3.
P(Ac) = ?
A B = {1},
P(A B) = ?

P(A B) = 1/6.

Math 321 - Dr. Minnotte

We dont need to know the entire


probability function to use these.

Example: Lifetime of a component (T).


Suppose we know:

153

P(A) = P(T 60) = .47


P(B) = P(40 T 80) = .34
P(A B) = P(40 T 60) = .26

Then:

P(T 60) = ?
P(lifetime no more than 80) = ?
Math 321 - Dr. Minnotte

Math 321 - Dr. Minnotte

154

51

Example: Suppose the probability that an


integrated circuit chip has defective
etching is 0.12. The probability that the
chip has a crack defect is 0.29. And the
probability of both defects is 0.07.

What is the probability the chip does not


have defective etching?

What is the probability it has at least one


defect?

What is the probability it has neither


defect?
Math 321 - Dr. Minnotte

155

Equally Likely Outcomes

If S consists of N equally likely outcomes,


and event A consists of k of them,
P(A) = k/N.

Example: A fair die (see slides 148, 153).

Example: Draw a card at random from a


standard deck (52 cards, 13 spades). What
is the probability of drawing a spade?

Example: A shipment of 1000 hard drives


contains 6 which do not work. If we draw one
at random, what is the probability of selecting
a defective drive?
Math 321 - Dr. Minnotte

156

Conditional Probability (3.2)

Suppose we have partial information about


the outcome of an experiment. In
particular, suppose we know that the event
B has occurred.

We may use this information to revise the


probability of another event, A.

We call the revised probability a


conditional probability, as it depends on
the condition of B being true.
Math 321 - Dr. Minnotte

Math 321 - Dr. Minnotte

157

52

Example: Fair die. Let

A = {1, 3, 5}
P(A) = 3/6 = 1/2
B = {1, 2, 3}
P(B) = 3/6 = 1/2
P(A B) = P({1, 3}) = 2/6 = 1/3
If I roll the die and, without showing you, tell
you event B has occurred (I rolled no greater
than 3), now what is the probability of event
A?

Math 321 - Dr. Minnotte

Since B has occurred, the sample space


reduces to B: {1, 2, 3}.

Two of the three possibilities are odd (in


A), and the chances are still equal. So
P(A|B) = 2/3.

Once we know the roll is 3 or less, the


probability increases to 2/3 that its odd.
Math 321 - Dr. Minnotte

158

159

Definition: The conditional probability of A


given B is
(undefined if P(B) = 0).

This is the probability, given that event B


has occurred, that event A has also
occurred.

Die:
Math 321 - Dr. Minnotte

Math 321 - Dr. Minnotte

160

53

Example (continued from slide 155):

P(defective etching) = 0.12.


P(crack defect) = 0.29.
P(etching and crack defects) = 0.07.

If a chip has a crack defect, what is the


(conditional) probability that it also has
defective etching?

Math 321 - Dr. Minnotte

161

What is the probability that a chip has a crack


defect but satisfactory etching?

If a chip has a crack defect, what is the


probability that it has satisfactory etching?

Note: P(A|B) = 1 P(Ac|B) , just like


P(A) =1 - P(Ac).
Math 321 - Dr. Minnotte

162

If a chip has defective etching, what is the


probability that it also has a crack defect?

No relationship between P(A|B), P(B|A).

Math 321 - Dr. Minnotte

Math 321 - Dr. Minnotte

163

54

Independence

Definition: If P(A B) = P(A) P(B), we say


A and B are independent.

If A and B are independent, P(A)>0,


P(B)>0, then

Likewise, P(B|A) = P(B). Your book uses


this as the definition of independence.
Math 321 - Dr. Minnotte

164

Assuming P(A)>0, P(B)>0, any one of

P(A B) = P(A) P(B)


P(A|B) = P(A)
P(B|A) = P(B)

proves independence and the other two.

Math 321 - Dr. Minnotte

165

Example: Draw one card at random from a


well-shuffled deck. Define:

A = {draw a club}
B = {draw an ace}
C = {draw a red card}

Are A and B independent? A and C?

Math 321 - Dr. Minnotte

Math 321 - Dr. Minnotte

166

55

Note that events being mutually exclusive


and their being independent is not the
same thing.

Show: If P(A) > 0, P(B) > 0, and A and B


are mutually exclusive, they cannot be
independent!

Math 321 - Dr. Minnotte

167

Well often assume independence to


calculate probabilities of intersections.

Example: Roll a red die and a black die.

A = {red 6}
B = {black 6}

P(A) = 1/6
P(B) = 1/6

(fair dice)

Results on one die shouldnt influence the


other, so we assume independence.

P(double-sixes) = P(A B) = P(A) P(B)


= (1/6)(1/6) = 1/36.
Math 321 - Dr. Minnotte

168

This extends to more than 2 events.

The multiplication law for independent


events says that if events A1, A2, , An
are independent (that is, knowledge of any
combination of the Ais does not change
the probabilities of the remainder), then

P(A1 A2 An) = P(A1) P(A2) P(An).

Note: this is the probability that all n


events occur.
Math 321 - Dr. Minnotte

Math 321 - Dr. Minnotte

169

56

Example: Flip a fair coin 4 times.

Let Ai = {Flip i is a head}.


P(Ai) = 1/2,
i = 1, 2, 3, 4
Separate flips are independent. (Why?)
P(4 heads) = P(A1 A2 A3 A4)
= P(A1) P(A2) P(A3) P(A4)
= (1/2) (1/2) (1/2) (1/2)
= 1/16.

Math 321 - Dr. Minnotte

170

Example: Draw a card from a standard


deck 3 times with replacement (replace
and reshuffle after each draw).

Let Ai = {Draw i is a spade}.


P(Ai) = 13/52 = 1/4, i = 1, 2, 3
Separate draws are independent. (Why?)
P(3 spades) = ?

Math 321 - Dr. Minnotte

What if events arent independent?

Recall,

Therefore, P(A B) = P(A|B) P(B).

The general multiplication law:

171

P(A1 and A2) = P(A1) P(A2|A1).


Math 321 - Dr. Minnotte

Math 321 - Dr. Minnotte

172

57

Example: Suppose we have 4 cards,


labeled 1, 2, 3, and 4. Suppose we
draw two at random without replacement.
What is the probability both cards are
odd?

Math 321 - Dr. Minnotte

173

Example: Suppose we draw two cards at


random without replacement from a
standard deck. What is the probability
both cards are spades?

Math 321 - Dr. Minnotte

174

Random Variables (3.3)

Definition: A random variable is a random


number. It is obtained by assigning a
number to each outcome of an
experiment.

Example: Roll a die. The number rolled is


a random variable.

Math 321 - Dr. Minnotte

Math 321 - Dr. Minnotte

175

58

Example: Flip a coin 5 times. Is the


sequence of heads and tails a random
variable (Example: HHTHT)?

Some random variables we could


generate from 5 coin flips:

X=#H
Y=#H#T
Z = # H before first T

We usually denote random variables by


capital letters from the end of the alphabet.

Math 321 - Dr. Minnotte

176

Example: Select a rat at random from a


large colony. What are some possible
random variables?

Math 321 - Dr. Minnotte

177

There are two main types of random


variables: discrete and continuous.

Definition: A discrete random variable can


only take on a specified (countable) list of
values. There is a gap between any two
elements in its sample space.

In practice, these are usually counts of some


sort, and thus whole numbers.

Example: Number of heads in 5 coin flips.


Math 321 - Dr. Minnotte

Math 321 - Dr. Minnotte

178

59

Definition: A continuous random variable


may take any real number in some (set of)
interval(s).

Examples: Weight, lifetime.

We will need to deal differently with


discrete and continuous random variables.
Math 321 - Dr. Minnotte

179

Discrete Random Variables


Definition: The probability mass function
(p.m.f.) of a discrete random variable X is
a function p() from the support of X to the
real numbers, where
p(x) = P(X = x) .

Notation:

X: capital letter, indicates a random variable.


x: lowercase letter, indicates a specific value.
Math 321 - Dr. Minnotte

Example: Let X be the roll of a fair die.

180

S = {1, 2, 3, 4, 5, 6}
p(1) = P(X = 1) = 1/6
p(2) = P(X = 2) = 1/6
and so on.

We might write
p(x) = 1/6

x {1, 2, 3, 4, 5, 6}

Math 321 - Dr. Minnotte

Math 321 - Dr. Minnotte

181

60

Example: An industrial plant has 3


machines. The probability that X are
operating at a given random time may be
found from
x
p(x)

0.12 0.27 0.46 0.15

Math 321 - Dr. Minnotte

The laws of probability tell us that:

1)

? p(x) ?

2)

x S p(x) = ?

for all p(x)

Math 321 - Dr. Minnotte

182

183

A p.m.f. is plotted as spikes:

Math 321 - Dr. Minnotte

Math 321 - Dr. Minnotte

184

61

Or as a probability histogram, with areas


equal to probabilities:

Math 321 - Dr. Minnotte

Math 321 - Dr. Minnotte

185

Math 321 - Dr. Minnotte

186

Math 321 - Dr. Minnotte

187

62

Continuous Random Variables

Recall, a continuous random variable may


take any value in some real interval.

Continuous random variables are typically


measurements (length, weight, lifetime,
etc.).

Math 321 - Dr. Minnotte

With continuous random variables, we


cant use a p.m.f. to find probabilities.
Instead:

Definition: A probability density function


(density, p.d.f.), f(x), is a function which
determines the probability properties of a
continuous random variable. If X f(x),
then

Math 321 - Dr. Minnotte

188

189

If f(x) is a p.d.f.:

f(x) ?

for all x, and

Note: for a continuous random variable,


Why?
Math 321 - Dr. Minnotte

Math 321 - Dr. Minnotte

190

63

Example: a continuous random variable


has p.d.f.

Is f(x) a true p.d.f.?

Math 321 - Dr. Minnotte

191

Example (continued): What is the


probability that X will be between 0.5 and
1.0?

P(2.5 X 3.0) = ?

Math 321 - Dr. Minnotte

P(0.2 X 0.2) = ?

P(X < 1.0) = ?

Math 321 - Dr. Minnotte

Math 321 - Dr. Minnotte

192

193

64

Definition: The cumulative distribution


function (c.d.f.), F(x), of a random variable
is defined as
F(x) = P(X x).

If X is continuous,

Math 321 - Dr. Minnotte

194

Properties of continuous c.d.f.s:


1)

limx-F(x) = 0

2)

limxF(x) = 1

3)
4)

F is nondecreasing (if x < y, F(x) F(y) ).


P(a X b) = P(X b) P(X a)
= F(b) F(a).
This is often easier than integrating f(x).
Math 321 - Dr. Minnotte

Example (back to earlier p.d.f.):

P(0.5 X 1.0) = ?

195

(Compare to slide 192.)


Math 321 - Dr. Minnotte

Math 321 - Dr. Minnotte

196

65

The Population Mean


Definition: The population mean (expectation,
expected value) of random variable X is

if X is discrete, and
if X is continuous.
It can be thought of as the long-term average
of X, or the mean of a sample that follows the
distribution of X perfectly.

Math 321 - Dr. Minnotte

197

Example: Die roll

p(x) = 1/6 x{1, 2, , 6}


=?

Example: Machines

x
p(x)

=?

0
1
2
3
0.12 0.27 0.46 0.15

Math 321 - Dr. Minnotte

198

Math 321 - Dr. Minnotte

199

Example:

=?

Example:

=?

Math 321 - Dr. Minnotte

66

Expectations of Functions of
Random Variables

Given a random variable, X, suppose we


are really interested in a function, h(X).
The expected value of h(X) is
if X is discrete, and
if X is continuous.
Math 321 - Dr. Minnotte

Example: X ~ p(x) = , x = 1, 2.
What is E(X2)?

Note: In general, E[h(X)] h[E(X)].

Example: For the above p.m.f., what is


E(X)? [E(X)]2?

Is E(X2) = [E(X)]2?
Math 321 - Dr. Minnotte

200

201

The Population Variance and


Standard Deviation

Just as we have a population mean to


measure of the center of a distribution, the
population variance and standard
deviation measure a distributions spread.

Math 321 - Dr. Minnotte

Math 321 - Dr. Minnotte

202

67

Definition: Let X be a random variable with


mean . Then the population variance of
X, 2, is

Definition: The population standard


deviation, , of random variable X is the
square root of the variance of X.

Math 321 - Dr. Minnotte

203

Example: Die roll

p(x) = 1/6 x{1, 2, , 6}


=?
E(X2) = ?
V(X) = ?
=?

Example: p(x) = 1/2 x{3, 4}

=?

E(X2) = ?
V(X) = ?
=?

Math 321 - Dr. Minnotte

204

Example: Machines

x
p(x)

=?

E(X2) = ?

V(X) = ?

=?

0
1
2
3
0.12 0.27 0.46 0.15

Math 321 - Dr. Minnotte

Math 321 - Dr. Minnotte

205

68

Example:

=?

E(X2) = ?

V(X) = ?

=?

Math 321 - Dr. Minnotte

206

Linear Functions of Random


Variables (3.4)

Recall, a linear function (or linear


combination) of variables x1, x2, , xn, is
a function of the form
f(x1,x2,,xn) = a1x1 + a2x2 + +anxn + b
where b and all of the ais are fixed
constants.
Math 321 - Dr. Minnotte

207

Given any random variables X1, X2, , Xn


and known constants a1, a2, , an, and b,
then
E(a1X1 + a2X2 + + anXn + b) =
a1E(X1) + a2E(X2) + + anE(Xn) + b .

To find the expectation of a linear


combination of random variables, we need
only know the constants and the
expectation of each random variable
individually.
Math 321 - Dr. Minnotte

Math 321 - Dr. Minnotte

208

69

Example: Let X be a random temperature


measured in degrees Celsius, with E(X) =
10. Let Y be the same temperature in
degrees Fahrenheit, Y = 9/5 X + 32. What
is E(Y)?

Example: The expectation of the roll of a


fair die is 3.5. What is the expectation of
the sum of four such rolls?
Math 321 - Dr. Minnotte

209

Independent Random Variables

Recall, events are said to be independent


if knowledge of one does not affect the
probability of the other.

Likewise, random variables X and Y are


independent if knowing the value of X
does not affect probabilities of Y, no
matter what value X takes (and viceversa).
Math 321 - Dr. Minnotte

210

If X and Y are independent, any event


involving X alone will be independent from
any event involving Y alone.
P(X A and Y B) = P(X A)P(Y B)
for any A and B.

Draws with replacement are independent.

Draws in a simple random sample are not


independent, but may be treated as
though they are if the sample size is much
smaller than the population size.
Math 321 - Dr. Minnotte

Math 321 - Dr. Minnotte

211

70

If the random variables are independent,


then

V(a1X1 + a2X2 + + anXn + b) =


a12V(X1) + a22V(X2) + + an2V(Xn) .
Notes:

The shift b does not affect the variance.


The coefficients ai are squared.
Dependent random variables require a more
complex formula.
Math 321 - Dr. Minnotte

212

Example: Let the variance of the Celsius


temperature X be V(X) = 25.

What is the standard deviation of X?

What is the variance of Y = 9/5 X + 32?

What is the standard deviation of Y?

Math 321 - Dr. Minnotte

213

Example: The variance of the roll of a fair


die is 35/12. What is the variance of the
sum of four such rolls?

If we take a single roll and multiply it by 4,


what is the variance of the result? Why is
this different?

Math 321 - Dr. Minnotte

Math 321 - Dr. Minnotte

214

71

Suppose X and Y each have mean 10 and


variance 4. What are the mean and
variance of Z = X Y?

Math 321 - Dr. Minnotte

215

Mean and Variance of


the Sample Mean

An important special case concerns the


sample mean of the Xis,

Note that
Xis.

is a linear combination of the

Math 321 - Dr. Minnotte

216

Theorem: If X1, X2, Xn are independent


random variables, each with E(Xi) = and
V(Xi) = 2, then
and

Proof:

Math 321 - Dr. Minnotte

Math 321 - Dr. Minnotte

217

72

Example: A (possibly biased) coin has


probability p of coming up heads. We flip
it and let X = 1 if heads, 0 if tails.

What are E(X) and V(X)?

Suppose we flip it n times, and look at

Math 321 - Dr. Minnotte

218

Chapter 4: Common Distributions

Often we will have useful mathematical forms


which represent entire families of
distributions.

These distributions include one or more


constants (called parameters) which must be
specified to define a specific distribution.

We will concentrate on two especially


important families, the binomial and normal
distributions.
Math 321 - Dr. Minnotte

219

The Binomial Distribution (4.1)

The binomial distribution is the most


important common named family of
discrete distributions.

Recall, a discrete distribution is described


by a probability mass function p(), where

p(0) = P(X = 0)
p(1) = P(X = 1)
and so on.
Math 321 - Dr. Minnotte

Math 321 - Dr. Minnotte

220

73

Suppose our experiment consists of trials


with only two possible outcomes.

One outcome called a success occurs


with probability p.

The other outcome is called a failure, and


occurs with probability (1 p).

Such a process is called a Bernoulli trial


(after 17th-century probabilist James
Bernoulli).

The binomial distribution looks at a fixed


number of independent identical Bernoulli
trials, and counts the number of successes.
Math 321 - Dr. Minnotte

221

Example: Suppose silicon computer chips


are made in pairs, and that 30% of all
chips produced are defective.

Also assume that the chips in a pair are


independent of each other.

Out of pairs in which the first chip is good,


the second is defective in 30% of pairs.
This remains true for pairs in which the
first chip is defective.
Math 321 - Dr. Minnotte

222

Out of all pairs, 70% will have a good first


chip. Out of those, 70% will also have a
good second chip. Overall, 70% of 70%, or
49% (.7*.7 = .49) will have two good chips.

Likewise, 30% of that 70%, or 21% overall


(.7*.3 = .21) will have a good first chip and a
defective second chip.

By the same reasoning, 30% will have a


defective first chip, and 70% of those (21%
overall) will have a good second chip.

Finally, 30% of 30%, or 9% will have both


chips defective.
Math 321 - Dr. Minnotte

Math 321 - Dr. Minnotte

223

74

If we let the letter S (for success)


represent a good chip, and F (for failure)
represent a defective one, we can
summarize as:

P(SS) = .7*.7 = .49


P(SF) = .7*.3 = .21
P(FS) = .3*.7 = .21
P(FF) = .3*.3 = .09

Math 321 - Dr. Minnotte

Now let X be the number of good chips


produced in a pair.

Then X can take the values 0, 1, or 2.

224

From the above,

p(0) = P(X = 0) = P(FF) = .09


p(2) = P(X = 2) = P(SS) = .49
p(1) = P(X = 1) = P(SF or FS) = .21 + .21
= .42

Math 321 - Dr. Minnotte

225

What if the chips are produced in sets of


4?

If we want the probability of a set


consisting of 2 good and 2 defective chips,
we can think about the case of SSFF the
first and second chips are good, while the
third and fourth are defective.

The probability of this particular outcome


will be .7*.7*.3*.3 = .0441 or 4.41%.
Math 321 - Dr. Minnotte

Math 321 - Dr. Minnotte

226

75

But there are other ways we can have two


successes and two failures 5 other
ways, in this case:

P(SSFF) = .7*.7*.3*.3 = .0441


P(SFSF) = .7*.3*.7*.3 = .0441
P(SFFS) = .7*.3*.3*.7 = .0441
P(FSSF) = .3*.7*.7*.3 = .0441
P(FSFS) = .3*.7*.3*.7 = .0441
P(FFSS) = .3*.3*.7*.7 = .0441

Overall, p(2) = P(X = 2) = 6*.0441


=.2646.
Math 321 - Dr. Minnotte

227

In general, suppose we have an


experiment consisting of n independent
Bernoulli trials.

Those trials which satisfy the condition we


wish to count are called successes, and
occur with probability p.

The remaining trials are called failures;


these occur with probability (1 p).

Let X be the number of successes in the


full experiment.
Math 321 - Dr. Minnotte

228

If these conditions are true, we say that X,


the number of successes in the
experiment, has a binomial distribution
with parameters n and p.

X Binomial(n, p) or X Bin(n, p) .

The mass function for X is:

Math 321 - Dr. Minnotte

Math 321 - Dr. Minnotte

229

76

Note: the exclamation mark is pronounced


factorial.

Given n items, n! is the number of


arrangements, and is found as
n! n (n-1) (n-2) 2 1.

Since there is one (empty) way to arrange


0 objects, we define 0! = 1.
Math 321 - Dr. Minnotte

230

Example: The chips (30% defective) are


produced in batches of 4. Let X be the
number of good chips in a batch.

What distribution does X follow?

What is p(2)?

What is the probability that a random batch


will contain no more than one good chip?

Math 321 - Dr. Minnotte

231

Example: In a genetics study, a secondgeneration cross of pure green peas with


pure yellow peas leads to pods where p =
P(yellow) = .

If pods contain 8 seeds, what is the


probability that a random pod will contain 6
yellow seeds?

What is the probability that a random pod


will contain at least 6 yellow seeds?
Math 321 - Dr. Minnotte

Math 321 - Dr. Minnotte

232

77

Table A.1 in your book can save


calculations by providing probabilities of
P(X x) for n 20 and certain values of p.

Example: Draw 16 times with replacement


from a standard deck, and let X = number
of spades drawn.

Find P(X > 6).

Math 321 - Dr. Minnotte

233

With standard distributions, the mean and


variance may generally be found as a
function of the parameters.

If X Binomial(n, p), then = np.

Example: If 75% of all seeds are yellow, and


each pod contains 8 seeds, what is the mean
number of yellow seeds per pod?
Example: If we have 4 fair coins which we flip
as a batch, what is the mean number of
heads?
Math 321 - Dr. Minnotte

234

Additionally, if X Bin(n, p), then


2 = np(1 p).

Example: X = # yellow seeds ~ Bin(8, .75).


What are the variance and standard deviation
of X?

Example: X = # heads in 4 flips ~ Bin(4, .5).


What are the variance and standard deviation
of X?

Math 321 - Dr. Minnotte

Math 321 - Dr. Minnotte

235

78

Recall, draws without replacement (simple


random samples) are not independent.

However, we may do calculations as


though they are independent (including
binomial calculations) as long as the
sample size is small (less than 5%)
compared to the population size.

Math 321 - Dr. Minnotte

236

Example: A lot of several thousand


components contains 7% defective. We
sample 8 at random.

What is the probability of no defective


components in our sample?

What is the probability of at least one


defective?

What is the expected number of defectives


in our sample?
Math 321 - Dr. Minnotte

237

The Normal Distribution (4.3)

The continuous normal (or Gaussian)


distribution has two parameters, and 2.

If X ~ N(, 2),

This distribution is often seen in practice,


and is also very important theoretically.
Math 321 - Dr. Minnotte

Math 321 - Dr. Minnotte

238

79

The normal p.d.f. is a


bell-shaped curve,
symmetric around,
and with its peak at,
. E(X) = .

Its width is
determined by 2;
large values of 2
imply a wide, low
curve, while small
values imply a
narrow, tall one.
V(X) = 2.
Math 321 - Dr. Minnotte

239

An important special case is the standard


normal distribution, with = 0 and 2 = 1.

We usually identify standard normal


variables with the letter Z.

If Z is standard normal, Z~N(0,1) and the


density of Z is

Math 321 - Dr. Minnotte

240

There is no closed-form integral for the


normal probability density function, so we
cant find probabilities that way.

To find normal probabilities, we must use


computer programs (which themselves
use numeric integration), or tables such as
Table A.2 (p. 521-522, and inside the front
cover of your book) of the standard normal
distribution.

Math 321 - Dr. Minnotte

Math 321 - Dr. Minnotte

241

80

Math 321 - Dr. Minnotte

242

Examples:

P(Z 1.00) = ?

P(Z > 1.00) = ?

P(-2.00 Z 0.75) = ?

Math 321 - Dr. Minnotte

243

For X ~ N(, 2), we find proportions by


converting to standard units.

If X ~ N(, 2), then Z = (X - )/ ~ N(0,1).

Remember to convert both sides of any


inequality the same way.

Math 321 - Dr. Minnotte

Math 321 - Dr. Minnotte

244

81

Examples: Let X ~ N(3, 4).

P(X 6.00) = ?

P(X > 4.00) = ?

Math 321 - Dr. Minnotte

245

Normal Percentiles

Just as for samples, the pth percentile of a


distribution has p% of the probability below
it, and (100 p)% above.

We find percentiles for the normal


distribution using Table A.2 again, but
reading from the inside out.

Since probabilities are in the middle of the


table, start there.
Read to the outside to find the percentile.
Math 321 - Dr. Minnotte

246

Example: Z ~ N(0, 1). What is the 70th


percentile of Z?

Example: What is the 25th percentile of Z?

Math 321 - Dr. Minnotte

Math 321 - Dr. Minnotte

247

82

For non-standard normal variables, first


find the desired percentile for the standard
normal, then use the fact that since
Z = (X - )/, therefore X = + Z.

Example: X ~ N(10, 25). What is the 95th


percentile of X?

Math 321 - Dr. Minnotte

248

Besides the binomial and normal


distributions, there are a number of other
named families of distributions with useful
properties.

For example, the Poisson distribution


(Section 4.2) is useful for modeling random
counts in a fixed interval of time or space.

See Sections 4.4-4.6 for discussion of the


lognormal, exponential, gamma, and Weibull
distributions, which are useful for modeling
continuous histograms which are positively
skewed and unimodal.
Math 321 - Dr. Minnotte

249

Sampling Distributions (4.8)

Suppose random variable X is drawn from


some distribution f. (X ~ f )

Now suppose we generate n of these


random variables, X1, Xn, independently
from f.

We say that X1, Xn make a random sample


from f.
Sometimes we say that X1, Xn are i.i.d.
(independent and identically distributed) from
f.
Math 321 - Dr. Minnotte

Math 321 - Dr. Minnotte

250

83

Since the Xs make a sample, we can


compute sample statistics such as the
mean,

Recall (3.4), since the Xs are random, so


is
and since it is a number, is itself a
random variable with a distribution.

This distribution is referred to as the


sampling distribution of
and plays a
large role in inferential statistics.
Math 321 - Dr. Minnotte

Example: Let pX(x) = 1/3, x = 1, 2, 3, and


let X1 and X2 be independent draws from
pX(x).
Now let = (X1 + X2)/2 be the average of
X1 and X2.
Note that is also a discrete random
variable, and therefore has a probability
mass function.
What is the mass function (sampling
distribution) of ?
Math 321 - Dr. Minnotte

251

252

Example: Suppose X ~ N(50, 4). A


histogram of 1000 Xs looks like this:

Math 321 - Dr. Minnotte

Math 321 - Dr. Minnotte

253

84

Sample 25 Xs and compute

If we repeat this process 1000 times, we


get a histogram such as this:

Math 321 - Dr. Minnotte

Note that

254

has a distribution that:

Is centered on 50 ();
Is narrower than the solid normal curve for the
individual Xs the variance and standard
deviation of are smaller than those of X.
Remains bell-shaped and (roughly?) normal.

Understanding the distributions of sample


statistics and their relationships to the
associated population parameters is the
basis of most of inferential statistics.

Math 321 - Dr. Minnotte

255

In general, if a sample statistic is used to


estimate a population parameter:

The sampling distribution of the statistic is


centered on (or at least near) the parameter.
The spread of the sampling distribution will
decrease as the sample size gets larger.
As the sample size gets larger, the shape of
the sampling distribution will usually get more
and more bell-shaped (normal).

Math 321 - Dr. Minnotte

Math 321 - Dr. Minnotte

256

85

Sampling Distributions of the Mean


Let
be the sample mean of a random
sample X1, X2, Xn, from a population or
process with mean and standard
deviation . Then (recall, Section 3.4):

The mean of the sampling distribution of ,


, is , the population mean, regardless of
sample size n.
The standard deviation of the sampling
distribution of ,
, is
, the population
standard deviation divided by the square root
of the sample size.
Math 321 - Dr. Minnotte

257

The standard deviation of the sample


mean,
, is often called the standard
error of the sample mean.

This emphasizes that it describes a


sampling distribution, not a population.

Math 321 - Dr. Minnotte

258

As the sample size gets larger, we have


more information and can make better
estimates, so the standard error
decreases.

(Note, however, that the square root


means we have diminishing returns; each
new observation provides less new
information than the previous one.)

The larger the sample, the closer


likely to be to .
Math 321 - Dr. Minnotte

Math 321 - Dr. Minnotte

is
259

86

Math 321 - Dr. Minnotte

260

If our original population has a normal


distribution, the sampling distribution of
is also normal, regardless of sample size.

Example: An automated filling machine


fills soft drink cans with a volume that has
a normal distribution with = 0.05 ounces.

If we sample 4 cans and take the sample


mean, what is the probability that will be
within 0.04 ounces of the population mean
?
Math 321 - Dr. Minnotte

261

The Central Limit Theorem

The Central Limit Theorem is the most


important theorem in statistics.

It shows the importance of the normal


distribution, and provides the justification
of many of the most fundamental statistical
methods.

Math 321 - Dr. Minnotte

Math 321 - Dr. Minnotte

262

87

If we know that a population or process


has a normal distribution, we know that the
sampling distribution of will also be
normal. This allows us to compute useful
probabilities.

Unfortunately, we often do not know the


population distribution (or perhaps we
know that it is not normal).

Fortunately, this is not always required.


Math 321 - Dr. Minnotte

263

The sample mean (or sum) of a large


number of independent random variables
has a sampling distribution which is
approximately normal, no matter what
distribution the original random variables
come from.

This important result is the Central Limit


Theorem.

Math 321 - Dr. Minnotte

Theorem (Central Limit Theorem): If X1,


X2, Xn are independent random
variables, from a population or process
with mean and standard deviation ,
then as long as n is sufficiently large,

We can use this to find probabilities for


sums or averages, without knowing the
distribution of the Xis!
Math 321 - Dr. Minnotte

Math 321 - Dr. Minnotte

264

265

88

Math 321 - Dr. Minnotte

266

Example: The (population) mean time


required for maintenance on an airconditioning unit is 1 hour, and the
standard deviation is also 1 hour. A
company operates 50 such units.

Could we find the probability that the


maintenance on a single unit requires more
than 2 hours from the information given?

Math 321 - Dr. Minnotte

What is the probability that the average time


for maintenance will be more than 75
minutes?

What is the probability that the total time for


maintenance will be less than 40 hours?

Math 321 - Dr. Minnotte

Math 321 - Dr. Minnotte

267

268

89

How large is large?

As a general rule, n 30 is usually large


enough that the Central Limit Theorem is
reasonable.

Symmetric populations can get by with


much less, often as few as 10, or even
fewer.

Highly skewed populations require more.


50 or more should be fairly safe in all but
the worst cases.
Math 321 - Dr. Minnotte

269

The Normal Approximation


to the Binomial Distribution

Recall, if X ~ B(n, p), then E(X) = np and


V(X) = np(1-p).

If the particular values of n and p lead to a


binomial distribution which is not very
skewed, the
distribution can be a good approximation to
the B(n,p) distribution.
We usually require that np 10 and
n(1-p) 10 .

Math 321 - Dr. Minnotte

270

Example: Roll a die 120 times and count


the number of 6s rolled (X).

What distribution does X follow?

What are E(X) and V(X)?

What is P(X 25)?

Math 321 - Dr. Minnotte

Math 321 - Dr. Minnotte

271

90

The true binomial probability is 0.136.

Were pretty close, but we can do better.

Binomial probabilities are located entirely


on the integers, but normal probabilities
are smeared out over the whole real line
(remember the probability histogram).

Well get a better approximation if we use


a continuity correction, by taking the
normal probability from (x - .5) to (x + .5)
to approximate the binomial P(X = x).

Math 321 - Dr. Minnotte

272

Math 321 - Dr. Minnotte

273

So, for X ~ B(120, 1/6),


P(X 25) = P(X 24.5) =

Example: If X~Bin(120, 1/6), use the


normal approximation to estimate
P(15 < X < 25).

Math 321 - Dr. Minnotte

Math 321 - Dr. Minnotte

274

91

Chapter 5: Statistical Estimation

The remainder of the course will focus on


inferential statistics.

Recall, in probability, we generally know


the distribution in question and wish to
calculate something about particular
outcomes or events.

Math 321 - Dr. Minnotte

275

In inferential statistics, we have a sample,


and wish to use that information to say
something about the population or
distribution the sample was drawn from.
Probability
Population

Sample
Inferential
Statistics

Math 321 - Dr. Minnotte

276

Recall: A parameter is an unknown


quantity related to a population or
distribution.

A statistic is a known quantity which can


be calculated from a dataset.

Estimation uses a statistic (what we know)


to tell us something about an unknown
parameter (what we wish we knew).

Math 321 - Dr. Minnotte

Math 321 - Dr. Minnotte

277

92

Point Estimation (5.1)

Definition: A point estimate of a parameter


, is a statistic, , which represents a
best guess for .

Example: We have an unknown


distribution, X ~ f(x), and we wish to know
the unknown parameter = E(X). We
take a sample X1, X2, Xn, and estimate
with the known statistic
.
Math 321 - Dr. Minnotte

278

Other common point estimates:

Estimate V(X) = 2 with

If X ~ Binomial(n, p) (n known, p
unknown), estimate p with
.

All of our standard sample statistics


(median, quartiles, etc.) are good
estimates of the corresponding population
or distribution parameters.

Math 321 - Dr. Minnotte

279

Properties of Estimates

There are a few properties that we like to


see in a parameter estimate.

On average (over many samples), an


estimate should give the correct value for
the parameter. If the mean of the
sampling distribution of our estimate is the
parameter we are estimating, that is,
we say that is an unbiased
estimate of .
Math 321 - Dr. Minnotte

Math 321 - Dr. Minnotte

280

93

Example: We know that


unbiased estimate of .

so

is an

Also,

and
(proof:)

so the sample variance and proportion


are unbiased estimates of the population
variance and proportion.

This is why we divide by (n 1) instead of


n to find s2.
Math 321 - Dr. Minnotte

On the other hand, the sample standard


deviation, s, has
so s is a biased
estimate for .

Fortunately, the bias (defined as


or more generally,
) is small,
especially as n gets large.

Math 321 - Dr. Minnotte

281

282

Note that just because an estimate is


unbiased, does not guarantee that it will
give you the exact parameter on this (or
possibly, any) sample.

Example: X ~ Binomial(n = 25, p = 0.3).


Even though is unbiased for p, there is
no value of X that will give

Remember our sampling distributions; an


unbiased estimates distribution will be
centered correctly, but it will still have
some spread.
Math 321 - Dr. Minnotte

Math 321 - Dr. Minnotte

283

94

The variance of the sampling distribution of our


estimate measures that spread and is also
important in measuring how well it performs.

Math 321 - Dr. Minnotte

We combine these two aspects into a


single measure, the mean squared error:

A small MSE means that both bias and


variance are small.
Math 321 - Dr. Minnotte

Example: Suppose X1 and X2 are


independent, with E(X1) = E(X2) = and
V(X1) = V(X2) = 2.

Let

284

285

Find:

Math 321 - Dr. Minnotte

Math 321 - Dr. Minnotte

286

95

Example (continued): Let

Find:

For what values of and 2 is

Math 321 - Dr. Minnotte

287

Confidence Intervals (5.2)

Having a good estimate is a good first step


in learning about a population parameter.

We should also be interested in how close


our estimate is likely to be to the
parameter.

One approach is to calculate the standard


error, remembering that we will usually be
within 2-3 standard errors of the parameter
(if we use an unbiased estimate).
Math 321 - Dr. Minnotte

288

Another way to look at this issue is that we


know our estimate is incorrect. (We just
dont know by exactly how much.)

We can improve this situation by


expanding our point estimate to an interval
estimate, providing a range of plausible
values for .

Done carefully, we can identify how likely it


is that our interval includes .
Math 321 - Dr. Minnotte

Math 321 - Dr. Minnotte

289

96

If our sample size, n, is large, we can use


the Central Limit Theorem to give us the
following.

Math 321 - Dr. Minnotte

290

Therefore, the interval

is a random interval which covers the


population mean with probability 0.95.

We call such an interval a 95% confidence


interval.

This represents a set of plausible values of


that are consistent with the data.
Math 321 - Dr. Minnotte

Example: A random sample of 80 auto


body shops for cost to repair a particular
kind of damage have mean $472.36 and
standard deviation $62.35.

What is the 95% confidence interval for


the mean of this population?

Math 321 - Dr. Minnotte

Math 321 - Dr. Minnotte

291

292

97

Is it correct to say
P(458.70 486.02) = 0.95 ?

No! Nothing inside the probability


statement is random. Recall:

The random parts are the sample


statistics.

The interval is random, not the population


parameter, .
Math 321 - Dr. Minnotte

293

If we constructed many 95% confidence


intervals from independent datasets, wed
get many different sample means and
sample standard deviations, and each
would lead to a different confidence
interval.

In the long run, about 95% of these


different confidence intervals would
contain the true parameter .

Remember, randomness is in the sample


and the interval, not in the parameter!

Math 321 - Dr. Minnotte

Math 321 - Dr. Minnotte

294

Math 321 - Dr. Minnotte

295

98

We call the value 95% the confidence


level. We say we are 95% confident that
the population mean lies within the
computed interval.

We can select other confidence levels if


desired, by replacing the critical value 1.96
with the Z-percentile that gives the
appropriate center probability.

A confidence level of 95% (1.96) is most


common, but levels of 90% (1.645) and
99% (2.575) are also often used.
Math 321 - Dr. Minnotte

296

In general, define zp to be the value,


above which there is probability p in the
tail of the standard normal distribution.

Then zp will be the 100(1-p)th percentile of


the standard normal distribution.

For a 100(1-)% confidence interval, we


use the critical value z/2.

Example: What critical value would we use


for an 80% confidence interval?

Math 321 - Dr. Minnotte

Math 321 - Dr. Minnotte

297

Math 321 - Dr. Minnotte

298

99

Math 321 - Dr. Minnotte

What factors affect the length (precision)


of the confidence interval?

s If s is bigger,
is less accurate, and the
interval must be wider.
Confidence level To be more confident of
including the true value, we must make the
interval wider.
n as n gets bigger, the standard error of
gets smaller, and the interval gets narrower.
Math 321 - Dr. Minnotte

299

300

If we require a 95% confidence interval of error


width (interval half-width) no more than w, we
can compute a (rough) minimum sample size if
we have an estimate or upper bound for s.

Of course, we can substitute the appropriate Z


critical value to find sample sizes for other
confidence levels.
Math 321 - Dr. Minnotte

Math 321 - Dr. Minnotte

301

100

Example: Milk fill weights.

n = 50,
= 2.0727, s = 0.0711
Find a 95% confidence interval for .

w=?

If we require w 0.01, how big should n be?

Math 321 - Dr. Minnotte

302

Confidence Bounds

Sometimes, we only wish to know a lower


(or upper) bound on .

We can generate one-sided confidence


intervals, also called confidence bounds,
in a similar way to the usual two-sided
case.

Math 321 - Dr. Minnotte

303

If we have a large sample, then:

A 95% lower confidence bound for is

A 95% upper confidence bound for is

To get 90%, 99%, or 100(1-)% bounds,


replace 1.645 with 1.28, 2.33, or z,
respectively.
Math 321 - Dr. Minnotte

Math 321 - Dr. Minnotte

304

101

Example: A sample of 48 Shear strength


measurements give a mean of
17.17 N/mm2 and a standard deviation of
3.28 N/mm2.

If we only care that the population mean


shear strength is great enough, find a 90%
lower bound on .

Math 321 - Dr. Minnotte

305

For our normal-based confidence interval


and level to be valid, we must know (or at
least assume) that:

The sample is a random draw from the


population.
The sample size n is large enough that the
sample mean is approximately normally
distributed and that s is a good estimate of .

Math 321 - Dr. Minnotte

306

Chapter 6: Hypothesis Testing

Estimation (both point and interval) is


useful for providing an idea of the value of
a population parameter.

Frequently, we may wish to investigate a


more specific question about a parameter.
For this purpose, we use the other major
branch of inferential statistics, hypothesis
testing.
Math 321 - Dr. Minnotte

Math 321 - Dr. Minnotte

307

102

One-Sample Z-Tests (6.1-6.2)

Example: (Milk data) Suppose our bottlefilling machine is supposed to dispense


2.04 L of milk. Recall, a sample of size 50
gave = 2.0727, s = 0.0711. Does the
machine need to be recalibrated?

To answer this, lets assume that the


machine is working properly, and see how
likely we are to get a sample mean as far
or further from the expected value as the
sample mean we actually saw (2.0727).
Math 321 - Dr. Minnotte

308

More formally, we choose a null


hypothesis, H0.

This is a statement about a population


parameter (say, ), generally that it is
equal to the value of interest (denoted 0).

Usually, the null hypothesis means


everything is as it should be, or nothing
interesting is happening.

Here:

H0:

= 2.04 (= 0)

Math 321 - Dr. Minnotte

309

We also choose an alternative hypothesis,


H1, that the null is incorrect.

H1:

The alternative is literally simply that the


null is incorrect, but this is often the more
interesting or important result.

2.04

Math 321 - Dr. Minnotte

Math 321 - Dr. Minnotte

310

103

Next, we compute a test statistic, under


the assumption that H0 is correct.

For large-sample tests on the population


mean, , we usually use the z-statistic:

Here: z = ?

If H0 is true,

Is z a typical value from a N(0, 1)


distribution?

and z ~ N(0, 1).

Math 321 - Dr. Minnotte

311

Formally, we find a P-value, the probability


that a sample from the null distribution
would give a test statistic as or more
unusual as the one we just saw.

Since H1: 2.04, we use a two-sided


P-value: P = P(|z| 3.25) (z ~ N(0,1)).

From our table, if z ~ N(0,1),


P (|z| 3.25) = .0012.

Math 321 - Dr. Minnotte

312

So we have two possibilities:

1)

H0 is correct, = 2.04, and we got very


unlucky to happen to get the (roughly) 1 in
800 chance to get
2.0727 (or the
equally unusual 2.0073), or

2)

H0 is wrong.

Which seems more reasonable to believe?

Since P is so small, we reject H0 and


decide the filling machine does require
recalibration.
Math 321 - Dr. Minnotte

Math 321 - Dr. Minnotte

313

104

All hypothesis tests follow this general


pattern:

1)

2)
3)
4)

5)

We observe some difference in a sample and


wish to decide if it reflects a true difference in
the population.
Identify the null and alternative hypotheses.
Compute a test statistic which has a known
distribution when the null hypothesis is true.
Find a P-value: the probability of a statistic as
or more unusual than the one we observed,
when the null hypothesis is true.
If P is small, reject the null hypothesis.
Otherwise, fail to reject it.
Math 321 - Dr. Minnotte

314

This basic pattern holds for many different


tests on different parameters with different
assumptions.

For questions about the population mean


for a single population, we often use the
one sample z-test demonstrated above.

Math 321 - Dr. Minnotte

315

Details on the one-sample z-test:


We have a single population, and a
specific value, 0, we wish to consider for
the population mean.

1)

This may be a known population mean for


some related population (see next example).
Or it may be a desired population mean
(example: milk data).
A sample from the population will give a
sample mean different from 0, even if that is
the actual population mean.
Math 321 - Dr. Minnotte

Math 321 - Dr. Minnotte

316

105

Identify H0 and H1.

2)

H1 is a statement that something interesting is


going on. It is usually what we wish to prove.
We should decide if we care about a onesided or two-sided alternative, ideally before
we ever see data.
Two-sided: H0: = 0 vs. H1: 0.
One-sided: H0: 0 vs. H1: > 0
or:
H0: 0 vs. H1: < 0
We always compute z and P using 0, so
= 0 is always part of H0.
Math 321 - Dr. Minnotte

Example: Example: A newspaper article


says that college freshmen average 7.5
hours per week at parties.

We suspect the number is lower at our


college.

317

H0 = ?
H1 = ?
Math 321 - Dr. Minnotte

318

Compute the test statistic.

3)

For a one-sample z-test, use the z statistic


with 0:

If is unknown, use s instead.

Example (cont): Interview 100 freshmen.


The average reported time spent at parties
is 6.6 hours, and the standard deviation is
9 hours.

z=?
Math 321 - Dr. Minnotte

Math 321 - Dr. Minnotte

319

106

Find the P-value.

4)

Under H0, z ~ N(0,1). Use probabilities on


z* ~ N(0,1), depending on H1:
H1
P
0

P(|z*| |z|) = 2 P (z* -|z|)

> 0

P(z* z) = P(z* -z)

< 0

P(z* z)

This gives the probability that, if H0 is true, a


new sample would give a statistic
which
disagrees with H0 at least as much as the
statistic
we have.
Math 321 - Dr. Minnotte

320

Example (cont): Interview 100 freshmen.


= 6.6 hours, s = 9 hours.

z=?
P=?

Math 321 - Dr. Minnotte

321

5)

Reject H0 for small P.

Choose a small significance level, .


Values of 0.05 or 0.01 are most commonly
used.

If P , the evidence is pretty strong against


H0, and we say we reject H0 (at the level).
We have strong evidence of H1.
If P > , our test statistic is pretty reasonable
under H0, so we say we fail to reject H0.
Note: a large P-value is not proof of H0; many
other hypotheses may also be reasonable.
This is why we do not say that we accept H0.
Math 321 - Dr. Minnotte

Math 321 - Dr. Minnotte

322

107

Many people call any result with P < 0.05


statistically significant, and any with P < 0.01
highly (statistically) significant.
Note that this is very artificial. A P-value of
0.049 is only slightly stronger than one of
0.051, yet we treat them very differently.
We should always report the P-value, to
provide full information.
You should always explain in words what your
conclusion implies for the situation.
Math 321 - Dr. Minnotte

323

Example (Student data, cont): What does


our P-value suggest about our
hypotheses?

What does this say about the party habits


of freshmen at our university?

Math 321 - Dr. Minnotte

324

If P = 0.16, is it correct to say


P( 7.5) = .16?

In general, with P-value P, can we say


P(H0 is true) = P?

No! Nothing in that statement is random.

P is a conditional probability given H0 is


true, not a probability on H0 itself.
Math 321 - Dr. Minnotte

Math 321 - Dr. Minnotte

325

108

Example: A machine that produces metal


cylinders is set to make cylinders with
diameter 50mm. A random sample of 60
cylinders has = 49.9865 and s = 0.0524.

Is the machine calibrated correctly?

Math 321 - Dr. Minnotte

326

Note that practical significance is not the


same as statistical significance.

Even though we found statistical


significance for our machines calibration,
it may be that the difference between
50mm and 49.9865mm is too small to
justify the expense of recalibration.

Math 321 - Dr. Minnotte

327

Large samples are particularly prone to


indicate statistical significance despite a
difference too small to be important.

Conversely, small samples may come with


large standard errors, so that a difference
which might be very important if confirmed
cannot be shown to be statistically
significant.

Math 321 - Dr. Minnotte

Math 321 - Dr. Minnotte

328

109

We should supplement our test with a


confidence interval, which will do a much
better job of indicating the size and
therefore importance of a potential
difference.

Example: Machine: n = 60,


and s = 0.0524.

What is a 95% confidence interval for ?

= 49.9865

Math 321 - Dr. Minnotte

Note that a 100(1 - )% confidence


interval for will include (exclude) 0
exactly whenever a two-sided test of
H0: = 0 fails to reject (rejects) at the
level.

Similarly, a 100(1 - )% lower bound will


fall below (above) 0 exactly whenever a
one-sided test of H0: 0 fails to reject
(rejects) at the level.

Likewise for upper bounds and tests on


H0: 0.
Math 321 - Dr. Minnotte

329

330

Small-Sample (t) Intervals (5.4)

Strictly speaking, the z-tests and confidence


intervals for means we looked at in Sections
5.2 and 6.1 require that we know the
standard deviation, , for our population.

In practice, for large samples, s is a good


enough estimate for that we can use s
without harming our P-values or interval
coverage severely.

If n is small, s may be far off from , and we


require an adjustment to our intervals that
takes into account this uncertainty.
Math 321 - Dr. Minnotte

Math 321 - Dr. Minnotte

331

110

The t-statistic

Since in practice, both and are usually


unknown, when n is small (n < 30) we
often use the following result:

If X1,Xn ~ Normal(, 2), then

has a t distribution with n-1 degrees of


freedom.
Math 321 - Dr. Minnotte

332

The t distribution is a bell curve with


heavier tails than the normal distribution.

It is always centered on, and symmetric


around, 0.

The t1 curve is most spread out (has the


heaviest tails). As gets larger, the tails
get lighter and the curve gets less spread
out.

As , the t distribution approaches the


standard normal distribution.

Math 321 - Dr. Minnotte

Math 321 - Dr. Minnotte

333

Math 321 - Dr. Minnotte

334

111

Table A.3
contains
important
percentiles
(critical values).

Each row
represents a
different t
distribution.

Math 321 - Dr. Minnotte

335

Example: T ~ t12

P(T 2.681) = 0.01


P(T < -1.356) = 0.10

Example: T ~ t9

P(T 1.833) = ?

P(-2.262 < T < 2.262) = ?

Find c such that P(T > c) = 0.001


Math 321 - Dr. Minnotte

336

We can use a calculation similar to the


one in Section 5.2 to justify a t-based
confidence interval when n < 30.

Let and s be the sample mean and


standard deviation from a sample of size n
from a normal population or process.
Then a confidence interval for the
population mean has the form

Math 321 - Dr. Minnotte

Math 321 - Dr. Minnotte

337

112

Note this is the same form as for the usual


z-interval. The only difference is the
replacement of the usual normal (z) critical
value (such as 1.96) from one found on
the t-table with (n-1) degrees of freedom.

One-sided intervals and bounds may be


found by taking only the appropriate limit
(+ or -) and choosing the one-sided t
critical value (tn-1,).
Math 321 - Dr. Minnotte

338

Example: An experiment measuring the tire


life of a new rubber compound finds the
mileage to end-of-life. A sample of size 10
finds a mean of 61,492 miles and a standard
deviation of 3,035 miles. A normal model is
appropriate.

Find a 95% confidence interval for the


population mean tire life.

If we only care about a minimum, find a 95%


lower bound for population mean tire life.
Math 321 - Dr. Minnotte

339

A brand of margarine was analyzed to


determine the level of polyunsaturated fatty
acid. In 6 samples, the mean percent is
16.98%, and the s.d. is 0.32%. A normal
distribution is reasonable for this variable.

Find a 99% confidence interval for population


mean percent pfa.

Find a 90% upper bound for population mean


percent pfa.
Math 321 - Dr. Minnotte

Math 321 - Dr. Minnotte

340

113

t-Tests (6.4)

We should also use the t distribution to


conduct hypothesis tests when n is small,
and is unknown.

This is especially important when n < 30,


but can be used for any sample size.

We will continue to require the assumption


of a normal population.

Math 321 - Dr. Minnotte

341

The process of conducting a t-test is


identical to conducting a z-test, except for
step 4, computation of the P-value.

1)

We have a single population, and a


specific value, 0, we wish to consider for
the population mean.

2)

Identify H0 and H1 just as for a z-test.

3)

Compute the test statistic

Math 321 - Dr. Minnotte

342

Find the P-value.

4)

Under H0, t ~ tn-1. Use probabilities on t* ~ tn-1, depending


on H1:

H1

P(|t*| |t|) = 2 P (t* |t|)

> 0

P(t* t)

< 0

P(t* t)

Tables (such as Table A.3) can give a range for P. Minitab


and other software can make your P-values exact.
P still gives the probability that, if H0 is true, a new sample
would give an
which is at least as unusual as the
we
have.
Math 321 - Dr. Minnotte

Math 321 - Dr. Minnotte

343

114

5)

Reject H0 for small P.

Example: A car manufacturer claims a


model gets 35 mpg. A consumer group
wishes to test this claim. We measure 14
cars, find = 34.271 mpg, s = 2.915 mpg.
H0?

H1?

t=?

d.f. = ?

P=?
Conclusion?
Math 321 - Dr. Minnotte

344

Example Minitab output: One-sample


tests, intervals

Results for: cholesterol.txt


One-Sample T: Cholesterol in mg/dL

Test of mu = 175 vs not = 175

Variable
Cholesterol in m

Mean

StDev

SE Mean

20

205.800

48.392

10.821

95% CI
(183.152, 228.448)

2.85

0.010

Math 321 - Dr. Minnotte

345

Type I and Type II Errors (6.6)

We can make two different kinds of error


when hypothesis testing, depending on
our decision and the actual (unknown)
truth:
Truth
Reject H0
Decision

Fail to
Reject H0

H0True

H1 True

Type I
Error
Correct
Decision

Correct
Decision
Type II
Error

Math 321 - Dr. Minnotte

Math 321 - Dr. Minnotte

346

115

Type I Error is generally considered more


serious. This may influence the choice of
H0 and H1.

Our significance level, , is the probability


of Type I error we are willing to accept
(when the null hypothesis is true).

Note that this does not control Type II


error at all. The probability of Type II error
may be very large, especially for small n.

Math 321 - Dr. Minnotte

347

It might help to think about the possible


consequences of the different errors, by
considering the (usually nonstatistical)
example of a jury trial.
Truth
H0True
(Defendant
Innocent)

Reject H0
(Convict)
Decision

Fail to
Reject H0
(Acquit)

H1 True
(Defendant
Guilty)
Correct
Type I Error
Decision
Correct
Decision

Type II
Error

Math 321 - Dr. Minnotte

348

Power (6.7)

Definition: The power of a significance test


with significance level is
power = P(reject H0 | H0 false)
= 1 P(Type II Error | H0 false).

If there is more than one value of


associated with H1, power will generally be
computed for a specific value of .
Math 321 - Dr. Minnotte

Math 321 - Dr. Minnotte

349

116

Large power is good, and power will


increase as the sample size increases.

There is no firm rule about power, but it is


desirable to have a power of at least 0.8 or
0.9 for a difference which is big enough to
be important.

Power should be computed prior to


conducting an experiment whenever
possible, to verify that the experiment will
probably show results if the difference you
desire or anticipate exists.
Math 321 - Dr. Minnotte

To compute power, we must:


1.
2.

3.

Decide on a significance level, .


Compute the rejection region, the set of
possible values of
which would lead to
rejecting H0.
Compute the probability of finding in the
rejection region, given the specified value of
.

See your book for a full example.

Math 321 - Dr. Minnotte

350

351

Statistical software such as Minitab can compute the


power for a specified test, or the sample size necessary
to achieve a given power.

Power and Sample Size


1-Sample t Test
Testing mean = null (versus > null)
Calculating power for mean = null + difference
Alpha = 0.05

Assumed standard deviation = 1.5

Sample
Difference

Size

Power

0.786845
Math 321 - Dr. Minnotte

Math 321 - Dr. Minnotte

352

117

Multiple Testing (6.8)

Even if the null hypothesis is true, there is


a 5% chance that the result of a test will
have a p-value of less than 0.05.

If we carry out 200 tests, we expect 5%, or


10, to be significant, even if the null is true
every time.

This can be useful in exploration, but the


significant results should be reconfirmed
with an additional study with new data.
Math 321 - Dr. Minnotte

353

Example: Researchers looked at


asbestos fibers in water and many cancer
rates. They did many tests, only a few of
which gave p-values less than 0.05, and
only one of which gave less than 0.01.

They concluded there was a strong


relationship between lung cancer rates
and asbestos concentration, even though
their own study suggested that a 100-fold
increase in asbestos was accompanied by
a 5% increase in lung cancer rate.
Math 321 - Dr. Minnotte

354

If we have multiple tests, we can adjust for


this with the Bonferroni correction.

Adjusted P = Original P Number of tests

The Bonferroni adjustment is conservative,


so if the adjusted P is small, we may still
reject H0.
Math 321 - Dr. Minnotte

Math 321 - Dr. Minnotte

355

118

Example: Test 5 new fertilizer formulations


for mean yields higher than current standard
formulation.

Formulation A: P = 0.49
Formulation B: P = 0.24
Formulation C: P = 0.17
Formulation D: P = 0.003
Formulation E: P = 0.53

Adjust Formulation Ds P-value: PB = 5(.003)


= .015.

Still small, reject H0. D appears to be an


improvement.
Math 321 - Dr. Minnotte

356

Example: As before, but for D, P = .022.

Now PB = 5(0.022) = 0.11.

Not small enough to be conclusive. Fail to


reject H0. D might have been higher by
chance.

If the confidence interval suggests


practical significance, rerun the
experiment for D, and collect new data.

If P is small with new data, results are


convincing.
Math 321 - Dr. Minnotte

357

Chapter 7: Two-Sample Inference

In many situations, we may not care about


the specific value of the mean of a
population, so much as comparing the
means of two separate, but related,
populations.

We use two-sample inference to


investigate these sorts of questions.

Math 321 - Dr. Minnotte

Math 321 - Dr. Minnotte

358

119

Example: Sexual discrimination?

M = population mean of male salaries


F = population mean of female salaries
Test H0: M F vs. H1: M > F

Example: New fertilizer for corn

1 = average yield for the new treatment


0 = average yield for a common current
treatment
Test H0: 1 0 vs. H1: 1 > 0
Confidence interval on (1 - 0)
Math 321 - Dr. Minnotte

Suppose we have two populations or


processes.

Population 1 (X) has mean X and


standard deviation X.

Likewise, population 2 (Y) has mean Y


and standard deviation Y.

We will compare the individual means by


looking at the difference, X - Y.
Math 321 - Dr. Minnotte

359

360

We usually do this by collecting


independent samples from each
population (of sizes nX and nY, which may
or may not be the same) and computing
the sample means ( and ) and
standard deviations (sX and sY).

We will estimate the difference of


population means, X - Y, by the
difference of the sample means,

To do inference, we need to know about


the sampling distribution of
Math 321 - Dr. Minnotte

Math 321 - Dr. Minnotte

361

120

Recall the following results from Sections


3.4 and 4.3:

1)

For any X and Y, X-Y = X Y.


The mean of the difference is the
difference of the means.

2)

For any independent X and Y,


The variance of the difference is the sum
of the variances.

3)

If X and Y are (approximately) normal, so


is the difference.
Math 321 - Dr. Minnotte

362

Those results, together with what we


already know about the sampling
distributions of and
give us:

is an unbiased estimator of X - Y.

1)

Show:

The standard error of

2)

is

Show:
Math 321 - Dr. Minnotte

363

3)

If both populations are normal, so is the


sampling distribution of

4)

If both sample sizes are large, the Central


Limit Theorem tells us that the sampling
distribution of
will be approximately
normal no matter what shapes the
population distributions have.

5)

When standardizing using sample


standard deviations from small samples
from normal populations, we should
continue to use a t-distribution.
Math 321 - Dr. Minnotte

Math 321 - Dr. Minnotte

364

121

Two-Sample z-Tests (7.1)

If we have two independent samples, we


construct a two-sample test in much the
same way as the one-sample version.

If both samples are large, we may use the


normal distribution as we do in the singlesample case.

Remember our five steps of hypothesis


testing.
Math 321 - Dr. Minnotte

We have two populations whose means we


wish to compare, and a sample from each
population.

1)

Population 1 (X) has mean X and standard


deviation X.
Likewise, population 2 (Y) has mean Y and
standard deviation Y.
The sample means and will be different,
even if the population means X and Y are the
same.
Are and different enough to provide strong
evidence that X and Y are different as well?
Math 321 - Dr. Minnotte

365

366

Example: A psychological test (the


Chapin Social Insight Test) is given to a
large number of college students, with a
desire to see if there is a difference in how
men and women score.

Does a test suggest that the populations of


college men and women have different
means on this test?
Math 321 - Dr. Minnotte

Math 321 - Dr. Minnotte

367

122

Identify H0 and H1. We generally rearrange


our hypotheses to describe a statement
about a difference in the population means.

2)

Two-sided:
H0: X Y = vs. H1: X Y .

One-sided:
H0: X Y vs. H1: X Y > .

or:
H0: X Y vs. H1: X Y < .

Usually, = 0, so that X = Y is part of H0.


Math 321 - Dr. Minnotte

368

Example: (Students, continued):

We are interested in showing any difference


between the mens population mean (X) and
the womens population mean (Y), so use a
two-sided alternative.

H0: X Y = 0 vs. H1: X Y 0.

Math 321 - Dr. Minnotte

369

Compute the test statistic.

3)

A z (or t) statistic always has the form:

In this case, our parameter is X Y.

Our point estimate is

Our null value is .

The standard error of


Math 321 - Dr. Minnotte

Math 321 - Dr. Minnotte

is
370

123

Our z statistic is therefore

Example: Students:

z=?

Math 321 - Dr. Minnotte

Find the P-value.

4)

This is still a z-test. Under H0, z ~ N(0,1). Use


probabilities on z* ~ N(0,1), depending on H1:
H1

X Y

P(|z*| |z|) = 2 P (z* -|z|)

X Y >

P(z* z) = P(z* -z)

X Y <

P(z* z)

P is now the probability that, if H0 is true, a


new sample would give a difference which is
at least as unusual as the one we have.
Example: z = 0.65. P = ?
Math 321 - Dr. Minnotte

372

Reject H0 for small P.

5)

371

The interpretation of P is exactly the same as


for any other hypothesis test.
If P , the evidence is pretty strong against
H0, and we say we reject H0 (at the level).
We have strong evidence of H1.
If P > , our test statistic is pretty reasonable
under H0, so we fail to reject H0. H0 is
plausible (although probably so is H1).

Example: What does our P say about the


populations of student men and women?
Math 321 - Dr. Minnotte

Math 321 - Dr. Minnotte

373

124

Example: It is claimed that (on average)


tensile strength should be at least 8
N/mm2 greater for 12mm-diameter steel
rods than for 10mm-diameter rods.
Samples of size 50 give:

Can we confirm the claim?

Math 321 - Dr. Minnotte

374

Two-Sample Confidence Intervals

Instead of, or in addition to, a two-sample


test, we may desire a confidence interval
for the difference of the population means,
X Y.

The same results that allow us to conduct


a test also allow computation of this
interval.

Math 321 - Dr. Minnotte

A 100(1-)% confidence interval for


X - Y will be

Interpretation of the confidence level is


exactly the same as in the one-sample
case.
Math 321 - Dr. Minnotte

Math 321 - Dr. Minnotte

375

376

125

Example: Students:

Find a 95% confidence interval for the


difference in population means between
men and women.

Math 321 - Dr. Minnotte

377

Example: Steel rods:

Find a 99% lower bound on the difference


in population means.

Math 321 - Dr. Minnotte

378

Two-Sample t-Tests and Intervals (7.3)

Just as in the one-sample case, if at least


one of the sample sizes is small, we run
into the same dangers for estimating the
standard error from the sample as we do
in the single-sample case.

We again use a t-table to compensate.

A two-sample t-test is conducted exactly


as a two-sample z-test, but uses t
probabilities for P-values.
Math 321 - Dr. Minnotte

Math 321 - Dr. Minnotte

379

126

We find P from the t-table or a computer


package such as Minitab, just as with the
one-sample t-test.

The degrees of freedom to be used can be


calculated as

Confidence intervals use critical values t,/2.


Math 321 - Dr. Minnotte

380

Example: Kudzu pulp yield:

Researchers are investigating using kudzu as


an alternative to wood pulp in paper
production.
They wish to determine if adding the chemical
anthraquinone increases the pulp yield.

Treatment: 25 experiments with

Control: 20 experiments without

Math 321 - Dr. Minnotte

Is the anthraquinone increasing the


population mean yield? Conduct a twosample t-test.

Construct a 95% confidence interval for


mean improvement.

Math 321 - Dr. Minnotte

Math 321 - Dr. Minnotte

381

382

127

Inference with Paired Data (7.4)

Sometimes, we can improve an estimate


of population differences by arranging to
collect the data in a paired fashion.

Each observation from population 1 (X)


should be paired with an observation from
population 2 (Y).

This will be effective if the pairing is such


that the pairs tend to be correlated.
Math 321 - Dr. Minnotte

383

Example: Heart rate data

Compare two drugs for effect on heart rate


reduction.
For samples of size 20, we could get 40
volunteers and divide them at random into two
groups to get independent samples.
If drug response varies substantially from
subject to subject, it may be better to give
both drugs to each subject (on different
occasions, in random order).
This reduces the effect of subject variability,
and is probably cheaper and easier as well!
Math 321 - Dr. Minnotte

Other examples:

Test tire wear for two brands by putting


Brand A on left front wheel and Brand B
on right front wheel (or vice-versa, at
random) on the same cars.

Compare airplane deicing procedures by


using one method on left wing, other on
right wing of the same planes.

Math 321 - Dr. Minnotte

Math 321 - Dr. Minnotte

384

385

128

Dealing with paired data (Xi, Yi) is actually


simpler than dealing with two independent
samples.

For each observation, compute the


difference Di = Xi Yi, and then conduct a
one-sample z- or t-test or construct a onesample z- or t-confidence interval on the
differences Di, depending on the number
of pairs.

This will give us a test or interval for


D = X Y.
Math 321 - Dr. Minnotte

386

Example: (Heart rate data, continued):

Let Xi be the percent rate reduction from the


standard drug for subject i, and let Yi be the
same for the new drug. Let Di = Xi Yi.
Data:
Patient
Xi
Yi
Di
1
28.5
34.8
-6.3
2
26.6
37.3
-10.7
:
:
:
:
20
40.1
40.8
-0.7
Math 321 - Dr. Minnotte

387

Example (Heart rate, continued):

Is there enough evidence to conclude that


the new drug is more effective at reducing
heart rate than the old one on average?

Construct a 95% paired t-interval for


d = 1 2.
Math 321 - Dr. Minnotte

Math 321 - Dr. Minnotte

388

129

Example Minitab output:


Two-sample tests, intervals

Results for: kudzu.txt


Two-Sample T-Test and CI: With, Without

Two-sample T for With vs Without


N

Mean

StDev

SE Mean

With

25

44.18

3.99

0.80

Without

20

38.56

3.63

0.81

Difference = mu (With) - mu (Without)


Estimate for difference:

5.62500

95% lower bound for difference:

3.71000

T-Test of difference = 0 (vs >): T-Value = 4.94

P-Value = 0.000

DF = 42

Math 321 - Dr. Minnotte

389

Results for: heartrate.txt


Paired T-Test and CI: StdDrug, NewDrug

Paired T for StdDrug - NewDrug

Mean

StDev

StdDrug

40

31.1825

4.8318

SE Mean
0.7640

NewDrug

40

33.8375

4.9379

0.7808

Difference

40

-2.65500

3.73012

0.58978

99% CI for mean difference: (-4.25208, -1.05792)


T-Test of mean difference = 0 (vs not = 0): T-Value = -4.50

P-Value = 0.000

Math 321 - Dr. Minnotte

390

Chapter 9: Analysis of Variance

In Chapter 7, we looked at comparing the


means of samples from two populations.

Suppose we have samples from more


than two populations, and we wish to test
whether all of the populations have the
same mean.

We use an extension of two-sample tests


called Analysis of Variance (ANOVA).

ANOVA generates P-values using another


important distribution, the F distribution.
Math 321 - Dr. Minnotte

Math 321 - Dr. Minnotte

391

130

The F distribution (7.5)

The F distribution is another distribution


commonly used in hypothesis tests.

It is a skewed distribution with support on the


positive real line.

It has two degrees of freedom parameters, 1


and 2.

If X ~
(X has an F distribution with 1 and
2 degrees of freedom),

Math 321 - Dr. Minnotte

392

Math 321 - Dr. Minnotte

393

Table A.6 contains critical values similar to those


for the t distributions.

With a separate table needed for each


combination of 1 and 2, we only get critical
points for a few values of .

We have tables for = 0.10, 0.05, 0.01, and


0.001.

F tests are generally upper-tailed, so these


generally give us what we need.

Minitab and other packages can give us more


precise P-values.
Math 321 - Dr. Minnotte

Math 321 - Dr. Minnotte

394

131

Math 321 - Dr. Minnotte

395

Example: x ~ F5,7

For what c is P(x c) = .10?

For what c is P(x c) = .01?

If an F statistic is 10 (with degrees of freedom


5 and 7), what can we say about an uppertailed P-value?

Math 321 - Dr. Minnotte

396

Single Factor Analysis of Variance (9.1)

Suppose we have I populations (I 3).


The populations are often called levels.

The variable identifying the levels is called


a factor.

A factor variable may be categorical, but it


also may identify different treatment
groups in a controlled experiment.

Math 321 - Dr. Minnotte

Math 321 - Dr. Minnotte

397

132

From each population i we take an


independent sample of size Ji.

Note: all variances and standard


deviations are assumed to be equal.

The total sample size is


N = J1+ J2 + + JI.
Math 321 - Dr. Minnotte

398

We wish to test
H0: 1= 2 = = I vs.
H1: Two or more of the i are different.

We begin by estimating the individual


population means as
and the common, or grand, mean (if H0 is
true) as

Math 321 - Dr. Minnotte

399

Example: Artery data


Stenosis
Level 1

Level 2

Level 3

Flowrate

10.6

11.7

19.6

(ml/s) at

9.7

12.7

15.1

collapse

8.3

17.6

16.6

11
11.209

14
15.086

10
17.330

Ji

N = 11 + 14 + 10 = 35
Math 321 - Dr. Minnotte

Math 321 - Dr. Minnotte

400

133

Math 321 - Dr. Minnotte

401

Math 321 - Dr. Minnotte

402

The variability between levels is estimated


by the treatment sum of squares.

The variability within levels is estimated by


the error sum of squares.

Math 321 - Dr. Minnotte

Math 321 - Dr. Minnotte

403

134

The total variability of the dataset is found


as the total sum of squares.

where s2 is the sample variance of the full


dataset.

Note that

SST = SSTr + SSE.


Math 321 - Dr. Minnotte

404

Example (Artery data):

Math 321 - Dr. Minnotte

405

To test our hypotheses, we compare these


two measures of variability in an F-statistic

The numerator and denominator of this


statistic are called the mean square for
treatments and the mean square error,
respectively.

Math 321 - Dr. Minnotte

Math 321 - Dr. Minnotte

406

135

If the null hypothesis is true, then the


mean square for treatments and the mean
square error are both estimates of the
common variance, 2, and

Large differences between the level


means will lead to large values of F, so the
P-value is defined as
P = P(X F), where
Math 321 - Dr. Minnotte

Example (Artery data):

F=?

P-value?

Conclusion?

407

Math 321 - Dr. Minnotte

408

Results for: arteries.txt


One-way ANOVA: Collapse Flowrate versus Amount of Stenosis
Source
Amount of Stenos
Error
Total
S = 2.080

Level
level 1
level 2
level 3

DF
2
32
34

SS
204.02
138.47
342.49

R-Sq = 59.57%

N
11
14
10

Mean
11.209
15.086
17.330

StDev
1.899
2.150
2.168

MS
102.01
4.33

F
23.57

P
0.000

R-Sq(adj) = 57.04%

Individual 95% CIs For Mean Based on Pooled


StDev
+---------+---------+---------+--------(----*----)
(---*----)
(----*-----)
+---------+---------+---------+--------10.0
12.5
15.0
17.5

Pooled StDev = 2.080

Math 321 - Dr. Minnotte

Math 321 - Dr. Minnotte

409

136

Pairwise Comparisons (9.2)

If an ANOVA test rejects H0, it suggests


that at least some of the level means are
different from one another, but does not
automatically identify which ones.

We can do two-sample tests at this point,


but doing many tests risks false positives
(Type I error recall Section 6.8).

We could use a Bonferroni correction, but


this is unnecessarily conservative.
Math 321 - Dr. Minnotte

410

Instead, we should use the Tukey-Kramer


multiple comparisons procedure, which
adjusts for the number of tests.

Such a procedure will have probability of


one or more Type I errors (false significant
differences) out of the full set.

The more tests conducted, the smaller the


individual probabilities of Type I error (and
the more conservative the individual tests)
must be.
Math 321 - Dr. Minnotte

411

In addition to the level means and sample


sizes, we need an estimate of the common
variance, 2.

The mean square error,


MSE = SSE/(N - I)
is an estimate of 2, so we use that.

Math 321 - Dr. Minnotte

Math 321 - Dr. Minnotte

412

137

We also need a critical value from Table A.7, the


Studentized range distribution, which adjusts for
the multiple comparisons.

We use qI,N-I,.

A 100(1- )% confidence interval for the


difference between level means i and j is

Then i and j are considered significantly


different at level if this interval does not
include 0.
Math 321 - Dr. Minnotte

413

Example: (Artery data)

q3,32,.05 3.49

Find 95% simultaneous c.i.s for:

2 1:
3 1:
3 2:

Which levels are significantly different?


Math 321 - Dr. Minnotte

414

Tukey 95% Simultaneous Confidence Intervals


All Pairwise Comparisons among Levels of Amount of Stenosis
Individual confidence level = 98.06%
Amount of Stenosis = level 1 subtracted from:
Amount
of
Stenosis
level 2
level 3

Lower
1.814
3.884

Center
3.877
6.121

Upper
5.939
8.357

--+---------+---------+---------+------(-----*-----)
(-----*------)
--+---------+---------+---------+-------3.5
0.0
3.5
7.0

Amount of Stenosis = level 2 subtracted from:


Amount
of
Stenosis
level 3

Lower
0.125

Center
2.244

Upper
4.364

--+---------+---------+---------+------(-----*-----)
--+---------+---------+---------+-------3.5
0.0
3.5
7.0

Math 321 - Dr. Minnotte

Math 321 - Dr. Minnotte

415

138

Model Assumptions

Remember that ANOVA is based on a


model of independent draws from normal
populations with a common variance.

Minor deviations from normality or a


common variance will not have a strong
effect, but large deviations will require
other techniques.

Histograms, boxplots, and prior


experience can all be useful guides.
Math 321 - Dr. Minnotte

416

Inference for Population Proportions

We used the known sampling distribution


of to construct confidence intervals for
its related parameter, .

We can do the same thing with the


sampling distribution of the sample
proportion, , to construct tests and
confidence intervals for a population
proportion or probability, p.
Math 321 - Dr. Minnotte

417

Z-tests for Proportions (6.3)

Recall from 5.1, if X ~ Binomial(n, p) and

If np 10 and n(1 - p) 10, then n is large


enough that the Central Limit Theorem tells
us that is approximately normal as well.

We can use this sampling distribution to do


hypothesis tests on p. Since we use the
normal table, these will be z-tests.
Math 321 - Dr. Minnotte

Math 321 - Dr. Minnotte

418

139

We have a single population and a specific


value, p0, we wish to consider for the
population probability of success or
population proportion of successes.

1)

A sample from the population will give a


sample proportion different from p0, even if
that is the actual population proportion.
Example: We have a possibly biased coin.
We wish to test whether or not

p = P(Heads) = 0.5 = p0.


Math 321 - Dr. Minnotte

419

Identify H0 and H1.

2)

Set up H1 as the statement that something


interesting is going on, or what we wish to
prove.
Choose a one-sided or two-sided alternative,
depending on our purpose.
Two-sided: H0: p = p0 vs. H1: p p0.
One-sided: H0: p p0 vs. H1: p > p0
or:
H0: p p0 vs. H1: p < p0
Example: Coin: H0: p= 0.5 vs. H1: p 0.5.
Math 321 - Dr. Minnotte

420

Compute the test statistic.

3)

Just as with tests on , we find the z test


statistic

Example: 400 coin flips, 176 heads.

=?
z=?

Math 321 - Dr. Minnotte

Math 321 - Dr. Minnotte

421

140

Find the P-value.

4)

This is still a z-test. Under H0, z ~ N(0,1). Use


probabilities on z* ~ N(0,1), depending on H1:
H1

P(|z*| |z|) = 2 P (z* -|z|)

> 0

P(z* z) = P(z* -z)

< 0

P(z* z)

Now this is the probability that, if H0 is true, a


new sample would give a
which disagrees
we have.
with H0 at least as much as the
Example: z = -2.40. P = ?
Math 321 - Dr. Minnotte

422

Choose a small , and reject H0 for P < .


We have strong evidence against the null
hypothesis.

5)

Otherwise, fail to reject H0. The null


hypothesis is plausible.
Example:

H0: p = 0.5 vs. H1: p 0.5.


z = -2.40, P = 0.0164.
If = 0.05, what should we conclude?
Math 321 - Dr. Minnotte

423

Example: 600 students, 217 knew the


author of The Canterbury Tales. Should
we believe that more than 1/3 of all
students know this?

Math 321 - Dr. Minnotte

Math 321 - Dr. Minnotte

424

141

Minitab will do the same test using exact


binomial probabilities for a slightly more
accurate P-value.

Test and CI for One Proportion


Test of p = 0.5 vs p not = 0.5
Exact
Sample
1

Sample p

95% CI

P-Value

176

400

0.440000

(0.390707, 0.490187)

0.019

Test and CI for One Proportion


Test of p = 0.5 vs p not = 0.5

Sample
1

Sample p

95% CI

176

400

0.440000

(0.391355, 0.488645)

Z-Value

P-Value

-2.40

0.016

Math 321 - Dr. Minnotte

425

Minitab may also be used to conduct


power calculations for tests of proportions.

Power and Sample Size


Test for One Proportion

Testing proportion = 0.5 (versus not = 0.5)


Alpha = 0.05

Alternative

Sample

Target

Proportion

Size

Power

Actual Power

0.55

783

0.8

0.800239

Math 321 - Dr. Minnotte

426

Confidence Intervals for


Population Proportions (5.3)

We have:

Isolating p in the probability statement is


trickier than in the case for , because it
appears in both the numerator and the
denominator.
Math 321 - Dr. Minnotte

Math 321 - Dr. Minnotte

427

142

In the past, people usually replaced the


unknown ps in the standard error with the
known , so the 95% confidence interval
for p would be

This traditional interval has the same


format as the -interval:

It works well for large n, but is no longer


recommended.
Math 321 - Dr. Minnotte

428

Unfortunately, the probability of containing


p in the interval can be well below 95% for
smaller n.

It turns out this can be corrected by adding


2 successes and 2 failures to our counts:

Then a 100(1-)% confidence interval for


p will be
for all n.

Math 321 - Dr. Minnotte

429

Example: In a blind taste test, subjects


choose between instant and fresh-brewed
coffee. Out of 40 subjects, 12 prefer the
instant coffee. If p is the probability that a
random person prefers instant coffee in a
blind test, find a 95% confidence interval
for p.

Math 321 - Dr. Minnotte

Math 321 - Dr. Minnotte

430

143

Example: Suppose we flip a (possibly


biased) coin. Let p be the probability of a
head. If 100 tosses result in 45 heads,
find a 95% confidence interval for p. Is it
plausible that our coin could be fair?

If 1000 tosses give 450 heads, now what


is our confidence interval for p?

Math 321 - Dr. Minnotte

431

If we have an idea of the value of p, and we


require a 95% confidence interval of error bound
(interval half-width) no more than w, we can
compute a minimum sample size.

Of course, we can substitute the appropriate z


critical value to find sample sizes for other
confidence levels.
Math 321 - Dr. Minnotte

If p is unknown, we should replace it with


the conservative value 0.5, and require a
minimum sample size of

The associated error bound of

432

is the margin of error that surveys and


polls generally report.
Math 321 - Dr. Minnotte

Math 321 - Dr. Minnotte

433

144

Example: If we take a survey and want a


2% margin of error (w = 0.02), how big a
sample must we take?

What if were willing to settle for a 3%


margin of error?

Math 321 - Dr. Minnotte

434

Two-Proportion Inference (7.2)

Just as we can use hypothesis tests and


confidence intervals to compare means of
two related populations, we may also use
them to compare two related binomial
probabilities or population proportions.

Suppose X ~ B(nX, pX), and Y ~ B(nY, pY).

We estimate the probabilities with

Math 321 - Dr. Minnotte

435

If we want a confidence interval on the


difference between two proportions, we
can use the fact that for independent X
and Y with large nX and nY,

Math 321 - Dr. Minnotte

Math 321 - Dr. Minnotte

436

145

Recall that for a modern confidence


interval on a single proportion, we used an
alternative estimate of p, adding two
successes and two failures to our counts:

With two samples, we distribute the extras


between the two estimates:

Math 321 - Dr. Minnotte

437

Our 100(1 )% confidence interval is

Example: Are rural households more likely


to use a natural Christmas tree than urban
ones?

Rural: nX = 160, X = 64
Urban: nY = 261, Y = 89
Find a 95% confidence interval for pX pY.
Math 321 - Dr. Minnotte

438

To test H0: pX = pY = p vs. H1: pX pY,


we must estimate the common null
proportion p with the pooled proportion:

Then our z statistic may be found as

Compute P-values as for any other z test.


Math 321 - Dr. Minnotte

Math 321 - Dr. Minnotte

439

146

Example: Natural Christmas trees

Rural: nX = 160, X = 64
Urban: nY = 261, Y = 89
Can we conclude that rural households are
more likely to use a natural Christmas tree
than urban ones?

Math 321 - Dr. Minnotte

440

Example: Are male college students more


likely to be frequent binge drinkers than
female students?

Of 9916 college women surveyed, 1684 were


classified as frequent binge drinkers.
Of 7180 college men surveyed, 1630 were
considered frequent binge drinkers.
Is there a significant difference in the
populations?

Math 321 - Dr. Minnotte

441

Minitab:

Test and CI for Two Proportions

Sample

Sample p

1684

9916

0.169827

1630

7180

0.227019

Difference = p (1) - p (2)


Estimate for difference:

-0.0571930

95% upper bound for difference:

-0.0469659

Test for difference = 0 (vs < 0):

Z = -9.34

Math 321 - Dr. Minnotte

Math 321 - Dr. Minnotte

P-Value = 0.000
442

147

Chi-Squared Tests (6.5)

A number of common tests use members


of the chi-squared family of distributions as
null distributions.

We will study chi-squared tests for


hypotheses involving categorical variables
with more than two categories.

This includes tests of proportions


comparing more than two populations.
Math 321 - Dr. Minnotte

443

The Chi-Squared Distribution

The chi-squared distribution is skewed,


with support on the positive real line.

It has one degrees of freedom parameter,


.

If Y ~

, then E(Y) = and V(Y) = 2.

Math 321 - Dr. Minnotte

Math 321 - Dr. Minnotte

444

Math 321 - Dr. Minnotte

445

148

Table A.5
contains
important critical
values,

If

then

Math 321 - Dr. Minnotte

446

Example: X~

P(X 7.815) = 0.05


P(X 11.345) = 0.01

Example: X~

P(X 9.236) = ?

Find c such that P(X c) = 0.025.

Math 321 - Dr. Minnotte

447

Chi-Square Tests on One


Categorical Variable

In many situations, outcomes are broken


down into multiple categories.

If there are only two categories of interest


(success / failure), then we study the
probability of a success, p, and test using
the z-test.

Math 321 - Dr. Minnotte

Math 321 - Dr. Minnotte

448

149

If there are more than two categories of


interest, we analyze them in a different
way.

Just as with binomial data, we generally


dont record each observation individually.
Instead, we record the number of times
each category occurs, in a contingency
table.

Math 321 - Dr. Minnotte

449

Example: A die is rolled 90 times, and we


record the numbers rolled. The
contingency table might look like:
Roll

Total

Count

14

20

17

10

21

n = 90

Math 321 - Dr. Minnotte

450

Example: Factory floor accidents are


recorded by day of the week.
Day

Th

Total

Count

65

43

48

41 73 n = 270

Example: Snapdragon color genotype may


be any of three colors.
Color

Red

Count

57

Pink White
89

54

Math 321 - Dr. Minnotte

Math 321 - Dr. Minnotte

Total
n = 200
451

150

Suppose we have N independent trials.


Each trial may result in category 1 with
probability p1, category 2 with probability
p2, and so on up to category k with
probability pk. (Note: p1 + + pk = 1.)

Let Oi be the observed count of trials in


category i, i = 1, k.

Math 321 - Dr. Minnotte

452

Suppose we wished to test whether the die


was fair, or whether accidents were equally
likely each day of the week, or whether the
snapdragons satisfy standard genetic theory.

Such a hypothesis can be written:


H0: p1 = p10, , pk = pk0.

Example (Die): H0: pi = 1/6, i = 1, 6.


Example (Factory): H0: pi = 1/5, i = 1, 5.
Example (Snapdragons): H0: p1 = .25, p2 = .5,
p3 = .25
Math 321 - Dr. Minnotte

453

The many-sided alternative, H1, here is


simply that at least one of the probabilities
is incorrect. This is often left implied.

We test these hypotheses by comparing


the observed cell frequencies O1, , Ok,
with expected cell frequencies E1,, Ek.

If the probability of category i is pi0, then


the expected count in this category from N
trials is Ei = Npi0.
Math 321 - Dr. Minnotte

Math 321 - Dr. Minnotte

454

151

We need an expected frequency for each cell


in our table.

Example (Die): N = 90, pi0 = 1/6, so


Ei = 90/6 = 15,

i = 1, , 6.

Example (Factory): N = 270, pi0 = 1/5, so


Ei = 270/5 = 54,

i = 1, , 5.

Note that the Eis might not be integers, and


will only be equal if the pi0s are.

Example (Snapdragons): N = 200, so


E1 = E3 = 200*.25 = 50, E2 = 200*.5 =100.
Math 321 - Dr. Minnotte

455

Once we have an observed and an


expected frequency for each cell in our
table, we compute a test statistic.

This statistic will compare the two sets of


frequencies, with a larger value indicating
less similar sets.

We will compute P-values using the chisquare table, so these are referred to as
chi-square statistics (and tests).
Math 321 - Dr. Minnotte

456

The chi-square statistic:

X2 gets larger as Oi differs from Ei for any


cell, in either direction.

Cells where Ei is larger require a larger


difference to contribute as much to X2.
Math 321 - Dr. Minnotte

Math 321 - Dr. Minnotte

457

152

We compute P-values by comparing X2 to


a chi-square distribution with (k-1) degrees
of freedom (one less than the number of
cells).

P-value = P(k-12 X2)

We can reject H0 at the level if our


statistic is larger than the critical point

Math 321 - Dr. Minnotte

458

Example (Die):
Roll
Oi

1
14

2
20

3
17

4
10

5
8

6
21

Total
N = 90

Ei

15

15

15

15

15

15

N = 90

d.f. = 5,

0.05 P 0.10
Math 321 - Dr. Minnotte

459

Example (Snapdragons):
Color
Oi

Red
57

Pink
89

White Total
54
N = 200

Ei

H0: p1 = 0.25, p2 = 0.5, p3 = 0.25

X2 = ?

P?

Math 321 - Dr. Minnotte

Math 321 - Dr. Minnotte

460

153

Two-Way Tables and Testing


Independence

Sometimes we collect data on more than


one categorical variable at once. We can
present the results for two such variables
in a two-way table, with one variable in
rows, the other in columns.

Recall, such data can be plotted in


clustered bar charts.
Math 321 - Dr. Minnotte

461

Example: Handedness by Sex

Right-handed

Men
934

Women
1070

Total
2004

Left-handed

113

92

205

Ambidextrous

20

28

Total

1067

1170

2237

Math 321 - Dr. Minnotte

Math 321 - Dr. Minnotte

462

Math 321 - Dr. Minnotte

463

154

We generalize the notation as:


1

O11

O12

O21

O22

Oi1

Oi2

Row
Totals

O1j

O1J

O1

O2j

O2J

O2

Oij

OiJ

O i

OI1

OI2

OIj

OIJ

OI

Column
Totals

O1

O2

Oj

OJ

O=N

Math 321 - Dr. Minnotte

464

The null hypothesis we usually wish to test


from such two-way data is that the two
variables are independent. That is, the
probability of seeing one level in variable 1
does not depend on the level in variable 2
and vice-versa.

We test this hypothesis against an


alternative of dependence using another
chi-square test.

Note: We have no specific probabilities in


mind here.
Math 321 - Dr. Minnotte

465

Alternatively, if one of our variables


represents several related populations, our
null hypothesis is that these populations
are homogenous with respect to the
remaining variable.

That is, the probability for each category is


the same for each population.

This will be tested exactly the same way


as independence.
Math 321 - Dr. Minnotte

Math 321 - Dr. Minnotte

466

155

The observed cell frequencies are the counts


Oij.

The expected cell frequencies for a test of


independence are found as

We still compute X2 as before, summing over


all cells.

We use a chi-square with (I 1)*(J 1)


degrees of freedom to find our P-value.
Math 321 - Dr. Minnotte

467

Example: Handedness by Sex

Right-handed

Men
934

Women
1070

Total
2004

Left-handed

113

92

205

Ambidextrous

20

28

Total

1067

1170

2237

X2?

d.f.?

P?
Math 321 - Dr. Minnotte

468

Minitab:

Chi-Square Test: Men, Women


Expected counts are printed below observed counts
Chi-Square contributions are printed below expected counts

Total

Men

Women

Total

934

1070

2004

955.86

1048.14

0.500

0.456

113

92

97.78

107.22

2.369

2.160

20

13.36

14.64

3.306

3.015

1067

1170

205

28

2237

Chi-Sq = 11.806, DF = 2, P-Value = 0.003


Math 321 - Dr. Minnotte

Math 321 - Dr. Minnotte

469

156

Ch. 8: Inference in Regression


Recall our analyses of Chapter 2, where
we looked at bivariate data.

Example:

January and February inflows of the Nile river


at a location.

Regression involves modeling and


predicting the values of one response
variable, based on the observed values of
one or more other explanatory variables.

Math 321 - Dr. Minnotte

470

Inference in Simple Linear Regression


The simple linear regression model fits a
straight line to a set of paired data
observations.

Formally:

yi = 0+ 1xi+ i
0 and 1 are (unknown) constants
1,,n are assumed to be independent draws
from a N(0, 2) distribution.
yi ~ N(0+ 1xi, 2)
E(yi) = 0+ 1xi
Math 321 - Dr. Minnotte

471

The most common way to estimate and


uses the least squares fit, minimizing

This leads to the least squares estimates,

Math 321 - Dr. Minnotte

Math 321 - Dr. Minnotte

472

157

Recall: Given a data set (xi, yi) and an


associated fitted regression model, the fitted
value for observation i is

The residual for i is

The error sum of squares (SSE) is found as

An estimate of , the standard deviation


around the regression line is

Math 321 - Dr. Minnotte

473

Also recall the total sum of squares, SST:


and the regression sum of squares, SSR:
which give us the computing formula
SSE = SST SSR.

The coefficient of determination, r2, measures


the proportion of the total variation of y which
is explained by x:

Math 321 - Dr. Minnotte

474

Inference in simple linear regression


usually focuses on , the estimate of the
slope parameter 1, which measures how
much y changes for a one-unit change in x.

Just like other sample statistics,


sampling distribution.

Under our model, this distribution is known


and may be used to construct confidence
intervals and hypothesis tests.
Math 321 - Dr. Minnotte

Math 321 - Dr. Minnotte

has a

475

158

Under the standard regression model (slide


471),

We may test H0: 1 = 10 using test statistic t


with 10 in place of 1 and using a t table for
n-2 degrees of freedom to find a P-value.

Most commonly, 10 = 0.
Math 321 - Dr. Minnotte

476

Example: Nile data (testing H0: 1 = 0):

n = 115
= .836
s = .331
= 119.25

=?
t=?

=?

P?

Math 321 - Dr. Minnotte

477

The regression equation is


February = - 0.470 + 0.836 January

Predictor
Constant
January

Coef
-0.4698
0.83617

SE Coef
0.1257
0.03027

Math 321 - Dr. Minnotte

Math 321 - Dr. Minnotte

T
-3.74
27.63

P
0.000
0.000

478

159

Can we conclude that the population slope


1 is greater than 0.8?

Math 321 - Dr. Minnotte

We can use the distribution on t to


construct a confidence interval for 1 as

Example: Nile data (95% c.i. for 1):

Math 321 - Dr. Minnotte

479

480

Inference in Correlation

Recall the definition of the sample correlation


coefficient.

Then

.
Math 321 - Dr. Minnotte

Math 321 - Dr. Minnotte

481

160

The population correlation is denoted .

Suppose we wish to test the hypotheses


H0: = 0 vs. H1: 0.

We can test this with a t test.

Our test statistic is

Under H0, U ~ tn-2, so we use the t table to


estimate our P-value.
Math 321 - Dr. Minnotte

482

Example: State data: n = 50

High school graduation vs. murder:

r = -.488
U=?

d.f. = ?
P?

Math 321 - Dr. Minnotte

483

Correlations: Illiteracy, HS Grad,


Murder

HS Grad

Illiteracy
-0.657
0.000

HS Grad

0.703
0.000

-0.488
0.000

Murder

Cell Contents: Pearson correlation


P-Value
Math 321 - Dr. Minnotte

Math 321 - Dr. Minnotte

484

161

Residual Plots (8.2)

Examination of residuals is an important


check on a regression analysis.

Generally, we look at a residual plot of xi


(or ) versus ei.

Math 321 - Dr. Minnotte

If the residual plot is flat, even, and


appears random around e = 0, everything
is probably fine.

Math 321 - Dr. Minnotte

485

486

Residual outliers indicate points which are


not well fit by the model. Check for
explanations, and possibly remove those
points.

Math 321 - Dr. Minnotte

Math 321 - Dr. Minnotte

487

162

Nonlinear patterns in the residual plot


suggest a linear fit is inappropriate.

Math 321 - Dr. Minnotte

488

Funnel-shaped plots mean your data is


heteroscedastic (different scatter), meaning the
standard deviation of y is not constant it
depends on x. Fitted values may still be
reasonable, but r2 and s may not mean much.

Math 321 - Dr. Minnotte

489

Power Transformations

Skewness (including outliers), curvature,


and heteroscedasticity can often all be
improved by the use of nonlinear
transformations on y, on x, or on both.

Such transformations include such options


as logs, square roots, and reciprocals (1/x).

Apply the transformation to all observations


on this variable.
Math 321 - Dr. Minnotte

Math 321 - Dr. Minnotte

490

163

Example: Tortilla frying time in seconds (x) vs.


moisture content in % (y), shows a curved,
decreasing relationship.

Math 321 - Dr. Minnotte

491

Taking logs of x and y leads to a quite


linear plot.

Math 321 - Dr. Minnotte

492

Finding a good combination of transformations on


x and y may require some experimentation and
patience.

Once we have a linear scatterplot, we may fit the


transformed variables. Inverse transformations
may give us models for x and y.

Example: Tortilla data. Fitting ln(y) to ln(x) gives


us
ln(y) = 4.64 - 1.05 ln(x)
and taking antilogs on each side gives us
y = 103.4 x- 1.05.
Math 321 - Dr. Minnotte

Math 321 - Dr. Minnotte

493

164

Inference in Multiple Regression (8.3)

Often we are interested in the relationship


between a response variable and multiple
explanatory variables.

Visually, we can inspect these


relationships with a matrix plot (or
scatterplot matrix).

Each pair of variables appears as two


scatterplots, one in each orientation.
Math 321 - Dr. Minnotte

Example: In late 1980, the city of


Concord, New Hampshire, began a
campaign to encourage water
conservation.

We wish to examine household water


usage (in cubic feet) in 1981 based on
1980 usage and a variety of other
household variables.

Math 321 - Dr. Minnotte

494

Math 321 - Dr. Minnotte

495

Math 321 - Dr. Minnotte

496

165

We can predict y from multiple xs using


multiple regression.

Our model looks like the one for simple


linear regression, with additional x terms.
yi = 0 + 1x1i + 2x2i + + pxpi + i.

Coefficient j measures the amount we


expect y to change when increasing xj by
one unit, while holding all of the other xs
constant.
Math 321 - Dr. Minnotte

497

We fit our coefficients again using the


method of least squares.

The computations are much more


complex than in the simple linear
regression case.

They are most easily represented and


computed using matrix methods difficult
by hand, but easy for a computer.

Math 321 - Dr. Minnotte

498

Regression Analysis: WATER81 versus WATER80, INCOME, ...

The regression equation is


WATER81 = 412 + 0.489 WATER80 + 0.0193 INCOME - 43.7 EDUCATION
+ 235 PEOPLE81 + 96.6 CHPEOPLE

Predictor
Constant
WATER80

Coef

SE Coef

412.0

189.0

2.18

0.030

0.48885

0.02638

18.53

0.000

0.019271

0.003368

5.72

0.000

EDUCATION

-43.65

13.23

-3.30

0.001

PEOPLE81

234.71

28.00

8.38

0.000

CHPEOPLE

96.56

80.76

1.20

0.232

INCOME

Math 321 - Dr. Minnotte

Math 321 - Dr. Minnotte

499

166

S = 851.914

R-Sq = 67.5%

R-Sq(adj) = 67.1%

Analysis of Variance

Source
Regression

DF

SS

MS

737617962

147523592

203.27

0.000

725757

Residual Error

490

355620748

Total

495

1093238710

Math 321 - Dr. Minnotte

500

Math 321 - Dr. Minnotte

501

Example: If two households are the same


in all respects except that one includes an
additional person, how much additional
water should we predict the larger
household will use?

Math 321 - Dr. Minnotte

Math 321 - Dr. Minnotte

502

167

Example: Suppose a household:

used 5000 cubic feet of water in 1980,


contained 4 people in both 1980 and 1981,
had a household income of $25,000, and
had a head of household with 12 years of
education.

Predict the water usage for this household


in 1981.

Math 321 - Dr. Minnotte

503

As in simple regression, we measure the


effectiveness of our model by

and
and combine them with the coefficient of
multiple determination
Math 321 - Dr. Minnotte

504

As in the simple linear regression case, R2


may be interpreted as the proportion of
variance in y explained by our model and
all of the xs.

It is not a simple correlation squared any


more.

Example: What percentage of the


variability in 1981 water usage is
explained by our model?
Math 321 - Dr. Minnotte

Math 321 - Dr. Minnotte

505

168

R2 is guaranteed to increase if you add


another x to your model, even if it is
related to y only by chance.

Because of this, many analysts prefer the


adjusted R2 for multiple regression,
especially when comparing models with
different numbers of explanatory variables.

Math 321 - Dr. Minnotte

506

We can still use t-tests and confidence


intervals on the individual coefficients in a
multiple regression model.

The standard errors are complicated to


compute, but available in output from
Minitab and other packages.

The degrees of freedom for the t-table will


be [n (p + 1)].

Math 321 - Dr. Minnotte

507

We can also conduct a test of model utility


on the entire model at once.

The null hypothesis is that all slope


coefficients are 0 (so none of the xs are
useful in predicting y).

The (F) test statistic is

Under H0, F has an F distribution with p


and [n (p + 1)] degrees of freedom.
Math 321 - Dr. Minnotte

Math 321 - Dr. Minnotte

508

169

The F test and P-value may be found in


the ANOVA table in regression output from
Minitab and other statistical packages.

The test of model utility is also found in the


simple regression case, but is completely
equivalent to the t-test on 1 for this case.

Math 321 - Dr. Minnotte

509

Example: What does the test of model


utility say about our multiple regression of
1981 water usage?

What do the t-tests say about the


individual terms in the model?

Math 321 - Dr. Minnotte

510

Interactions and Polynomials

An important special case of multiple


regression involves the use of interaction
(product) terms.
yi = 0 + 1x1i + 2x2i + 12x1ix2i + i.

The effect of increasing x1 on y depends


on the value of x2.

Interpretation becomes more difficult, but


relationships suggested by interaction
terms may be important and interesting.
Math 321 - Dr. Minnotte

Math 321 - Dr. Minnotte

511

170

Regression Analysis: WATER81 versus WATER80, INCOME, ...

The regression equation is


WATER81 = -769 + 0.974 WATER80 + 0.0213 INCOME + 39.5 EDUCATION
+ 217 PEOPLE81 - 0.0336 WATER80*EDUCATION

Predictor
Constant
WATER80

Coef

SE Coef

-768.9

313.4

-2.45

0.014

0.9742

0.1090

8.93

0.000

0.021263

0.003310

6.42

0.000

EDUCATION

39.55

22.25

1.78

0.076

PEOPLE81

216.57

27.52

7.87

0.000

-0.033617

0.007275

-4.62

0.000

INCOME

WATER80*EDUCATION

S = 835.152

R-Sq = 68.7%

R-Sq(adj) = 68.4%
Math 321 - Dr. Minnotte

512

What is the effective coefficient (simple


slope) on Water80 for a household whose
head has 8 years of education?

12 years?

16 years?

Math 321 - Dr. Minnotte

513

Another important special case is polynomial


regression.
yi = 0 + 1xi + 2xi2 + i.

This allows fitting polynomial curves to nonlinear


scatterplots.

Example: Yield
(kg/ha) vs. time
to harvest (days
after flowering)
for paddy, a
grain from India.
Math 321 - Dr. Minnotte

Math 321 - Dr. Minnotte

514

171

Polynomial Regression Analysis: Yield versus Time


The regression equation is
Yield = - 1070 + 293.5 Time - 4.536 Time**2
S = 203.883

R-Sq = 79.4%

R-Sq(adj) = 76.2%

Analysis of Variance
Source
Regression

DF

SS

MS

2084779

1042390

25.08

0.000

41568

Error

13

540388

Total

15

2625168

Math 321 - Dr. Minnotte

515

Note that regression with higher powers or


interaction terms is still considered linear
regression, not because it is linear in the
xs (its not), but because it is linear in the
parameters being estimated, the s.

Math 321 - Dr. Minnotte

516

Intelligent Consumption of Statistics

Even most statisticians deal with pregenerated statistics in journals and the
news far more frequently then they are
called upon to compute them themselves.

The ability to read and understand such


statistics with a properly critical eye and
brain (statistical literacy) is one which
should be expected of any educated adult.
Math 321 - Dr. Minnotte

Math 321 - Dr. Minnotte

517

172

Consider the following newspaper items:

In 2008, the export value of opium from


Afghanistan was $3.4 billion.
Three out of every 1,000 patients who have
their stomachs stapled will die within three
months.
Each year, about 1,100 suicides occur on
U.S. college campuses.

How firmly should we believe these?

Math 321 - Dr. Minnotte

Remember, any observed value can be


broken down into three parts:

the true value (what wed like to know)


randomness (unavoidable)
nonstatistical mistakes (what to watch for)

Too often, we assume the observed value


is the true value, and forget the other
components.

Math 321 - Dr. Minnotte

519

We deal with randomness by

518

insisting on large samples wherever possible,


reporting standard errors,
using confidence intervals instead of point
estimates,
testing for statistical significance,
and so on.

Nonstatistical mistakes are trickier.


Math 321 - Dr. Minnotte

Math 321 - Dr. Minnotte

520

173

Dangers in Data Collection


Remember, our statistical methods of data
analysis all assume weve collected our
data through planned introduction of
chance.

This means:

Randomized, controlled experiments or


Random samples.

Math 321 - Dr. Minnotte

521

Many experiments use no control group,


or a nonrandomized one (such as
historical data)

Example: A study on coronary bypass surgery


showed its subjects survived longer than
historical controls.
The sickest subjects couldnt be given the
surgery (they likely wouldnt survive it), so
were excluded from the study.
A randomized controlled study showed only
minor survival differences between the
groups.
Math 321 - Dr. Minnotte

522

Many studies use samples of


convenience.

Psychologists often require their students to


participate in experiments.
Critics have suggested modern psychology be
renamed psychology of the college
sophomore.

Math 321 - Dr. Minnotte

Math 321 - Dr. Minnotte

523

174

Even a random sample may be poor if


drawn from a different population than
implied.

A university does not provide daycare for


children of students. A sample of students of
the university asking if they require daycare to
attend classes is virtually certain to have a
small percentage (at most) saying yes. This
is not useful for determining if providing
daycare would allow other parents to attend
classes.
Math 321 - Dr. Minnotte

524

If a report on a study gives no information


about how the sample was collected, take
the results with a few barrels of salt,
especially if they seem unreasonable
otherwise.

Math 321 - Dr. Minnotte

525

Dangers of Survey Research

Survey results are among the most


commonly reported statistical results, yet
they have some of the greatest dangers
associated with them.

Knowing the details of a survey is critical


to evaluating the results it reports.

Math 321 - Dr. Minnotte

Math 321 - Dr. Minnotte

526

175

Defining the population is critical, but


finding it once it is defined is often difficult
or impossible.

Teenage mothers in Grand Forks County is


a well-defined population, but it could be hard
to find a list.

Math 321 - Dr. Minnotte

527

Selection and nonresponse biases will


often produce skewed results.

In 1936, Franklin Delano Roosevelt (a


Democrat) was running for his second
term as president against Republican Alf
Landon.

Literary Digest magazine sent out


questionnaires to 10 million people from
phone books, club membership lists, and
magazine subscription lists.
Math 321 - Dr. Minnotte

528

From the 2.4 million responses they


received, they predicted Landon would win
57% to 43%. On election day, Roosevelt
won 62% to 38%.

What went wrong?

Math 321 - Dr. Minnotte

Math 321 - Dr. Minnotte

529

176

In 1987, Shere Hite published The Hite


Report: Women and Love. This was a
study based on a long, essay-type survey
of women on love and sex.

Out of 100,000 questionnaires distributed


to organizations like church groups,
political organizations, and counseling
centers, Hite received 4,500 back.

Math 321 - Dr. Minnotte

530

From this, she generated claims (and


headlines!) like 70% of women married
five years or more are having sex outside
of their marriages.

Hite claims that because her sample was


large, and well-matched to census data in
factors such as race, income, and
geographic region, that her results can be
taken as representative of the country as a
whole.

Is this a valid claim?


Math 321 - Dr. Minnotte

531

The wording of questions is critical; ignore


a survey that will not provide the exact
questions asked.

13% of Americans think we spend too much


on assistance to the poor.
44% think we spend too much on welfare.

Math 321 - Dr. Minnotte

Math 321 - Dr. Minnotte

532

177

August 15-17, 2009, NBC/Wall St. Journal poll:

Would you favor or oppose creating a public health


care plan administered by the federal government that
would compete directly with private health insurance
companies?
o

43% favor, 47% oppose

August 19, 2009, Survey USA poll:

In any health care proposal, how important do you


feel it is to give people a choice of both a public plan
administered by the federal government and a private
plan for their health insurance extremely important,
quite important, not that important, or not at all
important?
o

77% extremely (58%) + quite (19%), 22% not that (7%) + not
at all (15%)
Math 321 - Dr. Minnotte

533

From Republican
congressman
John Culbersons
web page:

Math 321 - Dr. Minnotte

534

Sept. 19-22, 2008:

Pew Research: As you may know, the government is


potentially investing billions to try to keep financial institutions
and markets secure. Do you think this is the right thing or the
wrong thing for the government to be doing?

57% Right thing, 30% Wrong thing

L.A. Times/Bloomberg: Do you think the government should


use taxpayers' dollars to rescue ailing private financial firms
whose collapse could have adverse effects on the economy
and market?

31% Yes, 55% No

Wash. Post/ABC News: Do you approve or disapprove of the


steps the Federal Reserve and the Treasury Department
have taken to try to deal with the current situation involving
the stock market and major financial institutions?

47% Approve, 42% Disapprove

Math 321 - Dr. Minnotte

Math 321 - Dr. Minnotte

535

178

Even if not slanted, a questions wording


may be confusing, and thus bias the
results.

1992: Does it seem possible or does it seem


impossible to you that the Nazi extermination
of the Jews never happened?
22% said possible.
1994: Does it seem possible to you that the
Nazi extermination of the Jews never
happened, or do you feel certain that it
happened?
1% said possible.
Math 321 - Dr. Minnotte

536

Who is asking the question can influence


the answer.

Two teams of interviewers, one white, one


black, surveyed Southern blacks during World
War II, asking if blacks would be treated better
or worse if Japan conquered the U.S.
Black interviewers: 9% better, 25% worse.
White interviewers: 3% better, 45% worse.

Math 321 - Dr. Minnotte

537

This desire to look good and please the


interviewer can affect the results no matter
who does the questioning.

A study on toothbrushing habits found that


if people brushed as much as they
claimed, toothpaste sales would be three
times higher than they actually were.

Math 321 - Dr. Minnotte

Math 321 - Dr. Minnotte

538

179

In December, 2012, Public Policy Polling found


that 39% of Americans had an opinion about the
Simpson-Bowles deficit reduction plan.

They also found that 25% had an opinion on the


Panetta-Burns plan, even though the latter didnt
exist!

Math 321 - Dr. Minnotte

In November, 2010, California voters


voted on Proposition 19 to legalize, tax,
and regulate recreational marijuana use.

As of July, 2010, 6 telephone polls had


been conducted:

539

3 polls with human interviewers showed the


proposition being defeated by 1, 2, and 4
percentage points.
3 automated polls (robopolls) showed the
proposition passing by 10, 14, and 16 points.
Math 321 - Dr. Minnotte

540

Dangers of Inference
Garbage In Garbage Out

Statistical procedures dont check the data. If


there are issues with the data collection, there
will still be perfectly good-looking results from
the computer.
Thats why checking the study design is so
critical.

Math 321 - Dr. Minnotte

Math 321 - Dr. Minnotte

541

180

Data snooping

If we observe an extreme result, and then


conduct a test, we should not be surprised
when that test returns a significant result.
Ex: A town of 50,000 has very high voltage
power lines. One year, the rate of a particular
type of cancer is 3 times the national
average.
A test of significance gives a p-value of
0.0002 = 1/5,000. Are the power lines
causing cancer?
Math 321 - Dr. Minnotte

542

If you split the U.S. population of more than


250,000,000 into sets of 50,000, there would
be more than 5,000 of them. By chance,
youd expect at least one to have such a high
rate. Since the high rate led us to test, its not
convincing yet.
If the high rate were to persist over several
years, it would suggest that something in the
town was causing it. (Remember, correlation
is not causation!)
Important studies should be replicated to be
truly convincing.
Math 321 - Dr. Minnotte

543

Does the difference prove it?

Charles Tart wanted to prove ESP. He built a


device called the Aquarius which chose one
of 4 targets, which the subject was supposed
to predict.
Out of 7,500 guesses from 15 clairvoyant
subjects, 2,006 were hits. Compared to a null
of p=1/4, this gives a p-value of 0.0002.
Did this prove ESP?

Math 321 - Dr. Minnotte

Math 321 - Dr. Minnotte

544

181

When checked, it came out that the random


number generator almost never picked the
same target twice in a row.
By selecting a different choice for the next
guess after the target lit up, a subject could
almost have a 1/3 chance of a hit.
A replication with an improved r.n.g. showed
no significant results.
The results of the first experiment werent due
to chance with p=1/4, but they werent due to
ESP either.
Statistical tests wont check your experimental
design.
Math 321 - Dr. Minnotte

545

How much should we believe?

The opium value was unsourced in an


editorial. Google found a report on The
Afghan Opium Survey 2008 from the
United Nations Office on Drugs and Crime.

Methodology included use of satellite


imagery and surveys of farmers, villagers,
and traders.

Math 321 - Dr. Minnotte

546

Stomach stapling fatality rates are from


the International Bariatric Surgery
Registry. No information how computed.

College suicide rates are estimates by the


American Foundation for Suicide
Prevention. Also no information.

Math 321 - Dr. Minnotte

Math 321 - Dr. Minnotte

547

182

You might also like