You are on page 1of 16

AP Statistics

Chapter Four Lecture Notes


Thiele

Name: _____________________________
Period: _____________________________

The first thing we should probably tackle in our year long journey to understand the basics of
statistics is to define what it encompasses:
STATISTICS:

AP Statistics Major Themes


Now that we know what statistics is, it is probably a good idea to overview the four major
components of the statistical problem solving process (which also serve as the four major
components of the AP Statistics curriculum):
I.
II.
III.
IV.
In this chapter we are going to carefully examine the topic described in I: ______________________

Approaches to Data Collection


(I)

When trying to answer a question about some group using data, for example, What
proportion of Harker students own an iPhone? the easiest approach is to perform a
___________________.

Definition:

If we changed our question of interest to, What proportion of American high school students
own an iPhone? what would be difficult about this approach?
(1)
(2)
(3)
(4)
(II) As a result, a _____________ often times is inferior to taking a ___________________

Definition:

#4

THE KEY TO MAKING THIS DATA COLLECTION APPROACH WORK WELL IS THAT THE
SAMPLE NEEDS TO BE ______________________ OF THE POPULATION OF INTEREST.
Types of Sample Design
VOLUNTARY RESPONSE SAMPLE

CONVENIENCE SAMPLE

Do either of these two sampling designs accomplish our goal of having a _______________ sample?
As a matter of fact, the above methods of data collection often create a big problem with regard to
obtaining appropriate data:
Bias:

Example 1:

Before the presidential election of 1936, Franklin D. Roosevelt against Alf


Landon, the magazine Literary Digest predicted Landon winning the election in a
landslide using a survey of 2.8 million people. George Gallup surveyed only
50,000 people and predicted that Roosevelt would win, which he did. Literary
Digests survey came from magazine subscribers, car owners, telephone
directories, etc. Describe the source of bias.

Example 2:

Suppose that you want to estimate the total amount of money spent by students on
textbooks each semester at UC Berkeley. You collect register receipts for
students as they leave the bookstore during lunch one day. Describe the source of
bias.

#7,9

OUR BEST APPROACH TO ENSURING THAT OUR SAMPLE IS ______________ AND DOES
NOT INTRODUCE BIAS IS TO TAKE A _____________ SAMPLE.
2

SIMPLE RANDOM SAMPLE (SRS) OF SIZE N

Example 3:

Mr. Thiele runs a lucrative Statistics tutoring business in the San Mateo area. In
order to provide high quality customer service, he wishes to interview a sample of
five clients in detail. In order to avoid bias he wishes to choose an SRS of size
five. Using Table D, line 130, determine which five clients should be selected
from this group:

Angela Anova
Cos Ashon
Lynnie Arregression
Miss Calculate
Dennis Decile
Stan Deviation
Norma Distribution
Martin Gale
Marge Innovera
Cara Lashon

Corey Lashon
Maxim M. Likelihood
Ester Mate
Jean Mean
Marian Median
Moe Ment
Chris Miss
Moe Mode
Dee Morgan
Mark Off

Kurt O'Sis
Penelope Probability
Sam Pull
Quincy Quartile
Sally Sample
Sammy Sample
Randy Sampling
Cheb E. Shev
Minnie Tabb
Percy N Tile

#12,16
STRATIFIED RANDOM SAMPLE

Example 4:

The financial aid office of a university wishes to estimate the average amount
of money that students spend on textbooks each term. They are considering
taking a stratified sample. For each of the following proposed stratification
schemes, determine whether it would be worthwhile to stratify the university
students in this manner.
(a) Strata corresponding to class standing (freshman, sophomore, junior,
senior, grad student).
(b)

Strata corresponding to field of study, using the following categories:


engineering, architecture, business, other.

(c)

Strata corresponding to the first letter of the last name: A-E, F-K, etc.

CLUSTER SAMPLE

Example 5:

#21,24

A hotel has 30 floors with 40 rooms per floor. The rooms on one side of the
hotel face the water, while rooms on the other side face a golf course. There is
an extra charge for the rooms with a water view. The hotel manager wants to
survey 120 guests who stayed at the hotel during a convention about their
overall satisfaction with the property.
(a) Explain why choosing a stratified random sample might be preferable to
an SRS in this case.

(b)

What would you use as strata?

(c)

Why might a cluster sample be a simpler option?

(d)

What would you use as clusters?

SYSTEMATIC SAMPLE

Example 6:

For each of the situations described below, state the kind of sampling
procedure that is used.
(a) All freshmen at a university are enrolled in one of 30 sections of a
freshman seminar course. To select a sample of freshmen at this
university, a researcher selects four sections of the freshmen seminar
course at random from the 30 sections and all students in the four
selected sections are included in the sample.
(b)

To obtain a sample of students, faculty, and staff at a university, a


researcher randomly selects 50 faculty members from a list of faculty, 100
students from a list of students, and 30 staff members from a list of staff.

#26

(c)

A university researcher obtains a sample of students at his university by


using the 85 students enrolled in his Psychology 101 class.

(d)

To obtain a sample of the seniors at a particular high school, a researcher


writes the name of each senior on a slip of paper, places the slips in a
box and mixes them, and then selects 10 slips. The students whose
names are on the selected slips of paper are included in the sample.

(e)

To obtain a sample of those attending a basketball game, a researcher


selects the 24th person through the door. Then, every 50th person after
that is also included in the sample.

Although ____________ methods of sampling work to guard against ____________, it is obvious


that the sample results will depend on which individuals are actually selected. This leads to a second
potential problem.
Variability:

OUR BEST APPROACH TO REDUCE _________________ IN A SAMPLE IS TO TAKE A


_____________ SAMPLE.

BIAS

VARIABILITY

HIGH

LOW

HIGH

LOW

Sources of Bias in Sampling


SAMPLING ERRORS
(1)

Bad Sampling Methods

(2)

Selection Bias (Undercoverage)

NONSAMPLING ERRORS
(1)

Nonresponse Bias

(2)

Measurement Bias

(3)

Response Bias

(4)

Wording Bias

Example 7:

According to an article on the CNN.com website, dated 17 September 1998,


entitled Majority of U.S. Teens Are Not Sexually Active, Study Shows,
52% of surveyed teenagers had never had sexual intercourse. A very large
random sample of 16,262 high school students was the source of this
information. If the population of interest consists of all teenagers in the
United States, are there individuals in the population who have no chance of
being selected? What kind of bias is this? Do you think this bias is a serious
problem?

Example 8:

The article Study Provides New Data on the Extent of Gambling by College
Athletes (Chronicle of Higher Education, 22 January 1999) reported that 72
percent of college football and basketball players had bet money at least once
since entering college. This conclusion was based on a study in which
copies of the survey were mailed to 3000 athletes at 182 Division I
institutions, 25 percent of whom responded. What types of bias might have
influenced the results of this study? Explain.

#27,31,34,35

Methods of Data Collection


(1)
(2)

(3)

Example 9:

According to an observational study from the Fred Hutchinson Cancer


Research Center (see CNN.com web site article titled Broccoli, Not Pizza
Sauce, Cuts Cancer Risk, Study Finds, 5 January 2000), men who ate more
cruciferous vegetables had a lower risk of prostate cancer. This study made
separate comparison for men who ate different levels of vegetables.
According to one of the investigators, at any given level of total vegetable
consumption, as the percent of cruciferous vegetables increased, the prostate
cancer risk decreased. Based on this study, is it reasonable to conclude that
eating cruciferous vegetables causes a reduction in prostate cancer risk?
Explain.

Observational studies cannot easily provide evidence of ________________. They can,


however, provide information about ______________________.
The reason for this, as the above example shows, is that observational studies do not account for
other variables (such as the mens general health habits) that may impact the response of interest,
and as a result these __________________ variables can end up becoming _______________
variables).
LURKING VARIABLE:

CONFOUNDING VARIABLE:

Example 10:

Crime Finds the Never Married is the conclusion drawn in an article from
USA Today (29 June 2001). This conclusion is based on data from the Justice
Departments National Crime Victimization Survey, which estimated the
number of violent crimes per 1000 people, 12 years of age or older, to be 51
for the never married, 42 for the divorced or separated, 13 for married
individuals and 8 for the widowed. Does being single cause an increased risk
of violent crime? Describe a lurking variable that illustrates why it is
unreasonable to conclude that a change in marital status causes a change in
crime risk.

#46,47,50

Experimental Terminology
EXPLANATORY VARIABLE (FACTOR):

RESPONSE VARIABLE:

EXPERIMENTAL UNITS:

TREATMENT:
LEVELS:

Example 11: As we have already seen, nonresponse is a big issue in survey design. A
particular company was interested in reducing the rate of refusals in telephone
surveys. Most people who answer at all listen to the interviewers introductory
remarks and then decide whether to continue. To address this issue, a study made
telephone calls to randomly selected households to ask opinions about the next
election. In some calls, the interviewer gave her name, in others she identified the
university she was representing, and in still others she identified both herself and
the university. For each type of call, the interviewer either did or did not offer to
send a copy of the final survey results to the person interviewed. The company
then examined whether or not these differences in the introduction affected
whether the interview was completed.
(a)

What are the experimental units?

(b)

What are the factors?

(c)

What is the response variable?

(d)

List the treatments.

#54,57,63

Experimental Design

GOAL:

A good experimental design should contain the following:


(1)
(2)
(3)

(4)

(5)
(6)
9

Example 12: Will providing child-care for employees make a company more attractive to
women, even those who are unmarried? You are designing an experiment to
answer this question. You prepare recruiting material for two fictitious
companies, both in similar businesses in the same location. Company As
brochure does not mention child-care. There are two versions of Company Bs
material, identical except that one describes the companys on-site child-care
facility. Your subjects are 40 unmarried women who are college seniors seeking
employment. Each subject will read recruiting material for both companies and
choose the one she would prefer to work for. You will give each version of
Company Bs brochure to half the women. You expect that a higher percentage
of those who read the description that included child-care will choose Company
B.
(a) Describe an appropriate design for the experiment.

(b)

The names of the subjects appear below. Use Table D, beginning at line
131 to do the randomization required by your design. Indicate the subjects
who read the version that mentions child-care.
Abrams
Adamson
Afifi
Brown
Cansico
Chen
Cortez
Curzakis

Danielson
Durr
Edwards
Fluharty
Garcia
Gerson
Green
Gupta

Gutierrez
Howard
Hwang
Iselin
Janle
Kaplan
Kim
Lattimore

10

Lippman
Martinez
McNeill
Morse
Ng
Quinones
Rivera
Roberts

Rosen
Sugiwara
Thompson
Travers
Turing
Ullmann
Williams
Wong

Principles of Experimental Design


(1)

CONTROL

(2)

RANDOMIZATION

(3)

REPLICATION

(4)

BLOCKING

Matched Pairs:
#67
11

Other Issues of Concern when Experimenting


(1)

Placebo Effect

Counteract: Establish a ___________________ to receive a placebo treatment.


(2)

Blinding
Single Blind:

Double Blind:

#69,71,73
Example 13:

Mr. Stoll is concerned about room temperatures in Dobbins having an impact


on student performance on mathematics examinations. For a given school
year, there are four sections of AP Statistics being offered, with two sections
being taught by Professor Mortlock and the other two by Mr. Thiele. Mr.
Stoll wants to set the room temperature in two of the rooms at 65 F and the
other two rooms at 75 F.
(a) What are the experimental units, explanatory and response variables, and
treatments?

(b)

What is an extraneous variable in this setting?

(c)

How could blocking be utilized in this scenario to prevent the extraneous


variable from becoming confounding?

12

Example 14:

To compare two levels of treatment with a new fertilizer, cherry tomatoes are
to be grown in each of eight test plots. Tall windows line the north side of the
hothouse and a breezy doorway is located on the east side.
Tall Windows

Breezy Doorway

(a)

What are the experimental units, explanatory and response variables, and
treatments?

(b)

What would blocking using the scheme above (one block is white, the
other is gray) accomplish?

(c)

Comment on the strength and weakness of the above scheme as


compared to the following blocking scheme (one block is white, the
other is gray).
Tall Windows

Breezy Doorway

(d)

If a breezy doorway were added to the side of the hothouse opposite that
of the current breezy doorway, what type of blocking structure would
work well?

(e)

Describe the assignment of treatments to the test plots using the blocking
structure used in part (d).

13

Example 15:

In search of a mosquito repellent that is safer than the ones that are currently
on the market, scientists have developed a new compound that is rated as less
toxic than the current compound, thus making a repellent that contains this
new compound safer for human use. Scientists also believe that a repellent
containing the new compound will be more effective than the ones that
contain the current compound. To test the effectiveness of the new compound
versus that of the current compound, scientists have randomly selected 100
people from a state (probably Minnesota). Up to 100 bins, with an equal
number of mosquitoes in each bin, are available for use in the study. After a
compound is applied to a participants forearm, the participant will insert his
or her forearm into a bin for one minute, and the number of mosquito bites on
the arm at the end of that time will be determined. Suppose this study is to be
conducted using a matched-pairs design. Describe a randomization process.

#78,81,85,87

Scope of Inference
Researchers who conduct statistical studies often want to draw conclusions (i.e. make inferences)
that go beyond the data produced. So, what types of conclusions are appropriate? The answer
depends on the design of the study.

RANDOMLY SELECTED INDIVIDUALS?

INDIVIDUALS RANDOMLY ASSIGNED

YES

TO GROUPS?

NO

YES

NO

Another issue that may prevent us from generalizing an experiments results is lack of ______________.
#103,104,108
14

Yet another issue occurs when experiments are impractical or unethical to complete. The criteria
for establishing causation when an experiment cannot be completed include:
(1)
(2)
(3)
(4)
(5)

Data Ethics
All studies involving humans should always adhere to the guidelines of:
(1)

(2)

(3)

#109,112,114,115
15

Simulation
Useful when actually carrying out an experiment is too ___________, _____________, or
_______________
Steps of a Simulation:
(1)
(2)
(3)
(4)
(5)

Example 16:

In the 2006 Iowa Intercollegiate Athletic Conference (IIAC) Womens Soccer


Tournament championship game, the mighty Loras Duhawks were taken to
the dreaded penalty kick phase by the pesky Storm of Simpson College in
order to determine the conference champion and automatic qualifier for the
NCAA tournament. Although the Loras women missed their first kick, the
Storm missed their last three kicks. Knowing that penalty kicks are converted
80% of the time, use simulation to determine whether the event of missing
three out of five penalty kicks should be considered normal or if the mystique
of the Duhawk was at play on that glorious November evening in Indianola,
Iowa.

16

You might also like