Professional Documents
Culture Documents
TABLE OF CONTENTS
Summary of Problems ...................................................................................................................................................3
Content ..........................................................................................................................................................................3
Data and Variables ....................................................................................................................................................3
Vocabulary .............................................................................................................................................................3
Key Concepts .........................................................................................................................................................3
Data and Distributions ...............................................................................................................................................4
Vocabulary .............................................................................................................................................................4
Key Concepts .........................................................................................................................................................4
Visual Aids ..............................................................................................................................................................4
Drawing Conclusions from Studies ............................................................................................................................5
Vocabulary .............................................................................................................................................................5
Key Concepts .........................................................................................................................................................5
Random Sampling ......................................................................................................................................................5
Vocabulary .............................................................................................................................................................5
Key Concepts .........................................................................................................................................................6
Designing Experiments ..............................................................................................................................................6
Vocabulary .............................................................................................................................................................6
Key Concepts .........................................................................................................................................................6
Two-Way Tables ........................................................................................................................................................7
Vocabulary .............................................................................................................................................................7
Key Concepts .........................................................................................................................................................7
Visual Aids ..............................................................................................................................................................7
Displaying and Describing Distributions ....................................................................................................................8
Vocabulary .............................................................................................................................................................8
Key Concepts .........................................................................................................................................................8
Visual Aids ..............................................................................................................................................................8
Measures of Center ...................................................................................................................................................9
Vocabulary .............................................................................................................................................................9
Key Concepts .........................................................................................................................................................9
Measures of Spread...................................................................................................................................................9
Vocabulary .............................................................................................................................................................9
Key Concepts .........................................................................................................................................................9
More Summary Measures and Graphs ....................................................................................................................10
1|P a g e
Vocabulary ...........................................................................................................................................................10
Key Concepts .......................................................................................................................................................10
Visual Aids ............................................................................................................................................................10
Sample AP Problems ..................................................................................................................................................11
Problem 1 2014 Released Exam ...........................................................................................................................11
Problem 2 2009 Released Exam ...........................................................................................................................11
Problem 3 2009 Form B Released Exam ...............................................................................................................12
Solutions to AP Problems ..........................................................................................................................................12
Problem 1 2014 Released Exam ...........................................................................................................................12
Problem 2 2009 Released Exam ...........................................................................................................................13
Problem 3 2009 Form B Released Exam ...............................................................................................................13
Quiz Multiple Choice and Free Response .................................................................................................................13
Answers to Quiz Multiple Choice and Free Response ..............................................................................................15
References ...................................................................................................................................................................15
2|P a g e
SUMMARY OF PROBLEMS
Included in the Design of Experiments review packet are the following topics:
All of the above topics form the foundation necessary to study statistics in a useful fashion. These topics cover how
to setup an experiment, some of the ways to eliminate bias, how to interpret graphs or basic summary statistics,
and how to begin to describe the distribution of data.
CONTENT
In the following subsections, vocabulary and key topics will be summarized and any useful applications will be
listed. Practice problems and AP Problems will be available at the end of this review.
DATA AND VARIABLES
VOCABULARY
Variable Any characteristic of a person or thing that can be assigned a number or category.
Observational Unit The person or thing to which a category is assigned.
Data The numbers or categories that are recorded for the observational units.
Variability The phenomenon of a variable taking on different values or categories from one
KEY CONCEPTS
KEY CONCEPTS
When referring to the distribution of a quantitative variable, always make sure to refer to the center and
spread of the data.
Always begin to analyze data by constructing a graph.
Make sure to label graphs clearly.
Make sure to relate any comments about a graph to the context of the data.
The simplest graphs to construct are dotplots (for quantitative data) and bar graphs (for categorical data).
When making a bar graph, it is considered best practice to use the proportion of data that shows up in a
particular category along the vertical axis of the graph, not the numerical value of pieces.
VISUAL AIDS
4|P a g e
KEY CONCEPTS
RANDOM SAMPLING
VOCABULARY
Simple Random Sampling All potential observational units have the same chance of being selected.
5|P a g e
Table of Random Digits A randomly generated table (usually by a computer) that lists digits to be used
to aid in the simple random sampling process.
Unbiased No preference or tendency to overrepresent or underrepresent a specific group or variable.
Sampling Variability The values of sample statistics vary from one sample to another.
Precision The amount that the values differ from sample to sample.
KEY CONCEPTS
A random sample is one chosen with an impersonal mechanism not one chosen by the researcher (no
matter how much they try to make it unbiased).
The size of the population does not have anything to do with the issues of sampling bias or precision.
Increasing the sample size will reduce sampling variability.
Larger samples = higher precision
If the sampling method is biased in the first place, taking a larger sample will not eliminate that bias.
Be very careful how you select a sample if you intend to generalize its results to a population.
Bias is a property of the sampling method, not the individual sample.
DESIGNING EXPERIMENT S
VOCABULARY
Anecdotal evidence Evidence that comes to mind easily and is of little value in terms of scientific
research.
Experiment A study in which the experimenter actively imposes a treatment on the subjects.
Random assignment Each subject has an equal chance of being assigned to any of the treatment
groups.
Randomized Comparative Experiment An experiment in which random assignment has been carried
out.
Placebo effect This is when a subject tends to respond positively to a treatment whether or not the
treatment is actually effective.
Placebo A treatment that is virtually indistinguishable from the true treatment to all except the
experimenter that is given to a group of subjects in order to establish a comparison group.
Blind Subjects do not know whether they are receiving the true treatment or a placebo.
Double Blind The person evaluating the research and the test subjects do not know who has received
what treatment.
KEY CONCEPTS
Random assignment is very different from random sampling. They are two different methods and have
two distinctly different goals.
o Random sampling usually observational studies
o Random assignment usually experiments
Random assignment aims to produce treatment groups that are similar in all aspects except for the
treatment that is imposed. Then, if the groups differ substantially in the response variable, you can
conclude that the explanatory variable caused the difference.
Random assignment eliminates confounding variables.
6|P a g e
TWO-WAY TABLES
VOCABULARY
Two-Way Table This table classifies each person according to two categorical variables.
Marginal Distribution The proportional breakdown of each of a variables categories.
Conditional Distribution The distribution of the response variable for a particular category of the
explanatory variable.
Segmented Bar Graphs Used to visually represent the conditional distribution of the explanatory
variable.
Independent Two categorical variables are independent if the conditional distribution of one variable is
identical to every category of the other variable.
Association A relationship between two variables.
Simpsons Paradox The idea that aggregate proportions can reverse the direction of the relationship
seen in the individual pieces.
KEY CONCEPTS
Make sure to first summarize data through graphical displays and then through numerical summaries.
Remember that the way you obtained your data determines which kind and scope of conclusion you can
draw.
Simpsons paradox explains why one hospital may have a lower survival rate in general but when broken
down it actually has a higher survival rate for each of the two categories than the other hospital.
VISUAL AIDS
7|P a g e
KEY CONCEPTS
Three important things to discuss when talking about the distribution of data: shape, spread, and center.
Pay careful attention to outliers. Make sure to distinguish if it is truly an outlier and investigate possible
reasons for their occurrence.
Examine several different types of graphs. Do not rely solely on one type of graph. Each graph has its own
advantages and disadvantages.
If desired, use your graphing calculator to help display the data.
Make sure all comments about the distribution of the data relate to the context that the data is in.
VISUAL AIDS
8|P a g e
MEASURES OF CENTER
VOCABULARY
Mean An ordinary arithmetic average, found by adding up all of the values for each of the observational
units and then dividing by the number of values.
Median The value of the middle observation (once all of the values have been arranged in ascending
order). The median of an even number of values is defined as the average of the two middle values.
Resistant A measure whose value is not affected by outliers.
KEY CONCEPTS
The mean and median values can be quite similar, but a few scenarios can cause them to differ greatly.
Decide which measure of center to use based on the question asked.
The median is resistant to outliers.
The mean is not resistant to outliers.
You cannot completely summarize a distribution with only the value of the measure of center. You must
combine the measure of center value with the shape and variability to produce an accurate explanation of
the data.
In many cases, it is appropriate to report the mean and median.
Use 1-Var Stats on the calculator in order to calculate mean, median, and mode more easily.
Neither mean nor median conveys any information about the shape or variability of the distribution.
Center is often the most important aspect of a distribution.
MEASURES OF SPREAD
VOCABULARY
Range A measure of variability calculated as the difference between the maximum and minimum values
of a data set.
Interquartile Range The difference between the upper quartile and lower quartile of a data set.
Lower Quartile This is the value such that 25% of the values are less than this value and 75% are greater
than it.
Upper Quartile This is the value such that 75% of the values are less than this value and 25% are greater
than it.
Standard Deviation This can be loosely interpreted to be the typical deviation of an observation from
the mean.
Empirical Rule Applies to normal distributions. Approximately 68% of data falls within 1 standard
deviation of the mean. Approximately 95% of data falls within 2 standard deviations of the mean.
Approximately all data falls within 3 standard deviations of the mean.
z-score Indicates how many standard deviations above or below the mean an observation falls.
KEY CONCEPTS
When describing a distribution, focus on the bulk of the data instead of the few extreme values. It is
important to mention them, but they are not the most important values in describing the distribution.
Standard deviation is the most widely used measure of spread.
Mean and standard deviation provide a useful summary of a symmetric distribution.
The interquartile range is always acceptable, but is more useful with skewed distributions.
Calculating a z-score standardizes the observation.
()
( )2
1
Five-Number Summary This summary provides a quick description of the distribution of data. It includes
the median, quartiles, and extremes.
Modified Boxplots These boxplots contain useful information about data when dealing with outliers.
They mark outliers with special symbols and then extend the whiskers only to the most extreme nonoutlier value. Any value that is 1.5 times the Interquartile Range away is considered an outlier.
KEY CONCEPTS
VISUAL AIDS
10 | P a g e
SAMPLE AP PROBLEMS
The problems included below are from released Free Response sections of Advanced Placement Statistics Exams.
All questions are the property of the College Board.
PROBLEM 1 2014 RELEASED EXAM
As part of its twenty-fifth reunion celebration, the class of 1988 (students who graduated in 1988) at a state
university held a reception on campus. In an informal survey, the director of alumni development asked 50 of the
attendees about their incomes. The director computed the mean income of the 50 attendees to be $189,952. In a
news release, the director announced, The members of our class of 1988 enjoyed resounding success. Last years
mean income of its members was $189,952!
(a) What would be a statistical advantage of using the median of the reported incomes, rather than the mean, as
the estimate of the typical income?
(b) The director felt the members who attended the reception may be different from the class as a whole. A more
detailed survey of the class was planned to find a better estimate of the income as well as other facts about the
alumni. The staff developed two methods based on the available funds to carry out the survey.
Method 1: Send out an e-mail to all 6,826 members of the class asking them to complete an online form.
The staff estimates that at least 600 members will respond.
Method 2: Select a simple random sample of members of the class and contact the selected members
directly by phone. Follow up to ensure that all responses are obtained. Because method 2 will require
more time than method 1, the staff estimates that only 100 members of the class could be contacted
using method 2.
Which of the two methods would you select for estimating the average yearly income of all 6,826 members of the
class of 1988? Explain your reasoning by comparing the two methods and the effect of each method on the
estimate.
PROBLEM 2 2009 RELEASED EXAM
Before beginning a unit on frog anatomy, a seventh-grade biology teacher gives each of the 24 students in the class
a pretest to assess their knowledge of frog anatomy. The teacher wants to compare the effectiveness of an
instructional program in which students physically dissect frogs with the effectiveness of a different program in
which students use computer software that only simulates the dissection of a frog. After completing one of the
two programs, students will be given a posttest to assess their knowledge of frog anatomy. The teacher will then
analyze the changes in the test scores (score on posttest minus score on pretest).
(a) Describe a method for assigning the 24 students to two groups of equal size that allows for a statistically valid
comparison of the two instructional programs.
(b) Suppose the teacher decided to allow the students in the class to select which instructional program on frog
anatomy (physical dissection or computer simulation) they prefer to take, and 11 students choose actual
dissection and 13 students choose computer simulation. How might that self-selection process jeopardize a
statistically valid comparison of the changes in the test scores (score on posttest minus score on pretest) for the
two instructional programs? Provide a specific example to support your answer.
11 | P a g e
(a) Based on the boxplot, what are the approximate values of the median and the interquartile range of the
distribution of state gasoline taxes, in cents per gallon? Mark and label the boxplot to indicate how you found
the approximated values.
(b) The federal tax imposed on gasoline was 18.4 cents per gallon at the time the state taxes were in effect. The
federal gasoline tax was added to the state gasoline tax for each state to create a new distribution of combined
gasoline taxes. What are approximate values, in cents per gallon, of the median and interquartile range of the
new distribution of combined gasoline taxes? Justify your answer.
SOLUTIONS TO AP PROBLEMS
This section contains the solutions (with work when necessary) to the problems from released Advanced
Placement Statistics Exams.
PROBLEM 1 2014 RELEASED EXAM
(a) It would be more advantageous to use the median over the mean for reporting the estimate of the typical
income because the median is more resistant to outliers than the mean. For example, there may be a data
point representing an income of $1,000,000 while the rest of the incomes are within the $50,000 range. The
mean would be skewed to the higher end while the median would not be affected.
(b) The researcher should chose Method 2. Method 1 is potentially biased because of the way members are being
sampled. Method 1 requires members to voluntarily respond to the survey. This may cause only a certain group
to respond more frequently than others. Method 2 addresses this problem by contacting members directly and
pressuring them into giving a response that they might not give if they were simply emailed. Even though
Method 2 has a smaller sample size, it is more likely to be better representative of the population of interest.
12 | P a g e
(b) The new value of the median for the combined gasoline tax distribution is 39.4 cents per gallon. This is obtained
by adding 18.4 cents per gallon to the state median of 21 cents per gallon. The new interquartile range is still only
7 cents per gallon because you add 18.4 cents per gallon to each end (Q1 and Q3). Adding the same amount to
each end does not change the range between the two.
A certain country has 1,000 farms. Corn is grown on 100 of these farms but on none of the others. In
order to estimate the total farm acreage of corn for the country, two plans are proposed.
Plan I:
(a) Sample 20 Farms at Random
(b) Estimate the mean acreage of corn per farm in a confidence interval.
(c) Multiply both ends of the interval by 1,000 to get an interval estimate of the total.
Plan II:
(a) Identify the 100 corn-growing farms.
(b) Sample 20 corn-growing farms at random.
(c) Estimate the mean acreage of corn for corn-growing farms in a confidence interval.
(d) Multiply both ends of the interval by 100 to get an interval estimate of the total.
13 | P a g e
On the basis of the information given, which of the following is the better method for estimating the total
farm acreage of corn for the country?
2.
3.
4.
The boxplots above summarize two data sets, A and B. Which of the following statements must be true?
I. Set A contains more data than Set B
II. The box of Set A contains more data than the box of Set B
III. The data in Set A have a larger range than the data in Set B
(A) I only
(B) III only
(C) I and II only
(D) II and III only
(E) I, II, and III
14 | P a g e
5.
6.
7.
8.
The student government at a high school wants to conduct a survey of student opinion. It wants to begin
with a simple random sample of 60 students. Which of the following survey methods will produce a
simple random sample?
(A) Survey the first 60 students to arrive at school in the morning.
(B) Survey every 10th student entering the school library until 60 students have been sampled.
(C) Use random numbers to choose 15 students from each grade.
(D) Number the cafeteria seats. Use a table of random digits to choose seats and interview those
students.
(E) Number the students in the official roster. Use a table of random digits to choose 60 students from
this roster for the survey.
What would a distribution look like If the mean were greater than the median?
Describe the meaning of a z-score.
What is the most important principle of statistics?
4.
5.
6.
7.
8.
(B) You would choose Plan II over Plan I because Plan I includes farms that do not have any corn. Since you
are trying to find acreage of corn, you would want to sample only farms with corn on them.
(B) A cause-and-effect relationship can only be drawn when treatment is imposed on test subjects.
(D) Brand is a confounding variable in this experiment. In order to properly determine if temperature has
an effect, you would need to use the same brand of rubber band for each temperature. One brand may
be naturally stretchier than another.
(B) Based on the boxplots, you can only tell the spread of the data. The boxplots do not give any indication
as to how many data points are in each set.
(E) The only way to obtain a truly random sample is to use a table of random digits and sample from as
many test subjects as possible. There may be lurking variables present with students from the other
suggested groups as well.
The distribution would appear to have lots of data points at the low end of the graph and the tail would
extend to the higher values.
A z-score is essentially the number of standard deviations an observation is away from the mean.
The most important principle of statistics is variability.
REFERENCES
All information contained within this review packet is from the following sources:
Workshop Statistics by Allan A. Rossman, Beth L. Chance, J. Barr von Oehsen, et al.
The College Board, Advanced Placement Program Released Materials
15 | P a g e