You are on page 1of 34

Math 1040 Study Guide/Lecture Notes Set 1 (Ch.

1–3)
**Information in dashed boxes will be covered in class**

Section 1.1—Introduction
Definitions:
 ______________________________: how to collect, organize, summarize and analyze
information so that conclusions can be drawn with a
measure of confidence
o ___________________________________: organizing and summarizing collected data
o ___________________________________: methods that take sample results to extend it to (generalize
about) the population with a measure of reliability
 ______________________________: entire group of interest in a study
 ______________________________: a single person or object being studied
 ______________________________: subset of the population being studied
 ______________________________: numerical summary of a sample computed
from
 ______________________________: numerical summary of a population ____________
 ______________________________: characteristics of individuals in the population
o ______________________________________: responses based on attributes or characteristics
o _______________________________________: responses are numerical measures
 ______________________________: finite (countable) number of possible values,
can be listed without skipping any
 ______________________________: infinite number of possible values, can have
more and more decimal places

Exercise: Is this a statistic or a parameter?

1. In a study of all 2223 passengers aboard the Titanic, it is found that 706 survived when
it sank.

2. A recent survey of a sample of MBAs reported that the average salary for an MBA is
more than $82,000. (Source: The Wall Street Journal)

3. Starting salaries for the 667 MBA graduates from the University of Chicago Graduate
School of Business increased 8.5% from the previous year.

4. In a recent poll of Salt Lake Community College students, 83% of students owned a
vehicle.

5. In a recent survey of 457 attendies of Jumangi, 402 would recommend the movie to a
friend.
Exercise: Are the following a discrete or a continuous data set?

1. In a survey of 1059 adults, it is found that 39% of them have guns in their homes.
2. The number of heads obtained after flipping a coin five times.
3. The distance Tiger Woods can drive a golf ball.
4. Points scored in a basketball game.
5. Volume of water lost each day from a leaky faucet.
6. Length of a song.
7. Number of words in a song.

Exercise: Classify the variable as qualitative or quantitative.

1. Gender
2. Temperature
3. Nation of origin
4. Number of siblings
5. Number of day
6. Grams of carbohydrates in a donut
7. Phone number
8. Value of a house
9. Zip code

Number of
Exercise: The following information Weight
Model Body Style Seats
relates to the 2011 model year product
3 Series Coupe 3362 4
line of BMW automobiles. Identify the
5 Series Sedan 4056 5
individual being studies, variables and if
6 Series Convertible 4277 4
the data corresponding to the variables
7 Series Sedan 4564 5
are qualitative, quantitative, continuous
X3 Sport Utility 4012 5
or discrete.
Z4 Coupe 3505 2

Individuals: Variable(s): Qualitative.Quantitative Discrete/Contin.


Exercise: Smoker’s IQ. A study was conducted in which 20,211 18-year old recruits were given
an exam to measure IQ. In addition, the recruits were asked to disclose their smoking status.
An individual was considered a smoker if he smoked at least 1 cigarette per day. The goal of
the study was to determine whether adolescents aged 18 to 21 who smoke have a lower IQ
than nonsmokers. It was found that the average IQ of the smokers was 94, while the average
IQ of nonsmokers was 101. The researchers concluded that lower IQ individuals are more
likely to chose to smoke, not that smoking makes people less intelligent.

a) What is the population being studied?

b) What is the sample?

c) What are the descriptive statistics?

Section 1.2—Observational Studies vs. Designed Experiments


Definitions:
 ______________________________: measuring response variable without influencing the values of
either the response variable or explanatory variable
 Cross-sectional Studies: observational studies that collect data at one particular time
 Case-control Studies: observational studies that require individuals to look back in
time to record measurements (retrospective)
 Cohort Studies: observational studies that track individuals repeatedly over
Time (prospective)
 ______________________________: measuring response variable after intentionally changing the
value of the explanatory variable
 ______________________________: cannot distinguish between the effects of two or more
explanatory variables, or between the explanatory variable and
some other variable(s) not included in the study
 ______________________________: a variable that was not considered in the study, but affects the
response variable
IMPORTANT COMMENT
Because of lurking variables, OBSERVATIONAL STUDIES _____________________ ALLOW
A RESEARCHER TO CLAIM CAUSATION, ONLY _________________________________!
 ______________________________: a variable in the study where you can’t tell the difference
between its effect and another explanatory variable’s effect

 ______________________________: list of information from all individuals in a population


Exercise: Determine whether the study depicts an observational study or an
experiment.

1. Rats with cancer are divided into two groups. One group received 5 milligrams of a medication
that is thought to fight cancer, and the other received 10 milligrams. After 2 years, the spread
of cancer is measured.

2. Concervation agents netted 250 large-mouth bass in a lake and determined how many were
carrying parasites.

3. Seventh grade students are randomly divided into two groups. One group is taught math using
traditional techniques; the other is taught math using a reform method. After 1 year, each
group is given an achievement test to compare proficiency.

4. A survey was conducted asking 400 people, “Do you prefer Coke or Pepsi?”

Exercise: Daily Coffee Consumption Is there an association between daily coffee


consumption and the occurrence of skin cancer? Reasearchers asked 93,676 women to
disclose their coffee-drinking habits and also determined which of the women had
nonmelanoma skin cancer. The researchers concluded that consumption woman who
drank six or more cups of caffeinated coffee per day had signifigantly lower change of
nonmelanoma skin cancer.

a) Whas this a oberservational study or experiment? If observational, what type of


oberservational study was it?

b) What is the response variable? What is the explanatory variable?

c) Can we conclude drinking six or more cups of coffee reduces the change of nonmelanoma skin
cancer?

Exercise: Get Married, Gain Weight Are young couples who marry or cohabitate more
likely togian weight than those who stay single? Researchers followed 8000 men and women for 7
years. At the start of the study, none of the participants were married or living with a romantic
partner. The researchers found that women who married or cohabitated during the study period
gained 9 pounds more then sinlge women, and married or cohabitating men gained, on average, 6
pounds mopre then sinlge men.

a) Why is this an oberational study? What type of observational study is it?

b) What is the response variable in the study? What is the explanatory variable?

c) Identify some of the potential lurking variables in the study.

d) Can we conclude that getting married or cohabitating causes one to gain weight?
Section 1.3—Simple Random Sampling
Definitions:
 ______________________________: using chance (an objective device) to select individuals from a
population to be included in the sample
 ______________________________: every sample of a certain size is equally likely (and every
subject has an equal chance of being selected)
Notation:
__________ population size
__________ sample size
 ______________________________: list of all individuals in a population being studied.
 ______________________________: once selected, an individual cannot be chosen again.
 ______________________________: once selected, individuals are placed back in the population
and can be chosen again.
Examples of random devices for selecting a Simple Random Sample: draw names from a hat;
computer software program/website, random number generator; random number table.
Step 1: Randomly find a starting point
Step 2: Using as many columns as digits needed, move through the table, skipping
numbers outside the range of the frame or ones that have already been used,
until you reach your sample size.

Population: N=10; Sample, n=4; start at row 5/column 26 and move across:

Population: N=65; Sample, n=6; start at row 16/column 14 and move downward:

Population: N=304; Sample, n=8; start at row 18/column 27 and move downward:

Page 1
Exercise: Sophia has 4 tickets to a concert and 10 friends that she would like to invite. Which of
the following would produce a simiple random sample of the 4 friends that she will bring with
her:

Mike, Jamie, Adam, Yvette, Ashley, Monica, Cherie, Julie, Willard, Bruce

a. List each persons name on a separate pice of paper, place them in a hat and draw 4.
b. List the names in alphabetical order and take the first 4 names.
c. Ask one of her friends who she would bring.
d. Number the friends from 1 to 10 and us a random number generator to produce 4 numbers
between 1 and 10 that correspond to the 10 friends.

Use the random number table 1 to find a list of 4 friends

Use the TI graphing calculator to obtain a random sample of the 4 friends that she will bring
with her and give there names.

Exercise: To complete the Citizenship in the World merit badge, one must select 3 of the
following eight organizations and describe their role in the world. The list of digits below is
from a random number generator using technology.

7, 4, 4, 7, 3, 6, 2, 1, 9, 9, 5

United Nations, The World Court, The World Health Organization, CARE,
Amnesty International, The Red Cross, World Trade Organization, The World Bank

a. Give a list of 3 organization using the random number list above.

b. Give a list of 3 organaizations using the TI graphing calculator.

c. Give a list of 3 organaizations using the Random Number Table 1.

Exercise: A student completing an associates degree is required to select and two courses from
the following list of courses as part of the program.

CRN 1040 Elementary Statistcs


CRN 1060 Nonparamatric Statistics
CRN 1412 Methods of Multivariate Analysis
CRN 1212 Educational Planning

Page 2
CRN 1280 Fieldwork Methods and Research

a. Give all possible two-course selections:

b. What is the chance that a student will pick the combination CRN 1040 and CRN 1060?

Exercise: Suppose you are the CEO of a coffee shop chain and you wish to conduct a survey to
determine the length of time customers are in line. Your administrative assistance provides
you will a list of the 674 coffee shops in your chain.

a. Discuss a procedure you could follow to obtain a simple random sample of 5 coffee shops
for your survey.

b. Obtain a sample using the following random number table: 86571 03875
17245 55042
07498 57343
73278 71956
16387 36139
02561 04736
84702 36139

Exercise: The owner of a private food store is concerned with employee moral. She decides to
survey the employees to learn aobut work environment and job satisfaction. Obtain a simple
random sample of size 4 from the below table using the random number table provided.

10, 37, 22, 46, 03, 15, 02, 18, 14, 05, 27

Page 3
Section 1.4—Other Effective Sampling Methods

Definitions:
 ______________________________: a simple random sample is drawn from each nonoverlapping
subgroup that the population has been separated into (“strata”)
Works best when: individuals in each stratum are similar in some way and we
want to make sure each stratum is type is represented; the size of
each simple random sample is proportional to the number of
individuals in the stratum from which it is selected

 ______________________________: a starting point is randomly selected (p) and then every kth
subject is included in the sample

N
k = n , round down to the nearest integer
the n subjects in the sample are numbered:
p, p + k, p + 2k, p + 3k, . . . , p + (n-1)k

Example: n = 4; the 5th and then every 8th subject is selected


1st subject: p = random start = _______ (5)
2nd subject: p + k = _______ (13)
3rd subject: p + 2k = _______ (21)
4th subject: p + 3k = p + (n-1)k = 5 + (4-1)8 = _______ (29)
Works best when: individuals are arranged sequentially but you’re not sure how
many total individuals there are because you don’t have a frame of
the population (files in a drawer, people coming out of a building,
etc.)

 ______________________________: the population is divided into sections (“clusters”); all


individuals from randomly selected clusters are included in the
sample
Works best when: individuals are grouped geographically so it makes logistical sense
to sample this way, but location does not affect your variable of
interest; clusters are heterogeneous like the population is

 ______________________________: sampling subjects who are easy to get (bad sampling method)
Works best when: NEVER!! (not random)

 ______________________________: a convenience sample where individuals are self-selected


(another bad sampling method)
Works best when: NEVER!! (not random, favors those with stronger opinions)

 ______________________________: a sample that is selected in multiple stages, each of which


might use a different method of sampling
Works best when: the population is organized hierarchically but no list exists
Page 4
Identify the sampling method used in each of the following examples:
______ The fifth person exiting the student center and a. Simple Random
every 12th person afterward is interviewed.
______ Two out of the six parking lots on campus are b. Stratified
randomly selected and every driver in the two
lots receives a questionnaire. c. Systematic
______ Each registered student is assigned a number,
20 numbers are randomly generated and the d. Cluster
corresponding students are interviewed.
______ Six buildings on campus are randomly selected e. Convenience
and every 4th person who exits each building
is included in the sample. f. Voluntary Response
______ Randomly select 10 males and 10 females in the
activity center to be interviewed. g. Multistage
______ Every person in my class that sits next to me in
Math 1040 is interviewed.
______ A pile of questionnaires is left on a table in the
library with a sign that says, “Please complete.”

Exercise: Determine the sampleing method used:

a. To estimate the percentage of defects in a recent manufacturing batch, a quality control


manager selects every 8th chip that comes of the assemply line starting with the 3rd. until she
obtains 140 chips.
b. To determine customer opinion of its boarding policy, Southwest Airlines randomly selects 60
flights during a certain week and surveys all passengers on the flights.
c. A member of Congress wishes to determine her constituency’s opinion regarding estate taxes.
She divides her constituency into three income classes: low-income households, middle-
income households, and upper-income households She then takes a simple random sample of
households from each income class.
d. In an effort to identify whether an advertising campaign has been effective, a marketing firm
conducts a nationalwide poll by randomly selecting individuals from a list of known users of
the product.
e. A radio station asks its listeners to call in their opinion regarding the use of U.S. forces in
peacekeeping missions.
f. A college official divides the student population into five classes: freshman, sophomore, junior,
senior and graduate stufent. The official takes a simple random sample from each class and
asks the members’ opinions regarding student services.

Exercise: The human resources department at a certain company wants to conduct a survey
regarding worker moral. The department has a list of all 400 employees at the company and
wants to do a systematic sample of size 20. To do this they randomly selet the random number
8 person and then selects every 20th person. List the corresponding workers to be surveyed?

1st = ___________, 2nd = _____________, 3rd = ___________, . . . . . , 20th = ____________

Page 5
Section 1.5—Bias in Sampling
Definitions:
 ______________________________: when sample results are not representative of the population
 ______________________________: bias because the technique used to select the sample favors
some individuals over others (not random)
o ______________________________: type of sampling bias where part of the population has
a lower chance or no chance of being in the sample (frame
is incomplete)
 ______________________________: bias because individuals in the sample do not respond
 ______________________________: bias because responses given are not accurate

Types of Questions:

 ______________________________: allows respondent to choose his or own response


ex. What is the most important problem facing youths today?

 ______________________________: requires the respondants to choose from a list of predermined


responses.

I am going to leave it to you to read chapter 1.5. This chapter is 3 pages and is a must read
section about sources of bias in sampling.

Exercise: In a group, create some questions that will introduce bias from respondants.

Exercise: In a group, create some samples that will introduce bias.

Page 6
Section 2.1—Organizing Qualitative (Categorical) Data
Definitions:

 A ___________________________________lists each category of data and the number of occurrences for each cagory.

 A __________________________________ lists each category of data and the proportion (or percent) of
obervations in that category.
Relative frequency =

Exercise: Favorite Day to Eat Out – A survey was


conucted by Wakefield Research in which 40
particpants were asked to disclose their facorite
night to go out to dinner. The following data are
based on their results.

a. Construct a frequency distribution b. Construct a relative frequency distribution

Day of the Week Talley Frequency Day of the Week Relative Frequency
Sunday Sunday
Monday Monday
Tuesday Tuesday
Wednesday Wednesday
Thursday Thursday
Friday Friday
Saturday Saturday

Sum = _______ Sum = _______

c. Construct a bar graph d. Construct a relative frequency bar garph

Favorite Day to Eat Out Favorite Day to Eat Out


16 0.4
Relative Frequency

14 0.35
12 0.3
Frequency

10 0.25
8 0.2
6 0.15
4 0.1
2 0.05
0 0

Day of the Week Day of the Week

Page 1
 A ________________ is a circle divided into sectors. Each sector represents a
category of data and its area is proportional to the frequency.

Favorite Day to Eat Out


Sunday Sunday
Saturday 7% Monday
20% 5% Monday
Tuesday Tuesday
13% Wednesday
Thursday
Friday
Wednesday
15% Saturday
Friday
Thursday
35%
5%

e. If you own a restaurant, which days would you purchase an advertisement on the local
readio? Are there days that you should avoid?

Exercise: College Survey – In a national survey conducted


amoung college students, college students were asked,
“How often do you wear a seat belt when riding in a car
driven by someone else?” The frequencies were as follows:

a. Construct a relative frequency distribution.

Relative
Response
Frequency
Never
Rarely
Sometimes
Most of the
time
Always

b. What percentate of respondants answered “Always”?

c. What percentage of respondants answered “Never” or “Rarely”?

d. Suppose a representative from the Center says, “52.7% of college students always wear a
seatbelt.” Is this a descriptive statistic or inverential statement? Why?

Page 2
e. Construct a relative frequency bar graph.

College Survey
0.6

Relative Frequency
0.5
0.4
0.3
0.2
0.1
0
Never Rarely Sometimes Most of the Always
time
Response

f. If a certain college has 12,728 students, how many would we expect to never wear a seat
belt?

Exercise: Identity fraud occurs when someone else’s


personal information is used to open credit card
accounts, apply for a job, receive benefits, and so
on. The following relative frequency bar graph
represents the various types of identity theft based
on a study conducted by the Federal Tade
Commission in a recent year.

a. If there 10 million cases of identity fraud in a recent year, how many were credit card
fraud?

b. If the commission claimed, “The results indicate that 17% of the faud commited in that
year was from Utilities fraud”, would you say this statement is descriptive or inferential?
Why?

Page 3
Exercise: Desirability Attributes – A random sample of 2163 adults (aged 18 and over) was
asked, “ Given a choice of the following, which one would you most want to be?” The results
of the survey are presented in the side-by-side bar graph.

a. What proportion of males would like to be richer?

b. What attribute do females desire more than males?

Exercise: Made in Amerkica – A random sample of


2163 adults (aged 18 and over) were asked, “When
you see an ad emphasize that a product is ‘Made in
America’, are you more likey to buy it, less likely to
buy it, or neither mnore or less likely to buty it?” The
results of the survey are presented in the side-by-
side bar graph.

a. Which age group has the greatest proporition


who are more likely to buy when made in America?

b. Which age group has a majority of respondants who are less likely to buy when
made in America?

c. What is the apparent association between age and likelihood to buy when made in
America?

Page 4
Section 2.2—Organizing Quantitative Data
Definitions:
 A ___________________is for qualitative data, while a __________________________is used for discrete
quantitative data or continuous quantitative data. Data is then grouped into _______________.
Notes on historgrams:
(1) Bars have equal widths
(2) Bars are touching
 ______________________________: smallest value in a class
 ______________________________: largest value in a class
 ______________________________: difference between consecutive lower class limits (or
consecutive upper class limits); NOT the difference between the lower
and upper class limits of a class
 ______________________________: a type of table where the first class has no lower limit and/or
the last class has no upper limit

Example: Get heidth data for our class:

Construct a Frequency Distribution with ______ classes

largest data value − smallest data value


Class Width: =
number of classes

Lower Class Limits:

Upper Class Limits:

Page 5
Identify the Shape of a Distrubution
One way that a quantitative variable is described is through the shape of its disbribution

Exercise: Cigarette Tax Rates – The table shows the tax, in dollars, on a pack of cigarettes in each
of the 50 states and Washington DC as of Januarary 2014.

a. Construct a relative frequency distribution with lower


class limit 0 and a class width of 0.50.
b. Construct a relative frequency histogram

c. Construct a relative frequency distribution with lower class limit 0 and a class width of 0.10.
d. Construct a relative frequency histogram

e. Does one frequency distribution provide a better summary of the data then the other? Explain

Page 6
Exercise: Predicting School Enrollment – To predict future
enrollment in a school district, fifty household within the district
were sampled, and asked to disclose the number of children under
the age of five living in the household. The results of the survey are
presented here:

a. Is the given data discrete or continuous? Why?

b. Construct a relative frequency distribution.

c. Construct a relative frequency histogram.

d. What percentage of households have two or more children?

e. What is the shape of the distribution?

Exercise: Cigarette Tax Rates – The table shows the tax, in dollars, on a pack of cigarettes in each
of the 50 states and Washington DC as of Januarary 2014.

f. Construct a relative frequency distribution with lower class limit 0 and a class width of 0.50.
g. Construct a relative frequency histogram
h. Construct a relative frequency distribution with lower class limit 0 and a class width of 0.10.
i. Construct a relative frequency histogram
j. Does one frequency distribution provide a better summary of the data then the other? Explain

Exercise: Back Pack Weights – The following frequency


histogram represent the weights of a random sample
of college students backpack weights is given.

a. How many students were samped?

b. Determine the class width.

c. Which weight is the most common?

d. What percent of students have a backpack the weights at lest 30 pounds?

e. What is the shape of the distribution?


Page 7
 A______________________________ is another way to represent quantitative data where the rightmost (last)
digitforms a “leaf” and all other digits form the “stem”

Exercise: Grams of Fat in a McDonalds’s Breakfast –


The following data represents the number of grams of
fat in breakfast meals offered at McDonald’s.
Construct a stem and leaf plot and describe the
distribution.

 A______________________________ is another graph for quantitative data created by placing a dot above
the value for each observation across a horizontal number line in increasing order.

Exercise: Wendy’s Arrival Times – The manager at Wendy’s fast-food restaurant wants to know the
typical number of customer who arrie during lunch during the lunch hour. The following data
represents the number of customers who arrive at Wendy’s for 40 randomaly selected 15-minute
intercals of time during lunch. Construct a dot plot and identify the distribution.

Number of Arrival at Wendy's


7 8 9 6 4 9 8 7
8 7 11 7 8 8 8 8
6 6 8 7 8 3 2 9
9 8 7 1 8 7 9 8
4 7 7 9 6 5 9 5

Number of Arrivals at Wendy's

1 2 3 4 5 6 7 8 9 10 11
Number of Customers

a. What is the shape of the distribution?

b. What proportion (percent) of 15 minute intervals had greater then 7 customers?

Page 8
Section 2.3—Graphical Misrepresentations of Data (Bad Graphs)
1. Misrepresentation of Data (Circle the good graph; cross out the bad graph)

What’s wrong with the


bad graph?

2. Manipulating the Vertical Scale (Circle the good graph; cross out the bad graph)

What’s wrong with the


bad graph?

What’s wrong with the


bad graph?

3. Inappropriate dimensions (Circle the good graph; cross out the bad graph)
What’s wrong with the
bad graph?

What’s wrong with the


bad graph?

Also watch out for:


 Inconsistent scales (spacing on axes should be equally spaced and numbered)
 Confusing/difficult to read graphs
 Any graph that makes it more difficult to understand the data

Page 1
Section 3.1—Measures of Central Tendency
Advantages/Disadvantages
 ________________________: the arithmetic average, computed by
adding up all the values and dividing by the number of
observations
1. Add up all the data values
2. Divide by the # of values
Population: μ = ___________________

Sample: x̅ = ___________________
Example: 3, 5, 2, 6, 3
Sum of all data values: ∑ xi = ___________
Number of data values: n = ___________
∑x
Divide: x̅ = n i = ___________ (3.8)

Advantages/Disadvantages
 _________________________: the middle value, value that lies in the
middle of the data when arranged in ascending order
1. Sort the data, from least to greatest
2. Count from the ends to the middle of the list
3. If there is 1 value in the middle (n odd), that is the median
If there are 2 values in the middle (n even), average them
Example: 3, 4, 2, 6, 3
Ordered data values,
smallest to largest: ___ ___ ___ ___ ___
Middle value: ________ (3)
Example: 3, 4, 2, 6, 3, 7
Ordered data values,
smallest to largest: ___ ___ ___ ___ ___ ___
Average of 2
middle values: ________ (3.5)

Comparing the Mean and Median


Definition:
 _________________________: extreme values do not affect the value of a numerical summary

Picture
Symmetric Data: Mean Median

Left-skewed: Mean Median

Right-skewed: Mean Median

Page 2
 _________________________: most frequent observation occuring
1. Choose the value that occurs the most often
2. There may be multiple modes or no modes.
Example: 3, 4, 2, 6, 3 Which value is repeated the most often? _________
Example: 6, 3, 2, 6, 3 Which value is repeated the most often? _________

TI 83/84 or StatCrunch

Example: Random sample of car emissions (CO2 equivalents in tons per year)
7.2, 7.1, 7.4, 7.9, 6.5, 7.2, 8.2, 9.3

Mean: Median: Mode:

Example: Exams scores in a statistics class taught using traditioal lecture and a class taught
using a “flipped” classroom model were recorded. The “flipped” classroom is one where the
content is delivered via video and watched at home, while class time is used for activities and
exploration.

a. Determine the mean and median score for each class.


(Comment on any differences)

b. Suppose the score of 59.8 in the traditional course was incorrectly recorded as 598. How
would this affect the mean? The median? What property does this illustrate?

Example: The following data represents the pulst rates (beast per
minute) of nine students enrolled in a statistics course. Treat the 9
students as a population.

a. Determine the population mean pulse rate.

b. Find 3 samples of size 3 and determint the sample mean pulse


rate. Are they overestimates or underestimates?

i)

ii)

iii)

Page 3
Example: Hours Working – A random sample of 25 college
students was asked, “ How many hours per week typically
do you work outside the home?” Their responses were as
follows.

a. Determine the distribution of hours worked by drawing


a frequency histogram.

b. Find the mean and median. Which measure of central tendency better describes the hours
worked?

Example: The following data represent the weights (in


grams) of a simple random sample of 50 M&M plain
candies. Determine the shape of the distribution of
weights of M&M’s by drawing a frequency histogram.
Find the mean and median. Which measure of central
tendancy better describes the weight of plain M&M’s?

Create classes with a lower limit of 0.8 and class widths


of 0.25.

class frequency
0.8 - 0.824
0.825 - 0.849
0.850 - 0.874
0.875 - 0.899
0.9 - 0.924
0.925 - 0.949
0.950 - 0.974
0.975 - 1.00

Example: The median for the given set of data values is 16. What is the missing value?
3 7 12 13 ____ 25 28 31

Example: In a class with 4 exams, a student has an average of 84 on the first 3 exams. What
must she score on the fourth exam to have an average of 86?

Page 4
Section 3.2—Measures of Dispersion
Compare the wait times for two restaurants. Notice that the histograms for both restaurants are
centered at 8 minutes, but Restaurant B is much more consistent (wait times are between about 6 and
10 minutes) than Restaurant A (wait times are between about 1 and 15 minutes). How consistent or
inconsistent measurements are is called dispersion (or spread).

Definition:
 ______________________________: a measure of how much the data are spread out (in general, we
prefer less dispersion)

Measures of Variation:
 ______________________________: difference between the largestand smallest data values.

Advantage / Disadvantage
Subtract: (largest data value) – (smallest data value)
Example: 3, 5, 2, 4, 1

R = Max – Min = – = _________

 ______________________________: measures the typical deviation between a data value and the mean.
1. Find the mean
2. Find deviations from each data value to the mean
3. Square all deviations and add them up
Note: we square them since the sum of the deviations
without squaring is always 0.
4. Divide by N (population) or n–1 (sample)
5. Take the square root
∑(x−µ)2 ∑(x−x̅)2
Population: σ= √ Sample: s= √
N n−1

Definition:
 ______________________________: a measure of the amount of data available; the number
of observations that are free to be any value given the sum
of the observations = n – 1

The larger the standard deviation, the __________________ dispersion the distribution has.

Page 1
 ______________________________: the square of the standard deviation.

Population variance: σ2 Sample variance: s2

Definitions:
 ______________________________: when a statistic consistently underestimates or overestimate
a parameter

Exercise: Consider the grades from an exam of 10 students enrolled in an


introductory statistics class. Treat the 10 students as a population.

a. Comput the sample standard deviation of a random sample of 4 students.

sample mean= __________

Score Deviation from the mean Squared Deviation

b. Compute the population standard deviation using the TI Calculator and StatCrunch.

c. Compute the sample variance and population variance of the exam scores.

Exercise: Consider the emissions from 8 cars from a rental fleet. Compute the following using
the TI 83/84 and StatCrunch

Car emissions sample: 7.2, 7.1, 7.4, 7.9, 6.5, 7.2, 8.2, 9.3

Range:
Standard Deviation:
Variance:
What if these were the car emissions for all the cars in a particular fleet?

Range:
Standard Deviation:
Variance:
Page 2
Exercise: Do example 7 from homework.

Summary: The _________________________________ is used in conjunction with the _______________to


numerically describe distuributions that are ____________________________. The _____________
measures the center of the distribution while the _________________________________
measures the ________________of the distribution.

Exercise: Which distribution depics the hight standard deviation?

Empirical Rule of Thumb


For bell-shaped data:
About __________% of all values lie within 1 standard deviation of the mean.
About __________% of all values lie within 2 standard deviations of the mean.
About __________% of all values lie within 3 standard deviations of the mean.
99.7% If a distribution is bell-shaped with x̅ = 12 and s = 2,
95% approximately what percent of the data is:
between 10 and 14?
68%

less than 8 or greater than 16?



s
If a distribution is bell-shaped with x̅ = 12 and s = 2,
approximately 99.7% are between what two values?
12

What percent of the data is greater than 14?


x̅ = 12
s=2
Fill the boxes with the What percent of the data is less than 6?
appropriate values.

Page 3
Exercise: The following data represents the weights
(in grams) of a random sample of 50 M&M plain
candies.

Open in StatCrunch chapter 3.2 #23

a. Determine the sample standard deviation weight.

b. On the basis of the histogram, comment on the appropriateness ofusing the Emperical Rule
to make any general statements about the weights of M&Ms.

c. 68% of the candies weights will be between what two weights?

d. Use the Emperical Rule to determine the percentage of M&Ms with weights between 0.803
and 0.947 grams, inclusive.

e. Use the Emperical Rule to determine the percentage of M&Ms that weights more then 0.911
grams.

f. Determint the actual percentage of M&Ms that weigh more then 0.911 grams.

Exercise: SAT Math scores have a bell-shaped distribution with a mean of 515 and a standard
deviation of 114.

a. What percentage of SAT scores are between 401 and 629?

b. What percentage of SAT scores are less then 401?

c. What percentage of SAT scores are greater than 743?

Page 4
Section 3.3—Measures of Central Tendency & Dispersion from Grouped Data
Definition:
 ______________________________: data is given summarized in a frequency table, rather than the
raw data values for each observation. Since we only know counts
of observations that fall within certain categories, we can’t
compute the mean or standard deviation using the formulas from
Sections 3.1 and 3.2.

1st class midpoint × 1st class frequency+⋯+ last class midpoint × last class frequency ∑ xi fi
Instead, μ≈ ≈ x̅ ≈ ∑ fi
total frequency
lower limit + next lower limit
xi is the _____________________ of the ith class = 2

fi is the ______________________ of the ith class

Formulas for the standard deviation of grouped data (frequency distributions):

∑(xi −μ)2 fi ∑(xi −μ)2 fi


σ=√ ∑ fi
s= √ ∑ fi −1

However, we will use the calculator to determine the mean and


standard deviation.

Exercise: The five-year rate of return of a Exercise: Recently a random dample of 25-34
random sample of 40 large-blended mutual year olds was asked, “How much do you
funds is given. Approximate the mean and currently have in savings, not including
standard deviation of the five-year rate of retirement savings?” Approximate the mean
return using the TI-calculator. and standard deviation amount of savings
using the TI-calculator.

Page 1
Exercise: The following data represents the number of people aged 25 to 64 years covered by
health insurance (private or government) in 2003. Approximate the mean and standard
deviation for age using the TI-calculator.

Exercise: The following is the daytime household


temperature that the thermostat is set to when
someone is home for a randon sample of 750
households.

a. Approximate the mean and standard


deviation using the TI-calculator.

b. Draw a frequency histogram of the data to verify that the distribution is bell shaped.

Thermostat Temperature
200

150
frequency

100

50

0
55 57 59 61 63 65 67 69 71 73 75 77 79 81 83 85
Temperature (in degrees)

c. On the basis of the histogram, comment on the appropriateness of using the empirical rule
to make any general statements about the temperature data.

d. According to the Empirical Rule, 95% of days in the month will be between what two
temperatures?

Page 2
Section 3.4—Measures of Position and Outliers
Definitions:
 __________________________: the number of standard deviations that a data value is from the mean

Formulas: z= z=

(population z-score) (sample z-score)


Characteristics (1) unitless
of z-score: (2) mean of z-scores = ________
(3) standard deviation of z-scores = ________
(4) if a data value is larger than the mean, the z-score is ____________________
(5) if a data value is smaller than the mean, the z-score is __________________
(6) can be used to compare measurements on different scales
Example: Final exam scores for a biology course have a mean of 79 and standard deviation of 5.
Final exam scores for a statistics course have a mean of 82 and a standard deviation of 7. You
scored 86 on the biology final and 91 on the statistics final. How many standard deviations above
the mean were you in each course? Relative to the rest of the students, on which exam did you do
better (scored higher)?

86 on Biology: vs. 91 on Statistics:


x (test score) = ___________ x (test score) = ___________
µ (Biology mean) = _________ µ (Statistics mean) = _________
σ (Biology st. dev.) = _________ σ (Statistics st. dev.) = _________
x−μ x−μ
z= = ___________ z= = ___________
σ σ
86 on Biology is __________ standard 91 on Statistics is __________ standard
deviations _____________ the class mean. deviations _____________ the class mean.
above/below above/below

You scored higher on the ____________________ test relative to the rest of the students because it was
further above the mean.

Example: Roberto finishes a triathlon (750-meter swim, 5-kilometer run, and 20-kilometer bicycle)
in 63.2 minutes. Among all men in the race, the mean finishing time was 69.4 minutes with a
standard deviation of 8.0 minutes. Zandra finishes the same triathlon in 79.3 minutes. Among all
woman in the race, the mean finishing time was 84.7 minutes with a standard deviation of 7.4
minutes. Who did better in relationship to their gender?

Page 1
 _______________________: Pk, the data value where k% of the observations are less thanor equal to it.

Example: The 85th percentile of the IQ scores of honor roll graduates of a certain college is 124. What
does this mean?
_______% of the college’s honor roll graduates have an IQ of 124 or _____________, and
_______% of the college’s honor roll graduates have an IQ _____________ than 124.

Example: The 80th percentile of the distance students of a certain college is 12.4 milesd. What does
this mean?
_______% of the students drive 12.4 or _____________, and
_______% of the students drive drive _____________ than 12.4

 ______________________________: values that divide data sets into fourths


Q1 divides the bottom _______% from the top _______% = the _______th percentile
Q2 divides the bottom _______% from the top _______% = the _______th percentile or the _____________
Q3 divides the bottom _______% from the top _______% = the _______th percentile
How to find quartiles:
Step 1: Arrange the data in ______________________________ order. If If the number of observations is
Step 2: Determine the ________________________ (also called Q2). odd do not include the median
Step 3: Use the median to divide the data set into halves. when determining Q1 and Q3 by
Q1 is the median of the ___________________ half of the data and hand
Q3 is the median of the ___________________ half of the data.

 _______________________________________________: range of the middle 50% of the observations


IQR = ___________ – ___________ Note: IQR is a resistant measure of spread

 ______________________________: extreme observations, values far from the bulk of the data
Determining outliers (Note: there are other methods for determining outliers as well):
Lower fence = _________ – 1.5(_________)
Upper fence = _________ + 1.5(_________)
Values outside the fences (less than the ______________________________ or greater than the
______________________________ are considered outliers.

Another way to determine outliers: |z-score| > 2 means the value is unusual (outlier)
Page 2
Example: One variable that is measured by online homework systems is the amount of time a student
spends on homework for each section of text. The following is a summary of the number of
minutes a student spends for each section for last semester.

Q1  42 , Q2  51.5 , and Q3  72.5

a. Provide and interpretation of these results

b. Determine and interpret the interquartile range.

c. Suppose a students spent 2 hours doing homework for a section. Is this an outlier?

d. Do you believe that the distribution of time spent doing homework is sqewed or
symmetric? Why?

Example: Hemoglobin in cats: 5.7 6.1 7.8 8.8 9.4 9.4 9.6 9.9
10.0 10.3 10.6 10.7 11.5 11.7 12.9 14.3
Q1 = Q2 = Q3 =

Distance between Q1 & Median =

Distance between Q3 & Median =

Skewed?

IQR =

Lower fence =

Upper fence =

Outliers?

Page 3
Example: A credit card company has a fraud-detection
service that determines if a card has any unusual
activity. The company maintains a database of daily
charges on a customer’s credit card. If a day’s woth
of charges appears unusual, the customer is
contacted to make sure that the credit card has not
been compromised. The company uses the upper
fence as the cutoff point for the daily charges that
must be exceeded before the customer is contacted.
What is the cutoff point?

Section 3.5—The Five-Number Summary and Boxplots


Definitions:

 ________________________________________: consists of the smallest data value, Q1, median, Q3, and
the largest data value

 ______________________________: a type of graph created using the five-number summary

 ______________________________: lines extending from the box to the smallest and largest (non-
outlier) values

Skewness in Boxplots: Right-skewed: Median left of box’s center, right whisker longer
Note: outliers are *
marked with an *
Symmetric: Median roughly at box’s center, whiskers equal length

Left-skewed: Median right of box’s center, left whisker longer

Page 4
Example: Below are two boxplots for the ACT score for Incoming Freshmen at two colleges. Use the
boxplots to compare the two colleges.

Which college has a higher median ACT score?


College ______ has a higher median than
College ______ (compare center lines).
Which college’s ACT scores have less dispersion?
College ______ has less variation than
College ______ (compare boxplot lengths).
Which college has more symmetric ACT scores?
College ______ is more symmetric than 5-Number Summaries for each college:
College ______ (compare shapes). A:
Which college has more skewed ACT scores? B:
College ______ is more skewed than
College ______ (compare shapes).

24 15 29 30 26
Example: Consider the number of items produced per hour 28 20 27 23 30
produced at a factory: 22 7 26 28 21

Five-Number Summary: __________, __________, __________, __________, __________

Fences:

Draw the boxplot:

What if one of the largest value 30 is replaced with 40?

Page 5
Example: Do store-brand chocolate chip cookies have
fewer chips per cookie than Keebler’s Chips Deluxe
Chocolate Chip Cookies? To find out, a student
randomly selected 13 cookies of each brand and
counted the number of chips in the cookies.

Name Brand: 22 22 23 23 24 25 26 28 28 29 31 32 35

Store Brand: 15 17 19 21 22 23 24 24 26 27 28 28 33

a. Determine the 5-number summary for each brand of cookies using the TI-84 or by hand

Name Brand:

Store Brand:

b. Draw side-by-side boxplots for each brand of cookies

Name Brand:

Store Brand:

Does there appear to be a difference in the number of chips per cookie?

c. Does one have a more consistant number of chips per cookie?

Page 6

You might also like