You are on page 1of 41

Measures of Central

Tendency for Ungrouped


Data
Section 3.1

Mean (Arithmetic Mean)

Typically what is referred to when you see


the term average

Mean for population data denoted by


Mean for sample data denoted by
N is population size
n is sample size

Example
Number of car thefts in the past 12 days
6

11

15

6 + 3 + 7 + 11 + 4 + 3 + 8 + 7 + 2 + 6 + 9 + 15
=
12
81
=
= 6.75
12

Sample Mean vs. Population Mean


Number of car thefts in the past 12 days
6

11

15

Sample means:
odd number days
even number days

6+7+4+8+2+9
6

3+11+3+7+6+15
6

36
6

=6
45
6

*Sample mean varies from sample to sample

= 7.5

Outliers and Mean


Recall: Outliers are values that are very small or very
large relative to the majority of values in a data set
Mean is heavily influenced by outliers
Example: The following data represent the number of
tornadoes that touched down during 1950-1994 in the 12
states that had the most tornadoes during that period.
CO

FL

IA

IL

KS

LA

MO

MS

NE

OK

SD

TX

1113

2009

1374

1137

2110

1086

1166

1039

1673

2300

1139

5490

= 21636
=1803
12
Eliminate outlier (TX-5490) from the sample set:
= 16146
=1468
11

Trimmed Mean
Can used a trimmed mean to get rid of outliers.
Calculate trimmed mean by dropping certain percentage
of values from each end of a ranked data set.
Example: The following data represent the number of
tornadoes that touched down during 1950-1994 in the 12
states that had the most tornadoes during that period.
CO

FL

IA

IL

KS

LA

MO

MS

NE

OK

SD

TX

1113

2009

1374

1137

2110

1086

1166

1039

1673

2300

1139

5490

Rank data: 1039, 1086, 1113, 1137, 1139, 1166, 1374, 1673, 2009, 2110, 2300, 5490
Trim 10% from both sides. 10% of 12 is 1.2, so eliminate 1 point from each side
Trimmed data: 1086, 1113, 1137, 1139, 1166, 1374, 1673, 2009, 2110, 2300

= 15107
10 =1511

Weighted Mean
Use when certain values of a data set are
considered more important than others
=

Example: Weighted grades for STA2023


Class

Weight

Actual Grade

Computer Practice

10%

95%

1095+1590+10876583
10+15+10+65
8565
=
100

Computer Quizzes

15%

90%

= 85.65%

In-Class Quizzes

10%

87%

In-Class Tests

65%

83%

Median
The value of the middle term in a data set
that has been ranked in increasing order
Gives the center point of the histogram
Not influenced by outliers

Example: Number of car thefts in the past 12


days
6

11

Rank values: 2, 3, 3, 4, 6, 6, 7, 7, 8, 9, 11, 15


Even number of values so take the average of the 2 middle terms
Median = (6 + 7) / 2 = 6.5

15

Mode
The value that occurs with the highest frequency in a
data set
Data set may have none (no observation occurs more
than once) or may have more than one made
One mode - unimodal
Two modes - bimodal
More than two modes - multimodal
Example: Number of car thefts in the past 12 days
6

11

15

3, 6 & 7 all occur twice, so this data set is multimodal with


modes = 3, 6, 7

Relationship Among Mean, Median & Mode

Measures of Dispersion
for Ungrouped Data
Section 3.2

Measures of Dispersion

Help us learn about the spread of a data set


Range
o The difference between the largest & smallest values in a data
set
o Heavily influenced by outliers
o Range = Largest Value Smallest Value
o Example: The following data represent the number of tornadoes
that touched down during 1950-1994 in the 12 states that had
the most tornadoes during that period.
CO

FL

IA

IL

KS

LA

MO

MS

NE

OK

SD

TX

1113

2009

1374

1137

2110

1086

1166

1039

1673

2300

1139

5490

Range = 5490 1039 = 4451

Standard Deviation

Tells us how closely the values of a data set are


clustered around the mean
Lower values indicate the data set is spread over a
smaller range around the mean
Never a negative value
Variance square of the standard deviation
2

Population Variance: =

()2

Population Standard Deviation: =


2

Sample Variance: =

()2
1

Sample Standard Deviation: = 2

Calculating Standard Deviation


Example: The following data set belongs to a
population. Calculate the variance and standard
deviation.
5
-7
1
0
-9
16 10 7

Recall:

()2

Calculate : = 2.875
Calculate ( )2 = 494.875
2

494.875
8

= 61.859

= 61.859 = 7.865

Short-Cut Formulas for Variance


2 =

2 =

2
(
)
2

2
(
)
2
1

Calculating Variance & Standard


Deviation Using Short-Cut Formulas
Example: The following data represent the number of
tornadoes that touched down during 1950-1994 in the 12
states that had the most tornadoes during that period.
CO

FL

IA

IL

KS

LA

MO

MS

NE

OK

SD

TX

1113

2009

1374

1137

2110

1086

1166

1039

1673

2300

1139

5490

2 =
Need to know:

( )2

2 and ( )2

2 = 56,052,418 and ( )2 = 468,116,496


468,116,496
56,052,418
12
2 =
= 1549337.273
12 1

1549337.273 = 1244.724

Parameter vs. Statistic

Parameter
o A numerical measure calculated for a population data
set
o e.g. mean, median, mode, range, variance, standard
deviation
o ,

Statistic

o A summary measure calculated for a sample data set


o e.g. sample mean, median, mode, range, sample
variance, sample standard deviation
o ,

Mean, Variance &


Standard Deviation for
Grouped Data
Section 3.3

Mean for Grouped Data


=

m = midpoint, f = frequency of class

Example: The tables give the grouped data on the ounces


of milk dispensed by a machine into 1-gallon jugs for a
sample of 250 jugs of milk selected from a days
production. Note that 1 gallon is equal to 128 ounces.
Ounces of Milk

Number of Jugs

Midpoint

mf

121 to less than 123

122

=122*5 = 610

123 to less than 125

13

124

=124*13 = 1612

125 to less than 127

42

126

=126*42 = 5292

127 to less than 129

129

128

=128*129 = 16512

129 to less than 131

61

130

=130*61 = 7930

= .
=

Variance for Grouped Data


2

()2

()2
1

Short-cut Formulas:
2 =

( )2
2

2 =

( )2
2

Calculating Variance and Standard


Deviation for Grouped Data
Example: The tables give the grouped data on the ounces of milk
dispensed by a machine into 1-gallon jugs for a sample of 250 jugs of
milk selected from a days production. Note that 1 gallon is equal to 128
ounces.
2

Ounces of Milk

Number of Jugs

Midpoint

mf

121 to less than 123

122

=122*5 = 610

74420

123 to less than 125

13

124

=124*13 = 1612

199888

125 to less than 127

42

126

=126*42 = 5292

666792

127 to less than 129

129

128

=128*129 = 16512

2113536

129 to less than 131

61

130

=130*61 = 7930

1030900

= 31956

Recall:
2 =

( )2

319562
4085536
250 = 3.182
2 =
250 1

2 = 4085536

s = 3.182 = 1.784

Use of Standard
Deviation
Section 3.4

Standard Deviation
Allows us to find the proportion or
percentage of total observations that fall
within a given interval about the mean

Chebyshevs Theorem
Gives a lower bound for the area under a
curve between two points on opposite sides
of the mean and at the same distance from
the mean.
For any number k greater than 1, at least
1
1 2 of the data values lie within k standard

deviations of the mean.

Chebyshevs Theorem
If k=2,

1
1
1
1 2 = 1 2 = 1 = .75 75%

2
4
75% of the values in the data set lie within 2
standard deviations of the mean.
If k=3,

1
1
1
1 2 = 1 2 = 1 = .89 89%

3
9
89% of the values in the data set lie within 3
standard deviations of the mean.

Example using Chebyshevs


According to the National Center for Education Statistics, the
amounts of all loans granted to students during the 2007-2008
academic year had a distribution with a mean of $8109.65.
Suppose that the standard deviation of this distribution is
$2412.
a. Using Chebyshevs theorem, find at least what percentage of
students had loans between
i. $2079.65 and $14,139.65
ii. $3285.65 and $12,933.65
b. Using Chebyshevs theorem, find the interval that contains the
amounts of loans for at least 89% of all students.

Solution
a.i. 84% of all students had loans between
$2079.65 and $14,139.65.
a.ii. 75% of all students had loans between
$3285.65 and $12,933.65.
b. The interval $873.65 - $15,345.65 contains
89% of all students.

Empirical Rule
*Only applies to a bell-shaped (normal) distribution.

For a bell-shaped distribution, approximately


1. 68% of observations lie within one standard
deviation of the mean.
2. 95% of observations lie within two standard
deviations of the mean.
3. 99.7% of observations lie within three standard
deviations of the mean.

Example using Empirical Rule


The prices of all college textbooks follow a bellshaped distribution with a mean of $180 and a
standard deviation of $30.
a. Using the empirical rule, find the percentage of all
college textbooks with their prices between
i. $150 and $210
ii. $120 and $240
b. Using the empirical rule, find the interval that
contains the prices of 99.7% of college textbooks.

Solution
a.i. 68% of all textbooks cost between $150
and $210
a.ii. 95% of all textbooks cost between $120
and $240
b. 99.7% of college textbooks cost between
$90 and $270

Measures of Position
Section 3.5

Measure of Position
The measure of position determines the
position of a single value in relation to other
values in a sample or population data set.

Quartiles
Summary measures that divide a ranked data
set into four equal parts
The 2nd quartile, Q2 is the same as the median.
The 1st quartile, Q1 is the value of the middle
term among observations less than the median.
The 3rd quartile, Q3 is the value of the middle
term among observations greater than the
median.

Interquartile Range
The interquartile range (IQR) is the difference
between the third and first quartiles.
IQR = Q3 - Q1
Example: Number of car thefts in the past 12
days
6

11

Q1 = 3.5
Q2 = 6.5
IQR = 8.5 - 3.5 = 5

Q3 = 8.5

15

Percentiles
Percentiles divide a ranked data set into 100
equal parts.

Calculating Percentiles - The approximate value of the kth


percentile, denoted by Pk, is
Pk = Value of the (kn/100)th term in a ranked data set
where k denotes the number of the percentile and n
represents the sample size.

Calculating Percentiles
Example: Number of car thefts in the past 12
days
6

11

15

Rank the data: 2, 3, 3, 4, 6, 6, 7, 7, 8, 9, 11, 15


The position of the 70th percentile is:
kn/100 = 70(12)/100 = 8.4th term = 8th term
P70 = 70th percentile = 7 car thefts
Approximately 70% of the 12 days had less than 7
car thefts.

Percentile Rank
The percentile rank of a value xi in a data set gives the
percentage of values in the data set that are less than xi.
=


100%

Example: Number of car thefts in the past 12 days


6

11

Rank the data: 2, 3, 3, 4, 6, 6, 7, 7, 8, 9, 11, 15


Percentile rank of 9 = 9/12 x 100 = 75%
So 75% of days had less than 9 car thefts.

15

Box-and-Whisker Plot
Section 3.6

Box-and-Whisker Plot
A graphic presentation of data using five
measures: median, first quartile, third
quartile, and smallest and largest values in
the data set between the lower and upper
inner fences.
A visualization of the center, the spread, and
the skewness of a data set.
Can detect outliers
Lower inner fence = Q1 - 1.5 x IQR
Upper inner fence = Q3 + 1.5 x IQR

Constructing a Box-and-Whisker Plot


Example: The following data are the incomes (in thousands of dollars) for a
sample of 12 households.
75

69

84

112

74

104

81

90

94

144

79

98

1. Rank the data: 69


74
75
79
81
84
90
94
98
104
112
144
2. Calculate the median: Median = (84 + 90) / 2 = 87
3. Calculate Q1: Q1 = (75 + 79) / 2 = 77
4. Calculate Q3: Q3 = (98 + 104) / 2 = 101
5. Calculate IQR: IQR = Q3 - Q1 = 101 - 77 = 24
6. Calculate Lower Inner Fence: Q 1 - 1.5 x IQR = 77 - 1.5 x 24 = 41
7. Calculate Upper Inner Fence: Q 3 + 1.5 x IQR = 101 + 1.5 x 24 = 137
8. Determine the smallest and largest values in the data set within the inner fences. 69 & 112
9. Draw a box with the first and third quartile and median.
10. Draw 2 lines (whiskers) to the smallest and
largest values within the inner fences.
11. Mark values that fall outside the whiskers
with an asterisk.

Review Exercise
The following data gives the time (in minutes)
that each of 20 students selected from a
university waited in line at their bookstore to
pay for their textbooks in the beginning of the
Fall 2012 semester. Create a box-andwhisker plot displaying the data set.
15 8
23 21 5
6
5
10 14 17
31 19 34 3
22
30 31 25 17 16

You might also like