You are on page 1of 10

10-3 Measures of Central Tendency and Variation

So far, we have discussed some graphical methods of data description. Now, we


will investigate how statements of central tendency and variation can be used.
Statements of central tendency are nothing more than attempts to describe the
whole distribution of a data set by reporting one most typical value. The most
typical values serve to represent the point or points about which most of the
values in the distribution are centered.
Three statements of central tendency are commonly used. They are the mean,
the median, and the mode.

Computing Means
The mean is the average value. Imagine that we have recorded the ages of 11
children with chicken pox.
2, 4, 5, 5, 6, 6, 6, 7, 7, 8, and 10 years
One way to represent these 11 ages with one most typical value is to calculate the
mean age for the group of 11 children.
To find the mean, we must first sum the ages (2 + 4 + 5 + 5 + 6 + 6 + 6 + 7 + 7 + 8
+ 10 = 60), and then divide the total by the number of cases (66 11 = 6 years).
The mean for this age group is 6 years.

Computing Means
To compute the mean, or average, we use the following definition:
Definition of Mean

The mean is the average, the location in the distribution of values at which the
deviations above it and the deviations below it are equal.

Computing Means
We can think of the mean as the balance point.
The mean is sensitive to exceptional values.

1 + 4 + 4 + 5 = 14

1 + 1 + 1 + 1 + 2 + 2 + 2 + 4 = 14

The mean is the average, the location in the distribution of values at which the
deviations above it and the deviations below it are equal. Graphically, the sum of
the total distances to the data points below the mean equals the sum of the total
distances to the data points above the mean. The mean is sensitive to every
value.

Computing Median
The median is the middle value in a group of ordered values. Look again at the
ages of the 11 children with chicken pox:

2, 4, 5, 5, 6, 6, 6, 7, 7, 8, and 10 years
5 values on the right
5 values median
on the left
Notice that the ages have been ordered from the youngest to the oldest, and the
middle value is the sixth value from either end. The median for this group is 6
years.

In the previous example, there were an odd number of values; so, the median was
actually one of the values.

If we have an even number of values, the median is found by averaging the two
middle values. For example, imagine that the ordered ages of the group of six
children are:
1, 2, 3, 4, 5, and 6

In this case, there is no single middle data value. The two middle value are 3
and 4 years. The average of these is ((3 + 4) 2) 3.5. Therefore, 3.5 years
represents the median value for this group of six ages, but it is not one of the
ages.

In general, to find the median for a set of n numbers:


1. First sort the values in order.
2. If the number of values is odd, the median is the number located in the exact
middle of the list.
3. If the number of values is even, the median is found by computing the mean
of the two middle numbers.
Recall that the mean is dramatically affected by extreme values; whereas, the
median is not dramatically affected.

Finding the Modes


The mode is the most frequently occurring value in a group of values. The most
frequently occurring age in the group of 11 children with chicken pox:
2, 4, 5, 5, 6, 6, 6, 7, 7, 8, and 10 years
is 6 years. Three of the children were 6 years old, and none of the other ages
were represented more than twice.
It is important to remember that there may not be a single, most frequently
occurring value in the distribution, and if there is, it may not be unique i.e., there
may be more than one mode.

Choosing the Most Appropriate Average


When we attempt to choose the most appropriate statement of central tendency to
use when describing a set of data, two factors must be considered:
First, is the shape of the distribution. If the distribution is symmetrical, the
mean, median, and mode will be equal or very close, and each may be used
as the most typical representative value. If the shape of the distribution is
not symmetrical (skewed), the median is the best choice as a measure of
central tendency.
The second factor to consider is the scale of measurement. If we are
dealing with unordered categories, then our only choice is mode. For
continuous data we may use the mean, median, or mode depending on
symmetry.

Measures of Spread or Dispersion


Consider the following data:
Range = upper extreme lower extreme = 35 20 = 15
20, 22, 22, 25, 26, 27, 27, 28, 30, 35

20, 22, 22, 25, 26, 27, 27, 28, 30, 35

Box Plots
Boxplots are another graphical means of displaying key characteristics of data.
The idea is to arrange the data in increasing order and choose three numbers Q1,
Q2, and Q3 that divide it into four equal parts as indicated below.
Minimum
Maximum
Median
data point
data point

Q1

Q2

Q3
Top 25%

Bottom
25%
Min Q1
15

20

Q2 Q3
25

30

Max
35

45

Outliers
An outlier is a value that is located very far away from almost al of the other data
values. Relative to the other data, an outlier s an extreme value.
An outlier is any value that is more than 1.5 times the interquartile range above
the upper quartile or below the lower quartile. Outliers are commonly indicated
with an asterisk.

Outlier
*
Min Q1

Q2 Q3

Max

Mean Absolute Deviation


The mean absolute deviation (MAD) makes use of the absolute value to find the
distance each data point is away from the mean. The following steps are used to
determine the MAD.
1. Measure the distance from the mean by simply subtracting the data value
minus the mean.
2. Find the absolute value of the differences.
3. Sum those absolute values.
4. Find the mean by dividing the sum by the number of scores.

A visual picture of the mean absolute value deviation for the 11 ages of children
with chicken pox.
Bar Graph of Ages with Mean

Ages

Segment Marking the Mean

Children Listed Numerically


Compute the MAD for the following two sets of data.

Variance and Standard Deviation


The variance and the standard deviation are two commonly used statements of
dispersion. The variance of a sample may be defined as the sum of the squared
deviations from the mean value divided by the number of values. The variance is
calculated as follows:
1. Find the deviation of each value in the set from the mean value.
2. Square each of these deviations.
3. Sum all of these squared deviations
4. Divide the sum of the squared deviations from the mean by the number of
values.
Formula for the variance, v:

The standard deviation, s, is the square root of the variance.

Now, let us calculate the variance and standard deviation of the ages of the 11
children with chicken pox.

Variance

Standard deviation

Normal Distributions
A normal distribution is a frequency distribution with continuous, randomly
occurring data plotted on the x axis and frequency (counts) plotted on the y axis.
This distribution is actually a theoretical distribution, but many real world situations
are close to this idea
1. The normal curve has a bell-shape.
2. The curve extends infinitely in both directions and gets closer and closer to
the x-axis but never reaches it.
3. The curve is symmetrical about its center point, but not all symmetrical
distributions are normal.
4. The three statements of central tendency (mean, median, and mode) all fall
in the exact same place.
On a normal curve, about 68% of the values lie within 1 standard deviation of the
mean, about 95% lie within 2 standard deviations, and about 99.8% are within 3
standard deviations.

Applications of the Normal Curve


Suppose that cholesterol values for a population have a mean of 200 mg/dl and a
standard deviation of 40 mg/dl. The following shows the raw cholesterol levels for
plus/minus 1 and 2 standard deviations.

Example 1 Calculate the mean, median, and mode for the following data sets:
a. 2, 5, 7, 5, 8, 9, 5, 10, 8

b. 10, 12, 12, 15, 17, 12, 18, 14, 11, 13

c. 17, 21, 21, 18, 39, 17, 13

Example 2 Johns fall-quarter grades follows are below. Find his grade point
average for the term (A = 4, B = 3, C = 2, D = 1, F = 0).
Course
Credits
Grades
Math
5
B
English
3
A
Physics
5
C
German
3
D
Handball
1
A

Example 3 For certain workers, the mean wage is $5.00/hr, with a standard
deviation of $0.50. If a worker is chosen at random, what is the probability that the
workers wage is between $4.50 and $5.50? Assume a normal distribution of
wages.

Example 4 Ginnys median score on three tests was 90. Her mean score was 92
and her range was 6. What were her three test scores?

10.3 #A-1, 3, 9, 13, 15, 17, 19, 21, 23, B-1, 13, 19, 21

You might also like