You are on page 1of 10

Measures of Central Tendency

Introduction
A measure of central tendency is a single value that attempts to describe a set of data by
identifying the central position within that set of data. As such, measures of central tendency are
sometimes called measures of central location. They are also classed as summary statistics.
The mean (often called the average) is most likely the measure of central tendency that you are
most familiar with, but there are others, such as the median and the mode.
The mean, median and mode are all valid measures of central tendency, but under different
conditions, some measures of central tendency become more appropriate to use than others. In
the following sections, we will look at the mean, mode and median, and learn how to calculate
them and under what conditions they are most appropriate to be used.
Mean (Arithmetic)
The mean (or average) is the most popular and well known measure of central tendency. It can
be used with both discrete and continuous data, although its use is most often with continuous
data (see our Types of Variable guide for data types). The mean is equal to the sum of all the
values in the data set divided by the number of values in the data set. So, if we have n values in
a data set and they have values x
1
, x
2
, ..., x
n
, the sample mean, usually denoted
by (pronounced x bar), is:

This formula is usually written in a slightly different manner using the Greek capitol letter, ,
pronounced "sigma", which means "sum of...":

You may have noticed that the above formula refers to the sample mean. So, why have we
called it a sample mean? This is because, in statistics, samples and populations have very
different meanings and these differences are very important, even if, in the case of the mean,
they are calculated in the same way. To acknowledge that we are calculating the population
mean and not the sample mean, we use the Greek lower case letter "mu", denoted as :

The mean is essentially a model of your data set. It is the value that is most common. You will
notice, however, that the mean is not often one of the actual values that you have observed in
your data set. However, one of its important properties is that it minimises error in the prediction
of any one value in your data set. That is, it is the value that produces the lowest amount of
error from all other values in the data set.
An important property of the mean is that it includes every value in your data set as part of the
calculation. In addition, the mean is the only measure of central tendency where the sum of the
deviations of each value from the mean is always zero.
When not to use the mean
The mean has one main disadvantage: it is particularly susceptible to the influence of outliers.
These are values that are unusual compared to the rest of the data set by being especially small
or large in numerical value. For example, consider the wages of staff at a factory below:
Staff 1 2 3 4 5 6 7 8 9 10
Salary 15k 18k 16k 14k 15k 15k 12k 17k 90k 95k
The mean salary for these ten staff is $30.7k. However, inspecting the raw data suggests that
this mean value might not be the best way to accurately reflect the typical salary of a worker, as
most workers have salaries in the $12k to 18k range. The mean is being skewed by the two
large salaries. Therefore, in this situation, we would like to have a better measure of central
tendency. As we will find out later, taking the median would be a better measure of central
tendency in this situation.
Another time when we usually prefer the median over the mean (or mode) is when our data is
skewed (i.e., the frequency distribution for our data is skewed). If we consider the normal
distribution - as this is the most frequently assessed in statistics - when the data is perfectly
normal, the mean, median and mode are identical. Moreover, they all represent the most typical
value in the data set. However, as the data becomes skewed the mean loses its ability to
provide the best central location for the data because the skewed data is dragging it away from
the typical value. However, the median best retains this position and is not as strongly
influenced by the skewed values. This is explained in more detail in the skewed distribution
section later in this guide.
The median is the middle score for a set of data that has been arranged in order of magnitude.
The median is less affected by outliers and skewed data. In order to calculate the median,
suppose we have the data below:
65 55 89 56 35 14 56 55 87 45 92
We first need to rearrange that data into order of magnitude (smallest first):
14 35 45 55 55 56 56 65 87 89 92
Our median mark is the middle mark - in this case, 56 (highlighted in bold). It is the middle mark
because there are 5 scores before it and 5 scores after it. This works fine when you have an
odd number of scores, but what happens when you have an even number of scores? What if
you had only 10 scores? Well, you simply have to take the middle two scores and average the
result. So, if we look at the example below:
65 55 89 56 35 14 56 55 87 45
We again rearrange that data into order of magnitude (smallest first):
14 35 45 55 55 56 56 65 87 89 92
Only now we have to take the 5th and 6th score in our data set and average them to get a
median of 55.5.
Mode
The mode is the most frequent score in our data set. On a histogram it represents the highest
bar in a bar chart or histogram. You can, therefore, sometimes consider the mode as being the
most popular option. An example of a mode is presented below:
Normally, the mode is used for categorical data where we wish to know which is the most
common category, as illustrated below:

We can see above that the most common form of transport, in this particular data set, is the
bus. However, one of the problems with the mode is that it is not unique, so it leaves us with
problems when we have two or more values that share the highest frequency, such as below:
We are now stuck as to which mode best describes the central tendency of the data. This is
particularly problematic when we have continuous data because we are more likely not to have
any one value that is more frequent than the other. For example, consider measuring 30
peoples' weight (to the nearest 0.1 kg). How likely is it that we will find two or more people
with exactly the same weight (e.g., 67.4 kg)? The answer, is probably very unlikely - many
people might be close, but with such a small sample (30 people) and a large range of possible
weights, you are unlikely to find two people with exactly the same weight; that is, to the nearest
0.1 kg. This is why the mode is very rarely used with continuous data.
Another problem with the mode is that it will not provide us with a very good measure of central
tendency when the most common mark is far away from the rest of the data in the data set, as
depicted in the diagram below:

In the above diagram the mode has a value of 2. We can clearly see, however, that the mode is
not representative of the data, which is mostly concentrated around the 20 to 30 value range. To
use the mode to describe the central tendency of this data set would be misleading.


Skewed Distributions and the Mean and Median
We often test whether our data is normally distributed because this is a common assumption
underlying many statistical tests. An example of a normally distributed set of data is presented
below:

When you have a normally distributed sample you can legitimately use both the mean or the
median as your measure of central tendency. In fact, in any symmetrical distribution the mean,
median and mode are equal. However, in this situation, the mean is widely preferred as the best
measure of central tendency because it is the measure that includes all the values in the data
set for its calculation, and any change in any of the scores will affect the value of the mean. This
is not the case with the median or mode.
However, when our data is skewed, for example, as with the right-skewed data set below:
we find that the mean is being dragged in the direct of the skew. In these situations, the median
is generally considered to be the best representative of the central location of the data. The
more skewed the distribution, the greater the difference between the median and mean, and the
greater emphasis should be placed on using the median as opposed to the mean. A classic
example of the above right-skewed distribution is income (salary), where higher-earners provide
a false representation of the typical income if expressed as a mean and not a median.
If dealing with a normal distribution, and tests of normality show that the data is non-normal, it is
customary to use the median instead of the mean. However, this is more a rule of thumb than a
strict guideline. Sometimes, researchers wish to report the mean of a skewed distribution if the
median and mean are not appreciably different (a subjective assessment), and if it allows easier
comparisons to previous research to be made.
Summary of when to use the mean, median and mode
Please use the following summary table to know what the best measure of central tendency is
with respect to the different types of variable.
Type of Variable Best measure of central tendency
Nominal Mode
Ordinal Median
Interval/Ratio (not skewed) Mean
Interval/Ratio (skewed) Median

Please find below some common questions that are asked regarding measures of central
tendency, along with their answers. These FAQs are in addition to our article on measures of
central tendency found on the previous page.
What is the best measure of central tendency?
There can often be a "best" measure of central tendency with regards to the data you are
analysing, but there is no one "best" measure of central tendency. This is because whether you
use the median, mean or mode will depend on the type of data you have (see our Types of
Variable guide), such as nominal or continuous data; whether your data has outliers and/or is
skewed; and what you are trying to show from your data. Further considerations of when to use
each measure of central tendency is found in our guide on the previous page.
In a strongly skewed distribution, what is the best indicator of central tendency?
It is usually inappropriate to use the mean in such situations where your data is skewed. You
would normally choose the median or mode, with the median usually preferred. This is
discussed on the previous page under the subtitle, "When not to use the mean".
Does all data have a median, mode and mean?
Yes and no. All continuous data has a median, mode and mean. However, strictly speaking,
ordinal data has a median and mode only, and nominal data has only a mode. However, a
consensus has not been reached among statisticians about whether the mean can be used with
ordinal data, and you can often see a mean reported for Likert data in research.
When is the mean the best measure of central tendency?
The mean is usually the best measure of central tendency to use when your data distribution
is continuous and symmetrical, such as when your data is normally distributed. However, it all
depends on what you are trying to show from your data.
When is the mode the best measure of central tendency?
The mode is the least used of the measures of central tendency and can only be used when
dealing with nominal data. For this reason, the mode will be the best measure of central
tendency (as it is the only one appropriate to use) when dealing with nominal data. The mean
and/or median are usually preferred when dealing with all other types of data, but this does not
mean it is never used with these data types.
When is the median the best measure of central tendency?
The median is usually preferred to other measures of central tendency when your data set is
skewed (i.e., forms a skewed distribution) or you are dealing with ordinal data. However, the
mode can also be appropriate in these situations, but is not as commonly used as the median.
What is the most appropriate measure of central tendency when the data has outliers?
The median is usually preferred in these situations because the value of the mean can be
distorted by the outliers. However, it will depend on how influential the outliers are. If they do not
significantly distort the mean, using the mean as the measure of central tendency will usually be
preferred.
In a normally distributed data set, which is greatest: mode, median or mean?
If the data set is perfectly normal, the mean, median and mean are equal to each other (i.e., the
same value).
For any data set, which measures of central tendency have only one value?

Skew and Central tendency
recall from Module 4 that skew is a measure of asymmetry in a set of data.
Skew affects the location of the mode, median, and mean. In a symmetric
distribution such as a normal distribution, the three measures of central tendency coincide.
that is, the most frequent score (mode) equals the midpoint
(median), which equals the average (mean) (Figure 5.3).
this is not the case in a skewed distribution. Scores in the tail of a skewed
distribution are outliers. and we already saw via the teeter-totter what happens when an
outlier is introduced: the mean moves in the direction of the
extreme scorethat is, toward the tail. Because the mean is the most sensitive
of the three measures of central tendency to extreme scores, the mean is pulled
most toward the tail. the mode, which is simply the most frequent score,
remains where it was. the median falls between the mean and the mode. this
happens in both negatively and positively skewed distributions (Figure 5.4).
Because of the known relationship of the mode, median, and mean in normal versus
skewed distributions, a researcher can tell from the calculated values whether a distribution
is normally distributed or skewed. From Figures 5.3 and 5.4, we see that if the mean is lower
than the mode, the distribution is negatively skewed. Conversely, if the mean is higher than
the mode, the distribution is positively skewed. Similarly, a researcher can tell from the shape
of the distribution where the mean, median, and mode will fall. If a distribution is negatively
skewed, the mean must be lower than the mode. Conversely, if a distribution is
positively
skewed, the mean must be higher than the mode.

Discrete vs. Continuous Variables
Quantitative variables can be further classified as discrete or continuous. If a variable can take on any
value between its minimum value and its maximum value, it is called a continuous variable; otherwise, it is
called a discrete variable.
Some examples will clarify the difference between discrete and continouous variables.
Suppose the fire department mandates that all fire fighters must weigh between 150 and 250
pounds. The weight of a fire fighter would be an example of a continuous variable; since a fire
fighter's weight could take on any value between 150 and 250 pounds.
Suppose we flip a coin and count the number of heads. The number of heads could be any
integer value between 0 and plus infinity. However, it could not be any number between 0 and
plus infinity. We could not, for example, get 2.3 heads. Therefore, the number of heads must be a
discrete variable.

Interval scale is a scale which represents quantity and has equal units but for which zero represents
simply an additional point of measurement. On an interval scale the distance between any two positions
is of known size. The Fahrenheit scale is a clear example of the interval scale of measurement. Thus, 60
degree Fahrenheit or -10 degrees Fahrenheit are interval data. Measurement of Sea Level is another
example of an interval scale. With each of these scales there is direct, measurable quantity with equality
of units. In addition, zero does not represent the absolute lowest value. Rather, it is point on the scale
with numbers both above and below it (for example, -10 degrees Fahrenheit).

Examples of interval data:
-Temperature (Degrees F)
-Dates
-Dollars
-Years
-Most personality measures.
-WAIS intelligence score.
-Sea Level

Example: A student who scores 90% is probably a better student than someone who scores 70%. The
difference between the two scores is 20%. In an interval scale, the data can be ranked and for which the
difference between the two values can be calculated and interpreted.



Interval scale[edit source | editbeta]
The interval type allows for the degree of difference between items, but not the ratio between them.
Examples include temperaturewith the Celsius scale, which has an arbitrarily-defined zero point (the
freezing point of a particular substance under particular conditions), and date when measured from an
arbitrary epoch (such as AD). Ratios are not allowed since 20C cannot be said to be "twice as hot" as
10C, nor can multiplication/division be carried out between any two dates directly. However, ratios of
differencescan be expressed; for example, one difference can be twice another. Interval type variables
are sometimes also called "scaled variables", but the formal mathematical term is an affine space (in this
case an affine line).

Central tendency and statistical dispersion[edit source | editbeta]
The mode, median, and arithmetic mean are allowed to measure central tendency of interval variables,
while measures of statistical dispersion include range and standard deviation. Since one cannot divide,
one cannot define measures that require a ratio, such as the studentized range or the coefficient of
variation. More subtly, while one can define moments about the origin, only central moments are
meaningful, since the choice of origin is arbitrary. One can define standardized moments, since ratios of
differences are meaningful, but one cannot define the coefficient of variation, since the mean is a moment
about the origin, unlike the standard deviation, which is (the square root of) a central moment.

Levels of Measurement
Not all data is created equally. Some is quantitative, and some is qualitative. Some is
continuous and some is discrete.
Another way to separate data is to look at what is being measured. To do this there are four
levels of measurement: nominal, ordinal, interval and ratio. Different levels of measurement call
for different statistical techniques. For example, it makes no sense whatsoever to find themean,
and median of a list of Social Security numbers.
Nominal Level of Measurement
The nominal level of measurement is the lowest of the four ways to characterize data. Nominal
means "in name only" and that should help to remember what this level is all about. Nominal
data deals with names, categories, or labels.
Data at the nominal level is qualitative. Colors of eyes, yes or no responses to a survey, and
favorite breakfast cereal all deal with the nominal level of measurement. Even some things with
numbers associated with them, such as a number on the back of a football jersey, are nominal
since it is used to "name" an individual player on the field.
Data at this level can't be ordered in a meaningful way, and it makes no sense to calculate
things such as means and standard deviations.
Ordinal Level of Measurement
The next level is called the ordinal level of measurement. Data at this level can be ordered, but
no differences between the data can be taken that are meaningful.
Here you should think of things like a list of the top ten cities to live. The data, here ten cities,
are ranked from one to ten, but differences between the cities don't make much sense. There's
no way from looking at just the rankings to know how much better life is in city number 1 than
city number 2.
Another example of this are letter grades. You can order things so that A is higher than a B, but
without any other information, there is no way of knowing how much better an A is from a B.
As with the nominal level, data at the ordinal level should not be used in calculations.
Interval Level of Measurement
The interval level of measurement deals with data that can be ordered, and in which differences
between the data does make sense. Data at this level does not have a starting point.
The Fahrenheit and Celsius scales of temperatures are both examples of data at the interval
level of measurement. You can talk about 30 degrees being 60 degrees less than 90 degrees,
so differences do make sense. However 0 degrees (in both scales) cold as it may be does not
represent the total absence of temperature.
Data at the interval level can be used in calculations. However, data at this level does lack one
type of comparison. Even though 3 x 30 = 90, it is not correct to say that 90 degrees Celsius is
three times as hot as 30 degrees Celsius.
Ratio Level of Measurement
The fourth and highest level of measurement is the ratio level. Data at the ratio level possess all
of the features of the interval level, in addition to a zero value. Due to the presence of a zero, it
now makes sense to compare the ratios of measurements. Phrases such as "four times" and
"twice" are meaningful at the ratio level.
Distances, in any system of measurement give us data at the ratio level. A measurement such
as 0 feet does make sense, as it represents no length. Furthermore 2 feet is twice as long as 1
foot. So ratios can be formed between the data.
At the ratio level of measurement, not only can sums and differences be calculated, but also
ratios. One measurement can be divided by any nonzero measurement, and a meaningful
number will result.

You might also like