You are on page 1of 7

Algebra 1

Section 9.1

Descriptive Statistics
Sets of Data

A set of data for a single variable exists as a list of numbers that may or may not be in order.
The values in the list represent separate pieces of information with regard to what is being measured.
For example, if one were interested in the age of students in a college class, they may record the age
of each student and obtain the data set below.

{19, 20, 18, 19, 21, 15, 21, 22, 19, 20}

The set above is clearly not in order, because the values are listed randomly and are not increas-
ing. Ordering a set of data is often convenient because it is easier to infer the nature of an ordered
data set than infer the nature of an unordered data set. Ordering the above set of data would result
in the equivalent data set below.

{15, 18, 19, 19, 19, 20, 20, 21, 21, 22}

Notice that when the data is ordered, it is easier to find the highest, lowest, and middle value of
the set.

1
Descriptive Statistics

Each data set has a number of values associated to it that can be calculated which describe
aspects of the data set. The most important characteristics of a data set are its shape (which values
occur frequently or rarely in the set), center (the middle part of the data set), and spread (how much
the data varies from its center). These characteristics can be found by calculating descriptive statistics
for the data set, and descriptive statistics can be compared between different data sets to understand
the difference in the sets shape, center, and spread. Some commonly used descriptive statistics, along
with what they measure and how they are calculated, are given below.

Mean: The mean of a data set measures the center of that data set. It is the same as the average
of the values in the data set. It is calculated by adding the values in a data set and dividing
the total by the number of values in the data set. For example, to find the mean of the data set
on the previous page, add the values in the set and divide by 10 (the number of values in the set).

194
(15 + 18 + 19 + 19 + 19 + 20 + 20 + 21 + 21 + 22)/10 = 10 = 19.4

Notice that no value in the set is exactly 19.4, but this is the average value in the set.
Median: The median of a data set is another way to measure the center of that data set. It
is taken by first ordering the data set and then finding the value in the middle of the set. For
example, if a data set has five values, the third value has two values to its left and two values
to its right, so it is in the middle of the set and thus is the data sets median. The data set
{4, 3, 1, 7, 9}, for example would first need to be ordered in the form {1, 3, 4, 7, 9}, and then the
middle value of the set (and thus the median) is 4.

The data set from the previous page, however, has an even number of values and so the middle
of the set lies in between two items in the set. In such a case, take the mean of the values that
are in the middle of the data set. For example, the data set on the previous page has 10 values
and so the fifth and sixth values (19 and 20 respectively) are in the middle of the set. The mean
of 19 and 20 is (19 + 20)/2 = 19.5, so the median of the data set is 19.5. Notice that although
the median and mean of the data set both measure its center, the values vary slightly. In other
data sets, the median and mean may be very different. In such a case, one should calculate both
values and see which one better represents the center of the set. That is, the statistic which most
of the values in the set are closer to is a better representation of the sets center.
Range: The range of a data set measures the spread of the set, and it is simply calculated by
taking the difference between the highest and lowest value in a data set. For example, the range
of the data set on the previous page is 22 15 = 7. Data sets with large ranges have large
spreads.
Mode: The mode of a data set is the value in the set that occurs most frequently. It measures
the shape of the data, and a data set can have multiple modes if the set has two values that both
occur the most number of times in the set. The mode of the data set on the previous page is 19,
since it occurs three times and no other value occurs more than twice. If multiple but not all
values in the data set appear with the highest frequency, the data set has multiple modes (all of
the values that appear with the highest frequency). If every value in the data set appears with
the same frequency, the data set has no mode.

2
Interquartile Range (IQR): The interquartile range (IQR) of a data set measures the spread
of that data set. The median of a data set splits the set into two parts, the values to the left of
the median and half the values to the right of the median when the set is ordered. The median
of the values to the left of the median is called the 1st Quartile of the data set. The median
of the values to the right of the median is called the 3rd Quartile of the data set. The IQR is
the difference of the 1st and 3rd quartiles.

For example, the data set {2, 4, 3, 8, 6, 5, 1} would become {1, 2, 3, 4, 5, 6, 8} when ordered (always
remember to order the set when finding the median or IQR). The median of the set is 4 because
3 values lie to the left of it and 3 values lie to the right of it. Then, the values to the left of 4 are
{1, 2, 3} so the 1st quartile is the median of these values, or 2. The values to the right of 4 are
{5, 6, 8} so the 3rd quartile is the median of these values, or 6. Then, the IQR is the difference
of the 3rd and 1st quartiles, or 6 2 = 4. Sometimes, the 1st and 3rd quartile will be found by
taking the mean of two values in the middle of their respective regions with respect to the data
sets median. That is, the 1st and 3rd quartile may not always be values in the set, just like the
median. Data sets with large IQRs have large spreads.

Standard Deviation: The standard deviation of a data set measures the spread of that data
set. It is another alternative to the IQR or range. One should find all the values and decide
which one best represents the spread of the data set. The standard deviation of a data set is
found by taking the square root of the mean of the squared differences of every value in the set
with the sets mean. For example, consider the data set {2, 4, 6, 8}.

The mean of the set is (2 + 4 + 7 + 8)/4 = 20


4 = 5. Then, take the difference of every value in
the set and the mean and square each difference. Use these squared differences to make a new
data set.

(2 5)2 = (3)2 = 9. (4 5)2 = (1)2 = 1. (6 5)2 = 12 = 1. (8 5)2 = 32 = 9.

So the new set is {9, 1, 1, 9}. Next, take the mean of this set: (9 + 1 + 1 + 9)/4 = 20
4 = 5. Finally,
take the square root of this mean, and the resulting
value is the standard deviation of the original
set. In this case, the standard deviation is 5 2.236

All of these statistics can be found for multiple data sets to find which set has a higher center, a
larger spread, or any other comparative characteristic.

It is important to recognize when a data set has outliers, or unusual values in the data set not
representative of the majority of the set. An example for the data set {3, 15, 16, 17, 17, 19, 20, 21} is
the value 3. That value is not similar to the other values in the set. Outliers can affect some descrip-
tive statistics, such as the mean and standard deviation. This is because in these statistics include
calculations with every value in the data set. Other statistics, such as the median, are unaffected by
outliers. This is because such statistics are calculated only by the middle values in the data set.

3
Examples

Here are a few examples to test the concepts provided in this section. Answers can be found on
the following pages.

1. Order the set of data below.

{26, 14, 12, 2, 6, 17, 25, 20, 8, 5}

2. Find the mean, median, range, and mode of the set of data below.

{22, 26, 35, 21, 26, 28, 29, 33, 30, 36}

3. Find the interquartile range of the set of data below.

{12, 4, 7, 3, 10, 2, 11, 14, 5, 9, 10}

4. Find the standard deviation of the set of data.

{2, 1, 6, 5, 1, 4, 2}

4
Solutions

These are the solutions to the questions on the previous page

1. Simply put the items in the set of data in increasing order. To do this on paper, try looking
for the lowest value, writing it down, and crossing it off on the original set. Then, find the next
lowest value, write it down after the first number, cross it off, and continue. The result will be
the ordered set of data below.

{2, 5, 6, 8, 12, 14, 17, 20, 25, 26}

2. To find the mean of the set of data, fist add up the values in the data set and divide by the total
number of values. 22 + 26 + 35 + 21 + 26 + 28 + 29 + 33 + 30 + 36 = 286. Then, there are 10
values in this set of data, so the mean is 286
10 = 28.6.

To find the median of the set of data, order the set and find the middle value or values. This
data set has 10 values so the middle two values are 28 and 29. Taking the mean of these two
values will then give the median of the data set. 28+29
2 = 28.5. Thus, the mean is 28.5. The
ordered set is given below.

{21, 22, 26, 26, 28, 29, 30, 33, 35, 36}

The mode is the most frequently occurring value. The value 26 occurs twice in the data set and
no other value does. Thus, the mode of the data set is 26.

To find the range of the set of data, take the difference of the highest and lowest values. In this
case, they are 36 and 21 respectively. Thus, the range is 36 21 = 15.

3. To find the IQR, one needs to know the 1st and 3rd quartiles. To find the 1st and 3rd quartiles,
one needs to know the median. To find the median, the data set needs to be ordered. The
ordered data set is given below.

{2, 3, 4, 5, 7, 9, 10, 10, 11, 12, 14}

The median of the data set is the middle value of the ordered set. In this case, there are five
values to the left of and to the right of 9. Thus, the median of the data set is 9.

5
The 1st quartile is the median of the values to the left of 9, or the median of {2, 3, 4, 5, 7}, and
there are two values to the right of and to the left of 4. Thus, the 1st quartile of the data is 4.

The 3rd quartile is the median of the values to the right of 9, or the median of {10, 10, 11, 12, 14},
and there are two values to the right of and to the left of 11. Thus, the 3rd quartile of the data
is 11.

The IQR is the difference of the 3rd and 1st quartiles, so the IQR of this data set is 11 4 = 7.

4. The standard deviation of a set of data is the square root of the average square of the difference
of each point and the mean of the data set. Thus, one must first find the mean of the data set.
(2 + 1 + 6 + 5 + 1 + 4 + 2)/7 = 21 7 = 3. The differences of each value with the mean and the
square of the difference is in the table below.

Value Difference Square of Difference


2 1 1
1 2 4
6 3 9
5 2 4
1 2 4
4 1 1
2 1 1

The average of the squared differences is (1 + 4 + 9 + 4 + 4 + 1 + 1)/7 = 24


7 3.429. The standard

deviation of the data set is the square root of this value, or 3.429 1.852.

You might also like