You are on page 1of 4

This module will discuss ways to summarize data using some familiar and some not-sofamiliar calculations.

In statistics, a set of data has two important characteristics: Central tendency Dispersion

Some of these calculations are fairly easy. However, some other calculations, including those for finding variance and standard deviation are tedious tasks done manually.

Measures of Central Tendency Measures of Central Tendency are numerical values that locate the middle of a set of data. This module will focus on the mean, median, and mode of a data set. The mean value is the sum of all the values divided by the number of data values (recall calculating averages). The population mean, , is the measure of central tendency that points towards the center of the population data and is computed as follows:

(Eqn. 1)

X
i =1

This formula utilizes summation notation (). Review this activity from HyperStat for a review and some examples using summation notation. The median of a variable is the value that lies in the middle of a data when arranged in order of size. If the number of data items is even, the median is the mean of two middle observations. The mode of a variable is the value that occurs most frequency. For example, the mode for the data set 4, 2, 9, 5, 2 is 2 because it occurs twice.

Measures of Dispersion

MAT130 Module Three

Dispersion, or variability, is associated with the spread of a data set. Two sets of data can have the same mean, but drastically different levels of spread. Data that is closely grouped will have relatively small values for the measures of dispersion. This module will focus on the range, the variance, and the standard deviation of a data set. Range is the difference between the largest and smallest values within the data. The data set 4, 2, 9, 5, 2, has a range of 9 2 = 7. Variance measures dispersion about the mean. The population variance is the mean of the sum of the squared differences about the population mean. The variance for the population is shown below.

(Eqn. 2)

=
2

(x
i =1

The standard deviation is the square root of the population variance, or = 2 .

Sample Data If a sample is being considered for n observations, rather than a population, the sample mean, x , and sample variance, s2, are as follows. (Eqn. 3) x=

X
i =1

(Eqn. 4)

s =
2

(x
i =1

x)

n 1

Notice that the sample mean is calculated in the same manner as the population mean. However, the variance for a sample is different than the variance for a population. The standard deviation for a sample is s = s 2 .

Quartiles Quartiles divide data sets into for quarters and help in visualizing the spread of data. These quartiles are used to obtain the five number summary, which contain the following elements: The minimum
2 MAT 130 Module Three

Q 1 , where 25% of the data falls below Q 2 , or the median of the data Q 3 , where 75% of the data falls below The maximum The five number summary is used to create a boxplot, another visual representation of data. For example, given below is the data for the weight of a sample of 25 tablets.
Weight (grams) 0.608 0.598 0.608 0.606 0.608 0.610 0.610 0.605 0.612 0.611 0.601 0.600 0.610 0.602 0.608 0.607 0.607

0.609 0.608 0.605 0.611 0.600 0.605 0.610 0.603

There are many programs that can numerically summarize this data. The following table and boxplot was creating using StatCrunch. Notice that the table gives values for the sample mean, sample variance, sample standard deviation, range, and the five number summary. Summary statistics: Column Mean Variance Std. Dev. Median Range Min 0.608 Max Q1 Q3 Weight of Tablet 0.60648 1.5176666E-5 0.0038957242 0.014 0.598 0.612 0.605 0.61

MAT130 Module Three

As shown, programs are extremely useful in carrying out computations that can otherwise seem extremely tedious.

MAT 130 Module Three

You might also like