Professional Documents
Culture Documents
In this and the next chapter we shall work in the area of descriptive statistics.
As indicated above, descriptive techniques serve to summarize data; we want to put the
data into a “readable form” or to “create order.” In doing so we can determine, for
written as
Our first construct is what is called an absolute frequency distribution – it shows the
absolute frequencies with which the “different” values of a variable X occur in a set of
data, where absolute frequency (fj) is the number of times a particular value of X is
recorded.
Example 2.1
X: 1, 3, 1, 2, 4, 1, 3, 5, 6, 6, 4, 2, 3, 4, 3.
Table 2.1 describes the absolute frequency distribution for this data set. Note that while
for X (we have X: Xi, i=1,…,15), there are only 6 “different” values of X since most of the
X values occur more than once. Hence the j subscript indexes the different X’s (there are
only 6 of them so j = 1, …, 6). So given this absolute frequency distribution, what sort of
each absolute frequency (fj) as a fraction of the total number of observations n. Hence a
Example 2.2
Given the Example 2.1 data set, determine the associated relative frequency
distribution. To form Table 2.2, we use the same X heading as in Table 2.1 but replace fj
by fj /n. So while
Table 2.2
Relative Frequency Distribution for X
the absolute frequencies sum to 15, the relative frequencies must sum to 1. █
Example 2.3
Using the Example 2.2 relative frequency table, determine the corresponding
Table 2.3
Percent Distribution for X
X % (=100 x fj /n)
1 20.00
2 13.33
3 26.66
4 20.00
5 6.66
6 13.33
≈ 100
reveals the desired set of percentages. Clearly the percent column must total 100 (%). █
ways of conveying basically the same information. Which one you choose to employ
depends upon the type of information you wish to present. For instance, from Tables 2.1
– 2.3, it is apparent that X=3 occurs 4 times (Table 2.1) or accounts for 4/15 of all data
fj fj /n %
4--
0.3 -- 30 --
3--
0.2 -- 20 --
2--
0.1 -- 10 --
1--
X X X
1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6
Figure 2.1
The preceding set of descriptive techniques (tabular and graphical) were easily
applied to discrete sets of data with a relatively small number of observations. (A small
sample in statistics typically has about thirty observations (give or take a few data points)
whereas a medium-sized sample has around fifty observations.) Much larger data sets,
efficient manner. That is, instead of listing each and every different value of some
variable X (there may be hundreds of them) and then finding its associated absolute or
relative frequency or percent, we can opt to group the values of X into specified classes.
with which the various values of a variable X are distributed among chosen classes.
Here an absolute frequency is now a class frequency – it is the number of items falling
1. Choose the classes into which the data are to be grouped. Here we
must:
class frequencies.
While there are no hard and fast rules for carrying out Step 1, certain guidelines or “rules
inclusive;
2. obviously one must choose the classes so that all of the data can be
accommodated;
classes);
4. the class intervals should be equal (the class interval is the length of
a class; 1
5. try to avoid open-ended classes (e.g., the class “65 and over” is
open-ended);
Example 2.4
2.4.
Table 2.4
Observations on X
1 5 3 4 7 3 4 1 3
5 8 5 9 8 1 5 3 3 41
2 1 2 6 2 7 3 2
9 4 2 5 3 7 6 1 4 21
3 1 5 1 1 2 4 3
5 6 2 9 1 6 7 8 3 35
1 8 2 3 4 2 5 7 2 42
1
See Appendix 2.1 for the case of unequal class intervals.
7 9 4 7 3 8 8 2 8
3 3 3 4 6 4 3 5 1
4 0 1 6 0 4 6 2 8 49
Given that the smallest observation or min Xi = 1 and the largest observation or max Xi
=89, let us determine the range of our data set (a measure of the total spread of the data)
as
Since the range of our data set can be viewed as being made up of classes and class
So given that we know the range, we can choose the number of classes and then solve
for the common class interval or class length; or we can pick a class interval and then
solve for the required number of classes. Usually the class interval is chosen as
something easy to work with, such as 5 or 10 or 20, etc. If we decide to have the length
range 88
number of classes ≈ = = 8.8.
class interval 10
Since we cannot have a fractional part of a class, we round to 9 classes. (Had we at the
outset decided to use 9 classes, then, from (2.2), we would solve for
range 88
class interval ≈ = = 9.8.
number of classes 9
Table 2.5
Absolute Frequency Distribution for X
0-9 2 4.5
10-19 8 14.5
20-29 10 24.5
30-39 12 34.5
40-49 9 44.5
50-59 4 54.5
60-69 2 64.5
70-79 2 74.5
80-89 1 84.5
50 = ∑ fj
and largest values that can go into any class. Obviously the values 0,
10, 20, etc. are lower class limits while the values 9, 19, 29, etc.
3. Given Table 2.5, how can we find the class interval or the
successive lower class limits, e.g., 10-0 =10, 20-10=10, etc. (Note: the
class interval is “not” the difference between the upper and lower class
limits of a given class, i.e., for the third class, 29-20=9 ≠ class length =
10.)
and is obtained by averaging the class limits, e.g., for the second class
10 + 19
m2 = 2 = 14.5. What is the role of a class mark? To answer this
when faced with a table such as Table 2.5, in the class 40-49, we have
class – it is m5 = 44.5, the class midpoint. (Note that the class interval
between the first and second classes is 19.5 (19.5 is halfway between
10 and 20) and so on. (Note that the class interval can be calculated as
the difference between the upper and lower boundrys of a given class
(e.g., for the class 20-29, its length is 29.5 – 19.5 = 10.) Query: Why
are class boundrys always impossible values? (The answer is obvious
Example 2.4
Given the classes and absolute frequencies appearing in Table 2.5, the reader
fj /n) and a percent distribution (multiply the relative frequencies by 100 to convert then
Tables 2.5 – 2.7 is the histogram – a set of vertical bars or rectangles whose heights
correspond to the absolute frequencies (or relative frequencies or percents) of the classes
of X and whose bases are the class intervals. To avoid gaps between the rectangles, we
need only plot the various lower class limits on the X-axis.
Example 2.5
Figure 2.2 houses an absolute frequency histogram, a relative frequency
0/50 --
8/50 --
6/50 --
4/50 --
2/50 --
X
0 10 20 30 40 50 60 70 80 90
%
24 --
20 --
16 --
12 --
8 --
4 --
X
0 10 20 30 40 50 60 70 80 90
Percent Histogram
c.
Figure 2.2
Appendix 2.A Histograms With Classes of Different Lengths
correspond to absolute class frequencies and whose bases are the class intervals. This
definition implicitly assumes that all of the classes are of equal length. However, for
A more general definition of a histogram is: a set of vertical bars whose areas are
and
Hence frequency density amounts to absolute class frequency on a per standard width
basis – it is the absolute class frequency per interval length. So for classes of equal
length,
histogram is
standard width
[class interval ( class interval
)].
Example 2.A.1
histogram (Figure 2.A.1). Here the standard width of each class is 3 and the height of
each class is its frequency density = absolute class frequency (equation (2.A.2)). Since
the
Table 2.A.1
Absolute frequency distribution
Classes of X Absolute frequency height = frequency Area of
density = absolute class rectangle
frequency
5-7 2 2/(3/3) = 2 3 x2=6
8-10 5 5/(3/3) = 5 3 x 5 = 15
11-13 22 22/(3/3) = 22 3 x 22 = 66
14-16 13 13/(3/3) = 13 3 x 13 = 39
17-19 7 7/(3/3) = 7 3 x 7 = 21
20-22 1 1/(3/3) = 1 3 x1=3
50
classes are of equal length, the areas are determined as standard width (3) times absolute
class frequency.
absolute class frequency
25 --
20 --
15 --
10 --
5 --
X
5 8 11 14 17 20
complicated class structure. Its associated histogram appears as Figure 2.A.2. Here the
width of the first class is 5, the width of the last class is 30, and the classes 10-19 through
70-79 each have a “standard width” of 10. The height of each class is again determined
Table 2.A.2
Absolute frequency distribution
Classes of X Absolute frequency height = frequency Area of rectangle =
density height x class interval
5-9 6 6/(5/10) = 12 12 x 5 = 60
10-19 8 8/(10/10) = 8 8 x 10 = 80
20-29 11 11/(10/10) = 11 11x 10 = 110
30-39 13 13/(10/10) = 13 13 x 10 = 130
40-49 20 20/(10/10) = 20 20 x 10 = 200
50-59 7 7/(10/10) = 7 7 x 10 = 70
60-69 5 5/(10/10) = 5 5 x 10 = 50
70-79 2 2/(10/10) = 2 2 x 10 = 20
80-109 2 2/(30/10) = 2/3 2/3 x 30 = 20
74
frequency density
25 --
20 --
15 --
10 --
5 --
X
5 10 20 30 40 50 60 70 80 110