You are on page 1of 89

CHAPTER 2:

ORGANIZING DATA
RAW DATA
 Definition
 Data recorded in the sequence in
which they are collected and before
they are processed or ranked are
called raw data.

2
Table 2.1 Ages of 50 students

21 19 24 25 29 34 26 27 37 33
18 20 19 22 19 19 25 22 25 23
25 19 31 19 23 18 23 19 23 26
22 28 21 20 22 22 21 20 19 21
25 23 18 37 27 23 21 25 21 24

3
Table 2.2 Status of 50 Students

J F SO SE J J SE J J J
F F J F F F SE SO SE J
J F SE SO SO F J F SE SE
SO SE J SO SO J J SO F SO
SE SE F SE J SO F J SO SO

4
ORGANIZING AND
GRAPHING QUANTITATIVE
DATA
 Frequency Distributions
 Relative Frequency and Percentage
Distributions
 Graphical Presentation of Qualitative
Data
 Bar Graphs
 Pie Charts

5
TABLE 2.3 Type of Employment Students
Intend to Engage In

Type of Employment Number of Frequency


Variable Students column
Private 44
Category
companies/businesses 16 Frequenc
Federal government 23 y
State/local government 17
Own business
Sum = 100

6
Frequency Distributions
 Definition
 A frequency distribution for
qualitative data lists all categories and
the number of elements that belong to
each of the categories.

7
Variable
A characteristic that varies
with an individual or an
object, is called a variable

Discrete Variable Continuous Variable


Whose values change Can take any value within
by steps or jumps The range of numbers

8
Frequency Distribution:
A frequency distribution divides observations in the data set into conveniently established,
numerically ordered classes (groups or categories).
For example.
An advertising company kept an account of response letters received each day
over a period of 50 days. The observations were.
0 2 1 1 1 2 0 0 1 0 1
0 0 1 0 1 1 0 2 0 0 2
0 1 0 1 0 1 0 3 1 0 1
0 1 0 2 5 1 2 0 0 0 0
3 0 1 1 2 0

No.o
Frequency Distribution:
Any table arrange in such a way
that data are with their frequency

letter 9
CLASS BOUNDARIES:
The true class limits of a class are known as its class boundaries.
In this example:

Class
It should be noted that the difference between the upper class boundary
and the lower class boundary of any class is equal to the class interval h = 3.
32.95 minus 29.95 is equal to 3, 35.95 minus 32.95 is equal to 3, and so on.

10
Relative Frequency Distribution

Cla
ss
L
im it
11
Mid Point

C
12
Example 2-1
 A sample of 30 employees from large
companies was selected, and these
employees were asked how stressful their
jobs were. The responses of these
employees are recorded next where very
represents very stressful, somewhat
means somewhat stressful, and none
stands for not stressful at all.

13
Example 2-1
Some what None SomewhatVery Very None

Very SomewhatSomewhatVery Somewhat Somewhat

Very SomewhatNone Very None Somewhat

Somewhat Very SomewhatSomewhatVery None

Somewhat Very very somewhat None Somewhat


Construct a frequency distribution table for
these data.

14
Solution 2-1
Table 2.4 Frequency Distribution of Stress on Job
Stress on Job Tally Frequency (f)
Very |||| |||| 10
Somewhat |||| |||| |||| 14
None |||| | 6
Sum = 30

15
Relative Frequency and
Percentage Distributions
 Calculating Relative Frequency of a
Category

Frequency of that category


Re lative frequency of a category =
Sum of all frequencie s

16
Relative Frequency and
Percentage Distributions
cont.
 Calculating Percentage

Percentage = (Relative frequency) ·


100

17
Example 2-2
 Determine the relative frequency and
percentage for the data in Table 2.4.

18
Solution 2-2

Table 2.5 Relative Frequency and Percentage


Distributions of Stress on Job

Stress on Job Relative Frequency Percentage

Very 10/30 = .333 .333(100) = 33.3


Somewhat 14/30 = .467 .467(100) = 46.7
None 6/30 = .200 .200(100) = 20.0

19
Graphical Presentation of
Qualitative Data
 Definition
 A graph made of bars whose heights
represent the frequencies of respective
categories is called a bar graph.

20
Figure 2.1 Bar graph for the frequency
distribution of Table 2.4

16
14
12
Frequency

10
8
6
4
2
0
Very Somewhat None
Strees on Job

21
Graphical Presentation of
Qualitative Data cont.
 Definition
 A circle divided into portions that
represent the relative frequencies or
percentages of a population or a
sample belonging to different
categories is called a pie chart.

22
Table 2.6 Calculating Angle Sizes for the Pie
Chart

Stress on Job Relative Angle Size


Frequency
Very .333 360(.333) =
Somewhat .467 119.88
None .200 360(.467) =
168.12 360(.200)
= 72.00
Sum = 1.00 Sum = 360

23
Figure 2.2 Pie chart for the percentage
distribution of Table 2.5.

None, 20%
Very,
33.30%

Somewhat,
46.70%

24
Graphical Presentation of
Data
TYPES OF DATA

Qualitative Quantitative

Univariate Bivariate Discrete Continuous


Frequency Frequency
Table Table Frequency Frequency
Distribution Distribution
Percentages
Component Multiple Line Chart Histogram
Bar Chart Bar Chart
Pie Chart
Frequency
Bar Chart Polygon

Frequency
Curve 25
Presentation of
Qualitative Data
Qualitative

Univariate Bivariate
Frequency Frequency
Table Table

Percentages
Component Multiple
Bar Chart Bar Chart
Pie Chart

Bar Chart

For example Medium of Instruction of any student at school level


We will have an array of observations as follows:
U, U, E, U, E, E, E, U, ……
(U : URDU MEDIUM)
(E : ENGLISH MEDIUM) 26
This will result in the following table:
Medium of No. of Students
Institution (f)
Urdu 719
English 481
1200

800

600

400

200

0
1 2

Ser i es 1 719 481

27
Dividing the cell frequencies by the total frequency and multiplying by 100 we obtain the following:

Medium of f %
Institution
Urdu 719 59.9 = 60%
English 481 40.1 = 40%
1200

28
PIE CHART
Medium of f Angle
Institution
Urdu 719 215.70
ENGLISH 481 144.30
1200

Urdu
215.70

English
144.30

29
SIMPLE BAR CHART:
Suppose we have available to us information regarding the turnover of a company
for 5 years as given in the table below:
Years 1965 1966 1967 1968 1969

Turnover 35,000 42,000 43,500 48,000 48,500


(Rupees)

50,000

40,000

30,000

20,000

10,000

0
1965 1966 1967 1968 1969

30
Bivariate Data:
ose that along with the enquiry about the Medium of Institut
re also recording the sex of the student.
Student No. Medium Gender
1 U F
2 U M
3 E M
4 U F
5 E M
6 E F
7 U M
8 E M
: : :
: : :

31
Sex Male Female Total
Med.
Urdu 202 517 719

English 350 131 481

Total 552 648 1200

COMPONENT BAR CHAR:


800
Urdu
English
700
600
500
400
300
200
100
0
Male Female

32
MULTIPLE BAR CHART

e we have information regarding the imports and exports of P


years 1970-71 to 1974-75 as shown in the table below:
Years Imports Exports
(Crores of Rs.) (Crores of Rs.)
1970-71 370 200
1971-72 350 337
1972-73 840 855
1973-74 1438 1016
1974-75 2092 1029

33
ultiple Bar Chart Showing Imports & Exports of Pakistan 1970-71 to 197

2500

2000

1500

1000 Imports
Exports
500

0
1

5
-7

-7

-7

-7

-7
72
70

71

73

74
19

19

19

19

19

34
ORGANIZING AND
GRAPHING QUANTITATIVE
DATA
 Frequency Distributions
 Constructing Frequency Distribution
Tables
 Relative and Percentage Distributions
 Graphing Grouped Data
 Histograms
 Polygons

35
Frequency Distributions
Table 2.7 Weekly Earnings of 100 Employees of a
Company
Weekly Earnings Number of Employees Frequency
Variable
(dollars) f column
401 to 600 9
601 to 800 22
801 to 1000 39 Frequency of
Third class
the third class
1001 to 1200 15
1201 to 1400 9
1401 to 1600 6

Lower limit of the Upper limit of


sixth class the sixth class

36
Frequency Distributions
cont.
 Definition
 A frequency distribution for
quantitative data lists all the classes
and the number of values that belong
to each class. Data presented in the
form of a frequency distribution are
called grouped data.

37
Frequency Distributions
cont.
 Definition
 The class boundary is given by the
midpoint of the upper limit of one
class and the lower limit of the next
class.

38
Frequency Distributions
cont.
Finding Class Width

Class width = Upper boundary – Lower


boundary

39
Frequency Distributions
cont.
Calculating Class Midpoint or Mark

Lower limit + Upper limit


Class midpoint or mark =
2

40
Constructing Frequency
Distribution Tables
Calculation of Class Width

Largest value - Smallest value


Approximate class width =
Number of classes

41
Table 2.8 Class Boundaries, Class Widths, and
Class Midpoints for Table 2.7

Class Limits Class Boundaries Class Width Class


Midpoint
401 to 600 400.5 to less than 600.5 200 500.5
601 to 800 600.5 to less than 800.5 200 700.5
801 to 1000 800.5 to less than 200 900.5
1001 to 1200 1000.5 200 1100.5
1201 to 1400 1000.5 to less than 200 1300.5
1200.5
1401 to 1600 200 1500.5
1200.5 to less than
1400.5
1400.5 to less than
1600.5
42
Example 2-3
 Table 2.9 gives the total home runs hit
by all players of each of the 30 Major
League Baseball teams during the 2002
season. Construct a frequency
distribution table.

43
Table 2.9 Home Runs Hit by Major League
Baseball Teams During the 2002
Season

Team Home Runs Team Home Runs

Anaheim 152 Milwaukee 139


Arizona 165 Minnesota 167
Atlanta 164 Montreal 162
Baltimore 165 New York Mets 160
Boston 177 New York Yankees 223
Chicago Cubs 200 Oakland 205
Chicago White Sox 217 Philadelphia 165
Cincinnati 169 Pittsburgh 142
Cleveland 192 St. Louis 175
Colorado 152 San Diego 136
Detroit 124 San Francisco 198
Florida 146 Seattle 152
Houston 167 Tampa Bay 133
Kansas City 140 Texas 230
Los Angeles 155 Toronto 187

44
Solution 2-3
230 − 124
Approximate width of each class = = 21.2
5
Now we round this approximate width to a
convenient number – say, 22.

45
Solution 2-3
The lower limit of the first class can be
taken as 124 or any number less than 124.
Suppose we take 124 as the lower limit of
the first class. Then our classes will be
124 – 145, 146 – 167, 168 – 189, 190 –
211,
and 212 - 233

46
Table 2.10 Frequency Distribution for the
Data of Table 2.9

Total Home Tally f


Runs
124 – 145 |||| | 6
146 – 167 |||| |||| ||| 13
168 – 189 |||| 4
190 – 211 |||| 4
212 - 233 ||| 3
∑f = 30
47
Relative Frequency and
Percentage Distributions
Relative Frequency and Percentage
Distributions
Frequency of that class f
Relative frequency of a class = =
Sum of all frequencies ∑f
Percentage = (Relative frequency) ⋅ 100

48
Example 2-4
 Calculate the relative frequencies
and percentages for Table 2.10

49
Solution 2-4
Table 2.11 Relative Frequency and Percentage
Distributions for Table 2.10
Total Home Class Boundaries Relative Percentage
Runs Frequency

124 – 145 123.5 to less than 145.5 .200 20.0


146 – 167 145.5 to less than 167.5 .433 43.3
168 – 189 167.5 to less than 189.5 .133 13.3
190 – 211 189.5 to less than 211.5 .133 13.3
212 - 233 211.5 to less than 233.5 .100 10.0

50
Graphing Grouped Data

 Definition
 A histogram is a graph in which classes are
marked on the horizontal axis and the
frequencies, relative frequencies, or
percentages are marked on the vertical axis.
The frequencies, relative frequencies, or
percentages are represented by the heights
of the bars. In a histogram, the bars are
drawn adjacent to each other.
51
Figure 2.3 Frequency histogram for Table
2.10.

15

12
Frequency

124 146 168 - 190 212 -


- - 189 - 233
52
Total
145 167home runs
211
Figure 2.4 Relative frequency histogram for
Table 2.10.
Relative Frequency

.50

.40

.30

.20

.10

124 146 168 - 190 212 -


- - 189 - 233
53
Total
145 167home runs
211
Graphing Grouped Data
cont.

 Definition
 A graph formed by joining the
midpoints of the tops of successive
bars in a histogram with straight lines
is called a polygon.

54
Figure 2.5 Frequency polygon for Table 2.10.

15

12
Frequency

124 146 168 - 190 212 -


- - 189 - 233
145 167 211 55
Figure 2.6 Frequency Distribution curve.
Frequency

56
Example 2-5
 The following data give the average
travel time from home to work (in
minutes) for 50 states. The data are
based on a sample survey of 700,000
households conducted by the Census
Bureau (USA TODAY, August 6, 2001).

57
Example 2-5
22.4 18.2 23.7 19.8 26.7 23.4 23.5 22.5 24.3 26.7 24.2
19.7 27.0 21.7 17.6 17.7 22.5 23.7 21.2 29.2 26.1 22.7
21.6 21.9 23.2 16.0 16.1 22.3 24.4 28.7 19.9 31.2 22.6
15.4 22.1 19.6 21.4 23.8 21.9 21.9 15.6 22.7 23.6 20.8
21.1 25.4 24.9 25.5 20.1 17.1

Construct a frequency distribution table.


Calculate the relative frequencies and
percentages for all classes.

58
Solution 2-5

31.2 − 15.4
Approximate width of each class = = 2.63
6

59
Solution 2-5
Table 2.12 Frequency, Relative Frequency, and
Percentage Distributions of Average Travel
Time to Work
Class Boundaries f Relative Percentage
Frequency
15 to less than 18 7 .14 14
18 to less than 21 7 .14 14
21 to less than 24 23 .46 46
24 to less than 27 9 .18 18
27 to less than 30 3 .06 6
30 to less than 33 1 .02 2
Σf = 50 Sum = 1.00 Sum = 100%

60
Example 2-6
The administration in a large city wanted to know
the distribution of vehicles owned by households in
that city. A sample of 40 randomly selected
households from this city produced the following
data on the number of vehicles owned:
5 1 1 2 0 1 1 2 1 1
1 3 3 0 2 5 1 2 3 4
2 1 2 2 1 2 2 1 1 1
4 2 1 1 2 1 1 4 1 3
Construct a frequency distribution table for these
data, and draw a bar graph.

61
Solution 2-6
Table 2.13 Frequency Distribution of Vehicles
Owned
Vehicles Number of
Owned Households (f)
0 2
1 18
2 11
3 4
4 3
5 2
Σf = 40
62
Figure 2.7 Bar graph for Table 2.13.

20

18

16

14

12
Frequency

10

0
No Car 1 Car 2 Cars 3 Cars 4 Cars 5 Cars
Vehicles ow ned

63
SHAPES OF HISTOGRAMS
1. Symmetric
2. Skewed
3. Uniform or rectangular

64
Figure 2.8 Symmetric histograms.

65
Figure 2.9 (a) A histogram skewed to the
right. (b) A histogram skewed to the left.

(a) (b)

66
Figure 2.10 A histogram with uniform
distribution.

67
Figure 2.11 (a) and (b) Symmetric frequency
curves. (c) Frequency curve skewed
to the right. (d) Frequency curve skewed
to the left.

68
CUMULATIVE FREQUENCY
DISTRIBUTIONS
 Definition
 A cumulative frequency distribution
gives the total number of values that
fall below the upper boundary of each
class.

69
Example 2-7
 Using the frequency distribution of
Table 2.10, reproduced in the next
slide, prepare a cumulative frequency
distribution for the home runs hit by
Major League Baseball teams during
the 2002 season.

70
Example 2-7
Total Home f
Runs
124 – 145 6
146 – 167 13
168 – 189 4
190 – 211 4
212 - 233 3

71
Solution 2-7
Table 2.14 Cumulative Frequency Distribution of Home
Runs by Baseball Teams

Class Class Boundaries Cumulative Frequency


Limits
124 – 145 123.5 to less than 145.5 6
124 – 167 123.5 to less than 167.5 6+ 13 = 19
124 – 189 123.5 to less than 189.5 6+ 13 + 4 = 23
124 – 211 123.5 to less than 211.5 6+ 13 + 4 + 4 = 27
124 – 233 123.5 to less than 233.5 6+ 13 + 4+4+3=
30

72
CUMULATIVE FREQUENCY
DISTRIBUTIONS cont.
 Calculating Cumulative Relative
Frequency and Cumulative Percentage
Cumulative frequency of a class
Cumulative relative frequency =
Total observations in the data set

Cumulative percentage = (Cumulative relative frequency) ⋅ 100

73
Table 2.15 Cumulative Relative Frequency
and Cumulative Percentage
Distributions for Home Runs Hit by
baseball Teams

Class Limits Cumulative Cumulative


Relative Frequency Percentage
124 – 145 6/30 = .200 20.0
124 – 167 19/30 = .633 63.3
124 – 189 23/30 = .767 76.7
124 – 211 27/30 = .900 90.0
124 - 233 30/30 = 1.00 100.0

74
CUMULATIVE FREQUENCY
DISTRIBUTIONS cont.
 Definition
 An ogive is a curve drawn for the
cumulative frequency distribution by
joining with straight lines the dots
marked above the upper boundaries of
classes at heights equal to the
cumulative frequencies of respective
classes.
75
Figure 2.12 Ogive for the cumulative
frequency distribution in Table
2.14

30

25

20
Cumulative
frequency

15

10

123.5 145.5 167.5 189.5 211.5 233.5


76
Total home runs
STEM-AND-LEAF DISPLAYS
 Definition
 In a stem-and-leaf display of
quantitative data, each value is
divided into two portions – a stem and
a leaf. The leaves for each stem are
shown separately in a display.

77
Example 2-8
 The following are the scores of 30
college students on a statistics test:
75 52 80 96 65 79 71 87 93 95
69 72 81 61 76 86 79 68 50 92
83 84 77 64 71 87 72 92 57 98

 Construct a stem-and-leaf display.

78
Solution 2-8
 To construct a stem-and-leaf display
for these scores, we split each score
into two parts. The first part contains
the first digit, which is called the stem.
The second part contains the second
digit, which is called the leaf.

79
Solution 2-8
 We observe from the data that the
stems for all scores are 5, 6, 7, 8, and
9 because all the scores lie in the
range 50 to 98

80
Figure 2.13 Stem-and-leaf display.

Stems

Leaf for 52

5 2
Leaf for 75
6
7 5
8
9

81
Solution 2-8
 After we have listed the stems, we
read the leaves for all scores and
record them next to the corresponding
stems on the right side of the vertical
line.

82
Figure 2.14 Stem-and-leaf display of test
scores.

5 2 0 7
6 5 9 1 8 4
7 5 9 1 2 6 9 7 1
8 2
9 0 7 1 6 3 4 7
6 3 5 2 2 8

83
Figure 2.15 Ranked stem-and-leaf display of
test scores.

5 0 2 7
6 1 4 5 8 9
7 1 1 2 2 5 6 7 9
8 9
9 0 1 3 4 6 7 7
2 2 3 5 6 8

84
Example 2-9
 The following data are monthly rents
paid by a sample of 30 households
selected from a small city.
880 1081 721 1075 1023 775 1235 750 965 960
1210 985 1231 932 850 825 1000 915 1191 1035
1151 630 1175 952 1100 1140 750 1140 1370 1280

 Construct a stem-and-leaf display for


these data.

85
Solution 2-9
Figure 6 30
2.16 Stem- 7 75 50 21 50
and-leaf
display of 8 80 25 50
rents. 9 32 52 15 60 85 65
10 23 81 35 75 00
11 91 51 40 75 40 00
12 10 31 35 80
13 70

86
Example 2-10
 The following stem-and-leaf display is
prepared for the number of hours that
25 students spent working on
computers during the last month.

87
Example 2-10
0 6
1 1 7 9
2 2 6
3 2 4 7 8
4 1 5 6 9 9
5 3 6 8
6 2 4 4 5 7
7
8 5 6
 Prepare a new stem-and-leaf display
by grouping the stems.
88
Solution 2-10

Figure 2.17 Grouped stem-and-leaf display.

0–2 6 * 1 7 9 * 2 6
2 4 7 8 * 1 5 6 9 9 * 3 6 8
3– 2 4 4 5 7 * * 5 6
5

6–8

89

You might also like