You are on page 1of 3

Republic of the Philippines

utrnn1rtN1 or 8citNct nNo 1tchNotocx


PHILIPPINE SCIENCE HIGH SCHOOL
Cordillera Administrative Region Campus
Purok 12, Upper Irisan, Baguio City email: pshs_carc@yahoo.com.ph
Name:
Section:
R Worksheet 3
WORKSHEET ON GROUPED QUANTITATIVE TABULAR PRESENTATION OF DATA USING R
In our two previous meetings, we learned how to setup a grouped quantitative and qualitative frequency distri-
bution manually. This time, we are going to use R for setting up a grouped quantitative frequency distribution in
R.
For our example, we will use Example 2-1 again from Bluman.
Suppose a researcher wished to do a study on the ages of the top 50 wealthiest people in the world. The
researcher rst would have to get the data on the ages of the people. In this case, these ages are listed
in Forbes Magazine.
49 57 38 73 81
74 59 76 65 69
54 56 69 68 78
65 85 49 69 61
48 81 68 37 43
78 82 43 64 67
52 56 81 77 79
85 40 85 59 80
60 71 57 61 69
61 83 90 87 74
In this worksheet, you are going to learn how to import data from text les using the scan() function.
Preparing and importing the data
First, open notepad and type each data point from the rst to the last columns. Hit the Enter key for every entry.
Save the text le as wealthy.txt in your Desktop. Now, go to R and change your working directory to the Desktop.
We shall import the data and store it to a variable wealthy in R using the following script.
wealthy = scan("wealthy.txt")
We shall follow the steps in the book for setting up the frequency distribution, and use R for executing each step.
We use an ideal number of classes of 7 as with the book example.
Step 1: Determine the classes
1. First, we determine the highest value and lowest value in our data set.
range(wealthy)
## [1] 37 90
Alternatively, we can use the min and max functions to determine the lowest value and highest value, respec-
tively.
2. Fromthe previous step, we can determine the class width by dividing the range by the ideal number of classes.
(90 - 37)/7
## [1] 7.571
We round this o to the nearest odd number, 7, which becomes our class width.
3. We shall now compute the class boundaries. In our manual computation of the frequency distribution, we
determined the lower class limits rst before we determined the class boundaries. The largest multiple of
7 that is less than 37, the lowest value in our data set, is 35. The smallest multiple of 7 that is larger than
the highest value in our data set is 91. The class boundaries should therefore include 34.5 and 91.5. (Always
remember that a class boundary is a decimal point more accurate than the class limit.)
We use the seq() function to generate a sequence of numbers from 34.5 to 91.5, with a common dierence of
7 and store the values in the object Breaks.
Breaks = seq(34.5, 91.5, by = 7)
Wouldyoulike tosee the class boundaries? What shouldyoutype inthe Rconsole tosee them? .
Tallying We shall nowtell R to determine to which class each data point shall belong to. This is the same as tallying
in our manual computations.
wealthy.cut = cut(wealthy, breaks = Breaks, right = FALSE)
The cut() function cuts the data set into intervals with the Breaks as breaks. The FALSE value of the right
parameter tells that the intervals are half open to the right. This means that whenever a data point falls exactly
on the boundary, it already belongs to the next class. This is not much of a concern to us since our data points are
integers. Type wealthy.cut to see the outcome of the script.
Note that the semicolon (;) in the script allows several commands to be executed in one line, from left to right.
Write the frequencies We are now ready to tabulate our results. R has the table() function for this.
wealthy.freq = table(wealthy.cut)
wealthy.freq
## wealthy.cut
## [34.5,41.5) [41.5,48.5) [48.5,55.5) [55.5,62.5) [62.5,69.5) [69.5,76.5)
## 3 3 4 10 10 5
## [76.5,83.5) [83.5,90.5)
## 10 5
Now the display looks weird because we are used to looking at data in a completely organized, column-type
fashion. We can do this with the cbind function. You may think of cbind as binding in a column. What do you think
is the function for displaying the results in row format?
cbind(wealthy.freq)
## wealthy.freq
## [34.5,41.5) 3
## [41.5,48.5) 3
## [48.5,55.5) 4
## [55.5,62.5) 10
2 of 3
## [62.5,69.5) 10
## [69.5,76.5) 5
## [76.5,83.5) 10
## [83.5,90.5) 5
3 of 3

You might also like