You are on page 1of 82

Commission on Higher Education

in collaboration with the Philippine Normal University

TEACHING GUIDE FOR SENIOR HIGH SCHOOL

Statistics and Probability


CORE SUBJECT

This Teaching Guide was collaboratively developed and reviewed by educators from public
and private schools, colleges, and universities. We encourage teachers and other education
stakeholders to email their feedback, comments, and recommendations to the Commission on
Higher Education, K to 12 Transition Program Management Unit - Senior High School
Support Team at k12@ched.gov.ph. We value your feedback and recommendations.

INITIAL RELEASE: 13 JUNE 2016


Published by the Commission on Higher Education, 2016
Chairperson: Patricia B. Licuanan, Ph.D.

Commission on Higher Education


K to 12 Transition Program Management Unit
Office Address: 4th Floor, Commission on Higher Education,
C.P. Garcia Ave., Diliman, Quezon City
Telefax: (02) 441-1143 / E-mail Address: k12@ched.gov.ph

DEVELOPMENT TEAM
Team Leader: Jose Ramon G. Albert, Ph.D.
Writers:
Zita VJ Albacea, Ph.D., Mark John V. Ayaay
Isidoro P. David, Ph.D., Imelda E. de Mesa
This Teaching Guide by the
Technical Editors: Commission on Higher Education
Nancy A. Tandang, Ph.D., Roselle V. Collado is licensed under a Creative
Commons Attribution-
Copy Reader: Rea Uy-Epistola
NonCommercial-ShareAlike
Illustrator: Michael Rey O. Santos 4.0 International License. This
means you are free to:
Cover Artists: Paolo Kurtis N. Tan, Renan U. Ortiz
Share copy and redistribute
CONSULTANTS the material in any medium or
THIS PROJECT WAS DEVELOPED WITH THE PHILIPPINE NORMAL UNIVERSITY. format
University President: Ester B. Ogena, Ph.D. Adapt remix, transform, and
VP for Academics: Ma. Antoinette C. Montealegre, Ph.D. build upon the material.
VP for University Relations & Advancement: Rosemarievic V. Diaz, Ph.D.
The licensor, CHED, cannot
Ma. Cynthia Rose B. Bautista, Ph.D., CHED revoke these freedoms as long as
Bienvenido F. Nebres, S.J., Ph.D., Ateneo de Manila University you follow the license terms.
Carmela C. Oracion, Ph.D., Ateneo de Manila University However, under the following
Minella C. Alarcon, Ph.D., CHED terms:
Gareth Price, Sheffield Hallam University Attribution You must give
Stuart Bevins, Ph.D., Sheffield Hallam University appropriate credit, provide a link
to the license, and indicate if
SENIOR HIGH SCHOOL SUPPORT TEAM changes were made. You may do
CHED K TO 12 TRANSITION PROGRAM MANAGEMENT UNIT so in any reasonable manner, but
Program Director: Karol Mark R. Yee not in any way that suggests the
licensor endorses you or your use.
Lead for Senior High School Support: Gerson M. Abesamis
NonCommercial You may
Lead for Policy Advocacy and Communications: Averill M. Pizarro
not use the material for
Course Development Officers: commercial purposes.
John Carlo P. Fernando, Danie Son D. Gonzalvo ShareAlike If you remix,
Teacher Training Officers: transform, or build upon the
material, you must distribute
Ma. Theresa C. Carlos, Mylene E. Dones
your contributions under the
Monitoring and Evaluation Officer: Robert Adrian N. Daulat same license as the original.

Administrative Officers: Ma. Leana Paula B. Bato, Printed in the Philippines by EC-TEC
Commercial, No. 32 St. Louis
Kevin Ross D. Nera, Allison A. Danao, Ayhen Loisse B. Dalena Compound 7, Baesa, Quezon City,
ectec_com@yahoo.com
Introduction
As the Commission supports DepEds implementation of Senior High School (SHS), it upholds the vision
and mission of the K to 12 program, stated in Section 2 of Republic Act 10533, or the Enhanced Basic
Education Act of 2013, that every graduate of basic education be an empowered individual, through a
program rooted on...the competence to engage in work and be productive, the ability to coexist in fruitful
harmony with local and global communities, the capability to engage in creative and critical thinking,
and the capacity and willingness to transform others and oneself.

To accomplish this, the Commission partnered with the Philippine Normal University (PNU), the
National Center for Teacher Education, to develop Teaching Guides for Courses of SHS. Together with
PNU, this Teaching Guide was studied and reviewed by education and pedagogy experts, and was
enhanced with appropriate methodologies and strategies.

Furthermore, the Commission believes that teachers are the most important partners in attaining this
goal. Incorporated in this Teaching Guide is a framework that will guide them in creating lessons and
assessment tools, support them in facilitating activities and questions, and assist them towards deeper
content areas and competencies. Thus, the introduction of the SHS for SHS Framework.

The SHS for SHS Framework


The SHS for SHS Framework, which stands for Saysay-Husay-Sarili for Senior High School, is at the
core of this book. The lessons, which combine high-quality content with flexible elements to
accommodate diversity of teachers and environments, promote these three fundamental concepts:

SAYSAY: MEANING HUSAY: MASTERY SARILI: OWNERSHIP


Why is this important? How will I deeply understand this? What can I do with this?

Through this Teaching Guide, Given that developing mastery When teachers empower
teachers will be able to goes beyond memorization, learners to take ownership of
facilitate an understanding of teachers should also aim for deep their learning, they develop
the value of the lessons, for understanding of the subject independence and self-
each learner to fully engage in matter where they lead learners direction, learning about both
the content on both the to analyze and synthesize the subject matter and
cognitive and affective levels. knowledge. themselves.
The Parts of the Teaching Guide Pedagogical Notes
This Teaching Guide is mapped and aligned to the The teacher should strive to keep a good balance
DepEd SHS Curriculum, designed to be highly between conceptual understanding and facility in
usable for teachers. It contains classroom activities skills and techniques. Teachers are advised to be
and pedagogical notes, and integrated with conscious of the content and performance
standards and of the suggested time frame for
innovative pedagogies. All of these elements are
each lesson, but flexibility in the management of
presented in the following parts:
the lessons is possible. Interruptions in the class
1. INTRODUCTION schedule, or students poor reception or difficulty
Highlight key concepts and identify the with a particular lesson, may require a teacher to
essential questions extend a particular presentation or discussion.
Show the big picture Computations in some topics may be facilitated by
Connect and/or review prerequisite the use of calculators. This is encour- aged;
knowledge however, it is important that the student
understands the concepts and processes involved
Clearly communicate learning
competencies and objectives in the calculation. Exams for the Basic Calculus
course may be designed so that calculators are not
Motivate through applications and
necessary.
connections to real-life
Because senior high school is a transition period
2. INSTRUCTION/DELIVERY for students, the latter must also be prepared for
Give a demonstration/lecture/simulation/ college-level academic rigor. Some topics in
hands-on activity calculus require much more rigor and precision
Show step-by-step solutions to sample than topics encountered in previous mathematics
problems courses, and treatment of the material may be
Use multimedia and other creative tools different from teaching more elementary courses.
Give applications of the theory The teacher is urged to be patient and careful in
presenting and developing the topics. To avoid too
Connect to a real-life problem if applicable
much technical discussion, some ideas can be
3. PRACTICE introduced intuitively and informally, without
Discuss worked-out examples sacrificing rigor and correctness.
Provide easy-medium-hard questions The teacher is encouraged to study the guide very
Give time for hands-on unguided classroom well, work through the examples, and solve
work and discovery exercises, well in advance of the lesson. The
development of calculus is one of humankinds
Use formative assessment to give feedback
greatest achievements. With patience, motivation
4. ENRICHMENT and discipline, teaching and learning calculus
Provide additional examples and effectively can be realized by anyone. The teaching
applications guide aims to be a valuable resource in this
Introduce extensions or generalisations of objective.
concepts
Engage in reflection questions
Encourage analysis through higher order
thinking prompts

5. EVALUATION
Supply a diverse question bank for written
work and exercises
Provide alternative formats for student
work: written homework, journal, portfolio,
group/individual projects, student-directed
research project
On DepEd Functional Skills and CHEDs College Readiness Standards
As Higher Education Institutions (HEIs) welcome the graduates of the Senior High School program, it is
of paramount importance to align Functional Skills set by DepEd with the College Readiness Standards
stated by CHED.

The DepEd articulated a set of 21st century skills that should be embedded in the SHS curriculum across
various subjects and tracks. These skills are desired outcomes that K to 12 graduates should possess in
order to proceed to either higher education, employment, entrepreneurship, or middle-level skills
development.

On the other hand, the Commission declared the College Readiness Standards that consist of the
combination of knowledge, skills, and reflective thinking necessary to participate and succeed - without
remediation - in entry-level undergraduate courses in college.

The alignment of both standards, shown below, is also presented in this Teaching Guide - prepares
Senior High School graduates to the revised college curriculum which will initially be implemented by
AY 2018-2019.

College Readiness Standards Foundational Skills DepEd Functional Skills

Produce all forms of texts (written, oral, visual, digital) based on:
1. Solid grounding on Philippine experience and culture;
2. An understanding of the self, community, and nation; Visual and information literacies
Media literacy
3. Application of critical and creative thinking and doing processes;
Critical thinking and problem solving skills
4. Competency in formulating ideas/arguments logically, scientifically,
Creativity
and creatively; and Initiative and self-direction
5. Clear appreciation of ones responsibility as a citizen of a multicultural
Philippines and a diverse world;

Global awareness
Scientific and economic literacy
Systematically apply knowledge, understanding, theory, and skills Curiosity
for the development of the self, local, and global communities using Critical thinking and problem solving skills
prior learning, inquiry, and experimentation Risk taking
Flexibility and adaptability
Initiative and self-direction

Global awareness
Media literacy
Work comfortably with relevant technologies and develop
Technological literacy
adaptations and innovations for significant use in local and global
Creativity
communities; Flexibility and adaptability
Productivity and accountability

Global awareness
Multicultural literacy
Communicate with local and global communities with proficiency,
Collaboration and interpersonal skills
orally, in writing, and through new technologies of communication;
Social and cross-cultural skills
Leadership and responsibility

Media literacy
Interact meaningfully in a social setting and contribute to the Multicultural literacy
Global awareness
fulfilment of individual and shared goals, respecting the
Collaboration and interpersonal skills
fundamental humanity of all persons and the diversity of groups
Social and cross-cultural skills
and communities Leadership and responsibility
Ethical, moral, and spiritual values
Preface
Prior to the implementation of K-12, Statistics was taught in public high schools in the Philippines
typically in the last quarter of third year. In private schools, Statistics was taught as either an elective,
or a required but separate subject outside of regular Math classes. In college, Statistics was taught
practically to everyone either as a three unit or six unit course. All college students had to take at least
three to six units of a Math course, and would typically endure a Statistics course to graduate.
Teachers who taught these Statistics classes, whether in high school or in college, would typically be
Math teachers, who may not necessarily have had formal training in Statistics. They were selected out
of the understanding (or misunderstanding) that Statistics is Math. Statistics does depend on and uses a
lot of Math, but so do many disciplines, e.g. engineering, physics, accounting, chemistry, computer
science. But Statistics is not Math, not even a branch of Math. Hardly would one think that accounting
is a branch of mathematics simply because it does a lot of calculations. An accountant would also not
describe himself as a mathematician.

Math largely involves a deterministic way of thinking and the way Math is taught in schools leads
learners into a deterministic way of examining the world around them. Statistics, on the other hand, is
by and large dealing with uncertainty. Statistics uses inductive thinking (from specifics to generalities),
while Math uses deduction (from the general to the specific).

Statistics has its own tools and ways of thinking, and statisticians are quite insistent that
those of us who teach mathematics realize that statistics is not mathematics, nor is it even a
branch of mathematics. In fact, statistics is a separate discipline with its own unique ways of
thinking and its own tools for approaching problems. - J. Michael Shaughnessy, Research on
Students Understanding of Some Big Concepts in Statistics (2006)

Statistics deals with data; its importance has been recognized by governments, by the private sector,
and across disciplines because of the need for evidence-based decision making. It has become even more
important in the past few years, now that more and more data is being collected, stored, analyzed and
re-analyzed. From the time when humanity first walked the face of the earth until 2003, we created as
much as 5 exabytes of data (1 exabyte being a billion gigabytes). Information communications
technology (ICT) tools have provided us the means to transmit and exchange data much faster, whether
these data are in the form of sound, text, visual images, signals or any other form or any combination of
those forms using desktops, laptops, tablets, mobile phones, and other gadgets with the use of the
internet, social media (facebook, twitter). With the data deluge arising from using ICT tools, as of 2012,
as much as 5 exabytes were being created every two days (the amount of data created from the
beginning of history up to 2003); a year later, this same amount of data was now being created every ten
minutes.
In order to make sense of data, which is typically having variation and uncertainty, we need the Science
of Statistics, to enable us to summarize data for describing or explaining phenomenon; or to make
predictions (assuming trends in the data continue). Statistics is the science that studies data, and what
we can do with data. Teachers of Statistics and Probability can easily spend much time on the formal
methods and computations, losing sight of the real applications, and taking the excitement out of things.
The eminent statistician Bradley Efron mentioned how diverse statistical applications are:

During the 20th Century statistical thinking and methodology has become the scientific
framework for literally dozens of fields including education, agriculture, economics, biology, and
medicine, and with increasing influence recently on the hard sciences such as astronomy,
geology, and physics. In other words, we have grown from a small obscure field into a big obscure
field.

In consequence, the work of a statistician has become even fashionable. Googles chief economist Hal
Varian wrote in 2009 that the sexy job in the next ten years will be statisticians. He went on and
mentioned that The ability to take data - to be able to understand it, to process it, to extract value from
it, to visualize it, to communicate it's going to be a hugely important skill in the next decades, not only at
the professional level but even at the educational level for elementary school kids, for high school kids,
for college kids.

This teaching guide, prepared by a team of professional statisticians and educators, aims to assist
Senior High School teachers of the Grade 11 second semester course in Statistics and Probability so that
they can help Senior High School students discover the fun in describing data, and in exploring the
stories behind the data. The K-12 curriculum provides for concepts in Statistics and Probability to be
taught from Grade 1 up to Grade 8, and in Grade 10, but the depth at which learners absorb these
concepts may need reinforcement. Thus, the first chapter of this guide discusses basic tools (such as
summary measures and graphs) for describing data. While Probability may have been discussed prior to
Grade 11, it is also discussed in Chapter 2, as a prelude to defining Random Variables and their
Distributions. The next chapter discusses Sampling and Sampling Distributions, which bridges
Descriptive Statistics and Inferential Statistics. The latter is started in Chapter 4, in Estimation, and
further discussed in Chapter 5 (which deals with Tests of Hypothesis). The final chapter discusses
Regression and Correlation.

Although Statistics and Probability may be tangential to the primary training of many if not all Senior
High School teachers of Statistics and Probability, it will be of benefit for them to see why this course is
important to teach. After all, if the teachers themselves do not find meaning in the course, neither will
the students. Work developing this set of teaching materials has been supported by the Commission on
Higher Education under a Materials Development Sub-project of the K-12 Transition Project. These
materials will also be shared with Department of Education.

Writers of this teaching guide recognize that few Senior High School teachers would have formal
training or applied experience with statistical concepts. Thus, the guide gives concrete suggestions on
classroom activities that can illustrate the wide range of processes behind data collection and data
analysis.
It would be ideal to use technology (i.e. computers) as a means to help teachers and students with
computations; hence, the guide also provides suggestions in case the class may have access to a
computer room (particularly the use of spreadsheet applications like Microsoft Excel). It would be
unproductive for teachers and students to spend too much time working on formulas, and checking
computation errors at the expense of gaining knowledge and insights about the concepts behind the
formulas.

The guide gives a mixture of lectures and activities, (the latter include actual collection and analysis of
data). It tries to follow suggestions of the Guidelines for Assessment and Instruction in Statistics
Education (GAISE) Project of the American Statistical Association to go beyond lecture methods, and
instead exercise conceptual learning, use active learning strategies and focus on real data. The guide
suggests what material is optional as there is really a lot of material that could be taught, but too little
time. Teachers will have to find a way of recognizing that diverse needs of students with variable
abilities and interests.

This teaching guide for Statistics and Probability, to be made available both digitally and in print to
senior high school teachers, shall provide Senior High School teachers of Statistics and Probability with
much-needed support as the countrys basic education system transitions into the K-12 curriculum. It is
earnestly hoped that Senior High School teachers of Grade 11 Statistics and Probability can direct
students into examining the context of data, identifying the consequences and implications of stories
behind Statistics and Probability, thus becoming critical consumers of information. It is further hoped
that the competencies gained by students in this course will help them become more statistical literate,
and more prepared for whatever employment choices (and higher education specializations) given that
employers are recognizing the importance of having their employee know skills on data management
and analysis in this very data-centric world.
Chapter 1: Exploring Data

Lesson 1: Introducing Statistics

TIME FRAME:1 hour session

OVERVIEW OF LESSON
In decision making, we use statistics although some of us may not be aware of it. In this
lesson, we make the students realize that to decide logically, they need to use statistics. An
inquiry could be answered or a problem could be solved through the use of statistics. In fact,
without knowing it we use statistics in our daily activities.

LEARNING COMPETENCIES: At the end of the lesson, the learner should be able to
identify questions that could be answered using a statistical process and describe the activities
involved in a statistical process.

LESSON OUTLINE:
1. Motivation
2. Statistics as a Tool in Decision-Making
3. Statistical Process in Solving a Problem

DEVELOPMENT OF THE LESSON


A. Motivation

You may ask the students, a question that is in their mind at that moment. You may write
their answers on the board. (Note: You may try to group the questions as you write them on
the board into two, one group will be questions that are answerable by a fact and the other
group are those that require more than one information and needs further thinking).

The following are examples of what you could have written on the board:

Group 1:
How old is our teacher?
Is the vehicle of the Mayor of our city/town/municipality bigger than the vehicle used by
the President of the Philippines?
How many days are there in December?
Does the Principal of the school has a post graduate degree?
How much does the Barangay Captain receive as allowance?
What is the weight of my smallest classmate?
Group 2:
How old are the people residing in our town?

Chapter(1:(Exploring(Data((Lesson(1( Page(1(
(
Do dogs eat more than cats?
Does it rain more in our country than in Thailand?
Do math teachers earn more than science teachers?
How many books do my classmates usually bring to school?
What is the proportion of Filipino children aged 0 to 5 years who are underweight or
overweight for their age?

The first group of questions could be answered by a piece of information which is considered
always true. There is a correct answer which is based on a fact and you dont need the
process of inquiry to answer such kind of question. For example, there is one and only one
correct answer to the first question in Group 1 and that is your age as of your last birthday or
the number of years since your birth year.

On the other hand, in the second group of questions one needs observations or data to be able
to respond to the question. In some questions you need to get the observations or responses of
all those concerned to be able to answer the question. On the first question in the second
group, you need to ask all the people in the locality about their age and among the values you
obtained you get a representative value. To answer the second question in the second group,
you need to get the amount of food that all dogs and cats eat to respond to the question.
However, we know that is not feasible to do so. Thus what you can do is get a representative
group of dogs and another representative group for the cats. Then we measure the amount of
food each group of animal eats. From these two sets of values, we could then infer whether
dogs do eat more than cats.

So as you can see in the second group of questions you need more information or data to be
able to answer the question. Either you need to get observations from all those concerned or
you get representative groups from which you gather your data. But in both cases, you need
data to be able to respond to the question. Using data to find an answer or a solution to a
problem or an inquiry is actually using the statistical process or doing it with statistics.

Now, let us formalize what we discussed and know more about statistics and how we use it in
decision-making.

B. Main Lesson

1. Statistics as a Tool in Decision-Making

Statistics is defined as a science that studies data to be able to make a decision. Hence, it is a
tool in decision-making process. Mention that Statistics as a science involves the methods of
collecting, processing, summarizing and analyzing data in order to provide answers or
solutions to an inquiry. One also needs to interpret and communicate the results of the
methods identified above to support a decision that one makes when faced with a problem or
an inquiry.
Trivia: The word statistics actually comes from the word state because governments
have been involved in the statistical activities, especially the conduct of censuses either for
military or taxation purposes. The need for and conduct of censuses are recorded in the
pages of holy texts. In the Christian Bible, particularly the Book of Numbers, God is
reported to have instructed Moses to carry out a census. Another census mentioned in the

Chapter(1:(Exploring(Data((Lesson(1( Page(2(
(
Bible is the census ordered by Caesar Augustus throughout the entire Roman Empire before
the birth of Christ.
Inform students that uncovering patterns in data involves not just science but it is also an art,
and this is why some people may think Stat is eeeks! and may view any statistical
procedures and results with much skepticism. (See Figure 1-1.)

Make known to students that Statistics enable us to


characterize persons, objects, situations, and phenomena;
explain relationships among variables;
formulate objective assessments and comparisons; and, more importantly
make evidence-based decisions and predictions.
And to use Statistics in decision-making there is a statistical process to follow which is to be
discussed in the next section.

2. Statistical Process in Solving a Problem


You may go back to one of the questions identified in the second group and use it to discuss
the components of a statistical process. For illustration on how to do it, let us discuss how we
could answer the question Do dogs eat more than cats?
As discussed earlier, this question requires you to gather data to generate statistics which will
serve as basis in answering the query. There should be plan or a design on how to collect the
data so that the information we get from it is enough or sufficient for us to minimize any bias
in responding to the query. In relation to the query, we said earlier that we cannot gather the
data from all dogs and cats. Hence, the plan is to get representative group of dogs and another
representative group of cats. These representative groups were observed for some
characteristics like the animal weight, amount of food in grams eaten per day and breed of the
animal. Included in the plan are factors like how many dogs and cats are included in the
group, how to select those included in the representative groups and when to observe these
animals for their characteristics.
After the data were gathered, we must verify the quality of the data to make a good decision.
Data quality check could be done as we process the data to summarize the information
extracted from the data. Then using this information, one can then make a decision or provide
answers to the problem or question at hand.
To summarize, a statistical process in making a decision or providing solutions to a problem
include the following:

Planning or designing the collection of data to answer statistical questions in a way that
maximizes information content and minimizes bias;
Collecting the data as required in the plan;
Verifying the quality of the data after they were collected;
Summarizing the information extracted from the data; and
Examining the summary statistics so that insight and meaningful information can be
produced to support decision-making or solutions to the question or problem at hand.

Chapter(1:(Exploring(Data((Lesson(1( Page(3(
(
Hence, several activities make up a statistical process which for some the process is simple
but for others it might be a little bit complicated to implement. Also, not all questions or
problems could be answered by a simple statistical process. There are indeed problems that
need complex statistical process. However, one can be assured that logical decisions or
solutions could be formulated using a statistical process.

KEY POINTS
Difference between questions that could be and those that could not answered using
Statistics.
Statistics is a science that studies data.
There are many uses of Statistics but its main use is in decision-making.
Logical decisions or solutions to a problem could be attained through a statistical process.

REFERENCES

Albert, J. R. G. (2008).Basic Statistics for the Tertiary Level (ed. Roberto Padua, Welfredo
Patungan, Nelia Marquez), published by Rex Bookstore.
Handbook of Statistics 1 (1st and 2nd Edition), Authored by the Faculty of the Institute of
Statistics, UP Los Baos, College Laguna 4031
Workbooks in Statistics 1 (From 1st to 13th Edition), Authored by the Faculty of the Institute
of Statistics, UP Los Baos, College Laguna 4031
https://www.illustrativemathematics.org/content-standards/tasks/703
http://www.cartoonstock.com

ASSESSMENT
Note: Answers are provided inside the parentheses and in bold face.

1. Identify which of the following questions are answerable using a statistical process.
a. What is a typical size of a Filipino family? (answerable through a statistical process)
b. How many hours in a day? (not answerable through a statistical process)
c. How old is the oldest man residing in the Philippines? (answerable through a
statistical process)
d. Is planet Mars bigger than planet Earth? (not answerable through a statistical
process)
e. What is the average wage rate in the country? (answerable through a statistical
process)
f. Would Filipinos prefer eating bananas rather than apple? (answerable through a
statistical process)
g. How long did you sleep last night? (not answerable through a statistical process)

Chapter(1:(Exploring(Data((Lesson(1( Page(4(
(
h. How much a newly-hired public school teacher in NCR earns in a month? (not
answerable through a statistical process)
i. How tall is a typical Filipino? (answerable through a statistical process)
j. Did you eat your breakfast today? (not answerable through a statistical process)

2. For each of the identified questions in Number 1 that are answerable using a statistical
process, describe the activities involved in the process.

For a. What is a typical size of a Filipino family? (The process includes getting a
representative group of Filipino families and ask the family head as to how many
members do they have in their family. From the gathered data which had undergone
a quality check a typical value of the number of family members could be obtained.
Such typical value represents a possible answer to the question.)

For c. How old is the oldest man residing in the Philippines? (The process includes getting
the ages of all residents of the country. From the gathered data which had undergone
a quality check the highest value of age could be obtained. Such value is the answer to
the question.)

For e. What is the average wage rate in the country? (The process includes getting all
prevailing wage rates in the country. From the gathered data which had undergone a
quality check a typical value of the wage rate could be obtained. Such value is the
answer to the question.)

For f. Would Filipinos prefer eating bananas rather than apple? (The process includes
getting a representative group of Filipinos and ask each one of them on what fruit
he/she prefers, banana or apple? From the gathered data which had undergone a
quality check the proportion of those who prefers banana and proportion of those
who prefer apple will be computed and compared. The results of this comparison
could provide a possible answer to the question.)

For i. How tall is a typical Filipino? (The process includes getting a representative group
of Filipinos and measure the height of each member of the representative group.
From the gathered data which had undergone a quality check a typical value of the
height of a Filipino could be obtained. Such typical value represents a possible answer
to the question.)

Note: Tell the students that getting a representative group and obtaining a typical value are
to be learned in subsequent lessons in this subject.

Chapter(1:(Exploring(Data((Lesson(1( Page(5(
(
Chapter 1: Exploring Data

Lesson 2: Data Collection Activity

TIME FRAME:1 hour session

OVERVIEW OF LESSON
As we have learned in the previous lesson, Statistics is a science that studies data. Hence to
teach Statistics, real data set is recommend to use. In this lesson,we present an activity where
the students will be asked to provide some data that will be submitted for consolidation by the
teacher for future lessons. Data on heights and weights, for instance, will be used for
calculating Body Mass Index in the integrative lesson. Students will also be given the
perspective that the data they provided is part of a bigger group of data as the same data will
be asked from much larger groups (the entire class, all Grade 11 students in school, all Grade
11 students in the district). The contextualization of data will also be discussed.

LEARNING COMPETENCIES: At the end of the lesson, the learner should be able to:

Recognize the importance of providing correct information in a data collection activity;


Understand the issue of confidentiality of information in a data collection activity;
Participate in a data collection activity; and
Contextualize data

LESSON OUTLINE:

1. Preliminaries in a Data Collection Activity


2. Performing a Data Collection Activity
3. Contextualization of Data

DEVELOPMENT OF THE LESSON

A. Preliminaries in a Data Collection Activity


Before the lesson, prepare a sheet of paper listing everyones name in class with a Class
Student Number (see Attachment A for the suggested format). The class student number
is a random number chosen in the following fashion:
(a) Make a box with tickets (small pieces of papers of equal sizes) listing the numbers 1 up
to the number of students in the class.
(b) Shake the box, get a ticket, and assign the number in the ticket to the first person in the
list.
(c) Shake the box again, get another ticket, and assign the number of this ticket to the next
person in the list.
(d) Do (c) until you run out of tickets in the box.

Chapter(1(Exploring(Data((Lesson(2( Page(1(
(
At this point all the students have their corresponding class student number written across
their names in the prepared class list. Note that the preparation of the class list is done before
the class starts.
At the start of the class, inform each student confidentially of his/her class student number.
Perhaps, when the attendance is called, each student can be provided a separate piece of
paper that lists her/his name and class student number. Tell students to remember their class
student number, and to always use this throughout the semester whenever data are requested
of them. Explain to students that in data collection activity, specific identities like their
names are not required, especially because people have a right to confidentiality, but there
should be a way to develop and maintain a database to check quality of data provided, and
verify from respondent in a data collection activity the data that they provided (if necessary).
These preliminary steps for generating a class student number and informing students
confidentially of their class student number are essential for the data collection activities to be
performed in this lesson and other lessons so that students can be uniquely identified, without
having to obtain their names. Inform also the students that the class student numbers they
were given are meant to identify them without having to know their specific identities in the
class recording sheet (which will contain the consolidated records that everyone had
provided). This helps protect confidentiality of information.
In statistical activities, facts are collected from respondents for purposes of getting aggregate
information, but confidentiality should be protected. Mention that the agencies mandated to
collect data is bound by law to protect the confidentiality of information provided by
respondents. Even market research organizations in the private sector and individual
researchers also guard confidentiality as they merely want to obtain aggregate data. This way,
respondents can be truthful in giving information, and the researcher can give a commitment
to respondents that the data they provide will never be released to anyone in a form that will
identify them without their consent.
B. Performing a Data Collection Activity

Explain to the students that the purpose of this data collection activity is to gather data that
they could use for their future lessons in Statistics. It is important that they do provide the
needed information to the best of their knowledge. Also, before they respond to the
questionnaire provided in the Attachment B as Student Information Sheet (SIS), it is
recommended that each item in the SIS should be clarified. The following are suggested
clarifications to make for each item:

1. CLASS STUDENT NUMBER: This is the number that you provided confidentially to the
student at the start of the class.

2. SEX: This is the students biological sex and not their preferred gender. Hence, they have
to choose only one of the two choices by placing a check mark () at space provided
before the choices.

3. NUMBER OF SIBLINGS: This is the number of brothers and sisters that the student has
in their nuclear or immediate family. This number excludes him or her in the count. Thus,
if the student is the only child in the family then he/she will report zero as his/her number
of siblings.

Chapter(1(Exploring(Data((Lesson(2( Page(2(
(
4. WEIGHT (in kilograms): This refers to the students weight based on the students
knowledge. Note that the weight has to be reported in kilograms. In case the student
knows his/her weight in pounds, the value should be converted to kilograms by dividing
the weight in pounds by a conversion factor of 2.2 pounds per kilogram.

5. HEIGHT (in centimeters): This refers to the students height based on the students
knowledge. Note that the height has to be reported in centimeters. In case the student
knows his/her height in inches, the value should be converted to centimeters by
multiplying the height in inches by a conversion factor of 2.54 centimeters per inch.

6. AGE OF MOTHER (as of her last birthday in years): This refers to the age of the
students mother in years as of her last birthday, thus this number should be reported in
whole number. In case, the students mother is dead or nowhere to be found, ask the
student to provide the age as if the mother is alive or around.You could help the student in
determining his/her mothers age based on other information that the student could
provide like birth year of the mother or students age. Note also that a zero value is not
an acceptable value.

7. USUAL DAILY ALLOWANCE IN SCHOOL (in pesos): This refers to the usual
amount in pesos that the student is provided for when he/she goes to school in a weekday.
Note that the student can give zero as response for this item, in case he/she has no
monetary allowance per day.

8. USUAL DAILY FOOD EXPENDITURE IN SCHOOL (in pesos): This refers to the
usual amount in pesos that the student spends for food including drinks in school per day.
Note that the student can give zero as response for this item, in case he/she does not spend
for food in school.

9. USUAL NUMBER OF TEXT MESSAGES SENT IN A DAY: This refers to the usual
number of text messages that a student send in a day. Note that the student can give zero
as response for this item, in case he/she does not have the gadget to use to send a text
message or simply he/she does not send text messages.

10. MOST PREFERRED COLOR: The student is to choose a color that could be considered
his most preferred among the given choices. Note that the student could only choose one.
Hence, they have to place a check mark () at space provided before the color he/she
considers as his/her most preferred color among those given.

11. USUAL SLEEPING TIME: This refers to the usual sleeping time at night during a
typical weekday or school day. Note that the time is to be reported using the military way
of reporting the time or the 24-hour clock (0:00 to 23:59 are the possible values to use)

12. HAPPINESS INDEX FOR THE DAY : The student has to response on how he/she feels
at that time using codes from 1 to 10. Code 1 refers to the feeling that the student is very
unhappy while Code 10 refers to a feeling that the student is very happy on the day when
the data are being collected.

After the clarification, the students are provided at most 10 minutes to respond to the
questionnaire. Ask the students to submit the completed SIS so that you could consolidate the
data gathered using a formatted worksheet file provided to you as Attachment C. Having the

Chapter(1(Exploring(Data((Lesson(2( Page(3(
(
data in electronic file makes it easier for you to use it in the future lessons. Be sure that the
students provided the information in all items in the SIS.
Inform the students that you are to compile all their responses and compiling all these records
from everyone in the class is an example of a census since data has been gathered from every
student in class. Mention that the government, through the Philippine Statistics Authority
(PSA), conducts censuses to obtain information about socio-demographic characteristics of
the residents of the country. Census data are used by the government to make plans, such as
how many schools and hospitals to build. Censuses of population and housing are conducted
every 10 years on years ending in zero (e.g., 1990, 2000, 2010) to obtain population counts,
and demographic information about all Filipinos. Mid-decade population censuses have also
been conducted since 1995. Censuses of Agriculture, and of Philippine Business and
Industry, are also conducted by the PSA to obtain information on production and other
relevant economic information.
PSA is the government agency mandated to conduct censuses and surveys. Through Republic
Act 10625 (also referred to as The Philippine Statistical Act of 2013), PSA was created from
four former government statistical agencies, namely: National Statistics Office (NSO),
National Statistical Coordination Board (NSCB), Bureau of Labor and Employment of
Statistics (BLES) and Bureau of Agricultural Statistics (BAS). The other agency created
through RA 10625 is the Philippine Statistical Research and Training Institute (PSRTI) which
is mandated as the research and training arm of the Philippine Statistical System. PSRTI was
created from its forerunner the former Statistical Research and Training Center (SRTC).

C. Contextualization of Data
Ask students what comes to their minds when they hear the term data (which may be
viewed as a collection of facts from experiments, observations, sample surveys and
censuses, and administrative reporting systems).
Present to the student the following collection of numbers, figures, symbols, and words, and
ask them if they could consider the collection as data.
3, red, F, 156, 4, 65, 50, 25, 1, M, 9, 40, 68, blue, 78, 168, 69, 3, F, 6, 9, 45, 50, 20,
200, white, 2, pink, 160, 5, 60, 100, 15, 9, 8, 41, 65, black, 68, 165, 59, 7, 6, 35, 45,
Although the collection is composed of numbers and symbols that could be classified as
numeric or non-numeric, the collection has no meaning or it is not contextualized, hence it
cannot be referred to as data.
Tell the students that data are facts and figures that are presented, collected and
analyzed. Data are either numeric or non-numeric and must be contextualized. To
contextualize data, we must identify its six Ws or to put meaning on the data, we must know
the following Ws of the data:
1. Who? Who provided the data?

2. What? What are the information from the respondents and What is the unit of
measurement used for each of the information (if there are any)?

3. When? When was the data collected?

Chapter(1(Exploring(Data((Lesson(2( Page(4(
(
4. Where? Where was the data collected?

5. Why? Why was the data collected?

6. HoW? HoW was the data collected?

Let us take as an illustration the data that you have just collected from the students, and let us
put meaning or contextualize it by responding to the questions with the Ws. It is
recommended that the students answer theW-questions so that they will learn how to do it.
1. Who? Who provided the data?

The students in this class provided the data.

2. What? What are the information from the respondents and What is the unit of
measurement used for each of the information (if there are any)?

The information gathered include Class Student Number, Sex, Number of Siblings,
Weight, Height, Age of Mother, Usual Daily Allowance in School, Usual Daily Food
Expenditure in School, Usual Number of Text Messages Sent in a Day, Most
Preferred Color, Usual Sleeping Time and Happiness Index for the Day.

The units of measurement for the information on Number of Siblings, Weight, Height,
Age of Mother, Usual Daily Allowance in School, Usual Daily Food Expenditure in
School, and Usual Number of Text Messages Sent in a Day are person, kilogram,
centimeter, year, pesos, pesos and message, respectively.

3. When? When was the data collected?

The data was collected on the first few days of classes for Statistics and Probability.

4. Where? Where was the data collected?

The data was collected inside our classroom.

5. Why? Why was the data collected?

As explained earlier, the data will be used in our future lessons in Statistics and
Probability

6. HoW? HoW was the data collected?

The students provided the data by responding to the Student Information Sheet
prepared and distributed by the teacher for the data collection activity.

Once the data are contextualized, there is now meaning to the collection of number and
symbols which may now look like the following which is just a small part of the data
collected in the earlier activity.

Chapter(1(Exploring(Data((Lesson(2( Page(5(
(
Usual
Number Usual Usual daily
Age of number
Class of Height daily food Most Usual Happiness
Sex Weight mother of text
Student siblings (in allowance expenditure Preferred Sleeping Index for
(in kg) (in messages
Number (in cm) in school in school Color Time the Day
years) sent in a
person) (in pesos) (in pesos)
day
1 M 2 60 156 60 200 150 20 RED 23:00 8
2 F 5 63 160 66 300 200 25 PINK 22:00 9
3 F 3 65 165 59 250 50 15 BLUE 20:00 7
4 M 1 55 160 55 200 100 30 BLACK 19:00 6
5 M 0 65 167 45 350 300 35 BLUE 20:00 8
: : : : : : : : : : : :
: : : : : : : : : : : :

KEY POINTS
Providing correct information in a government data collection activity is a responsibility of
every citizen in the country.
Data confidentiality is important in a data collection activity.
Census is collecting data from all possible respondents.
Data to be collected must be clarified before the actual data collection.
Data must be contextualized by answering six W-questions.

REFERENCES

Albert, J. R. G. (2008). Basic Statistics for the Tertiary Level (ed. Roberto Padua, Welfredo
Patungan, Nelia Marquez), published by Rex Bookstore.
Handbook of Statistics 1 (1st and 2nd Edition), Authored by the Faculty of the Institute of
Statistics, UP Los Baos, College Laguna 4031
Workbooks in Statistics 1 (From 1st to 13th Edition), Authored by the Faculty of the Institute
of Statistics, UP Los Baos, College Laguna 4031
https://www.khanacademy.org/math/probability/statistical-studies/statistical-
questions/v/statistical-questions
https://www.illustrativemathematics.org/content-standards/tasks/703

Chapter(1(Exploring(Data((Lesson(2( Page(6(
(
ATTACHMENT A: CLASS LIST
CLASS CLASS
STUDENT NAME STUDENT STUDENT NAME STUDENT
NUMBER NUMBER
1. 36.
2, 37.
3. 38.
4. 39.
5. 40.
6. 41.(
7. 42.(
8. 43.(
9. 44.(
10. 45.(
11. 46.(
12. 47.(
13. 48.(
14. 49.(
15. 50.(
16. 51.(
17. 52.(
18. 53.(
19. 54.(
20. 55.(
21. 56.(
22. 57.(
23. 58.(
24. 59.(
25. 60.(
26. 61.(
27. 62.(
28. 63.(
29. 64.(
30. 65.(
31. 66.(
32, 67.(
33. 68.(
34. 69.(
35. 70.(

Chapter(1(Exploring(Data((Lesson(2( Page(7(
(
ATTACHMENT B: STUDENT INFORMATION SHEET

Instruction to the Students: Please provide completely the following information. Your
teacher is available to respond to your queries regarding the items in this information
sheet, if you have any. Rest assured that the information that you will be providing will
only be used in our lessons in Statistics and Probability.
1. CLASS STUDENT NUMBER: ______________

2. SEX (Put a check mark, ): ____Male __ Female 3. NUMBER OF SIBLINGS: _____

4. WEIGHT (in kilograms): ______________ 5. HEIGHT (in centimeters): ______

6. AGE OF MOTHER (as of her last birthday in years): ________


(If mother deceased, provide age if she was alive)

7. USUAL DAILY ALLOWANCE IN SCHOOL (in pesos): _________________

8. USUAL DAILY FOOD EXPENDITURE IN SCHOOL (in pesos): ___________

9. USUAL NUMBER OF TEXT MESSAGES SENT IN A DAY: ______________

10. MOST PREFERRED COLOR (Put a check mark, . Choose only one):

____WHITE ____RED ____ PINK ____ ORANGE ____YELLOW ____GREEN


____BLUE ____PEACH ____BROWN ____GRAY ____BLACK ____PURPLE

11. USUAL SLEEPING TIME (on weekdays): ______________

12. HAPPPINESS INDEX FOR THE DAY:


On a scale from 1 (very unhappy) to 10 (very happy), how do you feel today? : ______

Chapter(1(Exploring(Data((Lesson(2( Page(8(
(
ATTACHMENT C: CLASS RECORDING SHEET (for the Teachers Use)
Usual
Number Usual Usual Daily
Age of number of
Class of Daily food Most Usual Happiness
Sex Weight Height mother text
Student siblings allowance expenditure Preferred Sleeping Index for
(in kg) (in cm) (in messages
Number (in in school in school Color Time the Day
years) sent in a
person) (in pesos) (in pesos)
day

Chapter(1(Exploring(Data((Lesson(2( Page(9(
(
Chapter 1:Exploring Data

Lesson 3: Basic Terms in Statistics

TIME FRAME:1 hour session

OVERVIEW OF LESSON
As continuation of Lesson 2 (where we contextualize data) in this lesson we define basic
terms in statistics as we continue to explore data. These basic terms include the universe,
variable, population and sample. In detail we will discuss other concepts in relation to a
variable.

LEARNING OUTCOME(S): At the end of the lesson, the learner is able to

Define universe and differentiate it with population; and


Define and differentiate between qualitative and quantitative variables, and between
discrete and continuous variables (that are quantitative);

LESSON OUTLINE:
1. Recall previous lesson on Contextualizing Data
2. Definition of Basic Terms in Statistics (universe, variable, population and sample)
3. Broad of Classification of Variables(qualitative and quantitative, discrete and continuous)

DEVELOPMENT OF THE LESSON


A. Recall previous lesson on Contextualizing Data
Begin by recalling with the students the data they provided in the previous lesson and how
they contextualized such data. You could show them the compiled data set in a table like this:
Usual
Number Usual Usual Daily
Age of number
Class of Height Daily food Most Usual Happiness
Sex Weight mother of text
Student siblings (in allowance expenditure Preferred Sleeping Index for
(in kg) (in messages
Number (in cm) in school in school Color Time the Day
years) sent in a
person) (in pesos) (in pesos)
day
1 M 2 60 156 60 200 150 20 RED 23:00 8
2 F 5 63 160 66 300 200 25 PINK 22:00 9
3 F 3 65 165 59 250 50 15 BLUE 20:00 7
4 M 1 55 160 55 200 100 30 BLACK 19:00 6
5 M 0 65 167 45 350 300 35 BLUE 20:00 8
: : : : : : : : : : : :
: : : : : : : : : : : :

!
Chapter!1!Exploring!Data!!Lesson!3! Page!1"
! !
!
!
!
Recall also their response on the first Ws of the data, that is, on the question Who provided
the data? We said last time the students of the class provided the data or the data were taken
from the students.
Another Ws of the data is What? What are the information from the respondents? and What
is the unit of measurement used for each of the information (if there are any)? Our responses
are the following:

The information gathered include Class Student Number, Sex, Number of Siblings,
Weight, Height, Age of Mother, Usual Daily Allowance in School, Usual Daily Food
Expenditure in School, Usual Number of Text Messages Sent in a Day, Most
Preferred Color, Usual Sleeping Time and Happiness Index.

The units of measurement for the information on Number of Siblings, Weight, Height,
Age of Mother, Usual Daily Allowance in School, Usual Daily Food Expenditure in
School, and Usual Number of Text Messages Sent in a Day are person, kilogram,
centimeter, year, pesos, pesos and message, respectively.

B. Main Lesson

1. Definition of Basic Terms


The collection of respondents from whom one obtain the data is called the universe of the
study. In our illustration, the set of students of this Statistics and Probability class is our
universe. But we must precaution the students that a universe is not necessarily composed of
people. Since there are studies where the observations were taken from plants or animals or
even from non-living things like buildings, vehicles, farms, etc. So formally, we define
universe as the collection or set of units or entities from whom we got the data. Thus, this
set of units answers the first Ws of data contextualization.
On the other hand, the information we asked from the students are referred to as the variables
of the study and in the data collection activity, we have 12 variables including Class Student
Number. A variable is a characteristic that is observable or measurable in every unit of the
universe. From each student of the class, we got the his/her age, number of siblings, weight,
height, age of mother, usual daily allowance in school, usual daily food expenditure in
school, usual number of text messages sent in a day, most preferred color, usual sleeping time
and happiness index for the day. Since these characteristics are observable in each and every
student of the class, then these are referred to as variables.
The set of all possible values of a variable is referred to as a population. Thus for each
variable we observed, we have a population of values. The number of population in a study
will be equal to the number of variables observed. In the data collection activity we had, there
are 12 populations corresponding to 12 variables.
A subgroup of a universe or of a population is a sample. There are several ways to take a
sample from a universe or a population and the way we draw the sample dictates the kind of
analysis we do with our data.

!
Chapter!1!Exploring!Data!!Lesson!3! Page!2"
! !
!
!
!
We can further visualize these terms in the following figure:
VARIABLE 1 VARIABLE 2 VARIABLE 12

Unit!1! Value!1! Value!1! Value!1!


Unit!2! Value!2! Value!2! Value!2!
Unit!3! Value!3! Value!3! Value!3!
..!
:! :! :! :!
:! :! :! :!
Unit!N! Value!N! Value!N! Value!N!
! !

UNIVERSE POPULATION POPULATION POPULATION OF


OF VARIABLE 1 OF VARIABLE 2 VARIABLE 12

Unit!1! Value!1!
:! :!
:! OR! :!
Unit!n! Value!n!
SAMPLE

A SAMPLE OF UNITS A SAMPLE OF


POPULATION VALUES
Figure 3.1 Visualization of the relationship among universe, variable, population and sample.

2. Broad Classification of Variables


Following up with the concept of variable, inform the students that usually, a variable takes
on several values. But occasionally, a variable can only assume one value, then it is called a
constant. For instance, in a class of fifteen-year olds, the age in years of students is constant.
Variables can be broadly classified as either quantitative or qualitative, with the latter further
classified into discrete and continuous types (see Figure 3.3 below).

Figure 3.3 Broad Classification of Variables


!
Chapter!1!Exploring!Data!!Lesson!3! Page!3"
! !
!
!
!
(i) Qualitative variables express a categorical attribute, such as sex (male or female),
religion, marital status, region of residence, highest educational attainment. Qualitative
variables do not strictly take on numeric values (although we can have numeric codes for
them, e.g., for sex variable, 1 and 2 may refer to male, and female, respectively).
Qualitative data answer questions what kind. Sometimes, there is a sense of ordering in
qualitative data, e.g., income data grouped into high, middle and low-income status. Data
on sex or religion do not have the sense of ordering, as there is no such thing as a weaker
or stronger sex, and a better or worse religion. Qualitative variables are sometimes
referred to as categorical variables.

(ii) Quantitative (otherwise called numerical) data, whose sizes are meaningful, answer
questions such as how much or how many. Quantitative variables have actual units of
measure. Examples of quantitative variables include the height, weight, number of
registered cars, household size, and total household expenditures/income of survey
respondents. Quantitative data may be further classified into:

a. Discrete data are those data that can be counted, e.g., the number of days for
cellphones to fail, the ages of survey respondents measured to the nearest year, and
the number of patients in a hospital. These data assume only (a finite or infinitely)
countable number of values.

b. Continuous data are those that can be measured, e.g. the exact height of a survey
respondent and the exact volume of some liquid substance. The possible values are
uncountably infinite.
With this classification, let us then test the understanding of our students by asking them to
classify the variables, we had in our last data gathering activity. They should be able to
classify these variables as to qualitative or quantitative and further more as to discrete or
continuous. If they did it right, you have the following:

TYPE OF
TYPE OF
VARIABLE QUANTITATIVE
VARIABLE
VARIABLE
Class Student Number Qualitative
Sex Qualitative
Number of Siblings Quantitative Discrete
Weight (in kilograms) Quantitative Continuous
Height (in centimeters) Quantitative Continuous
Age of Mother Quantitative Discrete
Usual Daily Allowance in School (in Quantitative
Discrete
pesos)
Usual Daily Food Expenditure in School Quantitative
Discrete
(in pesos)
Usual Number of Text Messages Sent in Quantitative
Discrete
a Day
Usual Sleeping Time Qualitative
Most Preferred Color Qualitative
Happiness Index for the Day Qualitative
!
Chapter!1!Exploring!Data!!Lesson!3! Page!4"
! !
!
!
!
Special Note:
For quantitative data, arithmetical operations have some physical interpretation. One can add
301 and 302 if these have quantitative meanings, but if, these numbers refer to room
numbers, then adding these numbers does not make any sense. Even though a variable may
take numerical values, it does not make the corresponding variable quantitative! The issue is
whether performing arithmetical operations on these data would make any sense. It would
certainly not make sense to sum two zip codes or multiply two room numbers.

KEY POINTS

A universe is a collection of units from which the data were gathered.


A variable is a characteristic we observed or measured from every element of the
universe.
A population is a set of all possible values of a variable.
A sample is a subgroup of a universe or a population.
In a study there is only one universe but could have several populations.
Variables could be classified as qualitative or quantitative, and the latter could be further
classified as discrete or continuous.

REFERENCES
Albert, J. R. G. (2008). Basic Statistics for the Tertiary Level (ed. Roberto Padua,
WelfredoPatungan, Nelia Marquez), published by Rex Bookstore.
Handbook of Statistics 1 (1st and 2nd Edition), Authored by the Faculty of the Institute of
Statistics, UP Los Baos, College Laguna 4031
Takahashi, S. (2009). The Manga Guide to Statistics. Trend-Pro Co. Ltd.
Workbooks in Statistics 1 (From 1st to 13th Edition), Authored by the Faculty of the Institute
of Statistics, UP Los Baos, College Laguna 4031

!
Chapter!1!Exploring!Data!!Lesson!3! Page!5"
! !
!
!
!
ASSESSMENT
Note: Answers are provided inside the parentheses and in bold face.

1. A market researcher company requested all teachers of a particular school to fill up a


questionnaire in relation to their product market study. The following are some of the
information supplied by the teachers:
highest educational attainment
predominant hair color
body temperature
civil status
brand of laundry soap being used
total household expenditures last month in pesos
number of children in the household
number of hours standing in queue while waiting to be served by a bank teller
amount spent on rice last week by the household
distance travelled by the teacher in going to school
time (in hours) consumed on Facebook on a particular day
a. If we are to consider the collection of information gathered through the completed
questionnaire, what is the universe for this data set? (The universe is the set of all
teachers in that school)
b. Which of the variables are qualitative? Which are quantitative? Among the quantitative
variables, classify them further as discrete or continuous.
highest educational attainment (qualitative)
predominant hair color (qualitative)
body temperature (quantitative: continuous)
civil status (qualitative)
brand of laundry soap being used (qualitative)
total household expenditures last month in pesos (quantitative: discrete)
number of children in a household (quantitative: discrete)
number of hours standing in queue while waiting to be served by a bank teller
(quantitative: discrete)
amount spent on rice last week by a household (quantitative: discrete)
distance travelled by the teacher in going to school (quantitative: continuous)
time (in hours) consumed on Facebook on a particular day(quantitative: continuous)
c. Give at least two populations that could be observed from the variables identified in (b).
(Possible answer: The population is the set of all values of the highest educational
attainment and another population is {single, married, divorced, separated,
widow/widower})

2. The Engineering Department of a big city did a listing of all buildings in their locality. If
you are planning to gather the characteristics of these buildings,
a. what is the universe of this data collection activity? (Set of all buildings in the big city)
b. what are the crucial variables to observe? It would also be better if you could classify the
variables as to whether it is qualitative or quantitative. Furthermore, classify the
quantitative variable as discrete or continuous. (A possible answer is the number of
floors in the building, quantitative, discrete)

!
Chapter!1!Exploring!Data!!Lesson!3! Page!6"
! !
!
!
!
3. A survey of students in a certain school is conducted. The survey questionnaire details
the information on the following variables. For each of these variables, identify whether
the variable is qualitative or quantitative, and if the latter, state whether it is discrete or
continuous.
a. number of family members who are working (quantitative: discrete)
b. ownership of a cell phone among family members (qualitative)
c. length (in minutes) of longest call made on each cell phone owned per month
(quantitative: continuous)
d. ownership/rental of dwelling (qualitative)
e. amount spent in pesos on food in one week (quantitative: discrete)
f. occupation of household head (qualitative)
g. total family income (quantitative: discrete)
h. number of years of schooling of each family member (quantitative: discrete)
i. access of family members to social media (qualitative)
j. amount of time last week spent by each family member using the internet
(quantitative: continuous)

Explanatory Note:

Teachers have the option to just ask this assessment orally to the entire class, or to group
students and ask them to identify answers, or to give this as homework, or to use some
questions/items here for a chapter examination.

!
Chapter!1!Exploring!Data!!Lesson!3! Page!7"
! !
!
!
!
Chapter 1: Exploring Data

Lesson 4: Levels of Measurement

TIME FRAME:1 hour session

OVERVIEW OF LESSON
In this lesson we discuss the different levels of measurement as we continue to explore data.
Knowing such will enable us to plan the data collection process we need to employ in order
to gather the appropriate data for analysis.

LEARNING OUTCOME(S): At the end of the lesson, the learner is able to identify and
differentiate the different levels of measurement and methods of data collection

LESSON OUTLINE:
1. Motivational Activity
2. Levels of Measurement
3. Data Collection Methods

DEVELOPMENT OF THE LESSON


A. Motivational Activity

Ask the students first if they believe the following statement:

Students who eat a healthy breakfast will do best on a quiz, students who eat an unhealthy
breakfast will get an average performance, and students who do not eat anything for
breakfast will do the worst on a quiz

You could further ask one or more students who have different answers to defend their
answers. Then challenge the students to apply a statistical process to investigate on the
validity of this statement. You could enumerate on the board the steps in the process to
undertake like the following:

1. Plan or design the collection of data to verify the validity of the statement in a way that
maximizes information content and minimizes bias;
2. Collect the data as required in the plan;
3. Verify the quality of the data after it was collected;
4. Summarize the information extracted from the data; and
5. Examine the summary statistics so that insight and meaningful information can be
produced to support your decision whether to believe or not the given statement.

!
Chapter!1!Exploring!Data!!Lesson!4! Page!1"
! !
!
!
!
Let us discuss in detail the first step. In planning or designing the data collection activity, we
could consider the set of all the students in the class as our universe. Then let us identify the
variables we need to observe or measure to verify the validity of the statement. You may ask
the students to participate in the discussion by asking them to identify a question to get the
needed data. The following are some possible suggested queries:

1. Do you usually have a breakfast before going to school?


(Note: This is answerable by Yes or No)
2. What do you usually have for breakfast?
(Note: Possible responses for this question are rice, bread, banana, oatmeal, cereal, etc)
The responses in Questions Numbers 1 and 2 could lead us to identify whether a student in
the class had a healthy breakfast, an unhealthy breakfast or no breakfast at all.
Furthermore, there is a need to determine the performance of the student in a quiz on that
day. The score in the quiz could be used to identify the students performance as best,
average or worst.
As we describe the data collection process to verify the validity of the statement, there is also
a need to include the levels of measurement for the variables of interest.

B. Main Lesson:

1. Levels of Measurement

Inform students that there are four levels of measurement of variables: nominal, ordinal,
interval and ratio. These are hierarchical in nature and are described as follows:

Nominal level of measurement arises when we have variables that are categorical and non-
numeric or where the numbers have no sense of ordering. As an example, consider the
numbers on the uniforms of basketball players. Is the player wearing a number 7 a worse
player than the player wearing number 10? Maybe, or maybe not, but the number on the
uniform does not have anything to do with their performance. The numbers on the uniform
merely help identify the basketball player. Other examples of the variables measured at the
nominal level include sex, marital status, religious affiliation. For the study on the validity of
the statement regarding effect of breakfast on school performance, students who responded
Yes to Question Number 1 can be coded 1 while those who responded No, code 0 can be
assigned. The numbers used are simply for numerical codes, and cannot be used for ordering
and any mathematical computation.

Ordinal level also deals with categorical variables like the nominal level, but in this level
ordering is important, that is the values of the variable could be ranked. For the study on the
validity of the statement regarding effect of breakfast on school performance, students who
had healthy breakfast can be coded 1, those who had unhealthy breakfast as 2 while those
who had no breakfast at all as 3. Using the codes the responses could be ranked. Thus, the
students who had a healthy breakfast are ranked first while those who had no breakfast at all
are ranked last in terms of having a healthy breakfast. The numerical codes here have a
meaningful sense of ordering, unlike basketball player uniforms, the numerical codes suggest
that one student is having a healthier breakfast than another student. Other examples of the
ordinal scale include socio economic status (A to E, where A is wealthy, E is poor), difficulty
!
Chapter!1!Exploring!Data!!Lesson!4! Page!2"
! !
!
!
!
of questions in an exam (easy, medium difficult), rank in a contest (first place, second place,
etc.), and perceptions in Likert scales.

Note to Teacher: Let us also emphasize to the students that while there is a sense or ordering,
there is no zero point in an ordinal scale. In addition, there is no way to find out how much
distance there is between one category and another. In a scale from 1 to 10, the difference
between 7 and 8 may not be the same difference between 1 and 2.

Interval level tells us that one unit differs by a certain amount of degree from another unit.
Knowing how much one unit differs from another is an additional property of the interval
level on top of having the properties posses by the ordinal level. When measuring
temperature in Celsius, a 10 degree difference has the same meaning anywhere along the
scale the difference between 10 and 20 degree Celsius is the same as between 80 and 90
centigrade. But, we cannot say that 80 degrees Celsius is twice as hot as 40 degrees Celsius
since there is no true zero, but only an arbitrary zero point. A measurement of 0 degrees
Celsius does not reflect a true "lack of temperature." Thus, Celsius scale is in interval level.
Other example of a variable measure at the interval is the Intelligence Quotient (IQ) of a
person. We can tell not only which person ranks higher in IQ but also how much higher he or
she ranks with another, but zero IQ does not mean no intelligence. The students could also be
classified or categorized according to their IQ level. Hence, the IQ as measured in the
interval level has also the properties of those measured in the ordinal as well as those in the
nominal level.
Special Note: Inform also the students that the interval level allows addition and subtraction
operations, but it does not possess an absolute zero. Zero is arbitrary as it does not mean the
value does not exist. Zero only represents an additional measurement point.
Ratio level also tells us that one unit has so many times as much of the property as does
another unit. The ratio level possesses a meaningful (unique and non-arbitrary) absolute,
fixed zero point and allows all arithmetic operations. The existence of the zero point is the
only difference between ratio and interval level of measurement. Examples of the ratio scale
include mass, heights, weights, energy and electric charge. With mass as an example, the
difference between 120 grams and 135 grams is 15 grams, and this is the same difference
between 380 grams and 395 grams. The level at any given point is constant, and a
measurement of 0 reflects a complete lack of mass. Amount of money is also at the ratio
level. We can say that 2000 pesos is twice more than 1,000 pesos. In addition, money has a
true zero point: if you have zero money, this implies the absence of money. For the study on
the validity of the statement regarding effect of breakfast on school performance, the
students score in the quiz is measured at the ratio level. A score of zero implies that the
student did not get a correct answer at all.
In summary, we have the following levels of measurement:
Level Property Basic Empirical Operation
Nominal No order, distance, or origin Determination of equivalence
Has order but no distance or
Ordinal Determination of greater or lesser values
unique origin
Both with order and distance but Determination of equality of intervals or
Interval
no unique origin difference
Has order, distance and unique Determination of equality of ratios or
Ratio
origin means
!
Chapter!1!Exploring!Data!!Lesson!4! Page!3"
! !
!
!
!
The levels of measurement depend mainly on the method of measurement, not on the
property measured. The weight of primary school students measured in kilograms has a ratio
level, but the students can be categorized into overweight, normal, underweight, and in which
case, the weight is then measured in an ordinal level. Also, many levels are only interval
because their zero point is arbitrarily chosen.

To assess the students understanding of the lesson, you may go back to the set of variables in
the data gathering activity done in Lesson 2. You could ask the students to identify the level
of measurement for each of the variable. If they did it right, you have the following:

VARIABLE LEVEL OF MEASUREMENT


Class Student Number Nominal
Sex Nominal
Number of Siblings Ratio
Weight (in kilograms) Ratio
Height (in centimeters) Ratio
Age of Mother Ratio
Usual Daily Allowance in School (in pesos) Ratio
Usual Daily Food Expenditure in School (in pesos) Ratio
Usual Number of Text Messages Sent in a Day Ratio
Usual Sleeping Time Nominal
Most Preferred Color Nominal
Happiness Index for the Day Ordinal

2. Methods of Data Collection


Variables were observed or measured using any of the three methods of data collection,
namely: objective, subjective and use of existing records. The objective and subjective
methods obtained the data directly from the source. The former uses any or combination of
the five senses (sense of sight, touch, hearing, taste and smell) to measure the variable while
the latter obtains data by getting responses through a questionnaire. The resulting data from
these two methods of data collection is referred to as primary data. The data gathered in
Lesson 2 are primary data and were obtained using the subjective method.

On the other hand, secondary data are obtained through the use of existing records or data
collected by other entities for certain purposes. For example, when we use data gathered by
the Philippine Statistics Authority, we are using secondary data and the method we employ to
get the data is the use of existing records. Other data sources include administrative records,
news articles, internet, and the like. However, we must emphasize to the students that when
we use existing data we must be confident of the quality of the data we are using by knowing
how the data were gathered. Also, we must remember to request permission and acknowledge
the source of the data when using data gathered by other agency or people.

!
Chapter!1!Exploring!Data!!Lesson!4! Page!4"
! !
!
!
!
KEY POINTS

Four levels of measurement: Nominal, Ordinal, Interval and Ratio


Knowing what level the variable was measured or observed will guide us to know the
type of analysis to apply.
Three methods of data collection include objective, subjective and use of existing records.
Using the data collection method as basis, data can be classified as either primary or
secondary data.

REFERENCES
Albert, J. R. G. (2008). Basic Statistics for the Tertiary Level (ed. Roberto Padua, Welfredo
Patungan, Nelia Marquez), published by Rex Bookstore.
Handbook of Statistics 1 (1st and 2nd Edition), Authored by the Faculty of the Institute of
Statistics, UP Los Baos, College Laguna 4031
Takahashi, S. (2009). The Manga Guide to Statistics. Trend-Pro Co. Ltd.
Workbooks in Statistics 1 (From 1st to 13th Edition), Authored by the Faculty of the Institute
of Statistics, UP Los Baos, College Laguna 4031

!
Chapter!1!Exploring!Data!!Lesson!4! Page!5"
! !
!
!
!
ASSESSMENT
Note: Answers are provided inside the parentheses and in bold face.

1. Using the data of the teachers in a particular school gathered by a market researcher
company, identify the level of measurement for each of the following variable.
highest educational attainment (ordinal)
predominant hair color (nominal)
body temperature (interval)
civil status (nominal)
brand of laundry soap being used (nominal)
total household expenditures last month in pesos (ratio)
number of children in a household (ratio)
number of hours standing in queue while waiting to be served by a bank teller (ratio)
amount spent on rice last week by a household (ratio)
distance travelled by the teacher in going to school (ratio)
time (in hours) consumed on Facebook on a particular day (ratio)
2. The following variables are included in a survey conducted among students in a certain
school. Identify the level of measurement for each of the variables.
a. number of family members who are working (ratio);
b. ownership of a cell phone among family members (nominal);
c. length (in minutes) of longest call made on each cell phone owned per month (ratio);
d. ownership/rental of dwelling (nominal);
e. amount spent in pesos on food in one week (ratio);
f. occupation of household head (nominal);
g. total family income (ratio);
h. number of years of schooling of each family member (ratio);
i. access of family members to social media (nominal);
j. amount of time last week spent by each family member using the internet (ratio)

3. In the following, identify the data collection method used and the type of resulting data.
a. The website of Philippine Airlines provides a questionnaire instrument that can be
answered electronically. (subjective method, primary data)
b. The latest series of the Consumer Price Index (CPI) generated by the Philippine
Statistics Authority was downloaded from PSA website. (use of existing record,
secondary data)
c. A reporter recorded the number of minutes to travel from one end to another of the
Metro Manila Rail Transit (MRT) during peak and off-peak hours. (objective
method, primary data)
d. Students getting the height of the plants using a meter stick. (objective method,
primary data)
e. PSA enumerator conducting the Labor Force Survey goes around the country to
interview household head on employment-related variables. (subjective method,
primary data)

!
Chapter!1!Exploring!Data!!Lesson!4! Page!6"
! !
!
!
!
Chapter 1: Exploring Data

Lesson 5: Data Presentation

TIME FRAME:1 hour session

OVERVIEW OF LESSON
In this lesson we enrich what the students have already learned from Grade 1 to 10 about
presenting data. Additional concepts could help the students to appropriately describe further
the data set.

LEARNING OUTCOME(S): At the end of the lesson, the learner is able to identify and
use the appropriate method of presenting information from a data set effectively.

LESSON OUTLINE:
1. Review of Lessons in Data Presentation taken up from Grade 1 to 10.
2. Methods of Data Presentation
3. The Frequency Distribution Table and Histogram

DEVELOPMENT OF THE LESSON


A. Review of Lessons in Data Presentation taken up from Grade 1 to 10.

You could assist the students to recall what they have learned in Grade 1 to 10 regarding data
presentation by asking them to participate in an activity. The activity is called Toss the Ball.
This is actually a review and wake-up exercise. Toss a ball to a student and he/she will give
the most important concept he/she learned about data presentation.

You may list on the board their responses. You could summarize their responses to be able to
establish what they already know about data presentation techniques and from this you could
build other concepts on the topic. A suggestion is to classify their answers according to the
three methods of data presentation, i.e. textual, tabular and graphical. A possible listing will
be something like this:
Textual or Narrative Presentation:
Detailed information are given in textual presentation
Narrative report is a way to present data.
Tabular Presentation:
Numerical values are presented using tables.
Information are lost in tabular presentation of data.
Frequency distribution table is also applicable for qualitative variables

!
Chapter!1!Exploring!Data!!Lesson!5! Page!1"
! !
!
!
!
Graphical Presentation:
Trends are easily seen in graphs compared to tables.
It is good to present data using pictures or figures like the pictograph.
Pie charts are used to present data as part of one whole.
Line graphs are for time-series data.
It is better to present data using graphs than tables as they are much better to look at.

B. Main Lesson

1. Methods of Data Presentation


You could inform the students that in general there are three methods to present data. Two or
all of these three methods could be used at the same time to present appropriately the
information from the data set. These methods include the (1) textual or narrative; (2) tabular;
and (3) graphical method of presentation.
In presenting the data in textual or paragraph or narrative form, one describes the data by
enumerating some of the highlights of the data set like giving the highest, lowest or the
average values. In case there are only few observations, say less than ten observations, the
values could be enumerated if there is a need to do so. An example of which is shown below:

The countrys poverty incidence among families as reported by the Philippine


Statistics Authority (PSA), the agency mandated to release official poverty
statistics, decreases from 21% in 2006 down to 19.7% in 2012. For 2012, the
regional estimates released by PSA indicate that the Autonomous Region of Muslim
Mindanao (ARMM) is the poorest region with poverty incidence among families
estimated at 48.7%. The region with the smallest estimated poverty incidence
among families at 2.6% is the National Capital Region (NCR).

Data could also be summarized or presented using tables. The tabular method of presentation
is applicable for large data sets. Trends could easily be seen in this kind of presentation.
However, there is a loss of information when using such kind of presentation. The frequency
distribution table is the usual tabular form of presenting the distribution of the data. The
following are the common parts of a statistical table:
a. Table title includes the number and a short description of what is found inside the table.
b. Column header provides the label of what is being presented in a column.
c. Row header provides the label of what is being presented in a row.
d. Body are the information in the cell intersecting the row and the column.

In general, a table should have at least three rows and/or three columns. However, too many
information to convey in a table is also not advisable. Tables are usually used in written
technical reports and in oral presentation. Table 5.1 is an example of presenting data in
tabular form. This example was taken from 2015 Philippine Statistics in Brief, a regular
publication of the PSA which is also the basis for the example of the textual presentation
given above.

!
Chapter!1!Exploring!Data!!Lesson!5! Page!2"
! !
!
!
!
Table 5.1 Regional estimates of poverty incidence among families based on
the Family Income and Expenditures Survey conducted on the
same year of reporting.

Region 2006 2009 2012


NCR 2.9 2.4 2.6
CAR 21.1 19.2 17.5
I 19.9 16.8 14.0
II 21.7 20.2 17.0
III 10.3 10.7 10.1
IV A 7.8 8.8 8.3
IV B 32.4 27.2 23.6
V 35.4 35.3 32.3
VI 22.7 23.6 22.8
VII 30.7 26.0 25.7
VIII 33.7 34.5 37.4
IX 40.0 39.5 33.7
X 32.1 33.3 32.8
XI 25.4 25.5 25.0
XII 31.2 30.8 37.1
Caraga 41.7 46.0 31.9
ARMM 40.5 39.9 48.7

Graphical presentation on the other hand, is a visual presentation of the data. Graphs are
commonly used in oral presentation. There are several forms of graphs to use like the pie
chart, pictograph, bar graph, line graph, histogram and box-plot. Which form to use depends
on what information is to be relayed. For example, trends across time are easily seen using a
line graph. However, values of variables in nominal or ordinal levels of measurement should
not be presented using line graph. Rather a bar graph is more appropriate to use. A graphical
presentation in the form of vertical bar graph of the 2012 regional estimates of poverty
incidence among families is shown below:

60!
Poverty"Incidence"Among"

50!
Families"in"Percent"

40!

30!

20!

10!

0!
I!
II!
III!

IV!B!

VI!
VII!
VIII!
IX!
X!
XI!
XII!
Caraga!
ARMM!
IV!A!

V!
NCR!
CAR!

Figure 5.1 2012 Regional poverty incidence among families (2012 FIES).

!
Chapter!1!Exploring!Data!!Lesson!5! Page!3"
! !
!
!
!
Other examples of graphical presentations that are shown below are lifted from the Handbook
of Statistics 1 (listed in the reference section at the end of this Teaching Guide).

Figure 5.2. Percentage distribution of dogs according to groupings identified in a dog show.

Figure 5.3. Distribution of fruits sales of a store for two days.

Figure 5.4 Weapons arrest rate from 1965 to 1992 by age of offender.

!
Chapter!1!Exploring!Data!!Lesson!5! Page!4"
! !
!
!
!
80

weight in kg
70
60
50
40
30
110 130 150 170 190
height in cm

Figure 5.5. Height and weight of STAT 1 students registered during the previous term.

2. The Frequency Distribution Table and Histogram


A special type of tabular and graphical presentation is the frequency distribution table (FDT)
and its corresponding histogram. Specifically, these are used to depict the distribution of the
data. Most of the time, these are used in technical reports. An FDT is a presentation
containing non-overlapping categories or classes of a variable and the frequencies or counts
of the observations falling into the categories or classes. There are two types of FDT
according to the type of data being organized: a qualitative FDT or a quantitative FDT. For a
qualitative FDT, the non-overlapping categories of the variable are identified, and
frequencies, as well as the percentages of observations falling into the categories, are
computed. On the other hand, for a quantitative FDT, there are also of two types: ungrouped
and grouped. Ungrouped FDT is constructed when there are only a few observations or if the
data set contains only few possible values. On the other hand, grouped FDT is constructed
when there is a large number of observations and when the data set involves many possible
values. The distinct values are grouped into class intervals. The creation of columns for a
grouped FDT follows a set of guidelines. One such procedure is described in the following
steps, which is lifted from the Workbook in Statistics 1 (listed in the reference section at the
end of this Teaching Guide)

Steps in the construction of a grouped FDT

1. Identify the largest data value or the maximum (MAX) and smallest data value or the minimum
(MIN) from the data set and compute the range, R. The range is the difference between the largest
and smallest value, i.e. R = MAX MIN.

2. Determine the number of classes, k using k = N , where N is the total number of observations in
the data set. Round-off k to the nearest whole number. It should be noted that the computed k
might not be equal to the actual number of classes constructed in an FDT.

3. Calculate the class size, c, using c = R/k. Round off c to the nearest value with precision the same
as that with the raw data.

!
Chapter!1!Exploring!Data!!Lesson!5! Page!5"
! !
!
!
!
4. Construct the classes or the class intervals. A class interval is defined by a lower limit (LL) and an
upper limit (UL). The LL of the lowest class is usually the MIN of the data set. The LLs of the
succeeding classes are then obtained by adding c to the LL of the preceding classes. The UL of the
! 1 "
lowest class is obtained by subtracting one unit of measure # x $ , where x is the maximum
% 10 &
number of decimal places observed from the raw data) from the LL of the next class. The ULs of
the succeeding classes are then obtained by adding c to the UL of the preceding classes. The
lowest class should contain the MIN, while the highest class should contain the MAX.

5. Tally the data into the classes constructed in Step 4 to obtain the frequency of each class. Each
observation must fall in one and only one class.
!
6. Add (if needed) the following distributional characteristics:

a. True Class Boundaries (TCB). The TCBs reflect the continuous property of a continuous data.
It is defined by a lower TCB (LTCB) and an upper TCB (UTCB). These are obtained by taking
the midpoints of the gaps between classes or by using the following formulas: LTCB = LL
0.5(one unit of measure) and UTCB = UL + 0.5(one unit of measure).

b. Class Mark (CM). The CM is the midpoint of a class and is obtained by taking the average of
the lower and upper TCBs, i.e. CM = (LTCB + UTCB)/2.

c. Relative Frequency (RF). The RF refers to the frequency of the class as a fraction of the total
frequency, i.e. RF = frequency/N. RF can be computed for both qualitative and quantitative
data. RF can also be expressed in percent.

d. Cumulative Frequency (CF). The CF refers to the total number of observations greater than or
equal to the LL of the class (>CF) or the total number of observations less than or equal to the
UL of the class (<CF).

e. Relative Cumulative Frequency (RCF). RCF refers to the fraction of the total number of
observations greater than or equal to the LL of the class (>RCF) or the fraction of the total
number of observations less than or equal to the UL of the class (<RCF). Both the <RCF and
>RCF can also be expressed in percent.

The histogram is a graphical presentation of the frequency distribution table in the form of a
vertical bar graph. There are several forms of the histogram and the most common form has
the frequency on its vertical axis while the true class boundaries in the horizontal axis.
As an example, the FDT and its corresponding histogram of the 2012 estimated poverty
incidences of 144 municipalities and cities of Region VIII are shown below.

78!
80!
Poverty Incidence Frequency 59!
60!
Frequency"

(%)
00.000 - 20.015 3
40!
20.015 - 40.015 59
40.015 - 60.015 78 20!
60.015 - 80.015 4 3! 4! 0!
80.015 - 100.00 0 0!
True"Class"Boundaries"

!
Chapter!1!Exploring!Data!!Lesson!5! Page!6"
! !
!
!
!
KEY POINTS

Three methods of data presentation: textual, tabular and graphical


Two or all the methods could be combined to fully describe the data at hand.
Distribution of data is presented using frequency distribution table and histogram.

REFERENCES
Albert, J. R. G. (2008). Basic Statistics for the Tertiary Level (ed. Roberto Padua, Welfredo
Patungan, Nelia Marquez), published by Rex Bookstore.
Handbook of Statistics 1 (1st and 2nd Edition), Authored by the Faculty of the Institute of
Statistics, UP Los Baos, College Laguna 4031
Takahashi, S. (2009). The Manga Guide to Statistics. Trend-Pro Co. Ltd.
Workbooks in Statistics 1 (From 1st to 13th Edition), Authored by the Faculty of the Institute
of Statistics, UP Los Baos, College Laguna 4031

!
Chapter!1!Exploring!Data!!Lesson!5! Page!7"
! !
!
!
!
ASSESSMENT
Note: This exercise and its corresponding possible answers were lifted from Workbook in
Statistics 1 (listed in the reference section)
A. You are to describe the data on the following table. Perform what is being asked for in the
questions found after the table.
!!!!Table!5.2!!Characteristics!of!the!30!members!of!the!Batong!Malake!Senior!Citizens!Association!
(BMSCA)!who!participated!in!their!2009!LakbayFAral.!
Receiving
Gross Monthly
Age as of Last Monthly Number of Years as
No. Gender Family Income
Birthday Pension? Member
(in thousand pesos)
(Y/N)
1 Female 61 Yes 45.0 1
2 Female 64 Yes 26.3 2
3 Male 74 No 33.5 10
4 Male 80 No 50.0 12
5 Female 63 Yes 18.4 2
6 Female 71 Yes 30.0 9
7 Female 75 No 41.0 2
8 Male 64 No 10.1 3
9 Male 65 No 46.5 5
10 Female 68 Yes 18.0 3
11 Female 71 Yes 34.2 6
12 Female 63 Yes 73.1 2
13 Female 72 Yes 15.6 11
14 Male 76 Yes 17.4 11
15 Female 69 No 33.8 8
16 Male 70 Yes 35.1 9
17 Male 74 Yes 18.6 6
18 Female 68 Yes 65.7 8
19 Female 70 No 19.6 3
20 Male 65 Yes 53.0 2
21 Male 64 Yes 18.4 1
22 Female 62 Yes 27.8 1
23 Female 63 No 33.4 2
24 Male 68 No 38.0 5
25 Male 67 Yes 37.6 5
26 Male 69 No 50.4 7
27 Female 68 Yes 44.3 4
28 Female 66 No 36.7 3
29 Female 63 No 18.0 2
30 Male 64 Yes 63.2 2
!

!
Chapter!1!Exploring!Data!!Lesson!5! Page!8"
! !
!
!
!
1. Choose a QUANTITATIVE variable from the given data set. Construct a quantitative
grouped FDT for this variable. Show preliminary computations (R, k, and c). Also,
construct a histogram for the data. Use appropriate labels and titles for the table and
graph. Describe the characteristics of the units in the data set using a brief narrative
report. Refer to the FDT and histogram constructed.

R = ____________________ k = ____________________ c = ________________

Table ______________________________________________________________________

Classes Frequency RF CF RCF (%) TCB


CM
LL UL (F) (%) < CF > CF < RCF > RCF LTCB UTCB

Histogram:

Textual presentation:

________________________________________________________________________
________________________________________________________________________
________________________________________________________________________
________________________________________________________________________

Which of the three methods of data presentation do you think is most appropriate to use
for the variable chosen in Number 1? Justify your answer.

________________________________________________________________________
________________________________________________________________________
________________________________________________________________________

!
Chapter!1!Exploring!Data!!Lesson!5! Page!9"
! !
!
!
!
2. Choose a QUALITATIVE variable from Table 5.2 Construct an appropriate graph. Use
labels and a title for the graph.

Give a brief report describing the variable:


________________________________________________________________________
________________________________________________________________________
________________________________________________________________________
________________________________________________________________________

Possible Answers:

1. For the quantitative variable gross monthly family income:

R = 73.1 10.1 = 63 k= 30 = 5.477 ~ 5 c = 63/5 = 12.6

Table 1. Distribution of the gross monthly family income (in thousand pesos) of the 30
Batong Malake Senior Citizens Association members who joined the Lakbay-Aral.
Classes Frequency RF CF RCF (%) TCB
CM
LL UL (F) (%) < CF > CF < RCF > RCF LTCB UTCB
10.1 22.6 9 30.00 9 30 30.00 100.00 16.35 10.05 22.65
22.7 35.2 8 26.67 17 21 56.67 70.00 28.95 22.65 35.25
35.3 47.8 7 23.33 24 13 80.00 43.33 41.55 35.25 47.85
47.9 60.4 3 10.00 27 6 90.00 20.00 54.15 47.85 60.45
60.5 73.0 2 6.67 29 3 96.67 10.00 66.75 60.45 73.05
73.1 85.6 1 3.33 30 1 100.00 3.33 79.35 73.05 85.65

Histogram:
10!
8!
Frequency"

6!
4!
2!
0!
10.05!!!!!!!!!!!!!!!22.65!!!!!!!!!!!!!!!!35.25!!!!!!!!!!!!!!!!47.85!!!!!!!!!!!!!!!!!60.45!!!!!!!!!!!!!!!!73.05!!!!!!!!!!!!!!!!!
1! 2! 3! 4! 5! 6!
85.65!
TCB"
!

Figure 1. Monthly gross family income (in thousand pesos) of the 30 BMSCA members.
!
Chapter!1!Exploring!Data!!Lesson!5! Page!10"
! !
!
!
!
Textual presentation:
(Sample) The monthly gross family income of the 30 BMSCA members range from 10.1 to
73.1 thousand pesos. More than half of them have income of at most 35,250 pesos. Only three
of them, or 10%, have monthly family income of at least 60,450 pesos.

Which of the three methods of data presentation do you think is most appropriate to use for
the variable chosen in Number 1? Justify your answer.
(Sample)
Textual presentation: It is most appropriate to use a textual presentation since the highlights
of the family income of the BMSCA members can be presented.
Tabular presentation: It is most appropriate to use a tabular presentation since a lot of the
numerical information can be presented and trends in the monthly income of the members
can be seen.
Graphical presentation: A graphical presentation is most appropriate so that trends in the
monthly income of the BMSCA are easily visible.

2. For the qualitative variable: gender

!
! Figure 2. Distribution of the 30 BMSCA members by gender.
! !
Brief Description: Majority of the 30 BMSCA who joined the Lakbay-Aral are males.
Only 43% are females.

For the qualitative variable: whether member is receiving monthly pension or not

!
Figure 2. Distribution of the 30 BMSCA members as to whether
they are receiving monthly pension or not.

Brief Description: More than half of the 30 BMSCA members receive monthly pension.
Forty percent are not receiving monthly pension.

!
Chapter!1!Exploring!Data!!Lesson!5! Page!11"
! !
!
!
!
Chapter 1: Exploring Data

Lesson 6: Measures of Central Tendency

TIME FRAME:1 hour session

OVERVIEW OF LESSON
The lesson begins with students engaging in a review of some measures of central tendency
by considering a numerical example. Students are also asked to examine both strengths and
limitations of these measures. Assessments will be given to students on their ability to
calculate these measures, and also to get an overall sense of whether they recognize how
these measures respond to changes in data values.

LEARNING OUTCOME(S): At the end of the lesson, the learner is able to

Calculate commonly used measures of central tendency,


Provide a sound interpretation of these summary measures, and
Discuss the properties of these measures.

LESSON OUTLINE:
1. Motivation
2. Common Measures of Central Tendency: Mean, Median and Mode
3. Properties of the Mean, Median and Mode

DEVELOPMENT OF THE LESSON


A. Motivation
Present to the students the following frequency distribution table of the monthly income of 35
families residing in a nearby barangay/village.

Monthly Family Income in Pesos Number of Families


12,000 2
20,000 3
24,000 4
25,000 8
32,250 9
36,000 5
40,000 2
60,000 2

!
Chapter!1!Exploring!Data!!Lesson!6! Page!1"
! !
!
!
!
You may ask the students the following to pick up their interest and at the same time
introduce to them some summary statistics.

1. What is the highest monthly family income? Lowest?

Answer: Highest monthly family income is 60,000 pesos while the lowest is 12,000
pesos.

You may emphasize that the highest and lowest values, which are commonly known as
maximum and minimum, respectively are summary measures of a data set. They represent
important location values in the distribution of the data. However, these measures do not give
a measure of location in the center of the distribution.

2. What monthly family income is most frequent in the village?

Answer: Monthly family income that is most frequent is 32,250 pesos.

The value of 32,250 occurs most often or it is the value with the highest frequency. This is
called the modal value or simply the mode. In this data set, the value of 32,250 is found in
the center of the distribution.

3. If you list down individually the values of the monthly family income from lowest to
highest, what is the monthly family income where half of the total number of families have
monthly family income less than or equal to that value while the other half have monthly
family income greater than that value?

Answer: When arranged in increasing order or the data come in an array as in the
following:

12,000; 12,000; 20,000; 20,000; 20,000; 24,000; 24,000; 24,000; 24,000; 25,000;
25,000;25,000; 25,000; 25,000; 25,000; 25,000; 25,000; 32,250; 32,250; 32,250;
32,250; 32,250; 32,250; 32,250; 32,250; 32,250; 36,000; 36,000; 36,000; 36,000;
36,000; 40,000; 40,000; 60,000; 60,000;

there are 17 values that are less than the middle value while another 17 values are higher
or equal to the middle value. That middle value is the 18th observation and it is equal to
32,250 pesos. The middle value is called the median and is found in the center of the
distribution.

4. What is the average monthly family income?

Answer: When computed using the data values, the average is 30,007.14 pesos.

The average monthly family income is commonly referred to as the arithmetic mean or
simply the mean which is computed by adding all the values and then the sum is divided by

!
Chapter!1!Exploring!Data!!Lesson!6! Page!2"
! !
!
!
!
the number of values included in the sum. The average value is also found somewhere in the
center of the distribution.

Let us now summarize what we have learned from our illustration and introduce the three
common measures of central tendency.

B. Common Measures of Central Tendency: Mean, Median and Mode

Inform students that the most widely used measure of the center is the (arithmetic) mean. It
is computed as the sum of all observations in the data set divided by the number of
observations that you include in the sum. If we use the summation symbol, ! !!! !! read as
sum of observations represented by xi where i takes the values from1 to N, and N refers to
the total number of observations being added, we could compute the mean (usually denoted
!
!
by Greek letter, ) as ! = !!! ! !. Using the example earlier with 35 observations of
family income, the mean is computed as

! = 12,000 + 12,000 + + 60,000 35 = 1,050,250 35 = 30,007.14

Alternatively, we could do the computation as follows:

Monthly Family Number of


Income in Pesos Families xi fi
(xi) (fi)
12,000 2 12,000 2 = 24,000
20,000 3 20,000 3 = 60,000
24,000 4 24,000 4 = 96,000
25,000 8 25,000 8 = 200,000
32,250 9 32,250 9 = 290,250
36,000 5 36,000 5 = 180,000
40,000 2 40,000 2 = 80,000
60,000 2 60,000 2 = 120,000
Sum = 35 Sum = 1,050,250

For large number of observations, it is advisable to use a computing tool like a calculator or a
computer software, e.g. spreadsheet application or Microsoft Excel.
The median on the other hand is the middle value in an array of observations. To determine
the median of a data set, the observations must first be arranged in increasing or decreasing
order. Then locate the middle value so that half of the observations are less than or equal to
that value while the half of the observations are greater than the middle value.

If N (total number of observations in a data set) is odd, the median or the middle value is the
!!! !!
!
!observation in the array. On the other hand, if N is even, then the median or the
! !!
middle value is the average of the two middle values or it is average of the !
and

!
Chapter!1!Exploring!Data!!Lesson!6! Page!3"
! !
!
!
!
! !!
!
+1 !observations. In the example given earlier, there are 35 observations so N is 35, an
!!! !! !" !!
odd number. The median is then the ! = ! = 18!! observation in the array.
Locating the 18th observation in the array leads us to the value equal to 32,250 pesos.

The mode or the modal value is the value that occurs most often or it is that value that has the
highest frequency. In other words, the mode is the most fashionable value in the data set.
Like in the example above, the value of 32,250 pesos occurs most often or it is the value with
the highest frequency which is equal to nine.

C. Properties of the Mean, Median and Mode

Each of these three measures has its own properties. Most of the time we use these properties
as basis for determining what measure to use to represent the center of the distribution.
As mentioned before the mean is the most commonly used measure of central tendency since
it could be likened to a center of gravity since if the values in an array were to be put on a
beam balance, the mean acts as the balancing point where smaller observations will balance
the larger ones as seen in the following illustration.

12,000! 20,000!
24,000! 25,000! 32,250! 36,000! 40,000! 60,000!
!

Note that the frequency represented by the size of the rectangle serves as weights in this
beam balance.

To illustrate further this property, we could ask the student to subtract the value of the mean
to each observation (denoted as di) and then sum all the differences. The computation can
also be done alternatively as shown in the following table.

Number
Monthly Family
di = xi - of di fi
Income in Pesos
(rounded off) Families
(xi)
(fi)
12,000 12,000 30,007.14 = -18,007 2 -18,007 2 = -36,014
20,000 20,000 30,007.14 = -10,007 3 -10,007 3 = -30,021
24,000 24,000 30,007.14 = -6,007 4 -6,007 4 = -24,049
25,000 25,000 30,007.14 = -5,007 8 -5,007 8 = -40,057
32,250 32,250 30,007.14 = 2,243 9 2,243 9 = 20,186
36,000 36,000 30,007.14 = 5,993 5 5,993 5 = 29,964
40,000 40,000 30,007.14 = 9,993 2 9,993 2 = 19,986
60,000 60,000 30,007.14 = 29,993 2 29,993 2 = 59,986
Sum = 35 Sum = 0

!
Chapter!1!Exploring!Data!!Lesson!6! Page!4"
! !
!
!
!
The sum of the differences across all observations will be equal to zero. This indicate that the
mean indeed is the center of the distribution since the negative and positive deviations cancel
out and the sum is equal to zero.

In the expression given above, we could see that each observation has a contribution to the
value of the mean. All the data contribute equally in its calculation. That is, the weight of
each of the data items in the array is the reciprocal of the total number of observations in the
data set, i.e. 1 !.

Means are also amenable to further computation, that is, you can combine subgroup means to
come up with the mean for all observations. For example, if there are 3 groups with means
equal to 10, 5 and 7 computed from 5, 15, and 10 observations respectively, one can compute
the mean for all 30 observations as follows:

!! !! + ! !! !! + !! !! ! 105! + 515! + 710!


!= = = 195 30 = 6.59
30 30

If there are extreme large values, the mean will tend to be pulled upward, while if there are
extreme small values, the mean will tend to be pulled downward. The extreme low or high
values are referred to as outliers.Thus, outliers do affect the value of the mean.

To illustrate this property, we could tell the students that if in case there is one family with
very high income of 600,000 pesos monthly instead of 60,000 pesos only, the computed
value of mean will be pulled upward, that is,

! = 12,000 + 12,000 + + 600,000 35 = 2,130,250 35 = 60,864.29

Thus, in the presence of extreme values or outliers, the mean is not a good measure of the
center. An alternative measure is the median. The mean is also computed only for
quantitative variables that are measured at least in the interval scale.

Like the mean, the median is computed for quantitative variables. But the median can be
computed for variables measured in at least in the ordinal scale. Another property of the
median is that it is not easily affected by extreme values or outliers. As in the example above
with 600,000 family monthly income measured in pesos as extreme value, the median
remains to same which is equal to 32,250 pesos.

For variables in the ordinal, the median should be used in determining the center of the
distribution. On the other hand, the mode is usually computed for the data set which are
mainly measured in the nominal scale of measurement. It is also sometimes referred to as the
nominal average. In a given data set, the mode can easily be picked out by ocular inspection,
especially if the data are not too many. In some data sets, the mode may not be unique. The
data set is said to be unimodal if there is a unique mode, bimodal if there are two modes,
and multimodal if there are more than two modes. For continuous data, the mode is not very
useful since here, measurements (to the most precise significant digit) would theoretically
occur only once.

!
Chapter!1!Exploring!Data!!Lesson!6! Page!5"
! !
!
!
!
The mode is a more helpful measure for discrete and qualitative data with numeric codes than
for other types of data. In fact, in the case of qualitative data with numeric codes, the mean
and median are not meaningful.

The following diagram provides a guide in choosing the most appropriate measure of central
tendency to use in order to pinpoint or locate the center or the middle of the distribution of
the data set. Such measure, being the center of the distribution typically represents the data
set as a whole. Thus, it is very crucial to use the appropriate measure of central tendency.

What is the level of


measurement of the variable?

Nominal Ordinal Interval/Ratio

Best to Use Small Number Large Number Presence of


Mode of Observations of Observations Outliers?

Best to Use
Best to Use Mean or Without
Median With Outliers Outliers
Median

Best to Use Best to Use


Median Mean

KEY POINTS

A measure of central tendency is a location measure that pinpoints the center or middle
value.
The three common measures of central tendency are the mean, median and mode.
Each measure has its own properties that serve as basis in determining when to use it
appropriately.

REFERENCES
Albert, J. R. G. (2008). Basic Statistics for the Tertiary Level (ed. Roberto Padua,
WelfredoPatungan, Nelia Marquez), published by Rex Bookstore.
Deciding Which Measure of Center to Use http://www.sharemylesson.com/teaching-
resource/deciding-which-measure-of-center-to-use-50013703/
Handbook of Statistics 1 (1st and 2nd Edition), Authored by the Faculty of the Institute of
Statistics, UP Los Baos, College Laguna 4031
Workbooks in Statistics 1 (From 1st to 13th Edition), Authored by the Faculty of the Institute
of Statistics, UP Los Baos, College Laguna 4031
!
Chapter!1!Exploring!Data!!Lesson!6! Page!6"
! !
!
!
!
ASSESSMENT
Note: Answers are provided inside the parentheses and in bold face.

1. Thirty people were asked the question, How many people do you consider your best
friend? The graph below shows their responses.

12!

10!

8!
Frequency"

6!

4!

2!

0!
1! 2! 3! 4! 5! 6! 7! 8!
Number"of"Best"Friends"

What measure of central tendency would you use to find the center for the number of best
friends people have? Explain your answer. (Since there is a presence of an outlier, one
can use the median which is numerically equal to 3)
2. The mean age of 10 full time guidance counselors is 35 years old. Two new full time
guidance counselors, aged 28 and 30, are hired. Five years from now, what would be the
average age of these twelve guidance counselors? (The sum of ages is 350 for 10
counselors, with the two newly hired, the sum is now 408, thus yielding a mean currently at
34 years. Five years from now, the mean will go up to 39 years for the 12 guidance
counselors.)

3. Houses in a certain area in a big city have a mean price of PhP4,000,000 but a median
price is only PhP2,500,000. How might you explain this best? (There is an outlier (an
extremely expensive house) in the prices of the houses.)

4. Five persons were asked on the usual number of hours they spent watching television in a
week. Their responses are: 5, 7, 3, 38, and 7 hours.
a. Obtain the mean, median and mode. (The mean is 12; median is 7, mode is 7.)
b. If another person were to be asked the same question and he/she responded 200 hours,
how would this affect the mean, median and mode? (Median and mode unchanged;
mean increases to 43.3)

5. For the senior high school dance, there is a debate going on among students regarding the
color that will be featured prominently. Votes were sent by students via SMS, and the
results are as follows:

Color Red Green Orange White Yellow Blue Brown Purple


No. of Votes
Received
300 550 70 130 220 710 35 5
!
Chapter!1!Exploring!Data!!Lesson!6! Page!7"
! !
!
!
!
a. Is there a clear winner on the choice of color? (Yes)
b. Compute for the mean, median and modal color (if possible). (We cannot compute for
the mean and median. But the modal color is said to be blue.)
c. Why is it that we could or could not find each measure of the central tendency? (We
cannot compute for the mean and median since color is a qualitative variable and
is measured at the nominal level)
d. Which measure of central tendency will determine the color to be prominently used
during the senior high school dance? (mode)

6. Everyone studied very hard for the quiz in the Statistics and Probability Course. There
were 10 questions in the quiz, and the scores are distributed as follows:

Score Number of Students


10 8
9 12
8 6
7 5
6 3
5 2
4 0
3 1
2 1
1 0
0 2

a. Compute for the mean, median, and mode for this set of data. (The computation
could be done as follows:

Score Number of Students Less Than Cumulative Frequency


xi fi
(xi) (fi) (< CF)
10 8 80" 40"
9 12 108" 32"
8 6 48" 20"
7 5 35" 14"
6 3 18" 9"
5 2 10" 6"
4 0 0" 4"
3 1 3" 4"
2 1 2" 3"
1 0 0" 2"
0 2 0" 2"
Sum = 40 Sum"="304" "

!"#
Mean = = !"
= !. !;
!!!
Median is the average of the 20th and 21st observations = ! = !. !. Note that
the 20th observation is 8 while the 21st observation is 9 based on the less
than cumulative frequency.
Mode = 9 since that is the score with the highest frequency equal to 12.
!
Chapter!1!Exploring!Data!!Lesson!6! Page!8"
! !
!
!
!
c. Suppose the teacher said Everyone in the class will be getting either the mean,
median, or mode for their official score.
i. What would students want to receive (mean, median, or mode)? (Mode)
ii. Which would students want to receive the least (mean, median or mode)? (Mean)
iii What is the fairest score to receive would be? Ask students to explain their
answers. (Note: There is no right or wrong answer for this question. It all
depends on the reasoning of the students)

!
Chapter!1!Exploring!Data!!Lesson!6! Page!9"
! !
!
!
!
Chapter 1: Exploring Data

Lesson 7: Other Measures of Location

TIME FRAME:1 hour session

OVERVIEW OF LESSON
In the previous lesson we discussed a measure of location known as the measure of central
tendency. There are other measures of location which are useful in describing the distribution
of the data set. These measures of location include the maximum, minimum, percentiles,
deciles and quartiles. How to compute and interpret these measures are also discussed in this
lesson.

LEARNING OUTCOME(S): At the end of the lesson, the learner is able to

Calculate measures of location other than the measure of central tendency, and
Provide a sound interpretation of these summary measures.

LESSON OUTLINE:
1. Motivation
2. Measures of Location: Maximum, Minimum, Percentiles, Deciles and Quartiles

DEVELOPMENT OF THE LESSON


A. Motivation
In the previous lesson, we ask the students to identify the highest and lowest family income,
and emphasized that that the highest and the lowest values, which are commonly known as
maximum and minimum, respectively are important summary measures of a data set. They
represent important location values in the distribution of the data. However, these measures
do not give a measure of location in the center of the distribution. Instead, these two location
measures give extreme locations or points in a distribution.

For example, after a long test or examination, we are interested what is the highest score or
lowest score and of course who got these scores. These are in addition to knowing the
average, median and modal scores. These measures tell us how the students perform in the
long test. Knowing these measures, we could do further actions like reward the student(s)
who got the highest score and assist those student(s) who got the lowest score. In addition,
these measures also indicate if the long test is difficult or easy and the measures may also
indicate the level of understanding of the students in the concepts that are covered in the test.

!
Chapter!1!Exploring!Data!!Lesson!7! Page!1"
! !
!
!
!
To motivate the students, present the following distribution of scores in a 50-item long test of
150 Grade 11 students of a nearby Senior High School and ask them to respond to some
questions.

Score in a Long Test Number of Students


10 4
16 5
18 5
20 15
25 19
30 22
33 18
38 28
40 10
42 7
45 8
50 9

1. What is the highest score? Lowest score?


Answer: Highest score is 50 while the lowest is 10.

2. What is the most frequent score?


Answer: Most frequent score is 38 which is the score of 28 students.

3. What is the median score?


Answer: The median score is 33 which implies that 50% of the students or around 75
students have score at most 33.

4. What is the average or mean score?


Answer: On the average, the students got 32.04667 or 32 (rounded off) out of 50 items
correctly.

You could ask more questions like:


1. What is the score where at most 75% of the 150 students scored less or equal to it?
2. Do you think the long test is easy since 75 students have scores at most 33 out of 50?
3. Do you need to be alarmed when 10% of the class got a score of at most 20 out of 50?

These questions could be answered by knowing other measures of location.

B. Measures of Location: Maximum, Minimum, Percentiles, Deciles and Quatiles

We formally define the maximum as a measure of location that pinpoints the highest value in
the data distribution while the minimum locates the lowest value. There are other measures
of location that are becoming common because of its constant use in reporting rank in

!
Chapter!1!Exploring!Data!!Lesson!7! Page!2"
! !
!
!
!
distribution of scores as the percentile rank in college entrance examination. These measures
are referred to as percentiles, deciles, and quartiles.

Percentile is a measure that pinpoints a location that divides distribution into 100 equal parts.
It is usually represented by Pj, that value which separates the bottom j% of the distribution
from the top (100-j)%. For example, P30 is the value that separates the bottom 30% of the
distribution to the top 70%. Thus we say 30% of the total number of observations in the data
set are said to be less than or equal to P30 while the remaining 70% have values greater than
P30.

Lifted from the workbook cited as reference at the end of this Teachers Guide, are the steps in
finding the jth percentile (Pj)

Step 1: Arrange the data values in ascending order of magnitude.


! j "
Step 2: Find the location of Pj in the arranged list by computing L = $ % N , where N is
& 100 '
the total number of observations in the data set.
Step 3:
a. If L is a whole number, then Pj is the mean or average of the values in the Lth and
(L+1)th positions.
b. If L is not a whole number, then Pj is the value of the next higher position.

To illustrate we use the data on long test scores of 150 Grade 11 students of nearby Senior
High School. An additional column on less than cumulative frequency was included to
facilitate the computation.

Score in a Long Test Number of Students < CF


10 4 4
16 5 9
18 5 14
20 15 29
25 19 48
30 22 70
33 18 88
38 28 116
40 10 126
42 7 133
45 8 141
50 9 150

To find P30 we note that j = 30. Since the observations are tabulated in increasing order, we
! !"
could proceed to Step 2 which ask us to compute L as ! = !"" ! = !"" 150 = 45.
The computed L which is equal to 45 is a whole number and thus we follow the first rule in
Step 3 which states that Pj is the average or mean of the values found in the Lth and (L+1)th
positions. Thus, we take the average of the 45th and 46th observations which are both equal to
25. We then say that the bottom 30% of the scores are said to be less than or equal to 25
while the top 70% of the observations (which is around 105) are greater than 25.

!
Chapter!1!Exploring!Data!!Lesson!7! Page!3"
! !
!
!
!
Deciles and quartiles are then defined in relation to percentile. If the percentile divides the
distribution into 100 equal parts, deciles divide the distribution into 10 equal parts while
quartiles divide the distribution into 4 equal parts. Thus, we say that 10th Percentile is the
same as the 1st Decile, 20th Percentile same as 2nd Decile, 25th Percentile same as 1st Quartile,
50th Percentile same as 5th Decile or 2nd Quartile and so forth. Note also that by definition of
the median in previous lesson, we could say that the median value is equal to the 50th
Percentile or 5th Decile or 2nd Quartile. Because of this relationship, the computation of the
quartile and decile could be coursed through the computation of the percentile.
To illustrate, if we want to compute the 3rd Decile or D3 then we compute 30th Percentile or
P30. In other words, D3 = P30 = 25 based on our earlier computation. The 3rd Quartile or Q3 is
! !"
equal to P75. To compute L as ! = !"" ! = !"" 150 = 112.5. The computed L which
is equal to 112.5 is not a whole number and thus we follow the second rule in Step 3 which
states that Pj is the value found in the next higher position, specifically, in 113th position, the
next higher position after 112.5. Thus, we take the 113th observation which is equal to 38 as
the value of P75. We then say that 75% of the class of 150 students or around 113 students
correctly answered at most 38 out of the 50 items.
The median which is equal to P50 is computed as the mean or average of the 75th and 76th
observations which are both equal to 33. Hence, we did get the same value as the one we
obtained using the definition we had in the previous lesson.

KEY POINTS
There are other measures of location that could further describe the distribution of the
data set.
The maximum and minimum values are measures of location that pinpoints the extreme
values which are the highest and lowest values, respectively.
Percentiles, quartiles and deciles are measures of locations that divide the distribution into
100, 4 and 10 equal parts, respectively.

REFERENCES
Deciding Which Measure of Center to Use http://www.sharemylesson.com/teaching-
resource/deciding-which-measure-of-center-to-use-50013703/
Albert, J. R. G. (2008). Basic Statistics for the Tertiary Level (ed. Roberto Padua,
WelfredoPatungan, Nelia Marquez), published by Rex Bookstore.
Handbook of Statistics 1 (1st and 2nd Edition), Authored by the Faculty of the Institute of
Statistics, UP Los Baos, College Laguna 4031
Moore, D.S. (2007). The Basic Practice of Statistics, Fourth Edition W.H. Freeman and
Company.
Workbooks in Statistics 1 (From 1st to 13th Edition), Authored by the Faculty of the Institute
of Statistics, UP Los Baos, College Laguna 4031

!
Chapter!1!Exploring!Data!!Lesson!7! Page!4"
! !
!
!
!
ASSESSMENT
Note: Answers are provided inside the parentheses and in bold face.

1. A businesswoman is planning to have a restaurant in the university belt. She wants to


study the weekly food allowance of the students in order to plan her pricing strategy for
the different menus she is going to offer. She asked 213 students and gathered the
following data:

Weekly Food Weekly Food


Frequency Frequency
Allowance Allowance
50 5 550 3
100 3 600 18
150 6 700 22
170 1 750 8
200 8 800 16
250 5 900 11
300 5 1000 27
350 5 1200 2
400 6 1500 3
450 11 1700 1
500 46 2000 1

a. Determine the weekly food allowance where 60% of the students have at most.
! !"
(The statistic we wanted is P60. To compute L as ! = !""
! = !""
!"# =
!"#. ! !"#. Then we take the 128th observation which is equal to 700. Thus
we say that 60% of the students have at most 700 pesos as their weekly food
allowance.)

b. What percentage of the students have a weekly food allowance that is at most 170
pesos?
(Here we are looking for the value of j. It is given that Pj = 170 is the 15th
observation in the array of 213 values. Thus, 15 is the value of L and using this
! !"
we compute the value of j as ! = ! !"" = !"#
!"" !. Therefore we say
that 7% of the students have a weekly food allowance of at most 170 pesos.)

c. If the business woman wanted to have at least 50% of the students could afford to eat
in her restaurant, what should be the minimum total cost of the meals that the student
could have in a week?
!
(The statistic we wanted is the median or P50. To compute L as ! = !""
! =
!"
!""
!"# = !"#. ! !"#. Then we take the 107th observation which is equal
to 600. Thus we say that at least 50% of the students could afford to eat in the
restaurant if the minimum total cost of the meals that the student could have in a
week is 600 pesos.)

!
Chapter!1!Exploring!Data!!Lesson!7! Page!5"
! !
!
!
!
Chapter 1: Exploring Data

Lesson 8: Measures of Variation

TIME FRAME:1 hour session

OVERVIEW OF LESSON
In this lesson, students will be shown that it is not enough to get measures of central tendency
in a data set by scrutinizing two different data sets with the same measures of central
tendency. We illustrate this using data on the returns on stocks where it is not only the mean,
median and mode which are the same, it is also true for other measures of location like its
minimum and maximum. However, the spread of observations are different which means that
to further describe the data sets we need additional measures like a measure about the
dispersion of the data, i.e. range, interquartile range, variance, standard deviation, and
coefficient of variation. Also, the standard deviation, as a measure of dispersion can be
viewed as a measure of risk, specifically in the case of making investments in stock market.
The smaller the value of the standard deviation, the smaller is the risk.

LEARNING OUTCOME(S): At the end of the lesson, the learner is able to

Calculate some measures of dispersion;


Think of the strengths and limitations of these measures; and
Provide a sound interpretation of these measures.

LESSON OUTLINE:
1. Introduction: The Case of the Returns on Stocks
2. Absolute Measures of Dispersion: Range, Interquartile Range, Variance, Standard
Deviation and Coefficient of Variation
3. Relative Measure of Dispersion: Coefficient of Variation

DEVELOPMENT OF THE LESSON


A. Introduction: The Case of the Returns on Stocks.
To introduce this lesson, tell the students the importance of thinking about their future, of
saving, and of wealth generation. Explain that a number of people invest money into the
stock market as an alternative financial instrument to generate wealth from savings.
Explanatory Note: Stocks are shares of ownership in a company. When people buy stocks
they become part owners of the company, whether in terms of profits or losses of the
company.

!
Chapter!1!Exploring!Data!!Lesson!8! Page!1"
! !
!
!
!
Mention to students that the history of performance of a particular stock maybe a useful guide
to what may be expected of its performance in the foreseeable future. This is of course, a very
big assumption, but we have to assume it anyway.
Provide the following data to students representing the rates of return for two stocks, which
we will call Stock A and Stock B.

Year Stock A Stock B Year Stock A Stock B


2005 0.081 0.214 2010 0.241 0.081
2006 0.231 0.193 2011 0.193 0.181
2007 0.214 0.132 2012 0.133 0.230
2008 0.214 0.073 2013 0.071 0.214
2009 0.181 0.066 2014 0.066 0.241

Inform students that the rate of return is defined as the increase in value of the portfolio
(including any dividends or other distributions) during the year divided by its value at the
beginning of the year. For instance, if the parents of Juana dela Cruz invests 50,000 pesos in
a stock at the beginning of the year, and the value of the stock goes up to 60,000 pesos, thus
having an increase in value of 10,000 pesos, then the rate of return here is 10,000/50,000 =
0.20
Explain to students that the rate of return may be positive or negative. It represents the
fraction by which your wealth would have changed had it been invested in that particular
combination of securities.
Now, let us compute some measures of locations that we learned in previous lessons to
describe the data given above. You could ask the students to do this as a sort of an assessment
of what they have already learned. It could be done by recitation or through a quiz. Below is a
summary of the computed values as well as a graphical presentation of the rate of returns of
Stock A and B.

Maximum Minimum Mean Median Mode


Stock A 0.241 0.066 0.1625 0.187 0.214
Stock B 0.241 0.066 0.1625 0.187 0.214

0.3!

0.25!

0.2!

0.15! Stock!A!

Stock!B!
0.1!

0.05!

0!
2005! 2006! 2007! 2008! 2009! 2010! 2011! 2012! 2013! 2014!

!
Chapter!1!Exploring!Data!!Lesson!8! Page!2"
! !
!
!
!
Notice that there are no differences in the computed summary statistics but the trend and
actual values of the rate of returns for the two stocks are different as depicted in the line
graph. Such observation tells us that it is not enough to simply use measures of location to
describe a data set. We need additional measures such as measures of variation or dispersion
to describe further the data sets.
In particular, summary measures of variability (such as the range and the standard deviation)
of the rates of return are used to measure risk associated with investment. We could use
measures of variation to decide whether it would make any difference if we decide to invest
wholly in Stock A, wholly in Stock B, or half of our investments in Stock A and another half
in Stock B. In general, there is higher risk in investing if the rate of return fluctuates much or
there is high variability in its historical values. Thus, we choose investment where the risk of
the rate of return has a small measure of dispersion.
There are two types of measures of variability or dispersion. One type is the absolute
measure which includes the range, interquartile range, variance, and standard deviation.
Absolute measure of dispersion provides a measure of variability of observations or values
within a data set. On the other hand, the relative measure of dispersion which is the other
type of measure of dispersion is used to compare variability of data sets of different variables
or variables measured in different units of measurement. The coefficient of variation is a
relative measure of variability.

B. Absolute Measures of Dispersion: Range, Interquartile Range, Variance, and


Standard Deviation

The range is a simple measure of variation defined as the difference between the maximum
and minimum values. The range depends on the extremes; it ignores information about what
goes in between the smallest (minimum) and largest (maximum) values in a data set. The
larger the range, the larger is the dispersion of the data set. We already encountered the range
in previous lesson where we discussed the construction of an FDT.

Using the data on the scores of 150 Grade 11 students of a nearby Senior High School on a
50-item long test, we could demonstrate the computation of these measures.

Score in a Long Test Number of Students < CF


10 4 4
16 5 9
18 5 14
20 15 29
25 19 48
30 22 70
33 18 88
38 28 116
40 10 126
42 7 133
45 8 141
50 9 150

!
Chapter!1!Exploring!Data!!Lesson!8! Page!3"
! !
!
!
!
In the above data, the maximum is 50 and the minimum is 10, hence the range is 40. But note
that the range could be easily affected by the values of the extremes as mentioned earlier as
the range depends only on the extremities. Because of this property, another measure, the
interquartile range or IQR is used instead.
The interquartile range or IQR is the difference between the 3rd and the 1st quartiles. Hence,
it gives you the spread of the middle 50% of the data set. Like the range, the higher the value
of the IQR, the larger is the dispersion of the data set. Based on the computations we did in
the previous lesson, the 3rd quartile or Q3 is the 113th observation and is equal to 38 while Q1
or P25 is the 38th observation and is equal to 25. Hence, IQR = = 38 25 = 13.
Recall with the students the property of the mean when deviation or difference of each
observation was obtained and summed for all the observations we got the sum equal to zero.
We said that this property shows that the deviation of the observation from the mean cancels
out indicating that the mean is indeed the center of the distribution. What if we square the
difference before we get the sum and use it to measure the spread of observations? Doing it in
our example, we have the following table:

Score in a Number of
di =xi -
Long Test d i2 Students di2 fi
(rounded off)
(xi) (fi)
10 10-32 = -22 484 4 1936
16 16-32 = -16 256 5 1280
18 18-32 = -14 196 5 980
20 20-32 = -12 144 15 2160
25 25-32 = -7 49 19 931
30 30-32 = -2 4 22 88
33 33-32 = 1 1 18 18
38 38-32 = 6 36 28 1008
40 40-32 = 8 64 10 640
42 42-32 = 10 100 7 700
45 45-32 = 13 169 8 1352
50 50-32 = 18 324 9 2916
Sum= 14009

So what we did is for each unique observation we subtract the mean, we refer to the
difference as di, square the difference and sum it for all observations. Note that in the table
we have to multiply the square of the difference with the number of students to account for
all observations. We then divide the sum by the total number of observations, denoted by N.
!
! !! !
Summarizing these steps in a formula, we have !!! !! . We usually denote this expression
as or call it as variance. Thus in this example, = 14009/150 = 93.39 For ease in
2 2

! ! ! !
!! !! !!! !!
computation, instead of !!!
!
, we use an equivalent expression !
!! . When
! !
!!! !! !! !"#,!"#
applied to our example, we have ! ! = !
!! = !"#
32.04667! 93.39
(rounded off).
Variance is a measure of dispersion that accounts for the average squared deviation of each
observation from the mean. Since we square the difference of each observation from the
mean, the unit of measurement of the variance is the square of the unit used in measuring
!
Chapter!1!Exploring!Data!!Lesson!8! Page!4"
! !
!
!
!
each observation. Such property is a little bit problematic in interpretation. For example,
point2 or kilogram2 is difficult to interpret compared to inches2.
Hence, instead of the variance the standard deviation is computed which is the positive
square of the variance, that is, ! = ! ! . In the example,!! = 93.3933 = 9.6640. To
interpret, we say that on the average, the scores of the students deviate from the mean score
of 32 points by as much as 9.6640 or approximately 10 points.
If all the observations are equal to a constant, then the mean is that constant, and the measure
of variation is zero. Furthermore, if for a given data set, the variance and standard deviation
turn out to be zero, then all the deviations from the average must be zero, which means that
all observations are equal. Note that if a data set were rescaled, that is if the observations
were multiplied by some constant, then the standard deviation of the new data set is merely
the scaling factor multiplied to the standard deviation of the original data set.

The variance and standard deviation are based on all the observations items in the data set,
and each item is given a proper weight. They are extremely useful measures of variability as
they measure the average scattering of the data around the mean, that is how large data
fluctuate above and below the mean. The variance and standard deviation increase with an
increase in the deviations about the mean, and decrease with decreases in these deviations. A
small standard deviation (and variance) means a high degree of uniformity in the
observations and of homogeneity in a series.

The variance is the most suitable for algebraic manipulations but as was pointed out earlier,
its value is in squared unit of measurements. On the other hand, the standard deviation has
unit of measure same as with that of the observations. Thus, standard deviation serves as the
primary measure of variation, just as the mean is the primary measure of central location.

Going back to the motivation example on the stocks where in we have two stocks, A and B.
Both stocks have same expected return measured by the mean. However, the standard
deviation of the rates of return for Stock A is 0.0688 while that for Stock B is 0.0685,
indicating that Stock A has higher risk compared to Stock B although the difference is not
that large.

C. Relative Measure of Dispersion: Coefficient of Variation

To compare variability between or among different data sets, that is, the data sets are for
different variables or same variables but measured in different unit of measurement, the
coefficient of variation (CV) is used as measure of relative dispersion. It is usually expressed
!
as percentage and is computed as CV = ! 100%. CV is a measure of dispersion relative to
the mean of the data set. With and having same unit of measurement, CV is unit less or it
does not depend on the unit of measurement. Hence, it is used compare the variability across
the different data sets.

As an example, the CV of the scores of the students in the long test is computed as
! !.!!"#
CV = ! 100% = ! !".!"##$ 100% = 30.16% while the CV of the rate of returns of Stock A
!.!"##
is CV = !.!"#$ 100% = 42.34%. Thus, we say the rate of returns of Stock A is more

!
Chapter!1!Exploring!Data!!Lesson!8! Page!5"
! !
!
!
!
variable than the scores of the students in the test. Here, we used the CV to compare the
variability of two different data sets.

KEY POINTS
Measure of dispersion is used to further describe the distribution of the data set.
Absolute measures of variation include range, interquartile range, variance and standard
deviation.
A relative measure of dispersion is provided by the coefficient of variation.

REFERENCES

Albert, J. R. G. (2008). Basic Statistics for the Tertiary Level (ed. Roberto Padua,
WelfredoPatungan, Nelia Marquez), published by Rex Bookstore.
BryantSmith (2009): Practical Data Analysis, Second Edition. McGraw-Hill/Irvine, USA.
Handbook of Statistics 1 (1st and 2nd Edition), Authored by the Faculty of the Institute of
Statistics, UP Los Baos, College Laguna 4031
Moore, D.S. (2007). The Basic Practice of Statistics, Fourth Edition W.H. Freeman and
Company.
Range as a Measure of Variation http://www.sharemylesson.com/teaching-resource/range-
as-a-measure-of-variation-50009362
Workbooks in Statistics 1 (From 1st to 13th Edition), Authored by the Faculty of the Institute
of Statistics, UP Los Baos, College Laguna 4031

!
Chapter!1!Exploring!Data!!Lesson!8! Page!6"
! !
!
!
!
ASSESSMENT
Note: Answers are provided inside the parentheses and in bold face.

1. Three friends, Gerald, Carmina, and Rodolfo are planning their business of selling
homemade peanut butter. They start the planning by doing a market study where they
obtained the prices (in pesos) of a 250-gram jar of several known brands of peanut butter.
Below is the data set they have collected:

100.80 197.60 158.00 131.60 184.40 149.20


136.00 109.60 360.40 122.80 131.60

After studying the data, Gerald said, The prices of peanut butter are pretty similar. The
range is only PhP 30.80. Carmina said, You are mistaken! The prices are very different.
The range is PhP 259.60. Rodolfo said, I think you are both mistaken. The range isnt a
useful measure to describe the variation of the data set.

a. Explain what you think is the basis used by each person in support of their claims.
(Gerald did not arrange the data set from smallest to largest, and erroneously
subtracted the first value (100.80) from the last value (131.60) in the data set. Carmina
found the range correctly by subtracting the smallest value (100.80) from the largest
value (360.40). Rodolfo noticed that the maximum 360.80 is an outlier. As a result, the
computed range of PhP259.60 roughly describe the variation of the observations as it
was unduly increased by the extreme value.)

b. Who should we agree with? Why?


(We can agree with both Carmina and Rodolfo. Carmina correctly calculated the
range; Rodolfo intelligently observed that while Carmina was correct in her calculation,
the range is not very useful in describing the variability of the observations, as the range
would only be PHP 96.80 if the outlier were removed from the data set.)

2. Three hundred students taking a basic course in Statistics are given similar final
examination. After checking the papers and while the professor is studying the distribution of
the final examination scores, he taught of several scenarios which are described below:

a. Suppose the professor will give 30% weight to the final examination, what effect would
multiplying 30% on all the final scores have on the mean of the final exam scores? On the
standard deviation of the final exam scores?
(The mean will also get rescaled by 30%, so with the standard deviation.)
b. Suppose the professor wants to bloat the final examination scores, what will be the effect
to the mean of the final exam scores if 5 points will be added to each of the final score? On
the standard deviation of the final exam scores?
(The mean will also go up by 5 points; while standard deviation stays the same.)

3. In a fitness center, weights of a certain group of students were taken resulting to a common
weight of 140 pounds. What would be the standard deviation of the distribution of weights?
(Zero, since the observations do not vary.)

!
Chapter!1!Exploring!Data!!Lesson!8! Page!7"
! !
!
!
!
4. Determine which of the following statements is (are) TRUE or FALSE. Explain briefly
your answer.

a. If each observation in a data set is doubled, then the standard deviation would also be
doubled.
(True, since the variance would be quadrupled and taking the square root of the
resulting variance, will result to twice the standard deviation.)

b. If in a set of data, positive numbers are changed to negative, while negative are
changed to positive, then the standard deviation changes its sign as well.
(False, since standard deviation is always nonnegative.)

Explanatory Note:
Teachers have the option to ask this assessment orally to the entire class to either introduce or
recall the notions of computing the range and of computing the standard deviation, or to group
students and ask them to identify answers, or to give this as homework, or to use some
questions/items here for a chapter examination.

!
Chapter!1!Exploring!Data!!Lesson!8! Page!8"
! !
!
!
!
Chapter 1: Exploring Data
Lesson 9: More on Describing Data: SummaryMeasures and
Graphs

TIME FRAME:1 hour session

OVERVIEW OF LESSON: In this lesson, students will do an activity that will use the data on
heights and weights which were collected in Lesson 2. They will construct box plots and
calculate the summary measures they have learned in previous lessons. These computed
summary measures and constructed boxplot will be used to describe fully the data set so as to
provide simple analysis of the data at hand.

LEARNING OUTCOME(S): At the end of the lesson, the learner is able to


Construct and interpret box plots; and
Provide simple analysis of a data set based on its descriptive measures.

LESSON OUTLINE:
1. Preliminaries: Teachers Preparation for the Lesson
2. Motivation: The Students Height and Weight and Corresponding BMI
3. Construction and Interpretation of a Box-plot

DEVELOPMENT OF THE LESSON


A. Preliminaries: Teachers Preparation for the Lesson
Note: This is an activity that the teacher has to do in preparation for the lesson.
A day before the actual schedule for this lesson, you should review some information about the
body mass index (BMI) so that you could compute the BMI of each student in the class based on
the students weights and heights collected in Lesson 2. This will also make you more confident
to discuss BMI in the class as well as use it to integrate the lessons learned in this chapter. The
following discussion provides useful information about BMI.
The BMI, devised by Adolphe Quetelet, is defined as the body mass divided by the square of the
body height, and is universally expressed in units of kg/m2, using weight in kilograms and height
in meters. When the term BMI is used informally, the units are usually omitted. A high BMI can
be an indicator of high body fatness. The BMI can be used to screen for weight categories that
may lead to health problems.
The BMI provides a simple numeric measure of a person's thickness or thinness, allowing
medical and health professionals to discuss weight problems more objectively with the adult

Chapter(1(Exploring(Data((Lesson(9( Page(1"
(
(
patients. The standard weight status categories associated with BMI ranges for adults are listed
below:

BMI Range Weight Status Health Risk


Below 18.5 Underweight Risk of developing problems
such as nutritional deficiency
and osteoporosis
18.5 -22.9 Normal or Healthy Weight Low Risk (healthy range)
23.0-27.4 Overweight Moderate risk of developing
heart disease, high blood
pressure, stroke, diabetes
27.5 and above Obese High risk of developing heart
disease, high blood pressure,
stroke, diabetes

For adults, a BMI from 18.5 up to 23 indicates optimal weight, while a BMI lower than 18.5
suggests that the person is underweight, a number from 23 up to 30 indicates that the person is
overweight, and a number from 30 upwards suggests the person is obese. Note that the threshold
23 and 27.5 are used for South East Asians, as per suggestion of the World Health Organization
(WHO), though generally 25 and 30 are used.
Special Notes about interpreting BMI:
1. Many but not all athletes have a high muscle to fat ratio and may have a BMI that is
misleadingly high relative to their body fat percentage. Exceptions also can be made for the
elderly, and the infirm.

2. For children and teens, the interpretation of BMI depends upon age and sex, even though it is
computed using the same formula. This difference in interpretation is due to the variability in
the amount of body fat with age and between girls and boys, among children and teens.
Instead of comparison against fixed thresholds for underweight and overweight, the BMI is
compared against the percentile for children of the same gender and age. A BMI that is less
than the 5th percentile is considered underweight and above the 95th percentile is considered
obese. Children with a BMI between the 85th and 95th percentile are considered to be
overweight.

3. The following are other limitations in the interpretation of BMI.


a. Since the BMI depends upon weight and the square of height, it ignores the basic scaling
law which states that mass increases to the 3rd power of linear dimensions. Thus, taller
individuals, even if they had exactly the same body shape and relative composition,
always have a larger BMI

b. BMI also does not account for body frame size; a person may have a small frame and be
carrying more fat than optimal, but the BMI may suggest that these people are normal.
Alarge framed individual may be quite healthy with a fairly low body fat percentage, but
the BMI may yield an overweight classification.

Chapter(1(Exploring(Data((Lesson(9( Page(2"
(
(
In the Philippines, the governments Food and Nutrition Research Institute (FNRI) of the
Department of Science and Technology collects the anthropometric data through the National
Nutrition Survey (NNS) to be able to generate estimates on the extent of child malnutrition using
three indicators of undernutrition: underweight, wasted and stunted. The NNS is conducted every
five years and based on the gathered weights and heights, the nutritional status of the Filipinos
was assessed.

For a Filipino child whose weight is below three standard deviations from the median weight-for-
age, the child is said to be severely underweight, while if the weight is lower than two standard
deviations from the growth standard but higher than three standard deviations, then the child is
moderately underweight. Similarly, (moderate and severe) wasting and stunting are respectively,
defined in terms of the child growth standards on weight-for-height and height-for-age,
respectively. Using these standards, FNRI estimates based on the 2013 NNS about one in five
children aged 0 to 5 years were underweight, about three in ten had stunted growth. Wastingor
low weight-for-heightwas estimated at 7.9 percent.

It was also reported that incidents of malnutrition were high among those under the poorest 20
percent of families: underweight (29.8 percent), stunting (44.8 percent), and wasting (9.5
percent). Malnutrition is thus related to poverty. The percentage of overweight children was
highest among the "wealthiest" (10.7 percent). The figure below shows the trends in the
prevalence of stunting, underweight and wasting from 1989 to 2013 based on the data gathered by
FNRI through its NNS.
50
40
30
20
10

1990 1995 2000 2005 2010 2015


Year

Stunting(
Underweight Underweight(
Stunting Wasting
(

Figure 1. Prevalence of stunting, underweight, and wasting among 0-5 years old
preschoolers in the Philippines, 1989-2013.

When children under five are experiencing malnutrition, they are likely to carry this over to early
childhood, which has repercussions on learning achievements in school. In consequence,
government, through the Department of Social Welfare and Development, as well as the
Chapter(1(Exploring(Data((Lesson(9( Page(3"
(
(
Department of Education (DepED), has developed feeding programs to reduce hunger, to aid in
the development of children, to improve nutritional status and to promoting good health, as well
as to reduce inequities by encouraging families to send their children to school given the
incentive of school feeding benefits. School records of heights and weights are thus regularly
collected by DepED at the beginning and end of the school year to monitor nutrition of school-
aged children.
With this information and the class data gathered in Lesson 2, you are now to compute the BMI
of each student so that a table with the following format will be ready for the group activity
described in the next section.

Class Student Number Sex Height Weight BMI


(in meters) (in kilograms) (rounded off to whole numbers)

Note that the height of the student collected in Lesson 2 is in centimeter, thus you have to divide
the values by 100 to get the values in meters. Also, BMI is rounded off to whole numbers for
ease of computation in the group activity.

B. Motivation: The Students Height and Weight and Corresponding BMI


The activities for this lesson is to be done by groups and will be conducted during the entire class
period. Hence, it is recommended that the grouping be done at the start of the class and the group
members sit together in a circle as the activity requires group discussion. As mentioned, the
students should be advised to stay in their group for the entire class period.

A suggested way to group the students into three groups is to have them count 1-2-3 sequentially
and students with same number will belong to the same group. Once, they are seated together as
group you could begin the lesson by asking the students if they think that males and females
have the same heights, weights and BMI. Have them guess what the distribution of heights,
weights and BMI might look like for the whole class and whether the distribution of heights,
weights and BMI for males and females would be the same.

The following are some possible questions to ask:


Are the heights, weights, and BMI of males and females the same or different?
What are some other factors besides sex that might affect heights, weights and BMI?
(Possible factors that could be studied are age, location where person resides, and year the
data was collected.)

You could write these questions on the board so that the students will be reminded of these
questions while they perform a group activity. Assign the first group (those students who were
numbered 1) for the variable height; the second group (those students who were numbered
2) for the variable weight; and third group (those students who were numbered 3) for the
variable BMI You will be using the class data you prepared in the preliminary activity for this
lesson. The following table provides a sample data or what your class data should look like.
Chapter(1(Exploring(Data((Lesson(9( Page(4"
(
(
Class Student Number Sex Height Weight BMI
(in meters) (in kilograms) (rounded off to whole numbers)
1 F 1.64 40 15
2 F 1.52 50 22
3 F 1.52 49 21
4 F 1.65 45 17
5 F 1.02 60 58
6 F 1.63 45 17
7 F 1.50 38 17
8 F 1.60 51 20
9 F 1.42 42 21
10 F 1.52 54 23
11 F 1.48 46 21
12 F 1.62 54 21
13 F 1.50 36 16
14 F 1.54 50 21
15 F 1.67 63 23
16 M 1.72 55 19
17 M 1.65 61 22
18 M 1.56 60 25
19 M 1.50 52 23
20 M 1.70 90 31
21 M 1.53 50 21
22 M 1.62 90 34
23 M 1.79 80 25
24 M 1.57 58 24
25 M 1.70 68 24
26 M 1.77 27 9
27 M 1.48 50 23
28 M 1.73 94 31
29 M 1.56 66 27
30 M 1.75 50 16

With the class data, ask each group to do the following for the assigned variable in their group:
1. Compute the descriptive measures for the whole class and also for each subgroup in the data
set with sex as the grouping variable. The descriptive measures to compute include the
measures of location such as minimum, maximum,mean, median, first and third quartiles;
and measures of dispersion such the range, interquartile range (IQR) and standard deviation.
Each group could use the following format of the table to present the computed measures:

Chapter(1(Exploring(Data((Lesson(9( Page(5"
(
(
Table 9.1 Summary statistics of the variable __________.
Descriptive Measure Computed Value
For the whole class For the subgroup of For the subgroup of
with N = ___ Males with N = ___ Females with N = ___
Measures of Location
Minimum
Maximum
Mean
First Quartile
Median
Third Quartile
Measures of Dispersion
Range
IQR
Standard Deviation

2. With the computed descriptive measures, write a textual presentation of the data for the
variable assigned to the group.

The following tables provide the descriptive measures of the sample class data as a whole and by
subgroup. Note that there might be discrepancies in the computed values due to rounding off.

Table 9.2 Summary statistics of the variable height (in meters) using the sample data.
Descriptive Measure Computed Value
For the whole class For the subgroup of For the subgroup of
with N = 30 Males with N = 15 Females with N = 15
Measures of Location
Minimum 1.020 1.480 1.020
Maximum 1.790 1.790 1.670
Mean 1.582 1.642 1.522
First Quartile 1.520 1.560 1.500
Median 1.585 1.650 1.520
Third Quartile 1.670 1.730 1.630
Measures of Dispersion
Range 0.770 0.310 0.650
IQR 0.150 0.170 0.130
Standard Deviation 0.144 0.103 0.157

Possible textual presentation of the data on heights:

Chapter(1(Exploring(Data((Lesson(9( Page(6"
(
(
Based on Table 9.2, on the average, a student of this class is 1.582 meters high. The shortest
student is just a little bit over one meter while the tallest is 1.79 meters high resulting to a range
of 0.77 meter. The median which is 1.585 is almost the same as the mean height.

Comparing the males and female students, on the average male students are taller than female
students but the dispersion of the heights of the female students is wider compared to that of the
male students. Thus, male students of this class tend to be of same heights compared to female
students.

Table 9.3 Summary statistics of the variable weight (in kilograms) using the sample data.
Descriptive Measure Computed Value
For the whole class For the subgroup of For the subgroup of
with N = 30 Males with N = 15 Females with N = 15
Measures of Location
Minimum 27.0 27.0 36.0
Maximum 94.0 94.0 63.0
Mean 55.8 63.4 48.2
First Quartile 46.0 50.0 42.0
Median 51.5 60.0 49.0
Third Quartile 61.0 80.0 54.0
Measures of Dispersion
Range 67.0 67.0 27.0
IQR 15.0 30.0 12.0
Standard Deviation 15.9 18.4 7.7

Possible textual presentation of the data on weights:

Using the statistics on Table 9.3, on the average, a student of this class weighs 55.8 kilograms.
The minimum weight of the students in this class is only 27 kilograms while the heaviest student
of this class is 94 kilograms. There is a wide variation among the values of the weights of the
students in this class as measured by the range which is equal to 67 kilograms. The median
weight for this class is 51.5 kilograms which is quite different from the mean as the value of the
latter was pulled by the presence of extreme values.

Comparing the males and female students, on the average male students are heavier than female
students. The extreme values observed for the class are both coming from male students. The
wide variation observed on the students weights of this class was also observed among the
weights of the male students. In fact, the standard deviation of the weights of the male students
is more than double the standard deviation of the weights of female students.

Table 9.4 Summary statistics of the variable BMI (in kg/m2) using the sample data.

Chapter(1(Exploring(Data((Lesson(9( Page(7"
(
(
Descriptive Measure Computed Value
For the whole class For the subgroup of For the subgroup of
with N = 30 Males with N = 15 Females with N = 15
Measures of Location
Minimum 9.0 9.0 15.0
Maximum 58.0 34.0 58.0
Mean 22.9 23.6 22.2
First Quartile 19.0 21.0 17.0
Median 21.5 24.0 21.0
Third Quartile 24.0 27.0 22.0
Measures of Dispersion
Range 49.0 25.0 43.0
IQR 5.0 6.0 5.0
Standard Deviation 8.3 6.2 10.2

Possible textual presentation of the data on BMIs:

Table 9.4 shows that the minimum BMI of the students in the class is 9 while the maximum is 58
kg/m2. On the average, a student of this class has a BMI of 22.9. Also, the median BMI for this
class is 21.5 which is near the value of the mean BMI. The variability of the values is also not
that large as a small standard error value of 8.3 was obtained.

Comparing the males and female students, on the average, the BMI of the male and female
students are near each other with numerical values equal to 23.6 and 22.2, respectively.But there
is a wider variation among the BMI values of the female students compared to that of the male
students. The standard deviation of the BMIs of the male students is less than that of the female
students.

Visual comparison of the data distributions between two or among several groups could be
achieved through box-plots. You may ask the students if they already know how to construct a
box-plot. If so, you may just review the steps with them. Otherwise, you may briefly discuss the
steps in constructing box-plot as given in the next section before you ask them to construct box-
plots for their respective data sets.

C. Construction of a Box-Plot
Using five summary statistics, namely: minimum, maximum, median, first and third quartiles, a
box-plot can be constructed as follows:
1. Draw a rectangular box (horizontally or vertically) with the first and third quartiles as the
endpoints. Thus the width of the box is given by the IQR which is the difference between
the third and first quartiles.

Chapter(1(Exploring(Data((Lesson(9( Page(8"
(
(
2. Locate the median inside the box and identify it with a line segment.

3. Compute for 1.5 IQR. Use this value to identify markers. These markers are used to
identify outliers. The lowest marker is given by Q1 1.5IQR while the highest marker is
Q3+ 1.5IQR.Values outside these markers are said to be outliers and could be represented
by a solid circle.

4. One of the two whiskers of the box-plot is a line segment joining the side of the box
representing Q1 and the minimum while the other whisker is a line segment joining Q3
and the maximum. This is for the case when the minimum and maximum are not outliers.
In the case that there are outliers, the whiskers will only be line segments from the side of
box and its corresponding marker.
Inform also the students that a box-plot is also called box-and-whiskers plot and it could
easily be generated using a statistical software. Comparison of data distributions could easily
be done visually using this kind of plots. Likewise, in technical papers or reports, a box-plot
is an accepted graphical presentation of data distribution.
To complete the activity for this lesson, ask each group to construct box-plots of the male
and female data distributions of their assigned variable. They could further improve their
textual presentation by interpreting the resulting box-plots of their data sets.
Using the sample class data, the following figures provide the box-plots for the variables
heights, weights and BMI by sex of the student. The said figures confirm what were stated in
the textual presentation.

Figure 9.1 Box-plots of the variable heights of the 30 students by sex.


We could also note that in Figure 9.1, the distribution of heights for the girls has a larger range
because of an outlier as represented by a solid circle given on the plot. The distribution of the
girls heights has smaller median compared to the male distribution.
Chapter(1(Exploring(Data((Lesson(9( Page(9"
(
(
Figure 9.2 Box-plots of the variable weights of the 30 students by sex.
For the variable weights, females have a lower median weight than males, as well as less
variability. The middle 50% of the female weight distribution is also observed to be contained
within the range of the male weight data.

Figure 9.2 Box-plots of the variable BMI of the 30 students by sex.

As for the variable BMI, females have a lower median BMI and lower variability compared to
those of males. There is, at least extremely obese female, and one is severely underweight male.

With the computed descriptive statistics and corresponding box-plot(s), the analysis or textual
presentation could be further improved by describing data not only in terms of the measures but

Chapter(1(Exploring(Data((Lesson(9( Page(10"
(
(
also in terms of the interpretation of box plots. Furthermore, these measures allow us to answer
the guide questions provided at the start of the class.

KEY POINTS

Descriptive measures are important statistics required in simple data analysis.


Groups of data could be compared in terms of their descriptive measures.
A box-plot is an approach to compare visually data distributions.

REFERENCES
Albert, J. R. G. (2008). Basic Statistics for the Tertiary Level (ed. Roberto Padua,
WelfredoPatungan, Nelia Marquez), published by Rex Bookstore.
Armspans inSTatistics Education Web (STEW)
http://www.amstat.org/education/stew/pdfs/Armspans.docx

Deciding Which Measure of Center to Use http://www.sharemylesson.com/teaching-


resource/deciding-which-measure-of-center-to-use-50013703/
Handbook of Statistics 1 (1st and 2nd Edition), Authored by the Faculty of the Institute of
Statistics, UP Los Baos, College Laguna 4031
Workbooks in Statistics 1 (From 1st to 13th Edition), Authored by the Faculty of the Institute of
Statistics, UP Los Baos, College Laguna 4031

Chapter(1(Exploring(Data((Lesson(9( Page(11"
(
(
ASSESSMENT
Note: Answers are provided inside the parentheses and in bold face.

In a university the grading scale that is used for a subject are as follows: 1.0; 1.25; 1.5; 1.75; 2.0;
2.25; 2.5; 2.75; 3.0; 4.0; and 5.0 Grades from 1.0 to 3.0 are passing grades with 1.0 as the highest
possible grade. The grade of 5.0 is failing while 4.0 is a conditional grade. At the end of the
semester, the general weighted average (GWA) of the students are computed and students with
high GWAs are usually recognized. Below is a table showing the GWA and sex of thirty
students who are to be recognized in a program for having high GWAs.

Name GWA Sex


Imelda 1.54 F
Frederick 1.45 M
Gerald 1.42 M
Jose 1.52 M
Ana 1.56 F
Isidoro 1.34 M
Roberto 1.36 M
Katherine 1.43 F
Barbara 1.49 F
Josie 1.58 F
Maria 1.64 F
Kenneth 1.56 M
Ofelia 1.56 F
Amparo 1.49 F
James 1.42 M
Ditas 1.24 F
Frenz 1.78 F
Ronald 1.06 M
Ruben 1.33 M
Belle 1.45 F
Elmo 1.38 M
Connie 1.27 F
Gina 1.22 F
Marcia 1.59 F
Jikko 1.60 M
Susan 1.59 F
Emman 1.63 M
Pinky 1.70 F
Rose 1.75 M
Brad 1.58 M

Chapter(1(Exploring(Data((Lesson(9( Page(12"
(
(
Use the approaches below to compare the academic performance of male and female students in
the previous term.
1. Compute for the descriptive measures which include the measures of location such as
minimum, maximum, mean, median, first and third quartiles; and measures of dispersion
such the range, interquartile range (IQR) and standard deviation by sex.

Descriptive Measure Computed Value


For the subgroup of For the subgroup of
Males with N = 14 Females with N = 16
Measures of Location
Minimum 1.06 1.22
Maximum 1.75 1.78
Mean 1.46 1.51
First Quartile 1.36 1.44
Median 1.44 1.55
Third Quartile 1.58 1.59
Measures of Dispersion
Range 0.69 0.56
IQR 0.22 0.15
Standard Deviation 0.17 0.16

2. Using the computed descriptive statistics, compare the two distributions in terms of their
measures of location and measures of dispersions. On the average, which group of students
perform better academically in the previous term? Which group varies more?
(On the average, the numerical GWA of female students is 1.51 while male students
have an average GWA of 1.46 which implies that male students in this group perform
better academically than the female students. There is also difference in the numerical
values of the computed medians but still the same observation that males perform
better than females. However, the variability of the observations for the male students
is higher compared to those of the female students. Hence, we say that the GWAs of
male students vary more than those of the female students.)

3. Sort the data within each group then determine what proportion in each group is within one
standard deviation of that group's mean. Are the proportions similar?

(Sorted Data of Male Students:


1.06 1.33 1.34 1.36 1.38 1.42 1.42 1.45 1.52 1.56 1.58 1.6 1.63 1.75

! ! = !. !" !. !" = !. !", !. !" Note that there are 12 out of 14 observations are
within the interval or 86% of the observations are within one standard deviation of the
mean.

Chapter(1(Exploring(Data((Lesson(9( Page(13"
(
(
Sorted data for the female students:

1.22 1.24 1.27 1.43 1.45 1.49 1.49 1.54 1.56 1.56 1.58 1.59 1.59 1.64 1.7 1.78

! ! = !. !" !. !" = !. !", !. !" Note that there are 11 out of 16 are within the
interval or 69% of the observations are within one standard deviation of the mean.

The proportions of observations that are within one standard deviation of the mean for
each group are not the same. The proportion for the male group is larger than that of
the female group. This support the observation earlier that the GWAs of the male
students are more varied compared to those of female students.)

4. Construct box-plots of the GWAs for the males and females. Compare the two data
distributions of GWAs.

Visually, the two distributions of GWAs are different. The GWAs of the female
students are less dispersed compared to that of the male students. Numerically, the
median GWA of male students is lower than that of the female students. Hence, male
students of this group perform better academically than their female counterpart. But
the numerical values of the GWAs of the female students are close to each other.

Chapter(1(Exploring(Data((Lesson(9( Page(14"
(
(

You might also like