You are on page 1of 36

INTRODUCTION

STATISTICS is a collection of methods for planning experiments,


obtaining data, and then organizing, summarizing, analyzing,
interpreting, and drawing conclusions based on the data.

DESCRIPTIVE STATISTICS consists of procedures used to


summarize and describe the important characteristics of a set
of measurements.

INFERENTIAL STATISTICS consists of procedures used to


make inferences about population characteristics from
information contained in a sample drawn from this population

"The theory of statistics uses probability to measure the


uncertainty associated with an inference. It enables us to
calculate the probabilities of observing specific samples,
under specific assumptions about the population. The
statistician uses these probabilities to evaluate the
uncertainties associated with sample inferences."
Definition of terms:
Data are information or facts necessary to conduct a certain
study.
A variable is a characteristic that changes or varies over time
and/or for different individuals or object under
consideration.
A random variable is a variable whose numerical is
determined by the outcome of some chance experiment.

An experimental unit is the individual or object on which a


variable is measured. A single measurement or data
value results when a variable is actually measured on an
experimental unit.
The population in a statistical study is the group of objects
drawn about which conclusions are to be drawn.
A sample is a subset of measurements selected from the
population of interest.

A parameter is a numerical measurement describing some


characteristics of a population and a statistic is a
numerical measurement describing some characteristics
of a sample

Univariate data result when a single variable is measured on


a single experimental unit.
Bivariate data result when two variables are measured on a
single experiment unit.
Multivariate data results when more than two variables are
measured.

A. Types of variables:
Qualitative variable measures a quality or characteristic on
each experiment unit.
Ex. - taste ranking: excellent, good, fair, poor,
- color of M&M candy: brown, yellow, red orange,
green, blue
Quantitative variable measures a numerical quantity or
amount on each experiment unit.
Ex. - weight of package ready to be shipped
- volume of orange juice in a glass

Types of Quantitative Data:


Discrete data results from either a finite of possible values or
countable number of possible values (That is, the number
of possible values is 0, 1 or 2, and so on)
Continuous data results from many possible values that can
be associated with points on a continuous scale in such a
way that there are no gaps or interruptions.

B. Four Levels of Measurement:


The nominal level of measurement is characterized by the
data that consist of names, labels or categories only, and the
data cannot be arranged in an ordering scheme.
Ex. - collection of yes, no, undecided responses to a
survey question.
- responses consisting of 10 nurses, 15 teachers,
16 engineers, 5 priests, 20 businessmen.
The ordinal level of measurement involves data that may
be arranged in some order, but differences between data
values either cannot be determined or are meaningless.
Ex.
-In a sample of 24 car stereos, 15 were rated
good, 6 were rated better, 3 were rated best
-in considering employee promotion, a manager
ranked Myrna 3rd, Al 7th, and Jena 10th

The Interval level measurement is like the ordinal level, with


the a additional that meaningful amounts of differences
between data can be determined. However, there is no
inherent zero stating point.
Ex.
-body temperatures ( in degrees Celsius )
The ratio level of measurement is the interval level modified
to include the inherent zero starting point. For values at
this level, differences and ratios are meaningful.
Ex.
-heights of pine trees along Session road.
- temperature readings on Kelvin Scale since
the scale ha s an absolute zero

Classify the following statements as belonging to the area of


descriptive statistics or statistical inference:
(a) As a result of recent cutbacks by the oil-producing
nations, we can expects the price of gasoline to double
in the next years.
(b) At least 5% of all fires reported last year in a certain city
were deliberately set by arsonists.
(c)
Of all patients who have received this particular type of
drug at a local clinic, 60% later developed significant
side effects.
(d) Assuming that less than 20% of the Columbian coffee
beans were destroyed by frost this past winter, we
should expect an increase of no more than 30 cents for
a kilogram of coffee by the end of the year
(e) As a result of a recent poll, most Americans are in favor
of building additional nuclear power plants.

EXERCISES: Understanding the concepts


A. Identify the experimental units on which the ff.
Variables are measured:
1. Gender of student
2. Number of errors on a midterm exam
3. Age of a cancer patient
4. Number of flowers on an azalea plant
5. Color of a car entering the parking lot

B.
Identify each variable as quantitative or qualitative:
1. Amount of time it takes to assemble a simple puzzle
2. Number of students in a first grade classroom
3. Rating of newly elected politician ( excellent, good,
fair, poor )
4. State in which a person lives.
C. Identify the following quantitative variables as discrete
or continuous:
1. Population in a particular area of the Philippines
2. Weight of newspapers recovered for recycling on a
single day.
3. Time to complete a probability exam

D.
A data set consist of the ages at death for each of the
41 past president of the United States
1. Is this a set of measurements a population or a
sample?
2. What is the variable being measured?
3.
Is the variable in part b quantitative or qualitative?
E.
Determine which of the four level of measurement is
most appropriate:
1. Weights of a sample of M&M candies
2. Instructors rated as superior, above average, average,
or poor
3. Lengths (in minutes) of movies
4. Zip codes
5. Movies listed according to their genre, such as comedy,
adventure, and romance

FREQUENCY DISTRIBUTION
When the set of data includes a large number of
observe values. It becomes practical to group the data into
classes or categories with the corresponding number of
terms falling into each class. The result is a tabular
arrangement called a frequency distribution.
Definition of terms:
A frequency table categories (or classes) of scores,
along with counts (or frequencies) of the number of scores
that fall into each category.
The frequency for a particular class is the number of
original scores that fall into that class.

CLASS
INTERVAL

TALLY

FREQU
ENCY

CLASS
BOUNDARY

CLASS
MARK

<CF

>CF

5-9

IIII

4.5-9.5

100

10-14

IIII-III

9.5-14.5

12

12

96

15-19

IIII-IIII-IIII-II

17

14.5-19.5

17

29

88

20-24

IIII-IIII-IIII-IIIIIIII-I

26

19.5-24.5

22

55

71

25-29

IIII-IIII-IIII-IIII

20

24.5-29.5

27

75

45

30-34

IIII-IIII-IIII

15

29.5-34.5

32

90

25

35-39

IIII-IIII

10

34.5-39.5

37

100

10

Lower class limits are the smallest number that can actually
belong to the different classes.
Upper class limits are the largest number that can actually
belong to the different classes.
Class boundaries are the numbers used to separate
classes, but without the gaps created by the class limits.
They are obtained increasing the upper class limits and
decreasing the lower class limits by the same amount so
that there are no gaps between consecutive classes. The
amount be added or subtracted is one-half the difference
between the upper limit of one class and the lower limit of
the following class.
Class marks are the midpoints of the classes. They can be
found by adding lower class limits and dividing by 2.

Class width or Class size is the difference between two


consecutive lower class limits or two consecutive lower
class boundaries.
Relative Frequency ratio of the class frequency to the total
frequency
Cumulative Frequency accumulated frequency that is <, > to
a stated value. We obtain the > cumulative frequency if the
frequencies are summed from bottom up to find the
number of observations greater than a specified lower
class boundary. The less than cumulative is constructed if
the frequencies are summed from top down to find the
number of observations less than a particular upper class
boundary.

CLASS
INTERVAL

TALLY

FREQUENCY

CLASS
BOUNDARY

CLASS
MARK

<CF

>CF

5-9

4.5-9.5

100

10-14

9.5-14.5

12

12

96

15-19

17

14.5-19.5

17

29

88

20-24

26

19.5-24.5

22

55

71

25-29

20

24.5-29.5

27

75

45

30-34

15

29.5-34.5

32

90

25

35-39

10

34.5-39.5

37

100

10

A. Steps in constructing Frequency table.


Step 1: Count the number of data points in the set of data.
Step 2: Determine the range R, for the entire data set. The
range is the smallest value in the set of data subtracted
from the largest value
Step 3: Decide on the number of the class intervals. The
ideal number of class intervals is somewhere between 5
and 15. To approximate the appropriate number of class
intervals, we may use Herbert Sturges Formula
K = 1 +3.322 log n
Where K stands for the number of classes suggested and
n represents the total frequency. Avoid having too many
classes or too few classes. Too many classes may lead to
several empty classes. Too few classes tend to lose
important details of the data.

Step 4: Determine the class width by dividing the number


of classes into the range. Round the result up to a
convenient number. This rounding up ( not off ) not only
is convenient, but also guarantees that all of the data will
be included in frequency table.
Class width ( i ) = round up of ( range/number of classes )
Step 5: Select as the lower limit of the first class either the
lower score or convenient value slightly less than the
lowest score. This value serve as the starting point.

Step 6:
Add the class width to the starting point to get
the second lower class limit. Add the class width to the
second lower class limit to get the third, so on.

Step 7:
List the lower class limits in a vertical column,
and enter the upper class limits, which can be easily
identified at this stage.
Step 8:
Represent each score by a tally in the
appropriate class.
Step 9:
Replace the tally marks in each class with the
total frequency count for that class.

Example: The test scores of sixty students in Statistics are


recorded as follows:
78

51

61

74

68

78

62

71

88

72

66

77

82

68

68

73

56

82

66

71

58

75

67

75

86

66

70

71

64

73

85

74

62

84

66

92

91

57

61

78

63

73

58

79

61

83

88

81

75

57

68

70

54

79

62

78

59

70

66

81

1. Number of data points = 60


2. Range = 92 51 = 41
3. 3. Using Sturges formula, K = 1 +3.322 log 60 = 7.
Therefore, class intervals is seven.
4. The class size or width is computed as i = 41/7 = 5.86 = 6
Instead of starting the first class at 51, choose to start
at the nice round number 50.
Thus , the first class is 50- 55. Adding 6 to both limits, we
obtain the next interval 56-61.

CLASS
CLASS
MIDPOINT TALLY
INTERVAL BOUNDARIES

FREQUENCY

50 55

49. 5 55.5

52.5

II

56 61

55.5 61.5

58.5

IIII-IIII

62 67
68 73
74 79
80 85
86 91
92 97

61.5 67.5
67.5 73.5
73.5 79.5
79.5 85.5
85.5 91.5
92.5 97.5

64.5
70.5
76.5
82.5
88.5
94.5

IIII-IIII-1
IIII-IIII-IIII
IIII-IIII-II
IIII-III
IIII
I

11
14
12
7
4
1

3.

The number of television viewing hours per household


and the prime viewing times are two factors that affect
television advertising income, A random sample of 50
households in a particular viewing area produced the
following estimated of viewing hours per household.
3.0 6.0

7.5

15.0 12.0 6.6

9.5

6.5 8.0

4.0

5.5

6.0

5.6

13.3 13.1 5.5

12.5

5.0 12.0 1.0

3.5

3.0

2.4

3.8

4.5

8.0

2.5

7.5 5.0

10.0 8.0

3.5

2.6

8.5

2.5

6.4

7.6

9.0 2.0

6.5

5.0

7.7

9.3

6.5

8.2

8.8

1.0

14.5 10.5 11.0

a. Starting with the lowest value as the lower class limit,


construct a frequency distribution.
b Determine the class marks, class boundaries, relative
frequency, <CF, and >CF.

GRAPHICAL REPRESENTATION OF FREQUENCY


DISTRIBUTION
A histogram or frequency histogram, is a bar
graph which consist of a set of rectangles while the
frequency polygon is a line graph. Both graphs are
intended to show more salient features of the frequency
distribution.
a. HISTOGRAM
The histogram is a set of vertical bars having their bases
or the horizontal axes which center on the class marks.
The width corresponds to the class marks and the height
correspond to the frequencies.
A histogram differs from a bar chart in the bases of each
bar are the class boundaries rather than the class limits.

CLASS
INTERVAL

TALLY

FREQUENCY

CLASS
BOUNDARY

CLASS
MARK

<CF

>CF

5-9

4.5-9.5

100

10-14

9.5-14.5

12

12

96

15-19

17

14.5-19.5

17

29

88

20-24

26

19.5-24.5

22

55

71

25-29

20

24.5-29.5

27

75

45

30-34

15

29.5-34.5

32

90

25

35-39

10

34.5-39.5

37

100

10

HISTOGRAM

F
R
E
Q
U
E
N
C
Y

CLASS BOUNDARY

b.

FREQUENCY POLYGON
The frequency polygon is a modification of the histogram;
only, the frequency polygon is line graph where the class
frequencies is plotted against the class marks. To close the
polygon, an extra class mark at each end must be added. The
frequency polygon can also be obtained by connecting
midpoints of the tops of the rectangles in the histogram.
c. OGIVES
A line graph showing the cumulative frequency of distribution
is called an ogive. For the less than ogive, the less than
cumulative frequencies are plotted against the upper class
boundaries. For the greater than ogive, the greater than
cumulative frequencies are plotted directly above the lower
class boundaries. These graphs are useful in estimating the
number of observations that are less than or more than a
specified value.

POLYGON

F
R
E
Q
U
E
N
C
Y

CLASS BOUNDARY

STEM AND LEAF PLOTS


Another simple way to display the distribution of a
quantitative data set is the stem and leaf plot. This
procedure was introduced by Tukey and is one of the
primary tools of explanatory data analysis. A stem and leaf
diagram consists of a series of horizontal rows of
numbers. The number used to label a row is called a stem,
and the remaining numbers in the row are called leaves..

Steps:
1. Divide each measurement into two parts: the stem and
the leaf.
2. List the stem in a column, with a vertical line to their right.
3. For each measurement, record the leaf potion in the
same row as its corresponding stem.
4. Order the leaves from the lowest to highest in each stem.
5. Provide a key to your stem and leaf coding so that the
reader can recreate the actual measurements if
necessary.

Sometimes the available stem choices result in a plot that


contains too few stems and a large number of leaves
within each stem. In this situation, you can stretch the
stems by dividing each one into several lines, depending
on the leaf values assigned to them. Stems are usually
divided in one of two ways:
Into two lines, with leaves 0-4 in the first line and
leaves 5-9 in the second line.
Into five lines, with leaves 0-1, 2-3, 4-5, 6-7, and
8-9 in the five lines respectively.

Example:
The data below are the GPAs of 30 Adamson
University freshmen, recorded at the end of the
freshmen year. Construct a stem and leaf plot to display
the distribution of the data.

2.0

3.1

1.9

2.5

1.9

2.3

2.6

3.1

2.5

2.1

2.9

3.0

2.7

2.5

2.4

2.7

2.5

2.4

3.0

3.4

2.6

2.8

2.5

2.7

2.9

2.7

2.8

2.2

2.7

2.1

STEM AND LEAF DIAGRAM


STEM

LEAF

FREQUEN
CY

1 (8-9)

99

2 (0-1)

011

2 (2-3)

32

2 (4-5)

5554545

2 (6-7)

6777767

2 (8-9)

9898

3 (0-1)

1010

3 (2-3)

3 (4-5)

STEM AND LEAF DIAGRAM


STEM

LEAF

FREQUEN
CY

1 (8-9)

99

2 (0-1)

011

2 (2-3)

23

2 (4-5)

4455555

2 (6-7)

6677777

2 (8-9)

8899

3 (0-1)

0011

3 (2-3)

3 (4-5)

You might also like