You are on page 1of 82

STATISTICAL

Topic 2
MEASURES OF DATA
Parameter and Statistic
Measures of Central Tendency
Measures of Variation
Chebyshevs Theorem
Z-scores
Parameter and Statistic
A measure computed on the basis
of data obtained from a sample is
termed a statistic.

A parameter is a measure
computed on the basis of data
obtained from an entire population.
The sample statistic is
presumed to be an estimate
of the population parameter.
Example

The average height of


a selected sample of
grade one pupils in a
STATISTIC
certain school is a PARAMETER
statistic. The average
height of all grade
one pupils in the
school is a parameter.
Measures of Central Tendency

Any measure indicating the


center of a set of data, arranged
in an increasing and decreasing
order of magnitude.
Measures of Central Tendency
A single value that attempts to describe a
set of data by identifying the central
position within that set of data.
It is also referred as measures of central
location.
Measures of Central Tendency

MEAN
MEDIAN
MODE
FOR UNGROUPED DATA(< 30 values)

MEAN (Arithmetic)

The mean is the average.


It is computed by summing
the inputs and dividing by the
number of inputs.
Mean
The most popular and well known measure
of central tendency.
The mean is essentially a model of your
data set. It is the value that is most common.
It minimizes error in the prediction of any
value in your data set.
Formula:

i
x
X i 1
N
Example

1. The number of employees at 5 different drug


stores are 3, 5, 6, 4, and 6. Treating the data
as population, find the mean number of
employees for the 5 stores.
2. A food inspector examined a random sample
of 7 cans of certain brand of tuna to
determine the percent of foreign impurities.
The following data were recorded: 1.8, 2.1,
1.7, 1.6, 0.9, 2.7, and 1.8. Compute the
sample mean.
Disadvantage of using mean
It is particularly susceptible to the influence
of outliers.

These are the values that are unusual


compared to the rest of the data set being
especially small or large in numerical value.
Disadvantage of using mean
Skewed Data
A distribution is said to be skewed if when
the data points cluster more toward one side
of the scale than the other, creating curve
that is not symmetrical.
The right and left side of the distribution are
shaped differently from each other.
Disadvantage of using mean
Negative Skew

If the scores fall toward the higher side of the scale and there
are very few low scores.
Positive Skew

If the scores fall toward the lower side of the scale and there are very few
higher scores.
Skewed Mean and Median
MEDIAN

The median is the midpoint


of a set of numbers. The
numbers must be arranged in
order from lowest to highest or
vice-versa.
If there is an odd number of
inputs, the median is the middle
input.
If there is an even number of
inputs, the median is the
average of the two inputs in the
middle.
Example

1. On 5 terms tests in sociology a student


has made grades of 82, 93, 86, 92 and
79. Find the median for this population
of grades.
2. The nicotine contents for a random
sample of 6 cigarettes of a certain
brand are found to be 2.3, 2.7, 2.5,
2.9, 3.1, and 1.9 milligrams. Find the
median.
MODE

The mode is the input that


appears most times. There
can be more than one mode.
Example

1. The donations from the residents of


fairway Forest towards the Virginia Lung
Association are recorded as 9, 10, 5, 9, 9,
7, 8, 6, 10, and 11. Find the mode.
2. The number of movies attended last
month by a random sample of 12 high
school students were recorded as follows:
2, 0, 3, 1, 2, 4, 2, 5, 4, 0, 1, and 4. Find the
mode.
Notation:

Measure of

Central Tendency Statistic Parameter

Mean X
Median md Md
Mode mo Mo
Problem:

The following data show the


amount of phosphates per
load of laundry, in grams, for a
random sample of various
types of detergents used
according to the prescribed
directions.
Laundry Detergent Phosphates per Load(gm)
A&P Blue Sail 48
Dash 47
Concentrated All 42
Cold Water All 42
Breeze 41
Oxydol 34
Ajax 31
Sears 30
Fab 29
Cold Power 29
Bold 29
Rinso 26
For the given data, find the
mean
median
mode
Solution:

Computing for the MEAN:


n

x i
X i 1
n
48 47 42 42 41 34 31 30 29 29 29 26
X
12
X 35.67
Computing for the MEDIAN:

34 + 31
md =
2
md 32.5
Computing for the MODE:

mo = 29
Variability
Variability
A measure of the spread of a data set.

Refers to how spread out a group of data is.


In other words, variability measures how
much your scores differ from each other. It
also referred to as dispersion of spread.
Measures of Variability

A measure of dispersion of
about the mean. It describes how
the observations spread out along
the scale of distribution.
Measures of Variability

Consider the following measurements, in liters, for


two samples of orange juice bottled by companies A
and B.
Sample A 0.97 1.00 0.94 1.03 1.06
Sample B 1.06 1.01 0.88 0.91 1.14

Both samples have the same mean, 1.00 liters.


Company A bottles orange juice with a more
uniform content than company B.
The variability or dispersion of the observations
from the average is less for sample A than for
sample B.
Measures of Variation

Range
Variance
Standard Deviation
FOR UNGROUPED DATA(< 30 values)

RANGE
The range of a set of data is
the difference between the
largest and smallest number in
the set.
R = xH - xL
Example:

The IQs of 5 members of a


family are 108, 112, 127, 118,
and 113. Find the range.
R = xH xL
= 127 108

= 19
VARIANCE

The variance is a measure of how


close the scores in the data set are
to the mean.
It is the average of the squared
deviations (except that n 1 is used
instead of n for sample variance).
For a population,
N

variance, 2
(x
i 1
i ) 2

N
For a sample, n

i
( x x ) 2

variance, s
2 i 1
n 1
Example:

Assuming that two sets A and B are populations,


calculate their variance.
Set A 3 4 5 6 8 9 10 12 15
Set B 3 7 7 7 8 8 8 9 15

For Set A:
N

( xi ) 2
9

x 8
2
i
2 i 1
i 1
N 9


5 4 3 2 0 1 2 4 7
2 2 2 2 2 2 2 2 2

9
124

9
STANDARD DEVIATION

The standard deviation is


a measure of the spread or
dispersion of scores from the
mean in a distribution.
Sample data set for s
A 28, 29, 30, 31, 32

B 10, 20, 30, 40, 50


Sample Problem
Find the sample variance and the sample
standard deviation for the height data below:

A simple random sample of five men is


chosen from a large population of men, and
their heights are measured. The five heights
(in inches) are 65.51, 72.30, 68.31, 67.05, and
70.68.
Problem:

A certain city in the south has a total of


nine industrial factories. These factories
reported the following number of days
their operation were stopped because of
strikes: 20, 19, 18, 16, 15, 14, 13, 12 &
8. Determine the following: range,
variance, and standard deviation.
Solution:

Computing for the range:

R = 20 - 8 = 12
Computing for the MEAN:
n

x i
i 1
n
20 19 18 16 15 14 13 12 8

9

15
Computing for the variance: ( 15)
Xi Xi - (Xi - )2
20 5 25
19 4 16
18 3 9
16 1 1
15 0 0
14 -1 1
13 -2 4
12 -3 9
8 -7 49

/Xi - / = 26 (Xi - )2 = 114


From the Table: (Xi - )2 = 114
N =9

(x i ) 2

2 i 1
N
2
114
=
9
2
=12.67
Computing for the Standard Deviation:

12.67
2
2

12 .67
It is the square root of
the variance. 3.56
Chebyshevs
Theorem
A Russian mathematician
Pafnuty Lvovich Chebyshev
discovered that the fraction of the
measurements falling between any
two values symmetric about the mean
is related to the standard deviation.
Chebyshevs Theorem
Chebyshevs Theorem

Chebyshevs Theorem is true for any sample set.


Chebyshevs Theorem

Interval = x +- ks
Problem 1:

A coffee-maker is regulated so that it


takes an average of 5.8 minutes to brew a
cup of coffee with a standard deviation of
0.6 minute. Using Chebyshev's theorem,
determine the percentage of the times
that this coffee-maker is used will the
brewing time take anywhere from 4.6
minutes to 7.0 minutes.
x 5.8
Solution:
s 0.6 Interval =7.0

Interval = x ks
7.0 = 5.8 + k(0.6)
k(0.6) = 1.2
k = 2

75%
1 1 3
1 2 1 2
k 2 4
Problem 2
Empirical Rule

For many large populations (bell-shaped) the empirical provides an


estimate of the approximate percentage of observations that are
contained within one, two, or three standard deviations of the mean:

Approximately 68% of the observations are in the interval 1.


Approximately 95% of the observations are in the interval 2.
Almost all of the observations are in the interval 3
Problem 3
Measures of Relative Variation

If the units are different between


two or more distributions and/or
their means are different, then, the
measures of relative variation are
more appropriate in comparing the
variability among these
distributions.
Measures of Relative Variation

They are:
the standard score, Z
the coefficient of variation,V
Standard Score

The standard score tells the


relative location of a
particular raw score with
regards to the mean of all
scores in a series.
Formula:

x
z

Coefficient of Variation

The coefficient of variation


expresses the standard deviation
as a percentage of the mean.
Formula:

s
V 100 %
x
OR


V 100%

Problem:

An automobile
salesman made a SALE
profit of $245 on a
subcompact model
for which the
average profit has
been $200 with a
standard deviation
of $50.
Later on the same
day, he made a
profit of $620 on a
large luxury model
for which the
average profit has
been $500 with a
standard deviation
of $150.
For which of these
two models is the
salesman's profit
relatively higher?
x
Solution: z

For Subcompact Model For Luxury Model


x1 = 245 x2 = 620
1 = 200 2 = 500
1 = 50 2 = 150
245 200 620 500
z1 0.9 z2 0.8
50 150

Profit for the subcompact model is


relatively higher.
Problem:

The mean closing price of stock A


over the past year is P120 with a
standard deviation of P15. In the
case of stock B, the mean is P100
with a standard deviation of P8.
Which stock varies more in price?
s
Solution: V 100%
x
For Stock A For Stock B
x 120 x 100
s 15 s 8
15 8
V 100% 12.5% V 100% 8.0%
120 100

Stock A has a greater variability


in price. Therefore, stock A is
more risky than B.
Student
Activity
Answer the following:

1. What is the median of the following set


of data?
1, -1, 2, -2, 3, -3, 4, -4, 5, -5, 6, -6, 7, -7
2. Which of the following sets of data has
the highest variability relative to the
mean?
a. 10, 15, 19, 26, 28, 34, 48, 55
b, 62, 65, 68, 71, 75, 76, 79, 80
c. 120, 120, 124, 125, 128, 128
3. Listed below are the average monthly
incomes in pesos of Filipino families for
each of the 13 regions in the country:
Determine the ungrouped mean,
median and modal monthly income.
Which of the three measures best
describes the data? Why?
1249.50 7500.00 20.000.00
2250.00 7249.00 10,250.00
5249.00 8250.00 2250.00
6250.00 9249.00 6250.00
6500.00
4. During the dry season, when there is
a peak of construction activity,
shortages of cement and consequent
price fluctuations occur. A materials
purchaser for a construction firm
makes a random check on 15 cement
dealers and obtains the following
prices in pesos per 45 kg bag of
cement:
105, 105, 105, 110, 110, 115, 115, 120,
120, 120, 125, 125, 130, 130, 135

What is the current price range of a 45


kg bag of cement?
Compute for the standard and mean
deviations of the given data.
5. The IQ distribution of first year
Engineering students has a mean of
120 and a standard deviation of 8.
Use Chebyshevs theorem to
determine the interval containing

At least three-fourths of the IQs


At least what percentage of these scores
must lie between 104 and 136?
6. A real estate broker is in charge of
selling lots in two new subdivisions.
The mean selling price of a lot in
Jade Park Village is P250,000 with a
standard deviation of P15,000. In
Ruby Valley Subdivision, the mean
price is P300,000 and the standard
deviation is P25,000. Which
subdivision has the larger variation
in lot prices?
7. Shown below are Johns scores,
mean, and the standard deviation of
each of the three tests given to 1000
students. On which test did John
stand highest? Lowest?

Test Mean Std. Deviation Johns Score


Math 47.2 4.8 53
English 64.2 8.3 71
Physics 75.4 11.7 72
8. By stopwatch method, the completion
time for a particular task in an
assembly line is measured for a
sample of 40 workers. The results are
summarized below. The production
manager has determined that a
median completion time of 3.75
minutes or less is acceptable. He
would however pull out workers for
further training if the median time is
found to be greater than 3.75
minutes. Will training be necessary?
Completion Time(min) Number of Workers
5.9 5.5 2
5.4 5.0 5
4.9 4.5 7
4.4 4.0 10
3.9 3.5 8
3.4 3.0 5
2.9 2.5 2
2.4 2.0 1

You might also like