You are on page 1of 9

1

Math 103
Statistics and
Probability
Central Tendency and Spread
CJD
Characteristics of Data
Center: A representative or average value that
indicates where the middle of the data set is
located
Variation: A measure of the amount that the
values vary among themselves
Distribution: The nature or shape of the
distribution of data (such as bell-shaped, uniform,
or skewed)
Outliers: Sample values that lie very far away
from the vast majority of other sample values
CJD
Measures of Center
a value at the center or middle of a data set
Notation :
denotes the addition of a set of values
x is the variable usually used to represent the
individual data values
n represents the number of data values in a sample
N represents the number of data values in a
population
CJD
Mean
Mean (Arithmetic Mean) AVERAGE
the number obtained by adding the values and
dividing the total by the number of values
is pronounced myu and denotes the mean of all values
in a population
is pronounced x-bar and denotes the mean of a set
of sample values
Calculators can calculate the mean of data
x =
n
x
x
N
=
x
2
CJD
Mean
6.72 3.46 3.60 6.44 26.70
Example: Find the mean of the following weights (in kg)
of sample carry-on luggages presented at an airport
check-in counter in the last hour.
Solution :
Sum of all weights = 6.72 + 3.46 + 3.60 + 6.44 + 26.70 = 46.92
Number of weights = 5
Mean = 46.92 / 5 = 9.384 kg.
Notice the impact of the outlier 26.70 on the mean.
CJD
often denoted by (pronounced x-tilde)
or by (pronounced myu-tilde)
is not affected by an extreme value
Median
x
~
Median
the middle value when the original data values are
arranged in order of increasing magnitude

~
CJD
Median
6.72 3.46 3.60 6.44 26.70
3.46 3.60 6.44 6.72 26.70
(odd number of values)
exact middle MEDIAN is 6.44
6.72 3.46 3.60 6.44
3.46 3.60 6.44 6.72
no exact middle -- shared by two numbers
3.60 + 6.44
2
(even number of values)
MEDIAN is 5.02
unsorted
sorted
unsorted
sorted
CJD
Mode
Mode
- the score that occurs most frequently
- Unimodal, Bimodal, Multimodal or No Mode
- denoted by M
- the only measure of central tendency that can
be used with nominal data
a. 5 5 5 3 1 5 1 4 3 5
b. 1 2 2 2 3 4 5 6 6 6 7 9
c. 1 2 3 6 7 8 9 10
Mode is 5
Bimodal - 2 and 6
No Mode
3
CJD
Qualitative Data
376 Pajero Mitsubishi
581 Lancer Mitsubishi
1,243 Jeepney Sarao
459 CRV Honda
960 City Honda
732 Civic Honda
417 Innova Toyota
725 Altis Toyota
104 Prius Toyota
1,098 Vios Toyota
Units Sold Model Maker
Mode: Sarao Jeepney
CJD
Comparison
can be used
also for nominal
data
hardly requires
any calculation if
data is sorted
may not exist or
may not be
unique
not useful for
small n
Second most
useful
not affected by
outliers gives
truer average
Easy to compute
if data is sorted or
n is small
varies greatly
from sample to
sample
most useful
easiest to
compute for large n
uses all data
does not vary
much from sample
to sample
distribution of
means is well
known
affected by
outliers
Mode Median Mean
CJD
Weighted Mean
x =
w
(w x)

Each individual value x may have
a weight w associated with it.
Example: A talent show is judged 40% execution,
30% difficulty, 20% originality and 10% audience impact.
If a contestant scored 8,9,6 and 7, the weighted mean is
= 3.2 + 2.7 + 1.2 + 0.7 = 7.8
CJD
Raw Data
60 74 74 58 72
58 82 52 26 72
66 66 60 92 78
46 38 50 66 50
62 64 68 62 84
54 66 66 44 60
84 70 76 72 66
70 64 52 40 78
76 42 50 64 48
64 40 82 54 74
Raw Data Test Scores in a Statistics Test
4
CJD
Sorted Data
92 74 66 60 50
84 74 66 60 50
84 72 66 60 48
82 72 66 58 46
82 72 64 58 44
78 70 64 54 42
78 70 64 54 40
76 68 64 52 40
76 66 62 52 38
74 66 62 50 26
66
64
~
7 . 62
=
=
=
M

Applying the formulas, (and using a calculator) we get


CJD
Measures of Spread or Variation
Range
Mean Deviation
Variance
Standard Deviation
CJD
Range and Midrange
92 74 66 60 50
84 74 66 60 50
84 72 66 60 48
82 72 66 58 46
82 72 64 58 44
78 70 64 54 42
78 70 64 54 40
76 68 64 52 40
76 66 62 52 38
74 66 62 50 26
Range = Highest Value Lowest Value
In Example:
Range = 92 26 = 66
(a measure of spread)
Mid-Range =
(92+26)/2 = 59
(a measure of center)
Mid-Range = (Highest + Lowest) / 2
CJD
Mean Deviation
n
x x


N
x


Mean Dev of a Sample Mean Dev of a Population
6.72 3.46 3.60 6.44 26.70
Example: Weights of Carry-on Luggages
Mean = 9.384 Range = 26.70 3.46 = 23.24
Mean Deviation =
926 . 6
5
632 . 34
5
384 . 9 70 . 26 384 . 9 44 . 6 384 . 9 60 . 3 384 . 9 46 . 3 384 . 9 72 . 6
= =
+ + + +
5
CJD
Variance
N
x
N
i
i
=

=
1
2
2
) (

1
) (
1
2
2

=
n
x x
s
n
i
i
Population Variance Sample Variance
Computing Formula for Variance
) 1 (
1
2
1
2
2

=

= =
n n
x x n
s
n
i
n
i
i i
Using n-1 will reduce the bias
Using n will underestimate variance
2
1
2
1
2
2
N
x x N
N
i
N
i
i i
= =

=
CJD
Variance Example
824.454 46.92
712.890 26.70
41.474 6.44
12.960 3.60
11.972 3.46
45.158 6.72
i
x
2
i
x
6.72 3.46 3.60 6.44 26.70
Example: Weights of sample Carry-on Luggages

039 . 96
) 4 ( 5
) 92 . 46 ( ) 454 . 824 ( 5
2
2
=

= s
CJD
Standard Deviation
2
=
2
s s =
Population SD Sample SD
In example,
800 . 9 039 . 96 = = s
CJD
Symbols for Standard Deviation
Sample
Population

x
x
n
s
Sx
x
n-1
Textbook
Some graphics
calculators
Some
non-graphics
calculators
Textbook
Some graphics
calculators
Some
non-graphics
calculators
Excel variance
Excel variance var varp
6
CJD
Comparison
Easiest to use
and interpret
Not useful for
large n
Says nothing
about
distribution of
data between
max and min
Considers all data
relative to the
mean
Simple but
awkward to
compute
most useful
Considers all data
relative to the mean
Computation more
involved
interpretation not
straight-forward
Range Mean Deviation Variance / SD
CJD
Example : The prelim exam grades of a Statistics class
and a Calculus class are summarized below:
Coefficient of Variation
% 100 =

CV
31.50% 17.8 56.5 Calculus
32.26% 22.0 68.2 Statistics
Coefficient
of Variation
Standard
Deviation
Mean Subject
To compare spreads of samples/populations with different
means
Therefore, the statistics grades are relatively only slightly
more variable than the calculus grades.
% 100 =
x
s
CV
CJD
z scores
z Score (or standard score)
- A measure of position relative to other data
- the number of standard deviations that a given
value x is above or below the mean
Sample
z =
x - x
s
Population
z =
x -

Round to 2 decimal places
CJD
Example
A student scored 67 in a calculus test and 74 in a
statistics test. If the calculus test has a mean of 53 with
SD of 8, and the statistics test has a mean of 65 with SD
of 6, did the student fare better relative to his classmates
in calculus or in statistics ?
Calculus: z = (67-53)/8 = 1.75
Statistics: z = (74-65)/6 = 1.50
Conclusion: The student fared better in Calculus
7
CJD
Interpreting z scores
- 3 - 2 - 1 0 1 2 3
Z
Unusual
Values
Unusual
Values
Ordinary
Values
CJD
Measures of Location or Position
Percentiles 100 parts (in 1%)
Deciles 10 parts (in 10%)
Quartiles 4 parts (in 25%)
Fractiles or Quantiles
CJD
Percentiles
Percentiles
P
1
, P
2
, P
3
, , P
98
, P
99
i % of the data falls below (<=) P
i
CJD
Deciles
D
1
, D
2
, D
3
, D
4
, D
5
, D
6
, D
7
, D
8
, D
9
divides ranked data into ten equal parts
10% 10% 10% 10% 10% 10% 10% 10% 10% 10%
D
1
D
2
D
3
D
4
D
5
D
6
D
7
D
8
D
9
i*10% of the data falls below (<=) D
i
D
9
is the 90
th
Percentile or P
90
8
CJD
Quartiles
Q
1
, Q
2
, Q
3
divides ranked scores into four equal parts
25%
25% 25% 25%
Q
3
Q
2
Q
1
(minimum) (maximum)
(median)
i*25% of the data falls below (<=) Q
i
Q
50
is the 50
th
percentile P
50
or 5
th
decile D
5
CJD
Example
92 74 66 60 50
84 74 66 60 50
84 72 66 60 48
82 72 66 58 46
82 72 64 58 44
78 70 64 54 42
78 70 64 54 40
76 68 64 52 40
76 66 62 52 38
74 66 62 50 26
P
5
=
P
4
=
P
94
=
P
99
=
D
4
=
D
9
=
Q
2
=
Q
3
=
40 (5/100)*50 rounds up to 3
rd
39 (4/100)*50=2 :get mid 38&40
83 (get mid 47
th
and 48
th
)
92 (99/100)*50 rds up to 50
th
61 (4/10)*50 :get mid 20
th
&21
st
80 (9/10)*50 :get mid 45
th
&46
th
64 (2/4)*50 :get mid 25
th
&26
th
72 (3/4)*50 rounds up to 38
th
CJD
Decile of score x = 10
Quartile of score x = 4
Quantile of a Score
Percentile of score x = 100
number of scores <= x
total number of scores
number of scores <= x
total number of scores
number of scores <= x
total number of scores
If result is not an integer, Round up to the next higher integer
Example: 32 of 50 test scores are <= 66.
32/50*100=64, 32/50*10=6.4, 32/50*4=2.56
So test score 66 is in P
64
, D
7
and Q
3
To improve estimate, include only half of other scores equal to x in the numerator.
CJD
Interquartile and Percentile Range
Percentile Range = P
90
P
10
92 74 66 60 50
84 74 66 60 50
84 72 66 60 48
82 72 66 58 46
82 72 64 58 44
78 70 64 54 42
78 70 64 54 40
76 68 64 52 40
76 66 62 52 38
74 66 62 50 26
Q
3
= 72
Q
1
= 52
IQR = 20
Range = 92 26 = 66
P
10
= 43
P
90
= 80
P
10
to P
90
Range
= 80-43 = 37
Interquartile Range (or IQR) = Q
3
Q
1
9
CJD
Boxplots
Simple graph to indicate Median, IQR and Outliers
Also known as the box and whisker plot
92 74 66 60 50
84 74 66 60 50
84 72 66 60 48
82 72 66 58 46
82 72 64 58 44
78 70 64 54 42
78 70 64 54 40
76 68 64 52 40
76 66 62 52 38
74 66 62 50 26
26
92
Q
1
=52 Q
2
=64 Q
3
=72 IQR=20
Variation:
Whiskers extend only to 1.5*IQR below
Q
1
and 1.5*IQR above Q
3
. Outside data
are marked with circles to mark outliers.
Basic Boxplot:
Q
1
Q
2
Q
3
CJD
End

You might also like