Professional Documents
Culture Documents
Measures
Measures of Central Tendency
Symmetric, Skewed
Using Box-and-Whisker Plots
Coefficient of Correlation
Pitfalls in Numerical Descriptive Measures and Ethical Issues
Summary Measures
Summary Measures
Central Tendency
Mean
Quartiles
Variation
Mode
Median
Range
Coefficient of
Variation
Variance
Geometric Mean
Standard Deviation
Introduction
Think of a sample portfolio composed of three
stocks.
200
shares
100 shares ARR =
ARR = 10% 15%
100 shares
ARR = 20%
100 shares
ARR = 25%
Measures of Central
Tendency
Central Tendency
Mean
Median
Mode
X
i 1
i 1
Chap 3-5
Geometric Mean
X G X 1 X 2 L X n
1/ n
Measures of Central
Tendency
The central data point reflects the
locations of all the actual data points.
How?
If the third data point appears in the center
the measure of central location will remain
in the center, but (click)
Measures of Central
Tendency
The central data point reflects the
locations of all the actual data points.
How?
But if the third data point
appears on the left hand-side
of the midrange, it should pull
the central location to the left.
Measures of Central
Tendency
As more and more data points are added, the
central location moves (left and right) as required
in order to reflect the effects of all the points.
Sample Size
X
i 1
Population mean
N
X
i 1
X1 X 2 L X n
n
Population Size
X1 X 2 L X N
0 1 2 3 4 5 6 7 8 9 10
Mean = 5
0 1 2 3 4 5 6 7 8 9 10 12 14
Mean = 6
Median
Robust Measure of Central Tendency
Not Affected by Extreme Values
0 1 2 3 4 5 6 7 8 9 10
Median = 5
0 1 2 3 4 5 6 7 8 9 10 12 14
Median = 5
Number
If n or N is odd, the median is the middle number
If n or N is even, the median is the average of the 2
middle numbers
Mode
A Measure of Central Tendency
Value that Occurs Most Often
Not Affected by Extreme Values
There May Not Be a Mode
There May Be Several Modes
Used for Either Numerical or Categorical Data
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Mode = 9
0 1 2 3 4 5 6
No Mode
Geometric Mean
Useful in the Measure of Rate of Change of a
X G X 1 X 2 L X n
1/ n
RG 1 R1 1 R2 L 1 Rn
1/ n
Example
An investment of $100,000 declined to $50,000 at the
end of year one and rebounded back to $100,000 at end
of year two:
R2 1 (or 100% )
1/ 2
1/ 2
Quartiles
Split Ordered Data into 4 Quarters
25%
25%
Q1
25%
Q2
25%
Q3
i n 1
Position of i-th Quartile Qi
4
Q and
1
Quartiles
The lower half of a data set is the set of all values that are
to the left of the median value when the data has been put
into increasing order.
The upper half of a data set is the set of all values that are
to the right of the median value when the data has been
put into increasing order.
The first quartile, denoted by Q1 , is the median of
the lower half of the data set. This means that about 25%
of the numbers in the data set lie below Q1 and about 75%
lie above Q1 .
The third quartile, denoted by Q3 , is the median of
the upper half of the data set. This means that about 75%
of the numbers in the data set lie below Q3 and about 25%
lie above Q3 .
Quartiles
Data in Ordered Array: 11 12 13 16 16 17 17 18 21
Median
1 9 1
Position of Q1
2.5
4
Q1
12 13
12.5
(17 18)
Q3 2 17.5
Measures of Variation
Measures of central location fail to tell the whole
19
Measures of Variation
Variation
Range
Variance
Interquartile
Range
Population
Variance
Sample
Variance
Standard
Deviation
Population
Standard
Deviation
Sample
Standard
Deviation
Coefficient
of Variation
Range
Measure of Variation
Difference between the Largest and the Smallest
Observations:
Range = 12 - 7 = 5
Chap 3-21
10
11
12
10
11
12
Interquartile Range
Measure of Variation
Also Known as Midspread
Spread in the middle 50%
Variance
Important Measure of Variation
Shows Variation about the Mean
Sample Variance:
n
S2
Population Variance:
X
i 1
n 1
X
i 1
The Variance
Example
Find the variance of the following set of numbers,
representing annual rates of returns for a group of
mutual funds. Assume the set is (i) a sample, (ii) a
population: -2, 4, 5, 6.9, 10
Solution:
The Variance
Solution:
Assuming a sample
Standard Deviation
Most Important Measure of Variation
Shows Variation about the Mean
Has the Same Units as the Original Data
Sample Standard Deviation:
S
Population Standard Deviation:
X
i 1
n 1
X
i 1
Standard Deviation
Example
27
Standard Deviation
Solution:
Line 1:
28
Standard Deviation
Solution:
Line 2:
29
Standard Deviation
Line 1 should be considered less consistent
because the standard deviation of its defective
proportion is larger (i.e. therefore the standard
deviation of the good item proportion is also
larger).
30
distribution.
refer to
31
Standard Deviation
From a Frequency Distribution
(continued)
m
j 1
X fj
2
n 1
n sample size
c number of classes in the frequency distribution
m j midpoint of the jth class
f j frequencies of the jth class
Comparing Standard
Deviations
Data A
11 12
Mean = 15.5
s = 3.338
13
14
15
16
17
18
19
20 21
Data B
Mean = 15.5
11 12
13
14
15
16
17
18
19
20 21
s = .9258
Data C
Mean = 15.5
11 12
Chap 3-33
13
14
15
16
17
18
19
20 21
s = 4.57
Coefficient of Variation
Measure of Relative Variation
Always in Percentage (%)
Shows Variation Relative to the Mean
Used to Compare Two or More Sets of Data
S
CV 100%
X
Sensitive to Outliers
Comparing Coefficient
of Variation
Stock A:
Average price last year = $50
Standard deviation = $2
Stock B:
Average price last year = $100
Standard deviation = $5
Coefficient of Variation:
Stock A:
Stock B:
$2
S
CV 100%
100% 4%
X
$50
$5
S
CV 100%
100% 5%
X
$100
Shape of a Distribution
Describe How Data are Distributed
Measures of Shape
Symmetric or skewed
Left-Skewed
Symmetric
Right-Skewed
Mode < Median < Mean
X smallest Q
1
Median( Q2)
Q3
10
Xlargest
12
Q1
Q2 Q3
Symmetric
Q1Q2Q3
Right-Skewed
Q1 Q2 Q3
shaped:
Roughly 68% of the Observations Fall Within 1