Professional Documents
Culture Documents
1
Graphical Excellence
“Graphical excellence” deals with the
effective use of graphical techniques.
Effective graphical techniques are
– informative,
– concise,
– clear presentation of the data to the viewer.
Time Time
Dollars
10%
Aug. 98 Sept. 98
1980 1985 1990
5
Measures of Central Location
6
Measures of Central Location (Central Tendency)
Usually, we focus our attention on two aspects of measures of
central location:
Measure of the central data point (the average).
Measure of dispersion of the data about the average.
8
• Example 4.1
mean of the sample of six measurements 7, 3, 9, -2, 4, 6 is
∑ i6=1 xi x71 + x
7 32 + x
3 93 +−x
9 24 + x
45 + x
4 66
6
x= = = 4.5
4.5
6 6
• Example 4.2
ppose the telephone bills of example 2.1 represent populat
measurements. The population mean is
200
∑ i=1 xi x1 + 15.30
42.19
42.19 x
15.30
2 + ...+ 53.21
x200
53.21
µ= = = 43.59
43.59
200 200
9
• Example 4.3
When many of the measurements have the same value, the
easurement can be summarized in a frequency table. Supp
he number of children in a sample of 16 employees were reco
s follows:
NUMBER OF CHILDREN 0 1 2 3
NUMBER OF EMPLOYEES 3 4 7 2
16 employees
∑16
i=1xix1+ x2...+ x16 3(0) + 4(1) + 7(2) + 2(3)
x= = = = 1.5
16 16 16
10
The median
– The median of a set of measurements is the
value that falls in the middle when the
measurements are arranged in order of
magnitude.
Example 4.4
Seven employee salaries were Suppose
recordedone employee’s salary of $31,
(in 1000s) : 28, 60, 26, 32, 30, was
26, 29.
added to the group recorded befor
Find the median salary. Find the median salary.
Even number of observat
Odd number of observations
There are two
middle values!
26,26,28,29,30,32,60 26,26,28,29,
26,26,28,29,
26,26,28,29, 29.530,32,60,31
26,26,28,29,30,32,60,31
30,32,60,31
,30,32,60,
First, sort the salaries. First, sort the salaries.
Then, locate the value in Then, locate the values11in
The mode
– The mode of a set of measurements is the value
that occurs most frequently.
– Set of data may have one mode (or modal
class), or two or more modes.
For large data sets
The modal class the modal class is
much more relevant
than the a single-
value mode.
12
– Example 4.5
The manager of a men’s store observes the waist
13
Measures of Central Location
14
• Example 4.6
A professor of statistics wants to report the results of a midt
exam, taken by 100 students. The data appear in file XM04
Find the mean, median, and mode, and describe the informa
they provide. The mean provides information
about the over-all performance level
of the class. It can serve as a tool for
Marks
making comparisons with other
Mean 73.98
StandardError 2.1502163 classes and/or other exams.
Median 81
Mode 84
StandardDeviation 21.502163 The Median indicates that half of the
SampleVariance
Kurtosis
462.34303
0.3936606 class received a grade below 81%,
Skewness
Range
-1.073098
89
and half of the class received a grade
Minimum
Maximum
11
100
above 81%.
Sum 7398
Count 100 The mode must be used when data is
qualitative. If marks are classified by
letter grade, the frequency of each
grade can be calculated.Then, the mod
15
becomes a logical measure to comput
Excel Histogram
Fre q u e n cy
Bin Frequency
10 0
20 3 30
30
40
2
6 20 The histogram is skewed to the left
50 6
60 5 10
70 10
80 16
0
90 28
10
20
30
40
50
60
70
80
90
re
0
100 24
10
Mo
More 0
Modal class
16
Relationship among Mean, Median,
and Mode
If a distribution is symmetrical, the
mean, median and mode coincide
Mode Mean
Median 17
If
a distribution is symmetrical, the mean,
median and mode coincide
– Solution
Since Rg is the geometric mean
(1+R)3 = (1+.2)(1+.1)(1-.05)= 1.2540
Thus,
Rg = 3 (1+ .2)(1+ .1)(1− .05) − 1= .0784
, or 7.84%.
20
Measures of variability
(Dispersion or Spread)
Measures of central location fail to tell the
whole story about the distribution.
A question of interest still remains unanswered:
? Range
? ?
Smallest Largest
measurement measurement 23
The variance
– This measure of dispersion reflects the values
of all the measurements.
– The variance of a population of N
measurements
x1, x2,…,xN having a mean µ is defined as
∑N (x
i=1 i
− µ )2
σ2 =
N
– The variance of a sample of n measurements
x1, x2, …,xn having a mean x is defined as
∑n (x
i=1 i
− x)2
s2 =
n− 1 24
Consider two small populations: 9-10= -1
Population A: 8, 9, 10, 11,
Let us12 11-10= +1
start by calculating
the sum
Population B: 4, 7, 10, 13, 16of deviations 8-10= -2
12-10= +2
Thus, a measure of dispersion
Sum = 0
is needed that agrees with this
The sum of deviations
observation.
A is zero in both cases,
therefore, another
8 9 10 11 12 measure is needed.
…but measurements in B
The mean of both
are much more dispersed 4-10 = - 6
populations is 10...
then those in A. 16-10 = +6
B 7-10 = -3
4 7 10 13 16 13-10 = +3
Sum = 0
25
9-10= -1
The sum of squared deviations 11-10= +1
is used in calculating the variance. 8-10= -2
See example next. 12-10= +2
Sum = 0
The sum of deviations
A is zero in both cases,
therefore, another
8 9 10 11 12 measure is needed.
4-10 = - 6
16-10 = +6
B 7-10 = -3
4 7 10 13 16 13-10 = +3
Sum = 0
26
Let us calculate the variance of the two populations
2 2 2 2 2
(8− 10
) + (9− 10
) + (10− 10) + ( −
11 10
) + (12− 10
)
σ 2A = =2
5
2 2 2 2 2
2 (4 − 10
) + (7− 10
) + (10− 10) + (13− 10
) + (16− 10
)
σB = = 18
5
Why is the variance defined as
the average squared deviation?
Why not use the sum of squared
deviations as a measure of all, the sum of squared
After
dispersion instead? deviations increases in
magnitude when the dispersion
of a data set increases!! 27
Which data
Which data set
set has
has aa larger
larger dispersion?
dispersion?
However, whenDatacalculated
set B on
“per observation” basis
is more (variance),
dispersed
the data set dispersions are properly ranked
around the mean
A B
1 2 3 1 3 5
σ A22=
SumA = (1-2)2 +…+(1-2)2 +(3-2)2 +… +(3-2) = 10
SumA/N = 10/5 = 2
5 times 5 times
28
– Example 4.8
Find the mean and the variance of the following
29
Measures of variability
(Dispersion or Spread)
Measures of central location fail to tell the
whole story about the distribution.
A question of interest still remains unanswered:
Samplestandard :s = s2
deviation
Population
standard :σ = σ 2
deviation
– Example 4.9
Rates of return over the past 10 years for two mutual
Mean 16 Mean 12
Standard Error 5.295 Standard Error 3.152
Median 14.6 Median 11.75
und A should be consideredMode #N/A Mode #N/A
skier because its standardStandardDeviation 16.74 Standard Deviation 9.969
Sample Variance 280.3 Sample Variance 99.37
eviation is larger Kurtosis -1.34 Kurtosis -0.46
Skewness 0.217 Skewness 0.107
Range 49.1 Range 30.6
Minimum -6.2 Minimum -2.8
Maximum 42.9 Maximum 27.8
Sum 160 Sum 120
Count 10 Count 10
33
The coefficient of variation
– The coefficient of variation of a set of
measurements is the standard deviation divided
by the mean value. s
Sample
coefficien : cv=
t ofvariation
x
σ
Population
coefficien : CV=
t ofvariation
µ
– This coefficient provides a proportionate
measure of variation.
A standard deviation of 10 may be perceived
as large when the mean value is 100, but only
moderately large when the mean value is 500
34
Interpreting Standard
Deviation
The standard deviation can be used to
– compare the variability of several distributions
– make a statement about the general shape of a
distribution.
The empirical rule: If a sample of measurements
has a mound-shaped distribution, the interval
(x− s,x+ s) contains
approximat
ely68%ofthemeasuremen
ts
(x− 2s,x+ 2s) contains
approximat
ely95%ofthemeasuremen
ts
(x− 3s,x+ 3s) contains
virtually
allofthemeasuremen
ts
35
– Example 4.10
The duration of 30 long-distance telephone calls are
shown next. Check the empirical rule for the this set
of measurements.
• Solution
First check if the histogram has an approximate
mound-shape
10
8
6
4
2
0
2 5 8 11 14 17 20 More
36
• Calculate the mean and the standard deviation:
Mean = 10.26; Standard deviation = 4.29.
Interval Empirical
Interval Empirical Rule
Rule Actual
Actual percentage
percentage
5.97,14.55
5.97, 14.55 68%
68% 70%
70%
1.68,18.84
1.68, 18.84 95%
95% 96.7%
96.7%
-2.61,23.13
-2.61, 23.13 100%
100% 100%
100%
37
Other conclusions
By the empirical rule, approximately 95% of the area
under a mound-shaped histogram lies between