Professional Documents
Culture Documents
1.
2.
a. Find the mean, median and mode of the following observations and compare their values.
5, 7, 3, 5, 6, 8, 5, 6, 4, 6, 25
b.
c.
Eliminate the last observation x = 25 and then find the mean, median and mode. How do these values
compare with those found using the full data set?
How do possible outliers (such as 25) affect the values of these three measures of center?
3.
Suppose that as a prospective buyer you wish to compare the prices of homes in several parts of town.
a. If the homes in one area of town consisted of tract homes with about the same square feet of living space,
what measure would you use in finding the typical cost of a home in this area?
b. If the homes in another area of town consisted of homes which were built by individual contractors with
varying square feet of living area including several large custom homes, what measure might you wish to
use to find the typical cost of a home in this area?
4.
The following data are the ages (in months) at which n = 50 children were first enrolled in a preschool.
38
47
32
55
42
40
36
35
45
45
a.
b.
c.
d.
5.
40
35
34
39
50
48
41
40
42
38
30
34
41
33
37
36
43
30
41
46
35
43
30
32
39
31
48
46
36
36
39
41
46
32
33
36
40
37
50
31
The average weekly unemployment benefit amount for the 50 US states are given to the nearest dollar.
$159
190
163
210
160
256
258
215
220
212
$222
238
247
225
182
202
212
293
244
290
$188
222
217
290
180
247
231
210
236
214
$190
181
189
227
213
216
204
281
198
233
$284
209
252
157
186
188
233
264
253
207
a.
b.
c.
1.
2.
a.
3.
b.
c.
mean is smaller.
The mean is affected by the outlier, while the median and mode are not.
a.
b.
4.
a.
If all homes will cost about the same amounts, and there are no unusually high or low costing homes in the
tract, use the sample mean.
Since the custom homes may be much more expensive than the other homes, the average or mean cost may
not be a good measure of center. You should use the median cost.
The data are arranged in order of ascending magnitude below.
30
34
37
40
45
b.
c.
d.
30
34
37
41
46
30
35
38
41
46
31
35
38
41
46
31
35
39
41
47
32
36
39
42
48
32
36
39
42
48
32
36
40
43
50
33
36
40
43
50
33
36
40
45
55
The median for n = 50 observations is the average of the 25th and 26th ordered observations or
39 + 39
median =
= 39
2
xi 1954
=
= 39.08 .
and x =
n
50
The frequency histogram is shown below. The modal class is 35 to < 40 with midpoint ( 35 + 40 ) 2 = 37.5 .
Notice that the mean is shifted slightly to the right of both the median and the midpoint of the modal class.
The distribution in part b is slightly skewed to the right, which explains why the mean is shifted to the right
of the median and the midpoint of the modal class.
Histogram of Ages
16
14
Frequency
12
10
8
6
4
2
0
5.
a.
c.
35
40
45
Ages
50
55
60
b.
30
189
190
190
198
202
204
207
209
210
210
212
212
213
214
215
216
217
220
222
222
225
227
231
233
233
236
238
244
247
247
252
253
256
258
264
281
284
290
290
293
The median for n = 50 observations is the average of the 25th and 26th ordered observations or
215 + 216
median =
= 215.5
2
and
xi 10998
x=
=
= 219.96
n
50
Using the relative frequency histogram from Exercise 2, SCE 1C, the modal class is 200 to < 215 which has
midpoint 207.5.
Since the mean is shifted to the right of the median and the modal class, the distribution would exhibit
skewing to the right. See the histogram in Exercise 2, SCE 1C.
1.
2.
3.
1.
xi 18
=
= 3.6
n
5
a.
x=
b.
s2 =
=
( xi x )
s2 =
n 1
( 1.6 )
( 2 3.6 )2 + + ( 5 3.6 )2
4
5.2
=
= 1.3
4
c.
xi2
( xi )2
n 1
182
5 = 5.2 = 1.3 and
4
4
70
2.
a.
b.
R = 10 6 = 4
xi 72
=
= 8;
x=
n
9
s2 =
c.
xi2
( xi )2
n 1
722
9 = 12 = 1.5;
8
8
588
s = 1.5 = 1.225 .
R s = 4 1.225 = 3.27 . The range is between 3 and 4 standard deviations.
3.
xi2
( xi )2
n
a.
s2 =
b.
c.
s = 35.8302 = 5.99 .
Using the sorted data in the solution for Exercise 4, SCE 2A, R = 55 30 = 25 .
R s = 25 5.99 = 4.2 . The range is slightly more than 4 standard deviations.
n 1
19542
50 = 1755.68 = 35.8302;
49
49
78,118
Self-Correcting Exercises 2C: Tchebysheffs Theorem, the Empirical Rule and the Range Approximation
1.
2.
Suppose you are told that the mean and standard deviation of a sample of n = 500 observations were
x = 50 and s = 10 . You know nothing else about the shape of the distribution for these data.
a. What can be said about the proportion of observations between 40 and 60?
b. What can be said about the proportion of observations between 30 and 70?
c. What can be said about the proportion of observations smaller than 30?
3.
Suppose now you are told that the data in Exercise 2 are mound-shaped.
a. What can be said about the proportion of observations between 40 and 60?
b. What can be said about the proportion of observations between 30 and 70?
c. What can be said about the proportion of observations smaller than 30?
4.
1.
a.
b.
s R 3 = (10 1) 3 = 3
s2 =
xi2
( xi )2
n 1
542
10 = 94.4 = 10.4889;
9
9
386
2.
a.
b.
1
2
3
4
5
6
7
8
9
10
0
0
00
00
0
0
00
Since nothing is known about the shape of the distribution, you must use Tchebysheffs Theorem to
describe the data. The interval 40 to 60 represents x s 50 10 . Since k = 1 , you can say only that at
least none of the measurements are in this interval.
The interval 30 to 70 represents x 2 s 50 20 . Since k = 2 , you can say that at least 3/4 of the
measurements are in this interval.
c.
If at least 3/4 of the measurements are between 30 and 70, at most 1/4 of the measurements are outside this
interval. Since you know nothing about the shape of the distribution, all of these measurements might be
less than 30.
3.
a.
b.
c.
Using the Empirical Rule, approximately 68% of the measurements will be between 40 and 60.
Approximately 95% of the measurements will be in the interval x 2 s 50 20 or 30 to 70.
From b., there are 5% of the measurements outside the interval from 30 to 70. Since a mound-shaped
distribution is symmetric about the mean, 1 2 (5%) = 2.5% will be less than 30.
4.
a.
b.
c.
1.
Find the median, the lower and upper quartiles and the interquartile range for the following data: 4, 0, 5, 3, 6, 2,
5, 9, 5, 3.
2.
3.
Construct a boxplot for the data in Exercise 1. Are there any suspected outliers? Any extreme outliers?
4.
1.
Then m = ( 4 + 5 ) 2 = 4.5;
Q1 = 2 + .75 ( 3 2 ) = 2.75;
Q3 = 5 + .25 ( 6 5 ) = 5.25
and IQR = Q3 Q1 = 2.5
2.
a.
x=
xi 42
=
= 4.2;
10
n
s2 =
xi2
( xi )2
n 1
230
9
422
10 = 5.9556;
s = 5.9556 = 2.44 .
b.
x x 0 4.2
=
= 1.72 .
s
2.44
x x 9 4.2
=
= 1.97 .
For x = 9, z -score =
s
2.44
Neither value is unusually large or small.
For x = 0, z -score =
0, 2, 3, 3, 4, 5, 5, 5, 6, 9
3.
4.
a.
From Exercise 5a, SCE 2A, x = 219.96 and m = 215.5 . Using the sorted data in the solution to that
exercise, the positions of the upper and lower quartiles are:
.25 ( n + 1) = .25 ( 51) = 12.75
.75 ( n + 1) = .75 ( 51) = 38.25
Then
Q1 = 190 + .75 (190 190 ) = 190;
Q3 = 244 + .25 ( 247 244 ) = 244.75
and IQR = Q3 Q1 = 54.75
Also,
( xi )2
2, 478,896
109982
50 = 34.93.
n
=
n 1
49
Lower and Upper Fences:
Q1 1.5IQR = 190 1.5 ( 54.75 ) = 107.875
s=
b.
xi2
150
c.
175
200
225
Dollars
250
275
300