You are on page 1of 27

Lecture 1

Introduction to
Statistics and
Data Analysis

Copyright © 2010 Pearson Addison-Wesley. All rights reserved.


Measures of
Location: The
Sample Mean
and Median

Copyright © 2010 Pearson Addison-Wesley. All rights reserved.


Definition 1.1

1-3
Definition 1.2

1-4
Table 1.1 Data Set for
Example1.2

1-5
Figure 1.4 Sample mean as a centroid
of the with-nitrogen stem weight

1-6
Other Measures of Locations

• However, it is instructive to discuss one class of estimators, namely


the class of trimmed means. A trimmed mean is computed by
“trimming away” a certain percent of both the largest and the smallest
set of values.

• For example, the 10% trimmed mean is found by eliminating the


largest 10% and smallest 10% and computing the average of the
remaining values.

1-7
Other Measures of Locations

For example, in the case of the stem weight data, we would eliminate the
largest and smallest since the sample size is 10 for each sample. So for
the without-nitrogen group the 10% trimmed mean is given by

1-8
Exercise 1.2

According to the journal Chemical Engineering, an important property of a


fiber is its water absorbency. A random sample of 20 pieces of cotton
fiber was taken and the absorbency on each piece was measured. The
following are the absorbency values:

18.71 21.41 20.72 21.81 19.29 22.43 20.17


23.71 19.44 20.50 18.92 20.33 23.00 22.85
19.25 21.77 22.11 19.77 18.04 21.12

a) Calculate the sample mean and median for the above sample values.
b) Compute the 10% trimmed mean.
c) Do a dot plot of the absorbency data.
d) Using only the values of the mean, median, and trimmed mean, do
you have evidence of outliers in the data?

1-9
Exercise 1.2 [Solution]

1 - 10
Section 1.4
Measures of
Variability

Copyright © 2010 Pearson Addison-Wesley. All rights reserved.


Definition 1.3a

Range
The difference in value between the highest-valued (H) and
the lowest-valued (L) data:

range = high value – low value


range = H – L

The sample 3, 3, 5, 6, 8 has a range of H – L= 8 – 3 = 5

1 - 12
Definition 1.3

1 - 13
Example 1.4

1 - 14
Exercise 1.3

Compute the sample variance and standard deviation for the water
absorbency data of Exercise 1.2.

1 - 15
Section 1.6
Statistical
Modeling,
Scientific,
Inspection, and
Graphical
Diagnostics

Copyright © 2010 Pearson Addison-Wesley. All rights reserved.


Table 1.3 Tensile strength

1 - 17
Figure 1.5 Scatter plot of tensile
strength and cotton percentages

Copyright © 2010 Pearson Addison-Wesley. All rights reserved. 1 - 18


Table 1.4 Car Battery Life

Copyright © 2010 Pearson Addison-Wesley. All rights reserved. 1 - 19


Table 1.5 Stem-and-Leaf Plot of
Battery Life

1 - 20
Table 1.7 Relative Frequency
Distribution of Battery Life

1 - 21
Figure 1.6 Relative frequency
histogram

1 - 22
Figure 1.7 Estimating frequency
distribution

1 - 23
Figure 1.8 Skewness of data

• A distribution is said to be symmetric if it can be folded


along a vertical axis so that the two sides coincide. A
distribution that lacks symmetry with respect to a vertical
axis is said to be skewed.
• The distribution illustrated in Figure 1.8(a) is said to be
skewed to the right since it has a long right tail and a
much shorter left tail.
• In Figure 1.8(b) we see that the distribution is symmetric,
while in Figure 1.8(c) it is skewed to the left.

1 - 24
Figure 1.8 Skewness of data

1 - 25
Table 1.8 Nicotine Data for
Example 1.5

1 - 26
Figure 1.10 Stem-and-Leaf plot
for the nicotine data

1 - 27

You might also like