Professional Documents
Culture Documents
Describing Numerical
Data
Chapter 4
4.1 Summaries of Numerical Variables
Can 500 different songs fit on the iPod
Shuffle?
To answer this question we must understand the
typical length of a song and the variation of song
sizes around the typical length
We can do this using summary statistics
Copyright 2011 Pearson Education, Inc.
3 of 42
4.1 Summaries of Numerical Variables
A Subset of the Data
Copyright 2011 Pearson Education, Inc.
4 of 42
4.1 Summaries of Numerical Variables
The Median
Value in the middle of a sorted list of numerical
values (a typical value)
Half of the values fall below the median; half fall
above
It is the 50
th
Percentile
Copyright 2011 Pearson Education, Inc.
5 of 42
4.1 Summaries of Numerical Variables
Common Percentiles
Lower Quartile = 25
th
Percentile
Upper Quartile = 75
th
Percentile
One quarter of the values fall below the lower
quartile and one quarter fall above the upper
quartile
Copyright 2011 Pearson Education, Inc.
6 of 42
4.1 Summaries of Numerical Variables
The Interquartile Range (IQR)
IQR = 75
th
Percentile 25
th
Percentile
A measure of variation based on quartiles
Used to accompany the median
Copyright 2011 Pearson Education, Inc.
7 of 42
4.1 Summaries of Numerical Variables
The Range
Range = Maximum - Minimum
Maximum Value = 100
th
Percentile
Minimum Value = 0
th
Percentile
Another measure of variation; not preferred
because based on extreme values
Copyright 2011 Pearson Education, Inc.
8 of 42
4.1 Summaries of Numerical Variables
The Five Number Summary
Minimum
Lower Quartile
Median
Upper Quartile
Maximum
Copyright 2011 Pearson Education, Inc.
9 of 42
4.1 Summaries of Numerical Variables
The Five Number Summary for Song Sizes
Minimum = 0.148 MB
Lower Quartile = 2.85 MB
Median = 3.5015 MB
Upper Quartile = 4.32 MB
Maximum = 21.622 MB
Copyright 2011 Pearson Education, Inc.
10 of 42
4.1 Summaries of Numerical Variables
Summary Statistics for Song Sizes
Median = 3.5015 MB
IQR = 4.32 MB 2.85 MB = 1.47 MB
Range = 21.622 MB 0.148 MB = 21.474 MB
Copyright 2011 Pearson Education, Inc.
11 of 42
4.1 Summaries of Numerical Variables
The Mean (Average)
Arithmetic average; divide the sum of the values
by the number of values (another typical value)
The symbol y represents the variable of interest
The symbol read y bar represents the mean
Copyright 2011 Pearson Education, Inc.
12 of 42
y
4.1 Summaries of Numerical Variables
The Mean (Average)
Copyright 2011 Pearson Education, Inc.
13 of 42
1 2 n y y y
y
n
4.1 Summaries of Numerical Variables
The Variance (s
2
)
Is a measure of variation based on the
mean
How far a value is from the mean is known
as its deviation; the variance is the average
of the squared deviations
Copyright 2011 Pearson Education, Inc.
14 of 42
4.1 Summaries of Numerical Variables
The Variance
Copyright 2011 Pearson Education, Inc.
15 of 42
2
2 2 2
1 2
1
n
y y y y y y
s
n
4.5 Epilog
Can 500 different songs fit on the iPod
Shuffle?
Because of variation, not every collection of 500
songs will fit. The longest 500 songs wont fit.
However, based on the typical song size, the
amount of variation in song sizes and the shape
of its distribution, we can say that most
collections of 500 songs will fit!
Copyright 2011 Pearson Education, Inc.
38 of 42
Best Practices
Be sure that data are numerical when using
histograms and summaries such as the mean
and standard deviation.
Summarize the distribution of a numerical
variable with a graph.
Choose interval widths appropriate to the data
when preparing a histogram.
Copyright 2011 Pearson Education, Inc.
39 of 42
Best Practices (Continued)
Scale your plots to show data, not empty space.
Anticipate what you will see in a histogram.
Label clearly.
Check for gaps.
Copyright 2011 Pearson Education, Inc.
40 of 42
Pitfalls
Do not use the methods of this chapter for
categorical variables.
Do not assume that all numerical data have a
bell-shaped distribution.
Do not ignore the presence of outliers.
Copyright 2011 Pearson Education, Inc.
41 of 42
Pitfalls (Continued)
Do not remove outliers unless you have a good
reason.
Do not forget to take the square root of a
variance.
Copyright 2011 Pearson Education, Inc.
42 of 42