You are on page 1of 102

Descriptive Statistics

2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a
license distributed with a certain product or service or otherwise on a password-protected website for classroom use.
What is Statistics?

Statistics is a way to get information from data.

2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a
license distributed with a certain product or service or otherwise on a password-protected website for classroom use.
What is Statistics?

Statistics is a way to get information from data


Statistics

Data Information

Statistics is a tool for creating new understanding from a set of numbers.

Definitions: Oxford English Dictionary


2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a
license distributed with a certain product or service or otherwise on a password-protected website for classroom use.
Statistical Techniques/Methods

Formulate Get some Visualize the


problem data data

Do some Interpret
statistical results
calculations

2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a
license distributed with a certain product or service or otherwise on a password-protected website for classroom use.
Descriptive Statistics
Descriptive statistics deals with methods of organizing,
summarizing, and presenting data in a convenient and
informative way.

One form of descriptive statistics uses graphical techniques,


which allow statistics practitioners to present data in ways that
make it easy for the reader to extract useful information.

Chapter 2 introduces several graphical methods.

2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a
license distributed with a certain product or service or otherwise on a password-protected website for classroom use.
Descriptive Statistics
Another form of descriptive statistics uses numerical
techniques to summarize data.

The mean and median are popular numerical techniques to


describe the location of the data.

The range, variance, and standard deviation measure the


variability of the data

Chapter 4 introduces several numerical statistical measures


that describe different features of the data.
2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a
license distributed with a certain product or service or otherwise on a password-protected website for classroom use.
Inferential statistics
The information we would like to acquire in the Case is an
estimate of annual profits from the exclusivity agreement. The
data are the numbers of cans of soft drinks consumed in 7 days
by the 500 students in the sample.

We want to know the mean number of soft drinks consumed


by all 50,000 students on campus.

To accomplish this goal we need another branch of statistics-


inferential statistics.

2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a
license distributed with a certain product or service or otherwise on a password-protected website for classroom use.
Inferential statistics
Inferential statistics is a body of methods used to draw
conclusions or inferences about characteristics of populations
based on sample data. The population in question in this case
is the soft drink consumption of the university's 50,000
students. The cost of interviewing each student would be
prohibitive and extremely time consuming. Statistical
techniques make such endeavors unnecessary. Instead, we can
sample a much smaller number of students (the sample size is
500) and infer from the data the number of soft drinks
consumed by all 50,000 students. We can then estimate annual
profits for Cola.

2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a
license distributed with a certain product or service or otherwise on a password-protected website for classroom use.
Key Statistical Concepts
Population
a population is the group of all items of interest to
a statistics practitioner.
frequently very large; sometimes infinite.
E.g. All cola users

Sample
A sample is a set of data drawn from the
population.
Potentially very large, but less than the population.
E.g. a sample of drinkers
2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a
license distributed with a certain product or service or otherwise on a password-protected website for classroom use.
Key Statistical Concepts
Parameter
A descriptive measure of a population.

Statistic
A descriptive measure of a sample.

2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a
license distributed with a certain product or service or otherwise on a password-protected website for classroom use.
Key Statistical Concepts
Population Sample

Subset

Statistic
Parameter
Populations have Parameters,
Samples have Statistics.

2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a
license distributed with a certain product or service or otherwise on a password-protected website for classroom use.
Descriptive Statistics
are methods of organizing, summarizing, and presenting
data in a convenient and informative way. These methods
include:
Graphical Techniques (Chapter 2), and
Numerical Techniques (Chapter 4).
The actual method used depends on what information we
would like to extract. Are we interested in
measure(s) of central location? and/or
measure(s) of variability (dispersion)?

Descriptive Statistics helps to answer these questions

2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a
license distributed with a certain product or service or otherwise on a password-protected website for classroom use.
Inferential Statistics
Descriptive Statistics describe the data set thats being
analyzed, but doesnt allow us to draw any conclusions or
make any interferences about the data. Hence we need
another branch of statistics: inferential statistics.

Inferential statistics is also a set of methods, but it is used to


draw conclusions or inferences about characteristics of
populations based on data from a sample.

2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a
license distributed with a certain product or service or otherwise on a password-protected website for classroom use.
Statistical Inference
Statistical inference is the process of making an estimate,
prediction, or decision about a population based on a sample.
Population

Sample

Inference

Statistic
Parameter

What can we infer about a Populations Parameters


based on a Samples Statistics?
2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a
license distributed with a certain product or service or otherwise on a password-protected website for classroom use.
Statistical Inference
We use statistics to make inferences about parameters.

Therefore, we can make an estimate, prediction, or decision


about a population based on sample data.

Thus, we can apply what we know about a sample to the


larger population from which it was drawn!

2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a
license distributed with a certain product or service or otherwise on a password-protected website for classroom use.
Types of Data

Cross-sectional data
Data collected by recording a characteristic of many subjects
at the same point in time, or without regard to differences in
time.
Subjects might include individuals, households, firms,
industries, regions, and countries.

LO 1.3
2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a
license distributed with a certain product or service or otherwise on a password-protected website for classroom use.
Types of Data

Time series data


Data collected by recording a characteristic of a subject over
several time periods.
Data can include daily, weekly, monthly, quarterly, or
annual observations.
This graph plots the
U.S. GDP growth rate
from 1980 to 2010 - it
is an example of time
series data.

LO 1.3
2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a
license distributed with a certain product or service or otherwise on a password-protected website for classroom use.
Variables and Scales of Measurement

Describe variables and various types of measurement scales.

A variable is the general characteristic being


observed on an object of interest.
Types of Variables
Qualitative gender, race, political affiliation
Quantitative test scores, age, weight
Discrete
Continuous

2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a
license distributed with a certain product or service or otherwise on a password-protected website for classroom use.
Variables and Scales of Measurement

Types of Quantitative Variables


Discrete
A discrete variable assumes a countable
number of distinct values.
Examples: Number of children in a family,
number of points scored in a basketball
game.

LO 1.4
2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a
license distributed with a certain product or service or otherwise on a password-protected website for classroom use.
Variables and Scales of Measurement

Types of Quantitative Variables


Continuous
A continuous variable can assume an
infinite number of values within some
interval.
Examples: Weight, height, investment
return.

LO 1.4
2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a
license distributed with a certain product or service or otherwise on a password-protected website for classroom use.
Variables and Scales of Measurement

Scales of Measure

- Nominal
Qualitative
- Ordinal

- Ratio Quantitative

LO 1.4
2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a
license distributed with a certain product or service or otherwise on a password-protected website for classroom use.
Variables and Scales of Measurement

The Nominal Data


Data are simply categories for grouping the data.

Qualitative values may be


converted
to quantitative values for
analysis purposes.

LO 1.4
2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a
license distributed with a certain product or service or otherwise on a password-protected website for classroom use.
Variables and Scales of Measurement

The Ordinal Scale


Ordinal data may be categorized and ranked with respect to
some characteristic or trait.
For example, instructors are often evaluated on an ordinal scale
(excellent, good, fair, poor).

Differences between categories are meaningless because the


actual numbers used may be arbitrary.
There is no objective way to interpret the difference between instructor
quality.

LO 1.4
2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a
license distributed with a certain product or service or otherwise on a password-protected website for classroom use.
Variables and Scales of Measurement

The Ratio Scale


The strongest level of measurement.
Differences between values are equal and meaningful.
There is an absolute 0 or defined starting point. 0 does
mean the absence of Thus, meaningful ratios may be
obtained.

LO 1.4
2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a
license distributed with a certain product or service or otherwise on a password-protected website for classroom use.
Variables and Scales of Measurement

The Ratio Scale


The following variables are measured on a ratio scale:
General Examples: Weight, Time, and Distance
Business Examples: Sales, Profits, and Inventory Levels

LO 1.4
2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a
license distributed with a certain product or service or otherwise on a password-protected website for classroom use.
Hierarchy of Data
Ratio
Values are real numbers.
All calculations are valid.
Data may be treated as ordinal or nominal.

Ordinal
Values must represent the ranked order of the data.
Calculations based on an ordering process are valid.

Nominal
Values are the arbitrary numbers that represent categories.
Only calculations based on the frequencies of occurrence are valid.
Data can not be treated as ordinal or ratio.

2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a
license distributed with a certain product or service or otherwise on a password-protected website for classroom use.
Graphical & Tabular Techniques for Nominal Data
The only allowable calculation on nominal data is to count
the frequency of each value of the variable.

We can summarize the data in a table that presents the


categories and their counts called a frequency distribution.

A relative frequency distribution lists the categories and the


proportion with which each occurs.

2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a
license distributed with a certain product or service or otherwise on a password-protected website for classroom use.
Example 2.1 Work Status in the GSS 2012 Survey
[GSS2012*] In Chapter 1 we briefly introduced the General Social Survey.
In the 2012 survey respondents were asked the following question.
Last week were you working full time, part time, going to school, keeping
house, or what? The responses were
1. Working full time
2. Working part time
3. Temporarily not working
4. Unemployed, laid off
5. Retired
6. School
7. Keeping house
8. Other
The responses were recorded using the codes 1, 2, 3, 4, 5, 6, 7, and 8,
respectively.
2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a
license distributed with a certain product or service or otherwise on a password-protected website for classroom use.
Frequency and Relative Frequency Distributions

Work Status Code Frequency Percentage Frequency (%)


Working full-time 1 912 46.2
Working part-time 2 226 11.5
Temporarily not working 3 40 2.0
Unemployed, laid off 4 104 5.3
Retired 5 357 18.1
School 6 70 3.5
Keeping house 7 210 10.6
Other 8 54 2.7

2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a
license distributed with a certain product or service or otherwise on a password-protected website for classroom use.
Nominal Data (Frequency)
Bar Chart

1000
912
900

800

700

600

500

400 357

300
226 210
200
104
100 70 54
40
0
1 2 3 4 5 6 7 8
WRKSTAT

Bar Charts are often used to display frequencies


2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a
license distributed with a certain product or service or otherwise on a password-protected website for classroom use.
Nominal Data (Relative Frequency)
Pie Chart
8, 2.7%
7, 10.6%

6, 3.5%

1, 46.2%

5, 18.1%

4, 5.3%

3, 2.0%

2, 11.5%

Pie Charts show relative frequencies


2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a
license distributed with a certain product or service or otherwise on a password-protected website for classroom use.
Nominal Data
Bar Chart
It all the same information,
1000 912
900
800
700
600
(based on the same data).
500
400 357 Just different presentation.
300 226 210
200 104
40 70 54
100
0
1 2 3 4 5 6 7 8
WRKSTAT

Pie Chart
8, 2.7%

7,
6, 3.5% 10.6%

1, 46.2%
5, 18.1%

4, 5.3% 2, 11.5%
3, 2.0%

2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a
license distributed with a certain product or service or otherwise on a password-protected website for classroom use.
Describing the Relationship between Two Nominal Variables
To describe the relationship between two nominal variables, we must
remember that we are permitted only to determine the frequency of the
values. As a first step we need to produce a cross-classification table,
which lists the frequency of each combination of the values of the two
variables

2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a
license distributed with a certain product or service or otherwise on a password-protected website for classroom use.
Problem
One Chocolate manufacturing company sells quality chocolate products at
its plant and retail stores. Two years ago, the company developed a Web
site and began selling its products over the Internet. Web site have
exceeded the companys expectations, and management is now considering
strategies to increase sales even further. To learn more about the Web site
customers, a sample of 50 Chocolate transactions was selected from the
previous months sales.

Data showing
the day of the week each transaction was made,
the type of browser the customer used,
the time spent on the Web site,
the number of Web site pages viewed,
the amount spent by each of the 50 customers.

2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a
license distributed with a certain product or service or otherwise on a password-protected website for classroom use.
Box Plot

2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a
license distributed with a certain product or service or otherwise on a password-protected website for classroom use.
Example 3.1
Following deregulation of telephone service, several new
companies were created to compete in the business of
providing long-distance telephone service. In almost all
cases these companies competed on price since the service
each offered is similar. Pricing a service or product in the
face of stiff competition is very difficult. Factors to be
considered include supply, demand, price elasticity, and the
actions of competitors. Long-distance packages may employ
per-minute charges, a flat monthly rate, or some combination
of the two. Determining the appropriate rate structure is
facilitated by acquiring information about the behaviors of
customers and in particular the size of monthly long-distance
bills.

2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a
license distributed with a certain product or service or otherwise on a password-protected website for classroom use.
Example 3.1
As part of a larger study, a long-distance company wanted to
acquire information about the monthly bills of new
subscribers in the first month after signing with the
company. The companys marketing manager conducted a
survey of 200 new residential subscribers wherein the first
months bills were recorded. These data are stored in file
Xm03-01. The general manager planned to present his
findings to senior executives. What information can be
extracted from these data?

2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a
license distributed with a certain product or service or otherwise on a password-protected website for classroom use.
Example 3.1
We have chosen eight classes defined in such a way that each
observation falls into one and only one class. These classes are defined
as follows:
Classes
Amounts that are less than or equal to 15
Amounts that are more than 15 but less than or equal to 30
Amounts that are more than 30 but less than or equal to 45
Amounts that are more than 45 but less than or equal to 60
Amounts that are more than 60 but less than or equal to 75
Amounts that are more than 75 but less than or equal to 90
Amounts that are more than 90 but less than or equal to 105
Amounts that are more than 105 but less than or equal to 120

2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a
license distributed with a certain product or service or otherwise on a password-protected website for classroom use.
Example 3.1
Histogram

80
70
60
Frequency

50
40
30
20
10
0
15 30 45 60 75 90 105 120
Bills

2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a
license distributed with a certain product or service or otherwise on a password-protected website for classroom use.
Interpret

(18+28+14=60)200 = 30%
about half (71+37=108)
i.e. nearly a third of the phone bills
of the bills are small,
are $90 or more.
i.e. less than $30

There are only a few telephone


bills in the middle range.

2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a
license distributed with a certain product or service or otherwise on a password-protected website for classroom use.
Building a Histogram
1) Collect the Data
2) Create a frequency distribution for the data
How?
a) Determine the number of classes to use
How?
Refer to table 3.2:
With 200 observations,
we should have
between 7 & 10
classes

Alternative, we could use Sturges formula:


Number of class intervals = 1 + 3.3 log (n)
2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a
license distributed with a certain product or service or otherwise on a password-protected website for classroom use.
Building a Histogram
1) Collect the Data
2) Create a frequency distribution for the data
How?
a) Determine the number of classes to use. [8]
b) Determine how large to make each class
How?
Look at the range of the data, that is,
Range = Largest Observation Smallest Observation
Range = $119.63 $0 = $119.63
Then each class width becomes:
Range (# classes) = 119.63 8 15

2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a
license distributed with a certain product or service or otherwise on a password-protected website for classroom use.
Building a Histogram

2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a
license distributed with a certain product or service or otherwise on a password-protected website for classroom use.
Building a Histogram

2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a
license distributed with a certain product or service or otherwise on a password-protected website for classroom use.
Shapes of Histograms
Symmetry
A histogram is said to be symmetric if, when we draw a
vertical line down the center of the histogram, the two sides
are identical in shape and size:
Frequency

Frequency

Frequency
Variable Variable Variable

2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a
license distributed with a certain product or service or otherwise on a password-protected website for classroom use.
Shapes of Histograms
Skewness
A skewed histogram is one with a long tail extending to
either the right or the left:
Frequency

Frequency
Variable Variable

Positively Skewed Negatively Skewed

2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a
license distributed with a certain product or service or otherwise on a password-protected website for classroom use.
Shapes of Histograms
Modality
A unimodal histogram is one with a single peak, while a
bimodal histogram is one with two peaks:

Bimodal
Unimodal
Frequency

Frequency
Variable Variable

A modal class is the class with


the largest number of observations

2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a
license distributed with a certain product or service or otherwise on a password-protected website for classroom use.
Shapes of Histograms
Bell Shape
A special type of symmetric unimodal histogram is one that
is bell shaped:

Frequency

Many statistical techniques


require that the population
be bell shaped.
Variable

Drawing the histogram


helps verify the shape of
the population in question. Bell Shaped
2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a
license distributed with a certain product or service or otherwise on a password-protected website for classroom use.
Histogram Comparison
Compare & contrast the following histograms based on data
from Ex. 3.3 & Ex. 3.4:
The two courses, Business Statistics
and Mathematical Statistics have very
unimodal vs. bimodal different histograms

2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a
license distributed with a certain product or service or otherwise on a password-protected website for classroom use.
Ogive
(pronounced Oh-jive) is a graph of
a cumulative frequency distribution.

We create an ogive in three steps

First, from the frequency distribution created earlier,


calculate relative frequencies:

Relative Frequency = # of observations in a class


Total # of observations

2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a
license distributed with a certain product or service or otherwise on a password-protected website for classroom use.
Relative Frequencies
For example, we had 71 observations in our first class
(telephone bills from $0.00 to $15.00). Thus, the relative
frequency for this class is 71 200 (the total # of phone
bills) = 0.355 (or 35.5%)

2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a
license distributed with a certain product or service or otherwise on a password-protected website for classroom use.
Ogive
Is a graph of a cumulative frequency distribution.

We create an ogive in three steps


1) Calculate relative frequencies.
2) Calculate cumulative relative frequencies by adding the
current class relative frequency to the previous class
cumulative relative frequency.
(For the first class, its cumulative relative frequency is just its relative frequency)

2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a
license distributed with a certain product or service or otherwise on a password-protected website for classroom use.
Cumulative Relative Frequencies

first class

:
:

2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a
license distributed with a certain product or service or otherwise on a password-protected website for classroom use.
Ogive
Is a graph of a cumulative frequency distribution.
1) Calculate relative frequencies.
2) Calculate cumulative relative frequencies.
3) Graph the cumulative relative frequencies

2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a
license distributed with a certain product or service or otherwise on a password-protected website for classroom use.
Ogive

The ogive can be used to


answer questions like:

What telephone bill value


is at the 50th percentile?

around $35
(Refer also to Fig. 2.13 in your textbook)
2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a
license distributed with a certain product or service or otherwise on a password-protected website for classroom use.
Numerical Descriptive Techniques
Measures of Central Location
Mean, Median, Mode

Measures of Variability
Range, Standard Deviation, Variance, Coefficient of Variation

Measures of Relative Standing


Percentiles, Quartiles

Measures of Linear Relationship


Covariance, Correlation, Determination, Least Squares Line
2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a
license distributed with a certain product or service or otherwise on a password-protected website for classroom use.
Measures of Central Location
The arithmetic mean, a.k.a. average, shortened to mean, is
the most popular & useful measure of central location.

It is computed by simply adding up all the observations and


dividing by the total number of observations:

Sum of the observations


Mean =
Number of observations

2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a
license distributed with a certain product or service or otherwise on a password-protected website for classroom use.
Notation
When referring to the number of observations in a
population, we use uppercase letter N

When referring to the number of observations in a


sample, we use lower case letter n

The arithmetic mean for a population is denoted with Greek


letter mu:

The arithmetic mean for a sample is denoted with an


x-bar:
2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a
license distributed with a certain product or service or otherwise on a password-protected website for classroom use.
Arithmetic Mean

Sample Mean
Population Mean

2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a
license distributed with a certain product or service or otherwise on a password-protected website for classroom use.
The Arithmetic Mean
is appropriate for describing measurement data, e.g.
heights of people, marks of student papers, etc.

is seriously affected by extreme values called outliers.


E.g. as soon as a billionaire moves into a neighborhood, the
average household income increases beyond what it was
previously!

2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a
license distributed with a certain product or service or otherwise on a password-protected website for classroom use.
Measures of Central Location
The median is calculated by placing all the observations in
order; the observation that falls in the middle is the median.

Data: {0, 7, 12, 5, 14, 8, 0, 9, 22} N=9 (odd)


Sort them bottom to top, find the middle:
0 0 5 7 8 9 12 14 22

Data: {0, 7, 12, 5, 14, 8, 0, 9, 22, 33} N=10 (even)

Sort them bottom to top, the middle is the


simple average between 8 & 9:
0 0 5 7 8 9 12 14 22 33
median = (8+9)2 = 8.5
Sample and population medians are computed the same way.
2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a
license distributed with a certain product or service or otherwise on a password-protected website for classroom use.
Measures of Central Location
The mode of a set of observations is the value that occurs
most frequently.

A set of data may have one mode (or modal class), or two, or
more modes.

Mode is a useful for all data types, though mainly used for
nominal data.

For large data sets the modal class is much more relevant
than a single-value mode.

Sample and population modes are computed the same way.


2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a
license distributed with a certain product or service or otherwise on a password-protected website for classroom use.
Mode
E.g. Data: {0, 7, 12, 5, 14, 8, 0, 9, 22, 33} N=10

Which observation appears most often?


The mode for this data set is 0. How is this a measure of
central location?

A modal class
Frequency

Variable

2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a
license distributed with a certain product or service or otherwise on a password-protected website for classroom use.
=MODE(range) in Excel
Note: if you are using Excel for your data analysis and your
data is multi-modal (i.e. there is more than one mode), Excel
only calculates the smallest one.

You will have to use other techniques (i.e. histogram) to


determine if your data is bimodal, trimodal, etc.

2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a
license distributed with a certain product or service or otherwise on a password-protected website for classroom use.
Mean, Median, Mode
If a distribution is symmetrical,
the mean, median and mode may coincide
median
mode

mean

2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a
license distributed with a certain product or service or otherwise on a password-protected website for classroom use.
Mean, Median, Mode
If a distribution is asymmetrical, say skewed to the left or to
the right, the three measures may differ. E.g.:

median
mode

mean

2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a
license distributed with a certain product or service or otherwise on a password-protected website for classroom use.
Mean, Median, Mode: Which Is Best?
With three measures from which to choose, which one
should we use?

The mean is generally our first selection. However, there are


several circumstances when the median is better.

The mode is seldom the best measure of central location.

One advantage the median holds is that it not as sensitive to


extreme values as is the mean.

2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a
license distributed with a certain product or service or otherwise on a password-protected website for classroom use.
Mean, Median, Mode: Which Is Best?
To illustrate, consider the data in Example 4.1.

The mean was 11.0 and the median was 8.5.

Now suppose that the respondent who reported 33 hours


actually reported 133 hours (obviously an Internet addict).
The mean becomes
n

x
i 1
i
0 7 12 5 133 14 8 0 22 210
x 21.0
n 10 10

2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a
license distributed with a certain product or service or otherwise on a password-protected website for classroom use.
Mean, Median, Mode: Which Is Best?
This value is only exceeded by only two of the ten
observations in the sample, making this statistic a poor
measure of central location.

The median stays the same. When there is a relatively small


number of extreme observations (either very small or very
large, but not both), the median usually produces a better
measure of the center of the data.

2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a
license distributed with a certain product or service or otherwise on a password-protected website for classroom use.
Mean, Median, & Modes for Ordinal & Nominal Data
For ordinal and nominal data the calculation of the mean is
not valid.

Median is appropriate for ordinal data.

For nominal data, a mode calculation is useful for


determining highest frequency but not central location.

2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a
license distributed with a certain product or service or otherwise on a password-protected website for classroom use.
Measures of Central Location Summary
Compute the Mean to
Describe the central location of a single set of interval
data

Compute the Median to


Describe the central location of a single set of interval or
ordinal data

Compute the Mode to


Describe a single set of nominal data

2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a
license distributed with a certain product or service or otherwise on a password-protected website for classroom use.
Measures of Variability
Measures of central location fail to tell the whole story about
the distribution; that is, how much are the observations
spread out around the mean value?

For example, two sets of class


grades are shown. The mean
(=50) is the same in each case

But, the red class has greater


variability than the blue class.

2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a
license distributed with a certain product or service or otherwise on a password-protected website for classroom use.
Range
The range is the simplest measure of variability, calculated
as:

Range = Largest observation Smallest observation

E.g.
Data: {4, 4, 4, 4, 50} Range = 46
Data: {4, 8, 15, 24, 39, 50} Range = 46
The range is the same in both cases,
but the data sets have very different distributions

2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a
license distributed with a certain product or service or otherwise on a password-protected website for classroom use.
Range
Its major advantage is the ease with which it can be
computed.

Its major shortcoming is its failure to provide information on


the dispersion of the observations between the two end
points.

Hence we need a measure of variability that incorporates


all the data and not just two observations. Hence

2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a
license distributed with a certain product or service or otherwise on a password-protected website for classroom use.
Variance
Variance and its related measure, standard deviation, are
arguably the most important statistics. Used to measure
variability, they also play a vital role in almost all statistical
inference procedures.

Population variance is denoted by


(Lower case Greek letter sigma squared)

Sample variance is denoted by


(Lower case S squared)

2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a
license distributed with a certain product or service or otherwise on a password-protected website for classroom use.
Variance
population mean

The variance of a population is:


population size
sample mean

The variance of a sample is:

Note: The denominator is sample size (n) minus one !

2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a
license distributed with a certain product or service or otherwise on a password-protected website for classroom use.
Variance
As you can see, you have to calculate the sample mean (x-
bar) in order to calculate the sample variance.

Alternatively, there is a short-cut formulation to calculate


sample variance directly from the data without the
intermediate step of calculating the mean. Its given by:

2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a
license distributed with a certain product or service or otherwise on a password-protected website for classroom use.
Application
Example 4.7. The following sample consists of the number
of jobs six students applied for: 17, 15, 23, 7, 9, 13.
Finds its mean and variance.

What are we looking to calculate?

The following sample consists of the number of jobs six


students applied for: 17, 15, 23, 7, 9, 13.
Finds its mean and variance.

as opposed to or 2
2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a
license distributed with a certain product or service or otherwise on a password-protected website for classroom use.
Sample Mean & Variance
Sample Mean

Sample Variance

Sample Variance (shortcut method)

2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a
license distributed with a certain product or service or otherwise on a password-protected website for classroom use.
Standard Deviation
The standard deviation is simply the square root of the
variance, thus:

Population standard deviation:

Sample standard deviation:

2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a
license distributed with a certain product or service or otherwise on a password-protected website for classroom use.
Standard Deviation
Consider Example 4.8 [Xm04-08]where a golf club manufacturer has
designed a new club and wants to determine if it is hit more
consistently (i.e. with less variability) than with an old club.

Using Data > Data Analysis > Descriptive Statistics in Excel, we


produce the following tables for interpretation

You get more


consistent
distance with the
new club.

2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a
license distributed with a certain product or service or otherwise on a password-protected website for classroom use.
Interpreting Standard Deviation
The standard deviation can be used to compare the variability of
several distributions and make a statement about the general shape
of a distribution. If the histogram is bell shaped, we can use the
Empirical Rule, which states:

1) Approximately 68% of all observations fall within one standard


deviation of the mean.
2) Approximately 95% of all observations fall within two standard
deviations of the mean.
3) Approximately 99.7% of all observations fall within three standard
deviations of the mean.

2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a
license distributed with a certain product or service or otherwise on a password-protected website for classroom use.
The Empirical Rule
Approximately 68% of all observations fall
within one standard deviation of the mean.

Approximately 95% of all observations fall


within two standard deviations of the mean.

Approximately 99.7% of all observations fall


within three standard deviations of the mean.
2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a
license distributed with a certain product or service or otherwise on a password-protected website for classroom use.
Chebysheffs Theorem
A more general interpretation of the standard deviation is
derived from Chebysheffs Theorem, which applies to all
shapes of histograms (not just bell shaped).

The proportion of observations in any sample that lie


within k standard deviations of the mean is at least:

For k=2 (say), the theorem states


that at least 3/4 of all observations
lie within 2 standard deviations of
the mean. This is a lower bound
compared to Empirical Rules
approximation (95%).

2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a
license distributed with a certain product or service or otherwise on a password-protected website for classroom use.
Interpreting Standard Deviation
Suppose that the mean and standard deviation of last years
midterm test marks are 70 and 5, respectively. If the
histogram is bell-shaped then we know that approximately
68% of the marks fell between 65 and 75, approximately
95% of the marks fell between 60 and 80, and approximately
99.7% of the marks fell between 55 and 85.

If the histogram is not at all bell-shaped we can say that at


least 75% of the marks fell between 60 and 80, and at least
88.9% of the marks fell between 55 and 85. (We can use
other values of k.)

2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a
license distributed with a certain product or service or otherwise on a password-protected website for classroom use.
Coefficient of Variation
The coefficient of variation of a set of observations is the
standard deviation of the observations divided by their mean,
that is:

Population coefficient of variation = CV =

Sample coefficient of variation = cv =

2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a
license distributed with a certain product or service or otherwise on a password-protected website for classroom use.
Coefficient of Variation
This coefficient provides a
proportionate measure of variation, e.g.

A standard deviation of 10 may be perceived as large when


the mean value is 100, but only moderately large when the
mean value is 500.

2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a
license distributed with a certain product or service or otherwise on a password-protected website for classroom use.
Measures of Relative Standing & Box Plots
Measures of relative standing are designed to provide
information about the position of particular values relative to
the entire data set.

Percentile: the Pth percentile is the value for which P percent


are less than that value and (100-P)% are greater than that
value.
Suppose you scored in the 60th percentile on the GMAT, that
means 60% of the other scores were below yours, while 40% of
scores were above yours.

2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a
license distributed with a certain product or service or otherwise on a password-protected website for classroom use.
Quartiles
We have special names for the 25th, 50th, and 75th
percentiles, namely quartiles.

The first or lower quartile is labeled Q1 = 25th percentile.

The second quartile, Q2 = 50th percentile (which is also the


median).

The third or upper quartile, Q3 = 75th percentile.

We can also convert percentiles into quintiles (fifths) and deciles (tenths).

2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a
license distributed with a certain product or service or otherwise on a password-protected website for classroom use.
Commonly Used Percentiles
First (lower) decile = 10th percentile
First (lower) quartile, Q1, = 25th percentile
Second (middle)quartile,Q2, = 50th percentile
Third quartile, Q3, = 75th percentile
Ninth (upper) decile = 90th percentile

Note: If your exam mark places you in the 80th percentile,


that doesnt mean you scored 80% on the exam it means
that 80% of your peers scored lower than you on the exam;
It is about your position relative to others.

2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a
license distributed with a certain product or service or otherwise on a password-protected website for classroom use.
Location of Percentiles
The following formula allows us to approximate the location
of any percentile:

2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a
license distributed with a certain product or service or otherwise on a password-protected website for classroom use.
Location of Percentiles
Recall the data from Example 4.1:
0 0 5 7 8 9 12 14 22 33

Where is the location of the 25th percentile? That is, at


which point are 25% of the values lower and 75% of the
values higher?
0 0 5 7 8 9 12 14 22 33
L25 = (10+1)(25/100) = 2.75

The 25th percentile is three-quarters of the distance between the second


(which is 0) and the third (which is 5) observations. Three-quarters of the
distance is: (.75)(5 0) = 3.75
Because the second observation is 0, the 25th percentile is 0 + 3.75 = 3.75
2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a
license distributed with a certain product or service or otherwise on a password-protected website for classroom use.
Location of Percentiles
What about the upper quartile?

L75 = (10+1)(75/100) = 8.25

0 0 5 7 8 9 12 14 22 33

It is located one-quarter of the distance between the eighth and the ninth
observations, which are 14 and 22, respectively. One-quarter of the distance
is: (.25)(22 - 14) = 2, which means the 75th percentile is at: 14 + 2 = 16

2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a
license distributed with a certain product or service or otherwise on a password-protected website for classroom use.
Location of Percentiles
Please remember

position
2.75 16

0 0 | 5 7 8 9 12 14 | 22 33
position
3.75 8.25

Lp determines the position in the data set where the percentile value lies,
not the value of the percentile itself.

2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a
license distributed with a certain product or service or otherwise on a password-protected website for classroom use.
Interquartile Range
The quartiles can be used to create another measure of
variability, the interquartile range, which is defined as
follows:

Interquartile Range = Q3 Q1

The interquartile range measures the spread of the middle


50% of the observations.

Large values of this statistic mean that the 1st and 3rd
quartiles are far apart indicating a high level of variability.
2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a
license distributed with a certain product or service or otherwise on a password-protected website for classroom use.
Box Plots
The box plot is a technique that graphs five statistics:
the minimum and maximum observations, and

Whisker

Whisker (1.5*(Q3Q1))

the first, second, and third quartiles.


The lines extending to the left and right are called whiskers. Any points that lie outside
the whiskers are called outliers. The whiskers extend outward to the smaller of 1.5
times the interquartile range or to the most extreme point that is not an outlier.
2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a
license distributed with a certain product or service or otherwise on a password-protected website for classroom use.
Example 4.15
A large number of fast-food restaurants with drive-through
windows offering drivers and their passengers the
advantages of quick service. To measure how good the
service is, an organization called QSR planned a study
wherein the amount of time taken by a sample of drive-
through customers at each of five restaurants was recorded.
Compare the five sets of data using a box plot and interpret
the results.

2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a
license distributed with a certain product or service or otherwise on a password-protected website for classroom use.
Box Plots
These box plots are based on
data in Xm04-15.

Wendys service time is


shortest and least variable.

Hardees has the greatest


variability, while Jack-in-
the-Box has the longest
service times.

2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a
license distributed with a certain product or service or otherwise on a password-protected website for classroom use.
3.5 Mean-Variance Analysis and
the Sharpe Ratio
Explain mean-variance analysis and the Sharpe Ratio.

Mean-variance analysis:
The performance of an asset is measured by its rate of return.
The rate of return may be evaluated in terms of its reward
(mean) and risk (variance).
Higher average returns are often associated with higher risk.
The Sharpe ratio uses the mean and variance to
evaluate risk.

LO 3.5
2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a
license distributed with a certain product or service or otherwise on a password-protected website for classroom use.
Mean-Variance Analysis and
the Sharpe Ratio

Sharpe Ratio
Measures the extra reward per unit of risk.
For an investment , the Sharpe ratio is computed as:
x R
Sharpe Ratio
s
where is the mean return for the investment
is the mean return for a risk-free asset
is the standard deviation for the investment

LO 3.5
2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a
license distributed with a certain product or service or otherwise on a password-protected website for classroom use.
Mean-Variance Analysis and
the Sharpe Ratio
Sharpe Ratio Example
Compute the Sharpe ratios for the Metals and Income funds
given the risk free return of 4%.

Since 0.56 > 0.41, the Metals fund offers more reward per unit
of risk as compared to the Income fund.

LO 3.5
2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a
license distributed with a certain product or service or otherwise on a password-protected website for classroom use.
Parameters and Statistics
Population Sample
Size N n
Mean

Variance S2
Standard
Deviation S
Coefficient of
Variation CV cv
Covariance Sxy
Coefficient of
Correlation r
2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a
license distributed with a certain product or service or otherwise on a password-protected website for classroom use.

You might also like