You are on page 1of 40

IMS 502

Information Analysis for Decision Making


Descriptive Statistics
Content
• Measure of Location
• Measure of Spread
• Measure of Shape
Learning Outcomes
• After completing this chapter, you should be
able to:
– Able to conduct appropriate measure of location,
measure of spread & measure of shape
Concept of Descriptive
Statistics

Measure of Shape

Measure of Spread Measure of Location


Describing Data Numerically
(Variability) (Central Tendency)
Concept of Descriptive
Statistics
Measure of Location
(Central Tendency)
• Descriptive information about the single numerical value that is
considered to be the most typical of the values of a quantitative variable.
• A measure of central tendency is a measure which indicates where the
middle of the data is.
• Three common measures of central tendency are:

Mode Mean

Median
Measures of Central
Tendency
• Statistics to represent the ‘centre’ of a
distribution
– Mode (most frequent)
– Median (50th percentile)
– Mean (average)
• Choice of measure dependent on
– Type of data
– Shape of distribution (esp. skewness)
Measures of Central
Tendency

Measure of Central Tendency

Level of Mode Median Mean


Measurement
Nominal X
Ordinal X X X
Interval X X X
Ratio X? X X
Measures of Central
Tendency
• Choosing a measure of central tendency
Use the Mode When : 1. The variable is measured at the nominal level.
2. You want a quick and easy measure for ordinal and
interval-ratio variables.
3. You want to report the most common score.
Use the Median When : 1. The variable is measured at the ordinal level.
2. A variables measured at the interval-ratio level has a
highly skewed distribution.
3. You want to report the central score. The median
always lies at the exact center of a distribution.
Use the Mean When : 1. The variable is measured at the interval-ratio level
(except when the variable is highly skewed).
2. You want to report the typical score. The mean is “the
fulcrum that exactly balances all of the scores.”
3. You anticipate additional statistical analysis.
Measures of Central
Tendency
• Mean
– Is the average of the data
• The
N
Population Mean:
µ= ∑X i
which is usually unknown, then we use the
i =1
N
sample mean to estimate or approximate it.
• The Sample Mean:
n
=
x ∑ x
i =1
n
i
Measures of Central
Tendency

Example:
Here is a random sample of size 10 of ages, where
χ 1 = 42, χ 2 = 28, χ 3 = 28, χ 4 = 61, χ 5 = 31,
χ 6 = 23, χ 7 = 50, χ 8 = 34, χ 9 = 32, χ 10 = 37.

x = (42 + 28 + … + 37) / 10 = 36.6


Measures of Central
Tendency
• Properties of the Mean
– Uniqueness:
Uniqueness For a given set of data there is
one and only one mean.
– Simplicity:
Simplicity It is easy to understand and to
compute.
– Affected by extreme values:
values Since all values
enter into the computation.
Measures of Central
Tendency

Example:
Assume the values are 115, 110, 119, 117, 121 and 126.

The mean = 118.

But assume that the values are 75, 75, 80, 80 and 280.

The mean = 118, a value that is not representative of the set of data as
a whole.
Concept of Descriptive
Statistics
• Median
– When ordering the data, it is the observation that
divide the set of observations into two equal parts
such that half of the data are before it and the other
are after it.
– The median is the center point in a set of numbers
(50% above, 50% below)
Concept of Descriptive
Statistics
• Median
– Check to see which of the following two
rules applies:
• Rule One
If n is odd, the median will be the middle of
observations.
It will be the (n+1)/2 th ordered observation.
When n = 11, then the median is the 6th observation.
Example:
Three is the median for the numbers 1, 1, 3, 4, 9
Concept of Descriptive
Statistics
• Median
– Check to see which of the following two
rules applies:
• Rule Two
If n is even, there are two middle observations. The median
will be the mean of these two middle observations.
It will be the (n+1)/2 th ordered observation.
When n = 12, then the median is the 6.5th observation,
which is an observation halfway between the 6th and 7th
ordered observation.
Example:
23, 28, 28, 31, 32, 34, 37, 42, 50, 61.
Since n = 10, then the median is the 5.5th observation, i.e. =
(32+34)/2 = 33
Measures of Central
Tendency
• Properties of the Median
– Uniqueness:
Uniqueness For a given set of data there is
one and only one median.
– Simplicity:
Simplicity It is easy to calculate.
– It is not affected by extreme values as is the
mean.
Concept of Descriptive
Statistics
• Mode
– The mode is simply the most frequently
occurring number.
– If all values are different there is no mode.
mode
– Sometimes, there are more than one mode. mode
Measures of Central
Tendency

Example:
Assume the values are 23, 28, 28, 31, 32, 34, 37, 42, 50, 61.

The median = 28 (repeated two times).


Measures of Central
Tendency
• Properties of the Mode
– Sometimes, it is not unique.
– It may be used for describing qualitative
data.
Measures of Central
Tendency
• The only procedure in SPSS that will produce all three
commonly used measure of location is Frequency.
• To begin:
Working Example(Pg. 59)
• One hundred tennis players participated
in a serving competition. Gender and
number of aces were recorded for each
player. The data can be found in
Work4.sav on the iLearn web site that
accompanies this title.
• Follow steps 1, 2, 3, 4, 6, 8 & 11.
Exercises
• Use the Frequencies command to get all
three (3) measures of location.
• Write a sentence or two reporting each
measure. You may choose three(3)
variables to report on.
Concept of Descriptive
Statistics
Measure of Spread
(Variability/Dispersion)
• Give information on the spread or variability of the data values.
• Measures of deviation from the central tendency.
• Non-parametric/non-normal:
• range, percentiles, min, max
• Parametric:
• standard deviation (SD) & properties of the normal distribution
Concept of Descriptive
Statistics
Measure of Spread
(Variability/Dispersion)
• Four common measures of spread are:

Range
Quartiles
Variance
Standard
Deviation
Measures of Spread

Measure of Spread

Level of Range, Min/Max Percentile Standard Deviation


Measurement (SD)
Nominal
Ordinal X
Interval X X X?
Ratio X X X
Measures of Spread
• Range (R)
– The range is the difference between the
highest and lowest scores in a distribution.
– If we examine the marks of the 100 students
above, then we can see that the highest
score was 85 and the lowest was 35.
Therefore, the range is 50 (85 – 35 = 50).
– The range is limited as a means of telling
about the general spread of a group of data,
it does set the boundaries of the scores.
Measures of Spread
• Quartiles (Q)
– The quartiles split the ordered data into four
quarters:
• Q1, or 25th percentile -- 25.0% of the
observations
• Q2, is the median -- 50.0% of the observations
• Q3, or 75th percentile -- 75.0% of the
observations
– The difference between Q3 - Q1 is called the
inter-quartile range, or IQR.

25% 25% 25% 25%


Q1 Q2 Q3
Measures of Spread
• Variance
– The variance is the square of the standard
deviation.
– The lower the variance, the more accurately
the mean represents the scores of all cases
in a distribution of data.
Measures of Spread
• Standard Deviation
– The standard deviation provides the
researcher with an indicator of how scores
for variables are spread around the mean
average.
– The higher the standard deviation, the more
scores around the mean are spread out.
Measures of Spread
• Using SPSS for measure of spread.
• To begin:
Working Example(Pg. 59)
• One hundred tennis players participated
in a serving competition. Gender and
number of aces were recorded for each
player. The data can be found in
Work4.sav on the iLearn web site that
accompanies this title.
• Follow steps 1, 2, 3, 4, 5, 7, 8 & 11.
Exercises
• Choose three(3) variables to work on.
• Write a few sentences summarizing these
tables for each under measurement of
spread.
• Describe the difference (if any).
Concept of Descriptive
Statistics
Measure of Shape

• To describes how data are distributed.


• Use the normal curve (combination of mean & standard deviation) to construct
precise descriptive statements.
• Two common measures of shape are:

Kurtosis Skewness

is a measure of whether the data are is a measure of symmetry, or more


peaked or flat relative to a normal precisely, the lack of symmetry.
distribution.

Platykurtic Normal

Leptokurtic Positive Skewness

Negative Skewness
Measure of Shape - Skewness

Mean < Median < Mode Mean = Median = Mode Mode < Median < Mean
Coefficient = Negative Coefficient = 0 Coefficient = Positive
Measure of Shape - Kurtosis
• Data distribution with small
standard deviation.
• Data sets with high kurtosis tend
to have a distinct peak near the
mean, decline rather rapidly, and
have heavy tails.
• The data will cluster around or
close to the Mean.
• Kurtosis, γ2 > 0 (Leptokurtic)
• Data distribution with large
standard deviation.
• Data sets with low kurtosis tend
to have a flat top near the mean
rather than a sharp peak.
• The data will be far away from
the mean.
• Kurtosis, γ2 < 0 (Platykurtic)
Measure of Shape
• Using SPSS for measure of shape.
• To begin:
Working Example(Pg. 59)
• One hundred tennis players participated
in a serving competition. Gender and
number of aces were recorded for each
player. The data can be found in
Work4.sav on the iLearn web site that
accompanies this title.
• Follow steps 1, 2, 3, 4, Click on Skewness
& Kurtosis, 8, 9, 10 & 11.
Exercise
• Choose three(3) variables to work on.
• Write a few sentences summarizing these
tables for each under measurement of
shape.
• Describe the difference (if any).

You might also like