You are on page 1of 27

PSY 2101: Introduction to Statistical Analysis

Instructor: Kris Preacher



TA: Carolyn Shivers
Meeting time: T/R 2:35 3:50


Lecture 4: Central Tendency and Variability


1 PSY 2101
2 PSY 2101

Last time we covered...

Algebra and notation
Histograms
Kernel densities
Stem-and-leaf plots
Scatter plots
Orientation to SPSS

Today well cover...

Mean, median, and mode
Range, variance, standard deviation
Estimator consistency, efficiency, and unbiasedness
Boxplots


Agenda

A frequency distribution describes the distribution in the sample.

e.g., a histogram:







A probability density function (pdf) describes the distribution in the population.

e.g., normal curve:


3 PSY 2101
Two kinds of distributions

Central tendency describes the "middle" of a distribution.

Also known as location.

Measured by the mode, median, or mean.

4 PSY 2101
Central tendency

Mode: the most common category or bin.

Not a very good measure of central tendency.

A distribution is unimodal if it has one mode, bimodal if it has two modes; multimodal if
more than two.


5 PSY 2101
Modes for frequency plots / histograms

Mode: the highest point on the curve.













mode
6 PSY 2101
Modes for probability density functions
0 30 60 90 120
( )
( )
( )
say we know
set 0, solve for .
is positive or negative?
f X
f X x
f X
'
=
''

Median: the point below which half the points lie.

If there are an even number of points, take the middle two and find their average.

7 PSY 2101
Medians for frequency plots / histograms

Median: the point below which half the area under the curve lies.


8 PSY 2101
Medians for probability density functions
( ) ( )
The median is the value of for which:
.5
a
a
x
f x f x

= =
} }

Mean: the expectation of a variable.

9 PSY 2101
Means for frequency plots / histograms
( )
1
n
i
i
x
x E X
N
=
= =


A more general formula:



For example, with a fair 6-sided die:
10 PSY 2101
Means for frequency plots / histograms
( ) ( )
1
k
i i i
i
x E X p x x
=
= = (

( ) ( ) ( ) ( ) ( ) ( )
1 1 1 1 1 1
1 2 3 4 5 6 3.5
6 6 6 6 6 6
i i
p x = + + + + + =

( ) ( ) ( ) ( ) ( ) ( ) ( )
6
6
1
1
1
1 2 3 4 5 6
6
1
6
i
i
i
i
E X
x
x
N
=
=
= + + + + + (

= =


A more general formula:



For example, with an unfair 6-sided die:
11 PSY 2101
Means for frequency plots / histograms
( ) ( ) ( ) ( ) ( ) ( ) ( )
.14 1 .07 2 .14 3 .14 4 .37 5 .14 6
.14 .14 .42 .56 1.85 .84
3.95
E X = + + + + +
= + + + + +
=
( ) ( )
1
k
i i i
i
x E X p x x
=
= = (


Mean =

12 PSY 2101
Means for probability density functions
( ) ( )
E X xf x dx

=
}

Trimmed mean: The mean computed after eliminating k extreme scores from the lower tail
and k extreme scores from the upper tail of a sample distribution.

Trimmed (or "winsorized") means are more stable from sample to sample than ordinary
means.

However, they are based on fewer data.

As k increases toward N/2, the trimmed mean approaches the median.

13 PSY 2101
Means for probability density functions

Outliers: values that clearly violate the trend established by other data. The mode is
relatively unaffected by outliers.



14 PSY 2101
Outliers' effect on the mode
SPSS

Outliers have little effect on the median unless there are lots of them.

15 PSY 2101
Outliers' effect on the median
SPSS

The mean can be greatly affected by outliers.

16 PSY 2101
Outliers' effect on the mean
SPSS


17 PSY 2101
Scales of measurement (again)
mode median mean
nominal x
ordinal x x
interval x x x
ratio x x x

1. The mean is easy to deal with algebraically.

2. The population mean is usually of interest. The best estimate of the population mean is
the sample mean.

3. We know all about the sampling distribution* of the mean; not so much about
distributions of medians or modes.




*The distribution of a statistic across (theoretically infite) repeated samples from the same population.
18 PSY 2101
Why the sample mean?



Arithmetic mean:





Geometric mean:





Harmonic mean:
19 PSY 2101
Other means ("Pythagorean means")
1
1
n
n
i
i
x
=
| |
|
\ .
[
1
n
i
i
x
N
=

1
1
n
i
i
N
x
=
| |
|
\ .


Variability (or dispersion) concerns how "spread out" the scores of a distribution are.
20 PSY 2101
Variability
SPSS

Range is simply the difference between the lowest score and the highest score.

Range is a very crude measure of variability; low ranges indicate low variability, but high
ranges are more difficult to interpret.
21 PSY 2101
Range and percentiles
SPSS
1 2 3 4 5 6 7 8
Range = 6
9 10 11 12 13 14 15 16
1 2 3 4 5 6 7 8
Range = 14
9 10 11 12 13 14 15 16

A percentile is the value of x at or below which a given percentage of scores lie.

Common percentiles used as descriptive statistics in social science research are the
quartiles (25%, 50% [median], and 75%) and tertiles (33%, 67%).

Interquartile range is the difference between the 25
th
and 75
th
percentiles (i.e., the range
of the middle 50% of the scores in a distribution).
22 PSY 2101
Range and percentiles
SPSS

Boxplots are a common way to summarize a distribution graphically.


23 PSY 2101
Boxplots
SPSS
o o
*
Median
25
th
%ile
75
th
%ile
lower fence
(and, here, the
minimum)
upper
fence
outlier

Definitional Computational

Est. of Pop. Est. of Pop.
Sample Parameter Sample Parameter


Variance
s
2




Standard
Deviation
s


24 PSY 2101
Variance and standard deviation
( )
( )
2
1
2
1
N
i
i
N
i
i
x x
N
x x
N
=
=

( )
( )
2
1
2
1
1
1
N
i
i
N
i
i
x x
N
x x
N
=
=

2
2 1
1
2
2 1
1
N
i
N
i
i
i
N
i
N
i
i
i
x
x
N
N
x
x
N
N
=
=
=
=
| |
|
\ .

| |
|
\ .

SPSS
2
2 1
1
2
2 1
1
1
1
N
i
N
i
i
i
N
i
N
i
i
i
x
x
N
N
x
x
N
N
=
=
=
=
| |
|
\ .

| |
|
\ .


The statistics discussed today (mean, standard deviation, variance, and so on) are all
estimators if they are used as sample-based guesses of population parameters (that is, if
they are used as inferential statistics).

"Good" estimators have three very important characteristics: unbiasedness, efficiency, and
consistency.

Say we collect an infinite number of samples from the population, then compute a given
sample statistic in each one.

Unbiasedness: The average of the statistics across samples will equal the parameter.

Efficiency: The estimates will not vary too much around the parameter.

Consistency: As N increases, the mean estimate approaches the parameter.

Howell notes that good estimators should also be resistant to the influence of outliers.
25 PSY 2101
Sidebar: Good estimators




turns out to be a biased estimate of the population variance.






on the other hand, is unbiased.

26 PSY 2101
Sidebar: Good estimators
( )
2
2
1
N
i
i
x x
s
N
=

=

( )
2
2
1
1
N
i
i
x x
s
N
=

=


Homework 1 has been posted! (due 1/31 at the beginning of class)

Read Chapter 3.
27 PSY 2101
Next time...

You might also like