Professional Documents
Culture Documents
= =
} }
Mean: the expectation of a variable.
9 PSY 2101
Means for frequency plots / histograms
( )
1
n
i
i
x
x E X
N
=
= =
A more general formula:
For example, with a fair 6-sided die:
10 PSY 2101
Means for frequency plots / histograms
( ) ( )
1
k
i i i
i
x E X p x x
=
= = (
( ) ( ) ( ) ( ) ( ) ( )
1 1 1 1 1 1
1 2 3 4 5 6 3.5
6 6 6 6 6 6
i i
p x = + + + + + =
( ) ( ) ( ) ( ) ( ) ( ) ( )
6
6
1
1
1
1 2 3 4 5 6
6
1
6
i
i
i
i
E X
x
x
N
=
=
= + + + + + (
= =
A more general formula:
For example, with an unfair 6-sided die:
11 PSY 2101
Means for frequency plots / histograms
( ) ( ) ( ) ( ) ( ) ( ) ( )
.14 1 .07 2 .14 3 .14 4 .37 5 .14 6
.14 .14 .42 .56 1.85 .84
3.95
E X = + + + + +
= + + + + +
=
( ) ( )
1
k
i i i
i
x E X p x x
=
= = (
Mean =
12 PSY 2101
Means for probability density functions
( ) ( )
E X xf x dx
=
}
Trimmed mean: The mean computed after eliminating k extreme scores from the lower tail
and k extreme scores from the upper tail of a sample distribution.
Trimmed (or "winsorized") means are more stable from sample to sample than ordinary
means.
However, they are based on fewer data.
As k increases toward N/2, the trimmed mean approaches the median.
13 PSY 2101
Means for probability density functions
Outliers: values that clearly violate the trend established by other data. The mode is
relatively unaffected by outliers.
14 PSY 2101
Outliers' effect on the mode
SPSS
Outliers have little effect on the median unless there are lots of them.
15 PSY 2101
Outliers' effect on the median
SPSS
The mean can be greatly affected by outliers.
16 PSY 2101
Outliers' effect on the mean
SPSS
17 PSY 2101
Scales of measurement (again)
mode median mean
nominal x
ordinal x x
interval x x x
ratio x x x
1. The mean is easy to deal with algebraically.
2. The population mean is usually of interest. The best estimate of the population mean is
the sample mean.
3. We know all about the sampling distribution* of the mean; not so much about
distributions of medians or modes.
*The distribution of a statistic across (theoretically infite) repeated samples from the same population.
18 PSY 2101
Why the sample mean?
Arithmetic mean:
Geometric mean:
Harmonic mean:
19 PSY 2101
Other means ("Pythagorean means")
1
1
n
n
i
i
x
=
| |
|
\ .
[
1
n
i
i
x
N
=
1
1
n
i
i
N
x
=
| |
|
\ .
Variability (or dispersion) concerns how "spread out" the scores of a distribution are.
20 PSY 2101
Variability
SPSS
Range is simply the difference between the lowest score and the highest score.
Range is a very crude measure of variability; low ranges indicate low variability, but high
ranges are more difficult to interpret.
21 PSY 2101
Range and percentiles
SPSS
1 2 3 4 5 6 7 8
Range = 6
9 10 11 12 13 14 15 16
1 2 3 4 5 6 7 8
Range = 14
9 10 11 12 13 14 15 16
A percentile is the value of x at or below which a given percentage of scores lie.
Common percentiles used as descriptive statistics in social science research are the
quartiles (25%, 50% [median], and 75%) and tertiles (33%, 67%).
Interquartile range is the difference between the 25
th
and 75
th
percentiles (i.e., the range
of the middle 50% of the scores in a distribution).
22 PSY 2101
Range and percentiles
SPSS
Boxplots are a common way to summarize a distribution graphically.
23 PSY 2101
Boxplots
SPSS
o o
*
Median
25
th
%ile
75
th
%ile
lower fence
(and, here, the
minimum)
upper
fence
outlier
Definitional Computational
Est. of Pop. Est. of Pop.
Sample Parameter Sample Parameter
Variance
s
2
Standard
Deviation
s
24 PSY 2101
Variance and standard deviation
( )
( )
2
1
2
1
N
i
i
N
i
i
x x
N
x x
N
=
=
( )
( )
2
1
2
1
1
1
N
i
i
N
i
i
x x
N
x x
N
=
=
2
2 1
1
2
2 1
1
N
i
N
i
i
i
N
i
N
i
i
i
x
x
N
N
x
x
N
N
=
=
=
=
| |
|
\ .
| |
|
\ .
SPSS
2
2 1
1
2
2 1
1
1
1
N
i
N
i
i
i
N
i
N
i
i
i
x
x
N
N
x
x
N
N
=
=
=
=
| |
|
\ .
| |
|
\ .
The statistics discussed today (mean, standard deviation, variance, and so on) are all
estimators if they are used as sample-based guesses of population parameters (that is, if
they are used as inferential statistics).
"Good" estimators have three very important characteristics: unbiasedness, efficiency, and
consistency.
Say we collect an infinite number of samples from the population, then compute a given
sample statistic in each one.
Unbiasedness: The average of the statistics across samples will equal the parameter.
Efficiency: The estimates will not vary too much around the parameter.
Consistency: As N increases, the mean estimate approaches the parameter.
Howell notes that good estimators should also be resistant to the influence of outliers.
25 PSY 2101
Sidebar: Good estimators
turns out to be a biased estimate of the population variance.
on the other hand, is unbiased.
26 PSY 2101
Sidebar: Good estimators
( )
2
2
1
N
i
i
x x
s
N
=
=
( )
2
2
1
1
N
i
i
x x
s
N
=
=
Homework 1 has been posted! (due 1/31 at the beginning of class)
Read Chapter 3.
27 PSY 2101
Next time...