You are on page 1of 25

Lesson 1

Measures of Central Tendency:


The Mean, Median, and Mode
One of the most basic purposes of statistics is simply to enable us to make sense of large
numbers. For example, if you want to know how the students in your school are doing in the
statewide achievement test, and somebody gives you a list of all 600 of their scores, thats
useless. This everyday problem is even more obvious and staggering when youre dealing, lets
say, with the population data for the nation.
Weve got to be able to consolidate and synthesize large numbers to reveal their collective
characteristics and interrelationships, and transform them from an incomprehensible mass to a
set of useful and enlightening indicators.
The Mean
One of the most useful and widely used techniques for doing thisone which you already
knowis the average, or, as it is know in statistics, the mean. And you know how to calculate
the mean: you simply add up a set of scores and divide by the number of scores. Thus we have
our first and perhaps the most basic statistical formula:

Where:
(sometimes call the X-bar) is the symbol for the mean.
(the Greek letter sigma) is the symbol for summation.
X is the symbol for the scores.
N is the symbol for the number of scores.
So this formula simply says you get the mean by summing up all the scores and dividing the total
by the number of scoresthe old average, which in this case were all familiar with, so its a good
place to begin.
This is pretty simple when you have only a few numbers. For example, if you have just 6 numbers
(3, 9, 10, 8, 6, and 5), you insert them into the formula for the mean, and do the math:

But we usually have many more numbers to deal with, so lets do a couple examples where the
numbers are larger, and show how the calculations should be done. In our first example, were
going to compute the mean salary of 36 people. Column A of Table 1 show the salaries (ranging
from $20K to $70K), and column B shows how many people earned each of the salaries.
Table 1
Example 1 of Method for Computing the Mean
A
Salary (X)
$20k
$25K
$30K
$35K

B
Frequency (f)
1
2
3
4

C
fX
20
50
90
140

$40K
$45K
$50K
$55K
$60K
$65K
$70K
Sum

5
6
5
4
3
2
1
36

200
270
250
220
180
130
70
1,620

To get the
for our formula, we multiply the number of people in each salary category by the
salary for that category (e.g., 1 x 20, 2 x 25, etc.), and then total those numbers (the ones in
column C). Thus we have:

And this is how the distribution of these salaries looks:


Figure 1
Distribution of Example 1 Salaries

The scores in this distribution are said to be normally distributed, i.e., clustered around a central
value, with decreasing numbers of cases as you move to the extreme ends of the range. Thus the
term normal curve.
So, computing the mean is pretty simple. Piece of cake, right? Not so fast.
In our second example, lets look what happens if we change just six peoples salary in Table 1.
Lets suppose that the three people who made $60K actually made $220K, and that the two who
made $65K made $205K, and the one person who made $70K made $210K. The revised salary
table is the same except for these changes.
Table 2
Example 2 of Method for Computing the Mean
A
Salary (X)
$20k
$25K
$30K
$35K
$40K

B
Frequency (f)
1
2
3
4
5

C
fX
20
50
90
140
200

$45K
$50K
$55K
$200K
$205K
$210K
Sum

6
5
4
3
2
1
36

270
250
220
600
410
210
2,460

But before we recompute the mean, lets look at how different the distribution looks.
Figure 2
Distribution of Example-2 Salaries

Now, using the revised numbers in Table 2, we compute the mean as follows:

What this shows is that changing the salaries of just six individuals to extreme values greatly
affects the mean. In this case, it raised the mean from $45K to $68.3K (an increase of 52%), even
though all the other scores remained the same. In fact, the mean is a figure that no person in the
group hashardly a figure we would think of as "average" for the group.
The important lesson here is that the mean is intended to be a measure of central tendency, but it
works usefully as such only if the data on which it is based are more or less normally distributed
(like in Figure 1). The presence of extreme scores distorts the mean, and, in this case, gives us a
mean salary ($68.3K) that is not a very good indication of the "average" salary of this group of 36
individuals.
So if we know or suspect that our data may have some extreme scores that would distort the
mean, what measure can we use to give us a better measure of central tendency? One such
measure is the median, and we move on to learn about that now.
The Median
If your data are normally distributed (like those in Figure 1), the preferred measure of central
tendency is the mean. However, if your data are not normally distributed (like those in Figure 2),
the median is a better measure of central tendency, for reasons well see in a moment.

The median is the point in the distribution above which and below which 50% of the scores lie. In
other words, if we list the scores in order from highest to lowest (or lowest to highest) and find
the middle-most score, thats the median.
For example, suppose we have the following scores: 2, 12, 4, 11, 3, 7, 10, 5, 9, 6. The next step is
to array them in order from lowest to highest.
2
3
4
5
6
7
9
10
11
12
Since we have 10 scores, and 50% of 10 is 5, we want the point above which and below which
there are five scores. Careful. If you count up from the bottom, you might think the median is 6.
But thats not right because there are 4 scores below 6 and 5 above it. So how do we deal with
that problem? We deal with it by understanding that in statistics, a measurement or a score is
regarded not as a point but as an interval ranging from half a unit below to half a unit above the
value. So in this case, the actual midpoint or median of this distributionthe point above which
and below which 50% of the scores lieis 6.5
As we saw with the mean, when we have only a few numbers, its pretty simple. But how do we
find the median when we have larger numbers and more than one person with the same score?
Its not difficult. Lets use the salary data in Table 1.
Table 3
Example 1 of Method for Computing the Median
Salary
$20K
$25K
$30K
$35K
$40K
$45K
$50K
$55K
$60K
$65K
$70K
Sum

Range
$19.5K-20.5K
$24.5K-25.5K
$29.5K-30.5K
$34.5K-35.5K
$39.5K-40.5K
$44.5K-45.5K
$49.5K-50.5K
$54.5K-55.5K
$59.5K-60.5K
$64.5K-65.5K
$69.5K-70.5K

Frequency
1
2
3
4
5
6
5
4
3
2
1
36

The salaries are already in order from lowest to highest, so the next step in finding the median is
to determine how many individuals (ratings, scores, or whatever) we have. Those are shown in
the frequency column, and the total is 36. So our N = 36, and we want to find the salary point
above which and below which 50%, or 18, of the individuals fall. If we count up from the bottom
through the $40K level, we have 15, and we need three more. But if we include the $45K level (in
which there are 6), we have 21, three more than we need. Thus, we need 3, or 50%, of the 6 cases
in the $45K category. We add this value (.5) to the lower limit of the interval in which we know the
median lies ($44.5K-$45.5K), and this gives us value of $45K.
In this case, the mean and the median are the sameas they always are in normal distributions.
So in situations like this, the mean is the preferred measure.
But things arent always so neat and tidy. Lets now compute the median for the salary data in
Table 2, which we know (from Figure 2) are not normally distributed.

Table 4
Example 2 of Method for Computing the Median
Salary
$20k
$25K
$30K
$35K
$40K
$45K
$50K
$55K
$200K
$205K
$210K
Sum

Range
$19.5K-20.5K
$24.5K-25.5K
$29.5K-30.5K
$34.5K-35.5K
$39.5K-40.5K
$44.5K-45.5K
$49.5K-50.5K
$54.5K-55.5K
$199.5K-200.5K
$204.5K-205.5K
$209.5K-210.5K

Frequency
1
2
3
4
5
6
5
4
3
2
1
36

The N is the same (36), so we go through exactly the same calculations we did for the data in
Table 3. When we do that (count up from the bottom, find that we need half the cases in the $45K
category to get 50% (18) of the total, and do so by adding .5 to the lower limit of that category),
incredibly we get exactly the same result ($45K) we did with the data in Table 3. In other words,
those six extreme cases (the six whose salaries changed from $60K, $65K, and $70K to $200K,
$205K, and $210K) dont affect the median even though they made a big change in the mean.
They are still above the midpoint, and it doesnt matter how much above it in the calculation of
the median.
This example illustrates dramatically what the median is and why its a better measure of central
tendency than the mean when we have extreme scores.
Weve done the calculations for the median in a simple, descriptive way (arraying the scores from
high to low, counting up to the mid-category, dividing it as necessary, etc.), but just so you wont
feel slighted, here is the statistical formula for doing what weve just done.

Where:
Mdn is the median.
L is the lower limit of the interval containing the median.
N is the total number of scores.
is the sum of the frequencies or number of scores up to the interval containing the median.
fw is the frequency or number of scores within the interval containing the median.
i is the size or range of the interval.
The Mode
The third and last of the measures of central tendency well be dealing with in this course is the
mode. Its very simple: The mode is the most frequently occurring score or value. In our case
(see Figures 1 and 2), that value is 45K. But sometimes we may have odd distributions in which
there may be two peaks. Even if the peaks are not exactly equal, theyre referred to as bi-modal
distributions.

Lets assume we have such a bi-modal distribution of salaries as shown in Table 5 and Figure 3.
Table 5
Bi-Modal Distribution of Salaries
A
Salary (X)
$20K
$25K
$30K
$35K
$40K
$45K
$50K
$55K
$60K
$65K
$70K
Sum

B
Frequency (f)
1
3
4
6
3
1
3
5
6
3
1
36

C
fX
20
75
120
210
120
45
150
275
360
195
70
1,640

Figure 3
Example of a Bi-Modal Distribution

Before we talk about the mode, using the formulas and calculation procedures youve just
learned, calculate the mean and median for the salaries in Table 5 (the fx and the
data are in
Column C).
When you look at this distribution of salaries, as shown graphically in Figure 3, its hard to
discern any central tendency. The mean (which you just calculated) is $45K, which only one
person earns, and the median is also $45K, which, while its the middle-most value (50% of the
cases are above and below it), certainly doesnt give us a meaningful indication of the central
tendency in this distributionbecause there isnt any.
Therefore, the most informative general statement we can make about this distribution is to say
the it is bi-modal.
You now know the three principal measures of central tendencythe mean, the median, and the
modewhen they should be used, and how to calculate them, so we now move on to the other
side of the central-tendency coin: dispersion.
Lesson 2

The Standard Deviation and the Normal Curve


A Measure of Dispersion: The Standard Deviation
For various important reasons we'll see as we get further into this course, we often want to know
not only what the central tendency is in a set of scores or values (i.e., the mean, the median, or
the mode), we also want to know how bunched up or spread out the scores are.The most widely
used indicator of dispersion is the standard deviation which, in a nutshell, is based on the
deviation of each score from the mean.
To illustrate, compare the distribution of test scores in Figures 4 and 5. The first is flat and
spread out, while the second is concentrated and bunched up closely around the mean.
Figure 4
Graphic Display of Flat or Spread-Out Score Distribution

Figure 5
Display of a Narrow or Concentrated Distribution

Note that he mean and median of these two quite different distributions are the same ( = 150,
Mdn = 150), so simply calculating and reporting those two measures of central tendency would
fail to reveal how different the dispersion of scores is between the two groups. But we can do this
by calculating the standard deviation.

The standard deviation provides us with a measure of just how spread out the scores are: a high
standard deviation means the scores are widely spread; a low standard deviation means they're
bunched up closely on either side of the mean.
We'll now calculate the standard deviation for both these distributions. The formula for the
standard deviation is:

Where:
(little sigma) is the standard deviation.
d2 is a score's deviation from the mean squared.
is the number of cases.
The numbers we need to calculate the standard deviation for Figure 4, the flat distribution, are in
Table 6.
Table 6
Data for Figure 4the Flat Distribution
A
Test Score (X)
100
110
120
130
140
150
160
170
180
190
200
SUM

B
Frequency (f)
8
13
17
20
21
22
21
20
17
13
8
180

C
XMean (d)
50
40
30
20
10
0
-10
-20
-30
-40
-50

D
fd
400
520
510
400
210
0
-210
-400
-510
-520
-400

E
fd2
20,000
20,800
15,300
8,000
2,100
0
2,100
8,000
15,300
20,800
20,000
132,400

Column A displays the test scores (X).


Column B shows how many people got each test score (f).
Column C is the test score minus the mean (X minus the mean or d).
Column D is the sum of the deviations in column C (fd).
Column E contains the squares of all the deviations.
Of course, to get the deviation of each score from the mean (column C), we have to calculate the
mean, and you already know how to do that. We now have what we need to calculate the standard
deviation for the flat distribution in Figure 4:

or

You can do the last part of this calculation, the square root of 132,400/180 (which is 736) by using
the square-root button on your little hand calculator.
Now let's compute the standard deviation for the data in Figure 5. The data are in Table 7, and
you follow the same steps we've just completed.
Table 7
Example of a Narrow or Concentrated Distribution
A
Test Score (X)
100
110
120
130
140
150
160
170
180
190
200
SUM

B
Frequency (f)
0
0
0
10
45
70
45
10
0
0
0
180

C
X - Mean (d)
50
40
30
20
10
0
-10
-20
-30
-40
-50

D
fd
0
0
0
200
450
0
-450
-200
0
0
0

E
fd2
0
0
0
4,000
4,500
0
4,500
4,000
0
0
0
17,000

or
The two standard deviations provide a statistical indication of the how different the distributions
are: 27 for the spread-out distribution and 10 for the bunched-up distribution.
So once we know the mean and median, why do we need to know the standard deviation? What
use is it?
The standard deviation is important because, regardless of the mean, it makes a great deal of
difference whether the distribution is spread out over a broad range or bunched up closely
around the mean. For example, suppose you have two classes whose mean reading scores are
the same. With only that information, you would be inclined to teach the two classes in the same
way. But suppose you discover that the standard deviation of one of the classes is 27 and the
other is 10, as in the examples we just finished working with. That means that in the first class
(the one where
27), you have many students throughout the entire range of performance.
You'll need to have teaching strategies for both the gifted and the challenged. But in the second
class (the one where = 10), you don't have any gifted or challenged students. They're all
average, and your teaching strategy will be entirely different.
The Normal Curve
Before we leave the standard deviation, it's a good time to learn a little more about the normal
curve. We'll be coming back to it later.
First, why is it called the normal curve? The reason is that so many things in life are distributed in
the shape of this curve: IQ, strength, height, weight, musical ability, resistance to disease, and so
on. Not everything is normally distributed, but most things are. Thus the term normal curve.
In Figure 6, we have a set of scores which are normally distributed. The range is from 0 to 200,
the mean and median are 100, and the standard deviation is 20. In a normal curve, the standard
deviation indicates precisely how the scores are distributed. Note that the percentage of scores
is marked off by standard deviations on either side of the mean. In the range between 80 and 20
(thats one standard deviation on either side of the mean), there are 68.26% of the cases. In other
words, in a normal distribution, roughly two thirds of the scores lie between one standard

deviation on either side of the mean. If we go out to two standard deviations on either side of the
mean, we will include 95.44% of the scores; and if we go out three standard deviations, that will
encompass 98.74% of the scores; and so on.
Another way to think about this is to realize that in this distribution, if you have a score thats
within one standard deviation of the mean, i.e., between 80 and 120, thats pretty averagetwo
thirds of the people are concentrated in that range. But if you have a score thats two or three
standard deviations away from the mean, that is clearly a deviant score, i.e., very high or very
low. Only a small percent of the cases lie that far out from the mean.
This is valuable to understand in its own right, and will become useful when we take up
determining the significance of difference between meanswhich were going to do next in
Lesson 3.
Figure 6
Normal Curve Showing the Percent of Cases Lying Within 1, 2, and 3 Standard Deviations From
the Mean

Testing the Difference Between Means: The t-Test


This is one of the most important parts of this course in basic statistics. Here were going to learn
about testing the significance of difference between means. What does that mean?
Suppose youre the superintendent, and one of your principals bursts into your office
enthusiastically and says, "I know youll be happy to learn that after our big effort this year in
reading, my third graders improved from 187 to 195 on the state reading test!"
You immediately ask her, "Is the 8-point difference between those means statistically
significant?" When her eyes glaze over and she says, "Huh?" you smile, forebearingly, (because
youve taken this course in basic statistics, and she hasnt), and you patiently explain to her that
simply because there is a numerical difference between last years and this years mean scores
doesnt mean that there is real difference. It could be due to chance variation in the scores.
So how do we know when the difference between two means is probably a real difference, not
one due to chance? We have to say "probably" because nothing in statistics is absolutely certain
(as is the case with most things in life). But there are statistical tests which can tell us how likely
a difference between two means is due to chance.
One of the most widely used statistical methods for testing the difference between means, and
the one were going to get you up-to-speed on, is called the t-test.
Lets go back to the salary data we worked with in Table 1 of Lesson 1, but now lets compare the
mean salary of that group with another group, and ask whether the mean salaries of the two
groups are significantly different.

First, lets look at the formula for the t-test, and determine what we need to make the
computation:

Where:
is the mean for Group 1.
is the mean for Group 2.
is the number of people in Group 1.

is the number of people in Group 2.

is the variance for Group 1.

is the variance for Group 2.


2
The only thing in this formula youre not familiar with is the symbol s , which stands for the variance. The variance is the same as the standard deviation
without the square root, i.e., its nothing more than the sum of the deviations of all the scores from the mean divided by n-1.

The formula above is for testing the significance of difference between two independent samples, i.e., groups of different people. If we wanted to test the
difference between, say, the pre-test and post-test means of the same group of people, we would use a different formula for dependent samples. That
formula is:

Where:

is the sum of all the individuals pre-post score differences.

is the sum of all the individuals pre-post score differences squared.

is the number of paired observations.

But for now, well test the significance of difference between the mean salary of two different groups. You can try the one for dependent samples on your
own. (I knew youd welcome that opportunity.)

Tables 8 and 9 provide the numbers we need to compute the t-test for the difference in mean salaries of the two groups.

Table 8
Salaries and t-Test Calculation Data for Group 1

A
Salary (X)
20
25
30
35
40

B
Frequency (f)
1
2
3
4
5

C
X - Mean (d)
25
20
15
10
5

D
fd
25
40
45
40
25

E
fd2
625
800
675
400
125

45
50
55
60
65
70
SUM

6
5
4
3
2
1
36

0
-5
-10
-15
-20
-25

0
-25
-40
-45
-40
-25

0
125
400
675
800
625
5,250

The variance (s2)


Table 9
Salaries and t-Test Calculation Data for Group 2
A
Salary(X)
20
25
30
35
40
45
50
55
60
65
70
SUM

B
Frequency (f)
0
2
3
3
4
6
6
5
3
2
2
36

C
X - Mean (d)
27
22
17
12
7
2
-3
-8
-13
-18
-23

D
fd
0
44
51
36
28
12
-18
-40
-39
-36
-46

E
fd2
0
968
867
432
196
24
54
320
507
648
1,058
5,074

The variance (s2)


You can see from a quick inspection of the two tables that the salary distributions are similar.
There a few more people making higher salaries. The mean of the second group (which has been
calculated for you) is slightly higher (47 vs. 45 for the first group). And the variance is smaller
(145 vs. 150). So lets plug the numbers into the t-test formula and see what we get.

We now know that t = .222. So what does that mean? Is the difference between the two means
statistically significant or not? To find out whether a t-test of any value is significant or not, we
simply look it up in a table that can be found in the appendices of any statistical text book. The
quick answer in this case is no, it is not statistically significant. That is, the 2-point difference in
the mean salaries of these two groups could likely have occurred by chance.
But thats the quick and dirty answer. Theres more about the matter of statistical significance we
need to understand. So were going to that important topic now, and well return to this example
after weve done that.
Lesson 4
Statistical Significance and

The Type I and Type II Errors


Certainty and UncertaintyUniverses and Samples
Why do we have to use statistical tests, anyway? When we have two groups with different means,
why cant we just say that one is higher than the other, and thats it? The reason is that the
difference between the means of the two groups may be due to chance, and if we were to make
the comparison again, the difference might be turned around.
How can that be? The two main reasons are sampling and measurement error. The particular
sample we have may not be representative of the universe from which it is drawn. Also, tests and
measuring instruments are not perfect.
For example, suppose that within the next hour we could somehow magically measure the height
of every adult man and woman in the world, and we found that the mean height of the men was
56", and the mean for the women was 53". Since we have measured the entire universe of adult
men and women, those are the averages, not estimates of them based on samples. We dont need
to run a t-test to see if the 3" difference between the means is statistically significant. That is the
difference.
But ifas is almost always the case in whatever we dowe have to use a sample, we have to
account for the fact that the sample, no matter how carefully drawn, may not be representative of
the universe. Usually it is, but sometimes its not.
A good way to understand this important point is to realize that if we were to take 100 random
samples of a 1,000 people each, the means of those samples would form a normal curve (just like
the ones we worked on in Lesson 2). In other words, the means of some of those samples would
be as much (or more) than 3 standard deviations on either side of the collective mean of the
100,000 people.
When we take just one sample, which is what we usually have to work with, the chances are its
close to the real mean, simply because most of the values are clustered close to the mean
(remember, 68% of the values are within 1 standard deviation from the mean). But we cant be
sure. The sample were working with just might be one of those thats lying out at the extremes of
the normal curve.
Thats why we have tests of statistical significance. They cant tell us for sure whether the means
were comparing are close to the true mean, but they can give us a good estimate or probability
of whether thats the case.
Scientific Knowledge and the Null Hypothesis
As youve probably realized by now, scientists and statisticians understand that error and
uncertainty are inevitable, but theyre very uncomfortable with it. Thus, one of the basic tenets of
science, which is reflected in statistics, is the requirement that nothing be admitted into the body
of scientific knowledge unless were as sure as we can be that its true. In other words, there is a
strong conservative bias in science and statistics. Scientists would rather be guilty of waiting
until theres more evidence to be sure than to accept a finding prematurely and be wrong. In
statistics, this takes the form of what is called the "null hypothesis." Basically, the null
hypothesis says that whenever you are, for example, setting out to compare the difference
between two means, you begin with the assumptionindeed, the assertionthat there is no
difference between the means. And in order to conclude that there is a difference, your task is to
disprove the null hypothesis.
Levels of Significance
Now this leads to a very difficult decision. And to understand the difficulty, lets first go back to
the t-test of the two means we ran in Lesson 3. We found that, for that test, t = .222. In order to
find out if the difference between the means is statistically significant (i.e., how likely it is that it is
due to chance), we look up the value of t in one of the statistical significance table that are found
in the appendices of all statistics texts. The t-test table we need is reproduced below.

Table 10
t-Test Values Required to Reject the Null Hypthothesis at the .05 and .01 Levels of Confidence
(Two-Tailed Test)
__________________________________________________________________________
Degrees of Freedom (df) .05 .01
__________________________________________________________________________
20 2.09 2.85
21 2.08 2.83
22 2.07 2.82
23 2.07 2.81
24 2.06 2.80
25 2.06 2.79
26 2.06 2.78
27 2.05 2.77
28 2.05 2.76
29 2.05 2.76
30 2.04 2.75
35 2.03 2.72
40 2.02 2.71
45 2.01 2.70
50 2.01 2.68
55 2.00 2.67
60 2.00 2.66
65 2.00 2.66
70 2.00 2.65
75 1.99 2.64
80 1.99 2.64
85 1.99 2.64
90 1.99 2.63
95 1.99 2.63
100 1.98 2.63
Infinity 1.96 2.58

________________________________________________________________________
In order to use this table, we enter it with our t value (.222) and something called "degrees of
freedom." The degrees of freedom is simply n11+n21 or, in our case, 70. Note that there are two
columns of t values, one labeled .05, and the other labeled .01. If we go down to the degrees of
freedom nearest to ours, which would be 70, we find that both the .05 and the .01 t values are
substantially larger than our .222. So we didnt achieve a large enough t value to reject the null
hypothesis, i.e., to be able to conclude that the difference wasnt due to chance.
Why do we have two columns, one labeled .05 and the other .01? Because those are the two
levels of significance commonly used in statistical analysis. The t values in the .05 column are
likely to occur by chance 5 percent of the time, whereas the t values in the .01 column are likely
to occur by chance only 1 percent of the time.
Type I and Type II Errors
The choice of what significance level to use (.05, .01, or lower or higher) is the difficult choice that
you as the researcher must make. If you decide to accept the .05 level of confidence, which
requires a smaller t value, you can more easily reject the null hypothesis and declare that there is
a statistically significant difference between the means than if you select the .01 level, but you
will be wrong 5 percent of the time. This is the Type I error.
On the other hand, if you select the .01 value, you will be wrong only 1 percent of the time. But
since the .01 value requires a larger t value, you will less often be able to reject the null
hypothesis and say that there is a statistically significant difference between the means when in
fact that is the case. This is the Type II error. It is crucially important to an understanding of even
basic statistics that we have a clear understanding of these two errors. If you spend a little time
with Table 10, it will help you achieve this understanding.
Table 11.
Accepting and Rejecting Null Hypotheses and the Making of Type I and Type II Errors*
Decision

Accept The Null


Hypothesis

The null hypothesis is


really true, i.e., there is
not a real difference
between the means of
the two groups.

The null hypothesis is


really false, i.e., there is
a real difference
between the means of
the two groups.

Reject The Null


Hypothesis

You accepted the null


hypothesis when it is
true, i.e., you
concluded that there
is not a real difference
between the means of
the two groups which,
in fact, is the case.
That was a good
decision.

You rejected the null


hypothesis when it is true,
i.e., you concluded that
there is a real difference
between the means of the
two groups when, in fact,
there is not a difference.
That was a bad decision.
You made the Type I error.

You accepted the null


hypothesis when it is
false, i.e., you
concluded that there
is not a real difference
between the means of
the two groups when
in fact there is a real
difference. That was a

You rejected the null


hypothesis when it is true,
i.e., you concluded that
there is a real difference
between the means of the
two groups which, in fact,
is the case . That was a
good decision.

bad decision. You


made the Type II error.
*This table was adapted from a similar one found in Neil Salkinds Statistics for People Who
(Think They) Hate Statistics, Sage Publications, 2000, p. 176.
Table 11 and the work weve done in this lesson make the mysteries of statistical significance
and Type I and Type II errors transparently clear. When youre reading a professional journal and
you encounter a discussion of the difference between the means of two groups, and the authors
conclude by saying, t = 2.64 p < .01, df = 70, two tailed test, you will immediately know that:
1. The t-test of the two means yielded a t value of 2.64.
2. A t value of 2.64 with df = 70 for independent samples (the two-tailed rather than the onetailed test) is statistically significant beyond the .01 level of confidence, i.e., likely to occur
by chance less than 1 in 100 times.
So, this knowledge is a major step forward in your journey to master basic statistics. And weve
got a few more neat things to cover.
Lesson 5
The Effect Test
Take another look at Table 10 in Lesson 4 which provides the significance levels for the t-test.
You probably noticed that the size of the t value needed to reject the null hypothesis (and enable
you to declare that there is a statistically significant difference between two means) is dependent
on the size of the samples on which the means are based. With df = 20, you need a t value of 2.09
to reach the .05 level of significance; but with df = 100, you need a t value of only 1.98.
In other words, if you have small Ns, you will need a large difference between the means to
achieve statistical significance; but if you have very large Ns, you will need only a very small
difference to be able to declare that the difference between the means is statistically significant.
So why is that of more than technical interest? Because we dont want to mistake statistical
significance for educational significance. Suppose you are comparing the mean reading scores
of students in your traditional program with those in a new reading program. There are 500
students in each program, and at the end of the year there is a 3 point difference favoring the new
program, and that 3 point difference is statistically significant beyond the .001 level of
confidence. The proponents of the new program are likely to cite that finding as clear research
evidence of the superiority of the new program and call on you, as the superintendent, to junk the
traditional program, even though the new program is substantially more costly.
But you should be wary of that recommendation. Why? Because the mean difference in reading
scores, even though its statistically significant, is very small. Is a 3 point difference likely to have
any practical significance, or even be observable? Probably not. Even if the difference were a few
points greater, would such a difference justify the expenditure of substantially more funds?
Probably not.
It turns out that statisticians have developed a test that is intended to give some help when
confronting the question of whether a difference between two means is of practical consequence.
Its called the Effect Test.
The formula for the Effect Test is:

Where:
E is the effect size.

is the mean of Group 1.


is the mean of Group 2.
is the standard deviation of Group 1.
is the standard deviation of Group 2.
As you can see, the formula is simply dividing the difference between the two means by the
average of the score variability in the two groups.
There is a general consensus that an effect size of .33 or greater indicates that the difference has
practical meaning or significance.
Lets do an example.
We have two groups with mean reading test scores of 188 and 185 and standard deviations of 30
and 32. N = 500 for both groups, and the difference between the means is statistically significant.
We plug the numbers into the Effect Test formula as follows:

or
The effect size does not reach the .33 level, so the 3 point difference between the means would
not be regarded as practically consequential, even though its statistically significant.
But suppose the two means are 193 and 182, and the Ns and standard deviations are the same.
Then we have:

or
The 11 point difference between the means (with the associated score variability as reflected in
the standard deviations) exceeds the .33 threshold for practical significance. So in this case, we
would be justified in saying that the difference between the two groups is not only statistically
significant, it can also be regarded as having some practical educational meaning.
However, in the final analysis, you, as the experienced educator and administrator, must make
the judgment about practical meaning. Many times you will be presented with mean differences
that are large by any practical standard, but because of small Ns or large variances, theyre not
statistically significant. In those cases, the judgment is fairly easy: You would be on very soft
ground making policy and budgetary decisions based on differences that are not statistically
significant.
But the other case is more difficult. If you have a mean difference that is both statistically
significant and practically significant as indicated by the effect size, you still have to be the judge
of whether that difference justifies changing programs, spending more money, hiring or firing
staff, and so on.
The new knowledge you now have about how to determine statistical and practical significance
adds greatly to your ability to make decisions about the effectiveness of educational programs
and the formulation of educational policies, but there are no automatic answers. You, as the
responsible administrator, must bring your experience to bear in making the final decision.
Lesson 6
Correlation

What is a Correlation?
Thus far weve covered the key descriptive statisticsthe mean, median, mode, and standard
deviationand weve learned how to test the difference between means. But often we want to
know how two things (usually called "variables" because they vary from high to low)
are related to each other.
For example, we might want to know whether reading scores are related to math scores, i.e.,
whether students who have high reading scores also have high math scores, and vice versa. The
statistical technique for determining the degree to which two variables are related (i.e., the degree
to which they co-vary) is, not surprisingly, called correlation.
There are several different types of correlation, and well talk about them later, but in this lesson
were going to spend most of the time on the most commonly used type of correlation: the
Pearson Product Moment Correlation. This correlation, signified by the symbol r, ranges from
1.00 to +1.00. A correlation of 1.00, whether its positive or negative, is a perfect correlation. It
means that as scores on one of the two variables increase or decrease, the scores on the other
variable increase or decrease by the same magnitudesomething youll probably never see in
the real world. A correlation of 0 means theres no relationship between the two variables, i.e.,
when scores on one of the variables go up, scores on the other variable may go up, down, or
whatever. Youll see a lot of those.
Thus, a correlation of .8 or .9 is regarded as a high correlation, i.e., there is a very close
relationship between scores on one of the variables with the scores on the other. And
correlations of .2 or .3 are regarded as low correlations, i.e., there is some relationship between
the two variables, but its a weak one. Knowing peoples score on one variable wouldnt allow you
to predict their score on the other variable very well.
Computing the Pearson Product Moment Correlation
Lets do a correlation to see how the formula works and what it produces. The formula for the
Pearson product moment correlation is:

Where:
rxy is the correlation coefficient between X and Y.
n is the size of the sample.
X is the individuals score on the X variable.
Y is the individuals score on the Y variable.
XY is the product of each X score times its corresponding Y score.
X2 is the individual X score squared.
Y2 is the individual Y score squared.
Lets see what the correlation is between 30 students reading scores and their math scores. The
data we need to compute the formula are given in Table 12.
Table 12
Reading and Math Scores and the Associated Data for Computing the Pearson Product Moment
Correlation (N=30)
X

(Reading

(Math Scores)

X2

Y2

XY

Scores)
191

180

36481

32400

34380

103

101

10609

10201

10403

187

173

34969

29929

32351

108

103

11664

10609

11124

180

170

32400

28900

30600

118

113

13924

12769

13334

178

171

31684

29241

30438

127

122

16129

14884

15494

176

168

30976

28224

29568

134

130

17956

16900

17420

165

150

27225

22500

24750

147

145

21609

21025

21315

160

150

25600

22500

24000

157

154

24649

23716

24178

155

145

24025

21025

22475

168

164

28224

26896

27552

150

145

22500

21025

21750

172

170

29584

28900

29240

145

130

21025

16900

18850

185

179

34225

32041

33115

140

141

19600

19881

19740

195

193

38025

37249

37635

135

136

18225

18496

18360

100

101

10000

10201

10100

130

128

16900

16384

16640

125

121

15625

14641

15125

105

106

11025

11236

11130

120

118

14400

13924

14160

115

112

13225

12544

12880

110

108

12100

11664

11880

4381

4227

664583

616805

639987

Total ( )
So, we plug the numbers from this table into the formula, and do the math:

or

or

or

In this case, the correlation between reading and math scores is remarkably high (because I
concocted the numbers so it would turn out that way). With real scores, it would be high, but not
that high. If you glance over the numbers in Table 12, even before weve computed the correlation
you can easily see (in this small sample of 30) that high scores in reading tend to go with high
scores in math, low reading scores tend to go with low math scores, and so on. But, of course,
you wouldnt be able to see that pattern if you had a sample of 500.
Positive and Negative Correlations
I pointed out above that a correlation can vary from +1.00 to 1.00. The correlation we just
computed is a positive correlation. That is, high reading scores go with high math scores, low
with low, and so on. However, we could have a negative correlation. This is not something bad; it
simply denotes an association in which high scores on one variable go with low scores on the
other. For example, if we were computing a correlation between, say, amount of time students
watch television and their achievement score, we would find a negative correlation: high TV
watching is associated with lower achievement scores, and vice versa. Such a correlation might
be something like .71.
Determining Statistical Significance
OK, so we have a correlation coefficient. What precisely does it mean, and how do we interpret
it? Its not a percent, as many people mistakenly think.
First, we can determine its statistical significance in the same way we did with the t test. We can
look it up in a table in the appendices of any statistical text. In the case of our .98 correlation
between reading and math scores, if we look that up in the table for correlations, we find that the
value needed to reject the null hypothesis at the .01 level of confidence (and declare that the
correlation is statistically significant, or unlikely due to chance) for our sample of 30 is .45 (in this
case using the one-tailed test because the samples are dependent).

So if we were stating this finding in a research report, we could say that the correlation of reading
scores with math scores = .98 p <.01 with df = 28. (Now see how smart you are because you know
what all that means.)
Practical vs. Statistical Significance
But we have the same issue we had with the t-test: determining its practical vs. its statistical
significance. We dont have an effect test, as we did with the t-test, but we have something
similar. It has an imposing namethe coefficient of determinationbut youll be ecstatically
happy to learn that its very simple.
The coefficient of determination is nothing more than r2. You simply multiply r by itself, and
youve got it. OK, youve got it, what does it mean? The coefficient of determination, r2, tells us
how much of the variance in one of the variables is accounted for by the variance in the other
variable. Thus, if we have a correlation of .60 between, say, students achievement scores and a
measure of their socioeconomic status, r2 = 36. That means that 36% of the variance in the
students achievement scores (not 60, which is the correlation) can be accounted for by variance
in their socioeconomic status. But that also means that the remaining variance (64%) in
achievement scores cannot be accounted for by socioeconomic status, but is attributable to
many other factors, such as study time, intelligence, motivation, quality of instruction, and so on.
Other Correlations
All the correlations weve talked about so far have been based on what we call interval data, i.e.,
data where the distance between scores or values is the same. The distance between a 65 and 66
is assumed to be the same as the distance between a 14 and a 15. But many times we want to
determine the relationship between two variables when that is not the case. Suppose, for
example, we want to compute the correlation between students class rank in their junior year
with their class rank in their senior year. Ranks are not the same as scores; there may be a much
smaller (or bigger) difference between ranks 1 and 2 than between ranks 8 and 10 (like the
difference between the first two teams and the last two teams in football or baseball). If the data
we have are ranks rather than scores, we cant use the product moment formula. But there is
another correlation formula for use with ranks (its called rho).
And suppose we want to determine the relationship between two variables when one is based on
what is called nominal or categorical data, and the other is interval data. An example would be
correlating gender with achievement scores. Again, the product moment correlation cant be
used, but there is also a special formula for doing a correlation with these disparate types of
data. In this case, its called the point biserial correlation.
Table 13 displays the several different types of correlation for use with variables based on
different levels of measurement. In this course, were not going to compute them. But with the
knowledge and skills youve developed thus far, when you encounter situations where the
variables you want to correlate are based on different levels of measurement (interval, ordinal, or
nominal), youll be able to select the type you need.
Table 13
Alternative Types of Correlation for Different levels of Measurement*
Type of Measurement and Examples
Correlation Being
Computed

Type of
Correlation

Interval (math
scores)

Correlation between
reading and math
achievement

Pearson product
moment (r)

Ordinal (class rank


in the senior year)

Correlation between
class rank in the last
two years of high
school

Spearman rank
coefficient (rho or
p)

Variable X

Variable Y

Interval (reading
scores)

Ordinal (class
rank in the junior
year)

Nominal (social
class, high,
middle, or low

Ordinal (rank in
high school
graduating class)

Correlation between
social class and
rank in high school

Rank biserial
coefficient (rbs)

Nominal (family
configuration,
e.g., intact or
single parent)

Interval (grade
point average)

Correlation between
family configuration
and grade point
average

Point biserial
(rpb)

Nominal (voting
preference
Republican or
Democrat)

Nominal (gender,
i.e., male or female)

Correlation between
voting preference
and gender

Phi coefficient
( )

*This table was adapted from a similar one found in Neil Salkinds Statistics for People Who
(Think They) Hate Statistics, Sage Publications, 2000, p. 101.
Correlation and Cause
Before we conclude this lesson, we need to understand one of the most important facts about
correlation, namely, that it does notnecessarily indicate cause. It may be that one of the variables
does in fact cause the other, but we dont know that just from the fact that the two are correlated.
Smoking and Lung Cancer
It is now an established fact that smoking causes lung cancer, but that conclusion could not be
reached simply because there is a correlation between the two. When the association between
smoking and lung cancer first appeared, and many argued that indicated that smoking caused
lung cancer, the tobacco companies argued that there were other factors that could explain the
relationship, e.g., smoking is higher among blue collar workers who also have greater exposure
to other toxic elements, smokers drink more and lead more stressful lives, and so on. And
logically they were right. It took other kinds of direct physiological evidence and animal
experiments to prove that the association was indeed causal.
We often find strong correlations where clearly a causal relationship makes no sense. For
example, we may find a strong correlation between car sales and college attendance. Neither one
of these is causing the other; both increase during financially prosperous times.
Wine Consumption and Heart Disease
But it is when two correlated variables seem likely to be causally related to one another that we
tend to jump to the unsupportable conclusion that one causes the other. For example, when we
hear about a correlation between an increase in stork nests and the birth rate in Germany, we
laugh it off as clearly due to some unknown third factor. But when we hear that moderate wine
consumption is associated with lower rates of heart disease, were ready to immediately
conclude (especially if were wine lovers) that there is obviously some medically beneficial
element in wine. But when these reports first came out, skeptics (they were probably
statisticians) pointed out that other things could account for the association between moderate
wine consumption and lower rates of heart disease. Moderate wine drinkers are likely to be more
educated, non-smokers, get more exercise, and have lower rates of obesity. Again, as it has
turned out, other kinds of physiological evidence do support the conclusion that moderate wine
consumption is medically beneficial, but we cant conclude that just on the basis of the
correlation.
The Important Lesson About Correlation and Cause
The important lesson here is that the correlation coefficient is a highly useful statistic for
determining the relationship between variables, but a correlation does not demonstrate a causal
relationship between the variables.
The same holds for differences between means. If, for example, we give a pre-test and a post-test
to students who have participated in a new reading program, and we find that the increase in the
mean reading score is both statistically and practically significant, that does not entitle us to

conclude that the new program caused the increase. Any number of other factors could account
for the increase: the students were older, and they had been exposed to many other influences
and experiences that could haveand probably didimprove their reading. To determine how
much, if any, of the improvement was caused by the new program, we would have to employ a
control group (or some other method for determining "the expectation of non-treatment"). This
would tell us how much improvement occurred in comparable students who had the same
experiences except for the new reading program. For additional information on these and other
designs that address this question, see the Ed Leaders Evaluation Web Site
athttp://edl.nova.edu/secure/EVASupport/index.html.
Lesson 7
Chi Square
Parametric and Non-Parametric Statistics
Most of the statistics weve learned so farthe mean, the standard deviation, the t- test, and the
product moment correlationbelong to a category called parametric statistics. Thats because it
is assumed the data used to compute them have certain parameters or meet certain conditions.
One of these is that the variances are similar; another is that the sample is large enough to be
representative of the universe from which it is drawn. We used examples of 30 or more cases
when we worked on the mean, the t-test, and the product moment correlation because there is a
general consensus among statisticians that this is the minimum-size sample to use with
parametric tests. You should keep this in mind when using these tests in your practicum and in
your own research.
But what do we do when we cant meet these conditions? Happily, theres another category of
statistics, and you shouldnt be surprised to learn that its called non-parametric statistics. We
can do many of the same things with non-parametric statistics. Theyre regarded as somewhat
less powerful than parametric statistics, but theyre not to be looked down on. When conditions
call for them, they are the things to use.
Chi Square
One of the most useful of the non-parametric statistics is chi square. We use it when our data
consist of people distributed across categories, and we want to know whether that distribution is
different from what we would expect by chance (or another set of expectations). We dont have
scores, we dont have means. We just have numbers, or frequencies. In other words, we have
nominal data.
For example, suppose we have the data in Table 14 that display the number of students who elect
different majors, and we want to know whether those numbers differ from chance. In other words,
are some majors selected more often than others, or is the selection pattern essentially random?
Table 14
Number of Students Selecting Different Majors

Pre-Med
50

Computer
Sciences
85

English
Literature
25

Education
60

Engineering

Total

80

300

The null hypothesis here, of course, is that there is no difference between this distribution of
major selections from what would be expected by chance. So what chi square does is compare
these numbers (the observed frequencies) with those that would be expected by chance (the
expected frequencies).
The formula for chi square is:

Where:

is the value for chi square.


is the sum.
O is the observed frequency
E is the expected frequency.
The first question in doing the calculation is, how do we get the expected frequencies? Thats
easy. If we are testing the observed frequencies (those in Table 14) against what we would expect
by chance, since we have five categories of majors, we would expect one-fifth of the individuals
to fall in each of the categories. One-fifth (20%) of 300 is 60. So if the selection of majors is
largely a chance pattern, we would expect to find 60 people in each category.
Table 15 displays the observed and expected frequencies for each major, computes the
difference between them (OE), squares OE ((OE)2), divides the squares by the expected
frequencies ((OE)2/E), and sums those quantities to give us our
, which is 39.17.
Table 15
Observed and Expected Frequencies for the Selection of Majors

Major

O (observed
frequency)

E (expected
frequency)

OE

(OE)2

(OE)2/E

Pre-Med

50

60

-10

100

1.67

Computer
Sciences

85

60

25

625

10.42

English
Literature

25

60

-35

1225

20.42

Education

60

60

0.00

Engineering

80

60

20

400

6.67

Total

300

300

39.17

By now, you know the next step: determining if we can reject the null hypothesis. We do it the
same way we did for the t-test and the correlation. We enter the chi square significance table
(which I have handy, but you dont) with our chi square value (39.17) and the appropriate degrees
of freedom. For chi square, the degrees of freedom are equal to the number of rows minus one
(R1). In our case we have five rows, so df = 4.
Entering the chi square table with our result of 39.17 and df = 4, we find that we need a chi square
value of 13.28 to reject the null hypothesis at the .01 level of confidence. We clearly have that, so
we can say that the distribution of major selections is a not simply a chance pattern; or
= 39.7
p <.01, df = 4.
Lesson 8
Summarizing the Steps and Moving On
In the statistical tests weve calculated (the t-test, correlation, and X2), weve gone through a
series of steps that youll go through when you compute any statistical test.
Recapping, here they are:
1. First, determine the level of measurement you have. Are the data you have interval,
ordinal, or nominal?
2. If you have interval data, determine whether they meet the requirements of a parametric
test (adequate sample size and variance similarity).

3. Based on the determinations you made from (1) and (2), select the statistical test (t, r, X2,
or whatever).
4. Calculate the values required, plug them into the formula, and compute the test. (Now that
you have gone through these calculations and understand them, the labor can be done for
you by any one of the available statistical software packages.)
5. Select the level of risk you want to take in rejecting the null hypothesis and making (or
avoiding) the Type I and Type II errors. Usually that will be .05 or .01.
6. Enter the appropriate significance table (e.g., for t, r, or X2) with the test result and the
proper degrees of freedom.
7. Determine whether your test result is large enough to reject the null hypothesis and
enable you to conclude that it is statistically significant.
8. If it is statistically significant, use whatever additional tests may be available (e.g., the
effect test, the coefficient of determination, etc.) and your own reasoned judgment to
determine if the result is also practically significant.
*

Congratulate yourself. The fact that you understand these steps and can execute them shows
how far youve come. You now have a good grip on basic statistics. You can understand them in
research journals, and you can use them in your practicum and in your own research. And you
are now in a position to go on to more advanced statistics (I know you cant wait).
References
I have not provided a set of references because there are literally dozens of introductory
statistics texts, and just about any of them will do. You definitely should have one of these texts
for reference purposes, especially for the significance tables they all provide. My favorite, and the
one I highly recommend, is Neil Salkind's Statistics for People Who (Think They) Hate Statistics.
Sage Publications, 2000.
Statistical Software
This short course has taken you through both the explanation of the major statistical concepts
and the actual computation of the most common statistical tests you will be encountering in the
research literature and using in your own research.
Now that you have this essential, basic understanding, you wont need to do any computations
by hand. There are software applications that will do that for you. Once you enter the data, they
will compute a correlation in less than a second, and provide you with the significance levels.
There are a number of such programs. You can, in fact, do a number of statistical tests with
Microsoft Excel, which is mainly a spreadsheet program. And many of you probably have this
application on your computers, either as a stand-alone program or as part of Microsoft Office.
But one of the most highly regarded and user friendly statistical programs is GB-STAT, so if you
dont already have such a program, this would be a good one to get.
Good luck in all your research endeavors.
John Evans
evansj@nsu.nova.edu

You might also like