Professional Documents
Culture Documents
Where:
(sometimes call the X-bar) is the symbol for the mean.
(the Greek letter sigma) is the symbol for summation.
X is the symbol for the scores.
N is the symbol for the number of scores.
So this formula simply says you get the mean by summing up all the scores and dividing the total
by the number of scoresthe old average, which in this case were all familiar with, so its a good
place to begin.
This is pretty simple when you have only a few numbers. For example, if you have just 6 numbers
(3, 9, 10, 8, 6, and 5), you insert them into the formula for the mean, and do the math:
But we usually have many more numbers to deal with, so lets do a couple examples where the
numbers are larger, and show how the calculations should be done. In our first example, were
going to compute the mean salary of 36 people. Column A of Table 1 show the salaries (ranging
from $20K to $70K), and column B shows how many people earned each of the salaries.
Table 1
Example 1 of Method for Computing the Mean
A
Salary (X)
$20k
$25K
$30K
$35K
B
Frequency (f)
1
2
3
4
C
fX
20
50
90
140
$40K
$45K
$50K
$55K
$60K
$65K
$70K
Sum
5
6
5
4
3
2
1
36
200
270
250
220
180
130
70
1,620
To get the
for our formula, we multiply the number of people in each salary category by the
salary for that category (e.g., 1 x 20, 2 x 25, etc.), and then total those numbers (the ones in
column C). Thus we have:
The scores in this distribution are said to be normally distributed, i.e., clustered around a central
value, with decreasing numbers of cases as you move to the extreme ends of the range. Thus the
term normal curve.
So, computing the mean is pretty simple. Piece of cake, right? Not so fast.
In our second example, lets look what happens if we change just six peoples salary in Table 1.
Lets suppose that the three people who made $60K actually made $220K, and that the two who
made $65K made $205K, and the one person who made $70K made $210K. The revised salary
table is the same except for these changes.
Table 2
Example 2 of Method for Computing the Mean
A
Salary (X)
$20k
$25K
$30K
$35K
$40K
B
Frequency (f)
1
2
3
4
5
C
fX
20
50
90
140
200
$45K
$50K
$55K
$200K
$205K
$210K
Sum
6
5
4
3
2
1
36
270
250
220
600
410
210
2,460
But before we recompute the mean, lets look at how different the distribution looks.
Figure 2
Distribution of Example-2 Salaries
Now, using the revised numbers in Table 2, we compute the mean as follows:
What this shows is that changing the salaries of just six individuals to extreme values greatly
affects the mean. In this case, it raised the mean from $45K to $68.3K (an increase of 52%), even
though all the other scores remained the same. In fact, the mean is a figure that no person in the
group hashardly a figure we would think of as "average" for the group.
The important lesson here is that the mean is intended to be a measure of central tendency, but it
works usefully as such only if the data on which it is based are more or less normally distributed
(like in Figure 1). The presence of extreme scores distorts the mean, and, in this case, gives us a
mean salary ($68.3K) that is not a very good indication of the "average" salary of this group of 36
individuals.
So if we know or suspect that our data may have some extreme scores that would distort the
mean, what measure can we use to give us a better measure of central tendency? One such
measure is the median, and we move on to learn about that now.
The Median
If your data are normally distributed (like those in Figure 1), the preferred measure of central
tendency is the mean. However, if your data are not normally distributed (like those in Figure 2),
the median is a better measure of central tendency, for reasons well see in a moment.
The median is the point in the distribution above which and below which 50% of the scores lie. In
other words, if we list the scores in order from highest to lowest (or lowest to highest) and find
the middle-most score, thats the median.
For example, suppose we have the following scores: 2, 12, 4, 11, 3, 7, 10, 5, 9, 6. The next step is
to array them in order from lowest to highest.
2
3
4
5
6
7
9
10
11
12
Since we have 10 scores, and 50% of 10 is 5, we want the point above which and below which
there are five scores. Careful. If you count up from the bottom, you might think the median is 6.
But thats not right because there are 4 scores below 6 and 5 above it. So how do we deal with
that problem? We deal with it by understanding that in statistics, a measurement or a score is
regarded not as a point but as an interval ranging from half a unit below to half a unit above the
value. So in this case, the actual midpoint or median of this distributionthe point above which
and below which 50% of the scores lieis 6.5
As we saw with the mean, when we have only a few numbers, its pretty simple. But how do we
find the median when we have larger numbers and more than one person with the same score?
Its not difficult. Lets use the salary data in Table 1.
Table 3
Example 1 of Method for Computing the Median
Salary
$20K
$25K
$30K
$35K
$40K
$45K
$50K
$55K
$60K
$65K
$70K
Sum
Range
$19.5K-20.5K
$24.5K-25.5K
$29.5K-30.5K
$34.5K-35.5K
$39.5K-40.5K
$44.5K-45.5K
$49.5K-50.5K
$54.5K-55.5K
$59.5K-60.5K
$64.5K-65.5K
$69.5K-70.5K
Frequency
1
2
3
4
5
6
5
4
3
2
1
36
The salaries are already in order from lowest to highest, so the next step in finding the median is
to determine how many individuals (ratings, scores, or whatever) we have. Those are shown in
the frequency column, and the total is 36. So our N = 36, and we want to find the salary point
above which and below which 50%, or 18, of the individuals fall. If we count up from the bottom
through the $40K level, we have 15, and we need three more. But if we include the $45K level (in
which there are 6), we have 21, three more than we need. Thus, we need 3, or 50%, of the 6 cases
in the $45K category. We add this value (.5) to the lower limit of the interval in which we know the
median lies ($44.5K-$45.5K), and this gives us value of $45K.
In this case, the mean and the median are the sameas they always are in normal distributions.
So in situations like this, the mean is the preferred measure.
But things arent always so neat and tidy. Lets now compute the median for the salary data in
Table 2, which we know (from Figure 2) are not normally distributed.
Table 4
Example 2 of Method for Computing the Median
Salary
$20k
$25K
$30K
$35K
$40K
$45K
$50K
$55K
$200K
$205K
$210K
Sum
Range
$19.5K-20.5K
$24.5K-25.5K
$29.5K-30.5K
$34.5K-35.5K
$39.5K-40.5K
$44.5K-45.5K
$49.5K-50.5K
$54.5K-55.5K
$199.5K-200.5K
$204.5K-205.5K
$209.5K-210.5K
Frequency
1
2
3
4
5
6
5
4
3
2
1
36
The N is the same (36), so we go through exactly the same calculations we did for the data in
Table 3. When we do that (count up from the bottom, find that we need half the cases in the $45K
category to get 50% (18) of the total, and do so by adding .5 to the lower limit of that category),
incredibly we get exactly the same result ($45K) we did with the data in Table 3. In other words,
those six extreme cases (the six whose salaries changed from $60K, $65K, and $70K to $200K,
$205K, and $210K) dont affect the median even though they made a big change in the mean.
They are still above the midpoint, and it doesnt matter how much above it in the calculation of
the median.
This example illustrates dramatically what the median is and why its a better measure of central
tendency than the mean when we have extreme scores.
Weve done the calculations for the median in a simple, descriptive way (arraying the scores from
high to low, counting up to the mid-category, dividing it as necessary, etc.), but just so you wont
feel slighted, here is the statistical formula for doing what weve just done.
Where:
Mdn is the median.
L is the lower limit of the interval containing the median.
N is the total number of scores.
is the sum of the frequencies or number of scores up to the interval containing the median.
fw is the frequency or number of scores within the interval containing the median.
i is the size or range of the interval.
The Mode
The third and last of the measures of central tendency well be dealing with in this course is the
mode. Its very simple: The mode is the most frequently occurring score or value. In our case
(see Figures 1 and 2), that value is 45K. But sometimes we may have odd distributions in which
there may be two peaks. Even if the peaks are not exactly equal, theyre referred to as bi-modal
distributions.
Lets assume we have such a bi-modal distribution of salaries as shown in Table 5 and Figure 3.
Table 5
Bi-Modal Distribution of Salaries
A
Salary (X)
$20K
$25K
$30K
$35K
$40K
$45K
$50K
$55K
$60K
$65K
$70K
Sum
B
Frequency (f)
1
3
4
6
3
1
3
5
6
3
1
36
C
fX
20
75
120
210
120
45
150
275
360
195
70
1,640
Figure 3
Example of a Bi-Modal Distribution
Before we talk about the mode, using the formulas and calculation procedures youve just
learned, calculate the mean and median for the salaries in Table 5 (the fx and the
data are in
Column C).
When you look at this distribution of salaries, as shown graphically in Figure 3, its hard to
discern any central tendency. The mean (which you just calculated) is $45K, which only one
person earns, and the median is also $45K, which, while its the middle-most value (50% of the
cases are above and below it), certainly doesnt give us a meaningful indication of the central
tendency in this distributionbecause there isnt any.
Therefore, the most informative general statement we can make about this distribution is to say
the it is bi-modal.
You now know the three principal measures of central tendencythe mean, the median, and the
modewhen they should be used, and how to calculate them, so we now move on to the other
side of the central-tendency coin: dispersion.
Lesson 2
Figure 5
Display of a Narrow or Concentrated Distribution
Note that he mean and median of these two quite different distributions are the same ( = 150,
Mdn = 150), so simply calculating and reporting those two measures of central tendency would
fail to reveal how different the dispersion of scores is between the two groups. But we can do this
by calculating the standard deviation.
The standard deviation provides us with a measure of just how spread out the scores are: a high
standard deviation means the scores are widely spread; a low standard deviation means they're
bunched up closely on either side of the mean.
We'll now calculate the standard deviation for both these distributions. The formula for the
standard deviation is:
Where:
(little sigma) is the standard deviation.
d2 is a score's deviation from the mean squared.
is the number of cases.
The numbers we need to calculate the standard deviation for Figure 4, the flat distribution, are in
Table 6.
Table 6
Data for Figure 4the Flat Distribution
A
Test Score (X)
100
110
120
130
140
150
160
170
180
190
200
SUM
B
Frequency (f)
8
13
17
20
21
22
21
20
17
13
8
180
C
XMean (d)
50
40
30
20
10
0
-10
-20
-30
-40
-50
D
fd
400
520
510
400
210
0
-210
-400
-510
-520
-400
E
fd2
20,000
20,800
15,300
8,000
2,100
0
2,100
8,000
15,300
20,800
20,000
132,400
or
You can do the last part of this calculation, the square root of 132,400/180 (which is 736) by using
the square-root button on your little hand calculator.
Now let's compute the standard deviation for the data in Figure 5. The data are in Table 7, and
you follow the same steps we've just completed.
Table 7
Example of a Narrow or Concentrated Distribution
A
Test Score (X)
100
110
120
130
140
150
160
170
180
190
200
SUM
B
Frequency (f)
0
0
0
10
45
70
45
10
0
0
0
180
C
X - Mean (d)
50
40
30
20
10
0
-10
-20
-30
-40
-50
D
fd
0
0
0
200
450
0
-450
-200
0
0
0
E
fd2
0
0
0
4,000
4,500
0
4,500
4,000
0
0
0
17,000
or
The two standard deviations provide a statistical indication of the how different the distributions
are: 27 for the spread-out distribution and 10 for the bunched-up distribution.
So once we know the mean and median, why do we need to know the standard deviation? What
use is it?
The standard deviation is important because, regardless of the mean, it makes a great deal of
difference whether the distribution is spread out over a broad range or bunched up closely
around the mean. For example, suppose you have two classes whose mean reading scores are
the same. With only that information, you would be inclined to teach the two classes in the same
way. But suppose you discover that the standard deviation of one of the classes is 27 and the
other is 10, as in the examples we just finished working with. That means that in the first class
(the one where
27), you have many students throughout the entire range of performance.
You'll need to have teaching strategies for both the gifted and the challenged. But in the second
class (the one where = 10), you don't have any gifted or challenged students. They're all
average, and your teaching strategy will be entirely different.
The Normal Curve
Before we leave the standard deviation, it's a good time to learn a little more about the normal
curve. We'll be coming back to it later.
First, why is it called the normal curve? The reason is that so many things in life are distributed in
the shape of this curve: IQ, strength, height, weight, musical ability, resistance to disease, and so
on. Not everything is normally distributed, but most things are. Thus the term normal curve.
In Figure 6, we have a set of scores which are normally distributed. The range is from 0 to 200,
the mean and median are 100, and the standard deviation is 20. In a normal curve, the standard
deviation indicates precisely how the scores are distributed. Note that the percentage of scores
is marked off by standard deviations on either side of the mean. In the range between 80 and 20
(thats one standard deviation on either side of the mean), there are 68.26% of the cases. In other
words, in a normal distribution, roughly two thirds of the scores lie between one standard
deviation on either side of the mean. If we go out to two standard deviations on either side of the
mean, we will include 95.44% of the scores; and if we go out three standard deviations, that will
encompass 98.74% of the scores; and so on.
Another way to think about this is to realize that in this distribution, if you have a score thats
within one standard deviation of the mean, i.e., between 80 and 120, thats pretty averagetwo
thirds of the people are concentrated in that range. But if you have a score thats two or three
standard deviations away from the mean, that is clearly a deviant score, i.e., very high or very
low. Only a small percent of the cases lie that far out from the mean.
This is valuable to understand in its own right, and will become useful when we take up
determining the significance of difference between meanswhich were going to do next in
Lesson 3.
Figure 6
Normal Curve Showing the Percent of Cases Lying Within 1, 2, and 3 Standard Deviations From
the Mean
First, lets look at the formula for the t-test, and determine what we need to make the
computation:
Where:
is the mean for Group 1.
is the mean for Group 2.
is the number of people in Group 1.
The formula above is for testing the significance of difference between two independent samples, i.e., groups of different people. If we wanted to test the
difference between, say, the pre-test and post-test means of the same group of people, we would use a different formula for dependent samples. That
formula is:
Where:
But for now, well test the significance of difference between the mean salary of two different groups. You can try the one for dependent samples on your
own. (I knew youd welcome that opportunity.)
Tables 8 and 9 provide the numbers we need to compute the t-test for the difference in mean salaries of the two groups.
Table 8
Salaries and t-Test Calculation Data for Group 1
A
Salary (X)
20
25
30
35
40
B
Frequency (f)
1
2
3
4
5
C
X - Mean (d)
25
20
15
10
5
D
fd
25
40
45
40
25
E
fd2
625
800
675
400
125
45
50
55
60
65
70
SUM
6
5
4
3
2
1
36
0
-5
-10
-15
-20
-25
0
-25
-40
-45
-40
-25
0
125
400
675
800
625
5,250
B
Frequency (f)
0
2
3
3
4
6
6
5
3
2
2
36
C
X - Mean (d)
27
22
17
12
7
2
-3
-8
-13
-18
-23
D
fd
0
44
51
36
28
12
-18
-40
-39
-36
-46
E
fd2
0
968
867
432
196
24
54
320
507
648
1,058
5,074
We now know that t = .222. So what does that mean? Is the difference between the two means
statistically significant or not? To find out whether a t-test of any value is significant or not, we
simply look it up in a table that can be found in the appendices of any statistical text book. The
quick answer in this case is no, it is not statistically significant. That is, the 2-point difference in
the mean salaries of these two groups could likely have occurred by chance.
But thats the quick and dirty answer. Theres more about the matter of statistical significance we
need to understand. So were going to that important topic now, and well return to this example
after weve done that.
Lesson 4
Statistical Significance and
Table 10
t-Test Values Required to Reject the Null Hypthothesis at the .05 and .01 Levels of Confidence
(Two-Tailed Test)
__________________________________________________________________________
Degrees of Freedom (df) .05 .01
__________________________________________________________________________
20 2.09 2.85
21 2.08 2.83
22 2.07 2.82
23 2.07 2.81
24 2.06 2.80
25 2.06 2.79
26 2.06 2.78
27 2.05 2.77
28 2.05 2.76
29 2.05 2.76
30 2.04 2.75
35 2.03 2.72
40 2.02 2.71
45 2.01 2.70
50 2.01 2.68
55 2.00 2.67
60 2.00 2.66
65 2.00 2.66
70 2.00 2.65
75 1.99 2.64
80 1.99 2.64
85 1.99 2.64
90 1.99 2.63
95 1.99 2.63
100 1.98 2.63
Infinity 1.96 2.58
________________________________________________________________________
In order to use this table, we enter it with our t value (.222) and something called "degrees of
freedom." The degrees of freedom is simply n11+n21 or, in our case, 70. Note that there are two
columns of t values, one labeled .05, and the other labeled .01. If we go down to the degrees of
freedom nearest to ours, which would be 70, we find that both the .05 and the .01 t values are
substantially larger than our .222. So we didnt achieve a large enough t value to reject the null
hypothesis, i.e., to be able to conclude that the difference wasnt due to chance.
Why do we have two columns, one labeled .05 and the other .01? Because those are the two
levels of significance commonly used in statistical analysis. The t values in the .05 column are
likely to occur by chance 5 percent of the time, whereas the t values in the .01 column are likely
to occur by chance only 1 percent of the time.
Type I and Type II Errors
The choice of what significance level to use (.05, .01, or lower or higher) is the difficult choice that
you as the researcher must make. If you decide to accept the .05 level of confidence, which
requires a smaller t value, you can more easily reject the null hypothesis and declare that there is
a statistically significant difference between the means than if you select the .01 level, but you
will be wrong 5 percent of the time. This is the Type I error.
On the other hand, if you select the .01 value, you will be wrong only 1 percent of the time. But
since the .01 value requires a larger t value, you will less often be able to reject the null
hypothesis and say that there is a statistically significant difference between the means when in
fact that is the case. This is the Type II error. It is crucially important to an understanding of even
basic statistics that we have a clear understanding of these two errors. If you spend a little time
with Table 10, it will help you achieve this understanding.
Table 11.
Accepting and Rejecting Null Hypotheses and the Making of Type I and Type II Errors*
Decision
Where:
E is the effect size.
or
The effect size does not reach the .33 level, so the 3 point difference between the means would
not be regarded as practically consequential, even though its statistically significant.
But suppose the two means are 193 and 182, and the Ns and standard deviations are the same.
Then we have:
or
The 11 point difference between the means (with the associated score variability as reflected in
the standard deviations) exceeds the .33 threshold for practical significance. So in this case, we
would be justified in saying that the difference between the two groups is not only statistically
significant, it can also be regarded as having some practical educational meaning.
However, in the final analysis, you, as the experienced educator and administrator, must make
the judgment about practical meaning. Many times you will be presented with mean differences
that are large by any practical standard, but because of small Ns or large variances, theyre not
statistically significant. In those cases, the judgment is fairly easy: You would be on very soft
ground making policy and budgetary decisions based on differences that are not statistically
significant.
But the other case is more difficult. If you have a mean difference that is both statistically
significant and practically significant as indicated by the effect size, you still have to be the judge
of whether that difference justifies changing programs, spending more money, hiring or firing
staff, and so on.
The new knowledge you now have about how to determine statistical and practical significance
adds greatly to your ability to make decisions about the effectiveness of educational programs
and the formulation of educational policies, but there are no automatic answers. You, as the
responsible administrator, must bring your experience to bear in making the final decision.
Lesson 6
Correlation
What is a Correlation?
Thus far weve covered the key descriptive statisticsthe mean, median, mode, and standard
deviationand weve learned how to test the difference between means. But often we want to
know how two things (usually called "variables" because they vary from high to low)
are related to each other.
For example, we might want to know whether reading scores are related to math scores, i.e.,
whether students who have high reading scores also have high math scores, and vice versa. The
statistical technique for determining the degree to which two variables are related (i.e., the degree
to which they co-vary) is, not surprisingly, called correlation.
There are several different types of correlation, and well talk about them later, but in this lesson
were going to spend most of the time on the most commonly used type of correlation: the
Pearson Product Moment Correlation. This correlation, signified by the symbol r, ranges from
1.00 to +1.00. A correlation of 1.00, whether its positive or negative, is a perfect correlation. It
means that as scores on one of the two variables increase or decrease, the scores on the other
variable increase or decrease by the same magnitudesomething youll probably never see in
the real world. A correlation of 0 means theres no relationship between the two variables, i.e.,
when scores on one of the variables go up, scores on the other variable may go up, down, or
whatever. Youll see a lot of those.
Thus, a correlation of .8 or .9 is regarded as a high correlation, i.e., there is a very close
relationship between scores on one of the variables with the scores on the other. And
correlations of .2 or .3 are regarded as low correlations, i.e., there is some relationship between
the two variables, but its a weak one. Knowing peoples score on one variable wouldnt allow you
to predict their score on the other variable very well.
Computing the Pearson Product Moment Correlation
Lets do a correlation to see how the formula works and what it produces. The formula for the
Pearson product moment correlation is:
Where:
rxy is the correlation coefficient between X and Y.
n is the size of the sample.
X is the individuals score on the X variable.
Y is the individuals score on the Y variable.
XY is the product of each X score times its corresponding Y score.
X2 is the individual X score squared.
Y2 is the individual Y score squared.
Lets see what the correlation is between 30 students reading scores and their math scores. The
data we need to compute the formula are given in Table 12.
Table 12
Reading and Math Scores and the Associated Data for Computing the Pearson Product Moment
Correlation (N=30)
X
(Reading
(Math Scores)
X2
Y2
XY
Scores)
191
180
36481
32400
34380
103
101
10609
10201
10403
187
173
34969
29929
32351
108
103
11664
10609
11124
180
170
32400
28900
30600
118
113
13924
12769
13334
178
171
31684
29241
30438
127
122
16129
14884
15494
176
168
30976
28224
29568
134
130
17956
16900
17420
165
150
27225
22500
24750
147
145
21609
21025
21315
160
150
25600
22500
24000
157
154
24649
23716
24178
155
145
24025
21025
22475
168
164
28224
26896
27552
150
145
22500
21025
21750
172
170
29584
28900
29240
145
130
21025
16900
18850
185
179
34225
32041
33115
140
141
19600
19881
19740
195
193
38025
37249
37635
135
136
18225
18496
18360
100
101
10000
10201
10100
130
128
16900
16384
16640
125
121
15625
14641
15125
105
106
11025
11236
11130
120
118
14400
13924
14160
115
112
13225
12544
12880
110
108
12100
11664
11880
4381
4227
664583
616805
639987
Total ( )
So, we plug the numbers from this table into the formula, and do the math:
or
or
or
In this case, the correlation between reading and math scores is remarkably high (because I
concocted the numbers so it would turn out that way). With real scores, it would be high, but not
that high. If you glance over the numbers in Table 12, even before weve computed the correlation
you can easily see (in this small sample of 30) that high scores in reading tend to go with high
scores in math, low reading scores tend to go with low math scores, and so on. But, of course,
you wouldnt be able to see that pattern if you had a sample of 500.
Positive and Negative Correlations
I pointed out above that a correlation can vary from +1.00 to 1.00. The correlation we just
computed is a positive correlation. That is, high reading scores go with high math scores, low
with low, and so on. However, we could have a negative correlation. This is not something bad; it
simply denotes an association in which high scores on one variable go with low scores on the
other. For example, if we were computing a correlation between, say, amount of time students
watch television and their achievement score, we would find a negative correlation: high TV
watching is associated with lower achievement scores, and vice versa. Such a correlation might
be something like .71.
Determining Statistical Significance
OK, so we have a correlation coefficient. What precisely does it mean, and how do we interpret
it? Its not a percent, as many people mistakenly think.
First, we can determine its statistical significance in the same way we did with the t test. We can
look it up in a table in the appendices of any statistical text. In the case of our .98 correlation
between reading and math scores, if we look that up in the table for correlations, we find that the
value needed to reject the null hypothesis at the .01 level of confidence (and declare that the
correlation is statistically significant, or unlikely due to chance) for our sample of 30 is .45 (in this
case using the one-tailed test because the samples are dependent).
So if we were stating this finding in a research report, we could say that the correlation of reading
scores with math scores = .98 p <.01 with df = 28. (Now see how smart you are because you know
what all that means.)
Practical vs. Statistical Significance
But we have the same issue we had with the t-test: determining its practical vs. its statistical
significance. We dont have an effect test, as we did with the t-test, but we have something
similar. It has an imposing namethe coefficient of determinationbut youll be ecstatically
happy to learn that its very simple.
The coefficient of determination is nothing more than r2. You simply multiply r by itself, and
youve got it. OK, youve got it, what does it mean? The coefficient of determination, r2, tells us
how much of the variance in one of the variables is accounted for by the variance in the other
variable. Thus, if we have a correlation of .60 between, say, students achievement scores and a
measure of their socioeconomic status, r2 = 36. That means that 36% of the variance in the
students achievement scores (not 60, which is the correlation) can be accounted for by variance
in their socioeconomic status. But that also means that the remaining variance (64%) in
achievement scores cannot be accounted for by socioeconomic status, but is attributable to
many other factors, such as study time, intelligence, motivation, quality of instruction, and so on.
Other Correlations
All the correlations weve talked about so far have been based on what we call interval data, i.e.,
data where the distance between scores or values is the same. The distance between a 65 and 66
is assumed to be the same as the distance between a 14 and a 15. But many times we want to
determine the relationship between two variables when that is not the case. Suppose, for
example, we want to compute the correlation between students class rank in their junior year
with their class rank in their senior year. Ranks are not the same as scores; there may be a much
smaller (or bigger) difference between ranks 1 and 2 than between ranks 8 and 10 (like the
difference between the first two teams and the last two teams in football or baseball). If the data
we have are ranks rather than scores, we cant use the product moment formula. But there is
another correlation formula for use with ranks (its called rho).
And suppose we want to determine the relationship between two variables when one is based on
what is called nominal or categorical data, and the other is interval data. An example would be
correlating gender with achievement scores. Again, the product moment correlation cant be
used, but there is also a special formula for doing a correlation with these disparate types of
data. In this case, its called the point biserial correlation.
Table 13 displays the several different types of correlation for use with variables based on
different levels of measurement. In this course, were not going to compute them. But with the
knowledge and skills youve developed thus far, when you encounter situations where the
variables you want to correlate are based on different levels of measurement (interval, ordinal, or
nominal), youll be able to select the type you need.
Table 13
Alternative Types of Correlation for Different levels of Measurement*
Type of Measurement and Examples
Correlation Being
Computed
Type of
Correlation
Interval (math
scores)
Correlation between
reading and math
achievement
Pearson product
moment (r)
Correlation between
class rank in the last
two years of high
school
Spearman rank
coefficient (rho or
p)
Variable X
Variable Y
Interval (reading
scores)
Ordinal (class
rank in the junior
year)
Nominal (social
class, high,
middle, or low
Ordinal (rank in
high school
graduating class)
Correlation between
social class and
rank in high school
Rank biserial
coefficient (rbs)
Nominal (family
configuration,
e.g., intact or
single parent)
Interval (grade
point average)
Correlation between
family configuration
and grade point
average
Point biserial
(rpb)
Nominal (voting
preference
Republican or
Democrat)
Nominal (gender,
i.e., male or female)
Correlation between
voting preference
and gender
Phi coefficient
( )
*This table was adapted from a similar one found in Neil Salkinds Statistics for People Who
(Think They) Hate Statistics, Sage Publications, 2000, p. 101.
Correlation and Cause
Before we conclude this lesson, we need to understand one of the most important facts about
correlation, namely, that it does notnecessarily indicate cause. It may be that one of the variables
does in fact cause the other, but we dont know that just from the fact that the two are correlated.
Smoking and Lung Cancer
It is now an established fact that smoking causes lung cancer, but that conclusion could not be
reached simply because there is a correlation between the two. When the association between
smoking and lung cancer first appeared, and many argued that indicated that smoking caused
lung cancer, the tobacco companies argued that there were other factors that could explain the
relationship, e.g., smoking is higher among blue collar workers who also have greater exposure
to other toxic elements, smokers drink more and lead more stressful lives, and so on. And
logically they were right. It took other kinds of direct physiological evidence and animal
experiments to prove that the association was indeed causal.
We often find strong correlations where clearly a causal relationship makes no sense. For
example, we may find a strong correlation between car sales and college attendance. Neither one
of these is causing the other; both increase during financially prosperous times.
Wine Consumption and Heart Disease
But it is when two correlated variables seem likely to be causally related to one another that we
tend to jump to the unsupportable conclusion that one causes the other. For example, when we
hear about a correlation between an increase in stork nests and the birth rate in Germany, we
laugh it off as clearly due to some unknown third factor. But when we hear that moderate wine
consumption is associated with lower rates of heart disease, were ready to immediately
conclude (especially if were wine lovers) that there is obviously some medically beneficial
element in wine. But when these reports first came out, skeptics (they were probably
statisticians) pointed out that other things could account for the association between moderate
wine consumption and lower rates of heart disease. Moderate wine drinkers are likely to be more
educated, non-smokers, get more exercise, and have lower rates of obesity. Again, as it has
turned out, other kinds of physiological evidence do support the conclusion that moderate wine
consumption is medically beneficial, but we cant conclude that just on the basis of the
correlation.
The Important Lesson About Correlation and Cause
The important lesson here is that the correlation coefficient is a highly useful statistic for
determining the relationship between variables, but a correlation does not demonstrate a causal
relationship between the variables.
The same holds for differences between means. If, for example, we give a pre-test and a post-test
to students who have participated in a new reading program, and we find that the increase in the
mean reading score is both statistically and practically significant, that does not entitle us to
conclude that the new program caused the increase. Any number of other factors could account
for the increase: the students were older, and they had been exposed to many other influences
and experiences that could haveand probably didimprove their reading. To determine how
much, if any, of the improvement was caused by the new program, we would have to employ a
control group (or some other method for determining "the expectation of non-treatment"). This
would tell us how much improvement occurred in comparable students who had the same
experiences except for the new reading program. For additional information on these and other
designs that address this question, see the Ed Leaders Evaluation Web Site
athttp://edl.nova.edu/secure/EVASupport/index.html.
Lesson 7
Chi Square
Parametric and Non-Parametric Statistics
Most of the statistics weve learned so farthe mean, the standard deviation, the t- test, and the
product moment correlationbelong to a category called parametric statistics. Thats because it
is assumed the data used to compute them have certain parameters or meet certain conditions.
One of these is that the variances are similar; another is that the sample is large enough to be
representative of the universe from which it is drawn. We used examples of 30 or more cases
when we worked on the mean, the t-test, and the product moment correlation because there is a
general consensus among statisticians that this is the minimum-size sample to use with
parametric tests. You should keep this in mind when using these tests in your practicum and in
your own research.
But what do we do when we cant meet these conditions? Happily, theres another category of
statistics, and you shouldnt be surprised to learn that its called non-parametric statistics. We
can do many of the same things with non-parametric statistics. Theyre regarded as somewhat
less powerful than parametric statistics, but theyre not to be looked down on. When conditions
call for them, they are the things to use.
Chi Square
One of the most useful of the non-parametric statistics is chi square. We use it when our data
consist of people distributed across categories, and we want to know whether that distribution is
different from what we would expect by chance (or another set of expectations). We dont have
scores, we dont have means. We just have numbers, or frequencies. In other words, we have
nominal data.
For example, suppose we have the data in Table 14 that display the number of students who elect
different majors, and we want to know whether those numbers differ from chance. In other words,
are some majors selected more often than others, or is the selection pattern essentially random?
Table 14
Number of Students Selecting Different Majors
Pre-Med
50
Computer
Sciences
85
English
Literature
25
Education
60
Engineering
Total
80
300
The null hypothesis here, of course, is that there is no difference between this distribution of
major selections from what would be expected by chance. So what chi square does is compare
these numbers (the observed frequencies) with those that would be expected by chance (the
expected frequencies).
The formula for chi square is:
Where:
Major
O (observed
frequency)
E (expected
frequency)
OE
(OE)2
(OE)2/E
Pre-Med
50
60
-10
100
1.67
Computer
Sciences
85
60
25
625
10.42
English
Literature
25
60
-35
1225
20.42
Education
60
60
0.00
Engineering
80
60
20
400
6.67
Total
300
300
39.17
By now, you know the next step: determining if we can reject the null hypothesis. We do it the
same way we did for the t-test and the correlation. We enter the chi square significance table
(which I have handy, but you dont) with our chi square value (39.17) and the appropriate degrees
of freedom. For chi square, the degrees of freedom are equal to the number of rows minus one
(R1). In our case we have five rows, so df = 4.
Entering the chi square table with our result of 39.17 and df = 4, we find that we need a chi square
value of 13.28 to reject the null hypothesis at the .01 level of confidence. We clearly have that, so
we can say that the distribution of major selections is a not simply a chance pattern; or
= 39.7
p <.01, df = 4.
Lesson 8
Summarizing the Steps and Moving On
In the statistical tests weve calculated (the t-test, correlation, and X2), weve gone through a
series of steps that youll go through when you compute any statistical test.
Recapping, here they are:
1. First, determine the level of measurement you have. Are the data you have interval,
ordinal, or nominal?
2. If you have interval data, determine whether they meet the requirements of a parametric
test (adequate sample size and variance similarity).
3. Based on the determinations you made from (1) and (2), select the statistical test (t, r, X2,
or whatever).
4. Calculate the values required, plug them into the formula, and compute the test. (Now that
you have gone through these calculations and understand them, the labor can be done for
you by any one of the available statistical software packages.)
5. Select the level of risk you want to take in rejecting the null hypothesis and making (or
avoiding) the Type I and Type II errors. Usually that will be .05 or .01.
6. Enter the appropriate significance table (e.g., for t, r, or X2) with the test result and the
proper degrees of freedom.
7. Determine whether your test result is large enough to reject the null hypothesis and
enable you to conclude that it is statistically significant.
8. If it is statistically significant, use whatever additional tests may be available (e.g., the
effect test, the coefficient of determination, etc.) and your own reasoned judgment to
determine if the result is also practically significant.
*
Congratulate yourself. The fact that you understand these steps and can execute them shows
how far youve come. You now have a good grip on basic statistics. You can understand them in
research journals, and you can use them in your practicum and in your own research. And you
are now in a position to go on to more advanced statistics (I know you cant wait).
References
I have not provided a set of references because there are literally dozens of introductory
statistics texts, and just about any of them will do. You definitely should have one of these texts
for reference purposes, especially for the significance tables they all provide. My favorite, and the
one I highly recommend, is Neil Salkind's Statistics for People Who (Think They) Hate Statistics.
Sage Publications, 2000.
Statistical Software
This short course has taken you through both the explanation of the major statistical concepts
and the actual computation of the most common statistical tests you will be encountering in the
research literature and using in your own research.
Now that you have this essential, basic understanding, you wont need to do any computations
by hand. There are software applications that will do that for you. Once you enter the data, they
will compute a correlation in less than a second, and provide you with the significance levels.
There are a number of such programs. You can, in fact, do a number of statistical tests with
Microsoft Excel, which is mainly a spreadsheet program. And many of you probably have this
application on your computers, either as a stand-alone program or as part of Microsoft Office.
But one of the most highly regarded and user friendly statistical programs is GB-STAT, so if you
dont already have such a program, this would be a good one to get.
Good luck in all your research endeavors.
John Evans
evansj@nsu.nova.edu