Professional Documents
Culture Documents
Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at
http://www.jstor.org/page/info/about/policies/terms.jsp
JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content
in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship.
For more information about JSTOR, please contact support@jstor.org.
Taylor & Francis, Ltd. and American Statistical Association are collaborating with JSTOR to digitize, preserve and extend
access to The American Statistician.
http://www.jstor.org
This content downloaded from 137.159.75.151 on Mon, 27 Apr 2015 18:05:30 UTC
All use subject to JSTOR Terms and Conditions
test is sup{Pr( XAXB> a) 1.75 ' P, = P2 ' 1}The power function K(p1,P2, n1, n2) is Pr(XA-XBI > a).
To constructthe tables, we evaluate the power function (for
given n1 and n2 and a = .1) at the points (P1,P2) in
Iml-m21
Pr((XA-
. 1) + Pr(XA-
XB <
XB)/1\p/q(1/nj
+ l/n2) 2 1)
+ Pr((XA -
XB)/ /pq(1nl
= Pr(Z 2 1) + Pr(Z ?
Pr(XA-XB>
10},
-.1)
+1/n2)
-1)
-1),
REFERENCES
Bickel, P. J., and Doksum, K. A. (1977), MathematicalStatistics: Basic
Ideas and Selected Topics, San Francisco:Holden-Day.
Cohen, Ayala (1983), "On the Effect of Class Size on the Evaluationof
Lecturers'Performance,"The American Statistician, 37, 331-333.
In 1885, Sir Francis Galton first defined the term "regression" and completed the theory of bivariatecorrelation. A
decade later, Karl Pearsondeveloped the index that we still
use to measurecorrelation,Pearson'sr. Ourarticleis written
in recognition of the 100th anniversaryof Galton's first
discussion of regression and correlation. We begin with a
brief history. Then we present 13 different formulas, each
of which representsa different computationaland conceptual definition of r. Each formula suggests a different way
of thinkingaboutthis index, from algebraic, geometric, and
trigonometricsettings. We show thatPearson's r (or simple
functions of r) may variouslybe thoughtof as a special type
of mean, a special type of variance, the ratio of two means,
the ratio of two variances, the slope of a line, the cosine of
an angle, and the tangentto an ellipse, and may be looked
at from several other interestingperspectives.
INTRODUCTION
We are currentlyin the midst of a "centennial decade"
for correlationand regression. The empiricaland theoretical
developmentsthat defined regressionand correlationas statistical topics were presentedby Sir FrancisGalton in 1885.
Then, in 1895, Karl Pearson published Pearson's r. Our
article focuses on Pearson's correlation coefficient, presenting both the backgroundand a number of conceptualizations of r that will be useful to teachers of statistics.
We begin with a brief history of the development of
correlationand regression. Following, we present a longer
review of ways to interpretthe correlationcoefficient. This
presentationdemonstratesthatthe correlationhas developed
into a broad and conceptually diverse index; at the same
time, for a 100-year-old index it is remarkablyunaffected
by the passage of time.
The basic idea of correlationwas anticipatedsubstantially
before 1885 (MacKenzie 1981). Pearson (1920) credited
Gauss with developing the normal surface of n correlated
variates in 1823. Gauss did not, however, have any particular interest in the correlation as a conceptually distinct
notion; instead he interpretedit as one of the several parametersin his distributionalequations. In a previous hisThe American Statistician, February 1988, Vol. 42, No. 1
This content downloaded from 137.159.75.151 on Mon, 27 Apr 2015 18:05:30 UTC
All use subject to JSTOR Terms and Conditions
59
BASED ON TABLE I.
(all femlaleheights cr multiplied by 108 )
DIAGRAM
ADULT CHILDREN
their Heights, anA Devia.tions from 68t inhes.
MID-PARENTS
Ej5
HeiutsD.vmaZas 8
in
in
nches indces
-3
66
67
-2
-I
68
619
0
#I
70
.3
72-
.3
'2
70-
+1
69
68
9i~
~~
J4\
J3
7 -1
67~~~~~~
-2
*4
7j7
Person
Date
1823
1843
1846
1868
1877
1885
1888
1895
1920
1985
Karl Pearson
1.
X) (Yj-
Y)
Y)2]1/2
This, or some simple algebraic variant, is the usual formula found in introductorystatistics textbooks. In the numerator,the raw scores are centered by subtractingout the
mean of each variable, and the sum of cross-productsof the
centeredvariablesis accumulated.The denominatoradjusts
the scales of the variablesto have equal units. Thus Equation
(1.1) describes r as the centered and standardizedsum of
cross-productof two variables. Using the Cauchy-Schwartz
inequality, it can be shown that the absolute value of the
numeratoris less than or equal to the denominator(e.g.,
Lord and Novick 1968, p. 87); therefore, the limits of + 1
are established for r. Several simple algebraic transformations of this formulacan be used for computationalpurposes.
The American Statistician, February 1988, Vol. 42, No. 1
This content downloaded from 137.159.75.151 on Mon, 27 Apr 2015 18:05:30 UTC
All use subject to JSTOR Terms and Conditions
61
2.
CORRELATION AS STANDARDIZED
COVARIANCE
sxylsxsy,
(2.1)
3.
by.x(sx/sy)
bx.(sylsx),
(3.1)
where by.x and bx.y are the slopes of the regression lines
for predicting Y from X and X from Y, respectively. Here,
the correlation is expressed as a function of the slope of
either regressionline and the standarddeviationsof the two
variables. The ratio of standarddeviations has the effect of
rescaling the units of the regression slope into units of the
correlation. Thus the correlationis a standardizedslope.
A similar interpretationinvolves the correlation as the
slope of the standardizedregression line. When we standardize the two raw variables, the standarddeviations become unity and the slope of the regressionline becomes the
correlation.In this case, the interceptis 0, andthe regression
line is easily expressed as
zy
r zx.
(3.2)
From this interpretation,it is clear that the correlationrescales the units of the standardizedX variable to predict
units of the standardizedY variable. Note that the slope of
the regression of zy on Zxrestrictsthe regressionline to fall
between the two diagonals portrayedin the shaded region
of Figure 2. Positive correlations imply that the line will
pass through the first and third quadrants;negative correlations imply that it will pass throughthe second and fourth
quadrants.The regression of Zx on Zy has the same angle
with the Y axis that the regression of Zy on Zx has with the
X axis, and it will fall in the unshadedregion of Figure 2,
as indicated.
62
4.
r= +
(4.1)
.x.
VSYR
Y)2(YT
Y)
ST
=I
= SY/Sy.
(5.1)
r=
8.
(6.1)
zxzyIN.-
r = sec(,3) ? tan(,3).
(7.1)
r=
(8.1)
cos(a).
When the angle is 0, the vectors fall on the same line and
cos(a) = ? 1. When the angle is 90?, the vectors are perpendicular and cos(a) = 0. [Rodgers, Nicewander, and
Toothaker(1984) showed the relationshipbetween orthogonal and uncorrelatedvariable vectors in person space.]
zx:
rZy
zy
.. . . . . .
Figure2
The
Geometryof BivariateCorrelationfor Standardized....Variable.
63
/)2.
(9.1)
r=
s(zY+zx)/2- 1.
(9.2)
A1 - (h/H)2.
(10.1)
Zy
tangent
at Zx=O
10.
This interpretationis due to Chatillon (1984a). He suggested drawing a "birthdayballoon" aroundthe scatterplot
of a bivariaterelationship.The balloon is actually a rough
ellipse, from which two measures-h and H-are obtained
64
r = (D2
d2)/(D2 + d2).
(11.1)
These axes are also portrayedin Figure 3, and the interpretationis invariantto choice of ellipse, as before.
+ n -2,
(12.1)
13.
(13.1)
,1.1,
CONCLUSION
Certainlythere are other ways to interpretthe correlation
coefficient. A wealth of additional fascinating and useful
portrayalsis available when a more statistical and less algebraic approachis taken to the correlationproblem. We
in no sense presume to have summarizedall of the useful
or interestingapproaches,even within the fairly tight framework that we have defined. Nevertheless, these 13 approaches illustratethe diversity of interpretationsavailable
for teachers and researcherswho use correlation.
The American Statistician, February 1988, Vol. 42, No. 1
This content downloaded from 137.159.75.151 on Mon, 27 Apr 2015 18:05:30 UTC
All use subject to JSTOR Terms and Conditions
65
REFERENCES
Box, J. F. (1978), R. A. Fisher: The Life of a Scientist, New York: John
Wiley.
Brogden, H. E. (1946), "On the Interpretationof the CorrelationCoefficient as a Measure of Predictive Efficiency," Journal of Educational
Psychology, 37, 65-76.
Carroll, J. B. (1961), "The Nature of the Data, or How to Choose a
CorrelationCoefficient," Psychometrika,26, 347-372.
Chatillon, G. (1984a), "The Balloon Rules for a Rough Estimate of the
CorrelationCoefficient," The AmericanStatistician, 38, 58-60.
(1984b), "Reply to Schilling," TheAmericanStatistician, 38, 330.
Cook, T. C., and Campbell, D. T. (1979), Quasi-Experimentation,Boston: Houghton Mifflin.
Draper, N. R., and Smith, H. (1981), Applied Regression Analysis, New
York: John Wiley.
Fisher, R. A. (1925), Statistical Methodsfor Research Workers, Edinburgh, U.K.: Oliver & Boyd.
Galton, F. (1885), "Regression Towards Mediocrity in HereditaryStature," Journal of the AnthropologicalInstitute, 15, 246-263.
Henrysson, S. (1971), "Gathering,Analyzing, and Using Data on Test
Items," in Educational Measurement,ed. R. L. Thorndike, Washington, DC: American Council on Education, pp. 130-159.
Election Recounting
BERNARD HARRIS*
1.
INTRODUCTION
66
THE ELECTION
This content downloaded from 137.159.75.151 on Mon, 27 Apr 2015 18:05:30 UTC
All use subject to JSTOR Terms and Conditions