Professional Documents
Culture Documents
discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/272259980
CITATIONS READS
2 75
1 author:
Paul Louangrath
Bangkok University
32 PUBLICATIONS 3 CITATIONS
SEE PROFILE
ABSTRACT
The purpose of this writing is to introduce researchers to correlation coefficient
calculation. The Pearson Product Moment Correlation Coefficient is the most
common type of correlation; however, the Pearson correlation coefficient may not be
applicable in all cases. There are many types of correlation coefficient. The correct
choice of correlation coefficient depends on the classification of the independent
variable (X) and dependent variable (Y). Data are classified into one of three types:
quantitative, nominal and ordinal. This writing explains various types of correlations
on the basis of X-by-Y data type combination.
Key words:
Biserial correlation coefficient, Goodman-Kruskal lambda, Pearson correlation
coefficient, point biserial correlation coefficient, polychoric coerrelation coefficient,
rank biserial correlation coefficient, Spearman rho, and tetrachoric correlation
coefficient.
JEL Code:
C10, C12, C13 & C19
Citation:
Louangrath, Paul I., 2417910 (March 30, 2014). Available at SSRN: http://ssrn.com/abstract=2417910
-1-
Correlation Coefficient According to Data Classification Dr. Paul T.I. Louangrath
This section of the writing includes ten types of correlation coefficients; each
type of the correlation coefficient is used according to the characteristic of the data
arrays: quantitative, ordinal, or nominal for the variables X and Y . There are nine
common correlation coefficient types; one additional type is a variance of the
tetrachoric correlation ( 2 × 2 ) made to accommodate K × L contingency data for
ordinal-x-ordinal data. This extension of Pearson’s 2 × 2 contingency is called
polychoric correlation. Before examining each type of correlation, it is important to
be familiar with data classification.
-2-
Correlation Coefficient According to Data Classification Dr. Paul T.I. Louangrath
measure for central tendency. IQ test, for instance, is ordinal data; it is ranked data.
1
There is no measurement to quantify intelligence.
1 ⎛ X i − X ⎞ ⎛ Yi − Y ⎞
r= ⎜ ⎟⎜ ⎟ (1)
n − 1 ⎝ s X ⎠ ⎝ sY ⎠
where …
1
Mussen, Paul Henry (1973). Psychology: An Introduction. Lexington (MA): Heath. p. 363. ISBN 0-
669-61382-7. “The I.Q. is essentially a rank; there are no true ‘units’ of intellectual ability.”
2
Michell, J. (1997). Quantitative science and the definition of measurement in psychology. British
Journal of Psychology, 88, 355–383.
-3-
Correlation Coefficient According to Data Classification Dr. Paul T.I. Louangrath
Xi − X
Z= is the standard score measuring how far the individual data
sX
point is located away from the mean;
1 n
X= ∑ Xi
n − 1 i =1
is the mean of X i , and
1 n
Y = ∑ Yi is the mean of Yi .
n − 1 i =1
3
Another means to define r is to use the slope of the linear equation
Y = a + bX + c as the parameter and multiply the slope by the quotient of the standard
deviation of X divided by the standard deviation of Y, thus:
⎛s ⎞
r = b⎜ X ⎟ (2)
⎝ sY ⎠
where …
n∑ XY − ( ∑ X )( ∑ Y )
b= where X : ( x1, x2 ,..., xn ) and Y : ( y1, y2 ,..., yn )
n∑ X − ( ∑ X )
2 2
(3)
1 n 1 n
sX = ∑ ( xi − x )2 and sY = ∑ ( yi − y )2 (4)
n − 1 i =1 n − 1 i =1
The value of the correlation coefficient ranges between -1 and +1. A value of
zero means that there is no association between the two arrays. Negative coefficient
means that there is an obverse association. If there is an increase in X, there is a
3
Y = a + bX + c where a is the Y-intercept, b is the slope of the linear regression line, and c is the
forecast error. In basic statistics, the linear regression line equation is obtained through the following
three statements:
I XY = N ∑ XY − ( ∑ X )( ∑ Y ) a = Y − bX
2 I
II X = N ∑ X 2 − ( ∑ X ) where b = XY
II X
2
IIIY = N ∑ Y 2 − ( ∑ Y )
⎡ 1 ⎤ ⎡⎢ ⎛ ( I )2 ⎞ ⎤
c= ⎢ ⎥ ( Y ) ⎜ XY ⎟ ⎥
III −
⎣ N ( N − 2) ⎦ ⎢⎣ ⎜ II X ⎟ ⎥
⎝ ⎠⎦
-4-
Correlation Coefficient According to Data Classification Dr. Paul T.I. Louangrath
corresponding decrease in Y and vice versa. A positive coefficient means that there is
a perfect association. If there is an increase in X, there is also an increase in Y.
(Y1 − Y0 ) ⎛⎜
pq ⎞
⎟
rb = ⎝ Y ⎠ (5)
σY
where …
Y = Y score means for data pairs with x : (1, 0) :
n n
1 1 1 0
Y1 = ∑ yi and
n1 i =1
Y0 = ∑ yi
n0 i =1
q = 1− p ;
p = proportion of data pairs with scores x : (1, 0) ; and
σY = population standard deviation for the y data and Y is the height of the
standardized normal distribution at point z where P ( z ' < z ) = q and P( z ' > z ) = p .
Note that the probability for p and q may be given by the Laplace Rule of
4
Succession:
s +1
p= where s = number of success and n = total observations. (6)
n+2
q = 1− p (7)
x −µ
t= (8)
S/ n
⎛ S ⎞
µ =t⎜ ⎟− x (9)
⎝ n⎠
4
Viz. Laplace, Pierre-Simon (1814). Essai philosophique sur les probabilities. Paris: Courcier.
-5-
Correlation Coefficient According to Data Classification Dr. Paul T.I. Louangrath
With known µ , the population standard deviation may be determined through the Z-
equation. The Z-equation is given by:
x −µ
Z= (10)
σ/ n
⎛ x −µ ⎞
σ =⎜ ⎟ n (11)
⎝ Z ⎠
Note that p and q are used when discrete probability is involved. In point
biserial correlation, discrete probability is used because Y (response variable) exists as
a ranked or ordinal variable. The response Y : ( y1, y2 ,... yn ) either falls with a rank
placement: [1st , 2nd ,..., ith ] or it does not. This “either or” argument dichotomizes the
ordinal variable into {Yes | No} identifier which could be score as Yes = 1 and No =
0. Therefore, p and q of the discrete binomial probability is used. The test statistic
for the binomial probability is given by:
X
−p
Zbin = n See infra.
pq
n
⎛ M − M 0 ⎞ n1n0
rpb = ⎜ 1 ⎟ (12)
2
⎝ sn ⎠ n
1 n
sn = ∑
n i =1
( Xi − X )
2
(13)
-6-
Correlation Coefficient According to Data Classification Dr. Paul T.I. Louangrath
n
1 2
M0 = ∑ X i . The combined sample size is given by: n = n1 + n2 . If a data comes
n2 i =1
from only one sample of the population, sn −1 is used for the standard deviation. Thus
rpb is written as:
⎛ M − M 0 ⎞ n1n0
rpb = ⎜ 1 ⎟ (14)
⎝ sn −1 ⎠ n(n − 1)
The standard deviation for the “sample only” data set is given by:
1 n
sn −1 = ∑
n − 1 i =1
( Xi − X )
2
(15)
⎛ M − M 0 ⎞ n1n0 ⎛ M1 − M 0 ⎞ n1n0
rpb = ⎜ 1 ⎟ =⎜ ⎟
2
⎝ sn ⎠ n ⎝ sn −1 ⎠ n(n − 1)
n1 + n0 − 2
t pb = rpb (16)
2
1 − rpb
⎛ M − M 0 ⎞ ⎛ n1n0 ⎞
rb = ⎜ 1 ⎟⎜ 2 ⎟ (17)
⎝ sn ⎠⎝ n u ⎠
5
Stephens, M. A. (1974). “EDF Statistics for Goodness of Fit and Some Comparisons.” Journal of the
American Statistical Association 69: 730–737; and M. A. Stephens (1986). “Tests Based on EDF
-7-
Correlation Coefficient According to Data Classification Dr. Paul T.I. Louangrath
There are three types of point-biserial correlation, namely (i) the Pearson
correlation between item scores and total test scores including the item scores, (ii) the
Pearson correlation between item scores and total test score excluding the item scores,
and (iii) correlation adjusted for the bias resulted from the inclusion of the items
scores. The correlation adjusted for the bias resulted from the inclusion of the items
scores is given by:
M1 − M 0 − 1
rupb = (18)
⎛ n2 s 2 ⎞
⎜
⎜ n1n0 ⎟
( 1 0)
n ⎟ − 2 M − M +1
⎝ ⎠
n!
P( X ) = p X q X −n (19)
(n − X )!n !
X
−p
Zbin = n (20)
pq
n
Recall that the term a in the 2 × 2 contingency table is the frequency of for perfect
match of {Yes: observed} and {Yes: forecast}. The frequency a is equal to X in
P( X ) as illustrated in the table below.
Y
YES NO Forecast
YES a b P( F ) = a + b
X
NO c d 1 − P( F )
Observation P (O) = a + c 1 − P(O) a+b+c+d
The term a + b + c + d is the combined joint probability of all events in the set.
Statistics.” In D’Agostino, R.B. and Stephens, M.A. Goodness-of-Fit Techniques. New York: Marcel
Dekker. ISBN 0-8247-7487-6.
-8-
Correlation Coefficient According to Data Classification Dr. Paul T.I. Louangrath
6
of a well order set: {first, second, third, …, nth }. However, there is a claim made by
Lehman that the Spearman coefficient can be used for both continuous and discrete
7
variable. This section focuses on the ordinal data of both dependent and independent
variables.
Assume that there are two arrays of data called independent variable: X i and
dependent variable: Yi . The ordinal data of these variable are xi and yi respectively.
The correlation coefficient of xi and yi is given by:
ρ=
∑ i ( xi − x )( yi − y ) (21)
2 2
∑ i ( xi − x ) ∑ i ( yi − y )
There is an alternative calculation of rho through the use of the difference of two
8
ranked arrays xi and yi , thus:
1 − 6∑ di
ρ= (22)
(
n n2 − 1 )
where di = xi − yi and n is the number of elements in the paired set: i in d . This
method is not used if the researcher is looking for top X . Generally, equation (21) is
used.
The test statistic used for the Spearman rank correlation is given by the Z-test
or t-test. In order to use the Z-score test, it is necessary to find the Fisher’s
transformation of r . The Fisher’s transformation for the correlation is given by:
1 ⎛ 1+ r ⎞
F (r ) = ln ⎜ ⎟ (23)
2 ⎝ 1− r ⎠
⎛S ⎞
where r = b ⎜ x ⎟ and ln is the natural log of base e = 2.718... The test statistic under
⎜ Sy ⎟
⎝ ⎠
the Z-equation is given by:
6
Dauben, J. W. (1990). Georg Cantor: His Mathematics and Philosophy of the Infinite. Princeton, NJ:
Princeton University Press; p. 199; Moore, G. H. (1982). Zermelo's Axiom of Choice: Its Origin,
Development, and Influence. New York: Springer-Verlag; p. 52; and Suppes, P. (1972). Axiomatic Set
Theory. New York: Dover; p. 129.
7
Lehman, Ann (2005). Jmp For Basic Univariate And Multivariate Statistics: A Step-by-step Guide.
Cary, NC: SAS Press. p. 123. ISBN 1-59047-576-3.
8
Myers, Jerome L.; Well, Arnold D. (2003). Research Design and Statistical Analysis (2nd ed.).
Lawrence Erlbaum. p. 508. ISBN 0-8058-4037-0; and Maritz, J. S. (1981). Distribution-Free Statistical
Methods. Chapman & Hall. p. 217. ISBN 0-412-15940-6.
-9-
Correlation Coefficient According to Data Classification Dr. Paul T.I. Louangrath
⎛ n−3 ⎞
z = ⎜⎜ ⎟⎟ F (r ) (24)
⎝ 1.06 ⎠
9
The null hypothesis is that r = 0 which means that there is a statistical independence
or no dependent association. In the alternative, the test statistic may also be
determined by the t-test thus:
n−2
tr = r (25)
1− r2
10
The degree of freedom is given by df = n − 2 . The argument in support of
11
tr approach rests on the idea of permutation. The equivalence of the above
12
determination is the Kendall’s tau. Kendall’s tau is beyond the scope of the present
topic.
Y
Yes No
Yes a b pF
X No c d 1 − pF
po 1 − po
Table 2.0: The 2 × 2 contingency table expressing the frequencies in terms of joint
probabilities.
9
Choi, S. C. (1977). “Tests of Equality of Dependent Correlation Coefficients.” Biometrika 64(3):
645–647; and Fieller, E. C.; Hartley, H. O.; Pearson, E. S. (1957). “Tests for rank correlation
coefficients. I.” Biometrika 44: 470–481.
10
Press; Vettering; Teukolsky; Flannery (1992). Numerical Recipes in C: The Art of Scientific
Computing (2nd ed.). p. 640.
11
Kendall, M. G.; Stuart, A. (1973). The Advanced Theory of Statistics, Volume 2: Inference and
Relationship. Griffin. ISBN 0-85264-215-6. (Sections 31.19, 31.21).
12
Viz. Kowalczyk, T.; Pleszczyńska E. , Ruland F. (eds.) (2004). Grade Models and Methods for Data
Analysis with Applications for the Analysis of Data Populations. Studies in Fuzziness and Soft
Computing 151. Berlin Heidelberg New York: Springer Verlag. ISBN 978-3-540-21120-4.
- 10 -
Correlation Coefficient According to Data Classification Dr. Paul T.I. Louangrath
The three frequencies: a , po and pF determine the values in the table. The bias of
this determination is given by:
P a+b
Bias = F or Bias = (26)
Po a+c
Juras and Pasaric (2006) formally explained tetrachoric correlation coefficient as:
∞ ∞
a=∫ φ ( x1, x2 , r ) dx1 / dx2 ,
zO ∫z F
(2)
⎡ ⎤
φ ( x1, x2 , r ) =
1 ⎢
exp −
1 2 2 ⎥
(
x1 − 2rx1x2 + x2 .(3) )
2π 1 − r 2 ⎢
⎣⎢
2 1 − r 2
( ) ⎥
⎦⎥
The line x1 = zO and x2 = z F divide the bivariate normal into four
quadrants whose probabilities correspond to relative frequencies in the
2 × 2 table.
Clearly, the SDN − S zO and z F are uniquely determine by PO and PF ,
respectively. The double integral in (2) can be expressed as (National
Bureau of Standards, 1959):
a=
1 π
2π ∫arccos r
⎡ 1 2
exp ⎢ − zO
⎣ 2
( ⎤
+ z F2 − 2 zO z F cos ω cosec2ω ⎥d ω
⎦
) (4)
- 11 -
Correlation Coefficient According to Data Classification Dr. Paul T.I. Louangrath
∞ ∞
pa = ∫
Φ=1(1− p X ) ∫Φ=1(1− pY ) 2 1 2 tc
φ ( x , x , r )dx2 dx1 (27)
(
pa = Φ Φ −1 (1 − p X ) , Φ −1 (1 − pY ) , rtc ) (28)
These formal definitions are not helpful for the actual calculation of the
tetrachoric correlation coefficient. For practical purpose, assume that the 2 × 2
contingency table below as the basis for further discussion of the tetrachoric
correlation coefficient: rtc .
Y
Pos. Neg.
Pos. pa pb pX
X Neg. pc pd 1 − pX
pY 1 − pY
Table 3.0: The 2 × 2 contingency table expressing the probability of frequencies in
terms of joint probabilities.
a − PO PF
sP = (29)
PO (1 − PO )
2 ( a − PO PF )
sH = (30)
PO + PF − 2 PO PF
a − PO PF
sD = (31)
PO (1 − PO ) PF (1 − PF )
In addition, the Yule’s odd ratio skill score is said to also give the approximation of
the tetrachoric correlation coefficient:
a − PO PF
SY = (32)
a ⎡⎣1 − 2 ( PO + PF ) + 2a ⎤⎦ + PO PF
- 12 -
Correlation Coefficient According to Data Classification Dr. Paul T.I. Louangrath
13
Finally, the actually tetrachoric correlation coefficient is given by:
⎛π ⎞
Sr = sin ⎜ (4a − 1) ⎟ (33)
⎝2 ⎠
α −1
rtc = (34)
α +1
π /4
⎛ AD ⎞
where α = ⎜ ⎟ and the equivalence of ABCD are A = Pa ; B = Pb ; C = Pc and
⎝ BC ⎠
D = Pd . This short-hand version is less complicated than the Pearson’s original
14
version. Yet another shorthand formula for tetrachoric correlation coefficient is
given by:
⎛ 180 ⎞
rtc = cos ⎜ ⎟ (35)
⎜
⎝1+ ( BC / AD ) ⎠⎟
Y
0 1
1 A B A+ B
X 0 C D C+D
A+C B+D
Table 4.0: The 2 × 2 contingency table with score of 0 and 1. ABCD are relative
frequencies.
Y
0 1
1 A = 10 B=5 A + B = 15
X 0 C =5 D = 10 C + D = 15
A + C = 15 B + D = 15 30
13
Juras and Pasaric, p. 67 citing Sheppard (1998): On the application of the theory of error to cases of
normal distribution and normal correlations, Philos. Tr. R. Soc. S.-A, 192, 101-167. See also Johnson,
N.L. and Kotz, S. (1972). Distributions in Statistics. 4. Continuous Multivariate Distributions. Wiley,
New York, p. 333.
14
Pearson, Karl (1904). Mathematical Contributions to the Theory of Evolution. XIII. On the Theory
of Contingency and Its Relation to Association and Normal Correlation. Biometric Series I. Draper’s
Company Research Memoirs. Pp. 5-21.
- 13 -
Correlation Coefficient According to Data Classification Dr. Paul T.I. Louangrath
π /4 3.14 / 4 3.14 / 4
⎛ AD ⎞ ⎛ 10(10) ⎞ ⎛ 100 ⎞
α =⎜ ⎟ =⎜ ⎟ =⎜ ⎟ = 43.14 / 4
⎝ BC ⎠ ⎝ 5(5) ⎠ ⎝ 25 ⎠ , then …
α = 40.79 = 2.97
α − 1 2.97 − 1 1.97
rtc = = = = 0.50
α + 1 2.97 + 1 3.97
⎛ π ⎛ ad − bc ⎞ ⎞
rtc = sin ⎜ ⎜⎜ ⎟⎟ ⎟⎟ (36)
⎜2
⎝ ⎝ ad + bc ⎠ ⎠
⎛ π ⎛ ad − bc ⎞ ⎞ ⎛ π ⎛ 10(10) − 5(5) ⎞ ⎞
rtc = sin ⎜ ⎜⎜ ⎟
⎟⎟⎟ = sin ⎜ ⎜ ⎟⎟ ⎟
⎜ ⎜ ⎜ ⎟
⎝ 2 ⎝ ad + bc ⎠ ⎠ ⎝ 2 ⎝ 10(10) + 5(5) ⎠ ⎠
⎛ π ⎛ 100 − 25 ⎞ ⎞ ⎛ π ⎛ 10 − 5 ⎞ ⎞
rtc = sin ⎜ ⎜⎜ ⎟ ⎟ = sin ⎜ 2 ⎜ 10 + 5 ⎟ ⎟
⎜ 2 100 + 25 ⎟ ⎟ ⎝ ⎝ ⎠⎠
⎝ ⎝ ⎠⎠
⎛ π ⎛ 5 ⎞⎞
rtc = sin ⎜ ⎜ ⎟ ⎟ = sin (1.57(0.33) ) = sin(0.52)
⎝ 2 ⎝ 15 ⎠ ⎠
rtc = 0.50
The result of the calculation shows that equations (34) and (36):
- 14 -
Correlation Coefficient According to Data Classification Dr. Paul T.I. Louangrath
α −1 ⎛ π ⎛ ad − bc ⎞ ⎞
rtc = rtc = sin ⎜ ⎜⎜
and ⎟⎟ ⎟⎟ produce the same result and
α +1 ⎜2
⎝ ⎝ ad + bc ⎠ ⎠
⎛ 180 ⎞
equation rtc = cos ⎜ ⎟ produces a higher coefficient. In comparison,
⎜ 1 + ( BC / AD ) ⎟
⎝ ⎠
equation (33) yields the following computation:
⎛π ⎞
Sr = sin ⎜ (4a − 1) ⎟
⎝2 ⎠
⎛π ⎞ ⎛π ⎞ ⎛π ⎞
Sr = sin ⎜ (4(10) − 1) ⎟ = sin ⎜ (40 − 1) ⎟ = sin ⎜ (39) ⎟
⎝2 ⎠ ⎝2 ⎠ ⎝2 ⎠
⎛ 3.14 ⎞
Sr = sin ⎜ (39) ⎟ = sin (1.57(39) ) = sin(61.23)
⎝ 2 ⎠
S r = −1
The results of the computation are from various claims of method are not in
agreement. Below are the computation of the Peirce’s measure, Heike, Doolittle and
Yule. For convenience the following definition and value are given:
PO = a + c = 10 + 5 = 15
PF = a + b = 10 + 5 = 15
The result does not seem to be accurate because the range of a correlation coefficient
is between -1 and +1. The calculation under the Heidke measure follows:
Again a different number is obtained. The calculation under the Doolittle measure
follows:
- 15 -
Correlation Coefficient According to Data Classification Dr. Paul T.I. Louangrath
a − PO PF 10 − 15(15) 215
sD = = =
PO (1 − PO ) PF (1 − PF ) 15(1 − 15)15(1 − 15) 15(14)15(14)
215 215 215 215 215
sD = = = = =
15(14)15(14) 15(14)210 210(210) 4410 210
sD = 1.02
This result coincides with the Peirce measure. Laslty, under the Yule’s method the
calculation follows:
a − PO PF
SY =
a ⎡⎣1 − 2 ( PO + PF ) + 2a ⎤⎦ + PO PF
10 − (15)(15) 10 − 225
SY = =
10 [1 − 2(15 + 15) + 2(10) ] + 15(15) 10 [1 − 2(30) + 20] + 225
−215 −215 −215 −215
SY = = = =
10 [1 − 60 + 20] + 225 10 ( −39 ) + 225 −390 + 225 −165
SY = 1.30
The result under the Yule’s measure also exceeds 1.00. Bias in each case is
determined by: Bias = PF / PO = 15 /15 = 1.00 . The results of the various measures
may be summarized thus:
sP = 1.02
sH = 1.02
sD = 1.02
sY = 1.30
Sr = −1.00
The values appear to be consistent, except the sign for Sr . With the exception of the
negative sign of Sr , the interpretation of the correlation coefficient would otherwise
be consistent. However, with the negative Sr , the Sr value would point to an
opposite meaning in interpretation.
Xi X ( Xi − X ) S Zi = ( X i − X ) / S Zˆi
1.02 0.672 0.348 0.9425 0.369 1.65
1.02 0.672 0.348 0.9425 0.369 1.65
1.02 0.672 0.348 0.9425 0.369 1.65
1.30 0.672 0.628 0.9425 0.666 1.65
-1.00 0.672 -1.672 0.9425 -1.774 1.65
Table 5.0: Determining the level of significance of the for the variance estimates for
the tetrachoric correlation. All estimates are consistent except Sr showing significant
difference: −1.774 > 1.65 .
- 16 -
Correlation Coefficient According to Data Classification Dr. Paul T.I. Louangrath
⎡ x* ⎤ ⎡ y* ⎤
⎢ 1⎥ ⎢ 1⎥
⎢ x* ⎥ ⎢ y* ⎥
X* = ⎢ 2⎥ and Y* = ⎢ 2 ⎥
⎢ ⎥ ⎢ ⎥
⎢ *⎥ ⎢ *⎥
⎣ xn ⎦ ⎣ yn ⎦
Thus, X * = {x1* , x2* ,..., xn* } and Y * = { y1* , y2* ,..., yn* } . These are observed data. Assume
that there are discrete and random variables X and Y (note that there is no asterisk
marking this “unobserved” set) which relates to X * and Y * where
15
Holgado–Tello, F.P., Chacón–Moscoso, S, Barbero–García, I. and Vila–Abad E (2010). Polychoric
versus Pearson correlations in exploratory and confirmatory factor analysis of ordinal
variables. Quality and Quantity 44(1), 153-166.
- 17 -
Correlation Coefficient According to Data Classification Dr. Paul T.I. Louangrath
⎡ x* → x ⎤ ⎡ y* → y ⎤
⎢ 1 1
⎥ ⎢ 1 1
⎥
*
⎢ x2 → x2 ⎥ ⎢ y2* → y2 ⎥
X*→ X = ⎢ ⎥ and Y* → Y = ⎢ ⎥
⎢ ⎥ ⎢ ⎥
⎢ * ⎥ ⎢ * ⎥
⎣ xn → xk ⎦ ⎢⎣ yl → yη ⎥⎦
P [ y = l ] = ql for Y (40)
The cloud of the unobserved variables X * and Y * as defined by xi* and yi*
may be projected onto a space, thus:
⎡ ⎤
⎢π π1L ⎥⎥
⎢ 11 π12
⎢ π12 π 22 π 2L ⎥ (41)
⎢ ⎥
⎢ ⎥
⎢⎣π K1 π K 2 π KL ⎥⎦
The term π KL is called the discrete cell proportion. The objective of polychoric
correlation is to find the value for π KL . Recall that π is the joint probability of X
and Y pair that is not observed.
With the above set up, polychoric correlation can now be discussed.
Polychoric correlation is developed as the result of the inadequacy of the tetrachoric
16
correlation to handle a K × L contingency table; recall that the tetrachoric
16
Ritchie-Scott used the notation as r × s in labeling the contingency matrix. Zoran Pasoric and Josip
Juras uses K × L . Common statistics designated the multivariable contingency table as K × K . In
general, the K × K or K × L contingency table is provided as:
OBSERVATIONS
C1 C2 ... CK
C1 P11 P12 ... P2 K PF1
FORECAST
- 18 -
Correlation Coefficient According to Data Classification Dr. Paul T.I. Louangrath
17
correlation is confined to a 2 × 2 contingency table. In 1900, Pearson introduced
tetrachoric correlation calculation as an attempt to obtain a quantitative measurement
of a continuous variable. However, Pearson’s earlier attempt was confined to 2 × 2
18
scenario. The work was further expanded by Ritchie-Scott to cover a K × L
scenario which became known as polychoric correlation today. Where as Pearson’s
2 × 2 tetrachoric handles dichotomous data, Ritchie-Scott’s K × L handles
polytomous tests.
The observed array X * and Y * are assumed to be bivariate normal, and the
correlation between X * and Y * is given by ρ (rho) given series of unobserved data
set of xi* and yi* . The objective of polychoric correlation is to obtained the
correlation of the unobserved arrays X and Y from the product-moment or
correlation of the observed arrays X * and Y * where xi* and yi* is jointly normally
distributed. The polychoric correlation is estimated from the discrete cell proportion
π KL ; thus, this estimated value is designated as πˆ KL from the K × L contingency
table (5).
The probability density function (PDF) of the bivariate normal X * and Y * is
given by:
⎡ ⎤
1 ⎢ x *2 −2 ρ x * y * + y * ⎥
φ ( x*, y*; ρ ) = exp − (42)
2π 1 − ρ 2 ⎢
⎣⎢
2 1− ρ 2 (
⎥
⎦⎥ )
The cumulative distribution function (CDF) for the bivariate normal X * and Y * is
given by:
ξ1 ξ1
Φ 2 (ξi ,ηi ; ρ ) = ∫ ∫ φ ( x*, y*; ρ ) dy * dx * (43)
−∞ −∞
Bias = ( PF ,1 / P1i ,..., PF , K −1 / P0, K −1 ) and Pobs = ( Pobs,1,..., Pobs,k −1 ) . The notation above
uses: Prow, column . Note that there is switching of observation and forecast above. This alternation
does not change the number of interpretation of the result.
17
Pearson, K (1900). “Mathematical contributions to the theory of evolution. Vii. On the correlation
of characters not quantitatively measurable.” Philosophical Transaction of the Royal Society of
London. Series A, 195: 1-47.
18
Ritchie-Scott, A. (1918). A new measure of rank correlation. Biometrika, 30:81-93.
- 19 -
Correlation Coefficient According to Data Classification Dr. Paul T.I. Louangrath
The term hkl stands for the likelihood of event k and l occurring and this likelihood is
a function of theta θ and θ is given by:
π kl = hkl (θ ) (47)
A general statement may now be made about π ; now let π = [π11,..., π KL ] ' and the
likelihood of θ may be generally stated as h[θ ] = [ h11 (θ ),..., hKL (θ ) ] ' . Now, the
polychoric equation may be written as:
π = h(θ ) (48)
19
Olsson provides a close estimate of π as the maximum log likelihood which
is equivalent to the estimated theta θˆ . Olsson’s maximum log likelihood is given by:
K L
ln L = ∑ ∑ π kl loghkl (θ ) (49)
k =1 l =1
Recall that theta θ is the likelihood function. There are three kinds of
maximum likelihood functions used according to the type of data distribution: (i)
Bernoulli distribution, (ii) normal distribution, and (iii) Poisson distribution.
Polychoric correlation deals with ordinal data. Ordinal data is ranked data set. The
appropriate form of maximum likelihood function type is one that is used for normal
distribution. The maximum likelihood function for normal distribution is provided
thus:
1 ⎡ ( x − µ )2 ⎤
f ( x1, x2 ,..., xn | µ , σ ) = ∏ exp ⎢ − i ⎥ (50)
σ 2π ⎢ 2 ⎥
⎣ 2σ ⎦
−π / 2 ⎡ ∑ ( x − µ )2 ⎤
f ( x1, x2 ,..., xn | µ , σ ) =
( 2π )
exp ⎢ − i ⎥ (51)
⎢2 ⎥
σ
n
⎣ 2σ ⎦
The maximum likelihood is express as the natural log of the function; therefore,
19
Olsson, U. (1979). Maximum likelihood estimation of the polychoric correlation coefficient.
Psychometriko, 44: 443-460.
- 20 -
Correlation Coefficient According to Data Classification Dr. Paul T.I. Louangrath
2
1 ∑ ( xi − µ )
ln f = − n ln(2π ) − n ln σ − (52)
2 2σ 2
To find the expected mean of the function, the derivative of the maximum likelihood
function is taken by:
∂ (ln f ) ∑ ( xi − µ )
= =0 which gives the expected mean as: (53)
∂µ σ2
∑ xi
µˆ = (54)
n
2
∂ ( ln f ) n ∑ ( xi − µ )
=− + =0 (55)
∂σ σ σ3
2
∑ ( xi − µˆ )
σˆ = (56)
n
The above steps obtained the maximum likelihood of mean and standard
deviation as the mean and standard deviation of the sample. This may be a biased
estimate; nevertheless, for purposes of demonstrating how the maximum likelihood is
calculated in the context of the maximum likelihood of πˆ KL , it is an adequate
explanation.
⎛ M − M0 ⎞
rrb = 2 ⎜ 1 ⎟ (57)
⎝ n1 + n0 ⎠
where the subscripts 1 and 0 refers to the score of 1 and 0 in the 2 × 2 contingency
table; M is the mean of the frequency of the scores, and n is the sample size. The
null hypothesis is that rrb = 0 , meaning there is no correlation. If the null hypothesis
is true, the data is distributed as Mann-Whitney U.
The objective of the Mann-Whitney U Test is to verify the claim that the
standard deviation of population A is the same as the standard deviation of population
B; if so, then the two populations are identical, except for their locations, i.e. the
20
Viz. Gene V. Glass and Kenneth D. Hopkins (1995). Statistical Methods in Education and
Psychology (3rd ed.). Allyn & Bacon. ISBN 0-205-14212-5.
- 21 -
Correlation Coefficient According to Data Classification Dr. Paul T.I. Louangrath
populations in two cities have the same income. The case involves two population
located at a different place; this is a case that could be termed parallel group. The
claim by the alternative hypothesis ( H A ) is that the two populations are the same and
have the same population standard deviation. The logic follows that “if the two
standard deviations are the same, there difference, i.e. σ1 − σ 2 = 0 , must equal to
zero.” The null hypothesis ( H 0 ) states the obverse: “the two populations are
different; their means are different. Therefore, σ1 − σ 2 ≠ 0 .”
The procedure for conducting the Mann-Whitney U test involves five steps.
Each step is explained below thus:
1. Collect a sample from each population. The sample size of the two samples
may be equal or unequal. Mark one sample as n1 and the second sample n2 . It
does not mater which one is designated as the first or the second sample.
However, conventional practice dictates that treat the largest sample as n1 and
the smaller sample as n2 .
2. Combine the two samples into one array, i.e. one set as shown below:
n = n1 + n2 (58)
3. Rank the combined sample ( n ) in an ascending order, i.e. from low to high so
that the elements of the set is arranged as: n1 < n 2,... < nN .
4. calculate the test statistic for the Mann-Whitney U test according to the
formula below:
⎛ n1 ( n1 + n2 + 1) ⎞
⎜ ⎟
Z = W1 − ⎜ 2 ⎟+C (59)
⎜ Sw ⎟
⎜ ⎟
⎝ ⎠
where …
n1
W1 = ∑ Rank ( X lk ) (60)
k =1
This ( W1 ) is called the rank sum. The standard deviation of the ranked set is
given by:
⎛ ⎛ ⎞ ⎞
⎜ n1n2 ⎜ ∑ ti3 − ti ⎟ ⎟
n1n2 ( n1 + n2 + 1) ⎝ i =1 ⎠
Sw = − ⎜ ⎟ (61)
12 ⎜ 12 ( n1 + n2 )( n1 + n2 − 1) ⎟
⎜ ⎟
⎝ ⎠
where t1 = number of observations tied at value one;
t2 = number of observations tied at value two, and so on.
C = correction factor. This number is fixed at 0.50 if the
numerator if Z is negative and -0.50 if the numerator of Z is positive.
- 22 -
Correlation Coefficient According to Data Classification Dr. Paul T.I. Louangrath
5. Use the following decision rule to determine whether to accept or reject the
null hypothesis:
The Mann-Whitney U test is used in the following cases: (i) the test involves
the comparison of two populations; (ii) the values of the data is non-parametric;
therefore, it is an alternative test to the conventional t-test; (iii) the data is classified as
ordinal and NOT interval scale. “Ordinal” means that the data score, i.e. answer
choice, has the spacing between each score unit is unequal or non-constant. For
example, a scale of 1 (lowest) to 5 (highest) would not be able to use the Mann-
Whitney U test. Whereas, 1st place, 2nd place, and 3rd place type of answer choice,
where the distance between the first, second, and third are not equal, may be
appropriate for this test; and (iv) it is said that the Mann-Whitney test is more robust
and efficient than the conventional t-test.
21
“Robustness” means that the final result is not unduly affected by the
outliers. Outliers are extreme value. If the system is robust, it will not be affected by
extreme value. Generally, extreme value tends to create bias estimate by the estimator
because outliers or extreme values creates greater variance and thus larger standard
deviation. This problem is eliminated through the use of ranking the data by arranging
the combined sets of n = n1 + n2 into one set ranking from lowest value to highest
value.
“Efficiency” is the measure of the desirability of the estimator. The estimator
is desirable if it yields an optimal result. It yields the optimal result it the observed
22
data meets or comes closest to the expected value.
Recall that the conventional t-test is given by:
x −µ
t= (62)
S/ n
x −µ
Z= (63)
σ/ n
From equation (43), the population standard deviation may be written as:
21
Portnoy S. and He, X. “A Robust Journey in the New Millennium,” Journal of the American
Statistical Association Vol. 95, No. 452 (Dec., 2000), 1331–1335.
22
Everitt, Brian S. (2002). The Cambridge Dictionary of Statistics. Cambridge University
Press. ISBN 0-521-81099-X. p. 128.
- 23 -
Correlation Coefficient According to Data Classification Dr. Paul T.I. Louangrath
⎛ x −µ ⎞
σ =⎜ ⎟ n (64)
⎝ Z ⎠
Note that the population standard deviation in equation (64) may not be
determined unless the conventional t-equation (62) is used to determine the
population mean ( µ ). The value of µ is derived from equation (62) as:
⎛ S ⎞
µ =t⎜ ⎟− x (65)
⎝ n⎠
pa − p X pY
rphi = φ = (66)
( p X pY (1 − p X )(1 − pY ) )
There is an equivalence of equation (66) by contingency coding method of blocks
ABCD in the table:
rphi =
( BC − AD ) (67)
( A + B )( C + D )( A + C )( B + D )
rphi =
( BC − AD )
( A + B )( C + D )( A + C )( B + D )
rphi =
( 25 − 100 ) = −75 = −75
(15)(15)(15)(15) 50, 625 225
rphi = −0.33
χ2
φ= (68)
n
where χ = 2 (n − 1) S 2
or χ = ∑
2 ( Oi − Ei )2 .
σ2 Ei
- 24 -
Correlation Coefficient According to Data Classification Dr. Paul T.I. Louangrath
χ2
C= (69)
N + χ2
where χ 2 is chi square and N is the grand total of observations. Generally, the range
for correlation coefficient is -1 and +1; however, C does not reach this range. For a
2 × 2 table, it can reach 0.707 and 0.870 for 4 × 4 table. In order to reach the interval
24
maximum, more categories has to be added.
ε1 − ε 2
λ= (70)
ε1
where ε1 is the overall non-modal frequency; and ε 2 is the sum of the non-modal
frequencies for each value of independent variable. The range of lambda is 0 ≤ λ ≤ 1 .
Zero means there is no association between the independent and dependent variables,
and one means there is a perfect association between the two variables.
Goodman and Kruskal deals with optimal prediction of two polytomies
(multiples) which are asymmetrical where there is no underlying continua and no
25
ordering of interest. The Goodman and Kruskal (GK) polytomy is described by
A × B crossing in the table below:
B
A B1 B2 Bβ Total
A1 ρ11 ρ12 ρ1β ρ1i
A2 ρ 21 ρ 22 ρ2β ρ 2i
23
Pearson, Karl (1904). P. 16.
24
Smith, S. C., & Albaum, G. S. (2004) Fundamentals of marketing research. Sage: Thousand Oaks,
CA. p. 631.
25
Goodman, Leo A. & Kruskal, William H. (1954). “Measure of Association for Cross
Classifications.” Journal of the American Statistical Association, Vol 49, No. 268 (Dec 1954), pp. 732-
764.
- 25 -
Correlation Coefficient According to Data Classification Dr. Paul T.I. Louangrath
Aα ρα 1 ρα 2 ραβ ρα i
Total ρi1 ρi 2 ρi β 1
Table 6.0: A divides the population into alpha ( α ) classes where α : ( A1, A2 ,..., Aα ) .
Similarly, B divides the population into beta ( β ) classes where β : ( B1, B2 ,..., Bβ ) .
The proportion that classified as both Aα and Bβ is ραβ . The marginal proportion
ρα i is the proportion of the population classified as Aα and ρi β is the proportion of
the population classified as Bβ . See Goodman & Kruskal (1954). p. 734.
P(e1 ) − P(e2 )
λb = (71)
P (e1 )
which can be written as:
∑ ρam − ρim
λb = a (72)
1 − ρi m
∑ ρmb − ρmi
λa = b (73)
1 − ρ mi
26
Goodman & Kruskal, p. 742.
27
Guttman, Luis (1941). “An outline of the statistical theory of prediction,” Supplementary Study B-
1, pp. 253-318 in Horst, Paul et al. (eds.), The Prediction of Personal Adjustment, Bulletin 48, Social
Science Research Council, New York (1941).
- 26 -
Correlation Coefficient According to Data Classification Dr. Paul T.I. Louangrath
⎡ ⎤
0.5 ⎢ ∑ ρ am + ∑ ρ mb − ρi m − ρ mi ⎥
λ= ⎣⎢ a b ⎦⎥ (74)
1 − 0.5 ( ρi m + ρ mi )
The lambda proposed by Goodman and Kruskal lies between λa and λb described by
Guttman. The range of GK’s lambda is 0 ≤ λ ≤ 1 . Goodman and Kruskal alternatively
the terms in lambda by “[l]et v be the total number of individuals in the population,
28
vab = v ρ ab , vam = v ρ am , vmb = v ρ mb , and so on.” Under this general definition, the
Guttment λa and λb becomes:
∑ vam − vim
λb = a and (75)
v − vi m
∑ vmb − vmi
λa = b (76)
v − vmi
The following table demonstrates GK’s new definition of the population and its
components.
B
A B1 B2 B3 B4 va i
A1 1768 807 189 47 2811
A2 946 1387 746 53 3132
A3 115 438 288 16 857
vib 2829 2632 1223 116 v = 6800
Table 7.0: The numerical examples are taken from Kendall’s work and also was
reproduced in Goodman and Kruskal’s article (p. 744). See Kendall, Maurice G.
(1948). The Advanced Theory of Statistics, London, Charles Griffin and Co., Ltd.
1948; p. 300.
28
Goodman & Kruskal, p. 743.
- 27 -
Correlation Coefficient According to Data Classification Dr. Paul T.I. Louangrath
- 28 -