You are on page 1of 4

Non-parametric Correlation Nonparametric methods were developed to be used in cases when the researcher knows nothing about the

parameters of the variable of interest in the population (hence the name nonparametric). In more technical terms, nonparametric methods do not rely on the estimation of parameters (such as the mean or the standard deviation) describing the distribution of the variable of interest in the population. Therefore, these methods are also sometimes (and more appropriately) called parameter-free methods or distribution-free methods. Commonly used nonparametric correlation coefficients are Spearman R, Kendall Tau, and Gamma coefficients. Spearman R assumes that the variables under consideration were measured on at least an ordinal (rank order) scale, that is, that the individual observations can be ranked into two ordered series. Spearman R correlation is calculated by applying the Pearson correlation formula to the ranks of the data rather than to the actual data values themselves. Pearson correlation measures the strength of linear relationship between X and Y. In the case of nonlinear, a useful measure Spearmans rank correlation coefficient, Rho, which is a Pearsons type correlation coefficient is computed on the ranks of X and Y values. (Pearson r tells us the magnitude and direction of the association between two variables that are on interval or ratio scale.) Differences di = xi yi between the ranks of each observation on the two variables are calculated, and is given by:

Example Statistics score y 1 2 3 5 6 7 8 Math score x 2 3 5 6 7 10 7 y 1 2 3 4 5 6 7.5 x 1 2 3 4 5.5 8 5.5 d2 0 0 0 0 0.25 4 4

7.5

0.25 8.50

rs=1- (6d2 / n(n2-1))=1- 0.1012= 0.8988 For n=8 at 1% significance level rs=0.833. 0.8988>0.833 So we reject H0. Kendall tau is equivalent to Spearman R with regard to the underlying assumptions. It is also comparable in terms of its statistical power. Kendall tau and Spearman R imply different interpretations: Spearman R can be thought of as the regular Pearson product moment correlation coefficient, that is, in terms of proportion of variability accounted for, except that Spearman R is computed from ranks. Kendall tau, on the other hand, represents a probability, that is, it is the difference between the probability that in the observed data the two variables are in the same order versus the probability that the two variables are in different orders. Gamma statistic is preferable to Spearman R or Kendall tau when the data contain many tied observations. Point biserial correlation is used to look at the correlation between a dichotomous variable and a continuous variable. For example, correlation between GPA (continuous) and gender(dichotomous). To calculate rpb, assume that the dichotomous variable Y has the two values 0 and 1. If we divide the data set into two groups, group 1 which received the value "1" on Y and group 2 which received the value "0" on Y, then the point-biserial correlation coefficient is calculated as follows:

where sn is the standard deviation used when you have data for every member of the population:

M1 being the mean value on the continuous variable X for all data points in group 1, and M0 the mean value on the continuous variable X for all data points in group 2. Further, n1 is the number of data points in group 1, n0 is the number of data points in group 2 and n is the total sample size.

Example scores 80-84 75-79 70-74 65-69 60-64 55-59 50-54 45-49 40-44 35-39 30-34 25-29 Finished early (P) 3 4 6 5 10 10 15 4 3 60 Finished late (v) 2 2 5 9 5 5 3 2 4 2 1 40 f 3 6 8 10 19 15 20 7 5 4 2 1 100

M= 58.05 average of all numbers (N=100)


= 11.63 SD of all numbers Mp= 60.08 average of finished early Mv= 55.00 average of finished late

Tetrachoric correlation is applicable when both observed variables are dichotomous. If you have dichotomous data on two variables but are willing to assume that the underlying variables are normally distributed, you may use the tetrachoric correlation to estimate the size of the Pearson between the underlying variables. Suppose you are interested in the relationship between misanthropy and attitudes about animal rights. You have data on 91 people. For each you know whether e is a misanthrope (1) or not (0) and whether e supports animal rights (1) or not. You are willing to assume that the underlying variables are both normally distributed.

You might also like