You are on page 1of 8

THE COEFFICIENT Ordinal by ordinal

TYPES

ASSUMPTION/FUNCTION The data is a bivariate random variable. The measurement scale is at least ordinal. Xi ,Yi is independent of Xj ,Yj where i j

EXAMPLE relationship between the distance away from school students live and the IB Geography grades they attain.

COMMENT

Spearman's rho (rs) Spearman's rank correlation coefficient or Spearm an's rho is a nonparametric measure of statistical dependence between two variables. It assesses how well the relationship between two variables can be described using a monotonic function . If there are no repeated data values, a perfect Spearman correlation of +1 or 1 occurs when each of the variables is a perfect monotone function of the other.

ed type indicates what you have been given. Black type indicates the working done.

Null Hypothesis: There is no relationship between the two sets of data.

Distance From School (in miles) 3 7 2 2 1

2 1 3 3 5

therefore therefore

Few advantag es: Less sensitive to bias due to the effect of outliers - Can be used to reduce the weight of outliers (large distances get IB Geography Grades Attain treated as a one4 difference) rank 4 Does not 2. 7 require 6 assumpti on of 5 normalit y. 3. When the intervals between data points are problema

There is a weak positive correlation between the two sets of data. The null hypothesis is rejected.

tic, it is advisable to study the rankings rather than the actual values. 1.

Kendall tau (t) a statistic used to measure the association betwe en two measured quantities.

This test is nonparametric, as it does not rely on any assumptions on the distributions of X or Y or the distribution of (X,Y). If the agreement between the two rankings is perfect (i.e., the two rankings are the same) the coefficient has value 1. If the disagreement between the two rankings is perfect (i.e., one ranking is the reverse of the other) the coefficient has value 1. If X and Y are independent , then we would expect the coefficient to be approximately zero.

e..g..,, associiatiion between Liikert Scalle on work satiisfactiion and work output paiin iintensiity (no,, miilld,, moderate,, severe) and dosage of pethiidiine

Dichotomous by interval/ratio

Interval/ratio by interval/ratio "Pearson's correlation."

point biserial correlation coefficient (rpb) a correlation coefficient used when one variable (e.g. Y) is dichotomous; Y ca n either be "naturally" dichotomous, like gender, or an artificially dichotomized variable and one continuous variable. "Pearson's correlation." measures the degree of linear association between two interval scaled variables

2.

_ Related pairs _ Scale of measurement. For Pearson, data should be interval or ratio in nature. _ Normality _ Linearity _ Homocedasticity Assumption 1: The variables are bivariately normally distributed. If the variables are bivariately normally distributed, each variable is normally distributed

the age and height of students the numbers of days students are absent and their achievement scores on two tests such as math and reading.

ignoring the other variable and each variable is normally distributed at all levels of the other variable. If the bivariate normality assumption is met, the only type of statistical relationship that can exist between two variables is a linear relationship. However, if the assumption is violated, a non-linear relationship may exist. It is important to determine if a nonlinear relationship exists between two variables before describing the results using the Pearson correlation coefficient. Non-linearity can be assessed visually by examining a scatterplot of the data points. Assumption 2: The cases represent a random sample from the population and the scores on variables for one case are independent of scores on the variables for other cases.

The significance test for a Pearson correlation coefficient is not robust to violations of the independence assumption. If this assumption is violated, the correlation significance test should not be computed.

Nominal by nominal Chisquare

Contingency (or cross-tab) tables Observed Expected Row and/or column %s Marginal totals

Bivariate frequency tables Cell frequencies (red) Marginal totals (blue) Clustered bar chart

Phi/Cramer's V

(non-parametric measures of correlation) The choice of the appropriate statistic depends on whether the contingency table is 22 (each variable has two categories) or larger. Coefficient of Determination (r2) The coefficient of determination, r 2, is useful because it gives the proportion is a measure of how well the regression line represents the data. If the regression line passes exactly through every point on

Phi (f) Use for 2x2, 2x3, 3x2 analyses e.g., Gender (2) & Pass/Fail (2) Cramers V Use for 3x3 or greater analyses e.g., Favourite Season (4) x Favourite Sense (5)

For example, if r = 0.922, then r 2 = 0.850, which means that 85% of the total variation in y can be explained by the linear relationship

of the variance (fluctuation) of one variable that is predictable from the other variable. It is a measure that allows us to determine how certain one can be in making predictions from a certain model/graph. The coefficient of determination is the ratio of the explained variation to the total variation

the scatter plot, it would be able between x and y (as described by the to explain all of the variation. The regression equation). The other further the line is away from the 15% of the total variation points, the less it is able to in y remains unexplained. explain.

You might also like