You are on page 1of 61

Correlations and Inferential Statistics

Carlo Magno, PhD De La Salle Universty, Manila

Objectives
Identify the uses of a correlation coefficient Test the significance level of a correlation coefficient Obtain correlation coefficient using a computer/LR mode of a scientific calculator Interpret a correlation coefficient Propose a study in a school context making use of a correlation coefficient

A. Case Analysis
A guidance counselor (GC) wanted to replace the Metropolitan Achievement Test (MAT) with a less expensive and useful test, the Iowa Test for Basic Skills (ITBS). To determine if the MAT is equally measuring the ITBS, the GC looked at the manual of the MAT and found that it is convergent to ITBS with a correlation coefficient r=.82, p<.001.
What was the purpose of the guidance counselor? How did she find answer to her purpose? Can the ITBS used as a replacement for the MAT? What helped her decide to use ITBS as a replacement?

B. Case analysis
A guidance counselor wanted to find out is an aptitude test is good to include with an achievement tests to create a battery of tests given to new student applicants for high school. She had an available data of 150 students from the past years. She used this data to correlate the aptitude test scores with the achievement. She obtained a correlation coefficient of r=.91, p<.01.
What did the guidance counselor do to decide if the aptitude test can be included with the achievement as a set of battery? Can the aptitude test be included to form a battery of test? What will help the guidance counselor decide if the aptitude is good to be included.

C. Case analysis
A guidance counselor (GC) wanted to predict if the new students will be successful in the school in the future through their grades. The new students took an achievement as well as an aptitude test before enrollment. After the first quarter, the GC obtained the grades of the new students and she used their aptitude and achievement test scores to predict the grades. She computed for the multiple regression and found that aptitude contributes .38 (p<.05) to grades and achievement contributes .25 (p<.05) to grades.
Why do we need to know if the entrance test predicts grade sof students in the future? What statistics was used to predict grades? What were used as predictors of grades? What statistics was used to make the predictions? Are the achievement and aptitude predictive of the grades?

What is a correlation?
Measure of relationship between two variables Ex. Grades in English tends to be related with Foreign Language Height and weight Involves two variables that are paired because there is a relationship between them Deal with measurements made on two variables X and Y - bivariate

Nature of a correlation
Magnitude/direction of the relationship Strength of the relationship Variance explained Significance of the relationship

Linear Regression
There is a straight line relationship between variables X and Y When X increases, Y also increasespositive relationship When X increases, Y decreases or vice versa negative relationship

Relationship between achievement and aptitude

Achievement (X) 100 95 90 85 82 80 75 70 65

Aptitude (Y) 99 98 94 87 84 81 78 73 68

Regression Line between achievement and aptitude


Scatterplot: X vs. Y Y = 14.379 + .85633 * X Correlation: r = .98966 105 100 95 90 85 80 75 70 65 60 55 40

50

60

70 X

80

90

100 95% confidence

110

Laziness 100 95 90 85 75 70 65 60 55

Perseverance 35 40 45 50 55 60 64 70 76

Relationship between Laziness and Perseverance


Scatterplot: Y vs. X X = 139.94 - 1.138 * Y Correlation: r = -.9959 110

100

90

80

X
70 60 50 40 30

40

50

60 Y

70

80 95% confidence

90

Magnitude of the Relationship


Positive relationship as one variable increases the other variable also increases Ex. academic grades and intelligence Negative relationship as one variable increases, the other decreases or vice versa Ex. procrastination and motivation Absence of relationship between variables denoted by .00

Strength of Relationship
A correlation coefficient is computed for a bivariate distribution using a statistical formula
Correlation Coefficient Value Interpretation

0.80 1.00
0.6 0.79 0.40 0.59 0.2 0.39 0.00 0.19

Very strong relationship


Strong relationship Substantial/marked relationship Low relationship Negligible relationship

Relationship of math ability and attitude


attitude in math 45 18 14 math ability 18 20 20

16
48 45 16 20 48 18

32
52 32 18 25 64 12

Strength of relationship
Scatterplot (Spreadsheet1 2v*10c) math ability = 8.4351+0.7245*x 70

60

50

40

30

attitude in math:math ability: r = 0.6698, pattitude in math = 0.0341

math ability

20

10

0 10

15

20

25

30

35

40

45

50

Relationship of spelling and English ability


spelling 14 26 16 10 45 34 30 23 34 24 English ability 34 12 23 21 22 19 50 20 29 18

Strength of Relationship
Scatterplot (Spreadsheet1 2v*10c) English ability = 24.0142+0.0307*x 55 50 45 40 35 30

spelling:English ability:

English ability

25 20 15 10 5 10 15 20 25 30 35 40 45 50 r = 0.0305, p = 0.9333spelling

Variance
How much of Ys is explained/accounted for by X Proportion explained Square of the correlation coefficient value r=.67, r2=.4489 X 100 = 44.89 Interpretation=44.89% of the time attitude in math explains variability in math ability. 55.11% of the time math ability is explained by other factors.

Conditions in interpreting r
Linear regression the points in a scatterplot should tend to fall along a straight line The size of the r reflects the amount of variance that can be accounted for by a straight line Homosedasticity tendency of the standard deviation (or variances) of the arrays to be equal.

Correlational Techniques
Pearson Product-Moment correlation (r) used for interval/ratio sets of variables Spearman Rank-order correlation two sets of data are ordinal Phi coefficient each of the variables is a dichotomy

Hypothesis Testing
A systematic procedure for deciding whether the results of a research study, which examines a sample, support a particular theory or practical innovation, which applies to the population (Aron & Aron (2004).

An example of data for Hypothesis Testing


A researcher wanted to determine the relationship between a students performance in general psychology and his attitude towards the subject. Performance was measured through a series of tests in GENPSYC Attitude is measured through by the Shore and Shores Attitude Scale.

Steps in Hypothesis Testing


STEP 1: State the Null and alternative Hypothesis H0=There is no significant relationship between attitude and performance.
r=0

H1=There is a significant relationship between attitude and performance


r=0

Steps in Hypothesis Testing


STEP2: Determine the alpha level of significance, degrees of freedom and critical value Alpha level: =.05, .01 5% or 1% of the comparison distribution in which a sample would be considered an extreme that the possibility that it came from a distribution like this would be rejected. 5% or 1% = region of rejection 95 or 99%=region of acceptance

Steps in Hypothesis Testing


Degrees of Freedom (df) refers to power of a statistical test The more cases the higher the df, then the more probability the sample will represent the population. df=n-2

Steps in Hypothesis Testing


Critical value Cut-off sample score How extreme a sample score is needed to draw a confident conclusion

Steps in Hypothesis Testing


STEP 3: Computation Formulas are used to determine the obtained or computed value

Steps in Hypothesis Testing


STEP 4: Decision Rule Decide whether to reject or retain the null hypothesis Reject the null hypothesis if the probability of getting a result is less than 5%, p<.05 When a sample score is so extreme that researchers reject the null hypothesis, the result is said to be statistically significant

Steps in Hypothesis Testing


p < .05/.01 = reject the H0, significant p > .05/.01 = retain the H0, not significant Obtained value > critical value = reject the H0, significant Obtained value < critical value =retain the H0, not significant

Example
1. Ho: There is no significant relationship between attitude and performance H1; There is a significant relationship between attitude and performance 2. N = 157, =.05, df=155, r critical=.161 3. r computed = .11, p value=.19 4. Decision=since the r obtained which is .11 is less the r critical (.161), the null hypothesis is not rejected. There is no significant relationship between attitude and performance in general psychology

Illustration

2.5% region of rejection

95%

2.5% region of rejection

Z=1.38 r=.11

Z=2.03, r=.161

Decision Errors
Type 1 error = if you reject the null hypothesis when in fact the null hypothesis is true Type 2 = in reality the research hypothesis is true, but the result doesnt come out extreme enough to reject the null hypothesis

Decision error
Real situation H0 is true
H1 is supported Reject Ho Study inconclusive Do not reject H0 Type I error Type II error

Real situation H1 is true

Computation of Pearson r
Stude nt Aptitude Test (Time 1) X Aptitude Retest (Time 2) Y

XY

X2

Y2

A
B C D E F G H I J

45
30 20 15 26 20 35 26 10 27 X=254

47
33 25 19 28 23 38 29 15 29 Y=286

2115
990 500 285 728 460 1330 754 150 783 XY =8095

2025
900 400 225 676 400 1225 676 100 729 X2 =7356

2209
1089 625 361 784 529 1444 841 225 841 Y2 =8948

Computation of Pearson r
r NXY (X )(Y ) [ NX 2 (X ) 2 ][NY 2 (Y ) 2 ]
10(8095) (254)(286) [10(7356) (254) 2 ][10(8948) (286) 2 ]

r = .996

Examples
Magno, Tangco, Cabangon, & Crisostomo

Magno (2007)

Exercise in computing for the Pearson r


Math test 8 7 7 7 5 2 2 6 6 5 Statistics test 9 9 9 9 8 8 8 6 6 6

Y = a + bX Bivariate correlation

y = b1x1 + b2x2 + ... + bnxn + c


Multiple correlation

Multiple Regression association between a criterion variable and two or more predictor variables (Aron & Aron, 2003).
Multiple correlation coefficient = R Using two or more variables to predict a criterion variable.

Onwuegbuzie, A. J., Bailey, P, & Daley, C. E. (2000). Cognitive, affective, personality, and demographic predictors of foreign-language achievement. The Journal of Educational Research, 94, 3-15.

Cognitive Academic Ach.

Study Habits
Expectation Affective Perception Anxiety Personality Cooperativeness Competitiveness Demographic Gender

Foreign Language Achievement

Age

Espin, C., Shin, J., Deno, S. L., Skare, S., Robinson, S., & Brenner, B. (2000). Identifying indicators of written expression proficiency for middle school students. The Journal of Special Education, 34, 140-153.

Words written

Words correct
Characters Sentences Character/Word Word/sentences Correct word sentences Incorrect Word sentences

Written Expression Proficiency

Correct minus incorrect word sentences


Mean length of correct word sentences

Results
Regression coefficient () /Beta weight Distinct contribution of a variable, excluding any overlap with other predictor variables. Unstandardized simple regression coefficient Standardized regression coefficient - converted variables (independent and dependent) to zscores before doing the regression. Indicates which independent variable has most effect on the dependent variable.

Results
Multiple correlation coefficient (R) the correlation between the criterion variable and all the predictor variables taken together. Squared Correlation Coefficient (R2) The percent of variance in the dependent variable explained collectively by all of the independent variables. R2adjusted - assessing the goodness of fit of a regression equation. How well do the predictors (regressors), taken together, explain the variation in the dependent variable. R2adj = 1 - (1-R2)(N-n-1)/(N-1)

R2adj above 75% as very good; 50-75% as good; 25-50% as fair; below 25% as poor and perhaps unacceptable. R2adj values above 90% are rare in psychological data

Residual - The deviation of a particular point from the regression line (its predicted value). t-tests - used to assess the significance of individual b coefficients. F test - The F test is used to test the significance of R, F = [R2/k]/[(1 - R2 )/(n - k - 1)].

Considerations in using multiple regression:


The units (usually people) observed should be a random sample from some well defined population. The dependent variable should be measured on an interval, continuous scale. The independent variables should be measured on interval scales

Considerations in using multiple regression:


The distributions of all the variables should be normal The relationships between the dependent variable and the independent variable should be linear. Although the independent variables can be correlated, there must be no perfect (or near-perfect) correlations among them, a situation called multicollinearity.

Considerations in using multiple regression:


There must be no interactions (in the anova sense) between independent variables a rule of thumb for testing b coefficients is to have N >= 104 + m, where m = number of independent variables.

Reporting regression results:


The data were analyzed by multiple regression, using as regressors age, income and gender. The regression was a rather poor fit (R2adjusted = 40%), but the overall relationship was significant (F3,12 = 4.32, p < 0.05). With other variables held constant, depression scores were negatively related to age and income, decreasing by 0.16 for every extra year of age, and by 0.09 for every extra pound per week income. Women tended to have higher scores than men, by 3.3 units. Only the effect of income was significant (t12 = 3.18, p < 0.01).

Do exercise for the multiple regresion case

Demonstrate how multiple regression is done using statistica

Partial Correlation
In its squared form is the percent of variance in the dependent uniquely and jointly attributable to the given independent when other variables in the equation are controlled

Stepwise Regression
y = 0 + 1x1 + 2x2 + 3x3 + 4x4 + 5x5 + 6x6 + 7x7 + 8x8 + 9x9 + 10x10 + 11x11 + 12x12 + 13x13 + 14x14 + 14x14 + e choose a subset of the independent variables which "best" explains the dependent variable.

Heirarchical Regression
1) Forward Selection Start by choosing the independent variable which explains the most variation in the dependent variable. Choose a second variable which explains the most residual variation, and then recalculate regression coefficients. Continue until no variables "significantly" explain residual variation.

Stepwise Regression
2) Backward Selection Start with all the variables in the model, and drop the least "significant", one at a time, until you are left with only "significant" variables. 3) Mixture of the two Perform a forward selection, but drop variables which become no longer "significant" after introduction of new variables.

Hierarchical Regression
The researcher determines the order of entry of the variables. F-tests are used to compute the significance of each added variable (or set of variables) to the explanation reflected in R-square an alternative to comparing betas for purposes of assessing the importance of the independents

Show study on Asian values and epistemological beliefs

Categorical Regression
Used when there is a combination of nominal, ordinal, and interval-level independent variables.

Workshop
1. Propose a hypothesis of a study that you can do in your school. 2. Explain some background about your hypothesis (short, just explain in the presentation) 3. Indicate your plan for the respondents, instruments, procedure, and data analysis. 4. Indicate how will the information be useful in your school.

You might also like