Professional Documents
Culture Documents
5.1 Correlation
We are interested in how two or more attributes of individuals or objects in a population are related to one another. A scatterplot of bivariate numerical data gives a visual impression how strongly x and y values are related. A correlation coefficient is a quantitative assessment of the strength of relationship between x and y.
z z r
x
n 1
The correlation coefficient r is by far the most commonly used correlation coefficient .
z z 0.50 (0.48) (0.02) (0.16) 0.87 (0.78) 0.25 z z 0.25 0.05, (a very weakpositive linear relation.) r
x y
n 1
6 1
Create a scatterplot using Excel: Highlight the input data Choose the scatterplot.
Click Insert
Click Scatter
Excel creates the scatterplot. We can use Chart Layouts to change the layouts or add titles.
2.
3. 4. 5.
1.
2.
The Misery Index = the inflation rate + the unemployment rate The Revised Misery Index = the inflation rate + 2 the unemployment rate Using inflation, unemployment and suicide rate for 1958 to 1992, the researchers reported that The Pearson correlation between the Misery indices and suicide rate = .97. The Pearson correlation between the revised Misery indices and suicide rate = .61.
Conclusion: Although there is a positive relationship between suicide rate and both indexes, the relationship is much stronger for the original index than for the revised index.
1 2 3 4 5
9 10 11 12 13 14 15
6
7 8
642
568 642
113.5
95 104
The scatterplot indicates that there is almost no linear relation between foal weight and mare weight.
Exercise: How does the average finish time (in minutes) in a marathon vary with age group for female participants?
Age Group Representative Age Average Finish Time
10 19 20 29 30 39 40 49 50 59 60 - 69
15 25 35 45 55 65
Construct a scatterplot and find r. Is there a strong linear relation between the age and average finish time? Let x = representative age, and y = average finish time.
r = 0.038477. There is a very weak linear relation between the age and average finish time.
6
7 9 12
45
30 5 2
Choose Regression
In the dialog box, enter Y Range first (B2:B6) and then X Range (A2:A6). You can optionally choose Output Range.
Excel gives a summary with a lot of information. (You may adjust the width of columns to have a better view.) For least-squares line, we only need the data in Coefficients column: a = intercept = 101.33 and b = X Variable 1 = - 9.30. The least-squares line is = 101.33 9.30x.
Exercise: Is Age Related to Recovery Time for Injured Athletes? How quickly can athletes return to their sport following injuries requiring surgery? An article gave the data in the table for 10 weight lifters on x = age and y = days after arthroscopic shoulder surgery before being able to return to their sport. Find the least-squares line.
Answer: y = 5.05 + 0.272x
x 33 31 32 28 33
y 6 4 4 1 3
26
34 32
3
4 2
28
27
3
2