Professional Documents
Culture Documents
qxd
11/20/2007
9:09 PM
Page 389
34
Learning Objectives:
Terms: reliability, prediction, scatterplot, bivariate, strength, positive direction, negative direction, linear, curvilinear, outliers
Distinguish between correlational and experimental studies Create a scatterplot Estimate the strength and direction of a set of data based on its scatterplot Understand the effect of outliers on the strength and direction of a correlation
389
34-Steinberg-45460.qxd
11/20/2007
9:09 PM
Page 390
390
Most of the time, correlational studies are used for prediction rather than for establishing the reliability of the tests themselves. That is, we seek to establish relationships so that the score of a person on one variable can be used to predict that persons probable score on a second variable. For example, once a relationship is established between the number of hours children watch television and childrens academic performance in school, we can predict any given childs probable academic performance in school just by knowing the number of hours of television he or she watches. Similarly, a researcher interested in prediction may want to know the relationship between the amount of time students study and their grade on a test, the amount of antidepressant medication clients take and their reported mood level, air temperature and crime rate, income and years of education, height and weight, and IQ and shoe size (do you think there is any relationship?).
CHECK YOURSELF! Complete the following table to compare and contrast studies analyzed by three different statistics. Purpose of Study or Type of Conclusion to Be Drawn
No. of Variables
PRACTICE
1. Which of these studies might be analyzed with a correlation? a. The relationship between birth order and academic achievement b. The number of inches babies grow when breast-fed versus bottle-fed c. Whether the amount of time runners practice is related to their running times during a tournament d. Which of three treatments best helps hyperactive children stay on task
34-Steinberg-45460.qxd
11/20/2007
9:09 PM
Page 391
391
2. Which of these studies might be analyzed with a correlation? a. The cost of various cars when new and the amount of time before those cars need repair b. The number of illnesses reported per year by vegetarians versus nonvegetarians c. The duration of fever for viral versus bacterial infections d. The number of miles dieters walked and the amount of weight they lost
Figure 34.1
Drawing a Scatterplot
Assume that 10 students receive the following scores on Quiz 1 and Quiz 2. The scatterplot for this set of data is displayed in Figure 34.2, with the location of the first student, Abby, indicated. Student Abby Babs Clyde DeShawn Emily Fred Gino Hortense Ingrid Jorge Quiz 1 9 7 10 9 5 6 10 9 6 10 Quiz 2 10 8 7 8 5 4 8 6 6 9
34-Steinberg-45460.qxd
11/20/2007
9:09 PM
Page 392
392
12 11 10
Figure 34.2
PRACTICE
3. Create a scatterplot of the following heights and weights of ninth-grade girls: Subject Quashonna Ronette Sherelle Telise Ulinda Vonnie Winona Xelei Yvette Zoe Height 66 68 62 60 67 60 68 70 63 69 Weight 132 121 105 107 130 110 136 148 96 154
4. Create a scatterplot of the following data on the average annual household income and the average cost of a three-bedroom, 1-bath home (both to the nearest thousand dollars) in 10 geographic areas: Area 1 2 3 4 5 6 7 8 9 10 Income 32 46 29 67 42 78 35 52 39 43 House 104 232 95 291 165 482 128 347 119 143
34-Steinberg-45460.qxd
11/20/2007
9:09 PM
Page 393
393
Relationship Strength
The relationship between two sets of scores has two characteristics: strength and direction. Lets look at both strength and direction in more detail. The strength of a relationship tells the degree to which scores on one variable are related to scores on the other variable. Strength is expressed from .00 to 1.00. The higher the numerical value (regardless of sign), the stronger the relationship. A correlaMathematics is the tion of .82 is strong, while a correlation of .13 is weak. In a perfect relationship, science of patterns. all data points fall along a straight line. For example, by knowing the temperature on the Celsius scale, we can exactly predict the temperature on the Fahrenheit Lynn Arthur Steen scale. Thus, the correlation between Celsius and Fahrenheit is 1.00. A correlation of .00, at the other extreme, indicates no relationship. For example, there is no relationship between adult IQ and shoe size. Adults with high, medium, or low IQs are equally likely to have small, medium, or large shoe sizes. Thus, the data points fall in a circular blob. Here is how these two scatterplots would look (Figure 34.3).
14 13
Degrees Fahrenheit
Zero relationship
Although we sometimes find perfect relationships in the physical sciences, relationships are rarely perfect in the social sciences. With human beings, most variables are only moderately related. SAT score predicts freshman GPA, for example, but only moderately. Other variables such as motivation or test anxiety come into play. Thus, some people who score very high on the SAT do poorly in college, and some people who score low on the SAT do very well in college. Similarly, the number of absences during a semester predicts grade on a final exam but only moderately. Some students who never miss a class do poorly on the final exam, and some students who are frequently absent do well on the final exam. Other variables such as prior knowledge, diligence in reading the textbook, and obtaining missed notes come into play. The data for most social science relationships fall neither on a straight line nor in a circular blob. Rather, they fall in an ellipse. The thinner the ellipse, the closer the points fall to a straight line and, hence, the stronger (closer to 1.00) the relationship. The wider the ellipse, the farther the points fall from a straight line and, hence, the weaker (closer to .00) the relationship. Figure 34.4 shows two typical moderate relationships in the social sciences.
34-Steinberg-45460.qxd
11/20/2007
9:09 PM
Page 394
394
4 Absences
0 60 65 70 75 80 85 90 95 100
Figure 34.4
Relationship Direction
The direction of a relationship tells whether or not the values on two variables go up and down together. Direction is indicated by a positive or a negative sign. If two variables are positively correlated, then as the values on one variable go up, so do the values on the other variable. For example, the relationship between SAT score and freshman college GPA is positive. Thus, students who receive higher scores on the SAT are more likely to receive higher grades during their freshman year of college. You can see this pattern in the upper scatterplot in Figure 34.4. Note the direction of the data points. With a positive relationship, the data points go from the bottom left to the upper right.
34-Steinberg-45460.qxd
11/20/2007
9:09 PM
Page 395
395
If two variables are negatively correlated, then as the values of one variable go up, the values of the other variable go down. For example, the relationship between the number of absences in a course and score on the final exam in that course is negative. He seemed to have very In other words, the more often students are absent from class, the lower strong intuitions but their grades tend to be on the final exam. You can see this pattern in the unfortunately of negative sign. lower scatterplot in Figure 34.4. Note the direction of the data points. With a negative relationship, the data points go from the upper left to the lower the biologist Francis Crick, right. referring to Ren Thom, in Lets return to the set of quiz scores from the beginning of this module. the book What Mad Pursuit Judging from the scatterplot shown in Figure 34.5, what is the approximate strength of the relationship? What is the direction of the relationship?
12 11 10 9 Score on Quiz 2 8 7 6 5 4 3 2 1 0 0 1 2 3 4 5 6 7 8 9 10 11 12 Score on Quiz 1
Figure 34.5
The relationship is moderate because the points form an ellipse about halfway between a straight line and a circle. And it is positive because the general trend is from lower left to upper right. That is, as the scores on Quiz 1 go up, the scores on Quiz 2 also go up.
CHECK YOURSELF! What two terms are used to describe a correlational relationship? What does each term indicate?
PRACTICE
5. For each of the following, indicate whether the expected relationship between the two variables will be positive (+), negative (), or zero (0): ____ a. The average number of calories eaten per day and body weight ____ b. Running speed and general physical condition ____ c. Length of hair and introversion ____ d. The amount of education and the time spent on welfare ____ e. Per capita consumption of alcohol and suicide rate
34-Steinberg-45460.qxd
11/20/2007
9:09 PM
Page 396
396
6. For each of the following, indicate whether the expected relationship between the two variables will be positive (+), negative (), or zero (0): ____ a. Air temperature and the amount of snow on the ground ____ b. The number of minutes of exercise per day and score on a physical fitness test ____ c. The number of years since having a drivers license and age ____ d. The number of pages in a textbook and cost of that textbook ____ e. Age at which a child takes its first step and educational level of the parents
4 Absences
0 60 65 70 75 80 85 90 95 100
Figure 34.6
34-Steinberg-45460.qxd
11/20/2007
9:09 PM
Page 397
397
Not all relationships are linear. Some are curvilinear. In a curvilinear relationship, the trend in the data changes direction. For example, the relationship between test score and test anxiety is curvilinear. High levels of test anxiety impair test performance, but so do low levels of test anxiety. The best test performance occurs among test takers having moderate anxiety. Thus, the relationship takes on an inverted U shape, as you can see in Figure 34.7.
Test score
Anxiety level
Figure 34.7
An Inverted U Relationship
The relationship mentioned at the beginning of this module between the amount of antidepressant medication taken and reported mood level is also curvilinear. You can see this pattern in Figure 34.8.
Mood
Dosage
Figure 34.8
An S-Shaped Relationship
Note from the graph that mood elevates sharply with initial dosage, then elevates more slowly with a further increased dosage, and finally levels off at a still higher dosage. Thus, the relationship takes on a leaning S shape.
34-Steinberg-45460.qxd
11/20/2007
9:09 PM
Page 398
398
If we try to fit a straight line to curvilinear data, many data points will fall off the line. This is especially true for the relationship between test score and test anxiety. However, if we fit a curved line to the data, the data points hug the line quite closely. Figure 34.9 shows how a curved line fits the data better than a straight line does.
Test score
Anxiety level
Mood
Dosage
Figure 34.9
Recall that the closer data fall to the line, the stronger the relationship. Because the data in Figure 34.9 closely hug a curved line, fitting a curved line to the data will yield a relatively high correlationsomething close to 1.00. Conversely, because the same data fall quite far from a straight line, fitting a straight line to the data will yield a relatively low correlation something close to .00. This is why you should always plot your data before calculating any statistic. As you will learn in Module 35, some correlation statistics are linear, and some are curvilinear. Using a linear statistic to describe curvilinear data will seriously underestimate the amount of correlation.
34-Steinberg-45460.qxd
11/20/2007
9:09 PM
Page 399
399
CHECK YOURSELF! How can you tell if a relationship is linear or curvilinear? Why does the shape of the data matter when calculating a correlation?
Figure 34.10 Data Without and With an Outlier For the most part, a straight line fits the data well. However, the outlier pulls the line in the direction of the outlier, as shown in the lower example in Figure 34.10. When the line is pulled toward the outlier, the remaining points then fall farther from the line than they otherwise would have. The fit is worse; hence, the correlation is lower. Outliers lower the amount of correlation from what it would be without the outliers. Often an outlier is due to error: a mismarked answer paper, a mistake in entering a score in a database, a subject who misunderstood the directions, and so on. You should always seek to understand the cause of an outlying score. If the cause is not legitimate, you should eliminate the outlying score from the analysis. Leaving an incorrect or errant score in the analysis distorts the true correlation.
34-Steinberg-45460.qxd
11/21/2007
12:14 PM
Page 400
400
At other times, outliers are legitimate. One person may legitimately score far beyond everyone else on the variable being measured. Nevertheless, if the outlier seriously distorts interpretation of the remaining data, it is customary to present the data with and without the outlier, state that you are eliminating it, and then conduct further analyses with the outlier removed.
CHECK YOURSELF! How can you tell if a set of data includes outliers? Why does the presence of outliers matter when calculating a correlation?
Looking Ahead
So far, we have merely estimated the strength and direction of relationships through visual inspection of the data. In the next module, we will calculate a correlation coefficient. Then, we will be able to state the exact strength and direction of a relationship.
PRACTICE
7. Here are the quiz scores for 6 students on each of three pop quizzes: Quiz 1 Ann Brianna Chara Diane Eliza Felice 11 9 8 6 4 3 Quiz 2 1 7 9 10 11 12 Quiz 3 5 9 12 10 6 6
a. Create three separate scatterplots: Quiz 1 and Quiz 2, Quiz 1 and Quiz 3, and Quiz 2 and Quiz 3. b. Describe the direction and apparent strength of the relationship in each scatterplot. Comment on the effect of outliers, if any. 8. Gather data from 10 adults (perhaps from students in your statistics class) for height and shoe size. a. Create a scatterplot for the data. b. Describe the direction and apparent strength of the data. Comment on the effect of outliers, if any. 9. Gather data from 10 adults (perhaps from students in your statistics class) for the number of hours of television watched on an average day and the typical number of hours of sleep obtained in an average night. a. Create a scatterplot for the data. b. Describe the direction and apparent strength of the data. Comment on the effect of outliers, if any.
Visit the study site at www.sagepub.com/steinbergsastudy for practice quizzes and other study resources.