You are on page 1of 6

Lab Activity #13: Correlation and Regression

STAT 1350

Please read carefully through this handout and answer each question as completely as you can.
This material relates to what you have learned from Chapters 14 and 15.

There are 13 questions in this activity. You will need to submit the answers to these questions no
later than 11:55 p.m. on Friday, April 19th. You will share the answers via a Word or PDF file
that you submit through the course website (either by going to the “Submit” link within the
Week 14 Overview, next to the place where you downloaded this lab activity handout, or by
going to the Assignments link on the left side of the course page and clicking on Lab Activity
#13). Note that we encourage you to ask questions and help each other through this activity. A
special discussion forum can be found on the course site for questions that relate specifically to
the content of Week 14 (see the link within the Week 14 Overview, or by going to the
Discussions link on the left side of the course page).

Your answers to each question do not have to be long, but they should be as complete as
possible. Aim to be concise but thorough in your answers! You should also be sure to type your
name at the top of your assignment so we can easily see it. As always, you will be graded based
on completeness and the correctness of a random selection of problems that we choose to grade.
For this reason, please try to complete the entire lab, and ask for help if you stuck along the way!

If any problems require calculations, please attempt to write out how you arrived at your
answer so we can see your thought process.

1
Part 1: Predicting the length of pregnancy

Suppose that longevity is something we have a good deal of information about and typically
understand well, but the gestation period (or the length of pregnancy) of mammals is less clear.
We therefore want to think about how we might be able to predict or explain gestation period
(especially if we uncover new species of mammal) based on longevity.

The scatterplot below shows the relationship between longevity (measured in years) and gestation
(measured in days) for a sample of 38 mammals.

The correlation between gestation and longevity is r = 0.59. The regression equation to predict gestation
based on longevity is:

Predicted gestation = 19.66 + 12.68 (longevity)

Use this information to answer the following questions.

1.From what you can see in the scatterplot, and what you know about the correlation between the
variables, how would you describe the form, direction, and strength of this relationship?

Form:

Direction:

Strength:

2. Remember that longevity is measured in years and gestation is measured in days. Given this
information, what units would the correlation coefficient, or r, have?

3. Look carefully at the regression equation that appears below the scatterplot on the previous page.

2
A. What is the slope? Please interpret this value in the context of the problem.

B. What is the intercept? Please interpret this value in the context of the problem.

C. What would you predict gestation to be for a mammal with a longevity of 18 years?

4. ____________ % of the variability in gestation can be explained by the regression equation and
____________% of the variability in gestation can not be explained by the regression equation.

5. A human is a mammal, and a human has an average longevity of over 70 years. Should we use
the regression equation to predict the gestation period of a human? Please explain why or why
not.

6. Suppose you switched “x” and “y” in this example. In other words, rather than have x = longevity
and y = gestation, you decide that x should be gestation and y should be longevity. If you make
this switch, would you expect the correlation coefficient, or r, to change? Please explain.

Part 2: More reasoning about scatterplots, correlation, and regression

7. The correlation between height (measured in inches) and weight (measured in pounds), for a
sample of 87 4th grade students, is r = 0.65. If you decide it would be better to express height in
centimeters instead of inches, the correlation between height (measured in centimeters) and
weight (measured in pounds), for the same sample of 87 4 th grade students, would be r = 0.65.

A. True B. False

8. The correlation of r = 0.9 is stronger than the correlation of r = -0.9.

A. True B. False
9. The scatterplot below shows the relationship between the number of hours a student typically
works each week (hrswork) and the number of hours of television the student typically watches
each week (hrstv), for a random sample of 50 college students. It turns out that 29.16% of the

3
variability in hrstv can be explained by the regression equation. This means the correlation
between hrswork and hrstv must be equal to what value?

10. Midwest Office Equipment sells copiers and performs maintenance and repair services on their
copiers. A random sample of 45 service calls was selected. For each call, the number of copiers
serviced and the total service time (in minutes) were recorded. This information is presented in
the scatterplot below. Suppose one company has 4 copiers serviced, and the total service time
ends up being 125 minutes. If we added this new observation to our analysis, how would this
affect the strength of the relationship between Number of Copiers Serviced and Total Service
Time?

A. The correlation between the variables would get weaker.


B. The correlation between the variables would get stronger.
C. The correlation between the variables would not change at all.

11. The gas mileage of an automobile first increases and then decreases as the speed increases.
Suppose that this relationship is very regular, as shown by the following data on speed (miles per
hour) and mileage (miles per gallon):

4
Speed 30 40 50 60 70
Mileage 20 24 26 24 20

A scatterplot of the above data is shown below.

A. Based on what you see in the scatterplot above, describe the relationship between speed and
mileage.

B. The correlation between speed and mileage is actually r = 0. Explain why the correlation is 0
even though there is obviously a relationship between speed and mileage.

12. A linear regression analysis reveals a strong, negative linear relationship between x and y. Which
of the following could possibly be the results from this analysis? Note we are using the symbol “

” to stand for “Predicted y.”

A.

B.

C.

D.

E.

5
13. The scatterplot below shows the selling price (in dollars) of a house versus the living area (in
square feet). Which one of the following statements is false?

A. There is a positive linear relationship between selling price and living area.
B. The house that is around 1050 square feet sold for about $305,000.
C. An increase in living area causes an increase in selling price.
D. If we predict the selling price using a regression equation, it would be extrapolation if we tried to
predict prices for homes smaller than 1000 square feet or larger than 4000 square feet.

You might also like