You are on page 1of 3

Chapter 4 Project

The topic we chose to collect our data on is the number of people in a students
household and the number of bedrooms in the students house. We chose this topic because we
thought these two variables would be related. We predicted that the more people who lived in a
household, the more bedrooms there would be in the household. Likewise, the lower amount of
people living in a household, there would be fewer bedrooms in the house. We collected our data
using a systematic approach by surveying every fourth person in the freshmen and junior class,
and every third person in the sophomore and senior class. The explanatory variable is the x
value, or the number of people in the household. The response variable is the y value, or the
number of bedrooms in the household.
Our scatter plot displays the correlation between number of individuals that live in a
household and the number of bedrooms located in that house. The scatter plot displays a positive
correlation between the data sets. Positive correlation means greater x-values will go along with
greater y-values, while lower x-values will have a lower y-value that corresponds. Our data set
has a moderate positive correlation, with an R-value of .575. The R-value distinguishes the allaround correlation of all the data points in a set. If the R-value is closer to positive 1, than the
line will have a greater positive correlation. If the value is 0, there is no correlation, and if the Rvalue is close to negative 1, the greater negative correlation of the data set. The correlation is
displayed on the graph by a linear-regression line. Our regression line has an equation of
Y=0.369x + 2.018.
Our X-bar value for our data set is 5.04, while our Y-bar value was 3.87. These two
numbers tell us the average number of people per household is 5, while the average number of
bedrooms is 4. The X and Y bars values display the averages of the two separate sets of data. So
the X-bar value is the average of all the X-values, and Y-bar is the average of all the Y-values.
The significance of these values is each point lies on the linear regression line of the graph. The
differences between these values is identified by the marginal change. Marginal change is the
number of units changed in the response variable (Y-Values) for each unit change in the
explanatory variable (X-Values).
From our data set about the number of people in a household (x) and number of
bedrooms (y), there was 2 influential points. One was located at (4,6) and the other was at (5,6).
The two points that indicate there was 4 and 5 people in the household, while the house has 6
bedrooms. These two influential points could be from the lurking variable of a household that
has many kids who go to college and are moved out of the house. Influential points are defined
as if the point is removed, it would substantially change the equation of the least-squares line or
other calculations associated with the linear regression. Our two influential points would change
the equation if removed because of their substantial separation from the linear regression line.
One other value influenced by those points is the R squared value. Our R-Squared value is .331.
This value helps us identify the explained and unexplained variation of our data set. The amount
of our explained variable is 33.1%, and the amount of our unexplained variable is 66.9%. The
explained variable shows 33.1% of the behavior of the Y data points can be explained by the

corresponding X values by using the regression line from the graph. The unexplained variable of
66.9% shows that the behavior of the Y variables is due to random chance or from the possibility
of lurking variables.
We looked for lurking variables by examining our data and seeing if there was any other
variable besides the number of people in a household and the number of bedrooms in the
household that could have affected the data. A lurking variable is defined as a variable that is
neither response or explanatory, but may be responsible for the change in a x or y value. One
possible lurking variable in our data is many students have households that have one kid at least
who is in college and is no longer living in the house. Households with one or more kids moved
out, going to college, and living on their own have fewer people living in their household, but
still have the same number number of bedrooms in the house. College students or other members
of the household who are moved out causes some data values to have a smaller number of people
living in the household, but more bedrooms. This lurking variable could be one reason our data
had a lower correlation coefficient.
Another lurking variable that could have caused the data to be scattered is many of the
households had the number of people living in the house equal to the number of bedrooms in the
household. Before we conducted the study, we expected in most cases the number of bedrooms
would be one less than the people living in the household because we predicted the parents
would share a bedroom. On the contrary, many households had equal amounts of people living in
the household and bedrooms, which therefore caused the data to be more scattered and have
points like (6,6) that were not close to the regression line.
Interpolation is predicting the y values for x values that are between observed x values in
the data set. The point we chose for the x value in our data was 4. Even though 4 was already a
data set, we did not want to choose 3.5, because there cannot be 3.5 people living in the
household. We plugged the value 4 into the equation for x of the regression line and the y value
came to be 3.494. These two numbers mean that when 4 people live in the household, there is
3.494 bedrooms. Obviously, there cannot be 3.494 bedrooms in a house, so the model for using
interpolation is flawed. We found similar results when using extrapolation to predict y values for
x values that are beyond observed x values in the data set. The number we chose was 9, and
when we plugged the x value 9 into our regression line equation, the y value ended up being
5.339. These numbers tell us for 9 people living in a household, there are about 5.339 bedrooms.
Clearly, a house cannot have between 5 and 6 bedrooms, so the model for extrapolation is also
flawed.

You might also like