You are on page 1of 6

Laine Reed AP Stats 6th Kiker 10/2/13

The Linear Regression Project


One day I was thinking about the future and where I would like to go to college, and a sudden question popped into my head why are there so many schools in California? Probably because there are so many people that live there, and since the weather in California is always perfect, no one ever leaves. So what about other states in the US? The state populations must affect in some way the number of colleges and universities present in that particular state. Based off of this sudden inquiry I decided to find the linear relation between state populations and number of colleges/universities in that state.

Population vs Colleges
1400 1200 Colleges in State 1000 800 600 400 200 0 0 10,000,000 20,000,000 30,000,000 40,000,000 Population in State Colleges

As shown in the scatterplot, the explanatory variable (x axis) of the relationship is the population, and the response variable (y axis) is the number of colleges. This relationship can safely be assumed this way because the number of colleges a specific state has can be

explained by the population. When put into a least squares regression line, this model can be written by the equation: = -7.966 + (2.89e-5)x This equation will give predicted values. In this situation, is equal to the predicted number of colleges, and x represents the population value. We know a linear model is appropriate for this data because the r-value, or correlation coefficient, is .95, meaning there is a very strong, positive linear association. In the context of this set of data, the slope shows that for every additional person added to a states population, there will be a 2.89e -5 increase in the number of colleges. The y-intercept, in this case -7.966, represents how many colleges there would be in a state of the population were zero. The r2 value, or coefficient of determination (.9025), shows that 90.25% of the variation of number of colleges can be explained by the regression line on population. As you can see on the scatterplot, there is one point (38041430, 1246) that is very far away from the others. Using the 1.5IQR outlier method for both the x and y variables, I found that this point is in fact an outlier in the data set. However, if it were to be removed from the data set, it would change the regression equation drastically and weaken the strong positive correlation. If this point were to be removed, the new equation would be = 27.273 + (2.385e5)x, the r-value would be 0.91, and the r2-value would be 0.83. Since the equation changed so much, this outlier can be considered an influential point. Below is a residual plot of the data. This shows how the actual data compares to the predicted data. The points along the graph are fairly scattered and many are relatively close to

the regression line. This means that the data can be represented linearly and that the actual data is very close to the predicted data.

Residual Plot
200 150 100 Residuals 50 0 -50 0 -100 -150 -200 Population 10,000,000 20,000,000 30,000,000 40,000,000

To find out if the least squares regression equation is really accurate and can be used reasonably to predict the number of colleges in a state, we will use the equation to predict how many colleges Colorado has (assuming we dont already know). = -7.966 + (2.89e-5)(5,187,582) = 141.96

This prediction shows fairly accurate, the actual number of colleges in Colorado being 171. This makes the residual approximately 29, meaning the least squares regression line underestimates the number of colleges per state. This is pretty good considering the large range for the variable and shows that the least squares regression line is a good method of predicting colleges. This information could be helpful to many people. It would especially be helpful for people who work as professional grant writers. Grant Writers write to the government and request to have money loaned to certain colleges and institutions for research. There are more

than one grant writer in each state, and they could all use this information to find which states are in need of more college funding, and how many colleges they could loan money to. This is a very important career because without these people writing grants, many colleges and universities would not have the necessary funding for certain types of research. Population vs. college count results in a very strong linear relationship, and almost fits its predicted values. Its linear because as the population count rises, so does the college count. This increase is a continuous one. If a single states population skyrockets, then the state will probably build new learning institutions to cater to the sudden influx of people. A linear regression, not any other type of regression, is the most appropriate for this model of data.

Works Cited

"Colleges and Universities in US by State/ Possession." US Colleges and Universities Directory. N.p., n.d. Web. 2 Oct. 2013. "Top 50 Cities in the U.S. by Population and Rank | Infoplease.com." Infoplease: Encyclopedia, Almanac, Atlas, Biographies, Dictionary, Thesaurus. Free online reference, research & homework help. | Infoplease.com. Pearson Education, n.d. Web. 2 Oct. 2013.