You are on page 1of 4

Scatter Plot Project

Mr. Mun

Period 1

By: Erin Kim


Analysis

I researched on how many AP classes a student takes correlates to how many hours they

spend on their phone in a day. I conducted my research by using a Google Survey with the

obvious two questions: “How many AP classes are you taking? How many hours do you spend

on your phone per day?” Then, I uploaded the link onto my Instagram bio, then informing my

followers of this by mentioning it on my Instagram Story. Because there weren’t a lot of people

taking the survey, I became desperate; therefore, I kindly asked each one of my friends via

Snapchat to take the survey. Through the collecting of a total of sixty data points from my lovely

Instagram followers and Snapchat friends, my graph told me four things: negative correlation,

linear, medium strength, and one significant outlier of the data point (1,19) -- 1 being the number

of AP classes taken (x), and 19 being the hours spent on phone (y). The correlation coefficient is

-0.24497. The standard deviation of the residuals is 3.44526. The coefficient of determination is

0.06001.

Based on my results, I can conclude that while the correlation of the two does not imply a

causation, there was a fairly evident negative slope of the graph. In other words, for the most

part, the more AP classes a student took, the less amount of hours they spent on their phone. The

correlation coefficient was -0.24497; therefore, we can assume that there was a weak negative

correlation between the values of +1 and -1 — as the it is a negative number closer to 0, further

away from -1 (that indicates a strong negative correlation). The standard deviation of the

residuals, or the standard error or estimate, which measures the accuracy of the dependent

variable being measured (hours spent on phone) was 3.44526; therefore, we can assume that the

line of best fit from the regression line shown in the scatter plot is not too accurate in
determining the line of best fit, as it is not close to the standard deviation, 1.3707. Lastly, the

coefficient of determination was 0.06001; therefore, we can assume that 6% of the variance in

the response variable can be explained by the explanatory variable. And no, we cannot use a

linear model; according to the residual plot, there is an evident pattern of the data points -- a

negative linear pattern. We can only use a linear if and only if the data points are scattered

around, showing no sign of any patterns.

SCATTER PLOT (top)


Title: Number of AP Classes Taken vs. Number of Hours Spent on Phone
(x) # of AP classes taken
(y) # of hours spent on phone

RESIDUAL PLOT (bottom)


Title: Number of AP Classes Taken vs. Number of Hours Spent on Phone
(x) # of AP classes taken
(y) # of hours spent on phone

Sample size: 60
Mean x (x̄): 2.1333333333333
Mean y (y): 5.0666666666667
Intercept (a): 6.4779582366589
Slope (b): -0.66154292343387
Regression line equation: y=6.4779582366589-0.66154292343387x

You might also like