You are on page 1of 3

Regression

Analysis
League Points using
Goals Scored and
Wins by 3 or more
Goals
Scott Johnson

Consider predicting league points in the Barclays Premier League based off of the number of
goals scored in a season and number of wins by 3 or more goals (footballzz.com, statto.com).
The data will be provided if so desired.
The purpose of this report is to analyze the data using regression, in order to identify any
association or relationship among the response variable (league points) and the explanatory
variables (goals scored & wins by 3 or more goals). With this, we can determine the best variable
to use in predicting future results.
Points in Relation to Goals Scored

100
90
80
70
60
50
40
30
20
10
0

0
2
= 4.8603x + 38.963
R = 0.426 R=.924

League Points

League Points

Points in Relation to Wins by 3+


Goals

Wins by 3+ Goals

10

100
90
80
70
60
50
40
30
20
10
0

0
20
= 1.1157x - 7.6971
R = 0.8539 R= 0.653

40

60

80

100

Goals Scored

The relationship in the Points in Relation to Goals Scored graph appears to be positive and
fairly strong seeing as it follows a somewhat orderly pattern: as the number of goals increases,
the points increase proportionally. The r value is .924, indicating a rather strong relationship
between goals scored and league points. The slope, b, is 1.12 meaning that for every goal scored,
the team gained 1.12 points. The y-intercept is -7.7; this doesnt mean anything substantial since
you cannot have negative points. It could be accounting for several goals that teams are
essentially guaranteed to score. The coefficient of determination, r2, is .85; this means that
85% of all variance is accounted for in the regression. This proves the strength of the graph even
more since only a miniscule 15% of variance is unaccounted for. The predicted error, s, is
6.81. This means that the predicted amount of points based on goals scored only has an average
error of 6.81 points, demonstrating the strength of the correlation.
The correlation in the Points in Relation to Wins by 3+ goals is more unclear. There appears to
be a very weak or nonexistent relationship between the variables. The r value is .653; this
shows that slight positive correlation between wins with 3 or more goals and league points is not
very strong. The b value, or slope, is 4.86. This means that for every win by 3 or more goals,

the predicted league points increases by 4.86. The y-intercept is 38.96 and means that with no
wins of 3 or more goals, the predicted point total will be about 39. The coefficient of
determination, or r2, is .426. This means that only a mere 42.6% of the variance is shown in the
regression; this shows that this regression is most likely more inaccurate. The predicted error,
s, is 13.5, a substantially large predicted error seeing as there the points only range 64 points.
This means that using a correlation between wins by 3 or more goals and league points for a
prediction of league points will, on average, be incorrect by 13.5 points, proving that the
correlation is weak.

Residual Plot

20

100

10

50

0
0

20

40

-10
-20

Goals Scored

60

80

Residuals

Residuals

Residual Plot

0
0

-50
-100

Wins By 3+ Goals

The residual plot representing goals scored in relationship to league points is fairly random and,
overall, not very far from the residual, 0. This indicates strength in the correlation.
The residual plot showing the relationship between wins by 3 or more goals and league points, is
random, however the points are rather far away from the residual, 0. This means that the
regression is weaker than that of the others.
In conclusion, based on the r, r2, and s values, the best predictor variable, of the two, for
the data is the number of goals scored.

10

You might also like