You are on page 1of 8

Business Statistics Project

Topic: MULTIPLE LINEAR REGRESSION ANALYSIS






Names :-
Gaurav Mohta (90)
Priyanjali Moulik (91)
Ameya Rege (99)
Tanuj Nabar (93)
Class & Batch :-
PGDM 2
Date of submission :-
Saturday, 16 Nov, 2013
Kunal Kashyap (81)
Multiple Regression Theory
Multiple regression is regression with two or more independent variables on the right-hand side
of the equation. Use multiple regression if more than one cause is associated with the effect you
wish to understand.
For prediction: Multiple regression lets you use more than one factor to make a prediction.
For explanation: Multiple regression lets you separate causal factors, analyzing each ones
influence on what you are trying to explain.
The equation and the true plane
For the case of two independent variables, you can write the equation for a multiple regression
model this way:
Y = + X + Z + error
Imagine that the X- and Z-axes are on a table in front of you, with the X-axis pointing to the
right and the Z-axis pointing directly away from you. The Y-axis is standing vertically, straight
up from the table.
Y = + X + Z is the formula for a flat plane that is floating in the three-dimensional space
above the table.
is the height of the plane above the point on the table where X=0 and Z=0.
is the slope of the plane in the X direction, how fast the plane rises as you go to the right. If is
bigger than 0, the plane is tilted so that the part to your right is higher than the part to your left.
is the slope of the plane in the Z direction, how fast the plane rises as it goes away from you. If
is bigger than 0, the plain is tilted toward you. If is negative, the plane is tilted away from
you.
The error, in Y = + X + Z + error, means that the data points do not lie right on this plane.
Instead, the data points form a cluster or a cloud above and below the plane described by Y =
+ X + Z.
When we collect data, we do not get to see the true plane. All we have is the cloud of points,
floating in space. Multiple regression with two independent variables tries to find the plane that
best fits that cloud of points, in the hope that the regression plane will be close to the true plane.


Our regression model is aimed at finding out the dependence of the number of goals scored by
A-listed football players by taking into account factors such as goals, matches, assists, yellow
cards, red cards, substituted on , substituted off.
The sample size was determined on the basis of certain criterias of the players which are as
follows :-
Now, here in our model,
Independent variable (y) = Number of goals scored by the four players in different seasons
Dependent variables (x1) = Number of matches played
(x2) = Number of assists
(x3) = Number of yellow cards they each got issued
(x4) = Number of red cards they got issued
(x5) = Number of times they each got substituted on
(x6) = Number of times they each got substituted off




Player Season Goals (y) Matches (x1) Assists (x2) Yellow cards (x3) Red Cards (x4) Subsituted on (x5) Substituted off (x6)
Ronaldo 2009-2010 26 29 11 3 1 1 4
2010-2011 40 34 13 2 0 2 1
2011-2012 46 38 13 4 0 1 2
2012-2013 34 34 11 9 0 4 5
Messi 2009-2010 34 35 13 3 0 5 4
2010-2011 31 33 21 4 0 2 1
2011-2012 50 37 20 6 0 1 0
2012-2013 46 32 14 1 0 4 1
Rooney 2009-2010 26 32 5 6 0 0 10
2010-2011 11 28 11 5 0 3 4
2011-2012 27 34 7 1 0 2 6
2012-2013 12 27 10 7 0 5 9
Torres 2009-2010 18 22 3 5 0 2 10
2010-2011 10 37 6 7 0 7 9
2011-2012 6 32 7 4 1 12 5
2012-2013 8 36 9 4 0 8 6



Hence, we find the regression equation to be:


y =
0.72x1-0.604x2+0.33x3-4.62x4-2.435x5-
3.037x6+32.405


where, y = Goals scored


x1 = Number of matches


x2 = Number of assists by the player


x3 = Number of Yellow Cards given to the player


x4 =
Number of Red Cards given to the
player


x5 = Number of times the player is substituted on the team


x6 = Number of times the player is substituted off the team



For example, the number of goals that will be scored in a season by a player according to this model
under the following conditions:

Number of matches =

30


Number of assists by player =

14


Number of Yellow Cards given to the player =

1


Number of Red Cards given to the player =

0


Number of times the player is substituted on the
team =

3


Number of times the player is substituted off the
team =

2


is given by:


y = 32.52550589


Approximately, 33

To check our model at 5% level of
significance,

Null hypothesis: H0: All factors are zero

Alternate hypothesis: H1: All factors are not zero


F Method

F = 5.210223049

F crit = 3.37
{F(6,9) at 5% level of
significance}

.: F > F crit

.: Reject H0

.: Conclude that all factors are not zero


Significance-F method

Significance-F = 0.014113619

As it is a very small value, reject H0

.: Conclude that all factors are not zero


t Distribution method

t-values = 1.088923443


-0.520578462


0.240927386


-0.577088367


-2.873058225


-1.576609914

t crit = 1.833 {t(9) at 5% level of significance}
As t values are not greater than t crit, reject
H0.

.: Conclude that all factors are not zero


Error method to find factors of dependence:
for x1, i.e. number of matches played, p
value =

0.30448508


.: Error percentage = 30.44850803 %
for x2, i.e. number of assists by the player, p value = 0.615223742


.: Error percentage = 61.52237417 %
for x3, i.e. number of yellow cards given to the player, p value = 0.815010301


.: Error percentage = 81.50103014 %
for x4, i.e. number of red cards given to the player, p value = 0.578027352


.: Error percentage = 57.80273521 %
for x5, i.e. number of times the player is substituted off the
team, p value = 0.018384826


.: Error percentage = 1.838482588 %
for x5, i.e. number of times the player is substituted off the
team, p value = 0.149339776


.: Error percentage = 14.93397762 %


Conclusion:-
If one more match is played, then the number of goals will go up by 7.2 i.e. 7
approximately.
If there is one more assist by the players, then the number of goals goes down by 6.
If there is one more yellow card issued, then number of goals go up by 3.3 i.e. 3.
If there is one more red card issued, then number of goals goes down by 5.
If there is one more player substituted on the team, then number of goals go down by
2.43 i.e. 2.
If there is one more player substituted off the team, then number of goals go down by 3.7
i.e. 4.






Hence, we find that x2, x3 and x4 have a high error value when used as factors to determine the value of y
Hence, the number of assists by the player and the number of yellow and red cards given to the player cannot be determining factors for numbe of goals scored by him in the season.
.: The new equation of regression is:
y = 0.72x1-2.435x5-3.037x6+32.405
.: The number of goals that will be scored in a season by a player according to this model under the following conditions:
Number of matches = 30
Number of times the player is substituted on the team = 3
Number of times the player is substituted off the team = 2
is given by:
y = 40.64899767
Approximately, 41

You might also like