You are on page 1of 10

Multiple regression : used when there are 1 dependent variable(y) and 2 or more independent

variable (x1, x2, x3 etc.)


It is a parametric test.
Y = b0 + b1*x1 + b2*x2 + b3*x3 + b4*x4
b0 is intercept
b1,b2, b3, b4 are coefficients of x1,x2,x3,x4 respectively.

Assumptions for multiple regression:


Before applying multiple regression following assumptions should be checked.
Multiple linear regression analysis makes several key assumptions:
1 Linear relationshipIt is checked with scatter plot diagram
There should be linearity between Dependent variable and each independent variable. So, scatter
plot diagram has to be made for each independent variable with the dependent variable.
The independent variable which has non linear relation with dependent variable has no use in
multiple regression. That variable should be excluded from the analysis.

The linearity assumption can best be tested with scatter plots, the following two
examples depict two cases, where no and little linearity is present.

Normality the multiple linear regression analysis requires all variables to be


normal. This assumption can best be checked with a histogram and a fitted normal
curve or a Q-Q-Plot. Normality can be checked with a goodness of fit test, e.g.,
the Kolmogorov-Smirnof test.

No or little multicollinearity: i.e. there should be no or little relation between the


independent variables. This could be checked by following

Correlation matrix when computing the matrix of Pearson's Bivariate Correlation


among all independent variables the correlation coefficients need to be smaller than .08.

Variance Inflation Factor (VIF) the variance inflation factor of the linear regression is
defined as VIF = 1/T. Similarly with VIF > 10 there is an indication for multicollinearity
to be present.

Condition Index the condition index is calculated using a factor analysis on the
independent variables. Values of 10-30 indicate a mediocre multicollinearity in the
regression variables, values > 30 indicate strong multicollinearity.

No auto-correlation
Fourthly, multiple linear regression analysis requires that there is little or no
autocorrelation in the data. Autocorrelation occurs when the residuals are not
independent from each other. In other words when the value of y(x+1) is not
independent from the value of y(x). This for instance typically occurs in stock
prices, where today's price is not independent from yesterday's price.

While a scatter plot let's you check for autocorrelations, you can test the multiple
linear regression model for autocorrelation with the Durbin-Watson test. DurbinWatson's d tests the null hypothesis that the residuals are not linearly autocorrelated. While d can assume values between 0 and 4, values around 2 indicate

no autocorrelation. As a rule of thumb values of 1.5 < d < 2.5 show that there is
no auto-correlation in the multiple linear regression data. However the DurbinWatson test is limited to linear autocorrelation and direct neighbors (so called first
order effects).

Homoscedasticity

The last assumption the multiple linear regression analysis makes is homoscedasticity.
The scatter plot is good way to check whether homoscedasticity (that is the error terms
along the regression line are equal) is given. If the data is heteroscedastic the scatter
plots look like the following examples:

7
The Goldfeld-Quandt Test can test for heteroscedasticity. The test splits the multiple
linear regression data in high and low value to see if the samples are significantly
different . If homoscedasticity is present in our multiple linear regression model, a nonlinear correction might fix the problem, but might sneak multicollinearity into the model.

So before applying multiple regression above all things should be checked and if any
independent variable is behaving against the assumptions then that should be excluded or some
other measures should be performed to make it correct.
After checking all this one can perform regression analysis:
Multiple regression in SPSS:

You will be presented with the Linear Regression dialogue box below:

Transfer the dependent variable, VO2max, into the Dependent: box and the independent
variables, age, weight,heart_rate and gender into the Independent(s): box, using the
buttons, as shown below (all other boxes can be ignored):

Click the
button. You will be presented with the Linear Regression:
Statistics dialogue box, as shown below:

In addition to the options that are selected by default, select Confidence intervals in the
Regression Coefficients area leaving theLevel(%): option at "95". You will end up with the
following screen:

Click the

Click the

button. You will be returned to the Linear Regression dialogue box.


button. This will generate the output.

How to read model summary


R square: also called as coefficient of determination.
R=1. If a model has no predictive capability, R=0. As additional variables are added to a
regression equation, R increases even when the new variables have no real predictive capability.
But adjusted-R doesn't increase unless the new variables have additional predictive capability.
So, adjusted-R carries much importance. If there is huge difference between R & adjusted-R
then that means there is some problem in the model & some unwanted Independent variables are
present.

F value : It gives about the idea of null hypothesis. To check its significance p value has to be
checked. If it is less than .05 then null hypothesis is rejected and this shows that regression
model is true. That regression model can be accepted.
Source
Model
Error
C Total

Sum of Mean
DF Squares Square
2 0.47066 0.23533
233 6.68271 0.02868
235 7.15337

F Value Prob>F(p value)


8.205 0.0004

B value:

B value first column are coefficients of independent variables.


Ypredicted = b0 + b1*x1 + b2*x2 + b3*x3 + b4*x4
b0 is constant
b1 = .389, it is for math. This also shows that 1 unit change in the value of X1 brings .389 unit
change in the value of Y, keeping all other independent variables constant. Similarly for all other
coefficients(b2,b3,b4).
b2 is for female -2.010 and now all other b3, b4 are social studies and reading score .50 and.335
respectively. Negative and positive signs shows negative and positive relation of independent
variables with dependent variables. This also shows that one unit change in the value of x1
brings .389 unit change in the value of Y keeping all other independent variables constant.

Now one can put the values accordingly and can predict the value of Y
t and Sig. - These are the t-statistics and their associated 2-tailed p-values used in testing
whether a given coefficient is significantly different from zero. Using an alpha of 0.05:
The coefficient for math (0.389) is significantly different from 0 because its p-value is 0.000,
which is smaller than 0.05.
The coefficient for female (-2.010) is not significantly different from 0 because its p-value is
0.051, which is larger than 0.05.
The coefficient for socst (0.0498443) is not statistically significantly different from 0 because
its p-value is definitely larger than 0.05.
The coefficient for read (0.3352998) is statistically significant because its p-value of 0.000 is
less than .05.

You might also like