Professional Documents
Culture Documents
The linearity assumption can best be tested with scatter plots, the following two
examples depict two cases, where no and little linearity is present.
Variance Inflation Factor (VIF) the variance inflation factor of the linear regression is
defined as VIF = 1/T. Similarly with VIF > 10 there is an indication for multicollinearity
to be present.
Condition Index the condition index is calculated using a factor analysis on the
independent variables. Values of 10-30 indicate a mediocre multicollinearity in the
regression variables, values > 30 indicate strong multicollinearity.
No auto-correlation
Fourthly, multiple linear regression analysis requires that there is little or no
autocorrelation in the data. Autocorrelation occurs when the residuals are not
independent from each other. In other words when the value of y(x+1) is not
independent from the value of y(x). This for instance typically occurs in stock
prices, where today's price is not independent from yesterday's price.
While a scatter plot let's you check for autocorrelations, you can test the multiple
linear regression model for autocorrelation with the Durbin-Watson test. DurbinWatson's d tests the null hypothesis that the residuals are not linearly autocorrelated. While d can assume values between 0 and 4, values around 2 indicate
no autocorrelation. As a rule of thumb values of 1.5 < d < 2.5 show that there is
no auto-correlation in the multiple linear regression data. However the DurbinWatson test is limited to linear autocorrelation and direct neighbors (so called first
order effects).
Homoscedasticity
The last assumption the multiple linear regression analysis makes is homoscedasticity.
The scatter plot is good way to check whether homoscedasticity (that is the error terms
along the regression line are equal) is given. If the data is heteroscedastic the scatter
plots look like the following examples:
7
The Goldfeld-Quandt Test can test for heteroscedasticity. The test splits the multiple
linear regression data in high and low value to see if the samples are significantly
different . If homoscedasticity is present in our multiple linear regression model, a nonlinear correction might fix the problem, but might sneak multicollinearity into the model.
So before applying multiple regression above all things should be checked and if any
independent variable is behaving against the assumptions then that should be excluded or some
other measures should be performed to make it correct.
After checking all this one can perform regression analysis:
Multiple regression in SPSS:
You will be presented with the Linear Regression dialogue box below:
Transfer the dependent variable, VO2max, into the Dependent: box and the independent
variables, age, weight,heart_rate and gender into the Independent(s): box, using the
buttons, as shown below (all other boxes can be ignored):
Click the
button. You will be presented with the Linear Regression:
Statistics dialogue box, as shown below:
In addition to the options that are selected by default, select Confidence intervals in the
Regression Coefficients area leaving theLevel(%): option at "95". You will end up with the
following screen:
Click the
Click the
F value : It gives about the idea of null hypothesis. To check its significance p value has to be
checked. If it is less than .05 then null hypothesis is rejected and this shows that regression
model is true. That regression model can be accepted.
Source
Model
Error
C Total
Sum of Mean
DF Squares Square
2 0.47066 0.23533
233 6.68271 0.02868
235 7.15337
B value:
Now one can put the values accordingly and can predict the value of Y
t and Sig. - These are the t-statistics and their associated 2-tailed p-values used in testing
whether a given coefficient is significantly different from zero. Using an alpha of 0.05:
The coefficient for math (0.389) is significantly different from 0 because its p-value is 0.000,
which is smaller than 0.05.
The coefficient for female (-2.010) is not significantly different from 0 because its p-value is
0.051, which is larger than 0.05.
The coefficient for socst (0.0498443) is not statistically significantly different from 0 because
its p-value is definitely larger than 0.05.
The coefficient for read (0.3352998) is statistically significant because its p-value of 0.000 is
less than .05.