You are on page 1of 17

Quantitative Techniques In Analysis 1 Multiple Regression Model

IBAD UR REHMAN - 5873 SAAD AHMED - 5810 SHOAIB SHAIKH - 5785 MBA (M)

MULTIPLE REGRESSION MODEL PREDICTION OF MATH ACHIEVEMENTS

SUBMITTED TO: USMAN ALI WARRAICH

Quantitative Techniques In Analysis 2 Multiple Regression Model Objectives of Multiple Regression Model The Multiple regression models are appropriate in either of the two types of research problems: 1. Prediction 2. Explanation In our regression model we will predict a particular dependent variable with the help of some predictors (independent variables). After that we will explain the output by examining the regression coefficients, their magnitude signs and statistical significances. By doing this we can determine the relative importance of each individual variable in the prediction of the dependent variable and the nature of the relationships between the independent variables and the dependent variable. Research Model Our Multiple regression problem selected for this assignment is: Prediction of math achievement by a combination of variables: motivation, competence, pleasure, grades in high school, parent's education, and gender That is we will evaluate and conclude how well these variables predict and explain the dependent variable math achievement. Dependent and Independent Variables Multiple Linear Regression is a dependence technique therefore we must specify which variable is dependent and which are independent. The Predictors initially included in the research problem are six in number (five scale and one nominal dichotomous). They are: 1. Motivation: A scale variable measuring motivation level of students. The values lie in between 1 and 4. 2. Pleasure: Another scale variable measuring personal pleasure level of students. It defines and differentiates the pleasure in life a student got. The values lie in between 1 and 4. 3. Grades in High School: It is also included in the prediction model which showing the grades obtained by the cases scaled from 8 to 1, indicating grade A to D. 4. Competence: It is another subjective predictor indicating the ability, aptitude or IQ of the students which is again marked from 1 to 4 scores. 5. Parents Education: A scale variable, measures the parents education of the students from scale 2 to 10 where 2 indicates lowest education level i.e. less than high school grade and 10 shows the highest i.e. PhD. 6. Gender: A nominal variable which is dichotomous (having 2 categories) can be included as a predictor in the multiple regression model. Here gender includes two categories, male and female numbered 0 and 1 respectively.

Quantitative Techniques In Analysis 3 Multiple Regression Model The dependent variable, which is to be predicted here, is Math Achievement Test. We will see if math achievement can be predicted better from a combination of these predictors. The table in the test run showing the variables entered is given below.
Variables Entered/Removed Model Variables Entered
d

Variables Removed Method . Enter

competence scale, gender, parents' education, pleasure scale, grades in h.s., motivation scale
a

a. All requested variables entered. b. Dependent Variable: math achievement test

Sample Size It is the most influential element in our control in the research problem and it plays an important role in assessing the statistical power of an analysis. Our research problem includes 75 cases for each variable. Assumptions in Multiple Regression Analysis Multiple regression analysis is based on certain assumptions. Assumptions tested for this model are: 1. Normality 2. Multicollinearity 1) Normality We have to check normal distribution of residuals because calculation of confidence interval and various significance test for coefficient are all based on the assumption of normally distributed errors therefore, it should be followed. For Checking if the residuals are following normal distribution we have to consider the test of normality.

Quantitative Techniques In Analysis 4 Multiple Regression Model


Tests of Normality Kolmogorov-Smirnov Statistic math achievement test .071 df 75
a

Shapiro-Wilk Statistic
*

Sig. .200

df 75

Sig. .040

.966

a. Lilliefors Significance Correction *. This is a lower bound of the true significance.

The hypothesis of this table is if significant value is greater than 0.05 then the residuals are following normal distribution. According to the results of the table Kolmogorov-Smirnov the significant value is greater than 0.05 therefore, we can conclude that the errors are normally distributed. 2) Multicollinearity Multicollinearity (or collinearity) occurs when there are high inter-correlations among some set of the predictor variables. In other words, multicollinearity happens when two or more predictors contain much of the same information. For checking multicollinearity we have to consider the collinearity statistics of the following table.

Model (Constant) gender grades in h.s. competence scale motivation scale pleasure scale parents' education

Collinearity Statistics Tolerance .811 .747 .549 .641 .776 .796 VIF 1.233 1.339 1.821 1.561 1.289 1.256

Tolerance and VIF give the same information. (Tolerance = 1 /VIF) They tell us if there is multicollinearity. If the tolerance value is less than 0.5 or VIF is greater than 2, then there is probably a problem with multicollinearity. Here, the tolerance values for all predictors are greater than 0.5 therefore there is low colllinearity among the predictors. Competence scale is quite near to the threshold, possibly correlated with the motivation scale. After checking these assumptions on which the model is presumed to be based, we can proceed to the further tests and results to assess our model.

Quantitative Techniques In Analysis 5 Multiple Regression Model TEST RUN 1 Statistical Significance of the Model The hypothesis of the ANOVA table is, There is no relationship between the predictors and the outcome The significance value in the ANOVA table given below indicates that the combination of the independent variables significantly predicts the dependent variable. If the value is less than 0.05, i.e. the p-value, than it is considered to be statistically significant and the null hypothesis is rejected.
ANOVA Model 1 Regression Residual Total Sum of Squares 1523.461 1750.333 3273.794 df 6 64 70
b

Mean Square 253.910 27.349

F 9.284

Sig. .000
a

a. Predictors: (Constant), parents' education, pleasure scale, gender, motivation scale, grades in h.s., competence scale b. Dependent Variable: math achievement test

Here the Sig. value is less the 0.05 and it shows that the above predictors are contributing in the model and predicting the math achievement results significantly. The ANOVA table shows that F 9.284 and is significant. This also indicates that the combination of the predictors significantly predict math achievement.

Model Summary Table


Model Summary Model R R Square Adjusted R Square .415 Std. Error of the Estimate 5.22962

.682

.465

a. Predictors: (Constant), parents' education, pleasure scale, gender, motivation scale, grades in h.s., competence scale

The Model Summary table shows that the multiple correlation coefficient (R), using all the predictors, is .682, R square is 0.465 and the adjusted R square is .415, meaning that 46% of the variance in math achievement can be predicted from gender, competence and other variables combined. Note that the adjusted R square is lower than the unadjusted. This is because it takes into account the sample size and the number of independent variables in the regression model.

Quantitative Techniques In Analysis 6 Multiple Regression Model Coefficients Table It is a very important table is the multiple regression model analysis. Here the individual predictors are assessed. The results in this table show which variable is significantly contributing in the model and which is not. It also tells the relative importance of the predictors and the type of relationship with the dependent variable. The hypothesis of this table is, B (beta) = 0. We do not reject the null hypothesis if the sig. value is greater than 0.05.
Coefficients Model Unstandardized Coefficients B 1 (Constant) Gender grades in h.s. competence scale motivation scale pleasure scale parents' education -7.859 -3.729 1.966 .360 1.701 .852 .592 Std. Error 4.546 1.382 .455 1.253 1.207 1.180 .314 -.274 .457 .035 .161 .075 .193
a

Standardized Coefficients Beta t -1.729 -2.698 4.325 .287 1.410 .722 1.883 Sig. .089 .009 .000 .775 .164 .473 .064

a. Dependent Variable: math achievement test

In this table there are different columns to explain. The t value and the Sig opposite each independent variable indicates whether that variable is significantly contributing to the equation for predicting math achievement from the whole set of predictors. Thus, high school grades and gender, in this table, are the only variables that are significantly adding to the prediction as their sig. values are less than 0.05, hence rejecting the null hypothesis. The Standardized beta coefficients in the table are showing the relative importance of each predictor in the model. Grades in high school with the highest B value of 0.475 is contributing the most in the research model while competence scale with the value of 0.035 is contributing the least. The signs of the Beta values are showing the type of relationship a predictor possesses with the predicted variable. Except gender, all other predictors are showing positive relationship with the dependent variable (math achievement). Individual interpretation of the predictors will be explained later. The magnitude of the Beta coefficients explains how much change in math achievement scores would take place with the per unit change in each predictor. Here an additional one mark in high school grades

Quantitative Techniques In Analysis 7 Multiple Regression Model would increase the math achievement score by 1.966. Other variables will be interpreted later on in the report. According to the coefficients table, the sig values of competence scale, pleasure scale, motivation and parents education are not statistically significant. It means they are not contributing significantly in our research model, therefore to increase the overall prediction capability of the model i.e. to increase the F value we have to eliminate the insignificant variables. TEST RUN 2 Now we re-run the test by removing the predictor with the highest sig. value that is competence scale with the sig. value of 0.775. Rests of the predictors are included. Dependent and Independent variables
Variables Entered/Removed Model Variables Entered
d

Variables Removed Method

parents' education, pleasure scale, gender, grades in h.s., motivation scale


a

Competence Enter scale.

a. All requested variables entered. b. Dependent Variable: math achievement test

Following are the results obtained:

Quantitative Techniques In Analysis 8 Multiple Regression Model Statistical Significance of the Model
ANOVA Model 1 Regression Residual Total Sum of Squares 1530.302 1756.774 3287.076 df 5 67 72
b

Mean Square 306.060 26.221

F 11.673

Sig. .000
a

a. Predictors: (Constant), parents' education, pleasure scale, gender, grades in h.s., motivation scale b. Dependent Variable: math achievement test

Here in the table the F value is increased in the new model proving that some insignificant variable has been removed. Model Summary
Model Summary Model R R Square Adjusted R Square Std. Error of the Estimate

.682

.466

.426

5.12060

a. Predictors: (Constant), parents' education, pleasure scale, gender, grades in h.s., motivation scale In the table, interestingly the R square value increase as compare to the previous model summarys result despite using less number of predictors. It indicates that the new model is predicting more of the variance in math achievement than the previous model. This was due to the competence scale variable with the little predictive power and might because of its correlation with the other predictors, which may decrease their predictive power as well. Now almost 47% variation in math achievement test can be explained by the model. Coefficients Table
Coefficients Model Unstandardized Coefficients B 1 (Constant) Gender grades in h.s. -7.707 -3.827 2.051 Std. Error 4.423 1.305 .407 -.284 .482
a

Standardized Coefficients Beta t -1.742 -2.933 5.042 Sig. .086 .005 .000

Quantitative Techniques In Analysis 9 Multiple Regression Model


motivation scale pleasure scale parents' education 1.872 .946 .548 1.022 1.067 .283 .177 .084 .187 1.832 .886 1.934 .071 .379 .057

a. Dependent Variable: math achievement test

In these results, still gender and grades in high school are significantly predicting math achievement, while other predictors are not. Pleasure scale as being the weakest predictor should be removed in the next cycle.

TEST RUN 3 In this step we are removing the pleasure scale, which produced the p value equal to 0.379 and analyzing the results. Dependent and Independent variables
Variables Entered/Removed Model Variables Entered
d

Variables Removed Method

parents' education, motivation scale, grades in h.s., gender


a

Pleasure scale. Enter

a. All requested variables entered. b. Dependent Variable: math achievement test

Statistical Significance of the Model

Quantitative Techniques In Analysis 10 Multiple Regression Model


ANOVA Model 1 Regression Residual Total Sum of Squares 1509.723 1777.353 3287.076 df 4 68 72
b

Mean Square 377.431 26.138

F 14.440

Sig. .000
a

a. Predictors: (Constant), parents' education, motivation scale, grades in h.s., gender b. Dependent Variable: math achievement test

F value is now further increased up to 14.44 in the new model as compare to the previous models 11.673, proving that an insignificant variable has been removed. Model Summary
Model Summary Model 1 R .678 R Square .459 Adjusted R Square .427 Std. Error of the Estimate 5.11249

a. Predictors: (Constant), parents' education, motivation scale, grades in h.s., gender In this table, R square value has decreased, but very minutely, as compare to the previous model summarys result because now we are using less number of predictors. The gradual decrease is due to the removal of the predictor (pleasure scale) which was not affecting the model significantly. It means pleasure scale was not contributing much in explaining variation in math achievement. Now 46% variation in math achievement test is explained by the model. Coefficients Table
Coefficients Model Unstandardized Coefficients B 1 (Constant) gender grades in h.s. motivation scale parents' education -5.444 -3.631 1.991 2.148 .580 Std. Error 3.605 1.284 .400 .972 .280 -.269 .468 .203 .198
a

Standardized Coefficients Beta t -1.510 -2.828 4.972 2.211 2.070 Sig. .136 .006 .000 .030 .042

a. Dependent Variable: math achievement test

Quantitative Techniques In Analysis 11 Multiple Regression Model Finally all the predictors are now contributing significantly in the model in predicting math achievement. Each predictors p value is less than 0.05. We can evaluate the same results in one step through backward method. Backward Method Variables
Variables Entered/Removed Model Variables Entered 1 parents' education, pleasure scale, gender, motivation
d

Variables Removed Method . Enter

scale, grades in h.s., competence scale 2


a

. competence scale

Backward (criterion: Probability of F-to-remove >= .100).

. pleasure scale

Backward (criterion: Probability of F-to-remove >= .100).

a. All requested variables entered. b. Dependent Variable: math achievement test

Quantitative Techniques In Analysis 12 Multiple Regression Model Model Summary


Model Summary Model R
d

Adjusted R R Square
a b c

Std. Error of the Estimate

Square .415 .423 .425

1 2 3

.682 .682

.465 .465 .458

5.22962 5.19258 5.18352

.677

a. Predictors: (Constant), parents' education, pleasure scale, gender, motivation scale, grades in h.s., competence scale b. Predictors: (Constant), parents' education, pleasure scale, gender, motivation scale, grades in h.s. c. Predictors: (Constant), parents' education, gender, motivation scale, grades in h.s. ANOVA Model 1 Regression Residual Total 2 Regression Residual Total 3 Regression Residual Total Sum of Squares 1523.461 1750.333 3273.794 1521.208 1752.586 3273.794 1500.446 1773.348 3273.794 df 6 64 70 5 65 70 4 66 70 375.111 26.869 13.961 .000
c d

Mean Square 253.910 27.349

F 9.284

Sig. .000
a

304.242 26.963

11.284

.000

a. Predictors: (Constant), parents' education, pleasure scale, gender, motivation scale, grades in h.s., competence scale b. Predictors: (Constant), parents' education, pleasure scale, gender, motivation scale, grades in h.s. c. Predictors: (Constant), parents' education, gender, motivation scale, grades in h.s. d. Dependent Variable: math achievement test

Quantitative Techniques In Analysis 13 Multiple Regression Model


Coefficients Model Unstandardized Coefficients B 1 (Constant) gender grades in h.s. competence scale motivation scale pleasure scale parents' education 2 (Constant) gender grades in h.s. motivation scale pleasure scale parents' education 3 (Constant) gender grades in h.s. motivation scale parents' education -7.859 -3.729 1.966 .360 1.701 .852 .592 -7.750 -3.750 2.007 1.873 .967 .591 -5.462 -3.518 1.947 2.158 .627 Std. Error 4.546 1.382 .455 1.253 1.207 1.180 .314 4.498 1.370 .429 1.039 1.102 .312 3.659 1.342 .423 .985 .309 -.258 .453 .204 .204 -.275 .467 .177 .085 .193 -.274 .457 .035 .161 .075 .193
a

Standardized Coefficients Beta t -1.729 -2.698 4.325 .287 1.410 .722 1.883 -1.723 -2.737 4.681 1.803 .878 1.893 -1.493 -2.621 4.608 2.190 2.031 Sig. .089 .009 .000 .775 .164 .473 .064 .090 .008 .000 .076 .383 .063 .140 .011 .000 .032 .046

a. Dependent Variable: math achievement test

In the above tables, approximately same results are deduced as were calculated when individually predictors were removed in 3 steps. Excluding the Constant The sig. value of constant term is greater than 0.05 therefore it is insignificant to the model. We can exclude the constant term. The results are as under:
Model Summary Model 1 2 3 4 R .935 .935 .935 .933 R Squareb .875 .875 .875 .870 Adjusted R Square .863 .865 .867 .864 Std. Error of the Estimate 5.30902 5.26945 5.23083 5.29067

Quantitative Techniques In Analysis 14 Multiple Regression Model


a. Predictors: parents' education, gender, motivation scale, grades in h.s., pleasure scale, competence scale b. For regression through the origin (the no-intercept model), R Square measures the proportion of the variability in the dependent variable about the origin explained by regression. This CANNOT be compared to R Square for models which include an intercept. c. Predictors: parents' education, gender, motivation scale, grades in h.s., pleasure scale d. Predictors: parents' education, gender, motivation scale, grades in h.s. e. Predictors: parents' education, gender, grades in h.s.

By excluding the constant term the R square become 87%. It means 87% variation in math achievement is explained by the independent variables.

ANOVA Model 1 Regression Residual Total 2 Regression Residual Total 3 Regression Residual Total 4 Regression Residual Total Sum of Squares 12799.551 1832.071 14631.622
b

f,g

df 6 65 71 5 66 71 4 67 71 3 68 71

Mean Square 2133.258 28.186

F 75.686

Sig. .000
a

12798.992 1832.630 14631.622


b

2559.798 27.767

92.188

.000

12798.397 1833.225 14631.622


b

3199.599 27.362

116.938

.000

12728.224 1903.398 14631.622


b

4242.741 27.991

151.574

.000

a. Predictors: parents' education, gender, motivation scale, grades in h.s., pleasure scale, competence scale b. This total sum of squares is not corrected for the constant because the constant is zero for regression through the origin. c. Predictors: parents' education, gender, motivation scale, grades in h.s., pleasure scale d. Predictors: parents' education, gender, motivation scale, grades in h.s. e. Predictors: parents' education, gender, grades in h.s. f. Dependent Variable: math achievement test g. Linear Regression through the Origin

Quantitative Techniques In Analysis 15 Multiple Regression Model From the above ANOVA table we can see the increased F value (151.574) by the exclusion of insignificant constant term.

Coefficients Model

a,b

Standardized Unstandardized Coefficients B Std. Error -4.077 1.679 .179 1.073 -.198 .549 -4.085 1.701 1.163 -.133 .549 -4.154 1.695 1.061 .539 -4.068 2.108 .647 1.388 .430 1.268 1.168 1.027 .318 1.376 .396 .968 .911 .316 1.284 .391 .662 .306 1.298 .297 .302 Coefficients Beta -.208 .691 .042 .220 -.044 .186 -.208 .700 .239 -.030 .186 -.212 .697 .218 .182 -.207 .867 .219 t -2.937 3.908 .141 .918 -.193 1.727 -2.968 4.295 1.202 -.146 1.740 -3.234 4.336 1.601 1.763 -3.134 7.092 2.146 Sig. .005 .000 .888 .362 .848 .089 .004 .000 .234 .884 .087 .002 .000 .114 .083 .003 .000 .035

gender grades in h.s. competence scale motivation scale pleasure scale parents' education

gender grades in h.s. motivation scale pleasure scale parents' education

gender grades in h.s. motivation scale parents' education

gender grades in h.s. parents' education

a. Dependent Variable: math achievement test b. Linear Regression through the Origin

In the coefficient table, when the constant is excluded only three variables are left in the model which are explaining the variation in math achievement test. In the further discussions we are including the insignificant constant term in the multiple regression equation predicting math achievement score because this exclusion has changed the results greatly.

Quantitative Techniques In Analysis 16 Multiple Regression Model Conclusion After this discussion we came to a conclusion that in our research model, i.e prediction of math achievement test scores with the help of six predictors, only four predictors are contributing significantly. These four predictors are; 1) Gender 2) Grades in High School 3) Motivation Scale and, 4) Parents Education

Multiple Regression Equation Y` = X + X + X + X + C Where Y`=Predicted Math Achievement X=Gender X=Grades in High School X=Motivation scale X=Parents Education C=Constant term Interpretation of the equation With the help of the coefficient table on page # 10, we can interpret the beta values of the predictors in the equation. 1) Gender The beta value of gender is -3.631. We have nominated 0 to male and 1 to female, therefore the beta means that a female would score 3.631 less than a male in math achievement test. 2) Grades in high school The magnitude of beta is 1.991 which explains that per unit increase in grades in high school of students would predict an increase of 1.991 in math achievement. 3) Motivation Scale Here the beta of 2.148 interprets that one unit increase in motivation would increase math achievement by 2.148 units.

Quantitative Techniques In Analysis 17 Multiple Regression Model 4) Parents Education The beta here is small. Increase in one level of education of parents would increase the math achievement score by only 0.58. 5) Constant The constant term in this equation is -5.444. That is a student will score in negative if all the predictors beta are equal to zero. Therefore we can conclude that the model (gender, parents education, motivation and grades in high school) can predict the math achievement test. Future Use of the Equation 1. Our multiple regression model can be used for the purpose of forecasting. Math achievement scores can be predicted or forecasted in future as well. 2. With the combination of these variables we can also predict other type of tests similar to the math achievement test. 3. This model can be used in other quantitative techniques like logistic regression. Limitations: 1. The variables initially used in the model were subjective like Pleasure, Motivation and Competence which are difficult to be used as a scalar quantity. (Subject to which was obtained by providing various group and individual tests) 2. Pleasure to students provided should possess a limit as increase in the level of pleasure will affect the math achievement subsequently. 3. The constant term is included in the equation even when its p value was insignificant to show realistic prediction of math achievement with the help of given predictors.