You are on page 1of 3

Question: 27.25 (a) WEIGHT (GRAMS) = - 579 + 14.3 * LENGTH (CM) + 113 * WIDTH (CM) (b) 93.

7 % variation in perch is explained by the model. (c) The p-value of the model is around zero. So The ANOVA table indicates that only at least one of the explanatory variables is helpful in predicting the weight of the perch. (d) The p-value for 1 and 2 are 0.014 and around zero respectively at t-value 2.53 and 3.75. So the data provide evidence that both 1 and 2 are significantly different from zero. (e) WEIGHT (GRAMS) = 114 - 3.48 LENGTH (CM) - 94.6 WIDTH (CM) + 5.24 INTERACTION (f) 98.5 % variation in perch is explained by the model. (g) The p-value of the model is around zero. So The ANOVA table indicates that only at least one of the explanatory variables is helpful in predicting the weight of the perch. (h) The t-statistic was changed to -1.10 and p-value becomes 0.274 which is greater than -level 0.05 and data unable to provide evidence that the 1 significantly different from zero. Question: HW 4 1. Dependent variable: ManHours Independent variable: OCCUP, CHECHIN, HOURS, COMMON, WINGS, CAP, ROOMS Step-1 Root MSE of model =455.167 R-Sq=96.1% and R-Sq(adj)=94.5% P-value is around zero. So at least one of the variables contributes to the model. The individual p-value of OCCUP(0.129), HOURS (0.722) and WINGS (0.708) are more than alpha level 0.05 and so we will run the another regression without HOURS which has highest p-value among the all mentioned here. Step-2 Dependent variable: ManHours Independent variable: OCCUP, CHECHIN, COMMON, WINGS, CAP, ROOMS Root MSE of model = 444.049 R-Sq = 96.1% R-Sq(adj) = 94.8% Both root MSE decreased and R-Sq(adj) increased very little. Here model p-value still near zero. So at least one of the variables contributes to the model. The individual pvalue of OCCUP (0.122) and WINGS (0.696) are more than alpha level 0.05 thats why these are not significant. There should be another run of regression without WINGS which has highest p-value. Step-3 Dependent variable: ManHours Independent variable: OCCUP, CHECHIN, COMMON, CAP, ROOMS Root MSE of model = 434.089 R-Sq = 96.1% R-Sq(adj) = 95.0% Root MSE of model decreased more as compared previous run and little increase in R-Sq(adj) of the model. The model p-value still near zero and at least one of the variables contributes to the model. The individual pvalue of OCCUP is 0.096 which is more than alpha level 0.05 and OCCUP is not significant. The model run another regression without OCCUP. Step: 4 Dependent variable: ManHours Independent variable: CHECHIN, COMMON, CAP, ROOMS Root MSE of model = 455.909 R-Sq = 95.4% R-Sq(adj) = 94.5% Root MSE of model increased and little decrease in R-Sq(adj) as compared to previous run.

The model p-value still near zero and at least one of the variables contributes to the model. The individual pvalue of COMMON is 0.098 which is more than alpha level 0.05 and COMMON is not significant. The model run another regression without COMMON. Step:5 Dependent variable: ManHours Independent variable: CHECHIN, CAP, ROOMS Root MSE of model = 477.343 R-Sq = 94.7% R-Sq(adj) = 94.0% Root MSE of model increased more and little decrease in R-Sq(adj) as compared to previous run. The model p-value still near zero and at least one of the variables contributes to the model. The individual pvalues of all independent variable is less than 0.05 alpha level and all remaining variable in this run are significant which contribute to the model.
Histogram Versus Fits

(response is ManHours) 9 8 7 500


Residual

(response is ManHours) 1000

Frequency

6 5 4 3 2 1 0 -1000 -1000 -500 0 Residual 500 1000 0 1000 2000 3000 4000 5000 Fitted Value 6000 7000 8000 0

-500

Final model is

ManHours = 118 + 1.93 CHECKIN - 11.0 CAP + 22.7 ROOMS

The histogram of residuals looks normal and scatter plot of residuals does not have any pattern and looks random. So we can choose this model as final useful model to predict the manhour requirement for BOQ for the US navy. 2. Best subset model

The best two model which contain one independent variables are CHECKIN and ROOMS which have very high Cp away from Cp value 2 for single variable and has also very high Se for which we should look for model having more independent variables. The situation for two best model with two independent variable is similar to the models with one variable. When two best model with three variable are compared their Cp value is still far from 4 except a model having independent variables CHECKIN,CAPS and ROOMS with Cp value 6.1 and Se value 477.38 with R-Sq and R-Sq (adj) 94.7 % and 94.0 % respectively which can be a potential model to choose. But when two best model with four variable are analyzed, the model having independent variable CHECKIN, COMMON, CAPS and ROOMS has Cp 5.1 near to 5 which is very good. The R-Sq and R-Sq (adj) are 95.4 % , 94.5 % respectively which looks good as compared to another model having four independent variable having Cp value 13.4. So the model can be chosen for selection after exploring any other good model than this. The models having five independent variables have Cp value are 4.3 and 6.4 which are not close to 6 with Se values more than previously chosen model. There is decrease in Se value for two models having six independent variables but Cp values 6.1 for both which is not enough close to 7 but there is less increase in both R-sq and R-sq(adj) values. The model having all seven independent variables has Cp value 8 which is equal to actual Cp value(8). se value is 455.17 which is very close to chosen model with four independent variables with little increase in R-Sq and R-Sq(adj) values. But there is no huge change in all above mentioned parameter. By looking all models, I will choose model having four independent variables as CHECKIN, COMMON, CAPS and ROOMS which have pretty close Cp value (5.1), good R-Sq(95.4%) and R-Sq(adj)(94.5%) with lower se (455.91) values. These values are slightly better than the model chosen in question 1. Still I feel I should go with model chosen in question 1. So the model is ManHours = 199 + 1.72 CHECKIN - 17.0 COMMON - 13.1 CAP + 27.0 ROOMS
Histogram
Versus Fits

(response is ManHours) 9 8 7
500
Residual

(response is ManHours) 1000

Frequency

6 5 4 3 2 1 0
-1000 -500 0

-800

-400

0 Residual

400

800

1000

2000

3000

4000 5000 Fitted Value

6000

7000

8000

The residuals histograms looks normal and scatter looks random. 3. Collinearity is a problem in the choosing a regression model when there is a strong correlation between independent variables. The independent variables which are correlated doesnt not improve model other than gives a higher standard error of estimated parameters. One of the strongly correlated independent variable is redundant to the model.

You might also like