HW6

IE 561
STATISTICAL METHODS AND DATA ANALYSIS

ASSIGMENT:
Homework 6
Instructor:
Dr. Cheng-Hung, Hu
Department of Industrial Engineering and Management
Work elaborated by:
Belen Vargas
ID: 1015459
Carlos Manuel Santos
ID: 1015458
Due date: 2012/11/06
QUESTION 1
A)
Histogram of X1, X2, X3, X4
Normal
Frequency
X1
X2
6.0
4.5
3.0
1.5
0.0
60
80
100
120
X2
Mean 106.7
StDev 17.29
N
25
140
80
96
X3
X1
Mean 103.4
StDev 20.30
N
25
112
128
144
X4
X4
Mean 94.68
StDev 10.68
N
25
3
4
2
0
X3
Mean 100.8
StDev 8.855
N
25
1
80
90
100
110
120
70
80
90
100
110
120
Boxplot of X1, X2, X3, X4

X1
X2
150
120
125
100
100
75
50
120
80
X3
X4
110
110
100
100
90
90
80
80
70
The distribution on X2 and X3 are a little bit skewed, there could be a potential outlier on X1, and X1
seems to follow a normal distribution.
B)
Matrix Plot of Yi, X1, X2, X3, X4

50
100
150
80
100
120
120
90
Yi
60
150
100
X1
50
120
100
X2
80
120
100
X3
80
110
90
X4
70
60
90
120
80
100
120
70
90
110
Correlations: Yi, X1, X2, X3, X4

Yi
0.514
0.009
X1
X2
0.497
0.011
0.102
0.627
X3
0.897
0.000
0.181
0.387
0.519
0.008
X4
0.869
0.000
0.327
0.111
0.397
0.050
X1
X2
X3
0.782
0.000
Cell Contents: Pearson correlation

P-Value
Based on correlation values and the analysis of the scatter plot

matrix, Y is highly linearly correlated with all four predictors
variables. Based on the highlighted correlation values and the
corresponding scatter plots, X2 and X3, X3 and X4, are highly
correlated which indicates the possibility of multicollinearity
problem.
C) The regression equation is

Yi = - 124.382 + 0.29573 X1 + 0.04829 X2 + 1.3060 X3 + 0.5198 X4
Predictor
Constant
X1
X2
X3
X4
Coef
-124.382
0.29573
0.04829
1.3060
0.5198
SE Coef
9.941
0.04397
0.05662
0.1641
0.1319
T
-12.51
6.73
0.85
7.96
3.94
P
0.000
0.000
0.404
0.000
0.001
VIF
1.138
1.370
3.017
2.835
It seems as if X2 should be excluded from the model due to a pvalue of 0.404

D) USING FITTED VALUES (Yhat)
Best Subsets Regression: Yi versus X1, X2, X3, X4

Response is Yi
Vars
1
1
1
1
2
2
2
2
3
3
3
3
4
R-Sq
80.5
75.6
26.5
24.7
93.3
87.7
81.5
80.6
96.2
93.4
87.9
84.5
96.3
R-Sq(adj)
79.6
74.5
23.3
21.4
92.7
86.6
79.8
78.8
95.6
92.5
86.2
82.3
95.5
Mallows
Cp
84.2
110.6
375.3
384.8
17.1
47.2
80.6
85.5
3.7
18.5
48.2
66.3
5.0
S
8.7676
9.8039
17.014
17.217
5.2512
7.1073
8.7193
8.9336
4.0720
5.3306
7.2237
8.1653
4.0986
X X X X
1 2 3 4
X
X
X
X
X
X
X X
X
X
X X
X
X X
X X X
X X X
X X
X
X X X X
The four best subset regression models according to the Ra,p2 criterion are
(X1,X3,X4)(X1,X2,X3,X4)(X1,X3)(X1,X2,X3)
E)
We would chose the stepwise method due to is an automatic procedure

for statistical model selection in cases where there is a large number
of potential explanatory variables (which is our case), and no
underlying theory on which to base the model selection. At each stage
in the process, after a new variable is added, a test is made to check
if some variables can be deleted without appreciably increasing
the residual sum of squares. The procedure terminates when the measure
is (locally) maximized, or when the available improvement falls below
some critical value.
F)
Stepwise Regression: Yi versus X1, X2, X3, X4

Alpha-to-Enter: 0.15
Alpha-to-Remove: 0.15
Response is Yi on 4 predictors, with N = 25

Step
Constant
X3
T-Value
P-Value
1
-106.1
2
-127.6
3
-124.2
1.97
9.74
0.000
1.82
14.81
0.000
1.36
8.94
0.000
0.348
6.49
0.000
0.296
6.78
0.000
X1
T-Value
P-Value
X4
T-Value
P-Value
S
R-Sq
R-Sq(adj)
Mallows Cp
0.52
3.95
0.001
8.77
80.47
79.62
84.2
5.25
93.30
92.69
17.1
4.07
96.15
95.60
3.7
The stepwise model selection method suggests that model with

(X1,X3,X4) as predictor variables is the best subset model which is
consistent with the conclusion we draw earlier with the
G)
2
Both the Stepwise method and the Ra,p

criterion indicate the model
with (X1,X3,X4) as predictor variables is the best subset regression
model, the only comparison we may obtain between both subset choosing
2
methods is that Ra,p criterion will indicate how each subset fits the
model and the Stepwise will always indicate which is the best subset
immediately, so if we had a larger number of predictors we would
choose to use a method that would compute a faster answer which would
be in this case the Stepwise method.
QUESTION 2
A)
The regression equation is
LNY = - 2.04 - 0.712 LNX1 + 0.747 LN (140-X2) + 0.757 LNX3
B)
Normal Probability Plot
(response is LNY)
99
95
90
Percent
80
70
60
50
40
30
20
10
5
-0.4
-0.3
-0.2
-0.1
0.0
Residual
0.1
0.2
0.3
0.4
Scatterplot of RESI1 vs FITS1, LNX3, LN (140-X2), LNX1

FITS1
LNX3
0.4
0.2
0.0
-0.2
RESI1
-0.4
3.5
4.0
4.5
5.0
4.0
4.2
LN (140-X2)
0.4
4.4
4.6
LNX1
0.2
0.0
-0.2
-0.4
4.20
4.35
4.50
4.65
4.80 -0.5
0.0
0.5
1.0
C)
Predictor
Constant
LNX1
LN (140-X2)
LNX3
Coef
-2.043
-0.71195
0.7474
0.7574
SE Coef
1.019
0.09203
0.1570
0.1592
T
-2.00
-7.74
4.76
4.76
P
0.054
0.000
0.000
0.000
VIF
1.339
1.330
1.016
There are no indications that serious multicollinearity exist because

the average of the VIF not considerably greater than one.
D)
Subject
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
T
-0.02397
0.0034
-0.21765
0.27942
-0.15891
-0.16114
0.649973
-0.58074
-0.04679
-0.51129
1.127888
-0.86439
-0.08746
-0.11518
1.070476
-1.43825
0.734645
Subject
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
T
0.955272
0.955632
1.648977
-1.68083
-0.606
0.425732
0.229971
-0.73581
-1.43759
-0.33157
2.214574
-2.51995
-0.76873
-0.97484
1.98281
0.829027
We would like to test whether the absolute value of subject 29, which
is the largest value, is an outlier. We shall use the Bonferroni
simultaneous test procedure with =10.
t(0.9985,28)=3.24993
We can observe that the absolute value of subject 29 is not larger
than the Tvalue given by the Bonferroni procedure, therefore we have
enough statistical evidence to support that the model doesnt have any
outliers.

HW6

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

HW6

Uploaded by

Copyright:

Available Formats

IE 561

STATISTICAL METHODS AND DATA ANALYSIS

Boxplot of X1, X2, X3, X4

Matrix Plot of Yi, X1, X2, X3, X4

Correlations: Yi, X1, X2, X3, X4

Cell Contents: Pearson correlation

Based on correlation values and the analysis of the scatter plot

C) The regression equation is

It seems as if X2 should be excluded from the model due to a pvalue of 0.404

Best Subsets Regression: Yi versus X1, X2, X3, X4

We would chose the stepwise method due to is an automatic procedure

Stepwise Regression: Yi versus X1, X2, X3, X4

Response is Yi on 4 predictors, with N = 25

The stepwise model selection method suggests that model with

Both the Stepwise method and the Ra,p

Scatterplot of RESI1 vs FITS1, LNX3, LN (140-X2), LNX1

There are no indications that serious multicollinearity exist because

You might also like