You are on page 1of 21

Homework 3 (Attendance 5) for Statistics 512

Applied Regression Analysis


Material Covered: Chapter 6 Neter et al. and Kuhn
By: Friday, 3rd October, Fall 2003

This homework is worth 5% and marked out of 5 points. Homework assignments


are to be handed in using Vista on the Internet before 4am. Vista will not allow
any homework assignment to be handed in late. It is highly recommended that you
complete the homework, by hand, before logging onto Vista; use Vista simply to
submit your answers. Submit as many times as you want before the deadline and
receive the highest score of all the submissions. This is an individual homework
and so each student submits their own homework, although they are encouraged to
cooperate with other students.

1. Applied Linear Statistical Models


(Neter et al.) Questions.
Chapter Problem(s) hints
6, pages 252257 6.9, 6.10, 6.11, 6.12, 6.13, 6.14 Chemical shipment
6.18, 6.19, 6.20, 6.21 Mathematicians salaries
(6.9) chemical shipment, hw3-6-9-chem-diagnos

*HOMEWORK 3, 6-9, PAGES 252-257;


DATA CHEMICAL;
INPUT Y X1 X2 TIME;
DATALINES;
58 7 5.11 1
152 18 16.72 2
41 5 3.2 3
93 14 7.03 4
101 11 10.98 5
38 5 4.04 6
203 23 22.07 7
78 9 7.03 8
117 16 10.62 9
44 5 4.76 10
121 17 11.02 11
112 12 9.51 12
50 6 3.79 13
82 12 6.45 14
48 8 4.6 15
127 15 13.86 16
140 17 13.03 17
155 21 15.21 18
39 6 3.64 19
90 11 9.57 20
;
*6.9(A) STEM AND LEAF OF X1 AND X2;
PROC UNIVARIATE DATA=CHEMICAL PLOT;
TITLE1 '6.9(A) STEM AND LEAF OF NUMBER OF DRUMS, X1';
TITLE2 'AND OF WEIGHT OF SHIPMENTS, X2';
VAR X1 X2;
RUN;
*6.9(B) TIMEPLOTS OF HANDLING MINUTES;
SYMBOL1 V=STAR C=BLACK;
PROC GPLOT DATA=CHEMICAL;
TITLE1 '6.9(B-1) TIMEPLOT OF NUMBER OF DRUMS, X1';
PLOT X1*TIME;
RUN;
SYMBOL1 V=STAR C=BLACK;
PROC GPLOT DATA=CHEMICAL;
TITLE1 '6.9(B-2) TIMEPLOT OF WEIGHT OF SHIPMENTS, X2';
PLOT X2*TIME;
RUN;
*6.9(C) SCATTERPLOT MATRICES AND CORRELATION;
SYMBOL1 V=STAR C=BLACK;
PROC GPLOT DATA=CHEMICAL;
TITLE1 '6.9(C-1) HANDING TIME VERSUS NUMBER OF DRUMS, Y VS X1';
PLOT Y*X1;
RUN;
SYMBOL1 V=STAR C=BLACK;
PROC GPLOT DATA=CHEMICAL;
TITLE1 '6.9(C-2) HANDING TIME VERSUS NUMBER OF DRUMS, Y VS X2';
PLOT Y*X2;
RUN;
SYMBOL1 V=STAR C=BLACK;
PROC GPLOT DATA=CHEMICAL;
TITLE1 '6.9(C-3) HANDING TIME VERSUS NUMBER OF DRUMS, X1 VS X2';
PLOT X1*X2;
RUN;
PROC CORR DATA=CHEMICAL;
TITLE '6.9(C-4) CORRELATION Y, X1 AND X2';
VAR Y X1 X2;
RUN;
QUIT;

(a) Stemandleaf plots.


Look for outliers.
(b) Time Plots.
Any patterns?
(c) Scatter plots and correlation matrix
It would be good that Y is strongly linearly related to both X1 and X2 ,
but it would be bad that X1 and X2 are strongly linearly related to one
another.
(6.10) chemical shipment again, hw3-6-10-chem-residual

*HOMEWORK 3, 6-10, PAGES 252-257;


DATA CHEMICAL;
INPUT Y X1 X2 TIME;
X1X2 = X1*X2;
DATALINES;
58 7 5.11 1
152 18 16.72 2
41 5 3.2 3
93 14 7.03 4
101 11 10.98 5
38 5 4.04 6
203 23 22.07 7
78 9 7.03 8
117 16 10.62 9
44 5 4.76 10
121 17 11.02 11
112 12 9.51 12
50 6 3.79 13
82 12 6.45 14
48 8 4.6 15
127 15 13.86 16
140 17 13.03 17
155 21 15.21 18
39 6 3.64 19
90 11 9.57 20
;
*6.10(A) REGRESSION;
PROC REG DATA=CHEMICAL OUTEST=EST;
TITLE1 '6.10(A) REGRESSION OF Y VS X1 AND X2';
MODEL Y = X1 X2;
OUTPUT OUT=OUTPLOT PREDICTED=PRED RESIDUAL=RESID;
RUN;
*6.10(B) BOXPLOT OF RESIDUALS;
PROC UNIVARIATE DATA=OUTPLOT PLOT;
TITLE1 '6.10(B) BOXPLOT OF RESIDUALS';
VAR RESID;
RUN;
*6.10(C) RESIDUALS VS PREDICTED, X1, X2 AND X1X2;
SYMBOL1 V=STAR C=BLACK;
PROC GPLOT DATA=OUTPLOT;
TITLE '6.10(C-1) RESIDUALS VS PREDICTED';
PLOT RESID*PRED;
RUN;
SYMBOL1 V=STAR C=BLACK;
PROC GPLOT DATA=OUTPLOT;
TITLE '6.10(C-2) RESIDUALS VS X1';
PLOT RESID*X1;
RUN;
SYMBOL1 V=STAR C=BLACK;
PROC GPLOT DATA=OUTPLOT;
TITLE '6.10(C-3) RESIDUALS VS X2';
PLOT RESID*X2;
RUN;
SYMBOL1 V=STAR C=BLACK;
PROC GPLOT DATA=OUTPLOT;
TITLE '6.10(C-4) RESIDUALS VS X1X2';
PLOT RESID*X1X2;
RUN;
*6.10(C) NORMAL PROBABILITY PLOT;
* RESIDUALS VS EXPECTED RESIDUALS;
PROC SORT DATA=OUTPLOT;
BY RESID;
RUN;
DATA OUTPLOT;
SET OUTPLOT NOBS=NOBS;
QUANTILE = PROBIT( (_N_- (3/8)) / (NOBS + (1/4)) );
RUN;
DATA OUTPLOT2;
IF _N_ = 1 THEN SET EST;
SET OUTPLOT;
EXPRESIDUAL = _RMSE_*QUANTILE;
RUN;
PROC GPLOT DATA=OUTPLOT2;
TITLE '6.10(C-5) NORMAL PROBABILITY PLOT';
PLOT RESID*EXPRESIDUAL;
RUN;
*6.10(D) TIMEPLOT OF RESIDUALS;
SYMBOL1 V=STAR C=BLACK;
PROC GPLOT DATA=OUTPLOT;
TITLE1 '6.10(D) TIMEPLOT OF RESIDUALS';
PLOT RESID*TIME;
RUN;
*6.10(E) LEVENE TEST OF RESIDUALS;
DATA NEWCHEMICAL;
SET OUTPLOT;
IF PRED < 92 THEN LEVENEGROUP = 'A';
IF PRED GE 92 THEN LEVENEGROUP = 'B';
RUN;
PROC GLM DATA=NEWCHEMICAL ALPHA=0.01;
TITLE '6.10(E) (UNMODIFIED) LEVENE TEST';
TITLE1 'OF HOMOGENEITY OF VARIANCE OF RESIDUALS';
CLASS LEVENEGROUP;
MODEL RESID = LEVENEGROUP;
MEANS LEVENEGROUP / HOVTEST = LEVENE (TYPE=ABS);
RUN;
QUIT;
(a) Estimated regression function.
(b) Box plot of the residuals.
Look for outliers.
(c) Residual plots.
It is good if there is no pattern or outliers in residual plots.
(d) Residuals versus time plot.
(e) Levene Test
1. Statement.
The statement of the test is (check none, one or more):
(i) H0 : error variance constant versus H1 : > 1.
(ii) H0 : error variance constant versus H1 : not constant
(iii) H0 : error variance constant versus H1 : 6= 1.
2. Test.
From SAS, the pvalue is (choose one) 0.446 / 0.8278 / 0.989
The level of significance is (circle one) 0.01 / 0.05 / .10
3. Conclusion.
Since the pvalue is smaller / larger than the level of significance we
(circle one) accept / reject the null hypothesis that the error variance
is constant.
(6.11) chemical shipment again, hw3-6-11-chem-regress

*HOMEWORK 3, 6-11, PAGES 252-257;


DATA CHEMICAL;
INPUT Y X1 X2 TIME;
DATALINES;
58 7 5.11 1
152 18 16.72 2
41 5 3.2 3
93 14 7.03 4
101 11 10.98 5
38 5 4.04 6
203 23 22.07 7
78 9 7.03 8
117 16 10.62 9
44 5 4.76 10
121 17 11.02 11
112 12 9.51 12
50 6 3.79 13
82 12 6.45 14
48 8 4.6 15
127 15 13.86 16
140 17 13.03 17
155 21 15.21 18
39 6 3.64 19
90 11 9.57 20
;
*6.11 REGRESSION;
PROC REG DATA=CHEMICAL;
TITLE1 '6.11 REGRESSION OF Y VS X1 AND X2';
MODEL Y = X1 X2;
RUN;
QUIT;

Source Sum Of Squares Degrees of Freedom Mean Squares


Regression 40,496.48 p1=31=2 20,248.24
Error 536.47 n p = 20 3 = 17 31.56
Total 41,032.95 n 1 = 20 1 = 19

(a) Test of regression relation at = 0.05.


1. Statement.
The statement of the test is (check none, one or more):
(i) H0 : 1 = 2 = 0 versus H1 : 1 = 2 > 0.
(ii) H0 : 1 = 2 = 0 versus H1 : 1 = 2 < 0.
(iii) H0 : 1 = 2 = 0 versus H1 : not all i is zero.
2. Test.
From SAS, the pvalue is (choose one) 0 / 0.0827 / 0.098
The level of significance is (circle one) 0.01 / 0.05 / .10
3. Conclusion.
Since the pvalue is smaller / larger than the level of significance we
(circle one) accept / reject the null hypothesis that 1 = 2 = 0.
(b) Bonferroni Confidence Intervals.
From TI83 (INVT 18 ENTER 0.975 ENTER)
B = t(1 /2g; n 2) = t(1 0.05/2(2); 20 2) = t(0.9875; 18) = 2.458
From SAS,
1. Bonferroni CI for 1 :
b1 = 3.7681 and s{b1 } = 0.614,
b1 Bs{b1 } = 3.7681 2.458(0.614) =?
2. Bonferroni CI for 2 :
b2 = 5.0796 and s{b2 } = 0.666
b2 Bs{b2 } = 5.0796 2.458(0.666) =?
(c) Correlation Coefficient.
SSR
R2 = SSTO = 40,496.48
41,032.95
0.987
2
R is also given directly on the SAS output
(6.12) chemical shipment again, hw3-6-12-chem-respCI

*HOMEWORK 3, 6-12, PAGES 252-257;


DATA CHEMICALX;
INPUT Y X1 X2 TIME;
DATALINES;
58 7 5.11 1
152 18 16.72 2
41 5 3.2 3
93 14 7.03 4
101 11 10.98 5
38 5 4.04 6
203 23 22.07 7
78 9 7.03 8
117 16 10.62 9
44 5 4.76 10
121 17 11.02 11
112 12 9.51 12
50 6 3.79 13
82 12 6.45 14
48 8 4.6 15
127 15 13.86 16
140 17 13.03 17
155 21 15.21 18
39 6 3.64 19
90 11 9.57 20
. 5 3.20 21
. 6 4.80 22
. 10 7.00 23
. 14 10.00 24
. 20 18.00 25
;
DATA CHEMICAL X;
SET CHEMICALX;
IF READ NE . THEN OUTPUT CHEMICAL;
ELSE OUTPUT X;
RUN;
PROC REG DATA=CHEMICAL ALPHA=0.05 NOPRINT;
TITLE '6.12(A) BONFERRONI AND WH JOINT CIs FOR MEAN';
MODEL Y = X1 X2;
OUTPUT OUT=OUTPLOT PREDICTED=PRED RESIDUAL=RESID;
RUN;
PROC REG DATA=CHEMICALX;
MODEL Y = X1 X2;
OUTPUT OUT=PRED_DS(WHERE=(Y =.)) P=PHAT STDP=STDP;
RUN;
PROC PRINT DATA=PRED_DS;
RUN;
PROC PLOT DATA=CHEMICALX;
TITLE '6.12(B) RANGE OF X1 AND X2';
PLOT X1*X2=Y;
RUN;
PROC G3D DATA=CHEMICALX;
SCATTER X1*X2=Y;
RUN;
QUIT;

(a) Family CIs For Different Responses.


At = 0.05, and g = 5 (five simultaneous intervals),
from TI83,
q q
W = pF (1 ; p, n p) = 3F (1 0.05; 3, 20 3) = 3.098
(INVF 3 ENTER 17 ENTER 0.95 ENTER,
then multiply by 3 and find the square root)
B = t(1 /2g; n p) = t(1 0.05/2(5); 20 3) = t(0.995; 17) = 2.898
(INVT 17 ENTER 0.995 ENTER)
Since W = 3.098 > B = 2.898, use ?
From SAS,
1. Xh1 = 5, Xh2 = 3.20:
Yh = 38.4195 and s{Yh } = 2.0332
Yh Bs{Yh } = 38.4195 2.898(2.0332) =?
2. Xh1 = 6, Xh2 = 4.80:
Yh = 50.3150 and s{Yh } = 1.9192
Yh Bs{Yh } = 50.3150 2.898(1.9192) =?
3. Xh1 = 10, Xh2 = 7.00:
Yh = 76.5625 and s{Yh } = 1.3701
Yh Bs{Yh } = 76.5625 2.898(1.3701) =?
4. Xh1 = 14, Xh2 = 10.00:
Yh = 106.8737 and s{Yh } = 1.4761
Yh Bs{Yh } = 106.8737 2.898(1.4761) =?
5. Xh1 = 20, Xh2 = 18.00:
Yh = 170.1191 and s{Yh } = 2.6096
Yh Bs{Yh } = 170.1191 2.898(2.6096) =?
(b) Plot Xi1 versus Xi2 .
The point (X1 , X2 ) = (20, 5) is clearly where in the the scatter of points?
The point (X1 , X2 ) = (20, 19) is clearly where in the the scatter of points?
(6.13) chemical shipment again, hw3-6-13-chem-respPI

*HOMEWORK 3, 6-13, PAGES 252-257;


DATA CHEMICAL;
INPUT Y X1 X2 TIME;
DATALINES;
58 7 5.11 1
152 18 16.72 2
41 5 3.2 3
93 14 7.03 4
101 11 10.98 5
38 5 4.04 6
203 23 22.07 7
78 9 7.03 8
117 16 10.62 9
44 5 4.76 10
121 17 11.02 11
112 12 9.51 12
50 6 3.79 13
82 12 6.45 14
48 8 4.6 15
127 15 13.86 16
140 17 13.03 17
155 21 15.21 18
39 6 3.64 19
90 11 9.57 20
;
PROC IML;
USE CHEMICAL;
READ ALL VAR {'X1'} INTO X1;
READ ALL VAR {'X2'} INTO X2;
READ ALL VAR {'Y'} INTO Y;
N = NROW(X1);
M = NCOL(Y);
J = J(N,N,1);
X = J(N,1,1)||X1||X2;
B = INV(X`*X)*X`*Y;
H = X*INV(X`*X)*X`;
SSE = Y`*(I(N) - H)*Y;
DFE = N - 3;
MSE = SSE/DFE;
XH = { 1 1 1 1,
9 12 15 18,
7.20 9.00 12.50 16.50};
YHAT = XH`*B;
*SQRT WORKS BECAUSE NO NEGATIVES!;
SPRED = SQRT(MSE*(1 + XH`*INV(X`*X)*XH));
PRINT YHAT;
PRINT S2PRED;
PRINT SPRED;
RUN;
QUIT;

At = 0.05, g = 4 (four simultaneous intervals)


and p = 3 (three parameters, 0 ,1 , 2 ),
fromqTI83, q
S = gF (1 ; g, n p) = 4F (1 0.05; 4, 20 3) = 3.441
(INVF 4 ENTER 17 ENTER 0.95 ENTER,
then multiply by 4 and find the square root)
B = t(1 /2g; n p) = t(1 0.05/2(4); 20 3) = t(0.99375; 17) = 2.793
(INVT 17 ENTER 0.995 ENTER)
Since S = 3.441 > B = 2.793, use B because the Bonferroni gives narrower
(more efficient) CIs than the Scheffe CIs.
From SAS,
1. Xh1 = 9, Xh2 = 7.20:
Yh = 73.8103 and s{pred } = 5.8076
Yh Bs{Yh } = 73.8103 2.793(5.8076) =?
2. Xh1 = 12, Xh2 = 9.00:
Yh = 94.2579 and s{pred } = 5.7578
Yh Bs{Yh } = 94.2579 2.793(5.7578) =?
3. Xh1 = 15, Xh2 = 12.50:
Yh = 123.3408 and s{pred } = 5.8217
Yh Bs{Yh } = 123.3408 2.793(5.8217) =?
4. Xh1 = 18, Xh2 = 16.50:
Yh = 154.9635 and s{pred } = 6.1013
Yh Bs{Yh } = 154.9635 2.793(6.1013) =?
(6.14) chemical shipment again, hw3-6-14-chem-respPmean

*HOMEWORK 3, 6-14, PAGES 252-257;


DATA CHEMICAL;
INPUT Y X1 X2 TIME;
DATALINES;
58 7 5.11 1
152 18 16.72 2
41 5 3.2 3
93 14 7.03 4
101 11 10.98 5
38 5 4.04 6
203 23 22.07 7
78 9 7.03 8
117 16 10.62 9
44 5 4.76 10
121 17 11.02 11
112 12 9.51 12
50 6 3.79 13
82 12 6.45 14
48 8 4.6 15
127 15 13.86 16
140 17 13.03 17
155 21 15.21 18
39 6 3.64 19
90 11 9.57 20
;
PROC IML;
USE CHEMICAL;
READ ALL VAR {'X1'} INTO X1;
READ ALL VAR {'X2'} INTO X2;
READ ALL VAR {'Y'} INTO Y;
N = NROW(X1);
M = NCOL(Y);
J = J(N,N,1);
X = J(N,1,1)||X1||X2;
B = INV(X`*X)*X`*Y;
H = X*INV(X`*X)*X`;
SSE = Y`*(I(N) - H)*Y;
DFE = N - 3;
MSE = SSE/DFE;
XH = { 1 1 1,
7 7 7,
6 6 6};
YHAT = XH`*B;
*SQRT WORKS BECAUSE NO NEGATIVES!;
SPRED = SQRT(MSE*(1/3 + XH`*INV(X`*X)*XH));
PRINT YHAT;
PRINT SPRED;
RUN;
QUIT;

(a) Mean of New Observations CI.


At = 0.05, p = 3 (three parameters, 0 , 1 , 2 ),
and m = 3 (mean of three new observations)
from TI83,
B = t(1 /2; n p) = t(1 0.05/2; 20 3) = t(0.975; 17) = 2.110
(INVT 17 ENTER 0.975 ENTER)
Xh1 = 7, Xh2 = 6:
Yh = 60.1786 and s{predmean} = 3.7281
Yh Bs{Yh } = 60.1786 2.110(3.7281) =?
(b) A CI for the total handling time, then, would be
3 (52.30, 68.04) =?
(6.18) Mathematicians salaries, hw3-6-18-math-diagnos

*HOMEWORK 3, 6-18, PAGES 252-257;


DATA MATH;
INPUT Y X1 X2 X3;
X1X2 = X1*X2;
X1X3 = X1*X3;
X2X3 = X2*X3;
DATALINES;
33.2 3.5 9 6.1
40.3 5.3 20 6.4
38.7 5.1 18 7.4
46.8 5.8 33 6.7
41.4 4.2 31 7.5
37.5 6 13 5.9
39 6.8 25 6
40.7 5.5 30 4
30.1 3.1 5 5.8
52.9 7.2 47 8.3
38.2 4.5 25 5
31.8 4.9 11 6.4
43.3 8 23 7.6
44.1 6.5 35 7
42.8 6.6 39 5
33.6 3.7 21 4.4
34.2 6.2 7 5.5
48 7 40 7
38 4 35 6
35.9 4.5 23 3.5
40.4 5.9 33 4.9
36.8 5.6 27 4.3
45.2 4.8 34 8
35.1 3.9 15 5
;
*6.18(A) STEM AND LEAF OF X1, X2 AND X3;
PROC UNIVARIATE DATA=MATH PLOT;
TITLE1 '6.18(A) STEM AND LEAF OF WORK QUALITY, X1';
TITLE2 'AND OF YEARS OF EXPERIENCE, X2';
TITLE3 'AND OF PUBLICATION SUCCESS, X3';
VAR X1 X2 X3;
RUN;
*6.18(B) SCATTERPLOT MATRICES AND CORRELATION;
SYMBOL1 V=STAR C=BLACK;
PROC GPLOT DATA=MATH;
TITLE '6.18(B) SCATTERPLOT MATRICES';
PLOT Y*X1;
PLOT Y*X2;
PLOT Y*X3;
PLOT X1*X2;
PLOT X1*X3;
PLOT X2*X3;
RUN;
PROC CORR DATA=MATH;
TITLE '6.18(C-4) CORRELATION Y, X1, X2 AND X3';
VAR Y X1 X2 X3;
RUN;
*6.18(C) REGRESSION;
PROC REG DATA=MATH OUTEST=EST;
TITLE1 '6.18(C) REGRESSION OF Y VS X1, X2 AND X3';
MODEL Y = X1 X2 X3;
OUTPUT OUT=OUTPLOT PREDICTED=PRED RESIDUAL=RESID;
RUN;
*6.18(D) BOXPLOT OF RESIDUALS;
PROC UNIVARIATE DATA=OUTPLOT PLOT;
TITLE1 '6.18(D) BOXPLOT OF RESIDUALS';
VAR RESID;
RUN;
*6.18(E) RESIDUALS VS PREDICTED, X1, X2, X3 AND INTERACTIONS;
SYMBOL1 V=STAR C=BLACK;
PROC GPLOT DATA=OUTPLOT;
TITLE '6.18(E-1) RESIDUALS VS VARIOUS';
PLOT RESID*PRED;
PLOT RESID*X1;
PLOT RESID*X2;
PLOT RESID*X3;
PLOT RESID*X1X2;
PLOT RESID*X1X3;
PLOT RESID*X2X3;
RUN;
PROC SORT DATA=OUTPLOT;
BY RESID;
RUN;
DATA OUTPLOT;
SET OUTPLOT NOBS=NOBS;
QUANTILE = PROBIT( (_N_- (3/8)) / (NOBS + (1/4)) );
RUN;
DATA OUTPLOT2;
IF _N_ = 1 THEN SET EST;
SET OUTPLOT;
EXPRESIDUAL = _RMSE_*QUANTILE;
RUN;
PROC GPLOT DATA=OUTPLOT2;
TITLE '6.18(E-2) NORMAL PROBABILITY PLOT';
PLOT RESID*EXPRESIDUAL;
RUN;
*6.10(F) LEVENE TEST OF RESIDUALS;
DATA NEWMATH;
SET OUTPLOT;
IF PRED < 38.75 THEN LEVENEGROUP = 'A';
IF PRED GE 38.75 THEN LEVENEGROUP = 'B';
RUN;
PROC GLM DATA=NEWMATH ALPHA=0.05;
TITLE '6.18(F) (UNMODIFIED) LEVENE TEST';
TITLE1 'OF HOMOGENEITY OF VARIANCE OF RESIDUALS';
CLASS LEVENEGROUP;
MODEL RESID = LEVENEGROUP;
MEANS LEVENEGROUP / HOVTEST = LEVENE (TYPE=ABS);
RUN;
QUIT;
(a) Stem and Leaf Plots.
(b) Scatterplots and Correlation Matrix
(c) Estimated Regression.
(d) Residual Box Plot.
(e) Residual Plots.
(f ) Lack of Fit Test.
(f ) Levene Test
1. Statement.
The statement of the test is (check none, one or more):
(i) H0 : error variance constant versus H1 : > 1.
(ii) H0 : error variance constant versus H1 : not constant
(iii) H0 : error variance constant versus H1 : 6= 1.
2. Test.
From SAS, the pvalue is (choose one) 0.446 / 0.8278 / 0.884
The level of significance is (circle one) 0.01 / 0.05 / .10
3. Conclusion.
Since the pvalue is smaller / larger than the level of significance we
(circle one) accept / reject the null hypothesis that the error variance
is constant.
(6.19) Mathematicians salaries continued, hw3-6-19-math-famCI

*HOMEWORK 3, 6-19, PAGES 252-257;


DATA MATH;
INPUT Y X1 X2 X3;
X1X2 = X1*X2;
X1X3 = X1*X3;
X2X3 = X2*X3;
DATALINES;
33.2 3.5 9 6.1
40.3 5.3 20 6.4
38.7 5.1 18 7.4
46.8 5.8 33 6.7
41.4 4.2 31 7.5
37.5 6 13 5.9
39 6.8 25 6
40.7 5.5 30 4
30.1 3.1 5 5.8
52.9 7.2 47 8.3
38.2 4.5 25 5
31.8 4.9 11 6.4
43.3 8 23 7.6
44.1 6.5 35 7
42.8 6.6 39 5
33.6 3.7 21 4.4
34.2 6.2 7 5.5
48 7 40 7
38 4 35 6
35.9 4.5 23 3.5
40.4 5.9 33 4.9
36.8 5.6 27 4.3
45.2 4.8 34 8
35.1 3.9 15 5
;
*6.19 REGRESSION OF Y ON X1, X2 AND X3;
PROC REG DATA=MATH OUTEST=EST TABLEOUT ALPHA=0.05;
TITLE '6.19 REGRESSION';
TITLE2 'BONFERRONI JOINT CIs FOR B0, B1 AND B2';
TITLE3 'CORRELATION';
MODEL Y = X1 X2 X3;
OUTPUT OUT=OUTPLOT PREDICTED=PRED RESIDUAL=RESID;
RUN;
QUIT;

(a) Test of regression relation at = 0.05.


1. Statement.
The statement of the test is (check none, one or more):
(i) H0 : 1 = 2 = 3 = 0 versus H1 : 1 = 2 = 3 > 0.
(ii) H0 : 1 = 2 = 3 = 0 versus H1 : 1 = 2 = 3 < 0.
(iii) H0 : 1 = 2 = 3 = 0 versus H1 : not all i is zero.
2. Test.
From SAS, the pvalue is (choose one) 0 / 0.0827 / 0.098
The level of significance is (circle one) 0.01 / 0.05 / .10
3. Conclusion.
Since the pvalue is smaller / larger than the level of significance we
(circle one) accept / reject the null hypothesis that 1 = 2 = 3 = 0.
(b) Bonferroni Confidence Intervals.
From TI83 (INVT 18 ENTER 0.975 ENTER)
B = t(1 /2g; n p) = t(1 0.05/2(3); 24 4) = t(0.9917; 20) = 2.614
From SAS,
1. Bonferroni CI for 1 :
b1 = 1.1031 and s{b1 } = 0.330,
b1 Bs{b1 } = 1.1031 2.614(0.330) =?
2. Bonferroni CI for 2 :
b2 = 0.3215 and s{b2 } = 0.037
b2 Bs{b2 } = 0.3215 2.614(0.037) =?
3. Bonferroni CI for 3 :
b3 = 1.2889 and s{b3 } = 0.298
b3 Bs{b3 } = 1.2889 2.614(0.298) =?
(c) From SAS,
(6.20) Mathematicians salaries, hw3-6-20-math-respCI

*HOMEWORK 3, 6-20, PAGES 252-257;


DATA MATHX;
INPUT Y X1 X2 X3;
DATALINES;
33.2 3.5 9 6.1
40.3 5.3 20 6.4
38.7 5.1 18 7.4
46.8 5.8 33 6.7
41.4 4.2 31 7.5
37.5 6 13 5.9
39 6.8 25 6
40.7 5.5 30 4
30.1 3.1 5 5.8
52.9 7.2 47 8.3
38.2 4.5 25 5
31.8 4.9 11 6.4
43.3 8 23 7.6
44.1 6.5 35 7
42.8 6.6 39 5
33.6 3.7 21 4.4
34.2 6.2 7 5.5
48 7 40 7
38 4 35 6
35.9 4.5 23 3.5
40.4 5.9 33 4.9
36.8 5.6 27 4.3
45.2 4.8 34 8
35.1 3.9 15 5
. 5.0 20 5
. 6.0 30 6
. 4.0 10 4
. 7.0 50 7
;
*6.20 BONFERRONI AND WH JOINT CIs FOR MEAN;
DATA MATH X;
SET MATHX;
IF READ NE . THEN OUTPUT MATH;
ELSE OUTPUT X;
RUN;
PROC REG DATA=MATH ALPHA=0.05 NOPRINT;
TITLE '6.20 BONFERRONI AND WH JOINT CIs FOR MEAN';
MODEL Y = X1 X2 X3;
OUTPUT OUT=OUTPLOT PREDICTED=PRED RESIDUAL=RESID;
RUN;
PROC REG DATA=MATHX;
MODEL Y = X1 X2 X3;
OUTPUT OUT=PRED_DS(WHERE=(Y =.)) P=PHAT STDP=STDP;
RUN;
PROC PRINT DATA=PRED_DS;
RUN;
QUIT;

(a) At = 0.05, and g = 4 (four simultaneous intervals),


and p = 4 (parameters: 0 , 1 , 2 , 3 )
from TI83,
q q
W = pF (1 ; p, n p) = 4F (1 0.05; 4, 24 4) = 3.388
(INVF 4 ENTER 20 ENTER 0.95 ENTER,
then multiply by 4 and find the square root)
B = t(1 /2g; n p) = t(1 0.05/2(4); 24 4) = t(0.99375; 20) = 2.744
(INVT 20 ENTER 0.99375 ENTER)
Since W = 3.388 > B = 2.744, use B because the Bonferroni gives nar-
rower (more efficient) CIs than the WorkingHotelling CIs.
From SAS,
1. Xh1 = 5, Xh2 = 20, Xh3 = 5:
Yh Bs{Yh } = 36.2377 2.744(0.4631) =?
2. Xh1 = 6, Xh2 = 30, Xh3 = 6:
Yh Bs{Yh } = 41.8449 2.744(0.4170) =?
3. Xh1 = 4, Xh2 = 10, Xh3 = 4:
Yh Bs{Yh } = 30.6304 2.744(0.7560) =?
4. Xh1 = 7, Xh2 = 50, Xh3 = 7:
Yh Bs{Yh } = 50.6674 2.744(0.8975) =?
The questions from the text are altered somewhat to fit into the multiple choice
context given on Vista. The altered questions are given below.

Problem 6.9, pp 252-257.


Match the problems with the answers.

problem answer
6.9(a) time plots indicate wavelike pattern in Xi1 and Xi2
6.9(b) time plots indicate fairly random distribution of Xi1 and Xi2
6.9(c) scatterplot, correlation indicates strong correlation between Y and X i1 only
stem and leaf plots indicate Xi1 , Xi2 both have two extreme outliers
stem and leaf plots indicate fairly even distribution in Xi1 , Xi2
scatterplot, correlation indicates strong correlations between Y , Xi1 and Xi2

Problem 6.10, pp 252-257.


Match the problems with the answers.

problem answer
6.10(a) Y = 3.324 + 4.768Xi1 + 5.080Xi2
6.10(b) box plot indicates no outlying residuals
6.10(c) residual plot, normal probability plot indicates no outlying residuals
6.10(d) residual vs time plot indicates no outlying residuals
6.10(e) Levene test p-value is 0.8278
Y = 3.324 + 3.768Xi1 + 5.080Xi2
box plot indicates one outlying residual
residual plot, normal probability plot indicates one outlying residual
residual vs time plot indicates one outlying residual
Levene test p-value is 0.989

Problem 6.11, pp 252-257.


Match the problems with the answers.

problem answer
6.11(a) R2 = 0.787
6.11(b) Bonferroni CI for 1 is (2.259, 5.277)
6.11(c) R2 = 0.987
test of regression relation has F = 541.58
test of regression relation has F = 641.58
Bonferroni CI for 1 is (3.443, 6.717)
Problem 6.12, pp 252-257.
Match the problems with the answers.

problem answer
6.12(a) for family CIs of response, B = 4.098 > W = 2.898
6.12(b) point (Xh1 , Xh2 ) = (20, 5) is inside scatter plot
for family CIs of response, W = 4.098 > B = 2.898
for family CIs of response, W = 3.098 > B = 2.898
point (Xh1 , Xh2 ) = (20, 5) is outside scatter plot
point (Xh1 , Xh2 ) = (20, 19) is outside scatter plot

Problem 6.13, pp 252-257.


Match the problems with the answers.

problem answer
6.13(a) for Xh1 = 12 and Xh2 = 9.00, CI is (78.176, 100.339)
6.13(b) for Xh1 = 15 and Xh2 = 12.50, CI is (107.081, 159.600)
6.13(c) for Xh1 = 15 and Xh2 = 12.50, CI is (107.081, 139.600)
6.13(d) for Xh1 = 18 and Xh2 = 16.50, CI is (157.923, 172.004)
for Xh1 = 9 and Xh2 = 7.20, CI is (47.590, 90.031)
for Xh1 = 9 and Xh2 = 7.20, CI is (57.590, 90.031)
for Xh1 = 12 and Xh2 = 9.00, CI is (78.176, 110.339)
for Xh1 = 18 and Xh2 = 16.50, CI is (137.923, 172.004)

Problem 6.14, pp 252-257.


Match the problems with the answers.

problem answer
6.14(a) for (Xh1 , Xh2 ) = (7, 6), PI of TOTAL is (166.94, 204.14)
6.14(b) for (Xh1 , Xh2 ) = (7, 6), PI of TOTAL is (156.94, 214.14)
for (Xh1 , Xh2 ) = (7, 6), CI of MEAN is (42.312, 68.045)
for (Xh1 , Xh2 ) = (7, 6), CI of MEAN is (52.312, 78.045)
for (Xh1 , Xh2 ) = (7, 6), CI of MEAN is (52.312, 68.045)
for (Xh1 , Xh2 ) = (7, 6), PI of TOTAL is (156.94, 204.14)

Problem 6.18, pp 252-257.


Match the problems with the answers.
problem answer
6.18(a) scatterplot, correlation indicates strong correlations between Y , X i1 , Xi2 and Xi3
6.18(b) residual box plot indicates badly skewed distribution
6.18(c) Y = 7.84693 + 0.10313Xi1 + 0.32152Xi2 + 1.28894Xi3
6.18(d) residual box plot indicates fairly symmetric distribution
6.18(e) residual plots, normal probability plot indicates data normal
6.18(f) lack of fit test pvalue is 0.567
6.18(g) Levene test p-value is 0.884
stem and leaf plots indicates one extreme outlier in Xi1 , Xi2 , Xi3
stem and leaf plots indicate fairly even distribution in Xi1 , Xi2 , Xi3
Y = 17.84693 + 1.10313Xi1 + 0.32152Xi2 + 1.28894Xi3
scatterplot, correlation indicates strong correlations between Y and X i1 , Y and Xi2 , Y and Xi3 only
residual plots, normal probability plot indicates data not normal
unable to do lack of fit test because no repeated observations
Levene test p-value is 0.584

Problem 6.19, pp 252-257.


Match the problems with the answers.

problem answer
6.19(a) test of regression relation has F = 68.119
6.19(b) Bonferroni CI for 3 is (0.240, 1.966)
6.19(c) R2 = 0.8087
test of regression relation has F = 54.581
Bonferroni CI for 3 is (0.510, 2.068)
R2 = 0.9109

Problem 6.20, pp 252-257.


Match the problems with the answers.

problem answer
6.20(a) for (Xh1 , Xh2 , Xh3 ) = (5, 20, 5), CI is (36.967, 37.508)
6.20(b) for (Xh1 , Xh2 , Xh3 ) = (6, 30, 6), CI is (40.701, 42.989)
6.20(c) for (Xh1 , Xh2 , Xh3 ) = (7, 50, 7), CI is (48.205, 55.130)
6.20(d) for (Xh1 , Xh2 , Xh3 ) = (7, 50, 7), CI is (48.205, 53.130)
for (Xh1 , Xh2 , Xh3 ) = (5, 20, 5), CI is (34.967, 37.508)
for (Xh1 , Xh2 , Xh3 ) = (6, 30, 6), CI is (41.701, 42.989)
for (Xh1 , Xh2 , Xh3 ) = (4, 10, 4), CI is (29.556, 32.705)
for (Xh1 , Xh2 , Xh3 ) = (4, 10, 4), CI is (28.556, 32.705)

You might also like