Chapter 3 Special Topics in Regression

Chapter 3 Special topics in regression
3.1 Sequential variable selection procedures
An addition of an extra predictor usually improves the fitting. But it is only

worthwhile to include this additional predictor variable if the improvement in the
fitting is significant. There are three procedures allowing us to choose the best subset
of predictor variables to be included sequentially as explanatory variables in the
regression equation. Sequentially and systematically, each regressor variable is added
or deleted from the existing model on the basis of F-tests.
3.1.1 Forward Selection

The initial model only contains the constant term. Predictors are entered into the
equation one by one according to their correlations with the dependent variable. The
variable with higher correlation enters before one with lower correlation. The
correlations can be reflected from the partial t-values, the increase in R2 or the
reduction in residual sum of squares, on the addition of that variable. The steps of
forward selection may be described as below:
1. Select the first ‫ݔ‬ଵ‫ כ‬with the largest R2.

2. Test the significance of this predictor ‫ݔ‬ଵ‫ כ‬.
Stop if test is insignificant – no regression is necessary. If test is significant
include this predictor ‫ݔ‬ଵ‫ כ‬. Go to step 3.
3. Choose the second predictor ‫ݔ‬ଶ‫ כ‬which gives largest increase in R2 when added
to the regression equation with ‫ݔ‬ଵ‫ כ‬present.
4. Test the significance of the addition of ‫ݔ‬ଶ‫ כ‬given the presence of ‫ݔ‬ଵ‫ כ‬.
Stop if test is insignificant – no need to include ‫ݔ‬ଵ‫ כ‬, the regression equation
with ‫ݔ‬ଵ‫ כ‬only is the best regression.
If test is significant, ‫ݔ‬ଶ‫ כ‬will be included in addition to ‫ݔ‬ଵ‫ כ‬.
Proceed in similar fashion for ‫ݔ‬ଷ‫ כ‬, ‫ݔ‬ସ‫ כ‬and so on until all variables are
considered or stop when test for additional variable is insignificant.
3.1.2 Backward Elimination

Here we start the model with all predictor variables. The variable with least
contribution to the regression will be tested for its exclusion sequentially. If its
contribution is insignificant it will be removed from the model. The next least
contributed variable is then assessed and so on. The steps of forward selection may be
described as below:
1. Regress y on all x.
2. Select ‫ݔ‬ଵ‫ כ‬which removal will result in smallest decrease in R2.
3. Test the significance of this predictor ‫ݔ‬ଵ‫ כ‬given all other predictors.
Stop if it is significant – no predictor should be removed.
1
Remove this predictor if insignificant and go to next step.
4. Choose the second ‫ݔ‬ଶ‫ כ‬with smallest decrease in R2 on removal.
5. Test the significance of ‫ݔ‬ଶ‫ כ‬given the other predictors.
Stop if significant and no more variables should be removed.
Remove this variable if insignificance and so on until no predictors can be
removed from the regression.
3.1.3 Stepwise Regression

This combies both forward selection and backward elimination, starting with forward
selection and ending with backward elimination. The steps involved are:
1. Select the first ‫ݔ‬ଵ‫ כ‬with the largest R2.

2. Test the significance of this predictor ‫ݔ‬ଵ‫ כ‬.
Stop if test is insignificant – no regression is necessary. If test is significant
include this predictor ‫ݔ‬ଵ‫ כ‬. Go to step 3.
3. Choose the second predictor ‫ݔ‬ଶ‫ כ‬which gives largest increase in R2 when added
to the regression equation with ‫ݔ‬ଵ‫ כ‬present.
4. Forward selection - Test the significance of the addition of ‫ݔ‬ଶ‫ כ‬given the
presence of ‫ݔ‬ଵ‫ כ‬.
Stop if test is insignificant – no need to include ‫ݔ‬ଵ‫ כ‬, the regression equation
with ‫ݔ‬ଵ‫ כ‬only is the best regression.
If test is significant, ‫ݔ‬ଶ‫ כ‬will be included in addition to ‫ݔ‬ଵ‫ כ‬. Go to Step 5.
5. Choose the predictor among those in the equation with least decrease in R2
given the other predictors.
6. Backward elimination – Test whether this predictor may be removed from
present list.
7. Repeat steps 4 to 6 until no more predictor enters or no predictor removed.
Remarks
For practical convenience we may choose a F-in value and a F-out value in selecting
the regressors in the equation. For example we may set F-in = 4 and F-out = 4 (Why is
it 4?) then
F-in > 4 → to include x in forward selection

F-out < 4 → to exclude x in backward elimination
Sometimes we may choose a smaller F-out than F-in so that it is harder to remove a
regressor that has been included already.
Alternative, we may set α-in and α-out. For example, α-in =0.05 and α-out =0.10.
2
Example 1
The regression of y on x1, x2, x3 and x4 using the data contained in ‘stepwise.xls’
produces the following results:
Predictor SSE Forward Selection

1 219.44
2 574.39
3 645.76
4 740.21
1, 2 163.51
1, 3 176.38
1, 4 172.96
2, 3 569.78
2, 4 574.00
3, 4 640.18
1, 2, 3 157.53
1, 2, 4 137.28
1, 3, 4 127.36
2, 3, 4 569.76
1, 2, 3, 4 122.97 Backward Elimination
Total SS = 743.18, n = 25
Stepwise Regression
3
3.2` Multicollinearity
This is a situation where regressor variables are highly correlated. When some
columns of X is a linear combination of other columns multicollinearity occurs. In
practice we seldom have exact collinearity.
Suppose that X are centered and scaled as:
xij − xi
xij → = xij* X → X*
∑ (x ij − xi ) 2
Then X*′X* will be a correlation matrix of the regressors, i.e.
⎡ 1 r12 r13 " r1,k −1 ⎤

⎢ r 1 r23 " r2,k −1 ⎥⎥
⎢ 21
X*′X* = ⎢ r31 r32 1 " r3,k −1 ⎥ .
⎢ ⎥
⎢ # # ⎥
⎢ rk −1,1 rk −1,2 " " 1 ⎥⎦
⎣
If one or some of the r’s are close to 1, then X*′X* may be near singular and the
inverse of X*′X* will be very sensitive to the r’s. In this situation X*′X* is said to be
ill-conditioned.
Examples
⎡ 1.0 .99215⎤ ⎡ 63.94 −63.44 ⎤

A = X*′X* = ⎢ (X*′X* )-1 = ⎢ |A| = 0.016
⎣.99215 1.0 ⎥⎦ ⎥
⎣ −63.44 63.94 ⎦
⎡ 1.0 .975⎤ ⎡ 20.2532 −19.747 ⎤

B = X*′X* = ⎢ ⎥ (X*′X* )-1 = ⎢ ⎥ |B| = 0.049
⎣.975 1.0 ⎦ ⎣ −19.747 20.2532 ⎦
⎡1.0 .2 ⎤ ⎡1.042 −.208⎤

C = X*′X* = ⎢ ⎥ (X*′X* )-1 = ⎢ ⎥ |C| = 0.96
⎣ .2 1.0 ⎦ ⎣ −.208 1.042 ⎦
⎡1.0 .1 ⎤ ⎡ 1.01 −1.01⎤

B = X*′X* = ⎢ ⎥ (X*′X* )-1 = ⎢ ⎥ |D| = 0.99
⎣ .1 1.0 ⎦ ⎣ −1.01 1.01 ⎦
4
Since var( β̂ ) = σ2(X*′X* )-1 so we have
Cases A D C D
var( β̂1 ) 63.94 20.2532 1.042 1.01
var( β̂ 0 ) 63.94 20.2532 1.042 1.01
In the ideal case where all x are independent or orthogonal

⎡1 0 ⎤ ⎡1 0 ⎤
X*′X* = ⎢ ⎥ and (X*′X*)-1 = ⎢ ⎥
⎣0 1 ⎦ ⎣0 1 ⎦
hence var( β̂1 ) = var( β̂ 0 ) = 1.
Cases A, B: var( βˆi ) very large and changes greatly.

Cases C, D: var( βˆ ) changes very little.
i
We say that the variance is inflated from 1.0 (the ideal case) to 63.94 in Case A. We
define the variance inflation factor VIF to be the increase of the variance of an
estimated coefficient as compared to the ideal case.
Consider the regression of x1 on x2, x3, …, xk-1 or x1 on X2, the coefficient of

determination is
Residual SS of x1 on other x's

R12 = 1-
S11
Or residual SS of x1 on other x’s = S11(1 - R12 ).
var( β̂1 ) = σ2/Residual SS of x1 on other x’s

σ2
= .
S11 (1 − R12 )
The regression of y on x1 alone will give
σ2
var( β̂1 ) = .
S11
Thus with the presence of other regressors x2, x3,… we have
1
var( β̂1 ) = var(y on x1 alone) × .
1 − R12
5
With the presence of other variables, the variance of β̂1 will be inflated by a factor
1
. We define
1 − R12
1
VIF = ≡ variance inflation factor of x1.
1 − R12
Since 0 ≤ R12 ≤ 1 , VIF ≥ 1.
If R12 = 0, x1 is uncorrelated with other regressors, VIF =1 and var( β̂1 ) will not be
inflated.
If R12 = 1, x1 is exactly correlated with other regressors, VIF =∞ and var( β̂1 ) will be
inflated infinitely.
The mean of VIF
1 k
VIF = ∑ VIFi
k i =1
may be used to be a measure of severity of multicollinearity.
VIF = 1 if the x’s are near orthogonal or uncorrelated.
VIF >> 1 or VIFi > 10 will be considered that multicollinearity may severely
influence the stability of the estimated coefficients.
Another measure of multicollinearity is the conditional number of the matrix X′X.

The condition number of a matrix is the ration of the largest eigenvalue to the smallest
eigenvalue, i.e.
λmax
Condition number = .
λmin
Some solutions to multicollinearity are Ridge Regression and Principal Component

Regression, both of which will not be discussed here.
6
Example 2
Student consumption data in a certain month in 2010:
Con = Student consumption other than tuition and boarding fees (in $)
Yd = Student Disposable income (in $)
LiA = Student liquid assets (in $)
Row Student Con Yd LiA
1 Mary 2000 2500 26000

2 Lisa 2300 3000 30000
3 Jim 2800 3400 35000
4 Rob 3800 4100 45000
5 John 3500 4400 44000
6 Mike 5000 5000 48000
7 Chris 4500 5400 55000
8 Sue 3300 3500 36000
Regression of CON on Yd and LiA
The regression equation is

Con = - 418 + 1.17 Yd - 0.0191 LiA
Predictor Coef Stdev t-ratio p VIF

Constant -418.0 615.2 -0.68 0.527
Yd 1.1701 0.8719 1.34 0.237 35.6
LiA -0.01906 0.08887 -0.21 0.839 35.6
s = 384.8 R-sq = 90.1% R-sq(adj) = 86.1%
Analysis of Variance
SOURCE DF SS MS F p
Regression 2 6739470 3369735 22.75 0.003
Error 5 740530 148106
Total 7 7480000
2.555047 0.000485 -0.000108

XtX-inv = 0.000485 0.000005 -0.000001
-0.000108 -0.000001 0.000000
Regression of CON on Yd only

Con = - 457 + 0.986 Yd
Predictor Coef Stdev t-ratio p

Constant -456.7 539.2 -0.85 0.429
Yd 0.9857 0.1341 7.35 0.000
s = 352.9 R-sq = 90.0% R-sq(adj) = 88.3%
XtX-inv = 2.334296 -0.000565

-0.000565 0.000000
7
Regression of CON on LiA only

Con = - 528 + 0.0985 LiA

Constant -528.5 649.0 -0.81 0.447
LiA 0.09852 0.01587 6.21 0.001
s = 409.7 R-sq = 86.5% R-sq(adj) = 84.3%
XtX-inv = 2.5092783 -0.0000598

-0.0000598 0.0000000
Regression of CON on Yd and LiA without intercepts
* NOTE * Yd is highly correlated with other predictor variables

* NOTE * LiA is highly correlated with other predictor variables

Con = 1.25 Yd - 0.0368 LiA

Noconstant
Yd 1.2494 0.8243 1.52 0.180
LiA -0.03681 0.08104 -0.45 0.666
s = 367.2
SOURCE DF SS MS F p
Regression 2 99151096 49575548 367.73 0.000
Error 6 808901 134817
Total 8 99960000
Correlation matrix1
Con 1.00000 0.94873 0.93023

Yd 0.94873 1.00000 0.98584
LiA 0.93023 0.98584 1.00000
Yd and LiA are highly correlated.
8
3.3 Autocorrelation
Standard regression model assumes a constant variance on the errors or disturbances.
However this may not be always satisfied in the real world. For example, in the
regression of expenditure on income, we may expect expenditures vary less at lower
incomes but more at higher income levels. This is because higher income groups have
more room to exhibit different and varied expenditure behaviors. The assumption of
constant variance may also be violated when time series data are involved as today
values tend to influence tomorrow’s values. In this case, we say we have
autocorrelated or serially correlated errors.
Autocorrelated errors may be modeled by
ε i = ρε i −1 +ν i , ν i ~ IN (0, σ v2 ) , ρ < 1 .
Note that
E( ε i ) = 0
var( ε i ) = ρ 2var ( ε i −1 ) + 2cov(vi, ε i −1 ) + var (vi)
= ρ 2var( ε i ) + σ v2
σ v2
= .
1− ρ 2
Also
E( ε i ε i −1 ) = ρ E( ε i2−1 ) + E(vi, ε i −1 )
= ρσ ε2 = cov(ε i , ε i −1 ) .
Thus the disturbances have the same variance but are not independent. The covariance
matrix of the disturbances is
⎡ 1 ρ ρ2 " ρ n −1 ⎤
⎢ ⎥
⎢ ρ 1 ρ " ρ n−2 ⎥
cov(ε ) = σ ε ⎢ ρ 2
2
ρ 1 " ρ n −3 ⎥ .
⎢ ⎥
⎢ # # # % # ⎥
⎢ ρ n −1 ρ n−2 ρ n −3 " 1 ⎥⎦
⎣
3.3.1 Durbin-Watson Test

The Durbin-Watson test provides a test for first order autocorrelation. It tests
H 0 : ρ = 0 Vs H 0 : ρ > 0 or ρ < 0 .
Let ei be the least squares residuals from the regression of y on x. The Durbin-Watson
statistic is defined by
9
n
∑ (e − e i i −1 )2
DW = i =2
n
= d.
∑e
i =1
2
i
d ranges from 0 to 4.
d < 2 ⇒ positive autocorrelation i.e. ρ > 0

d > 2 ⇒ negative autocorrelation i.e. ρ < 0
d = 2 ⇒ no autocorrelation i.e. ε i are independent
If d < 2 we test for positive autocorrelation:

1. If d < dL we reject H0
2. If d > dU we do not reject H0
3. If dL < d < dU test is inconclusive
If d > 2 we test for negaitive autocorrelation. We compare 4 – d with dL and dU.
The dL, dU values are tabulated in a DW table.
3.3.2 Orcutt-Cochrane (O-C) Procedure

The simple method to remove error autocorrelation is the use of lagged values of the
response and regressor variables.
Consider the regression model with autocorrelated errors
yi = β 0 + β1 xi + ε i ,
ε i = ρε i −1 +ν i , ν i ~ IN (0, σ v2 ) , ρ < 1 .
Suppose ρ is known then

ρ yi −1 = ρβ 0 + ρβ1 xi −1 + ρε i −1
yi − ρ yi −1 = β 0 − ρβ 0 + β1 xi − ρβ1 xi −1 + ε i − ρε i −1
= β 0 (1 − ρ ) + β1 ( xi − xi −1 ) + vi
or
yi* = β 0* + β1 xi* + vi , where ν i ~ IN (0, σ v2 ) now.
By regressing yi* = yi − ρ yi −1 on xi* = xi − ρ xi −1 we shall obtain the

Intercept: β 0* = β 0 (1 − ρ ) or β 0 = β 0* / (1 − ρ )
Slope: β1 .
10
The unknown autocorrelation coefficient ρ is usually estimated by the first order
autocorrelation of the residuals from the regression of y on x.
Orcutt-Cochrane Procedure
Step 1. Regress yi on xi to get residuals ei and DW statistic. If DW is not significant,

stop, otherwise go to step 2.
Step2. Estimate ρ̂ , the first order correlation coefficient of disturbances from the
residuals ei.
Step 3. Calculate yi* = yi − ρˆ yi −1 and xi* = xi − ρˆ xi −1 and regress y* on x* to get the
regression line
yˆ i* = b0* + b1* xi* ,
residuals e1 and DW statistic. If DW is not significant, stop. The regression
line obtained is
b0*
yi = + b1* xi ,
1 − ρˆ
otherwise go to step 4.
Step 4. Estimate ρ̂1 , the first order correlation coefficient of disturbances from the
residuals e1.
Step 5. Calculate yi** = yi* − ρˆ1 yi*−1 and xi** = xi* − ρˆ1 xi*−1 and regress y** on x** to get
the regression line
yˆ i** = b0** + b1** xi** ,
residuals e1 and DW statistic. If DW is not significant, stop. The regression
line obtained is
b0**
yi = + b1** xi ,
(1 − ρˆ )(1 − ρˆ1 )
and so on.
11
Example 3
Filename: House price
House price and household income
Y= average house price in $
X=average household income in $
Year Y X e0 yi* = yi − ρˆ yi −1 xi* = xi − ρˆ xi −1 e1

1964 18900 648 -333.35 5825 216.7 ‐372.763
1965 20000 702.7 -351.38 6400 242.775 ‐303.202
1966 21400 769.8 -322.85 6650 236.95 59.70971
1967 22700 814.3 67.6 7675 278.575 277.8479
1968 24700 889.3 534.65 7075 292.525 ‐592.56
1969 25600 959.5 -0.18 4200 291.075 ‐3439.45
1970 23400 1010.7 -3246.67 7650 339.175 ‐921.827
1971 25200 1097.2 -3214.67 8700 384.1 ‐742.656
1972 27600 1207 -3058.9 11800 444.35 1189.454
1973 32500 1349.6 -1073.53 11525 446.4 874.7168
1974 35900 1458.6 98.59 12375 491.95 841.7725
1975 39300 1585.9 896.67 14725 578.975 1504.874
1976 44200 1768.4 2066.5 15650 647.8 1095.766
1977 48800 1974.1 2462.15 19100 752.125 2523.523
1978 55700 2232.7 4076.56 21125 814.075 3347.68
1979 62900 2488.6 6046.15 17425 841.55 ‐884.897
1980 64600 2708 3261.78 20450 999.6 ‐923.549
1981 68900 3030.6 968.08 17625 876.65 ‐1365.28
1982 69300 3149.6 -1064.2 23325 1042.8 1114.059
1983 75300 3405 -284.38 23425 1223.45 ‐2287.67
1984 79900 3777.2 -3291.87 24375 1205.8 ‐995.544
1985 84300 4038.7 -4236.74
Regression Analysis: Y versus X

Y = 5989 + 20.4 X
Predictor Coef SE Coef T P

Constant 5989 1143 5.24 0.000
X 20.4393 0.5339 38.28 0.000
s = 2625.93 R-Sq = 98.7% R-Sq(adj) = 98.6%
Source DF SS MS F P
Regression 1 10104102863 10104102863 1465.31 0.000
Residual Error 20 137910319 6895516
Total 21 10242013182
Durbin-Watson statistic = 0.383407
12
d = DW = 0.383
Since d < 2 we test for positive autocorrelation.
For n = 22 with 1 regressor (excluding the intercept) and α = 0.05

dL = 1.239, dU = 1.429, implying positive autocorrelation.
The autocorrelation may be estimated by the autocorrelation procedure within time

series using MINITAB.
Autocorrelation Function: RESI1
Lag ACF T LBQ

1 0.742815 3.48 13.87 ρˆ = 0.74 This is the first order autocorrelation
2 0.406988 1.32 18.25
3 0.133341 0.40 18.74
O-C method suggests to transform y and x by
yi* = yi − ρˆ yi −1 and xi* = xi − ρˆ xi −1 and run the regression of

yi* = yi − ρˆ yi −1 on xi* = xi − ρˆ xi −1
The estimated autocorrelation coefficient is 0.74. Usually a more convenient value

will be used instead. Here we use ρ̂ = 0.75 to transform y and x, i.e.
yi* = yi − 0.75 yi −1 and xi* = xi − 0.75 xi −1
Regression Analysis: y* versus x*

Y* = 1997 + 19.4 X*

Constant 1997.2 739.8 2.70 0.014
X* 19.384 1.082 17.91 0.000
s = 1602.44 R-Sq = 94.4% R-Sq(adj) = 94.1%
Source DF SS MS F P
Regression 1 823501802 823501802 320.70 0.000
Residual Error 19 48788555 2567819
Total 20 872290357
The DW is marginally significant or inconclusive if n is considered to be 21. If we

stop here the coefficients obtained will be
b0* 1997
b0 = = = 7988, b1 = b1* = 19.4
(1 − ρˆ ) (1 − 0.75)
and the regression line obtained will be
y = 7988 + 19.4 x.
13
If the DW is still significant, we may repeat the O-C procedure on y* and x*. Estimate
the autocorrelation ( ρ̂1 , say) from the residuals of the regression of y* on x* and
transform y* and x* by:
yi** = yi* − ρˆ1 yi*−1 on xi** = xi* − ρˆ1 xi*−1

and run the regression of yi** on xi** to obtain
Intercept: β 0** = β 0* (1 − ρ1 ) or β 0 = β 0* / [(1 − ρ )(1 − ρ1 )]

Slope: β1 .
Autocorrelation Function: RESI2
Lag ACF T LBQ

1 0.372753 1.71 3.36
2 0.160518 0.65 4.01
3 -0.019307 -0.08 4.02
ρ̂1 = 0.4 is used here, i.e.

yi** = yi* − 0.4 yi*−1 and xi** = xi* − 0.4 xi*−1
Regression Analysis: Y** versus X**

Y** = 1272 + 19.2 X**

Constant 1272.1 741.4 1.72 0.103
X** 19.193 1.677 11.44 0.000
s = 1522.39 R-Sq = 87.9% R-Sq(adj) = 87.2%
Source DF SS MS F P
Regression 1 303543183 303543183 130.97 0.000
Residual Error 18 41718181 2317677
Total 19 345261364
The DW statistic is surely insignificant now, implying no more autocorrelation in the

errors. By transforming back to the original equation we have
b0** 1272
b0 = = = 8480, b1 = b1** = 19.2
(1 − ρˆ )(1 − ρˆ1 ) (1 − 0.75)(1 − 0.4)
The estimated regression is
y = 8480 + 19.2 x
after having taking into account the autocorrelation of the errors. By assuming
independent errors the estimated regression equation will be
y = 5989 + 20.4 x.
14
3.4 Indicator variables
A regression model may involve both quantitative and qualitative regressor variables.
For example, to predict the weight (y) of a person from his/her height (x) it might be
more reasonable to use two models, one for males and the other for females because
of the differences in body profile. By defining a variable (called indicator, categorical
or dummy variable) D for gender
D = 0 for female
D = 1 for male
the regression model(s) may be formulated as
y = β 0 + γ D + β1 x + ε .
Note that we have two different equations with same slope coefficient but different
intercepts:
For females, D=0, thus the regression equation is y = β 0 + β1 x + ε .
For males, D=1, the regression equation becomes y = ( β 0 + γ ) + β1 x + ε .
y = (β 0 + γ ) + β1x
y y = β 0 + β1 x
β0
15
Example 4
The mileage per gallon of private cars are obtained to study the relationship of
mileage with the vehicle weight and transmission. (filename: City MPG)
Citympg = miles per gallon in city driving

Weight = weight of car in pounds
Auto = 1 for automatic transmission, 0 for manual transmission
CITYMPG WEIGHT AUTO CITYMPG WEIGHT AUTO

24.00 2880.00 0.00 26.00 2670.00 0.00
18.00 3530.00 1.00 19.00 3650.00 1.00
17.00 3805.00 1.00 21.00 3175.00 1.00
28.00 2340.00 0.00 19.00 3525.00 1.00
9.00 3640.00 0.00 25.00 2423.00 0.00
18.00 3362.00 1.00 20.00 3564.00 1.00
18.00 3652.00 1.00 18.00 3787.00 1.00
18.00 3495.00 1.00 20.00 3126.00 1.00
18.00 3576.00 1.00 23.00 2531.00 0.00
18.00 3768.00 1.00 30.00 2393.00 0.00
15.00 4040.00 1.00 18.00 3430.00 1.00
17.00 2955.00 0.00 23.00 2525.00 0.00
29.00 2332.00 0.00 21.00 2735.00 0.00
26.00 2600.00 0.00 23.00 2755.00 0.00
30.00 2425.00 0.00 31.00 2083.00 0.00
29.00 2240.00 0.00 18.00 3220.00 0.00
16
Regression Analysis: citympg versus weight, auto

citympg = 54.4 - 0.0115 weight + 4.80 auto

Constant 54.424 3.178 17.13 0.000
weight -0.011469 0.001192 -9.63 0.000
auto 4.804 1.341 3.58 0.001
s = 2.11545 R-Sq = 84.2% R-Sq(adj) = 83.1%
Source DF SS MS F P
Regression 2 692.19 346.10 77.34 0.000
Residual Error 29 129.78 4.48
Total 31 821.97
To test whether manual and auto cars have different MPG we can test the significance
of the coefficient γ. If γ is significantly different from 0 it means that the two
regressions have different intercepts, hence two separate regression lines.
Test H0: γ = 0 Vs H1: γ ≠ 0
t = 4.804/1.341 = 3.58 with d.f. = 29. Conclusion: reject H0.

It might be more sensible or reasonable to test whether one type will consume more
fuel than the other. In this case, the alternative would have been H1: γ > 0.
3.4.1 More categories

In many applications there may be more than two categories, e.g. it may be more
appropriate to use more indicator variables for seasonal data. We may define
D1 = 1 for summer, 0 otherwise

D2 = 1 for autumn, 0 otherwise
D3 = 1 for winter, 0 otherwise.
A regression model involving seasonal data may be formulated by:
y = β 0 + γ 1 D1 + γ 2 D2 + γ 3 D3 + β1 x1 + ... + ε .
The models for the seasons are:

Spring: y = β 0 + β1 x1 + ... + ε
Summer: y = ( β 0 + γ 1 ) + β1 x1 + ... + ε
Autumn: y = ( β 0 + γ 2 ) + β1 x1 + ... + ε
Winter: y = ( β 0 + γ 3 ) + β1 x1 + ... + ε
17
3.5 Assessment of assumptions
The ideal conditions for regression are:
a. The relationship is linear
b. The disturbances have the same variance
c. The disturbances are independent
d. The disturbances are normally distributed
e. The disturbances are not correlated with the regressor variables
The violation of any of the above conditions will lead to very undesirable results,
e.g. The estimates are no longer unbiased,
The estimates are not stable (large variances), etc.
Examination of the residuals will always enable us to have some idea whether these
conditions are satisfied or not.
Some properties of the residuals

1. Sum and hence average of residuals = 0.
2. No systematic pattern of the residuals and they are randomly distributed about
their mean 0 if the ideal conditions are satisfied.
3. If d is true, the residuals should look like random numbers drawn from a
normal distribution.
Some plots of residuals are usually to identify any violation of the above conditions.
These are:
A. Residual versus predicted y

B. Residual versus a regressor variable
C. Residual versus time when time series data are involved
Three examples of residual plots are given below:
18
This is the ideal case in which the residuals are scattered randomly around the 0-line.
This indicates a model possibly with curvature – non-linearity in the relationship.
19
This is an example where the error variance is not constant. Residuals have smaller
variation at lower level but larger variation at higher level.
Example 5
The population data of US from 1820 to 1940. A regression of population on time

(year) is to be modolled to study the changes in population over time. (filename:
Population)
Year Pop Residuals

1820 9.638 18.84621
1830 12.866 9.839764
1840 17.069 1.808319
1850 23.191 ‐4.30413
1860 31.443 ‐8.28657
1870 39.818 ‐12.146
1880 50.155 ‐14.0435
1890 62.947 ‐13.4859
1900 75.994 ‐12.6734
1910 105.71 4.808203
1920 122.755 9.618758
1930 131.669 6.298313
1940 151.325 13.71987
20
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.972318
R Square 0.945403
Adjusted R Square 0.940439
Standard Error 11.95915
Observations 13
Standard
Coefficients Error t Stat P‐value
Intercept ‐2235.88 166.6896 ‐13.4134 3.67E‐08
Year 1.223445 0.088647 13.80129 2.73E‐08
Year Residual Plot
25
20
15
10
Residuals
5
0
‐51800 1850 1900 1950
‐10
‐15
‐20
Year
Residual vs Predicted
20
15
10
5
0
‐5 0 50 100 150
‐10
‐15
‐20
Though R-square is quite large, indicating satisfactory fit, the residuals show a
systematic pattern when plotted against time, the regressor variable, or against the
predicted y. This indicates a non-linear relationship between y and x. For example it
may improve the fitting by including the square term of the time.
21
The regression of y on x and x2 gives the following results:

SUMMARY OUTPUT

Regression Statistics
Multiple R 0.99639837
R Square 0.9928097
Adjusted R Square 0.99137165
Standard Error 4.55181799
Observations 13
Standard
Coefficients Error t Stat P‐value
Intercept 26948.0616 3594.7123 7.496584 2.07E‐05
Year ‐29.835602 3.8252303 ‐7.79969 1.47E‐05
Year2 0.00826038 0.0010173 8.119839 1.03E‐05
Year Residual Plot
10
4
Residuals
0
1800 1850 1900 1950
‐2
‐4
‐6
Year
The R-square has increased to 99.3%, indicating a near perfect fit. Though the
residuals do not seem to have any systematic pattern the constant variance condition
seems being violated. The residuals vary very little in early years but substantially in
later years. This means that the variances of the disturbances increase with time. The
problem of non-constant variances will not be handled here.
22
3.6 Non-linear effects
In some applications though the dependent variable does not seem to linearly related
to the predictor variable it may be possible to linearize the relation so that the linear
regression procedure can still be applied to estimate the non-linear relation.
The table below gives some common nonlinear relations that can be linearized by a
simple transformation.
Transformation of nonlinear relations
Original Nonlinear Transformation Transformed Slope Intercept

Equation Linear Equation
yi = axib Logarithmic y ' = ln y , x ' = ln x b ln( a )
(Power equation) ln( yi ) = ln(a ) + b ⋅ ln( xi ) y ' = ln a + bx '
Regress
ln(y) on ln(x)
yi = aebxi Logarithmic ln( yi ) = ln(a ) + bxi b ln( a )

(Exponential model) y ' = ln y y ' = ln a + bx
Regress
ln(y) on x
axi Reciprocal 1 b 1 1 b 1
yi = = ⋅ +
b + xi yi a xi a a a
1 1
y'= , x' =
(Growth rate model) y x y ' = 1/ a + (b / a ) x '
Regress
1 1
on
y x
yi = a + b ln xi Logarithmic yi = a + b ⋅ ln( xi ) b a or
ln(a)
x ' = ln x y = ln a + bx '
Regress
y on ln(x)
23
Example 6
The damages susceptibility of peaches to the height at which they are dropped (drop
height measured in mm) and the density of the peaches (measured in g/cm3) are given
in the following table.
Observation Damage (mm) Drop height (mm) Density (g/cm3)

number y x1 x2
1 3.62 303.7 0.90
2 7.27 366.7 1.04
3 2.66 336.8 1.01
4 1.53 304.5 0.95
5 4.91 346.8 0.98
6 10.36 600.0 1.04
7 5.26 369.0 0.96
8 6.09 418.0 1.00
9 6.57 269.0 1.01
10 4.24 323.0 0.94
11 8.04 562.2 1.01
Suppose the damage y is related to drop height x by the relation

y = αβ x
A log transformation transforms the relation to
ln y = ln α + x ln β
By running the regression of z = ln y on x, then
Intercept = ln α and slope = ln β
Damage Drop height z=

(mm) (mm)
y x ln y x2 xz z2
3.62 303.7 1.286474 92233.69 390.7022 1.655015
7.27 366.7 1.983756 134468.9 727.4434 3.935289
2.66 336.8 0.978326 113434.2 329.5002 0.957122
1.53 304.5 0.425268 92720.25 129.494 0.180853
4.91 346.8 1.591274 120270.2 551.8538 2.532153
10.36 600 2.337952 360000 1402.771 5.466021
5.26 369 1.660131 136161 612.5883 2.756035
6.09 418 1.806648 174724 755.1789 3.263977
6.57 269 1.882514 72361 506.3962 3.543858
4.24 323 1.444563 104329 466.5939 2.086763
8.04 562.2 2.084429 316068.8 1171.866 4.344845
60.55 4199.7 17.48134 1716771 7044.388 30.72193
24
Sxx = 1716771 – (4199.7)2/11 = 113363.72
Sxz = 7044.388 - (4199.7)(17.48134)/11 = 370.17
Szz = 30.72193 – (17.48134)2/11 = 2.9404
Slope = 6674.22/113363.72 = 0.003265, βˆ = 0.9967
Intercept = 17.48134/11 – (0.003265)(4199.7/11) = 0.3427, α̂ = 1.4087
The fitted line is
y = (1.4087)(0.9967)x
SSR = (0.003265)( 370.17) = 1.2086
SSE = 2.9404 – 1.2086 = 1.7318
R2 = 1.2086/2.9404 = 0.411
s2 = 1.7318/9 = 0.1924, var(ln βˆ ) = 0.1924/113363.72 = 0.00132
25
3.6 Regression without intercept
yi = β xi + ε i
Let the fitted line be yi = bxi. The least square principle is to choose b to minimize
n
φ = ∑ ( yi − bxi ) 2 .
i =1
Differentiating will give
∂φ n
= −2∑ xi ( yi − bxi ) .
∂b i =1
Setting this partial derivative equal to zero, we shall obtain the normal equation
b∑ xi2 = ∑ xi yi
which, upon solving gives
b=
∑ xy .
∑x 2
Note that
b=
∑ xy = 1 ∑ x( β x + ε )
∑x ∑x 2 2
1
=β+ ∑ xε .
∑ x2
Obviously, E(b) = β and is therefore unbiased. In fact, b is BLU.
var(b) = E (b − E (b)) 2
⎛ ∑ xε
2
⎞
= E⎜ ⎟⎟
⎜ ∑ x2
⎝ ⎠
σ2
= .
∑x 2
Residual SS = ∑ ( y − bx) = ∑ y2 2
− b2 ∑ x 2 .
b∑ x 2
Here, R 2 = . Different definition than with intercept case is used here.
∑y 2
26
3.6.1 Sum of residuals
Consider first the case with intercept
yi = α + β xi + ε i .
The residual ei is given by:
e = y − yˆ = y − a − bx
= y − ( y − bx ) − bx
= ( y − y ) − b( x − x ).
Thus
∑ e = ∑ ( y − y ) + b∑ ( x − x ) = 0 .
For the case without intercept,
e = y − yˆ = y − bx
= y−x
∑ xy .
∑x 2
∑ e = ∑ y − ∑ x∑ xy / ∑ x 2
=
∑ x ∑ [ ∑ x − x] y. 2
∑x 2
∑x
This sum is uncertain. It may equal to any value. Thus the sum of residuals will sum
to 0 if there is an intercept but could be different from 0 when there is no intercept.
The residual SS is given by:
∑ e = ∑ ( y − bx) = ∑ y − b2 ∑ x 2
2 2 2
( ∑ xy ) .
2
=∑y − 2
∑x 2
and an unbiased estimate of σ 2 is given by
Residual SS
s2 = .
n −1
s2
Estimated var(b) = .
∑ x2
27
Exercises
1. The data set (Nerlove data, filename: nerlove) has been used by many for
benchmarking software performance. You are required to find a regression
model to predict the kilowatt output from other factors. Use the three selection
procedures in turn to obtain the best set of predictor variables. Show your
work step by step. You may use computer packages to help with your
computation and check your results with those obtained from a statistical
package with a variable selection procedure.
OBSNO COST70 KWH70 PL70 PK70 PF70 SL SK SF

1 0.2130 8 6869.47 64.945 18.0000 0.3291 0.4197 0.2512
20 0.4887 14 5438.89 86.110 34.1500 0.2994 0.4571 0.2435
17 0.6159 50 9204.24 90.470 32.0700 0.2963 0.0981 0.6056
14 0.7606 65 8971.89 41.243 28.5390 0.2802 0.1282 0.5916
28 0.6363 67 6696.50 58.258 25.4000 0.2447 0.3674 0.3879
22 1.1474 90 7189.67 79.101 21.5030 0.2855 0.2073 0.5072
16 1.3422 183 5063.49 74.430 35.5100 0.0960 0.2629 0.6411
15 2.2587 295 8218.40 71.940 39.2000 0.1772 0.1623 0.6606
27 2.0532 374 7884.94 82.458 26.3014 0.2054 0.1208 0.6737
30 3.1504 378 7895.43 60.277 42.4683 0.0980 0.1996 0.7024
35 4.5050 467 8410.34 76.300 26.8500 0.2474 0.1917 0.5610
38 3.7255 643 7332.13 64.506 21.1619 0.2666 0.2428 0.4905
45 6.6214 856 8033.72 67.680 39.5812 0.1463 0.2083 0.6554
4 3.0427 869 8372.96 68.227 21.0670 0.1030 0.2913 0.6057
36 5.5971 938 7703.26 77.197 25.4000 0.1577 0.2342 0.6080
32 5.8488 1025 7093.32 68.227 22.2793 0.2164 0.2455 0.5382
48 8.6372 1090 8507.92 39.127 33.9000 0.2399 0.1769 0.5805
59 7.4395 1293 6821.09 78.225 21.4305 0.2031 0.3297 0.4672
41 6.0065 1328 6680.77 51.817 25.2319 0.1082 0.2611 0.6307
5 9.4059 1412 7960.90 40.692 41.5300 0.0891 0.1567 0.7542
55 6.6221 1500 7588.70 82.840 21.3951 0.1221 0.3186 0.5594
47 8.6852 1627 7912.40 79.220 31.6801 0.1105 0.2071 0.6824
61 9.7843 1627 8355.75 76.140 31.3374 0.1494 0.2360 0.6146
31 10.3136 1886 6833.93 67.680 25.6000 0.0826 0.4070 0.5104
168 14.2710 1901 9822.64 69.411 36.0000 0.2144 0.1596 0.6260
67 14.0431 2001 8611.53 74.025 30.6446 0.0714 0.1618 0.7668
77 18.8963 2020 10806.20 79.570 46.9000 0.1797 0.1594 0.6609
50 11.2419 2258 6892.61 81.096 29.1000 0.0997 0.2309 0.6694
178 14.9529 2325 8568.23 53.890 38.9161 0.1466 0.1269 0.7265
58 13.6270 2437 7846.06 65.565 30.6712 0.1210 0.2578 0.6212
78 13.2679 2445 8764.75 76.140 28.3648 0.1317 0.2411 0.6271
69 16.1764 2487 8088.45 66.622 32.6186 0.1371 0.1973 0.6656
57 10.9660 2506 7412.36 57.240 27.4207 0.1164 0.2152 0.6684
44 12.7830 2632 8496.84 70.295 30.9710 0.0800 0.2604 0.6596
65 19.7092 2682 9484.85 81.750 50.4516 0.1087 0.1958 0.6955
46 9.6429 2689 6364.40 78.440 23.4889 0.0829 0.2105 0.7067
68 10.1902 2764 6338.00 66.232 21.1970 0.1659 0.2312 0.6028
28
25 7.5492 2969 8183.34 80.657 9.0000 0.2397 0.3972 0.3631
75 22.5612 3571 7297.71 78.255 41.5951 0.1142 0.1833 0.7025
167 21.5587 3886 9538.68 63.569 30.8894 0.1252 0.2033 0.6715
62 20.8671 3965 8403.59 74.480 33.1992 0.1162 0.2151 0.6687
80 21.5454 3981 8186.05 75.082 35.2049 0.1052 0.2299 0.6650
89 17.4802 4148 7536.89 74.025 24.5837 0.1176 0.2035 0.6789
82 29.8011 4187 7996.44 74.120 47.4257 0.1052 0.1824 0.7125
181 19.4391 4560 8558.37 76.464 23.7777 0.1531 0.2577 0.5893
96 30.2067 5286 7084.10 73.325 38.3384 0.0884 0.1969 0.7147
103 24.2903 5316 9759.83 74.025 27.8380 0.1894 0.1740 0.6366
85 30.8773 5643 10182.50 61.040 27.8498 0.1722 0.2204 0.6074
95 22.4421 5648 8954.12 78.440 25.9160 0.0834 0.2111 0.7055
179 33.9733 5708 10024.20 78.102 42.1660 0.0986 0.1826 0.7188
91 19.9008 5785 7969.55 71.910 22.2448 0.1093 0.2180 0.6727
111 37.0666 6754 10177.90 77.197 25.6208 0.2070 0.2363 0.5566
81 35.5303 6770 7798.26 67.570 29.8250 0.1108 0.2814 0.6078
112 25.1686 6779 7826.93 74.200 20.2790 0.1427 0.2662 0.3909
87 24.3565 6793 6336.88 70.295 18.5909 0.1266 0.3253 0.5481
76 33.0175 6837 7310.15 69.795 28.4405 0.1187 0.2515 0.6298
110 40.5281 6891 6769.55 74.120 35.9651 0.0895 0.2393 0.6711
71 42.2514 7320 5879.51 92.063 39.2104 0.0864 0.2064 0.7072
177 33.8814 7382 7512.72 72.362 25.9001 0.1393 0.2486 0.6140
104 31.2922 7484 8063.73 67.680 23.5267 0.1713 0.2535 0.5752
94 27.0832 7896 7119.96 74.513 20.1100 0.1196 0.2484 0.6320
100 32.5840 7930 7119.01 48.997 22.8380 0.1209 0.2772 0.6018
120 52.7634 9145 10373.50 81.750 35.8083 0.2027 0.1997 0.5976
115 41.1798 9275 8657.53 76.140 24.5804 0.1047 0.3284 0.5666
102 47.3864 9530 7624.57 83.880 31.5825 0.1266 0.2106 0.6628
97 30.1678 9602 7054.18 59.977 20.2010 0.0928 0.2164 0.6908
92 28.7861 9660 6686.73 79.542 20.2630 0.0697 0.2391 0.6913
132 57.7267 10004 6472.86 76.300 28.0959 0.1806 0.2362 0.5832
123 38.8472 10057 6035.95 81.578 25.8240 0.0844 0.2178 0.6978
105 31.9884 10149 6437.92 73.140 18.5343 0.1169 0.2367 0.6464
166 51.7415 10361 9578.63 68.016 28.1423 0.1913 0.2407 0.5680
114 55.1764 10855 8061.96 71.490 31.7601 0.1192 0.2362 0.6445
125 48.1125 11114 8413.86 69.975 22.5536 0.1301 0.2969 0.5730
139 76.2528 11667 10436.30 80.660 46.0701 0.1120 0.1708 0.7172
169 66.1032 11837 8709.43 75.379 31.3321 0.1627 0.2103 0.6296
118 68.4800 12542 8142.84 80.385 35.7882 0.1336 0.1688 0.6976
126 79.0705 12706 9282.51 70.853 37.2477 0.1108 0.2011 0.6880
113 45.1827 12936 8320.06 65.760 22.0330 0.1027 0.1992 0.6981
106 41.9016 12954 6460.64 62.330 21.7550 0.0865 0.2194 0.6941
129 77.8849 13702 7113.79 70.850 34.9616 0.1212 0.2121 0.6667
119 97.3859 13846 7786.37 88.540 44.1571 0.1003 0.2066 0.6931
117 80.3593 16311 7282.61 81.550 40.9692 0.0527 0.1337 0.8136
176 79.6207 16508 9404.97 78.044 42.2086 0.1501 0.1556 0.6943
135 90.7168 17280 9191.47 72.967 36.8816 0.0918 0.1795 0.7287
109 58.1154 17875 6288.41 73.395 20.6191 0.0658 0.2781 0.6561
174 107.9780 18455 6690.23 76.300 32.9654 0.1513 0.2101 0.6386
29
140 134.2280 19445 9829.32 67.580 38.8027 0.1756 0.1834 0.6410
171 90.3718 21956 7954.47 83.338 22.9115 0.1169 0.2984 0.5847
170 113.2560 22522 9500.78 76.732 25.0289 0.1961 0.2604 0.5435
127 111.8680 23217 6873.73 83.880 33.3944 0.0849 0.2007 0.7144
142 125.3360 24001 8047.35 74.372 33.0932 0.0998 0.2457 0.6544
137 183.2320 27118 9914.36 78.480 41.7578 0.1280 0.2265 0.6455
130 87.1015 27708 6378.23 63.600 20.3000 0.1060 0.2257 0.6683
144 240.5140 29613 9312.93 81.750 41.8872 0.1561 0.2017 0.6422
143 191.5630 30958 9810.10 69.541 36.3076 0.1636 0.1524 0.6840
141 168.3780 34212 5683.83 80.385 40.5286 0.0651 0.1361 0.7988
138 169.2350 38343 9117.16 65.992 31.5897 0.0663 0.2192 0.7144
175 269.7730 46870 9761.38 69.541 33.1999 0.1594 0.2194 0.6212
172 240.4860 53918 6068.87 78.380 31.1954 0.0966 0.1846 0.7188
OBSNO The number of the observation

COST70 Total cost in millions of current dollars
KWH70 Millions of kilowatt hours of output
PL70 The price of labor
PK70 The rental price index of capital
PF70 The price index for fuels
SL The cost share of labor
SK The cost share of capital
SF The cost share of fuel
2. The relationship between sales and several other variables is o be investigated

for the Mediccorp. The data in EXCEL format can also be downloaded from
the WEBCT (filename: medicorp).
SALES ADV BONUS MKTSHR COMPET REGION

963.50 374.27 230.98 33.00 202.22 SOUTH
893.00 408.50 236.28 29.00 252.77 SOUTH
1057.25 414.31 271.57 34.00 293.22 SOUTH
1183.25 448.42 291.20 24.00 202.22 WEST
1419.50 517.88 282.17 32.00 303.33 MIDWEST
1547.75 637.60 321.16 29.00 353.88 MIDWEST
1580.00 635.72 294.32 28.00 374.11 MIDWEST
1071.50 446.86 305.69 31.00 404.44 SOUTH
1078.25 489.59 238.41 20.00 394.33 SOUTH
1122.50 500.56 271.38 30.00 303.33 WEST
1304.75 484.18 332.64 25.00 333.66 MIDWEST
1552.25 618.07 261.80 34.00 353.88 MIDWEST
1040.00 453.39 235.63 42.00 262.88 SOUTH
1045.25 440.86 249.68 28.00 333.66 WEST
1102.25 487.79 232.99 28.00 232.55 WEST
1225.25 537.67 272.20 30.00 273.00 WEST
1508.00 612.21 266.64 29.00 323.55 MIDWEST
30
1564.25 601.46 277.44 32.00 404.44 MIDWEST
1634.75 585.10 312.35 36.00 283.11 MIDWEST
1159.25 524.56 292.87 34.00 222.44 SOUTH
1202.75 535.17 268.27 31.00 283.11 WEST
1294.25 486.03 309.85 32.00 242.66 WEST
1467.50 540.17 291.03 28.00 333.66 MIDWEST
1583.75 583.85 289.29 27.00 313.44 MIDWEST
1124.75 499.15 272.55 26.00 374.11 WEST
(a) Is multicollinearity present? If yes, is it serious? How would you suggest

to overcome it?
(b) Use forward selection to determine the best subset of predictor variables
that best predict sales. Show your work step by step.
(c) Repeat (a) using backward elimination procedure.
(d) Repeat (a) using stepwise regression.
(e) Compare and comment on the results you obtain in (a), (b) and (c).
(f) Code the regions appropriately and obtain separate regression lines for
each of the regions. Do we need separate regression lines for different
regions?
3. The table (filename: passenger miles) below gives the cost in $ and passenger
miles of an airline for 22 consecutive years. Run a regression of C on Q in the
form:
C = β 0 + β1Q + ε
to see how passenger miles will affect its operating cost. Calculate the Durbin-
Watson statistic and test for its significance, state whether the errors are
possibly positively or negatively correlated. If the D-W statistic indicates its
significance apply the Orcutt-Cochrane procedure to remove the error
autocorrelation. State your final regression equation.
T Q C Q=output, revenue
1 1140640 952.757 T=year
2 1215690 986.757 C=total cost, in $
3 1309570 1091.98
4 1511530 1175.78
5 1676730 1160.17
6 1823740 1173.76
7 2022890 1290.51
8 2314760 1390.67
9 2639160 1612.73
10 3247620 1825.44
11 3787750 1546.04
31
12 3867750 1527.9
13 3996020 1660.2
14 4282880 1822.31
15 4748320 1936.46
16 569292 520.635
17 640614 534.627
18 777655 655.192
19 999294 791.575
20 1203970 842.945
21 1358100 852.892
22 1501350 922.843
4. Nonlinear The Cobb-Douglas production function is a commonly used form to

assess the industry production efficiency. A two factor production may take
the form:
Q = γ K α Lβ
where Q = production output
K = capital input
L = labour input
α, β, γ are constants
The rate of return of a factor is defined to the percentage change in output with
respect to the percentage change in input of the factor, e.g. rate of return of
capital K is given by
∂Q / Q K ∂Q ∂ ln Q
= = .
∂K / K Q ∂K ∂ ln K
(a) Show that the rate of returns of capital and labour α are β and respectively.
The sum of the coefficients α + β is called the total return to scale of

production. Industry expands by increasing either its capital input, labour input
or both, α + β will then be the resulted changes in output. The following data
(filename: production data) are available from a survey of 27 statewide meta
production establishments.
Obs ValueAdd Labor Capital

1 657.29 162.31 279.99
2 935.93 214.43 542.5
3 1110.65 186.44 721.51
4 1200.89 245.83 1167.68
5 1052.68 211.4 811.77
6 3406.02 690.61 4558.02
7 2427.89 452.79 3069.91
32
8 4257.46 714.2 5585.01
9 1625.19 320.54 1618.75
10 1272.05 253.17 1562.08
11 1004.45 236.44 662.04
12 598.87 140.73 875.37
13 853.1 145.04 1696.98
14 1165.63 240.27 1078.79
15 1917.55 536.73 2109.34
16 9849.17 1564.83 13989.55
17 1088.27 214.62 884.24
18 8095.63 1083.1 9119.7
19 3175.39 521.74 5686.99
20 1653.38 304.85 1701.06
21 5159.31 835.69 5206.36
22 3378.4 284 3288.72
23 592.85 150.77 357.32
24 1601.98 259.91 2031.93
25 2065.85 497.6 2492.98
26 2293.87 275.2 1711.74
27 745.67 137 768.59
Production data
Primary metals, 27 Statewide observations, data are per establishment
ValueAdd : value added
Labor : labor input
Capital : capital stock - gross value of plant and equipment
(b) By transforming the production appropriately suggest a regression

equation of Q on K and L which may be estimated by linear regression
method.
(c) Find the returns of the inputs and total return to scale.
(d) If the return to a factor is smaller than that of another, the former is said to
be more intensive than the latter in production. Which factor is more
intensive?
(e) Often it is interested to know whether there is a constant return to scale.

Mathematically this means α + β = 1. This means that whichever you
increase, the resulted production will be increased by the same percentage.
Thus it will be more profitable for an industry to increase the input factor
which is relatively cheaper. Test the hypothesis that the return to scale is
constant at 5 % level of significance.
33

Chapter 3 Special Topics in Regression

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Chapter 3 Special Topics in Regression

Uploaded by

Copyright:

Available Formats

Chapter 3 Special topics in regression

3.1 Sequential variable selection procedures

An addition of an extra predictor usually improves the fitting. But it is only

3.1.1 Forward Selection

1. Select the first ‫ݔ‬ଵ‫ כ‬with the largest R2.

3.1.2 Backward Elimination

3.1.3 Stepwise Regression

1. Select the first ‫ݔ‬ଵ‫ כ‬with the largest R2.

F-in > 4 → to include x in forward selection

Predictor SSE Forward Selection

Suppose that X are centered and scaled as:

Then X*′X* will be a correlation matrix of the regressors, i.e.

⎡ 1 r12 r13 " r1,k −1 ⎤

⎡ 1.0 .99215⎤ ⎡ 63.94 −63.44 ⎤

⎡ 1.0 .975⎤ ⎡ 20.2532 −19.747 ⎤

⎡1.0 .2 ⎤ ⎡1.042 −.208⎤

⎡1.0 .1 ⎤ ⎡ 1.01 −1.01⎤

var( β̂1 ) 63.94 20.2532 1.042 1.01

var( β̂ 0 ) 63.94 20.2532 1.042 1.01

In the ideal case where all x are independent or orthogonal

Cases A, B: var( βˆi ) very large and changes greatly.

Consider the regression of x1 on x2, x3, …, xk-1 or x1 on X2, the coefficient of

Residual SS of x1 on other x's

var( β̂1 ) = σ2/Residual SS of x1 on other x’s

The regression of y on x1 alone will give

Thus with the presence of other regressors x2, x3,… we have

The mean of VIF

may be used to be a measure of severity of multicollinearity.

VIF = 1 if the x’s are near orthogonal or uncorrelated.

Another measure of multicollinearity is the conditional number of the matrix X′X.

Some solutions to multicollinearity are Ridge Regression and Principal Component

Row Student Con Yd LiA

1 Mary 2000 2500 26000

Regression of CON on Yd and LiA

The regression equation is

Predictor Coef Stdev t-ratio p VIF

s = 384.8 R-sq = 90.1% R-sq(adj) = 86.1%

2.555047 0.000485 -0.000108

Regression of CON on Yd only

The regression equation is

Predictor Coef Stdev t-ratio p

s = 352.9 R-sq = 90.0% R-sq(adj) = 88.3%

XtX-inv = 2.334296 -0.000565

The regression equation is

Predictor Coef Stdev t-ratio p

s = 409.7 R-sq = 86.5% R-sq(adj) = 84.3%

XtX-inv = 2.5092783 -0.0000598

Regression of CON on Yd and LiA without intercepts

* NOTE * Yd is highly correlated with other predictor variables

The regression equation is

Predictor Coef Stdev t-ratio p

Con 1.00000 0.94873 0.93023

Yd and LiA are highly correlated.

Autocorrelated errors may be modeled by

3.3.1 Durbin-Watson Test

d < 2 ⇒ positive autocorrelation i.e. ρ > 0

If d < 2 we test for positive autocorrelation:

If d > 2 we test for negaitive autocorrelation. We compare 4 – d with dL and dU.

The dL, dU values are tabulated in a DW table.

3.3.2 Orcutt-Cochrane (O-C) Procedure

Consider the regression model with autocorrelated errors

Suppose ρ is known then

By regressing yi* = yi − ρ yi −1 on xi* = xi − ρ xi −1 we shall obtain the

Then X′X will be a correlation matrix of the regressors, i.e.

Regression Analysis: Y versus X