You are on page 1of 4

(a) Suppose that MLR1-4 hold, the dependent variable respond, is measured

with an error (i.e. respond = respond* +e). we would have the following
scenarios.
If the measurement error e is independent of all regressors (xs), then the
Zero Conditional Mean assumption will hold and the OLS estimators we get
from regression will be unbiased and consistent asymptotically.
i.e. e =respond respond*. The model can be expressed as
respond*=
0
+
1
resplast+
2
avggift+
3
propresp++u
respond =
0
+
1
resplast+
2
avggift+
3
propresp++(u+e)
with a new error term (u+e)
On the other hand, if e is correlated with x variables, then it will cause bias in
the OLS estimators as it does not satisfies ZCM assumption.
(b) Suppose that MLR1-4 hold for the model and all variables are correctly
measured except one regressor, mailsyear, is measured with an additive
error(e) and e is uncorrelated with the truth mailsyear*. i.e. e =mailsyear
mailsyear*.
The model can be expressed as
ecobuy=
0
+
1
resplast+
2
avggift ++
4
mailsyear*++u
=
0
+
1
resplast+
2
avggift ++
4
(mailsyear e)++u
=
0
+
1
resplast+
2
avggift++
4
mailsyear++(u
4
e)
with a new error term (u
4
e)
Since e is with expected value of 0, uncorrelated with mailsyear* i.e.
Cov(mailsyear*,e) = 0 E(e)=0
Thus Cov(mailsyear ,e) = Cov(mailsyear*+e ,e) = Cov (mailsyear*, e) +
E(e
2
) - [E(e)]
2
=
2
> 0 (variance of e). This illustrates a strong
relationship between the measurement error and the observed
variable(mailsyear). Hence, ZCM fails to hold, all OLS estimators including

1
(i.e.
0,

1
,
2
) will be inconsistent, inefficient and biased toward zero.
This is also known as attenuation bias.
(c) Using OLS, the model is estimated to be

=-0.1172-0.0987resplast+0.00017avggift+0.731propresp+0.062mailsyear
(0.0245)(0.018) (0.000085) (0.7307) (0.010)
[0.0233] [0.020] [0.000021] [0.0340] [0.011]
n=4268, R
2
=0.2095
the heteroskedasticity-robust standard errors are in []
Ceteris paribus, if number of mailings per year increased by 1 more unit,
the probability of responded with gift is predicted to increase by 0.062.
By comparing the heteroskedasticity-robust standard errors with the usual
ones, it is observed that there is no much differences between them. We
should note that although the robust standard errors are not significantly
smaller/more efficient than the usual ones, one cannot conclude the
absence of heteroskedasticity since the model is in the form of LPM.
(d) To test the presence of heteroskedasticity, a Breusch-Pegan LM test can
be used.
H
0
: Homoskedasticity is present in the model(MLR5) against H
a
:
Heteroskedasticity is present.
Regress ecobuy on all regressors and save the squared residuals u
2
.
Regress u
2
on the same regressors again to get the R-squared R
u
2
2
.
The LM test statistic is then LM=nR
u
2
2
where n is the number of
observations (data).
As the test statistic LM should follow a
2
distribution with degree of
freedom 4(number of regressors)under the null, a large LM or a small
p-value would therefore lead to a rejection of the null hypothesis.
Conclusion: As the LM test yields a p-value of 0.000, we conclude to reject
the null hypothesis, which leads to the result that there is strong statistical
evidence to suggest the presence of heteroskedasticity.
(e) Suppose that MLR1-4 hold for the model. In the LPM, the variance of the
error term given all repressors, can be expressed as,
Var(u|x) = Var(y|x) = E(y|x)[1-E(y|x)] = Pr(y=1|x)[1-Pr(y=1|x)] = y(1-y)
(f) Proceeding from part (e), by estimating the model using OLS, we obtained
the fitted valuey. However, it is seen that for the i
th
observation, the
heteroskedasticity is in the form

(1-

). Thus, a suitable WLS with


weight 1/

can be utilized in estimation to regain homoskedasticity.



After checking whether all of the fitted values are inside the interval (0,1)
and adjusting for those values that are not, we can proceed WLS by
multiplying i
th
observation by 1/

i
, and then further regress y
i
/

i
on
{1/

i
, resplast
i
/

i
, avggift
i
/

i
, propresp
i
/

i
, mailsyear
i
/

i
} without
intercept.

Practically, the fitted values y
i
s do not necessarily fall inside of interval (0,1)
which could be problematic. As either y
i
0 or y
i
1, for h
i
= y
i
(1-y
i
), will
cause h
i
s to be negative(or zero), in which case we cannot proceed WLS
as we cannot take square root of a negative number or divide by 0. One
option to overcome this problem is to firstly check the range for all y
i
s,
and then replace negative or 0 y
i
s with positive near 0 numbers(0.001)
and y
i
1 with numbers less than but close to 1(0.999). However, the
choices of the numbers are arbitrary and do not work well if too many fitted
values are outside the interval (0, 1).
(g) It is shown in previous parts; the assumption of a linear form of
heteroskedasticity could be problematic for this model, as the fitted values
may not be in the required interval range. Thus, another method namely
Feasible GLS (FGLS) would be used to estimate the form h(x), in order to
make generalized least square(GLS) estimation feasible.
Assuming an exponential function for the form of heteroskedasticity

i.e. Var(u|x) =
2
exp(
0
+
1
x
1
+
2
x
2
++
k
x
k
),
The form of heteroskedasticity is therefore

h(x)= exp(
0
+
1
x
1
+
2
x
2
++
k
x
k
),
The FGLS method then works as follows,
Run OLS regression of ecobuy on all regressors to get the squared
residuals, namely

.
Take the natural logarithm of the squared residuals, log (

).
Run OLS regression again of log (

) on all regressors to get the fitted
value.
Apply WLS method(as discussed before), to re-estimate the original
model using
Weight 1/

= exp(-).
Estimation using FGLS,

=-0.1172-0.0987resplast+0.00017avggift+0.731propresp+0.062mailsyear
(0.0245)(0.018) (0.000085) (0.7307) (0.010)
n=4268,R
2
=0.2095

Estimation using ordinary OLS

=-0.1172-0.0987resplast+0.00017avggift+0.731propresp+0.062mailsyear
(0.0245)(0.018) (0.000085) (0.7307) (0.010)
n=4268,R
2
=0.2095

By comparing above two sets of estimates, we can find that the standard
errors for most of the estimates from FGLS estimation are less than those
from ordinary OLS estimation when heteroskedasticity is present. This
essentially concludes that, for large number of samples, the FGLS is
consistent, asymptotically normal and more efficient than the OLS in the
presence of heteroskedasticity.
(h) To test for possible model misspecifications, we consider cases where
misspecification is due to omitting a function (i.e. quadratic, cubic) of the
independent variables, and test it using the Regression Specification
Error Test (RESET). It is essentially a hypothesis test on the statistical
significance of the additional functions of variables (suspected to be omitted)
when added to the model.
OLS regress the original model and use saved fitted value, y.
Set out the hypothesis test as follows,
H
0
:
1
=0,
2
=0 in the model y =
0
+
1
x
1
++
k
x
k
+
1
y
2
+
2
y
3
+u VS H
a
: at
least one 0 in H
0.
Their joint test statistics (F-statistics) should follow F-distribution with
parameters 2, 4261(F
2,4261
) under the null.
Reject the null with a small p-value or a large test statistics (F-statistics)
that is greater than F
(2,4261)
critical value. In this case, the p-value is
large, which leads to the conclusion of falling to reject H
0
at any
reasonable significance level.
In conclusion, there is no strong evidence to support the statistical
insignificance of
1
or
2
(at least one of them is different from 0). This
essentially suggests that there is possible model misspecification. The
model has likely omitted a function of the dependent variable as
regressors which suggest us to try to include a quadratic form or a
cubic form or both.
(i) This is another form of the Breusch-Pegan test for heteroskedasticity, the
F-statistics computed from the data should be following F
4,4263
. Since there
are 4 regressors and 4268 number of observations.
(j) The R-square from the BP test can be interpreted as the proportion of
variation in the residual term that is explained by the regressors. This is
supposed to be tiny when homoskedasticity is present. (the residual should
not correlate with xs) Therefore, the R-squared from the regression should
be at least as large as the R-squared from the BP regression because in
that, the correlation should contain both SSEs.

You might also like