You are on page 1of 12

Dougherty: Econometrics 3e Instructor's Manual

8
8.1

STOCHASTIC REGRESSORS AND MEASUREMENT ERROR


Demonstrate that b1 = Y b2 X is a consistent estimator of 1 in the simple regression model. (You may take as given that b2 is a consistent estimator of 2.)

Answer:

plim b1 = plim ( 1 + 2 X + u ) plim b2 plim X = 1

since plim u = 0.

Alternatively, we have seen that b1 is unbiased and that its variance is


2 2 b1 = u +

X2 n MSD( X )

Since the variance tends to zero as n tends to infinity, the estimator is consistent.
8.2

In a certain industry, firms relate their stocks of finished goods, Y, to their expected annual sales, Xe, according to a linear relationship Y = 1 + 2Xe. Actual sales, X, differ from expected sales by a random quantity u, that is distributed with zero mean and constant variance: X = Xe + u where u is distributed independently of Xe. An investigator has data on Y and X (but not on Xe) for a cross-section of firms in the industry. Describe the problems that would be encountered if OLS were used to estimate 1 and 2, regressing Y on X. Answer: This is a standard errors-in-the-explanatory-variable model, with OLS leading to inconsistent estimates. Applying equation (8.16), the large sample bias is 2
2 u 2 2 Xe + u

8.3

In a similar industry, firms relate their intended stocks of finished goods, Y*, to their expected annual sales, Xe, according to a linear relationship Y* = 1 + 2Xe. Actual sales, X, differ from expected sales by a random quantity u, which is distributed with zero mean and constant variance: X = Xe + u where u is distributed independently of Xe. Since unexpected sales lead to a reduction in stocks, actual stocks are given by Y = Y* u.

C. Dougherty 20012006. All rights reserved. Version of 15.04.07.

STOCHASTIC REGRESSORS

An investigator has data on Y and X (but not on Y* or Xe) for a cross-section of firms in the industry. Describe analytically the problems that would be encountered if OLS were used to estimate 1 and 2, regressing Y on X. [Note: You are warned that the standard expression for measurement error bias is not valid in this case.] Answer: This model is slightly more complex than the basic errors-in-variables model since the measurement error in the (measured) dependent variable is correlated with the (measured) explanatory variable. You may wish to warn students that they should analyze the large-sample properties starting from scratch, rather than attempt to borrow or modify expression (8.28). Given the definitions, Y + u = 1 + 2(X u) so Y = 1 + 2X (1 + 2) u = 1 + 2X + v where v = (1 + 2)u If you use OLS to fit the equation, b2 =
n

b2 = 2 +

(X X )(Y Y ) can be (X X ) 1 (X X )(v v ) (X X )(v v ) n


i i
2

decomposed as

i =1

(X i X )

= 2 +

i =1

1 n

(X

X)

The numerator and denominator have both been divided by i to assure that they have probability limits. Hence

plim b2 = 2 + = 2

cov [ X e + u ],(1 + 2 )u cov( X , v) = 2 + 2 var( X ) X


2 (1 + 2 ) u 2 X

assuming that there is some variation in X. Hence in large samples the slope coefficient is downwards biased. It follows that the intercept will be upwards biased.
.8.4* A variable Q is determined by the model

Q = 1 + 2X + v, where X is a variable and v is a disturbance term that satisfies the regression model conditions. The dependent variable is subject to measurement error and is measured as Y where Y=Q+r and r is the measurement error, distributed independently of v. consequences of using OLS to fit this model if Describe analytically the

C. Dougherty 20012006. All rights reserved. Version of 15.04.07.

STOCHASTIC REGRESSORS

(1) the expected value of r is not equal to zero (but r is distributed independently of Q), (2) r is not distributed independently of Q (but its expected value is zero). Answer: Substituting for Q, the model may be rewritten Y = 1 + 2X + v + r = 1 + 2X + u where u = v + r. Then

b2 = 2 +

(X X )(u u ) = (X X )
i i 2 i

(X

X )(vi v ) +
i

(X (X X )
2

X )(ri r )

and
E (b2 ) = E 2 + = 2 + = 2 + = 2 provided that X is nonstochastic. (If X is stochastic, the proof that the expected value of the error term is zero is parallel to that in Section 8.2 of the text.) Thus b2 remains an unbiased estimator of 2. However, the estimator of the intercept is affected if E(r) is not zero.
b1 = Y b2 X = 1 + 2 X + u b2 X = 1 + 2 X + v + r b2 X

(X
1
i

X )(vi v ) +
i

(X (X

X) X)

(X (X X ) E ( (X X )(v
2 i i

X )(ri r ) v)+

(X
i

X )(ri r )

1
i

(X

X )E (v i v ) +

(X

X )E (ri r )

Hence

E (b1 ) = 1 + 2 X + E (v ) + E (r ) E (b2 X ) = 1 + 2 X + E (v ) + E (r ) XE (b2 ) = 1 + E (r )

Thus the intercept is biased if E(r) is not equal to zero, for then E (r ) is not equal to 0. If r is not distributed independently of Q, the situation is a little bit more complicated. For it to be distributed independently of Q, it must be distributed independently of both X and v, since these are the determinants of Q. Thus if it is not distributed independently of Q, one of these two conditions must be violated. We will consider each in turn. (a) r not distributed independently of X. We now have

C. Dougherty 20012006. All rights reserved. Version of 15.04.07.

STOCHASTIC REGRESSORS

plim b2 = 2 +

plim

1 n

(X

X )(vi v ) + plim
i

(X 1 plim (X X ) n
1 n
2

X )(ri r )

= 2 +

Xr 2 X
It follows that b1 will also be an

Since Xr 0, , b2 is an inconsistent estimator of 2.

inconsistent estimator of 1:
b1 = 1 + 2 X + v + r b2 X

Hence
plim b1 = 1 + 2 X + plim v + plim r X plim b2 = 1 + X ( 2 plim b2 ) and this is different from 1 if plim b2 is not equal to 2. (b) r is not distributed independently of v. This condition is not required in the proof of the unbiasedness of either b1 or b2 and so both remain unbiased.
8.5*

A variable Y is determined by the model Y = 1 + 2Z + v, where Z is a variable and v is a disturbance term that satisfies the regression model conditions. The explanatory variable is subject to measurement error and is measured as X where X=Z+w and w is the measurement error, distributed independently of v. Describe analytically the consequences of using OLS to fit this model if (1) the expected value of w is not equal to zero (but w is distributed independently of Z), (2) w is not distributed independently of Z (but its expected value is zero). Answer: Substituting for Z, we have Y = 1 + 2(X w) + v. =1 + 2X + u where u = v 2w.0 b2 = 2 +

(X X )(u u ) (X X )
i i
2

It is not possible to obtain a closed-form expression for the expectation of the error term since both its numerator and its denominator depend on w. Instead we take plims, having first divided the numerator and the denominator of the error term by n so that they have limits:

C. Dougherty 20012006. All rights reserved. Version of 15.04.07.

STOCHASTIC REGRESSORS

plim b2 = 2 +

plim

1 n

(X
1 n

X )(u i u )
i

plim = 2 + = 2 +

(X

X)

cov([ Z + w], [v 2 w]) cov( X , u ) = 2 + var( X ) var( X ) cov(Z , v) 2 cov(Z , w) + cov(w, v) 2 cov(w, w) var( X )

If E(w) is not equal to 0, b2 is not affected. The first three terms in the numerator are zero and plim b2 = 2 +
2 2 w 2 X

remains inconsistent as in the standard case. If w is not distributed independently of Z, then the second term in the numerator is not 0. b2 remains inconsistent, but the expression is now plim b2 = 2 +
2 Zw 2 w 2 X

The OLS estimator of the intercept is affected in both cases, but like the slope coefficient, it was inconsistent anyway.
b1 = Y b2 X = 1 + 2 X + u b2 X = 1 + 2 X + v 2 w b2 X

Hence plim b1 = 1 + ( 2 plim b2 ) X + plim v 2 plim w In the standard case this would reduce to
plim b1 = 1 + ( 2 plim b2 ) X = 1 + 2 If w has expected value w, not equal to zero,
2 w X. 2 X

2 plim b1 = 1 + 2 w X w . 2 X If w is not distributed independently of Z, plim b1 = 1 + 2


2 Zw + w X. 2 X

C. Dougherty 20012006. All rights reserved. Version of 15.04.07.

STOCHASTIC REGRESSORS

8.6*

A researcher investigating the shadow economy using international cross-sectional data for 25 countries hypothesizes that consumer expenditure on shadow goods and services, Q, is related to total consumer expenditure, Z, by the relationship Q = 1 + 2Z + v where v is a disturbance term that satisfies the regression model conditions. Q is part of Z and any error in the estimation of Q affects the estimate of Z by the same amount. Hence Yi = Qi + wi and Xi = Zi + wi where Yi is the estimated value of Qi, Xi is the estimated value of Zi, and wi is the measurement error affecting both variables in observation i. It is assumed that the expected value of w is zero and that v and w are distributed independently of Z and of each other. (1) Derive an expression for the large-sample bias in the estimate of 2 when OLS is used to regress Y on X, and determine its sign if this is possible. [Note: You are warned that the standard expression for measurement error bias is not valid in this case.] (2) In a Monte Carlo experiment based on the model above, the true relationship between Q and Z is Q = 2.0 + 0.2Z. A sample of 25 observations is generated using the integers 1, 2, ..., 25 as data for Z . The variance of Z is 52.0. A normally distributed random variable with mean 0 and variance 25 is used to generate the values of the measurement error in the dependent and explanatory variables. The results with 10 samples are summarized in the table. Comment on the results, stating whether or not they support your theoretical analysis.
Sample 1 2 3 4 5 6 7 8 9 10 b1 0.85 0.37 2.85 2.21 1.08 1.32 3.12 0.64 0.57 0.54 s.e.(b1) 1.09 1.45 0.88 1.59 1.43 1.39 1.12 0.95 0.89 1.26 b2 0.42 0.36 0.49 0.54 0.47 0.51 0.54 0.45 0.38 0.40 s.e.(b2) 0.07 0.10 0.06 0.10 0.09 0.08 0.07 0.06 0.05 0.08 R2 0.61 0.36 0.75 0.57 0.55 0.64 0.71 0.74 0.69 0.50

C. Dougherty 20012006. All rights reserved. Version of 15.04.07.

STOCHASTIC REGRESSORS

Q, Y
10

0 -5 0 5 10 15 20 25

Z, X

-5

-10

(3) The graph plots the points (Q, Z) and (Y, X) for the first sample, with each (Q, Z) point linked to the corresponding (Y, X) point. Comment on this graph, given your answers to parts (1) and (2). Answer: (1) Substituting for Q and Z in the first equation, (Y w) = 1 + 2(X w) + v. Hence Y = 1 + 2 X + v + (1 2 )w = 1 + 2 X + u where u = v + (1 2)w. So b2 = 2 +

(X X )(u u ) (X X )
i i 2 i

It is not possible to obtain a closed-form expression for the expectation of the error term since both its numerator and its denominator depend on w. Instead we take plims, having first divided the numerator and the denominator of the error term by n so that they have limits: plim 1 n

plim b2 = 2 +

(X
1 n

X )(u i u )
i

plim = 2 + = 2 +

(X

X)

cov([ Z + w], [v + (1 2 ) w]) cov( X , u ) = 2 + var( X ) var( X ) cov(Z , v) + (1 2 ) cov(Z , w) + cov(w, v) + (1 2 ) cov(w, w) var( X )

C. Dougherty 20012006. All rights reserved. Version of 15.04.07.

STOCHASTIC REGRESSORS

Since v and w are distributed independently of Z and of each other, cov(Z, v) = cov(Z, w) = cov(w, v) = 0, and so
plim b2 = 2 + (1 2 )
2 w . 2 X

2 clearly should be positive and less than 1, so the bias is positive.


2 2 2 (2) X = Z + w , given that w is distributed independently of Z, and hence 2 X = 52 + 25 = 77. Thus

plim b2 = 0.2 +

(1 0.2) 25 . = 0.46 77

The estimates of the slope coefficient do indeed appear to be distributed around this number. As a consequence of the slope coefficient being overestimated, the intercept is underestimated, negative estimates being obtained in each case despite the fact that the true value is positive. The standard errors are invalid, given the severe problem of measurement error. (3) The diagram shows how the measurement error causes the observations to be displaced along 45 lines. Hence the slope of the regression line will be a compromise between the true slope, 2, and 1. More specifically, plim b2 is a weighted average of 2 and 1, the weights being the variances of Z and w: plim b2 = 2 + (1 2 )
2 w 2 2 = 2 z 2 2 + 2 w 2 2 2 Z +w Z +w Z +w

8.7

In a certain economy the variance of transitory income is 0.5 that of permanent income, the propensity to consume nondurables out of permanent income is 0.6, and there is no expenditure on durables. What would be the value of the multiplier derived from a nave regression of consumption on income, and what would be the true value? Answer: In a conventional OLS regression,

plim b2 = 0.6 0.6

2 YT 2 2 Y P + YT

= 0.6 0.6

2 0.5 Y P 2 2 Y P + 0.5 Y P

= 0 .6 0 .2 = 0 .4

Hence the multiplier would appear to be In reality,

1 = 1.67. 1 0 .4

C = 0.6YP = 0.6YP + 0YT Hence if a change in income is regarded as permanent, the multiplier will be change is regarded as transitory, the multiplier will be
1 = 2.50. If a 1 0.6

1 = 1. If the change in income is 1 0 regarded as partly permanent, partly transitory, the multiplier will lie between these limits.

C. Dougherty 20012006. All rights reserved. Version of 15.04.07.

STOCHASTIC REGRESSORS

8.8

In his definition of permanent consumption, Friedman includes the consumption of services provided by durables. Purchases of durables are classified as a form of saving. In an economy similar to that in Exercise 8.7, the variance of transitory income is 0.5 that of permanent income, the propensity to consume nondurables out of permanent income is 0.6, and half of current saving (actual income minus expenditure on nondurables) takes the form of expenditure on durables. What would be the value of the multiplier derived from a nave regression of consumption on income, and what would be the true value? Answer: Let C be expenditure on nondurables, D be expenditure on durables, E = C + D be total consumer expenditure as conventionally measured, S be saving, Y be actual income, YP be permanent income and YT be transitory income. Then Y = YP + YT C = 0.6YP D = 0.5S = 0.5(Y C) E = C + 0.5(Y C) = 0.5C + 0.5Y = 0.3YP + 0.5Y = 0.8Y 0.3YT Multiplier from naive regression: In a conventional OLS regression of E on Y, the slope coefficient b2 would be computed as

b2 =

(Y Y )(E E ) = (Y Y )(0.8[Y Y ] 0.3[Y Y ]) (Y Y ) (Y Y ) 1 (Y Y )(Y Y ) = 0.8 0.3 n ([Y Y ] + [Y Y ])(Y Y ) = 0.8 0.3 1 (Y Y ) ([Y Y ] + [Y Y ]) n
T T i i i i i 2 2 i i T T P P T T T T i i i i i 2 i P P T T 2 i i

In large samples,
plim b2 = 0.8 0.3 = 0.8 0.3 cov(Y T , Y P ) + cov(Y T , Y T ) var(Y P ) + var(Y T ) + 2cov(Y P , Y T )
2 YT 2 2 Y P + YT 2 0.5 Y P 2 2 Y P + 0.5 Y P

= 0.8 0.3

= 0.8 0.1 = 0.7 Hence the multiplier derived from a conventional OLS regression would be True multiplier: The expression for E can be rewritten E = 0.8Y 0.3YT
1 = 3.33. 1 0.7

C. Dougherty 20012006. All rights reserved. Version of 15.04.07.

STOCHASTIC REGRESSORS

10

= 0.8YP + 0.5YT Hence if the change in income is regarded as permanent, the multiplier will be if it is regarded as transitory, it will be 1 = 5 , and 1 0.8

1 = 2 . If, more realistically, it is regarded as partly 1 0.5 permanent, partly transitory, the multiplier will correspondingly lie between theses limits. Why do we get a multiplier of 2 for transitory income, instead of 1? We are still assuming that transitory income does not give rise to consumption expenditure, defined narrowly, but it does give rise to consumer expenditure, defined broadly. Although it is still being assumed that all transitory income is saved, part of the saving is in the form of expenditure on durables, and this gives rise to the multiplier effect.

8.9

In Exercise 8.2, the amount of labor, L, employed by the firms is also a linear function of expected sales: L = 1 + 2Xe. Explain how this relationship might be exploited by the investigator to counter the problem of measurement error bias. Answer: In view of the fact that L is a linear function of Xe, it will be highly correlated with X, and presumably it will be uncorrelated with u. Hence it can be used as an instrument for X. To demonstrate consistency, 1 (Yi Y )(Li L ) n (Yi Y )(Li L ) IV b2 = = (X i X )(Li L ) 1 (X i X )(Li L ). n Hence cov(Y , L) cov([ 1 + 2 X 2 u ], L) IV = plim b2 = cov( X , L) cov( X , L)

= 2 +

cov( 2 u , L) = 2 cov( X , L)

since cov(u, L) = 0 and cov(X, L) is nonzero.


8.10 It is possible that the ASVABC test score is a poor measure of the kind of ability relevant for

earnings. Accordingly, perform an OLS regression of the logarithm of hourly earnings on years of schooling, work experience, and ASVABC using your EAEF data set and an IV regression using SM, SF, SIBLINGS, and LIBRARY as instruments for ASVABC. Perform a DurbinWu Hausman test to evaluate whether ASVABC appears to be subject to measurement error. Answer: The coefficient of ASVABC rises from 0.009 in the OLS regression to 0.025 in the IV regression with SM used as an instrument, the increase being consistent with the hypothesis of measurement error. However ASVABC is not highly correlated with any of the instruments and the standard error of the coefficient rises from 0.003 in the OLS regression to 0.015 in the IV regression. The chi-squared statistic, 1.32, is low and there is no evidence that the change in the estimate is anything other than random.

C. Dougherty 20012006. All rights reserved. Version of 15.04.07.

STOCHASTIC REGRESSORS

11

. ivreg LGEARN S EXP MALE ETHBLACK ETHHISP (ASVABC=SM SF SIBLINGS LIBRARY) Instrumental variables (2SLS) regression Source | SS df MS -------------+-----------------------------Model | 72.3086993 6 12.0514499 Residual | 141.701688 533 .265856826 -------------+-----------------------------Total | 214.010387 539 .397050811 Number of obs F( 6, 533) Prob > F R-squared Adj R-squared Root MSE = = = = = = 540 49.22 0.0000 0.3379 0.3304 .51561

-----------------------------------------------------------------------------LGEARN | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------ASVABC | .0253515 .0149574 1.69 0.091 -.0040312 .0547341 S | .0745083 .0319962 2.33 0.020 .0116542 .1373624 EXP | .0231284 .0075958 3.04 0.002 .008207 .0380498 MALE | .2577989 .0522823 4.93 0.000 .1550943 .3605035 ETHBLACK | .0474775 .1656892 0.29 0.775 -.2780065 .3729615 ETHHISP | .0457423 .1327565 0.34 0.731 -.2150479 .3065325 _cons | -.0599028 .3136999 -0.19 0.849 -.6761426 .5563371 -----------------------------------------------------------------------------Instrumented: ASVABC Instruments: S EXP MALE ETHBLACK ETHHISP SM SF SIBLINGS LIBRARY -----------------------------------------------------------------------------. estimates store IV1 . reg LGEARN S EXP ASVABC MALE ETHBLACK ETHHISP Source | SS df MS -------------+-----------------------------Model | 79.6526724 6 13.2754454 Residual | 134.357715 533 .252078265 -------------+-----------------------------Total | 214.010387 539 .397050811 Number of obs F( 6, 533) Prob > F R-squared Adj R-squared Root MSE = = = = = = 540 52.66 0.0000 0.3722 0.3651 .50207

-----------------------------------------------------------------------------LGEARN | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------S | .1088435 .0111413 9.77 0.000 .0869572 .1307298 EXP | .0295295 .0050282 5.87 0.000 .019652 .039407 ASVABC | .0085607 .0031108 2.75 0.006 .0024498 .0146717 MALE | .2866854 .0446382 6.42 0.000 .198997 .3743738 ETHBLACK | -.1204073 .0760945 -1.58 0.114 -.2698892 .0290745 ETHHISP | -.0506455 .1001965 -0.51 0.613 -.247474 .1461829 _cons | .2334226 .1775463 1.31 0.189 -.1153537 .582199 -----------------------------------------------------------------------------. estimates store OLS1 . hausman IV1 OLS1, constant ---- Coefficients ---| (b) (B) (b-B) sqrt(diag(V_b-V_B)) | IV1 OLS1 Difference S.E. -------------+---------------------------------------------------------------ASVABC | .0253515 .0085607 .0167907 .0146303 S | .0745083 .1088435 -.0343352 .0299938 EXP | .0231284 .0295295 -.0064011 .0056933 MALE | .2577989 .2866854 -.0288865 .0272189 ETHBLACK | .0474775 -.1204073 .1678848 .147182 ETHHISP | .0457423 -.0506455 .0963878 .0870917 _cons | -.0599028 .2334226 -.2933254 .2586212 -----------------------------------------------------------------------------b = consistent under Ho and Ha; obtained from ivreg B = inconsistent under Ha, efficient under Ho; obtained from regress

C. Dougherty 20012006. All rights reserved. Version of 15.04.07.

STOCHASTIC REGRESSORS

12

Test:

Ho:

difference in coefficients not systematic chi2(7) = (b-B)'[(V_b-V_B)^(-1)](b-B) = 1.32 Prob>chi2 = 0.9880

. cor ASVABC SM SF SIBLINGS LIBRARY (obs=540) | ASVABC SM SF SIBLINGS LIBRARY -------------+--------------------------------------------ASVABC | 1.0000 SM | 0.3931 1.0000 SF | 0.3854 0.6236 1.0000 SIBLINGS | -0.1999 -0.2688 -0.2664 1.0000 LIBRARY | 0.2663 0.3577 0.3256 -0.1504 1.0000

8.11 What is the difference between an instrumental variable and a proxy variable (as described in

Section 6.4)? When would you use one and when would you use the other? Answer: An instrumental variable estimator is used when one has data on an explanatory variable in the regression model but OLS would give inconsistent estimates because the explanatory variable is not distributed independently of the disturbance term. The instrumental variable partially replaces the original explanatory variable in the estimator and the estimator is consistent. A proxy variable is used when one has no data on an explanatory variable in a regression model. The proxy variable is used as a straight substitute for the original variable. The interpretation of the regression coefficients will depend on the relationship between the proxy and the original variable, and the properties of the other estimators in the model and the tests and diagnostic statistics will depend on the degree of correlation between the proxy and the original variable.

C. Dougherty 20012006. All rights reserved. Version of 15.04.07.

You might also like