You are on page 1of 23

Chapter 10 Instrumental Variables Regression Multiple Choice 1) Estimation of the IV regression model a. requires exact identification. b.

allows only one endogenous regressor, which is typically correlated with the error term. c. requires exact identification or overidentification. d. is only possible if the number of instruments is the same as the number of regressors. Answer: c 2) Two stage least squares is calculated as follows: in the first stage a. Y is regressed on the exogenous variables only. The predicted value of Y is then regressed on the instrumental variables. b. the unknown coefficients in the reduced form equation are estimated by OLS, and the predicted values are calculated. In the second stage, Y is regressed on these predicted values and the other exogenous variables. c. the exogenous variables are regressed on the instruments. The predicted value of the exogenous variables is then used in the second stage, together with the instruments, to predict the dependent variable. d. the unknown coefficients in the reduced form equation are estimated by weighted least squares, and the predicted values are calculated. In the second stage, Y is regressed on these predicted values and the other exogenous variables. Answer: b 3) The conditions for a valid instruments do not include the following: a. each instrument must be uncorrelated with the error term. b. each one of the instrumental variables must be normally distributed. c. at least one of the instruments must enter the population regression of X on the Zs and the Ws. d. perfect multicollinearity between the predicted endogenous variables and the exogenous variables must be ruled out. 1

Answer: b 4) The IV regression assumptions include all of the following with the exception of a. b. c. d. the error terms must be normally distributed. E(ui|W1i,, Wri) = 0. the Xs, Ws, Zs, and u all have nonzero, finite fourth moments. (X1i,, Xki, W1i,,Wri, Z1i, Zmi, Yi) are i.i.d. draws from their joint distribution.

Answer: a 5) The rule-of-thumb for checking for weak instruments is as follows: for the case of a single endogenous regressor, a. b. c. d. Answer: d 6) The J-statistic a. tells you if the instruments are exogenous. b. provides you with a test of the hypothesis that the instruments are exogenous for the case of exact identification. 2 c. is distributed m k where m-k is the degree of overidentification.
2 d. is distributed m k where m-k is the number of instruments minus the number of regressors.

a first stage F must be statistically significant to indicate a strong instrument. a first stage F > 1.96 indicates that the instruments are weak. the t-statistic on each of the instruments must exceed at least 1.64. a first stage F < 10 indicates that the instruments are weak.

Answer: c 7) In the case of the simple regression model Yi = 0 + 1Xi + ui, i = 1,, n, when X and u are correlated, then a. b. c. d. Answer: d 2 the OLS estimator is biased in small samples only. OLS and TSLS produce the same estimate. X is exogenous. the OLS estimator is inconsistent.

8)

The following will not cause correlation between X and u in the simple regression model: a. b. c. d. Answer: c simultaneous causality. omitted variables. irrelevance of the regressor. errors in variables.

9)

The distinction between endogenous and exogenous variables is a. that exogenous variables are determined inside the model and endogenous variables are determined outside the model. b. dependent on the sample size: for n > 100, endogenous variables become exogenous. c. dependent on the distribution of the variables: when they are normally distributed, they are exogenous; otherwise they are endogenous. d. whether or not the variables are correlated with the error term. Answer: d

10)

The two conditions for a valid instrument are a. b. c. d. Answer: c corr(Zi, Xi) = 0 and corr(Zi, ui) 0. corr(Zi, Xi) = 0 and corr(Zi, ui) = 0. corr(Zi, Xi) 0 and corr(Zi, ui) = 0. corr(Zi, Xi) 0 and corr(Zi, ui) 0.

11)

Instrument relevance a. means that the instrument is one of the determinants of the dependent variable. b. is the same as instrument exogeneity. c. means that some of the variance in the regressor is related to variation in the instrument. d. is not possible since X and u are correlated and Z and u are not correlated. Answer: c 3

12)

Consider a competitive market where the demand and the supply depend on the current price of the good. Then fitting a line through the quantity-price outcomes will a. b. c. d. Answer: b give you an estimate of the demand curve. estimate neither a demand curve nor a supply curve. enable you to calculate the price elasticity of supply. give you the exogenous part of the demand in the first stage of TSLS.

13)

When there is a single instrument and single regressor, the TSLS estimator for the slope can be calculated as follows a. 1 b. 1 c. 1 d. 1 Answer: a
TSLS

TSLS

TSLS

TSLS

sZY sZX s = XY 2 sX s = ZX sZY s = ZY 2 sZ =

. . . .

14)

The TSLS estimator is a. b. c. d. Answer: a consistent and has a normal distribution in large samples. unbiased. efficient in small samples. F-distributed.

15)

The reduced form equation for X a. regresses the endogenous variable X on the smallest possible subset of regressors. b. relates the endogenous variable X to all the available exogenous variables, both those included in the regression of interest and the instruments. c. uses the predicted values of X from the first stage as a regressor in the original equation. d. uses smaller standard errors, such as homoskedasticity-only standard errors, for inference. Answer: b

16)

When calculating the TSLS standard errors a. you do not have to worry about heteroskedasticity, since it was eliminated in the first stage. b. you can use the standard errors reported by OLS estimation of the second stage regression. c. the critical values from the standard normal table should be adjusted for the proper degrees of freedom. d. you should use heteroskedasticity-robust standard errors. Answer: d

17)

Having more relevant instruments a. is a problem because instead of being just identified, the regression now becomes overidentified. b. is like having a larger sample size in that the more information is available for use in the IV regressions. c. typically results in larger standard errors for the TSLS estimator. d. is not as important for inference as having the same number of endogenous variables as instruments. Answer: b

18)

Weak instruments are a problem because a. b. c. d. Answer: a 5 the TSLS estimator may not be normally distributed, even in large samples. they result in the instruments not being exogenous. the TSLS estimator cannot be computed. you cannot predict the endogenous variables any longer in the first stage.

19)

(Requires Appendix Material) The relationship between the TSLS slope and the corresponding population parameter is:
_ 1 n ( Z i Z )ui n i =1

a. ( 1TSLS 1 ) =

b. ( 1TSLS

c. ( 1TSLS

d. ( 1TSLS Answer: a 20)

. _ 1 n ( Zi Z )( X i X ) n i =1 _ 1 n (Zi Z ) n 1 ) = n i =1 _ . 1 ( Zi Z )( X i X ) n i =1 _ 1 n ( Z i Z )ui n 1 ) = i =n1 . _ 1 2 ( Zi Z ) n i =1 1 n ( X i X )ui n i =1 1 ) = n . _ 1 ( Zi Z )( X i X ) n i =1

If the instruments are not exogenous, a. you cannot perform the first stage of TSLS. b. then, in order to conduct proper inference, it is essential that you use heteroskedasticity-robust standard errors. c. your model becomes overidentified. d. then TSLS is inconsistent. Answer: d

21)

In the case of exact identification, a. you can use the J-statistic in a test of overidentifying restrictions. b. you cannot use TSLS for estimation purposes. c. you must rely on your personal knowledge of the empirical problem at hand to assess whether the instruments are exogenous. d. OLS and TSLS yield the same estimate. Answer: c 6

22)

To calculate the J-statistic you regress the a. squared values of the TSLS residuals on all exogenous variables and the instruments. The statistic is then the number of observations times the regression R2 . b. TSLS residuals on all exogenous variables and the instruments. You then multiply the homoskedasticity-only F-statistic from that regression by the number of instruments. c. OLS residuals from the reduced form on the instruments. The F-statistic from this regression is the J-statistic. d. TSLS residuals on all exogenous variables and the instruments. You then multiply the heteroskedasticity-robust F-statistic from that regression by the number of instruments. Answer: b

23)

(Requires Chapter 8) When using panel data and in the presence of endogenous regressors , a. the TSLS does not exist. b. you do not have to worry about the validity of instruments, since there are so many fixed effects. c. the OLS estimator is consistent. d. application of the TSLS estimator is straightforward if you use two time periods and difference the data. Answer: d

24)

In practice, the most difficult aspect of IV estimation is a. b. c. d. Answer: a finding instruments that are both relevant and exogenous. that you have to use two stages in the estimation process. calculating the J-statistic. finding instruments that are exogenous. Relevant instruments are easy to find.

25)

Consider a model with one endogenous regressor and two instruments. Then the Jstatistic will be large a. if the number of observations is very large. b. if the coefficients are very different when estimating the coefficients using one instrument at a time. c. if the TSLS estimates are very different from the OLS estimates. d. when you use homoskedasticity-only standard errors. Answer: b

Essays 1) Write a short essay about the overidentifying restrictions test. What is meant exactly by overidentification? State the null hypothesis. Describe how to calculate the J-statistic and what its distribution is. Use an example of two instruments and one endogenous variable to explain under what situation the test will be likely to reject the null hypothesis. What does this example tell you about the exactly identified case? If your variables pass the test, is this sufficient for these variables to be good instruments? Answer: The regression coefficients in the regression model with endogenous regressors can be either underidentified, exactly identified, or overidentified. If the number of instruments (m) equals the number of endogenous regressors (k), then the coefficients are exactly identified. If there are more instruments than the number of endogenous regressors, then the regression coefficients are overidentified. For the instrumental variable estimator to exist, there must be at least as many instruments as endogenous regressors ( m k ). In the case of overidentification, the exogeneity of the instruments can be tested. Under the null hypothesis, all instruments are exogenous. Under the alternative hypothesis, at least one of the instruments is endogenous. Technically, the overidentifying restrictions test uses the TSLS residuals to see if these are correlated with the instruments. The residuals are regressed on the instruments and the included exogenous regressors. Under the null hypothesis, all coefficients other than the constant are zero. Since this is a case of joint hypothesis testing, the F-statistic is computed, and from it the J-statistic, where J = mF . In large samples the distribution of 2 this statistic is m k . Calculating the J-statistic amounts to comparing different IV estimates. In the case of two instruments and one endogenous regressor, where the degree of overidentification is one, two such estimates exist. Due to sample variation, these estimates will differ, although they should be similar, or close to each other. If one or both of the instruments is not exogenous, then the estimates will not be similar, or the difference between the two will be sufficiently large so as not to be the result of pure sampling variation. In this situation the null hypothesis will be rejected. This procedure can only be executed when the coefficients are overidentified, since there is no comparison possible for the case of exactly identified coefficients. Passing the test is not sufficient for the instruments to be valid since, in addition to being exogenous, they must also be relevant, i.e. they must be correlated with the endogenous regressor.

2)

Using some of the examples from your textbook, describe econometric studies which required instrumental variable techniques. In each case emphasize why the need for instrumental variables arises and how authors have approached the problem. Make sure to include a discussion of overidentification, the validity of instruments, and testing procedures in your essay. Answer: The textbook mentions several studies which used instrumental variable estimation techniques, starting with Wrights problem to estimate demand and supply elasticities on animal and vegetable oils and fats. This is a case of simultaneous causality bias since the price and quantity in the market are determined by both the supply and demand for the commodity. Wright used the weather, which shifted the supply curve only and thereby traced out the demand curve. Since there was only a single instrument, the coefficients are exactly identified, and the validity of the instrument cannot not be tested. Another example mentioned is the effect of class size on test scores. The reason for a correlation between class size and the error term potentially stems from omitted variable bias here, such as the quality of the teaching staff and outside opportunities for some of the students. In the hypothetical examples of an earthquake, some schools may receive more students than usual dependent on the closeness to the epicenter, if the school was unaffected structurally. The increase in class size is related to the closeness to the epicenter, but this distance should be uncorrelated with the ability of the teaching staff and the outside opportunities. As in the previous study, there is only a single instrument and hence no possibility to use the overidentification test. The primary example of instrumental variable estimation in the chapter involves estimation of the demand elasticity for cigarettes. Due to simultaneity bias for the demand equation, sales taxes are used as an instrument first in a cross section of states in a single year and later in a panel. Prices and quantities are determined simultaneously by supply and demand, and as a result, prices will be correlated with the error term in the demand equation. Sales taxes are fairly highly correlated with prices, explaining almost half of the variation in these. It is argued that due to differences in choices about public finance due to political considerations across states, these are exogenous. Only one instrument is used in the cross-section and hence there is no degree of overidentification. Later another instrument is introduced, cigarette-specific taxes. With two instruments and one endogenous regressor, the J-statistic can be computed for the overidentifying restrictions test.

10

Other examples discussed in the textbook include the effect of an increase in the prison population on crime rates, further discussion of class size and test scores, and aggressive treatment of heart attacks and the potential for saving lives. 3) Describe the consequences of estimating an equation by OLS in the presence of an endogenous regressor. How can you overcome these obstacles? Present an alternative estimator and state its properties. Answer: In the case of an endogenous regressor, there is correlation between the variable and the error term. In this case, the OLS estimator is inconsistent. To get a consistent estimator in this situation, instrumental variable techniques, such as TSLS should be used. If one or more valid instruments can be found, meaning that the instrument must be relevant and exogenous, then a consistent estimator can be derived. The relevance of instruments can be tested using the rule of thumb (a first-stage F-statistic of more than 10 in the TSLS estimator). The exogeneity of the instruments can be tested using the J-statistic. The test requires that there is at least one more instrument than endogenous regressors, i.e., that the equation is overidentified. In large samples the sampling distribution of the TSLS estimator is approximately normal, so that statistical inference can proceed as usual using the t-statistic, confidence intervals, or joint hypothesis tests involving the F-statistic. However, inference based on these statistics will be misleading in the case where instruments are not valid. 4) Write an essay about where valid instruments come from. Part of your answer must deal with checking the validity of instruments and what the consequences of weak instruments are. Answer: In order for instruments to be valid, they have to be relevant and exogenous. To find valid instruments, two approaches are typically used. First economic theory can serve as a guide. In the case of simultaneous causality in a market, for example, theory predicts shifts in one curve but not the other as a result of changes in an instrumental variable. The second approach focuses on shifts in the endogenous regressor that is caused by an exogenous source of variation in the variable resulting from a random phenomenon. The textbook uses the example of an earthquake which changes student-teacher ratios as students in affected areas have to be redistributed.

11

To check the validity of instruments, there is the rule of thumb to determine whether or not an instrument is weak. It states that the F-statistic in the first stage of the TSLS procedure should exceed 10. Instrument exogeneity can be tested only in the case of overidentification. If there are more instruments than endogenous regressors, then the J-statistic can be calculated. The null hypothesis of exogeneity will be rejected, in essence, if the TSLS residuals are correlated with the instruments. If instruments are weak, then the TSLS estimator is biased and statistical inference does not yield reliable confidence intervals even in large samples.

5)

You have estimated a government reaction function, i.e., a multiple regression equation, where a government instrument, say the federal funds rate, depends on past government target variables, such as inflation and unemployment rates. In addition, you added the previous periods popularity deficit of the government, e.g. the approval rating of the president minus 50%, as one of the regressors. Your idea is that the Federal Reserve, although formally independent, will try to expand the economy if the president is unpopular. One of your peers, a political science student, points out that approval ratings depend on the state of the economy and thereby indirectly on government instruments. It is therefore endogenous and should be estimated along with the reaction function. Initially you want to reply by using a phrase that includes the words money neutrality but are worried about a lengthy debate. Instead you state that as an economist, you are not concerned about government approval ratings, and that government approval ratings are determined outside your (the economic) model. Does your whim make the regressor exogenous? Why or why not? Answer: In general, the question of whether or not a variable is endogenous or exogenous depends on its correlation with the error term, not on the size of the underlying model. The point to make is that just because a variable is endogenous does not imply that its determinants have to be modeled. If the purpose of the exercise is to eventually simulate the model for policy purposes, then the feedback envisioned by the political science student is potentially important. However, if the aim is simply to forecast the behavior of the government reaction function, then the issue of endogeneity or exogeneity is only relevant for questions regarding the type of estimator to be used. Of course, if a regressor is endogenous, then instrumental variable techniques must be used to ensure desirable properties of the estimator.

12

Mathematical and Graphical Problems 1) To analyze the year-to-year variation in temperature data for a given city, you regress the daily high temperature (Temp) for 100 randomly selected days in two consecutive years (1997 and 1998) for Phoenix. The results are (heteroskedasticity-robust standard errors in parentheses):
PHX PHX Temp1998 = 15.63 + 0.80 Temp1997 ; R 2 = 0.65, SER = 9.63 (6.46) (0.10)

(a)

Calculate the predicted temperature for the current year if the temperature in the previous year was 40F, 78F, and 100F. How does this compare with you prior expectation? Sketch the regression line and compare it to the 45 degree line. What are the implications? Answer: The three predicted temperatures will be 47.6, 78.0, and 95.6 respectively. The initial expectation should be that the temperature in 1998 is the same in 1997 for a given date. The regression line and the 45 degree line are sketched in the accompanying figure. The implication is mean reversion: if the temperature was low (40 degrees), then it will also be low the following year, but not as low. Alternatively, if the temperature was high (100), then it will be high again, but not as high. If this prediction is extrapolated into the future, then eventually all temperatures should be the same for all days. This obviously does not make sense.

1997 and 1998 Temperature


120 110 100 Degrees in 1998 90 80 70 60 50 40 30 30 50 70 Degrees in 1997 Predicted Temp 1998 45 degree line 90 110

13

(b)

You recall having studied errors-in-variables before. Although the Web site you received your data from seems quite reliable in measuring data accurately, what if the temperature contained measurement error in the following sense: for any given day, say January 28, there is a true underlying seasonal temperature (X), but each year there are different temporary weather patterns (v, w) which result in a temperature X different from X. For the two years in your data set, the situation can be described as follows:

X 1997 = X + v1997 and X 1998 = X + w1998


Subtracting X 1997 from X 1998 , you get X 1998 = X 1997 + w1998 v1997 . Hence the population parameter for the intercept and slope are zero and one, as expected. It is not difficult to show that the OLS estimator for the slope is inconsistent, where 1 1
p

v2 2 x + v2

As a result you consider estimating the slope and intercept by TSLS. You think about an instrument and consider the temperature one month ahead of the observation in the previous year. Discuss instrument validity for this case. Answer: For an instrument to be valid, two conditions have to hold. First, the instrument has to be relevant, and second, the instrument has to be exogenous. If temperatures in one month ahead can predict the current temperature, as it certainly does in Phoenix, then the instrument is relevant or correlated with the current months temperature. If in addition, whatever caused the temperature in the current month to deviate from its long-term value is only a temporary phenomenon, such as a weather system created by a storm in the Pacific, then next months temperature should not be correlated with this event. Hence the instrument would be exogenous. (c) The TSLS estimation result is as follows:
PHX PHX Temp1998 = -6.24 + 1.07 Temp1997 ; (5.0) (0.06)

Perform a t-test on whether or not the slope is now significantly different from 1. Answer: The t-statistic is 1.17, and hence you cannot reject the null hypothesis that the slope equals 1. 14

2)

Consider the following population regression model relating the dependent variable Yi and regressor Xi,

Yi = 0 + 1Xi + ui, i = 1,, n. X i Yi + Z i


where Z is a valid instrument for X. (a) Exlain why you should not use OLS to estimate 1. Answer: Substitution of the first equation into the identity shows that X is correlated with the error term. Hence estimation with OLS results in an inconsistent estimator. (b) To generate a consistent estimator for 1, what should you do? Answer: The instrumental variable estimator is consistent and in this case is 1
2 SLS

sZY . sZX

Adventurous students will derive this estimator along the lines shown in Appendix 10.2. (c) The two equations above make up a system of equations in two unknowns. Specify the two reduced form equations in terms of the original coefficients. (Hint: Substitute the identity into the first equation and solve for Y. Similarly, substitute Y into the identity and solve for X.) Answer:

Yi = 0 + 1 (Yi + Z i ) + ui X i = ( 0 + 1 X i + ui ) + Z i or (1 1 )Yi = 0 + 1Z i + ui
(1 1 ) X i = 0 + Z i + ui Hence Yi = 0 + 2 Z i + v1i

X i = 3 + 4 Z i + v2i
where 0 = 3 =

0 1 1 ui . , 2 = 1 , 4 = , and v1i = v2i = 1 1 1 1 1 1 1 1

15

(d)

Do the two reduced form equations satisfy the OLS assumptions? If so, can you find consistent estimators of the two slopes? What is the ratio of the two estimated slopes? This estimator is called indirect least squares. How does it compare to the TSLS in this example? Answer: Since Z is a valid instrument by assumption, it must be uncorrelated with the sYZ 2 sZZ sYZ = = which error term, and hence using OLS results in a consistent estimator. 4 s XZ sZZ sZZ is identical to the TSLS estimator.

3)

Here are some examples of the instrumental variables regression model. In each case you are given the number of instruments and the J-statistic. Find the relevant value from the 2 m k distribution, using a 1% and 5% significance level, and make a decision whether or not to reject the null hypothesis.

(a)

Yi = 0 + 1 X 1i + ui , i = 1,..., n ; Z1i , Z 2i are valid instruments, J = 2.58.


Answer: The test statistic is distributed 12 and the critical values are 6.63 and 3.84 at the 1% and 5% significance level. Hence you cannot reject the null hypothesis that all the instruments are exogenous.

(b)

Yi = 0 + 1 X 1i + 2 X 2 i + 3W1i + ui , i = 1,..., n ; Z1i , Z 2i , Z 3i , Z 4i are valid instruments, J = 9.63.


2 Answer: The test statistic is distributed 2 and the critical values are 9.21 and 5.99 at the 1% and 5% significance level. Hence you can reject the null hypothesis that all the instruments are exogenous.

(c)

Yi = 0 + 1 X 1i + 2W1i + 3W2i + 4W3i + ui , i = 1,..., n ; Z1i , Z 2i , Z 3i , Z 4i are valid instruments, J = 11.86.


Answer: The test statistic is distributed 32 and the critical values are 11.34 and 7.81 at the 1% and 5% significance level. Hence you can reject the null hypothesis that all the instruments are exogenous.

16

4)

To study the determinants of growth among the countries of the world, researchers have used panels of countries and observations spanning long periods of time (e.g. 1965-1975, 1975-1985, 1985-1990). Some of these studies have focused on the effect that inflation has on growth, and found that although the effect is small for a given time period, it accumulates over time and therefore has an important negative effect. Explain why the OLS estimator may be biased in this case. Answer: The presence of simultaneous causality is highly likely since inflation may respond to growth. Depending on the list of regressors, omitted variables can also bias the estimator for the effect of the inflation rate.

(a)

(b)

Explain how methods using panel data could potentially alleviate the problem. Answer: Country fixed effects or differencing the data can solve the problem if inflation stays relatively constant over time from one country to the other. Unfortunately if the effect of inflation on growth is the focus of the study, then much of the cross-sectional information is lost using this approach.

(c)

Some authors have suggested using an index of central bank independence as an instrument. Discuss whether or not such an index would be a valid instrument. Answer: For this index to be valid, central bank independence has to be relevant and exogenous. If inflation rates are correlated with the index, then central bank independence is a relevant instrument. Although there is a high correlation for developed countries, there is little to no correlation when data for all countries are considered. Whether or not the index is exogenous cannot be tested unless the coefficients of the equation are overidentified. Otherwise personal judgment is the only guide. An argument that central bank independence is exogenous would have to rely on it being based on institutional arrangements which are independent of inflation. Although the independence of central banks in many countries was initially determined by concerns independent of inflation, there have been many situations where the institutional arrangements were altered as a result of high inflation.

5)

(Requires Matrix Algebra) The population multiple regression model can be written in matrix form as Y = X + U

17

Where

Y1 u1 1 X 11 Y2 , U = u2 , X = 1 X 12 Y= 1 X Y u 1n n n

X k 1 W11 X k 2 W12 X kn W1n

Wr1 0 Wr 2 , and = 1 Wrn k

Note that the X matrix contains both k endogenous regressors and (r +1) included exogenous regressors (the constant is obviously exogenous). The instrumental variable estimator for the overidentified case is
= [ X ' Z ( Z ' Z )1 Z ' X ]1 X ' Z ( Z ' Z )1 Z 'Y , where Z is a matrix, which contains two types of variables: first the r included exogenous regressors plus the constant, and second, m instrumental variables.

IV

1 Z11 1 Z12 Z = 1 Z 1n
It is of order n (m+r+1).

Z m1 W11 Z m 2 W12 Z mn W1n

Wr1 Wr 2 Wrn

For this estimator to exist, both ( Z ' Z ) and [ X ' Z ( Z ' Z )1 Z ' X ] must be invertible. State the conditions under which this will be the case and relate them to the degree of overidentification. Answer: In order for a matrix to be invertible, it must have full rank. Since ZZ is of order (m + r + 1) (m + r + 1) , then in order to invert ZZ , it must have rank (m + r + 1) . In the case of a product such as ZZ, the rank is at most less than or equal to the rank of Z or Z , whichever is smaller. Z is of order n (m + r + 1) , and assuming that there is no perfect multicollinearity, will have either rank n or rank (m + r + 1), whichever is the smaller of the two. Hence if there are fewer observations than the number of instrumental variables plus exogenous variables, then the rank of Z will be n(< m + r + 1) , and the rank of ZZ is also n(< m + r + 1) . Hence ZZ does not have full rank, and therefore cannot be inverted. The IV estimator does not exist as a result. In the past, this was 18

considered a strong possibility with large econometric models, where many predetermined variables entered. If there are more observations than instruments, then the rank of ZZ is (m + r + 1) . X ' Z will be of order (k + r + 1) (m + r + 1) , which will have rank (k + r + 1) if m > k, i.e. if there is overidentification. Furthermore [ X ' Z ( Z ' Z )1 Z ' X ] is of order (k + r + 1) (k + r + 1) and will have full rank since the rank of a product of the three matrices involved is at most the rank of the minimum of the three matrices X ' Z , ZZ , and Z ' X . 6) Consider the following model of demand and supply of coffee: Demand: QiCoffee = 1 Pi Coffee + 2 Pi Tea + ui Supply: QiCoffee = 3 Pi Coffee + 4 PiTea + 5Weather + vi (Variables are measured in deviations from means, so that the constant is omitted.) What are the expected signs of the various coefficients this model? Assume that the price of tea and Weather are exogenous variables. Are the coefficients in the supply equation identified? Are the coefficients in the demand equation identified? Are they overidentified? Is this result surprising given that there are more exogenous regressors in the second equation? Answer: Changes in Weather will shift the supply equation and thereby trace out the demand equation. Hence the coefficients of the demand equation are exactly identified since the number of instruments equals the number of endogenous regressors. However, the coefficients of the supply equation are underidentified since there is no instrumental variable available for estimation. The result is not surprising, since it is not the number of exogenous regressors in the equation that matters when determining whether or not the coefficients are identified. Instead what matters is the number of instruments available relative to the number of endogenous regressors. It is possible that the regression coefficients can be (over)identified even if there are no exogenous regressors present in the equation. 7) You started your econometrics course by studying the OLS estimator extensively, first for the simple regression case and then for extensions of it. You have now learned about the instrumental variable estimator. Under what situation would you prefer one to the other? Be specific in explaining under which situations one estimation method generates superior results.

19

Answer: Under the OLS assumptions, the OLS estimator is unbiased and consistent. The sampling distribution of the estimator is approximately normal in large samples. Hence statistical inference can proceed as usual using the t-statistic, confidence intervals, or joint hypothesis tests involving the F-statistic. One major concern throughout the text has been the development of new estimation techniques in the case where one of the OLS assumptions is violated, specifically that there is correlation between the error term and at least one of the regressors. This may be the result of omitted variables, error-in-variables, or simultaneous causality bias. These make up three of the threats to internal validity. In each of these cases, OLS becomes biased and an alternative estimator should be used. Even if the OLS assumptions are violated and the OLS estimator is biased because of omitted variable bias, simultaneous causality, or errors-in-variables, using TSLS will not improve the situation if the instruments are not valid. In that case, TSLS will yield inconsistent estimators if the instruments are not exogenous. It will be biased and statistical inference will not be valid if the instruments are weak. Furthermore, the estimator will not even be normally distributed in large samples. If the instruments are valid and the other IV regression assumptions hold, then the TSLS estimator is consistent and therefore preferable over the OLS estimator. Although its distribution is complicated in small samples, the sampling distribution of the estimator is approximately normal in large samples. Hence statistical inference can proceed as usual using the t-statistic, confidence intervals, or joint hypothesis tests involving the F-statistic. 8) Your textbook gave an example of attempting to estimate the demand for a good in a market, but being unable to do so because the demand function was not identified. Is this the case for every market? Consider, for example, the demand for sports events. One of your peers estimated the following demand function after collecting data over two years for every one of the 162 home games of the 2000 and 2001 season for the Los Angeles Dodgers.

Attend = 15,005 + 201 Temperat + 465 DodgNetWin + 82 OppNetWin (8,770) (121) (169) (26) + 9647 DFSaSu + 1328 Drain + 1609 D150m + 271 DDiv 978 D2001; (1505) (3355) (1819) (1,184) (1,143)

20

R2 = 0.416, SER = 6983 Attend is announced stadium attendance, Temperat is the average temperature on game day, DodgNetWin are the net wins of the Dodgers before the game (wins-losses), OppNetWin is the opposing teams net wins at the end of the previous season, and DFSaSu, Drain, D150m, Ddiv, and D2001 are binary variables, taking a value of 1 if the game was played on a weekend, it rained during that day, and the opposing team was within a 150-mile radius, plays in the same division as the Dodgers, and played during 2001, respectively. Numbers in parentheses are heteroskedasticity-robust standard errors. Even if there is no identification problem, is it likely that all regressors are uncorrelated with the error term? If not, what are the consequences? Answer: In the case of sports events, often price and quantity are not simultaneously determined by supply and demand. For baseball games, the supply of seats is fixed at the capacity level of the stadium. In addition, prices for games are also fixed in advance and do not vary with the attractiveness of the opponent. Therefore the supply curve is infinitely elastic up to the point of where the game is sold out. This situation is complicated by ticket scalping and the fact that teams stage special events (fireworks, etc.). Taking these considerations into account may result in simultaneous causality bias or a threat to internal validity because of the identification problem. However, assuming that there is no identification problem, there may still be omitted variable bias or errors-in-variables bias. For example, attendance typically increases the tighter the race for a play-off spot towards the end of the season. Furthermore, it is not the opposing teams net wins at the end of the previous season that accounts for the attractiveness of the opponent, but the performance during the current season. If the opposing teams current performance is related to its performance in the previous season, then the OLS estimator is biased. 9) Earnings functions, whereby the log of earnings is regressed on years of education, years of on-the-job training, and individual characteristics, have been studied for a variety of reasons. Some studies have focused on the returns to education, others on discrimination, union and non-union differentials, etc. For all these studies, a major concern has been the fact that ability should enter as a determinant of earnings, but that it is close to impossible to measure and therefore represents an omitted variable. Assume that the coefficient on years of education is the parameter of interest. Given that education is positively correlated to ability, since, for example, more able students attract scholarships and hence receive more years of education, the OLS estimator for the returns to education could be upward-biased. To overcome this problem, various authors have 21

used instrumental variables estimation techniques. For each of the instruments potential instruments listed below, briefly discuss instrument validity. (a) The individuals postal zip code. Answer: Instrumental validity has two components, instrument relevance ( corr ( Z i , X i ) 0 ), and instrument exogeneity ( corr ( Z i , ui ) = 0 ). The individuals postal zip code will certainly be uncorrelated with the omitted variable, ability, even though some zip codes may attract more able individuals. However, this is an example of a weak instrument, since it is also uncorrelated with years of education. (b) The individuals IQ or test-score on a work-related exam. Answer: There is instrument relevance in this case, since, on average, individuals who do well in intelligence scores or other work-related test scores will have more years of education. Unfortunately there is bound to be a high correlation with the omitted variable ability, since this is what these tests are supposed to measure. (c) Years of education for the individuals mother or father. Answer: A non-zero correlation between the mothers or fathers years of education and the individuals years of education can be expected. Hence this is a relevant instrument. However, it is not clear that the parents years of education are uncorrelated with parents ability, which in turn, can be a major determinant of the individuals ability. If this is the case, then years of education of the mother or father is not a valid instrument. (d) Number of siblings the individual has. Answer: There is some evidence that the larger the number of siblings of an individual, the less the number of years of education the individual receives. Hence number of siblings is a relevant instrument. It has been argued that number of siblings is uncorrelated with an individuals ability. In that case it also represents an exogenous instrument. However, there is the possibility that ability depends on the attention an individual receives from parents, and this attention is shared with other siblings. (10) The two conditions for instrument validity are corr ( Z i , X i ) 0 and corr ( Z i , ui ) = 0 . The reason for the inconsistency of OLS is that corr ( X i , ui ) 0 . But if X and Z are correlated, and X and u are also correlated, then how can Z and u not be correlated? Explain.

22

Answer: The introduction to Chapter 10 on instrumental variables regression and section 10.1 went into a lengthy explanation of this problem. The major idea is that corr ( X i , ui ) has two parts: one for which the correlation is zero and a second for which it is non-zero. The trick is to isolate the uncorrelated part of X. For the instrument to be valid, corr ( Z i , ui ) = 0 and corr ( Z i , X i ) 0 must hold. TSLS then generates predicted values of X in the first stage by using a linear combination of the instruments. As long as corr ( Z i , X i ) 0 and corr ( Z i , ui ) = 0 , then the part of X which is uncorrelated with the error term is extracted through the prediction. In the second stage, this captured exogenous variation in X is then used to estimate the effect of X on Y, which is exogenous. .

23

You might also like