You are on page 1of 28

Summary Chapter 2

Estimation of a 2 variable relationship Beta OLS definition and derivation E( ) and Var( ) Properties of beta:
BLUE
Best linear unbiased estimator
Among the list of all linear unbiased estimators, OLS estimator has the smallest variance

Based on a list of assumptions


Econ 321-Stphanie Lluis 1

Multiple Regression Chapter 3 Part I


Econ 321 Introduction to Econometrics

Econ 321-Stphanie Lluis

Outline
Multiple Regression Interpretation of the Results Specification
Omitting variables (omitted variable bias) Adding Irrelevant variables

Econ 321-Stphanie Lluis

Moving to Multiple Regression


In a bivariate analysis context, the analysis assumes that only X matters for Y
all other factors (potentially affecting Y) are in the error term and are not correlated with X

In a multiple regression context we take into account the fact that multiple factors simultaneously affect Y
We can conduct a ceteris paribus analysis We can control for other factors Crucial given that we deal with non experimental data

What we will see


Two examples:
Growth data Earnings data

Adding a variable X2 effect on Y (which is 2 ) and on X1 (which is 1) Adding a variable


May have a strong effector may not!

Econ 321-Stphanie Lluis

EX1: Analysis of Growth


Suppose you are interested in the relationship between Growth and revolutions/insurrections Statistical Model (PRF): Growth=0 + 1 Rev + u Obtain data of average percentage growth in GDP from 1960 to 1995 for 65 countries

EX1: Variables

Econ 321-Stphanie Lluis

country_name 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. 23. 24. 25. 26. 27. 28. 29. 30. 31. Argentina Australia Austria Bangladesh Belgium Bolivia Brazil Canada Chile Colombia Costa Rica Cyprus Denmark Dominican Republic Ecuador El Salvador Finland France Germany Ghana Greece Guatemala Haiti Honduras Iceland India Ireland Israel Italy Jamaica

32. 33. 34. 35. 36. 37. 38. 39. 40. 41. 42. 43. 44. 45. 46. 47. 48. 49. 50. 51. 52. 53. 54. 55. 56. 57. 58. 59. 60. 61. 62. 63. 64. 65.

Kenya Korea, Republic of Malaysia Malta Mauritius Mexico Netherlands New Zealand Niger Norway Pakistan Panama Papua New Guinea Paraguay Peru Philippines Portugal Senegal Sierra Leone South Africa Spain Sri Lanka Sweden Switzerland Taiwan, China Thailand Togo Trinidad and Tobago United Kingdom United States Uruguay Venezuela Zaire Zimbabwe

Japan Econ 321-Stphanie Lluis

STATA command: regress growth rev_coups


. regress growth rev_coups Source Model Residual Total growth SS 18.7910648 211.548992 230.340057 Coef. -2.411691 2.346553 df 1 63 64 MS 18.7910648 3.35792051 3.59906339 t -2.37 8.25 P>|t| 0.021 0.000 Number of obs F( 1, 63) Prob > F R-squared Adj R-squared Root MSE = = = = = = 65 5.60 0.0211 0.0816 0.0670 1.8325

Std. Err. 1.019486 .2842592

[95% Conf. Interval] -4.448971 1.778507 -.3744113 2.9146

rev_coups _cons

One additional revolution or insurrection is associated with an average growth rate which is 2.41% lower
Econ 321-Stphanie Lluis 9

EX1: Adding a New Variable has two main Interests


1. The individual effects of the new variable added (2, ) for X2,
2. The effect of the main variable of interest X1 (1) changes

Econ 321-Stphanie Lluis

10

EX1: Adding Average Education


New PRF: Growth=0 + 1 Rev + 2 Educ + u
. regress growth rev_coups yearsschool Source Model Residual Total growth rev_coups yearsschool _cons SS 33.1790605 197.160997 230.340057 Coef. -1.663581 .1978976 1.432645 df 2 62 64 MS 16.5895302 3.18001607 3.59906339 t -1.58 2.13 2.80 P>|t| 0.119 0.037 0.007 Number of obs F( 2, 62) Prob > F R-squared Adj R-squared Root MSE = = = = = = 65 5.22 0.0081 0.1440 0.1164 1.7833

Std. Err. 1.052608 .0930369 .5110023

[95% Conf. Interval] -3.767714 .0119196 .411166 .4405526 .3838756 2.454124

Controlling for average education in the country, one additional revolution or insurrection is associated with an average growth rate which is 1.66% lower
Econ 321-Stphanie Lluis 11

EX1: Adding Assasinations


New PRF: Growth=0 + 1 Rev + 2 Assa + u
. regress growth rev_coups assasinations Source Model Residual Total growth rev_coups assasinati~s _cons SS 18.8254472 211.51461 230.340057 Coef. -2.353678 -.0541002 2.351855 df 2 62 64 MS 9.4127236 3.41152597 3.59906339 t -2.00 -0.10 8.07 P>|t| 0.050 0.920 0.000 Number of obs F( 2, 62) Prob > F R-squared Adj R-squared Root MSE = = = = = = 65 2.76 0.0711 0.0817 0.0521 1.847

Std. Err. 1.178935 .5388961 .2913458

[95% Conf. Interval] -4.710334 -1.131338 1.769463 .002979 1.023138 2.934247

Controlling for assasinations, one additional revolution or insurrection is associated with an average growth rate which is 2.35% lower
Econ 321-Stphanie Lluis 12

EX2: Analysis of Earnings and Tenure


PRF: Log(wage)= 0 + 1 tenure + u
. regress lwage tenure Source Model Residual Total lwage tenure _cons SS 3.12220883 125.927949 129.050158 Coef. .01281 6.706028 df 1 739 740 MS 3.12220883 .170403179 .174392106 t 4.28 252.39 P>|t| 0.000 0.000 Number of obs F( 1, 739) Prob > F R-squared Adj R-squared Root MSE = = = = = = 741 18.32 0.0000 0.0242 0.0229 .4128

Std. Err. .0029927 .0265696

[95% Conf. Interval] .0069349 6.653867 .0186852 6.758189

One more year of tenure is associated with an increase in earnings of 1.281% or (e0.01281 1)*100=1.289%

Econ 321-Stphanie Lluis

13

EX2: Adding age


New PRF: Log(wage)= 0 + 1 tenure + 2 age + u
. regress lwage tenure age Source Model Residual Total lwage tenure age _cons SS 6.13983427 122.910324 129.050158 Coef. .0090145 .0215171 6.025259 df 2 738 740 MS 3.06991713 .166545154 .174392106 t 2.92 4.26 37.18 P>|t| 0.004 0.000 0.000 Number of obs F( 2, 738) Prob > F R-squared Adj R-squared Root MSE = = = = = = 741 18.43 0.0000 0.0476 0.0450 .4081

Std. Err. .00309 .005055 .1620739

[95% Conf. Interval] .0029481 .0115933 5.707078 .0150808 .0314409 6.343439

Controlling for age, one more year of tenure is associated with an increase in earnings of 0.90% or (e0.009 1)*100=0.90%

Econ 321-Stphanie Lluis

14

EX2: Adding # of siblings


New PRF: Log(wage)= 0 + 1 tenure + 2 sibs + u
. regress lwage tenure sibs Source Model Residual Total lwage tenure sibs _cons SS 5.18577098 123.864387 129.050158 Coef. .0124695 -.0232682 6.775552 df 2 738 740 MS 2.59288549 .167837923 .174392106 t 4.20 -3.51 205.37 P>|t| 0.000 0.000 0.000 Number of obs F( 2, 738) Prob > F R-squared Adj R-squared Root MSE = = = = = = 741 15.45 0.0000 0.0402 0.0376 .40968

Std. Err. .0029716 .0066359 .0329917

[95% Conf. Interval] .0066356 -.0362956 6.710783 .0183034 -.0102407 6.84032

Controlling for the number of siblings, one more year of tenure is associated with an increase in earnings of 1.24% or (e0.0124 1)*100=1.24%
Econ 321-Stphanie Lluis 15

Summary of effect of adding X2 on 1


Growth example Explanatory Variables Revolutions Revolutions + educ Revolutions + assa Estimate of revolutions -2.41% -1.66% -2.45% Earnings example Explanatory Variables Tenure Tenure + age Tenure + siblings Estimate of tenure 1.28% 0.90% 1.24%

Educ has a strong effect Assasinations doesnt

Age has a strong effect Siblings doesnt

Econ 321-Stphanie Lluis

16

Specification of the Equation


You have in mind a relationship between Y and X1
The conditional model is E(Y|X1) = 0 + 1 X1

Which other variables X2, X3, should you place in the model? What we are going to see is: Effect of omitting to include variables that are relevant Effect of including irrelevant variables

Omitted Variable Bias


True model: Instead, we estimate a shorter equation:

Omitted Variable Bias


Rewriting the true model so it looks like the other model:

The error term in parenthesis contains X2 If X2 is correlated with X1, the coefficient on X1 will pick up the effect of X2

Illustration
Suppose X2 is a function of X1: Back into the regression model:
Bias

Rearranging:
2

The alphas combine the effects of X1 and X2

+ 2 }
20

Illustration
Growth example Explanatory Variables
Revolutions Revolutions + educ

Earnings example Estimate of revolutions


-2.41% -1.66%

Explanatory Variables
Tenure Tenure + age

Estimate of tenure
1.28% 0.90%

1= -3.780 < 0 2= 0.198 > 0 Bias=-0.075 < 0 1 = 1 + 1 2 -2.41 = -1.66 - 0.75

1= 0.176 > 0 2= 0.0215 > 0 Bias=0.0038 > 0 1 = 1 + 1 2 0.0128 = 0.009 + 0.0038

Problem with Important Omitted Variables


If explanatory variables are omitted from your equation, and they are correlated with variables which are included in the model
Your estimated coefficients will not reflect just the effect of the variable included in the model, it will also pick up the effect of the omitted variable. Your coefficients are wrong or biased, they are systematically over or under shooting Whether the bias on alpha1 is upward or downward depends upon whether the omitted variable (X2 ) is positively or negatively correlated with the variable included (X1) (whether gamma1 is >0 or < 0)

Direction of Bias When X2 is Omitted

Econ 321-Stphanie Lluis

23

Correcting for Omitted Variable Bias


Depends on the source of the problem
You have the data but didnt include it. Put the variable(s) into the model You dont have the data in your data set. This is more often the case
Get the damn data. This is a reason for planning data collection very carefully. It is costly to try to go back, possibly impossible.

Use a proxy for the data which you would prefer to have.
You may not have exactly the variable which you would like to use You may be able to find an alternative which is close and largely eliminates the problem of omitted variable bias

Use intuition to get the direction of the omitted variable bias. Use instrumental variable estimation

Consequences of Including Irrelevant Variables


Coefficient of the irrelevant variable should be close to or equal to 0 or insignificant Other coefficients are unchanged or dont change much The standard error of regression for all coefficients will be larger than it would be if that variable was not included
t-tests will be less likely to reject the null hypothesis than with the correct specification doesnt matter in moderately large data sets

From Ex1 and Ex2


Growth example
The variable assasinations could be considered an irrelevant variable The estimate of 2 is estimated with little precision (large standard error of 2)

Earnings example
The variable siblings is NOT irrelevant because even though it does not have a strong effect on the relationship between earnings and tenure (the 1), it does have an effect on Y (2 is precisely estimated and non negligible)
Econ 321-Stphanie Lluis 26

Consequences of Altering the Specification


When adding a previously omitted variable
the coefficient will be statistically significant and correctly signed R-bar2 will increase as the variable has explanatory power coefficients, particularly those of interest will change as bias is removed

When adding an irrelevant variable


will not be statistically significant or, in large data sets, be far less significant than other variables. R-bar2 will not increase and will likely fall coefficients, particularly those of interest will not change as we are not eliminating bias

Which Variables?
How de we decide what should be and not be in the model? It depends on the question to be analyzed
If just wants to know how much more a union member earn then first estimate is fine If we want to measure how much union membership increases the earnings of otherwise similar individuals, then we want to use as many control variables as possible to insure we have similar individuals.

You might also like