Professional Documents
Culture Documents
Course outline
Course materials
Hayashi, F., Econometrics, Princeton University Press, 2000
Econometric software package STATA
Lecture sheets (available on Nestor)
Lecturers
1. prof. dr. Rob Alessie; email: r.j.m.alessie@rug.nl (weeks 1-4)
2. dr. Jochem de Bresser; email j.de.bresser@rug.nl (weeks 5-7)
3. Bert ter Bogt (homework); email: t.e.ter.bogt@student.rug.nl
Format of the course
A combination of lectures and tutorials (we will discuss the assignments).
Assessment
Weekly assignments (20%). Assignments will have both applied and
theoretical questions.
Important: A group of exactly 2 students can collaborate and can hand
in one set of answers!
Please note that you will have to enroll in a homework group consisting
of exactly two students. This can be done on Nestor, see below Group
enroll. The deadline for registering is February 10.
Assignments have to be typed!!!! Hand-written answers are not accepted.
Email submission are not allowed; please hand in a hard-copy.
Students who
1. followed the course Introduction to Econometrics last year or before
and
2. already made the weekly assignments,
dont need to hand in the assignments again. These students will retain
the old grade for the assignments.
individual exam (80 %);
Supplementary reading:
Verbeek, M., A guide to modern econometrics: Fourth edition, 2012.
Wooldridge, J.M., Introduction to econometrics, (4th edn.), 2006.
Prerequisite knowledge
Matrix algebra, e.g. matrix multiplication, rank of a matrix, inverse of
a matrix, determinant, trace, positive (semi-)denite matrices, Projection
matrix etc. (cf. handouts Paul Bekker available on Nestor).
Calculus: functions of several variables, partial derivatives, rst principles
of optimization, integrals.
Probability theory and statistics, e.g. random variables, joint distributions, conditional distribution, independence, conditional and unconditional expectations, (multivariate) normal distribution, 2 distribution etc.
(cf. course Linear models in Statistics).
Students with insucient knowledge are expected to study the required
material themselves.
Lecture 1
Hayashi, chapter 1: section 1.1
Organisation of lecture 1
1. What is Econometrics?
2. The classical linear regression model (cf. section 1.1 Hayashi)
The linearity assumption
Matrix notation
The strict exogeneity assumption
Implications of strict exogeneity
Other assumptions
What is econometrics?
Econometrics is the development of statistical methods for
estimating economic relations
testing economic theories
evaluating economic and business policy (proposals)
Examples
Forecasting of interest rate, ination rate and GDP
What is the causal eect of schooling on wages?
What is the eect of colonial institutions on economic growth?
Econometrics focuses on problems inherent in analyzing non-experimental
economic data.
Steps in empirical economic analysis
1. Formulation of a research question
2. Construction of an economic model
3. Construction and estimation of an econometric model
Structure of economic data
Cross-sectional data (obtained from a random sample)
Time series data (more dicult to analyze because one cannot assume
that the data are obtained from a random sample).
Panel data
In this course we mainly focus on the analysis of cross-section data
Causality and the notion of ceteris paribus
2.1
i = 1, . . . , n
(1)
where
yi :=dependent variable; other terms: regressand or left hand side
variable
(xi1 , . . . , xiK ):= K explanatory variables. Other terms: a) regressors,
b) right hand side variables, c) independent variables.
1 , . . . , K : regression coecients.
i := disturbance term or error term with certain properties to be
specied below.
Remarks
The part of the right-hand side involving the regressors
1 xi1 + 2 xi2 + . . . + K xiK ,
is called the regression or the regression function.
The disturbance term i captures all inuences on yi left unexplained by
the regressors.
In almost all cases, a regression model includes a constant term, i.e. xi1 = 1
for all i. In that case model (1) becomes
yi = 1 + 2 xi2 + . . . + K xiK + i ,
i = 1, . . . , n
(2)
(3)
(4)
In empirical studies, one typically nds that 0 < 2 < 1 (food is a necessary good)
In equation (4), the income elasticity is the same for everybody (this might
be a restrictive assumption).
Alternative specication:
ln(f oodi ) = 1 + 2 ln(ydi ) + 3 ln2 (ydi ) + i
(5)
11
(6)
12
(7)
2.2
Matrix notation
i = 1, . . . , n
(8)
(9)
where y = (y1 , ..., yn ) ; = (1 , ..., n ) are (n 1) vectors and X a (n K)matrix dened as follows:
x1
x11 . . . x1K
x x . . . x
2K
21
X = ..2 = .. ..
. .....
xn
xn1 . . . xnK
13
2.3
i = 1, . . . , n
(10)
i = 1, . . . , n
(11)
To state the assumption dierently, take, for any given observation i , the
joint distribution of the nK + 1 random variables, f (i , x1 , . . . , xn ), and
consider the conditional distribution, f (i |x1 , . . . xn ).
The conditional mean E(i |x1 , . . . , xn ) = g(x1 , . . . , xn ) where g(.) in general a nonlinear function of x1 , . . . , xn .
The strict exogeneity assumption: E(i |x1 , . . . , xn ) = 0.
Assuming this constant to be zero is not restrictive if the regressors include
a constant.
To see this, suppose that E(i |X) = and xi1 = 1. Then equation (1) can
be rewritten as:
yi = 1 + 2 xi2 + . . . + K xiK + i =
(1 + ) + 2 xi2 + . . . + K xiK + (i ) = c
(12)
1 + 2 xi2 + . . . + K xiK + i
Notice that E(i |X) = 0.
Instead of assuming strict exogeneity, one could assume that i and X are
independent of each other. The assumption of independence says that
f (i |X) = f (i )
(13)
Suppose that E(i ) = 0. Then stochastic independence implies strict exogeneity: (i is conditional mean independent of X. This can be seen
as follows:
E(i |X) =
f (|X)d =
f ()d = E() = 0
(14)
14
2.4
(15)
(16)
Cov(x, y)
sd(x)sd(y)
15
(17)
q.e.d.
2. Under strict exogeneity, the regressors are orthogonal to the error term
for all observations, i.e.,
E(xjk i ) = 0,
i, j = 1, . . . , n; k = 1, . . . , K
(18)
(19)
(20)
The last line of equation (20) follows from result (19) q.e.d.
3. Because the mean of the error term is zero, the orthogonality conditions
(18) are equivalent to zero-correlation conditions. This is because
Cov(i , xjk ) = E(i xjk ) E(i )E(xjk ) = E(i xjk ) = 0
(21)
The second equality follows from E(i ) = 0 (cf. equation (17), the last
equality from orthogonality conditions (18).
4. In particular, for i = j, Cov(xik , i ) = 0.
Therefore, strict exogeneity implies that the regressors are uncorrelated
with the error term.
5. Together with the linearity assumption 1.1, the strict exogeneity assumption also implies that
E(yi |X) = E(xi + i |X) = xi + E(i |X) = xi
16
(22)
17
2.5
For time-series models where index i is time, the implication (18) of strict
exogeneity can be rephrased as: the regressors are orthogonal to the past,
current, and future error terms (or equivalently, the error term is orthogonal
to the past, current, and future regressors).
But for most time-series models, this condition (and a fortiori strict exogeneity) is not satised.
The clearest example of a failure of strict exogeneity is a model where the
regressor includes the lagged dependent variable. Consider the simplest
such model:
yi = yi1 + i
(23)
where we assume that yi1 is uncorrelated with i : E(yi1 i ) = 0
Notice that
E(yi i ) = E[(yi1 + i )i ] =
E(yi1 i ) + E(2i ) = E(2i ) > 0
(24)
18
(25)
where
gGDPi := GDP-growth rate in period i
ri := interest rate
Suppose that E(ri i ) = 0
The treasury bill interest rate is typically set by the president of a central
bank, e.g. Janet Yellen (USA).
Behavior Janet Yellen (feedback mechanism):
ri = 1 + 2 (gGDPi1 3) + i , 2 > 0
(26)
19
2.6
i = 1, . . . , n
(27)
i, j = 1, . . . , n; i = j
(28)
20
(29)
where
agei = age of individual i (measured in years)
yobi = year of birth of individual i
Obviously, yobi = 1996 agei for all i.
Since model (30) includes a constant term (xi1 = 1), the regressor yobi is a
linear combination of the constant term and the the variable age. In other
words, we face a problem of perfect multicollinearity.
Given the identity yobi = 1996 agei , we can rewrite equation (30) as
follows:
log(wagei ) = (1 + 19964 ) + 2 Si + (3 4 )agei + i
(30)
which shows that only the dierence 3 4 but not 3 and 4 separately,
can be estimated.
21
(31)
i = 1, . . . , n
(32)
(33)
(34)
Due to the assumption of strict exogeneity, the second term of the rhs
of equation (34) cancels out.
In the context of time-series models, (28) states that there is no serial
correlation in the error term.
Since the (i, j) element of the n n matrix is i j , Assumption 1.4
can be written compactly as
E( |X) = 2 In
(35)
22
2.7
23
i = 1, . . . , n
(38)
(39)
2.8
Fixedregressors
24
i = 1, . . . , n
(40)
(41)
(42)
Introduction to Econometrics
Coordinator: prof. dr. Rob Alessie
email: r.j.m.alessie@rug.nl
oce: DUI 727 tel: 050-3637240
February 4, 2016
Lecture 2
Hayashi, chapter 1: section 1.2, 1.3
Organization of the lecture
1. The algebra of least squares
Minimizing the residual sum of squares
Normal equations
OLS estimator
2
Other concepts (fitted value, residual, R2 , Run
)
(1)
2 = (y X )
(y X )
(4)
SSR() =
(yi xi )
i=1
b argmin SSR()
(5)
1.1
Normal equations
(y X )
SSR()=(y
X )
=(y X )(y X )
=y y X y y X + X X
=y y 2y X + X X
y y 2a + A
(6)
where a = X y and A = X X.
Recalling from matrix algebra (cf. handout Matrix
Algebra) that
( A)
(a )
= a and
= 2A for A symmetric
SSR()
= 2a + 2A = 2X y + 2X X (7)
(8)
or
(X X)b = X y
(9)
Here, we replaced by b because the OLS estimate b
is the that satisfies the first order conditions.
The K equations (9) are called the normal equations
The normal equations can also be written as follows
(check!)
( n
)
n
xi xi b =
x i yi
(10)
i=1
i=1
(11)
2 SSR()
= 2X X
(12)
should be positive definite at the optimum.
Indeed X X is a positive definite matrix so that b is
the unique solution of the minimization problem (4)
The system of normal equations linear in b.
xi yi = (X X)1 X y (13)
b=
xi xi
i=1
i=1
1.2
More concepts
Residual
yi = xi b
(14)
ei = yi yi = yi xi b
(15)
SSR = SSR(b) = e e =
e2i
(16)
i=1
ei = 0
i=1
and that
(17a)
y = x b
(17b)
(18)
Annihilator matrix
M = In P
(19)
SSR = e e = M M = M
(20)
s2 .
(21)
by
yi = y y. We can decompose y y as follows:
y y=(y + e) (y + e)
=y y + 2y e + e e
=y y + 2b X e + e e
=y y + e e
(22)
e e
=1
yy
(23)
y y
=
yy
(24)
2
Since both y y and y y are nonnegative, 0 Ruc
1
10
1
n
yi
i=1
e e = (yi y)2
i=1
2
2
(yi y) =
(
yi y) +
e2i
i=1
i=1
(25)
i=1
e2i
i=1
R2 1
n
(yi y)2
(26)
i=1
(
yi y)2
R2 = i=1
n
(yi y)2
i=1
It holds that 0 R2 1
11
(27)
Interpretation R2 : 100*R2 is percentage of sample variance in y explained by the nonconstant regressors (model).
low R2 : bad fit of the model
high R2 : good fit of the model
Be careful with interpretation of the R2 !!! Value of R2
increases by the inclusion of extra irrelevant regressors.
2 , addresses this problem somewhat
Adjusted R2 , R
e e/(n K)
2 = 1
R
n
12
(28)
i = 1, . . . , n
i = 1, . . . , n
(29)
2.1
14
15
16
2.2
Gauss-Markov theorem
V ar(|X)
V ar(b|X)
in the matrix sense.
Remarks
The matrix inequality in part (c) says that the K K
matrix V ar(|X)
V ar(b|X) is positive semidefinite,
so
a [V ar(|X)V
ar(b|X)]a or a V ar(|X)a
a V ar(b|X)a
(30)
for any K-dimensional vector a.
Consider a = ek = (0, . . . , 1, 0, . . . , 0) (k-th element
equal to one, other elements zero), then equation (30)
implies that
V ar(k |X) V ar(bk |X)
(31)
That is, for any regression coecient, the variance of
the OLS estimator is no larger than that of any other
linear unbiased estimator.
17
=(D
+ A)y
=Dy + b
=D(X + ) + b
=DX + D + b
E(|X)
= DX + DE(|X) + E(b|X) = DX +
Since is unbiased it should hold that DX = O.
So = D + b and
=D + (b )
=(D + A)
18
So
V ar(|X)
=
=
=
=
=
E(( )( ) |X)
E((D + A) (D + A )|X) =
(D + A)E( |X)(D + A ) =
2 (D + A)(D + A ) =
2 (DD + AD + DA + AA )
V ar(|X)
V ar(b|X)= 2 (DD + (X X)1 ) 2 (X X)1
2 DD 0
(32)
19
2.3
e e
nK
(33)
More specifically, e has to satisfy the K normal equations, which limits the variability of the residual.
Estimate of V ar(b|X) = 2 (X X)1 :
\ = s2 (X X)1
V ar(b|X)
21
(34)
Introduction to Econometrics
Coordinator: prof. dr. Rob Alessie
email: r.j.m.alessie@rug.nl
oce: DUI 727 tel: 050-3637240
February 9, 2016
Lecture 3
Hayashi, chapter 1: section 1.3, 1.4
Organisation of the lecture
1. Finite sample properties of s2
2. Estimate of the variance V ar(b|X)
3. Examples
Omitted variable bias: the simple case
Inclusion of irrelevant variables in a regression model
The variance of the OLS estimator revisited
Interpretation dummy variables
4. Partitioned regression: Frisch-Waugh theorem
y Xb
y X(X X)1 X y
(In P )y
My
M (X + ) = M
Notice that M =
mij i j
i=1 j=1
Also
E( M |X) = E
=
=
n
n
)
mij i j |X
i=1 j=1
n
n
i=1 j=1
n
i=1
mii = 2 trace(M )
i=1
trace(M ) = trace(In P )
= trace(In ) trace(P ) = n K
(1)
Intuition proposition 1.2
dividing e e by n K rather than by n makes this
estimate unbiased for 2 .
The intuitive reason is that K parameters () have to
be estimated before obtaining the residual vector e.
More specifically, e has to satisfy the K normal equations, which limits the variability of the residual.
Estimate of V ar(b|X) = 2 (X X)1 :
\ = s2 (X X)1
V ar(b|X)
4
(2)
(3)
where
wagei =wage is measured in dollars per hour
educi =years of schooling
Model (3) has been estimated using Stata (the stata
program and dataset are available on Nestor).
. sum wage,detail
average hourly earnings
------------------------------------------------Percentiles Smallest
1%
1.67
.53
5%
2.75
1.43
10%
2.92
1.5 Obs
526
25%
3.33
1.5 Sum of Wgt.
526
50%
75%
90%
95%
99%
4.65
6.88
10
13
20
Largest
21.86
22.2
22.86
24.98
Mean
Std. Dev.
5.896103
3.693086
Variance
Skewness
Kurtosis
13.63888
2.007325
7.970083
regression results
. reg wage educ
Source |
SS
df
MS
---------+--------------------------Model |1179.73204
1 1179.73204
Residual |5980.68225 524 11.4135158
---------+--------------------------Total |7160.41429 525 13.6388844
No obs =
526
F(1,524)=103.36
Prob> F =0.0000
R^2
=0.1648
Adj R^2 =0.1632
Root MSE=3.3784
----------------------------------------wage |
Coef.
Std. Err.
t
---------+------------------------------educ |
.5413593
.053248
10.17
_cons | -.9048516
.6849678
-1.32
---------------------------------------- beduc =.5413593
Interpretation: One year of extra schooling leads to
54 dollar cents extra wage per hour.
s2 = e e/(n K)=(5980.68225/524=11.4135158)
Standard error of the regression:
se(beduc |X) = Vd
ar(beduc |X) = .053248
R2 = 1
e e
(yi
y )2
=1
5980.68225
7160.41429
1179.73204
7160.41429
= 0.1648
i=1
Adjusted R2 = 1
e e/(nK)
(yi
y )2 )/(n1)
i=1
= 1 11.4135158
13.6388844 =0.1632
25
20
15
10
5
0
0
10
years of education
15
Fitted values
20
(4)
1.536867
1.928619
2.302585
2.564949
2.995732
Largest
3.084659
3.100092
3.129389
3.218076
Mean
Std. Dev.
1.623268
.5315382
Variance
Skewness
Kurtosis
.2825329
.3909038
3.386529
regression results
. reg log_wage educ
Source |
SS
df
MS
---------+--------------------------Model |27.5606288
1 27.5606288
Residual |120.769123 524 .230475425
---------+--------------------------Total |148.329751 525
.28253286
Number of obs=
526
F( 1,
524)=119.58
Prob > F
=0.0000
R-squared
=0.1858
Adj R-squared=0.1843
Root MSE
=.48008
-------------------------------------log_wage |
Coef. Std. Err.
t
---------+---------------------------educ | .0827444 .0075667
10.94
_cons | .5837727 .0973358
6.00
------------------------------------- Interpretation beduc (see table 1 below).
The return to education is equal to 8.27 percent (due
to a one year increase in the educ (years of schooling), the hourly wage rate increases with 8.27 percent
(=.0827444*100)
3
2
1
0
1
0
10
years of education
log_wage
10
15
Fitted values
20
level-log
y
ln(x)
y 100
%x
log-level (semi-log)
ln(y)
x %y 100x1
log-log
ln(y)
ln(x)
%y = %x
1: The log-level approximation is precise for small values of .
Semi-log specification
A useful approximation for the natural logarithm for
small x is
ln(1 + x) x
This linear approximation x is very close for |x| < 0.1,
and reasonably close for 0.1 < |x| < 0.2, but the difference increases with |x|.
Now, if y is c% greater than y, then
y = (1 + c/100)y.
Taking natural logarithms,
ln(y ) = ln(y) + ln(1 + c/100)
or
ln(y ) ln(y) = ln(1 + c/100)
c
=
100
1.1
(5)
(6)
where i = 3 abil + i
Notice that
E(i |educi ) = E(3 abili +i |educi ) = 3 E(abili |educi ) = 0
because abili and educi are positively correlated (E(abili |educi ) >
0).
In case of equation (6), assumption 1.2 (strict exogeneity) is violated! In other words, a problem of omitted
variable bias arises.
Omitted variable bias. Let
2 be the estimator of 2 from a simple regression of
ln(wagei ) on educi (cf. equation (6)).
yi = ln(wagei ), x2i = educi and x3i = abili
Model (5) can be rewritten as follows:
yi = 1 + 2 x2i + 3 x3i + i
(7)
12
(8)
(10)
14
1.2
(12)
X)
= E((X
X)
1 X
y|X)
= = 2
E(|
3
0
E()
= E() =
In other words, is an unbiased estimator. Moreover,
is linear in y
15
(1 , 2 , 3 ) than .
16
1.3
(13)
2
(xji xj
i=1
)2 (1
, j = 2, . . . , K
Rj2 )
(14)
where
is the R-squared from regressing xj on all
other independent variables (and including an intercept).
Rj2
17
V ar(bj |X) =
n
2
(xji xj )2 (1
i=1
, j = 2, . . . , K
Rj2 )
(xji xj )2
i=1
18
Number of obs
F( 3,
522)
Prob > F
R-squared
Adj R-squared
Root MSE
=
=
=
=
=
=
526
74.67
0.0000
0.3003
0.2963
.44591
-----------------------------------------------------------------------------log_wage |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------educ |
.0903658
.007468
12.10
0.000
.0756948
.1050368
exper |
.0410089
.0051965
7.89
0.000
.0308002
.0512175
expersq | -.0007136
.0001158
-6.16
0.000
-.000941
-.0004861
_cons |
.1279975
.1059323
1.21
0.227
-.0801085
.3361035
------------------------------------------------------------------------------
log(wagei )
experi
Nobs
=
935
F(3,931)= 46.75
Prob > F=0.0000
R^2
=0.1309
Adj R^2 =0.1281
Root MSE=.39324
------------------------------------log_wage|
Coef. Std. Err.
t
--------+---------------------------educ| .0779866 .0066242
11.77
exper| .016256
.01354
1.20
expersq| .000152
.000567
0.27
_cons| 5.517432 .1248186
44.20
-------------------------------------
20
21
Nobs
=
935
F(2,932)= 70.16
Prob > F=0.0000
R^2
=0.1309
Adj R^2 =0.1290
Root MSE=.39304
22
Source |
SS
df
MS
-------------+-----------------------------Model | 59.2711314
4 14.8177829
Residual |
89.05862
521
.17093785
-------------+-----------------------------Total | 148.329751
525
.28253286
Number of obs
F( 4,
521)
Prob > F
R-squared
Adj R-squared
Root MSE
=
=
=
=
=
=
526
86.69
0.0000
0.3996
0.3950
.41345
-----------------------------------------------------------------------------log_wage |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------educ |
.0841361
.0069568
12.09
0.000
.0704692
.0978029
exper |
.03891
.0048235
8.07
0.000
.029434
.0483859
expersq |
-.000686
.0001074
-6.39
0.000
-.000897
-.0004751
female | -.3371868
.0363214
-9.28
0.000
-.4085411
-.2658324
_cons |
.390483
.1022096
3.82
0.000
.1896894
.5912767
------------------------------------------------------------------------------
23
24
If the regression model is to have dierent intercepts for, say, g categories, we need to include g 1
dummy variables in the model along with an intercept.
Example: In example below, we distinguish 4 different regions (g = 4):
1.
2.
3.
4.
. reg log_wage educ exper expersq female married northcen south west
Source |
SS
df
MS
Number of obs =
526
-------------+-----------------------------F( 8,
517) =
45.95
Model | 61.6387993
8 7.70484991
Prob > F
= 0.0000
Residual | 86.6909521
517 .167680758
R-squared
= 0.4156
-------------+-----------------------------Adj R-squared = 0.4065
Total | 148.329751
525
.28253286
Root MSE
= .40949
-----------------------------------------------------------------------------log_wage |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------educ |
.0808207
.0070101
11.53
0.000
.067049
.0945925
exper |
.0363615
.0052269
6.96
0.000
.0260929
.0466301
expersq |
-.000645
.0001128
-5.72
0.000
-.0008665
-.0004235
female | -.3345661
.0364315
-9.18
0.000
-.406138
-.2629941
married |
.0711934
.0417725
1.70
0.089
-.0108712
.153258
northcen |
-.070182
.0519674
-1.35
0.177
-.1722752
.0319111
south | -.1162238
.0486825
-2.39
0.017
-.2118637
-.0205839
west |
.04643
.0576308
0.81
0.421
-.0667894
.1596494
_cons |
.4625799
.1074952
4.30
0.000
.2513988
.673761
------------------------------------------------------------------------------
Interpretation bsouth : an employee living in the Southern part of the US earns about 11.6 % less than an
employee living in the North East (reference group)
keeping other factors fixed.
25
(16)
(17)
2 X
2 )1 X
2 y
1. Under assumptions 1.1-1.3 b2 = (X
That is b2 can be obtained by regressing the resid2
uals y on the matrix of residuals X
2 nu2. The residuals from the regression of y on X
merically equals e, the residuals from y on X (
.
(X1 ..X2 ))
2X
2 )1
3. Under assumptions 1.1-1.4, Vd
ar(b2 ) = s2 (X
2y
2 )1 X
2X
4. b2 = (X
That is b2 can be obtained by regressing y on the
2.
matrix of residuals X
Comments on the Frisch Waugh theorem
Consider the following simple regression model
yi = 1 x1i + 2 x2i + i
where x1i = 1 (the intercept)
If we stack the observations we obtain
y = 1 n + 2 x2 +
where n = (1, . . . , 1) is a ((n 1)-vector of ones.
Now we apply the Frisch-Waugh theorem where x1 =
n and P1 = n (n n )1 n
Notice that
P1 n = n
M1 n = (In P1 )n = n n = 0
y = M1 y = y yn where y =
1
n
yi (sample
i=1
average of yi ).
yi = yi y (deviation from the average)
27
2 = M1 x2 = x2 x2 n
x
x2i = x2i x2 (deviation from the average)
Part 1 of the Frisch-Waugh theorem says that we can
2.
obtain the OLS estimate b2 by regressing y on x
Then we obtain
n
(x2i x2 )(yi y)
sy
b2 = i=1
rx2 ,y
(18)
=
n
s
x
2
(x2i x2 )2
i=1
28
X2
b2
X2
or
X1 X1 b1 + X1 X2 b2 = X1 y
X2 X1 b1 + X2 X2 b2 = X2 y
(19)
(20)
X2 y
X2 y X2 P1 y
X2 (I P1 )y
X2 M1 y
X2 M1 M1 y
2 y
= X
=
=
=
=
=
2 )1 X
2 y
2X
b2 = (X
q.e.d.
29
(21)
M1 e = e because
M1 e
M1 e
M1 e
M1 e
=
=
=
=
(I P1 )e
e P1 e
e X1 (X1 X1 )1 X1 e
e
proof part 4:
2X
2 )1 X
2 y,
2
1
2) X M y
2 X
(X
2
1
1
2 X
2) X
2 y
(X
31
Introduction to Econometrics
Coordinator: prof. dr. Rob Alessie
email: r.j.m.alessie@rug.nl
oce: DUI 727 tel: 050-3637240
February 11, 2016
Lecture 4
Hayashi, chapter 1: section 1.4 (except page 45)
Organization of the lecture
1. Hypothesis Testing under Normality
Normally Distributed Error Terms
Testing Hypotheses about Individual Regression
Coecients
Decision Rule for the t-Test
Condence Interval
p-Value
Example t-test
Linear Hypotheses
The F -Test
A more convenient expression for the F test
t versus F
i = 1, . . . , n (1)
i = 1, . . . , n
(2)
where Yi denotes output, Li labor input, and Ki capital. If production technology exhibits constant returns to scale, then 1 + 2 = 1.
The unbiasedness property of OLS (E(b|X) = ) guarantees that (cf equation (2))
E(b1 + b2 ) = 1 + 2 = 1
if the restriction of constant returns to scale is true.
However, given the sampling error b1 +b2 is not exactly
equal to 1 for a particular sample at hand.
Obviously, we cannot conclude that the restriction is
false just because the estimate b1 + b2 diers from 1.
To decide whether the sampling error
b1 + b2 1
is too large for the restriction to be true, we need to
construct from the sampling error some test statistic
whose probability distribution is known given the truth
of the hypothesis.
2
1.1
|X normal
E(|X) = 0
f (|X) = f () = (2 2 )n/2 exp(0.5 2 )
V ar(|X) = 2 In
In other words
|X N (0, 2 In ),
N (0, 2 In )
(3)
(4)
(5)
(6)
1
where (X X)1
kk is the (k,k ) element of (X X) .
Standardizing bk k gives
zk
bk k
2 (X X)1
kk
N (0, 1)
(7)
bk k
Vd
ar(b|X)kk
bk k
s2 (X X)1
kk
(8)
Proposition 1.3 (distribution of the t-ratio): Suppose Assumptions 1.1,..,1.5 hold. Under the null hypothesis H0 : k = k , the t-ratio defined as in equation
(8) is distributed as t(n K) (the t-distribution with
n-K degrees of freedom).
Some facts:
Fact 1: If
1. x N (0, 1)
2. y 2 (m)
3. x and y are independent
then the ratio
x/ y/m
bk k
2
zk
tk =
=
2
s2
2 (X X)1 s
kk
=
where q =
zk
e e/(nK)
2
e e
2
zk
=
q
nK
(9)
1.4
Condence interval
10
1.5
p-Value
(10)
11
1.6
Example t-test
Number of obs
F( 3,
931)
Prob > F
R-squared
Adj R-squared
Root MSE
=
=
=
=
=
=
935
46.75
0.0000
0.1309
0.1281
.39324
-----------------------------------------------------------------------------log_wage |
Coef.
Std. Err.
t
P>|t|
95% Conf. Interval
-------------+---------------------------------------------------------------educ |
.0779866
.0066242
11.77
0.000
.0649864
.0909867
exper |
.016256
.01354
1.20
0.230
-.0103164
.0428285
expersq |
.000152
.000567
0.27
0.789
-.0009607
.0012647
_cons |
5.517432
.1248186
44.20
0.000
5.272474
5.762391
------------------------------------------------------------------------------
b2 0 .0779866
=
= 11.77
se(b2 )
.0066242
12
14
1.7
Linear hypotheses
0 0 1 0
0 0 0 1
[
and r =
0
0
15
0 0 1
R=0 0 0
0 0 1
0
0
1 and r = 0
1
0
0 0 1 0
0
R = 0 0 0 1 and r = 0
0 1 0 0
0.05
16
1.8
The F -test
1
= (Rb r) [Rvar(b|X)R
c
] (Rb r)/#r
(13)
17
w/#r
q/(n K)
where q = e e/ 2 and
w = (Rb r) [ 2 R(X X)1 R ]1 (Rb r)
We need to show
1) w|X 2 (#r)
2) q|X 2 (n K) (this is part 1 in the proof of
theorem 1.3)
3) w and q are independently distributed conditional
on X.
Then, by the denition of the F distribution,
F F (#r, n K)
1) Let v = Rb r. Under H0 : Rb r = R(b )
We already know that b|X N (0, 2 (X X)1 )
So conditional on X, v is normal with mean 0, and
variance
var(v|X) = var(R(b )|X) = R var(b |X)R =
= 2 R(X X)1 R
which is the inverse of the middle matrix in the
quadratic form for w. Hence,
w = v var(v|X)1 v
Since R is of full row rank and X X is nonsingular
(assumption 1.3), then var(v|X) = 2 R(X X)1 R
is nonsingular (and we can therefore take an inverse
in equation above)
Moreover due to fact 4, w|X 2 (#r)
18
19
20
1.9
3.
4.
5.
6.
(16)
L()
(r R)
= 0.5(y y 2y X + X X )
First order conditions (check!)
X X = X y R = b (X X)1 R
R = r
(17)
(18)
Obviously = y Xb + X(b )
Then
SSRR =
=
SSRR SSRU =
SSRR SSRU =
(e + X(b ))
= (e + X(b ))
X e + (b )
(X X)(b )
SSRU + 2(b )
(X X)(b )
(b )
(20)
(Rb r) (R(X X)1 R )1 (Rb r)
q.e.d.
1.10
t versus F
Number of obs
F( 3,
931)
Prob > F
R-squared
Adj R-squared
Root MSE
=
=
=
=
=
=
935
46.75
0.0000
0.1309
0.1281
.39324
-----------------------------------------------------------------------------log_wage |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------educ |
.0779866
.0066242
11.77
0.000
.0649864
.0909867
exper |
.016256
.01354
1.20
0.230
-.0103164
.0428285
expersq |
.000152
.000567
0.27
0.789
-.0009607
.0012647
_cons |
5.517432
.1248186
44.20
0.000
5.272474
5.762391
-----------------------------------------------------------------------------. test exper expersq
23
( 1)
( 2)
exper = 0
expersq = 0
F(
2,
931) =
Prob > F =
17.95
0.0000
24
i = 1, . . . , n
i = 1, . . . , n
(21)
25
Introduction to Econometrics
Coordinator: prof. dr. Rob Alessie
email: r.j.m.alessie@rug.nl
oce: DUI 727 tel: 050-3637240
February 16, 2016
Overview Chapter 1
Assumptions
Assumption 1.1: the relationship between the dependent variable and the regressors is linear.
yi = 1 xi1 + 2 xi2 + . . . + K xiK + i ,
i = 1, . . . , n
i = 1, . . . , n
2.1
(1)
(3)
We write this as zn 0
a.s.
(4)
We write this as zn 0
m.s.
a.s.
and
zn z is the same as zn z 0
m.s.
m.s.
4) Convergence in distribution
Let {zn } be a sequence of random scalars and Fn be
the cumulative distribution function (c.d.f.) of zn .
We say that {zn } converges in distribution to a
random scalar z if
lim Fn (z) = F (z)
F is well-known, e.g. zn
N (0, 1)
d
If z is a constant
zn
z zn
z
p
(5)
2.2
m.s
(b) zn zn
. So zn z zn
z
a.s
a.s
(c) zn
zn
Lemma 2.3 (preservation of convergence for continuous transformation): Suppose a() is a vectorvalued continuous function that does not depend on
n.
(a) zn
a(zn )
a(), or stated dierently
p
Lemma 2.4 (Parts (a) and (c) are called the Slutskys
theorem)
(a) xn
x, yn
xn + yn
x+
d
(b) xn
x, yn
0
d
yn xn
0
p
(c) xn
x, An
A An xn
Ax provided that
d
(d) xn
x, An
A xn A1
x A1 x
n xn
d
(6)
We say that the two sequences zn and xn are asymptotically equivalent: when yn = zn xn
0 then
zn xn or zn = xn + op (1)
(7)
Therefore, replacing yn by its probability limit does
not change the asymptotic distribution of yn xn , provided xn converges in distribution to some random vector x.
, i.e. n . Consequently, n
(lemma 2.2a).
m.s.
n(n )
N (0, )
d
m.s.
12
density
3
xbar
n=4
n=500
n=30
Figure 1:
Central Limit Theorems (CLTs) are about the limiting behavior of the dierence between zn , and E(zn )
1
n(zn ) =
(zi )
N (0, )
d
n i=1
n
n( zn ) = n(zn )
N (0, )
d
Theorem, [ n(zn )]
x (cf. vector version of
d
14
Example continued
The table below and gures 2 and 3 conrm the
validity of the CLT (Lindeberg-Levy), i.e. as the
sample size increases the ratio
n(
xn ) = n(
xn 1)
converges in distribution to N (0, 2) (remember that
in the example xi 2 (1), so var(xi ) = 2).
15
.1
cltity
.2
.3
.4
0
ybar
n=4: sqrt(4)(ybar1)
n=500: sqrt(500)(ybar1)
Figure 2:
16
4
n=30: sqrt(30)(ybar1)
.1
cltity
.2
.3
0
clt500
Figure 3:
17
n=500: sqrt(500)(ybar1)
In case of time series data, the assumption of strict exogeneity is far stronger that that of contemporaneous
exogeneity (E(i |xi )). (see slides of lecture 1)
Suppose that the regressors include a constant (almost
always the case)
Then assumption 2.3 implies
1. E(i ) = 0
2. Cov(xi , i ) = E(xi i ) = 0 That is, all (nonconstant) regressors xi are uncorrelated with the error
term i .
Assumption 2.4:
Since by the random sample assumption 2.2 {xi } is
i.i.d., {xi xi } is i.i.d., so by Kolmogorovs second strong
law of large numbers:
Sxx xx
a.s.
where
1
1
xi xi = X X
n i=1
n
n
Sxx
1
1
b = Sxx sxy can be computed (sxy n
xi yi ).
i=1
3.2
Let
1
1
gi =
xi i
g =
n i=1
n i=1
n
21
.
Proposition 2.1.a.(Consistency of b for ): Under
Assumptions 2.1-2.4
plim b =
n
Proof:
The sampling error b can be written as
b = (X X)1 X
(
)1 (
)
1
1
=
XX
X
n
n
( n
)1 ( n
)
1
1
=
xi xi
xi i
n i=1
n i=1
1
Sxx
g
where
(8)
1
gi = xi i , g =
gi
n i=1
n
xx
by lemma 2.3.
p
22
. So
plim b =
n
q.e.d.
23
1
1
n (b )
N (0, xx
Sxx
)
d
Proof
Rewrite (8) as
1
n(b ) = Sxx
( n
g)
(10)
Notice that
1. by assumptions 2.1 and 2.2, {gi } ({xi i }) is
i.i.d.
2. by assumption 2.3 E(gi ) = 0
3. by assumptions 2.5 and 2.3
var(gi ) = E(gi gi ) = S
We can therefore apply Lindeberg-Levys CLT and
state that
n
g
N (0, S)
d
1
1
n(b )
N (0, xx
Sxx
)
d
24
(11)
3.3
s2 is consistent
25
Hypothesis testing
\k )
n(bk k )
N (0, Avar(bk )) and Avar(b
Avar(bk )
p
n(bk k )
bk k
tk
=
N (0, 1)
(12)
SE (bk ) d
\
Avar(bk )
where
SE (bk )
1 \
Avar(bk )
n
26
1 1 1
(S SSxx )kk
n xx
27
xx
then
(a) Under H0 : k = k , tk as dened in(12)
N (0, 1)
d
\
W n(Rbr) [R[Avar(b)]R
]1 (Rbr)
2 (#r)
(13)
Proof part b
Write W as
\
W = cn Q1
c
where
c
28
Since the #r-dimensional random vector c is is normally distributed and since Q = var(c), c Q1 c
2 (#r)
q.e.d.
29
(14)
E(gi gi ) = E(xi xi 2i )
E[E(xi xi 2i |xi )]
E[xi xi E(2i |xi )]
E[xi xi 2 ]
2 E(xi xi ) = 2 xx
(15)
of lemma 2.3.a)
Since by proposition 2.2 s2
2 the following estimap
30
Substituting (15) into (11) yields the following simplied expression for Avar(b):
1
Avar(b) = 2 xx
(16)
(17)
n(bk k )
bk k
tk =
=
(18)
SE(b
)
1
k
2
ns (X X)kk
where SE(bk ) denotes the usual standard error of bk
The Wald statistic dened in equation (13) simplies
(see page 128 of the book)
W = #rF = (SSRR SSRU )/s2
(19)
32
2. E(i ) = 0, var(i ) = 2, skewness of 8 and kurtosis of 15. (By contrast, a normal error has
skewness of 0 and kurtosis of 3.
3. 1000 simulations performed: b2 , se2 , t-values of
the t test of H0 : 2 = 2 and the outcome of a
two sided test of H0 at (nominal) signicance
level 0.05.
4. Notice that this example satises the assumptions 2.1-2.5. Therefore we would expect that
b2 is a consistent estimator
5. Moreover, we expect that the t-ratio formulated
above follows a N (0, 1) distribution (especially
in the case for n = 1500 (the t distribution and
standard normal distribution almost coincide if
n is large (n > 120).
6. See Stata program simulate_mean.do for the
simulations.
7. Simulation results indicate that
(a) the mean of the point estimates is very close
to the true value of 2 (irrespective of sample
size)
This conrms that OLS is an unbiased estimator
33
(b) The standard deviation of the point estimates is close to the mean of the standard
errors
(c) The distribution of the t-ratio converges to a
standard normal distribution as the sample
size increases (the distribution becomes less
rightly skewed as sample size increases ).
This is a conrmation of proposition 2.3.b
(2.5b) see below)
(d) The row reject2f indicates the fraction of
rejections (out of 1000 simulations) of H0 :
2 = 2. It basically measures the actual
size of the test (the probability of Type I
error). The actual size is close to the nominal size (see below)
n=150
. tabstat b2f se2f t2f reject2f,stat(mean sd sk ku min max) long col(stat)
variable |
mean
sd skewness kurtosis
min
max
-------------+-----------------------------------------------------------b2f | 2.000506
.08427 .5324175 4.206323 1.719513
2.40565
se2f | .0839776 .0172588 .4718556 3.067525 .0415919
.145264
t2f | .0028714 .9932668 .5252773 3.664809 -2.824061 4.556576
reject2f |
.046 .2095899 4.334438 19.78735
0
1
-------------------------------------------------------------------------n=1500
. tabstat b2f se2f t2f reject2f,stat(mean sd sk ku min max) long col(stat)
variable |
mean
sd skewness kurtosis
min
max
-------------+-----------------------------------------------------------b2f | 1.999467 .0266265 .3857573 3.136166 1.925347 2.103468
se2f | .0258293
.001761 .2088724 3.007599 .0201466
.031988
t2f | -.0211082
1.02924 .3641272 3.093189 -3.016825 3.843584
reject2f |
.052 .2221381 4.035545 17.28562
0
1
--------------------------------------------------------------------------
34
S
ei xi xi
n i=1
n
(20)
1
1
\ = S 1
Avar(b)
e2i xi xi Sxx
(21)
xx
n i=1
proof for special case K = 1
Since K = 1, b, and xi are scalars
Assumption 2.5: S exists and is nite
b
(proposition 2.1)
p
35
Obviously
e2i = ((yi xi ) (b )xi )2
= (i (b )xi )2
= 2i 2(b )xi i + (b )2 x2i
Pre-multiplying this equation with x2i and averaging over i yields
1 3
1 4
1 2 2 1 2 2
xi e i
xi i = 2(b)
xi i +(b)2
x
n i=1
n i=1
n i=1
n i=1 i
(22)
n
1
x2i 2i
S
n
i=1
1
n
x4i
i=1
1
2) This also holds for n
x3i i (see analytical exi=1
S
x
0
n i=1 i i p
n
or
S S
0 or S
S
p
q.e.d.
36
5.1
ei xi xi
S
n K n i=1
n
(23)
. * simulation results
. tabstat b2f se2f t2f reject2f,stat(mean sd sk ku min max) long col(stat)
variable |
mean
sd skewness kurtosis
min
max
-------------+-----------------------------------------------------------b2f | 1.999467 .0266265 .3857573 3.136166 1.925347 2.103468
se2f | .0253119 .0040947 1.268405 6.061013 .0161489 .0493915
t2f | -.1157212 1.054541 -.1146999 3.109138 -4.511499 3.327462
reject2f |
.059 .2357426 3.743241 15.01185
0
1
-------------------------------------------------------------------------.
. set seed 10101
. global numobs 15000
37
. * simulation results
. tabstat b2f se2f t2f reject2f,stat(mean sd sk ku min max) long col(stat)
variable |
mean
sd skewness kurtosis
min
max
-------------+-----------------------------------------------------------b2f | 2.000018 .0080713 .1267341 2.752077 1.977108
2.02643
se2f |
.008128 .0004615 .6295791 4.125336 .0069674 .0106837
t2f | -.0305723 .9933768 -.0500329 2.711769 -3.070038
2.82062
reject2f |
.047 .2117447 4.280878 19.32591
0
1
--------------------------------------------------------------------------
1 2
2
21
S s Sxx =
ei xi xi s
xi xi
n i=1
n i=1
1 2
(ei s2 )xi xi
=
O
p
n i=1
n
38
(24)
(25)
Number of obs
F( 3,
565)
Prob > F
R-squared
Adj R-squared
Root MSE
=
569
= 2716.02
= 0.0000
= 0.9352
= 0.9348
= 156.26
-----------------------------------------------------------------------------labor |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------wage | -6.741904
.5014054
-13.45
0.000
-7.72675
-5.757057
output |
15.40047
.3556333
43.30
0.000
14.70194
16.09899
capital | -4.590491
.2689693
-17.07
0.000
-5.118793
-4.062189
_cons |
287.7186
19.64175
14.65
0.000
249.1388
326.2984
-----------------------------------------------------------------------------. scalar s=sqrt(e(rss)/e(df_r))
40
Number of obs
F( 9,
559)
Prob > F
R-squared
Adj R-squared
Root MSE
=
=
=
=
=
=
569
279.41
0.0000
0.8181
0.8152
62443
-----------------------------------------------------------------------------res2 |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------output | -2573.292
512.1794
-5.02
0.000
-3579.324
-1567.261
capital |
2810.435
663.0731
4.24
0.000
1508.016
4112.854
wage |
554.3507
833.0281
0.67
0.506
-1081.897
2190.598
outputoutput |
27.59449
1.836334
15.03
0.000
23.98753
31.20144
outputwage |
58.53846
8.11748
7.21
0.000
42.59397
74.48295
outputcapi~l | -40.02937
3.746344
-10.68
0.000
-47.388
-32.67073
wagewage | -10.07189
9.290223
-1.08
0.279
-28.3199
8.176121
wagecapital | -48.24574
14.01991
-3.44
0.001
-75.78389
-20.70759
capitalcap~l |
14.41762
2.010053
7.17
0.000
10.46944
18.3658
_cons | -260.8893
18478.55
-0.01
0.989
-36556.77
36034.99
-----------------------------------------------------------------------------. scalar LM=e(r2)*e(N)
. scalar pval2=chi2tail(e(df_m),LM)
. dis LM
465.51928
. dis pval2
1.377e-94
.
. /* rerun the regression with white standard errors,
outreg2 also reports the results of the white test
(rejection of H_0) */
. reg labor output capital wage, robust
41
(26)
43
(29)
(30)
(31)
Notice that
V {|X}
= V {C|X} = CV {|X}C = 2 CV C = 2 In
(33)
In other words the transformed model (31) satises
the Gauss Markov assumptions, including (A1.4)
) is BLUE
Therefore the GLS estimator (V
Obviously
)|X) = 2 (X V 1 X)1
V ((V
Unbiased estimator for 2 :
1
(Cy CX )
=
2 = nK
(Cy CX )
1
V 1 (y X )
(y X )
(34)
(35)
nK
(36)
45
i = 1, . . . , n
(37)
Suppose that
V {i |xi } = i2 = 2 h2i = 2 exp{1 zi1 +. . .+J ziJ } = 2 exp{zi }
(38)
Problem the parameter vector not known
We need consistent estimate for this vector. This
estimate can be obtained by running the following
regression
loge2i = log 2 + zi + i
(39)
i = xi / exp(z )
on xi /h
i (do not forget to
transform the constant term!!!!
6. see Stata do le labour2.do for an implementation of feasible GLS
46
(1)
VARIABLES
wage
output
capital
OLS
(2)
(3)
(4)
OLS level
white s.e. alpha param FGLS level
-6.742*** -6.742***
(0.501)
(1.858)
15.40*** 15.40***
(0.356)
(2.491)
-4.590*** -4.590***
(0.269)
(1.719)
0.00842
(0.00713)
0.0341***
(0.00506)
-0.0201***
(0.00383)
const
Constant
Observations
R-squared
s
adj R squared
White test
p-value white test
287.7***
(19.64)
287.7***
(65.11)
6.937***
(0.279)
569
0.935
156.3
0.935
569
0.935
569
0.124
0.935
465.5
0
Standard errors in parentheses
*** p<0.01, ** p<0.05, * p<0.1
47
-2.638***
(0.318)
17.06***
(0.500)
-3.724***
(0.235)
116.6***
(11.46)
569
0.900
Normally the econometrician is concerned about estimating unknown parameters from a sample.
However, assume for the moment that we know the
joint distribution (y, x) of the dependent variable y and
the random vector x. Moreover, we know the value of
x.
On the basis of this knowledge we wish to predict y.
A predictor is a function f (x) of x (f (.) determined
by the joint distribution of (y, x)).
Forecast error: y f (x)
Mean squared error: E[(y f (x))2 ]
48
49
8.2
50
2]
E[(y x )
2]
= E[((y x ) + x ( ))
E[x(y x )] +
= E[((y x )2 ] + 2( )
E(xx )( )
( )
2
E(xx )( )
= E[((y x ) ] + ( )
E[((y x )2 ]
q.e.d.
51
= + x
(46a)
E (y|x) E (y|1, x)
where
1 Cov(x,
y) and = E(y) E(x)
(46b)
= var(x)
Proof of equation (46):
If (, ) is the least squares projection coecients,
it satises
[ ]
E(xx )
= E(x y)
(47)
or
[
][ ] [
]
1
E(x)
E(y)
=
(48)
E(x
x
)
E(x)
E(xy)
or
= E(y) E(x)
+ E(x
x
) = E(xy)
E(x)
(49)
(50)
) + E(xx ) = E(xy)
E(x)(E(y)
E(x)
(51)
or
) = E(xy)
E(x)E(y)
(E(xx ) E(x)E(
x)
(52)
or
= Cov(x,
y) = var(x)
1 Cov(x,
y)
var(x)
(53)
52
1
1
xi xi
xi yi = (X X)1 X y
n i=1
n i=1
which is the usual OLS estimator b.
That is, under Assumption 2.2 (random sample) and
Assumption 2.4 guaranteeing the nonsingularity of E(xx ),
the OLS estimator is always consistent for the projection coecient vector, the that satises the orthogonality condition (45).
One also needs the assumptions 2.1 (linearity) and 2.3
(predetermined regressors) in order to attach an economic interpretation to and its OLS estimate b.
53
54
Observations
R-squared
s
adj R squared
White test
p-value white test
Constant
const
capital
output
wage
VARIABLES
569
0.935
287.7***
(65.11)
-6.742***
(1.858)
15.40***
(2.491)
-4.590***
(1.719)
(2)
OLS level
white s.e.
569
0.124
6.937***
(0.279)
0.00842
(0.00713)
0.0341***
(0.00506)
-0.0201***
(0.00383)
alpha param
(3)
0.935
465.5
0
Standard errors in parentheses
*** p<0.01, ** p<0.05, * p<0.1
569
0.935
156.3
0.935
287.7***
(19.64)
-6.742***
(0.501)
15.40***
(0.356)
-4.590***
(0.269)
OLS
(1)
569
0.900
-2.638***
(0.318)
17.06***
(0.500)
-3.724***
(0.235)
116.6***
(11.46)
FGLS level
(4)
55
Observations
R-squared
s
adj R squared
White test
p-value white test
Constant
const
ln capital
569
0.843
6.177***
(0.294)
-0.928***
(0.0867)
0.990***
(0.0468)
-0.00370
(0.0379)
(2)
OLS log
white s.e.
569
0.024
-3.254***
(1.185)
-0.0611
(0.344)
0.267**
(0.127)
-0.331***
(0.0904)
alpha param
(3)
0.842
58.54
2.56e-09
Standard errors in parentheses
*** p<0.01, ** p<0.05, * p<0.1
569
0.843
0.465
0.842
6.177***
(0.246)
-0.928***
(0.0714)
0.990***
(0.0264)
-0.00370
(0.0188)
ln wage
ln output
OLS log
VARIABLES
(1)
569
0.990
-0.856***
(0.0719)
1.035***
(0.0273)
-0.0569***
(0.0216)
5.895***
(0.248)
FGLS log
(4)
March 2016
Week 5
1/84
March 2016
1 / 84
Outline
1
GMM in Stata
Week 5
2/84
March 2016
2 / 84
Week 5
3/84
March 2016
3 / 84
Optimal prediction
Week 5
4/84
March 2016
4 / 84
(y f (x))2 = (y E (y |x))2
+ 2 (y E (y |x)) (E (y |x) f (x))
+ (E (y |x) f (x))2
Week 5
5/84
March 2016
5 / 84
+ E (E (y |x) f (x))2
Week 5
6/84
March 2016
6 / 84
Week 5
7/84
March 2016
7 / 84
Week 5
8/84
March 2016
8 / 84
Proof.
We follow the add-and-subtract strategy once more:
mean squared error =E
=E
2
y x0 e
2
y x0 + x0 e
0
2 i
y x0
+ 2 e E x y x0
0
+ e E xx0 e
h
0
2 i
=E y x0
+ e E xx0 e
h
2 i
E y x0
=E
Week 5
9/84
March 2016
9 / 84
Proof:
or
1
E (e
x)
E (e
x) 0
E (e
xe
x0 )
Week 5
E (y )
=
E (e
xy )
10/84
March 2016
10 / 84
E (y )
1
E (e
x) 0
=
E (e
xy )
E (e
x) E (e
xe
x0 )
or
= E (y ) E (e
x) 0
E (e
x) + E e
xe
x0 = E (e
xy )
Done with , now substitute expression for into second equation:
E (e
x) E (y ) E (e
x) 0 + E e
xe
x0 = E (e
xy )
0
E e
xe
x0 E (e
x) E (e
x) = E (e
xy ) E (e
x) E (y )
Jochem de Bresser (RUG)
Week 5
11/84
March 2016
11 / 84
E (e
x) E (y ) E (e
x) 0 + E e
xe
x0 = E (e
xy )
0
E e
xe
x0 E (e
x) E (e
x) = E (e
xy ) E (e
x) E (y )
or
Var (e
x) = Cov (e
x, y ) = Var (e
x)1 Cov (e
x, y )
Week 5
12/84
March 2016
12 / 84
In conclusion
Week 5
13/84
March 2016
13 / 84
bOLS
1 n
=
xi xi0
n i
=1
|
{z
! 1
!
1 n
xi i
n i
=1
|
{z
}
Week 5
14/84
March 2016
14 / 84
Week 5
15/84
March 2016
15 / 84
bOLS =
1 n
xi xi0
n i
=1
1 n
xi xi0
n i
=1
= +
! 1
1 n
xi yi
n i
=1
! 1
!
1 n
0
xi xi + ui + i
n i
=1
! 1
! 1
!
1 n
1 n
1 n
0
0
xi xi
xi ui +
xi xi
n i
n i
n i
=1
=1
=1
Week 5
16/84
1 n
xi i
n i
=1
March 2016
16 / 84
1 n
xi xi0
n i
=1
! 1
1 n
xi ui
n i
=1
1 n
xi xi0
n i
=1
! 1
1 n
xi i
n i
=1
p
1
bOLS + xx
E ( x i ui )
Week 5
17/84
March 2016
17 / 84
Week 5
18/84
March 2016
18 / 84
cov (xi , i )
var (xi )
E (xi i ) = E [(wi + ui ) (i 2 ui )] = 2 u2
var (xi ) = var (wi + ui ) = w2 + u2
Attenuation bias: b2 biased towards zero
Jochem de Bresser (RUG)
Week 5
19/84
March 2016
19 / 84
2
1 2 u 2
w + u
w2
b1,OLS of the
Week 5
20/84
March 2016
20 / 84
Week 5
21/84
March 2016
21 / 84
u2
2
> 0; cov (pi , i ) =
<0
1 1
1 1
cov (pi , qi )
var (pi )
Week 5
22/84
March 2016
22 / 84
Week 5
23/84
March 2016
23 / 84
Week 5
24/84
March 2016
24 / 84
pi =
Week 5
25/84
March 2016
25 / 84
cov (tempi , qi )
cov (tempi , pi )
1
n
1
n
26/84
March 2016
26 / 84
Week 5
27/84
March 2016
27 / 84
yi = 0 + 1 xi + u (xi ) implies
du
dy
= 1 +
dx
dx
OLS is inconsistent for 1 since it measures
Jochem de Bresser (RUG)
Week 5
28/84
dy
dx ,
not just 1
March 2016
28 / 84
In a path diagram:
Note that zi does not directly cause yi , but zi and yi are correlated
through xi
Week 5
29/84
March 2016
29 / 84
The IV estimator is
bIV = z0 x
Jochem de Bresser (RUG)
Week 5
1
30/84
z0 y
March 2016
30 / 84
General formulation
Notation:
yi : dependent variable of interest
zi : L-dimensional vector of regressors (some of which endogenous)
: parameter vector of interest
xi : K -dimensional vector of instruments (exogenous regressors and at
least 1 instrument for each endogenous regressor)
wi : vector of unique and non-constant elements of (yi , zi , xi )
= E ( gi ) = 0
where gi xi i
Jochem de Bresser (RUG)
Week 5
31/84
March 2016
31 / 84
Week 5
32/84
March 2016
32 / 84
Week 5
33/84
March 2016
33 / 84
Identification
3.4: rank condition for identification: the K L matrix
xz = E xi zi0
is of full column rank (rank equals L, the number of its columns)
To see why we need this, look at the orthogonality condition:
E (gi ) = E (g (wi ; )) = E (xi i ) = E xi yi zi0 = 0
E (xi yi ) E xi zi0 = 0
xz = xy
where xz = E (xi zi0 ) and xy = E (xi yi )
This is a system of K equations in L unknowns
Unique solution if and only if xz is of full column rank
In that case is identified
Jochem de Bresser (RUG)
Week 5
34/84
March 2016
34 / 84
Identification
Week 5
35/84
March 2016
35 / 84
Assumptions so far:
3.1: linearity: the equation to be estimated is linear
3.2: random sample: {wi } is an i.i.d. stochastic process
3.3: instrument exogeneity or instrument validity: all K variables
in xi are orthogonal to the error term: E (xi i ) = E (gi ) = 0
3.4: rank condition for identification: the K L matrix
xz = E (xi zi0 ) is of full column rank (rank equals L, the number of its
columns)
Week 5
36/84
March 2016
36 / 84
where e
i = yi zi0 e
MM amounts to choosing e that solves system of K equations with L
unknowns
Jochem de Bresser (RUG)
Week 5
37/84
March 2016
37 / 84
n i =1
n i =1
Sxz e = sxy
where Sxz =
If K = L
1
n
1
n
ni=1 xi yi
Week 5
1
n
xi zi0
i =1
38/84
! 1
1
n
x i yi
i =1
March 2016
38 / 84
zi e i = n zi
i =1
yi zi0 e = 0
i =1
1
n
zi yi n zi zi0 e = 0
i =1
i =1
Szz e = szy
bIV = bOLS =
Jochem de Bresser (RUG)
Week 5
1 n
zi zi0
n i
=1
39/84
! 1
1 n
zi yi
n i
=1
March 2016
39 / 84
Week 5
40/84
March 2016
40 / 84
Week 5
41/84
March 2016
41 / 84
qi = 0 + 1 pi + i
1
tempi
pi
tempi pi
IV estimator
b
0
1
=
1 n
b
1
n i =1 tempi
Jochem de Bresser (RUG)
1
0
i =
tempi
0
0
1
=E
q
1
tempi i
1 n
n i =1 pi
1 n
n i =1 tempi pi
Week 5
42/84
1
1 n
n i =1 qi
1 n
n i =1 tempi qi
March 2016
42 / 84
where
i = n1 ni=1 xi yi n1 ni=1 xi zi0 e = sxy Sxz e
gn e = n1 ni=1 xi e
c is a weighting matrix that can be random and depend on the
and W
p
c
sample size (estimated); W
W, with W symmetric and positive
definite
Jochem de Bresser (RUG)
Week 5
43/84
March 2016
43 / 84
c :
So system has unique solution that is the GMM estimator b W
1
0 c
0 c
c = Sxz
b W
WSxz
Sxz
Wsxy
If K = L then Sxz is square and the GMM estimator reduces to the
IV estimator
1
bIV = Sxz
sxy
Week 5
44/84
March 2016
44 / 84
Sampling error
Model
yi = zi0 + i
Multiply both sides from left with xi and take averages:
sxy = Sxz + g
g = n1 ni=1 xi i = n1 ni=1 g (wi ; ) = gn ()
Substitute into definition of GMM estimator:
1
0 c
0 c
c = Sxz
b W
WSxz
Sxz
Wsxy
1
n
0 c
0 c1
= Sxz
WSxz
Sxz
W xi yi
n i =1
1
n
0 c
0 c1
= Sxz
WSxz
Sxz
W xi zi0 + i
n i =1
1
0 c
0 c
c = Sxz
b W
WSxz
Sxz
Wg
Jochem de Bresser (RUG)
Week 5
45/84
March 2016
45 / 84
Week 5
46/84
March 2016
46 / 84
1
0 c
0 c
c = Sxz
b W
WSxz
Sxz
Wg
Key term: g =
1
n
ni=1 xi i =
1
n
ni=1 g (wi ; ) = gn ()
Proposition
Under assumptions 3.1-3.4 the GMM estimator is consistent:
c =
plimn b W
p
g 0 (orthogonality conditions)
Week 5
47/84
March 2016
47 / 84
1
n
ni=1 xi i =
1
n
ni=1 g (wi ; ) = gn ()
Proposition
Under assumptions 3.1-3.5 the GMM estimator is asymptotically normal:
h i
d
c
c
n b W
N 0, avar b W
h i
0 W )1 0 WSW (0 W )1
c = (xz
where avar b W
xz
xz
xz
xz
xz
2
0
0
c = W; S = E gi g = E xi x0
and xz = E xi zi ; plim W
i
i
i
p
d
Sxz xz ; n g N (0, S) (CLT)
a
a
0 W )1 0 W
n g N (0, S) = n Ag N (0, ASA0 ) with A = (xz
xz
xz
Jochem de Bresser (RUG)
Week 5
48/84
March 2016
48 / 84
Asymptotic distribution
of GMM estimator consistent
h i
c
estimate of avar b W
1
0 c
0 c
c = Sxz
b W
WSxz
Sxz
Wg
Proposition
b
Suppose we have a consistent estimator
h iS for S. Then, under random
c consistently by
sampling, we can estimate avar b W
0 c
Sxz
WSxz
1
1
0 cbc
0 c
Sxz
WSWSxz Sxz
WSxz
Week 5
49/84
March 2016
49 / 84
Under 3.1; 3.2 and finite second moment of the regressors (E (zi zi0 ) is
finite) we can show that
Proposition
b
The error variance can be estimated using any consistent estimator :
1 n 2 p
b
i E 2i
n i =1
if E 2i exists and is finite
Week 5
50/84
March 2016
50 / 84
Estimation of S
Suppose E (xik zil )2 exists and is finite for all k = 1, ..., K and
l = 1, ..., L
Proposition
Use a consistent estimator b to calculate the residual b
i . Then, under 3.1;
3.2; 3.5 and 3.6:
n
b = 1 b
S
2 xi xi0
n i =1 i
is consistent for S
Week 5
51/84
March 2016
51 / 84
c
Optimal weighting matrix W
The asymptotic variances of GMM estimators indexed by W are given
by
h i
1 0
1
0
0
c = xz
Wxz
xz WSWxz xz
Wxz
avar b W
Proposition
0
A lower bound on the asymptotic
variance is given by (xz Wxz )
c = S1 :
bound is achieved if plim W
b 1
c=S
W
Week 5
1 n 2 0
b
i xi xi
n i
=1
52/84
. That
! 1
March 2016
52 / 84
1
c such as W
c = IK or W
c
Choose any convenient matrix W,
= Sxx .
c and 2 and
c to compute a consistent estimate of : b W
Use this W
i
b
S.
b
Use
the efficient
S to
calculate
1 GMM estimator
0 S
0 S
b 1 = Sxz
b 1 Sxz
b 1 sxy
b S
Sxz
Week 5
53/84
March 2016
53 / 84
Hypothesis testing
h i
a
c
c
N 0, avar b W
n b W
Under H0 : l = l
b
c l
c l
n W
b W
d
l
l
N (0, 1)
tl r
h\
i =
SEl
c
avar b W
ll
Under H0 : R = r
0
h i 1
d
c r
c
c r
n Rb W
R avar b W
R0
Rb W
2 (#r )
#r is the number of restrictions and R (#r L) is of full row rank
Jochem de Bresser (RUG)
Week 5
54/84
March 2016
54 / 84
p
d
d
b
b 1
ng N (0, S) and S
S, so J , S
2 (K )
Jochem de Bresser (RUG)
Week 5
55/84
March 2016
55 / 84
Week 5
56/84
March 2016
56 / 84
Week 5
57/84
March 2016
57 / 84
a
ng N 0, 2 xx
a
n b2SLS N 0, A2 xx A0
1
0 ( ) 1
0 ( ) 1
where A = xz
xz
xx
xz
xx
Jochem de Bresser (RUG)
Week 5
58/84
March 2016
58 / 84
n b2SLS
a
ng N 0, 2 xx
a
N 0, A2 xx A0
1
0 ( ) 1
0 ( ) 1
xz
where A = xz
xx
xx
xz
Notice how terms cancel out:
0
A2 xx A0 = (B)1 xz
(xx )1 2 xx (xx )1 xz (B)1
0
= 2 (B)1 xz
(xx )1 xz (B)1
= 2 (B)1 B (B)1
= 2 (B)1
1
0
= 2 xz
(xx )1 xz
Jochem de Bresser (RUG)
Week 5
59/84
March 2016
59 / 84
Sargans statistic 2 (K L)
Computation of Sargan statistic:
1 Estimate the equation y = z0 + by means of 2SLS (with x as
i
i
i
i
instruments). Compute the residuals b
i = yi zi0 b2SLS .
2
Regress b
i on xi and compute the Sargan statistic as
n R2
Week 5
60/84
March 2016
60 / 84
Week 5
61/84
March 2016
61 / 84
bOLS
b2SLS
H0 : E ( z i i ) = 0
consistent
consistent
plim b2SLS bOLS = 0
inconsistent
consistent
plim b2SLS bOLS 6= 0
Week 5
62/84
March 2016
62 / 84
Tricky
bit: estimating
h
i
h
i
h
i
h
i
b
V 2SLS bOLS = V b2SLS + V bOLS 2 Cov b2SLS , bOLS
b
Usual assumption:
one of the
h
i estimators
h
i is efficient
h
i (OLS efficient)
Then: V b2SLS bOLS = V b2SLS V bOLS
Heteroscedasticity-robust
versions
of Hausman test
h
i
Week 5
63/84
March 2016
63 / 84
Week 5
64/84
March 2016
64 / 84
GMM in practice
Week 5
65/84
March 2016
65 / 84
Week 5
66/84
March 2016
66 / 84
Week 5
67/84
March 2016
67 / 84
Week 5
68/84
March 2016
68 / 84
Second stage:
yi = 0 + 1 ri + 2 qi + ui
= 0 + 1 ri + i
Week 5
69/84
March 2016
69 / 84
Week 5
70/84
March 2016
70 / 84
Estimated equation:
yi = 0 + 1 ri + i
Success measured by presence in catalogues:
Week 5
71/84
March 2016
71 / 84
Estimated equation:
yi = 0 + 1 ri + i
Success measured by critics reviews:
Week 5
72/84
March 2016
72 / 84
First stage:
ri = 0 + 1 firsti + 2 femalei + 3 latei + i
where firsti = 1{i was first to perform in competition};
femalei = 1{i is female};
latei = 1{i was second to play on particular evening}
Test for relevance: test for joint significance of instruments
Week 5
73/84
March 2016
73 / 84
Week 5
74/84
March 2016
74 / 84
Week 5
75/84
March 2016
75 / 84
Musicians who play during first evening rank 3 places lower on average
Females are ranked 2 places lower on average
Those who perform second gain 1 rank
Though the order of performing is random, it does affect the final
ranking
Jury hears concerto for first time too maybe they get less severe as
competition unfolds
Week 5
76/84
March 2016
76 / 84
Week 5
77/84
March 2016
77 / 84
Week 5
78/84
March 2016
78 / 84
Week 5
79/84
March 2016
79 / 84
Week 5
80/84
March 2016
80 / 84
Week 5
81/84
March 2016
81 / 84
Week 5
82/84
March 2016
82 / 84
Week 5
83/84
March 2016
83 / 84
Week 5
84/84
March 2016
84 / 84