Professional Documents
Culture Documents
James B. McDonald
Brigham Young University
9/29/2010
Explanatory Variables
A. Basic Concepts
E( y 1 )
E( y 2 )
E(y) =
E( y n )
2. The variance of the vector y is defined by
y1 - 1
2
E( y 1 - 1 ) E( y 1 - 1 )( y 2 - 2 ) ... E( y 1 - 1 )( y n - n )
2
E( y 2 - 2 )( y 1 - 1 ) E( y 2 - 2 ) . . . E( y 2 - 2 )( y n - n )
. . .
=
. . .
. . .
2
E( y n - n )( y 1 - 1 ) E( y n - n )( y 2 - 2 ) ... E( y n - n )
1 -1
- (y- ) (y- )
e2
f(y; , ) = n 1
.
(2 ) | | 2 2
1 1
- ( y1- 1) 2 ( y1- 1)
e2
f( y1 ; 1 , )= 1 1
2
(2 ) 2 ( )2
-(y1- 1)2
e 2 2
= .
2
2
NOTE:
Verify that
III 4
1 1
(a) ,..., =
n n
1
n
1 1
(b) ,..., 2
I = 2
/n .
n n
1
n
III 5
Note that βi can be interpreted as the marginal impact of a unit increase in xi on the
expected value of y.
. . . .
. . . .
(3) . . . .
y = Xβ + ε
1 1
2 2
= and = .
k n
compactly as
y = Xβ + ε.
C. Estimation
We will derive the least squares, MLE, BLUE and instrumental variables estimators in
this section.
1. Least Squares:
y = Xβ + ε
= Xβˆ + e = Y
ˆ +e
where Ŷ = X βˆ is an nx1 vector of predicted values for the dependent variable and
e denotes a vector of residuals or estimated errors.
The sum of squared errors is defined by
n
ˆ =
SSE(β) 2
et
t=1
e1
e2
= (e1 , e2 , , en )
en
=ee
= (y - Xβ)ˆ (y - Xβ)ˆ
= y y - βˆ X y - y Xβˆ + βˆ X Xβˆ
= y y - 2βˆ X y + βˆ X Xβˆ .
ˆ A
The least squares estimator of β is defined as the β̂ which minimizes SSE (β).
necessary condition for SSE(β)ˆ to be a minimum is that
ˆ
dSSE(β)
=0 (see Appendix A for how to differentiate a real
dβˆ
valued function with respect to a vector)
ˆ
dSSE(β)
= -2X y + 2X Xβˆ = 0 or
ˆ
dβ
III 8
1
- (y-X ) (y-X )
e2 2
= 1
(2 ) n/ 2 | 2
I| 2
(y-X ) (y-X ) / 2 2
e
= .
n 2 2 n2
(2 ) ( )
(y- X ) (y- X ) n n
= ln L = - 2
- ln 2 - ln 2
2 2 2
1 n n
= 2
(y- X ) (y- X ) - ln 2 - ln 2
2 2 2
1 n n 2
= 2
y' y - 2 'X' Y 2 ' X ' X - ln 2 - ln
2 2 2
(y - X ) (y - X ) n 1
2
= - 2
=0
2 2
2
2( )
III 9
i.e.,
= (X X ) -1 X'y
2 = 1 (y - X ) (y - X )
n
2
ee et
= =
n n
.
NOTE: (1) =ˆ
(2) 2
is a biased estimator of ζ2; whereas,
2 1 (y - X ) (y - X ) SSE
s = ee= =
n- k n-k n-k
X’e = 0
function
n SSE
- 1 ln(2 ) ln
2 n
III 10
3. BLUE ESTIMATORS OF β, β .
estimator. We first consider the desired properties and then derive the associated
estimator.
~
Linear: = Ay where A is a kxn matrix of constants
~
Unbiased: E( ) = AE(y) = AX
~
We note that E( ) = A X = requires AX = I.
Minimum Variance:
Var(βi) = A i Var(y) A i
= ζ2AiAi'
Thus, the construction of BLUE is equivalent to selecting the matrix A so that the
rows of A
Min AiAi' i = 1, 2, . . ., k
s.t. AX = I
or equivalently, min Var(βi) s.t. AX = I (unbiased).
NOTE: (1)
β = β = βˆ = (X X ) -1 X y
1
(2) AX X 'X X 'X I ; thus β is unbiased
III 12
which is equivalent to X’e=0 which are also known as the normal equations in the
OLS framework and yields the OLS estimator by solving
X 'e X ' Y X ˆ 0
for ˆ .
1
Z'Y Z' X Z ; hence, β̂ z ZX Zy.
measurement error. In this case OLS will yield biased and inconsistent
estimators; whereas, instrumental variables can yield consistent estimators.
NOTE: (1) The motivation for the selection of the instruments (Z) is
ZX Z
(2) If Lim is nonsingular and Lim = 0 , then
n n n n
β̂ z is a consistent estimator of β.
.
The Stata command for the instrumental variables estimator
is given by
ivregress 2sls depvar (varlist_1 =varlist_iv)
[varlist_2]
where estimator = 2sls, gmm, or liml with
2sls is the default estimator
Recall that under the assumptions (A.1) – (A.5) y ~ N(Xβ, = ζ2I) and
β̂ = β = β = (X X ) -1 X y;
where A = (X'X)-1X'.
AXβ = (X'X)-1X'Xβ = β
ζ2AA' = ζ2(X'X)-1X'((X'X)-1X')'
= ζ2(X'X)-1X'X((X'X)-1)'
= ζ2((X'X)-1)'
= ζ2((X'X)')-1
= ζ2(X'X)-1.
Δ
1
Therefore β̂ = β = β ~ N β; 2
XX
NOTE: (1) ζ2(X'X)-1 can be shown to be the Cramer-Rao matrix, the matrix
of lower bounds for the variances of unbiased estimators.
Δ
(2) β̂, β, β, are
unbiased
consistent
estimators
normally distributed
s2(X'X)-1
programs.
. reg y x
. estat vce
(n - k)s 2 2
2
~ (n - k)
NOTE: This can be proven using the theorem (II'.A.4(b)) and noting that
ˆ (Y - Xβ)
(n- k)s 2 = e e = (Y - Xβ) ˆ .
= (X + ) (I - X(X X ) -1 X )(X + )
= ε'(I - X(X'X)-1X')ε.
(n- k)s 2 -1
Therefore, 2
= (I - X(X X ) X )
= M
(n- k)s 2 2
2
~ (n- k) because
E. Statistical Inference
1. Ho: β2 = β3 = . . . = βk = 0
This hypothesis tests for the statistical significance of overall explanatory power
of the explanatory variables by comparing the model with all variables included to
the model without any of the explanatory variables, i.e., yt = β1 + εt (all non-
intercept coefficients = 0). Recall that the total sum of squares (SST) can be
partitioned as follows:
N N N
( y t - y )2 = ( yt - ŷt )2 + ( ŷt - y )2 or
t =1 t =1 t =1
Dividing both sides of the equation by ζ2 yields quadratic forms, each having a
chi-square distribution:
SST SSE SSR
2
= 2 + 2
ζ ζ ζ
χ2(n - 1) = χ2(n - k) + χ2(k - 1).
SSR
2
(K -1)(n- K)
F= K 1 = 2
~ F(K - 1, n - K)
SSE (n- K)(K -1)
n K
2
SSR SSR SSR/SST
NOTE: (1) = = = R 2
SSE SST - SSR SSR 1- R
1-
SST
hence, the F-statistic for this hypothesis can also be rewritten as
2
R
2
k -1 n-k R
F= 2
= ~ F(k - 1, n - k).
(1 - R ) /(n - k) k -1 1- R2
follows:
where the ratio of the model and error MSE’s yields the F statistic just discussed.
Additionally, remember that the adjusted R2 ( R 2 ), defined by
2 ( e 2t ) /(n- K)
R =1- 2
,
(Y t - Y) /(n - 1)
will only increase with the addition of a new variable if the t-statistic associated with
the new variable is greater than 1 in absolute value. This result follows from the
equation
2 2
(n 1) SSENew ˆ 0
2 2 New _ var
RNew ROld 1 where the last
n k n K 1 SST sˆ
New _ var
term in the product is t 2 1 and K denotes the number of coefficients in the “old”
regression model and the “new” regression model includes K+1 coefficients.
The Lagrangian Multiplier (LM) and likelihood ratio (LR) tests can also be
used to test this hypothesis where
LM NR 2 ~ a 2
(k 1)
LR N ln 1 R 2 ~ a 2
(k 1)
III 19
Recall that
β̂ ~ N (β; ζ (X X ) -1)
where
2ˆ
β1 βˆ 1βˆ 2 βˆ 1βˆ k
2ˆ
βˆ 2 βˆ 1 β2 βˆ 2 βˆ k
2 1
XX
2ˆ
βˆ k βˆ 1 βˆ k βˆ 2 βk
βˆ i - β i
0
~ t(n - k)
s β̂ i
Recall that
β̂ ~ N (β; ζ 2 (X X ) -1) ;
therefore,
δ βˆ ~ N (δβ; δ ζ 2 (X X) -1δ)
involves running one regression and estimating the variance of δ βˆ from s2(X'X)-1
a. Introduction
(e.g., Ho: β3 = 6), and testing the validity of a linear constraint on the coefficients
III 21
(Ho: δ’β = γ). In this section we will consider how more general tests can be
performed. The testing procedures will be based on the Chow and Likelihood
ratio (LR) tests. The hypotheses may be of many different types and involve the
previous tests as special cases. Other examples might include joint hypotheses of
really valid, then goodness of fit measures such as SSE, R2 and log-likelihood
values (l) will not be significantly impacted by imposing the valid hypothesis in
estimation. Hence, the SSE, R2 or values will not be significantly different for
regression model. The tests of the validity of the hypothesis are based on
y=Xβ+ε
The Chow and likelihood ratio tests for testing Ho: g(β) = 0 can be
constructed from the output obtained from estimating the two following
regression models.
the associated sum of squared errors, R2, log-likelihood value and degrees
b. Chow test
SSE* - SSE
r
SSE ~ F(r, n - k)
n-k
Note that if the hypothesis (H0: g(β) = 0) is valid, then we would expect R2 (SSE)
and R2* (SSE*) to not be significantly different from each other. Thus, it is only
large values (greater than the critical value) of F which provide the basis for
rejecting the hypothesis. Again, the R 2 form of the Chow test is only valid if the
dependent variable is the same in the constrained and unconstrained regression.
References:
(2) Fisher, F. M., "Tests of Equality Between Sets of Coefficients in Two Linear
III 23
a 2
LR = 2( - *) (r).
The LR test is more general than the Chow test and for the case of
independent and identically distributed normal errors, with known ζ2, LR is equal
to LR = [SSE* - SSE]/ζ2 .
Recall that s2 = SSE/(n - k) appears in the denominator of the Chow test statistic
and that for large values of (n-k), s2 is "close" to ζ2; hence, we can see the
normal linear regression model, then SSE* = SST and LR can be rewritten in
a
LR = nln[1/(1-R2)] = -nln[1-R2] ~ χ2(k-1).
In this case, the Chow test is identical to the F test for overall explanatory power
discussed earlier.
Thus the Chow test and LR test are similar in structure and purpose. The
LR test is more general than the Chow test; however, its distribution is
2 2
- * n-4
= R R2 ~ F(2, n - 4)
1- R 2
a
LR = 2( - *) ~ χ2(2).
models.
and (n - k) = n1 + n2 - 2k.
Now impose the hypothesis that β(1) = β(2) = β and write (1)
as
(1) (1) (1)
y X
(2)’ y = (2)
= (2)
β + (2)
y X
2 2
- * n1 + n 2 - k ~ F (
= R R2 k , n1 + n 2 - 2 k)
1- R k
a
LR = 2( - *) ~ χ2 (k).
estat ic
(2) H1: β2 = 1
H2: β3 = 0
H3: β3 + β4 = 1
H4: β3β4 = 1
H5: β2 = 1 and β3 = 0
reg Y X2
X3 X4
estimates the unconstrained model
test X2 = 1 (Tests H1)
test X3 = 0 (Tests H2)
test X3 + X4 = 1 (Tests H3)
Stata. To change the confidence level, use the “level” option as follows:
F. Stepwise Regression
significance and not according to any theoretical reason. While stepwise regression can be
forward selection, a stepwise regression will add one independent variable at a time to see
if it is significant. If the variable is significant, it is kept in the model and another variable
insignificant, it is not included in the model. This process continues until no additional
Forward:
Backward:
where the “#” in “pr(#)” is the significance level at which variables are removed, as
0.051, and the “#” in “pe(#)” is the significance level at which variables are entered or
added to the model. If pr(#1) and pr(#2) are both included in a stepwise regression
command, #1 must be greater than #2. Also, “depvar” represents the dependent variable,
“forced_indepvars” represent the independent variables which the user wishes to remain
in the model no matter what their significance level may be, and “other_indepvars”
represents the other independent variables which the stepwise regression will consider
including or excluding. Forward and backward stepwise regression may yield different
results.
G. Forecasting
Let yt = F(Xt, β) + εt
denote the stochastic relationship between the variable yt and the vector of variables Xt
ˆ ,
Forecasts are generally made by estimating the vector of parameters β(β)
discussed later.)
Yt
Xt
FE = yt - ŷt = yt - F(Xt,β) = εt
2
FE = Variance(FE)
= Var(εt) = ζ2.
Pr [F (X t, β) - t ( ζ < y t < F (X t, β) + t (
/ 2) ζ] = 1 - α
/ 2)
Yt
Xt
III 33
3. Uncertainty about β
Assume F(Xt, β) = Xtβ in the model yt = F(Xt, β) + εt, then the predicted
yˆ t = X t βˆ ,
and the variance of yˆ t (sample regression line), 2
ŷ t is given by
2 ˆ
ζ ŷ t = X t Var (β) X t ,
with the variance of the forecast error (actual y) given by:
2
FE = ζ 2 + ζ 2 ŷ t .
2
Note that FE takes account of the uncertainty associated with the unknown
regression line and the error term and can be used to construct confidence
intervals for the actual value of Y rather than just the regression line.
Yt
Xt
III 35
Some students have found the following table facilitates their understanding of the different confidence intervals for the
population regression line and actual value of Y. The column for the estimated coefficients is only included to compare
Distribution 2 1 2 2 1
N , X 'X N Xt , Yˆt
Xt ( X 'X ) X t' N 0, 2
FE
2 2
Yˆt
ˆ Xt ˆ Xt 1 Pr t
FE 0
t
=
t-stat 1 Pr t i i
t 1 Pr t t
/2
sFE
/2
/2 /2 /2 /2
sˆ sYˆ
i t
Pr FE t sFE 0 FE t sFE =
= Pr ˆi t s ˆ i
ˆ
i t sˆ Pr X t ˆ t sYˆ Xt X t ˆ t sYˆ 2 2
i i
Pr X t ˆ t sFE X t ˆ t sFE
2 2 2 2
Yt
2 2
where sYˆ is used to compute confidence intervals for the regression line ( E Yt X t ) and sFE is used in the calculation of
2 2 2 2
confidence intervals for the actual value of Y. Recall that s FE s s ; hence, s
Yˆ FE > sY2ˆ and the confidence intervals for
Y are larger than for the population regression line.
III 36
5. Uncertainty about X. In many situations the value of the independent variable also
needs to be predicted along with the value of y. Not surprisingly, a “poor” estimate of
Xt will likely result in a poor forecast for y. This can be represented graphically as
follows:
Yt
Y t
X t
Xt
X̂ t
One way to explore the predictive ability of a model is to estimate the model on a
subset of the data and then use the estimated model to predict known outcomes which
= βˆ t + βˆ 2G t + βˆ 3M t
where yt, Gt, Mt denote GDP, government expenditure, and money supply.
Assume that
III 37
10 5 2
2 -1
s (X X ) = 5 20 3 10-3 , s2 = 10 .
2 3 15
10
ˆ
yˆ t = X t β = (1, 100, 200) 2.5
6
= 10 + 250 + 1200 = 1460.
10 5 2 1
2 -1
s ŷt = Xt (s2 (X X ) ) Xt = (1, 100, 200) 5 20 3 100 .10-3
2 3 15 200
= 921.81
s ŷt = 30.30
2 2
sFE = s + s ŷt = 10 + 921.81 = 931.81
SFE = 30.53
a) The data file should include values for the explanatory variables
c) Use the predict command, picking the name you want for the predictions, in
III 38
These commands result in the calculation and reporting of Y, Ŷ, e, sFE and
sYˆ for observations 1 through n2. The predictions will show up in the Data
Editor of STATA under the variable names you picked (in this case, yhat,
e, sfe and syhat).
2 2
s ŷt = sFE - s2
III 39
Theory
OBJECTIVE: The objective of problems 1 & 2 is to demonstrate that the matrix equations and
summation equations for the estimators and variances of the estimators are equivalent.
n
Remember Xt NX and Don’t get discouraged!!
t 1
Y1 1 X1 ε1
Y2 1 X2 ε2
(1)’ = 1
+
2
Yn 1 Xn εn
(1)” Y = Xβ + ε
ˆ
is ˆ = (X X )-1 X Y .
1
The least squares estimator of
ˆ
2
Var( ˆ 1) Cov(ˆ 1 , ˆ 2)
Var( ˆ ) = = 2
(X X )-1
Cov(ˆ 2 , ˆ 1) Var( ˆ 2)
N NX NY
a. XX= and X ' Y N
NX
2 X tYt
Xt t 1
III 40
b. ˆ = ( X t Y t - N X Y ) / ( X t 2 - N X 2)
2
c. ˆ =Y-ˆ X
1 2
d. Var( ˆ 2) = 2
/ ( X t 2 - N X 2)
2
1 X
e. Var( ˆ 1) = 2
+ 2 2
n Xt - N X
= Var( Y) + X 2 Var( ˆ 2)
f. Cov(ˆ 1 , ˆ 2) = - X Var( ˆ 2)
(JM II’-A, JM Stats)
Y1 X1 ε1
Y2 X2 ε2
where Y ,X = ,ε
Yn Xn εn
b. Using the matrices in 2(a), evaluate (X X )-1 X Y and compare your answer with
the results obtained in question 4 in Problem Set 2.1.
Applied
b. What is the estimated increase in price for a house with one more bedroom, holding
square footage constant?
c. What is the estimated increase in price for a house with an additional bedroom that is 140
square feet in size? Compare this to your answer in part (ii).
d. What percentage variation in price is explained by square footage and number of
bedrooms?
e. The first house in the sample has sqrft = 2,438 and bdrms = 4. Find the predicted selling
price for this house from the OLS regression line.
f. The actual selling price of the first house in the sample was $300,000 (so price = 300).
Find the residual for this house. Does it suggest that the buyer underpaid or overpaid for
the house?
III 42
Theory
2 SSR SSE
R = =1-
SST SST
2 2
where SSE = et 2 and SST = (Y t - Y) , SSR = (Ŷ t - Y ) .
a. Demonstrate that 0 R2 1.
SSE/(n- k)
d. The adjusted R 2 , R 2 , is defined by R 2 = 1 - . Demonstrate that
SST/(n-1)
1- k
R
2
R
2
1 , i.e., the adjusted R2 can be negative.
n- k
SSE n- 1 n- 1
(Hint : 1 - R 2 = = (1 - R 2))
SST n- k n- k
e. Verify that
SSE* - SSE
LR = 2
if ζ2 is known
2. Demonstrate that
Applied
3. The following model can be used to study whether campaign expenditures affect election
outcomes:
where voteA is the percent of the vote received by Candidate A, expendA and expendB are
campaign expenditures by Candidates A and B, and prtystrA is a measure of party
strength for Candidate A (the percent of the most recent presidential vote that went to A's
party).
i) What is the interpretation of β1?
ii) In terms of the parameters, state the null hypothesis that a 1% increase in A's
expenditures is offset by a 1% increase in B's expenditures.
iii) Estimate the model above using the data in VOTE1.RAW and report the results in
the usual form. Do A's expenditures affect the outcome? What about B's
expenditures? Can you use these results to test the hypothesis in part (ii)?
iv) Estimate a model that directly gives the t statistic for testing the hypothesis in part
(ii). What do you conclude? (Use a two sided alternative.). A possible approach,
test H 0 : 1 2 D , plug D 2 for 1 and simplify to obtain
You can check your results by constructing the “high-tech” t-test or by using the
Stata command, test ln(expendA) + ln(expendB) =0 following the estimation of
the unconstrained regression model.
(Wooldridge C. 4.1)
Y = eβ1 + β 2 t + β3 L + β 4 K Lβ5Kβ6
a. What restrictions on the transcendental production function result in a Cobb-Douglas
production function?
b. Estimate the transcendental production function using the data in problem 2 and use the Chow
and LR tests to compare it with the Cobb-Douglas production function.
(JM II)
III 47
APPENDIX A
Some important derivatives:
x1 a1 a11 a12
Let X = , a= , A=
x2 a2 a 21 a 22
(symmetric) (a12 = a 21 = a )
d (a X) d (X a)
1. = =a
dX dX
d (X AX)
2. = 2 AX
dX
d (X a)
Proof of =a
dX
d (X a) (X a) / X1 a1
= = =a
dX (X a) / X2 a2
d (X AX)
Proof of = 2 AX
dX
d (X AX) (X a) / X1 2 a11 x1 + 2a x 2
= =
dX (X AX) / X2 2a x1 + 2 a 22 x 2
a11 x1 + a x 2
=2
a x1 + a 22 x 2
a11 a x1
=2
a a 22 x2
= 2 AX.
III 48
APPENDIX B
1
2
s = (y (I - X (X X )-1 X ) y) = SSE/(n- k) .
n- k
n
tr (A) = a ii
i
1) tr(I) = n
5) tr(ABC) = tr(CAB)
6) tr(kA) = k tr(A)
2 1
ζ̂ = ee
n
1
and s2 = ee
n-k
e = y - Xβˆ = y - X ( X X ) -1 X y = My
= M (Xβ + ε) = MXβ + Mε ,
= Mε ,
where M = I - X(X’X)-1X’.
1
= ε MMε .
n
1
= ε Mε .
n
1
and s 2 = ε Mε .
n-k
1 1
E (ζ̂ 2) = E (ε Mε) = E (tr(ε Mε)) because cov (ε i, ε j) = 0, i j)
n n
1 1
= Etr (M εε ) = tr (ME (εε ))
n n
1 1
= tr (M ζ 2I) = tr (ζ 2 M)
n n
2
ζ
= tr(M)
n
2
ζ -1
= tr(I - X(X X ) X )
n
2
ζ -1
= (n - tr (X(X X ) X ))
n
2
ζ -1
= (n - tr (X X(X X ) ))
n
2
ζ
= (n - trace (I k ))
n
2
ζ
= (n - k)
n
n-k 2 n
= 2
ζ so E (s ) = E (ζˆ 2) = ζ 2 .
n n-k
n
Therefore ζ̂ 2 is biased, but E (s 2) = E (ζˆ 2) = ζ 2 and s2 is unbiased.
n-k
III 50
APPENDIX C
β = AY = (X X) X Y is BLUE.
Proof: Let β i = A i Y where Ai denotes the ith row of the matrix A. Since the result will be
symmetric for each βi (hence, for each Ai), denote Ai by a’ where a is a (n by 1) vector.
or min a’Ia
= 2a I + λ X = 0
a
= (X a - i) = 0 .
λ
This implies
a = (-1/ 2)λ X ) .
Now substitute a = (-½)Xλ into the expression for = 0 and we obtain
λ
(-1/ 2) X X λ = i
λ = - 2 (X X ) -1 i
a = (-1 / 2) (-2) i (X X )-1 X
= i (X X )-1 X = Ai .
which implies
A = (X X )-1 X
-1
hence, β = (X X ) X y .