285 views

Uploaded by Taylór Archibald

- Ch2
- 1 Heteros Ce Dasti City
- cointegracion MacKinnon,_et_al_1996.pdf
- Curve fitting with the R Environment
- MPRA Paper 23069
- Multiple Linear Regression
- EC220 2017 Paper ST
- Probabilistic Modelling of Overflow, Surcharge and Flooding in Urban Drainage
- Economics
- Starbucks Case Analysis
- Undergraduate
- Chapter1-Econometrics-IntroductionToEconometrics
- The Quarterly Journal of Economics 2010 Lileeva 1051 99
- Station a Rity of Beta Illinois
- 01 Regression Analysis
- 9040-eng
- Network Bucket Testing
- File English
- Sampling Methods
- How to Estimate Spatial Contagion Between Financial Markets

You are on page 1of 50

James B. McDonald

Brigham Young University

9/29/2010

Explanatory Variables

A. Basic Concepts

E( y 1 )

E( y 2 )

E(y) =

E( y n )

2. The variance of the vector y is defined by

Cov( y 2 , y 1 ) Var( y 2 ) Cov( y 2 , y n )

Var(y) =

NOTE: Let μ = E(y), then

y1 - 1

yn - n

III 2

2

E( y 1 - 1 ) E( y 1 - 1 )( y 2 - 2 ) ... E( y 1 - 1 )( y n - n )

2

E( y 2 - 2 )( y 1 - 1 ) E( y 2 - 2 ) . . . E( y 2 - 2 )( y n - n )

. . .

=

. . .

. . .

2

E( y n - n )( y 1 - 1 ) E( y n - n )( y 2 - 2 ) ... E( y n - n )

Cov( y 2 , y 1 ) Var( y 2 ) ... Cov( y 2 , y n )

. . .

= .

. . .

. . .

Cov( y n , y 1 ) Cov( y n , y 2 ) ... Var( y n )

normal with mean vector μ and variance covariance matrix (denoted y ~

N(μ, )) if the probability density function of y is given by

1 -1

- (y- ) (y- )

e2

f(y; , ) = n 1

.

(2 ) | | 2 2

1 1

- ( y1- 1) 2 ( y1- 1)

e2

f( y1 ; 1 , )= 1 1

2

(2 ) 2 ( )2

-(y1- 1)2

e 2 2

= .

2

2

matrix of constants.

III 3

where m = Rank(A) = trace (A).

c. If y ~ N(0,I) and L is a k x n matrix of rank k, then Ly and y'Ay are

independently distributed if LA = 0.

d. If y ~ N(0,I), then the idempotent quadratic forms y'Ay and y'By are

independently distributed χ2 variables if AB = 0.

NOTE:

E(z) = E(Ay) = AE(y) = Aµy

VAR(z) = E[(z - E(z))(z - E(z))']

= E[(Ay - Aµy)(Ay - Aµy)']

= E[A(y - µy)(y - µy)'A']

= AE[(y - µy)(y - µy)']A'

= AΣyA' =Σ z

N(μ,ζ2).

y1 2

...0

. .

y ~N , . .

. . 2

0. . .

yn

1 1 1 1 2

y = y1 + ... + y n = , . . . y ~ N( , /n) .

n n n n

Verify that

III 4

1 1

(a) ,..., =

n n

1

n

1 1

(b) ,..., 2

I = 2

/n .

n n

1

n

III 5

Note that βi can be interpreted as the marginal impact of a unit increase in xi on the

expected value of y.

. . . .

. . . .

(3) . . . .

y = Xβ + ε

III 6

y2 individual variables.

x 21 x 2k

y= X= rows: may represent

observations at a given point

yn x n1 x nk in time.

(nx1) (nxk)

1 1

2 2

= and = .

k n

compactly as

y = Xβ + ε.

XX

Limit = x is nonsingular.

n n

III 7

C. Estimation

We will derive the least squares, MLE, BLUE and instrumental variables estimators in

this section.

1. Least Squares:

y = Xβ + ε

= Xβˆ + e = Y

ˆ +e

where Ŷ = X βˆ is an nx1 vector of predicted values for the dependent variable and

e denotes a vector of residuals or estimated errors.

The sum of squared errors is defined by

n

ˆ =

SSE(β) 2

et

t=1

e1

e2

= (e1 , e2 , , en )

en

=ee

= (y - Xβ)ˆ (y - Xβ)ˆ

= y y - βˆ X y - y Xβˆ + βˆ X Xβˆ

= y y - 2βˆ X y + βˆ X Xβˆ .

ˆ A

The least squares estimator of β is defined as the β̂ which minimizes SSE (β).

necessary condition for SSE(β)ˆ to be a minimum is that

ˆ

dSSE(β)

=0 (see Appendix A for how to differentiate a real

dβˆ

valued function with respect to a vector)

ˆ

dSSE(β)

= -2X y + 2X Xβˆ = 0 or

ˆ

dβ

III 8

- 1 (y-X ) -1

(y-X )

2 e 2

L(y; , = I) = 1

(2 ) n/ 2 | | 2

1

- (y-X ) (y-X )

e2 2

= 1

(2 ) n/ 2 | 2

I| 2

(y-X ) (y-X ) / 2 2

e

= .

n 2 2 n2

(2 ) ( )

(y- X ) (y- X ) n n

= ln L = - 2

- ln 2 - ln 2

2 2 2

1 n n

= 2

(y- X ) (y- X ) - ln 2 - ln 2

2 2 2

1 n n 2

= 2

y' y - 2 'X' Y 2 ' X ' X - ln 2 - ln

2 2 2

The MLE. of β and ζ are defined by the two equations (necessary conditions for a

maximum):

1

= (-2 X y + 2(X X) ) = 0

β 2

2

(y - X ) (y - X ) n 1

2

= - 2

=0

2 2

2

2( )

III 9

i.e.,

= (X X ) -1 X'y

2 = 1 (y - X ) (y - X )

n

2

ee et

= =

n n

.

NOTE: (1) =ˆ

(2) 2

is a biased estimator of ζ2; whereas,

2 1 (y - X ) (y - X ) SSE

s = ee= =

n- k n-k n-k

Only n-k of the estimated residuals are independent. The

necessary conditions for least squares estimates impose k

restrictions on the estimated residuals (e). The restrictions

are summarized by the normal equations X'X ˆ = X'y, or

equivalently

X’e = 0

function

n SSE

- 1 ln(2 ) ln

2 n

III 10

III 11

3. BLUE ESTIMATORS OF β, β .

estimator. We first consider the desired properties and then derive the associated

estimator.

~

Linear: = Ay where A is a kxn matrix of constants

~

Unbiased: E( ) = AE(y) = AX

~

We note that E( ) = A X = requires AX = I.

Minimum Variance:

Var(βi) = A i Var(y) A i

= ζ2AiAi'

Thus, the construction of BLUE is equivalent to selecting the matrix A so that the

rows of A

Min AiAi' i = 1, 2, . . ., k

s.t. AX = I

or equivalently, min Var(βi) s.t. AX = I (unbiased).

The details of this derivation are contained in Appendix C.

NOTE: (1)

β = β = βˆ = (X X ) -1 X y

1

(2) AX X 'X X 'X I ; thus β is unbiased

III 12

Method of moments parameter estimators are selected to equate sample and

corresponding theoretical moments. The open question is what theoretical

moments should be considered and what are the corresponding sample moments.

With the regression model we might consider the following theoretical moments

which follow from the underlying theoretical assumptions:

(A.2) E t 0

(A.5) Cov X it , t 0

The sample moment associated with (A.2) is

n

et / n e 0

t 1

n n n

X it Xi et e /n X it X i et / n X it et / n 0

t 1 t 1 t 1

1 1 ... 1 e1

x12 x22 ... xn 2 e2

/n X 'e / n 0

. . ... . .

x1k x2 k ... xnk en

which is equivalent to X’e=0 which are also known as the normal equations in the

OLS framework and yields the OLS estimator by solving

X 'e X ' Y X ˆ 0

for ˆ .

y = Xβ + ε

1

Z'Y Z' X Z ; hence, β̂ z ZX Zy.

variables on the right hand side include “endogenous” variables or in the case of

III 13

measurement error. In this case OLS will yield biased and inconsistent

estimators; whereas, instrumental variables can yield consistent estimators.

NOTE: (1) The motivation for the selection of the instruments (Z) is

ZX Z

(2) If Lim is nonsingular and Lim = 0 , then

n n n n

β̂ z is a consistent estimator of β.

estimation using the formula R2 = 1 – SSE/SST. Since this

can be negative, there is not a natural interpretation of R2

for instrumental variables estimators. Further, the R2 can’t

be used to construct F-statistics for IV estimators.

with the X’s), then the variances of the IV estimator can

be large and the corresponding asymptotic biases can be

large if the Z and error are correlated. This can be

seen by noting that the bias of the instrumental variables

estimator is given by

1

E Z ' X / n ( Z ' / n) .

Δ

(5) As a special case, if Z = X, then βˆ = βˆ = βˆ = β = β .

z x

(6) If Z is an n x k* matrix where k< k* (Z contains more

variables than X), then the IV estimator defined above must

be modified. The most common approach in this case is to

replace Z in the “IV” equation by the projections** of X on

the columns of Z, i.e. Xˆ Z Z ' Z Z ' X .

1

1

IV Xˆ ' X Xˆ ' Y

1 1 1

X 'Z Z 'Z Z'X X 'Z Z 'Z Z 'Y

which yields estimates for k k* .

III 14

.

The Stata command for the instrumental variables estimator

is given by

ivregress 2sls depvar (varlist_1 =varlist_iv)

[varlist_2]

where estimator = 2sls, gmm, or liml with

2sls is the default estimator

ivregres 2sls y1 (y2=z1 z2 z3) x1 x2 x3

Ivregress 2sls y1 (y2 x1 x2 x3=z1 z2 z3)

variables on the set of instrumental variables. This can be

thought of as being of the form

estimates of

in the "reduced form" equation X Z V to yield

ˆ 1

Z ' Z Z ' X ; hence, the estimate of X is given by

Xˆ Zˆ

1

Z Z 'Z Z 'X

Δ

D. Distribution of β̂, β , β

Recall that under the assumptions (A.1) – (A.5) y ~ N(Xβ, = ζ2I) and

β̂ = β = β = (X X ) -1 X y;

Δ

β̂ = β = β ~ N(A y A y A ) = N[Ax , A 2

IA ]

III 15

where A = (X'X)-1X'.

AXβ = (X'X)-1X'Xβ = β

ζ2AA' = ζ2(X'X)-1X'((X'X)-1X')'

= ζ2(X'X)-1X'X((X'X)-1)'

= ζ2((X'X)-1)'

= ζ2((X'X)')-1

= ζ2(X'X)-1.

Δ

1

Therefore β̂ = β = β ~ N β; 2

XX

NOTE: (1) ζ2(X'X)-1 can be shown to be the Cramer-Rao matrix, the matrix

of lower bounds for the variances of unbiased estimators.

Δ

(2) β̂, β, β, are

unbiased

consistent

estimators

normally distributed

s2(X'X)-1

III 16

programs.

. reg y x

. estat vce

(n - k)s 2 2

2

~ (n - k)

NOTE: This can be proven using the theorem (II'.A.4(b)) and noting that

ˆ (Y - Xβ)

(n- k)s 2 = e e = (Y - Xβ) ˆ .

= (X + ) (I - X(X X ) -1 X )(X + )

= ε'(I - X(X'X)-1X')ε.

(n- k)s 2 -1

Therefore, 2

= (I - X(X X ) X )

= M

(n- k)s 2 2

2

~ (n- k) because

III 17

E. Statistical Inference

1. Ho: β2 = β3 = . . . = βk = 0

This hypothesis tests for the statistical significance of overall explanatory power

of the explanatory variables by comparing the model with all variables included to

the model without any of the explanatory variables, i.e., yt = β1 + εt (all non-

intercept coefficients = 0). Recall that the total sum of squares (SST) can be

partitioned as follows:

N N N

( y t - y )2 = ( yt - ŷt )2 + ( ŷt - y )2 or

t =1 t =1 t =1

Dividing both sides of the equation by ζ2 yields quadratic forms, each having a

chi-square distribution:

SST SSE SSR

2

= 2 + 2

ζ ζ ζ

χ2(n - 1) = χ2(n - k) + χ2(k - 1).

SSR

2

(K -1)(n- K)

F= K 1 = 2

~ F(K - 1, n - K)

SSE (n- K)(K -1)

n K

2

SSR SSR SSR/SST

NOTE: (1) = = = R 2

SSE SST - SSR SSR 1- R

1-

SST

hence, the F-statistic for this hypothesis can also be rewritten as

2

R

2

k -1 n-k R

F= 2

= ~ F(k - 1, n - k).

(1 - R ) /(n - k) k -1 1- R2

III 18

follows:

Model SSR K-1 SSR/(K-1)

Error SSE n–K SSE/(n - K) s 2

Total SST n–1

K = number of coefficients in model

where the ratio of the model and error MSE’s yields the F statistic just discussed.

Additionally, remember that the adjusted R2 ( R 2 ), defined by

2 ( e 2t ) /(n- K)

R =1- 2

,

(Y t - Y) /(n - 1)

will only increase with the addition of a new variable if the t-statistic associated with

the new variable is greater than 1 in absolute value. This result follows from the

equation

2 2

(n 1) SSENew ˆ 0

2 2 New _ var

RNew ROld 1 where the last

n k n K 1 SST sˆ

New _ var

term in the product is t 2 1 and K denotes the number of coefficients in the “old”

regression model and the “new” regression model includes K+1 coefficients.

The Lagrangian Multiplier (LM) and likelihood ratio (LR) tests can also be

used to test this hypothesis where

LM NR 2 ~ a 2

(k 1)

LR N ln 1 R 2 ~ a 2

(k 1)

III 19

Recall that

β̂ ~ N (β; ζ (X X ) -1)

where

2ˆ

β1 βˆ 1βˆ 2 βˆ 1βˆ k

2ˆ

βˆ 2 βˆ 1 β2 βˆ 2 βˆ k

2 1

XX

2ˆ

βˆ k βˆ 1 βˆ k βˆ 2 βk

2 1

sβˆ 2βˆ1 s 2βˆ 2 sβˆ 2βˆ k

s XX

βˆ i - β i

0

~ t(n - k)

s β̂ i

N(0,1)

~ t(d)

2

(d) /d

βˆ - β i (n - k) 2 2

since i ~ N(0,1) and 2 s β̂ i ~ χ (n - k).

ζ β̂i ζ β̂ i

III 20

k

β1

δ iβ i = (δ1,...,δ k ) = δ β.

=1

βk

H0: δ'β = γ.

Recall that

β̂ ~ N (β; ζ 2 (X X ) -1) ;

therefore,

δ βˆ ~ N (δβ; δ ζ 2 (X X) -1δ)

hence, = ~ t(n - k).

-1 2

δ' s 2(X,X) δ s δ'βˆ

involves running one regression and estimating the variance of δ βˆ from s2(X'X)-1

a. Introduction

(e.g., Ho: β3 = 6), and testing the validity of a linear constraint on the coefficients

III 21

(Ho: δ’β = γ). In this section we will consider how more general tests can be

performed. The testing procedures will be based on the Chow and Likelihood

ratio (LR) tests. The hypotheses may be of many different types and involve the

previous tests as special cases. Other examples might include joint hypotheses of

really valid, then goodness of fit measures such as SSE, R2 and log-likelihood

values (l) will not be significantly impacted by imposing the valid hypothesis in

estimation. Hence, the SSE, R2 or values will not be significantly different for

regression model. The tests of the validity of the hypothesis are based on

y=Xβ+ε

The Chow and likelihood ratio tests for testing Ho: g(β) = 0 can be

constructed from the output obtained from estimating the two following

regression models.

III 22

the associated sum of squared errors, R2, log-likelihood value and degrees

and (n - k)*, respectively.

b. Chow test

SSE* - SSE

r

SSE ~ F(r, n - k)

n-k

the hypothesis. For example, if the hypothesis was Ho: β2 + 6 β5 =4, β3 = β7 = 0,

then the numerator degrees of freedom (r) is equal to 3. In applications where the

SST is unaltered by the imposing the restrictions, we can divide the numerator and

denominator by SST to yield the Chow test rewritten in terms of the change in the

R2 between the constrained and unconstrained regressions.

2 2

- * n-k

F = R R2 ~ F(r, n - k)

1- R r

Note that if the hypothesis (H0: g(β) = 0) is valid, then we would expect R2 (SSE)

and R2* (SSE*) to not be significantly different from each other. Thus, it is only

large values (greater than the critical value) of F which provide the basis for

rejecting the hypothesis. Again, the R 2 form of the Chow test is only valid if the

dependent variable is the same in the constrained and unconstrained regression.

References:

Linear Regressions," Econometrica, 28(1960), 591-605.

(2) Fisher, F. M., "Tests of Equality Between Sets of Coefficients in Two Linear

III 23

statistics. The motivation behind the LR test is similar to that of the Chow test

except that it is based on determining whether there has been a significant

reduction in the value of the log-likelihood value as a result of imposing the

hypothesized constraints on β in the estimation process. The LR test statistic is

defined to be twice the difference between the values of the constrained and

*

unconstrained log-likelihood values (2( - )) and, under fairly general

regularity conditions, is asymptotically distributed as a chi-square with degrees of

freedom equal to the number of independent restrictions (r) imposed by the

hypothesis. This may be summarized as follows:

a 2

LR = 2( - *) (r).

The LR test is more general than the Chow test and for the case of

independent and identically distributed normal errors, with known ζ2, LR is equal

to LR = [SSE* - SSE]/ζ2 .

Recall that s2 = SSE/(n - k) appears in the denominator of the Chow test statistic

and that for large values of (n-k), s2 is "close" to ζ2; hence, we can see the

*

LR = 2 ( - )

III 24

normal linear regression model, then SSE* = SST and LR can be rewritten in

a

LR = nln[1/(1-R2)] = -nln[1-R2] ~ χ2(k-1).

In this case, the Chow test is identical to the F test for overall explanatory power

discussed earlier.

Thus the Chow test and LR test are similar in structure and purpose. The

LR test is more general than the Chow test; however, its distribution is

n SSE

= - 1 + ln(2 ) + ln ,

2 n

n-k = n - 4

III 25

III 26

SSE* - SSE SSE* - SSE

(n k)* (n k) 2 n- 4 SSE*-SSE

Chow = = =

SSE SSE 2 SSE

n k n 4

2 2

- * n-4

= R R2 ~ F(2, n - 4)

1- R 2

a

LR = 2( - *) ~ χ2(2).

models.

(1) (1) (1) (1)

y X 0

(1)' y = (2)

= (2) (2)

+ (2)

y 0 X

and (n - k) = n1 + n2 - 2k.

Now impose the hypothesis that β(1) = β(2) = β and write (1)

as

(1) (1) (1)

y X

(2)’ y = (2)

= (2)

β + (2)

y X

III 27

(n - k)* = n1 + n2 - k.

SSE* - SSE

(n - k) * - (n - k)

Chow =

SSE

(n k)

2 2

- * n1 + n 2 - k ~ F (

= R R2 k , n1 + n 2 - 2 k)

1- R k

a

LR = 2( - *) ~ χ2 (k).

a. Stata reports the log likelihood values when the command

estat ic

(2) H1: β2 = 1

H2: β3 = 0

H3: β3 + β4 = 1

H4: β3β4 = 1

H5: β2 = 1 and β3 = 0

III 28

III 29

reg Y X2

X3 X4

estimates the unconstrained model

test X2 = 1 (Tests H1)

test X3 = 0 (Tests H2)

test X3 + X4 = 1 (Tests H3)

for testing nonlinear hypotheses. The

suffix “_b”, along with the braces,

must be used when testing nonlinear

hypotheses)

Stata. To change the confidence level, use the “level” option as follows:

to 90%)

III 30

F. Stepwise Regression

significance and not according to any theoretical reason. While stepwise regression can be

forward selection, a stepwise regression will add one independent variable at a time to see

if it is significant. If the variable is significant, it is kept in the model and another variable

insignificant, it is not included in the model. This process continues until no additional

Forward:

Backward:

III 31

where the “#” in “pr(#)” is the significance level at which variables are removed, as

0.051, and the “#” in “pe(#)” is the significance level at which variables are entered or

added to the model. If pr(#1) and pr(#2) are both included in a stepwise regression

command, #1 must be greater than #2. Also, “depvar” represents the dependent variable,

“forced_indepvars” represent the independent variables which the user wishes to remain

in the model no matter what their significance level may be, and “other_indepvars”

represents the other independent variables which the stepwise regression will consider

including or excluding. Forward and backward stepwise regression may yield different

results.

G. Forecasting

Let yt = F(Xt, β) + εt

denote the stochastic relationship between the variable yt and the vector of variables Xt

ˆ ,

Forecasts are generally made by estimating the vector of parameters β(β)

ˆ .

ˆ t, β)

yˆ t = F(X

III 32

discussed later.)

Yt

Xt

FE = yt - ŷt = yt - F(Xt,β) = εt

2

FE = Variance(FE)

= Var(εt) = ζ2.

Pr [F (X t, β) - t ( ζ < y t < F (X t, β) + t (

/ 2) ζ] = 1 - α

/ 2)

Yt

Xt

III 33

3. Uncertainty about β

Assume F(Xt, β) = Xtβ in the model yt = F(Xt, β) + εt, then the predicted

yˆ t = X t βˆ ,

and the variance of yˆ t (sample regression line), 2

ŷ t is given by

2 ˆ

ζ ŷ t = X t Var (β) X t ,

with the variance of the forecast error (actual y) given by:

2

FE = ζ 2 + ζ 2 ŷ t .

2

Note that FE takes account of the uncertainty associated with the unknown

regression line and the error term and can be used to construct confidence

intervals for the actual value of Y rather than just the regression line.

ŷ t and ζ 2 FE can be easily obtained by replacing ζ2

with its unbiased estimator s2.

P R [X tβˆ - t (α/2)s FE < Y t < X tβˆ + t (α/2)s FE] = 1 - α

III 34

Yt

Xt

III 35

Some students have found the following table facilitates their understanding of the different confidence intervals for the

population regression line and actual value of Y. The column for the estimated coefficients is only included to compare

1

X 'X X 'Y

FE Yt Yˆt Yt Xt ˆ

predicted Y values corresponding to X t .

Distribution 2 1 2 2 1

N , X 'X N Xt , Yˆt

Xt ( X 'X ) X t' N 0, 2

FE

2 2

Yˆt

ˆ Xt ˆ Xt 1 Pr t

FE 0

t

=

t-stat 1 Pr t i i

t 1 Pr t t

/2

sFE

/2

/2 /2 /2 /2

sˆ sYˆ

i t

Pr FE t sFE 0 FE t sFE =

= Pr ˆi t s ˆ i

ˆ

i t sˆ Pr X t ˆ t sYˆ Xt X t ˆ t sYˆ 2 2

i i

Pr X t ˆ t sFE X t ˆ t sFE

2 2 2 2

Yt

2 2

i: i i i

2 2 2 2 2 2

where sYˆ is used to compute confidence intervals for the regression line ( E Yt X t ) and sFE is used in the calculation of

2 2 2 2

confidence intervals for the actual value of Y. Recall that s FE s s ; hence, s

Yˆ FE > sY2ˆ and the confidence intervals for

Y are larger than for the population regression line.

III 36

5. Uncertainty about X. In many situations the value of the independent variable also

needs to be predicted along with the value of y. Not surprisingly, a “poor” estimate of

Xt will likely result in a poor forecast for y. This can be represented graphically as

follows:

Yt

Y t

X t

Xt

X̂ t

One way to explore the predictive ability of a model is to estimate the model on a

subset of the data and then use the estimated model to predict known outcomes which

= βˆ t + βˆ 2G t + βˆ 3M t

where yt, Gt, Mt denote GDP, government expenditure, and money supply.

Assume that

III 37

10 5 2

2 -1

s (X X ) = 5 20 3 10-3 , s2 = 10 .

2 3 15

10

ˆ

yˆ t = X t β = (1, 100, 200) 2.5

6

= 10 + 250 + 1200 = 1460.

10 5 2 1

2 -1

s ŷt = Xt (s2 (X X ) ) Xt = (1, 100, 200) 5 20 3 100 .10-3

2 3 15 200

= 921.81

s ŷt = 30.30

2 2

sFE = s + s ŷt = 10 + 921.81 = 931.81

SFE = 30.53

a) The data file should include values for the explanatory variables

c) Use the predict command, picking the name you want for the predictions, in

III 38

predict e, resid this option predicts the residuals (e)

predict sfe, stdf this option predicts the standard

error of the forecast ( s FE )

predict syhat, stdp this option predicts the standard

error of the prediction ( sYˆ )

list y yhat sfe this option lists indicated variables

These commands result in the calculation and reporting of Y, Ŷ, e, sFE and

sYˆ for observations 1 through n2. The predictions will show up in the Data

Editor of STATA under the variable names you picked (in this case, yhat,

e, sfe and syhat).

2 2

s ŷt = sFE - s2

III 39

Theory

OBJECTIVE: The objective of problems 1 & 2 is to demonstrate that the matrix equations and

summation equations for the estimators and variances of the estimators are equivalent.

n

Remember Xt NX and Don’t get discouraged!!

t 1

equivalently,

Y1 1 X1 ε1

Y2 1 X2 ε2

(1)’ = 1

+

2

Yn 1 Xn εn

(1)” Y = Xβ + ε

ˆ

is ˆ = (X X )-1 X Y .

1

The least squares estimator of

ˆ

2

Var( ˆ 1) Cov(ˆ 1 , ˆ 2)

Var( ˆ ) = = 2

(X X )-1

Cov(ˆ 2 , ˆ 1) Var( ˆ 2)

*Hint: It might be helpful to work backwards on part c and e.

N NX NY

a. XX= and X ' Y N

NX

2 X tYt

Xt t 1

III 40

b. ˆ = ( X t Y t - N X Y ) / ( X t 2 - N X 2)

2

c. ˆ =Y-ˆ X

1 2

d. Var( ˆ 2) = 2

/ ( X t 2 - N X 2)

2

1 X

e. Var( ˆ 1) = 2

+ 2 2

n Xt - N X

= Var( Y) + X 2 Var( ˆ 2)

f. Cov(ˆ 1 , ˆ 2) = - X Var( ˆ 2)

(JM II’-A, JM Stats)

Y1 X1 ε1

Y2 X2 ε2

where Y ,X = ,ε

Yn Xn εn

b. Using the matrices in 2(a), evaluate (X X )-1 X Y and compare your answer with

the results obtained in question 4 in Problem Set 2.1.

(X X )-1 .

(JM II’-A)

Applied

where price is the house price measured in thousands of dollars, sqrft is

the floorspace measured in square feet, and bdrms is the number of bedrooms.

III 41

b. What is the estimated increase in price for a house with one more bedroom, holding

square footage constant?

c. What is the estimated increase in price for a house with an additional bedroom that is 140

square feet in size? Compare this to your answer in part (ii).

d. What percentage variation in price is explained by square footage and number of

bedrooms?

e. The first house in the sample has sqrft = 2,438 and bdrms = 4. Find the predicted selling

price for this house from the OLS regression line.

f. The actual selling price of the first house in the sample was $300,000 (so price = 300).

Find the residual for this house. Does it suggest that the buyer underpaid or overpaid for

the house?

III 42

Theory

2 SSR SSE

R = =1-

SST SST

2 2

where SSE = et 2 and SST = (Y t - Y) , SSR = (Ŷ t - Y ) .

a. Demonstrate that 0 R2 1.

careful! Show Y = Ŷ = X ˆ .)

the R2 increase, decrease, or remain unaltered? (Hint: What is the effect upon

SST, SSE?)

SSE/(n- k)

d. The adjusted R 2 , R 2 , is defined by R 2 = 1 - . Demonstrate that

SST/(n-1)

1- k

R

2

R

2

1 , i.e., the adjusted R2 can be negative.

n- k

SSE n- 1 n- 1

(Hint : 1 - R 2 = = (1 - R 2))

SST n- k n- k

e. Verify that

SSE* - SSE

LR = 2

if ζ2 is known

restricted SSE.

III 43

1

can be written as LR = n ln = - n ln(1 - R 2) .

1- R2

FYI: The corresponding Lagrangian multiplier (LM) test statistic for this

hypothesis can be written in terms of the coefficient of variation as LM NR2 .

(JM II-B)

2. Demonstrate that

b. X’e = 0 implies that the sum of estimated error terms will equal zero if regression

equation includes an intercept.

Remember: e Y Yˆ Y X ˆ

(JM II-B)

Applied

3. The following model can be used to study whether campaign expenditures affect election

outcomes:

where voteA is the percent of the vote received by Candidate A, expendA and expendB are

campaign expenditures by Candidates A and B, and prtystrA is a measure of party

strength for Candidate A (the percent of the most recent presidential vote that went to A's

party).

i) What is the interpretation of β1?

ii) In terms of the parameters, state the null hypothesis that a 1% increase in A's

expenditures is offset by a 1% increase in B's expenditures.

iii) Estimate the model above using the data in VOTE1.RAW and report the results in

the usual form. Do A's expenditures affect the outcome? What about B's

expenditures? Can you use these results to test the hypothesis in part (ii)?

iv) Estimate a model that directly gives the t statistic for testing the hypothesis in part

(ii). What do you conclude? (Use a two sided alternative.). A possible approach,

test H 0 : 1 2 D , plug D 2 for 1 and simplify to obtain

III 44

You can check your results by constructing the “high-tech” t-test or by using the

Stata command, test ln(expendA) + ln(expendB) =0 following the estimation of

the unconstrained regression model.

(Wooldridge C. 4.1)

2 40.84 66.30 139.24

3 42.83 65.27 141.64

4 43.89 67.32 148.77

5 46.10 67.20 151.02

6 44.45 65.18 143.38

7 43.87 65.57 148.19

8 49.99 71.42 167.12

9 52.64 77.52 171.33

10 57.93 79.46 176.41

β1+β 2t β β

(1) Yt = e K t 3 Lt 4 εt

where (β2t) takes account of changes in output for any reason other than a change in Lt or

Kt; εt denotes a random disturbance having the property that lnεt is distributed N(0, ζ2).

total wage receipts

Labor’s share is given by β3 if β3 + β4 (the returns to scale) is

totalsales receipts

dYt

/ Y t for fixed L and K . Taking the natural logarithm of equation(1),we obtain

dt

III 45

b. Corresponding to equation (2)

1) Test the hypothesis Ho: β2 = β3 = β4 = 0. Explain the implications of this

hypothesis. (95% confidence level)

2) perform and interpret individual tests of significance of β2, β3, and β4, i.e. test

Ho : βi = 0 .α = .05.

3) test the hypothesis of constant returns to scale, i.e., Ho: β3 + β4 = 1, using

a. a t-test for general linear hypothesis, let restrictions δ= (0,0,1,1);

b. a Chow test;

c. a LR test.

c. Estimate equation (3) and test the hypothesis that labor’s share is equal to .75, i.e., β3 =

.75.

d. Re-estimate the model (equation 2) with the first nine observations and check to see if the actual

log(output) for the 10th observation lies in the 95% forecast confidence interval.

(JM II)

a. What restrictions on the translog production function result in a Cobb-Douglas

production function?

b. Estimate the translog production function using the data in problem 5 and use the Chow and

LR tests to determine whether it provides a statistically significant improved fit to the data,

relative to the Cobb-Douglas function.

(JM II)

III 46

Y = eβ1 + β 2 t + β3 L + β 4 K Lβ5Kβ6

a. What restrictions on the transcendental production function result in a Cobb-Douglas

production function?

b. Estimate the transcendental production function using the data in problem 2 and use the Chow

and LR tests to compare it with the Cobb-Douglas production function.

(JM II)

III 47

APPENDIX A

Some important derivatives:

x1 a1 a11 a12

Let X = , a= , A=

x2 a2 a 21 a 22

(symmetric) (a12 = a 21 = a )

d (a X) d (X a)

1. = =a

dX dX

d (X AX)

2. = 2 AX

dX

d (X a)

Proof of =a

dX

d (X a) (X a) / X1 a1

= = =a

dX (X a) / X2 a2

d (X AX)

Proof of = 2 AX

dX

d (X AX) (X a) / X1 2 a11 x1 + 2a x 2

= =

dX (X AX) / X2 2a x1 + 2 a 22 x 2

a11 x1 + a x 2

=2

a x1 + a 22 x 2

a11 a x1

=2

a a 22 x2

= 2 AX.

III 48

APPENDIX B

1

2

s = (y (I - X (X X )-1 X ) y) = SSE/(n- k) .

n- k

n

tr (A) = a ii

i

1) tr(I) = n

5) tr(ABC) = tr(CAB)

6) tr(kA) = k tr(A)

2 1

ζ̂ = ee

n

1

and s2 = ee

n-k

e = y - Xβˆ = y - X ( X X ) -1 X y = My

= M (Xβ + ε) = MXβ + Mε ,

= Mε ,

where M = I - X(X’X)-1X’.

1 1

So ζ̂ 2 = e e = ε M Mε

n n

III 49

1

= ε MMε .

n

1

= ε Mε .

n

1

and s 2 = ε Mε .

n-k

1 1

E (ζ̂ 2) = E (ε Mε) = E (tr(ε Mε)) because cov (ε i, ε j) = 0, i j)

n n

1 1

= Etr (M εε ) = tr (ME (εε ))

n n

1 1

= tr (M ζ 2I) = tr (ζ 2 M)

n n

2

ζ

= tr(M)

n

2

ζ -1

= tr(I - X(X X ) X )

n

2

ζ -1

= (n - tr (X(X X ) X ))

n

2

ζ -1

= (n - tr (X X(X X ) ))

n

2

ζ

= (n - trace (I k ))

n

2

ζ

= (n - k)

n

n-k 2 n

= 2

ζ so E (s ) = E (ζˆ 2) = ζ 2 .

n n-k

n

Therefore ζ̂ 2 is biased, but E (s 2) = E (ζˆ 2) = ζ 2 and s2 is unbiased.

n-k

III 50

APPENDIX C

β = AY = (X X) X Y is BLUE.

Proof: Let β i = A i Y where Ai denotes the ith row of the matrix A. Since the result will be

symmetric for each βi (hence, for each Ai), denote Ai by a’ where a is a (n by 1) vector.

or min a’Ia

= 2a I + λ X = 0

a

= (X a - i) = 0 .

λ

This implies

a = (-1/ 2)λ X ) .

Now substitute a = (-½)Xλ into the expression for = 0 and we obtain

λ

(-1/ 2) X X λ = i

λ = - 2 (X X ) -1 i

a = (-1 / 2) (-2) i (X X )-1 X

= i (X X )-1 X = Ai .

which implies

A = (X X )-1 X

-1

hence, β = (X X ) X y .

- Ch2Uploaded byDaniiar Kamalov
- 1 Heteros Ce Dasti CityUploaded bysamaritasaha
- cointegracion MacKinnon,_et_al_1996.pdfUploaded byAndy Hernandez
- Curve fitting with the R EnvironmentUploaded byJorgeNúñez
- MPRA Paper 23069Uploaded bycallmeshani1
- Multiple Linear RegressionUploaded byMajmaah_Univ_Public
- EC220 2017 Paper STUploaded bybekfjbewkfbwke
- Probabilistic Modelling of Overflow, Surcharge and Flooding in Urban DrainageUploaded byranggarang
- EconomicsUploaded byakshay patri
- Starbucks Case AnalysisUploaded byShoaib
- UndergraduateUploaded byjomanous
- Chapter1-Econometrics-IntroductionToEconometricsUploaded byAbdullah Khatib
- The Quarterly Journal of Economics 2010 Lileeva 1051 99Uploaded byAbel Camacho
- Station a Rity of Beta IllinoisUploaded bydawnherald
- 01 Regression AnalysisUploaded byManuel Mercy Garcia
- 9040-engUploaded byakrause
- Network Bucket TestingUploaded byFacebook
- File EnglishUploaded byAnonymous dgqSQ8yZI
- Sampling MethodsUploaded by'Sari' Siti Khadijah Hapsari
- How to Estimate Spatial Contagion Between Financial MarketsUploaded byAhmed
- docx_20110812_8_HeteroscedaticityUploaded byVu Trung Duc
- Lect1-2in1Uploaded byachal_premi
- random 2.pdfUploaded bySteven Truong
- Addendum to 'the Analysis of a Momentum Model and Accompanying Portfolio Strategies'Uploaded byRobert T. Samuel III
- UT Dallas Syllabus for poec5316.501.10s taught by Paul Jargowsky (jargo)Uploaded byUT Dallas Provost's Technology Group
- Band spectral regression with trending dataUploaded byXiaojun Song
- lecture10_7012_logitUploaded bySergio Gonzales Espinoza
- ENG5001Tute_07Uploaded byJustin
- Maximum Likelihood EstimatorUploaded byVlad Doru
- bgpe_3_glsUploaded bymonoid

- A Brief Interpretation of Output of Simple Regression - HassanUploaded byabdulraufhcc
- A Class of Double Sampling Log Type Estimators for Population Variance Using Two Auxiliary VariableUploaded byMia Amalia
- ABB Electric Data (Customer Choice)Uploaded bybrkn
- Comparing Measures of Sample Skewness and KurtosisUploaded byElizabeth Collins
- Estimating Population Mean When σ is Unknown LESSON PLANUploaded byMikel Manipon Segundo
- Linear Regression Example.pdfUploaded byShi Zhenyang
- PanelDataNotes-11Uploaded byWaleed Said Soliman
- CorrelationUploaded byeric
- Random Signals Detection Estimation and Data Analysis Shanmugan Breipohl 1988Uploaded byMammuky
- ecn140hw5.docxUploaded byk j
- chap12Uploaded byImam Awaluddin
- Panel Data Problem Set 2Uploaded byYadavalli Chandradeep
- A Modified PLS Regression Model for Quality PredictionUploaded byJavier Solano
- AutocorrelationUploaded byJenab Peteburoh
- Lecture19_TS5Uploaded byjeanturq
- Quantile RegressionUploaded byHassaanAhmad
- Robust RegressionUploaded byJia Lenine P. Domingo
- hw4Uploaded byAnonymous 9qJCv5mC0
- NPR N-W EstimatorUploaded byManu
- Chapter 10 Elementary StatisticsUploaded byDiana Bracamonte Dyck
- L05 Binary Choice Models CIE555 0205&10Uploaded byUB
- TGARCH_ZAKOIAN_1994Uploaded byLuis Alberto Bautista
- Univariate RegressionUploaded bySree Nivas
- IPPTCh011Uploaded byRene
- Copula RegressionUploaded byDanna Lesley Cruz Reyes
- Stata Textbook Examples Introductory Econometrics by Jeffrey.pdfUploaded byNicol Escobar Herrera
- Olofin Kouassi Salisu 2Uploaded byAbdul Latif
- lrUploaded byRadu Vasile
- Rug ArchUploaded byAyatRashad
- 5How to Analyze Meta-Analytic Data Stage IIUploaded byĐinh Đức Tâm