You are on page 1of 31

R EVISION

M ULTIPLE R EGRESSION

Econometrics I - Exercise 2
Lubomr Cingl

March 4, 2014

R EVISION

M ULTIPLE R EGRESSION

R EVISION
I

Simple regression model

y = 0 + 1 x + u

y dependent / explained

x independent / explanatory / control

u error term

0 intercept

1 slope

we want to know the relationship

(1)

R EVISION

M ULTIPLE R EGRESSION

A SSUMPTIONS

Average value of u in population is 0: E(u) = 0


I

not very restrictive

Zero conditional mean


I
I

knowing sth about x does not give any info about u


E(u|x) = E(u) = 0 which implies E(y|x) = 0 + 1 x

R EVISION

M ULTIPLE R EGRESSION

E(y|x) as a linear function of x;distribution of y centered around


exp. value

R EVISION

M ULTIPLE R EGRESSION

E STIMATION FROM A SAMPLE

we have a random sample of observations

for each observation holds:

Sample regression line

yi = 0 + 1 xi + ui
I

we want best estimates of parameters


0 , 1

3 ways how to find them: MoM, OLS, ML

(2)

R EVISION

M ULTIPLE R EGRESSION

F ORMULAS TO KNOW
I

intercept
0 =
y 1
x

(3)

slope
n
P

1 =

(xi
x)(yi
y)

i=1
n
P

(4)
(xi
x)2

i=1

if

n
P

(xi x)2 > 0

i=1
I

1 is sample covar bw x and y div by variance of x

If x and y are positively correlated, the slope positive

R EVISION

M ULTIPLE R EGRESSION

S AMPLE REGRESSION LINE

= 0.33 + 0.56edu
e.g. inc

R EVISION

M ULTIPLE R EGRESSION

R SQUARED

each observation can be made up by explained and


unexplained part
yi = yi + ui

we can define following:


P
(yi y)2 is total sum of squares (Var of y)
P
(yi y)2 explained sum of squares
P
(ui )2 residual sum of squares

SST = SSE + SSR

R2 = SSE/SST = 1 SSR/SST

I
I
I

R EVISION

M ULTIPLE R EGRESSION

P ROPERTIES OF OLS ESTIMATOR

Unbiased
I
I

Variance
I
I
I

expected value
of estimator is its true value
P
(xi
x)ui

P
1 = 1 + (xi
= 1
x)2
Assume homoskedasticity Var(u|x) = 2
2 is the error
Pvariance
Var1 = 2 / (xi
x) = 2 /sx 2

We also have to estimate 2


I
I

2 = 1/(n 2) u1 2 = SSR/(n 2)
P
P
Var1 = 2 / (xi
x) = 1/(n 2) u1 2 /sx 2

R EVISION

M ULTIPLE R EGRESSION

E XAMPLE 2.7
Consider the savings function
sav = 0 + 1 inc + u, u =

inc e

(5)

where e is a random variable with E(e) = 0 and Var(e) = e2 .


Assume that e is independent of inc.
1. Show that E(u|inc) = 0, so that the key zero conditional
mean assumption is satisfied. [Hint: if e independent of
inc, then E(e|inc) = E(e) and Var(e|inc) = Var(e)]
2. Show that Var(u|inc) = e2 inc.
3. Provide a discussion that supports the assumption that the
variance of savings increases with family income.

R EVISION

M ULTIPLE R EGRESSION

I NTERPRETATION
linear function
I

y = 0 + 1 x

marginal effect of x on y

natural log
I

y = log(x); x > 0

good to know: log(1 + x) x for x 0 and

log(x1 ) log(x0 ) (x1 x0 )/x0 = x/x0

therefore log(x) 100 %x

constant elasticity model


I

log(y) = 0 + 1 log(x); y, x > 0

1 is elasticity of y w.r.t. x

R EVISION

M ULTIPLE R EGRESSION

E XAMPLE 2.6
Using data from 1988 for houses sold close to garbage mill,
following equation relates housing price to distance from
garbage incinerator:
= 9.4 + 0.312log(dist); n = 135; R2 = 0.162
log(price)

(6)

1. Interpret the coefficient on log(dist). Is the sign of this


estimate what you expect it to be?
2. Do you think simple regression provides an unbiased
estimator of the ceteris paribus elasticity of price with
respect to dist? (Think about the citys decision on where to
put the incinerator.)
3. What other factors about a house affect its price? Might
these be correlated with distance from the incinerator?

R EVISION

M ULTIPLE R EGRESSION

R EGRESSION THROUGH ORIGIN


restriction: x = 0 if y = 0
I then we estimate y
= 1 x
I

we know that
n
P

1 =

(xi
x)(yi
y)

i=1
n
P

(7)
(xi

x)2

i=1

if we set x = 0, y = 0 we get possibly biased estimator:


n
P

1 =

(xi )(yi )

i=1
n
P

(8)
(xi

i=1

)2

R EVISION

M ULTIPLE R EGRESSION

E XAMPLE 2.8
Consider standard simple regression model with standard
assumptions met. Let 1 be the estimator of 1 obtained by
assuming the intercept is zero.
1. Find E(1 ) in terms of the xi , 0 , and 1 . Verify that 1 is
unbiased for 1 when the population intercept 0 is zero.
Are there other cases where 1 is unbiased?
2. Find the variance of 1 . Hint: It does not depend on 0 .
P
P
3. Show that Var(1 ) Var(1 ). Hint: (xi )2 (xi x)2
with strict inequality unless
x = 0.
4. Comment on the trade-off between bias and variance
when choosing between 1 and 1 .

R EVISION

M ULTIPLE R EGRESSION

E XAMPLE 2.9

1. Let 0 and 1 be the intercept and slope from the regression


of yi on xi , using n observations. Let c1 and c2 , with c2 6= 0,
be constants. Let 0 and 1 be the intercept and slope from
the regression c1 yi on c2 xi . Show that 1 = (c1 /c2 )1 and
0 = c1 0 , thereby verifying the claims on units of
measurement in Section 2.4. [Hint: To obtain 1 , 0 plug
the scaled versions of x and y into the OLS formulas.]
2. Now let and 0 and 1 be from the regression (c1 + yi ) on
(c2 + xi ) (with no restriction on c1 or c2 ). Show that 1 = 1
and 0 = 0 + c1 c2 1 .

R EVISION

M ULTIPLE R EGRESSION

E XAMPLE 2.9

3. Now let and 0 and 1 be the OLS estimates from the


regression log(yi ) on (xi ), y > 0. For c1 > 0 let 0 and 1 be the
intercept and slope from the regression of log(c1 yi ) on (xi ).
Show that 1 = 1 and 0 = 0 + log(c1 ).
4. Now, assuming that xi > 0, let 0 and 1 be the intercept and
slope from the regression of yi on log(c2 xi ). How do 0 and 1
compare with the intercept and slope from the regression of yi
on log(xi )?

R EVISION

M ULTIPLE R EGRESSION

Multiple regression

R EVISION

M ULTIPLE R EGRESSION

M ULTIPLE REGRESSION MODEL


I

paralels with simple reg model


y = 0 + 1 x1 + 2 x2 + ... + u

y dependent / explained

x independent / explanatory / control

u error term

similar assumptions about the error term

0 intercept

linear - linear in parameters

1 ,2 , ... slopes

Interpretation:
y = 1 x1

ceteris paribus: holding other factors constant

(9)

R EVISION

M ULTIPLE R EGRESSION

E XAMPLE I

effect of education on wage


wage = 0 + 1 educ + 2 exper + u

we take exper out from error term

ceteris paribus: holding experience constant, we can


measure the effect of education on wage

partial effect

(10)

R EVISION

M ULTIPLE R EGRESSION

E XAMPLE II

effect of income on consumption


cons = 0 + 1 inc + 2 (inc)2 + u

same way of getting parameters

interpretation different: alone inc non-sense


cons
1 + 22 inc
inc

marginal effect of income on consumption depends on


both terms

(11)

R EVISION

M ULTIPLE R EGRESSION

A SSUMPTIONS

I
I

Expected value of u is 0: E(u) = 0


Zero conditional mean (ASS3)
I
I
I

No perfect collinearity (ASS4)


I
I
I

knowing sth about x does not give any info about u


E(u|x1 , x2 , ..., xk ) = 0
all unobserved factors in error term uncorrelated with all xi
no independent variable is constant
no exact linear relationship among them holds
xs can be correlated, but not perfectly

we need more observations than parameters: n k + 1

R EVISION

M ULTIPLE R EGRESSION

E STIMATION FROM A SAMPLE


I

we have a random sample of observations (ASS2)

for each observation holds (ASS1):


yi = 0 + 1 xi1 + 2 xi2 + ... + k xik + ui

we want best estimates of parameters


0 , 1 , ...

3 ways how to find them: MoM, OLS, ML

we will get the estimates Sample regression ft

yi = 0 + 1 xi2 + 2 xi2 + ... + k xik

(12)

(13)

R EVISION

M ULTIPLE R EGRESSION

P ROPERTIES

if ASS1 thru ASS4 hold, then our OLS estimator is


unbiased

i.e. the procedure of getting the estimate is unbiased

if we add irrelevant variable


I
I

no effect on parameters of relevant variables - still unbiased


slight increase in R2

if we omit important variable


I
I

OLS will be biased, usually


omitted variable bias

R EVISION

M ULTIPLE R EGRESSION

O MMITED VARIABLE BIAS

y = 0 + 1 x1 + 2 x2 + u

y = 0 + 1 x1
P
(xi1 x1 )xi2

E(1 ) = 1 + 2 P
= 1 + 2 1
(xi1 x1 )2

R EVISION

M ULTIPLE R EGRESSION

VARIANCE
I

Assume homoskedasticity Var(u|x1 , x2 ...xk ) = 2


I

variance of u same for all combinations of outcomes of xs


2
2
=
Var(j ) = P
2
(xij xj )(1 Rj )
SSTj (1 R2j )

I
I

R2j is R2 from the regression of xj on all other xs


size important

We also have to estimate 2


I
I

2 = 1/(n k 1)
2
Varj = SST (1R
2)
j

sej =

SSTj (1R2j )

u1 2 = SSR/df

(14)

R EVISION

M ULTIPLE R EGRESSION

E XAMPLE 3.9
The following equation describes the median housing price in a
community in terms of amount of pollution (nox for nitrous
oxide) and the average number of rooms in houses in the
community (rooms):
log(price) = 0 + 1 log(nox) + 2 rooms + u

1. What are the probable signs of the regression slopes?


Interpret 1 , explain.
2. Why might nox [more precisely, log(nox)] and rooms be
negatively correlated? If this is the case, does the simple
regression of log(price) on log(nox) produce an upward or
downward biased estimator of 1 ?

R EVISION

M ULTIPLE R EGRESSION

E XAMPLE 3.9
3. Using the data in HPRICE2.RAW, the following equations
were estimated:
= 11.71 1.043log(nox), n = 506, R2 = 0.264
log(price)
= 9.230.718log(nox)+0.306rooms, n = 506, R2 = 0.514
log(price)
Is the relationship between the simple and multiple regression
estimates of the elasticity of price with respect to nox what you
would have predicted, given your answer in part 2.? Does this
mean that -0.718 is definitely closer to the true elasticity than
-1.043?

R EVISION

M ULTIPLE R EGRESSION

E XAMPLE 3.12
1. Consider the simple regression model y = 0 + 1 x + u under
the first four Gauss-Markov assumptions. For some function
g(x), for example g(x) = x2 or g(x) = log(1 + x2 ) define
zi = g(xi ). Define a slope estimator as
X
X
1 =
(zi z)yi /
(zi z)xi
Show that 1 is linear and unbiased. Remember, because
E(u|x) = 0, you can treat both xi and zi as nonrandom in your
derivation.
2. Add the homoskedasticity assumption, MLR.5. Show that
X
X
Var(1 ) = ( (zi z)2 )/( (zi z)xi )2

R EVISION

M ULTIPLE R EGRESSION

E XAMPLE 3.12

3. Show directly that, under the Gauss-Markov


assumptions,Var(1 ) Var(1 ), where 1 is the OLS estimator.
[Hint: The Cauchy-Schwartz inequality in Appendix B implies
that
X
X
X
(n1
(zi z)(xi
x))2 (n1
(zi z)2 )(n1
(xi x)2 )
notice that we can drop
x from the sample covariance.]

R EVISION

M ULTIPLE R EGRESSION

E XAMPLE 3.15
The file CEOSAL2.RAW contains data on 177 chief executive
officers, which can be used to examine the effects of firm
performance on CEO salary.
1. Estimate a model relating annual salary to firm sales and
market value. Make the model of the constant elasticity
variety for both independent variables. Write the results
out in equation form.
2. Add profits to the model from part 1. Why can this
variable not be included in logarithmic form? Would you
say that these firm performance variables explain most of
the variation in CEO salaries?

R EVISION

M ULTIPLE R EGRESSION

E XAMPLE 3.15

3. Add the variable ceoten to the model in part 2. What is the


estimated percentage return for another year of CEO tenure,
holding other factors fixed?
4. Find the sample correlation coefficient between the variables
log(mktval) and profits. Are these variables highly correlated?
What does this say about the OLS estimators?

You might also like