Introductory Econometrics: Chapter 2 Slides

R EVISION
M ULTIPLE R EGRESSION
Econometrics I - Exercise 2
Lubomr Cingl
March 4, 2014
R EVISION
R EVISION
I
Simple regression model
y = 0 + 1 x + u
y dependent / explained
x independent / explanatory / control
u error term
0 intercept
1 slope
we want to know the relationship
(1)
R EVISION
A SSUMPTIONS
Average value of u in population is 0: E(u) = 0

I
not very restrictive
Zero conditional mean

I
I
knowing sth about x does not give any info about u

E(u|x) = E(u) = 0 which implies E(y|x) = 0 + 1 x
R EVISION
E(y|x) as a linear function of x;distribution of y centered around

exp. value
R EVISION
E STIMATION FROM A SAMPLE
we have a random sample of observations
for each observation holds:
Sample regression line
yi = 0 + 1 xi + ui
I
we want best estimates of parameters

0 , 1
3 ways how to find them: MoM, OLS, ML
(2)
R EVISION
F ORMULAS TO KNOW
I
intercept
0 =
y 1
x
(3)
slope
n
P
1 =
(xi
x)(yi
y)
i=1
n
P
(4)
(xi
x)2
i=1
if
n
P
(xi x)2 > 0
i=1
I
1 is sample covar bw x and y div by variance of x
If x and y are positively correlated, the slope positive
R EVISION
S AMPLE REGRESSION LINE
= 0.33 + 0.56edu
e.g. inc
R EVISION
R SQUARED
each observation can be made up by explained and

unexplained part
yi = yi + ui
we can define following:

P
(yi y)2 is total sum of squares (Var of y)
P
(yi y)2 explained sum of squares
P
(ui )2 residual sum of squares
SST = SSE + SSR
R2 = SSE/SST = 1 SSR/SST
I
I
I
R EVISION
P ROPERTIES OF OLS ESTIMATOR
Unbiased
I
I
Variance
I
I
I
expected value
of estimator is its true value
P
(xi
x)ui
P
1 = 1 + (xi
= 1
x)2
Assume homoskedasticity Var(u|x) = 2
2 is the error
Pvariance
Var1 = 2 / (xi
x) = 2 /sx 2
We also have to estimate 2

I
I
2 = 1/(n 2) u1 2 = SSR/(n 2)
P
P
Var1 = 2 / (xi
x) = 1/(n 2) u1 2 /sx 2
R EVISION
E XAMPLE 2.7
Consider the savings function
sav = 0 + 1 inc + u, u =
inc e
(5)
where e is a random variable with E(e) = 0 and Var(e) = e2 .

Assume that e is independent of inc.
1. Show that E(u|inc) = 0, so that the key zero conditional
mean assumption is satisfied. [Hint: if e independent of
inc, then E(e|inc) = E(e) and Var(e|inc) = Var(e)]
2. Show that Var(u|inc) = e2 inc.
3. Provide a discussion that supports the assumption that the
variance of savings increases with family income.
R EVISION
I NTERPRETATION
linear function
I
y = 0 + 1 x
marginal effect of x on y
natural log
I
y = log(x); x > 0
good to know: log(1 + x) x for x 0 and
log(x1 ) log(x0 ) (x1 x0 )/x0 = x/x0
therefore log(x) 100 %x
constant elasticity model

I
log(y) = 0 + 1 log(x); y, x > 0
1 is elasticity of y w.r.t. x
R EVISION
E XAMPLE 2.6
Using data from 1988 for houses sold close to garbage mill,
following equation relates housing price to distance from
garbage incinerator:
= 9.4 + 0.312log(dist); n = 135; R2 = 0.162
log(price)
(6)
1. Interpret the coefficient on log(dist). Is the sign of this

estimate what you expect it to be?
2. Do you think simple regression provides an unbiased
estimator of the ceteris paribus elasticity of price with
respect to dist? (Think about the citys decision on where to
put the incinerator.)
3. What other factors about a house affect its price? Might
these be correlated with distance from the incinerator?
R EVISION
R EGRESSION THROUGH ORIGIN

restriction: x = 0 if y = 0
I then we estimate y
= 1 x
I
we know that
n
P
1 =
(xi
x)(yi
y)
i=1
n
P
(7)
(xi
x)2
i=1
if we set x = 0, y = 0 we get possibly biased estimator:

n
P
1 =
(xi )(yi )
i=1
n
P
(8)
(xi
i=1
)2
R EVISION
E XAMPLE 2.8
Consider standard simple regression model with standard
assumptions met. Let 1 be the estimator of 1 obtained by
assuming the intercept is zero.
1. Find E(1 ) in terms of the xi , 0 , and 1 . Verify that 1 is
unbiased for 1 when the population intercept 0 is zero.
Are there other cases where 1 is unbiased?
2. Find the variance of 1 . Hint: It does not depend on 0 .
P
P
3. Show that Var(1 ) Var(1 ). Hint: (xi )2 (xi x)2
with strict inequality unless
x = 0.
4. Comment on the trade-off between bias and variance
when choosing between 1 and 1 .
R EVISION
E XAMPLE 2.9
1. Let 0 and 1 be the intercept and slope from the regression

of yi on xi , using n observations. Let c1 and c2 , with c2 6= 0,
be constants. Let 0 and 1 be the intercept and slope from
the regression c1 yi on c2 xi . Show that 1 = (c1 /c2 )1 and
0 = c1 0 , thereby verifying the claims on units of
measurement in Section 2.4. [Hint: To obtain 1 , 0 plug
the scaled versions of x and y into the OLS formulas.]
2. Now let and 0 and 1 be from the regression (c1 + yi ) on
(c2 + xi ) (with no restriction on c1 or c2 ). Show that 1 = 1
and 0 = 0 + c1 c2 1 .
R EVISION
E XAMPLE 2.9
3. Now let and 0 and 1 be the OLS estimates from the

regression log(yi ) on (xi ), y > 0. For c1 > 0 let 0 and 1 be the
intercept and slope from the regression of log(c1 yi ) on (xi ).
Show that 1 = 1 and 0 = 0 + log(c1 ).
4. Now, assuming that xi > 0, let 0 and 1 be the intercept and
slope from the regression of yi on log(c2 xi ). How do 0 and 1
compare with the intercept and slope from the regression of yi
on log(xi )?
R EVISION
Multiple regression
R EVISION
M ULTIPLE REGRESSION MODEL

I
paralels with simple reg model

y = 0 + 1 x1 + 2 x2 + ... + u
y dependent / explained
x independent / explanatory / control
u error term
similar assumptions about the error term
0 intercept
linear - linear in parameters
1 ,2 , ... slopes
Interpretation:
y = 1 x1
ceteris paribus: holding other factors constant
(9)
R EVISION
E XAMPLE I
effect of education on wage

wage = 0 + 1 educ + 2 exper + u
we take exper out from error term
ceteris paribus: holding experience constant, we can

measure the effect of education on wage
partial effect
(10)
R EVISION
E XAMPLE II
effect of income on consumption

cons = 0 + 1 inc + 2 (inc)2 + u
same way of getting parameters
interpretation different: alone inc non-sense

cons
1 + 22 inc
inc
marginal effect of income on consumption depends on

both terms
(11)
R EVISION
A SSUMPTIONS
I
I
Expected value of u is 0: E(u) = 0

Zero conditional mean (ASS3)
I
I
I
No perfect collinearity (ASS4)

I
I
I
knowing sth about x does not give any info about u

E(u|x1 , x2 , ..., xk ) = 0
all unobserved factors in error term uncorrelated with all xi
no independent variable is constant
no exact linear relationship among them holds
xs can be correlated, but not perfectly
we need more observations than parameters: n k + 1
R EVISION
E STIMATION FROM A SAMPLE

I
we have a random sample of observations (ASS2)
for each observation holds (ASS1):

yi = 0 + 1 xi1 + 2 xi2 + ... + k xik + ui
we want best estimates of parameters

0 , 1 , ...
3 ways how to find them: MoM, OLS, ML
we will get the estimates Sample regression ft
yi = 0 + 1 xi2 + 2 xi2 + ... + k xik
(12)
(13)
R EVISION
P ROPERTIES
if ASS1 thru ASS4 hold, then our OLS estimator is

unbiased
i.e. the procedure of getting the estimate is unbiased
if we add irrelevant variable

I
I
no effect on parameters of relevant variables - still unbiased

slight increase in R2
if we omit important variable

I
I
OLS will be biased, usually

omitted variable bias
R EVISION
O MMITED VARIABLE BIAS
y = 0 + 1 x1 + 2 x2 + u
y = 0 + 1 x1
P
(xi1 x1 )xi2
E(1 ) = 1 + 2 P
= 1 + 2 1
(xi1 x1 )2
R EVISION
VARIANCE
I
Assume homoskedasticity Var(u|x1 , x2 ...xk ) = 2

I
variance of u same for all combinations of outcomes of xs

2
2
=
Var(j ) = P
2
(xij xj )(1 Rj )
SSTj (1 R2j )
I
I
R2j is R2 from the regression of xj on all other xs

size important
We also have to estimate 2

I
I
2 = 1/(n k 1)
2
Varj = SST (1R
2)
j
sej =
SSTj (1R2j )
u1 2 = SSR/df
(14)
R EVISION
E XAMPLE 3.9
The following equation describes the median housing price in a
community in terms of amount of pollution (nox for nitrous
oxide) and the average number of rooms in houses in the
community (rooms):
log(price) = 0 + 1 log(nox) + 2 rooms + u
1. What are the probable signs of the regression slopes?

Interpret 1 , explain.
2. Why might nox [more precisely, log(nox)] and rooms be
negatively correlated? If this is the case, does the simple
regression of log(price) on log(nox) produce an upward or
downward biased estimator of 1 ?
R EVISION
E XAMPLE 3.9
3. Using the data in HPRICE2.RAW, the following equations
were estimated:
= 11.71 1.043log(nox), n = 506, R2 = 0.264
log(price)
= 9.230.718log(nox)+0.306rooms, n = 506, R2 = 0.514
log(price)
Is the relationship between the simple and multiple regression
estimates of the elasticity of price with respect to nox what you
would have predicted, given your answer in part 2.? Does this
mean that -0.718 is definitely closer to the true elasticity than
-1.043?
R EVISION
E XAMPLE 3.12
1. Consider the simple regression model y = 0 + 1 x + u under
the first four Gauss-Markov assumptions. For some function
g(x), for example g(x) = x2 or g(x) = log(1 + x2 ) define
zi = g(xi ). Define a slope estimator as
X
X
1 =
(zi z)yi /
(zi z)xi
Show that 1 is linear and unbiased. Remember, because
E(u|x) = 0, you can treat both xi and zi as nonrandom in your
derivation.
2. Add the homoskedasticity assumption, MLR.5. Show that
X
X
Var(1 ) = ( (zi z)2 )/( (zi z)xi )2
R EVISION
E XAMPLE 3.12
3. Show directly that, under the Gauss-Markov

assumptions,Var(1 ) Var(1 ), where 1 is the OLS estimator.
[Hint: The Cauchy-Schwartz inequality in Appendix B implies
that
X
X
X
(n1
(zi z)(xi
x))2 (n1
(zi z)2 )(n1
(xi x)2 )
notice that we can drop
x from the sample covariance.]
R EVISION
E XAMPLE 3.15
The file CEOSAL2.RAW contains data on 177 chief executive
officers, which can be used to examine the effects of firm
performance on CEO salary.
1. Estimate a model relating annual salary to firm sales and
market value. Make the model of the constant elasticity
variety for both independent variables. Write the results
out in equation form.
2. Add profits to the model from part 1. Why can this
variable not be included in logarithmic form? Would you
say that these firm performance variables explain most of
the variation in CEO salaries?
R EVISION
E XAMPLE 3.15
3. Add the variable ceoten to the model in part 2. What is the

estimated percentage return for another year of CEO tenure,
holding other factors fixed?
4. Find the sample correlation coefficient between the variables
log(mktval) and profits. Are these variables highly correlated?
What does this say about the OLS estimators?

Introductory Econometrics: Chapter 2 Slides

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Introductory Econometrics: Chapter 2 Slides

Uploaded by

Copyright:

Available Formats

R EVISION

Simple regression model

x independent / explanatory / control

we want to know the relationship

Average value of u in population is 0: E(u) = 0

not very restrictive

Zero conditional mean

knowing sth about x does not give any info about u

E(y|x) as a linear function of x;distribution of y centered around

E STIMATION FROM A SAMPLE

we have a random sample of observations

for each observation holds:

Sample regression line

we want best estimates of parameters

3 ways how to find them: MoM, OLS, ML

(xi x)2 > 0

1 is sample covar bw x and y div by variance of x

If x and y are positively correlated, the slope positive

S AMPLE REGRESSION LINE

each observation can be made up by explained and

we can define following:

SST = SSE + SSR

P ROPERTIES OF OLS ESTIMATOR

We also have to estimate 2

where e is a random variable with E(e) = 0 and Var(e) = e2 .

good to know: log(1 + x) x for x 0 and

log(x1 ) log(x0 ) (x1 x0 )/x0 = x/x0

therefore log(x) 100 %x

constant elasticity model

log(y) = 0 + 1 log(x); y, x > 0

1. Interpret the coefficient on log(dist). Is the sign of this

R EGRESSION THROUGH ORIGIN

if we set x = 0, y = 0 we get possibly biased estimator:

1. Let 0 and 1 be the intercept and slope from the regression

3. Now let and 0 and 1 be the OLS estimates from the

M ULTIPLE REGRESSION MODEL

paralels with simple reg model

x independent / explanatory / control

similar assumptions about the error term

linear - linear in parameters

ceteris paribus: holding other factors constant

effect of education on wage

we take exper out from error term

ceteris paribus: holding experience constant, we can

effect of income on consumption

same way of getting parameters

interpretation different: alone inc non-sense

marginal effect of income on consumption depends on

Expected value of u is 0: E(u) = 0

No perfect collinearity (ASS4)

knowing sth about x does not give any info about u

we need more observations than parameters: n k + 1

E STIMATION FROM A SAMPLE

we have a random sample of observations (ASS2)

for each observation holds (ASS1):

we want best estimates of parameters

3 ways how to find them: MoM, OLS, ML

we will get the estimates Sample regression ft

yi = 0 + 1 xi2 + 2 xi2 + ... + k xik

if ASS1 thru ASS4 hold, then our OLS estimator is

i.e. the procedure of getting the estimate is unbiased

if we add irrelevant variable

no effect on parameters of relevant variables - still unbiased

if we omit important variable