Lecture 4 - ETC3440

Introductory Econometrics
Regression
Interpretation, functional form, scaling
Farshid Vahid
2016
Recap
I
The regression model relates the dependent variable to k

explanatory variables
y = 0 + 1 x1 + 2 x2 + + k xk + u
When we have a sample of n observation {yi , xi1 , xi2 , . . . , xik } for

i = 1, . . . , n, we can write the model for the entire sample in matrix
notation
y =
X
+ u
n1
n(k+1) (k+1)1
n1
where
y=
and X =
u=
The OLS procedure finds a linear combination of X that is closest

to the vector y, i.e. the length of its error vector (OLS residuals) is
the shortest
This implies that the OLS residual vector is perpendicular to all

columns of X, i.e.,
=0
X0 u
= y Xb we obtain the famous OLS formula

Since u
b = (X0 X)1 X0 y
A consequence of orthogonality of residuals and columns of X is that

n
X
i=1
(yi y)2 =
n
n
X
X
(
yi y)2 +
ui2
i=1
i=1
or
SST = SSE + SSR
I
This leads to the definition of the coefficient of determination R 2 ,

which is a measure of goodness of fit
R 2 = SSE/SST = 1 SSR/SST
Lecture Outline
Interpretation of the OLS estimates with examples (textbook

reference 3-2a to 3-2e)
The world is non-linear: How useful can a linear model be?

(textbook reference 2-4b, 6-2b, 6-2c)
Units of measurement: do the results qualitatively change if we

change the units of measurement? (textbook reference 2-4a, 6-1
(exclude 6-1a))
Interpretation of OLS estimates

Example: The causal effect of education on wage
I
Consider an extension of the wage equation that was in last weeks

consolidation lesson:
wage = 0 + 1 educ + 2 IQ + u
where IQ is IQ score (in the population, it has a mean of 100 and sd
= 15).
Primarily interested in 1 , because we want to know the value that

education adds to a persons wage.
Without IQ in the equation, the coefficient of educ will show how

strongly wage and educ are correlated, but both could be caused by
a persons ability.
By explicitly including IQ in the equation, we obtain a more

persuasive estimate of the causal effect of education provided that
IQ is a good proxy for intelligence.

If we only estimate a regression of wage on education, we cannot be

sure if we are measuring the effect of education, or if education is
acting as a proxy for smartness. This is important, because if the
education system does not add any value other than separating
smart people from not so smart, the society can achieve that much
cheaper by national IQ tests!

The coefficient of educ now shows that for two people with the
same IQ score, the one with 1 more year of education is expected to
earn $42 more.

I
Consider k = 2 for simplicity
The conditional expectation of y given x1 and x2 (also know as the

population regression function) is
E (y | x1 , x2 ) = 0 + 1 x1 + 2 x2
The estimated regression (also known as the sample regression

function) is
y = 0 + 1 x1 + 2 x2 .
The formula
y = 1 x1 + 2 x2
allows us to compute how predicted y changes when x1 and x2
change by any amount.
What if we hold x2 fixed, that is, its change is zero, x2 = 0?
y = 1 x1 if x2 = 0
In particular,
y
1 =
if x2 = 0
x1
In other words, 1 is the slope of y with respect to x1 when x2 is

held fixed.
We also refer to 1 as the estimate of the partial effect of x1 on y

holding x2 constant
Yet another legitimate interpretation is that 1 estimates the effect

of x1 on y after the influence of x2 has been removed (or has been
controlled for)
Similarly,
y = 2 x2 if x1 = 0
and
y
2 =
if x1 = 0
x2
Lets go back to regression output and interpret the parameters.

wage
[
n
= 128.89 + 42.06 educ + 5.14 IQ

=
935, R 2 = .134
42.06 shows that for two people with the same IQ, the one with one
more year of education is predicted to earn $42.06 more in monthly
wages.
Or: Every extra year of education increases the predicted wage by

$42.06, keeping IQ constant (or after controlling for IQ, or after
removing the effect of IQ, or all else constant, or all else
equal, or ceteris paribus)
Is a linear model useful in a non-linear world?

I
We all feel that the world is non-linear. Our speed in learning (or
any other activity) accelerates as we grow up, gets to a peak and
goes downhill eventually. How good is a linear model in this
non-linear world?
But the linear regression model only needs to be linear in

parameters.
y and x1 to xk can be non-linear transformations of variables.
It is quite usual that y and some of x variables are logarithms of

observed variables, and also some x variable can be quadratic
functions of measured variables
Example: Recall the wage example:

wage
[
n
= 128.89 + 42.06 educ + 5.14 IQ

=
935, R 2 = .134
This is not satisfactory because it predicts that regardless of what

your wage currently is, an extra year of schooling will add $42.06 to
your wage. It is more realistic to assume that it adds a constant
percentage to your wage, not a constant dollar amount
How can we incorporate this in the model? We can use natural

logarithm of wage as the dependent variable
log(wage) = 0 + 1 educ + 2 IQ + u
Holding IQ and u fixed,

log(wage) = 1 educ
so
1 =
log(wage)
educ
Useful result from calculus:

100 log(wage) %wage
This leads to a simple interpretation of 1 :

1001 %wage when educ = 1 holding IQ constant
If we do not multiply by 100, we have the decimal version (the

proprotionate change).
In this example, 1001 is often called the return to education (just

like an investment). This measure is free of units of measurement of
wage (currency, price level).
Lets revisit the wage equation

\
log(wage)
5.66 + 0.039 educ + 0.006 IQ
935, R 2 = .130
These results tell us that ...
Warning: This R-squared is not directly comparable to the

R-squared when wage is the dependent variable. We can only
compare R-squared of two models if they have the same dependent
variable. The total variation (SSTs) in wagei and log(wagei ) are
completely different.
We can use logarithmic transformation of x as well. Table 2.3 of the

textbook summarises the different nonlinear functional form that
involve logarithms:
Model
Level-level
Dep Var
y
Indep Var
x
Interpretation
y = 1 x
Level-log
log(x)
y = (1 /100)%x
Log-level
log(y )
%y = (1001 )x
Log-log
log(y )
log(x)
%y = 1 %x
The last one is very important in economics because 1 measures

the elasticity of y with respect to x. For analysing the effect of a
change in tax rates, it is important to measure demand and supply
elasticities.
Considerations for using levels or logarithms:

1. A variable must have a strictly positive range to be a candidate for
logarithmic transformation.
2. Thinking about the problem: does it make sense that a unit change
in x leads to a constant change in the magnitude of y or a constant
% change in y ?
3. Looking at the scatter plot, if there is only one x.
4. Explanatory variables that are measured in years, such as years of
education, experience or age, are not logged.
5. Variables that are already in percentages (such as interest rate or
tax rate) are not logged. A unit change in these variables already is
a one percent change.
6. If a variable is positively skewed (like income or wealth), taking
logarithms makes its distribution less skewed.
There is a good discussion in 6-2a
Summary
Interpretation of OLS estimates in multiple regression: Very

important to interpret the coefficients using the context, and very
important to remember that each b estimates the partial effect of
its corresponding x all other x staying constant.
Modelling non-linear relationships: The linear regression model is

only linear in parameters. By using non-linear transformations (such
as logarithmic or quadratic) of y or any of the x variables, we can
model non-linear relationships with the regression model.

Lecture 4 - ETC3440

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lecture 4 - ETC3440

Uploaded by

Copyright:

Available Formats

Introductory Econometrics

The regression model relates the dependent variable to k

When we have a sample of n observation {yi , xi1 , xi2 , . . . , xik } for

The OLS procedure finds a linear combination of X that is closest

This implies that the OLS residual vector is perpendicular to all

= y Xb we obtain the famous OLS formula

A consequence of orthogonality of residuals and columns of X is that

This leads to the definition of the coefficient of determination R 2 ,

Interpretation of the OLS estimates with examples (textbook

The world is non-linear: How useful can a linear model be?

Units of measurement: do the results qualitatively change if we

Interpretation of OLS estimates

Consider an extension of the wage equation that was in last weeks

Primarily interested in 1 , because we want to know the value that

Without IQ in the equation, the coefficient of educ will show how

By explicitly including IQ in the equation, we obtain a more

Interpretation of OLS estimates

If we only estimate a regression of wage on education, we cannot be

Interpretation of OLS estimates

Interpretation of OLS estimates

Consider k = 2 for simplicity

The conditional expectation of y given x1 and x2 (also know as the

The estimated regression (also known as the sample regression

What if we hold x2 fixed, that is, its change is zero, x2 = 0?

In other words, 1 is the slope of y with respect to x1 when x2 is

We also refer to 1 as the estimate of the partial effect of x1 on y

Yet another legitimate interpretation is that 1 estimates the effect

Lets go back to regression output and interpret the parameters.

= 128.89 + 42.06 educ + 5.14 IQ

Or: Every extra year of education increases the predicted wage by

Is a linear model useful in a non-linear world?

But the linear regression model only needs to be linear in

y and x1 to xk can be non-linear transformations of variables.

It is quite usual that y and some of x variables are logarithms of

Example: Recall the wage example:

= 128.89 + 42.06 educ + 5.14 IQ

This is not satisfactory because it predicts that regardless of what

How can we incorporate this in the model? We can use natural

Holding IQ and u fixed,

Useful result from calculus:

This leads to a simple interpretation of 1 :

If we do not multiply by 100, we have the decimal version (the

In this example, 1001 is often called the return to education (just

Lets revisit the wage equation

5.66 + 0.039 educ + 0.006 IQ

These results tell us that ...

Warning: This R-squared is not directly comparable to the

We can use logarithmic transformation of x as well. Table 2.3 of the

The last one is very important in economics because 1 measures

Considerations for using levels or logarithms:

Interpretation of OLS estimates in multiple regression: Very

Modelling non-linear relationships: The linear regression model is

You might also like