You are on page 1of 18

Introductory Econometrics

Regression
Interpretation, functional form, scaling
Farshid Vahid

2016

Recap
I

The regression model relates the dependent variable to k


explanatory variables
y = 0 + 1 x1 + 2 x2 + + k xk + u

When we have a sample of n observation {yi , xi1 , xi2 , . . . , xik } for


i = 1, . . . , n, we can write the model for the entire sample in matrix
notation
y =
X
+ u
n1

n(k+1) (k+1)1

n1

where
y=

and X =

u=

The OLS procedure finds a linear combination of X that is closest


to the vector y, i.e. the length of its error vector (OLS residuals) is
the shortest

This implies that the OLS residual vector is perpendicular to all


columns of X, i.e.,
=0
X0 u

= y Xb we obtain the famous OLS formula


Since u
b = (X0 X)1 X0 y

A consequence of orthogonality of residuals and columns of X is that


n
X
i=1

(yi y)2 =

n
n
X
X
(
yi y)2 +
ui2
i=1

i=1

or
SST = SSE + SSR
I

This leads to the definition of the coefficient of determination R 2 ,


which is a measure of goodness of fit
R 2 = SSE/SST = 1 SSR/SST

Lecture Outline

Interpretation of the OLS estimates with examples (textbook


reference 3-2a to 3-2e)

The world is non-linear: How useful can a linear model be?


(textbook reference 2-4b, 6-2b, 6-2c)

Units of measurement: do the results qualitatively change if we


change the units of measurement? (textbook reference 2-4a, 6-1
(exclude 6-1a))

Interpretation of OLS estimates


Example: The causal effect of education on wage
I

Consider an extension of the wage equation that was in last weeks


consolidation lesson:
wage = 0 + 1 educ + 2 IQ + u
where IQ is IQ score (in the population, it has a mean of 100 and sd
= 15).

Primarily interested in 1 , because we want to know the value that


education adds to a persons wage.

Without IQ in the equation, the coefficient of educ will show how


strongly wage and educ are correlated, but both could be caused by
a persons ability.

By explicitly including IQ in the equation, we obtain a more


persuasive estimate of the causal effect of education provided that
IQ is a good proxy for intelligence.

Interpretation of OLS estimates


Example: The causal effect of education on wage

If we only estimate a regression of wage on education, we cannot be


sure if we are measuring the effect of education, or if education is
acting as a proxy for smartness. This is important, because if the
education system does not add any value other than separating
smart people from not so smart, the society can achieve that much
cheaper by national IQ tests!

Interpretation of OLS estimates


Example: The causal effect of education on wage

The coefficient of educ now shows that for two people with the
same IQ score, the one with 1 more year of education is expected to
earn $42 more.

Interpretation of OLS estimates


I

Consider k = 2 for simplicity

The conditional expectation of y given x1 and x2 (also know as the


population regression function) is
E (y | x1 , x2 ) = 0 + 1 x1 + 2 x2

The estimated regression (also known as the sample regression


function) is
y = 0 + 1 x1 + 2 x2 .

The formula

y = 1 x1 + 2 x2
allows us to compute how predicted y changes when x1 and x2
change by any amount.

What if we hold x2 fixed, that is, its change is zero, x2 = 0?

y = 1 x1 if x2 = 0

In particular,

y
1 =
if x2 = 0
x1

In other words, 1 is the slope of y with respect to x1 when x2 is


held fixed.

We also refer to 1 as the estimate of the partial effect of x1 on y


holding x2 constant

Yet another legitimate interpretation is that 1 estimates the effect


of x1 on y after the influence of x2 has been removed (or has been
controlled for)

Similarly,

y = 2 x2 if x1 = 0
and

y
2 =
if x1 = 0
x2

Lets go back to regression output and interpret the parameters.


wage
[
n

= 128.89 + 42.06 educ + 5.14 IQ


=

935, R 2 = .134

42.06 shows that for two people with the same IQ, the one with one
more year of education is predicted to earn $42.06 more in monthly
wages.

Or: Every extra year of education increases the predicted wage by


$42.06, keeping IQ constant (or after controlling for IQ, or after
removing the effect of IQ, or all else constant, or all else
equal, or ceteris paribus)

Is a linear model useful in a non-linear world?


I

We all feel that the world is non-linear. Our speed in learning (or
any other activity) accelerates as we grow up, gets to a peak and
goes downhill eventually. How good is a linear model in this
non-linear world?

But the linear regression model only needs to be linear in


parameters.

y and x1 to xk can be non-linear transformations of variables.

It is quite usual that y and some of x variables are logarithms of


observed variables, and also some x variable can be quadratic
functions of measured variables

Example: Recall the wage example:


wage
[
n

= 128.89 + 42.06 educ + 5.14 IQ


=

935, R 2 = .134

This is not satisfactory because it predicts that regardless of what


your wage currently is, an extra year of schooling will add $42.06 to
your wage. It is more realistic to assume that it adds a constant
percentage to your wage, not a constant dollar amount

How can we incorporate this in the model? We can use natural


logarithm of wage as the dependent variable
log(wage) = 0 + 1 educ + 2 IQ + u

Holding IQ and u fixed,


log(wage) = 1 educ
so
1 =

log(wage)
educ

Useful result from calculus:


100 log(wage) %wage

This leads to a simple interpretation of 1 :


1001 %wage when educ = 1 holding IQ constant

If we do not multiply by 100, we have the decimal version (the


proprotionate change).

In this example, 1001 is often called the return to education (just


like an investment). This measure is free of units of measurement of
wage (currency, price level).

Lets revisit the wage equation


\
log(wage)

5.66 + 0.039 educ + 0.006 IQ

935, R 2 = .130

These results tell us that ...

Warning: This R-squared is not directly comparable to the


R-squared when wage is the dependent variable. We can only
compare R-squared of two models if they have the same dependent
variable. The total variation (SSTs) in wagei and log(wagei ) are
completely different.

We can use logarithmic transformation of x as well. Table 2.3 of the


textbook summarises the different nonlinear functional form that
involve logarithms:
Model
Level-level

Dep Var
y

Indep Var
x

Interpretation
y = 1 x

Level-log

log(x)

y = (1 /100)%x

Log-level

log(y )

%y = (1001 )x

Log-log

log(y )

log(x)

%y = 1 %x

The last one is very important in economics because 1 measures


the elasticity of y with respect to x. For analysing the effect of a
change in tax rates, it is important to measure demand and supply
elasticities.

Considerations for using levels or logarithms:


1. A variable must have a strictly positive range to be a candidate for
logarithmic transformation.
2. Thinking about the problem: does it make sense that a unit change
in x leads to a constant change in the magnitude of y or a constant
% change in y ?
3. Looking at the scatter plot, if there is only one x.
4. Explanatory variables that are measured in years, such as years of
education, experience or age, are not logged.
5. Variables that are already in percentages (such as interest rate or
tax rate) are not logged. A unit change in these variables already is
a one percent change.
6. If a variable is positively skewed (like income or wealth), taking
logarithms makes its distribution less skewed.
There is a good discussion in 6-2a

Summary

Interpretation of OLS estimates in multiple regression: Very


important to interpret the coefficients using the context, and very
important to remember that each b estimates the partial effect of
its corresponding x all other x staying constant.

Modelling non-linear relationships: The linear regression model is


only linear in parameters. By using non-linear transformations (such
as logarithmic or quadratic) of y or any of the x variables, we can
model non-linear relationships with the regression model.

You might also like