You are on page 1of 47

Economics 420

Introduction to Econometrics
Professor Woodbury
Fall Semester 2015
Simple Regression (Regression with One Regressor)
1. Introduction to linear regression
2. Defining the linear regression model
3. Estimating the linear regression model method of moments
4. Algebraic Properties and measures of fit of OLS
5. Sampling distribution of the OLS estimator
6. Hypothesis testing and confidence intervals for 0 and 1

1. Introduction to linear regression


Empirical problem: Class size and educational output
Policy question: What is the effect of reducing class size by one
student per class? by 8 students/class?
What is the right output (performance) measure?
o parent satisfaction
o student personal development
o future adult welfare
o future adult earnings
o performance on standardized tests

What do data say about class sizes and test scores?


The California Test Score Data Set
All K-6 and K-8 California school districts (n = 420)
Variables:
5th grade test scores (Stanford-9 achievement test, combined
math and reading), district average
Student-teacher ratio (STR) = number of students in the
district divided by number full-time equivalent teachers

An initial look at the California test score data

Do districts with smaller classes (lower STR) have higher test


scores?

The class size/test score policy question


What is the effect on test scores of reducing STR by one
student/class?
We want to know:
!
! the slope of the line relating test score and STR
This suggests that we want to draw a line through the Test
Score v. STR scatterplot
But how?

Notation and Terminology


We can write the population regression line (or population
regression function):
! Test Score = 0 + 1STR
This represents the true relationship between STR and Test
Score
1 = slope of population regression line:
! = change in test score for a unit change in STR
Why are 0 and 1 population parameters?
We would like to know the population value of 1
But we dont know 1, so must estimate it using data

How can we estimate 0 and 1 from data?


We will focus on the least squares (ordinary least squares or
OLS) estimator of the unknown parameters 0 and 0, which
solves,

The OLS estimator minimizes the average squared difference


between the actual values of Yi and the values of Yi predicted
by the estimated line
This minimization problem can be solved using calculus (but
the OLS estimator can be derived in two other ways, one of
which we will use
The result is the OLS estimators of 0 and 1
8

Why use OLS, rather than some other estimator?


OLS is a generalization of the sample average
If the line is just an intercept (no X), then the OLS estimator
is just the sample average of Y1,Yn (or

Like
o

, the OLS estimator has some desirable properties


under certain assumptions, it is unbiased

that is, E( 1 ) = 1

it has a tighter sampling distribution than some other


candidate estimators of 1 (more on this later)
Moreover, this is what everyone uses the common
language of linear regression
o

Application to the California test score-class size data

10

Estimated slope = 1 = 2.28

Estimated intercept = 0 = 698.9


Estimated OLS regression line:

= 698.9 2.28STR

11

Interpretation of the estimated slope and intercept


= 698.9 2.28STR
Districts with one more student per teacher on average have
test scores that are 2.28 points lower
= 2.28
That is,
The intercept (taken literally) means that, according to this
estimated line, districts with zero students per teacher would
have a (predicted) test score of 698.9
This interpretation of the intercept makes no sense it
extrapolates the line outside the range of the data in this
application, the intercept is not itself economically meaningful
12

Predicted values & residuals

One of the districts in the data set is Antelope, CA, for which STR
= 19.33 and Test Score = 657.8

13

For Antelope, the predicted value is


! !
= 698.9 2.28 19.33 = 654.8
and the residual is
!

= 657.8 654.8 = 3.0

14

OLS regression: Stata output


regress testscr str, robust
Regression with robust standard errors

Number of obs =
F(

1,

420

418) =

19.26

Prob > F

0.0000

R-squared

0.0512

Root MSE

18.581

------------------------------------------------------------------------|
testscr |

Robust
Coef.

Std. Err.

P>|t|

[95% Conf. Interval]

--------+---------------------------------------------------------------str |

-2.279808

.5194892

-4.39

0.000

-3.300945

-1.258671

_cons |

698.933

10.36436

67.44

0.000

678.5602

719.3057

-------------------------------------------------------------------------

= 698.9 2.28 STR


(Well discuss the rest of the output later)
15

The OLS regression line is an estimate, computed using our


sample of data

A different sample would have given a different value of 1

How can we:


o
o
o

quantify the sampling uncertainty associated with 1 ?

use 1 to test hypotheses such as 1 = 0?


construct a confidence interval for 1?

16

2. Defining the linear regression model


Yi = 0 + 1Xi + ui, ! for i = 1,...,n
We have n observations, (Xi,Yi), i = 1,..., n
X is the independent variable or regressor
Y is the dependent variable
0 = intercept
1 = slope
u = the regression error

17

Definition of the simple linear regression model

Explains variable

in terms of variable
Intercept

Dependent variable,
explained variable,
response variable,

Slope parameter

Independent variable,
explanatory variable,
regressor,

Error term,
disturbance,
unobservables,

engage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole o

18

CHAPTER 2 The Simple Regression Mo

Dependent variable

Independent variable

Explained variable

Explanatory variable

Response variable

Control variable

Predicted variable

Predictor variable

Regressand

Regressor

Cengage Learning, 2013

T A B L E 2 . 1 Terminology for Simple Regression

Yet another name for an X variable is covariate

response variable, the predicted variable, or the regressand; x is c


dent variable, the explanatory variable, the control variable, the p
or the regressor. (The term covariate is also used for x.) The terms d
and independent variable are frequently used in econometrics. Bu
label independent here does not refer to the statistical notion of inde
random variables (see Appendix B).
19

Population regression function (PRF)

The first part of the linear regression model (without the


error term) is the population regression line or
population regression function
linear regression model:Yi = 0 + 1Xi + ui
population regression function: E(Y|X) = 0 + 1X

The population regression function is a description


of the process that we believe is generating the data

20

E(Y|X) = 0 + 1X
(or Yi = 0 + 1Xi )

It can also be thought of as a statement of the true


relationship between X and Y
The PRF says that the average value of the dependent variable
can be expressed as a linear function of the explanatory
variable
E(Y|X) is a linear function of X
Two pictures of this follow

21

22

23

What is the regression error term?


Yi = 0 + 1Xi + ui, ! for i = 1,...,n

It consists of factors that are omitted or unobservable


In general, these omitted factors are factors that influence Y,
other than the variable X
The regression error also includes error in the measurement
of Y (measurement error, which will talk about later in the
course)
What are some of the omitted factors in the class size / test
score example?

24

Regression Model
Omitted
variables
create
for regression
!
Interpretation
of the
simpleproblems
linear regression
model
analysis
Studies how

varies with changes in

as long as

By how much does the dependent


variable change if the independent
variable is increased by one unit?

Interpretation only correct if all other


things remain equal when the independent variable is increased by one unit

The simple linear regression model is rarely applicable in practice but its discussion is useful for pedagogical reasons

2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.

25

!
!

Example: Soybean yield and fertilizer


Example: Soybean yield and fertilizer
Examples

Measures the effect of fertilizer on


yield,
holding
all other
factors fixed
Measures
the effect
of fertilizer
on
yield, holding all other factors fixed
!
!

Rainfall,
land quality,
Rainfall,
presence
land
quality,of parasites,
presence of parasites,

Example: A simple wage equation


Example: A simple wage equation

Measures the change in hourly wage


Measures
the change
hourly wage
given
another
year of in
education,
given another
year
of education,
holding
all other
factors
fixed
holding all other factors fixed

Labor force experience,


Labor
force
experience,
tenure
with
current employer,
tenure
employer,
work with
ethic,current
intelligence

work ethic, intelligence

2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in who
2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or

26

Beware: Correlation does not imply causation!


(Say it 10 times each day!)
Suppose you compared two groups of cities:
One has a high number of ashtrays per capita
The other has a low number of ashtrays per capita
You observe that the mortality rate is higher in the first group
of cities than in the second
Would you conclude that ashtrays cause people to die at a
higher rate?

27

Interpretation of 1
!

1 = Y / X

if u = 0

So it has to be the case that X and u are independent (actually


mean independence will do, but dont worry about that yet):
!

E(u | X) = 0

What does this mean?

28

Meaning of E(u | X) = 0

This means that u and X cannot be systematically related


If we are modeling earnings and education:
earningsi = 0 + 1educationi + ui,
then it seems likely that ability and motivation, are in u
And if E(ability | education = 16 years) >
E(ability | education = 12 years)
then E(u | X) = 0 is violated
People with more schooling have more ability, and they would
have earned more even without the additional schooling

29

Again
E(ability | education = 16 years) >
E(ability | education = 12 years)
means E(u | X) is violated
1 will estimate the additional earnings resulting from the
combination of (a) an increase schooling and (b) the added
ability that goes with the additional schooling
If will not identify a clean or pure return to one more year
of education
Footnote: corr(u,X) = 0 is not enough (although it is a good start)
because correlation is measure only of linear association

30

When is there a causal interpretation?


en is there a causal interpretation?
!
Conditional
meaninterpretation
independencepossible?
assumption
When is a causal
ditional mean independence assumption
When the zero conditional mean assumption is satisfied
!

The explanatory variable must not


information
The explanatorycontain
variable
must not about the mean
of theabout
unobserved
factors
contain information
the mean
of the unobserved factors

Example: wage equation


mple: wage equation
!

e.g. intelligence
e.g. intelligence

For
example, with a wage equation, the independence assumption
The conditional mean independence assumption is unlikely to hold because
e conditional
mean independence
assumption
is unlikely
to hold
because
more
education
will
also be
more
intelligent
on average.
is individuals
unlikely
towith
hold
because
individuals
with
more
education
will
ividuals with more education will also be more intelligent on average.
also be more intelligent (or more able) on average

2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in
ning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.

31

So the simple regression model is rarely applicable in


practice
The only time it really makes sense is with randomized trials
Other possible examples:
o Capital Asset Pricing Model
o Phillips Curve
o Okuns Law
But even with these, you would want to consider a more
elaborate model
But we need to walk before we run, so simple regression is the
place to start

32

3. OLS estimator the linear regression model

The next question is, how can we estimate 0 and 1 from


data?
Again we will focus on the ordinary least squares (OLS)
estimator, which minimizes the average squared difference
between the actual values of Yi and the predicted values, based
on the estimated line,

33

So we want to minimize the sum of squared residuals


Remember what a residual is: the difference between the actual
value of Yi and the predicted value

u i = Yi Yi

u i = Yi ( 0 + 1X i )
because the fitted or predicted value of Yi is:

Yi = 0 + 1X i
Here is a picture

34

given by (2.17) and (2.19). The name ordinary least squares comes from the fact that
these estimates minimize the sum of squared residuals.
F I G U R E 2 . 4 Fitted values and residuals.
y

yi
y 5 0 1 1x

yi 5 fitted value

y 1
y1
x1

xi

Cengage Learning, 2013

i 5 residual

35

Solution by the method of moments


There are three ways of solving this, and the most
straightforward is the method of moments approach used by
Wooldridge (section 2.2)
Here it is, in outline (you can see Wooldridge for all the steps)
Remember that the population regression model is:
Yi = 0 + 1Xi
We will impose two restrictions (assumptions) on this model
that will give us a solution

36

Write down the restrictions


First, the average value of u in the population is 0:
! E(u) = 0
Second, remember the average value of u doesnt depend on X:
! E(u|X) = E(u)
But E(u) = 0, so:
! E(u|X) = 0
This is the zero conditional mean assumption, and it implies that u
and X are uncorrelated:
! cov(X,u) = E(Xu) = 0
37

Impose the restrictions


Remember that Yi = 0 + 1Xi + ui
It follows that ui = Yi 0 1Xi
Substitute this expression for u into the two restrictions:
E(u) = 0 => E(Yi 0 1Xi) = 0
E(Xu) = 0 => E[X (Yi 0 1Xi)] = 0

38

Get the results


This gives us two equations in two unknowns (0 and 1), so if we
have a sample of data, we can choose estimates of 0 and 1 that
solve the two equations we will call the estimates b0 and b1
So the sample counterparts to the two restrictions are:
! (1/n) [(Yi b0 b1Xi) = 0
and
! (1/n) Xi [(Yi b0 b1Xi) = 0

39

Ordinary least squares (OLS) estimator of 0


If you work through some algebra (see Wooldridge), if turns out
that

Y = b0 + b1X
so

b0 = Y b1X
so we have our estimator of 0

40

Ordinary least squares (OLS) estimator of 1


A little more algebra gives you the estimator of 1:

b =
1

n
i=1

(X i X )(Yi Y )

2
(X

X
)
i=1 i
n

and we have triumphed!


Note: b0 and b1 are also denoted

and

because they are estimators of 0 and 1


41

So what?
This matters for a couple of reasons
First, it gives you (or Stata) a way to actually compute b0 and b1
Second, the formula for b1 says that
b1 = cov(Y, X) / var(X)
!

= sXY / s2X

42

Example 1: CEO salary and return on equity


!
salary in is $1,000s, and roe is return on equity (in %) at the CEOs
firm (CEOSAL1.dta)
The fitted regression is:
!
If the return on equity increases by 1 percentage point, then the
CEOs salary is predicted to increase by $18,501
Would you give this a causal interpretation?

43

!  963.191  18.501 roe and the


F I G U R E 2 . 5 The OLS regression line salary
(unknown) population regression function.
salary
salary 5 963.191 1 18.501 roe

E(salary|roe) 5

roe

roe

Cengage Learning, 2013

963.191

44

Example 2: Earnings and education


!
wage is hourly wage in dollars, and educ is years of education
(WAGE1.dta)
What is cov(X,Y)?
What is var(X)?
What is the estimate of the coefficient on educ?
Would you give this a causal interpretation?

45

Notation and terminology again


The linear regression model written with population parameters:
Yi = 0 + 1Xi + ui,
The population regression line (or population regression function,
PRF):
Yi = 0 + 1Xi
or
E(Y|X) = 0 + 1X

46

The OLS regression line (or sample regression function, SRF):

Yi = 0 + 1X i (this is a fitted value for Y)

Yi = 0 + 1X i + u i (the actual value of Y)

u i = Yi Yi = Yi ( 0 + 1X i ) (this is a residual)
The first line (the fitted value) can also be written:

Yi = b0 + b1X i (just different notation)

47

You might also like