7,8-Simple Regression PDF

Economics 420
Introduction to Econometrics
Professor Woodbury
Fall Semester 2015
Simple Regression (Regression with One Regressor)
1. Introduction to linear regression
2. Defining the linear regression model
3. Estimating the linear regression model method of moments
4. Algebraic Properties and measures of fit of OLS
5. Sampling distribution of the OLS estimator
6. Hypothesis testing and confidence intervals for 0 and 1
1. Introduction to linear regression

Empirical problem: Class size and educational output
Policy question: What is the effect of reducing class size by one
student per class? by 8 students/class?
What is the right output (performance) measure?
o parent satisfaction
o student personal development
o future adult welfare
o future adult earnings
o performance on standardized tests
What do data say about class sizes and test scores?

The California Test Score Data Set
All K-6 and K-8 California school districts (n = 420)
Variables:
5th grade test scores (Stanford-9 achievement test, combined
math and reading), district average
Student-teacher ratio (STR) = number of students in the
district divided by number full-time equivalent teachers
An initial look at the California test score data
Do districts with smaller classes (lower STR) have higher test

scores?
The class size/test score policy question

What is the effect on test scores of reducing STR by one
student/class?
We want to know:
!
! the slope of the line relating test score and STR
This suggests that we want to draw a line through the Test
Score v. STR scatterplot
But how?
Notation and Terminology

We can write the population regression line (or population
regression function):
! Test Score = 0 + 1STR
This represents the true relationship between STR and Test
Score
1 = slope of population regression line:
! = change in test score for a unit change in STR
Why are 0 and 1 population parameters?
We would like to know the population value of 1
But we dont know 1, so must estimate it using data
How can we estimate 0 and 1 from data?

We will focus on the least squares (ordinary least squares or
OLS) estimator of the unknown parameters 0 and 0, which
solves,
The OLS estimator minimizes the average squared difference

between the actual values of Yi and the values of Yi predicted
by the estimated line
This minimization problem can be solved using calculus (but
the OLS estimator can be derived in two other ways, one of
which we will use
The result is the OLS estimators of 0 and 1
8
Why use OLS, rather than some other estimator?

OLS is a generalization of the sample average
If the line is just an intercept (no X), then the OLS estimator
is just the sample average of Y1,Yn (or
Like
o
, the OLS estimator has some desirable properties

under certain assumptions, it is unbiased
that is, E( 1 ) = 1
it has a tighter sampling distribution than some other

candidate estimators of 1 (more on this later)
Moreover, this is what everyone uses the common
language of linear regression
o
Application to the California test score-class size data
10
Estimated slope = 1 = 2.28
Estimated intercept = 0 = 698.9

Estimated OLS regression line:
= 698.9 2.28STR
11
Interpretation of the estimated slope and intercept

= 698.9 2.28STR
Districts with one more student per teacher on average have
test scores that are 2.28 points lower
= 2.28
That is,
The intercept (taken literally) means that, according to this
estimated line, districts with zero students per teacher would
have a (predicted) test score of 698.9
This interpretation of the intercept makes no sense it
extrapolates the line outside the range of the data in this
application, the intercept is not itself economically meaningful
12
Predicted values & residuals
One of the districts in the data set is Antelope, CA, for which STR
= 19.33 and Test Score = 657.8
13
For Antelope, the predicted value is

! !
= 698.9 2.28 19.33 = 654.8
and the residual is
!
= 657.8 654.8 = 3.0
14
OLS regression: Stata output

regress testscr str, robust
Regression with robust standard errors
Number of obs =
F(
1,
420
418) =
19.26
Prob > F
0.0000
R-squared
0.0512
Root MSE
18.581
------------------------------------------------------------------------|
testscr |
Robust
Coef.
Std. Err.
P>|t|
[95% Conf. Interval]
--------+---------------------------------------------------------------str |
-2.279808
.5194892
-4.39
0.000
-3.300945
-1.258671
_cons |
698.933
10.36436
67.44
0.000
678.5602
719.3057
-------------------------------------------------------------------------
= 698.9 2.28 STR

(Well discuss the rest of the output later)
15
The OLS regression line is an estimate, computed using our

sample of data
A different sample would have given a different value of 1
How can we:

o
o
o
quantify the sampling uncertainty associated with 1 ?
use 1 to test hypotheses such as 1 = 0?

construct a confidence interval for 1?
16
2. Defining the linear regression model

Yi = 0 + 1Xi + ui, ! for i = 1,...,n
We have n observations, (Xi,Yi), i = 1,..., n
X is the independent variable or regressor
Y is the dependent variable
0 = intercept
1 = slope
u = the regression error
17
Definition of the simple linear regression model
Explains variable
in terms of variable
Intercept
Dependent variable,
explained variable,
response variable,
Slope parameter
Independent variable,
explanatory variable,
regressor,
Error term,
disturbance,
unobservables,
engage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole o
18
CHAPTER 2 The Simple Regression Mo
Dependent variable
Independent variable
Explained variable
Explanatory variable
Response variable
Control variable
Predicted variable
Predictor variable
Regressand
Regressor
Cengage Learning, 2013
T A B L E 2 . 1 Terminology for Simple Regression
Yet another name for an X variable is covariate
response variable, the predicted variable, or the regressand; x is c

dent variable, the explanatory variable, the control variable, the p
or the regressor. (The term covariate is also used for x.) The terms d
and independent variable are frequently used in econometrics. Bu
label independent here does not refer to the statistical notion of inde
random variables (see Appendix B).
19
Population regression function (PRF)
The first part of the linear regression model (without the

error term) is the population regression line or
population regression function
linear regression model:Yi = 0 + 1Xi + ui
population regression function: E(Y|X) = 0 + 1X
The population regression function is a description

of the process that we believe is generating the data
20
E(Y|X) = 0 + 1X
(or Yi = 0 + 1Xi )
It can also be thought of as a statement of the true

relationship between X and Y
The PRF says that the average value of the dependent variable
can be expressed as a linear function of the explanatory
variable
E(Y|X) is a linear function of X
Two pictures of this follow
21
22
23
What is the regression error term?

Yi = 0 + 1Xi + ui, ! for i = 1,...,n
It consists of factors that are omitted or unobservable

In general, these omitted factors are factors that influence Y,
other than the variable X
The regression error also includes error in the measurement
of Y (measurement error, which will talk about later in the
course)
What are some of the omitted factors in the class size / test
score example?
24
Regression Model
Omitted
variables
create
for regression
!
Interpretation
of the
simpleproblems
linear regression
model
analysis
Studies how
varies with changes in
as long as
By how much does the dependent

variable change if the independent
variable is increased by one unit?
Interpretation only correct if all other

things remain equal when the independent variable is increased by one unit
The simple linear regression model is rarely applicable in practice but its discussion is useful for pedagogical reasons
2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
25
!
!
Example: Soybean yield and fertilizer

Example: Soybean yield and fertilizer
Examples
Measures the effect of fertilizer on

yield,
holding
all other
factors fixed
Measures
the effect
of fertilizer
on
yield, holding all other factors fixed
!
!
Rainfall,
land quality,
Rainfall,
presence
land
quality,of parasites,
presence of parasites,
Example: A simple wage equation

Example: A simple wage equation
Measures the change in hourly wage

Measures
the change
hourly wage
given
another
year of in
education,
given another
year
of education,
holding
all other
factors
fixed
holding all other factors fixed
Labor force experience,

Labor
force
experience,
tenure
with
current employer,
tenure
employer,
work with
ethic,current
intelligence
work ethic, intelligence
2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in who
2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or
26
Beware: Correlation does not imply causation!

(Say it 10 times each day!)
Suppose you compared two groups of cities:
One has a high number of ashtrays per capita
The other has a low number of ashtrays per capita
You observe that the mortality rate is higher in the first group
of cities than in the second
Would you conclude that ashtrays cause people to die at a
higher rate?
27
Interpretation of 1
!
1 = Y / X
if u = 0
So it has to be the case that X and u are independent (actually

mean independence will do, but dont worry about that yet):
!
E(u | X) = 0
What does this mean?
28
Meaning of E(u | X) = 0
This means that u and X cannot be systematically related

If we are modeling earnings and education:
earningsi = 0 + 1educationi + ui,
then it seems likely that ability and motivation, are in u
And if E(ability | education = 16 years) >
E(ability | education = 12 years)
then E(u | X) = 0 is violated
People with more schooling have more ability, and they would
have earned more even without the additional schooling
29
Again
E(ability | education = 16 years) >
E(ability | education = 12 years)
means E(u | X) is violated
1 will estimate the additional earnings resulting from the
combination of (a) an increase schooling and (b) the added
ability that goes with the additional schooling
If will not identify a clean or pure return to one more year
of education
Footnote: corr(u,X) = 0 is not enough (although it is a good start)
because correlation is measure only of linear association
30
When is there a causal interpretation?

en is there a causal interpretation?
!
Conditional
meaninterpretation
independencepossible?
assumption
When is a causal
ditional mean independence assumption
When the zero conditional mean assumption is satisfied
!
The explanatory variable must not

information
The explanatorycontain
variable
must not about the mean
of theabout
unobserved
factors
contain information
the mean
of the unobserved factors
Example: wage equation

mple: wage equation
!
e.g. intelligence
e.g. intelligence
For
example, with a wage equation, the independence assumption
The conditional mean independence assumption is unlikely to hold because
e conditional
mean independence
assumption
is unlikely
to hold
because
more
education
will
also be
more
intelligent
on average.
is individuals
unlikely
towith
hold
because
individuals
with
more
education
will
ividuals with more education will also be more intelligent on average.
also be more intelligent (or more able) on average
2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in
ning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
31
So the simple regression model is rarely applicable in

practice
The only time it really makes sense is with randomized trials
Other possible examples:
o Capital Asset Pricing Model
o Phillips Curve
o Okuns Law
But even with these, you would want to consider a more
elaborate model
But we need to walk before we run, so simple regression is the
place to start
32
3. OLS estimator the linear regression model
The next question is, how can we estimate 0 and 1 from

data?
Again we will focus on the ordinary least squares (OLS)
estimator, which minimizes the average squared difference
between the actual values of Yi and the predicted values, based
on the estimated line,
33
So we want to minimize the sum of squared residuals

Remember what a residual is: the difference between the actual
value of Yi and the predicted value
u i = Yi Yi
u i = Yi ( 0 + 1X i )
because the fitted or predicted value of Yi is:
Yi = 0 + 1X i
Here is a picture
34
given by (2.17) and (2.19). The name ordinary least squares comes from the fact that
these estimates minimize the sum of squared residuals.
F I G U R E 2 . 4 Fitted values and residuals.
y
yi
y 5 0 1 1x
yi 5 fitted value
y 1
y1
x1
xi
i 5 residual
35
Solution by the method of moments

There are three ways of solving this, and the most
straightforward is the method of moments approach used by
Wooldridge (section 2.2)
Here it is, in outline (you can see Wooldridge for all the steps)
Remember that the population regression model is:
Yi = 0 + 1Xi
We will impose two restrictions (assumptions) on this model
that will give us a solution
36
Write down the restrictions

First, the average value of u in the population is 0:
! E(u) = 0
Second, remember the average value of u doesnt depend on X:
! E(u|X) = E(u)
But E(u) = 0, so:
! E(u|X) = 0
This is the zero conditional mean assumption, and it implies that u
and X are uncorrelated:
! cov(X,u) = E(Xu) = 0
37
Impose the restrictions

Remember that Yi = 0 + 1Xi + ui
It follows that ui = Yi 0 1Xi
Substitute this expression for u into the two restrictions:
E(u) = 0 => E(Yi 0 1Xi) = 0
E(Xu) = 0 => E[X (Yi 0 1Xi)] = 0
38
Get the results

This gives us two equations in two unknowns (0 and 1), so if we
have a sample of data, we can choose estimates of 0 and 1 that
solve the two equations we will call the estimates b0 and b1
So the sample counterparts to the two restrictions are:
! (1/n) [(Yi b0 b1Xi) = 0
and
! (1/n) Xi [(Yi b0 b1Xi) = 0
39
Ordinary least squares (OLS) estimator of 0

If you work through some algebra (see Wooldridge), if turns out
that
Y = b0 + b1X
so
b0 = Y b1X
so we have our estimator of 0
40
Ordinary least squares (OLS) estimator of 1

A little more algebra gives you the estimator of 1:
b =
1
n
i=1
(X i X )(Yi Y )
2
(X
X
)
i=1 i
n
and we have triumphed!

Note: b0 and b1 are also denoted
and
because they are estimators of 0 and 1

41
So what?
This matters for a couple of reasons
First, it gives you (or Stata) a way to actually compute b0 and b1
Second, the formula for b1 says that
b1 = cov(Y, X) / var(X)
!
= sXY / s2X
42
Example 1: CEO salary and return on equity

!
salary in is $1,000s, and roe is return on equity (in %) at the CEOs
firm (CEOSAL1.dta)
The fitted regression is:
!
If the return on equity increases by 1 percentage point, then the
CEOs salary is predicted to increase by $18,501
Would you give this a causal interpretation?
43
! 963.191 18.501 roe and the

F I G U R E 2 . 5 The OLS regression line salary
(unknown) population regression function.
salary
salary 5 963.191 1 18.501 roe
E(salary|roe) 5
roe
roe
963.191
44
Example 2: Earnings and education

!
wage is hourly wage in dollars, and educ is years of education
(WAGE1.dta)
What is cov(X,Y)?
What is var(X)?
What is the estimate of the coefficient on educ?
Would you give this a causal interpretation?
45
Notation and terminology again

The linear regression model written with population parameters:
Yi = 0 + 1Xi + ui,
The population regression line (or population regression function,
PRF):
Yi = 0 + 1Xi
or
E(Y|X) = 0 + 1X
46
The OLS regression line (or sample regression function, SRF):
Yi = 0 + 1X i (this is a fitted value for Y)
Yi = 0 + 1X i + u i (the actual value of Y)
u i = Yi Yi = Yi ( 0 + 1X i ) (this is a residual)
The first line (the fitted value) can also be written:
Yi = b0 + b1X i (just different notation)
47

7,8-Simple Regression PDF

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

7,8-Simple Regression PDF

Uploaded by

Copyright:

Available Formats

Economics 420

1. Introduction to linear regression

What do data say about class sizes and test scores?

An initial look at the California test score data

Do districts with smaller classes (lower STR) have higher test

The class size/test score policy question

Notation and Terminology

How can we estimate 0 and 1 from data?

The OLS estimator minimizes the average squared difference

Why use OLS, rather than some other estimator?

, the OLS estimator has some desirable properties

it has a tighter sampling distribution than some other

Application to the California test score-class size data

Estimated slope = 1 = 2.28

Estimated intercept = 0 = 698.9

Interpretation of the estimated slope and intercept

Predicted values & residuals

For Antelope, the predicted value is

= 657.8 654.8 = 3.0

OLS regression: Stata output

[95% Conf. Interval]

= 698.9 2.28 STR

The OLS regression line is an estimate, computed using our

A different sample would have given a different value of 1

How can we:

quantify the sampling uncertainty associated with 1 ?

use 1 to test hypotheses such as 1 = 0?

2. Defining the linear regression model

Definition of the simple linear regression model

CHAPTER 2 The Simple Regression Mo

Cengage Learning, 2013

T A B L E 2 . 1 Terminology for Simple Regression

Yet another name for an X variable is covariate

response variable, the predicted variable, or the regressand; x is c

Population regression function (PRF)

The first part of the linear regression model (without the

The population regression function is a description

It can also be thought of as a statement of the true

What is the regression error term?

It consists of factors that are omitted or unobservable

varies with changes in

By how much does the dependent

Interpretation only correct if all other

Example: Soybean yield and fertilizer

Measures the effect of fertilizer on

Example: A simple wage equation

Measures the change in hourly wage

Labor force experience,

work ethic, intelligence

Beware: Correlation does not imply causation!

So it has to be the case that X and u are independent (actually

What does this mean?

This means that u and X cannot be systematically related

When is there a causal interpretation?

The explanatory variable must not

Example: wage equation

So the simple regression model is rarely applicable in

3. OLS estimator the linear regression model

The next question is, how can we estimate 0 and 1 from

So we want to minimize the sum of squared residuals

Cengage Learning, 2013

Solution by the method of moments

! 963.191 18.501 roe and the