You are on page 1of 57

1

Properties of the
OLS Estimator
Quantitative
Methods 2
Lecture 5
Edmund Malesky, Ph.D., UCSD
2
Solutions for
0
and
1
OLS Chooses values of
0
and
1
that
minimizes the unexplained sum of squares.
To find minimum take partial derivatives with
respect to
0
and
1

=
=
n
i
u SSR
1
2

( )
i i
SSR y y =

2
0 1
( )
i i
SSR y x | | =

3
Solutions for
0
and
1
Derivatives were transformed into the
normal equations
Solving the normal equations for
0

and
1
gives us our OLS estimators
0 1
2
0 1

( ) ( )

( ) ( )
i i
i i i i
y n x
x y x x
| |
| |
= +
= +


4
Solutions for
0
and
1
Our estimate of the slope of the line is:




And our estimate of the intercept is:
1
2
( )( )

( )
i i
i
x x y y
x x
|

=

0 1

y x | | =
5
Estimators and the True
Coefficients
are the truecoefficients if we
only wanted to describe the data we
have observed
We are almost ALWAYS using data to
draw conclusions about cases outside
our data
Thus are estimates of some
true set of coefficients (
0
and
1
) that

exist beyond our observed data
0 1

& | |
0 1

& | |
6
Some Terminology for Labeling
Estimators
Various conventions used to
distinguish the true coefficients
from the estimates that we observe.
We will use the beta versus beta-
hat distinction from Wooldridge.

But other authors, textbooks, or
websites may use different terms.




: ;

Wooldridge
u u
| |

&

Others
b b
b
b B a A
e u
e e
|

Think of this as the


same distinction
between population
values and sample-
based estimates

7
Gauss-Markov Theorem:
Under the 5 Gauss-Markov assumptions,
the OLS estimator is the best, linear,
unbiased estimator of the true parameters
(s) conditional on the sample values of
the explanatory variables. In other words,
the OLS estimators is BLUE
Karl Freidrich
Gauss
Andrey Markov
8
5 Gauss-Markov Assumptions for
Simple Linear Model (Wooldridge, p.65)

1. Linear in Parameters

2. Random Sampling of n
observations

3. Sample variation in
explanatory variables. x
i
s
are not all the same value

4. Zero conditional mean. The
error u has an expected
value of 0, given any values
of the explanatory variable

5. Homoskedasticity. The error
has the same variance given
any value of the explanatory
variable.

,
( ) : 1, 2,...
i i
x y i n =
0 1 1
y x u | | = + +
1 2 3
( ... )
n
x x x x x = = = = =
( ) 0 E u x =
2
( ) Var u x o =
9
The Linearity Assumption
Key to understanding
OLS models.
The restriction is that our
model of the population
must be linear in the
parameters.
A model cannot be non-
linear in the parameters

Non-linear in the
variables (xs), however,
is fine and quite useful


0 1 1 2 2
...
k k
y x x x u | | | | = + + + +
2
0 1 1
0 1 1
[ ]
[ ln( ) ]
OLS y x
OLS y x
| |
| |
= = +
= = +
2
0 1 1 1
0 1 1
[ ]
[ ln( )]
OLS y x x
OLS y x
| | |
| |
~ = + +
~ = +
y=ln(x)
10
y
x
0
1
Interpretation of Logs in
Regression Analysis
11
Model Dependent
Variable
Independent
Variable
Interpretation of
1
Level - Level y x y=
1
x
Level - Log y ln(x) y=(
1
/100)%x

Log - Level ln(y) x %y=(100*
1
)x
Log - Log ln(y) ln(x) %y=
1
%x
12
y
x
0
Quadratic Function (y=6+8x-2x
2)
6
14
2
13
F(y/x)
y
x
x
1

Demonstration of the Homskedasticity Assumption
Predicted Line Drawn Under Homoskedasticity
x
2

x
4

x
3

x y
1 0

B + B =
Variance across values
of x is constant
14
F(y/x)
y
x
x
1

Demonstration of the Homskedasticity Assumption
Predicted Line Drawn Under Heteroskedasticity
x
2

x
4

x
3

x y
1 0

B + B =
Variance differs across
values of x
15
How Good are the Estimates?
Properties of Estimators
Small Sample Properties
True regardless of how much data we have
Most desirable characteristics
Unbiased
Efficient
BLUE (Best Linear Unbiased Estimator)
16
Second Best Properties of
Estimators
Asymptotic (or large sample) Properties
True in hypothetical instance of infinite data
In practice applicable if N>50 or so
Asymptotically unbiased
Consistency
Asymptotic efficiency
17
Bias
A parameter is unbiased if






In other words, the average value of the estimator
in repeated sampling equals the true parameter.
Note that whether an estimator is biased or not
implies nothing about its dispersion.

( ) , 0,1,....,
j j
E j k | | = =
18
Efficiency
An estimator is efficient if its variance is less
than any other estimator of the parameter.

This criterion only useful in combination with
others. (e.g. =2 is low variance, but biased)

is thebest Unbiased estimator if


,where is any other unbiased estimator
of

j
|

j
|

( ) ( )
j j
Var Var | | s
j
|


We might want to
choose a biased
estimator, if it has a
smaller variance.
19
True
F(x)
+ bias
Biased estimator
of
Unbiased and
efficient estimator
of
High Sampling
Variance means
inefficient
estimator of

0
20
BLUE
(Best Linear Unbiased Estimate)
An Estimator is BLUE
if:

is a linear function


is unbiased:

is the most efficient:

j
|

j
|

j
|

j
|

( ) , 0,1,....,
j j
E j k | | = =

( ) ( )
j j
Var Var | | s
21
Large Sample Properties
Asymptotically Unbiased
As n becomes larger E( ) trends toward
j
Consistency
If the bias and variance both decrease as n
gets larger, the estimator is consistent.
Asymptotic Efficiency
asymptotic distribution with finite mean and
variance
is consistent
no estimator has smaller asymptotic variance

j
|
22
F(x)
True
0
n=50
Demonstration of
Consistency
n=16
n=4
23
Lets Show that OLS is
Unbiased
Begin with our equation: y
i
=
0
+
1
x
i
+u
u ~ N(0,
2
) and y
i
~ N(
0
+
1
x
i
,
2
)
A Linear function of a normal random
variable is also a normal random
variable
Thus
0
and
1
are normal random
variables
24
The Robust Assumption of
Normality
Even if we do not know the distribution of y,
0

and
1
will behave like normal random variables
Central Limit Theorem says estimates of the
mean of any random variable will approach
normal as n increases
Assumes cases are independent (errors not
correlated) and identically distributed (i.i.d).
This is critical for hypothesis testing
s are normal regardless of y
25
Showing
1
hat
is Unbiased
Recall the formula for



From rules of summation properties, this
reduces to:


1
2
( )( )

( )
i i
i
x x y y
x x
|

=

j
|
1
2
( )

( )
i i
i
x x y
x x
|

=

26
Showing
1
hat
is Unbiased
Now we substitute for y
i
to yield:



This expands to:
0 1 1
1
2
( )( )

( )
i i
i
x x x u
x x
| |
|
+ +
=

0 1 1
1
2
( ) ( ) ( )

( )
i i i i
i
x x x x x x x u
x x
| |
|
+ +
=

27
Showing
1
hat
is Unbiased
Now, we can separate terms to yield:



Now, we need to rely on two more rules of
summation:
0 1 1
1
2 2 2
( ) ( ) ( )

( ) ( ) ( )
i i i i
i i i
x x x x x x x u
x x x x x x
| |
|

= + +



1
2 2 2
1 1 1
6. ( ) 0
7. ( ) ( ) ( )
n
i
i
n n n
i i i i
i i i
x x
x x x n x x x x
=
= = =
=
= =


28
Showing
1
hat
is Unbiased
By the first summation rule, the first term = 0
By the second summation rule, the second term
=
1

This leaves:

1 1
2
( )

( )
i i
i
x x u
x x
| |

= +

29
Showing
1
hat
is Unbiased
Expanding the summations yields:



To show that
1
hat
is unbiased, we must show
that the expectation of
1
hat
=
1

1 1 2 2
1 1
2 2 2
( ) ( ) ( )

...
( ) ( ) ( )
n n
i i i
x x u x x u x x u
x x x x x x
| |

= + + +


1 1 2 2
1 1
2 2 2
( ) ( ) ( )

( ) ...
( ) ( ) ( )
n n
i i i
x x u x x u x x u
E E E E
x x x x x x
| |
( ( (

= + + +
( ( (

( ( (


30
Showing
1
hat
is Unbiased
Now we need Gauss-
Markov assumption4
that expected value of
the error term = 0

Then all terms after
1

are equal to 0

This reduces to:

( ) 0 E u x =
1 1

( ) 0 0... 0 E | | = + + +
1 1

( ) E | | =
Two Assumptions needed
to get this result:

1. xs are fixed
(measured without
error)
2. Expected Value of the
error is zero
31
Showing
0
hat
Is Unbiased
Begin with equation for
0
hat


Since :




Substitute for mean of y (y
bar)


0 1

y x | | =
0 1
0 1
... ,...
i i i
y x u
Then
y x u
| |
| |
= + +
= + +
0 0 1 1

x u x | | | | = + +
32
Showing
0
hat
Is Unbiased
Take expected value of both sides:

We just showed that

Thus,
1
s cancel each other out

This leaves:
Again, since then,

0 0 1 1

[ ] [ ] [ ] E x E u E x | | | | = + +
0 0

[ ] [ ] E E u | | = +
[ ] 0 E u =
0 0

[ ] E | | =
1 1

( ) E | | =
33
Notice Assumptions
Two key assumptions to show
0
hat
and

1
hat
are unbiased
x is fixed (meaning, it is measured without
error)
E(u)=0
Unbiasedness tells us that OLS will give
us a best guess at the slope and intercept
that is correct on average.
34
OK, but is it BLUE?
Now we have an estimator (
1
hat
)
We know that
1
hat
is unbiased
We can calculate the variance of
1
hat

across samples.
But is
1
hat
the Best Linear Unbiased
Estimator????
35
The variance of the
estimator and
hypothesis testing
36
The variance of the estimator
and hypothesis testing
We have derived an estimator for the
slope a line through data:
1
hat

We have shown that
1
hat
is an unbiased
estimator of the true relationship
1

Must assume x is measured without error
Must assume the expected value of the error
term is zero
37
Variance of
0
hat
and
1
hat

Even if
0
hat
and
1
hat
are right on average we
still want to know how far off they might be in a
given sample
Hypotheses are actually about
1
, not
1
hat

Thus we need to know the variance of
0
hat
and

1
hat

Use probability theory to draw conclusions about

1
, given our estimate of
1
hat

38
Variances of
0
hat
and
1
hat

Conceptually, the
variances of
0
hat
and

1
hat
are the expected
distance from their
individual values to their
mean values.
We can solve these
based on our proof of
unbiasedness
Recall from above:
2
0 0 0
2
1 1 1

Var( ) = E[( -E( )) ]

Var( ) = E[( -E( )) ]
| | |
| | |
1 1 2 2
1 1
2 2 2
( ) ( ) ( )

...
( ) ( ) ( )
n n
i i i
x x u x x u x x u
x x x x x x
| |

= + + +


39
The Variance of
1
hat

If a random variable (
1
hat
) is the linear
combination of other independently distributed
random variables (u)
Then the variance of
1
hat
is the sum of the
variances of the us
Note the assumption of independent observations
Applying this principle to the previous equation
yields:
40
The Variance of
1
hat



Now we need another Gauss-Markov
Assumption 5:

That is, we must assume that the variance of
the errors is constant. This yields:


2 2 2 2 2 2
1 1 2 2
1
2 2 2 2 2 2
( ) ( ) ( )

( ) ...
[ ( ) ] [ ( ) ] [ ( ) ]
u u n un
i i i
x x x x x x
Var
x x x x x x
o o o
|

= + +


2 2
2
2
1
2
...
un u u u
o o o o = = =
2
( ) Var u x o =
2
2 2
1
2 2
1
( )

( )
[ ( ) ]
i
u
i
x x
Var
x x
|
| o o

= =

41
The Variance of
1
hat
!
OR:

That is, the variance of
1
hat
is a function of
the variance of the errors (o
u
2
), and the
variation of x

Butwhat is the true variance of the errors?

2
2

1
2
1

( )
( )
u
i
Var
x x
|
o
| o = =

42
The Estimated Variance of
1
hat

We do not observe the o
u
2
- because we
dont observe
0
and
1


0
hat
and
1
hat
are unbiased, so we use the
variance of the observed residuals as an
estimator of the variance of the true errors
We lose 2 degrees of freedom by
substituting in estimators
0
hat
and
1
hat

43
The Estimated Variance of
1
hat

Thus:

This is an unbiased estimator of o
u
2

Thus the final equation for the estimated variance
of
1
hat
is:


New assumptions: independent observations and
constant error variance

2

2
2

n
u
i
u
o
2
2

1
2
1

( )
( )
u
i
Var
x x
|
o
| o = =

44
The Estimated Variance of
1
hat

has nice intuitive qualities
As the size of the errors decreases,
decreases
The line fits tightly through the data. Few other lines
could fit as well
As the variation in x increases, decreases
Few lines will fit without large errors for extreme
values of x
2

1 |
o
2

1 |
o
2

1 |
o
2
2

1
2
1

( )
( )
u
i
Var
x x
|
o
| o = =

45
The Estimated Variance of
1
hat

Because the variance of the estimated errors has
n in the denominator, as n increases, the
variance of
1
hat
decreases
The more data points we must fit to the line,
the smaller the number of lines that fit with few
errors
We have more information about where the
line must go
2

2
2

n
u
i
u
o
2
2

1
2
1

( )
( )
u
i
Var
x x
|
o
| o = =

46
Variance of
1
hat
is Important for
Hypothesis Testing
F test hypothesis that Null Model does
better
Log-likelihood Test joint significance of
variables in an MLE model
T-test tests that individual coefficients
are not zero.
This is the central task for testing most policy
theories
47
T-Tests
In general, our theories give us
hypotheses that
0
>0 or
1
<0, etc.
We can estimate
1
hat
, but we need a way
to assess the validity of statements that
1

is positive or negative, etc.
We can rely on our estimate of
1
hat
and its
variance to use probability theory to test
such statements.
48
Z Scores & Hypothesis Tests
We know that
1
hat
~ N(
1
,

)
Subtracting
1
from both sides, we can
see that (
1
hat
-
1
) ~ N( 0 ,

)
Then, if we divide by the standard
deviation we can see that:
(
1
hat
-
1
) /
1
hat
~ N( 0 , 1 )
To test the Null Hypothesis that
1
=0, we
can see that:
1
hat
/

~ N( 0 , 1 )

49
Z-Scores & Hypothesis Tests
This variable is a z-score based on the
standard normal distribution.
95% of cases are within 1.96 standard
deviations of the mean.
If
1
hat
/

> 1.96 then in a series of


random draws there is a 95% chance that

1
>0
The Problem is that we dont know

50
Z-Scores and t-scores
Obvious solution is to substitute
in place of




Problem:
1
hat
/

is the ratio of two random
variables, and this will not be normally
distributed

Fortunately, an employee of Guinness Brewery
figured out this distribution in 1919

1 |
o

1 |
o
51
The t-statistic
The statistic is called Students t, and the t-
distribution looks similar to a normal distribution

Thus
1
hat
/

~ t
(n-2)
for bivariate regression.

More generally
1
hat
/

~ t(n-k)
where k is the # of parameters estimated

1 |
o

1 |
o
52
The t-statistic
Note the addition of a degrees of
freedom constraint
Thus the more data points we have
relative to the number of parameters we
are trying to estimate, the more the t
distribution looks like the z distribution.
When n>100 the difference is negligible
53
Limited Information in Statistical
Significance Tests
Results often illustrative rather than
precise
Only tests not zero hypothesis does
not measure the importance of the
variable (look at confidence interval)
Generally reflects confidence that results
are robust across multiple samples
54
For Example Presidential
Approval and the CPI
reg approval cpi

Source | SS df MS Number of obs = 148
---------+------------------------------ F( 1, 146) = 9.76
Model | 1719.69082 1 1719.69082 Prob > F = 0.0022
Residual | 25731.4061 146 176.242507 R-squared = 0.0626
---------+------------------------------ Adj R-squared = 0.0562
Total | 27451.0969 147 186.742156 Root MSE = 13.276

------------------------------------------------------------------------------
approval | Coef. Std. Err. t P>|t| [95% Conf. Interval]
---------+--------------------------------------------------------------------
cpi | -.1348399 .0431667 -3.124 0.002 -.2201522 -.0495277
_cons | 60.95396 2.283144 26.697 0.000 56.44168 65.46624
------------------------------------------------------------------------------

. sum cpi

Variable | Obs Mean Std. Dev. Min Max
---------+-----------------------------------------------------
cpi | 148 46.45878 25.36577 23.5 109

55
So the distribution of
1
hat
is:
F
r
a
c
t
i
o
n
Simd cpi parameter
-.3 -.2 -.135 -.1 -.05 0 .1
0
.3
56
Now Lets Look at Approval and
the Unemployment Rate
. reg approval unemrate

Source | SS df MS Number of obs = 148
---------+------------------------------ F( 1, 146) = 0.85
Model | 159.716707 1 159.716707 Prob > F = 0.3568
Residual | 27291.3802 146 186.927262 R-squared = 0.0058
---------+------------------------------ Adj R-squared = -0.0010
Total | 27451.0969 147 186.742156 Root MSE = 13.672

------------------------------------------------------------------------------
approval | Coef. Std. Err. t P>|t| [95% Conf. Interval]
---------+--------------------------------------------------------------------
unemrate | -.5973806 .6462674 -0.924 0.357 -1.874628 .6798672
_cons | 58.05901 3.814606 15.220 0.000 50.52003 65.59799
------------------------------------------------------------------------------

. sum unemrate

Variable | Obs Mean Std. Dev. Min Max
---------+-----------------------------------------------------
unemrate | 148 5.640541 1.744879 2.6 10.7
57
Now the Distribution of
1
hat
is:
F
r
a
c
t
i
o
n
Simd unemrate parameter
-3 -2 -1 -.597 0 .67 1 2 3
0
.25

You might also like