You are on page 1of 7

ECO391 Lecture Handout Over 15.

3
Spri ng 2003, G. Hoyt
I. !at i" #!e $et!od o% Lea" t S&uare " ' Ordi nary Lea"t S&uare " (
). #!eory
*. +or,u- a
C. )pp-i cati on
II. Standard Error o% t!e E"ti ,at e
I. Ordinary Lea"t S&uare " 'OLS( 'a- " o ca-- ed #!e $et!od o% Lea"t S&uare " (
). #!e #!eory
Ordinary least squar es is a statistical techni que that uses sampl e dat a to esti mat e the
true popul ation relationshi p between two variabl es.
Recall that :
1) E'.
i
/
i
( 0
o
1
1
/
i
is the popu- ati on regre " " i on -ine
2) .
i'!at (
0 2
o
1 2
1
/
i
is the "a,p- e regre " " i on e&uat i on
OLS allows us to find 0 2
o
and 2
1
.
onsider the followin! scat t er plot dia!ram the shows the actual" observed dat a points in
a sampl e:
#
$
%any lines could fit throu!h thes e dat a points. &e want to det er mi ne the line with the
'best fit.'
!at doe " it ,ean to "ay a -ine %it" t!e dat a t!e 2e" t 3
Recall that e
i(hat)
" the residual" repres ent s the dist ance between the sampl e
re!ressi on line and the observed dat a point" ($
i
"#
i
). )he line that minimi*es the
sum of these dist ances is the one that !ives us the best fit.
+owever" some of the values of the residual s are ne!ative in si!n while others are
positive. ,f we sum the residual s" positive values will cancel out ne!ative values so
the sum will not accur at el y reflect the tot al amount of error.
)o solve this probl em we squar e the residual s before we add them to!et her.
#!e $et!od o% -ea" t "&uare " 4 'OLS( produces a line that minimi*es the sum of the
squar ed vertical dist ances from the line to the observed dat a point s.
i.e. it minimi*es e
i
2
- e
1
2
. e
2
2
. e
/
2
.......... . e
n
2
" where n is the sampl e si*e
(hat s over all of the e0 s)
)he sum of the residual s (unsquar ed) is e1actly *ero. (Lat er" you can use this bit of
informati on to chec2 your wor2.)
*. +or,u- a" 5 Ho6 doe" OLS get e" ti ,at e " o% t!e coe% %i ci ent " 3
e
i
2
i" a- "o ca-- ed t!e re"i dua- "u, o% "&uare " 'SSE (. #!i" i" t!e a,ount
t!at 6e 6ant to ,ini ,i 7e.
SSE 0 e
i'!at(
2
'1(
0 '.
i
5 .'
i'!at(
(
2
'2(
0 '.
i
5 2
o
5 2
1
/
i
(
2
'3(
3ow consider equati on (/). , am !oin! to as2 you to try to remember a little calculus.
&e can consider (/) as a mat he ma t i cal function of b
o
and b
1
" f(b
o
" b
1
)
&e want to minimi*e the sum of the squar ed error terms so we want to minimi*e
equati on (/). ,n ter ms of calculus this means we want to find the critical points of a
function. &e want to find the values of b
o
and b
1
that mini mi*e the function. )o do this
we ta2e a first derivative of function (/) with respect to b
o
and set it equal to *ero and
solve for b
o
. &hen we do this we !et the function:
n
X b - Y
= b
i i
o

1
'8(
(4)
,f we ta2e the first derivative of (/) with respect to b
1
and set it equal to *ero and then
solve for b
1
we !et the followin! equati on:
(5)
6quations (4) and (5) !ive us the formul as that we need to find the values of b
o
and b
1
that esti mat e the true popul ation relationshi p between $ and #. ,f we plu! (5)" the
formul a for b
1
" into (4)" the formul a for b
o"
we may also write b
o
as follows:
(3ote that $
2
($)
2
) 6quations (4) 7 (8) !ive us b
o
and b
1
solely in terms or $ and #
(sampl e dat a. )
) X ( - X n
Y X - Y X n
= b
2
i i
i i i i


2
1 '5(
) X ( - X n
Y X X - Y X
= b
2
i i
i i i i i
o


2
2
'9(
. 6$9%:L6:
Let0 s try an e1ampl e. onsider the followin! dependent and independent variabl es:
$
i
- the number of children in a family
#
i
- the number of loaves of bread consumed by a family in a !iven three wee2
period
;amily
<" n - 5
$
,
#
i
$
i
#
i
$
i
2
#
i(hat)
e
i(hat)
e
i(hat)
2
1 2 4
2 / =
/ 1 /
4 5 >
5 > 1=
n - 5
$
i
- #
i
- $
i
#
i
- $
i
2
- e
i(hat)
-
e
i(hat)
2
-
1( +ind 2
o
4
2(+ind 2
1
4
/) &rite out the full sampl e re!ressi on line? ,nterpr et the coefficient s" b
o
and b
1
.
8( :redi cti on4
@iven that $
i
- 8" predict the value that we e1pect #
i
to ta2e !iven our sampl e re!ression
line. (i.e. find #
i(hat)
.) ompl et e the si1th column of the table.
5( Ca-cu- at e t!e re"i dua- " 4
Recall e
i(hat)
- #
i
7 #
i(hat)
(;ill in the sevent h column of the table.)
hec2: )he sum of the residual s should be appro1i mat el y *ero" e
i(hat)
- A.
8) +ind e
i'!at(
2
or SSE : (ompl et e the ei!ht h column of the table.)
)-ternat i ve +or,u- a" 4
)he formul as for the esti mat ed coefficient s can be manipul at ed and writt en in a variet y of
ways. +ere are a few other alternatives. One set of alternatives are the formul as !iven
in the te1t.
( )( ) [ ]
( )
2
1
X X
Y Y X X
b
i
i i


=
n
X
X and
X n X
Y X n Y X
b
i i i

=


= "
2 2
1
n
Y
Y and X b Y b
i
o

= = "
1
15. / #!e St andar d Error o% t!e E"ti ,at e 'S e (
2
B
2

=
n
e
S
i
e
#!i " ,ea" ur e appro;i ,at e " t!e averag e di "t anc e o% t!e rea- dat a poi nt " %ro, t!e
e"t i ,at e d regre " " i o n -ine.
i) Se i" ,ea" ur e d in t!e "a,e uni t o% ,ea" ur e a" t!e . vari a2- e . So if the # variabl e is
meas ur ed in dollars and Se - C>. 2/" on aver a!e" our act ual dat a point s vary from their
esti mat ed values by about C>. 2/.
ii) Se can 2e u" e d a" a ,ea" ur e o% t!e &ua- i t y o% %it o% t!e "a,p- e regre " " i o n -ine.
#!e ",a- - er t!e S e , t!e 2et t er t!e %it .
iii) )n a- t ernat i v e %or,u- a %or S e :
2
1
2


=
n
Y X b Y b Y
S
i i i o i
e

You might also like