1 The Idea

1 The Idea
The idea is to find a line of best fit. Consider the following regression function
that is linear in parameters (and to keep things simple, also linear in the
variable)
Yi = b0 + b1 Xi + ei (1)
The idea behind the method of least squares is to find the parameters b0
and b1 that minimize the sum of squared residuals (SSR).
Three questions:
1. Why is choosing b0 and b1 the goal?
2. Why do we minimize the sum of squared residuals, rather than the

sum of residuals, or the sum of absolute residuals?
3. Is there a drawback to using squares?
2 Formal Statement of the Problem

n n
e2i = (Yi b0 b1 Xi )2
X X
minb0 ,b1 (2)
i=1 i=1
3 Solution Concept
What we have here is a 3-D surface called SSE that is shaped like a bowl.
Our goal is to find the pair (b0 , b1 ) that corresponds to the lowest point in
this bowl. An intuitive way to do this would be to construct a numerical
algorithm that starts at any point in the surface and keeps moving along it
until the minimum is found.
1
However, the linearity in the parameters of the linear regression function
allows us to solve this problem analytically using basic calculus. The ana-
lytical idea is to find the point where the slopes in both directions, b0 , b1 are
zero, and to ensure that this point is a global minimum.
4 The derivation of coefficient estimates

Take the partial derivatives of SSR with respect to the two parameters b0
and b1 .
n
SSR X
= 2 (Yi b0 b1 Xi ) = 0 (3)
b0 i=1
n
SSR X
= 2 (Yi b0 b1 Xi )Xi = 0 (4)
b1 i=1
These two first order conditions are simplified as follows, and called the
normal equations:
n
X
(Yi b0 b1 Xi ) = 0 (5)
i=1
n
X
(Yi b0 b1 Xi )Xi = 0 (6)
i=1
Using simple algebra, the normal equations reduce to

n
X n
X n
X
Yi b0 b1 Xi = 0 (7)
i=1 i=1 i=1
n n n
b1 Xi2 = 0
X X X
Xi Yi b0 Xi (8)
i=1 i=1 i=1
n(Y b0 b1 X) = 0 (9)
n n
Xi2 = 0
X X
Xi Yi b0 nX b1 (10)
i=1 i=1
2
From the first FOC, we get
b0 = Y b1 X (11)
Substituting this in the second FOC

n n
2
Xi2 nb1 X
X X
Xi Yi nXY = b1 (12)
i=1 i=1
Solving for b1 ,
Pn
Xi Yi nXY
b1 i=1
= P n 2 (13)
i=1 Xi2 nX
With a bit of algebraic manipulation, we can write this in a more elegant
form. To do this, lets first focus on the numerator, and add and subtract
nXY to it, and manipulate the resulting expression as follows;
n
X
Xi Yi nXY + nXY nXY (14)
i=1
n
X n
X n
X
Xi Yi + nXY X Yi Y Xi (15)
i=1 i=1 i=1
n
X n
X n
X n
X
Xi Yi + XY XYi Y Xi (16)
i=1 i=1 i=1 i=1
n
X
(Xi Yi XYi Y Xi + XY ) (17)
i=1
n
X
(Xi X)(Yi Y ) (18)
i=1
2
Similarly, for the denominator, add and subtract nX
n
2 2 2
Xi2 nX + nX nX
X
(19)
i=1
n n n n
2
Xi2 +
X X X X
X Xi X Xi X (20)
i=1 i=1 i=1 i=1
3
n
2
(Xi2 + X 2Xi X)
X
(21)
i=1
n
(Xi X)2
X
(22)
i=1
Therefore,
Pn
i=1 (Xi X)(Yi Y)
b1 = Pn 2
(23)
i=1 (Xi X)
In deviations form, i.e. xi = Xi X,yi = Yi Y ,

Pn
xi yi
b1 = Pi=1
n 2
(24)
i=1 xi
5 Properties of the OLS estimator

1. The OLS regression line always goes through the centroid, ((X), (Y ))
: see first FOC
2. The sum of the OLS residuals is always zero: see first FOC
3. The residuals are orthogonal to the independent variable, i.e. X and e

are uncorrelated. see second FOC.
6 Ensuring the global minimum

We need to verify that the solution we found is a minimum and that is it is
a global minimum.
1. To show that a solution is a minimum, all we need to do is to show

that both second derivatives are positive.
2 SSR
= 2n > 0 (25)
b20
4
n
2 SSR
Xi2 > 0
X
= 2 (26)
b21 i=1
2. To show that the solution is a global minimum, we need to
7 Some useful measures

Start with
Yi = Yi + ei (27)
Subtract Y from both sides,
Yi Y = Yi Y + ei (28)
Square both sides and some across all observations,
(Yi Y )2 = (Yi Y + ei )2
X X
(29)
Expand the right hand side
(Yi Y )2 = (Yi Y )2 + e2i + 2

X X X X
(Yi Y )ei
(Yi Y )2 = (Yi Y )2 + e2i + 2b0

X X X X X X
ei + b1 Xi ei Y ei
Because the last three terms are all zero, this expression reduces to,
(Yi Y )2 = (Yi Y )2 + e2i

X X X
This allows us to interpret a regression as a useful decomposition of the

(Yi
P
total variation, or the total sum of squares, of the Y variable. Define
Y )2 as the total sum of squares, (Yi Y )2 as the total explained squares.
P
e2i as the sum of squared residuals. Therefore, what

P
We already defined
we have is the following identity,
5
SST = SSE + SSR (30)
A useful correlated measure is called R-squared: this expresses the ex-

plained sum of squares as a percentage of the total sum of squares,
SSE
R2 = (31)
SST

1 The Idea

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

1 The Idea

Uploaded by

Copyright:

Available Formats

1 The Idea

1. Why is choosing b0 and b1 the goal?

2. Why do we minimize the sum of squared residuals, rather than the

3. Is there a drawback to using squares?

2 Formal Statement of the Problem

4 The derivation of coefficient estimates

Using simple algebra, the normal equations reduce to

Substituting this in the second FOC

In deviations form, i.e. xi = Xi X,yi = Yi Y ,

5 Properties of the OLS estimator

3. The residuals are orthogonal to the independent variable, i.e. X and e

6 Ensuring the global minimum

1. To show that a solution is a minimum, all we need to do is to show

2. To show that the solution is a global minimum, we need to

7 Some useful measures

Subtract Y from both sides,

Square both sides and some across all observations,

Expand the right hand side

(Yi Y )2 = (Yi Y )2 + e2i + 2

(Yi Y )2 = (Yi Y )2 + e2i + 2b0

(Yi Y )2 = (Yi Y )2 + e2i

This allows us to interpret a regression as a useful decomposition of the

e2i as the sum of squared residuals. Therefore, what

A useful correlated measure is called R-squared: this expresses the ex-

You might also like