You are on page 1of 30

Chapter 2

Ordinary
Least
Squares

Copyright 2011 Pearson Addison-Wesley.


All rights reserved.

Slides by Niels-Hugo Blunch


Washington and Lee University

Estimating Single-IndependentVariable Models with OLS


Recall that the objective of regression analysis is to start
from:
(2.1)
And, through the use of data, to get to:
(2.2)
Recall that equation 2.1 is purely theoretical, while equation
(2.2) is it empirical counterpart
How to move from (2.1) to (2.2)?
2011 Pearson Addison-Wesley. All rights reserved.

2-2

Estimating Single-IndependentVariable Models with OLS (cont.)


One of the most widely used methods is Ordinary Least
Squares (OLS)
OLS minimizes

(i = 1, 2, ., N)

(2.3)

Or, the sum of squared deviations of the vertical distance


between the residuals (i.e. the estimated error terms) and
the estimated regression line
We also denote this term the Residual Sum of Squares
(RSS)

2011 Pearson Addison-Wesley. All rights reserved.

2-3

Estimating Single-IndependentVariable Models with OLS (cont.)


Similarly, OLS minimizes:
Why use OLS?
Relatively easy to use
The goal of minimizing RSS is intuitively / theoretically
appealing
This basically says we want the estimated regression
equation to be as close as possible to the observed data

OLS estimates have a number of useful characteristics


2011 Pearson Addison-Wesley. All rights reserved.

2-4

Estimating Single-IndependentVariable Models with OLS (cont.)


OLS estimates have at least two useful
characteristics:
The sum of the residuals is exactly zero
OLS can be shown to be the best estimator when
certain specific conditions hold (well get back to
this in Chapter 4)
Ordinary Least Squares (OLS) is an estimator
A given

2011 Pearson Addison-Wesley. All rights reserved.

produced by OLS is an estimate

2-5

Estimating Single-IndependentVariable Models with OLS (cont.)


How does OLS work?
First recall from (2.3) that OLS minimizes the sum of the squared
residuals
Next, it can be shown (see Exercise 12) that the coefficients that
ensure that for the case of just one independent variable are:

(2.4)

(2.5)
2011 Pearson Addison-Wesley. All rights reserved.

2-6

Estimating Multivariate
Regression Models with OLS
In the real world one explanatory variable is not enough
The general multivariate regression model with K
independent variables is:
Yi = 0 + 1X1i + 2X2i + ... + KXKi + i (i = 1,2,,N)

(1.13)

Biggest difference with single-explanatory variable


regression model is in the interpretation of the slope
coefficients
Now a slope coefficient indicates the change in the dependent
variable associated with a one-unit increase in the explanatory
variable holding the other explanatory variables constant
2011 Pearson Addison-Wesley. All rights reserved.

2-7

Estimating Multivariate Regression


Models with OLS (cont.)
Omitted (and relevant!) variables are therefore not
held constant
The intercept term, 0, is the value of Y when all
the Xs and the error term equal zero
Nevertheless, the underlying principle of
minimizing the summed squared residuals remains
the same

2011 Pearson Addison-Wesley. All rights reserved.

2-8

Example: financial aid awards at


a liberal arts college
Dependent variable:
FINAIDi: financial aid (measured in dollars of
grant) awarded to the ith applicant

2011 Pearson Addison-Wesley. All rights reserved.

2-9

Example: financial aid awards at


a liberal arts college
Theoretical Model:
(2.9)
(2.10)

where:
PARENTi: The amount (in dollars) that the parents of the ith
student are judged able to contribute to college expenses
HSRANKi: The ith students GPA rank in high school, measured
as a percentage (i.e. between 0 and 100)
2011 Pearson Addison-Wesley. All rights reserved.

2-10

Example: financial aid awards at


a liberal arts college (cont.)
Estimate model using the data in Table 2.2 to get:
(2.11)
Interpretation of the slope coefficients?
Graphical interpretation in Figures 2.1 and 2.2

2011 Pearson Addison-Wesley. All rights reserved.

2-11

Figure 2.1 Financial Aid as a


Function of Parents Ability to Pay

2011 Pearson Addison-Wesley. All rights reserved.

2-12

Figure 2.2 Financial Aid as a


Function of High School Rank

2011 Pearson Addison-Wesley. All rights reserved.

2-13

Total, Explained, and Residual


Sums of Squares

(2.12)

(2.13)

TSS = ESS + RSS


This is usually called the decomposition of
variance

2011 Pearson Addison-Wesley. All rights reserved.

2-14

Figure 2.3 Decomposition of the


Variance in Y

2011 Pearson Addison-Wesley. All rights reserved.

2-15

Evaluating the Quality of a


Regression Equation
Checkpoints here include the following:
1. Is the equation supported by sound theory?
2. How well does the estimated regression fit the data?
3. Is the data set reasonably large and accurate?
4. Is OLS the best estimator to be used for this equation?
5. How well do the estimated coefficients correspond to the expectations
developed by the researcher before the data were collected?
6. Are all the obviously important variables included in the equation?
7. Has the most theoretically logical functional form been used?
8. Does the regression appear to be free of major econometric
problems?
*These numbers roughly correspond to the relevant chapters in the book

2011 Pearson Addison-Wesley. All rights reserved.

2-16

Describing the Overall Fit of the


Estimated Model
The simplest commonly used measure of overall fit
is the coefficient of determination, R2:
(2.14)
Since OLS selects the coefficient estimates that
minimizes RSS, OLS provides the largest possible
R2 (within the class of linear models)

2011 Pearson Addison-Wesley. All rights reserved.

2-17

Figure 2.4 Illustration of Case


Where R2 = 0

2011 Pearson Addison-Wesley. All rights reserved.

2-18

Figure 2.5 Illustration of Case


Where R2 = .95

2011 Pearson Addison-Wesley. All rights reserved.

2-19

Figure 2.6 Illustration of Case


Where R2 = 1

2011 Pearson Addison-Wesley. All rights reserved.

2-20

The Simple Correlation


Coefficient, r
This is a measure related to R2
r measures the strength and direction of the linear
relationship between two variables:
r = +1: the two variables are perfectly positively
correlated
r = 1: the two variables are perfectly negatively
correlated
r = 0: the two variables are totally uncorrelated
2011 Pearson Addison-Wesley. All rights reserved.

2-21

The adjusted coefficient of


determination
A major problem with R2 is that it can never
decrease if another independent variable is added
An alternative to R2 that addresses this issue is the
adjusted R2 or R2:
(2.15)
Where N K 1 = degrees of freedom

2011 Pearson Addison-Wesley. All rights reserved.

2-22

The adjusted coefficient of


determination (cont.)
So, R2 measures the share of the variation of Y around
its mean that is explained by the regression equation,
adjusted for degrees of freedom
R2 can be used to compare the fits of regressions with
the same dependent variable and different numbers of
independent variables
As a result, most researchers automatically use instead
of R2 when evaluating the fit of their estimated
regressions equations

2011 Pearson Addison-Wesley. All rights reserved.

2-23

Table 2.1a
The Calculation of Estimated Regression
Coefficients for the Weight/Height Example

2011 Pearson Addison-Wesley. All rights reserved.

2-24

Table 2.1b
The Calculation of Estimated Regression
Coefficients for the Weight/Height Example

2011 Pearson Addison-Wesley. All rights reserved.

2-25

Table 2.2a
Data for the Financial Aid Example

2011 Pearson Addison-Wesley. All rights reserved.

2-26

Table 2.2b
Data for the Financial Aid Example

2011 Pearson Addison-Wesley. All rights reserved.

2-27

Table 2.2c
Data for the Financial Aid Example

2011 Pearson Addison-Wesley. All rights reserved.

2-28

Table 2.2d
Data for the Financial Aid Example

2011 Pearson Addison-Wesley. All rights reserved.

2-29

Key Terms from Chapter 2


Ordinary Least Squares (OLS)
Interpretation of a multivariate regression coefficient
Total sums of squares
Explained sums of squares
Residual sums of squares
Coefficient of determination, R2
Simple correlation coefficient, r
Degrees of freedom
Adjusted coefficient of determination , R2

2011 Pearson Addison-Wesley. All rights reserved.

2-30

You might also like