You are on page 1of 26

Outline

Least Squares Regression

The R Script

Least Squares Regression Alan T. Arnholt


Department of Mathematical Sciences Appalachian State University arnholt@math.appstate.edu

Spring 2006 R Notes

Copyright c 2006 Alan T. Arnholt

Outline

Least Squares Regression

The R Script

Least Squares Regression Overview of Regression

The R Script

Outline

Least Squares Regression

The R Script

Least Squares Regression

When a linear pattern is evident from a scatter plot, the

relationship between the two variables is often modeled with a straight line.

Outline

Least Squares Regression

The R Script

Least Squares Regression

When a linear pattern is evident from a scatter plot, the

relationship between the two variables is often modeled with a straight line.
When modeling a bivariate relationship, Y is called the

response or dependent variable, and x is called the predictor or independent variable.

Outline

Least Squares Regression

The R Script

Least Squares Regression

When a linear pattern is evident from a scatter plot, the

relationship between the two variables is often modeled with a straight line.
When modeling a bivariate relationship, Y is called the

response or dependent variable, and x is called the predictor or independent variable.


The simple linear regression model is written

Yi = 0 + 1 xi + i

(1)

Outline

Least Squares Regression

The R Script

OLS
The goal is to estimate the coecients 0 and 1 in (1). The most well known method of estimating the coecients 0 and 1 is to use ordinary least squares (OLS). OLS provides estimates of 0 and 1 by minimizing the sum of the squared deviations of the Yi s for all possible lines. Specically, the sum of the squared residuals i ) is minimized when the OLS estimators of 0 ( i = ei = Yi Y and 1 are b0 = y b1 x b1 =
n ) (yi i=1 (xi x n )2 i=1 (xi x

(2) y ) (3)

respectively. Note that the estimated regression function is written as i = b0 + b1 xi . Y


6

Outline

Least Squares Regression

The R Script

2 Y 2 2 = Y2 Y Y2
0 1 2 3 4 5 6

x
Figure: Graph depicting residuals. The vertical distances shown with a i s, dotted line between the Yi s, depicted with a solid circle, and the Y depicted with a clear square, are the residuals.
7

Outline

Least Squares Regression

The R Script

Example

Use the data frame Gpa from the BSDA package to: 1. Create a scatterplot of CollGPA versus HSGPA.

Outline

Least Squares Regression

The R Script

Example

Use the data frame Gpa from the BSDA package to: 1. Create a scatterplot of CollGPA versus HSGPA. 2. Find the least squares estimates of 0 and 1 using Equations (2) and (3) respectively.

Outline

Least Squares Regression

The R Script

Example

Use the data frame Gpa from the BSDA package to: 1. Create a scatterplot of CollGPA versus HSGPA. 2. Find the least squares estimates of 0 and 1 using Equations (2) and (3) respectively. 3. Find the least squares estimates of 0 and 1 using the R function lm().

10

Outline

Least Squares Regression

The R Script

Example

Use the data frame Gpa from the BSDA package to: 1. Create a scatterplot of CollGPA versus HSGPA. 2. Find the least squares estimates of 0 and 1 using Equations (2) and (3) respectively. 3. Find the least squares estimates of 0 and 1 using the R function lm(). 4. Add the least squares line to the scatterplot created in 1 using the R function abline().

11

Outline

Least Squares Regression

The R Script

R Code

Code for part 1. > > > > > + + library(BSDA) attach(Gpa) Y <- CollGPA x <- HSGPA plot(x, Y, col="blue", main="Scatterplot of College Versus High School GPA", xlab="High School GPA",ylab="College GPA")

12

Outline

Least Squares Regression

The R Script

Scatterplot of GPA
Scatterplot of College Versus High School GPA

College GPA

1.5 2.0

2.0

2.5

3.0

3.5

2.2

2.4

2.6

2.8

3.0

3.2

3.4

High School GPA

Figure: Scatterplot requested in part 1

Outline

Least Squares Regression

The R Script

Using Equations (2) and (3) to Find b0 and b1

Using Equations (2) and (3) to answer part 2. > b1 <- sum( (x-mean(x))*(Y-mean(Y)) ) / + sum( (x-mean(x))^2 ) > b0 <- mean(Y)-b1*mean(x) > c(b0,b1) [1] -0.950366 1.346999

14

Outline

Least Squares Regression

The R Script

Using abline()

Using the R function abline() to add the least squares regression line to Figure 2 on page 13.
abline() adds one or more straight lines to the current plot.

15

Outline

Least Squares Regression

The R Script

Using abline()

Using the R function abline() to add the least squares regression line to Figure 2 on page 13.
abline() adds one or more straight lines to the current plot. The arguments to abline() are a=b0 and b=b1

16

Outline

Least Squares Regression

The R Script

Using abline()

Using the R function abline() to add the least squares regression line to Figure 2 on page 13.
abline() adds one or more straight lines to the current plot. The arguments to abline() are a=b0 and b=b1

> abline(model,col="blue",lwd=2) Note: the object model contains b0 and b1 .

17

Outline

Least Squares Regression

The R Script

Scatterplot of GPA with Superimposed Least Squares Regression Line


Scatterplot of College Versus High School GPA

College GPA

1.5 2.0

2.0

2.5

3.0

3.5

2.2

2.4

2.6

2.8

3.0

3.2

3.4

High School GPA

Figure: Scatterplot requested in part 4


18

Outline

Least Squares Regression

The R Script

Residuals and Predicted (Fitted) Values

i . The ith residual is dened to be ei = Yi Y

19

Outline

Least Squares Regression

The R Script

Residuals and Predicted (Fitted) Values

i . The ith residual is dened to be ei = Yi Y i from Equation (1) given an xi is The resulting value Y referred to as the predicted value, as well as the tted value.

20

Outline

Least Squares Regression

The R Script

Residuals and Predicted (Fitted) Values

i . The ith residual is dened to be ei = Yi Y i from Equation (1) given an xi is The resulting value Y referred to as the predicted value, as well as the tted value.
The R functions predict() and fitted() can be used on lm

objects.

21

Outline

Least Squares Regression

The R Script

Using fitted() and predict()


> > > > > > > yhat <- b0+b1*x yhatRp <- predict(model) yhatRf <- fitted(model) e <- Y - yhat eR <- resid(model) COMPARE <- rbind(yhat,yhatRp,yhatRf,e,eR) COMPARE[ ,1:4] # all rows columns 1:4 1 2 3 4 yhat 2.68653 3.2253294 1.8783309 3.3600293 yhatRp 2.68653 3.2253294 1.8783309 3.3600293 yhatRf 2.68653 3.2253294 1.8783309 3.3600293 e -0.48653 -0.4253294 0.5216691 0.4399707 eR -0.48653 -0.4253294 0.5216691 0.4399707
22

Outline

Least Squares Regression

The R Script

Sum of Squares Due to Error

The sum of squares due to error (also called the residual sum of squares) is dened as
n

SSE =
i=1

i )2 = e2 (Yi Y i

(4)

Use the denition in (4) and the R function anova() to compute the SSE for the regression of Y on x (Gpa).

23

Outline

Least Squares Regression

The R Script

R Code
> SSE <- sum(e^2) > SSE [1] 1.502284 > anova(model) Analysis of Variance Table Response: Y Df Sum Sq Mean Sq F value Pr(>F) x 1 3.7177 3.7177 19.798 0.002141 ** Residuals 8 1.5023 0.1878 --Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 > anova(model)[2,2] [1] 1.502284
24

Outline

Least Squares Regression

The R Script

Pretty ANOVA Table

x Residuals

Df 1 8

Sum Sq 3.718 1.502

Mean Sq 3.718 0.188

F value 19.798

Pr(>F) 0.002

25

Outline

Least Squares Regression

The R Script

Link to the R Script

Go to my web page Script for Regression Homework: problems 2.35 - 2.40, 2.42 - 2.46 See me if you need help!

26

You might also like