Professional Documents
Culture Documents
The R Script
Outline
The R Script
The R Script
Outline
The R Script
relationship between the two variables is often modeled with a straight line.
Outline
The R Script
relationship between the two variables is often modeled with a straight line.
When modeling a bivariate relationship, Y is called the
Outline
The R Script
relationship between the two variables is often modeled with a straight line.
When modeling a bivariate relationship, Y is called the
Yi = 0 + 1 xi + i
(1)
Outline
The R Script
OLS
The goal is to estimate the coecients 0 and 1 in (1). The most well known method of estimating the coecients 0 and 1 is to use ordinary least squares (OLS). OLS provides estimates of 0 and 1 by minimizing the sum of the squared deviations of the Yi s for all possible lines. Specically, the sum of the squared residuals i ) is minimized when the OLS estimators of 0 ( i = ei = Yi Y and 1 are b0 = y b1 x b1 =
n ) (yi i=1 (xi x n )2 i=1 (xi x
(2) y ) (3)
Outline
The R Script
2 Y 2 2 = Y2 Y Y2
0 1 2 3 4 5 6
x
Figure: Graph depicting residuals. The vertical distances shown with a i s, dotted line between the Yi s, depicted with a solid circle, and the Y depicted with a clear square, are the residuals.
7
Outline
The R Script
Example
Use the data frame Gpa from the BSDA package to: 1. Create a scatterplot of CollGPA versus HSGPA.
Outline
The R Script
Example
Use the data frame Gpa from the BSDA package to: 1. Create a scatterplot of CollGPA versus HSGPA. 2. Find the least squares estimates of 0 and 1 using Equations (2) and (3) respectively.
Outline
The R Script
Example
Use the data frame Gpa from the BSDA package to: 1. Create a scatterplot of CollGPA versus HSGPA. 2. Find the least squares estimates of 0 and 1 using Equations (2) and (3) respectively. 3. Find the least squares estimates of 0 and 1 using the R function lm().
10
Outline
The R Script
Example
Use the data frame Gpa from the BSDA package to: 1. Create a scatterplot of CollGPA versus HSGPA. 2. Find the least squares estimates of 0 and 1 using Equations (2) and (3) respectively. 3. Find the least squares estimates of 0 and 1 using the R function lm(). 4. Add the least squares line to the scatterplot created in 1 using the R function abline().
11
Outline
The R Script
R Code
Code for part 1. > > > > > + + library(BSDA) attach(Gpa) Y <- CollGPA x <- HSGPA plot(x, Y, col="blue", main="Scatterplot of College Versus High School GPA", xlab="High School GPA",ylab="College GPA")
12
Outline
The R Script
Scatterplot of GPA
Scatterplot of College Versus High School GPA
College GPA
1.5 2.0
2.0
2.5
3.0
3.5
2.2
2.4
2.6
2.8
3.0
3.2
3.4
Outline
The R Script
Using Equations (2) and (3) to answer part 2. > b1 <- sum( (x-mean(x))*(Y-mean(Y)) ) / + sum( (x-mean(x))^2 ) > b0 <- mean(Y)-b1*mean(x) > c(b0,b1) [1] -0.950366 1.346999
14
Outline
The R Script
Using abline()
Using the R function abline() to add the least squares regression line to Figure 2 on page 13.
abline() adds one or more straight lines to the current plot.
15
Outline
The R Script
Using abline()
Using the R function abline() to add the least squares regression line to Figure 2 on page 13.
abline() adds one or more straight lines to the current plot. The arguments to abline() are a=b0 and b=b1
16
Outline
The R Script
Using abline()
Using the R function abline() to add the least squares regression line to Figure 2 on page 13.
abline() adds one or more straight lines to the current plot. The arguments to abline() are a=b0 and b=b1
17
Outline
The R Script
College GPA
1.5 2.0
2.0
2.5
3.0
3.5
2.2
2.4
2.6
2.8
3.0
3.2
3.4
Outline
The R Script
19
Outline
The R Script
i . The ith residual is dened to be ei = Yi Y i from Equation (1) given an xi is The resulting value Y referred to as the predicted value, as well as the tted value.
20
Outline
The R Script
i . The ith residual is dened to be ei = Yi Y i from Equation (1) given an xi is The resulting value Y referred to as the predicted value, as well as the tted value.
The R functions predict() and fitted() can be used on lm
objects.
21
Outline
The R Script
Outline
The R Script
The sum of squares due to error (also called the residual sum of squares) is dened as
n
SSE =
i=1
i )2 = e2 (Yi Y i
(4)
Use the denition in (4) and the R function anova() to compute the SSE for the regression of Y on x (Gpa).
23
Outline
The R Script
R Code
> SSE <- sum(e^2) > SSE [1] 1.502284 > anova(model) Analysis of Variance Table Response: Y Df Sum Sq Mean Sq F value Pr(>F) x 1 3.7177 3.7177 19.798 0.002141 ** Residuals 8 1.5023 0.1878 --Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 > anova(model)[2,2] [1] 1.502284
24
Outline
The R Script
x Residuals
Df 1 8
F value 19.798
Pr(>F) 0.002
25
Outline
The R Script
Go to my web page Script for Regression Homework: problems 2.35 - 2.40, 2.42 - 2.46 See me if you need help!
26