Professional Documents
Culture Documents
Josefina V. Almeda
College Secretary/Asst. Prof. 7
School of Statistics
UP Diliman
2016
Objectives
After this, you should be able to:
• Calculate and interpret the simple correlation
between two variables
• Determine whether the correlation is significant
• Calculate and interpret the simple linear
regression equation for a set of data
• Understand the assumptions behind
regression analysis
• Determine whether a regression model is
significant 2
1
5/4/2018
Objectives
(continued)
After this, you should be able to:
• Calculate and interpret confidence
intervals for the regression coefficients
• Recognize regression analysis
applications for purposes of prediction
and description
• Recognize some potential problems if
regression analysis is used incorrectly
• Recognize nonlinear relationships
between two variables 3
2
5/4/2018
y y
x x
y y
x x5
y y
x x
y y
x x6
3
5/4/2018
x
7
Correlation Coefficient
(continued)
4
5/4/2018
Features of ρ and r
• Unit free
• Range between -1 and 1
• The closer to -1, the stronger the
negative linear relationship
• The closer to 1, the stronger the positive
linear relationship
• The closer to 0, the weaker the linear
relationship
9
Examples of Approximate
r Values
y y y
x x x
r = -1 r = -.6 r=0
y y
x x 10
r = +.3 r = +1
5
5/4/2018
Calculating the
Correlation Coefficient
Sample correlation coefficient:
r
( x x)( y y )
[ ( x x ) ][ ( y y ) ]
2 2
Calculation Example
Tree Trunk
Height Diamete
r
y x xy y2 x2
35 8 280 1225 64
49 9 441 2401 81
27 7 189 729 49
33 6 198 1089 36
60 13 780 3600 169
21 7 147 441 49
45 11 495 2025 121
51 12 612 2601 144
12
=321 =73 =3142 =14111 =713
6
5/4/2018
Tree n xy x y
Height, r
y 70 [n( x ) ( x)2 ][n( y 2 ) ( y)2 ]
2
60
8(3142) (73)(321)
50
40
[8(713) (73) 2 ][8(14111) (321) 2 ]
30
0.886
20
10
0
r = 0.886 → relatively strong positive
0 2 4 6 8 10 12 14
linear association between x and y
Trunk Diameter, x
13
Excel Output
Excel Correlation Output
Tools / data analysis / correlation…
Correlation between
Tree Height and Trunk Diameter
14
7
5/4/2018
• Test statistic
r
t
– 1 r 2 (with n – 2 degrees of freedom)
n2
15
n2 82 16
8
5/4/2018
r .886 Decision:
t 4.68
1 r 2
1 .886 2 Reject H0
9
5/4/2018
19
20
10
5/4/2018
y β0 β1x ε
Variable
21
11
5/4/2018
y y β0 β1x ε
Observed Value
of y for xi
εi Slope = β1
Predicted Value Random Error
of y for xi
for this x value
Intercept = β0
xi x
23
Independent
ŷ i b0 b1x variable
24
12
5/4/2018
e 2
(y ŷ) 2
(y (b 0 b1x))2
25
algebraic
equivalent: and
xy
x y
n
b0 y b1 x
b1
( x ) 2
x 2
n 26
13
5/4/2018
Interpretation of the
Slope and the Intercept
• b0 is the estimated average value of
y when the value of x is zero
27
14
5/4/2018
15
5/4/2018
31
Excel Output
Regression Statistics
Multiple R 0.76211
R Square 0.58082
Adjusted R
Square 0.52842 The regression equation is:
Standard Error 41.33032
Observations 10 house price 98.24833 0.10977 (square feet)
ANOVA Significance
df SS MS F F
18934.934 11.084
Regression 1 18934.9348 8 8 0.01039
Residual 8 13665.5652 1708.1957
Total 9 32600.5000
Coefficien P- Upper
ts Standard Error t Stat value Lower 95% 95%
0.1289 232.0738
Intercept 98.24833 58.03348 1.69296 2 -35.57720 6
0.0103
Square Feet 0.10977 0.03297 3.32938 9 0.03374 32
0.18580
16
5/4/2018
Graphical Presentation
• House price model: scatter plot and
regression line
450
400
Interpretation of the
Intercept, b0
17
5/4/2018
Interpretation of the
Slope Coefficient, b1
house price 98.24833 0.10977 (square feet)
18
5/4/2018
19
5/4/2018
Xi x
39
Coefficient of Determination,R2
• The coefficient of determination is the
portion of the total variation in the
dependent variable that is explained by
variation in the independent variable
SST 40
20
5/4/2018
Coefficient of Determination, R2
(continued)
Coefficient of determination
SSR sum of squares explained by regression
R2
SST total sum of squares
R2 r 2
where:
R2 = Coefficient of determination
r = Simple correlation coefficient
41
Examples of Approximate
R2 Values
y
R2 = 1
x
R2 = +1 42
21
5/4/2018
Examples of Approximate
R2 Values
y
0 < R2 < 1
x
43
Examples of Approximate
R2 Values
R2 = 0
y
No linear relationship
between x and y:
44
22
5/4/2018
Excel Output
Regression Statistics SSR 18934.9348
Multiple R 0.76211 R2 0.58082
R Square 0.58082
SST 32600.5000
Adjusted R Square 0.52842 58.08% of the variation in
Standard Error 41.33032 house prices is explained by
Observations 10 variation in square feet
ANOVA Significance
df SS MS F F
18934.934 11.084
Regression 1 18934.9348 8 8 0.01039
Residual 8 13665.5652 1708.1957
Total 9 32600.5000
Coefficie P- Upper
nts Standard Error t Stat value Lower 95% 95%
0.1289 232.0738
Intercept 98.24833 58.03348 1.69296 2 -35.57720 6
0.0103
45
Square Feet 0.10977 0.03297 3.32938 9 0.03374 0.18580
23
5/4/2018
where:
sb1 = Estimate of the standard error of the least squares slope
SSE = Sample standard error of the estimate
sε
n2 47
Excel Output
Regression Statistics
Multiple R 0.76211
s ε 41.33032
R Square 0.58082
Adjusted R
Square 0.52842
Standard Error
Observations
41.33032
10
sb1 0.03297
ANOVA Significance
df SS MS F F
Upper
Coefficients Standard Error t Stat P-value Lower 95% 95%
24
5/4/2018
y y
–
t 1 β1 = Hypothesized slope
s b1 sb1 = Estimator of the standard
error of the slope
d.f. n 2 50
25
5/4/2018
price
26
5/4/2018
27
5/4/2018
1 (xp x)
2
ŷ t /2 sε
n (x x)2
55
1 (x p x)
2
ŷ t /2 sε 1
n (x x)2
28
5/4/2018
Interval Estimates
for Different Values of x
Prediction Interval
for an individual y,
y given xp
Confidence
Interval for
the mean of
y, given xp
x
x xp 57
29
5/4/2018
98.25 0.1098(2000)
317.85
The predicted price for a house with 2000
square feet is 317.85($1,000s) = $317,850
59
1 (x p x)
2
30
5/4/2018
Estimation of Individual
Values: Example
Prediction Interval Estimate for y|xp
Find the 95% confidence interval for an individual
house with 2,000 square feet
Predicted Price Yi = 317.85 ($1,000s)
1 (x p x)
2
Residual Analysis
• Purposes
– Examine for linearity assumption
– Examine for constant variance for all
levels of x
– Evaluate normal distribution
assumption
• Graphical Analysis of Residuals
– Can plot residuals vs. x
– Can create histogram of residuals to
check for normality 62
31
5/4/2018
y y
x x
residuals
residuals
x x
Not Linear
Linear 63
y y
x x
residuals
residuals
x x
32
5/4/2018
Excel Output
RESIDUAL OUTPUT
Predicted
House Price Model Residual Plot
House
Price Residuals 80
1 251.92316 -6.923162 60
2 273.87671 38.12329 40
Residuals
3 284.85348 -5.853484 20
4 304.06284 3.937162 0
0 1000 2000 3000
5 218.99284 -19.99284 -20
9 254.6674 64.33264
65
10 284.85348 -29.85348
33