Professional Documents
Culture Documents
Regressand Regressors
Effect variable Causal variables
y β0 β1x ε
Linear component Random Error
component
Linear Regression Assumptions
y y β0 β1x ε
Observed Value
of y for xi
εi Slope = β1
Predicted Value Random Error
of y for xi
for this x value
Intercept = β0
xi x
Estimated Regression Model
Independent
ŷ i b0 b1x variable
e 2
(y ŷ) 2
(y (b 0 b1x)) 2
The Least Squares Equation
xy x y
b1 n
(
x n
2 x ) 2
and
or
xy x . y
b1
2
x
b0 y b1 x
Interpretation of the
Slope and the Intercept
TSS where
Coefficient of Determination R 2
Coefficient of determination
RSS sum of squares explained by regression
R
2
TSS total sum of squares
Examples of Approximate
Values R2
y
R2 = 1
R2 = +1
x
Examples of Approximate
Values R2
y
0 < R2 < 1
x
Examples of Approximate
Values R 2
R2 = 0
y
No linear relationship
between x and y:
The value of Y does not
x depend on x. (None of the
R2 = 0
variation in y is explained
by variation in x)
y x ŷ ( yˆ y ) 2 ( y y )2
245 1400 251.8283 1202.127 1722.25
312 1600 273.7683 162.0962 650.25
279 1700 284.7383 3.103587 56.25
308 1875 303.9358 304.0071 462.25
199 1100 218.9183 4567.286 7656.25
219 1550 268.2833 331.8482 4556.25
405 2350 356.0433 4836.271 14042.25
324 2450 367.0133 6482.391 1406.25
319 1425 254.5708 1019.474 1056.25
255 1700 284.7383 3.103587 992.25
2865 17150 2863.838 18911.71 32600.5
Coefficient of determination
RSS
R
2
TSS
2. Correlation analysis
Correlation is a technique used to
measure the strength of the relationship
between two variables.
The stronger the correlation, the better
the relationship or the better fit the
regression line and vice versa.
Scatter Plot Examples
High degree of Low degree of
correlation correlation
y y
x x
y y
x x
Scatter Plot Examples
No relationship
x
The correlation coefficient (r)
r
( x x)( y y)
[ ( x x ) ][ ( y y ) ]
2 2
n xy x y
r
[n( x 2 ) ( x )2 ][n( y 2 ) ( y )2 ]
xy x . y
r
x y
Note
In the single
independent
variable case, the
coefficient of
R r 2 2
determination is
where
r : simple correlation coefficient
Features of r
Unit free
Range between -1 and 1
The closer to -1, the stronger the
negative linear relationship
The closer to 1, the stronger the positive
linear relationship
The closer to 0, the weaker the linear
relationship
Examples of Approximate
r Values
y y y
x x x
r = -1 r = -.6 r=0
y y
x x
r = +.3 r = +1
Example calculation
xy x . y
r
x 2 ( x )2 y 2 ( y )2
Working Productivity
Example experience (items/h)
The data below 1 2
relates the
working 3 8
experience 4 9
(years) to the
productivity of 10 5 15
workers in a small 6 15
firm
7 20
9 23
12 25
14 22
15 36
Example calculation
x 76 x 2
782
y 175 y 2
3932
xy 1722
Estimate b0 and b1
Linear regression equation
Simple Regression 39
Steps in Regression
3- Calculate the coefficient of determination: r2 = (r)2
0 ≤ r2 ≤ 1
This is the proportion of the variation in the dependent variable (Yi) explained by
the independent variable (Xi)
Note that you have already calculated the numerator and the denominator for parts
of r. Other than a single division operation, no new calculations are required.
BTW, r and b1 are related. If a correlation is negative, the slope term must be
negative; a positive slope means a positive correlation.
Simple Regression 40
Steps in Regression
6- The regression equation (a straight line) is:
Yˆi = b0 + b1Xi
r n2
tn-2 =
1 r2
Simple Regression 42
n = 5 pairs of X,Y observations
Independent variable (X) is amount of
water (in gallons) used on crop
Dependent variable (Y) is yield (bushels of
tomatoes). Y
i X
i XY X Y i i i
2
i
2
2 1 2 1 4
5 2 10 4 25
8 3 24 9 64
10 4 40 16 100
15 5 75 25 225
40 15 151 55 418
Simple Regression 44
Example: Water and Tomato Yield
155
Step 4- b1 = = 3.1 The slope is positive. There is a positive relationship
50
between water and crop yield.
Simple Regression 45
Example: Water and Tomato Yield
Simple Regression 46
Example: Water and Tomato Yield
Yi Xi ei ei2
Yˆi
2 1 1.8 .2 .04
5 2 4.9 .1 .01
8 3 8.0 0 0
10 4 11.1 -1.1 1.21
15 5 14.2 .8 .64
Σei = 0 Σei2 = 1.90
Total Variation: Y Y
i Y i
2
n
Explained Variation: Yˆ Y b Y b X Y
2
Yi
2
i 0 i 1 i i
n
Unexplained Variation:
Yi Yˆi Y
2
i
2
b0Yi b1X iY
From our previous problem,
Total variation in Y = 418 – (40)2/5 = 98
Explained variation (explained by X) = -1.3(40) + 3.1(151) – (40)2/5 = 96.10
Unexplained variation = 418 - -1.3(40) - 3.1(151) = 1.90
Simple Regression 48
The Multiple Regression Model
Idea: Examine the linear relationship between
1 dependent (y) & 2 or more independent variables (xi)
Population model:
Y-intercept Population slopes Random Error
y β0 β1x1 β2 x 2 βk xk ε
Estimated multiple regression model:
Estimated Estimated
(or predicted) Estimated slope coefficients
intercept
value of y
ŷ b0 b1x1 b2 x 2 bk xk
Estimates b0, b1, b2,….,bk
y nb0 b1 x1 b2 x2 ....... bk xk
1 0 1 1 1 b2 x1 x2 ....... bk x1 xk
2
x y b x b x
2 0 2 1 1 2 2 2 ....... bk x2 xk
2
x y b x b x x b x
......................................................................................
xk y b0 xk b1 x1 xk b2 x2 xk ....... bk xk2
Interpretation of Estimated
Coefficients
Slope (bi)
◦ Estimates that the average value of y changes by
bi units for each 1 unit increase in Xi given that
all other variables unchanged
Intercept (b0)
◦ The estimated average value of y when all xi = 0
Multiple Regression Model
Two variable model
y
ŷ b0 b1x1 b2 x 2
x2
x1
Multiple Regression Model
Two variable model
y Sample
yi
<
observation ŷ b0 b1x1 b2 x 2
yi
<
e = (y – y)
x2i
x2
<
x1i The best fit equation, y ,
is found by minimizing the
x1 sum of squared errors, e2
Multiple Regression
Assumptions
<
e = (y – y)
ŷ b0 b1 x1 b2 x2
Estimates b0, b1, b2
y nb0 b1 x1 b2 x2
x1 y b0 x1 b1 x1 b2 x1 x2
2
2 0 2 1 1 2 2 2
2
x y b x b x x b x
Multiple Coefficient of
Determination
Reports the proportion of total variation in
y explained by all x variables taken
together
R R 2
Define residuals
Create residual plots
Interpret plots
Residual analysis
Notation Properties
E = residual Σe = 0
Y = observed value
Y’ = predicted value
e=0
What is a residual
Residual = observed value – predicted value
Typical patterns for residual plots
Is linear regression appropriate?
Randome pattern: use linear
regression
Non-random: consider other
technique
X Addition Xt = X + 5
Y Multiplication Yt = 2* Y
A Square root At = A
B Reciprocal Bt = 1/B
TRANSFORMATIONS BASIC
Do you know?
Linear Non-linear
No effect on correlation Changes correlation
Xt = c* X Anything else
Xt = X/c
Xt = X + c
Application to Regression
EXAMPLE
Suppose we repeat the analysis, using a
quadratic model to transform the dependent
variable. For a quadratic model, we use the
square root of y, rather than y, as the
dependent variable. Using the transformed
data, our regression equation is:
y't = b0 + b1x
yt = transformed dependent variable, which is equal to
the square root of y
y't = predicted value of the transformed dependent
variable yt
x = independent variable
b0 = y-intercept of transformation regression line
b1 = slope of transformation regression line
The table below shows the transformed
data we analyzed.
x 1 2 3 4 5 6 7 8 9
Yt 1.14 1 2.45 3.74 3.87 5.48 6.32 8.6 8.66