Chapter 6: Regression
Analysis
Part 1: Simple Linear Regression
Regression Analysis
Building models that characterize the
relationships between a dependent variable
and one (single) or more (multiple)
independent variables, all of which are
numerical, for example:
Sales = a + b*Price + c*Coupons +
d*Advertising + e*Price*Advertising
Crosssectional data
Time series data (forecasting)
Simple Linear Regression
Single independent variable
Linear relationship
SLR Model
Y = 
0
+ 
1
X + c
Intercept slope error
E(YX)
f(YX)
Error Terms (Residuals)
c
i
= Y
i

0

1
X
i
Estimation of the Regression
Line
True regression line (unknown):
Y = 
0
+ 
1
X + c
Estimated regression line:
Y = b
0
+ b
1
X
Observed errors:
e
i
= Y
i
 b
0
 b
1
X
i
Least Squares Regression
=
+
n
i
i i
X b b Y
1
2
1 0
]) [ (
2
1
2
1
1
X n X
Y X n Y X
b
n
i
i
n
i
i i
=
=
b
0
= Y  b
1
X
minimize
a
b
c
Excel Trendlines
Construct a scatter diagram
Method 1: Select Chart/Add Trendline
Method 2: Select data series; right click
= 0.0854(6000) 108.59
= 403.81
6000
Y
Regression and
Investment Risk
Systematic risk variation in stock price
explained by the market
Measured by beta
Beta = 1: perfect match to market
movements
Beta < 1: stock is less volatile than market
Beta > 1: stock is more volatile than
market
Systematic Risk (Beta)
Beta is the slope of the
regression line
Theory Without Regression
Y
Best estimate for Y is the
mean; independent of the
value of X
A measure of total variation is
SST = (Y
i
Y)
2
Unexplained
variation
X
i
Y
i
Theory With Regression
Observed values Y
i
Fitted values Y
i
X
i
Y
i
Variation unexplained
after regression, Y  Y
Fitted line
Y = b
0
+ b
1
X
Y
Variation explained
by regression, Y  Y
Sums of Squares
SST = (Y
i
Y)
2
=( Y)
2
+ (Y  )
2
Y
= SSR + SSE
Explained variation Unexplained variation
Coefficient of Determination
R
2
= SSR/SST = (SST SSE)/SST = 1 SSE/SST =
coefficient of determination: the proportion of
variation explained by the independent variable
(regression model)
0 s R
2
s 1
Adjusted R
2
incorporates sample size and number of
explanatory variables (in multiple regression models).
Useful for comparing against models with different
numbers of explanatory variables.
( )
(
=
2
1
1 1
2 2
n
n
R R
adj
Correlation Coefficient
Sample correlation coefficient
R = \R
2
Properties
1 s R s 1
R = 1 => perfect positive correlation
R = 1 => perfect negative correlation
R = 0 => no correlation
Standard Error of the Estimate
MSE = SSE/(n2) = an unbiased
estimate of the variance of the errors
about the regression line
Standard error of the estimate is
Measures the spread of data about the line
S
YX
=
2 n
SSE
Confidence Interval for Mean
Value of Y
Confidence intervals depend on the value of
the independent variable
t
o/2, n2
S
YX
\h
i
i
Y
+ =
n
i
i
i
i
X X
X X
n
h
1
2
2
) (
) (
1
Prediction Intervals
Prediction intervals apply to individual
(not mean) values of the dependent
variable
t
o/2, n2
S
YX
\1+h
i
i
Y
=
n
i
i YX
X X S
b
t
1
2
1 1
) ( /

with n2 degrees of freedom
This allows you to test
H
0
: slope = 
1
H
1
: slope = 
1
Excel Regression Tool
Excel menu > Tools > Data Analysis >
Regression
Input variable ranges
Check appropriate
boxes
Select output options
Regression Output
Correlation coefficient
S
YX
b
0
b
1
pvalue for
significance of
regression
t test for slope
Confidence
interval for slope
Residuals
Standard residuals are residuals divided by their standard
error, expressed in units independent of the units of the
data.
Assumptions Underlying
Regression
Linearity
Check with scatter diagram of the data or the residual plot
Normally distributed errors for each X with mean 0 and constant
variance
Examine histogram of standardized residuals or use goodnessoffit
tests
Homoscedasticity constant variance about the regression line
for all values of the independent variable
Examine by plotting residuals and looking for differences in
variances at different values of X
No autocorrelation. Residuals should be independent for each
value of the independent variable. Important if the independent
variable is time (e.g., forecasting models).
Residual Plot
Histogram of Residuals
Evaluating Homoscedasticity
OK
Heteroscadastic
Autocorrelation
DurbinWatson statistic
D < 1 suggest autocorrelation
D > 1.5 suggest no autocorrelation
D > 2.5 suggest negative autocorrelation
PHStat tool calculates this statistic
=
=
=
n
i
i
n
i
i i
e
e e
D
1
2
2
2
1
) (