You are on page 1of 30

2007 Pearson Education

Chapter 6: Regression
Analysis
Part 1: Simple Linear Regression
Regression Analysis
Building models that characterize the
relationships between a dependent variable
and one (single) or more (multiple)
independent variables, all of which are
numerical, for example:
Sales = a + b*Price + c*Coupons +
d*Advertising + e*Price*Advertising
Cross-sectional data
Time series data (forecasting)
Simple Linear Regression
Single independent variable
Linear relationship
SLR Model
Y = |
0
+ |
1
X + c

Intercept slope error
E(Y|X)
f(Y|X)
Error Terms (Residuals)
c
i
= Y
i
|
0
|
1
X
i


Estimation of the Regression
Line
True regression line (unknown):
Y = |
0
+ |
1
X + c

Estimated regression line:
Y = b
0
+ b
1
X

Observed errors:
e
i
= Y
i
- b
0
- b
1
X
i





Least Squares Regression

=
+
n
i
i i
X b b Y
1
2
1 0
]) [ (
2
1
2
1
1
X n X
Y X n Y X
b
n
i
i
n
i
i i

=
=
b
0
= Y - b
1
X

minimize
a
b
c
Excel Trendlines
Construct a scatter diagram
Method 1: Select Chart/Add Trendline
Method 2: Select data series; right click
= 0.0854(6000) 108.59
= 403.81
6000

Y
Regression and
Investment Risk
Systematic risk variation in stock price
explained by the market
Measured by beta
Beta = 1: perfect match to market
movements
Beta < 1: stock is less volatile than market
Beta > 1: stock is more volatile than
market
Systematic Risk (Beta)
Beta is the slope of the
regression line
Theory Without Regression
Y
Best estimate for Y is the
mean; independent of the
value of X
A measure of total variation is
SST = (Y
i
-Y)
2

Unexplained
variation
X
i

Y
i

Theory With Regression
Observed values Y
i

Fitted values Y
i

X
i

Y
i

Variation unexplained
after regression, Y - Y
Fitted line
Y = b
0
+ b
1
X
Y
Variation explained
by regression, Y - Y
Sums of Squares
SST = (Y
i
-Y)
2

=( Y)
2
+ (Y - )
2
Y

= SSR + SSE
Explained variation Unexplained variation
Coefficient of Determination
R
2
= SSR/SST = (SST SSE)/SST = 1 SSE/SST =
coefficient of determination: the proportion of
variation explained by the independent variable
(regression model)
0 s R
2
s 1
Adjusted R
2
incorporates sample size and number of
explanatory variables (in multiple regression models).
Useful for comparing against models with different
numbers of explanatory variables.
( )
(

=
2
1
1 1
2 2
n
n
R R
adj
Correlation Coefficient
Sample correlation coefficient
R = \R
2

Properties
-1 s R s 1
R = 1 => perfect positive correlation
R = -1 => perfect negative correlation
R = 0 => no correlation


Standard Error of the Estimate
MSE = SSE/(n-2) = an unbiased
estimate of the variance of the errors
about the regression line
Standard error of the estimate is


Measures the spread of data about the line

S
YX
=

2 n
SSE
Confidence Interval for Mean
Value of Y
Confidence intervals depend on the value of
the independent variable
t
o/2, n-2
S
YX
\h
i


i
Y

+ =
n
i
i
i
i
X X
X X
n
h
1
2
2
) (
) (
1
Prediction Intervals
Prediction intervals apply to individual
(not mean) values of the dependent
variable

t
o/2, n-2
S
YX
\1+h
i


i
Y

PHStat Simple Linear


Regression
Define range of dependent
and independent variables

Choose regression options


Choose output options
Regression Statistics and
Confidence Interval Outputs
Regression as ANOVA
Testing for significance of regression


SST = SSR + SSE
The null hypothesis implies that SST = SSE, or SSR = 0
MSR = SSR/1 = variance explained by regression
F = MSR/MSE
If F > critical value, it is likely that |
1
= 0, or that the
regression line is significant
H
0
: |
1
= 0
H
1
: |
1
= 0

t-test for Significance of
Regression

=
n
i
i YX
X X S
b
t
1
2
1 1
) ( /
|
with n-2 degrees of freedom
This allows you to test
H
0
: slope = |
1

H
1
: slope = |
1

Excel Regression Tool
Excel menu > Tools > Data Analysis >
Regression
Input variable ranges
Check appropriate
boxes


Select output options
Regression Output
Correlation coefficient
S
YX

b
0
b
1

p-value for
significance of
regression
t- test for slope
Confidence
interval for slope
Residuals
Standard residuals are residuals divided by their standard
error, expressed in units independent of the units of the
data.
Assumptions Underlying
Regression
Linearity
Check with scatter diagram of the data or the residual plot
Normally distributed errors for each X with mean 0 and constant
variance
Examine histogram of standardized residuals or use goodness-of-fit
tests
Homoscedasticity constant variance about the regression line
for all values of the independent variable
Examine by plotting residuals and looking for differences in
variances at different values of X
No autocorrelation. Residuals should be independent for each
value of the independent variable. Important if the independent
variable is time (e.g., forecasting models).
Residual Plot
Histogram of Residuals
Evaluating Homoscedasticity
OK


Heteroscadastic
Autocorrelation
Durbin-Watson statistic



D < 1 suggest autocorrelation
D > 1.5 suggest no autocorrelation
D > 2.5 suggest negative autocorrelation
PHStat tool calculates this statistic

=
=

=
n
i
i
n
i
i i
e
e e
D
1
2
2
2
1
) (