Professional Documents
Culture Documents
y ( SSE
n
1 i
2
i i
=
=
Linear Regression
Assumptions
Y can be predicted from X
A graph of X & Y is a straight line
The line extends infinitel y in both
directions
Model explains onl y the variability in Y
Each XY pair was randoml y sampled
Each XY pair was selected
independentl y
Linear Regression Assumptions
For each x value, y is a random variable
havi ng a normal (bell-shaped) distributi on.
All of these y distributions have the same
variance. Al so, for a gi ven val ue of x, the
distribution of y values has a mean that li es
on the regression l ine. (Results are not
seri ousl y affected i f departures from normal
distributions and equal variances are not too
extreme.)
The Normality of y
y is normally distributed with mean
E(y) = |
0
+ |
1
x, and a constant standard
deviation o
c
y is normally distributed with mean
E(y) = |
0
+ |
1
x, and a constant standard
deviation o
c
3
|
0
+ |
1
x
1
|
0
+ |
1
x
2
|
0
+ |
1
x
3
E(y|x
2
)
E(y|x
3
)
x
1
x
2
x
3
1
E(y|x
1
)
2
The standard deviation remains constant,
but the mean value changes with x
Standard Error of Estimate
The mean error is equal to zero.
If o
c
is small the errors tend to be close to
zero (close to the mean error). Then, the
model fits the data well.
Therefore, we can, use o
c
as a measure of
the suitability of using a linear model.
An estimator of o
c
is given by s
c
2
tan
=
n
SSE
s
Estimate of Error dard S
=
c
where
Multivariate Data
Analysis
Selecting a Multivariate Technique
Dependency
Dependent (criterion) variables and
independent (predictor) variables are
present
Interdependency
Variables are interrelated without
designating some dependent and others
independent
Dependency Techniques
Multiple regression (Univariate and
multivariate)
Conjoint analysis
Discriminant analysis
Multivariate analysis of variance
(MANOVA)
Linear structural relationships (LISREL)
Interdependency Techniques
Factor analysis
Cluster analysis
Multidimensional Scaling (MDS)
Multiple Regression Model
The equation that describes how the dependent
variable y is related to the independent variables
x
1
, x
2
, . . . x
p
and an error term is called the
multiple regression model.
The multiple regression model is:
y =
0
+
1
x
1
+
2
x
2
+. . . +
p
x
p
+
0
,
1
,
2
, . . . ,
p
are the parameters.
is a random variable called the error term.
In the SLR, the conditional mean of Y depends on
X. The Multiple Regression Model extends this
idea to include more than one independent
variable.
The equation that describes how the mean value of The equation that describes how the mean value of y y is is
related to related to x x
1 1
, , x x
2 2
, . . . , . . . x x
p p
is called the is called the multiple regression multiple regression
equation equation. .
The multiple regression equation is: The multiple regression equation is:
E E( (y y) = ) =
0 0
+ +
1 1
x x
1 1
+ +
2 2
x x
2 2
+. . . + +. . . +
p p
x x
p p
Multiple Regression Equation Multiple Regression Equation
A simple random sample is used to compute sample A simple random sample is used to compute sample
statistics statistics b b
0 0
, , b b
1 1
, , b b
2 2
, , . . . , . . . , b b
p p
that are used as the point that are used as the point
estimators of the parameters estimators of the parameters
0 0
, ,
1 1
, ,
2 2
, . . . , , . . . ,
p p
. .
The The estimated multiple regression equation is: estimated multiple regression equation is:
y y = =b b
0 0
+ +b b
1 1
x x
1 1
+ +b b
2 2
x x
2 2
+. . . + +. . . +b b
p p
x x
p p
Estimated Multiple Regression Equation Estimated Multiple Regression Equation
^ ^
Y
X
1
X
2
X
3
Estimation Process
Multiple Regression Model Multiple Regression Model
y y = =
0 0
+ +
1 1
x x
1 1
+ +
2 2
x x
2 2
+. . + +. . +
p p
x x
p p
+ +
Multiple Regression Equation Multiple Regression Equation
E E( (y y) = ) =
0 0
+ +
1 1
x x
1 1
+ +
2 2
x x
2 2
+. . . + +. . . +
p p
x x
p p
Unknown parameters are Unknown parameters are
0 0
, ,
1 1
, ,
2 2
, . . . , , . . . ,
p p
Sample Data: Sample Data:
x x
1 1
x x
2 2
. . . x . . . x
p p
y y
. . . . . . . .
. . . . . . . .
Estimated Multiple Estimated Multiple
Regression Equation Regression Equation
b b
0 0
, , b b
1 1
, , b b
2 2
, , . . . , . . . , b b
p p
are sample statistics are sample statistics
b b
0 0
, , b b
1 1
, , b b
2 2
, , . . . , . . . , b b
p p
provide estimates of provide estimates of
0 0
, ,
1 1
, ,
2 2
, . . . , , . . . ,
p p
0 1 1 2 2
...
p p
y b bx bx b x = + + + +
0 1 1 2 2
...
p p
y b bx bx b x = + + + +
Least Squares Method
Least Squares Criterion
Computation of Coefficients Values
The formulas for the regression
coefficients b
0
, b
1
, b
2
, . . . b
p
involve the
use of matrix algebra. We will rely on
computer software packages to perform
the calculations.
min (
i
y y
i
)
2
min (
i
y y
i
)
2
^ ^
Least Squares Method Least Squares Method
A Note on Interpretation of Coefficients A Note on Interpretation of Coefficients
b b
i i
represents an estimate of the change in represents an estimate of the change in y y
corresponding to a one corresponding to a one- -unit change in unit change in x x
i i
when all other when all other
independent variables are held constant. independent variables are held constant.
Relationship Among SST, SSR, SSE
SST = SSR + SSE
Multiple Coefficient of Determination Multiple Coefficient of Determination
Multiple Coefficient of Determination Multiple Coefficient of Determination
R R
2 2
=SSR/SST =SSR/SST
Adjusted Multiple Coefficient of Determination Adjusted Multiple Coefficient of Determination
If R
2
>0, then we reject the null hypothesis of no
relationship.
Multiple Coefficient of Determination Multiple Coefficient of Determination
R R
n
n p
a
2 2
1 1
1
1
=
( ) R R
n
n p
a
2 2
1 1
1
1
=
( )
Model Assumptions
Assumptions About the Error Term
1. The error is a random variable with mean of zero.
Implication: For given value of several independent
variables, the expected or average value of y is given by
E(y) =
0
+
1
x
1
+
2
x
2
+. . . +
p
x
p
2. The variance of , denoted by
2
, is the same for all
values of the independent variables. Implication: The
variance of y equals
2
and same for all values of x
1,
x
2,
. .
. X
p
3. The values of are independent. Implication: the size of
the error of a particular set of variables is not related to the
size of any other set of variable.
4. The error is a normally distributed random variable
reflecting the deviation between the y value and the
expected value of y given by
0
+
1
x
1
+
2
x
2
+. . . +
p
x
p
Implication: y is also a normally distributed random
variable for given
0,
1,
2
. . .
p
In simple linear regression, the F and t tests
provide the same conclusion.
In multiple regression, the F and t tests
have different purposes.
The F test is used to determine whether a
significant linear relationship exists between
the dependent variable and the set of all the
independent variables.
The F test is referred to as the test for
overall significance.
Testing for Significance
Testing for Significance: t
Test
If the F test shows an overall significance,
the t test is used to determine whether each
of the individual independent variables is
significant.
A separate t test is conducted for each of
the independent variables in the model.
We refer to each of these t tests as a test
for individual significance.
Testing for Significance: F Test
Hypotheses
H
0
:
1
=
2
=. . . =
p
=0
H
a
: One or more of the parameters
is not equal to zero.
Test Statistic
F =MSR/MSE
Rejection Rule
Reject H
0
if F >F
where F