Professional Documents
Culture Documents
and
the General Linear Model
1
Outline
1. Introduction to Multiple Linear Regression
2. Statistical Inference
3. Topics in Regression Modeling
4. Example
5. Variable Selection Methods
6. Regression Diagnostic and Strategy for
Building a Model
2
1.
Introduction to Multiple Linear
Regression
3
Multiple Linear Regression
• Regression analysis is a statistical methodology to estimate the
relationship of a response variable to a set of predictor variables
• Multiple linear regression extends simple linear regression model to the
case of two or more predictor variable
Example:
A multiple regression analysis might show us that the demand of a product varies
directly with the change in demographic characteristics (age, income) of a market
area.
Historical Background
• Francis Galton started using the term regression in his biology research
• Karl Pearson and Udny Yule extended Galton’s work to the statistical
context
• Legendre and Gauss developed the method of least squares used in
regression analysis
• Ronald Fisher developed the maximum likelihood method used in the 4
related statistical inference (test of the significance of regression etc.).
History
Hi, I am Carl Friedrich Gauss (1777/4/30 – 1855/4/23).
I IHi,
developed amIthe am Francis Galton
Adrien-Marie
fundamentals Legendre
of
(1822/2/16
the basis
–
(1752/9/18
for -
1911/1/17).
HI, I am Karl Pearson.
1833/1/10).
least-squares
You guys analysis in 1795
regard me asatthe the age of
eighteen.In 1805, Iofpublished
founder an article named
Biostatistics.
Nouvelles
I published an article
In my méthodes
research called pour
Theoria
I found tall laMotus
Hi, again. I am
détermination des orbites
Corporum Coelestium
parents usual have shorter desConicis
in Sectionibus
Ronaldcomètes.
Aylmer
Solem Ambientum
children;
Fisher.In this article,andin vice
1809. versa. So the
In 1821, Iheight
published I introduced
another
of human hasMethod
article
being about
the least of
Least
square analysis
tendency Squares
with to the to
further
to regress world. Yes, I am
development,
its mean.
the
called Theoria
Yes, first person who
combinationis
It is me who published
first the article
observationum
use
We both developed
regarding toobnoxiae.
method ofsuch
leastarticle
squares,
erroribus word
regressionwhich “regression”
minimis
theoryisafter for This
includes Gauss–Markov
phenomenon the earliest
and form of
theorem
problems.
Galton. regression.
The random error, i , are assumed to be independent r.v.’s with mean 0 and variance 2
Thus Yi are independent r.v.’s with mean i and variance 2 , where
i E (Yi) xi xi kxi
1 1 k
6
Fitting the model
• The least squares (LS) method is used to find a line that fits the equation
Yi xi1 xi1 kxik i , i 1, 2,..., n
• Specifically, LS provides estimates of the unknown model parameters,
observed values, yi , and the corresponding points on the line with the same x’s
n
Q [ yi ( 0 1 xi1 2 xi 2 ... kxik )]2
i 1
• The resulting solutions, ˆ0 , ˆ1 , , ˆk are the least squares (LS)
estimators of , , , , respectively.
0 1 k
R R2
• only positive square root is used
• R is a measure of the strength of the association between the predictors
(x’s) and the one response variable Y
9
Multiple Regression Model in Matrix Notation
The multiple regression model can be represented in a compact form using matrix notation
Let:
Y1 y1 1
Y y
Y 2, y ,2
2
Yn n
y n
be the n x 1 vectors of the r.v.’s Yi ' s , their observed values yi ' s, and random errors i ' s,
respectively for all n observations
Let: 1 x11 x1k
1 x x2 k
X 21
1 xn1 xnk
be the n x (k + 1) matrix of the values of the predictor variables for all n observations
(the first column corresponds to the constant term )
0
10
ˆ0
Let: 0
ˆ1
ˆ
1
and
ˆ
k k
be the (k + 1) x 1 vectors of unknown model parameters and their LS estimates, respectively
Y X
• The simultaneous linear equations whose solutions yields the LS estimates:
X 'X X 'y
• If the inverse of the matrix X 'X exists, then the solution is given by:
ˆ ( X ' X ) 1 X ' y
11
2.
Statistical Inference
12
Determining the statistical significance of the
predictor variables:
• For statistical inferences, we need the assumption that
i ~ N 0, 2 *i.i.d. means independent & identically distributed
iid
and variance 2v jj , where v jj is the jth diagonal entry of the
matrix V (X' X)1
13
Deriving a pivotal quantity for the inference
on j
• Recall ˆ j ~ N ( j , 2v jj )
• The unbiased estimator of the unknown error variance 2
SSE ei 2 MSE
is given by S 2
n (k 1) n (k 1) d.o. f .
(n (k 1)) S 2 SSE 2
• We also know that W ~ ( k 1) , and that
2
2 n
ˆ j t / 2,n( k 1) SE ( ˆ j )
where SE ( ˆ j )
s v jj
15
Derivation of the Hypothesis Test for j
at the significance level α
Hypotheses: H 0 : j 0
Ha : j 0
n ( k 1)
S v jj
H 0 : j 0 for all 0 j k
Now consider:
H a : j 0 for at least one 0 j k
MSR
When
H0 is true, the test statistics F0 ~ f k ,n ( k 1)
MSE
a decision for a statistical
• An alternative and equivalent way to make
test is through the p-value, defined as:
ˆ * *
• The pivotal quantity for * is T
* *
~ Tn ( k 1)
s x Vx
20
3.1 Multicollinearity
21
How does the multicollinearity cause
difficulties?
is the solution to the equation X T X X T y, Thus X T X must be invertable
^
^
in order for to be unique and computable .
^
1
2. The matrix V ( X X ) ^ has very large elements. Therefore Var( ) 2 v jj
T
22
Measures of Multicollinearity
1. The correlation matrix R.
Easy but can’t reflect linear relationships
between more than two variables.
2. Determinant of R can be used as measurement
of singularity of X T X .
3. Variance Inflation Factors (VIF): the diagonal
elements of R1 . VIF>10 is regarded as
unacceptable.
23
3.2 Polynomial Regression
A special case of a linear model:
y 0 1 x ... k x k
Problems:
2 k
1. The powers of x, i.e., x, x , , x tend to be highly
correlated.
2. If k is large, the magnitudes of these powers tend to vary
over a rather wide range.
So, set k<=3 if a good idea, and almost never use k>5.
24
Solutions
1. Centering the x-variable:
y ( x x) ... ( x x) k
*
0
*
1
*
k
27
Why don’t we just use c indicator variables:
x1 , x2 ,..., xc ?
28
Example of the dummy variables
For instance, if we have four years of quarterly sale data
of a certain brand of soda. How can we model the time
trend by fitting a multiple regression equation?
29
3. Once the dummy variables are included, the
resulting regression model is referred to as a
“General Linear Model”.
Y = height of child
x1 = height of father
x2 = height of mother
x3 = gender of child
32
Example
In matrix notation:
Y X
(X ' X ) X ' y
1
33
Example
Father Mother Gender Child
1 78.5 67 1 73.2
1 78.5 67 1 69.2
1 78.5 67 0 69
1 78.5 67 0 69
1 75 64 1 71
1 75 64 0 68
… … … … …
34
Example
Important calculations
SSE ( yi yi ) 4149.162
2
SST ( yi y ) 11,515.060
2
yi X i ˆ
Is the predicted height
SSE of each child given a
r 1
2
0.6397 set of predictor
SST variables
SSE
MSE 4.64
n (k 1)
35
Example
Are these values significantly different than zero?
Ho: βj = 0
Ha: βj ≠ 0
Reject H0j if j
tj
tn ( k 1), / 2 t894,0.025 2.245
SE ( j )
1.63 0.0118 0.0125 0.00596
0.0118 0.000184 0.0000143 0.0000223
V ( X ' X ) 1
0.0125 0.0000143 0.000211 0.0000327
0.00596 0.0000223 0.0000327 0.00447
SE ( j ) MSE * v jj
36
Example
β-estimate SE t
Analysis of Variance
Sum of Mean
Source DF Squares Square F Value Pr > F
Parameter Estimates
Parameter Standard
Variable Label DF Estimate Error t Value Pr > |t|
Parameter Estimates
Variance
Variable Label DF Inflation
Intercept Intercept 1 0
father father 1 1.00607
mother mother 1 1.00660
gender 1 1.00188
41
EXAMPLE
42
EXAMPLE
By
Gary
Bedford &
Christine
Vendikos
43
5. Variables Selection Method
A. Stepwise Regression
44
Variables selection method
(1) Why do we need to select the variables?
45
Stepwise Regression
(p-1)-variable model:
Yi 0 1 xi ,1 ... p 1 xi , p 1 i
P-variable model
Yi 0 1 xi ,1 ... p 1 xi , p 1 p xi , p i
46
Partial F - test
Hypothesis test
H 0 p: p 0
H1 p: p 0
( SSE p 1 SSE p ) / 1
Fp f1,n ( p 1),
SSE p /[ n ( p 1)]
p
test statistic : t p
SE ( p )
reject H 0 p :| t p | tn ( p 1), / 2
47
Partial correlation coefficients
48
5. Variables selection method
A. Stepwise Regression:
SAS Example
49
Example 11.5 (T&D pg. 416), 11.9 (T&D pg. 431)
The following table shows data on the heat evolved in calories during the hardening of
cement on a per gram basis (y) along with the percentages of four ingredients: tricalcium
aluminate (x1), tricalcium silicate (x2), tetracalcium alumino ferrite (x3), and dicalcium
silicate (x4).
No. X1 X2 X3 X4 Y
1 7 26 6 60 78.5
2 1 29 15 52 74.3
3 11 56 8 20 104.3
4 11 31 8 47 87.6
5 7 52 6 33 95.9
6 11 55 9 22 109.2
7 3 71 17 6 102.7
8 1 31 22 44 72.5
9 2 54 18 22 93.1
10 21 47 4 26 1159
11 1 40 23 34 83.8
12 11 66 9 12 113.3
13 10 68 8 12 109.4
50
SAS Program (stepwise variable selection is used)
data example115;
input x1 x2 x3 x4 y;
datalines;
7 26 6 60 78.5
1 29 15 52 74.3
11 56 8 20 104.3
11 31 8 47 87.6
7 52 6 33 95.9
11 55 9 22 109.2
3 71 17 6 102.7
1 31 22 44 72.5
2 54 18 22 93.1
21 47 4 26 115.9
1 40 23 34 83.8
11 66 9 12 113.3
10 68 8 12 109.4
;
run;
proc reg data=example115;
model y = x1 x2 x3 x4 /selection=stepwise;
run; 51
Selected SAS output
Parameter Standard
Variable Estimate Error Type II SS F Value Pr > F
52
SAS Output (cont)
All variables left in the model are significant at the 0.1500 level.
No other variable met the 0.1500 significance level for entry into the model.
53
5. Variables selection method
54
Best Subsets Regression
55
Best Subsets Regression
56
Best Subsets Regression
Optimality Criteria
SSR p SSE p
• rp 2-Criterion: r
2
p 1
SST SST
MSE p
• Adjusted rp 2-Criterion: r 2
adj , p 1
MST
• Cp-Criterion (recommended for its ease of computation and its ability
to judge the predictive power of a model)
n
1
p
2
ip
[ E
i 1
[Yˆ ] E[Yi ]] 2
SSE p
Cp 2( p 1) n
ˆ 2 57
Best Subsets Regression
Algorithm
Note that our problem is to find the minimum of a given function.
Use the stepwise subsets regression algorithm
and replace the partial F criterion with other
criterion such as Cp.
Enumerate all possible cases and find the
minimum of the criterion functions.
Other possibility?
58
Best Subsets Regression & SAS
proc reg data=example115;
model y = x1 x2 x3 x4 /selection=ADJRSQ;
run;
For the selection option, SAS has implemented 9 methods
in total. For best subset method, we have the following
options:
Maximum R2 Improvement (MAXR)
Minimum R2 (MINR) Improvement
R2 Selection (RSQUARE)
Adjusted R2 Selection (ADJRSQ)
Mallows' Cp Selection (CP)
59
6. Building A Multiple
Regression Model
Steps and Strategy
60
Modeling is an iterative process. Several
cycles of the steps maybe needed before
arriving at the final model.
The basic process consists of seven steps
61
Get started and Follow the Steps
64
Step III
65
Step IV
Divide the Data
Training Sets: building
Test Sets: checking
How to divide?
Large sample Half-Half
Small sample size of training set >16
66
Step V
67
Step VI
68
Step VII
69
Regression Diagnostics (Step VI)
70
Linear Regression Assumptions
71
Residual Plot
for Functional Form (Linearity)
e e
X X
72
Residual Plot
for Equal Variance
SR SR
X X
Fan-shaped.
Standardized residuals used typically (residual
divided by standard error of prediction)
73
Residual Plot
for Independence
SR SR
X X
74
Questions?
www.ams.sunysb.edu/~zhu
zhu@ams.sunysb.edu
75