You are on page 1of 109

Case study

Simple linear
Advanced regression methods regression

Gaussian simple
1. Reminders on linear regression linear regression

Multiple linear
regression

V. Lefieux Gaussian multiple


linear regression

Regression in
practice

References

1/109
Table of contents
Case study

Simple linear
regression
Case study
Gaussian simple
linear regression

Multiple linear
Simple linear regression regression

Gaussian multiple
linear regression
Gaussian simple linear regression Regression in
practice

References
Multiple linear regression

Gaussian multiple linear regression

Regression in practice

2/109
Table of contents
Case study

Simple linear
regression
Case study
Gaussian simple
linear regression

Multiple linear
Simple linear regression regression

Gaussian multiple
linear regression
Gaussian simple linear regression Regression in
practice

References
Multiple linear regression

Gaussian multiple linear regression

Regression in practice

3/109
Forecast the ozone concentration
112 records from Rennes-summer 2001 (source: Laboratoire Case study

of mathematiques appliquees of lAgrocampus Ouest) Simple linear


regression
containing: Gaussian simple
linear regression
I MaxO3: daily maximum of ozone concentration
Multiple linear
( gr /m3 ). regression

I T9, T12, T15: temperature (at 09:00, 12:00 and Gaussian multiple
linear regression
15:00). Regression in
practice
I Ne9, Ne12, Ne15: cloud cover (at 09:00, 12:00 and
References
15:00).
I Vx9, Vx12, Vx15: east-west component of the wind (at
09:00, 12:00 and 15:00).
I MaxO3v: daily maximum of ozone concentration for the
day before.
I wind: wind direction at 12:00.
I rain: rainy or dry day.
4/109
A linear link ?
Case study

Simple linear
regression

Gaussian simple
linear regression

Multiple linear
regression

Gaussian multiple
linear regression

Regression in
practice

References

5/109
Table of contents
Case study

Simple linear
regression
Case study
Gaussian simple
linear regression

Multiple linear
Simple linear regression regression

Gaussian multiple
linear regression
Gaussian simple linear regression Regression in
practice

References
Multiple linear regression

Gaussian multiple linear regression

Regression in practice

6/109
Terminology
Case study

Simple linear
regression

The word regression was introduced by Francis Galton in Gaussian simple


linear regression
Regression towards mediocrity in hereditary stature: (Galton,
Multiple linear
1886) (http://galton.org/essays/1880-1889/ regression

galton-1886-jaigi-regression-stature.pdf). Gaussian multiple


linear regression
He observed that extreme heights in parents are not Regression in
completely passed on to their offspring. practice

References

The most known regression is the linear model, simple or


multivatiate (Cornillon and Matzner-Lber, 2010), but there
are a lot of other models including non linear models
(Antoniadis et al., 1992).

7/109
Assumptions
Case study
Y : dependent variable, random. Simple linear
regression
X : explanatory variable, deterministic.
Gaussian simple
linear regression
Simple linear regression assumes: Multiple linear
regression

Y = 1 + 2 X + Gaussian multiple
linear regression

Regression in
where: practice

References
I 1 and 2 are unknown parameters (unobserved),
I , the error of model, is a centered random variable
with variance 2 :

E () = 0 ,
Var () = 2 .

8/109
Sample
Case study

Let (xi , yi )i{1,...,n} be n realizations of (X , Y ): Simple linear


regression

Gaussian simple
i {1, . . . , n} : yi = 1 + 2 xi + i . linear regression

Multiple linear
regression
One assume that:
Gaussian multiple
I (yi )i{1,...,n} is an i.i.d. sample of Y . linear regression

Regression in
I (xi )i{1,...,n} are deterministic (and observed). practice

I (i )i{1,...,n} is an i.i.d. sample of (unobserved). References

(i )i{1,...,n} , for (i, j) i {1, . . . , n}2 , satisfy:


I E (i ) = 0 ,
I Var (i ) = 2 ,
I Cov (i , j ) = 0 if i 6= j .

9/109
Matrix form 1/3
Case study

Simple linear
regression

Gaussian simple
linear regression

Multiple linear
regression

y
1 1 x 1

1
Gaussian multiple
linear regression
. . . 1 .
.. = .. .. + .. . Regression in
practice
2 References
yn 1 xn n

10/109
Matrix form 2/3
Case study

Simple linear
regression

Gaussian simple
Or: linear regression
Y = X + Multiple linear
regression
where: Gaussian multiple
linear regression

Regression in

y1 1 x1 1
practice

. . . 1 . References
Y= . . . .
. , X = . . , = , = . .

2
yn 1 xn n

11/109
Matrix form 3/3
Case study

Simple linear
regression

Gaussian simple
linear regression

Multiple linear
We have: regression

Gaussian multiple
linear regression
E () = 0 ,
Regression in
= 2 In . practice

References

where is the variance-covariance matrix of .

12/109
Ordinary least square estimator 1/5
Case study

Simple linear
Ordinary least square (OLS) estimators of 1 and 2 , b1 and regression

b2 , minimize: Gaussian simple


linear regression

n
X Multiple linear

(yi 1 2 xi )2
regression
S (1 , 2 ) =
Gaussian multiple
i=1 linear regression

Regression in
so: practice
  References
b1 , b2 = arg min S (1 , 2 )
(1 ,2 )
n
X
= arg min (yi 1 2 xi )2 .
(1 ,2 )
i=1

2 must also estimated.

13/109
Ordinary least square estimator 2/5
Case study

The regression line is: Simple linear


regression

Gaussian simple
linear regression

Multiple linear
regression

Gaussian multiple
linear regression

Regression in
practice

References

where ybi = b1 + b2 xi .

14/109
Ordinary least square estimator 3/5
Case study

It isnt the distance between points and regression line that Simple linear
regression
we minimize: Gaussian simple
linear regression

Multiple linear
regression

Gaussian multiple
linear regression

Regression in
practice

References

15/109
Ordinary least square estimator 4/5
Case study

Simple linear
regression

Gaussian simple
linear regression
Its possible to consider other distances, for example: Multiple linear
regression
n
X Gaussian multiple
S (1 , 2 ) = |yi 1 2 xi | . linear regression

Regression in
i=1 practice

References
Ordinary least square estimators are sensitive to outliers but
are easily obtained (by derivation) and are unique (if the xi
arent all equal).

16/109
Ordinary least square estimator 5/5
Case study

Simple linear
regression

Gaussian simple
linear regression
We assume that (xi )i{1,...,n} arent all equal (otherwise Multiple linear
regression
columns of X are colinear and the solution non unique).
Gaussian multiple
Ordinary least square estimator of (1 , 2 ) is: linear regression

Regression in
sXY practice
b2 = 2 ,
sX References

b1 = y b2 x .

17/109
Proof 1/3
We search b1 and b2 which minimize: Case study

Simple linear
n
X regression

S (1 , 2 ) = (yi 1 2 xi )2 . Gaussian simple


linear regression
i=1
Multiple linear
regression
The hessian matrix of the function:
Gaussian multiple
linear regression
2
S: R R Regression in
practice
(1 , 2 ) 7 S (1 , 2 ) References

is positive definite.
Hessian matrix, H (S) (or 2 S), is:

2 S(1 ,2 ) 2 S(1 ,2 )
12 1 2
H (S) = .
2 S(1 ,2 ) 2 S(1 ,2 )
2 1 22

18/109
Proof 2/3
We search values such that Case study

Simple linear

S(1 ,2 ) regression
1
S (1 , 2 ) = =0. Gaussian simple
linear regression
S(1 ,2 )
2 Multiple linear
regression

We have: Gaussian multiple


linear regression

n Regression in
S (1 , 2 ) X practice
= 2 (yi 1 2 xi ) ,
1 References
i=1
n
S (1 , 2 ) X
= 2 xi (yi 1 2 xi ) .
2
i=1

So: P  
n yi
b1
b2 i =0
x
i=1
Pn   .
xi yi b1 b2 xi = 0
i=1
19/109
Proof 3/3
First equation gives:
Case study
n n
X X Simple linear
nb1 + b2 xi = yi b1 + b2 x = y . regression

i=1 i=1 Gaussian simple


linear regression

Second equation gives: Multiple linear


regression
n
X n
X n
X Gaussian multiple
b1 xi + b2 xi2 = xi yi . linear regression

i=1 i=1 i=1 Regression in


practice

So: References

 n
X n
X n
X
y b2 x xi + b2 xi2 = xi yi
i=1 i=1 i=1
n n
! n n
X X X X
b2 xi2 x xi = xi yi y xi
i=1 i=1 i=1 i=1
1
Pn
x i i xy
y sX ,Y
b2 = n1 Pi=1
n 2 = .
n
2
i=1 xi x
sX2
20/109
Regression line
Case study

Simple linear
regression

Gaussian simple
linear regression

Multiple linear
The regression line is: regression

Gaussian multiple
linear regression
y = b1 + b2 x .
Regression in
practice

References

The barycenter (x, y ) belongs to the regression line.

21/109
Fitted value
Case study

Simple linear
regression

Gaussian simple
linear regression

Multiple linear
regression

The fitted value of i-th observation is: Gaussian multiple


linear regression

Regression in
ybi = b1 + b2 xi . practice

References

22/109
Residual
Case study

Simple linear
The residual of i-th observation is: regression

Gaussian simple
linear regression
ei = yi ybi .
Multiple linear
regression
ei is an estimation of i . Gaussian multiple
linear regression

Regression in
We have: practice
  n
X
ei2
References
S b1 , b2 =
i=1

and:
n
X
S (1 , 2 ) = 2i .
i=1

23/109
Properties of fitted values
Case study

Simple linear
regression

Gaussian simple
linear regression

I Multiple linear
n n regression
X X
Gaussian multiple
ybi = yi . linear regression
i=1 i=1 Regression in
practice
I !
1 (xi x)2 References

yi ) = 2
Var (b + Pn 2
.
n i=1 (xi x)

24/109
Properties of residuals
Case study

Simple linear
regression

Gaussian simple
linear regression

Multiple linear
regression

Gaussian multiple
n
X linear regression
ei = 0 . Regression in
practice
i=1
References

25/109
Properties of b1 and b2
Case study

Simple linear
I b1 and b2 are unbiased estimators of 1 and 2 : regression

Gaussian simple
linear regression
 
i {1, 2} : E bi = i . Multiple linear
regression

I Pn Gaussian multiple
  2 2
i=1 xi
linear regression
Var b1 = Pn . Regression in
n i=1 (xi x)2 practice

References
I
  2
Var b2 = Pn .
i=1 (xi x)2
I
  2 x
Cov b1 , b2 = Pn 2
.
i=1 (xi x)

26/109
Linear estimators
Case study

Simple linear
regression

Gaussian simple
Its possible to find (1 , 2 ) (Rn )2 such that b1 = > 1Y
linear regression

and b2 = >
Multiple linear
2 Y: regression

Gaussian multiple
n
!
linear regression
X 1 x (x i x)
b1 = y b2 x = Pn 2
yi , Regression in
n j=1 (xj x)
practice
i=1
References
n
X (xi x)
b2 = Pn yi .
i=1 j=1 (xj x)2

27/109
Gauss-Markov theorem
Case study

Simple linear
regression

Gaussian simple
linear regression

Multiple linear
regression
The theorem states that OLS estimators b1 and b2 have the Gaussian multiple
linear regression
smallest variances among unbiased linear estimators.
Regression in
practice

They are BLUE: Best Linear Unbiased Estimators. References

28/109
Residual variance
Case study

Simple linear
regression

Gaussian simple
linear regression

Multiple linear
We consider the following unbiased estimation of 2 , called regression

residual variance: Gaussian multiple


linear regression
n Regression in
2 1 X 2 practice

b = ei .
n2 References
i=1

29/109
Forecast
Case study

Simple linear
regression
For an observation xn+1 , we call forecast (or prediction):
Gaussian simple
linear regression
p
ybn+1 = b1 + b2 xn+1 . Multiple linear
regression

Its a forecast of: Gaussian multiple


linear regression

Regression in
yn+1 = 1 + 2 xn+1 + n+1 . practice

References

We have:
!
p  1 (xn+1 x)2
Var ybn+1 = 2 + Pn 2
.
n i=1 (xi x)

30/109
Forecast error
Case study

Simple linear
regression

The forecast error is: Gaussian simple


linear regression
p p Multiple linear
en+1 = yn+1 ybn+1 . regression

Gaussian multiple
We have: linear regression
p 
E en+1 =0 Regression in
practice

References
and:
!
p  2 1 (xn+1 x)2
Var en+1 = 1 + + Pn 2
.
n i=1 (xi x)

31/109
Geometric interpretation 1/2
Case study

We have: Simple linear


regression
Y = 1 1n + 2 X +
Gaussian simple
linear regression
where Multiple linear
regression
x1 Gaussian multiple
. linear regression
X= .
. . Regression in
practice
xn References

We can rewrite:
n
X
S (1 , 2 ) = (yi (1 + 2 xi ))2 = kY (1 1 + 2 X)k2
i=1

where kxk is the euclidean distance of x Rp .

32/109
Geometric interpretation 2/2
Case study

Simple linear
regression

We have: Gaussian simple


linear regression
2 Multiple linear
2
= min kY (1 1 + 2 X)k regression

Y Y
b
(1 ,2 ) Gaussian multiple
linear regression

Regression in
with Y
b = b1 1 + b2 X.
practice

References
b is the orthogonal projection of Y on the subspace spanned
Y
by 1 and X (sp{1, X}).

b1 and b2 are the coordinates of the Y projection.

33/109
Introduction to the coefficient of determination
Case study

Simple linear
regression

Gaussian simple
linear regression
We want to explain variations of Y by variations of X .
Multiple linear
Variations of Y are the differences between yi and their regression

average y : yi y . Gaussian multiple


linear regression

Regression in
We can write: practice

References

yi y = ybi y + yi ybi

where ybi y is the variation explained by the model.

34/109
Analysis of variance
Case study

Simple linear
regression

Gaussian simple
ANOVA (ANalysis Of VAriance) states: linear regression

Multiple linear
SST =SSM +SSR regression

n n n Gaussian multiple
X X X linear regression
(yi y )2 = yi y )2
(b + (yi ybi )2 Regression in
i=1 i=1 i=1 practice
2 2 References
kY y 1n k2 = Y y 1n + Y Y
b b

where SST is the Sum of Squares Total, SSM is the Sum of


Squares Model and SSR is the Sum of Squares Residual.

35/109
Coefficient of determination 1/3
Case study

Simple linear
regression

The coefficient of determination, R2 is defined by: Gaussian simple


linear regression
2 Multiple linear
y 1 regression
b
SSM Y n
2
R = = . Gaussian multiple
SST kY y 1n k2 linear regression

Regression in
practice
2
R [0, 1] because: References

0 SSM SST .

If R2 = 1 then SSM = SST.


If R2 = 0 then SSR = SST.

36/109
Coefficient of determination 2/3
For the simple linear regression: Case study

Simple linear
n n  2
1X 2 1X sX ,Y sX ,Y regression
yi y ) =
(b y 2 x + 2 xi y Gaussian simple
n n sX sX linear regression
i=1 i=1
n Multiple linear
sX ,Y 2 1 X
 
regression
= 2
(xi x)2
sX n Gaussian multiple
linear regression
i=1
 2 Regression in
sX ,Y
= sX2 practice

sX2 References

sX ,Y 2
 
= .
sX

So:
s 2b
Pn
y )2
1  2
2 SSM i=1 (b
yi sX ,Y
R = = Y2 = n
1 Pn 2
= = rX2 ,Y .
SST sY n i=1 (yi y) sX sY

37/109
Coefficient of determination 3/3
Case study

Simple linear
regression

Dont over-interpret the coefficient of determination: Gaussian simple


linear regression

Multiple linear
regression
2
I For a good model, R is close to 1. Gaussian multiple
linear regression

Regression in
I But a R2 close to 1 doesnt indicate that a linear model practice

is adequate. References

I A R2 close to 0 indicates that a linear model isnt


adequate (some non linear models could be adequate).

38/109
Limitations
Case study

Simple linear
regression

Gaussian simple
linear regression

Multiple linear
Its impossible to build confidence intervals on parameters, regression

or test their significancy: it becomes possible if we add a Gaussian multiple


linear regression
distribution. Regression in
practice

The gaussian simple linear regression assumes that the error References

is gaussian.

39/109
Table of contents
Case study

Simple linear
regression
Case study
Gaussian simple
linear regression

Multiple linear
Simple linear regression regression

Gaussian multiple
linear regression
Gaussian simple linear regression Regression in
practice

References
Multiple linear regression

Gaussian multiple linear regression

Regression in practice

40/109
Assumptions
Case study

Simple linear
regression

Gaussian simple
linear regression

Multiple linear
regression
We add: Gaussian multiple
N 0, 2 .

linear regression

Regression in
practice
2

(i )i{1,...,n} is an i.i.d sample with distribution N 0, . References

41/109
Distribution of Y
Case study

Simple linear
regression

Gaussian simple
linear regression
We have: Multiple linear
regression

Gaussian multiple

1 + 2 x1 linear regression
.. 2 Regression in
Y Nn
. , In .
practice

References

1 + 2 xn

42/109
Maximum likelihood estimation 1/3
Case study

The likelihood of (1 , . . . , n ) is: Simple linear


regression

p 1 , . . . , n ; 1 , 2 , 2
Gaussian simple

linear regression

n   !
1 X yi 1 2 xi 2
Multiple linear
1 regression
= n exp
2 2 Gaussian multiple
i=1 linear regression
 
1 1 Regression in
= n exp 2 S (1 , 2 ) . practice
2 2 References

So the log-likelihood is:

` 1 , . . . , n ; 1 , 2 , 2

  n 1
= n log 2 log 2 2 S (1 , 2 ) .
2 2

43/109
Maximum likelihood estimation 2/3
Case study

Simple linear
regression

We obtain maximum: Gaussian simple


linear regression
 
S b1,ML , b2,ML Multiple linear
` 1 , . . . , n ; 1 , 2 , 2

regression
=0 =0,
1 1 Gaussian multiple
  linear regression
` 1 , . . . , n ; 1 , 2 , 2 S b1,ML , b2,ML

Regression in
=0 =0, practice
2 2
References
` 1 , . . . , n ; 1 , 2 , 2

n 1  
2
= 0 2 + 4 S b1,ML , b2,ML = 0 .
2b
ML 2bML

Maximum likelihood and OLS estimators of 1 and 2 are


same.

44/109
Maximum likelihood estimation 3/3
Case study
We have: Simple linear
regression
 
S b1 , b2 Gaussian simple
linear regression
2

bML =
n Multiple linear
regression
n 
1 X 2
Gaussian multiple
= yi b1 b2 xi linear regression
n
i=1 Regression in
n practice
1 X
= (yi ybi )2 References
n
i=1
n
1X
= ei2 .
n
i=1

Maximum likelihood estimation of 2 is biased (but


asymptotically unbiased).

45/109
Parameters distribution 1/6
Case study

Simple linear
regression
 > Gaussian simple
With = (1 , 2 )> and
b = b1 , b2 , we have: linear regression

Multiple linear
regression
2

b N2 , V
Gaussian multiple
linear regression

where: Regression in
practice
References
1 Pn 2
1 n i=1 xi x
V = Pn .
x)2

i=1 (xi x 1

46/109
Parameters distribution 2/6
Case study

Simple linear
regression

Gaussian simple
linear regression
So:
Multiple linear
Pn ! regression
2 2
i=1 xi Gaussian multiple
b1 N 1 , Pn , linear regression
n i=1 (xi x)2 Regression in
practice
!
2
b2 N 2 , Pn 2
. References

i=1 (xi x)

The 2 estimators arent independent.

47/109
Parameters distribution 3/6
Case study

Simple linear
regression

Gaussian simple
linear regression

Cochran theorem gives: Multiple linear


regression
Pn 2 Gaussian multiple
(n 2) 2 i=1 ei linear regression
= 2n2
2 2 Regression in
practice

and: References

b
b2 .

48/109
Cochran theorem
Case study

Simple linear
regression

Consider X Nn (, 2 In ) . Gaussian simple


linear regression

Multiple linear
regression
Let H be the orthogonal projection matrix on a
Gaussian multiple
p-dimensional subspace of Rn . linear regression

Regression in
practice
The Cochran theorem states that:
References
I H X Nn (H, 2 H) .
I HX
X HX .
kH(X )k2
I
2
2p .

49/109
Parameters distribution 4/6
Case study

With: Simple linear


regression
2
Pn 2
i=1 xi
Gaussian simple
2b = Pn 2
, linear regression
1 n i=1 (xi x) Multiple linear
regression
2
2b = Pn 2
, Gaussian multiple
linear regression
i=1 (xi x)
2
Regression in
practice
we have for i {1, 2}: References
 
bi N i , 2b
i

and:
bi i
N (0, 1) .
bi

50/109
Parameters distribution 5/6
Case study

Simple linear
regression
We can estimate 2 by:
Gaussian simple
linear regression
n
1 X Multiple linear
b2 =
ei2 . regression
n2
i=1 Gaussian multiple
linear regression

So: Regression in
practice

2
Pn 2 References
i=1 xi
b2b =
b
Pn 2
,
1 n i=1 (xi x)
b2

b2b = Pn
.
2
i=1 (xi x)2

51/109
Parameters distribution 6/6
Case study

Simple linear
regression

Gaussian simple
We have for i {1, 2}: linear regression

Multiple linear
regression
bi i Gaussian multiple
i {1, 2} : Tn2 linear regression

bbi
Regression in
practice
and: References

1 b >  
1 b
V F (2, n 1) .
2
2b

52/109
Test on 1 and 2
Case study
For i {1, 2}, we test:
Simple linear
( regression

H0 : i = c Gaussian simple
linear regression
H1 : i 6= c Multiple linear
regression

where c R (the test is called significancy test for c = 0). Gaussian multiple
linear regression
We use the statistic test: Regression in
practice

bi c References
Ti = .

bbi

We reject H0 at the level if:

|ti | > tn2,1 2



where tn2,1 2 is the 1 2 -quantile of T (n 2).

53/109
Confidence intervals for 1 and 2
Case study

Simple linear
regression

Gaussian simple
linear regression

The confidence interval of i with level 1 is: Multiple linear


regression
h i Gaussian multiple
bi tn2,1 2
bbi ; bi + tn2,1 2
bbi . linear regression

Regression in
practice

References
Its
 alsopossible to build a confidence interval for the vector
b1 , b2 .

54/109
Table of contents
Case study

Simple linear
regression
Case study
Gaussian simple
linear regression

Multiple linear
Simple linear regression regression

Gaussian multiple
linear regression
Gaussian simple linear regression Regression in
practice

References
Multiple linear regression

Gaussian multiple linear regression

Regression in practice

55/109
Introduction
Case study

Simple linear
regression

Gaussian simple
linear regression

Multiple linear
The multiple linear regression assumes that there is a linear regression
link between the dependent variable Y and p explanatory Gaussian multiple
linear regression
variables (X1 , . . . , Xp ):
Regression in
practice
Y = 1 X1 + 2 X2 + . . . + p Xp + References

where is the error of the model.

56/109
Sample
Case study
Let (xi1 , . . . , xip , yi )i{1,...,n} n observations of
Simple linear
(X1 , . . . , Xp , Y ): regression

Gaussian simple
linear regression
i {1, . . . , n} : yi = 1 xi1 + 2 xi2 + . . . + p xip + i .
Multiple linear
regression
One assume that: Gaussian multiple
linear regression
I (yi )i{1,...,n} is an i.i.d. sample of Y . Regression in
practice
I (xij )i{1,...,n},j{1,...,p} are deterministic (and observed).
References
I (i )i{1,...,n} is an i.i.d sample of (unobserved).

(i )i{1,...,n} , for (i, j) i {1, . . . , n}2 , satisfy:


I E (i ) = 0 ,
I Var (i ) = 2 ,
I Cov (i , j ) = 0 if i 6= j .

57/109
Matrix form 1/2
Case study

Simple linear
We can write: regression

Y = X + Gaussian simple
linear regression

with: Multiple linear


regression
Gaussian multiple
y1 x11 . . . x1p linear regression
1
. . Regression in
.. .. ...

.. ..

practice
. . 1
References
..

Y = yi , X = xi1 . . . xip , =
. , = .

i
. . .. .. ...

.. .. . .
p

yn xn1 . . . xnp . n

58/109
Matrix form 2/2
Case study

Simple linear
regression

Gaussian simple
We have: linear regression

Multiple linear
E () = 0 , regression

Gaussian multiple
= 2 In . linear regression

Regression in
practice
So:
References

E (Y) = X ,
Y = 2 In .

59/109
Multiple linear regression with or without
constant Case study

Simple linear
regression

Gaussian simple
linear regression

Multiple linear
regression

Gaussian multiple
If the constant is in the model, we consider X1 equal to 1: linear regression

Regression in
i {1, . . . , n} : xi1 = 1 . practice

References

There are only (p 1) explanatory variables.

60/109
Linearization of problems
Case study

Simple linear
regression

Gaussian simple
linear regression

Multiple linear
regression

Gaussian multiple
Its possible to consider as explanatory variables functions of linear regression

X1 , . . . , Xp (power, exponential, logarithm. . . ). Regression in


practice

References

61/109
Ordinary least square estimator 1/4
Case study

Simple linear
regression
Ordinary
 least square  (OLS) estimators of = (1 , . . . , p ), Gaussian simple
linear regression
b = b1 , . . . , bp , minimize:
Multiple linear
regression
2 Gaussian multiple
n
X p
X linear regression
S () = yi j xij Regression in
practice
i=1 j=1
References
= kY Xk2
= (Y X)> (Y X)
= Y> Y 2 > X> Y + > X> X .

62/109
Ordinary least square estimator 2/4
Case study

Simple linear
regression

Gaussian simple
linear regression
So: Multiple linear
regression

b = arg min kY Xk2 Gaussian multiple


linear regression

Regression in
 
= arg min Y> Y 2 > X> Y + > X> X . practice
References

If the rank of matrix X is p, then X> X is invertible. This is


the case if the columns of X arent collinear.

63/109
Ordinary least square estimator 3/4
Case study

Simple linear
regression
We minimize S defined by:
Gaussian simple
linear regression

S : Rp R+ Multiple linear
regression
.
7 S () = Y> Y 2 > X> Y + > X> X
Gaussian multiple
linear regression

Regression in
practice
The gradient of S is:
References

> >
S () = 2X Y + 2X X .

Note that the gradient of x 7 a> x is a> and that the


gradient of x 7 x > Ax is Ax + A> x (2Ax if the matrix A is
symmetric).

64/109
Ordinary least square estimator 4/4
Case study

Simple linear
regression

Gaussian simple
linear regression

Hessian matrix of S is 2X> X, which is positive definite. Multiple linear


regression
We need to solve: Gaussian multiple
linear regression

X> Y + X> X = 0 X> X = X> Y Regression in


practice
 1
b = X> X X> Y .
References

65/109
Fitted value and residuals
Case study

Simple linear
regression

Gaussian simple
linear regression

Multiple linear
Fitted values are: regression
b = X
Y b . Gaussian multiple
linear regression

Regression in
practice
Residual are: References
e=YY
b .

66/109
Geometric interpretation
Case study

Simple linear
regression
b is the orthogonal projection of Y on the subspace spanned
Y Gaussian simple
linear regression
by columns of X: sp{X1 , . . . , Xp }.
Multiple linear
regression

The projection matrix (hat matrix) is: Gaussian multiple


linear regression
 1 Regression in
H = X X> X X> . practice

References

We can check that:


 1
b = X X> X
b = X
Y X> Y = HY .

67/109
Properties of fitted values
Case study

Simple linear
regression

Gaussian simple
linear regression

Multiple linear
regression

I E (Y) = X . Gaussian multiple


linear regression

Regression in
practice
I Yb = 2 H .
References

68/109
Properties of residuals
Case study
I e sp{X1 , . . . , Xp } . Simple linear
regression

Gaussian simple
I b = Y HY = H Y where H = In H .
e=YY linear regression

Multiple linear
regression
I X> e = 0 . Gaussian multiple
linear regression

Regression in
I If there is constant
P is in the model
Pmodel then
practice

he, 1n i = 0, so ni=1 ei = 0 and ni=1 ybi = ni=1 yi .


P References

I kek2 = Y> H Y .

I E(e) = 0 .

I e = 2 H .

69/109
Properties of
b
Case study

Simple linear
regression

Gaussian simple
linear regression

Multiple linear
regression
I
b is an unbiased estimator of :
Gaussian multiple
  linear regression
E b = . Regression in
practice

1 References
I b = 2 X> X .

70/109
Gauss-Markov theorem
Case study

Simple linear
regression

Gaussian simple
linear regression

Multiple linear
regression

Gaussian multiple
linear regression
is the best linear unbiased estimator (BLUE) of . Regression in
practice

References

71/109
Estimation of 2
Case study

Simple linear
regression

Gaussian simple
We consider the residual variance as estimation of 2 : linear regression

Multiple linear

kek2
regression
SSR
b2 =
= . Gaussian multiple
np np linear regression

Regression in
practice
Thus, for the variance-covariance matrix of :
b
References
 1 SSR 1
b b 2 X> X
b = = X0 X .
np

72/109
ANOVA
Case study

Simple linear
regression

Gaussian simple
linear regression
With a constant in the model:
Multiple linear
regression

SST = SSM + SSR Gaussian multiple


linear regression
Xn Xn Xn
(yi y )2 yi y )2 (yi ybi )2
Regression in
= (b + practice

i=1 i=1 i=1 References


2 2
2
kY y 1n k = Y y 1n + Y Y
b b

73/109
Dcomposition of la Variance (ANOVA)
Case study

Simple linear
regression

Gaussian simple
linear regression
Without a constant in the model:
Multiple linear
regression

SST = SSM + SSR Gaussian multiple


linear regression
Xn Xn Xn
(yi ybi )2
Regression in
yi2 = ybi2 + practice

i=1 i=1 i=1 References


2 2
2
kYk = Y + Y Y
b b

74/109
Coefficient of determination
Case study

Simple linear
2 regression
The coefficient of determination R [0, 1] is defined by:
Gaussian simple
linear regression
SSM
R2 = . Multiple linear
regression
SST
Gaussian multiple
linear regression

R2 is also equal to cos2 () where is the angle between: Regression in


practice
 
I Regression with constant: (Y y 1n ) and Y b y 1n . References

I Regression without constant: Y and Y.


b

Interpretation is easier in the case of a regression with


constant.

75/109
Adjusted coefficient of determination
Case study

Simple linear
regression
The coefficient of determination increases with p. Gaussian simple
linear regression

The adjusted coefficient of determination is defined by: Multiple linear


regression
I Regression with constant: Gaussian multiple
linear regression
n Regression in
R2adjusted = 1 1 R2 .

practice
np References

I Regression without constant:


n1
R2adjusted = 1 1 R2 .

np

76/109
Table of contents
Case study

Simple linear
regression
Case study
Gaussian simple
linear regression

Multiple linear
Simple linear regression regression

Gaussian multiple
linear regression
Gaussian simple linear regression Regression in
practice

References
Multiple linear regression

Gaussian multiple linear regression

Regression in practice

77/109
Assumption
Case study

Simple linear
regression

Gaussian simple
linear regression

Multiple linear
regression
We add: Gaussian multiple
N 0, 2 .

linear regression

Regression in
practice
2

So (i )i{1,...,n} is an i.i.d sample with distribution N 0, . References

78/109
Distribution of Y
Case study

Simple linear
regression

Gaussian simple
linear regression

Multiple linear
regression

Gaussian multiple
We have: linear regression

Y Nn X, 2 In

. Regression in
practice

References

79/109
Maximum likelihood estimation 1/3
Case study

Simple linear
regression
The likelihood of (1 , . . . , n ) is: Gaussian simple
linear regression
!
2
 1 1 kk2 Multiple linear
p 1 , . . . , n ; , = n exp regression

2 2 2 Gaussian multiple
  linear regression
1 1 Regression in
= n exp 2 S () . practice
2 2
References

Thus the log-likelihood is:


  n 1
` 1 , . . . , n ; , 2 = n log 2 log 2 2 S () .

2 2

80/109
Maximum likelihood estimation 2/3
Case study

Simple linear
regression

Gaussian simple
linear regression
We obtain the maximum by solving: Multiple linear
regression
 
` 1 , . . . , n ; , 2 = 0 S
 Gaussian multiple
ML = 0 ,
b
linear regression
2

` 1 , . . . , n ; , n 1   Regression in
= 0 2 + 4 S
b
ML = 0 .
practice
2 2b
ML 2b
ML References

Maximum likelihood and OLS estimators of are same.

81/109
Maximum likelihood estimation 3/3
Case study

Simple linear
regression

Gaussian simple
linear regression

Multiple linear
We have:   regression
S
b
kek2 Gaussian multiple
2 linear regression

bML = = .
n n Regression in
practice

Maximum likelihood estimation of 2 is biased (but References

asymptotically unbiased).

82/109
Parameters distribution
Case study

Simple linear
regression

Gaussian simple
Cochran theorem gives: linear regression

Multiple linear
I  regression
 1 
b N , 2 X> X , Gaussian multiple
linear regression

Regression in
I practice
b2

(n p) 2np , References
2
I
b
b2 .

83/109
Test
Case study

We test: ( Simple linear


regression
H0 : R = r Gaussian simple
linear regression
H1 : R 6= r
Multiple linear
regression
with dim(R) = (q, p) and rang(R) = q.
Gaussian multiple
linear regression

The statistic used is: Regression in


practice
2 References
r R
b h 1 > 1
i
np R( )
X> X R
F = 2 .
q

Y X
b

Under H0 : F F (q, n p) .
We reject H0 at level if f > f(q,np1),1 .

84/109
Overall significance test
Case study

Simple linear
For a regression with constant, we consider: regression

( Gaussian simple
linear regression
H0 : 2 = . . . = p = 0
. Multiple linear
H1 : i {2, . . . , p} /i 6= 0 regression

Gaussian multiple
linear regression
Statistic used is:
Regression in
practice
n p SSM
F = References
p 1 SSR
n p R2
= .
p 1 1 R2

Under H0 : F F (p 1, n p).
We reject H0 at level if f > f(p1,np),1 .

85/109
ANOVA
Case study

Simple linear
We define: regression

Gaussian simple
SSM linear regression
MSM = ,
p1 Multiple linear
regression
SSR Gaussian multiple
MSR = . linear regression
np
Regression in
practice

Source ddl SS MS F p value References

MSM
M p1 SSM MSM MSR P (F (p 1, n p) > f )

R np SSR MSR

T n1 SST

86/109
Test on a parameter
Case study
Consider: ( Simple linear
H0 : i = c regression

Gaussian simple
H1 : i 6= c linear regression

Multiple linear
for i {1, . . . , p}. regression

Gaussian multiple
linear regression
The statistic used is:
Regression in
practice
bi c References
T = q
1
b (X> X)ii

1
where X> X ii is le i-th diagonal value of the matrix
1
X> X .
Under H0 : T T (n p).
We reject H0 at level if t > tnp,1 .

87/109
Table of contents
Case study

Simple linear
regression
Case study
Gaussian simple
linear regression

Multiple linear
Simple linear regression regression

Gaussian multiple
linear regression
Gaussian simple linear regression Regression in
practice

References
Multiple linear regression

Gaussian multiple linear regression

Regression in practice

88/109
Atypical explanatory variables 1/2
Case study

Simple linear
regression
The i-th diagonal value of H is called leverage point: Gaussian simple
linear regression
>
 1
X> X
Multiple linear
hii = Xi Xi regression

Gaussian multiple
linear regression
>
where Xi = (xi1 , . . . , xip ) . Regression in
practice

We have 0 hii 1 because H> = H and: References

n
X n
X
2
H = H hii = hij2 = hii2 + hij2 .
j=1 j=1,j6=i

89/109
Atypical explanatory variables 2/2
Case study

Simple linear
regression

Gaussian simple
If hii is close to 1 then hij are close to 0, ybi will be almost linear regression

Multiple linear
explained by yi . regression

Gaussian multiple
Pn linear regression
Moreover Tr (H) = rang (H) so i=1 hii = p.
Regression in
practice
Belsey proposed to consider i as atypical if: References

p
hii > 2 .
n

90/109
Atypical dependent variable: principle
Case study

Simple linear
regression
We use residuals: Gaussian simple
ei = yi ybi . linear regression

Multiple linear
We have: regression

e = 2 (In H) . Gaussian multiple


linear regression

Thus: Regression in
practice

References
Var (ei ) = 2 (1 hii ) ,
Cov (ei , ej ) = 2 (1 hij ) .

We use studentized residuals to evaluate if an observation is


atypical.

91/109
Atypical dependent variable: internal studentized
residuals Case study

Simple linear
regression

Gaussian simple
linear regression

Multiple linear
regression
For i {1, . . . , n}, the internal studentized residuals are:
Gaussian multiple
linear regression
ei yi ybi
ri = q = . Regression in
b 1 hii
practice
Var
d (ei )
References

ri approximatively follows the distribution T (n p 1)


(approximation used when n p 1 > 30).

92/109
Atypical dependent variable: external studentized
residuals Case study

We use cross validation criterion. Simple linear


regression
Consider leave-one out calculations: Gaussian simple
 > i linear regression

ybi i = Xii
b , Multiple linear
regression
i1
b i = X i > X i
h > Gaussian multiple
X i Y ,

linear regression

Regression in
i b i 2
practice
2 Y Y
b i =
References
.
np1
The external studentized residuals are:

yi ybi i
Ri = r > h .
> i 1 i
 i
i
b i
1 + Xi (X ) Xi Xi

We have: Ri T (n p). 93/109


Observations influence
Case study

Simple linear
regression
The Cook distance measures a difference between b and
Gaussian simple
i linear regression

b .
Multiple linear
For the observation i: regression

> Gaussian multiple


b i b i
   linear regression
b
X>X b
1 Regression in
Di = . practice
p b2

References

Cook proposed to consider i as influent if:


4
Di > .
np

94/109
Collinearity detection: problem
Case study

Simple linear
regression

Gaussian simple
linear regression

Multiple linear
regression

Gaussian multiple
If the columns of X are collinear then the OLS estimator linear regression

isnt unique. Regression in


practice

References

95/109
Collinearity detection: facteur dinfluence of la
variance and tolrance Case study

Simple linear
regression
We compute the regression of Xj , for j {1, . . . , p}, on the
Gaussian simple
p other variables (with the constant) and we calculate the linear regression

coefficient of determination R2j . Multiple linear


regression
The variance inflation factor (VIF) of Xj , j {1, . . . , p}, is: Gaussian multiple
linear regression
1 Regression in
VIFj = . practice
1 R2j
References

The tolerance (TOL) is:

1
TOLj = .
VIFj

In practice, we consider that there is a collinearity problem if


VIFj > 10 (TOLj < 0.1).

96/109
Collinearity detection: explanatory variables
structure1/5 Case study

Simple linear
regression

Gaussian simple
linear regression

Multiple linear
regression

In the case of a regression without constant, consider: Gaussian multiple


linear regression
" # Regression in
X X 1 X X 1 practice
Xe= 1 1 n
, . . . , p p n
. References
X1 X1 1n Xp Xp 1n

97/109
Collinearity detection: explanatory variables
structure 2/5 Case study

Simple linear
regression

Gaussian simple
The correlation matrix of explanatory variables is R = X e > X.
e linear regression

Multiple linear
Its a nonnegative and symmetric matrix, thus diagonalizable. regression
Let (1 , . . . , p ) be the p descendingly ordered eigenvalues Gaussian multiple
linear regression
of R.
Regression in
The condition numbers for j {1, . . . , p} are defined by: practice

s References

1
CNj = .
j

In practice a CNj > 30 value indicates a possible problem of


collinearity.

98/109
Collinearity detection: analyse of la structure des
explanatory variables 3/5 Case study

We now seek to determine the groups of variables concerned. Simple linear


regression
Let V be the transfer matrix (V V > = Ip ) of R:
Gaussian simple
linear regression

0 ... ... 0 Multiple linear


1 regression

0 ... ...
.. Gaussian multiple
. linear regression

.. . . .

. Regression in
R=V . . i . . .. V> . practice
References
. . .
.. .. .. 0


0 . . . . . . 0 p

We have for k {1, . . . , p}:


p
  2 X vkj2
Var
d ck = .
Xk Xk 1n j
j=1
99/109
Collinearity detection: explanatory variables
structure 4/5 Case study

We compute the proportion lk of the variance of bk Simple linear


regression
explained by Xl : Gaussian simple
vkl2 linear regression
l
lk = P . Multiple linear
p vkj2 regression
j=1 j Gaussian multiple
linear regression
We have: Regression in
      practice
Eigenvalues CN Var b1 ... Var bk ... Var bp References

1 1 11 ... 1k ... 1p
.. .. .. .. ..
. . . . .
q
1
j j
j1 ... jk ... jp
.. .. .. .. ..
. . . . .
q
1
p p
p1 ... pk ... pp

100/109
Collinearity detection: explanatory variables
structure 5/5 Case study

Simple linear
regression

Gaussian simple
linear regression

Multiple linear
regression
In practice, we must study the explanatory variables Xj with
Gaussian multiple
a high CN. linear regression

Regression in
practice
For this variable j, if there are at least two explanatory References
variables Xk and Xk 0 such that jk and jk 0 are high (more
than 0.5 in practice) then problem of collinearity is suspected
between these variables k and k 0 .

101/109
Complementary analysis
Case study

Simple linear
regression

Gaussian simple
linear regression

I Homoscedasticity. Multiple linear


regression
There are tests but its also possible to plot studentized Gaussian multiple
residuals in function of fitted values. linear regression

Regression in
practice

I Normality of the error distribution References

Its possible to use tests, for example a Shapiro-Wilk


test.

102/109
Model selection: principle
Case study

Simple linear
regression

Gaussian simple
linear regression
Assume that we are searching k predictors between K
Multiple linear
possible predictors. regression

Gaussian multiple
K
 linear regression
There are k potential models.
Regression in
practice

We need: References

I a criterion to compare two models,


I a search strategy.

103/109
Model selection: possible criterions
Case study

Simple linear
regression

Gaussian simple
linear regression

Multiple linear
regression
I Adjusted coefficient of determination,
Gaussian multiple
linear regression

I Kullback information criterion (AIC or BIC for example), Regression in


practice

References

I Mallows coefficient.

104/109
Model selection: Kullback-Leibler information
criterion Case study

Simple linear
regression
Let f0 be a probability density that we want to estimate
Gaussian simple
among a parameterized set F. linear regression

The Kullback-Leibler divergence is a measure of distance Multiple linear


regression
between true and estimated probability densities: Gaussian multiple
  linear regression
Z
f0 (x) Regression in
I (f0 , F) = min log f0 (x) dx. practice
f F f (x)
References

This quantity is always nonnegative and is zero if f0 F.


Kullback-Leibler estimators have the following form:
2

Ib(Mk ) = n log
bM k
+ (n) k

where 2 is the residual variance of M (one of the models


bM k k
with k predictors).

105/109
Model selection: AIC and BIC criterions
Case study

Simple linear
regression
There are many choices for :
Gaussian simple
I The Akaike criterion (AIC: Akaike Information Criterion) linear regression

With (n) = 2: Multiple linear


regression

2
 Gaussian multiple
AIC = n log bM k
+ 2k . linear regression

Regression in
practice
I The Schwarz criterion (BIC: Bayesian Information
References
Criterion, equivalent to SBC: Schwarz Bayesian
Criterion)
With (n) = log (n):
2

BIC = n log bM k
+ k log (n) .

106/109
Model selection: Mallows criterion
Case study

Simple linear
regression

Gaussian simple
linear regression

Multiple linear
The Mallows criterion Cp (p doesnt refer to the number of regression

Gaussian multiple
predictors) for a model Mk with k predictors is defined by: linear regression

Regression in
SSR (Mk ) practice
Cp = n + 2k .
b2
References

107/109
Model selection: search strategy
Case study
I Forward procedure:
Simple linear
I Start with no variables in the model. regression
I Add the variable whose inclusion gives the most Gaussian simple
statistically significant improvement (given the criterion linear regression

chosen). Multiple linear


regression
I Repeat until none of the variables exclusion improves
Gaussian multiple
the model. linear regression

I Backward procedure: Regression in


practice
I Start with all variables in the model. References
I Remove the variable whose exclusion gives the most
statistically significant improvement (given the criterion
chosen).
I Repeat until none of the variables inclusion improves
the model.
I Stepwise procedure:
Its a combination of the forward and backward
procedures.
108/109
References
Case study

Simple linear
regression

Gaussian simple
Antoniadis, A., Berruyer, J., and Carmona, R. (1992). linear regression
Regression non lineaire et applications. Economica. Multiple linear
regression
Cornillon, P.-A. and Matzner-Lber, E. (2010). Regression Gaussian multiple
avec R. Springer. linear regression

Regression in
Galton, F. (1886). Regression towards mediocrity in practice

hereditary stature. The Journal of the Anthropological References

Institute of Great Britain and Ireland, 15:246263.


McCullagh, P. and Nelder, J. A. (1989). Generalized linear
models. Monographs on Statistics & Applied Probability
(37). Chapman & Hall.

109/109

You might also like