You are on page 1of 59

DrMudassirUddin

DrMudassirUddin



Review of Linear Model Basics
Linear Regression Lecture [1]
Gauss-Markov Assumptions for Classical Linear Regression
Functional Form: Y
(n1)
= X
(nk)(k1)
+
n1
(X is full rank and has leading column of 1s)
Mean Zero Errors: E[|X] = 0
Homoscedasticity: Var[

] =
2
I
Non-Correlated Errors: Cov[
i
,
j
] = 0, i = j
Exogeneity of Explanatory Variables Cov[
i
|X] = 0, i
Linear Regression Lecture [2]
Other Considerations
Requirements: conformability, X is rank k
Freebee: eventual normality. . . |X N(0,
2
I)
Linear Regression Lecture [3]
Least Squares Estimation (OLS)
Dene the following function:
S() =

= (YX)

(YX)
= Y

Y
(1n)(n1)
2Y

X
(1n)(nk)(k1)
+

X
(1k)(kn)(nk)(k1)
Take the derivative of S() with respect to :

S() = 0 2 X

Y
(kn)(n1)
+ 2X

X
(kn)(nk)(k1)
0
So there exists a solution at some value

of : X

Xb = X

Y which is the Normal Equation


Premultiplying the Normal Equation by (X

X)
1
, gives:

= (X

X)
1
X

Y, where we can call


as b for notational convenience (this is where the requirement for X

X to be nonsingular comes
from)
Linear Regression Lecture [4]
Implications
Normal Equation: X

Xb X

Y = X

(YXb) = X

e 0 (by assumption)

e
i
0
the regression hyperplane pass through means:

Y =

Xb
mean(

Y) = mean(Y)
the hat matrix (H, P, or (I M)):
e = YXb
= YX((X

X)
1
X

Y)
= Y(X(X

X)
1
X

)Y
= YHY
= (I H)Y
= MY
where M is symmetric and idempotent.
Linear Regression Lecture [5]
The HAT Matrix
the name is because

Y = Xb = X((X

X)
1
X

Y) = (X(X

X)
1
X

)Y = HY, but projection


matrix is better for geometric reasons.
properties of interest:
I M = P, I P = M
PX = X
PM = MP = 0 (orthogonality)
e

e = Y

MY = Y

MY = Ye (sum of squares)
Using the Normal Equation, X

Xb = X

Y,
e

e = (YXb)

(YXb) = Y

YY

Xb b

Y + b

Xb
= Y

Y(X

Xb)

b b

(X

Xb) + b

Xb
= Y

Y(b

)(Xb)
= Y

Y

Y


Y
Linear Regression Lecture [6]
Fit & Decomposition
Denitions:
SST =
n

i=1
(Y
i


Y)
2
SSR =
n

i=1
(

Y
i


Y)
2
SSE =
n

i=1
(

Y
i
Y
i
)
2
Linear Regression Lecture [7]
Fit & Decomposition
Interesting manipulations of the sum of squares total:
SST =
n

i=1
(Y
2
i
2Y
i

Y +

Y
2
)
=
n

i=1
Y
2
i
2
n

i=1
Y
i

Y + n

Y
2
=
n

i=1
Y
2
i
2n

Y
2
+ n

Y
2
=
n

i=1
Y
2
i
n

Y
2
(scalar description)
=
n

i=1
Y
2
i
n
_
1
n
n

i=1
Y
i
_
2
= Y

Y
1
n
Y

JY (matrix algebra description)


where J is a n n matrix of all 1s.
Linear Regression Lecture [8]
More Decomposition
Sum of Squares Regression:
SSR =
n

i=1
(

Y
2
i
2

Y
i

Y +

Y
2
)
=

Y

Y 2

Y
n

i=1

Y
i
+ n

Y
2
= (b

)Xb 2n

Y
2
+ n

Y
2
= (b

)Xb n

Y
2
= b

Y
1
n
Y

JY
Linear Regression Lecture [9]
Sum of Squares Error:
SSE =
n

i=1
(

Y
i
Y
i
)
2
= e

e
= (YX)

(YX)
= Y

YY

Xb b

Y + b

Xb
= Y

Yb

Y + (b

XY

)Xb
= Y

Yb

Y + (0)Xb
= Y

Yb

Y
Linear Regression Lecture [10]
Magic
SSR + SSE:
SSR + SST = (b

Y
1
n
Y

JY) + (Y

Yb

Y)
= Y

Y
1
n
Y

JY
= SST
Linear Regression Lecture [11]
A Measure of Fit
The R-Square or R-Squared measure:
R
2
=
SSR
SST
=
SST SSE
SST
= 1
e

e
Y

M
o
Y
=
b

M
o
Xb
Y

M
o
Y
where M
o
= I
1
n
i

i, i = c(1, 1, . . . , 1).
Note: M
o
is idempotent and transforms means to deviances for the explanatory variables.
Warning: R
2
is not a statistic and does not have quite the meaning that some expect.
There is another version that accounts for sample size and the number of explanatory variables:
R
2
adj
=
e

e/(n k)
Y

M
o
Y/(n 1)
Linear Regression Lecture [12]
Properties of OLS
b is unbiased for :
b = (X

X)
1
X

Y
= (X

X)
1
X

(X + )
= (X

X)
1
X

X + (X

X)
1
X

= + (X

X)
1
X

So E[b] = .
Linear Regression Lecture [13]
Properties of OLS
The variance of b is:
Var[b] = E[(b )](b )

] (E[b ])(E[b ])
= E[(b )](b )

]
= E[((X

X)
1
X

)((X

X)
1
X

)]
= (X

X)
1
X

E[

]X(X

X)
1
= (X

X)
1
X(
2
I)X(X

X)
1
=
2
(X

X)
1
Linear Regression Lecture [14]
Properties of OLS
Given Gauss-Markov assumptions, b is BLUE for if calculated from OLS.
b|X N(,
2
(X

X)
1
).
Est.Var[b] = s
2
(X

X)
1
, where s
2
= e

e/(n k), the standard error of the regression.


Most econometric texts make a distinction between non-stochastic explanatory variables (those
set in experimental settings) and stochastic explanatory variables which are set by nature/survey/others.
Perspective here: X is xed once observed and is the random variable:
since Var[] = I, then no single term dominates and we get the Lindberg-Feller CLT result.
so (the sample quantity) is IID normal and we write the joint PDF as:
f() =
n

i=1
f(
i
) = (2
2
)

n
2
exp[e

e/2
2
]
(estimation from E[e

e] =
2
).
Linear Regression Lecture [15]
Estimating From Sample Quantitities
Population derived variance/covariance matrix: Var[b] =
2
(X

X)
1
.
We also know: E[e
i
] =
i
.
And: E[e
2
i
] = Var[
i
] + (E[
i
])
2
= tr(
2
I) = n
2
.
So why not use:
2

1
n

e
2
i
.
But:
e
i
= y
i
X
i
b (now insert population values)
= (X

i
+
i
) X
i
b
=
i
X
i
(b )
where E[e
i
] =
i
.
Linear Regression Lecture [16]
Estimating From Sample Quantitities
Recall that: M = I H = I X(X

X)
1
X

. So that:
My = (I X(X

X)
1
X

)y
= y X(X

X)
1
X

y
= y Xb
= e
So e = Me since e = y y.
And therefore:
My = M[X + ]
= MX + M
= (I X(X

X)
1
)X + M
= X X(X

X)
1
)XX + M
= M
So e = Me and e

e = (M)

M =

M =

M.
Linear Regression Lecture [17]
Estimating From Sample Quantitities
So we can use this:
E[e

e|X] = E[e

Me|X]
= E[tr(e

Me)|X] (Gauss-Markov assumption)


= E[tr(Me

e)|X] (properties of traces: tr(ABC) = tr(CBA)


= tr(ME[e

e|X]) (Mis xed for observed X


= tr(M)
2
= [tr(I) tr(X(X

X)
1
X

)]
2
= [n k]
2
Tricks:
tr[X(X

X)
1
X] = tr[(X

X)
1
X

X] = k
rank[A] = tr[A] for symmetric idempotent A
rank[ABC] = rank[B] if A, B nonsingular
so tr[H] = rank[X

] = rank[X] = k
Linear Regression Lecture [18]
Estimating From Sample Quantitities
From the biased estimator of
2
, E[e

e|X] = (n k)
2
, we get:

2
=
e

e
n k
= s
2
,
so that an estimator of Var[b] is:

Var[b] = s
2
(X

X)
1
.
This sets up Wald-style traditional linear inference:
z
k
=
b
k

null
k
_

2
(X

X)
1
asym.
N(0, 1),
provided that we know
2
(which we usually do not).
But we know that:
(n k)s
2

2

2
nk
= X
2
z
k
/
_
X
2
/df t
(nk)
(0)
if the random variables z
k
and X
2
are independent.
Linear Regression Lecture [19]
Estimating From Sample Quantitities
Making the obvious substitution gives:
t
(nk)
=
b
k

null
k
_

2
(X

X)
1

1
_
(nk)s
2

2
(nk)
=
b
k

null
k
_
s
2
(X

X)
1
Typical (Wald) regression test:
H
0
:
k
= 0 H
1
:
k
= 0
making:
t
(nk)
=
b
k

null
k
_
s
2
(X

X)
1
=
b
k
SE(
k
)
Alternatives usually look like:
H
0
:
k
< 1 H
1
:
k
1
making:
t
(nk)
=
b
k
1
SE(
k
)
Linear Regression Lecture [20]
Summary Statistics
(1 ) Condence Interval for b
k
:
_
b
k
SE(b)t
/2,df
:b
k
+ SE(b)t
/2,df

(1 ) Condence Interval for


2
:
_
(n k)s
2

2
/2
:
(n k)s
2

2
1/2
_
F-statistic test for all but b
0
zero:
F =
SSR/(k 1)
SSE/(n k)
F
k1,nk
under the null
Linear Regression Lecture [21]
Multicollinearity Issues
If one explanatory variable is a linear combination of another then rank(X) = k 1.
Therefore rank(X

X) = k 1 (matrix size k k), and it is singular and non-invertible.


Now no parameter estimates are possible, and model is now unidentied.
More typically: 2 explanatory variables are highly but not perfectly correlated.
Symptoms:
small changes in data give large changes in parameter estimates.
coecients have large standard errors and poor t-statistics even if F-statistics and R
2
are okay.
coecients seem illogical (wrong sign, huge magnitude)
Linear Regression Lecture [22]
Multicollinearity Remedies
respecify model (if reasonable)
center explanatory variables, or standardize
ridge regression (add a little bias):
b = [X

X+ RI]
1
X

y
such that the [ ] part barely inverts.
Linear Regression Lecture [23]
Is b Unbiased?
Starting with b:
b = (X

X)
1
X

y = (X

X)
1
X

(X + ) = + (X

X)
1
X

,
and take expected values:
E[b] = E[ + (X

X)
1
X

] = E[] + E[(X

X)
1
X

] = + E[K] =
Linear Regression Lecture [24]
What is the Variance of b?
By denition:
Var[b|X] = E[(b )(b )|X] E[(b |X)]
2
.
Since b = (X

X)
1
X

,
Var[b|X] = E
_
((X

X)
1
X

)((X

X)
1
X

|X

= E
_
(X

X)
1
X

X(X

X)
1
|X

= (X

X)
1
X

E[ee

|X]X(X

X)
1
= (X

X)
1
X

2
IX(X

X)
1
=
2
(X

X)
1
X

X(X

X)
1
=
2
(X

X)
1
Linear Regression Lecture [25]
Testing Linear Restrictions
A theory has testable implications if it implies some testable restrictions on the model denition:
H
0
:
k
= 0 versus H
1
:
k
= 0
for example.
Most restrictions involve nested parameter space:
unrestricted:[
0
,
1
,
2
,
3
]
restricted:[
0
, 0,
2
,
3
].
Note that non-nested comparisons cause problems for non-Bayesians.
Linear Regression Lecture [26]
Testing Linear Restrictions
Linear restrictions for regression are clear when:
r
11

1
+ r
12

2
+ . . . + r
1k

k
= q
1
r
21

1
+ r
22

2
+ . . . + r
2k

k
= q
2
.
.
.
r
j1

1
+ r
j2

2
+ . . . + r
jk

k
= q
j
or in more succinct matrix algebra form: R
(Jk)
= q.
Notes:
Each row of R is one restriction.
J < k for R to be full rank.
This setup imposes J restrictions on k parameters, so there are k J free parameters left.
We are still assuming that
i
N(0,
2
)
General test:
H
0
: R q = 0 H
1
: R q = 0
Linear Regression Lecture [27]
Testing Linear Restrictions, Examples
One of the coecients is zero,
j
= 0, J = 1:
R = [0, 0, 0, 1
j
, . . . , 0, 0, 0], q = 0
Two coecients are equal,
j
=
k
, J = 2:
R = [0, 0, 0, 1
j
, . . . , 1
k
, . . . , 0, 0, 0], q = 0
Three coecients are zero, J = 3:
_
_
1 0 0 0 . . . 0
0 1 0 0 . . . 0
0 0 1 0 . . . 0
_
_
= [I:0], q =
_
_
0
0
0
_
_
Linear Regression Lecture [28]
Testing Linear Restrictions, Examples
A set of coecients sum to one,
2
+
3
+
4
= 1:
R = [0, 1, 1, 1, 0, . . . , 0], q = 1
Several restrictions,
2
+
3
= 4,
4
+
6
= 0,
5
= 9:
R =
_
_
0 1 1 0 0 0
0 0 0 1 0 1
0 0 0 0 1 0
_
_
, q =
_
_
4
0
9
_
_
All coecients except the constant are zero:
R = [0:I], q = [0]
Linear Regression Lecture [29]
Testing Linear Restrictions, Examples
Dene now the discrepency vector dictated by the null hypothesis:
Rb q = m 0
which asks whether m is suciently dierent from zero. Note that m is linear function of b and
therefore also normally distributed.
This makes it easy to think about:
E[m|X] = RE[b|X] q = R q = 0
Var[m|X] = Var[Rb q|X] = R[Var[b|X]R

=
2
R(X

X)
1
R
Wald Test:
W = m

[Var[m|X]]
1
m = (Rb q)

[
2
R(X

X)
1
R

]
1
(Rb q)
2
J
where J is the number of rows of R, i.e. the number of restrictions.
Linear Regression Lecture [30]
Testing Linear Restrictions, Examples
Unfortunately we do not have
2
, so we use the test:
F =
2
(Rb q)

(R(X

X)
1
R

)(Rb q)
1

2
J

2
s
2
_
n k
n k
_
=
(Rb q)

(
2
R(X

X)
1
R

)
1
(Rb q)/J
[(n k)s
2
/
2
]/(n k)
=
X
n

2
J
/J
X
d

2
nk
/(n k)
F
j,nk
=
thats the distributional interpretation, now simplify:
=
1
J
(Rb q)

(s
2
R(X

X)
1
R

)
1
(Rb q)
Linear Regression Lecture [31]
Testing Linear Restrictions, Examples
Example with 1 linear restriction, start with the denition:
H
0
: r
1

1
+ r
2

2
+ . . . + r
k

k
= r = q
F
1,nk
=

j
(r
j
b
j
q)
2

k
r
j
r
k
Est.Cov.[b
j
, b
k
]
and be more specic:

= 0, so R = [0, 0, . . . , 0, 1, 0, . . . , 0, 0], q = [0]


so that R(X

X)
1
R simplies to the jth diagonal of (X

X)
1
. Now:
Rb q = b
j
q, F
1,nk
=
b
j
q
Est.V ar.[b
j
]

1
2
.
Linear Regression Lecture [32]
Non-Normal Errors
Without
i
N(0,
2
) we do not have F, t,
2
results (bummer).
Despite that, we know that the asymptotic disribution of b is:
b N
_
,

2
n
Q
1
_
where Q = plim
_
X

X
n
_
and plims
2
=
2
where s
2
=
e

e
nk
.
So the test statistic is:
t
k
=

n(b
k

null
k
)
_
s
2
(X

X/n)
1
t
nk
provided that e
i
N(0,
2
).
Since the denominator converges to
_

2
Q
1
, then:

k
=

n(b
k

null
k
)

2
Q
1
=

n(b
k

null
k
)
Asym.V ar.[b
k
]
1
2
thus asymptotically justifying a t-test without the assumption of normality.
Linear Regression Lecture [33]
Non-Normal Errors Summary
Major theorem, if:

n(b ) d

N(0,
2
Q
1
)
and if
H
0
: R q = 0
is true, then
W = (Rb q)

[Rs
2
(X

X)
1
R

]
1
(Rb q) = JF d

2
J
Linear Regression Lecture [34]
Testing Nonlinear Restriction
H
0
: c() = q where c() is some nonlinear function.
Simple 1-restriction case:
z =
c(

)
est.SE
t
nk
(or equivalently z
2
F
1,nk
).
But getting est.SE is hard, so use the rst two terms of a Taylor series expansion to get an
estimate:
f(b) = f(a) + f

(a)
(b 1)
1
1!
+ f

(a)
(b a)
2
2!
+ f

(a)
(b a)
3
3!
+ . . .
meaning:
c(

) c() +
_
c()

)
so plim

= justies using c(

) instead of c().
Linear Regression Lecture [35]
Testing Nonlinear Restriction
Now we can calculate the needed variance term:
Var(c(

) = E
_
c(

)
2
_
(E[c(

)])
2
= E
_
c() +
_
c()

)
_
(E[c(

)])
2
= E
_
c()
2
2c()
_
c()

) +
_
c()

)
2
_
c()

__
(E[c(

)])
2
= q
2
2q
_
c()

(0) + E
__
c()

)
2
_
c()

__
q
2
= E
__
c()

Var(

)
_
c()

__
since E(

)
2
= E(

)
2

2
= Var(

).
All this means that we can use sample estimates for c()/ and plug in s
2
(X

X)
1
for Var(

)
and then test with a normal distribution.
Linear Regression Lecture [36]
Linear Model Predictions
We want the predicted value for x
0
not in the sample:
y
0
= x
0
+
0
y
0
= x
0
b
since y
0
is the LMVUE of E[ y
0
|x
0
].
The prediction error is:
e
0
= y
0
y
0
= x
0
( b) +
0
.
The Prediction variance is:
Var[e
0
|X, x
0
] =
2
+ Var[x
0
( b)|X, x
0
] =
2
+ x
0

2
(X

X)
1
x
0
and if we have a constant term in the regression, this is equivalent to:

2
_
_
1 +
1
n
+
K1

j=1
K1

k=1
(x
0
j
x
j
)(x
0
k
x
k
)(X
1
M
0
X
1
)
jk
_
_
,
where X
1
is X omitting the rst column, K is the number of explanatory variables (including
the constant), and M
0
= I
1
n
ii

.
Linear Regression Lecture [37]
Linear Model Predictions
This shows that the variance is greater the further away x
0
is from x. Classic diagram: where
prediction intervals are modeled y
0
t
/2
_
Var(e
0
)
Linear Regression Lecture [38]
Running Lowess
postscript("Class.Multilevel/trends1.ps")
par(bg="lightgrey")
x <- seq(1,25,length=600)
y <- (2/(pi*x))^(0.5)*(1-cos(x)) + rnorm(100,0,1/10)
par(mar=c(3,3,2,2))
plot(x,y,pch="+")
ols.object <- lm(y~x)
abline(ols.object,col="blue")
lo.object <- lowess(y~x,f=2/3)
lines(lo.object$x,lo.object$y,lwd=2,col="red")
lo.object <- lowess(y~x,f=1/5)
lines(lo.object$x,lo.object$y,lwd=2,col="purple")
dev.off()
Linear Regression Lecture [39]
Running Lowess
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+++
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
++
+
+
+
+
+
+
+
++
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
++
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
++
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
++
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+++
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
++
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
++
++
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
++
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
++
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
++
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
++
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+++
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
5 10 15 20 25

0
.
2
0
.
0
0
.
2
0
.
4
0
.
6
0
.
8
1
.
0
y
Linear Regression Lecture [40]
Example: Poverty Among the Elderly, Europe
Governments often worry about the economic condition of senior citizens for political and social
reasons.
Typically in a large industrialized society, a substantial portion of these people obtain the bulk of
their income from government pensions.
An important question is whether there is enough support through these payments to provide
subsistence above the poverty rate.
To see if this is a concern, the European Union (EU) looked at this question in 1998 for the (then)
15 member countries with two variables:
1. the median (EU standardized) income of individuals age 65 and older as a percentage of the
population age 064,
2. the percentage with income below 60% of the median (EU standardized) income of the national
population.
Linear Regression Lecture [41]
Example: Poverty Among the Elderly, Europe
The data from the European Household Community Panel Survey are:
Relative Poverty
Nation Income Rate
Netherlands 93.00 7.00
Luxembourg 99.00 8.00
Sweden 83.00 8.00
Germany 97.00 11.00
Italy 96.00 14.00
Spain 91.00 16.00
Finland 78.00 17.00
France 90.00 19.00
United.Kingdom 78.00 21.00
Belgium 76.00 22.00
Austria 84.00 24.00
Denmark 68.00 31.00
Portugal 76.00 33.00
Greece 74.00 33.00
Ireland 69.00 34.00
Linear Regression Lecture [42]
Example: Poverty Among the Elderly, Europe
eu.pov<-read.table("http://MS-702-2009/data/inc .pov.dat",row.names=1)
names(eu.pov) <- c("relative income", "poverty rate")
eu.pov <- eu.pov[-1,]
postscript("Class.Multilevel/trends2.ps")
par(mfrow=c(1,1),mar=c(4,4,2,2),lwd=5,bg="lightgrey")
plot(eu.pov,pch=15,xlab="",ylab="",ylim=c(2,37),xlim=c(61,104))
lines(lowess(eu.pov),col="purple",lwd=3)
text.loc <- cbind(eu.pov[,1],(eu.pov[,2]-1))
text.loc[14,2] <- text.loc[14,2] +2
text.loc[10,2] <- text.loc[10,2] +2
text(text.loc,dimnames(eu.pov)[[1]],cex=1.2)
mtext(side=1,cex=1.3,line=2,"Relative Income, Over 65")
mtext(side=2,cex=1.3,line=2,"Poverty rate, Over 65")
dev.off()
Linear Regression Lecture [43]
Example: Poverty Among the Elderly, Europe
60 70 80 90 100
5
1
0
1
5
2
0
2
5
3
0
3
5
Netherlands
Luxembourg Sweden
Germany
Italy
Spain
Finland
France
United.Kingdom
Belgium Austria
Denmark
Portugal
Greece
Ireland
Relative Income, Over 65
P
o
v
e
r
t
y

r
a
t
e
,

O
v
e
r

6
5
Linear Regression Lecture [44]
Example: Poverty Among the Elderly, Europe
postscript("Class.Multilevel/trends3.ps")
par(mfrow=c(1,1),mar=c(4,4,2,2),lwd=5,bg="lightgrey")
plot(eu.pov,pch=15,xlab="",ylab="",ylim=c(2,37),xlim=c(61,104))
x.y.fit <- lm(eu.pov[,2] ~ eu.pov[,1])
abline(x.y.fit$coefficients,col="forest green")
text.loc <- cbind(eu.pov[,1],(eu.pov[,2]-1))
text.loc[14,2] <- text.loc[14,2] +2
text.loc[10,2] <- text.loc[10,2] +2
text(text.loc,dimnames(eu.pov)[[1]],cex=1.2)
mtext(side=1,cex=1.3,line=2,"Relative Income, Over 65")
mtext(side=2,cex=1.3,line=2,"Poverty rate, Over 65")
dev.off()
Linear Regression Lecture [45]
Example: Poverty Among the Elderly, Europe
60 70 80 90 100
5
1
0
1
5
2
0
2
5
3
0
3
5
Netherlands
Luxembourg Sweden
Germany
Italy
Spain
Finland
France
United.Kingdom
Belgium Austria
Denmark
Portugal
Greece
Ireland
Relative Income, Over 65
P
o
v
e
r
t
y

r
a
t
e
,

O
v
e
r

6
5
Linear Regression Lecture [46]
Example: Poverty Among the Elderly, Europe
summary(x.y.fit)
Call:
lm(formula = eu.pov[, 2] ~ eu.pov[, 1])
Residuals:
Min 1Q Median 3Q Max
-12.224 -3.312 1.482 3.923 7.424
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 83.6928 12.2526 6.831 1.21e-05
eu.pov[, 1] -0.7647 0.1458 -5.246 0.000158
Residual standard error: 5.611 on 13 degrees of freedom
Multiple R-Squared: 0.6792, Adjusted R-squared: 0.6545
F-statistic: 27.52 on 1 and 13 DF, p-value: 0.0001580
Linear Regression Lecture [47]
More on Linear Models
In this example the slope of the line is 0.7647, which tells us how much the poverty rate changes
for a one unit increase in income.
The intercept is 83.6928, which is what poverty would be for zero income.
The standard error is a measure of how reliable these estimates are. One common rule of thumb
is to see if the standard error is half the coecient estimates or less.
R-Squared tells us how much of the variance of the outcome variable can be explained by the
explanatory variable.
F-statistic tells us how well signicant the model t is.
Linear Regression Lecture [48]
OECD Example
The data are from the Organization for Economic Cooperation and Development (OECD) and
highlight the relationship between
commitment to employment protection measured on an interval scale (0-4) indicating the
quantity and extent of national legislation to protect jobs,
the total factor productivity dierence in growth rates between 1980-1990 and 1990-1998.
for 19 countries.
For details, see The Economist, September 23, 2000.
Linear Regression Lecture [49]
OECD Example
Prot. Prod.
United States 0.2 0.5
Canada 0.6 0.6
Australia 1.1 1.3
New Zealand 1.0 0.4
Ireland 1.0 0.1
Denmark 2.0 0.9
Finland 2.2 0.7
Austria 2.4 -0.1
Belgium 2.5 -0.4
Japan 2.6 -0.4
Sweden 2.9 0.5
Netherlands 2.8 -0.5
France 2.9 -0.9
Germany 3.2 -0.2
Greece 3.6 -0.3
Portugal 3.9 0.3
Italy 3.8 -0.3
Spain 3.5 -1.5
Linear Regression Lecture [50]
OECD Example
oecd<-read.table("http://MS-702-2009/data/oecd. data",header=TRUE,row.names=1)
postscript("Class.Multilevel/trends4.ps")
par(bg="lightgrey")
plot(oecd$Prot,oecd$Prod,xlim=c(-0.2,4.2),ylim=c(-1.7,1.7),pch=15,xlab="",ylab="")
x.y.fit <- lsfit(oecd$Prot,oecd$Prod)
abline(x.y.fit$coefficients,col="firebrick")
text(oecd$Prot,(oecd$Prod-0.1),dimnames(oecd)[[1]])
mtext(side=1,cex=1.3,line=2,"Employment Protection Scale")
mtext(side=2,cex=1.3,line=2,"Total Factor Productivity Difference")
dev.off()
Linear Regression Lecture [51]
OECD Example
0 1 2 3 4

1
.
5

1
.
0

0
.
5
0
.
0
0
.
5
1
.
0
1
.
5
UnitedStates
Canada
Australia
NewZealand
Ireland
Denmark
Finland
Austria
Belgium Japan
Sweden
Netherlands
France
Germany
Greece
Portugal
Italy
Spain
Employment Protection Scale
T
o
t
a
l

F
a
c
t
o
r

P
r
o
d
u
c
t
i
v
i
t
y

D
i
f
f
e
r
e
n
c
e
Linear Regression Lecture [52]
OECD Example
oecd.fit <- lm(oecd$Prod~oecd$Prot)
summary(oecd.fit)
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.8591 0.3174 2.706 0.0156
oecd$Prot -0.3496 0.1215 -2.878 0.0109
---
Residual standard error: 0.5761 on 16 degrees of freedom
Multiple R-Squared: 0.3411, Adjusted R-squared: 0.2999
F-statistic: 8.284 on 1 and 16 degrees of freedom, p-value: 0.01093
Linear Regression Lecture [53]
So What Does a Bad Regression Model Look Like?
Run the following....
time = c(0.25, 0.5, 0.75, 1, 1.25, 2, 3, 4, 5, 6, 8, 0.25, 0.5,
0.75, 1, 1.25, 2, 3, 4, 5, 6, 8, 0.25, 0.5, 0.75, 1, 1.25, 2, 3,
4, 5, 6, 8, 0.25, 0.5, 0.75, 1, 1.25, 2, 3, 4, 5, 6, 8, 0.25,
0.5, 0.75, 1, 1.25, 2, 3, 4, 5, 6, 8, 0.25, 0.5, 0.75, 1, 1.25,
2, 3, 4, 5, 6, 8)
conc = c(1.5, 0.94, 0.78, 0.48, 0.37, 0.19, 0.12, 0.11,
0.08, 0.07, 0.05, 2.03, 1.63, 0.71, 0.7, 0.64, 0.36, 0.32, 0.2,
0.25, 0.12, 0.08, 2.72, 1.49, 1.16, 0.8, 0.8, 0.39, 0.22, 0.12,
0.11, 0.08, 0.08, 1.85, 1.39, 1.02, 0.89, 0.59, 0.4, 0.16, 0.11,
0.1, 0.07, 0.07, 2.05, 1.04, 0.81, 0.39, 0.3, 0.23, 0.13, 0.11,
0.08, 0.1, 0.06, 2.31, 1.44, 1.03, 0.84, 0.64, 0.42, 0.24, 0.17,
0.13, 0.1, 0.09)
#postscript("Class.Multilevel/conc.ps"); par(bg="lightgrey")
conc.fit <- lm(conc~time)
plot(time,conc,pch=5)
abline(conc.fit,col="steelblue4",lwd=3)
#dev.off()
Linear Regression Lecture [54]
Extended Policy Example
# SIMPLE REGRESSION EXAMPLE USING THE DETROIT MURDER DATASET, YEARS 1961-1974
# R CODE HERE RUNS A BASIC MODEL AND VARIOUS DIAGNOSTICS, EMAIL QUESTIONS
#FTP - Full-time police per 100,000 population
#UEMP - % unemployed in the population
#MAN - number of manufacturing workers in thousands
#NMAN - Number of non-manufacturing workers in thousands
#GOV - Number of government workers in thousands
#LIC - Number of handgun licences per 100,000 population (you can buy one)
#GR - Number of handgun registrations per 100,000 population (you own one)
#CLEAR - % homicides cleared by arrests
#WM - Number of white males in the population
#HE - Average hourly earnings
#WE - Average weekly earnings
Linear Regression Lecture [55]
#HOM - Number of homicides per 100,000 of population
#ACC - Death rate in accidents per 100,000 population
#ASR - Number of assaults per 100,000 population
# LOAD DATA FILE, CREATE DATA FRAME
count.fields("public_html/data/detroit.data")
count.fields("http://jgill.wustl.edu/data/etroit.data")
detroit.df <- read.table("public_html/data/detroit.data",header=TRUE)
detroit.df <- read.table("http://jgill.wustl.edu/data/detroit.data",header=TRUE)
# ATTACH DATA FRAME AND CREATE A SUB-MATRIX
attach(detroit.df)
detroit.mat <- cbind(FTP,UEMP,LIC,CLEAR,HE,HOM)
# LOOK AT DATA WITH SCATTERPLOTS
# postscript("S.Dir/detroit.fig1.ps")
par(oma=c(1,1,4,1),bg="lightgrey")
pairs(detroit.mat)
# dev.off()
Linear Regression Lecture [56]
# RUN A MODEL, SUMMARIZE THE RESULTS
detroit.ols <- lm(HOM~FTP+UEMP+LIC+CLEAR+HE)
summary(detroit.ols)
detroit.table <- cbind(summary(detroit.ols)$coef[,1:2],
(summary(detroit.ols)$coef[,1] - 1.96*summary(detroit.ols)$coef[,2]),
(summary(detroit.ols)$coef[,1] + 1.96*summary(detroit.ols)$coef[,2]))
dimnames(detroit.table)[[2]] <-
c("Estimate","Std. Error","95% CI Lower","95% CI Upper")
detroit.table
# QQNORM DIAGNOSTIC ON RESIDUALS
# postscript("S.Dir/detroit.fig2.ps")
par(mfrow=c(1,1),oma=c(4,4,4,4),bg="lightgrey")
qqnorm(detroit.ols$residuals)
qqline(detroit.ols$residuals)
# dev.off()
# MISC DIAGNOSTIC VALUES
detroit.ols$residuals; detroit.ols$residual; residuals(detroit.ols)
Linear Regression Lecture [57]
rstudent(detroit.ols)
dfbetas(detroit.ols)
dffits(detroit.ols)
covratio(detroit.ols)
cooks.distance(detroit.ols)
X <- detroit.mat[,-6]
diag(X%*%solve(t(X)%*%X)%*%t(X))
# COOKS D DIAGNOSTIC
# postscript("S.Dir/detroit.fig3.ps")
cooks.vals <- cooks.distance(detroit.ols)
R.vals <- detroit.ols$residual
par(mfrow=c(2,1),mar=c(0,5,0,1),oma=c(5,5,5,5),bg="lightgrey")
plot(seq(1,13,length=13),rep(max(cooks.vals),13),type="n",xaxt="n",
ylab="Studentized Residuals by Year",xlab="",ylim=c(-3.7,3.7))
abline(h=0)
for (i in 1:length(R.vals)) segments(i,0,i,R.vals[i])
plot(seq(1,13,length=13),rep(max(cooks.vals),13),type="n",xaxt="n",
ylab="Cooks Distance by Year",
ylim=c(0,6), xlab="")
abline(h=0)
Linear Regression Lecture [58]
for (i in 1:length(cooks.vals))
segments(i,0,i,cooks.vals[i])
mtext(side=3,cex=1.3,"Leverage and Influence: Detroit Murder Data",outer=T,line=2)
# dev.off()
# JACKKNIFE OUT THE 8TH CASE AND RERUN MODEL
detroit2.df <- detroit.df[-8,]
detroit2.ols <- lm(HOM~FTP+UEMP+LIC+CLEAR+HE,data=detroit.df2)
detroit2.table <- cbind(summary(detroit2.ols)$coef[,1:2],
(summary(detroit2.ols)$coef[,1] - 1.96*summary(detroit2.ols)$coef[,2]),
(summary(detroit2.ols)$coef[,1] + 1.96*summary(detroit2.ols)$coef[,2]))
dimnames(detroit2.table)[[2]] <-
c("Estimate","Std. Error","95% CI Lower","95% CI Upper")
detroit2.table

You might also like