2 Linear Model (Final)

DrMudassirUddin
DrMudassirUddin

Review of Linear Model Basics
Linear Regression Lecture [1]
Gauss-Markov Assumptions for Classical Linear Regression
Functional Form: Y
(n1)
= X
(nk)(k1)
+
n1
(X is full rank and has leading column of 1s)
Mean Zero Errors: E[|X] = 0
Homoscedasticity: Var[
] =
2
I
Non-Correlated Errors: Cov[
i
,
j
] = 0, i = j
Exogeneity of Explanatory Variables Cov[
i
|X] = 0, i
Other Considerations
Requirements: conformability, X is rank k
Freebee: eventual normality. . . |X N(0,
2
I)
Least Squares Estimation (OLS)
Dene the following function:
S() =
= (YX)
(YX)
= Y
Y
(1n)(n1)
2Y
X
(1n)(nk)(k1)
+
X
(1k)(kn)(nk)(k1)
Take the derivative of S() with respect to :
S() = 0 2 X
Y
(kn)(n1)
+ 2X
X
(kn)(nk)(k1)
0
So there exists a solution at some value

of : X
Xb = X
Y which is the Normal Equation

Premultiplying the Normal Equation by (X
X)
1
, gives:

= (X
X)
1
X
Y, where we can call

as b for notational convenience (this is where the requirement for X
X to be nonsingular comes
from)
Implications
Normal Equation: X
Xb X
Y = X
(YXb) = X
e 0 (by assumption)
e
i
0
the regression hyperplane pass through means:

Y =

Xb
mean(

Y) = mean(Y)
the hat matrix (H, P, or (I M)):
e = YXb
= YX((X
X)
1
X
Y)
= Y(X(X
X)
1
X
)Y
= YHY
= (I H)Y
= MY
where M is symmetric and idempotent.
The HAT Matrix
the name is because

Y = Xb = X((X
X)
1
X
Y) = (X(X
X)
1
X
)Y = HY, but projection

matrix is better for geometric reasons.
properties of interest:
I M = P, I P = M
PX = X
PM = MP = 0 (orthogonality)
e
e = Y
MY = Y
MY = Ye (sum of squares)
Using the Normal Equation, X
Xb = X
Y,
e
e = (YXb)
(YXb) = Y
YY
Xb b
Y + b
Xb
= Y
Y(X
Xb)
b b
(X
Xb) + b
Xb
= Y
Y(b
)(Xb)
= Y
Y

Y

Y
Fit & Decomposition
Denitions:
SST =
n
i=1
(Y
i

Y)
2
SSR =
n
i=1
(

Y
i

Y)
2
SSE =
n
i=1
(

Y
i
Y
i
)
2
Fit & Decomposition
Interesting manipulations of the sum of squares total:
SST =
n
i=1
(Y
2
i
2Y
i

Y +

Y
2
)
=
n
i=1
Y
2
i
2
n
i=1
Y
i

Y + n

Y
2
=
n
i=1
Y
2
i
2n

Y
2
+ n

Y
2
=
n
i=1
Y
2
i
n

Y
2
(scalar description)
=
n
i=1
Y
2
i
n
_
1
n
n
i=1
Y
i
_
2
= Y
Y
1
n
Y
JY (matrix algebra description)

where J is a n n matrix of all 1s.
More Decomposition
Sum of Squares Regression:
SSR =
n
i=1
(

Y
2
i
2

Y
i

Y +

Y
2
)
=

Y
Y 2

Y
n
i=1
Y
i
+ n

Y
2
= (b
)Xb 2n

Y
2
+ n

Y
2
= (b
)Xb n

Y
2
= b
Y
1
n
Y
JY
Sum of Squares Error:
SSE =
n
i=1
(

Y
i
Y
i
)
2
= e
e
= (YX)
(YX)
= Y
YY
Xb b
Y + b
Xb
= Y
Yb
Y + (b
XY
)Xb
= Y
Yb
Y + (0)Xb
= Y
Yb
Y
Magic
SSR + SSE:
SSR + SST = (b
Y
1
n
Y
JY) + (Y
Yb
Y)
= Y
Y
1
n
Y
JY
= SST
A Measure of Fit
The R-Square or R-Squared measure:
R
2
=
SSR
SST
=
SST SSE
SST
= 1
e
e
Y
M
o
Y
=
b
M
o
Xb
Y
M
o
Y
where M
o
= I
1
n
i
i, i = c(1, 1, . . . , 1).
Note: M
o
is idempotent and transforms means to deviances for the explanatory variables.
Warning: R
2
is not a statistic and does not have quite the meaning that some expect.
There is another version that accounts for sample size and the number of explanatory variables:
R
2
adj
=
e
e/(n k)
Y
M
o
Y/(n 1)
Properties of OLS
b is unbiased for :
b = (X
X)
1
X
Y
= (X
X)
1
X
(X + )
= (X
X)
1
X
X + (X
X)
1
X
= + (X
X)
1
X
So E[b] = .
Properties of OLS
The variance of b is:
Var[b] = E[(b )](b )
] (E[b ])(E[b ])
= E[(b )](b )
]
= E[((X
X)
1
X
)((X
X)
1
X
)]
= (X
X)
1
X
E[
]X(X
X)
1
= (X
X)
1
X(
2
I)X(X
X)
1
=
2
(X
X)
1
Properties of OLS
Given Gauss-Markov assumptions, b is BLUE for if calculated from OLS.
b|X N(,
2
(X
X)
1
).
Est.Var[b] = s
2
(X
X)
1
, where s
2
= e
e/(n k), the standard error of the regression.

Most econometric texts make a distinction between non-stochastic explanatory variables (those
set in experimental settings) and stochastic explanatory variables which are set by nature/survey/others.
Perspective here: X is xed once observed and is the random variable:
since Var[] = I, then no single term dominates and we get the Lindberg-Feller CLT result.
so (the sample quantity) is IID normal and we write the joint PDF as:
f() =
n
i=1
f(
i
) = (2
2
)
n
2
exp[e
e/2
2
]
(estimation from E[e
e] =
2
).
Estimating From Sample Quantitities
Population derived variance/covariance matrix: Var[b] =
2
(X
X)
1
.
We also know: E[e
i
] =
i
.
And: E[e
2
i
] = Var[
i
] + (E[
i
])
2
= tr(
2
I) = n
2
.
So why not use:
2
1
n
e
2
i
.
But:
e
i
= y
i
X
i
b (now insert population values)
= (X
i
+
i
) X
i
b
=
i
X
i
(b )
where E[e
i
] =
i
.
Recall that: M = I H = I X(X
X)
1
X
. So that:
My = (I X(X
X)
1
X
)y
= y X(X
X)
1
X
y
= y Xb
= e
So e = Me since e = y y.
And therefore:
My = M[X + ]
= MX + M
= (I X(X
X)
1
)X + M
= X X(X
X)
1
)XX + M
= M
So e = Me and e
e = (M)
M =
M =
M.
So we can use this:
E[e
e|X] = E[e
Me|X]
= E[tr(e
Me)|X] (Gauss-Markov assumption)

= E[tr(Me
e)|X] (properties of traces: tr(ABC) = tr(CBA)

= tr(ME[e
e|X]) (Mis xed for observed X

= tr(M)
2
= [tr(I) tr(X(X
X)
1
X
)]
2
= [n k]
2
Tricks:
tr[X(X
X)
1
X] = tr[(X
X)
1
X
X] = k
rank[A] = tr[A] for symmetric idempotent A
rank[ABC] = rank[B] if A, B nonsingular
so tr[H] = rank[X
] = rank[X] = k
From the biased estimator of
2
, E[e
e|X] = (n k)
2
, we get:

2
=
e
e
n k
= s
2
,
so that an estimator of Var[b] is:
Var[b] = s
2
(X
X)
1
.
This sets up Wald-style traditional linear inference:
z
k
=
b
k
null
k
_
2
(X
X)
1
asym.
N(0, 1),
provided that we know
2
(which we usually do not).
But we know that:
(n k)s
2
2

2
nk
= X
2
z
k
/
_
X
2
/df t
(nk)
(0)
if the random variables z
k
and X
2
are independent.
Making the obvious substitution gives:
t
(nk)
=
b
k
null
k
_
2
(X
X)
1
1
_
(nk)s
2
2
(nk)
=
b
k
null
k
_
s
2
(X
X)
1
Typical (Wald) regression test:
H
0
:
k
= 0 H
1
:
k
= 0
making:
t
(nk)
=
b
k
null
k
_
s
2
(X
X)
1
=
b
k
SE(
k
)
Alternatives usually look like:
H
0
:
k
< 1 H
1
:
k
1
making:
t
(nk)
=
b
k
1
SE(
k
)
Summary Statistics
(1 ) Condence Interval for b
k
:
_
b
k
SE(b)t
/2,df
:b
k
+ SE(b)t
/2,df
(1 ) Condence Interval for

2
:
_
(n k)s
2
2
/2
:
(n k)s
2
2
1/2
_
F-statistic test for all but b
0
zero:
F =
SSR/(k 1)
SSE/(n k)
F
k1,nk
under the null
Multicollinearity Issues
If one explanatory variable is a linear combination of another then rank(X) = k 1.
Therefore rank(X
X) = k 1 (matrix size k k), and it is singular and non-invertible.

Now no parameter estimates are possible, and model is now unidentied.
More typically: 2 explanatory variables are highly but not perfectly correlated.
Symptoms:
small changes in data give large changes in parameter estimates.
coecients have large standard errors and poor t-statistics even if F-statistics and R
2
are okay.
coecients seem illogical (wrong sign, huge magnitude)
Multicollinearity Remedies
respecify model (if reasonable)
center explanatory variables, or standardize
ridge regression (add a little bias):
b = [X
X+ RI]
1
X
y
such that the [ ] part barely inverts.
Is b Unbiased?
Starting with b:
b = (X
X)
1
X
y = (X
X)
1
X
(X + ) = + (X
X)
1
X
,
and take expected values:
E[b] = E[ + (X
X)
1
X
] = E[] + E[(X
X)
1
X
] = + E[K] =
What is the Variance of b?
By denition:
Var[b|X] = E[(b )(b )|X] E[(b |X)]
2
.
Since b = (X
X)
1
X
,
Var[b|X] = E
_
((X
X)
1
X
)((X
X)
1
X
|X
= E
_
(X
X)
1
X
X(X
X)
1
|X
= (X
X)
1
X
E[ee
|X]X(X
X)
1
= (X
X)
1
X
2
IX(X
X)
1
=
2
(X
X)
1
X
X(X
X)
1
=
2
(X
X)
1
Testing Linear Restrictions
A theory has testable implications if it implies some testable restrictions on the model denition:
H
0
:
k
= 0 versus H
1
:
k
= 0
for example.
Most restrictions involve nested parameter space:
unrestricted:[
0
,
1
,
2
,
3
]
restricted:[
0
, 0,
2
,
3
].
Note that non-nested comparisons cause problems for non-Bayesians.
Testing Linear Restrictions
Linear restrictions for regression are clear when:
r
11
1
+ r
12
2
+ . . . + r
1k
k
= q
1
r
21
1
+ r
22
2
+ . . . + r
2k
k
= q
2
.
.
.
r
j1
1
+ r
j2
2
+ . . . + r
jk
k
= q
j
or in more succinct matrix algebra form: R
(Jk)
= q.
Notes:
Each row of R is one restriction.
J < k for R to be full rank.
This setup imposes J restrictions on k parameters, so there are k J free parameters left.
We are still assuming that
i
N(0,
2
)
General test:
H
0
: R q = 0 H
1
: R q = 0
Testing Linear Restrictions, Examples
One of the coecients is zero,
j
= 0, J = 1:
R = [0, 0, 0, 1
j
, . . . , 0, 0, 0], q = 0
Two coecients are equal,
j
=
k
, J = 2:
R = [0, 0, 0, 1
j
, . . . , 1
k
, . . . , 0, 0, 0], q = 0
Three coecients are zero, J = 3:
_
_
1 0 0 0 . . . 0
0 1 0 0 . . . 0
0 0 1 0 . . . 0
_
_
= [I:0], q =
_
_
0
0
0
_
_
A set of coecients sum to one,
2
+
3
+
4
= 1:
R = [0, 1, 1, 1, 0, . . . , 0], q = 1
Several restrictions,
2
+
3
= 4,
4
+
6
= 0,
5
= 9:
R =
_
_
0 1 1 0 0 0
0 0 0 1 0 1
0 0 0 0 1 0
_
_
, q =
_
_
4
0
9
_
_
All coecients except the constant are zero:
R = [0:I], q = [0]
Dene now the discrepency vector dictated by the null hypothesis:
Rb q = m 0
which asks whether m is suciently dierent from zero. Note that m is linear function of b and
therefore also normally distributed.
This makes it easy to think about:
E[m|X] = RE[b|X] q = R q = 0
Var[m|X] = Var[Rb q|X] = R[Var[b|X]R
=
2
R(X
X)
1
R
Wald Test:
W = m
[Var[m|X]]
1
m = (Rb q)
[
2
R(X
X)
1
R
]
1
(Rb q)
2
J
where J is the number of rows of R, i.e. the number of restrictions.
Unfortunately we do not have
2
, so we use the test:
F =
2
(Rb q)
(R(X
X)
1
R
)(Rb q)
1
2
J
2
s
2
_
n k
n k
_
=
(Rb q)
(
2
R(X
X)
1
R
)
1
(Rb q)/J
[(n k)s
2
/
2
]/(n k)
=
X
n

2
J
/J
X
d

2
nk
/(n k)
F
j,nk
=
thats the distributional interpretation, now simplify:
=
1
J
(Rb q)
(s
2
R(X
X)
1
R
)
1
(Rb q)
Example with 1 linear restriction, start with the denition:
H
0
: r
1
1
+ r
2
2
+ . . . + r
k
k
= r = q
F
1,nk
=
j
(r
j
b
j
q)
2
k
r
j
r
k
Est.Cov.[b
j
, b
k
]
and be more specic:
= 0, so R = [0, 0, . . . , 0, 1, 0, . . . , 0, 0], q = [0]

so that R(X
X)
1
R simplies to the jth diagonal of (X
X)
1
. Now:
Rb q = b
j
q, F
1,nk
=
b
j
q
Est.V ar.[b
j
]
1
2
.
Non-Normal Errors
Without
i
N(0,
2
) we do not have F, t,
2
results (bummer).
Despite that, we know that the asymptotic disribution of b is:
b N
_
,

2
n
Q
1
_
where Q = plim
_
X
X
n
_
and plims
2
=
2
where s
2
=
e
e
nk
.
So the test statistic is:
t
k
=
n(b
k
null
k
)
_
s
2
(X
X/n)
1
t
nk
provided that e
i
N(0,
2
).
Since the denominator converges to
_
2
Q
1
, then:
k
=
n(b
k
null
k
)
2
Q
1
=
n(b
k
null
k
)
Asym.V ar.[b
k
]
1
2
thus asymptotically justifying a t-test without the assumption of normality.
Non-Normal Errors Summary
Major theorem, if:
n(b ) d
N(0,
2
Q
1
)
and if
H
0
: R q = 0
is true, then
W = (Rb q)
[Rs
2
(X
X)
1
R
]
1
(Rb q) = JF d
2
J
Testing Nonlinear Restriction
H
0
: c() = q where c() is some nonlinear function.
Simple 1-restriction case:
z =
c(
)
est.SE
t
nk
(or equivalently z
2
F
1,nk
).
But getting est.SE is hard, so use the rst two terms of a Taylor series expansion to get an
estimate:
f(b) = f(a) + f
(a)
(b 1)
1
1!
+ f
(a)
(b a)
2
2!
+ f
(a)
(b a)
3
3!
+ . . .
meaning:
c(
) c() +
_
c()
)
so plim
= justies using c(
) instead of c().
Testing Nonlinear Restriction
Now we can calculate the needed variance term:
Var(c(
) = E
_
c(
)
2
_
(E[c(
)])
2
= E
_
c() +
_
c()
)
_
(E[c(
)])
2
= E
_
c()
2
2c()
_
c()
) +
_
c()
)
2
_
c()
__
(E[c(
)])
2
= q
2
2q
_
c()
(0) + E
__
c()
)
2
_
c()
__
q
2
= E
__
c()
Var(
)
_
c()
__
since E(
)
2
= E(
)
2
2
= Var(
).
All this means that we can use sample estimates for c()/ and plug in s
2
(X
X)
1
for Var(
)
and then test with a normal distribution.
Linear Model Predictions
We want the predicted value for x
0
not in the sample:
y
0
= x
0
+
0
y
0
= x
0
b
since y
0
is the LMVUE of E[ y
0
|x
0
].
The prediction error is:
e
0
= y
0
y
0
= x
0
( b) +
0
.
The Prediction variance is:
Var[e
0
|X, x
0
] =
2
+ Var[x
0
( b)|X, x
0
] =
2
+ x
0
2
(X
X)
1
x
0
and if we have a constant term in the regression, this is equivalent to:
2
_
_
1 +
1
n
+
K1
j=1
K1
k=1
(x
0
j
x
j
)(x
0
k
x
k
)(X
1
M
0
X
1
)
jk
_
_
,
where X
1
is X omitting the rst column, K is the number of explanatory variables (including
the constant), and M
0
= I
1
n
ii
.
Linear Model Predictions
This shows that the variance is greater the further away x
0
is from x. Classic diagram: where
prediction intervals are modeled y
0
t
/2
_
Var(e
0
)
Running Lowess
postscript("Class.Multilevel/trends1.ps")
par(bg="lightgrey")
x <- seq(1,25,length=600)
y <- (2/(pi*x))^(0.5)*(1-cos(x)) + rnorm(100,0,1/10)
par(mar=c(3,3,2,2))
plot(x,y,pch="+")
ols.object <- lm(y~x)
abline(ols.object,col="blue")
lo.object <- lowess(y~x,f=2/3)
lines(lo.object$x,lo.object$y,lwd=2,col="red")
lo.object <- lowess(y~x,f=1/5)
lines(lo.object$x,lo.object$y,lwd=2,col="purple")
dev.off()
Running Lowess
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+++
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
++
+
+
+
+
+
+
+
++
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
++
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
++
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
++
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+++
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
++
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
++
++
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
++
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
++
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
++
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
++
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+++
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
5 10 15 20 25
0
.
2
0
.
0
0
.
2
0
.
4
0
.
6
0
.
8
1
.
0
y
Example: Poverty Among the Elderly, Europe
Governments often worry about the economic condition of senior citizens for political and social
reasons.
Typically in a large industrialized society, a substantial portion of these people obtain the bulk of
their income from government pensions.
An important question is whether there is enough support through these payments to provide
subsistence above the poverty rate.
To see if this is a concern, the European Union (EU) looked at this question in 1998 for the (then)
15 member countries with two variables:
1. the median (EU standardized) income of individuals age 65 and older as a percentage of the
population age 064,
2. the percentage with income below 60% of the median (EU standardized) income of the national
population.
The data from the European Household Community Panel Survey are:
Relative Poverty
Nation Income Rate
Netherlands 93.00 7.00
Luxembourg 99.00 8.00
Sweden 83.00 8.00
Germany 97.00 11.00
Italy 96.00 14.00
Spain 91.00 16.00
Finland 78.00 17.00
France 90.00 19.00
United.Kingdom 78.00 21.00
Belgium 76.00 22.00
Austria 84.00 24.00
Denmark 68.00 31.00
Portugal 76.00 33.00
Greece 74.00 33.00
Ireland 69.00 34.00
eu.pov<-read.table("http://MS-702-2009/data/inc .pov.dat",row.names=1)
names(eu.pov) <- c("relative income", "poverty rate")
eu.pov <- eu.pov[-1,]
par(mfrow=c(1,1),mar=c(4,4,2,2),lwd=5,bg="lightgrey")
plot(eu.pov,pch=15,xlab="",ylab="",ylim=c(2,37),xlim=c(61,104))
lines(lowess(eu.pov),col="purple",lwd=3)
text.loc <- cbind(eu.pov[,1],(eu.pov[,2]-1))
text.loc[14,2] <- text.loc[14,2] +2
text(text.loc,dimnames(eu.pov)[[1]],cex=1.2)
mtext(side=1,cex=1.3,line=2,"Relative Income, Over 65")
mtext(side=2,cex=1.3,line=2,"Poverty rate, Over 65")
dev.off()
60 70 80 90 100
5
1
0
1
5
2
0
2
5
3
0
3
5
Netherlands
Luxembourg Sweden
Germany
Italy
Spain
Finland
France
United.Kingdom
Belgium Austria
Denmark
Portugal
Greece
Ireland
Relative Income, Over 65
P
o
v
e
r
t
y

r
a
t
e
,

O
v
e
r

6
5
par(mfrow=c(1,1),mar=c(4,4,2,2),lwd=5,bg="lightgrey")
plot(eu.pov,pch=15,xlab="",ylab="",ylim=c(2,37),xlim=c(61,104))
x.y.fit <- lm(eu.pov[,2] ~ eu.pov[,1])
abline(x.y.fit$coefficients,col="forest green")
text.loc <- cbind(eu.pov[,1],(eu.pov[,2]-1))
text(text.loc,dimnames(eu.pov)[[1]],cex=1.2)
mtext(side=1,cex=1.3,line=2,"Relative Income, Over 65")
mtext(side=2,cex=1.3,line=2,"Poverty rate, Over 65")
dev.off()
60 70 80 90 100
5
1
0
1
5
2
0
2
5
3
0
3
5
Netherlands
Luxembourg Sweden
Germany
Italy
Spain
Finland
France
United.Kingdom
Belgium Austria
Denmark
Portugal
Greece
Ireland
Relative Income, Over 65
P
o
v
e
r
t
y

r
a
t
e
,

O
v
e
r

6
5
summary(x.y.fit)
Call:
lm(formula = eu.pov[, 2] ~ eu.pov[, 1])
Residuals:
Min 1Q Median 3Q Max
-12.224 -3.312 1.482 3.923 7.424
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 83.6928 12.2526 6.831 1.21e-05
eu.pov[, 1] -0.7647 0.1458 -5.246 0.000158
Residual standard error: 5.611 on 13 degrees of freedom
Multiple R-Squared: 0.6792, Adjusted R-squared: 0.6545
F-statistic: 27.52 on 1 and 13 DF, p-value: 0.0001580
More on Linear Models
In this example the slope of the line is 0.7647, which tells us how much the poverty rate changes
for a one unit increase in income.
The intercept is 83.6928, which is what poverty would be for zero income.
The standard error is a measure of how reliable these estimates are. One common rule of thumb
is to see if the standard error is half the coecient estimates or less.
R-Squared tells us how much of the variance of the outcome variable can be explained by the
explanatory variable.
F-statistic tells us how well signicant the model t is.
OECD Example
The data are from the Organization for Economic Cooperation and Development (OECD) and
highlight the relationship between
commitment to employment protection measured on an interval scale (0-4) indicating the
quantity and extent of national legislation to protect jobs,
the total factor productivity dierence in growth rates between 1980-1990 and 1990-1998.
for 19 countries.
For details, see The Economist, September 23, 2000.
OECD Example
Prot. Prod.
United States 0.2 0.5
Canada 0.6 0.6
Australia 1.1 1.3
New Zealand 1.0 0.4
Ireland 1.0 0.1
Denmark 2.0 0.9
Finland 2.2 0.7
Austria 2.4 -0.1
Belgium 2.5 -0.4
Japan 2.6 -0.4
Sweden 2.9 0.5
Netherlands 2.8 -0.5
France 2.9 -0.9
Germany 3.2 -0.2
Greece 3.6 -0.3
Portugal 3.9 0.3
Italy 3.8 -0.3
Spain 3.5 -1.5
OECD Example
oecd<-read.table("http://MS-702-2009/data/oecd. data",header=TRUE,row.names=1)
par(bg="lightgrey")
plot(oecd$Prot,oecd$Prod,xlim=c(-0.2,4.2),ylim=c(-1.7,1.7),pch=15,xlab="",ylab="")
x.y.fit <- lsfit(oecd$Prot,oecd$Prod)
abline(x.y.fit$coefficients,col="firebrick")
text(oecd$Prot,(oecd$Prod-0.1),dimnames(oecd)[[1]])
mtext(side=1,cex=1.3,line=2,"Employment Protection Scale")
mtext(side=2,cex=1.3,line=2,"Total Factor Productivity Difference")
dev.off()
OECD Example
0 1 2 3 4
1
.
5
1
.
0
0
.
5
0
.
0
0
.
5
1
.
0
1
.
5
UnitedStates
Canada
Australia
NewZealand
Ireland
Denmark
Finland
Austria
Belgium Japan
Sweden
Netherlands
France
Germany
Greece
Portugal
Italy
Spain
Employment Protection Scale
T
o
t
a
l

F
a
c
t
o
r

P
r
o
d
u
c
t
i
v
i
t
y

D
i
f
f
e
r
e
n
c
e
OECD Example
oecd.fit <- lm(oecd$Prod~oecd$Prot)
summary(oecd.fit)
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.8591 0.3174 2.706 0.0156
oecd$Prot -0.3496 0.1215 -2.878 0.0109
---
Residual standard error: 0.5761 on 16 degrees of freedom
Multiple R-Squared: 0.3411, Adjusted R-squared: 0.2999
F-statistic: 8.284 on 1 and 16 degrees of freedom, p-value: 0.01093
So What Does a Bad Regression Model Look Like?
Run the following....
time = c(0.25, 0.5, 0.75, 1, 1.25, 2, 3, 4, 5, 6, 8, 0.25, 0.5,
0.75, 1, 1.25, 2, 3, 4, 5, 6, 8, 0.25, 0.5, 0.75, 1, 1.25, 2, 3,
4, 5, 6, 8, 0.25, 0.5, 0.75, 1, 1.25, 2, 3, 4, 5, 6, 8, 0.25,
0.5, 0.75, 1, 1.25, 2, 3, 4, 5, 6, 8, 0.25, 0.5, 0.75, 1, 1.25,
2, 3, 4, 5, 6, 8)
conc = c(1.5, 0.94, 0.78, 0.48, 0.37, 0.19, 0.12, 0.11,
0.08, 0.07, 0.05, 2.03, 1.63, 0.71, 0.7, 0.64, 0.36, 0.32, 0.2,
0.25, 0.12, 0.08, 2.72, 1.49, 1.16, 0.8, 0.8, 0.39, 0.22, 0.12,
0.11, 0.08, 0.08, 1.85, 1.39, 1.02, 0.89, 0.59, 0.4, 0.16, 0.11,
0.1, 0.07, 0.07, 2.05, 1.04, 0.81, 0.39, 0.3, 0.23, 0.13, 0.11,
0.08, 0.1, 0.06, 2.31, 1.44, 1.03, 0.84, 0.64, 0.42, 0.24, 0.17,
0.13, 0.1, 0.09)
#postscript("Class.Multilevel/conc.ps"); par(bg="lightgrey")
conc.fit <- lm(conc~time)
plot(time,conc,pch=5)
abline(conc.fit,col="steelblue4",lwd=3)
#dev.off()
Extended Policy Example
# SIMPLE REGRESSION EXAMPLE USING THE DETROIT MURDER DATASET, YEARS 1961-1974
# R CODE HERE RUNS A BASIC MODEL AND VARIOUS DIAGNOSTICS, EMAIL QUESTIONS
#FTP - Full-time police per 100,000 population
#UEMP - % unemployed in the population
#MAN - number of manufacturing workers in thousands
#NMAN - Number of non-manufacturing workers in thousands
#GOV - Number of government workers in thousands
#LIC - Number of handgun licences per 100,000 population (you can buy one)
#GR - Number of handgun registrations per 100,000 population (you own one)
#CLEAR - % homicides cleared by arrests
#WM - Number of white males in the population
#HE - Average hourly earnings
#WE - Average weekly earnings
#HOM - Number of homicides per 100,000 of population
#ACC - Death rate in accidents per 100,000 population
#ASR - Number of assaults per 100,000 population
# LOAD DATA FILE, CREATE DATA FRAME
count.fields("public_html/data/detroit.data")
count.fields("http://jgill.wustl.edu/data/etroit.data")
detroit.df <- read.table("public_html/data/detroit.data",header=TRUE)
detroit.df <- read.table("http://jgill.wustl.edu/data/detroit.data",header=TRUE)
# ATTACH DATA FRAME AND CREATE A SUB-MATRIX
attach(detroit.df)
detroit.mat <- cbind(FTP,UEMP,LIC,CLEAR,HE,HOM)
# LOOK AT DATA WITH SCATTERPLOTS
# postscript("S.Dir/detroit.fig1.ps")
par(oma=c(1,1,4,1),bg="lightgrey")
pairs(detroit.mat)
# dev.off()
# RUN A MODEL, SUMMARIZE THE RESULTS
detroit.ols <- lm(HOM~FTP+UEMP+LIC+CLEAR+HE)
summary(detroit.ols)
detroit.table <- cbind(summary(detroit.ols)$coef[,1:2],
(summary(detroit.ols)$coef[,1] - 1.96*summary(detroit.ols)$coef[,2]),
(summary(detroit.ols)$coef[,1] + 1.96*summary(detroit.ols)$coef[,2]))
dimnames(detroit.table)[[2]] <-
c("Estimate","Std. Error","95% CI Lower","95% CI Upper")
detroit.table
# QQNORM DIAGNOSTIC ON RESIDUALS
par(mfrow=c(1,1),oma=c(4,4,4,4),bg="lightgrey")
qqnorm(detroit.ols$residuals)
qqline(detroit.ols$residuals)
# dev.off()
# MISC DIAGNOSTIC VALUES
detroit.ols$residuals; detroit.ols$residual; residuals(detroit.ols)
rstudent(detroit.ols)
dfbetas(detroit.ols)
dffits(detroit.ols)
covratio(detroit.ols)
cooks.distance(detroit.ols)
X <- detroit.mat[,-6]
diag(X%*%solve(t(X)%*%X)%*%t(X))
# COOKS D DIAGNOSTIC
cooks.vals <- cooks.distance(detroit.ols)
R.vals <- detroit.ols$residual
par(mfrow=c(2,1),mar=c(0,5,0,1),oma=c(5,5,5,5),bg="lightgrey")
plot(seq(1,13,length=13),rep(max(cooks.vals),13),type="n",xaxt="n",
ylab="Studentized Residuals by Year",xlab="",ylim=c(-3.7,3.7))
abline(h=0)
for (i in 1:length(R.vals)) segments(i,0,i,R.vals[i])
plot(seq(1,13,length=13),rep(max(cooks.vals),13),type="n",xaxt="n",
ylab="Cooks Distance by Year",
ylim=c(0,6), xlab="")
abline(h=0)
for (i in 1:length(cooks.vals))
segments(i,0,i,cooks.vals[i])
mtext(side=3,cex=1.3,"Leverage and Influence: Detroit Murder Data",outer=T,line=2)
# dev.off()
# JACKKNIFE OUT THE 8TH CASE AND RERUN MODEL
detroit2.df <- detroit.df[-8,]
detroit2.ols <- lm(HOM~FTP+UEMP+LIC+CLEAR+HE,data=detroit.df2)
detroit2.table <- cbind(summary(detroit2.ols)$coef[,1:2],
(summary(detroit2.ols)$coef[,1] - 1.96*summary(detroit2.ols)$coef[,2]),
(summary(detroit2.ols)$coef[,1] + 1.96*summary(detroit2.ols)$coef[,2]))
dimnames(detroit2.table)[[2]] <-
c("Estimate","Std. Error","95% CI Lower","95% CI Upper")
detroit2.table

2 Linear Model (Final)

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

2 Linear Model (Final)

Uploaded by

Copyright:

Available Formats

DrMudassirUddin

Y which is the Normal Equation

Y, where we can call

as b for notational convenience (this is where the requirement for X

)Y = HY, but projection

JY (matrix algebra description)

e/(n k), the standard error of the regression.

Me)|X] (Gauss-Markov assumption)

e)|X] (properties of traces: tr(ABC) = tr(CBA)

e|X]) (Mis xed for observed X

(1 ) Condence Interval for

X) = k 1 (matrix size k k), and it is singular and non-invertible.

= 0, so R = [0, 0, . . . , 0, 1, 0, . . . , 0, 0], q = [0]

You might also like