Professional Documents
Culture Documents
Estimation
January 20, 2010
Tiejun (Ty) Tong
Department of Applied Mathematics
Simple Linear Regression
A simple linear regression model is dened as
Y
i
= a
0
+a
1
x
i
+c
i
,
where
Y
i
is the response values,
x
i
is the predictor values,
a
0
is the intercept,
a
1
is the slope,
c
i
are i.i.d. random variables from N(0, o
2
).
For ease of notation, denote x =
1
n
n
i =1
x
i
,
Y =
1
n
n
i =1
Y
i
,
S
xx
=
n
i =1
(x
i
x)
2
, and S
xy
=
n
i =1
(x
i
x)(Y
i
Y).
Least Square Estimation
The LS estimates of a
0
and a
1
are dened to be the values of
a
0
and
a
1
such that the line
a
0
+
a
1
x minimizes the RSS.
(
a
0
,
a
1
) = argmin
c,d
n
i =1
(Y
i
(c + dx
i
))
2
.
The LS estimators of a
0
and a
1
are
a
1
= S
xy
,S
xx
,
a
0
=
Y
a
1
x.
Given
a
0
and
a
1
, the tted linear regression model is
Y =
a
0
+
a
1
x.
Least Square Estimation
The dierence between the observed value Y
i
and the tted
value
Y
i
is called a residual. We denote it as
e
i
= Y
i
Y
i
= Y
i
(
a
0
+
a
1
x
i
), i = 1, . . . , n.
An unbiased estimator of o
2
is given as
o
2
=
RSS
n 2
=
1
n 2
n
i =1
e
2
i
.
The coecient of determination, denoted by r
2
, is given by
r
2
= 1
RSS
SST
= 1
n
i =1
(Y
i
Y
i
)
2
n
i =1
(Y
i
Y)
2
.
Maximum Likelihood Estimation
The lease squares method can be used to estimate a
0
and a
1
regardless of the distribution form of the error term c (either
normal or non-normal errors).
For Inference problems such as hypothesis testing and
condence interval construction, we need to assume that the
distribution of the errors are known.
For a simple linear regression model, we assume that
c
i
i .i .d.
N(0, o
2
), i = 1, . . . , n.
Thus for xed design points x
i
, the observations Y
i
are
independently r.v.s with distribution
Y
i
N(a
0
+a
1
x
i
, o
2
), i = 1, . . . , n.
Maximum Likelihood Estimation
Under the normal errors assumption, the joint pdf of
Y
1
, . . . , Y
n
is
f (Y
1
, . . . , Y
n
a
0
, a
1
, o
2
)
=
n
i =1
f (Y
i
a
0
, a
1
, o
2
)
=
1
(2o
2
)
n/2
exp
{
1
2o
2
n
i =1
(Y
i
a
0
a
1
x
i
)
2
}
.
The log-likelihood function is
log L(a
0
, a
1
, o
2
Y
1
, . . . , Y
n
)
=
n
2
log(2)
n
2
log(o
2
)
1
2o
2
n
i =1
(Y
i
a
0
a
1
x
i
)
2
.
Maximum Likelihood Estimation
Taking the rst partial derivatives of the log-likelihood
function on a
0
, a
1
and o
2
, we have
n
i =1
(Y
i
a
0
a
1
x
i
) = 0,
n
i =1
x
i
(Y
i
a
0
a
1
x
i
) = 0,
n
i =1
(Y
i
a
0
a
1
x
i
)
2
= no
2
.
Solving the above equations leads to
a
1,ML
= S
xy
,S
xx
,
a
0,ML
=
Y
a
1,ML
x, and o
2
ML
=
1
n
n
i =1
e
2
i
.
Note that the ML estimators of a
0
and a
1
are identical to the
LS estimators of a
0
and a
1
.
Properties of
0
and
1
First,
a
0
and
a
1
can be represented as linear combinations
of the observations Y
i
:
a
1
=
1
S
xx
n
i =1
(x
i
x)(Y
i
Y) =
n
i =1
c
i
Y
i
,
a
0
=
1
n
n
i =1
Y
i
n
i =1
c
i
xY
i
=
n
i =1
(
1
n
c
i
x)Y
i
,
where c
i
= (x
i
x),S
xx
.
Second,
a
0
and
a
1
are unbiased estimators of a
0
and a
1
,
respectively. For example,
E(
a
1
) =
n
i =1
c
i
(a
0
+a
1
x
i
) = a
0
n
i =1
c
i
+a
1
n
i =1
c
i
x
i
= a
1
,
where
n
i =1
c
i
= 0 and
n
i =1
c
i
x
i
= 1.
Properties of
0
and
1
The variances of
a
0
and
a
1
are
Var(
a
1
) =
n
i =1
c
2
i
Var(Y
i
) =
o
2
S
xx
,
Var(
a
0
) = Var(
Y) + x
2
Var(
a
1
) = o
2
(
1
n
+
x
2
S
xx
),
where
n
i =1
c
2
i
= 1,S
xx
, and the covariance of
Y and
a
1
is
zero.
Lastly, it can be shown that
a
0
and
a
1
are the Best Linear
Unbiased Estimators (BLUE) of a
0
and a
1
, where the
best implies the minimum variance. This result is called the
Gauss-Markov Theorem.
Distributions of the Estimators
Theorem: Let Z
1
, . . . , Z
n
be mutually independent random
variables with Z
i
N(j
i
, o
2
i
). Let a
1
, . . . , a
n
and b
1
, . . . , b
n
be xed constants. Then
Z =
n
i =1
(a
i
Z
i
+ b
i
) N
(
n
i =1
(a
i
j
i
+ b
i
),
n
i =1
a
2
i
o
2
i
)
.
The distributions of
a
0
and
a
1
are
a
0
N
(
a
0
, o
2
(
1
n
+
x
2
S
xx
)
)
,
a
1
N
(
a
1
,
o
2
S
xx
)
.
Furthermore, (
a
0
,
a
1
) and o
2
(unbiased estimator) are
independent and
(n 2) o
2
o
2
2
n2
.