You are on page 1of 46

Econometrics [EM2008]

Lecture 2
The k-variable linear regression model

Irene Mammi

irene.mammi@unive.it

Academic Year 2018/2019

1 / 46
outline

I the k-variable linear regression model


I matrix formulation
I partial correlation coefficients
I inference
I prediction

I References:
I Johnston, J. and J. DiNardo (1997), Econometrics Methods, 4th
Edition, McGraw-Hill, New York, Chapter 3.

2 / 46
the multivariate model

I the bivariate framework is too restrictive for realistic analysis of


economic phenomena
I generally more useful to specify multivariate relations
I restrict the analysis to a single equation which now include k variables
I the specification of such a relationship is

Yt = β 1 + β 2 X2t + β 3 X3t + · · · + β k Xkt + ut t = 1, . . . , n

which identifies k − 1 explanatory variables, namely X2 , X3 , . . . , Xk ,


that are thought to influence the dependent variable
I nb: the X may be transformations of other variables but the
relationship is linear in the β coefficients
I assume that the disturbances are white noise
I k + 1 parameters to estimate, the β’s and the disturbance variance σ2

3 / 46
matrix formulation of the k-variable model
I matrices indicated by uppercase bold letters, vectors by lowercase
bold letters
I vectors generally taken as column vectors
I for example,
   
Y1 X21
 Y2  X22 
y= .  x2 =  . 
   
 ..   .. 
Yn X2n
are n × 1 vectors, also referred to n-vectors, containing the sample
observations on Y and X2
I the n sample observations on the k-variable model can be written as

.. .. .. .. ..
         
. . .  .  .
y  = β 1 x 1  + β 2 x 2  + · · · + β k x k  + u 
         
.. .. .. .. ..
. . . . .
4 / 46
matrix formulation of the k-variable model (cont.)

I the y vector is expressed as a linear combination of the x vectors plus


the disturbance vector u
I the x 1 vector is a column of ones to allow for the intercept term
I collecting all the x vectors intro a matrix X and the β coefficients
into a vector β, can write

y = Xβ + u

where
· · · Xk1
   
1 X21 β1
1 X22 · · · Xk2   β2 
X = . and β= . 
   
.. .. .. 
 .. . . .   .. 
1 X2n · · · Xkn βk

5 / 46
the algebra of least squares

I if the unknown vector β is replaced by some guess or estimate b, this


defines a vector of residuals e,

e = y − Xb

I the least squares principle is to choose b to minimize the residual


sum of squares e 0 e, namely,

RSS = e 0 e
= (y − Xb )0 (y − Xb )
= y 0 y − b 0 X 0 y − y 0 Xb + b 0 X 0 Xb
= y 0 y − 2b 0 X 0 y + b 0 X 0 Xb

6 / 46
the algebra of least squares (cont.)
I the first order-conditions for the minimization are
∂(RSS )
= −2X 0 y + 2X 0 Xb = 0
∂b
giving the normal equations

(X 0 X )b = X 0 y

I if y is replaced by Xb + e the result is

(X 0 X )b = X 0 (Xb + e ) = (X 0 X )b + X 0 e

thus
X 0e = 0
which is another fundamental least-squares result
I the first element in this equation gives ∑ et = 0, that is,

ē = Ȳ − b1 − b2 X̄2 − · · · − bk X̄k = 0
7 / 46
the algebra of least squares (cont.)

I ⇒ the residuals have zero mean, and the regression plane passes
through the point of means in k dimensional space
I the remaining elements are of the form

∑ Xit et = 0 i = 2, . . . , k
t

which implies that each regressor has zero sample correlation with the
residuals
I this, in turn,implies that ŷ (= Xb ), the vector of the regression values
for Y , is uncorrelated with e, for

ŷ 0 e = (Xb )0 e = b 0 X 0 e = 0

8 / 46
the algebra of least squares (cont.)

normal equations for the two-variable case

I here, k = 2 and the model of interest is Y = β 1 + β 2 X + u


I the X matrix is  
1 X1
1 X2 
X = .
 
.. 
 .. . 
1 Xn
thus,
 
1 X1
 1 X2  ∑X
  
1 1 ··· 1  n
X 0X = =

X1 X2 ··· Xn  ... ..
∑ X ∑ X2
 
. 
1 Xn

9 / 46
the algebra of least squares (cont.)

and  
Y1
1 Y2  ∑Y
    
1 1 ···
X 0y =  ..  =
X1 X2 ··· Xn  .  ∑ XY
Yn
giving
∑X ∑Y
    
n b1
=
∑X ∑ X2 b2 ∑ XY
or

nb1 + b2 ∑ X = ∑Y
b1 ∑ X + b2 ∑ X 2
= ∑ XY

10 / 46
the algebra of least squares (cont.)

normal equations for the three-variable case

I in a similar way, it may be shown that the normal equations for fitting
a three-variable equation by least squares are

nb1 + b2 ∑ X2 + b3 ∑ X3 = ∑Y
b1 ∑ X2 + b2 ∑ X22 + b3 ∑ X2 X3 = ∑ X2 Y
b1 ∑ X3 + b2 ∑ X2 X3 + b3 ∑ X32 = ∑ X3 Y

11 / 46
decomposition of the sum of squares

I the zero covariances between regressors and the residuals underlie the
decomposition of the sum of squares
I decomposing the y vector into the part explained by the regression
and the unexplained part, we have

y = ŷ + e = Xb + e

from which it follows that

y 0 y = (ŷ + e )0 (ŷ + e ) = ŷ 0 ŷ + e 0 e = b 0 X 0 Xb + e 0 e

I however, y 0 y = ∑nt=1 Yt2 is the sum of squares of actual Y values;


interest normally centers on analyzing the variation in Y , measured by
the sum of the squared deviations from the sample mean, namely,

∑(Yt − Ȳt )2 = ∑ Yt2 − nȲ 2


t t

12 / 46
decomposition of the sum of squares (cont.)

I subtracting nȲ 2 from each side of the previous decomposition gives

(y 0 y − nȲ 2 ) = (b 0 X 0 Xb − nȲ 2 ) + e0e


TSS = ESS + RSS

where TSS indicates the total sum of squares in Y , and ESS and
RSS the explained and residual (unexplained) sum of squares

13 / 46
equation in deviation form
I alternatively, express all the data in the form of deviations from the
sample mean
I the least-squares equation is

Yt = b1 + b2 X2t + b3 X3t + · · · + bk Xkt + et t = 1, . . . , n

I averaging over the sample observations gives

Ȳ = b1 + bx X̄2 + b3 X̄3 + · · · + bk X̄k

which contains no term in e, since ē is zero


I subtracting the second equation from the first gives

yt = b2 x2t + b3 x3t + · · · + bk xkt + et t = 1, . . . , n

I intercept b1 disappears, but it may be recovered from

b1 = Ȳ − b2 X̄2 − · · · − bk X̄k
14 / 46
equation in deviation form (cont.)
I nb: the least-squares slope coefficients b2 , . . . , bk are identical in both
forms of the regression equation, and so the residuals
I collecting all n observations, the deviation form of the equation may
be written compactly using a transformation matrix
 
1
A = In − ii0
n

where i is a column vector of n ones


I it follows that Ae = e and Ai = 0
I write the least-squares equation as
 
  b1
y = Xb + e = i X2 +e
b2

where X 2 is the n × (k − 1) matrix of observations on the regressors


and b 2 is the k − 1 element vector containing the coefficients
b2 , b3 , . . . , bk
15 / 46
equation in deviation form (cont.)
I premultiplying by A gives
 
  b1
Ay = 0 AX 2 + Ae = (AX 2 )b 2 + e
b2
or
y ∗ = X ∗b2 + e
where y ∗ = Ay and X ∗ = AX 2 give the data in deviation form.
Since X 0 e = 0, it follows that X 0∗ e = 0.
I premultiplying previous equation by X 0∗ gives

X 0∗ y ∗ = (X 0∗ X ∗ )b 2

which are the familiar normal equations, except that now the data
have all been expresses in deviation form and the b 2 vector contains
the k − 1 slope coefficients and excludes the intercept term

16 / 46
equation in deviation form (cont.)
I the decomposition of the sum of squares may be expressed as

y 0∗ y ∗ = b 20 X 0∗ X ∗ b 2 + e0e
TSS = ESS + RSS

I the coefficient of multiple correlation R is defined as the positive


square root of
ESS RSS
R2 = = 1−
TSS TSS
2
I the adjusted R is defined as

RSS/(n − k )
R̄ 2 = 1 −
TSS/(n − 1)
I the numerator and the denominator on the RHS are unbiased
estimators of the disturbance variance and the variance of Y

17 / 46
equation in deviation form (cont.)
I the relation between the adjusted and unadjusted coefficients is

n−1
R̄ 2 = 1 − (1 − R 2 )
n−k
1−k n−1 2
= + R
n−k n−k
I two alternative criteria for comparing the fit of specifications are the
Schwarz criterion
e0e k
SC = ln + ln n
n n

and the Akaike information criterion

e0e 2k
AIC = ln +
n n

18 / 46
generalizing partial correlation
I the normal equations solve for b = (X 0 X )−1 X 0 y
I the residuals from the LS regression may be expressed as

e = y − Xb = y − X (X 0 X )−1 X 0 y = My

where
M = I − X (X 0 X ) −1 X 0
I M is a symmetric, idempotent matrix; it also has the properties that
MX = 0 and Me = e
I now write the general regression in partitioned form as
 
  b2
y = x2 X∗ +e
b (2)

I in this partitioning x 2 is the n × 1 vector of observations on X2 , with


coefficient b2 , and X ∗ is the n × (k − 1) matrix of all the other
variables (including the column of ones) with coefficient vector b (2)

19 / 46
generalizing partial correlation (cont.)
I the normal equations for this setup are
 0
x 2 x 2 x 20 X ∗
   0 
b2 x y
= 20
X 0∗ x 2 X 0∗ X ∗ b (2) X ∗y

I the solution for b2 is

b2 = (x 20 M ∗ x 2 )−1 (x 20 M ∗ y )

where
M ∗ = I − X ∗ (X 0∗ X ∗ )−1 X 0∗
M ∗ is a symmetric, idempotent matrix with the properties
M ∗ X ∗ = 0 and M ∗ e = e
I we have that

M ∗ y is the vector of residuals when y is regressed on X ∗


M ∗ x 2 is the vector of residuals when x 2 is regressed on X ∗

20 / 46
generalizing partial correlation (cont.)
I regressing the first vector on the second gives a slope coefficient,
which, using the simmetry and idempotency of M ∗ , gives the b2
coefficient defined above
I a simpler way to prove the same result is as follows: write the
partitioned regression as

y = x 2 b2 + X ∗ b ( 2 ) + e

I premultiplying by M ∗ , obtain

M ∗ y = ( M ∗ x 2 ) b2 + e

I finally, premultiply by x 20 which gives

x 20 M ∗ y = (x 20 M ∗ x 2 )b2

21 / 46
inference in the k-variables equation
assumptions

1. X is nonstochastic and has full rank k


2. the errors have the properties

E(u ) = 0

and
var(u ) = E(uu 0 ) = σ2 I

I since the expected value operator is applied to every element of a


vector or matrix, we have
     
u1 E(u1 ) 0
u2  E(u2 )  0
E(u ) = E  .  =  .  =  .  = 0
     
 ..   ..   .. 
un E ( un ) 0

22 / 46
inference in the k-variables equation (cont.)
u1
  
  u2   
E(uu 0 ) = E  .  u1 u2 ··· un 
  
 ..  
un
E(u12 ) E ( u1 u2 ) · · · E ( u1 un )
 
 E ( u2 u1 ) 2
E ( u2 ) · · · E ( u2 un ) 
=
 
.. .. .. .. 
 . . . . 
E(un u1 ) E(un u2 ) · · · E(un2 )
var(u1 ) cov(u1 , u2 ) · · · cov(u1 , un )
 
cov(u2 , u1 ) var(u2 ) · · · cov(u2 , un )
=
 
.. .. .. .. 
 . . . . 
cov(un , u1 ) cov(un , u2 ) · · · var(un )
 2
0 ··· 0

σ
 0 σ2 · · · 0 
2
= . ..  = σ I
 
.. ..
 .. . . . 
0 0 · · · σ2

23 / 46
inference in the k-variables equation (cont.)

I the previous matrix is the variance-covariance matrix of the error


term
I this matrix embodies two strong assumptions: homoskedasticity and
no serial correlation

24 / 46
inference in the k-variables equation (cont.)
Mean and Variance of b

I write the normal equations as

b = (X 0 X ) −1 X 0 y

I substitute for y to get

b = (X 0 X ) −1 X 0 (X β + u ) = β + (X 0 X ) −1 X 0 u

from which
b − β = (X 0 X ) −1 X 0 u
I take expectations (moving the expectation operator to the right past
non-stochastic terms such as X)

E (b − β ) = (X 0 X ) −1 X 0 E (u ) = 0

giving
E(b ) = β
25 / 46
inference in the k-variables equation (cont.)
I under the assumptions of the model, the LS estimators are
unbiased estimators of the β parameters
I to obtain the variance-covariance matrix of the LS estimators,
consider
var(b ) = E[(b − β)(b − β)0 ]
and substituting for b − β get

E[(b − β)(b − β)0 ] = E[(X 0 X )−1 X 0 uu 0 X (X 0 X )−1 ]


= (X 0 X )−1 X 0 E[uu 0 ]X (X 0 X )−1
= σ 2 (X 0 X ) −1

thus

var(b ) = σ2 (X 0 X )−1

26 / 46
inference in the k-variables equation (cont.)
Estimation of σ2

I the variance-covariance matrix of LS estimators involves the error


variance σ2 , which is unknown
I it is reasonable to base an estimate on the residual sum of squares
from the fitted regression
I write e = My = M (X β + u ) = Mu since MX = 0, so that

E(e 0 e ) = E(u 0 M 0 Mu ) = E(u 0 Mu )

I exploiting the fact that the trace of a scalar is the scalar, write

E(u 0 Mu ) = E[tr(u 0 Mu )]
= E[tr(uu 0 M )]
= σ2 tr(M )
= σ2 tr(I ) − σ2 tr[X (X 0 X )−1 X 0 ]
= σ2 tr(I ) − σ2 tr[(X 0 X )−1 (X 0 X )]
= σ 2 (n − k )
27 / 46
inference in the k-variables equation (cont.)

I thus
e0e
s2 =
n−k
defines an unbiased estimator of σ2
I the square root s is the standard deviation of the Y values about the
regression plane; it is referred to as standard error of the
estimator or standard error of the regression (SER)

28 / 46
inference in the k-variables equation (cont.)
Gauss-Markov theorem

I this is the fundamental LS theorem


I G-M states that, conditional on the assumptions made, no other
linear, unbiased estimator of the β coefficient can have smaller
sampling variances than those of the least-squares estimator

1. each LS estimator bi is a best linear unbiased estimator of the


population parameter β i
2. the BLUE of any combination of β’s is that same linear combination
of the b’s
3. the BLUE of E(Ys ) is

Ŷs = b1 + b2 X2s + b3 X3s + · · · + bk Xks

which is the value found by inserting a relevant vector of X values


into the regression model

29 / 46
testing linear hypotheses about β

I we have established the properties of the LS estimators of β


I now we show how to test hypotheses about β
I consider, for example
(i) H0 : βi = 0
(ii) H0 : β i = β i0
(iii) H0 : β2 + β3 = 1
(iv) H0 : β 3 = β 4 , or β 3 − β 4 = 0
(v) H0 :
0
   
β2
 β 3  0
 .  = .
   
 ..   .. 
βk 0
(vi) H0 : β2 = 0

30 / 46
testing linear hypotheses about β (cont.)
I all these examples fit into the general linear framework

Rβ = r

where R is a q × k matrix of known constants, with q < k, and r is a


q-vector of known constants. Each null hypothesis determines the
relevant elements in R and r
I for the previous examples we have
 
(i) R = 0 · · · 0 1 0 · · · 0 r =0 q=1
with 1 in th ith position 
(ii) R = 0 · · · 0 1 0 · · · 0 r = β i0 q=1
with 1 in th ith position 
(iii) R = 0 1 1 0 · · · 0  r =1 q=1
(iv) R = 0 0 1 −1 0 · · · 0 r =0 q=1
(v) R = 0 I k −1 r =0 q = k −1
where 0 is a vector of k − 1 zeros
(vi) R = 0k2 ×k1 I k2 r =0 q = k2

31 / 46
testing linear hypotheses about β (cont.)
I we now derive a general testing procedure for the general linear
hypothesis
H0 : R β − r = 0
I given the LS estimator, we can compute the vector (Rb − r ), which
measures the discrepancy between expectation and observation
I if this vector is “large”, it casts doubt on the null hypothesis
I the distinction between “large” and “small” is determined from the
sampling distribution under the null, in this case, the distribution of
Rb when R β = r
I from the unbiasedness result, it follows that

E(Rb ) = R β

I therefore

var(Rb ) = E[R (b − β)(b − β)0 R 0 ]


= R var(b )R 0
= σ 2 R (X 0 X ) −1 R 0
32 / 46
testing linear hypotheses about β (cont.)
I we know the mean and the variance of the vector Rb
I need a further assumption to determine the form of the sampling
distribution: since b is a function of the u vector, the sampling
distribution of Rb will be determined by the distribution of u
I assume that the ui are normally distributed so that,

u ∼ N (0, σ2 I )

I it follows that
b ∼ N [ β, σ2 (X 0 X )−1 ]
then
Rb ∼ N [R β, σ2 R (X 0 X )−1 R 0 ]
and so
R (b − β) ∼ N [0, σ2 R (X 0 X )−1 R 0 ]
I if the null hypothesis R β = r is true, then

(Rb − r ) ∼ N [0, σ2 R (X 0 X )−1 R 0 ]


33 / 46
testing linear hypotheses about β (cont.)
I this equation gives the sampling distribution of Rb, and we may
derive a χ2 variable, namely

(Rb − r )0 [σ2 R (X 0 X )−1 R 0 ]−1 (Rb − r ) ∼ χ2 (q )

I σ2 is unknown but it can be shown that


e0e
∼ χ2 (n − k )
σ2
and that this statistic is distributed independently of b
I a computable test statistic, which has an F distribution under the
null, is

(Rb − r )0 [R (X 0 X )−1 R 0 ]−1 (Rb − r )/q


∼ F (q, n − k )
e 0 e/(n − k )
I the test procedure is to reject R β = r if the computed F value
exceeds the relevant critical value

34 / 46
testing linear hypotheses about β (cont.)

I it could be helpful to write

(Rb − r )0 [s 2 R (X 0 X )−1 R 0 ]−1 (Rb − r )/q ∼ F (q, n − k )

thus, s 2 (X 0 X )−1 is the estimated variance-covariance matrix of b.


I if we let cij denote the i, j th element in (X 0 X )−1 then

s 2 cii = var(bi ) and s 2 cij = cov(bi , bj ) i, j = 1, 2, . . . , k

35 / 46
testing linear hypotheses about β (cont.)
I going back to the previous examples. . .
(i) H0 : β i = 0: Rb picks out bi and R (X 0 X )−1 R 0 picks out cii , the i th
diagonal element in (X 0 X )−1 . Thus we have

bi2 bi2
F = 2
= ∼ F (1, n − k )
s cii var(bi )

or, taking the square root,


bi bi
t= √ = ∼ t (n − k )
s cii s.e.(bi )

(ii) H0 : β i = β i0 : this hypothesis is tested by

bi − β i0
t= ∼ t (n − k )
s.e.(bi )

One may compute also compute a 95 % confidence interval for β i :

bi ± t0.025 s.e.(bi )

36 / 46
testing linear hypotheses about β (cont.)
(iii) H0 : β 2 + β 3 = 1: Rb gives the sum of the two estimated coefficients,
b2 + b3 . Premultiplying (X 0 X )−1 by R gives a row vector whose
elements are the sum of the corresponding elements in the second and
third rows of (X 0 X )−1 . Forming the inner product with R 0 gives the
sum of the second and third elements of the row vector, that is,
c22 + 2c23 + c33 , noting that c23 = c32 . Thus

s 2 R (X 0 X )−1 R 0 = s 2 (c22 + 2c23 + c33 )


= var(b2 ) + 2 cov(b2 , b3 ) + var(b3 )
= var(b2 + b3 )

The test statistic is then


(b2 + b3 − 1)
t= p ∼ t (n − k )
var(b2 + b3 )

Alternatively one may compute, say, a 95% percent confidence


interval for the sum ( β 2 + β 3 ) as
q
(b2 + b3 ) ± t0.025 var(b2 + b3 )
37 / 46
testing linear hypotheses about β (cont.)

(iv) H0 : β 3 = β 4 : the test statistic here is

b3 − b4
t= p ∼ t (n − k )
var(b3 − b4 )

(v) H0 : β 2 = β 3 = · · · = β k = 0: this case involves a composite


hypothesis about all k − 1 coefficients. The F statistic for testing the
joint significance of the complete set of regressors is

ESS/(k − 1)
F = ∼ F (k − 1, n − k )
RSS/(n − k )

This statistic may also be expressed as

R 2 / (k − 1)
F = ∼ F (k − 1, n − k )
(1 − R 2 ) / (n − k )

38 / 46
testing linear hypotheses about β (cont.)

(vi) H0 : β2 = 0: this hypothesis postulates that a subset of coefficients is


a zero vector. Partition the regression equation as follows:
 
  b1
y = X1 X2 + e = X 1b1 + X 2b2 + e
b2

where X 1 has k1 columns, including a column of ones, X 2 has


k2 = k − k1 columns, and b 1 and b 2 are the corresponding subvectors
of regression coefficients. The hypothesis may be tested by running
two separate regressions. First regress y on X 1 and denote the RSS
by e 0∗ e ∗ . Then run the regression on all the X s, obtaining the RSS,
denoted by e 0 e. The test statistic is

(e 0∗ e ∗ − e 0 e )/k2
F = ∼ F ( k2 , n − k )
e 0 e/(n − k )

39 / 46
restricted and unrestricted regressions

I examples (v) and (vi) may be interpreted as the outcome of two


separate regressions
I recall that ESS may be expressed ESS = y 0∗ y ∗ − e 0 e, where y ∗ = Ay
I it may be shown that y 0∗ y ∗ is the RSS when y ∗ is regressed on
x 1 (= i )
I in both cases (v) and (vi) the first regression may be regarded as a
restricted regression and the second as an unrestricted
regression.
I e 0∗ e ∗ is the restricted RSS and e 0 e is the unrestricted RSS

40 / 46
fitting the restricted regressions
I question: how to fit the restricted regression?
I answer: 1) either work out each specific case from first principles; 2)
or derive a general formula into which specific cases can be fitted
I (1) as for the first approach, consider example (iii) with the regression
in deviation form,
y = b 2 x2 + b 3 x3 + e
I want to impose the restriction that b2 + b3 = 1. Substituting the
restriction in the regression gives

y = b2 x2 + (1 − b2 )x3 + e∗ or

(y − x3 ) = b2 (x2 − x3 ) + e∗
so as to form two new variables (y − x3 ) and (x2 − x3 ): the simple
regression of the first on the second (without the constant) gives the
restricted estimate of b2 ; the RSS from this regression is the
restricted RSS, e 0∗ e ∗ .
41 / 46
fitting the restricted regressions (cont.)
I (2) the general approach requires a b ∗ vector that minimizes the RSS
subject to the restrictions Rb ∗ = r . To do so set up the function

φ = (y − Xb ∗ )0 (y − Xb ∗ ) − 2λ0 (Rb ∗ − r )

where λ is a q-vector of Lagrange multipliers


I the first-order conditions are
∂φ
= −2X 0 y + 2(X 0 X )b ∗ − 2R 0 λ = 0
∂b ∗
∂φ
= −2(Rb ∗ − r ) = 0
∂λ
I the solution for b ∗ is

b ∗ = b + (X 0 X )−1 R 0 [R (X 0 X )−1 R 0 ]−1 (r − Rb )

where b is the unrestricted LS estimator (X 0 X )−1 X 0 y.

42 / 46
fitting the restricted regressions (cont.)
I the residuals from the restricted regression are

e ∗ = y − Xb ∗
= y − Xb − X (b ∗ − b )
= e − X (b ∗ − b )
I transposing and multiplying, we obtain

e 0∗ e ∗ = e 0 e + (b ∗ − b )0 X 0 X (b ∗ − b )
I the process of substituting for (b ∗ − b ) and simplifying gives

e 0∗ e ∗ − e 0 e = (r − Rb )0 [R (X 0 X )−1 R 0 ]−1 (r − Rb )

where, apart from q, the expression on the RHS is the same as the
numerator in the F statistic
I thus an alternative expression of the test statistic for H0 : Rb = r is
(e 0∗ e ∗ − e 0 e )/q
F = ∼ F (q, n − k )
e 0 e/(n − k )
43 / 46
prediction

I suppose that we have fitted a regression model, and we know consider


some specific vector of regressor values,

c 0 = 1 X2f · · · Xkf
 

I we wish to predict the value of Y conditional on c


I a point prediction is obtained by inserting the given X values into the
regression equation, giving

Ŷf = b1 + b2 X2f + · · · + bk Xkf = c 0 b

I Gauss-Markov theorem shows that c 0 b is a BLUE of c 0 β; here


c 0 β = E(Yf ) so that Ŷf is an optimal predictor of E(Yf )
I as var(Rb ) = R var(b )R 0 , replacing R by c 0 gives

var(c 0 b ) = c 0 var(b )c

44 / 46
prediction (cont.)
I if we assume normality for the error term, it follows that

c 0b − c 0 β
p ∼ N (0, 1)
var(c 0 b )

I when the unknown σ2 in var(b ) is replaced by s 2 , we have

Ŷ − E(Yf )
pf ∼ t (n − k )
s c 0 (X 0 X ) −1 c

from which a 95% confidence interval for E(Yf ) is


q
Ŷf ± t0.025 s c 0 (X 0 X )−1 c

I to obtain a confidence interval for Yf rather than E(Yf ), consider


they differ only by the error uf that appears in the prediction period
I the point prediction is the same as before, but uncertainty of
prediction increases
45 / 46
prediction (cont.)

I we have Ŷf = c 0 b as before and now Yf = c 0 β + uf so that the


prediction error is

ef = Yf − Ŷf = uf − c 0 (b − β)

I squaring both sides and taking expectations gives the variance of the
prediction error

var(ef ) = σ2 + c 0 var(b )c
σ 2 (1 + c 0 (X 0 X ) −1 c )

from which we derive a t statistic

Ŷf − Yf
∼ t (n − k )
1 + c 0 (X 0 X ) −1 c
p
s

46 / 46

You might also like