Econometrics (EM2008) The K-Variable Linear Regression Model

Econometrics [EM2008]
Lecture 2
The k-variable linear regression model
Irene Mammi
irene.mammi@unive.it
Academic Year 2018/2019
1 / 46
outline
I the k-variable linear regression model

I matrix formulation
I partial correlation coefficients
I inference
I prediction
I References:
I Johnston, J. and J. DiNardo (1997), Econometrics Methods, 4th
Edition, McGraw-Hill, New York, Chapter 3.
2 / 46
the multivariate model
I the bivariate framework is too restrictive for realistic analysis of

economic phenomena
I generally more useful to specify multivariate relations
I restrict the analysis to a single equation which now include k variables
I the specification of such a relationship is
Yt = β 1 + β 2 X2t + β 3 X3t + · · · + β k Xkt + ut t = 1, . . . , n
which identifies k − 1 explanatory variables, namely X2 , X3 , . . . , Xk ,

that are thought to influence the dependent variable
I nb: the X may be transformations of other variables but the
relationship is linear in the β coefficients
I assume that the disturbances are white noise
I k + 1 parameters to estimate, the β’s and the disturbance variance σ2
3 / 46
matrix formulation of the k-variable model
I matrices indicated by uppercase bold letters, vectors by lowercase
bold letters
I vectors generally taken as column vectors
I for example,
   
Y1 X21
 Y2  X22 
y= .  x2 =  . 
   
 ..   .. 
Yn X2n
are n × 1 vectors, also referred to n-vectors, containing the sample
observations on Y and X2
I the n sample observations on the k-variable model can be written as
.. .. .. .. ..
         
. . .  .  .
y  = β 1 x 1  + β 2 x 2  + · · · + β k x k  + u 
         
.. .. .. .. ..
. . . . .
4 / 46
matrix formulation of the k-variable model (cont.)
I the y vector is expressed as a linear combination of the x vectors plus

the disturbance vector u
I the x 1 vector is a column of ones to allow for the intercept term
I collecting all the x vectors intro a matrix X and the β coefficients
into a vector β, can write
y = Xβ + u
where
· · · Xk1
   
1 X21 β1
1 X22 · · · Xk2   β2 
X = . and β= . 
   
.. .. .. 
 .. . . .   .. 
1 X2n · · · Xkn βk
5 / 46
the algebra of least squares
I if the unknown vector β is replaced by some guess or estimate b, this

defines a vector of residuals e,
e = y − Xb
I the least squares principle is to choose b to minimize the residual

sum of squares e 0 e, namely,
RSS = e 0 e
= (y − Xb )0 (y − Xb )
= y 0 y − b 0 X 0 y − y 0 Xb + b 0 X 0 Xb
= y 0 y − 2b 0 X 0 y + b 0 X 0 Xb
6 / 46
the algebra of least squares (cont.)
I the first order-conditions for the minimization are
∂(RSS )
= −2X 0 y + 2X 0 Xb = 0
∂b
giving the normal equations
(X 0 X )b = X 0 y
I if y is replaced by Xb + e the result is
(X 0 X )b = X 0 (Xb + e ) = (X 0 X )b + X 0 e
thus
X 0e = 0
which is another fundamental least-squares result
I the first element in this equation gives ∑ et = 0, that is,
ē = Ȳ − b1 − b2 X̄2 − · · · − bk X̄k = 0
7 / 46
I ⇒ the residuals have zero mean, and the regression plane passes
through the point of means in k dimensional space
I the remaining elements are of the form
∑ Xit et = 0 i = 2, . . . , k
t
which implies that each regressor has zero sample correlation with the
residuals
I this, in turn,implies that ŷ (= Xb ), the vector of the regression values
for Y , is uncorrelated with e, for
ŷ 0 e = (Xb )0 e = b 0 X 0 e = 0
8 / 46
normal equations for the two-variable case
I here, k = 2 and the model of interest is Y = β 1 + β 2 X + u

I the X matrix is  
1 X1
1 X2 
X = .
 
.. 
 .. . 
1 Xn
thus,
 
1 X1
1 X2  ∑X

1 1 ··· 1  n
X 0X = =

X1 X2 ··· Xn  ... ..
∑ X ∑ X2
 
. 
1 Xn
9 / 46
and  
Y1
1 Y2  ∑Y
 
1 1 ···
X 0y =  ..  =
X1 X2 ··· Xn  .  ∑ XY
Yn
giving
∑X ∑Y

n b1
=
∑X ∑ X2 b2 ∑ XY
or
nb1 + b2 ∑ X = ∑Y
b1 ∑ X + b2 ∑ X 2
= ∑ XY
10 / 46
normal equations for the three-variable case
I in a similar way, it may be shown that the normal equations for fitting
a three-variable equation by least squares are
nb1 + b2 ∑ X2 + b3 ∑ X3 = ∑Y
b1 ∑ X2 + b2 ∑ X22 + b3 ∑ X2 X3 = ∑ X2 Y
b1 ∑ X3 + b2 ∑ X2 X3 + b3 ∑ X32 = ∑ X3 Y
11 / 46
decomposition of the sum of squares
I the zero covariances between regressors and the residuals underlie the
decomposition of the sum of squares
I decomposing the y vector into the part explained by the regression
and the unexplained part, we have
y = ŷ + e = Xb + e
from which it follows that
y 0 y = (ŷ + e )0 (ŷ + e ) = ŷ 0 ŷ + e 0 e = b 0 X 0 Xb + e 0 e
I however, y 0 y = ∑nt=1 Yt2 is the sum of squares of actual Y values;

interest normally centers on analyzing the variation in Y , measured by
the sum of the squared deviations from the sample mean, namely,
∑(Yt − Ȳt )2 = ∑ Yt2 − nȲ 2

t t
12 / 46
decomposition of the sum of squares (cont.)
I subtracting nȲ 2 from each side of the previous decomposition gives
(y 0 y − nȲ 2 ) = (b 0 X 0 Xb − nȲ 2 ) + e0e

TSS = ESS + RSS
where TSS indicates the total sum of squares in Y , and ESS and
RSS the explained and residual (unexplained) sum of squares
13 / 46
equation in deviation form
I alternatively, express all the data in the form of deviations from the
sample mean
I the least-squares equation is
Yt = b1 + b2 X2t + b3 X3t + · · · + bk Xkt + et t = 1, . . . , n
I averaging over the sample observations gives
Ȳ = b1 + bx X̄2 + b3 X̄3 + · · · + bk X̄k
which contains no term in e, since ē is zero

I subtracting the second equation from the first gives
yt = b2 x2t + b3 x3t + · · · + bk xkt + et t = 1, . . . , n
I intercept b1 disappears, but it may be recovered from
b1 = Ȳ − b2 X̄2 − · · · − bk X̄k
14 / 46
equation in deviation form (cont.)
I nb: the least-squares slope coefficients b2 , . . . , bk are identical in both
forms of the regression equation, and so the residuals
I collecting all n observations, the deviation form of the equation may
be written compactly using a transformation matrix

1
A = In − ii0
n
where i is a column vector of n ones

I it follows that Ae = e and Ai = 0
I write the least-squares equation as

b1
y = Xb + e = i X2 +e
b2
where X 2 is the n × (k − 1) matrix of observations on the regressors

and b 2 is the k − 1 element vector containing the coefficients
b2 , b3 , . . . , bk
15 / 46
I premultiplying by A gives

b1
Ay = 0 AX 2 + Ae = (AX 2 )b 2 + e
b2
or
y ∗ = X ∗b2 + e
where y ∗ = Ay and X ∗ = AX 2 give the data in deviation form.
Since X 0 e = 0, it follows that X 0∗ e = 0.
I premultiplying previous equation by X 0∗ gives
X 0∗ y ∗ = (X 0∗ X ∗ )b 2
which are the familiar normal equations, except that now the data
have all been expresses in deviation form and the b 2 vector contains
the k − 1 slope coefficients and excludes the intercept term
16 / 46
I the decomposition of the sum of squares may be expressed as
y 0∗ y ∗ = b 20 X 0∗ X ∗ b 2 + e0e
TSS = ESS + RSS
I the coefficient of multiple correlation R is defined as the positive

square root of
ESS RSS
R2 = = 1−
TSS TSS
2
I the adjusted R is defined as
RSS/(n − k )
R̄ 2 = 1 −
TSS/(n − 1)
I the numerator and the denominator on the RHS are unbiased
estimators of the disturbance variance and the variance of Y
17 / 46
I the relation between the adjusted and unadjusted coefficients is
n−1
R̄ 2 = 1 − (1 − R 2 )
n−k
1−k n−1 2
= + R
n−k n−k
I two alternative criteria for comparing the fit of specifications are the
Schwarz criterion
e0e k
SC = ln + ln n
n n
and the Akaike information criterion
e0e 2k
AIC = ln +
n n
18 / 46
generalizing partial correlation
I the normal equations solve for b = (X 0 X )−1 X 0 y
I the residuals from the LS regression may be expressed as
e = y − Xb = y − X (X 0 X )−1 X 0 y = My
where
M = I − X (X 0 X ) −1 X 0
I M is a symmetric, idempotent matrix; it also has the properties that
MX = 0 and Me = e
I now write the general regression in partitioned form as

b2
y = x2 X∗ +e
b (2)
I in this partitioning x 2 is the n × 1 vector of observations on X2 , with

coefficient b2 , and X ∗ is the n × (k − 1) matrix of all the other
variables (including the column of ones) with coefficient vector b (2)
19 / 46
generalizing partial correlation (cont.)
I the normal equations for this setup are
0
x 2 x 2 x 20 X ∗
0
b2 x y
= 20
X 0∗ x 2 X 0∗ X ∗ b (2) X ∗y
I the solution for b2 is
b2 = (x 20 M ∗ x 2 )−1 (x 20 M ∗ y )
where
M ∗ = I − X ∗ (X 0∗ X ∗ )−1 X 0∗
M ∗ is a symmetric, idempotent matrix with the properties
M ∗ X ∗ = 0 and M ∗ e = e
I we have that
M ∗ y is the vector of residuals when y is regressed on X ∗

M ∗ x 2 is the vector of residuals when x 2 is regressed on X ∗
20 / 46
generalizing partial correlation (cont.)
I regressing the first vector on the second gives a slope coefficient,
which, using the simmetry and idempotency of M ∗ , gives the b2
coefficient defined above
I a simpler way to prove the same result is as follows: write the
partitioned regression as
y = x 2 b2 + X ∗ b ( 2 ) + e
I premultiplying by M ∗ , obtain
M ∗ y = ( M ∗ x 2 ) b2 + e
I finally, premultiply by x 20 which gives
x 20 M ∗ y = (x 20 M ∗ x 2 )b2
21 / 46
inference in the k-variables equation
assumptions
1. X is nonstochastic and has full rank k

2. the errors have the properties
E(u ) = 0
and
var(u ) = E(uu 0 ) = σ2 I
I since the expected value operator is applied to every element of a

vector or matrix, we have
     
u1 E(u1 ) 0
u2  E(u2 )  0
E(u ) = E  .  =  .  =  .  = 0
     
 ..   ..   .. 
un E ( un ) 0
22 / 46
inference in the k-variables equation (cont.)
u1
  
  u2  
E(uu 0 ) = E  .  u1 u2 ··· un 
  
 ..  
un
E(u12 ) E ( u1 u2 ) · · · E ( u1 un )
 
 E ( u2 u1 ) 2
E ( u2 ) · · · E ( u2 un ) 
=
 
.. .. .. .. 
 . . . . 
E(un u1 ) E(un u2 ) · · · E(un2 )
var(u1 ) cov(u1 , u2 ) · · · cov(u1 , un )
 
cov(u2 , u1 ) var(u2 ) · · · cov(u2 , un )
=
 
.. .. .. .. 
 . . . . 
cov(un , u1 ) cov(un , u2 ) · · · var(un )
 2
0 ··· 0

σ
 0 σ2 · · · 0 
2
= . ..  = σ I
 
.. ..
 .. . . . 
0 0 · · · σ2
23 / 46
I the previous matrix is the variance-covariance matrix of the error

term
I this matrix embodies two strong assumptions: homoskedasticity and
no serial correlation
24 / 46
Mean and Variance of b
I write the normal equations as
b = (X 0 X ) −1 X 0 y
I substitute for y to get
b = (X 0 X ) −1 X 0 (X β + u ) = β + (X 0 X ) −1 X 0 u
from which
b − β = (X 0 X ) −1 X 0 u
I take expectations (moving the expectation operator to the right past
non-stochastic terms such as X)
E (b − β ) = (X 0 X ) −1 X 0 E (u ) = 0
giving
E(b ) = β
25 / 46
I under the assumptions of the model, the LS estimators are
unbiased estimators of the β parameters
I to obtain the variance-covariance matrix of the LS estimators,
consider
var(b ) = E[(b − β)(b − β)0 ]
and substituting for b − β get
E[(b − β)(b − β)0 ] = E[(X 0 X )−1 X 0 uu 0 X (X 0 X )−1 ]

= (X 0 X )−1 X 0 E[uu 0 ]X (X 0 X )−1
= σ 2 (X 0 X ) −1
thus
var(b ) = σ2 (X 0 X )−1
26 / 46
Estimation of σ2
I the variance-covariance matrix of LS estimators involves the error

variance σ2 , which is unknown
I it is reasonable to base an estimate on the residual sum of squares
from the fitted regression
I write e = My = M (X β + u ) = Mu since MX = 0, so that
E(e 0 e ) = E(u 0 M 0 Mu ) = E(u 0 Mu )
I exploiting the fact that the trace of a scalar is the scalar, write
E(u 0 Mu ) = E[tr(u 0 Mu )]
= E[tr(uu 0 M )]
= σ2 tr(M )
= σ2 tr(I ) − σ2 tr[X (X 0 X )−1 X 0 ]
= σ2 tr(I ) − σ2 tr[(X 0 X )−1 (X 0 X )]
= σ 2 (n − k )
27 / 46
I thus
e0e
s2 =
n−k
defines an unbiased estimator of σ2
I the square root s is the standard deviation of the Y values about the
regression plane; it is referred to as standard error of the
estimator or standard error of the regression (SER)
28 / 46
Gauss-Markov theorem
I this is the fundamental LS theorem

I G-M states that, conditional on the assumptions made, no other
linear, unbiased estimator of the β coefficient can have smaller
sampling variances than those of the least-squares estimator
1. each LS estimator bi is a best linear unbiased estimator of the

population parameter β i
2. the BLUE of any combination of β’s is that same linear combination
of the b’s
3. the BLUE of E(Ys ) is
Ŷs = b1 + b2 X2s + b3 X3s + · · · + bk Xks
which is the value found by inserting a relevant vector of X values

into the regression model
29 / 46
testing linear hypotheses about β
I we have established the properties of the LS estimators of β

I now we show how to test hypotheses about β
I consider, for example
(i) H0 : βi = 0
(ii) H0 : β i = β i0
(iii) H0 : β2 + β3 = 1
(iv) H0 : β 3 = β 4 , or β 3 − β 4 = 0
(v) H0 :
0
   
β2
 β 3  0
 .  = .
   
 ..   .. 
βk 0
(vi) H0 : β2 = 0
30 / 46
testing linear hypotheses about β (cont.)
I all these examples fit into the general linear framework
Rβ = r
where R is a q × k matrix of known constants, with q < k, and r is a

q-vector of known constants. Each null hypothesis determines the
relevant elements in R and r
I for the previous examples we have

(i) R = 0 · · · 0 1 0 · · · 0 r =0 q=1
with 1 in th ith position
(ii) R = 0 · · · 0 1 0 · · · 0 r = β i0 q=1
with 1 in th ith position
(iii) R = 0 1 1 0 · · · 0 r =1 q=1
(iv) R = 0 0 1 −1 0 · · · 0 r =0 q=1
(v) R = 0 I k −1 r =0 q = k −1
where 0 is a vector of k − 1 zeros
(vi) R = 0k2 ×k1 I k2 r =0 q = k2
31 / 46
I we now derive a general testing procedure for the general linear
hypothesis
H0 : R β − r = 0
I given the LS estimator, we can compute the vector (Rb − r ), which
measures the discrepancy between expectation and observation
I if this vector is “large”, it casts doubt on the null hypothesis
I the distinction between “large” and “small” is determined from the
sampling distribution under the null, in this case, the distribution of
Rb when R β = r
I from the unbiasedness result, it follows that
E(Rb ) = R β
I therefore
var(Rb ) = E[R (b − β)(b − β)0 R 0 ]

= R var(b )R 0
= σ 2 R (X 0 X ) −1 R 0
32 / 46
I we know the mean and the variance of the vector Rb
I need a further assumption to determine the form of the sampling
distribution: since b is a function of the u vector, the sampling
distribution of Rb will be determined by the distribution of u
I assume that the ui are normally distributed so that,
u ∼ N (0, σ2 I )
I it follows that
b ∼ N [ β, σ2 (X 0 X )−1 ]
then
Rb ∼ N [R β, σ2 R (X 0 X )−1 R 0 ]
and so
R (b − β) ∼ N [0, σ2 R (X 0 X )−1 R 0 ]
I if the null hypothesis R β = r is true, then
(Rb − r ) ∼ N [0, σ2 R (X 0 X )−1 R 0 ]

33 / 46
I this equation gives the sampling distribution of Rb, and we may
derive a χ2 variable, namely
(Rb − r )0 [σ2 R (X 0 X )−1 R 0 ]−1 (Rb − r ) ∼ χ2 (q )
I σ2 is unknown but it can be shown that

e0e
∼ χ2 (n − k )
σ2
and that this statistic is distributed independently of b
I a computable test statistic, which has an F distribution under the
null, is
(Rb − r )0 [R (X 0 X )−1 R 0 ]−1 (Rb − r )/q

∼ F (q, n − k )
e 0 e/(n − k )
I the test procedure is to reject R β = r if the computed F value
exceeds the relevant critical value
34 / 46
I it could be helpful to write
(Rb − r )0 [s 2 R (X 0 X )−1 R 0 ]−1 (Rb − r )/q ∼ F (q, n − k )
thus, s 2 (X 0 X )−1 is the estimated variance-covariance matrix of b.

I if we let cij denote the i, j th element in (X 0 X )−1 then
s 2 cii = var(bi ) and s 2 cij = cov(bi , bj ) i, j = 1, 2, . . . , k
35 / 46
I going back to the previous examples. . .
(i) H0 : β i = 0: Rb picks out bi and R (X 0 X )−1 R 0 picks out cii , the i th
diagonal element in (X 0 X )−1 . Thus we have
bi2 bi2
F = 2
= ∼ F (1, n − k )
s cii var(bi )
or, taking the square root,

bi bi
t= √ = ∼ t (n − k )
s cii s.e.(bi )
(ii) H0 : β i = β i0 : this hypothesis is tested by
bi − β i0
t= ∼ t (n − k )
s.e.(bi )
One may compute also compute a 95 % confidence interval for β i :
bi ± t0.025 s.e.(bi )
36 / 46
(iii) H0 : β 2 + β 3 = 1: Rb gives the sum of the two estimated coefficients,
b2 + b3 . Premultiplying (X 0 X )−1 by R gives a row vector whose
elements are the sum of the corresponding elements in the second and
third rows of (X 0 X )−1 . Forming the inner product with R 0 gives the
sum of the second and third elements of the row vector, that is,
c22 + 2c23 + c33 , noting that c23 = c32 . Thus
s 2 R (X 0 X )−1 R 0 = s 2 (c22 + 2c23 + c33 )

= var(b2 ) + 2 cov(b2 , b3 ) + var(b3 )
= var(b2 + b3 )
The test statistic is then

(b2 + b3 − 1)
t= p ∼ t (n − k )
var(b2 + b3 )
Alternatively one may compute, say, a 95% percent confidence

interval for the sum ( β 2 + β 3 ) as
q
(b2 + b3 ) ± t0.025 var(b2 + b3 )
37 / 46
(iv) H0 : β 3 = β 4 : the test statistic here is
b3 − b4
t= p ∼ t (n − k )
var(b3 − b4 )
(v) H0 : β 2 = β 3 = · · · = β k = 0: this case involves a composite

hypothesis about all k − 1 coefficients. The F statistic for testing the
joint significance of the complete set of regressors is
ESS/(k − 1)
F = ∼ F (k − 1, n − k )
RSS/(n − k )
This statistic may also be expressed as
R 2 / (k − 1)
F = ∼ F (k − 1, n − k )
(1 − R 2 ) / (n − k )
38 / 46
(vi) H0 : β2 = 0: this hypothesis postulates that a subset of coefficients is

a zero vector. Partition the regression equation as follows:

b1
y = X1 X2 + e = X 1b1 + X 2b2 + e
b2
where X 1 has k1 columns, including a column of ones, X 2 has

k2 = k − k1 columns, and b 1 and b 2 are the corresponding subvectors
of regression coefficients. The hypothesis may be tested by running
two separate regressions. First regress y on X 1 and denote the RSS
by e 0∗ e ∗ . Then run the regression on all the X s, obtaining the RSS,
denoted by e 0 e. The test statistic is
(e 0∗ e ∗ − e 0 e )/k2
F = ∼ F ( k2 , n − k )
e 0 e/(n − k )
39 / 46
restricted and unrestricted regressions
I examples (v) and (vi) may be interpreted as the outcome of two

separate regressions
I recall that ESS may be expressed ESS = y 0∗ y ∗ − e 0 e, where y ∗ = Ay
I it may be shown that y 0∗ y ∗ is the RSS when y ∗ is regressed on
x 1 (= i )
I in both cases (v) and (vi) the first regression may be regarded as a
restricted regression and the second as an unrestricted
regression.
I e 0∗ e ∗ is the restricted RSS and e 0 e is the unrestricted RSS
40 / 46
fitting the restricted regressions
I question: how to fit the restricted regression?
I answer: 1) either work out each specific case from first principles; 2)
or derive a general formula into which specific cases can be fitted
I (1) as for the first approach, consider example (iii) with the regression
in deviation form,
y = b 2 x2 + b 3 x3 + e
I want to impose the restriction that b2 + b3 = 1. Substituting the
restriction in the regression gives
y = b2 x2 + (1 − b2 )x3 + e∗ or
(y − x3 ) = b2 (x2 − x3 ) + e∗
so as to form two new variables (y − x3 ) and (x2 − x3 ): the simple
regression of the first on the second (without the constant) gives the
restricted estimate of b2 ; the RSS from this regression is the
restricted RSS, e 0∗ e ∗ .
41 / 46
fitting the restricted regressions (cont.)
I (2) the general approach requires a b ∗ vector that minimizes the RSS
subject to the restrictions Rb ∗ = r . To do so set up the function
φ = (y − Xb ∗ )0 (y − Xb ∗ ) − 2λ0 (Rb ∗ − r )
where λ is a q-vector of Lagrange multipliers

I the first-order conditions are
∂φ
= −2X 0 y + 2(X 0 X )b ∗ − 2R 0 λ = 0
∂b ∗
∂φ
= −2(Rb ∗ − r ) = 0
∂λ
I the solution for b ∗ is
b ∗ = b + (X 0 X )−1 R 0 [R (X 0 X )−1 R 0 ]−1 (r − Rb )
where b is the unrestricted LS estimator (X 0 X )−1 X 0 y.
42 / 46
fitting the restricted regressions (cont.)
I the residuals from the restricted regression are
e ∗ = y − Xb ∗
= y − Xb − X (b ∗ − b )
= e − X (b ∗ − b )
I transposing and multiplying, we obtain
e 0∗ e ∗ = e 0 e + (b ∗ − b )0 X 0 X (b ∗ − b )
I the process of substituting for (b ∗ − b ) and simplifying gives
e 0∗ e ∗ − e 0 e = (r − Rb )0 [R (X 0 X )−1 R 0 ]−1 (r − Rb )
where, apart from q, the expression on the RHS is the same as the
numerator in the F statistic
I thus an alternative expression of the test statistic for H0 : Rb = r is
(e 0∗ e ∗ − e 0 e )/q
F = ∼ F (q, n − k )
e 0 e/(n − k )
43 / 46
prediction
I suppose that we have fitted a regression model, and we know consider

some specific vector of regressor values,
c 0 = 1 X2f · · · Xkf

I we wish to predict the value of Y conditional on c

I a point prediction is obtained by inserting the given X values into the
regression equation, giving
Ŷf = b1 + b2 X2f + · · · + bk Xkf = c 0 b
I Gauss-Markov theorem shows that c 0 b is a BLUE of c 0 β; here

c 0 β = E(Yf ) so that Ŷf is an optimal predictor of E(Yf )
I as var(Rb ) = R var(b )R 0 , replacing R by c 0 gives
var(c 0 b ) = c 0 var(b )c
44 / 46
prediction (cont.)
I if we assume normality for the error term, it follows that
c 0b − c 0 β
p ∼ N (0, 1)
var(c 0 b )
I when the unknown σ2 in var(b ) is replaced by s 2 , we have
Ŷ − E(Yf )
pf ∼ t (n − k )
s c 0 (X 0 X ) −1 c
from which a 95% confidence interval for E(Yf ) is

q
Ŷf ± t0.025 s c 0 (X 0 X )−1 c
I to obtain a confidence interval for Yf rather than E(Yf ), consider

they differ only by the error uf that appears in the prediction period
I the point prediction is the same as before, but uncertainty of
prediction increases
45 / 46
prediction (cont.)
I we have Ŷf = c 0 b as before and now Yf = c 0 β + uf so that the

prediction error is
ef = Yf − Ŷf = uf − c 0 (b − β)
I squaring both sides and taking expectations gives the variance of the
prediction error
var(ef ) = σ2 + c 0 var(b )c
σ 2 (1 + c 0 (X 0 X ) −1 c )
from which we derive a t statistic
Ŷf − Yf
∼ t (n − k )
1 + c 0 (X 0 X ) −1 c
p
s
46 / 46

Econometrics (EM2008) The K-Variable Linear Regression Model

Uploaded by

Document Information

Original Title

Copyright

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Econometrics (EM2008) The K-Variable Linear Regression Model

Uploaded by

Copyright:

Econometrics [EM2008]

Academic Year 2018/2019

I the k-variable linear regression model

I the bivariate framework is too restrictive for realistic analysis of

Yt = β 1 + β 2 X2t + β 3 X3t + · · · + β k Xkt + ut t = 1, . . . , n

which identifies k − 1 explanatory variables, namely X2 , X3 , . . . , Xk ,

I the y vector is expressed as a linear combination of the x vectors plus

I if the unknown vector β is replaced by some guess or estimate b, this

I the least squares principle is to choose b to minimize the residual

I if y is replaced by Xb + e the result is

normal equations for the two-variable case

I here, k = 2 and the model of interest is Y = β 1 + β 2 X + u

normal equations for the three-variable case

from which it follows that

I however, y 0 y = ∑nt=1 Yt2 is the sum of squares of actual Y values;

∑(Yt − Ȳt )2 = ∑ Yt2 − nȲ 2

I subtracting nȲ 2 from each side of the previous decomposition gives

(y 0 y − nȲ 2 ) = (b 0 X 0 Xb − nȲ 2 ) + e0e

Yt = b1 + b2 X2t + b3 X3t + · · · + bk Xkt + et t = 1, . . . , n

I averaging over the sample observations gives

Ȳ = b1 + bx X̄2 + b3 X̄3 + · · · + bk X̄k

which contains no term in e, since ē is zero

yt = b2 x2t + b3 x3t + · · · + bk xkt + et t = 1, . . . , n

I intercept b1 disappears, but it may be recovered from

where i is a column vector of n ones

where X 2 is the n × (k − 1) matrix of observations on the regressors

I the coefficient of multiple correlation R is defined as the positive

and the Akaike information criterion

I in this partitioning x 2 is the n × 1 vector of observations on X2 , with

I the solution for b2 is

M ∗ y is the vector of residuals when y is regressed on X ∗

I finally, premultiply by x 20 which gives

1. X is nonstochastic and has full rank k

I since the expected value operator is applied to every element of a

I the previous matrix is the variance-covariance matrix of the error

I write the normal equations as

I substitute for y to get

E[(b − β)(b − β)0 ] = E[(X 0 X )−1 X 0 uu 0 X (X 0 X )−1 ]

I the variance-covariance matrix of LS estimators involves the error

E(e 0 e ) = E(u 0 M 0 Mu ) = E(u 0 Mu )

I this is the fundamental LS theorem

1. each LS estimator bi is a best linear unbiased estimator of the

Ŷs = b1 + b2 X2s + b3 X3s + · · · + bk Xks

which is the value found by inserting a relevant vector of X values

I we have established the properties of the LS estimators of β

where R is a q × k matrix of known constants, with q < k, and r is a

var(Rb ) = E[R (b − β)(b − β)0 R 0 ]

(Rb − r ) ∼ N [0, σ2 R (X 0 X )−1 R 0 ]

(Rb − r )0 [σ2 R (X 0 X )−1 R 0 ]−1 (Rb − r ) ∼ χ2 (q )

I σ2 is unknown but it can be shown that

(Rb − r )0 [R (X 0 X )−1 R 0 ]−1 (Rb − r )/q

I it could be helpful to write

(Rb − r )0 [s 2 R (X 0 X )−1 R 0 ]−1 (Rb − r )/q ∼ F (q, n − k )

thus, s 2 (X 0 X )−1 is the estimated variance-covariance matrix of b.

s 2 cii = var(bi ) and s 2 cij = cov(bi , bj ) i, j = 1, 2, . . . , k

or, taking the square root,

(ii) H0 : β i = β i0 : this hypothesis is tested by

One may compute also compute a 95 % confidence interval for β i :

s 2 R (X 0 X )−1 R 0 = s 2 (c22 + 2c23 + c33 )

The test statistic is then

Alternatively one may compute, say, a 95% percent confidence

(iv) H0 : β 3 = β 4 : the test statistic here is

(v) H0 : β 2 = β 3 = · · · = β k = 0: this case involves a composite

This statistic may also be expressed as

(vi) H0 : β2 = 0: this hypothesis postulates that a subset of coefficients is

where X 1 has k1 columns, including a column of ones, X 2 has