Professional Documents
Culture Documents
17/02/2010
Notation
y1 y y2 yn column vector containing the n sample observations on the dependent variable y. (1)
x 1k xk x 2k x nk column vector containing the n sample observations on the independent variable x k , with k 1, 2, . . . , K. (2)
x1 x2
xK
(3)
n K data matrix containing the n sample observations on the K independent variables. Usually the vector x 1 is assumed to be a column of 1s (constant).
The K unknown parameters of the model can be collected in a column vector, 1 2 K and the model can be rewritten in compact form: y x 1 1 x 2 2 . . . x K K y X (6)
(7)
(8)
First implication of strict exogeneity The unconditional mean is also zero. In fact, by the Law of Total Expectations: (10) E i E X E i |X 0 i 1, 2, . . . , n
Second implication of strict exogeneity The regressors are orthogonal to the error term for all observations: E i x jk 0 i, j 1, 2, . . . , n; k 1, 2, . . . , K
(11)
Third implication of strict exogeneity The orthogonality conditions are equivalent to zero-correlation conditions Covx jk , i Ex jk i Ex jk E i 0
(12)
(13)
No-correlation assumption The conditional second cross-moment between i and j is zero for all i j E i j |X 0 for i j (14)
Since:
2 2 2 Var i |X E 2 i |X E i |X E i |X
and Cov i , j |X E i j |X E i |XE j |X E i j |X 0 the two assumptions can be written as: E |X Var|X 2 I where I is a n n identity matrix.
the Given an arbitrary choice for the coefficient vector, to minimize the function minimization problem is to choose , where: S y X y X S Note that y X i y i x i 2 i 2 y X i The set of first order condition is: S 0 2X y 2X X
(19) (20)
(21)
Let b be the solution. Then b satisfies the least squares normal equations: X Xb X y If the inverse of X X exists (which follows from the full column rank assumption) the solution is: b X X 1 X y and the least squares residuals can be written as: e y Xb For this solution to minimize the sum of squares 2 Sb 2 X X b b must be a positive definite matrix (X has full column rank).
(22)
The normal equations X Xb X y imply that X y Xb X e 0 Therefore, for every column x k of X xke 0 and, if the first column of X is a column of 1s x1e i e 0 In words, the least squares residuals sum to zero.
(30) (31)
where:
1 1 S XX X X n n x i x i (32) i 1 xiyi s XY X y 1 n n i Intuition: S XX and s XY can be thought as sample averages of x i x i and x i y i respectively. This form is utilized in large sample theory.
Projection matrix P XX X 1 X Annihilator matrix M IP Both P and M are n n, symmetric and idempotent PX X MX 0 (36) (37) (35)
Sum of Squared Residuals (RSS) RSS e e M In fact e y Xb y XX X 1 X y My MX M and, after squaring both terms we obtain: e e M
Estimate of the Variance of the Error Term The OLS estimate of 2 , denoted s 2 , is the sum of squared residuals divided by n K e e 2 s nK Standard Error of the Regression (SER) The square root of s 2 , s, is called the standard error of the regression.
(41)
Sampling Error b X X 1 X y X X X
1 X X X
(42) X
1 X X
1 X X X
X X X
1 X X
Starting from y y M0y (43) we obtain 0 0 0 0 0 2 i y i y M y M y y M M y y M y (44) where M 0 is a n n symmetric idempotent matrix that transforms observations into deviations from sample means. Its diagonal elements are all 1 1 n and its off-diagonal elements are 1 n .
Derivation of the coefficient of determination y e after subtracting y from both sides, we obtain yy ye that can be rewritten as M 0 y M 0 e M 0 Xb e The total sum of squares is M 0 y M 0 y M 0 Xb e M 0 Xb e which can be rewritten as: y M y b X M 0 Xb e e
0
The total sum of squares (TSS) can be decomposed in two parts, measuring respectively the proportion of the TSS that is accounted for by variation in the regressors (ESS) and the proportion that is not (RSS) TSS ESS RSS The standard measure of the goodness of fit of a regression is simply the ratio between ESS and TSS. This measure called the coefficient of determination, R 2 is bounded between zero and one. R2
0 ESS X M Xb 1 e e b TSS y M0y y M0y
(50)
(51)
Problems with the use of the coefficient of determination It never decreases when an additional explanatory variable is added to the regression. For this reason alternative measures have been implemented, including the so-called adjusted R 2 (for the degrees of freedom) which is computed as follows: e e/n K 2 R 1 0 (52) y M y/n 1 This adjusted variable can decrease if the contribution of the additional variable to the fit of the regression is relatively low. If the constant term is not included in the model the coefficient of determination is not bounded between zero and one and can indeed turn negative.