Mechanics and Interpretation of Multiple Regression Analysis

Motivation for Multiple Regression Mechanics and Interpretation of OLS Expected Values of OLS Estimators Variance of the OLS
Estimators Eciency of OLS
Chapter 3 Multiple Regression Analysis: Estimation

Le Van Chon
University of Economics Ho Chi Minh City
June 2012 Based on Introductory Econometrics: A Modern Approach by Wooldridge
Le Van Chon
Applied Econometrics
Motivation for Multiple Regression Mechanics and Interpretation of OLS Expected Values of OLS Estimators Variance of the OLS Estimators Eciency of OLS
Primary drawback of simple regression analysis is that the key assumption SLR.3 is often unrealistic. Multiple regression analysis allows to explicitly control for many other factors that simultaneously aect the dependent variable. It is more amenable to ceteris paribus analysis. (1) If we add more factors that are useful for explaining y , then more of the variation in y can be explained. (2) Multiple regression analysis can incorporate fairly general functional forms. it allows for much more exibility.
Le Van Chon
Model with 2 Independent Variables

Some examples show how multiple regression analysis can solve problems that cannot be solved by simple regression. wage = 0 + 1 educ + 2 exper + u (1)
wage is determined by 2 independent variables, education and experience, and by other factors contained in u . We are still primarily interested in the eect of educ on wage . (1) takes exper out of u and puts it explicitly in the equation. Can measure the eect of educ on wage , holding exper xed. We dont need to assume exper is uncorrelated with educ . 2 measures ceteris paribus eect of exper on wage .
Le Van Chon Applied Econometrics
Model with 2 Independent Variables (cont.)

Suppose average test score (avgscore ) depends on per student spending (expend ), average family income (avginc ) and other variables: avgscore = 0 + 1 expend + 2 avginc + u (2) Coecient of interest is 1 , the ceteris paribus eect of expend on avgscore . In simple regression, avginc would be included in u , which would likely to be correlated with expend . OLS estimator of 1 would be biased.
Le Van Chon

A general model with 2 independent variables: y = 0 + 1 x1 + 2 x2 + u (3)
It can generalize functional relationships between variables. cons = 0 + 1 inc + 2 inc 2 + u (4) Consumption depends on only 1 observed factor, income. This model is not a a simple regression model because it has 2 independent variables, x1 = inc and x2 = inc 2 . In (1), 1 is the ceteris paribus eect of educ on wage . In (4), marginal propensity to consume is approximated by cons 1 + 22 inc inc
The key assumption about how u is related to x1 and x2 is E (u |x1 , x2 ) = 0 For any values of x1 and x2 , the average unobservable is equal to 0. E.g., in (1), this assumption is E (u |educ , exper ) = 0. Other factors aecting wage are not related on average to educ and exper .
Le Van Chon
Model with k Independent Variables

Multiple linear regression model in the population is y = 0 + 1 x1 + 2 x2 + ... + k xk + u 0 is the intercept, 1 to k are all called slope parameters, u is the error term or disturbance. Key assumption is zero conditional mean assumption: E (u |x1 , x2 , ..., xk ) = 0 (6)
(5)
(6) requires that all factors in the unobserved error term be uncorrelated with the explanatory variables.
Le Van Chon
Obtaining the OLS estimates

SRF: 0 + 1 x1 + 2 x2 + ... + k xk y = 0 , 1 , ..., k are chosen to minimize SSR: OLS estimates n 2 i =1 (yi 0 1 xi 1 2 xi 2 ... k xik ) k + 1 rst order conditions in k + 1 unknowns 0 , 1 , ..., k :
n
(7)
0 1 xi 1 2 xi 2 ... k xik ) = 0 (yi

i =1 n
(8)
0 1 xi 1 2 xi 2 ... k xik ) = 0, j = 1, 2, ..., k xij (yi

i =1
Obtaining the OLS estimates (cont.)
The OLS rst order conditions can be obtained by the method of moments: Under assumption (6), E (u ) = 0 and E (xj u ) = 0, j = 1, 2, ..., k , Equations (8) are the sample counterparts of these population moments.
Le Van Chon
Interpreting the OLS Regression Equation

The case of 2 independent variables: 0 + 1 x1 + 2 x2 y = 0 is the predicted value of y when x1 = 0 and x2 = 0. 1 and 2 have partial eect, or ceteris paribus, interpretations. 1 x1 + 2 x2 From (9), y = 1 x1 when x2 is held xed, y = 2 x2 when x1 is held xed, y = Allows us to control x2 explicitly when estimating the partial eect of x1 on y .
(9)
Le Van Chon
Interpreting the OLS Regression Equation (cont.)

E.g., GPA1 contains college grade point average (colGPA), high school GPA (hsGPA), and achievement test score (ACT ) for 141 students from a large university. OLS regression line: colGPA = 1.29 + .453hsGPA + .0094ACT The intercept is not meaningful since no one attending college has either a zero hsGPA or a zero ACT . Holding ACT xed, another point on hsGPA is associated with .453 of a point on colGPA. Holding hsGPA xed, additional 10 points on ACT aects colGPA by less than one-tenth of a point.
Interpreting the OLS Regression Equation (cont.)
The general case with k independent variables: 0 + 1 x1 + 2 x2 + ... + k xk y = 1 x1 + 2 x2 + ... + k xk y =
(10)
Holding x2 , ..., xk xed implies that 1 x1 y = 1 measures the change in y or due to a one-unit increase in x1 , holding all other independent variables xed. j has a ceteris paribus interpretation. Each
Le Van Chon
Holding Other Factors Fixed

Power of multiple regression analysis: It provides ceteris paribus interpretation even though the data have not been collected in a ceteris paribus fashion. ACT a partial eect interpretation, we actually went In giving out and sampled people with the same hsGPA but dierent ACT scores. This is not the case as the data are a random sample. Multiple regression allows us to do in nonexperimental environments what natural scientists can do in a controlled lab setting: keep other factors xed.
Le Van Chon
OLS Fitted Values and Residuals

After obtaining the OLS regression line, we can obtain a tted or predicted value for each observation i . 0 + 1 xi 1 + 2 xi 2 + ... + k xik y i = (11) Residual for observation i : u i = yi y i If u i > 0, yi is underpredicted. If u i < 0, yi is overpredicted. OLS y i and u i have the same properties as in the simple regression: (1) The sample average of u i is zero. (2) The sample covariance between xj and u is zero. (3) OLS regression line always goes through ( x1 , x 2 , ..., x k , y ).
A Partialling Out Interpretation

Consider the case with 2 independent variables: 0 + 1 x1 + 2 x2 y = 1 can be computed in the following way (no need to prove) (1) Simple regression of y on x2 : y = 0 + 1 x2 + u y 2 (2) Simple regression of x1 on x2 : x1 = 0 + 1 x2 + u 12 (3) Simple regression of u y 2 on u 12 : 1 u u y 2 = 12 + u y 1.2 1 measures the sample relationship between y and x1 after x2 has been partialled out.
A Partialling Out Interpretation (cont.)
0 + 1 educ + 2 exper + u E.g., WAGE1: wage = 1 in the above way. Compute
Le Van Chon
Simple vs Multiple Regression Estimates

Simple regression of y on x1 : 0 + 1 x1 y = Multiple regression of y on x1 and x2 : 0 + 1 x1 + 2 x2 y = 1 does not usually equal 1 . We know that 1 = 1 in two distinct cases: 2 = 0, the partial eect of x2 on y (1) is zero in the sample. (2) 1 = 0, x1 and x2 are uncorrelated in the sample. E.g., GPA1. Regress colGPA on hsGPA.
Le Van Chon
Goodness-of-Fit
As with simple regression, we can dene n )2 is the total sum of squares (SST), i =1 (yi y n i y )2 is the explained sum of squares (SSE), i =1 (y n i 2 is the residual sum of squares (SSR). i =1 u Then SST = SSE + SSR Assuming that the total variation in y is nonzero, we can divide (12) by SST to get 1 = SSE /SST + SSR /SST (12)
Le Van Chon
Goodness-of-Fit (cont.)
As in simple regression, the R-squared is SSE SSR R2 =1 SST SST R 2 implies the proportion of the sample variation in y that is explained by the model. 0 R2 1 R 2 never decreases and usually increases when another independent variable is added to a regression. Because SSR never increases when additional regressors are added to the model.
Goodness-of-Fit (cont.)
E.g., GPA1. hsGPA and ACT together explain about 17.6% of the variation in colGPA for this sample. We must remember that there are many other factors family background, personality, quality of high school education, anity for college that contribute to a students college performance. A low R 2 does not mean that the equation is useless. It is still possible that the OLS estimates are reliable estimates of the ceteris paribus eects of xj on y .
Le Van Chon
Regression Through the Origin

Sometimes, an economic theory or common sense suggests that 0 = 0. The equation form is 1 x1 + 2 x2 + ... + k xk y = (13) OLS estimates in (13) minimize SSR, but with 0 = 0, the OLS properties derived earlier no longer hold: n i = 0 i =1 u R 2 = 1 SSR /SST can be negative. Serious drawback: if 0 = 0, OLS estimators of the slope parameters will be biased.
Le Van Chon
Unbiasedness of OLS
We turn to the statistical properties of OLS which have nothing to do with a particular sample, but with the properties of estimators when random sampling is done repeatedly. Unbiasedness of OLS is established under a set of assumptions: Assumption MLR.1 (Linear in Parameters) The population model is linear in parameters as y = 0 + 1 x1 + 2 x2 + ... + k xk + u (14)
where 0 , 1 , 2 , ..., k are unknown parameters of interest, and u is an unobservable random error term.
Unbiasedness of OLS (cont.)

Assumption MLR.2 (Random Sampling) We have a random sample of n observations, {(xi 1 , xi 2 , ..., xik , yi ) : i = 1, 2, ..., n}, from the population model. We can write (14) in terms of the random sample as yi = 0 + 1 xi 1 + 2 xi 2 + ... + k xik + ui , i = 1, 2, ..., n (15) To obtain unbiased estimators of j , we need to impose Assumption MLR.3 (Zero Conditional Mean) E (u |x1 , x2 , ..., xk ) = 0
Le Van Chon
Assumption MLR.3 can fail if the functional relationship between y and x s is misspecied. E.g., we forget to include inc 2 , use wage instead of log (wage ). When Assumption MLR.3 holds, we often say that we have exogenous explanatory variables. If xj is correlated with u , then xj is said to be an endogenous explanatory variable.
Le Van Chon

Assumption MLR.4 (No perfect collinearity) None of the independent variables is constant, and there are no exact linear relationships among the independent variables. While Assumption MLR.3 describes the relationship between u and x s, Assumption MLR.4 depicts the relationships among all x s. If an xj is an exact linear combination of the other xm s, the model suers from perfect collinearity and cannot be estimated by OLS. Why? Assumption MLR.4 does allow xj s to be correlated; they just cannot be perfectly correlated.

E.g., we have the relationship: cons = 0 + 1 inc + u Is Assumption MLR.4 violated when we add income measured in thousands of dollars or inc 2 ? E.g., VOTE1. We want to estimate the eect of campaign spending on campaign outcomes. Let voteA be the percent of the vote for Candidate A, expendA be campaign expenditures by Candidate A, expendB be campaign expenditures by Candidate B, totexpend be total campaign expenditures. voteA = 0 + 1 expendA + 2 expendB + 3 totexpend + u

This model violates Assumption MLR.4 because x3 = x1 + x2 . Solution: drop any one of the three variables from the model. Assumption MLR.4 also fails if the sample size, n, is smaller than the number of parameters, k + 1, being estimated. Theorem 3.1 (Unbiasedness of OLS) Using Assumptions MLR.1 through MLR.4, j ) = j , E ( j = 0, 1, 2, ..., k
(16)
OLS are unbiased estimators of the population parameters. When we say OLS is unbiased, we mean the procedure by which the OLS estimates are obtained is unbiased.
Including Irrelevant Variables

One (or more) independent variable is included in the model even though it has no partial eect on y in the population. In the model y = 0 + 1 x1 + 2 x2 + 3 x3 + u , x3 has no eect on y after x1 and x2 have been controlled for or 3 = 0. Because we dont know 3 = 0, we estimate the equation 0 + 1 x1 + 2 x2 + 3 x3 including x3 : y = 1 and 2 ? Does including x3 aect the unbiasedness of No eect. (This follows Theorem 3.1.) However, including irrelevant variables can have undesirable eects on the variances of the OLS estimators.
Omitted Variable Bias: Simple Case

What if we exclude a variable that does belong in the population model? Suppose the true model is y = 0 + 1 x1 + 2 x2 + u (17)
Assume the model satises Assumptions MLR.1 thru MLR.4. Suppose our primary interest is in 1 , the partial eect of x1 on y . E.g., y is hourly wage, x1 is education, x2 is a measure of innate ability. To get an unbiased estimator of 1 , we should run a regression of y on x1 and x2 . wage = 0 + 1 educ + 2 abil + u
(18)
Omitted Variable Bias: Simple Case (cont.)

However, due to our ignorance or data inavailability, we 0 + 1 x1 + u exclude x2 : y = From (12) in Chapter 2: n 1 )yi =1 (xi 1 x 1 = in 1 )2 i =1 (xi 1 x
(19)
Since (17) is the true model, we write y for each obs i as yi = 0 + 1 xi 1 + 2 xi 2 + ui (20) Plug (20) in the numerator of (19):
n
(xi 1 x 1 )(0 + 1 xi 1 + 2 xi 2 + ui )
i =1 n n n
= 1
i =1
(xi 1 x 1 )2 + 2
i =1
Le Van Chon
(xi 1 x 1 )xi 2 +
i =1
(xi 1 x 1 )ui (21)
Divide (21) by SST1 , take the conditional expectation: 1 |x1 , x2 ) = 1 + 2 E (

n 1 )xi 2 i =1 (xi 1 x n 1 )2 i =1 (xi 1 x
(22)
1 ) does not generally equal 1 : 1 is biased for 1 . Thus, E ( The ratio multiplying 2 is the slope coecient from the 0 + 1 x1 regression of x2 on x1 : x 2 =
Le Van Chon
We can write (22) as 1 |x1 , x2 ) = 1 + 2 1 E ( 1 is E ( 1 ) 1 = 2 1 , called the omitted The bias in variable bias. 1 is unbiased if: (23) implies there are 2 cases where (i) 2 = 0, that is x2 does not appear in the true model; 1 = 0, that is x1 and x2 are uncorrelated in the sample. (ii) (23)
Le Van Chon

In the wage equation (18), more ability leads to higher productivity and therefore higher wages: 2 > 0. It is believed that educ and abil are positively correlated. 1 from the simple regression Thus, the OLS 0 + 1 educ + v wage = is on average too large. 1 when x2 is omitted. Table 3.2 Summary of Bias in 2 > 0 2 < 0 Corr(x1 , x2 ) > 0 positive bias negative bias Corr(x1 , x2 ) < 0 negative bias positive bias
Le Van Chon
Omitted Variable Bias: General Cases

Suppose the population model y = 0 + 1 x1 + 2 x2 + 3 x3 + u satises Assumptions MLR.1 through MLR.4. But we omit x3 and estimate the model 0 + 1 x1 + 2 x2 y = Suppose x2 and x3 are uncorrelated, but x1 is correlated with x3 . 2 is unbiased while 1 is probably It is tempting to think that biased. 1 and 2 are normally biased. Why? Not right: both 2 is unbiased when x1 and x2 are also uncorrelated.
Variance of the OLS Estimators

Assumption MLR.5 (Homoskedasticity) Var (u |x1 , x2 , ..., xk ) = 2 It means that the variance in u , conditional on the x s, is the same for all combinations of outcomes of the x s. If this variance changes with any xj , then heteroskedasticity is present. Assumptions MLR.1 through MLR.5 are collectively known as the Gauss-Markov assumptions. We will use the symbol x to denote the set of all independent variables, (x1 , x2 , ..., xk ).
Variance of the OLS Estimators (cont.)
Theorem 3.2 (Sampling Variances of OLS Slope Estimators) Under Assumptions MLR.1 through MLR.5, j |x ) = Var ( 2 for j = 1, 2, ..., k SSTj (1 Rj2 ) (24)
where SSTj = n j )2 and Rj2 is the R-squared from i =1 (xij x regressing xj on all other independent variables (and including an intercept). Now we discuss the elements comprising (24).
Le Van Chon
Components of the OLS Variances

j depends on 3 factors: 2 , (24) shows that the variance of SSTj , and Rj2 . Error variance, 2 . A larger 2 means larger variances for j : More noisein the equation makes it more dicult OLS to estimate the partial eect of any xj on y . For a given y , there is 1 way to reduce the error variance: add more explanatory variables to the equation. This is not always possible, nor always desirable for reasons discussed later.
Le Van Chon
Components of the OLS Variances (cont.)
Total sample variation in xj , SSTj . The larger the total j ). (This is discussed in variation in xj , the smaller is Var( Chapter 2.) One way to increase SSTj is to increase the sample size. In fact, SSTj increases without bound as the sample size gets larger and larger. Extreme case: no sample variation in xj , SSTj = 0, is not allowed by Assumption MLR.4.
Le Van Chon
Components of the OLS Variances (cont.)

Linear relationships among the independent variables, Rj2 . Rj2 is obtained from a regression of xj on all other xl s. Rj2 is the proportion of the total variation in xj that can be explained by the other xl s appearing in the equation. j ) is obtained For a given 2 and SSTj , the smallest Var( when Rj2 = 0 if and only if xj has zero sample correlation with every other xl . The other extreme case, Rj2 = 1, is ruled out. Why? j ) . High (but not perfect) correlation As Rj2 1, Var( between two or more of the independent variables is called multicollinearity.
Variances in Misspecied Models

Whether or not to include a variable in a regression model can be chosen by analyzing the tradeo between bias and variance. In Section 3.3, we derived the bias. We have a true model which satises the Gauss-Markov assumptions: y = 0 + 1 x1 + 2 x2 + u We consider two estimators of 1 : multiple regression simple regression 0 + 1 x1 + 2 x2 y = 0 + 1 x1 y = (25) (26)
When 2 = 0, (26) excludes a relevant variable and in Section 1 unless x1 and x2 are uncorrelated. 3.3, this induces a bias in
Variances in Misspecied Models (cont.)

1 is preferred to 1 . If bias is the only criterion, This conclusion does not hold when variance is taken into account. We know that 1 |x ) = Var ( 2 2) SST1 (1 R1 (27) (28)
2 Var (1 |x ) = SST1
1 |x ) < Var ( 1 |x ) unless x1 (27) and (28) shows that Var ( and x2 are uncorrelated in the sample.
Variances in Misspecied Models (cont.)

Assuming that x1 and x2 are correlated, it follows: 1 is unbiased, 1 is biased, and 1) When 2 = 0, 1 ) < Var ( 1 ). Var ( 1 and 1 are unbiased, and 2) When 2 = 0, 1 ) < Var ( 1 ). Var ( 1 is preferred if 2 = 0. Then including x2 in the From (2), model can only exacerbate the multicollinearity problem. 1 ) is the cost of including an irrelevant Higher Var ( variable in a model. When 2 = 0, we should include x2 in the model because: 1 does not shrink as the sample size grows. i) any bias in 1 ) and Var ( 1 ) both shrink to zero as n gets large. ii) Var ( iii)Excluding x2 increases the error variance (more subtle).
j Estimating 2 : Standard errors of

What we observe are the residuals, u i . We can use the residuals to form an estimate of the error variance. An unbiased estimator of 2 is n i2 SSR 2 i =1 u = = nk 1 nk 1
(29)
Term n k 1 in (29) is the degrees of freedom (df) with n observations and k + 1 estimated parameters. Theorem 3.3 (Unbiased estimation of 2 ) Under the Gauss-Markov Assumptions MLR.1 through MLR.5, E ( 2) = 2
j (cont.) Estimating 2 : Standard errors of

= 2 = standard error of the regression (SER). j ) = sd(
SSTj (1Rj2 )
j : Standard deviation of
j : Substituting 2 for 2 gives us the standard error of j ) = se ( SSTj (1 Rj2 ) (30)
j ) relies on Assumption MLR.5, (30) is Note that because se( j ) if the errors exhibit not a valid estimator of sd( heteroskedasticity.
The Gauss-Markov Theorem
Theorem 3.4 (Gauss-Markov Theorem) 0 , 1 , ..., k are Under Assumptions MLR.1 through MLR.5, the best linear unbiased estimators (BLUEs) of 0 , 1 , ..., k , respectively. Best smallest variance, Linear a linear function of the sample data, j ) = j . Unbiased E( If the assumptions hold, use OLS.
Le Van Chon

Mechanics and Interpretation of Multiple Regression Analysis

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Mechanics and Interpretation of Multiple Regression Analysis

Uploaded by

Copyright:

Available Formats

Motivation for Multiple Regression Mechanics and Interpretation of OLS Expected Values of OLS Estimators Variance of the OLS

Estimators Eciency of OLS

Chapter 3 Multiple Regression Analysis: Estimation

June 2012 Based on Introductory Econometrics: A Modern Approach by Wooldridge

Model with 2 Independent Variables

Model with 2 Independent Variables (cont.)

Model with 2 Independent Variables (cont.)

Model with 2 Independent Variables (cont.)

Model with k Independent Variables

Obtaining the OLS estimates

0 1 xi 1 2 xi 2 ... k xik ) = 0 (yi

0 1 xi 1 2 xi 2 ... k xik ) = 0, j = 1, 2, ..., k xij (yi

Obtaining the OLS estimates (cont.)

Interpreting the OLS Regression Equation

Interpreting the OLS Regression Equation (cont.)

Interpreting the OLS Regression Equation (cont.)

The general case with k independent variables: 0 + 1 x1 + 2 x2 + ... + k xk y = 1 x1 + 2 x2 + ... + k xk y =

Holding Other Factors Fixed

OLS Fitted Values and Residuals

A Partialling Out Interpretation

A Partialling Out Interpretation (cont.)

0 + 1 educ + 2 exper + u E.g., WAGE1: wage = 1 in the above way. Compute

Simple vs Multiple Regression Estimates

Regression Through the Origin

Unbiasedness of OLS (cont.)

Unbiasedness of OLS (cont.)

Unbiasedness of OLS (cont.)

Unbiasedness of OLS (cont.)

Unbiasedness of OLS (cont.)

Including Irrelevant Variables

Omitted Variable Bias: Simple Case

Omitted Variable Bias: Simple Case (cont.)

(xi 1 x 1 )ui (21)

Omitted Variable Bias: Simple Case (cont.)

Divide (21) by SST1 , take the conditional expectation: 1 |x1 , x2 ) = 1 + 2 E (

Omitted Variable Bias: Simple Case (cont.)

Omitted Variable Bias: Simple Case (cont.)

Omitted Variable Bias: General Cases

Variance of the OLS Estimators

Variance of the OLS Estimators (cont.)

Components of the OLS Variances

Components of the OLS Variances (cont.)

Components of the OLS Variances (cont.)

Variances in Misspecied Models

Variances in Misspecied Models (cont.)

Variances in Misspecied Models (cont.)

j Estimating 2 : Standard errors of

j (cont.) Estimating 2 : Standard errors of

j : Substituting 2 for 2 gives us the standard error of j ) = se ( SSTj (1 Rj2 ) (30)

The Gauss-Markov Theorem

You might also like