005

Multiple Regression Analysis: Estimation EC 252 Introduction to Econometric Methods
Matthias Parey
February 16, 2011
1 / 47
Outline
More on interpretation The Expected Value of the OLS Estimators Assumptions Result: OLS is unbiased Potential bias from misspecication The Variance of the OLS Estimators Homoscedasticity assumption Result Interpretation Variance in misspecied models Estimating 2 Eciency of OLS: The Gauss-Markov Theorem Reading: Wooldridge (2009), Introductory Econometrics, Chapter 3.
2 / 47
Outline
More on interpretation The Expected Value of the OLS Estimators Assumptions Result: OLS is unbiased Potential bias from misspecication The Variance of the OLS Estimators Homoscedasticity assumption Result Interpretation Variance in misspecied models Estimating 2 Eciency of OLS: The Gauss-Markov Theorem
3 / 47
Units of Measurement
What is the eect of changing units of x or y on our results? If the dependent variable y is multiplied by some constant c, then the OLS intercept and slope estimates are also multiplied by c. If the independent variable x is multiplied by some nonzero constant c, then the OLS slope coecient is divided by c (intercept is not aected). variable does not aect the intercept. What happens to R 2 when the unit of measurement of either the independent or the dependent variable changes?
4 / 47
Functional form: the log-transformation

Often we consider a non-linear transformation of the dep. variable. Leading case: the log-transformation. Why is this useful?
Example (Wage eects of education)
log(wage) = 0 + 1 schooling + u eect on the dependent variable? log(wage) = 1 schooling 1 wage = 1 wage schooling
schooling induces an approximate percentage change (%) in wages of 1001 . We write this as follows: %wage (1001 )schooling
5 / 47
Example (Wage eects of education)

Re-estimating the wage equation with the dependent variables in log, we obtain: log(wage) = 0.58 + 0.08 schooling with R = 0.19. An additional year of schooling increases wages by about 8 percent. This model is called a log-level model (dep. variable in logs, independent variable in levels). Before, we assumed that each unit change of x has the same constant eect on y . Now we assume that each each unit change of x has the same percentage eect on y . 1 is known in this case as semi-elasticitiy. Note that this interpretation is approximate it only works well for small changes in x.
6 / 47
The constant-elasticity model
Now consider the case where both x and y are transformed into logs: log(y ) = 0 + 1 log(x) + u Here we have 1 =
log(y ) x y = log(x) y x
This is known as the constant-elasticity model 1 measures the percentage change of y to a one percent change in x.
7 / 47
Summary: dierent functional forms

Log-transformations of our variables aect the interpretation of the coecient. The following table summarizes the dierent cases:
Model level-level level-log log-level log-log
Dependent Variable y y log (y ) log (y )
Independent Variable x log (x) x log (x)
Interpretation of 1 y = 1 x y = (1 /100)%x %y = (1001 )x (semi-elasticity) %y = 1 %x (elasticity)
8 / 47
Outline
9 / 47
The Expected Value of the OLS Estimators
Assumption MLR.1 (Linear in Parameters)

The model in the population (the population model or true model) can be written as: y = 0 + 1 x1 + 2 x2 + ... + k xk + u where 0 , 1 , ..., k are the unknown parameters (constants) of interest and u is an unobservable random error or disturbance term.
10 / 47
How restrictive is the linearity assumption of MLR.1?

Often the relationship between economic variables is not linear. Example: eect of eort on output might be characterized by decreasing marginal productivity. How can we account for this in a regression model?
3 2 the multiple regression model allows to include polynomials: (x1 , x1 , ...) 3 2 e.g. y = 0 + 1 x1 + 2 x1 + 3 x1 + u
There are other ways of incorporating non-linearities: e.g. interactions (set x3 = x1 x2 ) Thus, in principle the formulation is very general. Nontheless: The requirement is that the functional relation we end up choosing is appropriate. This is dicult to judge: economic theory rarely provides insights on what the exact functional form should be.
11 / 47
Assumption MLR.2 (Random Sampling)

We have a random sample of n observations {(xi1 , xi2 , ..., xik , yi ) : i = 1, 2, ..., n}, following the population model in Assumption MLR.1.
12 / 47
Assumption MLR.3 (No Perfect Collinearity)

In the sample (and therefore in the population), none of the independent variables is constant, and there are no exact linear relationships among the independent variables. If an independent variable is an exact linear combination of the other independent variables, then the model suers from perfect collinearity, and it cannot be estimated by OLS. Note: Assumption MLR.3 does allow the independent variables to be correlated; they just cannot be perfectly correlated.
13 / 47
MLR.3 rules out the following cases: one variable is a constant multiple of another: e.g. x1 = x2 one variable can be expressed as an exact linear function of two or more of the other variables, e.g. x1 = 0 + 2 x2 + 3 x3 + ... + k xk . Intuition: If there is an exact linear relationship, it is impossible to tell apart the eect of one variable from the other: we have no variation to separate out the eects. There are many combinations of (0 , 1 , ..., k ) which all deliver the same value from the loss function (sum of squared residuals). Given the exact restriction on the relationship between the dierent covariates, the ceteris paribus notion is meaningless. Assumption MLR.3 also fails if the sample size, n, is too small in relation to the number of parameters being estimated, i.e. if n < k + 1. in a particular sample, the linear relationship from above happens to hold by chance (this is a very small sample problem).
14 / 47
Example (Age versus cohort eects)

Assume you are interested in estimating a wage regression for a sample of workers, using data collected during one year (2009). You believe that workers productivity changes with age. You also believe that workers productivity depends on their vintage (that is their cohort as measured by year-of-birth). This motivates the following specication: log w = 0 + 1 age + 2 cohort + u Is assumption MLR.3 satised in this case?
15 / 47
Example (continued)
Note that the following relation is true for all individuals in the sample: cohort + age = 2009 Conclusion: We have to drop either the age eect or the cohort eect from the specication. That is, we eectively have to assume that either 1 = 0 or 2 = 0.
In this example, both age and vintage are potentially important. both factors may be genuinely relevant. but the data does not allow us to separate out the two eects.
16 / 47
Assumption MLR.4 (Zero Conditional Mean)

The error u has an expected value of 0 given any values of the independent variables. In other words: E (u|x1 , x2 , ..., xk ) = 0
When Assumption MLR.4 holds, we say we have exogenous explanatory variables. If xj is correlated with u for any reason, then xj is said to be an endogenous explanatory variable.
17 / 47
A list of cases when assumption MLR.4 can fail: 1. misspecied functional form (misspecication) 2. in case of omission of a variable that is correlated with any of the independent variables (omitted variable problem) 3. in specic forms of measurement error in an explanatory variable (measurement error problems) 4. in case one or more of the explanatory variables is determined jointly with y (simultaneity problems).
18 / 47
Theorem 3.1 (Unbiasedness of OLS)

Under Assumptions MLR.1 through MLR.4: E j = j , j = 0, 1, ..., k, for any values of the population parameter j . In other words, the OLS estimators are unbiased estimators of the population parameters.
19 / 47
Two remarks
1. Unbiasedness is a statement about the sampling distribution of an estimator
If we kept drawing fresh samples from the population, how would the distribution of the estimator look like? Thus, unbiasedness says nothing about how a particular realization is related to the true parameter value. In a particular sample, the estimated coecient may be far away from the true value even though the estimator is unbiased.
2. Unbiasedness is a statement about the expected value, and not about dispersion
An estimator can be unbiased but still have a large dispersion around the true value. Unbiasedness says nothing about the probability of being close to the true parameter value.
20 / 47
We now ask if we can describe what happens in specic cases of misspecication. What is the eect of including irrelevant variables? What is the eect of excluding a relevant variable?
21 / 47
What is the eect of including irrelevant variables?

Suppose we specify the model as: y = 0 + 1 x 1 + 2 x 2 + 3 x 3 + u and this model satises the Assumptions MLR.1 through MLR.4. Assume that x3 is irrelevant: 3 = 0. Note that then: E (y |x1 , x2 , x3 ) = E (y |x1 , x2 ) = 0 + 1 x1 + 2 x2 . Since we do not know that 3 = 0, we estimate the following equation: y = 0 + 1 x 1 + 2 x 2 + 3 x 3 This does not aect the unbiasedness of the OLS estimators, but can aect their variances.
loss of eciency: we estimate an additional parameter (3 ).
22 / 47
What is the eect of excluding a relevant variable?
Suppose the true population model is: y = 0 + 1 x1 + 2 x2 + u and assume that this model satises Assumptions MLR.1 through MLR.4. However, due to data availability, we estimate instead the model by excluding x2 : y = 0 + 1 x 1
23 / 47
Example (Unobserved ability in a wage regression)

Suppose that wages are determined as a function of schooling and of ability. Thus, the appropriate model would be log w = 0 + 1 schooling + 2 ability If we cannot observe individuals ability, we may have to restrict ourselves to log w = 0 + 1 schooling
24 / 47
We use the following fact: 1 = 1 + 2 1 where 1 and 2 are the slope estimators from the multiple regression of yi on xi1 and xi2 , and 1 is the slope from the simple regression of xi2 on xi1 . Now compute the expected value: E 1 = E 1 + 2 1 = E 1 + E 2 1 = 1 + 2 1 which implies the bias in 1 (the omitted variable bias) is: Bias 1 = E 1 1 = 2 1
Conclusion: There are two cases where 1 is unbiased: 1. if the unobserved covariate is irrelevant for y : 2 = 0; 2. if x1 and x2 are uncorrelated: 1 = 0.
25 / 47
The sign of the bias in 1 depends on the signs of both 2 and 1 : Corr(x1 , x2 ) > 0 positive bias negative bias Corr(x1 , x2 ) < 0 negative bias positive bias
2 > 0 2 < 0
Terminology: If E 1 > 1 , then 1 has an upward bias. If E 1 < 1 , then 1 has a downward bias. The phrase biased towards zero refers to cases where E 1 is closer to zero than 1 :
if 1 > 0, then 1 is biased towards zero if it has a downward bias; if 1 < 0, then 1 is biased towards zero if it has an upward bias.
26 / 47
Omitted variable bias - more general cases

Suppose the true population model is: y = 0 + 1 x 1 + 2 x 2 + 3 x 3 + u and this model satises the Assumptions MLR.1 through MLR.4. However, we omit x3 and estimate: y = 0 + 1 x 1 + 2 x 2 It is dicult to obtain the direction of the bias in general, because x1 , x2 and x3 can all be pairwise correlated. Note: even if x2 and x3 are uncorrelated, if x1 is correlated with x3 , both 1 2 will normally be biased. and The only exception to this is when x1 and x2 are also uncorrelated. Nevertheless, when x1 and x2 are uncorrelated, it can be shown that: E 1 = 1 + 3
n i=1 (xi1 x1 ) xi3 n 2 i=1 (xi1 x )
27 / 47
Outline
28 / 47
The Variance of the OLS Estimators
Assumption MLR.5 (Homoscedasticity)

The error u has the same variance given any values of the explanatory variables: Var (u|x1 , ..., xk ) = 2 If this assumption fails, then the model exhibits heteroscedasticity: The variance of the error term, given the explanatory variables, is not constant, but varies with the covariates.
29 / 47
Assumptions MLR.1 through MLR.5 are collectively known as the Gauss-Markov assumptions. Assumptions MLR.1 and MLR.4 can be written as: E (y |x) = 0 + 1 x1 + 2 x2 + ... + k xk Assumption MLR.5 can be written as: Var (y |x) = 2 where x is the set of all independent variables, (x1 , ..., xk ).
30 / 47
Theorem 3.2 (Sampling Variance of the OLS Slope Estimators)

Under Assumptions MLR.1 through MLR.5, conditional on the sample values of the independent variables: Var j = 2 SSTj 1 Rj2
for j = 1, 2, ..., k, where SSTj = i=1 (xij xj )2 is the total sample variation in xj , and Rj2 is the R-squared from regressing xj on all other independent variables (and including an intercept). The size of Var j is important: a larger variance means a less precise estimator larger condence intervals and less powerful hypothesis tests.
31 / 47
Interpretation
The variance of j depends on three factors: 1. The error variance 2 2. The linear relationship among the independent variables, Rj2 3. The total sample variation in xj , SSTj
32 / 47
1. The error variance 2 : a larger 2 means larger variances for the OLS estimators.
This reects more noise in the data. To reduce the error variance for a given y , add more explanatory variables to the equation.
2. The total sample variation in xj , SSTj : the larger the total variation in xj , the smaller Var (j ).
To increase the sample variation in each of the independent variables, increase the sample size.
3. The linear relationship among the independent variables, Rj2 :

Consider a limiting case: Var j as Rj2 1
High (but not perfect) correlation between two or more independent variables is called multicollinearity.
33 / 47
Remarks: What ultimately matters for statistical inference is how big j is in relation to its standard deviation. Also note that a high degree of correlation between certain independent variables can be irrelevant as to how well we can estimate other parameters in the model.
34 / 47
Variance in misspecied models

In applications, we sometimes have little guidance on whether to include a particular covariate or not. The choice can be made by analyzing the tradeo between bias and variance. Suppose the true population model, which satises the Gauss-Markov assumptions, is: y = 0 + 1 x 1 + 2 x 2 + u We consider two estimators of 1 : 1. y = 0 + 1 + 2 x 2 where Var 1 = 2. y = 0 + 1 x 1 2 where Var 1 = . SST1
35 / 47
2 2 . SST1 (1 R1 )
Assuming that x1 and x2 are correlated, we can draw the following conclusions: 1. When 2 = 0, 1 is biased, 1 is unbiased, and Var 1 < Var 1 . = When 2 = 0, there are two reasons for including x2 in the model:
any bias in 1 does not shrink as the sample size grows, but the variance does; the variance of 1 conditional only on x1 is larger than the one shown in the previous slide, where both regressors are treated as nonrandom.
2. When 2 = 0, 1 and 1 are both unbiased, and Var 1 < Var 1 . = 1 is preferred if 2 = 0.
36 / 47
Estimating 2
The unbiased estimator of 2 in the multiple regression case is: 2 = ui2 SSR = (n k 1) (n k 1)
n i=1
where the degrees of freedom is: df = n (k + 1) = (no. observations) (no. estimated parameters) Recall: the division by n k 1 comes from the fact that, in obtaining the OLS estimates, k + 1 restrictions are imposed on the OLS residuals, so that there are only n k 1 df in the residuals.
37 / 47
Theorem 3.3 (Unbiased Estimation of 2 )

Under the Gauss-Markov Assumptions MLR.1 through MLR.5, E 2 = 2
is called the standard error of the regression (SER), the standard error of the estimate or the root mean squared error. It is an estimator of the standard deviation of the error term. Note that can either decrease or increase when another independent variable is added to a regression. Why?
numerator: SSR goes down denominator: k increases, n k 1 goes down overall eect unclear.
38 / 47
For constructing condence intervals and conducting tests, we need to estimate the standard deviation of j : sd j = [SSTj (1 Rj2 )]1/2
Since is unknown, we replace it with its estimator . j : This gives us the standard error of se j = [SSTj (1 Rj2 )]1/2
39 / 47
Eects of heteroscedasticity
In the presence of heteroscedasticity the formula for Var j is no longer valid
this in turn invalidates the standard errors se j .
40 / 47
Outline
41 / 47
The Gauss-Markov Theorem: Introduction

Why should we choose OLS as our estimator of choice? We have already seen that it is unbiased. But what about its variance? How dispersed is it around the true parameter values? Are there other estimators which are less dispersed? The Gauss-Markov Theorem tells us that under a set of specic assumptions (the Gauss-Markov assumptions) OLS has smaller variance than any other estimator within a specic class: comparing across all linear and unbiased estimators. The Gauss-Markov Theorem justies the use of OLS rather than competing estimators.
42 / 47
Eciency of OLS
Gauss-Markov Theorem
Under the following set of assumptions: MLR.1 (Linear in parameters) MLR.2 (Random sampling) MLR.3 (No perfect collinearity) MLR.4 (Zero conditional mean) MLR.5 (Homoscedasticity) the OLS estimators (0 , 1 , ..., k ) are the best linear unbiased estimators (BLUEs) of (0 , 1 , ..., k ), respectively. We say that OLS is BLUE.
43 / 47
Interpretation: A comparison within a specic class

Estimator: a rule that can be applied to any sample of data to produce an estimate.
All estimators ll i of Subsetof linear estimators
Linear: an estimator is linear if, and only if, it can be expressed as a linear function of the data on the dependent variable:
n
j =
i=1
wij yi
Subsetof linear li unbiased estimators
OLS
where each wij can be a function of the sample values of all the independent variables. Unbiased: j is an unbiased estimator of j if E j = j .
44 / 47
The criterion: Best: smallest variance. = Under the assumptions MLR.1-MLR.5, we have that for any estimator j that is linear and unbiased, Var j Var j where j is the OLS estimator.
Keep in mind: If any of the Gauss-Markov assumptions fail, then this theorem no longer holds.
45 / 47
Eects of heteroscedasticity
Under heteroscedasticity, the Gauss-Markov theorem no longer applies: MLR.5 does not hold any longer the OLS estimator is still unbiased (MLR.5 is not required for Theorem 3.1) but the Gauss-Markov theorem does not apply Intuition why OLS may not be ecient: Heteroscedasticity means that some observations are more informative (contain less noise) than others. but the OLS objective function puts equal weight on all squared residuals ui2 . thus, OLS does not exploit that we can extract more information from some observations it is not surprising that there may be a more ecient estimator!
46 / 47
Outline
More on interpretation The Expected Value of the OLS Estimators Assumptions Result: OLS is unbiased Potential bias from misspecication The Variance of the OLS Estimators Homoscedasticity assumption Result Interpretation Variance in misspecied models Estimating 2 Eciency of OLS: The Gauss-Markov Theorem Next lecture: Multiple Regression Analysis: Inference.
47 / 47
Outline
Appendix: Numerical example to x ideas (continued)
48 / 47
This continues the numerical exercise for the simple linear regression model (from week 3).
Example (Wage eects of education (continued))

Table: Data
We estimated the coecients as follows (n = 10): wage = 3.569 + 0.8597 schooling, We can predict residuals ui in the sample, and estimate the variance of the error term: 2 = 1 (n 2)
n
ui2 =
i=1
95.54 = 11.94 10 2
id 1 2 3 4 5 6 7 8 9 10
wage 6 5.3 8.75 11.25 5 3.6 18.18 6.25 8.13 8.77
schooling 8 12 16 18 12 12 17 16 13 12
49 / 47

We can also estimate R 2 from the residuals and the total sum of squares SST: 95.54 SSR =1 = 0.395 R2 = 1 SST 157.93 Thus, our empirical model explains 39.5% of the observed variation in wages.
50 / 47

Plug in to obtain estimates of the standard error of 1 and 0 : 11.94 1 = = se = 0.376 2 n 84.40 (xi x ) i=1 se 0 =
1 n n i=1 n 2 i=1 xi
=
2
(xi x )
11.94 193.4 = 5.23 84.40
We can summarize what we have learned so far about this regression by writing wage = -3.569 + 0.8597 schooling, n = 10, R 2 = 0.395 (5.23) (0.376)
51 / 47

005

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

005

Uploaded by

Copyright:

Available Formats

Multiple Regression Analysis: Estimation EC 252 Introduction to Econometric Methods

February 16, 2011

Functional form: the log-transformation

Example (Wage eects of education)

The constant-elasticity model

Summary: dierent functional forms

Model level-level level-log log-level log-log

Dependent Variable y y log (y ) log (y )

Independent Variable x log (x) x log (x)

Interpretation of 1 y = 1 x y = (1 /100)%x %y = (1001 )x (semi-elasticity) %y = 1 %x (elasticity)

The Expected Value of the OLS Estimators

Assumption MLR.1 (Linear in Parameters)

How restrictive is the linearity assumption of MLR.1?

Assumption MLR.2 (Random Sampling)

Assumption MLR.3 (No Perfect Collinearity)

Example (Age versus cohort eects)

Assumption MLR.4 (Zero Conditional Mean)

Theorem 3.1 (Unbiasedness of OLS)

What is the eect of including irrelevant variables?

What is the eect of excluding a relevant variable?

Example (Unobserved ability in a wage regression)

Omitted variable bias - more general cases

The Variance of the OLS Estimators

Assumption MLR.5 (Homoscedasticity)

Theorem 3.2 (Sampling Variance of the OLS Slope Estimators)

3. The linear relationship among the independent variables, Rj2 :

Variance in misspecied models

Theorem 3.3 (Unbiased Estimation of 2 )

In the presence of heteroscedasticity the formula for Var j is no longer valid

this in turn invalidates the standard errors se j .

The Gauss-Markov Theorem: Introduction

Interpretation: A comparison within a specic class

Subsetof linear li unbiased estimators

Appendix: Numerical example to x ideas (continued)

Example (Wage eects of education (continued))

wage 6 5.3 8.75 11.25 5 3.6 18.18 6.25 8.13 8.77

Example (Wage eects of education (continued))

Example (Wage eects of education (continued))

11.94 193.4 = 5.23 84.40

You might also like