You are on page 1of 49

Multiple Regression

Multiple Regression

So far we have only examined the case where there is only one explanatory variable. Often the variable we are interested in is related to more than one variable and their effects on Y E.g.1 A firms share price may be a depend on the sales of the firm this year, the number of employees, where they sell their product etc. E.g.2 House price depends on location, number of rooms, job opportunities in the region, population etc. etc.

Multiple Regression

The multiple regression model extends our analysis to more than one explanatory variable Yi = 1X1i + 2X2i+ 3X3i ++ kXki + i

Where X1t is a vector equal to unity [X1i=1] (i.e 1 is an intercept) and XJi ( j= 2,3,,k) is a set of explanatory variables k = # of parameters to be estimated / degrees of freedom

Multiple Regression

Again we obtain fitted values


i 1 1i 2 2i 3 3i

X X X ... X Y
k

ki

Again the aim is to minimise

Min
n i

1 , ..., k

Min Y Y
2 n i 2

1 , ..., k

Steps are same as before:


Differentiate by each and set each equal to 0 Solve the k first order conditions simultaneously to find the s

Interpretation of Coefficients in a Multiple regression

Yi = 1X1i + 2X2i+ 3X3i ++ kXki + i

Recall X1i here is just 1 so:

Yi = 1 + 2X2i+ 3X3i ++ kXki + i

Interpretation:

Intercept (1) -> Value of Y we would expect to observe if X2i,X3i,,Xki were ALL zero! Slopes:

2 -> Change in Y we would expect to observe if X2 increased by one unit if all other Xs remained unchanged!!! 3 -> Change in Y we would expect to observe if X3 increased by one unit if all other Xs remained unchanged!!! k -> Change in Y we would expect to observe if Xk increased by one unit if all other Xs remained unchanged!!!

Goodness of Fit and Hypothesis Testing

We can decompose the deviation of Y from its mean value (Y) into a part explained by the fact that X is not at its average value, and a part which we cannot explain.

Yi

=> This part is explained by variation in Xi from its mean

Xi

Naturally we would like to explain as much of the variation in Y as possible.


7

Goodness of Fit

We should be a little suspicious of models which appear to fit the data too well (e.g. r2 of 0.9 means we can explain 90% of the variation in Y using only X!)

2 R

and adjusted

2 R

R2 measures the closeness of fit in the regression model Comparing two different regressions with different numbers of explanatory variables causes difficulties

Additional independent variables will always result in a higher R2 regardless of their importance [because by chance they will appear to explain some variation]] 2 Instead, we use the adjusted R2 (R )

RSS /(n k ) R 1 TSS /(n 1)


2

The model selection criterion is to include an extra explanatory variable only if it increases the adjusted R2

Other Model Selection Criterion


(These are more commonly used than R2 in Time-series)

Akaike Information Criterion (AIC)


RSS 2 k / n AIC ( )e n Schwarz Bayesian Criterion (SBC)

Others include

RSS k / n SBC ( )e n

The Finite Prediction Error (FPE) Hannan and Quin Criterion (HQC)

Other Model Selection Criterion

Ideally you should select the model which minimizes the AIC and SBC The SBC has a higher penalty for including more explanatory variables In small samples the AIC can work better than the SBC as a selection criteria You can be quite confident if both the AIC and SBC select the same model.

More Hypothesis Testing

Can use a t test to test the significance of individual coefficients as before

with tn-k degrees of freedom

Reject Ho if t > |tCV|

This is for testing the significance of a single variable

But: Sometimes we wish to test whether there are particular relationships between the estimated coefficients

Imposing Linear Restrictions

Given the unrestricted equation

Yi = 1X1i + 2X2i+ 3X3i + 4X4i + 5X5i + i


i is a random shock with a mean 0

Impose the restriction 2 = 5

Our equation is then: Yi = 1 + 2X2i+ 3X3i + 4X4i + 2X5i + ei Yi = 1 + 2(X2i+ X5i) + 3X3i + 4X4i + ei Yi = 1 + 2(X*i) + 3X3i + 4X4i + ei Where X*i = X2i+ X5i

Testing Linear Restrictions

Need to test whether your imposed restrictions are valid

Key is to test the difference between the unrestricted and restricted model If the restriction does not affect the fit of the model very much then we can accept the restriction as being valid How do we compare the restricted and unrestricted least-squares regressions?

Testing Linear Restrictions

The most common method is to estimate both the restricted and unrestricted equations and apply an F test

The difference between RSSR and RSSU should be minimal if the restrictions are valid

( RSS R RSSU ) /(kU k R ) F RSSU /(n kU )

Where subscripts U and R stand for the unrestricted and restricted equations respectfully and k is the number of parameters in the model

Testing Linear Restrictions


Applying the Test
1.
2.

3.
4.

5.

H0 = Restrictions are valid Estimate the restricted and unrestricted models. Calculate RSSR and RSSU Calculate the F statistic Find F critical for (kU kr, n ku) degrees of freedom form the F tables If F- statistic > F critical reject H0

Testing the joint significance of all the explanatory variables


Yi = 1 + 2X2i+ 3X3i + 4X4i + 5X5i + ei

Test the restriction that 2= 3 = 4 = 5= 0 In other words, test the null hypothesis that none of the coefficients in the model apart from the intercept are statistically significant The F statistic is calculated automatically by most statistical packages for this test!

Adding or Deleting explanatory variables

When only considering a single variable it is safe to check its t-ratio When more than one variable is involved you can apply the F test by estimating the restricted and unrestricted equations as discussed

Wald and LM Procedures

Wald and LM (Lagrange Multiplier) procedures are other ways of testing the validity of restrictions

Wald procedure: By estimating the unrestricted equation and applying a t-test to the restriction (e.g. the t-test on a single coefficient) Lagrange Multiplier test: By estimating the restricted model only and then testing for a relaxation of the restrictions by applying a formula These are both explained in Asteriou & Hall P74-75

Some other considerations

Type of Data used

We will distinguish between 3 types of data:

Scale Data (e.g. X= 1,1.5, 2.7, 3.1 ,4.6 ,5,.)

the numbers are measuring something, e.g. age, weight, turnover, share price etc.

Ordinal Data: (e.g. X: 1= very good, 2= good, 3= ok, 4=bad 5= very bad).

Numbers suggest some ordering but do not have a true meaning Education example: if 1= primary, 2 = secondary 3= 3rd level A person with 3rd level education doesnt have 3 times as much education!!
Any numbers assigned here have no real meaning Gender Example: 1= male 2= female. Can think of the numbers as just labels!!

Categorical data:

Non-scale dependent variables (Y)

Our Y variable (dependent variable) must be a scale for the results of the models we use to have any (correct) meaning

However if we are interested in non-scale variables there are methods to deal with this we wont cover them e.g. Probit if our model has just 2 categories etc. If, in future work, you are unsure what sort of models to consider I can point you in (hopefully!) the right direction

Non-scale Explanatory variables (Xs)

For ordinal variables:


The sign of the coefficient is reliable but the size of the coefficient is only likely to be reliable if the levels are evenly spaced. (Often not the case, e.g. effect of moving from primary education to secondary is not the same as moving from secondary to tertiary) (e.g. County: 1=Dublin, 2= Wicklow, 3= ) Often the sign has no meaning AND the size has no meaning either However it is a common mistake for people starting out in econometrics to mistakenly include these variables so be careful!!!

For categorical variables


The correct Approach for dealing with categorical data (and for ordinal though less commonly used):

Dummy Variables

Dummy Variables

A dummy variable is a variable which takes a value of 1 for a certain group of observations and a 0 for all other observations. E.g. Male=1 Female=0 Adding a Dummy variable to our regression allows the groups to have different intercepts Suppose we have data on Earnings and experience and we fit a regression:

Yi = 1 + 2X2i + i

Not using Dummy Variable


Y

Yi = 1 + 2X2i + i

Seems to fit ok, but few of the observations lie very close to the line seems to be a group above and a group below.

Dummy variable for Gender


Now suppose we also know gender: Create a dummy variable, D:

D=0 for females and D=1 for males


Yi = 1 + 2X2i + 3D + i So:

Our regression is now:


For Females: Yi = 1 + 2X2i + i For Males:


Rearranging this :

Yi = 1 + 2X2i + 3 + i Yi = (1 + 3) + 2X2i + i

We can see each group has a different intercept!!

Regression including a Dummy variable

(1 + 3)

1
X

Seems to fit better than just one line for both groups!!

Dummy Variable continued

Note: we could also use a dummy variable to allow the X variable to have a different impact on Y. We do this by including an interaction term, i.e. D*X Our regression is now (letting them both have the same intercept in this case for clarity):

Yi = 1 + 2X2i + 3D*X2i + i So:

If D=0: Yi = 1 + 2X2i + i If D=1: Yi = 1 + 2X2i + 3(1*X2i) i Yi = 1 + (2 + 3)X2i + i

Regression including an Interaction between Dummy variable and X2


Y

For D=1 Slope = 2 + 3

For D=0

Slope = 2

What if we think a variable doesnt always have the same effect? E.g. for wages, an extra year of experience may increase wages a lot, but after 20 years an extra year is unlikely to make much difference!!

Wages and Experience


Wages

1
Experience

Yi = 1 + 2X2i + i

We predict wages that are too high for low (and for high) experience but predict wages that are too low for medium experience workers

Fitting a curve

In algebra we would write the equation for a straight line as:


Y=c + mX In econometric terms: Yi = 1+ 2X1+ i Y=c+ mX+nX2 So the equivalent in econometrics is:
Yi = 1+ 2X1+ 3X21 +i

In algebra the equation for a curve is:

Wages and Experience


Wages

1
Experience

Yi = 1+ 2X1+ 3X21 +i

This fits better however now when we look at the effect of X, we must consider both 2 and 3

Violations of Assumptions of OLS

The Linear Regression Model The Assumptions


1.

2. 3. 4. 5.

6.
7. 8.

Linearity: The dependent variable is a linear function of the independent variable Xt has some variation i.e. Var(X) is not 0. Xt is non stochastic and fixed in repeated samples E(ut) = 0 Homoskedasticity: Var (ut) = 2 = constant for all t, Cov(ut, us) = 0 , serial independence ut ~ N (,2) No Multi-collinearity (i.e. no explanatory variable can be a linear combination of others in the model) This week we will only consider violations of Assumption 6 as and come back to the others towards the end of the course!

Autocorrelation

Autocorrelation is when the error terms of different observations are not independent of each other i.e. they are correlated with each other Consider the time series regression Yt = 1X1t + 2X2t+ 3X3t ++ kXkt + ut but in this case there is some relation between the error terms across observations. E(ut) = 0 Var (ut) = 2 But: Cov (ut, us) 0 Thus the error covariance's are not zero. This means that one of the assumption that makes OLS BLU does not hold. Autocorrelation is most likely to occur in a time series framework

Likely causes of Autocorrelation


1.

2.

3.

Omit variable that ought to be included. Misspecification of the functional form. This is most obvious where a straight line is put through a curve of dots. This would clearly show up in plots of residuals. Errors of measurement in the dependent variable. If the errors are not random then the error term will pick up any systematic mistakes.

The Problem with Autocorrelation

OLS estimators will be inefficient and no longer BLUE The estimated variances of the regression coefficients will be biased and inconsistent, therefore hypothesis testing will no longer be valid. R2 will tend to be overestimated and tstatistics will tend to be higher

Autocorrelation

Focus on simplest form of relation over time: first order autocorrelation which can be written as ut =ut-1 + t Where <1 is the parameter depicting the relationship among ut and ut-1. t is a new error term.

The current observation of the error term is a function of the previous observation of the error term. First order serial correlation.
Higher order serial correlation can be modeled with ut =ut-1 + 2ut-2 + 3ut-3 ++ kut-k + t

Detecting Autocorrelation

By observation

Plot the residuals against time Plot u^t against u^t-1


Only good for first order serial correlation It can give inconclusive results It is not applicable when a lagged dependent variable is included in the regression

The Durbin Watson Test

Breusch and Godfrey (1978) developed an alternative test

Breusch-Godfrey Test
This is an example of an LM (Lagrange Multiplier) type test where only the restricted form of the model is estimated. We then test for a relaxation of these restrictions by applying a formula. Consider the model

Yt = 1X1t + 2X2t+ 3X3t ++ kXkt + ut Where

ut =ut-1 + 2ut-2 + 3ut-3 ++ p ut-p + t Combing these two equations gives Yt = 1X1t + 2X2t+ 3X3t ++ kXkt + ut-1 + 2ut-2 + 3ut-3 ++ p ut-p + t Test the following H0 and Ha Ho: 1= 2 = 3 == p = 0 Ha: at least one of the s is not zero, thus there is serial correlation

Breusch-Godfrey Test
This two-stage test begins be considering the model Yt = 1X1t + 2X2t+ 3X3t ++ kXkt + ut (1) Estimate model 1 and save the residuals. Run the following model with the number of lags Used determined by the order of serial correlation you are willing to test. u^t = 0 + 1X2t+ 2X3t ++ RrXRt + R+1 u^t-1 + R+2u^t-2 ++ R+p u^t-p (auxiliary regression)

Breusch-Godfrey Test
The test statistic may be written as an LM statistic = (n - )*R2, where R2 relates to this auxiliary regression. The statistic is distributed asymptotically as chi-square (2) with degrees of freedom. If the LM statistic is bigger than the 2 critical then we reject the null hypothesis of no serial correlation and conclude that serial correlation exists

Ho: No Autocorrelation Here we have autocorrelation (Prob. F<0.05)

Also note previous residual is significant in regression

30

20

10
PREVRESID

-10

-20

-30 -30 -20 -10 0 RESID 10 20 30

Solutions to Autocorrelation
1. 2. 3. 4.

5.

Find cause Increase number of observations Specify correctly E-views provides number of procedures eg Cochrane Orcutt last resort Most important: It is easy to confuse misspecified dynamics with serial correlation in the errors. In fact it is best to always start from a general dynamic model and test the restrictions before applying the tests for serial correlation.

Thats all for today!


There are some questions on black board you should try the five questions with a * beside them and e-mail your answers to me No need to write too much - one page in total is fine (or less!)

You might also like