You are on page 1of 25

TESTING THE STRENGTH OF THE MULTIPLE REGRESSION MODEL

Test 1: Are Any of the xs Useful in Predicting y?


We are asking: Can we conclude at least one of the s (other than 0) 0? H0: 1 = 2 = 3 = 4 = 0 HA: At least one of these s 0 = .05

Idea of the Test


Measure the overall average variability due to changes in the xs Measure the overall average variability that is due to randomness (error) If the overall average variability due to changes in the xs IS A LOT LARGER than average variability due to error, we conclude at least is non-zero, i.e. at least one factor (x) is useful in predicting y

Total Variability
Just like with simple linear regression we have total sum of squares due to regression SSR , and total sum of squares due to error, SSE, which are printed on the EXCEL output.
The formulas are a more complicated (they involve matrix operations)

Average Variability
Average variability (Mean variability) for a group is defined as the Total Variability divided by the degrees of freedom associated with that group: Mean Squares Due to Regression MSR = SSR/DFR Mean Squares Due to Error MSE = SSE/DFE

Degrees of Freedom
Total number of degrees of freedom DF(Total) always = n-1 Degrees of freedom for regression (DFR) = the number of factors in the regression (i.e. the number of xs in the linear regression)
Degrees of freedom for error (DFE) = difference between the two = DF(Total) -DFR

The F-Statistic
The F-statistic is defined as the ratio of two measures of variability. Here,

MSR F MSE
Recall we are saying if MSR is large compared to MSE, at least one 0. Thus if F is large, we draw the conclusion is that HA is true, i.e. at least one 0.

The F-test
Large compared to what? F-tables give critical values for given values of TEST: REJECT H0 (Accept HA) if: F = MSR/MSE > F,DFR,DFE

RESULTS
If we do not get a large F statistic
We cannot conclude that any of the variables in this model are significant in predicting y.

If we do get a large F statistic


We can conclude at least one of the variables is significant for predicting y . NATURAL QUESTION - WHICH ONES?

DFR = #xs DFE = Total DF- DFR Total DF = n-1

SSR SSE Total SS = (yi - y )2

MSR = SSR/DFR MSE = SSE/DFE

F = MSR/MSE

P-value for the F test

Results
We see that the F statistic is 20.89762 This would be compared to F.05,3,34
From the F.05 Table, the value of F.05,3,34 is not given. But F.05,3,30 = 2.92 and F.05,3,40 = 2.84. And 20.89762 > either of these numbers. The actual value of F.05,3,34 can be calculated by Excel by FINV(.05,3,34) = 2.882601

USE SIGNIFICANCE F
This is the p-value for the F-Test Significance F = 7.46 x 10-8 = .0000000746 < .05 Can conclude that at least one x is useful in predicting y

Test 2: Which Variables Are Significant IN THIS MODEL?


The question we are asking is, taking all the other factors (xs) into consideration, does a change in a particular x (x3, say) value significantly affect y.
This is another hypothesis test (a t-test). To test if the age of the house is significant: H0: 3 = 0 (x3 is not significant in this model) HA: 3 0 (x3 is significant in this model)

The t-test for a particular factor IN THIS MODEL


Reject H0 (Accept HA) if:

0 t 3 t.025,DFE or t.025,DFE s 3

t-value for test of 3 = 0


p-value for test of 3 = 0

Reading Printout for the t-test


Simply look at the p-value
p-value for 3 = 0 is .02194 < .05
Thus the age of the house is significant in this model

The other variables


p-value for 1 = 0 is .0000839 < .05
Thus square feet is significant in this model

p-value for 2 = 0 is .15503 > .05


Thus the land (acres) is not significant in this model

Does A Poor t-value Imply the Variable is not Useful in Predicting y?


NO It says the variable is not significant IN THIS MODEL when we consider all the other factors. In this model land is not significant when included with square footage and age. But if we would have run this model without square footage we would have gotten the output on the next slide.

p-value for land is .00000717. In this model Land is significant.

Can it even happen that F says at least one variable is significant, but none of the ts indicate a useful variable?
YES EXAMPLES IN WHICH THIS MIGHT HAPPEN:
Miles per gallon vs. horsepower and engine size Salary vs. GPA and GPA in major Income vs. age and experience HOUSE PRICE vs. SQUARE FOOTAGE OF HOUSE AND LAND

There is a relation between the xs


Multicollinearity

Approaches That Could Be Used When Multicollinearity Is Detected


Eliminate some variables and run again Stepwise regression
This is discussed in a future module.

Test 3 --What Proportion of the Overall Variability in y Is Due to Changes in the xs?
2 R
R2 = .442197 Overall 44% of the total variation in sales price is explained by changes in square footage, land, and age of the house.

What is Adjusted R2?


Adjusted R2 adjusts R2 to take into account degrees of freedom. By assuming a higher order equation for y, we can force the curve to fit this one set of data points in the model eliminating much of the variability (See next slide). But this is not what is going on!
R2 might be higher but adjusted R2 might be much lower

Adjusted R2 takes this into account Adjusted R2 = 1-MSE/SST

Scatterplot
Sales vs Ad Dollars
$140,000 $120,000

$100,000

$80,000

Sales
$60,000 $40,000 $20,000

This is not what is really going on


$$200 $400 $600 Ad Dollars $800 $1,000 $1,200 $1,400

$0

Review
Are any of the xs useful in predicting y IN THIS MODEL
Look at p-value for F-test Significance F F = MSR/MSE would be compared to F,DFR,DFE

Which variables are significant in this model?


Look at p-values for the individual t-tests

What proportion of the total variance in y can be explained by changes in the xs?
R2 Adjusted R2 takes into account the reduced degrees of freedom for the error term by including more terms in the model

4 Places to Look on Excel Printout


4- R2 What proportion of y can be explained by changes in x?

2- Significance F Are any variables useful?

1-regression equation

3- p-values for t-tests Which variables are significant in this model?

You might also like