Partial Correlation

Partial & Semi-Partial Correlation and Multiple Regression
Relationships among > 2 variables
Correlation & Regression

Both test simple linear relationships between 2 variables
Correlation: non-directional Regression: directional
Both can be extended to more than 2 variables

Partial correlation: non-directional Semi-partial correlation: directional Multiple regression: directional
Dealing with Data

Imagine the ETS calls you up and says they think there is a relationship between the hours a student spends preparing for the SAT and the score on the SAT. They have asked recent SAT-takers to provide an estimate of the hours spent preparing (including classes). They provide you with these data as well as each students GPA and the final score on the SAT.
ETS Example
What data do you have?
Hours of prep GPA SAT score
What kinds of predictions might you make about the relationship between hours of preparation and SAT score? How can you examine the relationship(s)?
Simple Correlation
Goal: determine the relationship between 2 variables (e.g. y and x1) r2yx1 is the shared variance between y and x1
r2yx1
X1
ETS Example
Can look at simple correlation between each pair of variables
prep hours & SAT prep hours & GPA GPA & SAT
r2yx1
X1
ETS Example
Prep hours x SAT
Prep hours SAT score

15 6 12 2 18 30 26 15 10 20 5 30 12 16 25 7 24 10 14 22 1040 1450 1000 1510 1230 1160 1580 1240 1329 1470 1460 1020 1390 1200 1060 1040 1340 1280 1290 1450
GPA
2.8 3.75 2.6 3.8 3.2 2.75 3.15 2.4 3.3 3.5 3.4 2.4 3.6 2.87 2.9 2.65 2.67 3.5 3.23 3.0
ETS Example
GPA x SAT

15 6 12 2 18 30 26 15 10 20 5 30 12 16 25 7 24 10 14 22 1040 1450 1000 1510 1230 1160 1580 1240 1329 1470 1460 1020 1390 1200 1060 1040 1340 1280 1290 1450
GPA
2.8 3.75 2.6 3.8 3.2 2.75 3.15 2.4 3.3 3.5 3.4 2.4 3.6 2.87 2.9 2.65 2.67 3.5 3.23 3.0
ETS Example
GPA x prep hours

15 6 12 2 18 30 26 15 10 20 5 30 12 16 25 7 24 10 14 22 1040 1450 1000 1510 1230 1160 1580 1240 1329 1470 1460 1020 1390 1200 1060 1040 1340 1280 1290 1450
GPA
2.8 3.75 2.6 3.8 3.2 2.75 3.15 2.4 3.3 3.5 3.4 2.4 3.6 2.87 2.9 2.65 2.67 3.5 3.23 3.0
ETS Example
GPA & SAT: not surprising GPA & Prep hours: huh? GPA & Prep hours:
People with lower GPAs prep more (why?) Could explain the GPA & Prep hrs
Prep Hrs SAT Prep Hrs
GPA
Three (or more) Variables

3 variables = 3 relationships
Each can effect the other two r2yx1 X1
Partial & semi-partial correlation--remove contributions of 3rd variable
X2
Partial Correlation
Find the correlation between two variables with the third held constant in BOTH That is, we remove the effect of x2 from both y and x1 r2yx1.x2 is the shared variance of y & x1 with x2 removed
r2yx1.x2 r2yx1
X1
X2
Partial Correlation
Y r2 yx1.x2 X1
y without x2 & x1 without x2 (residuals) We can put this in terms of simple corr. coefficients:
Simple correlation between y and x1 ryx1.x2 =
X2
ryx1 - ryx2rx1x2
Product of the corr. between y & x2 and the corr. of x1 & x2
( 1 - r2yx2)(1 - r2x1x2)
These represent all the variance without the partialled out relationships
Partial Correlation
Y r2 yx1.x2 X1
The signicance of ryx1.x2 can be calculated using t
H0 : xy = 0 (no relationship) H1 : xy 0 (either positive or negative corr.) t(N-3) = r yx1.x2 - yx1. x2 (1 - r2yx1.x2)/N-3
1- r2 yx1.x2 is the unexplained variance N-3 = degrees of freedom (three variables) (1 - ryx1. x22)/N-3 = standard error of ryx1.x2
X2
ETS Example
Correlation between prep hours and SAT score with GPA partialled out:
ryx1.x2 = = ryx1 - ryx2rx1x2
( 1 - r2yx2)(1 - r2x1x2)
-0.21 -(-0.54*0.71)
(1 - (-0.542))(1 - 0.712)
= 0.28
ETS Example
The partial correlation between prep hours and SAT score with effect of GPA removed: ryx1.x2 = 0.28, r2yx1.x2 = 0.08
t (N-3) = t (17) = ryx1.x2 ( 1 - r2yx1.x2)/N-3 0.28
(1 - 0.08)/17
= 1.23 Significant? t0.05(17) = 2.11 t(17) = 1.23 is not significant
Semi-Partial Correlation
Find the correlation between two variables with the third held constant in one of the variables That is, we remove the effect of x2 from x1 r2y(x1.x2) is the shared variance of y & x1 with x2 removed from x1
r2y(x1.x2) r2yx1
X1
X2
Why semi-partial? Generally used with multiple regression to remove the effect of one predictor from another predictor without removing that variability in the predicted variable NOT typically reported as the only analysis
r2y(x1.x2) r2yx1
X1
X2
y & x1 without x2 (residuals) Put in terms of simple correlation coefficients:
Simple correlation between y and x1 ry(x1. x2) =
Y r2y(x1. x2) X1
X2
ryx1 - ryx2rx1x2 ( 1 - r2x1x2)
Product of the corr. between y & x2 and the corr. of x1 & x2
Same as partial except the shared variance of y & x2 is left in
Which will be larger, the partial or the semipartial correlation?
ryx1 - ryx2rx1x2 ( 1 - r2 yx2)(1 - r2x1x2) ryx1 - ryx2rx1x2 ( 1 - r2x1x2)
ryx1.x2 =
partial
ry(x1. x2) =
semi-partial
ETS Example
Going back to the SAT example, suppose we partial out GPA from hours of prep only ryx1 - ryx2rx1x2
ry(x1. x2) = (1 - r2x1x2) -0.21 -(-0.54*0.71) = (1 - 0.542 ) = 0.20
Signicance of Semi-partial
Same as for partial correlation, just substitute the ry(x1.x2) df = N-3
t (N-3) = ry(x1. x2) ( 1 - r2y(x1.x2))/N-3
ETS Example
The semi-partial correlation between prep hours and SAT score with effect of GPA removed: ry(x1.x2) = 0.20, r2y(x1.x2) = 0.04
t (N-3) = t (17) = ry(x1. x2) ( 1 - r2y(x1.x2))/N-3 0.20 ( 1 - 0.04)/17
= 0.84 Significant? t0.05(17) = 2.11 t(17) = 0.84 is not significant
Multiple Regression
Simple regression: y = a + bx Multiple regression: General Linear Model
y = a + b 1 x1 + b2 x2 (2 predictors) Therefore, the general formula: y = a + b1 x1 + + b k xk (k predictors)
The problem is to solve for k+1 coefficients
k predictors (regressors) + the intercept We are most concerned with the predictors
ETS Example
Prep hours (x1), GPA (x2), & SAT (y)
Use Prep hours and GPA to predict SAT score
Simple regressions
y = -4.79x1 + 1353 y = 300x2 + 355
ETS Example
Use both prep hours and GPA to predict SAT score Now find equation for 3-D relationship
Finding Regression Weights

What do we minimize?
(y-y)2 (least square principle)
For multiple regression, it is easier to think in terms of standardized regression coefcients*

What do we minimize?
(y-y)2 (least square principle)
For multiple regression, it is easier to think in terms of standardized regression coefcients*

zy = 1 zx1 + 2zx2 The goal is to nd s that minimizes:
1 (zy - zy) 2 N = 1 (zy - zx - zx )2 1 1 2 2 N

Using differential calculus, we find 2 normal equations for 2 regressors:
1 + rx1x2 1 + rx1x22 2 rx1y - rx2yrx1x2 1 - r2x1x2 rx2y - rx1yrx1x2 1 - r2x1x2
rx1y = 0 rx2y = 0
Notice that these are like the semipartial correlation
These can be converted to:

1 =
2 =

In practice, the raw scores are used:
zy y - y est y = 1 = 1zx1 + x1 - x1 est x1 + 2 2zx2 x2 - x2 est x2
which is equivalent to:

y = 1 est y est x1 x1+ 2 est y x 2 + y - 1 est y est x1 x 1 - 2 x2 est x2 est y
est x2
10

Look at each segment...
y = 1 y = est y est x1 b 1 x1 x1+ 2 + est y est y est y x 1 - 2 x 2 + y - 1 x2 est x1 est x2 est x2 b2 x2 + a
we have the regression equation with...

b1 = 1 est y est x1 b2 = 2 est y est x2 Note: RAW
regression
weights
a = y - b1 x1 - b2x2
ETS Example
Use the rs to get the s
rx1x2 = -0.54 rx1y = -0.22 rx2y = 0.72
1 = rx1y - rx2yrx1x2 1 - r2x1x2 2 = rx2y - rx1yrx1x2 1 - r2x1x2 2 = 0.84 est y est x2
1 = 0.24 Use the s to get the coefcients b1 = 1 est y est x1 = 5.16 b2 = 2
= 353
a = y - b1 x1 - b2x2 = 110

For >2 predictors, the same principle apply
Use normal equations will minimize (y - y) 2 (deviation of actual from predicted) The equations can be expressed in matrix form as: RijB j - R jy = 0 Rij = k x k matrix of the correlation among the different independent variables (xs) Bj = a column vector of the k unknown values (1 for each x) Rjy = a column vector of the correlation coefcient for each k predictor and the dependent variable (y)
11

RijBj - Rjy = 0 Rij and Rjy are known
each rxixj and each ryxi
Therefore, we can solve for Bj Bj = Rij-1 Rjy (in matrix form, this is really easy!) Dont worry about actually calculating these, but be sure you understand the equation!

For each independent variable, we can use the relationship of b to :
bj = j est y est xj
The same principle for obtaining the intercept in simple regression applies as well:
a = y - b jxj
Explained Variance (Fit)

For 2 predictors, equation denes a plane
y = 5.16x1 + 353x2 + 110 (ETS example)
How far are the points in 3-D space from the plane dened by the equation?
12
Explained Variance
In addition to simple (rxy), partial (ryx1.x2), & semi-partial (ry(x1.x2)) correlation coefcients, we can have a multiple correlation coefcient (Ry.x1x2) Ry.x1x2 = correlation between observed value of y and predicted value of y
can be expressed in terms of beta weights and simple correlation coefcients
Ry.x1x2 =
1 yx1
+ 2 r yx2
OR
R2y.x1x2 =
1r yx1 + 2r yx2
Explained Variance
R2y.x1x2 = 1 ryx1 + 2 ryx2 Any i represents the contribution of variable xi to predicting y The more general version of this equation is simply: R2 = jryxj or in matrix form... R2 = BjRjy
(Just add up the products of the s and the rs)
How are i s and R2 related to the simple correlation coefcients?
Explained Variance
R2y.x1x2 = 1 ryx1 + 2 ryx2 If x1 and x2 are uncorrelated:
1 = ryx1
X1 Y
2 = ryx2
X2
R2y.x1x2 = ryx1ryx1 + ryx2ryx2 = r2yx1 + r2yx2
13
Explained Variance
R2y.x1x2 = 1 ryx1 + 2 ryx2 If x1 and x2 are correlated:
is are corrected so that overlap is not counted twice
Y
X1
X2
Adjusted R2
R2 is a biased estimate of the population R2 value If you want to estimate the population, use Adjusted R2
Most stats packages calculate both R2 and Adjusted R2 If not, the value can be obtained from the R2:
(k)(1 - R2) Adj R 2 = R2 N-k-1
Signicance Tests
In multiple regression, there are 3 different statistical tests that are of interest
Significance of R2
Is the fit of the regression model significant?
Significance for increments to R2

How much does adding a variable improve the fit of the regression model?
Significance of the regression coefficients

j is the contribution of xj. Is this different from 0?
14
Partitioning Variance
(y-y)2 = (y-y)2 + (y-y)2
Total variance in y (aka SStotal) Unexplained Explained Variance Variance (aka SSres) (aka SSreg)
Same as in simple regression! The only difference is that y is generated by a linear function of several independent variables (k predictors) Note: SStotal = SSregression + SSresidual
Signicance of R2
Need a ratio of variances (F value)
F= MSReg MSRes = SSReg/dfReg SSRes/dfRes
Where do these values come from? SSReg = (y-y)2; dfReg = k (# of regressors - 1) SSRes = (y-y)2; dfRes = N-k-1 (# obs - # reg) F for the overall model reflects this ratio
Signicant Increments to R2
As variables (predictors) are added to the regression R2 can
stay the same; additional variable has NO contribution increase; additional variable has some contribution
If R2 increases, we want to know if that increase is significant
15
Signicant Increments to R2
Use an FR2 FR2 = RL2 - RS2/kL - k S (1-RL2 )/(N-kL-1)
Making sense of the equation

L = larger model; S = smaller model ALL variables in smaller model (S) must also be in larger model (L) Therefore, L is model S + one or more additional variables
Signicance of Coefcients
Think about bj in t terms: bj /est bj bj/est bj is distributed as a t with N-k-1 degrees of freedom, where...
est bj = SSRes/N-k-1 SSj(1-Rj2 )
SSj = sum of squares for variable xj Rj2 = squared multiple correlation for predicting j from remaining k-1 predictors (treating xj as the predicted variable)
Signicance of Coefcients
est bj = SSRes/N-k-1 SSj(1-Rj2 )
As Rj2 increases, the denominator of the t equation approaches 0; that is, est bj becomes larger As the remaining xs account for xj , bj is less likely to reach significance
16
Importance of IVs (xs)

Uncorrelated IVs: simple ryxjs work Correlated IVs:
simple correlation coefficients include variance shared among IVs (over-estimated) regression weights can involve predictor intercorrelations or suppressors (more later) Best measure: squared semi-partial correlation srj2 BUT srj2 comes in different forms for different types of regression
Multiple Regression Types

Several types of regression available How do they differ?
Method for entering variables
What variables are in the model What variables are held constant
Use different types of R2 values to assess importance Use of different measures to assess importance of IVs

Simultaneous Regression (most common)
Single regression model with all variables
All predictors are entered simultaneously All variables treated equally
Each predictor is assessed as if it was entered last

Each predictor is evaluated in terms of what it adds to the prediction of the dependent variable, over and above the other variables Key test: srj2 for each xj with all other xs held constant
17

Hierarchical Regression
Multiple models calculated
Start with one predictor Add predictors Order specified by researcher
Each predictor is assessed in terms of what it adds at the time it is entered

Each predictor is evaluated in terms of what it adds to the prediction of the dependent variable, over and above the other variables that have already been entered Key test: R2 at each step

Hierarchical Regression
Used when the researcher has a priori reasons for entering variables in a certain order
Specific hypothesis about the components of theoretical models Practical concerns about what it is important to know

Stepwise & Setwise Regressions
Multiple models calculated (like hierarchical)
Use statistical criteria to determine order Limit final model to meaningful regressors
Recommended for exploratory analyses of very large data sets (> 30 predictors)
With lots of predictors, keeping all but one constant may make it difficult to find any significant These procedures capitalize on chance to find the meaningful variables
18

Stepwise Regression: Forward
Step 1: enter xj with largest simple ryxj Step 2: partial out first variable and choose xj with highest partial ryxj.x1 Step 3: partial out x1 and x2 Stop when resulting model reaches some criteria (e.g., min R2)

Stepwise Regression: Backward
Step 1: Start with complete model (all xjs) Step 2: remove xj based on some criterion
Smallest R2 Smallest F
Stop removing variables when some criteria is reached

All regressors significant Min R 2

Setwise Regression
Test several simultaneous models Finds the best possible subset of variables
Setwise(#): for a given set size Setwise Full: for all possible set sizes
For example, with 8 variables:

Look at all possible combinations of say 5 variables Figure out which combo has the largest R2 Can be done for sets of 2, 3, 4, 5, 6, or 7 variables In each case, find the set with the largest R2
19
Importance of Regressors
is primarily serve to help dene the equation for predicting y Squared semi-partial correlation (sr2) more appropriate for practical importance
Put in terms of variance explained by each regressor Compare how variance much each regressor explains

For simultaneous or setwise regression
srj2 is the amount R2 would be reduced if variable xj were not included in the regression equation In terms of the regression statistics:
srj2 = Fj dfRes (1-R2)
When the IVs are correlated, the srj2 s for all of the xjs will not sum to the R2 for the full model

For hierarchical or stepwise regression
srj2 is the increment to R2 added when xj is entered into the equation. Because each variable is added separately, the srj2 will reflect that variables contribution AT A PARTICULAR POINT in the model The sum of the srj2 values WILL sum to R2 The importance of the different variables may vary depending on the order in which the variables are entered
20
Potential Problems
Several assumptions
(see Berry & Feldman pp. 10-11 in book)
Random variables, interval scale No perfect collinear relationships
Also practical concerns Focus on most relevant/prevalent
Multicollinearity
Perfect collinearity: when one independent variable is perfectly linearly related to one or more of the other regressors
x1 = 2.3x2 + 4 : x1 is perfectly predicted by x2 x1 = 4.1x3 + .45x4 + 11.32 ; x1 is perfectly predicted by linear combination of x3 and x4 Any case where there is an R2 value of 1.00 among the regressors (NOT including y) Why might this be a problem?
Multicollinearity
Perfect collinearity (simplest case)
One variable is a linear function of another Wed be thrilled (and skeptical) to see this in a simple regression However...
21
Multicollinearity
Problem in multiple regression y values will line up in single plane rather than varying about a plane
Multicollinearity
No way to determine the plane that fits the y values best Many possible planes
Multicollinearity
In practice
perfect collinearity violates assumptions of regression less-than-perfect collinearity is more common
not an all or nothing situation can have varying degrees of multicollinearity dealing with multicollinearity depends on what you want to know
22
Multicollinearity
Consequences
If the only goal is prediction, not a problem
plugging in known numbers will give you the unknown value although specific regression weights may vary, the final outcome will not
We usually want to explain the data

can identify the contributions of the regressors that are NOT collinear cannot identify the contributions of the regressors that are collinear because regression weights will change from sample to sample
Multicollinearity
Detecting collinearities
Some clues
full model is significant but none of the individual regressors reach significance instability of weights across multiple samples look at simple regression coefficients for all pairs cumbersome way: regress each independent variable on all other independent variables to see if any R2 values are close to 1
Multicollinearity
What can you do about it?
Increase the sample size
reduce the error offset the effects of multicollinearity
If you know the relationship, you can use that information to offset the effect (yea right!) Delete one of the variables causing the problem
which one? If one is predicted by group of others logical rationale? presumably, the variables were there for theoretical reasons
23
Multicollinearity
Detecting collinearities
SPSS: Collinearity diagnostics & follow-up
Tolerance: 1-R2 for the regression of each IV against the remaining regressors.
Collinearity: tolerance close to 0 Use this to locate the collinearity
VIF: variance inflation factor = instability of

(reciprocal of Tolerance)
To locate collinearity
removing the variable with the lowest tolerance
To resolve original regression

Run a Forward regression on the variable with the lowest tolerance
Suppression
Special case of multicollinearity Suppressor variables are variables that increase the values of R2 by virtue of their correlations with other predictors and NOT the dependent variable The best way to explain this is by way of an example...
Suppression Example
Predicting course grade in a multivariate statistics course with GRE verbal and quantitative
The multiple correlation R was 0.62 (reasonable, right?) However, the s were 0.58 for GRE-Q and -0.24 for GRE-V Does this mean that higher GRE-V scores were associated with lower course performance? Not exactly
24
Suppression Example
Why was the for GRE-V negative?
The GRE-V alone actually had a small positive correlation with course grade The GRE-V and GRE-Q are highly correlated with each other The regression weights indicate that for a given score on the GRE-Q, the lower a person scores on the GRE-V, the higher the predicted course grade
Suppression Example
Another way to put it...
The GRE-Q is a good predictor of course grade, but part of the performance on GRE-Q is determined by GRE-V, so it favors people of high verbal ability. Suppose we have 2 people who score equally on GRE-Q but differently on GRE-V
Bob scores 500 on GRE-Q and 600 on GRE-V Jane scores 500 on GRE-Q and 500 on GRE-V What happens to the predictions about course grade?
Suppression Example
Another way to put it...
Bob: 500 on GRE-Q and 600 on GRE-V Jane: 500 on GRE-Q and 500 on GRE-V Based on the verbal scores, we would predict that Bob should have better quantitative skills than Jane, but he does not score better Thus, Bob must actually have LESS quantitative knowledge than Jane, so we would predict his course grade to be lower. This is equivalent to giving GRE-V a negative regression weight, despite positive correlation
25
Suppression
More generally...
If x2 is a better measure of the source of errors in x1 than in y, then giving x2 a negative regression weight will improve our predictions of y x2 subtracts out/corrects for (i.e., suppresses) sources of error in x1 Suppression seems counterintuitive, but actually improves the model
Suppression
More generally...
Suppressor variables usually considered bad--can cause misinterpretation (GRE example) However, careful exploration
enlighten understanding of interplay of variables improve our prediction of y
Easy to identify
Significant regression weights b/ (reg) & r (simple corr) have opposite signs
Practical Issues
Number of cases
Must exceed number of predictors (N > k) Acceptable N/k ratio depends on
reliability of data researchers goals
Larger samples required for:

more specific conclusions vs vague conclusions post-hoc vs. a priori tests designs with interactions collinear predictors
Generally, more is better
26
Practical Issues
Outliers
Correlation is extremely sensitive to outliers Easiest to show with simple correlation rxy = +0.59 rxy = -0.03
Outliers should be assessed for DV and all IVs Ideally, we would identify multivariate outliers, but this is not practical
Practical Issues
Linearity
Multiple regression assumes linear relationship between DV and each IV Relationships can be non-linear, multiple regression may not be appropriate Transformations may rectify non-linearity
logs reciprocals
Practical Issues
Normality
Normally distributed relationship between y and residuals (y-y) violation affects statistical conclusions, but not validity of model
Homoscedasticity
multivariate version of homogeneity of variance violation affects statistical conclusions, but not validity of model
27
y' values
Residuals (y-y') Residuals (y-y) Residuals (y-y') Residuals (y-y)
y' values
Assumptions Met
y' values
Residuals (y-y') Residuals (y-y) Residuals (y-y') Residuals (y-y)
Normality Violated
y' values
Linearity Violated
Homoscedasticity Violated
What do you report?

Correlation analyses
Always state whats happening in the data Report the r value and its corresponding p value (either actual p or p < ) Qualify simple correlation with partial correlation coefficient, if multiple variables Authors may include r2 values, stating xx% of the variance was accounted for by this relationship.
What do you report?

Regression analyses
Report the correlations first For simple regression: state the equation, the r2 value, and significance of the regression weight (sometimes a table will work) For multiple regression
state equation (not always in manuscripts) state practical importance of each regressor (sr2 ) state the relative relationship among regressors state the significance of each regressor
28

Partial Correlation

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Partial Correlation

Uploaded by

Copyright:

Available Formats

Partial & Semi-Partial Correlation and Multiple Regression

Relationships among > 2 variables

Correlation & Regression

Both can be extended to more than 2 variables

Dealing with Data

Prep hours SAT score

Prep hours SAT score

Prep hours SAT score

Prep Hrs SAT Prep Hrs

Three (or more) Variables

Partial & semi-partial correlation--remove contributions of 3rd variable

Product of the corr. between y & x2 and the corr. of x1 & x2

The signicance of ryx1.x2 can be calculated using t

= 1.23 Significant? t0.05(17) = 2.11 t(17) = 1.23 is not significant

ryx1 - ryx2rx1x2 ( 1 - r2x1x2)

Product of the corr. between y & x2 and the corr. of x1 & x2

Same as partial except the shared variance of y & x2 is left in

= 0.84 Significant? t0.05(17) = 2.11 t(17) = 0.84 is not significant

Finding Regression Weights

For multiple regression, it is easier to think in terms of standardized regression coefcients*

Finding Regression Weights

For multiple regression, it is easier to think in terms of standardized regression coefcients*

Finding Regression Weights

These can be converted to:

Finding Regression Weights

which is equivalent to:

Finding Regression Weights

we have the regression equation with...

1 = 0.24 Use the s to get the coefcients b1 = 1 est y est x1 = 5.16 b2 = 2

Finding Regression Weights

Finding Regression Weights

Finding Regression Weights

Explained Variance (Fit)

How are i s and R2 related to the simple correlation coefcients?

R2y.x1x2 = ryx1ryx1 + ryx2ryx2 = r2yx1 + r2yx2

Significance for increments to R2

Significance of the regression coefficients

If R2 increases, we want to know if that increase is significant

Making sense of the equation

Importance of IVs (xs)

Multiple Regression Types

Multiple Regression Types

Each predictor is assessed as if it was entered last

Multiple Regression Types

Each predictor is assessed in terms of what it adds at the time it is entered

Multiple Regression Types

Multiple Regression Types

Multiple Regression Types

Multiple Regression Types

Stop removing variables when some criteria is reached

Multiple Regression Types

For example, with 8 variables:

Importance of IVs (xs)

Importance of IVs (xs)

Random variables, interval scale No perfect collinear relationships

Also practical concerns Focus on most relevant/prevalent

We usually want to explain the data

VIF: variance inflation factor = instability of

To resolve original regression

Larger samples required for:

Generally, more is better

What do you report?

What do you report?