Producing and Interpreting Residuals Plots in SPSS
In a linear regression analysis it is assumed that the distribution of residuals,
)
( Y Y , is, in the population, normal at every level of predicted Y and constant in
variance across levels of predicted Y. I shall illustrate how to check that assumption. Although I shall use a bivariate regression, the same techniue would work for a multiple regression. !tart by downloading Residual-Skew.dat and Residual-Hetero.dat from my !tat"ata page and A#$%A&.sav from my !'!! data page. (ach line of data has four scores) *, Y, *+, and Y+. ,he delimiter is a blank space. -reate new variable !./,0Y+ this way) ,ransform, -ompute, $1. 2irst some descriptive statistics on the variables) Descriptive Statistics +33 &4 56 67.89 9.978 :.38; .&5+ .&4& .;6+ +33 ++ 55 69.44 9.765 :.349 .&5+ :.+66 .;6+ +33 & &66 68.47 +5.&+; &.364 .&5+ &.3;7 .;6+ +33 ; &4; 67.6; +5.676 .967 .&5+ &.+83 .;6+ +33 &.5; &+.55 4.458& &.95+99 .&5& .&5+ :.&94 .;6+ +33 * Y *+ Y+ Y+0!./, %alid # (listwise) !tatistic !tatistic !tatistic !tatistic !tatistic !tatistic !td. (rror !tatistic !td. (rror # <inimum <a=imum <ean !td. "eviation !kewness 1urtosis #otice that variables * and Y are not skewed > I generated them with a normal random number generator. #otice that *+ and Y+ are skewed and that taking the suare root of Y+ reduces its skewness greatly. ?ere we predict Y from *, produce a residuals plot, and save the residuals.
-opyright +335, 1arl @. Auensch : All rights reserved.
/esidual:'lots:!'!!.doc + Model Summary b .683 a .+3; .&99 7.7&8 <odel & / / !uare AdBusted / !uare !td. (rror of the (stimate 'redictors) (-onstant), *+ a. "ependent %ariable) Y b. ?ere is a histogram of the residuals with a normal curve superimposed. ,he residuals look close to normal. ?ere is a plot of the residuals versus predicted Y. ,he pattern show here indicates no problems with the assumption that the residuals are normally distributed at each level of Y and constant in variance across levels of Y. !'!! does not automatically draw in the regression line (the horiContal line at residual D 3). I double clicked the chart and then selected (lements, 2it @ine at ,otal to get that line. ; !'!! has saved the residuals, unstandardiCed (/(!0&) and standardiCed (E/(0&) to the data file) AnalyCe, (=plore E/(0& to get a better picture of the standardiCed residuals. ,he plots look fine. As you can see, the skewness and kurtosis of the residuals is about what you would e=pect if they came from a normal distribution) 6 Descriptives .3333333 :+.8867& +.488&7 8.+3999 :.356 :.+46 <ean <inimum <a=imum /ange !kewness 1urtosis E/(0& !tatistic #ow predict Y from the skewed *+. You conduct this analysis with the same plots and saved residuals as above. You will notice that the residuals plots and e=ploration of the saved residuals indicate no problems for the regression model. ,he skewness of *+ may be troublesome for the correlation model, but not for the regression model. #e=t, predict skewed Y+ from *. Model Summary b .68+ a .+36 .+33 +6.87& <odel & / / !uare AdBusted / !uare !td. (rror of the (stimate 'redictors) (-onstant), * a. "ependent %ariable) Y+ b. 8 #otice that the residuals plots shows the residuals not to be normally distributed > they are pulled out (skewed) towards the top of the plot. (=plore also shows trouble) 4 Descriptives .3333333 :&.75656 ;.4&;99 8.6775; &.;63;9 .73; .948 <ean <inimum <a=imum /ange Interuartile /ange !kewness 1urtosis E/(0& !tatistic #otice the outliers in the bo=plot. <aybe we can solve this problem by taking the suare root of Y+. 'redict the suare root of Y from *. 5 Model Summary b .689 a .+&& .+35 &.585;7 <odel & / / !uare AdBusted / !uare !td. (rror of the (stimate 'redictors) (-onstant), * a. "ependent %ariable) Y+0!./, b. Descriptives .3333333 .3538;+59 :+.+&694 +.7+443 8.36&84 &.6+535 .&;; .&5+ :.+63 .;6+ <ean <inimum <a=imum /ange Interuartile /ange !kewness 1urtosis E/(0& !tatistic !td. (rror #otice that the transformation did wonders, reducing the skewness of the residuals to a comfortable level. 7 Ae are done with the /esidual:!kew data set now. /ead into !'!! the A#$%A&.sav data file. -onduct a linear regression analysis to predict illness from dose of drug. !ave the standardiCed residuals and obtain the same plots that we produced above. Model Summary b .&&3 a .3&+ .33+ &+.&&; <odel & / / !uare AdBusted / !uare !td. (rror of the (stimate 'redictors) (-onstant), dose a. "ependent %ariable) illness b. @ook at the residuals plot. $h my. #otice that the residuals are not symmetrically distributed about Cero. ,hey are mostly positive with low and high values of predicted Y and mostly negative with medium values of predicted Y. If you were to find the means of the residuals at each level of Y and connect those means with the line you would get a curve with one bend. ,his strongly suggests that the relationship between * and Y is not linear and you should try a nonlinear model. #otice that the problem is not apparent when we look at the marginal distribution of the residuals. 'roduce the new variable "ose0!. by suaring "ose, $1. 9 #ow predict Illness from a combination of "ose and "ose0!.. Ask for the usual plots and save residuals and predicted scores. Model Summary(b <odel / / !uare AdBusted / !uare !td. (rror of the (stimate & .485(a) .6;& .6&9 9.+;7 a 'redictors) (-onstant), "ose0!., dose b "ependent %ariable) illness #otice that the R has gone up a lot and is now significant, and the residuals plot looks fine. &3 @et us have a look at the regression line. Ae saved the predicted scores ('/(0&), so we can plot their means against dose of the drug) -lick Fraphs, @ine, !imple, "efine. !elect @ine /epresents $ther statistic and scoot '/(0& into the variable bo=. !coot "ose into the -ategory A=is bo=. $1. && Aow, that is certainly no straight line. Ahat we have done here is a polynomial regression, fitting the data with a uadratic line. A uadratic line can have one bend in it. @et us get a scatter plot with the data and the uadratic regression line. -lick Fraph, !catter, !imple !catter, "efine. !coot Illness into the Y:a=is bo= and "ose into the *:a=is bo=. $1. "ouble:click the graph to open the graph editor and select (lements, 2it line at total. !'!! will draw a nearly flat, straight line. In the 'roperties bo= change 2it <ethod from @inear to .uadratic. -lick Apply and then close the chart editor. &+ Ae are done with the A#$%A.sav data for now. Gring into !'!! the /esidual: ?(,(/$.dat data. (ach case has two scores, * and Y. ,he delimiter is a blank space. -onduct a regression analysis predicting Y from *. -reate residuals plots and save the standardiCed residuals as we have been doing with each analysis. &; As you can see, the residuals plot shows clear evidence of heteroscedasticity. In this case, the error in predicted Y increases as the value of predicted Y increases. I have been told that transforming one the variables sometimes reduces heteroscedasticity, but in my e=perience it often does not help. /eturn to AuenschHs !'!! @essons 'age -opyright +335, 1arl @. Auensch : All rights reserved. &6