You are on page 1of 14

Producing and Interpreting Residuals Plots in SPSS

In a linear regression analysis it is assumed that the distribution of residuals,


)

( Y Y , is, in the population, normal at every level of predicted Y and constant in


variance across levels of predicted Y. I shall illustrate how to check that assumption.
Although I shall use a bivariate regression, the same techniue would work for a
multiple regression.
!tart by downloading Residual-Skew.dat and Residual-Hetero.dat from my
!tat"ata page and A#$%A&.sav from my !'!! data page. (ach line of data has four
scores) *, Y, *+, and Y+. ,he delimiter is a blank space.
-reate new variable !./,0Y+ this way) ,ransform, -ompute,
$1.
2irst some descriptive statistics on the variables)
Descriptive Statistics
+33 &4 56 67.89 9.978 :.38; .&5+ .&4& .;6+
+33 ++ 55 69.44 9.765 :.349 .&5+ :.+66 .;6+
+33 & &66 68.47 +5.&+; &.364 .&5+ &.3;7 .;6+
+33 ; &4; 67.6; +5.676 .967 .&5+ &.+83 .;6+
+33 &.5; &+.55 4.458& &.95+99 .&5& .&5+ :.&94 .;6+
+33
*
Y
*+
Y+
Y+0!./,
%alid # (listwise)
!tatistic !tatistic !tatistic !tatistic !tatistic !tatistic !td. (rror !tatistic !td. (rror
# <inimum <a=imum <ean
!td.
"eviation
!kewness 1urtosis
#otice that variables * and Y are not skewed > I generated them with a normal
random number generator. #otice that *+ and Y+ are skewed and that taking the
suare root of Y+ reduces its skewness greatly.
?ere we predict Y from *, produce a residuals plot, and save the residuals.

-opyright +335, 1arl @. Auensch : All rights reserved.


/esidual:'lots:!'!!.doc
+
Model Summary
b
.683
a
.+3; .&99 7.7&8
<odel
&
/ / !uare
AdBusted
/ !uare
!td. (rror of
the (stimate
'redictors) (-onstant), *+
a.
"ependent %ariable) Y
b.
?ere is a histogram of the residuals with a normal curve superimposed. ,he
residuals look close to normal.
?ere is a plot of the residuals versus predicted Y. ,he pattern show here
indicates no problems with the assumption that the residuals are normally distributed at
each level of Y and constant in variance across levels of Y. !'!! does not
automatically draw in the regression line (the horiContal line at residual D 3). I double
clicked the chart and then selected (lements, 2it @ine at ,otal to get that line.
;
!'!! has saved the residuals, unstandardiCed (/(!0&) and standardiCed
(E/(0&) to the data file)
AnalyCe, (=plore E/(0& to get a better picture of the standardiCed residuals.
,he plots look fine. As you can see, the skewness and kurtosis of the residuals is about
what you would e=pect if they came from a normal distribution)
6
Descriptives
.3333333
:+.8867&
+.488&7
8.+3999
:.356
:.+46
<ean
<inimum
<a=imum
/ange
!kewness
1urtosis
E/(0&
!tatistic
#ow predict Y from the skewed *+.
You conduct this analysis with the same plots and saved residuals as above.
You will notice that the residuals plots and e=ploration of the saved residuals
indicate no problems for the regression model. ,he skewness of *+ may be
troublesome for the correlation model, but not for the regression model.
#e=t, predict skewed Y+ from *.
Model Summary
b
.68+
a
.+36 .+33 +6.87&
<odel
&
/ / !uare
AdBusted
/ !uare
!td. (rror of
the (stimate
'redictors) (-onstant), *
a.
"ependent %ariable) Y+
b.
8
#otice that the residuals plots shows the residuals not to be normally distributed > they
are pulled out (skewed) towards the top of the plot. (=plore also shows trouble)
4
Descriptives
.3333333
:&.75656
;.4&;99
8.6775;
&.;63;9
.73;
.948
<ean
<inimum
<a=imum
/ange
Interuartile /ange
!kewness
1urtosis
E/(0&
!tatistic
#otice the outliers in the bo=plot.
<aybe we can solve this problem by taking the suare root of Y+. 'redict the
suare root of Y from *.
5
Model Summary
b
.689
a
.+&& .+35 &.585;7
<odel
&
/ / !uare
AdBusted
/ !uare
!td. (rror of
the (stimate
'redictors) (-onstant), *
a.
"ependent %ariable) Y+0!./,
b.
Descriptives
.3333333 .3538;+59
:+.+&694
+.7+443
8.36&84
&.6+535
.&;; .&5+
:.+63 .;6+
<ean
<inimum
<a=imum
/ange
Interuartile /ange
!kewness
1urtosis
E/(0&
!tatistic !td. (rror
#otice that the transformation did wonders, reducing the skewness of the
residuals to a comfortable level.
7
Ae are done with the /esidual:!kew data set now. /ead into !'!! the
A#$%A&.sav data file. -onduct a linear regression analysis to predict illness from dose
of drug. !ave the standardiCed residuals and obtain the same plots that we produced
above.
Model Summary
b
.&&3
a
.3&+ .33+ &+.&&;
<odel
&
/ / !uare
AdBusted
/ !uare
!td. (rror of
the (stimate
'redictors) (-onstant), dose
a.
"ependent %ariable) illness
b.
@ook at the residuals plot. $h my. #otice that the residuals are not
symmetrically distributed about Cero. ,hey are mostly positive with low and high values
of predicted Y and mostly negative with medium values of predicted Y. If you were to
find the means of the residuals at each level of Y and connect those means with the line
you would get a curve with one bend. ,his strongly suggests that the relationship
between * and Y is not linear and you should try a nonlinear model. #otice that the
problem is not apparent when we look at the marginal distribution of the residuals.
'roduce the new variable "ose0!. by suaring "ose, $1.
9
#ow predict Illness from a combination of "ose and "ose0!.. Ask for the usual
plots and save residuals and predicted scores.
Model Summary(b
<odel / / !uare
AdBusted /
!uare
!td. (rror of
the (stimate
& .485(a) .6;& .6&9 9.+;7
a 'redictors) (-onstant), "ose0!., dose
b "ependent %ariable) illness
#otice that the R has gone up a lot and is now significant, and the residuals plot
looks fine.
&3
@et us have a look at the regression line. Ae saved the predicted scores
('/(0&), so we can plot their means against dose of the drug)
-lick Fraphs, @ine, !imple, "efine.
!elect @ine /epresents $ther statistic and scoot '/(0& into the variable bo=.
!coot "ose into the -ategory A=is bo=. $1.
&&
Aow, that is certainly no straight line. Ahat we have done here is a polynomial
regression, fitting the data with a uadratic line. A uadratic line can have one bend in
it.
@et us get a scatter plot with the data and the uadratic regression line. -lick
Fraph, !catter, !imple !catter, "efine. !coot Illness into the Y:a=is bo= and "ose into
the *:a=is bo=. $1. "ouble:click the graph to open the graph editor and select
(lements, 2it line at total. !'!! will draw a nearly flat, straight line. In the 'roperties
bo= change 2it <ethod from @inear to .uadratic.
-lick Apply and then close the chart editor.
&+
Ae are done with the A#$%A.sav data for now. Gring into !'!! the /esidual:
?(,(/$.dat data. (ach case has two scores, * and Y. ,he delimiter is a blank space.
-onduct a regression analysis predicting Y from *. -reate residuals plots and save the
standardiCed residuals as we have been doing with each analysis.
&;
As you can see, the residuals plot shows clear evidence of heteroscedasticity. In
this case, the error in predicted Y increases as the value of predicted Y increases. I
have been told that transforming one the variables sometimes reduces
heteroscedasticity, but in my e=perience it often does not help.
/eturn to AuenschHs !'!! @essons 'age
-opyright +335, 1arl @. Auensch : All rights reserved.
&6

You might also like