You are on page 1of 33

1

Instructions for Conducting Multiple Linear Regression Analysis in SPSS Multiple linear regression analysis is used to examine the relationship between two or more independent variables and one dependent variable. The independent variables can be measured at any level (i.e., nominal, ordinal, interval, or ratio). However, nominal or ordinallevel I s that have more than two values or categories (e.g., race) must be recoded prior to conducting the analysis because linear regression procedures can only handle interval or ratiolevel I s (Namun, nominal atau tingkatan ordinal pada IVs yang memiliki lebih dari dua nilai
atau kategori (misalnya, ras) harus recoded sebelum melakukan analisis karena prosedur regresi linear hanya dapat menangani pada tingkatan interval atau rasio,), and nominal or

ordinal-level I s with a maximum o! two values (i.e., dichotomous). The dependent variable M"#T be measured at the interval- or ratio-level. In this demonstration we use base year standardi$ed reading score (%&'()#T*) as the dependent variable and socio-economic status (%&#+#), !amily si$e (%&,-M#I.), sel!-concept (%&/0/1T2), urban residence, (")%-0), rural residence ()")-3), and sex (4+0*+)) as independent variables. In order to conduct the analysis, we have to recode two variables !rom the original data set 5i.e., urbanicity (46")%-0) into urban residence (")%-0), and rural residence ()")-3), and sex (#+() into sex (4+0*+))7. 8e include instructions !or this step as well. -lthough not used in the analysis, the data set also includes two identi!ication variables 5student I* (#T"9I*) and school I* (#/H9I*)7 and one weight variable (,'10838T). /opies o! the data set and output are available on the companion website. The data set !ile is entitled, :)+4)+##I;0.#- <. The output !ile is entitled, :Multiple 3inear )egression results.spv<. The !ollowing instructions are divided into three sets o! steps=

2.

)ecode 46")%-0 and #+( into new dichotomous variables (i.e., ")%-0, )")-3, > 4+0*+))

'.

/onduct preliminary analyses a. +xamine descriptive statistics o! the continuous variables b. /hec? the normality assumption by examining histograms o! the continuous variables c. /hec? the linearity assumption by examining correlations between continuous variables and scatter diagrams o! the dependent variable versus independent variables.

@.

/onduct multiple linear regression analysis a. )un model with dependent and independent variables b. Model /hec? i. +xamine collinearity diagnostics to chec? !or multicollinearity ii. +xamine residual plots to chec? error variance assumptions (i.e., normality and homogeneity o! variance) iii. +xamine in!luence diagnostics (residuals, d!betas) to chec? !or outliers iv. +xamine signi!icance o! coe!!icient estimates to trim the model c. )evise the model and rerun the analyses based on the results o! steps i-iv. d. 8rite the !inal regression eAuation and interpret the coe!!icient estimates.

To get started, open the #1## data !ile entitled, )+4)+##I;0.#- . STEP I: Recode SEX and G8 R!A" into dic#oto$ous %aria&les )un !reAuencies !or #+( and 46")%-0. This will be help!ul later !or veri!ying that the dichotomous variables were created correctly. -t the Analy'e menu, select (escripti%e

Statistics. /lic? )re*uencies+ /lic? #+( and 46")%-0 and move them to the ,aria&les box. /lic? -.+

1rior to recoding #+( into a dichotomous variable, we need to determine what the numeric values are !or Male and ,emale. To do this, clic? on #1## %aria&le %ie/ at the bottom le!t corner o! your #1## screen+ /lic? the ,alues column !or #+( variable. /lic? on the grey bar to reveal how Male and ,emale are coded. 8e see that Male is coded as :2< and ,emale is coded as :'<. /lic? Cancel+

To run multiple regression analysis in #1##, the values !or the #+( variable need to be recoded !rom B2C and B'C to BDC and B2C. %ecause the value !or Male is already coded 2, we only need to re-code the value !or ,emale, !rom B'C to BDC. ;pen the Transfor$ menu at the top o! the #1## menu bar. #elect and clic? Recode into (ifferent ,aria&les.

;n the variables list in the box on the le!t, clic? #+( and move it to center box.

8ithin the ;utput variable "a$e type B4+0*+)C to rename the #+( variable. /lic? C#ange+

/lic? -ld and "e/ ,alues+ The E;ld alueE is the value !or the level o! the categorical variable (#+() to be changed. The E0ew alueE is the value !or the level on the recoded variable (4+0*+)).

The value !or Male is :2< on both #+( and 4+0*+). Type :01 in the -ld ,alue boxF type :2< in the "e/ ,alue box+ /lic? Add+ The value !or ,emale is recoded !rom :'< on #+( to :D< on 4+0*+). Type :'< in the -ld ,alue boxF type :D< in the "e/ ,alue box+ /lic? Add+

/lic? Continue+ /lic? -.

The variable 4+0*+) will be added to the dataset.

0ext we label the values !or the 4+0*+) categories (Male and ,emale). /lic? ,aria&le ,ie/ on bottom le!t corner o! the data spread sheet.

/lic? the grey bar on the ,alues column !or 4+0*+)

Type :2< into the alue boxF type :Male< into the 3abel box. /lic? Add+ Type 2D1 into the alue boxF type :,emale< into the 3abel %ox. /lic? Add+ /lic? -.+

To determine whether 4+0*+) was created correctly, run a !reAuency on the newly created 4+0*+) variable.

The values !or Males and ,emales should match those in the #+( variable.

8e now need to examine 46")%-0 to determine its categories and how it should be recoded. /lic? on #1## %aria&le %ie/+ /lic? the ,alues column !or 46")%-0 variable. /lic? on the grey bar to reveal how 46")%-0 categories are coded. 8e see that 46")%-0 has three categories.

)ecall that in multiple regression analysis, independent variables must be measured as either intervalGratio or nominalGordinal with only two values (i.e., dichotomous). Thus, we need to recode 46")%-0. A rule of t#u$& in creating dic#oto$ous %aria&les is t#at for categorical independent %aria&les /it# $ore t#an t/o categories 3i+e+4 564 /e create 5 $inus 0 dic#oto$ous %aria&les. In this case, 46")%-0 has @ categories, thus, we will create ' dichotomous variables (@ H 2 I '). The category that is not included in the dichotomous variables is re!erred to as the re!erence category. In this example, #"%")%-0 is the re!erence category. The two new dichotomous variables are named ")%-0 and )")-3. The recoding is done in the same way that we recoded #+(. The !ollowing table shows the values o! the old variable (i.e., 46")%-0) and values !or each o! the new dichotomous variables (")%-0 and )")-3)= ariables alues

1"

"rban ;ld= 46")%-0 0ew= ")%-0 0ew= )")-3 2 2 D

#uburban ' )e!erence /ategory )e!erence /ategory

)ural @ D 2

To create ")%-0 !rom 46")%-0, open the Transfor$ menu at the top o! the #1## menu bar. #elect and clic? Recode into (ifferent ,aria&les.

I! necessary, clic? Reset to clear previous selections. ;n the variables list in the box on the le!t, clic? 46")%-0 and move it to center box. 8ithin the ;utput variable "a$e type ")%-0. /lic? C#ange+

/lic? ;ld and 0ew alues. Type :01 in the -ld ,alue boxF type :2< in the "e/ ,alue box+ /lic? Add+ Type :'< in the -ld ,alue boxF type :D< in the "e/ ,alue box+ /lic? Add+ Type :@< in the -ld ,alue boxF type :D< in the "e/ ,alue box+ /lic? Add. /lic? Continue+

11

/lic? -.+

- new variable ")%-0 will be added to the dataset. )un !reAuencies on ")%-0 to veri!y that the number o! 2Cs in the ")%-0 category matches the number o! ")%-0 in the original 46")%-0 variable.

12

To create )")-3 !rom 46")%-0, open the Transfor$ menu at the top o! the #1## menu bar. #elect and clic? Recode into (ifferent ,aria&les.

/lic? Reset to clear the previous selections. ;n the variables list in the box on the le!t, clic? 46")%-0 and move it to center box. 8ithin the ;utput variable "a$e type R RAL. /lic? C#ange+

13

/lic? ;ld and 0ew alues. Type :01 in the -ld ,alue boxF type :D< in the "e/ ,alue box+ /lic? Add+ Type :'< in the -ld ,alue boxF type :D< in the "e/ ,alue box+ /lic? Add+ Type :@< in the -ld ,alue boxF type :2< in the "e/ ,alue box+ /lic? Add. /lic? Continue+

/lic? -.+ - new variable )")-3 will be added to the dataset. )un !reAuencies on )")-3 to veri!y that the number o! 2Cs in the )")-3 category matches the number o! )")-3 in the original 46")%-0 variable.

14

STEP II: PRELIMI"AR7 A"AL7SES (ependent ,aria&le: %&'()#T* Independent ,aria&les# %&#+#, %&,-M#I., %&/0/1T2, 4+0*+), ")%-0, )")-3 ***NOTE: The list of independent variables does not include SEX and G8URB N variables! This is because "e replaced the# "ith the recoded and dichoto#ous variables*** Examine Descriptive Statistics In conducting preliminary analyses, the !irst step is to examine various descriptive statistics o! the continuous variables (i.e., %&'()#T*, %&#+#, %&,-M#I., and %&/0/1T2). ;n Analy'e menu, select (escripti%e Statistics. /lic? (escripti%es+ /lic? on

%&'()#T*, %&#+#, %&,-M#I., and %&/0/1T2, one at a time, and clic?

to add them

to the ,aria&les box. /lic? -ptions+ /hec? Mean, #td. deviation, Minimum, Maximum, Jurtosis, and #?ewness. /lic? Continue+ /lic? -.+

15

The values !or #?ewness and the Jurtosis indices are very small which indicates that the variables most li?ely do not include in!luential cases or outliers. Check Normality Assumption 8e examine the distribution o! the independent variables measured at the intervalGratiolevels and dependent variables to chec? the normality assumption as the second step. To chec? %&#+#, clic? Grap#s menu. #elect Legacy (ialogs. /lic? 8istogra$s+ /lic? and move %&#+# to the ,aria&le box. /hec? (isplay nor$al curve. /lic? -.+

16

-s the graph shows, %&#+# is normally distributed.

To chec? %&,-M#I., clic? Grap#s menu. #elect Legacy (ialogs. /lic? 8istogra$s+ /lic? Reset to clear the variable box. /lic? and move %&,-M#I. to the ,aria&le box. /hec? (isplay nor$al curve. /lic? -.+

-lthough %&,-M#I. is slightly s?ewed in a positive direction, the normality assumption is not violated. To chec? %&/01T2, clic? Grap#s menu. #elect Legacy (ialogs. /lic? 8istogra$s+ /lic? Reset to clear the variable box. /lic? and move %&/0/1T2 to the ,aria&le box. /hec? (isplay nor$al curve. /lic? -.+

17

#ometimes the way in which #1## determines the number o! intervals !or histograms results in the creation o! a histogram with a shape that is di!!icult to interpret (as in the preceding histogram !or sel! concept). 8hen this happens, we can manually adKust the number o! intervals to improve interpretation. %etween 2L and 'D intervals is recommended !or a data set with more than 'LD cases. The !ollowing revised histogram includes 2L intervals.

-s the graph indicates, sel! concept is negatively s?ewed. The distribution o! sel!-concept scores is not normal and thus violates the normality assumption.

To chec? %&'()#T*, clic? Grap#s menu. #elect Legacy (ialogs. /lic? 8istogra$s+ /lic? Reset to clear the variable box. /lic? and move %&'()#T* to the ,aria&le box. /hec? (isplay nor$al curve. /lic? -.+ The number o! intervals !or this histogram has also been adKusted to 2L using the same procedure as sel! concept.

)eading #tandardi$ed score is slightly s?ewed in a positive direction but not enough to violate the normality assumption. Check Linearity Assumption: -nother linear regression assumption is that the relationship between the dependent and independent variables is linear. 8e can chec? this assumption by examining scatterplots o! the dependent and independent variables. ,irst, we calculate 1earson correlation coe!!icients to examine relationships between the * and the I s measured at the intervalGratio-levels to chec? an indication o! the magnitude o! the relationship between variable pairs. /lic? Analy'e. #elect Correlate. /lic? !i%ariate. /lic? and move the dependent and the continuous independent variables to the ,aria&les box. /hec? Pearson. #elect T/o9tailed significance. /hec? )lag significant correlations+ /lic? -.+

1!

The results indicate a moderate positive correlation between socio economic status and reading scoresF a wea? positive correlation between sel! concept and reading scores, and a wea? negative correlation between !amily si$e and reading scores. To create a scatter diagram o! %&'()#T* by %&#+#, clic? Grap#s menu. #elect Legacy (ialogs. /lic? Scatter:(ots+ #elect Si$ple scatter. /lic? (efine. /lic? and move %&'()#T* to the 7 A;is+ /lic? and move %&#+# to the X A;is. /lic? -.+

2"

The graph shows a positive linear relationship between reading scores and socio economic status. The correlation between the two variables is signi!icant so we can conclude that there is a linear relationship between %&'()#T* and %&#+#, thus not violating the linearity assumption. To create a scatter diagram o! %&'()#T* by %&/01T2, clic? Grap#s menu. #elect Legacy (ialogs. /lic? Scatter:(ots+ #elect Si$ple scatter. /lic? (efine. /lic? Reset to clear the axes boxes. /lic? and move %&'()#T* to the 7 A;is+ /lic? and move %&/0/1T2 to the X A;is. /lic? -.+

-lthough this graph suggests that the linearity assumption may be violated here, we will ?eep sel! concept in the model because its correlation with reading scores is signi!icant. However, this violation must be noted as a limitation o! the model.

21

To create a scatter diagram o! %&'()#T* by %&,-M#I., clic? Grap#s menu. #elect Legacy (ialogs. /lic? Scatter:(ots+ #elect Si$ple scatter. /lic? (efine. /lic? Reset to clear the axes boxes. /lic? and move %&'()#T* to the 7 A;is+ /lic? and move %&,-M#I. to the X A;is. /lic? -.+

This graph suggests that the linearity assumption may be violated here as well. In !act, the graph shows similar reading scores among students in !amilies with two to seven !amily members. The reading scores o! students !rom !amilies with more than seven members appear to be lower. 8e could recode !amily si$e to a dichotomous variable (DI less than 6 !amily members, 2I6 or more !amily members) as an alternative way to evaluate its impact. This violation must also be noted as a limitation o! the model. %ased on the outcome o! this assessment, we retain all continuous independent variables in the model because they are signi!icantly correlated with the dependent variable. STEP III: M LTIPLE LI"EAR REGRESSI-" A"AL7SIS (ependent ,aria&le: %&'()#T* Independent ,aria&les# %&#+#, %&,-M#I., %&/0/1T2, 4+0*+), ")%-0, )")-3 ***NOTE: The list of independent variables does not include SEX and G8URB N variables! This is because "e replaced the# "ith the recoded and dichoto#ous variables***

22

In addition to providing instructions !or running the multiple regression analysis, we also include instructions !or chec?ing model reAuirements (i.e., outliers and in!luential cases and multicollinearity among the continuous independent variables) and the homogeneity o! variance assumption. )ecall that outliers and in!luential cases can negatively a!!ect the !it o! the model or the parameter estimates. 8e calculate residuals and d!betas using the In!luence *iagnostics procedure to chec? !or outliers and in!luential cases. 8e calculate the ariance In!lation ,actor ( I,) and Tolerance statistics to chec? !or multicollinearity. ,inally, we create homogeneity o! error variance plots to chec? the homogeneity o! variance. The dataset used to conduct these analyses is included on the companion website. It is entitled, )+4)+##I;0.#- . The !irst step is to enter the dependent and independent variables into the #1## model. /lic? on Analy'e in the menu bar at the top o! the data view screen. - drop down menu will appear. Move your cursor to Regression. -nother drop down menu will appear to the right o! the !irst menu. /lic? on Linear to open the 3inear )egression dialog box.

/lic? to highlight %&'()#T* in the box on the le!t side. /lic? the top arrow to move it to the *ependent box. /lic? to highlight %&#+#, %&,-M#I., %&/0/1T2, 4+0*+), ")%-0, and )")-3 and move each o! them one at a time to the Independent3s) box.

23

0ext, we program #1## to provide a chec? !or multicollinearity. /lic? Statistics in the upper right corner to open the 3inear )egression= #tatistics dialogue box. Ma?e sure that Esti$ates, Model fit and Collinearity (iagnostics are chec?ed. /lic? Continue to return to the 3inear )egression dialogue box+

To create residual plots to chec? the homogeneity o! variance and normality assumptions, clic? Plots Kust below Statistics to open the 3inear )egression= 1lots dialogue box. /lic? and move :M.)+#I*< to the & box. /lic? and move :M.1)+*< to the ( box. /hec? :Histogram< + /lic? Continue to return to the 3inear )egression dialogue box+

24

To chec? !or outliers and in!luential cases, clic? Sa%e Kust below Plots to open the 3inear )egression= #ave dialogue box. "nder )esiduals, chec? :#tandardi$ed1. "nder In!luence #tatistics, chec? :#tandardi$ed *!%etas<. /lic? Continue to return to the 3inear )egression dialogue box+

/lic? ;J at the bottom o! the 3inear )egression dialogue box to run the multiple linear regression analysis.

25

8e now examine the output, including !indings with regard to multicollinearity, whether the model should be trimmed (i.e., removing insigni!icant predictors), violation o! homogeneity o! variance and normality assumptions, and outliers and in!luential cases.

The )' is .''L. This means that the independent variables explain ''.LN o! the variation in the dependent variable.

The p value !or the , statistic is O .DL. This means that at least one o! the independent variables is a signi!icant predictor o! the * (standardi$ed reading scores). The :#ig.< column in the /oe!!icients table shows which variables are signi!icant.

26

It appears $ulticollinearity is not a concern because the I, scores are less than @. In terms o! $odel tri$$ing, the results also show that %&,-M#I., ")%-0, )")-3 are not signi!icant predictors o! the standardi$ed reading scores. 8e will remove these variables !rom the model and rerun the analysis. The histogram o! residuals allows us to chec? the extent to which the residuals are nor$ally distri&uted. The residuals histogram shows a !airly normal distribution. Thus, based on these results, the normality o! residuals assumption is satis!ied.

27

8e examine a scatter plot o! the residuals against the predicted values to evaluate whether the #o$ogeneity of %ariance assumption is met. I! it is met, there should be no pattern to the residuals plotted against the predicted values. In the !ollowing scatter plot, we see a slanting pattern, which suggests heteroscedasticity, (i.e., violation o! the homogeneity o! variance assumption).

,inally, we examine the values o! the standardi$ed *!%etas and standardi$ed residual values to identi!y outliers and influential cases+ 3arge values suggest outliers or in!luential cases. 0ote that the results thus !ar (histograms and scatter plots o! the continuous variables and residuals) showed no data point(s) that stood out as outliers. Thus, it is unli?ely that we will !ind large standardi$ed *!%etas or standardi$ed residual values. 0onetheless, the standardi$ed *!%eta values can veri!y this. The values o! the standardi$ed *!%etas have been added as seven additional variables in the data set (#*%D92 H #*%P92). ;utliers or in!luential cases have large (O -' or Q') standardi$ed *!%etas. Instead o! manually scrolling through the values o! each variable to chec? this, we can calculate maximum and minimum values.

/lic? Analy'e in the menu bar at the top o! the data view screen. - drop down menu will appear. Move your cursor to (escripti%e Statistics. -nother drop down menu will appear to the right o! the !irst menu. /lic? on (escripti%es to open the *escriptives dialog box. Highlight the seven #*% variables and move them to the ,aria&les box. /lic? -.+

The results show no standardi$ed *!beta values O -' or Q '+ 8e can conclude that the dataset does not include outliers or in!luential cases.

2!

- copy o! the output !or the analyses we Kust conducted is provided on the companion website. It is entitled, Multiple 3inear )egression )esults9prelim.spv. )itting a )inal Model 4iven that %&,-M#I., ")%-0, )")-3 are not signi!icant we remove them !rom the analysis and re!it the model. The revised list variables are= (ependent ,aria&le: %&'()#T* Independent ,aria&les# %&#+#, %&/0/1T2, 4+0*+) ,irst, we must delete the )esidual and *!%eta variables !rom the dataset. This can easily be done by highlighting them, right clic?ing your mouse, and then clic?ing Clear+

0ext, we rerun the model !ollowing the same steps used in conducting the preliminary analyses. /lic? Analy'e in the menu bar at the top o! the data view screen. - drop down menu will appear. Move your cursor to Regression. -nother drop down menu will appear to the right o! the !irst menu. /lic? Linear to open the 3inear )egression dialog box. /lic? to highlight %&'()#T* in the box on the le!t side. /lic? the top arrow to move it to the *ependent box. /lic? to highlight %&#+#, %&/0/1T2, and 4+0*+) and move each o! them one at a time to the Independent3s) box. /lic? Statistics in the upper right corner to open the 3inear )egression= #tatistics dialogue box. Ma?e sure that Esti$ates and Model fit are chec?ed. /lic? Continue to return to

3"

the 3inear )egression dialogue box+ To chec? the homogeneity o! variance and normality assumptions, clic? Plots Kust below Statistics to open the 3inear )egression= 1lots dialogue box. /lic? and move :M.)+#I*< to the & box. /lic? and move :M.1)+*< to the ( box. /hec? :Histogram<+ /lic? Continue to return to the 3inear )egression dialogue box+ /lic? -. at the bottom o! the 3inear )egression dialogue box to run the revised analyses. The output is provided as !ollows=

The )' I .''R. This means that the independent variables explain ''.RN o! the variation in the dependent variable. This value almost the same as the )' value !rom the preliminary model. This con!irms that the variables removed !rom the preliminary model were useless in predicting reading scores.

The p value !or , statistic is O .DL. This means that at least one independent variable is a signi!icant predictor o! reading scores. The #ig. column in the /oe!!icients table shows which variables are signi!icant.

31

1lots o! residuals and homogeneity o! error variance loo? identical to the plots !rom the preliminary model, indicating that the normality o! residuals assumption is met but the homogeneity o! variance assumption is not met.

32

8e examine the coe!!icients table to examine and interpret the results. The prediction eAuation is based on the unstandardi$ed coe!!icients, as !ollows= %&'()#T*i I L'.L2 S L.6TU %&#+#i S 2.@@P %&/0/1T2i H '.@RP 4+0*+)i , where iI2V.2LDD and 4+0*+)I 2 !or Males and D !or ,emales. 8e can use the unstandardi$ed coe!!icients to interpret the results. The Constant is the predicted value o! the dependent variable when all o! the independent variables have a value o! $ero. In the context o! this analysis, the predicted

33

reading score !or !emales with $ero sel!-concept and $ero socio-economic status score is L'.L2. The slope o! socio9econo$ic status (%&#+#) is L.6TU. This means that !or every one unit increase in socio economic status, predicted reading scores increase by L.6TU units, a!ter controlling !or sel! concept and gender. The slope o! self concept (%&/0/1T2) is 2.@@P. This means that !or every one unit increase in sel! concept, predicted reading scores increase by 2.@@P, a!ter controlling !or socio-economic status and gender. The slope o! gender (4+0*+)) is -'.@RP. This means that, on average, predicted reading scores !or males are '.@RP points lower than !or !emales, a!ter controlling !or socio-economic status and sel! concept.

You might also like