Professional Documents
Culture Documents
Stepwise binary logistic regression is very similar to stepwise multiple regression in terms
of its advantages and disadvantages.
Stepwise logistic regression is designed to find the most parsimonious set of predictors
that are most effective in predicting the dependent variable.
Variables are added to the logistic regression equation one at a time, using the statistical
criterion of reducing the -2 Log Likelihood error for the included variables.
After each variable is entered, each of the included variables are tested to see if the model
would be better off the variable were excluded. This does not happen often.
The process of adding more variables stops when all of the available variables have been
included or when it is not possible to make a statistically significant reduction in -2 Log
Likelihood using any of the variables not yet included.
Nonmetric variables are added to the logistic regression as a group. It is possible, and often
likely, that not all of the individual dummy-coded variables will have a statistically
significant individual relationship with the dependent variable. We limit our interpretation
to the dummy-coded variables that do have a statistically significant individual
relationship.
SPSS provides a table of variables included in the analysis and a table of variables
excluded from the analysis. It is possible that none of the variables will be included. It is
possible that all of the variables will be included.
The order of entry of the variables can be used as a measure of relative importance.
Once a variable is included, its interpretation in stepwise logistic regression is the same as
it would be using other methods for including variables.
The number of cases required for stepwise logistics regression is greater than the number
for the other forms. We will use the norm of 20 cases for each independent variable,
double the recommendation of Hosmer and Lemeshow.
Stepwise logistic regression can be used when the goal is to produce a predictive model
that is parsimonious and accurate because it excludes variables that do not contribute to
explaining differences in the dependent variable.
Stepwise logistic regression is less useful for testing hypotheses about statistical
relationships. It is widely regarded as atheoretical and its usage is not recommended.
Stepwise logistic regression can be useful in finding relationships that have not been
tested before. Its findings invite one to speculate on why an unusual relationship makes
sense.
It is not legitimate to do a stepwise logistic regression and present the results as though
one were testing a hypothesis that included the variables found to be significant in the
stepwise logistic regression.
Using statistical criteria to determine relationships is vulnerable to over-fitting the data set
used to develop the model at the expense of generalizability.
When stepwise logistic regression is used, some form of validation analysis is a necessity.
We will use 75/25% cross-validation.
75/25% Cross-validation
To do cross validation, we randomly split the data set into a 75% training sample and a
25% validation sample. We will use the training sample to develop the model, and we test
its effectiveness on the validation sample to test the applicability of the model to cases not
used to develop it.
Note: shrinkage may be a negative value, indicating that the accuracy rate for the
validation sample is larger than the accuracy rate for the training sample. Negative
shrinkage (increase in accuracy) is evidence of a successful validation analysis.
If the validation is successful, we base our interpretation on the model that included all
cases.
Click on the
Continue
button to close
the dialog box.
Click on the OK
button to request
the output.
While optional
statistical output is
available, we do not
need to request any
optional statistics.
The statement about the relationship between education and abortion for any reason
Having satisfied the criteria for the
stepwise relationship, we examine the
findings for individual relationships with
the dependent variable. If the overall
relationship were not significant, we would
not interpret the individual relationships.
Output for the relationship between education and abortion for any reason
Marking the statement for relationship between education and abortion for any reason
Statement for relationship between fundamentalism and abortion for any reason
Output for relationship between fundamentalism and abortion for any reason
Marking the relationship between fundamentalism and abortion for any reason
Statement for relationship between fundamentalism and abortion for any reason
Output for relationship between fundamentalism and abortion for any reason
Marking the relationship between fundamentalism and abortion for any reason
Statement for relationship between socioeconomic index and abortion for any reason
Output for relationship between socioeconomic index and abortion for any reason
Marking the relationship between socioeconomic index and abortion for any reason
At Block 0 with no
independent variables
in the model, all of the
cases are predicted to
be members of the
modal group, 0=NO in
this example.
Second, select
the option button
for a Fixed
Value.
Click on the OK
button to create
the variable.
An Additional Task before Running the Stepwise Logistic Regression on Training Sample
Before we run the regression on the training sample, we need an additional step that will
enable us to compare the accuracy of the model for the training sample to the accuracy of
the model for the validation sample, using the R2 for each as our measure of accuracy.
We need to exclude from the analysis cases that are missing data for any of the variables
that we have designated as candidates for inclusion. If we dont specifically do this, SPSS
may include different cases in predicting values for the dependent variable than it does in
determining which variables to include in the model.
In model building, SPSS does listwise exclusion of missing data and omits any cases that
have missing data for any variable. In predicting scores on the dependent variable, it
excludes cases that are missing data for only the variables included in the stepwise model.
Thus, when selecting variables, SPSS assumes that only respondents who answer all
questions are valid cases; in predicting scores, it assumes that failing to answer a question
on a variable that is not included has no importance in the analysis.
Selecting Cases with Valid Data for All Variables in the Analysis - 1
Selecting Cases with Valid Data for All Variables in the Analysis - 2
First, mark the
option button for If
condition is
satisfied.
Second, click on
the If button to
add the condition.
Selecting Cases with Valid Data for All Variables in the Analysis - 3
Type
NMISS(abany,age,educ,sex,rincom98,fund,sei) = 0
in the condition textbox. In the parentheses, we
type the names of the dependent variable and all of
the independent variables.
Selecting Cases with Valid Data for All Variables in the Analysis - 4
Click on the
Continue button to
close the dialog box.
Selecting Cases with Valid Data for All Variables in the Analysis - 5
Click on the OK
button to
execute the
command.
Selecting Cases with Valid Data for All Variables in the Analysis - 6
Click on the
Continue
button to close
the dialog box.
First, highlight
the split variable.
First, type 1 in
the Value text
box. Recall that
this is the value
of split indicating
training cases.
Click on the OK
button to produce
the output.
If the number of
steps were different,
the validation would
fail.
If the variables
included were different,
the validation would
fail.
Level of
measurement ok?
No
Yes
Stop
Mark check box
for level of measurement
No
Yes
Consider limitation in
discussion of findings
Multicollinearity/Numeric
al Problems (S. E. > 2.0)
Yes
No
Yes
No
1+ variables entered
in model?
No
No
Yes
Parsimonious subset of
variables correctly
identified?
Yes
No
Yes
Correct interpretation of
direction and strength of
relationship?
Yes
Yes
Additional individual
Relationships to
interpret?
No
No
Classification accuracy =
or > 1.25 x by chance
accuracy rate
No
Yes
Mark check box for
classification accuracy
No
No
Yes
Yes