You are on page 1of 46

Logistic regression

Markus Kemmelmeier, Ph.D.


Dept of Sociology & Interdisciplinary Ph.D. Program in Social Psychology

markusk@unr.edu

What it is

Statistical technique of choice to work with dichotomous (binary) dependent variable


Guilty vs. not-guilty Passed vs. failed Long-term resident vs. short-term resident Just like OLS regression
.though substantially different in some respects

Regression technique

Based on maximum likelihood estimation

What it is good for

Prediction

Finding the most efficient model Finding the most comprehensive model Does X predict Y (or not)?

Hypothesis testing

X is a continuous variable (e.g., age) X is a categorical variable (e.g., acad. discipline)

Do X & Z predict Y better than X by itself? Does Z mediate the effect of X on Y?

The logic of logistic regression

Relies on probabilities, odds


Odds = probability (p) / probability (1-p)
When rolling a die Probability 1 / 6 = .1666 Odds 1 / (6 - 1) = 1 / 5 = .20 MK hitting the trash can Probability 8 / 10 = .80 Odds 8/ 2 = 4

Odds ratio

Relationship between odds


Ex. MK and the trash can Probability


Left-hand 8 of 10 .80 Right-hand 6 of 10 .60 Left-hand 8 over 2 4 Right-hand 6 over 4 1.5 Odds (left hand) / Odds (right hand) Odds (right hand) / Odds (left hand)

In print out as:

Odds

Exp(B)

Odds ratio

4 / 1.5 = 2.667 1.5 / 4 = 0.375

Logit

What logistic regression does:

Predict logit of p based on predictor variables xk

The logistic function


Nonlinear function

A indicates how much the log odd ratio of an event changes.

Why not just use ANOVA/linear (OLS) regression?

Answer #1

Computer simulation shows that often you may draw using the right conclusions when using ANOVA instead of logistic regression (Linney, 1970) And how do you know that you are drawing the right conclusion this time? You only know that you have the right results if you have run the right analysis

Answer #2

Why not just use ANOVA/linear (OLS) regression?

Answer #3

You are violating ANOVA, regression assumptions that can lead to misleading conclusions

Data workup

Before you run your actual analyses

Check nature of predictor variables

Is it a categorical or continuous predictor? What do high/low values mean? What value do I want to use a reference category? (categorical predictor variables)

Data coding

Data workup

Before you run your actual analyses

Check distributions

Are there outliers?


Univariate outliers (EXPLORE) Multivariate outliers: Mahalanobis distances Run OLS regression (!), choose option SAVE Mahalanobis; ignore results Data set now includes Mahalanobis value for each case Are there any Mahalanobis values that are greater than the 2 value for df = k, with k being the number of predictors/terms in your logistic regression model?

How to run a logistic regression in SPSS

Going through the menu (GUI) Syntax (preferred)


verdict responsible

not guilty vs. guilty 1-9 rating (continuous predictor)

LOGISTIC REGRESSION VARIABLES verdict /METHOD=ENTER responsible /CRITERIA=PIN(.05) POUT(.10) ITERATE(20) CUT(.5).

SPSS Output

Different components

Case processing summary Dependent variable coding Beginning model (Block 0) Statistics for overall model fit Classification tables Summary of model variables

SPSS Output

Dependent variable coding

Automatic recoding to 0/1

Block 0 (not useful)

Constant-only model

Block 1 (here: final model)

Statistics for overall model fit

Significant chi-square: Your model is better at predicting than constant-only model (Block 0)

-2 LL score: the lower the better

Important fit index

Two indices mimic the OLS r2

Block 1 (final model)

Classification tables: % predicted correctly


Cases with predicted probability of >=.50 guilty Cases with predicted probability of < .50 not guilty Crude measure, because it ignores

base rates how well cases are being discriminated

Block 1 (final model)

Summary of model variables


Significance of estimates B, based on S.E. and Wald test Continuous predictors variable has df = 1 Positive B = Exp(B) > 1 Negative B = Exp(B) < 1

Computation of predicted probabilities

Interpretation of B coefficients not intuitive! Use B coefficients to compute predicted values

See chapter by Norusis for an example

Expanding the model

Add a categorical predictor variable to previous model

Account of the defendant:

Justification vs. Excuse vs. Denial vs. No explanation

Account needs to be defined as categorical (sonamed button in logistic regression menu of SPSS)

LOGISTIC REGRESSION VARIABLES verdict /METHOD=ENTER Responsible Account /CONTRAST (Account)=Indicator /CRITERIA=PIN(0.05) POUT(0.10) ITERATE(20) CUT(0.5).

Coding & Block 0

Automatic coding of multi-category predictor variable into dummy variables

Block 0 identical

Block 1 (final model)

Model fit

Model variables

Comparative model testing

Likelihood ratio test

Tests whether a (set of) predictor(s) improves model fit


Likelihood ratio = - 2LL(constant only model) (-2LL (model with predictors)) Likelihood ratio = - 2LL(reduced model) - (-2LL (full model))

Likelihood ratio has a 2 (chi-square) distribution

Evaluate based on number of different parameters in the model k

k = df (full model) df (reduced model)

Critical value = 2 (k)


Only works if to-be-compared models are nested.

Comparative model testing

1-predictor model

2-predictors model

Likelihood ratio: 190.200 182.242 = 7.558

k = df (full model) df (reduced model) = 4 1 = 3

2 (df = 3) = 7.558, p = .056

The hypothesis that the account variable s coefficients are 0 has to be rejected.

Comparing the models

1-predictor model

2-predictors model

No change in % classified correctly!

More on fit: Model discrimination

ROC (response-operation curve)

Matching actual category membership against predicted probabilities Step 1: Save predicted probabilities for models Step 2: Use ROC to plot actual category membership against predicted probabilities

More on fit: Model discrimination

ROC (response-operation curve)

STEP 1: SAVE predicted probabilities for models (first model: pred_1; second model pred_2)

LOGISTIC REGRESSION VARIABLES verdict /METHOD=ENTER Responsible /SAVE=PRED /CRITERIA=PIN(.05) POUT(.10) ITERATE(20) CUT(.5). LOGISTIC REGRESSION VARIABLES verdict /METHOD=ENTER Account Responsible /CONTRAST (Account)=Indicator(1) /SAVE=PRED /CRITERIA=PIN(.05) POUT(.10) ITERATE(20) CUT(.5).

Syntax for ROCs


Step 2: Plot Model 1


ROC PRE_1 BY verdict (1) /PLOT=CURVE(REFERENCE) /PRINT=SE /CRITERIA=CUTOFF(INCLUDE) TESTPOS(LARGE) DISTRIBUTION(FREE) CI(95) /MISSING=EXCLUDE.

Model 2
ROC PRE_2 BY verdict (1) /PLOT=CURVE(REFERENCE) /PRINT=SE /CRITERIA=CUTOFF(INCLUDE) TESTPOS(LARGE) DISTRIBUTION(FREE) CI(95) /MISSING=EXCLUDE.

Comparing the ROC curves

1-predictor model

2-predictors model

C-statistic (area under the curve): .82 .84

Linear regression vs. logistic regression

Hierarchical regression; r2

Hierarchical model comparisons; Likelihood ratio test Stepwise logistic regression

Stepwise regression

Stepwise logistic regression


I will never use stepwise regression. I will never use stepwise regression. I will never use stepwise regression. I will never use stepwise regression. I will never use stepwise regression. I will never use stepwise regression. I will never use stepwise regression. I will never use stepwise regression. I will never use stepwise regression. I will never use stepwise regression. I will never use stepwise regression.

Stepwise logistic regression

Letting the program decide which variables to include in the model based on pre-defined statistical criteria Iterative, automated process Incompatible with theory-guided hypothesis testing, but Used in applied research

Efficient prediction (non-inclusion, pruning of redundant or low-performing predictors)

Stepwise logistic regression

Letting the program decide which variables to include in the model based on pre-defined statistical criteria Iterative, automated process Incompatible with theory-guided hypothesis testing, but Used in applied research

Efficient prediction (non-inclusion, pruning of redundant or low-performing predictors)

Stepwise logistic regression: Methods

Forward selection

In ever step, include the next best predictor


Stop when no other variable meets the criterion In ever step, exclude the next worst predictor Stop when no other variable meets the criterion Based on variable significance (non-significance) Does -2LL change significantly?

Backward (de)selection

Pre-defined criteria:

Yes include the variable in forward selection No exclude the variable in backward (de)selection

Stepwise logistic regression: Methods

Different selections methods may converge, but they don t have to

Syntax: Forward Stepwise: Likelihood Ratio

LOGISTIC REGRESSION VARIABLES verdict /METHOD=FSTEP(LR) Honest8 Credible10 Believable11 Deceitful12 /CONTRAST (Account)=Indicator(1) /CRITERIA=PIN(0.05) POUT(0.10) ITERATE(20) CUT(0.5).

Model calibration

Hosmer & Lemeshow test

Select under options

LOGISTIC REGRESSION VARIABLES verdict /METHOD=ENTER Account Responsible /CONTRAST (Account)=Indicator(1) /PRINT=GOODFIT /CRITERIA=PIN(0.05) POUT(0.10) ITERATE(20) CUT(0.5).

Model calibration

Hosmer & Lemeshow test

Convergence between expect and observed probabilities Should be non-significant!

Interaction terms

2 categorical predictor variables of verdict


Did the defendant show remorse? (yes, no) Gender of mock juror? (m, f)

Syntax (both predictors need to be defined

LOGISTIC REGRESSION VARIABLES verdict /METHOD=ENTER remorse gender gender*remorse /CONTRAST (remorse)=Indicator(1) /CONTRAST (gender)=Indicator(1) /CRITERIA=PIN(.05) POUT(.10) ITERATE(20) CUT(.5). OR LOGISTIC REGRESSION VARIABLES verdict /METHOD=ENTER remorse gender gender*remorse /CRITERIA=PIN(.05) POUT(.10) ITERATE(20) CUT(.5).

Interaction terms

2 categorical predictor variables of verdict


Did the defendant show remorse? (yes=1, no=2) Gender of mock juror? (m=1, f=2)

Syntax (both predictors need to be defined

LOGISTIC REGRESSION VARIABLES verdict /METHOD=ENTER remorse gender gender*remorse /CONTRAST (remorse)=Indicator(1) /CONTRAST (gender)=Indicator(1) /CRITERIA=PIN(.05) POUT(.10) ITERATE(20) CUT(.5).

Interaction terms

Internal recoding using first value as reference category

Interaction model: Results

Interaction model: Pattern in the data

Specific uses of multiple regression analysis

Regression analysis is sometimes used for specific purposes

Mediational modeling

What is mediation?
Definition: Mediation is when the effect of one variable A onto an outcome variable C is transmitted through another variable B. In other words, A has an indirect effect on C.
Full mediation: A B C

Partial mediation:

More on mediational analysis

1) 2) 3)

Baron & Kenny three-step procedure


AC BC (A), B C

Allowing for partial mediation


Testing if coefficient of A has decreased significantly when B is included in the equation partial mediation Compare confidence intervals of A in first and third equation No overlap = at least partial mediation

More on mediational analysis

Great, if effect of A becomes nonsignificant in third equation, but...

that does not necessarily imply that there is now a significant indirect effect

The Sobel test:


www.quantpsy.org

Sobel test equation z-value = a*b/SQRT(b2*sa2 + a2*sb2)

You might also like