GLM Models for Binary Outcomes

Generalized Multilevel Models
Todays Topics:
Introduction to Generalized Multilevel Models Models for Binary Outcomes Other Generalized Models Estimation in Generalized Multilevel Models
Psyc 945 Class 14a 1 of 52
The Two Sides of ANY Model

Model for the Means:
Aka Fixed Effects, Structural Part of Model What you are used to caring about for testing hypotheses How the expected outcome for a given observation varies as a function of values on predictor variables
Model for the Variances:

Aka Random Effects and Residuals, Stochastic Part of Model What you are used to making assumptions about instead How model residuals are related and distributed across cases Validity of the tests of the predictors depends on having the right model for the variances (where right means least wrong)
Estimates will usually be ok they come from model for the means Standard errors (and p-values) of estimates can be compromised
A World View of Models

Statistical models can be broadly organized as:
General (normal outcome) vs. Generalized (not normal outcome) One dimension of sampling (one variance term per outcome) vs. multiple dimensions of sampling (multiple variance terms)
Fixed effects only vs. mixed (fixed and random effects = multilevel)
All models have fixed effects, and then:

General Linear Models: fixed effects, no random effects General Linear Mixed Models: fixed and random effects Generalized Linear Models: fixed effects through link functions, no random effects Generalized Linear Mixed Models: fixed and random effects through link functions Linear means the fixed effects predict the link-transformed DV in a linear combination of (effect*predictor) + (effect*predictor)
What kind of outcome? Generalized vs. General

Generalized Linear Models General Linear Models whose residuals follow some not-normal distribution and in which a link-transformed Y is predicted instead of Y Many kinds of non-normally distributed outcomes have some kind of generalized linear model to go with them:
Binary (dichotomous) Unordered categorical (nominal) These two are often called multinomial inconsistently Ordered categorical (ordinal) Counts (discrete, positive values) Censored (piled up and cut off at one end left or right) Zero-inflated (pile of 0s, then some distribution after) Continuous but skewed data (pile on one end, long tail)
3 Parts of a Generalized Linear Model

Link Function (main difference from GLM):
How a non-normal outcome gets transformed into something we can predict that is more continuous (unbounded) For outcomes that are already normal, general linear models are just a special case with an identity link function (Y * 1)
Model for the Means (Structural Model):

How predictors linearly relate to the link-transformed outcome New link-transformed Yi = 0 + 1xi + 2zi
Model for the Variance (Sampling/Stochastic Model):

If the errors arent normally distributed, then what are they? Family of alternative distributions at our disposal that map onto what the distribution of errors could possibly look like
Examples of Generalized Models

Binary (dichotomous) Logit, Probit, (Complementary) Log-Log Unordered categorical (nominal) Baseline Category Logit Ordered categorical (ordinal) Cumulative or Adjacent Category Logit Counts (discrete, positive values)
Poisson or Negative Binomial (for too little/much variance)
Zero-inflated (pile of 0s, then some distribution after)

ZIP/ZINB: Poisson/Negative Binomial + excess of 0s Two-Part Model: Logit for Whether, Normal/Log for How Much?
Continuous but Skewed Log-Normal or Gamma Censored (piled up and cut off at one end)
Tobit
Means and Variances by DV Type

Means:
Quantitative DV mean = sum items / n = y y Binary DV mean = number correct / n = y py
Variances:
Quantitative DV: Var(Y) =
(Y -Y)
i i=1
n-1
If 3+ options get used, the variance is NOT determined by the mean
Binary DV: Var(Y) = py(1-py) = pyqy = y2 sy2

In binary DVs, the variance IS determined by the mean (py)
p variance
A General Linear Model for Binary Outcomes?

If Yi is a binary (0 or 1) outcome
Expected mean is proportion of people who have a 1 (or p, the probability of Yi=1 in the sample)
The probability of having a 1 is what were trying to predict for each person, given the values of his/her predictors General linear model: Yi = 0 + 1xi + 2zi + ei 0 = expected probability when all predictors are 0 s = expected change in probability for a one-unit change in the predictor ei = difference between observed and predicted values
Model becomes Yi = (predicted probability of 1) + ei

A General Linear Model With Binary Outcomes?

But if Yi is binary, then ei can only be 2 things:
ei = Observed Yi minus Predicted Yi
If Yi = 0 then ei = (0 predicted probability) If Yi = 1 then ei = (1 predicted probability)
Mean of errors would still be 0 But variance of errors cant possibly be constant over levels of X like we assume in general linear models
The mean and variance of a binary outcome are dependent! This means that because the conditional mean of Y (p, the predicted probability Y= 1) is dependent on X, then so is the error variance
A General Linear Model With Binary Outcomes?

How can we have a linear relationship between X & Y?
Probability of a 1 is bounded between 0 and 1, but predicted probabilities from a linear model arent bounded impossible values Linear relationship needs to shut off somehow made non-linear
1.40 1.20 1.00 0.80 0.60 0.40 0.20 0.00 -0.20 -0.40
??
Prob (Y=1)
??
1 2 3 4 5 6 7 8 9 10 11 X Predictor
1.40 1.20 1.00 0.80 0.60 0.40 0.20 0.00 -0.20 -0.40 1 2 3 4 5 6 7 8 9 10 11 X Predictor
Prob (Y=1)
3 Problems with General* Linear Models for Binary Outcomes

*General = model for continuous, normal outcome Restricted range (e.g., 0 to 1 for binary item)
Predictors should not be linearly related to observed outcome Effects of predictors need to be shut off at some point to keep predicted values of binary outcome within range
Variance is dependent on the mean, and not estimated

Fixed and random parts are related So residuals cant have constant variance
Residuals have a limited number of possible values

Predicted values can each only be off in two ways So residuals cant be normally distributed
Generalized Models for Binary Outcomes

Rather than modeling the probability of a 1 directly, we need to transform it into a more continuous variable with a link function, for example:
We could transform probability into an odds ratio:
Odds ratio: (p / 1-p) prob(1) / prob(0) If p = .7, then Odds(1) = 2.33; Odds(0) = .429 Odds scale is way skewed, asymmetric, and ranges from 0 to + Nope, thats not helpful
Take natural log of odds ratio called logit link

LN (p / 1-p) Natural log of (prob(1) / prob(0)) If p = .7, then LN(Odds(1)) = .846; LN(Odds(0)) = -.846 Logit scale is now symmetric about 0 DING
Turning Probability into Logits

Logit is a nonlinear transformation of probability:
Equal intervals in logits are NOT equal in probability The logit goes from and is symmetric about prob = .5 (logit = 0) This solves the problem of using a linear model
The model will be linear with respect to the logit, which translates into nonlinear with respect to probability (i.e., it shuts off as needed) Probability: p Zero-point on each scale: Prob = .5 Odds = 1 Logit = 0
Logit: LN (p / 1-p)
Transforming Probabilities to Logits

1.0
Probability 0.99 0.90
Logit 4.6 2.2 0.0 -2.2
Probabilty
0.6
0.8
0.50 0.10
0.4
Can you guess what a probability of .01 would be on the logit scale?
0.0
0.2
-4
-2
0 Logit
4
Nonlinearity in Prediction
The relationship between X and the probability of response=1 is nonlinear an s-shaped logistic curve whose shape and location are dictated by the estimated fixed effects
Linear with respect to the logit, nonlinear with respect to probability B0 = 0 B1 = 1
Predictor X
Predictor X
The logit version of the model will be easier to explain; the probability version of the prediction will be easier to show.
Predicting Logits, Odds, & Prob:

Different but Equivalent Equations
Coefficients for each Y form of the model:
Logit: LN(pi/1-pi) = 0 + 1xi + 2zi
Predictor effects are linear and additive like in regression, but what does a change in the logit mean anyway?
Odds: (pi/1-pi) = exp(0) * exp(1xi) * exp(2zi)

A compromise: effects of predictors are multiplicative
Prob: P(yi=1) =
exp(0) * exp(1xi) * exp(2zi) 1+ exp(0) * exp(1xi) * exp(2zi)
Effects of predictors on probability are nonlinear and non-additive (no one-unit change language allowed)
Example: Categorical Predictor

DV = Admission (0=no, 1=yes) IV = Sex (0=F, 1=M)
Variables in the Equation B Step 1 1 Gender Constant 1.695 -.847 S.E. .976 .690 Wald 3.015 1.508 df 1 1 Sig. .082 .220 Exp(B) 5.444 .429 1. Variable(s) entered on step 1: Gender.
Logit: LN(p/1-p) = 0 + 1xs

Logit Yi = -.847 + 1.695(Ms) note is additive Log odds for women = -.847, for men = .847
Odds: (p/1-p) = exp(0) * exp(1xs)

Odds Yi = exp(-.847) * exp(1.695(Ms=1)) note is multiplicative Multiply, then exp: Odds Ys= .429 * 5.444 = .429 for W, 2.333 for M
Prob: P(yi=1) = exp(0)*exp(1xs) / 1+(exp(0) *exp(1xs)) Prob(Yi=1) = (.429 * 5.444) / 1 + (.429 * 5.444) = .70
So, for men, probability(admit) = .70, odds = 2.334, log odds = 0.847 for women, probability(admit) = .30, odds = 0.429, log odds = -0.847
Example: Continuous Predictor

DV = Senility (0=no, 1=yes) IV = WAIS (0=10)
Variables in the Equation B Step 1 1 WAIS Constant -.275 1.776 S.E. .103 1.067 Wald 7.120 2.771 df 1 1 Sig. .008 .096 Exp(B) .760 5.907 1. Variable(s) entered on step 1: WAIS.
Logit: LN(p/1-p)s = 0 + 1xs
Logit Ys = 1.776 - .275(WAISs=1) = 1.50 note is additive For every one-unit change in WAIS, log odds go down by .275
Odds: (p/1-p)s = exp(0 + 1xs) or exp(0) * exp(1xs) Prob: P(1)s = exp(0 + 1xs) / 1 + exp(0 + 1xs)

Odds Ys = exp(1.776) * exp(- .275(WAISs=1)) note is multiplicative Multiply, then exp: Odds Ys = 5.907 * .760 = 4.49
Prob(Y=1)s = 5.907 * .760(WAISs=1) / 1 + 5.907 * .760(WAISs=1) = .82
So, if WAIS=10, prob(senility) = .86, odds = 5.91, log odds = 1.78 So, if WAIS=11, prob(senility) = .82, odds = 4.49, log odds = 1.50 So, if WAIS=16, prob(senility) = .53, odds = 1.13, log odds = 0.13 So, if WAIS=17, prob(senility) = .46, odds = 0.86, log odds = -0.15
More Models for Binary Outcomes: Logit vs. Probit (Ogive)

2 Alternative Link Functions (both new Ys go from to +)
Logit link: binary Y = ln(p/1-p) logit is new transformed Y Probit link: binary Y = (Y)
Value of standard normal curve below which observed proportion is found (z-score for area to the left for that probability) Z-score is the new Y
Same Model for the Means:

Main effects and interactions of predictors as desired No analog to odds ratios in probit, however
Different Model for the Variances: in both models, ei ~ Bernoulli, but new transformed Ys have a known variance of:
Logit: 2/3, or 3.29 (from the logistic distribution) Probit: 1 (from the standard normal distribution)
Threshold Model Concept

Distribution of Transformed y*
Probit (SD=1) Logit (SD=1.8) Threshold
Youd think it would be 1.8 to rescale, but its 1.7
Rescale to equate model coefficients: BL = 1.7BP
Transformed y* 0 1 Another way these models are explained is with the threshold concept Underlying the observed 0/1 response is really a pretend continuous variable called y*, such that: if y*< 0 then y = 0 and if y* 0 then y = 1, so the model predicts the probability that y* threshold at which 0 becomes 1 Accordingly, the difference between logit and probit is that the continuous underlying variable y* has a variance of 3.29 (SD = 1.8, logit) or 1.0 (probit)
More Models for Binary DVs:

Logit = Probit*1.7
LINK=LOGIT, DIST=BINARY in SAS GLIMMIX, but no probit (both are in Mplus via CATEGORICAL ARE)
Log-Log is for DVs in which 1 is more frequent

LINK=LOGLOG, DIST=BINARY in SAS GLIMMIX (not in Mplus)
Complementary Log-Log is for DVs in which 0 is more frequent

LINK=CLOGLOG, DIST=BINARY in SAS GLIMMIX (not in Mplus)
Cumulative Logit Model for c Ordered Categories (Ordinal)

Models the probability of lower vs. higher cumulative categories via c-1 submodels (e.g., if c = 4):
0 vs. 1,2,3
Submodel1
0,1 vs. 2,3

Submodel2
0,1,2 vs. 3
Submodel3
Submodels can be phrased with intercepts or thresholds:

Intercept = logit of probability of higher response (=thres*-1) Threshold = logit of probability of lower response (=int*-1)
Logit of 0 = 1 int1 Logit of 1 = int1 int2 Logit of 2 = int2 int3 Logit of 3 = int3 0
or or or or
= thres1 0 = thres1 thres2 = thres2 thres3 = 1 thres3

Cumulative Logit Model for c Ordered Categories (Ordinal)

Because the middle probabilities are not given directly, it is called an indirect or difference model Can do this in SAS GLMMIX (LINK=CLOGIT, DIST=MULT) and in Mplus with CATEGORICAL ARE (= graded response model in IRT) In SAS also cumulative versions of log-log (LINK=CUMLOGLOG,) and complementary log-log (LINK=CUMCLL), DIST=MULT for each Cumulative submodels are each estimated using all the data (good), but ordering is assumed (maybe bad) Effects of predictors are assumed the same across submodels Proportional odds assumption
Is testable in Mplus or SAS PROC GLIMMIX via nominal models Is testable more flexibly in SAS PROC NLMIXED
Adjacent Category Logit Model for c Ordered Categories (Ordinal)

Models the probability of a response in any category vs. the next via c-1 submodels (e.g., if c = 4):
0 vs. 1
Submodel1
1 vs. 2
Submodel2
2 vs. 3
Submodel3
Submodels can be phrased with intercepts or thresholds:

Intercept = logit of probability of higher response (=thres*-1) Threshold = logit of probability of lower response (=int*-1)
Logit of 0 = Logit of 1 = Logit of 2 = Logit of 3 =
1 int1 1 int2 1 int3 int3 0
or or or or
= thres1 0 = thres1 0 = thres2 0 = 1 thres3

Adjacent Category Logit Model for c Ordered Categories (Ordinal)

Because the middle probabilities are given directly, it is called a direct or divide by total model Not available in Mplus or SAS PROC GLMIIX
But can specify it yourself using NLMIXED Is partial credit model in IRT
Submodels are each estimated using only the data for those adjacent categories (bad), but ordering is not assumed (good) Effects of predictors are assumed the same across submodels Proportional odds assumption
Is testable more flexibly in SAS PROC NLMIXED
Baseline Category Logit Model for c Ordered Categories (Nominal)

Models the probability of a response in any category vs. the reference category via c-1 submodels (e.g., if c = 4):
0 vs. 1
Submodel1
0 vs. 2
Submodel2
0 vs. 3
Submodel3
Assumes no order whatsoever; is direct model

Can do this in SAS GLMMIX (LINK=GLOGIT, DIST=MULT and in Mplus with NOMINAL ARE (= nominal response model in IRT)
Useful for testing proportional odds assumption because all fixed and random effects are estimated separately per submodel, but may be hard to estimate as a result
Generalized Multilevel Models for Non-Normal Outcomes

Same components as generalized models:
Link function to transform DV into some continuous Linear predictive model (linear for link-transformed DV) Alternative distribution of errors assumed
The difference is that we will add random effects (i.e., additional piles of variance) to address dependency in longitudinal or clustered data
Piles of variance are ADDED TO, not EXTRACTED FROM, the original residual variance pile when it is fixed to a known value Thus, some concepts translate exactly from general multilevel models, but some dont
GEE: An Alternative Approach

GEE Generalized Estimating Equations Marginal Model or Population Average Model
Assumes outcome at each time point is univariate normal, but it does not assume multivariate normality of outcomes over time Uses link functions for discrete outcomes (logit, probit, etc) Treats variances model as nuisance deals with effects of correlation on SEs instead of modeling it (no random effects; all in R matrix) Will provide accurate tests of fixed effects EVEN IF the variances model is way off, so it works well for non-normally distributed outcomes (no assumptions made about distribution of residuals) EXCEPT that it assumes MCAR (which never holds) instead of MAR And that it cant handle unbalanced time And its based on quasi-likelihood, so model comparisons are problematic (stay tuned) So never mind.
Empty Multilevel Logistic Model for Binary Outcomes

Level 1: Level 2: Composite: Logit(yti) = 0i 0i = 00 + U0i Logit(yti) = 00 + U0i
Note whats NOT in level 1
eti residual variance is not estimated 2/3 = 3.29

(Known) residual is in model for actual Y, not prob(Y) or logit(Y)
Logistic ICC =
Var(U0i) Var(U0i) + Var(eti)
Var(U0i) Var(U0i) + 3.29
Can do -2LL difference test to see if Var(U0i) > 0 (although the ICC is somewhat problematic due to non-constant residual variance)
Random Linear Time Model for Binary Outcomes

Level 1: Level 2: Logit(yti) = 0i + 1i(timeti) 0i = 00 + U0i 1i = 10 + U1i Logit(yti) = (00 + U0i) + (10 + U1i)(timeti)
Combined:
eti residual variance is still not estimated 2/3 = 3.29 Can test new fixed or random effects with -2LL difference tests (or Wald test p-values for fixed effects)
Random Linear Time Model for Ordinal Outcomes (c=3)

L1: Logit(yti1) = 0i1 + 1i1(timeti) Logit(yti2) = 0i2 + 1i2(timeti) L2: 0i1 = 001 + U0i1 1i1 = 101 + U1i1 0i2 = 002 + U0i2 1i2 = 102 + U1i2 Assumes proportional odds 001 002 and 101 = 102 and U0i1 = U0i2 and U1i1 = U1i2
Testable via nominal model (all unequal) or using NLMIXED to write a custom model in which some can be constrained eti residual variance is still not estimated 2/3 = 3.29
New Interpretation of Fixed Effects

In general linear mixed models, the fixed effects are interpreted as the average effect for the sample
00 is sample average intercept U0i is individual deviation from sample average
What average means in generalized linear mixed models is different, because the natural log is a nonlinear function:
So the mean of the logs log of the means Therefore, the fixed effects are not the sample average effect, they are the effect for specifically for Ui = 0
Fixed effects are conditional on the random effects This gets called a unit-specific or subject-specific model This distinction does not exist for normally distributed outcomes
New Comparisons across Models without Estimated Residual Variance

NEW RULE: Coefficients cannot be compared across models, because they are not on the same scale! (see Bauer, 2009) e.g., if residual variance = 3.29 in binary models:
When adding a random intercept to an empty model, the total variation in the outcome has increased the fixed effects will increase in size because they are unstandardized slopes
mixed
Var(U 0i )+3.29 (fixed ) 3.29
Residual variance cant decrease due to effects of level-1 predictors, so all other estimates have to go up to compensate
If Xti is uncorrelated with other Xs and is a pure level-1 variable (ICC 0), then fixed and SD(U0i) will increase by same factor
Random effects variances can decrease, though, so level-2 effects should be on the same scale across models if level-1 is the same
A Little Bit about Estimation

Goal: End up with maximum likelihood estimates for all model parameters (because they are consistent, efficient)
When we have a V matrix based on multivariate normally distributed eti residuals at level-1 and multivariate normally distributed Ui terms at level 2, ML is easy When we have a V matrix based on multivariate Bernoulli distributed eti residuals at level-1 and multivariate normally distributed Ui terms at level 2, ML is much harder
Same with any other kind model for not normal level 1 residual ML does not assume normality unless you fit a normal model!
3 main families of estimation approaches:

Quasi-Likelihood methods (marginal/penalized quasi ML) Numerical Integration (adaptive Gaussian quadrature) Also Bayesian methods (MCMC, newly available in SAS or Mplus)
2 Main Types of Estimation

Quasi-Likelihood methods older methods
Marginal QL approximation around fixed part of model Penalized QL approximation around fixed + random parts These both underestimate variances (MQL more so than PQL) 2nd-order PQL is supposed to be better than 1st-order MQL QL methods DO NOT PERMIT MODEL -2LL COMPARISONS HLM program adds Laplace approximation to QL, which then does permit -2LL comparisons (also in SAS GLIMMIX and STATA xtmelogit)
ML via Numerical Integration more available nowadays

Better estimates, but can take for-freaking-ever DOES permit regular model -2LL comparisons Will blow up with many random effects (which make the model exponentially more complex, especially in these models) Good idea to use PQL to get start values first to then use in integration
ML via Numerical Integration

Gold standard of estimation for non-normal outcomes Relies on two assumptions of independence:
Level-1 residuals are independent after controlling for level-2 random effects local independence in CFA/IRT contexts
This means that the joint probability (likelihood) of two observations is just the probability of each multiplied together
Level-2 units are independent (no additional clustering or nesting)

You can add Level-3 random effects to handle dependence, but |then the assumption is independent given L3 random effects
Doesnt assume it knows the individual U values, but does assume that the U values have a multivariate normal distribution (cannot be modified in most programs)

Step 1: Select starting values for all fixed effects Step 2: Compute the likelihood of each observation given by the current parameter values
Model tells you probability(y=1) given model parameters Likelihood (binary data given parameters) = py(1-p)1-y But what about the Us? We dont have them yet no problem! Computing the likelihood value for each set of possible parameters involves removing the individual U values from the equation by integrating across possible U values for each Level-2 unit
Integration is like summing the area under the curve Integration is accomplished by Gaussian Quadrature summing up rectangles that approximate the integral for each Level-2 unit

Step 2 (still): Divide the U distribution into rectangles
Gaussian Quadrature (# rectangles = # quadrature points) Can either divide the whole distribution into rectangles, or take the most likely section for each level-2 unit and rectangle that
This is adaptive quadrature and is computationally more demanding, but gives more accurate results with fewer rectangles The likelihood of each level-2 units outcomes at each U rectangle is then weighted by that rectangles probability of being observed (from the multivariate normal distribution). The weighted likelihoods are then summed across all rectangles ta da! numerical integration
Example of Numeric Integration:

Binary DV, Fixed Linear, Random Intercept Model
1. Start with values for fixed effects: intercept: 00 = 0.5, time: 10 = 2.0, 2. Compute likelihood for real data based on fixed effects and plausible U0i (-2,0,2) using model: logit(yti=1) = 00 + 10(timeti) + U0i
Here for one person at two occasions with yti=1 at both occasions
IFyti=1 IFyti=0 Likelihood Theta Theta Product U0i=2 Logit(yti) Prob 1Prob ifbothy=1 prob width perTheta Time0 0.5+1.5(0)2 1.5 0.18 0.82 0.091213 0.05 2 0.00912 Time1 0.5+1.5(1)2 0.0 0.50 0.50 U0i=0 Logit(yti) Prob 1Prob Time0 0.5+1.5(0)+0 0.5 0.62 0.38 0.54826 0.40 2 0.43861 Time1 0.5+1.5(1)+0 2.0 0.88 0.12 U0i=2 Logit(yti) Prob 1Prob Time0 0.5+1.5(0)+2 2.5 0.92 0.08 0.90752 0.05 2 0.09075 Time1 0.5+1.5(1)+2 4.0 0.98 0.02 OverallLikelihood(SumofProductsoverAllThetas): 0.53848 (dothisforeachoccasion,thenmultiplythiswholethingoverallpeople) (repeatwithnewvaluesoffixedeffectsuntilfindhighestoveralllikelihood)

Step 3: Decide if the algorithm should stop
Algorithm should stop once values of the likelihood dont change much from iteration to iteration
Convergence criterion is typically a tiny number like 0.000001
However, if the change in likelihood value from the previous iteration to the current iteration is larger than your criterion
Step 4: Choose new parameter values (repeat as needed)

Many methods are available to make the selection:
Newton-Rhapson: uses the shape of the likelihood function to find parameter values closer to the maximum, provides SEs based on observed information (come from second derivatives of LL) Fisher Scoring: same idea, but SEs based on expected information (which is generally not recommended) EM algorithm: Us are treated as missing data (SEs based on expected info) Mplus alternates between approaches to maximize efficiency
Tobit Models: Censored Outcomes

For outcomes with ceiling or floor effects
Can be Right censored and/or left censored Also inflated or not inflation = binary variable in which 1 = censored, 0 = not censored
Model assumes continuous normal distribution for y* instead (y without missing responses; no transformation otherwise) In Mplus (not in GLIMMIX), use CENSORED ARE (with options):
CENSORED ARE y1 (a) y2 (b) y3 (ai) y4 (bi);
y1 is censored from above (right); y2 is censored from below (left) y3 is censored from above (right) and has inflation variable (inflated: y3#1) y4 is censored from above (below) and has inflation variable (inflated: y4#1)
So, can predict distribution of y1-y4, as well as whether or not y3 and y4 are censored (inflation) as separate outcomes
y3 ON x; y3#1 ON x; x predicts value of Y if at censoring point or above x predicts whether Y is censored (1) or not (0)
Models for Proportions

DVs may be bounded from 0-1, even if they arent binary (%correct) Need logit link to prevent predicted Y from going out of 0-1 bounds
Predicted Y shuts off as it approaches 0 or 1, same as for binary DVs
But because Y is not binary, it uses the binomial distribution for its residuals, such that Y is modeled as: Y= #events / #trials
Can be predicted as 2 variables in SAS GLIMMIX when #trials varies:
MODEL events/trials = (predictors) / SOLUTION LINK=LOGIT DIST=BIN;
Or can be predicted directly as proportion in SAS GLIMMIX otherwise:

MODEL accuracy = (predictors) / SOLUTION LINK=LOGIT DIST=BIN;
Normal distribution for residuals only works ok if the mean ~ .50
In single-level models, you can also estimate a dispersion or scale parameter that allows the residual variance to exceed what it is predicted to be (from p*[1-p]), but this doesnt seem to be possible using true ML in GLIMMIX for multilevel models
Models for Skewed Continuous DVs.
Log-Normal:
y* = LN(yi), so LN(ei) but not ei is assumed normally distributed
-- LINK=IDENTITY, DIST=LOGNORMAL in SAS GLIMMIX (not in Mplus) -- Or just log-transform Y, fit the exact same model using MIXED! Psyc 945 Class 14a
43 of 52
Models for Skewed Continuous DVs.
Gamma:
y* = LN(y), and then LN(e) is assumed gamma distributed with shape and scale parameters
-- LINK=LOG, DIST=GAMMA in SAS GLIMMIX (not in Mplus)
RT Data from Chapter 15
SAS Code to Make this Type of Plot

ODS RTF FILE="&filesave./RT Histogram.rtf"; PROC UNIVARIATE DATA=&datafile.; VAR RT_sec;
HISTOGRAM RT_sec / OUTHISTOGRAM=RTdistN NORMAL(COLOR=red W=5 MU=est SIGMA=est) HAXIS=AXIS2; HISTOGRAM RT_sec / OUTHISTOGRAM=RTdistLN LOGNORMAL(COLOR=purple W=5 SIGMA=est ZETA=est) HAXIS=AXIS2; HISTOGRAM RT_sec / OUTHISTOGRAM=RTdistG GAMMA(COLOR=blue W=5 ALPHA=est SIGMA=est) HAXIS=AXIS2; * Settings for X-axis; AXIS2 LABEL=(HEIGHT=1.5 "RT in Seconds") ORDER=(0 TO 60 BY 5); RUN; ODS RTF CLOSE; * Rename predicted DATA RTdistN; SET DATA RTdistLN; SET DATA RTdistG; SET DATA RTdist; MERGE curve variable, merge into single data set, export to excel; RTdistN; Normal=_EXPPCT_; DROP _CURVE_ _EXPPCT_; run; RTdistLN; LogNormal=_EXPPCT_; DROP _CURVE_ _EXPPCT_; run; RTdistG; Gamma=_EXPPCT_; DROP _CURVE_ _EXPPCT_; run; RTdistN RTdistLN RTdistG; BY _MIDPT_; RUN;
Models for Count Outcomes

Counts: non-negative integer unbounded responses
e.g., how many cigarettes did you smoke this week?
Poisson and negative binomial models

y* = LN(y), which makes the count stay positive Residuals follow 1 of 2 distributions:
Poisson distribution in which = Mean = Variance Negative binomial distribution that includes a new scaling or dispersion parameter that allows the variance to be bigger than the mean variance = (1 + ) if > 0
Negative binomial mixes gamma distribution with Poisson Poisson is nested within negative binomial (can test of 0)
In Mplus: COUNT ARE y1 (p) y2 (nb); SAS GLIMMIX: LINK=LOG, DIST=Poisson or DIST=Negbin
Models for Count DVs.
Poisson: y* = LN(y), and LN(e) is assumed Poisson distributed such that mean = variance, dispersion = 0 Negative Binomial: y* = LN(y), and LN(e) is assumed Gamma distributed, such that varianceNB = varianceP(1 + varianceP*)
Modeling Zeros in Count Data

No zeros zero-truncated negative binomial
e.g., how many days were you in the hospital? (has to be >0) In Mplus only: COUNT ARE y1 (nbt);
Too many zeros zero-inflated Poisson or NB

e.g., # cigarettes smoked when asked in non-smokers too Only in Mplus only: COUNT ARE y2 (pi) y3 (nbi);
Refer to inflation variable as y2#1 or y3#1
Tries to distinguish two kinds of zeros

Structural zeros would never do it Inflation is predicted as logit of being a structural zero Expected zeros could do it, just didnt (part of regular count) Count with expected zeros predicted by Poisson or NB
Poisson or NB without inflation is nested within the same models with inflation (and Poisson is nested within NB) Psyc 945 Class 14a
49 of 52
Zero-Inflated Poisson and Negative Binomial Distributions

Zero-inflated distributions have extra structural zeros not expected from Poisson or NB (stretched Poisson) distributions. This can be tricky to estimate and interpret because the model distinguishes between kinds of zeros rather than zero or not... Image borrowed from Atkins & Gallop, 2007
Modeling Zeros More Generally

Other more direct ways of dealing with too many zeros: split distribution into (0 or not) and (if not 0, how much)?
Negative binomial hurdle (or zero-altered negative binomial)
In Mplus: COUNT ARE y1 (nbh); 0 or not: predicted by logit of being a 0 (0 is the higher category) How much: predicted by non-zero NB for counts (integers only)
In Mplus: Two-part model uses DATA TWOPART: command

NAMES ARE y1-y4; list outcomes to be split into 2 parts CUTPOINT IS 0; where to split observed outcomes BINARY ARE b1-b4; create names for 0 or not part CONTINUOUS ARE c1-c4; create names for how much part TRANSFORM IS LOG; transformation of continuous part 0 or not: predicted by logit of being NOT 0 (something is the 1) How much: predicted as continuous distribution (like log or normal)
Summary: Differences in Generalized Mixed Models

Analyze link-transformed DV (e.g., via logit, log, log-log)
Linear relationship between Xs and transformed Y Nonlinear relationship between Xs and original Y
Original e residuals are assumed to follow some non-normal distribution
In models for binary or count data, Level-1 residual variance is set

So it cant go down after adding level-1 predictors, which means that the scale of everything else has to go UP to compensate Scale of model will also be different after adding random effects for the same reason the total variation in the model is now bigger Fixed effects may not be comparable across models as a result
Estimation is trickier and takes longer

Numerical integration is best but may blow up in complex models Start values are often essential (can get those with MSPL estimator)

GLM Models for Binary Outcomes

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

GLM Models for Binary Outcomes

Uploaded by

Copyright:

Available Formats

Generalized Multilevel Models

Psyc 945 Class 14a 1 of 52

The Two Sides of ANY Model

Model for the Variances:

A World View of Models

All models have fixed effects, and then:

Psyc 945 Class 14a 3 of 52

What kind of outcome? Generalized vs. General

Psyc 945 Class 14a 4 of 52

3 Parts of a Generalized Linear Model

Model for the Means (Structural Model):

Model for the Variance (Sampling/Stochastic Model):

Psyc 945 Class 14a 5 of 52

Examples of Generalized Models

Zero-inflated (pile of 0s, then some distribution after)

Means and Variances by DV Type

If 3+ options get used, the variance is NOT determined by the mean

Binary DV: Var(Y) = py(1-py) = pyqy = y2 sy2

A General Linear Model for Binary Outcomes?

Model becomes Yi = (predicted probability of 1) + ei

A General Linear Model With Binary Outcomes?

A General Linear Model With Binary Outcomes?

3 Problems with General* Linear Models for Binary Outcomes

Variance is dependent on the mean, and not estimated

Residuals have a limited number of possible values

Generalized Models for Binary Outcomes

Take natural log of odds ratio called logit link

Turning Probability into Logits

Psyc 945 Class 14a 13 of 52

Transforming Probabilities to Logits

Probability 0.99 0.90

Logit 4.6 2.2 0.0 -2.2

Predicting Logits, Odds, & Prob:

Odds: (pi/1-pi) = exp(0) * exp(1xi) * exp(2zi)

exp(0) * exp(1xi) * exp(2zi) 1+ exp(0) * exp(1xi) * exp(2zi)

Example: Categorical Predictor

Logit: LN(p/1-p) = 0 + 1xs

Odds: (p/1-p) = exp(0) * exp(1xs)

Example: Continuous Predictor

Logit: LN(p/1-p)s = 0 + 1xs

Prob(Y=1)s = 5.907 * .760(WAISs=1) / 1 + 5.907 * .760(WAISs=1) = .82

Psyc 945 Class 14a 18 of 52

More Models for Binary Outcomes: Logit vs. Probit (Ogive)

Same Model for the Means:

Threshold Model Concept

Rescale to equate model coefficients: BL = 1.7BP

Psyc 945 Class 14a 20 of 52

More Models for Binary DVs:

Log-Log is for DVs in which 1 is more frequent

Complementary Log-Log is for DVs in which 0 is more frequent

Cumulative Logit Model for c Ordered Categories (Ordinal)

0,1 vs. 2,3

Submodels can be phrased with intercepts or thresholds:

= thres1 0 = thres1 thres2 = thres2 thres3 = 1 thres3

Cumulative Logit Model for c Ordered Categories (Ordinal)

Psyc 945 Class 14a 23 of 52

Adjacent Category Logit Model for c Ordered Categories (Ordinal)

Submodels can be phrased with intercepts or thresholds:

Logit of 0 = Logit of 1 = Logit of 2 = Logit of 3 =

1 int1 1 int2 1 int3 int3 0

= thres1 0 = thres1 0 = thres2 0 = 1 thres3

Adjacent Category Logit Model for c Ordered Categories (Ordinal)

Baseline Category Logit Model for c Ordered Categories (Nominal)