You are on page 1of 26

Generalized Multilevel Models

Todays Topics:
Introduction to Generalized Multilevel Models Models for Binary Outcomes Other Generalized Models Estimation in Generalized Multilevel Models

Psyc 945 Class 14a 1 of 52

The Two Sides of ANY Model


Model for the Means:
Aka Fixed Effects, Structural Part of Model What you are used to caring about for testing hypotheses How the expected outcome for a given observation varies as a function of values on predictor variables

Model for the Variances:


Aka Random Effects and Residuals, Stochastic Part of Model What you are used to making assumptions about instead How model residuals are related and distributed across cases Validity of the tests of the predictors depends on having the right model for the variances (where right means least wrong)
Estimates will usually be ok they come from model for the means Standard errors (and p-values) of estimates can be compromised
Psyc 945 Class 14a 2 of 52

A World View of Models


Statistical models can be broadly organized as:
General (normal outcome) vs. Generalized (not normal outcome) One dimension of sampling (one variance term per outcome) vs. multiple dimensions of sampling (multiple variance terms)
Fixed effects only vs. mixed (fixed and random effects = multilevel)

All models have fixed effects, and then:


General Linear Models: fixed effects, no random effects General Linear Mixed Models: fixed and random effects Generalized Linear Models: fixed effects through link functions, no random effects Generalized Linear Mixed Models: fixed and random effects through link functions Linear means the fixed effects predict the link-transformed DV in a linear combination of (effect*predictor) + (effect*predictor)

Psyc 945 Class 14a 3 of 52

What kind of outcome? Generalized vs. General


Generalized Linear Models General Linear Models whose residuals follow some not-normal distribution and in which a link-transformed Y is predicted instead of Y Many kinds of non-normally distributed outcomes have some kind of generalized linear model to go with them:
Binary (dichotomous) Unordered categorical (nominal) These two are often called multinomial inconsistently Ordered categorical (ordinal) Counts (discrete, positive values) Censored (piled up and cut off at one end left or right) Zero-inflated (pile of 0s, then some distribution after) Continuous but skewed data (pile on one end, long tail)

Psyc 945 Class 14a 4 of 52

3 Parts of a Generalized Linear Model


Link Function (main difference from GLM):
How a non-normal outcome gets transformed into something we can predict that is more continuous (unbounded) For outcomes that are already normal, general linear models are just a special case with an identity link function (Y * 1)

Model for the Means (Structural Model):


How predictors linearly relate to the link-transformed outcome New link-transformed Yi = 0 + 1xi + 2zi

Model for the Variance (Sampling/Stochastic Model):


If the errors arent normally distributed, then what are they? Family of alternative distributions at our disposal that map onto what the distribution of errors could possibly look like

Psyc 945 Class 14a 5 of 52

Examples of Generalized Models


Binary (dichotomous) Logit, Probit, (Complementary) Log-Log Unordered categorical (nominal) Baseline Category Logit Ordered categorical (ordinal) Cumulative or Adjacent Category Logit Counts (discrete, positive values)
Poisson or Negative Binomial (for too little/much variance)

Zero-inflated (pile of 0s, then some distribution after)


ZIP/ZINB: Poisson/Negative Binomial + excess of 0s Two-Part Model: Logit for Whether, Normal/Log for How Much?

Continuous but Skewed Log-Normal or Gamma Censored (piled up and cut off at one end)
Tobit
Psyc 945 Class 14a 6 of 52

Means and Variances by DV Type


Means:
Quantitative DV mean = sum items / n = y y Binary DV mean = number correct / n = y py

Variances:
Quantitative DV: Var(Y) =

(Y -Y)
i i=1

n-1

If 3+ options get used, the variance is NOT determined by the mean

Binary DV: Var(Y) = py(1-py) = pyqy = y2 sy2


In binary DVs, the variance IS determined by the mean (py)

p variance
Psyc 945 Class 14a 7 of 52

A General Linear Model for Binary Outcomes?


If Yi is a binary (0 or 1) outcome
Expected mean is proportion of people who have a 1 (or p, the probability of Yi=1 in the sample)
The probability of having a 1 is what were trying to predict for each person, given the values of his/her predictors General linear model: Yi = 0 + 1xi + 2zi + ei 0 = expected probability when all predictors are 0 s = expected change in probability for a one-unit change in the predictor ei = difference between observed and predicted values

Model becomes Yi = (predicted probability of 1) + ei


Psyc 945 Class 14a 8 of 52

A General Linear Model With Binary Outcomes?


But if Yi is binary, then ei can only be 2 things:
ei = Observed Yi minus Predicted Yi
If Yi = 0 then ei = (0 predicted probability) If Yi = 1 then ei = (1 predicted probability)

Mean of errors would still be 0 But variance of errors cant possibly be constant over levels of X like we assume in general linear models
The mean and variance of a binary outcome are dependent! This means that because the conditional mean of Y (p, the predicted probability Y= 1) is dependent on X, then so is the error variance
Psyc 945 Class 14a 9 of 52

A General Linear Model With Binary Outcomes?


How can we have a linear relationship between X & Y?
Probability of a 1 is bounded between 0 and 1, but predicted probabilities from a linear model arent bounded impossible values Linear relationship needs to shut off somehow made non-linear
1.40 1.20 1.00 0.80 0.60 0.40 0.20 0.00 -0.20 -0.40

??
Prob (Y=1)

??
1 2 3 4 5 6 7 8 9 10 11 X Predictor

1.40 1.20 1.00 0.80 0.60 0.40 0.20 0.00 -0.20 -0.40 1 2 3 4 5 6 7 8 9 10 11 X Predictor
Psyc 945 Class 14a 10 of 52

Prob (Y=1)

3 Problems with General* Linear Models for Binary Outcomes


*General = model for continuous, normal outcome Restricted range (e.g., 0 to 1 for binary item)
Predictors should not be linearly related to observed outcome Effects of predictors need to be shut off at some point to keep predicted values of binary outcome within range

Variance is dependent on the mean, and not estimated


Fixed and random parts are related So residuals cant have constant variance

Residuals have a limited number of possible values


Predicted values can each only be off in two ways So residuals cant be normally distributed
Psyc 945 Class 14a 11 of 52

Generalized Models for Binary Outcomes


Rather than modeling the probability of a 1 directly, we need to transform it into a more continuous variable with a link function, for example:
We could transform probability into an odds ratio:
Odds ratio: (p / 1-p) prob(1) / prob(0) If p = .7, then Odds(1) = 2.33; Odds(0) = .429 Odds scale is way skewed, asymmetric, and ranges from 0 to + Nope, thats not helpful

Take natural log of odds ratio called logit link


LN (p / 1-p) Natural log of (prob(1) / prob(0)) If p = .7, then LN(Odds(1)) = .846; LN(Odds(0)) = -.846 Logit scale is now symmetric about 0 DING
Psyc 945 Class 14a 12 of 52

Turning Probability into Logits


Logit is a nonlinear transformation of probability:
Equal intervals in logits are NOT equal in probability The logit goes from and is symmetric about prob = .5 (logit = 0) This solves the problem of using a linear model
The model will be linear with respect to the logit, which translates into nonlinear with respect to probability (i.e., it shuts off as needed) Probability: p Zero-point on each scale: Prob = .5 Odds = 1 Logit = 0

Logit: LN (p / 1-p)

Psyc 945 Class 14a 13 of 52

Transforming Probabilities to Logits


1.0

Probability 0.99 0.90

Logit 4.6 2.2 0.0 -2.2

Probabilty

0.6

0.8

0.50 0.10

0.4

Can you guess what a probability of .01 would be on the logit scale?

0.0

0.2

-4

-2

0 Logit

4
Psyc 945 Class 14a 14 of 52

Nonlinearity in Prediction
The relationship between X and the probability of response=1 is nonlinear an s-shaped logistic curve whose shape and location are dictated by the estimated fixed effects
Linear with respect to the logit, nonlinear with respect to probability B0 = 0 B1 = 1

Predictor X

Predictor X

The logit version of the model will be easier to explain; the probability version of the prediction will be easier to show.
Psyc 945 Class 14a 15 of 52

Predicting Logits, Odds, & Prob:


Different but Equivalent Equations
Coefficients for each Y form of the model:
Logit: LN(pi/1-pi) = 0 + 1xi + 2zi
Predictor effects are linear and additive like in regression, but what does a change in the logit mean anyway?

Odds: (pi/1-pi) = exp(0) * exp(1xi) * exp(2zi)


A compromise: effects of predictors are multiplicative

Prob: P(yi=1) =

exp(0) * exp(1xi) * exp(2zi) 1+ exp(0) * exp(1xi) * exp(2zi)

Effects of predictors on probability are nonlinear and non-additive (no one-unit change language allowed)
Psyc 945 Class 14a 16 of 52

Example: Categorical Predictor


DV = Admission (0=no, 1=yes) IV = Sex (0=F, 1=M)
Variables in the Equation B Step 1 1 Gender Constant 1.695 -.847 S.E. .976 .690 Wald 3.015 1.508 df 1 1 Sig. .082 .220 Exp(B) 5.444 .429 1. Variable(s) entered on step 1: Gender.

Logit: LN(p/1-p) = 0 + 1xs


Logit Yi = -.847 + 1.695(Ms) note is additive Log odds for women = -.847, for men = .847

Odds: (p/1-p) = exp(0) * exp(1xs)


Odds Yi = exp(-.847) * exp(1.695(Ms=1)) note is multiplicative Multiply, then exp: Odds Ys= .429 * 5.444 = .429 for W, 2.333 for M

Prob: P(yi=1) = exp(0)*exp(1xs) / 1+(exp(0) *exp(1xs)) Prob(Yi=1) = (.429 * 5.444) / 1 + (.429 * 5.444) = .70
So, for men, probability(admit) = .70, odds = 2.334, log odds = 0.847 for women, probability(admit) = .30, odds = 0.429, log odds = -0.847
Psyc 945 Class 14a 17 of 52

Example: Continuous Predictor


DV = Senility (0=no, 1=yes) IV = WAIS (0=10)
Variables in the Equation B Step 1 1 WAIS Constant -.275 1.776 S.E. .103 1.067 Wald 7.120 2.771 df 1 1 Sig. .008 .096 Exp(B) .760 5.907 1. Variable(s) entered on step 1: WAIS.

Logit: LN(p/1-p)s = 0 + 1xs

Logit Ys = 1.776 - .275(WAISs=1) = 1.50 note is additive For every one-unit change in WAIS, log odds go down by .275

Odds: (p/1-p)s = exp(0 + 1xs) or exp(0) * exp(1xs) Prob: P(1)s = exp(0 + 1xs) / 1 + exp(0 + 1xs)

Odds Ys = exp(1.776) * exp(- .275(WAISs=1)) note is multiplicative Multiply, then exp: Odds Ys = 5.907 * .760 = 4.49

Prob(Y=1)s = 5.907 * .760(WAISs=1) / 1 + 5.907 * .760(WAISs=1) = .82

So, if WAIS=10, prob(senility) = .86, odds = 5.91, log odds = 1.78 So, if WAIS=11, prob(senility) = .82, odds = 4.49, log odds = 1.50 So, if WAIS=16, prob(senility) = .53, odds = 1.13, log odds = 0.13 So, if WAIS=17, prob(senility) = .46, odds = 0.86, log odds = -0.15

Psyc 945 Class 14a 18 of 52

More Models for Binary Outcomes: Logit vs. Probit (Ogive)


2 Alternative Link Functions (both new Ys go from to +)
Logit link: binary Y = ln(p/1-p) logit is new transformed Y Probit link: binary Y = (Y)
Value of standard normal curve below which observed proportion is found (z-score for area to the left for that probability) Z-score is the new Y

Same Model for the Means:


Main effects and interactions of predictors as desired No analog to odds ratios in probit, however

Different Model for the Variances: in both models, ei ~ Bernoulli, but new transformed Ys have a known variance of:
Logit: 2/3, or 3.29 (from the logistic distribution) Probit: 1 (from the standard normal distribution)
Psyc 945 Class 14a 19 of 52

Threshold Model Concept


Distribution of Transformed y*
Probit (SD=1) Logit (SD=1.8) Threshold
Youd think it would be 1.8 to rescale, but its 1.7

Rescale to equate model coefficients: BL = 1.7BP

Transformed y* 0 1 Another way these models are explained is with the threshold concept Underlying the observed 0/1 response is really a pretend continuous variable called y*, such that: if y*< 0 then y = 0 and if y* 0 then y = 1, so the model predicts the probability that y* threshold at which 0 becomes 1 Accordingly, the difference between logit and probit is that the continuous underlying variable y* has a variance of 3.29 (SD = 1.8, logit) or 1.0 (probit)

Psyc 945 Class 14a 20 of 52

More Models for Binary DVs:


Logit = Probit*1.7
LINK=LOGIT, DIST=BINARY in SAS GLIMMIX, but no probit (both are in Mplus via CATEGORICAL ARE)

Log-Log is for DVs in which 1 is more frequent


LINK=LOGLOG, DIST=BINARY in SAS GLIMMIX (not in Mplus)

Complementary Log-Log is for DVs in which 0 is more frequent


LINK=CLOGLOG, DIST=BINARY in SAS GLIMMIX (not in Mplus)
Psyc 945 Class 14a 21 of 52

Cumulative Logit Model for c Ordered Categories (Ordinal)


Models the probability of lower vs. higher cumulative categories via c-1 submodels (e.g., if c = 4):
0 vs. 1,2,3
Submodel1

0,1 vs. 2,3


Submodel2

0,1,2 vs. 3
Submodel3

Submodels can be phrased with intercepts or thresholds:


Intercept = logit of probability of higher response (=thres*-1) Threshold = logit of probability of lower response (=int*-1)

Logit of 0 = 1 int1 Logit of 1 = int1 int2 Logit of 2 = int2 int3 Logit of 3 = int3 0

or or or or

= thres1 0 = thres1 thres2 = thres2 thres3 = 1 thres3


Psyc 945 Class 14a 22 of 52

Cumulative Logit Model for c Ordered Categories (Ordinal)


Because the middle probabilities are not given directly, it is called an indirect or difference model Can do this in SAS GLMMIX (LINK=CLOGIT, DIST=MULT) and in Mplus with CATEGORICAL ARE (= graded response model in IRT) In SAS also cumulative versions of log-log (LINK=CUMLOGLOG,) and complementary log-log (LINK=CUMCLL), DIST=MULT for each Cumulative submodels are each estimated using all the data (good), but ordering is assumed (maybe bad) Effects of predictors are assumed the same across submodels Proportional odds assumption
Is testable in Mplus or SAS PROC GLIMMIX via nominal models Is testable more flexibly in SAS PROC NLMIXED

Psyc 945 Class 14a 23 of 52

Adjacent Category Logit Model for c Ordered Categories (Ordinal)


Models the probability of a response in any category vs. the next via c-1 submodels (e.g., if c = 4):
0 vs. 1
Submodel1

1 vs. 2
Submodel2

2 vs. 3
Submodel3

Submodels can be phrased with intercepts or thresholds:


Intercept = logit of probability of higher response (=thres*-1) Threshold = logit of probability of lower response (=int*-1)

Logit of 0 = Logit of 1 = Logit of 2 = Logit of 3 =

1 int1 1 int2 1 int3 int3 0

or or or or

= thres1 0 = thres1 0 = thres2 0 = 1 thres3


Psyc 945 Class 14a 24 of 52

Adjacent Category Logit Model for c Ordered Categories (Ordinal)


Because the middle probabilities are given directly, it is called a direct or divide by total model Not available in Mplus or SAS PROC GLMIIX
But can specify it yourself using NLMIXED Is partial credit model in IRT

Submodels are each estimated using only the data for those adjacent categories (bad), but ordering is not assumed (good) Effects of predictors are assumed the same across submodels Proportional odds assumption
Is testable more flexibly in SAS PROC NLMIXED
Psyc 945 Class 14a 25 of 52

Baseline Category Logit Model for c Ordered Categories (Nominal)


Models the probability of a response in any category vs. the reference category via c-1 submodels (e.g., if c = 4):
0 vs. 1
Submodel1

0 vs. 2
Submodel2

0 vs. 3
Submodel3

Assumes no order whatsoever; is direct model


Can do this in SAS GLMMIX (LINK=GLOGIT, DIST=MULT and in Mplus with NOMINAL ARE (= nominal response model in IRT)

Useful for testing proportional odds assumption because all fixed and random effects are estimated separately per submodel, but may be hard to estimate as a result
Psyc 945 Class 14a 26 of 52

Generalized Multilevel Models for Non-Normal Outcomes


Same components as generalized models:
Link function to transform DV into some continuous Linear predictive model (linear for link-transformed DV) Alternative distribution of errors assumed

The difference is that we will add random effects (i.e., additional piles of variance) to address dependency in longitudinal or clustered data
Piles of variance are ADDED TO, not EXTRACTED FROM, the original residual variance pile when it is fixed to a known value Thus, some concepts translate exactly from general multilevel models, but some dont
Psyc 945 Class 14a 27 of 52

GEE: An Alternative Approach


GEE Generalized Estimating Equations Marginal Model or Population Average Model
Assumes outcome at each time point is univariate normal, but it does not assume multivariate normality of outcomes over time Uses link functions for discrete outcomes (logit, probit, etc) Treats variances model as nuisance deals with effects of correlation on SEs instead of modeling it (no random effects; all in R matrix) Will provide accurate tests of fixed effects EVEN IF the variances model is way off, so it works well for non-normally distributed outcomes (no assumptions made about distribution of residuals) EXCEPT that it assumes MCAR (which never holds) instead of MAR And that it cant handle unbalanced time And its based on quasi-likelihood, so model comparisons are problematic (stay tuned) So never mind.
Psyc 945 Class 14a 28 of 52

Empty Multilevel Logistic Model for Binary Outcomes


Level 1: Level 2: Composite: Logit(yti) = 0i 0i = 00 + U0i Logit(yti) = 00 + U0i
Note whats NOT in level 1

eti residual variance is not estimated 2/3 = 3.29


(Known) residual is in model for actual Y, not prob(Y) or logit(Y)

Logistic ICC =

Var(U0i) Var(U0i) + Var(eti)

Var(U0i) Var(U0i) + 3.29

Can do -2LL difference test to see if Var(U0i) > 0 (although the ICC is somewhat problematic due to non-constant residual variance)
Psyc 945 Class 14a 29 of 52

Random Linear Time Model for Binary Outcomes


Level 1: Level 2: Logit(yti) = 0i + 1i(timeti) 0i = 00 + U0i 1i = 10 + U1i Logit(yti) = (00 + U0i) + (10 + U1i)(timeti)

Combined:

eti residual variance is still not estimated 2/3 = 3.29 Can test new fixed or random effects with -2LL difference tests (or Wald test p-values for fixed effects)

Psyc 945 Class 14a 30 of 52

Random Linear Time Model for Ordinal Outcomes (c=3)


L1: Logit(yti1) = 0i1 + 1i1(timeti) Logit(yti2) = 0i2 + 1i2(timeti) L2: 0i1 = 001 + U0i1 1i1 = 101 + U1i1 0i2 = 002 + U0i2 1i2 = 102 + U1i2 Assumes proportional odds 001 002 and 101 = 102 and U0i1 = U0i2 and U1i1 = U1i2
Testable via nominal model (all unequal) or using NLMIXED to write a custom model in which some can be constrained eti residual variance is still not estimated 2/3 = 3.29
Psyc 945 Class 14a 31 of 52

New Interpretation of Fixed Effects


In general linear mixed models, the fixed effects are interpreted as the average effect for the sample
00 is sample average intercept U0i is individual deviation from sample average

What average means in generalized linear mixed models is different, because the natural log is a nonlinear function:
So the mean of the logs log of the means Therefore, the fixed effects are not the sample average effect, they are the effect for specifically for Ui = 0
Fixed effects are conditional on the random effects This gets called a unit-specific or subject-specific model This distinction does not exist for normally distributed outcomes
Psyc 945 Class 14a 32 of 52

New Comparisons across Models without Estimated Residual Variance


NEW RULE: Coefficients cannot be compared across models, because they are not on the same scale! (see Bauer, 2009) e.g., if residual variance = 3.29 in binary models:
When adding a random intercept to an empty model, the total variation in the outcome has increased the fixed effects will increase in size because they are unstandardized slopes

mixed

Var(U 0i )+3.29 (fixed ) 3.29

Residual variance cant decrease due to effects of level-1 predictors, so all other estimates have to go up to compensate
If Xti is uncorrelated with other Xs and is a pure level-1 variable (ICC 0), then fixed and SD(U0i) will increase by same factor

Random effects variances can decrease, though, so level-2 effects should be on the same scale across models if level-1 is the same
Psyc 945 Class 14a 33 of 52

A Little Bit about Estimation


Goal: End up with maximum likelihood estimates for all model parameters (because they are consistent, efficient)
When we have a V matrix based on multivariate normally distributed eti residuals at level-1 and multivariate normally distributed Ui terms at level 2, ML is easy When we have a V matrix based on multivariate Bernoulli distributed eti residuals at level-1 and multivariate normally distributed Ui terms at level 2, ML is much harder
Same with any other kind model for not normal level 1 residual ML does not assume normality unless you fit a normal model!

3 main families of estimation approaches:


Quasi-Likelihood methods (marginal/penalized quasi ML) Numerical Integration (adaptive Gaussian quadrature) Also Bayesian methods (MCMC, newly available in SAS or Mplus)
Psyc 945 Class 14a 34 of 52

2 Main Types of Estimation


Quasi-Likelihood methods older methods
Marginal QL approximation around fixed part of model Penalized QL approximation around fixed + random parts These both underestimate variances (MQL more so than PQL) 2nd-order PQL is supposed to be better than 1st-order MQL QL methods DO NOT PERMIT MODEL -2LL COMPARISONS HLM program adds Laplace approximation to QL, which then does permit -2LL comparisons (also in SAS GLIMMIX and STATA xtmelogit)

ML via Numerical Integration more available nowadays


Better estimates, but can take for-freaking-ever DOES permit regular model -2LL comparisons Will blow up with many random effects (which make the model exponentially more complex, especially in these models) Good idea to use PQL to get start values first to then use in integration
Psyc 945 Class 14a 35 of 52

ML via Numerical Integration


Gold standard of estimation for non-normal outcomes Relies on two assumptions of independence:
Level-1 residuals are independent after controlling for level-2 random effects local independence in CFA/IRT contexts
This means that the joint probability (likelihood) of two observations is just the probability of each multiplied together

Level-2 units are independent (no additional clustering or nesting)


You can add Level-3 random effects to handle dependence, but |then the assumption is independent given L3 random effects

Doesnt assume it knows the individual U values, but does assume that the U values have a multivariate normal distribution (cannot be modified in most programs)
Psyc 945 Class 14a 36 of 52

ML via Numerical Integration


Step 1: Select starting values for all fixed effects Step 2: Compute the likelihood of each observation given by the current parameter values
Model tells you probability(y=1) given model parameters Likelihood (binary data given parameters) = py(1-p)1-y But what about the Us? We dont have them yet no problem! Computing the likelihood value for each set of possible parameters involves removing the individual U values from the equation by integrating across possible U values for each Level-2 unit
Integration is like summing the area under the curve Integration is accomplished by Gaussian Quadrature summing up rectangles that approximate the integral for each Level-2 unit
Psyc 945 Class 14a 37 of 52

ML via Numerical Integration


Step 2 (still): Divide the U distribution into rectangles
Gaussian Quadrature (# rectangles = # quadrature points) Can either divide the whole distribution into rectangles, or take the most likely section for each level-2 unit and rectangle that
This is adaptive quadrature and is computationally more demanding, but gives more accurate results with fewer rectangles The likelihood of each level-2 units outcomes at each U rectangle is then weighted by that rectangles probability of being observed (from the multivariate normal distribution). The weighted likelihoods are then summed across all rectangles ta da! numerical integration

Psyc 945 Class 14a 38 of 52

Example of Numeric Integration:


Binary DV, Fixed Linear, Random Intercept Model
1. Start with values for fixed effects: intercept: 00 = 0.5, time: 10 = 2.0, 2. Compute likelihood for real data based on fixed effects and plausible U0i (-2,0,2) using model: logit(yti=1) = 00 + 10(timeti) + U0i
Here for one person at two occasions with yti=1 at both occasions
IFyti=1 IFyti=0 Likelihood Theta Theta Product U0i=2 Logit(yti) Prob 1Prob ifbothy=1 prob width perTheta Time0 0.5+1.5(0)2 1.5 0.18 0.82 0.091213 0.05 2 0.00912 Time1 0.5+1.5(1)2 0.0 0.50 0.50 U0i=0 Logit(yti) Prob 1Prob Time0 0.5+1.5(0)+0 0.5 0.62 0.38 0.54826 0.40 2 0.43861 Time1 0.5+1.5(1)+0 2.0 0.88 0.12 U0i=2 Logit(yti) Prob 1Prob Time0 0.5+1.5(0)+2 2.5 0.92 0.08 0.90752 0.05 2 0.09075 Time1 0.5+1.5(1)+2 4.0 0.98 0.02 OverallLikelihood(SumofProductsoverAllThetas): 0.53848 (dothisforeachoccasion,thenmultiplythiswholethingoverallpeople) (repeatwithnewvaluesoffixedeffectsuntilfindhighestoveralllikelihood)

Psyc 945 Class 14a 39 of 52

ML via Numerical Integration


Step 3: Decide if the algorithm should stop
Algorithm should stop once values of the likelihood dont change much from iteration to iteration
Convergence criterion is typically a tiny number like 0.000001

However, if the change in likelihood value from the previous iteration to the current iteration is larger than your criterion

Step 4: Choose new parameter values (repeat as needed)


Many methods are available to make the selection:
Newton-Rhapson: uses the shape of the likelihood function to find parameter values closer to the maximum, provides SEs based on observed information (come from second derivatives of LL) Fisher Scoring: same idea, but SEs based on expected information (which is generally not recommended) EM algorithm: Us are treated as missing data (SEs based on expected info) Mplus alternates between approaches to maximize efficiency
Psyc 945 Class 14a 40 of 52

Tobit Models: Censored Outcomes


For outcomes with ceiling or floor effects
Can be Right censored and/or left censored Also inflated or not inflation = binary variable in which 1 = censored, 0 = not censored

Model assumes continuous normal distribution for y* instead (y without missing responses; no transformation otherwise) In Mplus (not in GLIMMIX), use CENSORED ARE (with options):
CENSORED ARE y1 (a) y2 (b) y3 (ai) y4 (bi);
y1 is censored from above (right); y2 is censored from below (left) y3 is censored from above (right) and has inflation variable (inflated: y3#1) y4 is censored from above (below) and has inflation variable (inflated: y4#1)

So, can predict distribution of y1-y4, as well as whether or not y3 and y4 are censored (inflation) as separate outcomes
y3 ON x; y3#1 ON x; x predicts value of Y if at censoring point or above x predicts whether Y is censored (1) or not (0)
Psyc 945 Class 14a 41 of 52

Models for Proportions


DVs may be bounded from 0-1, even if they arent binary (%correct) Need logit link to prevent predicted Y from going out of 0-1 bounds
Predicted Y shuts off as it approaches 0 or 1, same as for binary DVs

But because Y is not binary, it uses the binomial distribution for its residuals, such that Y is modeled as: Y= #events / #trials
Can be predicted as 2 variables in SAS GLIMMIX when #trials varies:
MODEL events/trials = (predictors) / SOLUTION LINK=LOGIT DIST=BIN;

Or can be predicted directly as proportion in SAS GLIMMIX otherwise:


MODEL accuracy = (predictors) / SOLUTION LINK=LOGIT DIST=BIN;

Normal distribution for residuals only works ok if the mean ~ .50

In single-level models, you can also estimate a dispersion or scale parameter that allows the residual variance to exceed what it is predicted to be (from p*[1-p]), but this doesnt seem to be possible using true ML in GLIMMIX for multilevel models
Psyc 945 Class 14a 42 of 52

Models for Skewed Continuous DVs.

Log-Normal:
y* = LN(yi), so LN(ei) but not ei is assumed normally distributed
-- LINK=IDENTITY, DIST=LOGNORMAL in SAS GLIMMIX (not in Mplus) -- Or just log-transform Y, fit the exact same model using MIXED! Psyc 945 Class 14a
43 of 52

Models for Skewed Continuous DVs.

Gamma:
y* = LN(y), and then LN(e) is assumed gamma distributed with shape and scale parameters
-- LINK=LOG, DIST=GAMMA in SAS GLIMMIX (not in Mplus)
Psyc 945 Class 14a 44 of 52

RT Data from Chapter 15

Psyc 945 Class 14a 45 of 52

SAS Code to Make this Type of Plot


ODS RTF FILE="&filesave./RT Histogram.rtf"; PROC UNIVARIATE DATA=&datafile.; VAR RT_sec;

HISTOGRAM RT_sec / OUTHISTOGRAM=RTdistN NORMAL(COLOR=red W=5 MU=est SIGMA=est) HAXIS=AXIS2; HISTOGRAM RT_sec / OUTHISTOGRAM=RTdistLN LOGNORMAL(COLOR=purple W=5 SIGMA=est ZETA=est) HAXIS=AXIS2; HISTOGRAM RT_sec / OUTHISTOGRAM=RTdistG GAMMA(COLOR=blue W=5 ALPHA=est SIGMA=est) HAXIS=AXIS2; * Settings for X-axis; AXIS2 LABEL=(HEIGHT=1.5 "RT in Seconds") ORDER=(0 TO 60 BY 5); RUN; ODS RTF CLOSE; * Rename predicted DATA RTdistN; SET DATA RTdistLN; SET DATA RTdistG; SET DATA RTdist; MERGE curve variable, merge into single data set, export to excel; RTdistN; Normal=_EXPPCT_; DROP _CURVE_ _EXPPCT_; run; RTdistLN; LogNormal=_EXPPCT_; DROP _CURVE_ _EXPPCT_; run; RTdistG; Gamma=_EXPPCT_; DROP _CURVE_ _EXPPCT_; run; RTdistN RTdistLN RTdistG; BY _MIDPT_; RUN;

Psyc 945 Class 14a 46 of 52

Models for Count Outcomes


Counts: non-negative integer unbounded responses
e.g., how many cigarettes did you smoke this week?

Poisson and negative binomial models


y* = LN(y), which makes the count stay positive Residuals follow 1 of 2 distributions:
Poisson distribution in which = Mean = Variance Negative binomial distribution that includes a new scaling or dispersion parameter that allows the variance to be bigger than the mean variance = (1 + ) if > 0
Negative binomial mixes gamma distribution with Poisson Poisson is nested within negative binomial (can test of 0)

In Mplus: COUNT ARE y1 (p) y2 (nb); SAS GLIMMIX: LINK=LOG, DIST=Poisson or DIST=Negbin
Psyc 945 Class 14a 47 of 52

Models for Count DVs.

Poisson: y* = LN(y), and LN(e) is assumed Poisson distributed such that mean = variance, dispersion = 0 Negative Binomial: y* = LN(y), and LN(e) is assumed Gamma distributed, such that varianceNB = varianceP(1 + varianceP*)
Psyc 945 Class 14a 48 of 52

Modeling Zeros in Count Data


No zeros zero-truncated negative binomial
e.g., how many days were you in the hospital? (has to be >0) In Mplus only: COUNT ARE y1 (nbt);

Too many zeros zero-inflated Poisson or NB


e.g., # cigarettes smoked when asked in non-smokers too Only in Mplus only: COUNT ARE y2 (pi) y3 (nbi);
Refer to inflation variable as y2#1 or y3#1

Tries to distinguish two kinds of zeros


Structural zeros would never do it Inflation is predicted as logit of being a structural zero Expected zeros could do it, just didnt (part of regular count) Count with expected zeros predicted by Poisson or NB

Poisson or NB without inflation is nested within the same models with inflation (and Poisson is nested within NB) Psyc 945 Class 14a
49 of 52

Zero-Inflated Poisson and Negative Binomial Distributions


Zero-inflated distributions have extra structural zeros not expected from Poisson or NB (stretched Poisson) distributions. This can be tricky to estimate and interpret because the model distinguishes between kinds of zeros rather than zero or not... Image borrowed from Atkins & Gallop, 2007
Psyc 945 Class 14a 50 of 52

Modeling Zeros More Generally


Other more direct ways of dealing with too many zeros: split distribution into (0 or not) and (if not 0, how much)?
Negative binomial hurdle (or zero-altered negative binomial)
In Mplus: COUNT ARE y1 (nbh); 0 or not: predicted by logit of being a 0 (0 is the higher category) How much: predicted by non-zero NB for counts (integers only)

In Mplus: Two-part model uses DATA TWOPART: command


NAMES ARE y1-y4; list outcomes to be split into 2 parts CUTPOINT IS 0; where to split observed outcomes BINARY ARE b1-b4; create names for 0 or not part CONTINUOUS ARE c1-c4; create names for how much part TRANSFORM IS LOG; transformation of continuous part 0 or not: predicted by logit of being NOT 0 (something is the 1) How much: predicted as continuous distribution (like log or normal)
Psyc 945 Class 14a 51 of 52

Summary: Differences in Generalized Mixed Models


Analyze link-transformed DV (e.g., via logit, log, log-log)
Linear relationship between Xs and transformed Y Nonlinear relationship between Xs and original Y
Original e residuals are assumed to follow some non-normal distribution

In models for binary or count data, Level-1 residual variance is set


So it cant go down after adding level-1 predictors, which means that the scale of everything else has to go UP to compensate Scale of model will also be different after adding random effects for the same reason the total variation in the model is now bigger Fixed effects may not be comparable across models as a result

Estimation is trickier and takes longer


Numerical integration is best but may blow up in complex models Start values are often essential (can get those with MSPL estimator)
Psyc 945 Class 14a 52 of 52

You might also like