Professional Documents
Culture Documents
Todays Topics:
Introduction to Generalized Multilevel Models Models for Binary Outcomes Other Generalized Models Estimation in Generalized Multilevel Models
Continuous but Skewed Log-Normal or Gamma Censored (piled up and cut off at one end)
Tobit
Psyc 945 Class 14a 6 of 52
Variances:
Quantitative DV: Var(Y) =
(Y -Y)
i i=1
n-1
p variance
Psyc 945 Class 14a 7 of 52
Mean of errors would still be 0 But variance of errors cant possibly be constant over levels of X like we assume in general linear models
The mean and variance of a binary outcome are dependent! This means that because the conditional mean of Y (p, the predicted probability Y= 1) is dependent on X, then so is the error variance
Psyc 945 Class 14a 9 of 52
??
Prob (Y=1)
??
1 2 3 4 5 6 7 8 9 10 11 X Predictor
1.40 1.20 1.00 0.80 0.60 0.40 0.20 0.00 -0.20 -0.40 1 2 3 4 5 6 7 8 9 10 11 X Predictor
Psyc 945 Class 14a 10 of 52
Prob (Y=1)
Logit: LN (p / 1-p)
Probabilty
0.6
0.8
0.50 0.10
0.4
Can you guess what a probability of .01 would be on the logit scale?
0.0
0.2
-4
-2
0 Logit
4
Psyc 945 Class 14a 14 of 52
Nonlinearity in Prediction
The relationship between X and the probability of response=1 is nonlinear an s-shaped logistic curve whose shape and location are dictated by the estimated fixed effects
Linear with respect to the logit, nonlinear with respect to probability B0 = 0 B1 = 1
Predictor X
Predictor X
The logit version of the model will be easier to explain; the probability version of the prediction will be easier to show.
Psyc 945 Class 14a 15 of 52
Prob: P(yi=1) =
Effects of predictors on probability are nonlinear and non-additive (no one-unit change language allowed)
Psyc 945 Class 14a 16 of 52
Prob: P(yi=1) = exp(0)*exp(1xs) / 1+(exp(0) *exp(1xs)) Prob(Yi=1) = (.429 * 5.444) / 1 + (.429 * 5.444) = .70
So, for men, probability(admit) = .70, odds = 2.334, log odds = 0.847 for women, probability(admit) = .30, odds = 0.429, log odds = -0.847
Psyc 945 Class 14a 17 of 52
Logit Ys = 1.776 - .275(WAISs=1) = 1.50 note is additive For every one-unit change in WAIS, log odds go down by .275
Odds: (p/1-p)s = exp(0 + 1xs) or exp(0) * exp(1xs) Prob: P(1)s = exp(0 + 1xs) / 1 + exp(0 + 1xs)
Odds Ys = exp(1.776) * exp(- .275(WAISs=1)) note is multiplicative Multiply, then exp: Odds Ys = 5.907 * .760 = 4.49
So, if WAIS=10, prob(senility) = .86, odds = 5.91, log odds = 1.78 So, if WAIS=11, prob(senility) = .82, odds = 4.49, log odds = 1.50 So, if WAIS=16, prob(senility) = .53, odds = 1.13, log odds = 0.13 So, if WAIS=17, prob(senility) = .46, odds = 0.86, log odds = -0.15
Different Model for the Variances: in both models, ei ~ Bernoulli, but new transformed Ys have a known variance of:
Logit: 2/3, or 3.29 (from the logistic distribution) Probit: 1 (from the standard normal distribution)
Psyc 945 Class 14a 19 of 52
Transformed y* 0 1 Another way these models are explained is with the threshold concept Underlying the observed 0/1 response is really a pretend continuous variable called y*, such that: if y*< 0 then y = 0 and if y* 0 then y = 1, so the model predicts the probability that y* threshold at which 0 becomes 1 Accordingly, the difference between logit and probit is that the continuous underlying variable y* has a variance of 3.29 (SD = 1.8, logit) or 1.0 (probit)
0,1,2 vs. 3
Submodel3
Logit of 0 = 1 int1 Logit of 1 = int1 int2 Logit of 2 = int2 int3 Logit of 3 = int3 0
or or or or
1 vs. 2
Submodel2
2 vs. 3
Submodel3
or or or or
Submodels are each estimated using only the data for those adjacent categories (bad), but ordering is not assumed (good) Effects of predictors are assumed the same across submodels Proportional odds assumption
Is testable more flexibly in SAS PROC NLMIXED
Psyc 945 Class 14a 25 of 52
0 vs. 2
Submodel2
0 vs. 3
Submodel3
Useful for testing proportional odds assumption because all fixed and random effects are estimated separately per submodel, but may be hard to estimate as a result
Psyc 945 Class 14a 26 of 52
The difference is that we will add random effects (i.e., additional piles of variance) to address dependency in longitudinal or clustered data
Piles of variance are ADDED TO, not EXTRACTED FROM, the original residual variance pile when it is fixed to a known value Thus, some concepts translate exactly from general multilevel models, but some dont
Psyc 945 Class 14a 27 of 52
Logistic ICC =
Can do -2LL difference test to see if Var(U0i) > 0 (although the ICC is somewhat problematic due to non-constant residual variance)
Psyc 945 Class 14a 29 of 52
Combined:
eti residual variance is still not estimated 2/3 = 3.29 Can test new fixed or random effects with -2LL difference tests (or Wald test p-values for fixed effects)
What average means in generalized linear mixed models is different, because the natural log is a nonlinear function:
So the mean of the logs log of the means Therefore, the fixed effects are not the sample average effect, they are the effect for specifically for Ui = 0
Fixed effects are conditional on the random effects This gets called a unit-specific or subject-specific model This distinction does not exist for normally distributed outcomes
Psyc 945 Class 14a 32 of 52
mixed
Residual variance cant decrease due to effects of level-1 predictors, so all other estimates have to go up to compensate
If Xti is uncorrelated with other Xs and is a pure level-1 variable (ICC 0), then fixed and SD(U0i) will increase by same factor
Random effects variances can decrease, though, so level-2 effects should be on the same scale across models if level-1 is the same
Psyc 945 Class 14a 33 of 52
Doesnt assume it knows the individual U values, but does assume that the U values have a multivariate normal distribution (cannot be modified in most programs)
Psyc 945 Class 14a 36 of 52
However, if the change in likelihood value from the previous iteration to the current iteration is larger than your criterion
Model assumes continuous normal distribution for y* instead (y without missing responses; no transformation otherwise) In Mplus (not in GLIMMIX), use CENSORED ARE (with options):
CENSORED ARE y1 (a) y2 (b) y3 (ai) y4 (bi);
y1 is censored from above (right); y2 is censored from below (left) y3 is censored from above (right) and has inflation variable (inflated: y3#1) y4 is censored from above (below) and has inflation variable (inflated: y4#1)
So, can predict distribution of y1-y4, as well as whether or not y3 and y4 are censored (inflation) as separate outcomes
y3 ON x; y3#1 ON x; x predicts value of Y if at censoring point or above x predicts whether Y is censored (1) or not (0)
Psyc 945 Class 14a 41 of 52
But because Y is not binary, it uses the binomial distribution for its residuals, such that Y is modeled as: Y= #events / #trials
Can be predicted as 2 variables in SAS GLIMMIX when #trials varies:
MODEL events/trials = (predictors) / SOLUTION LINK=LOGIT DIST=BIN;
In single-level models, you can also estimate a dispersion or scale parameter that allows the residual variance to exceed what it is predicted to be (from p*[1-p]), but this doesnt seem to be possible using true ML in GLIMMIX for multilevel models
Psyc 945 Class 14a 42 of 52
Log-Normal:
y* = LN(yi), so LN(ei) but not ei is assumed normally distributed
-- LINK=IDENTITY, DIST=LOGNORMAL in SAS GLIMMIX (not in Mplus) -- Or just log-transform Y, fit the exact same model using MIXED! Psyc 945 Class 14a
43 of 52
Gamma:
y* = LN(y), and then LN(e) is assumed gamma distributed with shape and scale parameters
-- LINK=LOG, DIST=GAMMA in SAS GLIMMIX (not in Mplus)
Psyc 945 Class 14a 44 of 52
HISTOGRAM RT_sec / OUTHISTOGRAM=RTdistN NORMAL(COLOR=red W=5 MU=est SIGMA=est) HAXIS=AXIS2; HISTOGRAM RT_sec / OUTHISTOGRAM=RTdistLN LOGNORMAL(COLOR=purple W=5 SIGMA=est ZETA=est) HAXIS=AXIS2; HISTOGRAM RT_sec / OUTHISTOGRAM=RTdistG GAMMA(COLOR=blue W=5 ALPHA=est SIGMA=est) HAXIS=AXIS2; * Settings for X-axis; AXIS2 LABEL=(HEIGHT=1.5 "RT in Seconds") ORDER=(0 TO 60 BY 5); RUN; ODS RTF CLOSE; * Rename predicted DATA RTdistN; SET DATA RTdistLN; SET DATA RTdistG; SET DATA RTdist; MERGE curve variable, merge into single data set, export to excel; RTdistN; Normal=_EXPPCT_; DROP _CURVE_ _EXPPCT_; run; RTdistLN; LogNormal=_EXPPCT_; DROP _CURVE_ _EXPPCT_; run; RTdistG; Gamma=_EXPPCT_; DROP _CURVE_ _EXPPCT_; run; RTdistN RTdistLN RTdistG; BY _MIDPT_; RUN;
In Mplus: COUNT ARE y1 (p) y2 (nb); SAS GLIMMIX: LINK=LOG, DIST=Poisson or DIST=Negbin
Psyc 945 Class 14a 47 of 52
Poisson: y* = LN(y), and LN(e) is assumed Poisson distributed such that mean = variance, dispersion = 0 Negative Binomial: y* = LN(y), and LN(e) is assumed Gamma distributed, such that varianceNB = varianceP(1 + varianceP*)
Psyc 945 Class 14a 48 of 52
Poisson or NB without inflation is nested within the same models with inflation (and Poisson is nested within NB) Psyc 945 Class 14a
49 of 52