You are on page 1of 30

Once you know that hierarchies exist, you see them everywhere.

(Kreft and de Leeuw, 1998)

Multilevel/ Hierarchical
Linear Modeling
Jasmina Tacheva

What is Multilevel Modeling (MM)?

MM: more than a statistical technique -- a way of understanding the types of


relationships that can be examined.

Multilevel perspective changes how we think about research designs -- we can


ask different research questions.

Appropriate for populations exhibiting structures and dependencies; in


essence, a random effects modeling technique.

Method developed to deal with problems of single level analysis, cross level
inferences, and ecological fallacy.

More generally, a multilevel model is a regression (a linear or generalized


linear model) in which the parametersthe regression coe cientsare given a
probability model. This second-level model has parameters of its ownthe
hyperparameters of the modelwhich are also estimated from data.

Key elements

Varying coecients

A model for those varying coecients (which can itself include group-level
predictors)

Basic Multilevel Data Structure multiple levels of


sampling

Observations at one level of analysis are nested within observations at


another

Persons within groups, e.g. workers in organizations, work groups, classrooms,


families

Observations within persons ambulatory assessment, repeated measures in


experiments

Data requirements

Individual units with their group indicators

One or more response variables

In general, more than one unit per group is needed, and at least 10 groups

When the number of groups is small (less than 5), there is typically not
enough information to accurately estimate group-level variation; multilevel
models in this setting typically gain little beyond classical varying-coe cient
models.

Multilevel Analyses

Datasets where observations at one level of analysis are nested within


observations at another level are referred to as nested or hierarchically
nested.

Some datasets are collected with an inherent multilevel structure, for


example, students within schools, patients within hospitals, or data from
cluster sampling.

If nesting is not taken into account when the data are analyzed, important
assumptions about the independence of errors are violated.

Independence of errors

Nested (within-group) observations are not independent of each other, but


between-group observations are!

Even if the same result is obtained through a simpler method (e.g. OLS) as
with a multilevel model, it will still be wrong to use it when analyzing
multilevel data.

Children in the same classroom have the same teacher they cannot be
treated as independent observations. Students in another class also have the
same teacher but this teacher is different than the one in class #1 these
differences have to be controlled.

Multilevel modeling does this in the most accurate way currently available.

Advantages of Multilevel Analyses

Multilevel models can be used for a variety of inferential goals including


causal inference, prediction, and descriptive modeling.

Parameter estimates incorporate effects of hierarchies.

Analyze means, variances, covariances at multiple levels simultaneously.

Covariances can differ across levels of analysis (can lead to a statistical


aggregation paradox).

Multilevel modeling allows the estimation of group averages and group-level


eects, compromising between the overly noisy within-group estimate and
the oversimplied regression estimate that ignores group indicators.

Traditional alternatives to multilevel modeling

Complete pooling: dierences between groups are ignored;

No pooling: data from dierent sources are analyzed separately.

Both approaches have problems: no pooling ignores information and can give
unacceptably variable inferences, and complete pooling suppresses variation
that can be important or even the main goal of a study.

These extreme alternatives can be useful as preliminary estimates, but


ultimately we prefer the partial pooling that comes out of a multilevel
analysis.

Examples of alternative techniques for multilevel data

Accuracy defined in terms of Monte Carlo studies comparing different techniques

OLS Multilevel analyses:

aggregated means ignores within-groups relationships inaccurate parameter


estimates;

dummy-coded least squares (LSDV) takes into account between-group differences in


means, but does not model error properly;

sub-group analyses can analyze within-group relationships, but does not account for
reliability and treats parameters as fixed;

ANCOVA assumes slopes are the same; differences in slopes violate assumption (
heteroscedasticity); MLM can test differences in slopes without violating assumptions.

Classical regression can sometimes accommodate varying coe cients by using indicator
variables. The feature that distinguishes multilevel models from classical regression is in
the modeling of the variation between groups.

When to use Multilevel Modeling

When data are naturally nested or hierarchical

When data have influences occurring at different levels (individual, over


time, over domains, etc.)

When enough groups are present at upper levels (typically, more than 5)

More accurate than ICC (interclass correlations) which only deal with the
distributions of means but not with the relationships themselves

Way to find middle ground between the "individualist fallacy" and the
"ecological fallacy"

Multilevel modeling assumptions

Multilevel model requires additional assumptions beyond those of classical


regression: each level of the model corresponds to its own regression with its
own set of assumptions such as additivity, linearity, independence, equal
variance, and normality.

One of the main purposes of multilevel models is to deal with cases where the
assumption of independence is violated; multilevel models do, however,
assume that 1) the level 1 and level 2 residuals are uncorrelated and 2) The
errors (as measured by the residuals) at the highest level are uncorrelated.

Comparing the assumptions for HL models with OLS models

OLS Assumptions

HLM assumptions

Linearity: function form is linear

Linearity: function forms are linear at each level

Normality: residuals are normally distributed

Normality: level-1 residuals are normally distributed and level-

Homoscedasticity: residual variance is constant

2 random effects us have a multivariate normal distribution

Independence: observations are independent

Homoscedasticity: level-1 residual variance is constant

of each other

Independence: level-1 residuals and level-2 residuals are


uncorrelated
Independence: observations at highest level are independent
of each other

Example

Consider data from students in many schools, predicting in each school the
students grades y on a standardized test given their scores on a pre-test x
and other information.

Specifically, we assume just one student-level predictor x (for example, a


pre-test score) and one school-level predictor u (for example, average
parents incomes).

In this situation, two multilevel model scenarios can arise

Varying-intercept model

Regressions have the same slope in each of the schools, and only the
intercepts vary.

We use i for individual students and j[i] for the school of student i

The number of data points J (here, schools) in the higher-level regression is


typically much less than n, the sample size of the lower-level model (for
students in this example).

Varying-intercept, varying-slope model

Compared to the varying-intercept model, this has twice as many vectors of


varying coecients (,), twice as many vectors of second-level coecients
(a,b), and potentially correlated second-level errors 1,2

Steps in MM illustrated with an example in R


Step 1. Because multilevel modeling involves predicting variance at different levels, one often begins a
multilevel analysis by determining the levels at which significant variation exist.

In the case of a two-level model, one generally assumes that there is significant variation in 2 that
is, one assumes that within-group variation is present. One does not necessarily assume, however,
that there will be significant intercept variation (00) or between-group slope variation (11).

In Step 1, one explores the group-level properties of the outcome variable to determine three things:
(1) the ICC(1) associated with the outcome variable (how much of the variance in the outcome can be
explained by group membership); (2) whether the group means of the outcome variable are reliable
(around .70, Bliese, 2000); (3) whether the variance of the intercept (00) is significantly larger than
zero. These three aspects of the outcome variable are examined by estimating an unconditional
means model.

In the two stage HLM notation, the model is:

In combined form, the model is:

This model states that the dependent variable is a function of a common intercept 00, and two error
terms: the between-group error term, u0j, and the within-group error term, rij. The model essentially
states that any Y value can be described in terms of an overall mean plus some error associated with
group membership and some individual error. Bryk and Raudenbush (1992) note that this model is
directly equivalent to a one-way random effects ANOVA. In the unconditional means model, the fixed
portion of the model is 00 (an intercept term) and the random component is u 0j+rij.The random
portion of the model states that intercepts will be allowed to vary among groups.

Steps in MM illustrated with an example in R


We begin the analysis by attaching the multilevel package (which also loads
the nlme package) and making the bh1996 data set in the multilevel package
available for analysis.
> library(multilevel)
> data(bh1996)
> Null.Model<-lme(WBEING~1,random=~1|GRP,data=bh1996,
control=list(opt="optim"))

In the model, the fixed formula is WBEING~1. This states that the only
predictor of well-being is an intercept term. One can think of this model as
stating that in the absence of any predictors, the best estimate of any
specific outcome value is the mean value on the outcome. The random
formula is random=~1|GRP. This specifies that the intercept can vary as a
function of group membership. This is the simplest random formula that one
will encounter, and in many situations a random intercept model may be all
that is required to adequately account for the nested nature of the grouped
data. The option control=list(opt="optim") in the call to lme instructs the
program to use Rs general purpose optimization routine.

Steps in MM illustrated with an example in R

Estimating ICC. The unconditional means model provides between-group and within-group
variance estimates in the form of 00 and 2, respectively. As with the ANOVA model, it is
useful to determine how much of the total variance is between-groups. This can be
accomplished by calculating the Intraclass Correlation Coefficient (ICC) using the formula:

ICC = 00/(00 + 2)
The VarCorr function provides estimates of variance for an lme object:

The estimate of 00 (between-group or Intercept variance) is 0.036, and the estimate of 2


(within-group or residual variancel) is 0.789. The ICC estimate (00/(00 + 2)) is .04. To verify
that the ICC results from the random coefficient modeling are similar to those from an ANOVA
model and the ICC1 function one can perform an ANOVA analysis on the same data:
> tmod<-aov(WBEING~as.factor(GRP),data=bh1996)
> ICC1(tmod)
[1] 0.04336905

Steps in MM illustrated with an example in R


Estimating Group-Mean Reliability. The reliability of group means often affects
ones ability to detect emergent phenomena. In other words, a prerequisite for
detecting emergent relationships at the aggregate level is to have reliable group
means (Bliese 1998).
By convention, estimates around .70 are considered reliable. Group mean
reliability estimates are a function of the ICC and group size (see Bliese, 2000;
Bryk & Raudenbush, 1992).
The GmeanRel function from the multilevel package calculates the ICC, the
group size, and the group mean reliability for each group. When we apply the
GmeanRel function to our Null.Model based on the 99 groups in the bh1996 data
set, we are interested in two things:

First, we are interested in the average reliability of the 99 groups.

Second, we are interested in determining whether or not there are specific


groups that have particularly low reliability.

ac Th
qu cep e o
Sp ite ta ver
re eci low ble all
lia fic
a g
bi al rel t .7 rou
lit ly ia 3 p
y , g bi , b -m
es ro lit u e
tim up y t s an
at 72 esti eve rel
es a m r a i a
be nd ate l g bili
lo gr s. ro ty
up is
w ou
sh
.5 p
0. 98
av
e
ha
ve

Steps in MM illustrated with an example in R

Steps in MM illustrated with an example in R


Determining whether 00 is significant. Returning to our original analysis
involving wellbeing from the bh1996 data set, we might be interested in knowing
whether the intercept variance (i.e., 00) estimate of 0.036 is significantly
different from zero. To do this we compare 2 log likelihood values between (1)
a model with a random intercept, and (2) a model without a random intercept:

The 2 log likelihood value for the gls model without the random intercept is 19536.17. The 2 log
likelihood value for the model with the random intercept is 19347.34. The difference of 188.8 is
significant on a Chi-Squared distribution with one degree of freedom (one model estimated a
variance term associated with a random intercept, the other did not, and this results in the one df
difference). These results suggest that there is significant intercept variation.

Steps in MM illustrated with an example in R


Step 2. At this point in our example we have two sources of variation that we can attempt
to explain in subsequent modeling within-group variation (2) and between-group
intercept (i.e., mean) variation (00). In many cases, these may be the only two sources of
variation we are interested in explaining so let us begin by building a model that predicts
these two sources of variation. The form of the model using Bryk and Raudenbushs (1992)
notation is:

The first row states that individual well-being is a function of the groups intercept plus a
component that reflects the linear effect of individual reports of work hours plus some
random error. The second line states that each groups intercept is a function of some
common intercept (00) plus a component that reflects the linear effect of average group
work hours plus some random between-group error. The third line states that the slope
between individual work hours and well-being is fixedit is not allowed to randomly vary
across groups. Stated another way, we assume that the relationship between work hours
and well-being varies by no more than chance levels among groups.
When we combine the three rows into a single equation we get an equation that looks like
a common regression equation with an extra error term (u0j). This error term indicates
that WBEING intercepts (i.e., means) can randomly differ across groups. The combined
model is:

Steps in MM illustrated with an example in R

This model is specified in lme as:

Notice that work hours are significantly


negatively related to individual well-being.
Furthermore after controlling the individual-level
relationship, average work hours (G.HRS) are
related to the average well-being in a group.
The interpretation of this model, like the
interpretation of the contextual effect model
(section 3.4.1) indicates that the slope at the
grouplevel significantly differs from the slope at
the individual level. Indeed, in this example,
each hour increase at the group level is
associated with a -.163 (-.046+-.127) decrease in
average wellbeing. The coefficient of -.127
reflects the degree of difference between the
two slopes.

Steps in MM illustrated with an example in R


Step 3. Let us continue our analysis by trying to explain the
third source of variation, namely, variation in our slopes (11,
12, etc.). To do this, we examine another variable from the
Bliese and Halverson (1996) data set. This variable represents
Army Company members ratings of leadership consideration
(LEAD). Generally, individual soldiers ratings of leadership are
related to well-being. In this analysis, however, we will
consider the possibility that the strength of the relationship
between individual ratings of leadership consideration and wellbeing varies among groups. We begin by examining slope
variation among the first 25 groups. Visually we can do this
using xyplot from the lattice package.
> library(lattice)
> xyplot(WBEING~LEAD|as.factor(GRP),data=bh1996[1:1582,],
type=c("p","g","r"),col="dark blue",col.line="black",
xlab="Leadership Consideration",
ylab="Well-Being")

Steps in MM illustrated with an example in R

From the plot of the first 25 groups in the bh1996 data set, it seems likely that
there is some slope variation. The plot, however, does not tell us whether or
not this variation is significant. Thus, the first thing to do is to determine
whether the slope variation differs by more than chance levels.

Is slope variation significant? We begin our formal analysis of slope variability


by adding leadership consideration to our model and testing whether or not
there is significant variation in the leadership consideration and well-being
slopes across groups. The model that we test is:

The last line of the model includes the error term u 2j. This term indicates that
the leadership consideration and well-being slope is permitted to randomly vary
across groups. The variance term associated with u 2j is 12. It is this variance
term that interests us in the cross-level interaction hypothesis. Note that we
have not permitted the slope between individual work hours and individual
well-being to randomly vary across groups. In combined form the model is:

Steps in MM illustrated with an example in R


In R this model is designated as:

Goodness of fit tests

Wald test

Chi square: Snijders and Bosker (2012, p. 99) recommend using a "mixture
distribution" (or "chi-bar distribution") by comparing the chi-square
difference.

Likelihood ratio test as a comparison of two nested models in the


likelihood ratio, i.e. Deviance statistics. The deviance is usually not
interpreted directly, but rather compared to deviance(s) from other models
fitted to the same data.

Multilevel modelling software reviews


http://www.bristol.ac.uk/cmm/learning/mmsoftware/

aML

EGRET

GENSTAT

GLLAMM

HLM

MIXREG

MLwiN

SAS

S-Plus

SPSS

Stata

SYSTAT

WinBUGS

References

Bliese, P. (2013). Multilevel modeling in R (2.5).Retrieved July 20, 2013.

Gelman, A., & Hill, J. (2006).Data analysis using regression and


multilevel/hierarchical models. Cambridge University Press.

Kreft, I. G. G., & de Leeuw, J. (1998). Introducing Multilevel Modeling. Newbury


Park, CA: Sage Publications.

Kwok, O. M., Underhill, A. T., Berry, J. W., Luo, W., Elliott, T. R., & Yoon, M. (2008).
Analyzing longitudinal data with multilevel models: An example with individuals living
with lower extremity intra-articular fractures.Rehabilitation Psychology,53(3), 370.

Nezlek, J. B. (2008). An introduction to multilevel modeling for social and personality


psychology.Social and Personality Psychology Compass,2(2), 842-860.

Raudenbush, S. W., & Bryk, A. S. (2002). Hierarchical Linear Models (2nd ed.).
Newbury Park, CA: Sage Publications.

You might also like