Explained Variance in Multilevel Models

This article was downloaded by: University College London
On: 28 Jan 2017

Access details: subscription number 11237
Publisher:Routledge
Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered office: 5 Howick Place, London SW1P 1WG, UK
Handbook of Advanced Multilevel Analysis
Joop J. Hox, J. Kyle Roberts
Explained Variance in Multilevel Models
Publication details
https://www.routledgehandbooks.com/doi/10.4324/9780203848852.ch12
J. Kyle Roberts, James P. Monaco, Holly Stovall, Virginia Foster
Published online on: 19 Jul 2010
How to cite :- J. Kyle Roberts, James P. Monaco, Holly Stovall, Virginia Foster. 19 Jul
2010 ,Explained Variance in Multilevel Models from: Handbook of Advanced Multilevel
Analysis Routledge.
Accessed on: 28 Jan 2017
https://www.routledgehandbooks.com/doi/10.4324/9780203848852.ch12
PLEASE SCROLL DOWN FOR DOCUMENT
Full terms and conditions of use: https://www.routledgehandbooks.com/legal-notices/terms.
This Document PDF may be used for research, teaching and private study purposes. Any substantial or systematic reproductions,
re-distribution, re-selling, loan or sub-licensing, systematic supply or distribution in any form to anyone is expressly forbidden.
The publisher does not give any warranty express or implied or make any representation that the contents will be complete or
accurate or up to date. The publisher shall not be liable for an loss, actions, claims, proceedings, demand or costs or damages
whatsoever or howsoever caused arising directly or indirectly in connection with or arising out of the use of this material.
Downloaded By: University College London At: 14:31 28 Jan 2017; For: 9780203848852, chapter12, 10.4324/9780203848852.ch
12
Explained Variance in Multilevel Models
J. Kyle Roberts
Annette Caldwell Simmons School of Education and Human
Development, Southern Methodist University, Texas
James P. Monaco
Laboratory for Computational Imaging and
Bioinformatics, Rutgers University, New Jersey
Holly Stovall and Virginia Foster

Dedman College, Southern Methodist University, Texas
With the rise of the use and utility of multilevel modeling (MLM), one ques-
tion has consistently been posed to authors and on listserves: “How much
variance does my model explain?” Answering this question within the MLM
framework is not an easy task where it is actually possible to explain “nega-
tive variance” when the addition of explanatory variables increases the corre-
sponding variance components (Snijders & Bosker, 1999). Because effect size
measures previously proposed consider variance at each level, a single mea-
sure is needed that helps researchers interpret the strength of the model as a
whole. The purpose of this paper is to provide a history of past MLM effect
sizes and present three new measures that consider “whole model” effects.
The utility of effect sizes in research interpretation has generated con-
siderable discussion, much of which centers on the role and function of
effect sizes, especially concerning the relationship to statistical signifi-
cance tests (cf. Harlow, Mulaik, & Steiger, 1997). Many authors agree that
effect sizes can serve a valuable function to help evaluate the magnitude
of a difference or relationship (cf. Cohen, 1994; Kirk, 1996; Schmidt, 1996;
Shaver, 1985; Thompson, 1996; Wilkinson & APA Task Force on Statistical
Inference, 1999). Their articles, along with current publications (cf., Knapp
& Sawilowsky, 2001a; Roberts & Henson, 2002) continue to debate both the
use and utility of measures of effect size when considered both in conjunc-
tion with and peripheral from statistical significance testing.
One positive thing that has occurred while researchers began to debate the
issue of effect size reporting (e.g., Knapp & Sawilowsky, 2001b; Thompson,
219
220 • J. Kyle Roberts, James P. Monaco, Holly Stovall, and Virginia Foster
2001) is the encouragement of researchers degree to which individuals share common

to consider more than just p-values before experiences due to closeness in space and/
making interpretations as to the notewor- or time” (Kreft & de Leeuw, 1998, p. 9). Hox
thiness (or lack thereof) of a given study. (1995) explains the ICC as a “population
And although it may seem that the field of estimate of the variance explained by the
research follows changes and adopts a new grouping structure” (p. 14). The ICC for a
course with the speed and acuteness of a two-level model can be represented as:
glacier, the fact that this glacier is moving is
predicated by the adoption of such language τ02 σ u2 0
ρI = = , (12.1)
in the APA Publication Manual (2001): τ 02 + σ 2 σ u2 0 + σ e20
The general principle to be followed, how- where the numerator is represented by the
ever, is to provide the reader not only variance at the second level of the hierarchy
with information about statistical signifi- ( τ02 ) , and the denominator represents the
cance but also with enough information total variation in the model at both level-2
to assess the magnitude of the observed
and level-1 (σ2).
effect or relationship. (p. 26)
Although the ICC actually is not a mea-
sure of the effect size of an MLM model, it
bears mentioning here because it sometimes
is wrongly thought of as a measure of the
12.1 Catalog of Effect “power” or strength of MLM over ordinary
Sizes in MLM least squares (OLS) regression. However,
A history of effect sizes has been dealt with this type of thinking is commonly illus-
exhaustively in Huberty (2002), and does trated in passages like the following:
not bear repeating here. One thing absent
from Huberty’s catalog was the use of effect Determining the proportion of the total
size indices in multilevel analysis. It was variance that lies systematically between
schools, called the intraclass correla-
wisely absent from Huberty’s manuscript,
tion (ICC), constitutes the first step in an
since there is much misconception, and HLM analysis. We conduct this analysis
even disagreement, as to the interpretation with a fully unconditional model, which
of these effects. We will quickly list some of means that no student or school charac-
the proposed effect size indices for use in teristics are considered. This first step can
MLM and give brief explanations as to their also indicate whether HLM is needed or
utility. whether a single level analytic method is
appropriate. Only when the ICC is more
than trivial (i.e., greater than 10% of the
12.1.1 Intraclass Correlation total variance in the outcome) would the
analyst need to consider multilevel meth-
Intraclass correlation (ICC) is generally
ods [emphasis mine]. Ignoring this step
thought of as the degree of dependence of (i.e., assuming an ICC of either 0 or 1)
individuals upon a higher structure to which would be inappropriate if the research
they belong; or, the proportion of total vari- question were multilevel. Investigation of
ance that is between the groups of the regres- contextual effects, I argue, is by nature a
sion equation. Put more succinctly, it “is the multilevel question. (Lee, 2000, p. 128)
Explained Variance in Multilevel Models • 221
Roberts (2002) has rightly pointed out is reflected in many forms by Hox (2002,
that it would be incorrect to interpret p. 64), or conversely as:
this statistic as a measure of the magni-
tude of difference between OLS and MLM τ 00 (null ) − τ00 ( full )
R22 = (12.5)
estimates. τ 00 (null )
by Raudenbush and Bryk (2002, p. 74) and

12.1.2 Proportion Reduction
Kreft and de Leeuw (1998, p. 118).
in Variance
Although perceived as a tool for noting
The process of building a multilevel model the reduction in variance at each level of the
often begins with a null model (also called model, Hox (2002) and Snijders and Bosker
the baseline model by Hox, 2002). In this (1999) are quick to caution researchers
baseline model, just the grand mean is fit in against directly interpreting this statistic,
the model such that: since it is possible to obtain negative values
for R2 with these formulas when either σ e2
yij = γ 00 + u0 j + eij , (12.2) is a biased estimator or when level-2 pre-
dictors are included in the model. This is a
difficult concept to grasp, because in nor-
where, γ00 is the model grand mean (or inter-
mal OLS models, the addition of variables
cept), u0 j is the random group-level effect
to the model can only help prediction of
with variance σ u02
, and eij is the person level
the dependent variable (raise R2), not hurt
effect with variance σ e2 . This baseline mod-
prediction. A negative R2 value in MLM
el’s variance estimates serve as a bench-
might be wrongly interpreted to mean that
mark for determining the R2 at each level of
the predictor variables are performing at
the hierarchy. By using variance estimates
worse levels than just the grand mean as a
from the null model (e.g., σ u2 0|b ) and vari-
predictor.
ance estimates from the model where all
Negative variance can occur in an exam-
predictors are entered (e.g., σ u2 0|m ), the per-
ple where we use a variable that has almost
centage reduction in variance between the
no variation at one of the levels. Consider
null model and the complete model can be
the case of a model where we have one single
estimated by:
level-1 predictor. It would be safe to assume
σ u2 0|b − σ u2 0|m that the addition of this variable would
R22 = , (12.3) reduce both the between-and within-groups
σ u2 0|b
variance. If we add a group-level predictor,
for the percentage reduction in level-2 vari- then we could expect that it would reduce
ance and by: only the between-groups variance, not the
within-groups variance, ultimately increas-
σ e2|b − σ e2|m ing the estimate for the population variance
R12 = , (12.4) σ̂ u0
2
σ e2|b . For example, consider the output in
Table 12.1 from a two-level model.
for the reduction percentage in level-1 vari- The data in this example were adapted from
ance, where |b and |m represent the baseline a hypothetical data set written to illustrate
and full models, respectively. This formula multilevel models (Roberts, 2004). In model
TABLE 12.1 as one minus the combined variance at both

Illustration of Negative Variance With the Addition levels for the full model divided by the com-
of a Level-1 Predictor bined variance for the null model, or:
Estimate
Model Formula σ e2 σ u0
2
R12 = 1 −
( ∑γX )
var Yij −
h
h hij
M0: scienceij = γ 00 + u0 j + eij 1.979 23.923
var (Yij )
M1: scienceij = γ 00 + γ 10 (ses ) + u0 j + eij 0.651 80.890 (12.6)
σˆ ( full ) + τˆ ( full )
2 2
0
= 1− ,
M1, just a single level-1 predictor is included σˆ 2 (null ) + τˆ (null )
2
0
in the model with variance estimates of
σ̂ e2  = 0.651 and σ̂ u0
2
 = 80.923. If we are to where Yij is the outcome variable, γh repre-
consider this in terms of Equation 12.4, we sents the coefficient for outcome variable
could see that we actually would be decreas- Xhij for all h variables, σ̂ 2 is an estimate of
ing the variance explained (or increasing σ̂ u0
2
) at the variance at the first level, and τ̂02 is an
the second level with the introduction of this estimate of the variance at the second level.
predictor (hence negative variance explained The level-2 R2 is then found by dividing
between the two models). the σ̂ 2 by the group cluster size (B), or by
Although the amount of variance explai the average cluster size for unbalanced data,
ned is noteworthy at level-1 (R12 = (1.979 – such that:
.0651)/1.979 = 0.671), the amount of variance
explained at the second level is actually
R22 = 1 −
( ∑γX )
var Y. j −
h
h h. j
–2.381(R22 = (23.923 – 80.890)/23.923 = –2.381). var (Y. j )
Not only is this number troubling, but it is (12.7)
counter-intuitive to the way most researchers σˆ 2 ( full ) / B + τˆ 02 ( full )
= 1− .
think about the effectiveness of a model. If we σˆ 2 (null )/ B + τˆ 02 (null )
were to interpret this model without previous
knowledge of multilevel models, we might In this formula, it is easy to see that the
be inclined to say that the addition of the R2 estimate at level-2 is similar to the R2
predictor “ses” is a worse predictor of “math for level-1, having just reduced the level-1
achievement” at the second level than if we variance to represent an average variance
had no predictor at all, being that it explains for each group. Although this estimation
–238% variance! differs from the previous definition of R2
(Equations 12.3 and 12.4), it is still possible
to obtain “negative” values for R2.
12.1.3 Explained Variance as Using Table 12.1, R2 at level-1 is:
a Reduction in Mean
Square Prediction Error 0.651 + 80.890
R12 = 1 − = −2.148,
1.979 + 23.923
Snijders and Bosker (1999) argue for a
slightly different approach to computing R2 and for level-2 is:
values in multilevel models by computing
the model’s associated mean square predic- 0.651 / 10 + 80.890
R22 = 1 − = −2.356.
tion error. The R2 for level-1 is then computed 1.979 / 10 + 23.923
Once again this solution is extremely prob- n / (n + 1)σˆ α2 + σˆ 2y / n

lematic, as we have again obtained negative λ = 1− ,
σˆ α2 + σˆ 2y / n
values for R2.
and at the group level as:
12.1.4 Pooling as a Measure of
Explained Variance σˆ α2
λ = 1− .
σˆ + σˆ 2y / n
2
α
Like Snijders and Bosker (1999), Gelman
and Pardoe (2006) propose a method that
defines a measure of explained variance (R2) Together these two measures clarify the
at each level of the model. (Here the term role of predictors at the different levels
level “corresponds to the separate variance and are very useful for understanding the
components rather than the more usual behavior of complex multilevel models.
measurement scales.”) However, their mea- These measures shed insight into model fit
sure does not require fitting multiple models at each level as well as the extent to which
and can be interpreted similarly to the clas- the errors are pooled toward their common
sical R2 in that it is the proportion of vari- prior distribution.
ability explained by the linear predictors.
However, it is important to note that since 12.1.5 Whole Model
the proposed R2 is computed at each level, Explained Variance
it does not directly measure the overall pre-
dictive accuracy of the model. Xu (2003) proposes three measures that
They define the explained variance at the could be used to estimate the amount of vari-
data level as: ation being explained by predictors in a lin-
ear mixed effects model. To obtain all three
of these measures both a null and full model
σˆ 2y / n
R2 = 1 − , must be fit to the data. The first of these three
σˆ α2 / (n + 1) + σˆ 2y / n measures, r2, uses maximum likelihood
to directly estimate the variance compo-
where σ̂ 2y is an estimate of the variance of yj nents under these two models. An empiri-
given y j ~ N ( α j , σˆ 2y ) and σ̂ α2 is an estimate cal Bayes approach is taken to estimate the
of the variance of αj given α j ~ N ( µ α , σˆ α2 ) . random effects, and a second measure, R2, is
Gelman and Pardoe also define a sum- obtained by comparing the residuals of the
mary measure for the average amount of full and null models. The third measure, ρ2,
pooling (λ) at each level of the model. A low is based on the idea of explained random-
measure of pooling (< 0.5) suggests a higher ness and relies on the Kullback-Leibler gain.
degree of within-group information than Xu defines the following formulae where:
population-level information. The opposite
is true for a high measure of pooling. Note σˆ 2
r2 = 1− ,
that this pooling factor is clearly related to σˆ 0
the number of effective parameters. This
pooling factor may be defined at the data RSS
R2 = 1 − ,
level as: RSS0
and predictor variables and the dependent vari-

able. Another way to think about this value
σˆ 2  RSS RSS0  is that it is the correlation between the
ρ2 = 1 − exp − ,
ˆ
σ0  Nσˆ 2 Nσˆ 02  dependent variable, y, and ŷ , the values of
the dependent variable predicted from the
independent variables. This association can
where σ̂ 2 is an estimate of the residual
be seen in Figure 12.1 where x1, x2, and x3
variance at the lowest level (given by Xu as
are all predictors of y.
εij) with all predictors present, and σ̂ 02 is
Therefore we could write a formula to
an estimate of the residual variance given
represent Figure 12.1 as:
only the clustering. RSS is then defined
as the residual sum of squares such that
R y2( x1,x 2 ,x 3) = R yy
2
ˆ. (12.8)
σˆ 2 ≈ RSS / ( N − df ).
The performance of the three measures, r2,
If we are to think theoretically about this
R2, and ρ2, was assessed by Xu through sim-
formula, we can describe ŷ as simply any
ulation, and the results suggest that r2 gives
given person’s predicted y score based on
accurate estimates of the true amount of vari-
the weights derived for each independent
ation explained. R2 and ρ2 are good estimates
variable. The distance between an individ-
when the cluster size is large, but they overes-
ual’s predicted score, ŷ , and their observed
timate if the cluster sizes are too small with R2
score, y, would simply be thought of as an
over estimating the truth slightly more than
error, or variance unaccounted for.
ρ2. Thus, all three of these measures seem well
Although this is a relatively simple for-
suited for summarizing the predictive ability
mula, it can be applied easily to MLM, if
of a linear mixed effects model.
we are to think of MLM as belonging to
the General Linear Model of statistics. If we
consider that the point of any analysis is to
try to produce a series of coefficients that
12.2 Distance Measures
closely approximates an individual’s origi-
for Calculating R2 in
nal score, then we can see that in MLM, the
Multilevel Models
ŷ is simply any predicted value based on
In turning our thoughts to multiple regres- a set of regression coefficients derived from
sion, the multiple R2 can be thought of as the MLM model, and the error term is sim-
the correlation between a function of the ply the distance between ŷ and y. In the
y x1
x2
Figure 12.1
Graphical representation of correlation between y x3
ŷ
and predictor variables.
case of hierarchical linear modeling, these where y j is the random estimate for group
weights are derived through maximum like- j and yij is the original outcome score for
lihood estimates of the fixed effects, with person i in group j. This number is the
the individual estimates being the product same value as the sum of the square of all
of empirical Bayesian estimates. of the residual values eij, but is distinctly
Although it would seem that a researcher different from the variance estimate from
could simply correlate these ŷ and y val- the unbiased estimator of σ2 that contains
ues to obtain an estimate of R2, we must a correction factor for the Q + 1 regression
remember and maintain in MLM that we parameters such that:
wish to honor the procedure by which the
estimates were obtained. In OLS regression,
we can typically compute the total variance
σˆ 2 = ∑ e / (n − Q − 1).
2
ij (12.12)
in the model as:
As noted in the OLS model, the error
n
variance in a model can be viewed as the
∑ (Y − Y ) .
i =1
i
2
(12.9) distance between ŷ and y. Likewise, in
MLM after the ŷ are calculated using the
In a multilevel model, we must remember full model, the error variance for the total
that we typically define the null model as hav- model can be explained as:
ing both fixed effects (the grand mean for the
dependent variable) and random effects (the J nj
variation of each group’s mean around that

grand mean). In defining our model in this
σ′ 2
error = ∑ ∑( y
j =1 i =1
ij − yîj ) , (12.13)
2
manner, we cannot simply compute the total

variance in the same manner as it is computed where ŷij is the predicted value for person i
in Equation 12.9. Instead, the total variance in group j based on the full model:
must be the predicted values of y when only
the grand mean is used as a predictor, or: yîj = γ 00 + γ q 0 Xij + γ 0qWj + u0 j + eij , (12.14)
yij = γ 00 (cons ) + u0 j . (12.10) and Wj is a level-2 predictor. Once these val-

ues are computed, an R2 value may be com-
Notice that the random estimate for indi- puted as:
viduals has been left out of Equation 12.10.
By doing this, the grand mean value of the σ error
′2
1− . (12.15)
dependent variable is being estimated for the σ total
′2
entire model and the random level-2 deviate
u0j (also known as β0j for each group). We
then can compute the total amount of vari- 12.2.1 R2 Measure that
ance in this model as: Incorporates a Gaussian
Probability Density Function
J nj
So far, these formulas are not very dissimilar
σ′ 2
total = ∑ ∑( y
j =1 i =1
ij − y j ) = eij2 , (12.11)
2
from the previously proposed estimators of
variance explained, with the difference being
that they do not use the unbiased estimator where eij′ is an estimate of the residual for
for variance. However, consider if we were to each i person in the jth group in the null
aggregate this formula to the level-2 grouping model, and:
structure such that we gain an R2 value for
each level-2 group and then average across p( yij ) = p(dij | s j ) p(s j ) (12.19)
all groups. Doing so would further enhance
the above formulas such that the estimate of where p(dij • sj) is the probability of the per-
variance explained would be defined by: son i, given that they belong to the jth group,
and p(sj) is the probability of group j, given
R 2j k , (12.16) the entire sample of level-2 units. In extrap-
olating Equation 12.19 further and apply-
where k is the number of level-2 groups ing the probability density function from a
and, Gaussian distribution, it can be shown that:
ei2 ( full ) 1  (eij′ )2 

R 2j = 1 − , (12.17) p( yij ) = • exp  − 2σ 2 
ei2 (null ) 2 πσ ij2 ⋅ σ 2j  ij 
(12.20)
 (u ′j )2 
with ei2 representing the measure of each • e xp  − 2 
,
residual for the ith person in each distinct  2σ j 
group j for both the full model and the null
model. Although this would represent an with σ ij2 representing the variance of the
“average” R2 for the entire model by produc- i individuals around their jth group mean
ing a mean R2 based on each group’s R2, it for the null model, σ 2j is the variance of β0j
does not take into account that the original around γ00, and eij′ and u ′j are the residual
estimation method used to produce these scores for the level-1 and level-2 estimates,
values was based on a probability of inclu- respectively.
sion in the model from maximum likelihood As would follow from Equation 12.13, the
estimates. Unless each group has exactly the model error could be thought of as:
same sample size and the same probability of
selection, it would not follow to use Equations ∑ p( y )(e ′′)
ij ij
2
12.16 and 12.17 to solve for a model R2. σ error

2 =
ij
, (12.21)
By inserting the Gaussian probability ∑ p( y ) ij
density function into the above equations, ij
we could gain a measure of R2 as a function

where eij′′ is an estimate of the residual for
of the probability of inclusion of the given
each i person in the jth group in the full
value assuming the model. Doing so would
model, and:
modify Equation 12.11 such that:
1  (eij′′)2 
∑ p( y )(e ′ )
ij
ij ij
2 p( yij ) =
2 πσ ij2 ⋅ σ 2j
• exp  − 2σ 2 
 ij 
(12.22)
σ 2
= , (12.18)
∑  (u ′′j )2 
total
p( y ) ij • exp  − ,
2 
ij  2σ j 
with σ ij2 representing the variance of the the slope coefficient is simply the weighted
i individuals around their jth group mean least squares estimator (or maximum likeli-
for the full model, σ 2j is the variance of β0j hood estimate) γ00 where:
around γ00, and eij′′ and u ′′j are the residual
scores for the level-1 and level-2 estimates,
respectively. The final estimate of variance
γˆ 00 = ∑∆ −1
j •jY ∑∆ −1
j , (12.24)
explained could then be derived from com-

bining Equations 12.18 and 12.21: and ∆j is the sum of the two variance compo-
nents Var(u0j) and Var( e• j ; see Raudenbush
σ error
2
& Bryk, 2002, p. 40 for further discussion).
1− . (12.23)
σ total
2
Put simply, the grand estimate for the mean
of all of the groups (γ00) is the sum of all of
The strength of a measure such as this is the group means ( Y• j ) after applying the
twofold. First, it puts what would normally precision parameter ( ∆ −1 j ) and then divid-
be a complicated interpretation of a model ing by the sum of the precision parameters.
into a palatable form for the less-informed The effect of these precision parameters on
researcher who might be reading an MLM the grand estimate is to apply more weight to
analysis. Although it often seems that the the groups that are measured with more pre-
goal of many statistical concepts is to con- cision (cf., more level-1 units). This formula
fuse the graduate student (e.g., the multi- for grand estimates also could be applied to
plicity of effect sizes currently available for the idea of an R2 effect size measure.
use), in doing so, we only confuse the future In typical OLS regression, the multiple R2
researcher, and likewise, future research. can be expressed as:
Second, it allows the researcher, outside
of the Akaike Information Criterion AIC n
(Akaike, 1987) or the Bayesian Information

Criterion BIC (Schwarz, 1978), with a single
∑ ( yˆ − y )i i
2
R2 = 1 − i =1
n
, (12.25)
∑( y − y )
statistic to interpret just how well a model is 2
i
performing. Since the goal of most research is
i =1
to find variables that fully describe the varia-
tion in the dependent variable, a measure
like this could potentially prove very useful where yi is any given individual’s score
in helping the researcher make judgments on the dependent variable, ŷi is that indi-
about the effectiveness of a MLM model. vidual’s predicted score from the linear
regression equation, and y is the mean of
all individuals on the dependent variable.
12.2.2 Group Initiated R2 based on For any given group within a set of level-2
Weighted Least Squares units, Equation 12.25 could be considered
In addition to alternative ways of comput- the mathematical equivalent of:
ing an effect size mentioned above, another
type of effect size can be conceived through nj
( yîj − yij )2
∑
1
maximum likelihood methods. In a typical R = 1−
2
j , (12.26)
nj σ ij2
multilevel ANOVA, the grand estimate for i =1
where nj is the number of people in group we are producing an equation for each
j and σ ij2 is the variance of the individuals second-level group based on weighted least
in group j. For simplification purposes, we squares estimators. It seems appropriate,
will define the latter part of Equation 12.26 then, to produce an entire model R 2 that
as being an error term corresponding to a is also weighted for the probability of the
normalized error for a given group. In rep- group from which the estimate was drawn.
resenting this with the term Ei, Equation In expanding Equation 12.27 to include
12.26 can be thought of as: all groups and also reflect the need to use
a weighted estimator, RT2 could be thought
nj of as:
∑E .
1
R = 1−
2
j i (12.27)
nj i =1 1
RT2 = 1 − J
E j , (12.30)
This would mean that the total weighted ∑ p ( s )n

j =1
j j
least squares normalized error for all groups

could be thought of as:
where Ej is the total weighted least squares
J
normalized error for all groups from
Ej = ∑ p(s ) E ,
j =1
j i (12.28) Equation 12.28. We can further deduce
that RT2 :
where p(sj) is the probability of group j exist- J
∑ p(s ) E ,
ing given the sample of all j groups such 1
= 1− J j j
that:
∑ p ( s )n
j =1
j j
j =1
1  (u ′j )2 
p(s j ) = • exp  −  , (12.29) J  nj

∑ ∑
1
2 πσ 2j  2σ j   p(s j )
2
= 1− J
Ei  ,
∑ p ( s j )n j j =1  i =1 
with u ′j being the residual score for group j =1
j around the mean of all groups and σ 2j nj

J J  
the variance of j groups’ means around the
grand mean of the dependent variable.

∑
j =1
p ( s j )n j − ∑ j =1
 p(s j )

∑
i =1
Ei 

= ,
In performing analyses that are not J
multilevel in nature, we could simply com- ∑ p ( s )n j j

pute the R 2 for each group, and then aver- j =1
age these values across all groups to obtain nj

J  
an average group R 2 based on the sample
of groups that we drew from the greater
∑
j =1
p ( s j )n j  1 − 1/ n j

∑ i =1
Ei 

= .
population of all level-2 units. As was J
previously stated, this makes little sense, ∑ p ( s )n j j
however, since in multilevel modeling j =1

And since we already have defined R 2j in explanation, it is hoped that the continued
Equation 12.26, we can then interpret: development of these models will help in
their proliferation.
J There is a caution, however, in making
∑ n ⋅ p(s )⋅ R
j =1
j j
2
j
these models more accessible. Just because
a researcher has the software and program-
RT2 = , (12.31)
J
ming skills to utilize complicated techniques
∑ n ⋅ p(s )
j =1
j j does not mean that that technique is war-
ranted. With the growth of a likewise com-
plicated field of statistics, MLM, Goldstein
which, theoretically, is simply the weighted (1995) voiced similar concerns:
least squares average of all of the R2 val-
ues from each group. What is present There is a danger, and this paper reminds
in Equation 12.31 is a solution that will us of it, that multilevel modeling will
become so fashionable that its use will be
produce estimates similar in interpreta-
a requirement of journal editors, or even
tion to OLS R2 measures. Equation 12.31
worse, that the mere fact of having fitted a
seems considerably more appropriate than multilevel model will become a certificate
Equations 12.15 and 12.23, since it honors of statistical probity. That would be a great
both the nesting structure of the data and pity. These models are as good as the data
the fact that the model was derived through they fit; they are powerful tools, not uni-
weighted least squares estimates. versal panaceas. (p. 202)
It is our sincere hope that develop-

ing MLM as a more user-friendly field of
statistics will improve its utilization and
12.3 Conclusions interpretation, but along with this must
Snijders and Bosker (1999) presented good come responsibility in evaluating such
arguments for instances when Equations models.
12.6 and 12.7 produce results that yield
negative values for R2 in multilevel mod-
els. While it is sometimes helpful to be
References
able to know the variance accounted for
at each level of the MLM model, the lan- Akaike, H. (1987). Factor analysis and the AIC.
Psychometrica, 52, 317–332.
guage with which researchers must refer to American Psychological Association. (2001). Publi-
these estimates is, at best, confusing to the cation manual of the American Psychological
non-MLM minded researcher. With the Association (5th ed.). Washington, DC: Author.
further encouragement from editors to Cohen, J. (1994). The earth is round (p < .05).
American Psychologist, 49, 997–1003.
begin reporting effect sizes in all research, Gelman, A., & Pardoe, I. (2006). Bayesian measure of
it is becoming more necessary for research- explained variance and pooling in multilevel
ers using MLM to be able to explain their (hierarchical) models. Technometrics, 48(2),
241–251.
results in a way that is common with other Goldstein, H. (1995). Hierarchical data modeling in
statistical methods. Although results from a the social sciences. Journal of Educational and
multilevel model probably will need further Behavioral Statistics, 20, 201–204.
Harlow, L. L., Muliak, S. A., & Steiger, J. H. (Eds.). Roberts, J. K. (2004). An introductory primer on multi-
(1997). What if there were no significance tests? level and hierarchical linear modeling. Learning
Mahwah, NJ: Erlbaum. disabilities: A contemporary journal, 2(1), 30–38.
Hox, J. (1995). Applied multilevel analysis. Amsterdam: Roberts, J. K., & Henson, R. K. (2002). Correction for
TT-Publikaties. bias in estimating effect sizes. Educational and
Hox, J. (2002). Multilevel analysis: Techniques and Psychological Measurement, 62(2), 241–253.
applications. Mahwah, NJ: Erlbaum. Schmidt, F. (1996). Statistical significance testing
Huberty, C. J. (2002). A history of effect size indices. and cumulative knowledge in psychology:
Educational and Psychological Measurement, Implications for the training of researchers.
62(2), 227–240. Psychological Methods, 1, 115–129.
Kirk, R. E. (1996). Practical significance: A con- Schwarz, G. (1978). Estimating the dimension of a
cept whose time has come. Educational and model. Annals of Statistics, 6, 461–464.
Psychological Measurement, 56, 746–759. Shaver, J. (1985). Chance and nonsense. Phi Delta
Knapp, T. R., & Sawilowsky, S. S. (2001a). Constructive Kappan, 67, 57–60.
criticisms of methodological and editorial prac- Snijders, T., & Bosker, R. (1999). Multilevel analysis.
tices. The Journal of Experimental Education, 70, Thousand Oaks, CA: Sage.
65–79. Thompson, B. (1996). AERA editorial policies regarding
Knapp, T. R., & Sawilowsky, S. S. (2001b). Strong argu- statistical significance testing: Three suggested
ments: Rejoinder to Thompson. The Journal of reforms. Educational Researcher, 25(2), 26–30.
Experimental Education, 70, 94–95. Thompson, B. (2001). Significance, effect sizes, step-
Kreft, I., & de Leeuw, J. (1998). Introducing multilevel wise methods, and other issues: Strong argu-
modeling. Thousand Oaks, CA: Sage. ments move the field. Journal of Experimental
Lee. V. E. (2000). Using Hierarchical linear modeling to Education, 70, 80–93.
study social contexts: The case of school effects. Wilkinson, L., & American Psychological Association
Educational Psychologist, 35(2), 125–141. Task Force on Statistical Inference. (1999).
Raudenbush, S. W., & Bryk, A. S. (2002). Hierarchical Statistical methods in psychology jour-
linear models: Applications and data analy- nals: Guidelines and explanation. American
sis methods (2nd ed.). Thousand Oaks, CA: Psychologist, 54, 594–604. [reprint available
Sage. through the APA Home Page: http://www.apa.
Roberts, J. K. (2002). The importance of intraclass org/journals/amp/amp548594.html]
correlation in multilevel and hierarchical lin- Xu, R. (2003). Measuring explained variation in linear
ear modeling designs. Multiple linear regression mixed effects models. Statistics in Medicine, 22,
viewpoints, 28(2), 19–31. 3527–3541.

Explained Variance in Multilevel Models

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Explained Variance in Multilevel Models

Uploaded by

Copyright:

Available Formats

This article was downloaded by: University College London

On: 28 Jan 2017

Handbook of Advanced Multilevel Analysis

Joop J. Hox, J. Kyle Roberts

Explained Variance in Multilevel Models

PLEASE SCROLL DOWN FOR DOCUMENT

Full terms and conditions of use: https://www.routledgehandbooks.com/legal-notices/terms.

Holly Stovall and Virginia Foster

2001) is the encouragement of researchers degree to which individuals share common

by Raudenbush and Bryk (2002, p. 74) and

TABLE 12.1 as one minus the combined variance at both

Once again this solution is extremely prob- n / (n + 1)σˆ α2 + σˆ 2y / n

and predictor variables and the dependent vari-

variation of each group’s mean around that

manner, we cannot simply compute the total

yij = γ 00 (cons ) + u0 j . (12.10) and Wj is a level-2 predictor. Once these val-

ei2 ( full ) 1  (eij′ )2 

12.16 and 12.17 to solve for a model R2. σ error

we could gain a measure of R2 as a function

explained could then be derived from com-

(Akaike, 1987) or the Bayesian Information

This would mean that the total weighted ∑ p ( s )n

least squares normalized error for all groups

where p(sj) is the probability of group j exist- J

with u ′j being the residual score for group j =1

j around the mean of all groups and σ 2j nj

­multilevel in nature, we could simply com- ∑ p ( s )n j j

age these values across all groups to obtain nj

previously stated, this makes little sense, ∑ p ( s )n j j

however, since in multilevel modeling j =1

It is our sincere hope that develop-

You might also like

multilevel in nature, we could simply com- ∑ p ( s )n j j