Professional Documents
Culture Documents
Author(s): R. J. A. Little
Source: Journal of the American Statistical Association, Vol. 88, No. 423 (Sep., 1993), pp.
1001-1012
Published by: Taylor & Francis, Ltd. on behalf of the American Statistical Association
Stable URL: http://www.jstor.org/stable/2290792
Accessed: 06-04-2018 10:21 UTC
JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide
range of content in a trusted digital archive. We use information technology and tools to increase productivity and
facilitate new forms of scholarship. For more information about JSTOR, please contact support@jstor.org.
Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at
http://about.jstor.org/terms
American Statistical Association, Taylor & Francis, Ltd. are collaborating with JSTOR to
digitize, preserve and extend access to Journal of the American Statistical Association
This content downloaded from 132.64.29.203 on Fri, 06 Apr 2018 10:21:44 UTC
All use subject to http://about.jstor.org/terms
Post-Stratification: A Modeler's Perspective
R. J. A. LITTLE *
Post-stratification is a common technique in survey analysis for incorporating population distributions of variables into survey
estimates. The basic technique divides the sample into post-strata, and computes a post-stratification weight wh = rPh/rh for each
sample case in post-stratum h, where rh is the number of survey respondents in post-stratum h, Ph is the population proportion from
a census, and r is the respondent sample size. Survey estimates, such as functions of means and totals, then weight cases by Wh.
Variants and extensions of the method include truncation of the weights to avoid excessive variability and raking to a set of two or
more univariate marginal distributions. Literature on post-stratification is limited and has mainly taken the randomization (or design-
based) perspective, where inference is based on the sampling distribution with population values held fixed. This article develops
Bayesian model-based theory for the method. A basic normal post-stratification model is introduced which yields the post-stratified
mean as the posterior mean, and a posterior variance that incorporates adjustments for estimating variances. Modifications are then
proposed for small sample inference, based on (a) changing the Jeffreys prior for the post-stratum parameters to borrow strength
across post-strata, and (b) ignoring partial information about the post-strata. In particular, practical rules for collapsing post-strata to
reduce posterior variance are developed and compared with frequentist approaches. Methods for two post-stratifying variables are
also considered. Raking sample counts and respondent counts is shown to provide approximate Bayesian inferences when the margins
of the two post-stratifiers are available but their joint distribution is not. When the joint distribution is available, raking effectively
ignores the information it contains, and hence can be compared with other techniques that ignore information such as collapsing.
For inference about means, it is suggested that raking is most appropriate when post-stratum means have an additive or near-additive
structure, whereas collapsing is indicated when interactions are present.
KEY WORDS: Collapsing strata; Raking; Stratification; Superpopulation models; Survey inference; Weighting.
1001
This content downloaded from 132.64.29.203 on Fri, 06 Apr 2018 10:21:44 UTC
All use subject to http://about.jstor.org/terms
1002 Journal of the American Statistical Association, September 1993
From the randomization perspective it is natural to view sparse cells and as such can be compared with alternative
post-stratification as a method of weighting adjustment, methods, such as collapsing. Section 4 suggests some topics
which is the form in which the method typically appears to for future research.
the survey analyst. A weight is attached to each sample unit,
that is proportional to the number of population units the 2. POST-STRATIFICATION TO A SINGLE MARGIN
unit "represents" (Fig. 1 a). From a predictive modeling per-
spective, the data are more accurately depicted as in Figure 2.1 The Post-Stratified Mean
This content downloaded from 132.64.29.203 on Fri, 06 Apr 2018 10:21:44 UTC
All use subject to http://about.jstor.org/terms
Little: Post-Stratification 1003
2.2 The Randomization Variance of the Post- refined, but it is of secondary importance; the main features
Stratified Mean of (6) are the inclusion of a distinct mean and variance in
each post-stratum and the iid assumption within post-strata,
The appropriate randomization variance for Y-ps is con-
which requires modification for designs with clustering and
troversial. The unconditional sampling vaiiance is
differential selection rates within post-strata.
var(yJS) = E{varQ(i7pIr) } + var{ E(Yp, Ir)} A standard Bayesian analysis under (6) yields the posterior
distribution of Y as a mixture of t distributions with mean
= E{var(jYps Ir)}, (2) and variance
which is analogous to the estimator in randomization infer- when comparing Yps with y. The unconditional sampling
ence and the posterior variance variance of y 1is
This content downloaded from 132.64.29.203 on Fri, 06 Apr 2018 10:21:44 UTC
All use subject to http://about.jstor.org/terms
1004 Journal of the American Statistical Association, September 1993
bution of Y in post-strata where rh is small; the posterior post-strata by a single mean and variance for the combined
mean Yh is poorly determined, and the posterior variance post-strata. The posterior mean under this collapsed model
VPS requires at least four respondents in each post-stratum
is
(yiwy, U2) -ind G(,u a2); p(,U, log a) = const, (11) (Pi + Pj)2(l j-f) 2
j (13)
which again yields E( Y I Ys) = - and var( Y I Ys) = v. Note ri + rj
that (10) implies (1 1), but not vice versa. Under MCAR the
where a? and a? are the expected values of s? and s .. To
distinction between (10) and (1 1) is of no practical import, express (13) in terms of sample weights, note that if Z and
because they yield the same inference. When data are not
Yare independent, then = = i= say, and (13)
MCAR, inference under (10) is unaffected, but ( 11) can yield
reduces to
different answers because it does not assume independence
of Y and Z.
E i/U21 W}
2(r, +-2(r+
r) Wi-j } 2
Bayesian analysis under (6) leads to posterior mean Yps
and variance vps; Bayesian analysis under (10) or ( 11) leads
Hence in this case the variance is always reduced by col-
to posterior mean - and variance v. The comparison of pre-
lapsing, provided that the weights wi and wj in the collapse
cisions thus is between vps and v, which correspond to the post-strata are unequal. But when Z and Y are associated,
conditional variance of -ps given { r } and the unconditional this reduction is counteracted by an increase in variance
variance of y in the randomization framework. Thus the
U?. from collapsing. (The increase in variance is squared bias
Bayesian approach leads to the two natural measures of pre- from the conditional frequentist perspective of Section 2.2.)
cision (vps and v) and also provides small-sample corrections The following collapsing algorithm based on (13) is sug-
(6, ah) for estimating the variance. gested:
2.5 Algorithms for Collapsing Post-strata 1. Order the post-strata so that neighbors are a priori rel-
atively homogeneous. If they are based on an ordered variable
2.5.1 An Algorithm Assuming Missing Completely at
(such as grouped age), then this step is not needed.
Random. A useful compromise between using and ignoring
2. Collapse the post-stratum pair (i, j) that maximizes
post-stratum counts is to collapse small post-strata that con-
(13), subject to the restriction that j = i + 1, that is, only
tribute excessively to the variance. Frequentist strategies for
neighboring pairs are considered. (This restriction is designed
collapsing were considered by Tremblay (1986) and by Kal-
to limit the increase in the variance from collapsing, in that
ton and Maligalig (199 1). This subsection proposes practical
neighboring post-strata are more likely to be homogeneous
criteria for collapsing under MCAR, aimed at reducing the
with respect to outcomes.)
expected value of the posterior variance vsp associated with
3. Proceed sequentially until a reasonable number of
the BNPM (6). The next subsection proposes a modification
pooled post-strata remain, or (13) becomes noticeably neg-
when MCAR is not assumed, and hence bias may be an
ative.
issue.
Collapsing two post-strata Z = i and Z = j is interpreted The main task in operationalizing this algorithm is to
as combining the population proportions Pi and Pj and choose forms for updating the variances termls C2 and o]J,
modifying (6) by replacing the means and variance in these which depend on the form of association between Z and Y.
This content downloaded from 132.64.29.203 on Fri, 06 Apr 2018 10:21:44 UTC
All use subject to http://about.jstor.org/terms
Little: Post-Stratification 1005
This content downloaded from 132.64.29.203 on Fri, 06 Apr 2018 10:21:44 UTC
All use subject to http://about.jstor.org/terms
1006 Journal of the American Statistical Association, September 1993
Table 1. Los Angeles ECA: (A) Sample and (B) Census Counts by Age in Eight Stratum Groups
Stratum group
AGE (A) (B) (A) (B) (A) (B) (A) (B) (A) (B) (A) (B) (A) (B) (A) (B)
(continued)
This content downloaded from 132.64.29.203 on Fri, 06 Apr 2018 10:21:44 UTC
All use subject to http://about.jstor.org/terms
Little: Post-Stratification 1007
Table 1. (continued)
Stratum group
AGE (A) (B) (A) (B) (A) (B) (A) (B) (A) (B) (A) (B) (A) (B) (A) (B)
85 2 74 1 140 0 66 0 12 1 20 0 44 0 55 0 8
86 0 57 0 112 1 55 1 12 2 33 0 48 0 25 0 4
87 0 55 0 96 1 47 0 5 1 21 0 37 1 34 0 22
88 1 48 0 89 0 34 0 5 0 17 1 32 0 14 0 7
89 1 29 0 74 0 48 0 10 0 15 1 28 1 20 0 2
90 0 30 0 75 0 29 0 3 2 12 0 24 0 13 0 5
91 0 30 0 53 0 31 0 5 0 17 0 21 0 9 0 2
92 1 21 0 45 0 24 0 5 0 10 0 11 0 5 0 2
93 0 15 0 32 0 18 0 1 0 8 0 6 0 10 0 1
94 0 21 0 33 0 10 0 2 0 2 0 7 0 5 0 1
95 0 11 0 23 0 7 0 2 0 1 0 5 0 3 0 1
96 0 9 0 12 1 8 0 4 0 3 0 4 0 2 0 2
97 0 6 0 8 0 6 0 0 0 0 0 4 0 3 0 1
98 0 2 0 6 0 4 0 0 0 0 0 0 0 5 0 1
99 0 4 0 3 0 16 0 3 0 4 0 0 0 12 0 4
NOTE: F = Female, M = Male, N = Non-Hispanic, H = Hispanic; E = East Los Angeles, W = West Los Angeles.
(r, /PI )/ (rl/Pj) in this expression yields the following bias:yielding a random effects model. With normal specifications,
the prior:
E{L J}=(PI+PJ)((ri + r) (P + P))( i) P(Ah I Oh) ind G(A, T2); a = a2 for all h;
The reduction in expected MSE from collapsing is thus yields estimates of Y that smooth Yps toward yj (Scott a
Smith 1969; Holt and Smith 1979; Little 1986). If variances
E(MSE,J) = E(/v\I) - El{ Lm }2, (17) are not assumed constant:
This content downloaded from 132.64.29.203 on Fri, 06 Apr 2018 10:21:44 UTC
All use subject to http://about.jstor.org/terms
1008 Journal of the American Statistical Association, September 1993
130 -
SUBGROUP 1: FNE SUBGROUP 2: FNW
120 -
R
E 110-
V60 -10
A
R
90 -
A ....... ..... ........ ... ...-............
C 80 .
E--
70
L~~~~~~~~~~~~~~~~~~~~~~
60 -
130-
SUBGROUP 3: FHE SUBGROUP 4: FHW
120 -
R
E 110-
L
V 100 _
A
R
I90
N
C 80
E
70-
60-
0 10 20 30 40 50 60 700 0 10 20 30 40 50 60 70
130 -
SUBGROUPS5: MNE SUB3GROUP 6:MN
120-
R
E 110
L
V 100
R
A
N
C 80
E
70-
60-
130-
SUBGROUP 7: MHE SUB3GROUP 8: MHW
120-
R
E110
L
V 100
A
R
I90
A
N
This content downloaded from 132.64.29.203 on Fri, 06 Apr 2018 10:21:44 UTC
All use subject to http://about.jstor.org/terms
Little: Post-Stratification 1009
4- 0
w~~~~~~~~~~~~~~
E~~~~~~~~~~~~~~
H S S
l l l l l l l l
1 2 24 36 48 60 72 84 96 1 6 24 32 4'0 48 56 64 72 80
POST-STRATUM, COLLAPSING ONLY ZERO POST-STRATA POST-STRATUM, COLLAPSING To T
This content downloaded from 132.64.29.203 on Fri, 06 Apr 2018 10:21:44 UTC
All use subject to http://about.jstor.org/terms
1010 Journal of the American Statistical Association, September 1993
(6) for Y given Z1, Z2 is the basic normal two-way post- that is, the response rate is a product of row and column
stratification model effects (see also Binder and Theberge 1988, sec. 3). Little
and Wu showed that the assumptions about the 'Phk are un-
(yi Izl = h, Z21 = k, A1hk, Uhlk) -ind G(/Ihk, Ohk); testable from the data, and that other choices lead to other
P(/Ahk, log Ohk) = const. (19) estimation methods.
Combining these raking estimates with estimates of { Yjk }
The next section discusses inferences under this model. Sec-
from (19) yields the posterior mean of Y as
tion 3.3 considers modifications for small cells.
H K
3.2.1 The Posterior Mean of Y. The BNPM model (19) where { P,k } is given by (22) or (24), depending on
at hand.
with { Phk } known yields
3.2.2 The Posterior Variance of Y. If {Phk} is known,
E(YIZ, Ys) = s=
h k
E PhY, (20) then the posterior variance of Y under (19) is
{Phk} = Rake[{nhk}; {Ph.}, {P-k}], (22) PI,, I d) can be approximated by the asymptotic expressions
derived in Binder and Theberge (1988). Another approach,
which denotes the result of raking the sample counts { nhk
which} does not rely on asymptotics, is to simulate the pos-
to the known margins. Raking means iterative proportional
terior covariance matrix by computing T draws { Pt) } (t
fitting (Deming and Stephan 1940) of { nhk } to match suc- = 1, ..., T) from the posterior distribution of { Phk }, com-
cessively the row and column marginal distributions { Ph. } puting post-stratified means (ti) by substituting { P t') } for
and { P. k }; (22) approximates the posterior mean of { Phk } {Phk } in Yp, and then estimating (29) by the sample variance
because it is asymptotically equivalent to the maximum of y(t) over t = 1, .. . , T. For the model (23) that leads to
likelihood (ML) estimate (Brackstone and Rao 1979; Ireland raking the respondent counts, the appropriate draws are
and Kullback 1968), which is in turn asymptotically equiv- computed as
alent to the posterior mean. Note that (22) involves the sam-
ple counts, n, which do not enter into inferences about Y Phk = Rake[ { phk }; {Ph-, Pk }], (30)
for the-case of a single post-stratifier.
where {Phtk} are drawn from a Dirichlet distribution
If either Z1 or Z2 is subject to nonresponse as well as Y,
parameters { rhk + 1/ 2 }. For the model (21) that leads to
then the sample counts { nhk } are not known. A model is raking the sample counts { nhk }, an approximate procedure
then required to relate the respondent counts { rhk } to the
is to apply (24) where {p (t)h} are drawn from a Dirichlet
target population. Suppose that distribution with parameters { nhk + 1 / 2 . (Unfortunately,
{rhk} I r - MNOM[{ Phkhk/o }, r], (23) in this case the method does not yield exact draws from the
posterior distribution.) These simulation methods avoid
where Phk iS the probability of response in cell (h, k) and
computing and inverting the covariance matrix of { Phk } and
( = 2h 2k Phksohk. This model is underidentified given the
hence may be useful when this matrix has high dimension,
available data. But Little and Wu (1991) showed that the
as in large tables.
estimates based on raking { rhk } to the margins, namely
3.3 Treatment of Small Cells
{j5k } = Rake[{rhk}; {Ph.,'P.k}I, (24)
In this section I outline modifications of the inferences
are ML under the assumption that
under the BNPM for two post-stratifiers. A principled
'Phk = thf3k, (25) Bayesian approach is to change the flat prior in (19) for Y
This content downloaded from 132.64.29.203 on Fri, 06 Apr 2018 10:21:44 UTC
All use subject to http://about.jstor.org/terms
Little: Post-Stratification 1011
given Z1 and Z2 to borrow strength across post-strata. Mod- assess precision of the post-stratified mean, to suggest some
ifications should depend on context. One approach of interest approaches for the problem of sparse cells, and to provide a
is to model high-order interactions of Y on Z as random Bayesian perspective on raking. Many important topics are
effects. For example, write left untouched. Inference about other quantities, such as
population slopes, has not been considered. The survey
Ahk = A + Ah + 3k + 'Yhk
practitioner can rightly complain that deviations from simple
random sampling are restricted to nonresponse and coverage
and replace the prior for { /Ahk } in (19) by a prior that is flat
for { IA, ah, 1k } but models the interactions 'Yhk as normalerror-models need to be adapted to complex survey designs.
with mean 0 and variance a2. The result is a mixed ANOVA Another topic of real practical interest concerns post-
model with fixed main effects and random interactions. The stratification to margins that are subject to error, as in an
resulting prediction for Yhk (ignoring finite population cor- income classification where the census and survey definitions
rections) has the form differ modestly. I believe that the Bayesian approach provides
a framework for addressing this problem, because progress
Yhk = Uhkyhk + (1 - Uhk)Yhk,
is hard without some working model for the pattern of errors.
where Yhk = y + Ah + k is an additive fit and UhkAnother interesting topic untouched here is raking to totals
= nhkU(J/(nhkU1 + ohk) is a shrinkage factor, with estimates of an auxiliary variable (such as dollar amounts) rather than
&rz and &hk of the random effiects arz and ahk computed by to sample counts. The algebra for this form of raking is closely
ML or the method of moments. related to that for raking to sample counts (see, for example,
Modifications of the prior in ( 19) should be tuned to each Binder and Theberge 1988), but the underpinning models
outcome, and hence they may be impractical in large surveys appear quite different. Even statisticians unpersuaded by the
with many variables. Simpler strategies are based on ignoring modeling view of surveys can benefit from a study of these
partial information in the post-strata. A common strategy is underlying models, because they case light on how to choose
to rake sample or respondent counts to the margins, effiec- between alternative analysis approaches, which in the absence
tively ignoring information on the joint distribution of Z1 of models can appear capricious. For modelers, I suggest
and Z2 when this is available; raking the sample counts seems that post-stratification in particular, and survey methods in
preferable, because no model is implied for the response general, remain rich and somewhat neglected areas with
probabilities. Raking is well justified when the means of many problems still to be resolved.
outcome variables have an approximately additive struc-
[Received October 1991. Revised September 1992.]
ture, because if Yhk = yt + Ah + 1k, then E( Y Idata) =
+ zh Ph.- h + Ik P kAk involves only the marginal distri- REFERENCES
butions of Z1 and Z2, SO raking does not involve a loss of
Binder, D. A., and Theberge, A. (1988), "Estimating the Variance of Raking
information. Another way of seeing this is to note that Equa- Ratio Estimators Under Simple Random Sampling," Canadian Journal
tion (29) for the added uncertainty from raking remains valid of Statistics, 16, 47-55.
Brackstone, G. J., and Rao, J. N. K. (1979), "An Investigation of Raking-
with { Yhk } replaced by { Yhk -Yhk }, where { Yhk } are predicted Ratio Estimators," Sankhya, C, 41, 97-114.
values from any additive fit. If the additive fit is good, then Cochran, W. G. (1977), Sampling Techniques (3rd ed.), New York: John
the added variance (29) must be small. Wiley.
If interactions are expected, then a better approach may Deming, W. E., and Stephan, F. F. (1940), "On a Least Squares Adjustment
of a Sample Frequency Table When the Expected Marginal Totals Are
be to apply collapsing strategies as discussed in Section 2. Known," Annals of Mathematical Statistics, 11, 427-444.
Let Z1 denote a "primary" stratifier most closely related to W. W., and Kessler, L. G., eds. (1985), Epidemiologic Field Methods
Eaton,
survey outcomes, and let Z2 denote a secondary stratifier. in Psychiatry: The NIMH Epidemiologic Catchment Area Program, New
York: Academic Press.
Then the collapsing methods of Section 2 could be applied Ericson, W. A. (1969), "Subjective Bayesian Models in Sampling Finite
to Z2 classes separately for each value of Z, (as in Example
Populations, I," Journal of the Royal Statistical Society, Ser. B, 31, 195-
234.
1) or to the single classification of Z1 and Z2 obtained by
Goksel, H., Judkins, D. R., and Mosher, W. D. (1991), "Nonresponse Ad-
laying out the cells in serpentine order, with the Z2 index justments for a Telephone Follow-up to a National In-Person Survey,"
changing fastest. in Proceedings of the Survey Research Methods Section, American Sta-
tistical Association, pp. 581-586.
If the MCAR assumption is violated, as in the case of
Harte, J. M. (1982), "Post-Stratification Approaches in the Corporate Sta-
nonresponse, then collapsing strategies should focus on lim- tistics of Income Program," in Proceedings of the Survey Research Methods
iting nonresponse bias. One way of achieving this is to regressSection, American Statistical Association 1982, pp. 250-253.
Hanson, R. H. (1978), The Current Population Survey: Design and Meth-
response rates on the post-stratifying variables and then form
odology, Technical Paper No. 40, U.S. Bureau of the Census.
collapsed post-strata that are homogeneous with respect to Holt, D., and Smith, T. M. F. (1979), "Post Stratification," Journal of the
the estimated response rate (Little 1986; Rosenbaum and Royal Statistical Society, Ser. A, 142, 33-46.
Ireland, C. T., and Kullback, S. (1968), "Contingency Tables With Given
Rubin 1983). This strategy seems particularly useful when
Marginals," Biometrika, 55, 179-188.
the number of post-stratifiers is large. For a recent applica- Kalton, G., and Maligalig, D. S. (1991), "A Comparison of Methods of
tion, see Goksel, Judkins, and Mosher (1991). Weighting Adjustment for Nonresponse," Proceedings of the 1991 Annual
Research Conference, U.S. Bureau of the Census, pp. 409-428.
4. CONCLUSION Little, R. J. A. (1982), "Models for Nonresponse in Sample Surveys," Journal
of the American Statistical Association, 77, 237-250.
The main aims of this article are to lay out the Bayesian (1986), "Survey Nonresponse Adjustments for Estimates of Means,"
International Statistical Review, 54, 139-157.
predictive modeling framework for post-stratification, to
(199 la), "Inference With Survey Weights," Journal of Official Sta-
provide a Bayesian interpretation of the problem of how to tistics, 7, 405-424.
This content downloaded from 132.64.29.203 on Fri, 06 Apr 2018 10:21:44 UTC
All use subject to http://about.jstor.org/terms
1012 Journal of the American Statistical Association, September 1993
This content downloaded from 132.64.29.203 on Fri, 06 Apr 2018 10:21:44 UTC
All use subject to http://about.jstor.org/terms