You are on page 1of 4

This article was downloaded by: [Michigan State University]

On: 04 January 2015, At: 02:42


Publisher: Taylor & Francis
Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House, 37-41
Mortimer Street, London W1T 3JH, UK

The American Statistician


Publication details, including instructions for authors and subscription information:
http://www.tandfonline.com/loi/utas20

A Note on Sample Size Estimation for Multinomial


Populations
a
Robert D. Tortora
a
Statistical Research Division of the Economics, Statistics, and Cooperatives Service, U.S.
Department of Agriculture , Washington , DC , 20250 , USA
Published online: 12 Mar 2012.

To cite this article: Robert D. Tortora (1978) A Note on Sample Size Estimation for Multinomial Populations, The American
Statistician, 32:3, 100-102

To link to this article: http://dx.doi.org/10.1080/00031305.1978.10479265

PLEASE SCROLL DOWN FOR ARTICLE

Taylor & Francis makes every effort to ensure the accuracy of all the information (the Content) contained in the
publications on our platform. However, Taylor & Francis, our agents, and our licensors make no representations or
warranties whatsoever as to the accuracy, completeness, or suitability for any purpose of the Content. Any opinions
and views expressed in this publication are the opinions and views of the authors, and are not the views of or endorsed
by Taylor & Francis. The accuracy of the Content should not be relied upon and should be independently verified with
primary sources of information. Taylor and Francis shall not be liable for any losses, actions, claims, proceedings,
demands, costs, expenses, damages, and other liabilities whatsoever or howsoever caused arising directly or indirectly
in connection with, in relation to or arising out of the use of the Content.

This article may be used for research, teaching, and private study purposes. Any substantial or systematic
reproduction, redistribution, reselling, loan, sub-licensing, systematic supply, or distribution in any form to anyone is
expressly forbidden. Terms & Conditions of access and use can be found at http://www.tandfonline.com/page/terms-
and-conditions
In this section, The American Statistician publishes, in addi- of printed material) to Associate Editor Harry O. Posten,
tion to articles of interest to teachers of statistics, announcements Statistics Department, University of Connecticut, Storrs, Con-
and selected reviews of Teaching Materials of general use to the necticut 06268. A statement of intention that the material will be
statistical field. These may include (but will not necessarily be available to all requestors for a minimum of a two-year period
restricted to) curriculum material, collections of teaching examples should be provided, along with information on the cost (including
or case studies, modular instructional material, transparency sets, postage) and special features of the material. Information on
films. film-strips. video-tapes, probability devices, audio-tapes, classroom experience may also be included. All materials sub-
slides, and data deck sets (with complete documentation). mitted must be of general use for teaching purposes in the area of
Authors, producers, or distributors wishing to have such materials probability and statistics. Articles should sent to the Editor.
announced or reviewed should submit a single copy (three copies

A Note on Sample Size Estimation for Multinomial Populations


ROBERT D. TORTORA *
Downloaded by [Michigan State University] at 02:42 04 January 2015

A method is described for determining the sample size required and if this is the primary consideration, then the
for a specified precision simultaneous confidence statement about appropriate sample size is different from that based on
the parameters of a multinomial population. The method is based
the consideration of only one blood group at a time.
on a simultaneous confidence interval procedure due to Goodman,
and the results are compared with those obtained by separately In this note, a procedure is given for determining
considering each cell of the multinomial population as a binomial. the sample size required for a simultaneous confidence
KEY WORDS: Multinomial populations; Sample size estimation;
statement of specified precision about the parameters
Simultaneous confidence intervals. of a multinomial population. The situation considered
is that of a simple random sample from a large
population where each population unit is classified into
1. Introduction
one of k mutually exclusive categories. For other sam-
ple designs, the design effects must be considered.
The many variables measured in a survey and the
The procedure is based on a simultaneous confidence
multiple uses of survey data make the determination
interval procedure due to Goodman (1965), and the
of a reasonable sample size to use for a sample
resulting sample sizes are compared with those that
survey an onerous task for the statistician. Fre-
would be obtained by considering a confidence interval
quently the problem is restricted to the investigation
for a binomial parameter.
of the effect of the sample size on the estimates as-
sociated with one or two key variables. For a classifica-
tion variable, Cochran (1963, p. 71) gives an illustra-
2. Methodology
tion where an anthropologist studying the inhabitants
of an island is interested in taking a sample survey to
Consider a population of units divided into k mu-
estimate the percentage of the inhabitants that belong
tually exclusive and exhaustive categories. Let IIi,
to blood group O. The observations to be obtained are
i = I, . . . , k , be the proportion of the population in
treated as a simple random sample of Bernoulli trials,
the ith category, and let n io i = I, . . . , k, be the
and the sample-size effect on the estimate is measured
frequency observed in the ith category in a simple
in terms of the standard error of the estimate or in
random sample of size n from the population.
terms of a confidence interval width. This is a practical
For a specified value of ex, we wish to obtain a set of
and accepted approach to the determination ofa sample
intervals S io i = I, ... , k, such that
size; however, suppose the anthropologist specifically
k
wants to make a joint statement about the distribution
of all blood types. That is, he asks how large a
Pr{ n (IIi E SJ} 2':: I - ex;
i=1
sample he should take to be reasonably confident that
that is, we require the probability that every interval
the observed percentages of each blood type, A, B, 0,
S, contains IIi to be at least I - ex. Goodman (1965)
and AB, will all be within a specified range of the true
gives the approximate large-sample confidence interval
percentages. This is clearly a simultaneous statement,
bounds (when n --'> (0) as

* Robert D. Tortora is a mathematical statistician in the Sta-


tistical Research Division of the Economics, Statistics, and where
Cooperatives Service, U.S. Department of Agriculture, Washington, II i- = IIi - [BIIi(l - IIi)/n] 1/2, (2.1)
DC 20250. The author wishes to thank G. David Faulkenberry for
several valuable comments. 11/ = IIi + [BIIi(l - IId/n]l/2, (2.2)

100 The American Statistician, August 1978, Vol. 32, No.3


and B is the upper (alk) x lOath percentile of the X2 knows from previous work on similar islands that
distribution with 1 degree of freedom. approximately 27 percent have blood type A, 43 per-
Examining equations (2.1) and (2.2), we see that cent have blood type 0, 19 percent have blood type
[IIi ( 1 - II din]!/2 is the standard deviation for the ith cell B, and 11 percent have blood type AB. He requires
of the multinomial population. Also, recall that each an absolute precision of 5 percent for each propor-
marginal probability mass function is binomial. If N is tion and a confidence coefficient of .95. In our notation,
the total population size, then using the finite popula- b, = .05, i = 1, ... ,4, and a = .05. We assume the
tion correction factor (fpc) and the variance for each island has a large enough population to ignore the fpc.
IIi (Cochran 1963, p. 50), we obtain approximate Since hi = .05 for each blood type, only one calcula-
confidence bounds of tion, for the proportion closest to ~, is needed. Using
(2.8) we obtain a sample size of 624 inhabitants.
II i- = IIi - [B(N - n)IIi(l - IId/(N - I)nJl/2, (2.3)
II i+ = IIi + [B(N - n)IIi(l - IIi)/(N - I)n]l/2. (2.4)
3. Comparison with Binomial Sample Sizes
Note as N ~ 00, (2.3) and (2.4) converge to (2.1) and
(2.2), respectively.
Cochran (1963) presents an approach which can be
To determine the required sample size, the pre-
modified to determine sample size in a multinomial
cision for each parameter in the multinomial popula-
interval estimation setting. This approach is to consider
tion must be specified. Suppose we require an
each cell versus the remaining cells as a binomial
absolute precision of hi for each cell. Then (2.1) and
distribution and obtain a set of binomial sample size
(2.2) become
Downloaded by [Michigan State University] at 02:42 04 January 2015

estimates for the individual cell proportions. One would


IIi - hi = IIi - [BIID - IIdln]l/2, (2.5) use the largest estimated sample size for the survey.
This same course is followed in confidence interval
IIi + b, = IIi + [BII,(l - IIJln]l/2, (2.6)
estimation. Binomial confidence interval estimates are
respectively. Similar results are obtained when the fpc made for the individual cell proportions. The dif-
is included. Equations (2.5) and (2.6) imply ficulty here arises from the inability to assess the value
(2.7) of the confidence coefficient for the entire set of
intervals, or even to make statements concerning the
Squaring (2.7) and solving for n gives relative values of the proportions.
For completeness, recall that for the binomial case
n = BII;(I - IIdlhl, (2.8)
an estimate of sample size is given by
or, using the fpc,
(3.1 )
where t is the abscissa of the normal curve that cuts
Therefore, one should make k calculations, one for off a total area a at the tails, and b is the absolute
each pair (hi,II i), i = 1, ... , k , and select the largest precision required. Using the example of Section 2, we
n as the desired sample size. As functions of IIi and get from (3.1) sizes of303, 377, 236, and 150 for blood
b.; (2.8) and (2.9) show that n increases as IIi ~ ~ or types A, 0, B, and AB, respectively. The one cell at a
b, ~ O. When hi = b , the only calculation required is time approach indicates that 377 is the desired sample
for the II, closest to ~. If there is no prior knowledge size, compared with a sample size of 624 for the si-
about the values of the IIi'S, a "worst" case calcula- multaneous confidence interval approach.
tion of sample size can be made assuming some A direct comparison of the binomial with the si-
IIi = ~ and hi = b for i = I, ... ,k. It is n = B/4b 2 multaneous confidence interval approach can be ob-
Often a relative precision h.' is specified for each cell. tained. Ignoring the fpc, we get from (2.8) and (3.1),
Here hi = h/II,. Substituting this in (2.8) gives for the same precision,

(2.10) nln' =Blt 2. 0.2)

Of course a sample size calculation including the fpc The following tabulation gives values ointn' for various
can be computed as above. values of k and a for the same precision:
Here again, one should make k calculations, one for k
each pair (b;',IIJ, i = I, ... , k. The largest n com-
puted is selected as the desired sample size. As 4 5 10
IIi ~ a or hi' ~ 0, the sample size increases according
.1 1.71 1.84 2.04 2.44
to (2.10). If h/ = hi for all i , then the largest sample .05 1.53 1.66 1.73 2.05
size is n = B(l - II)/Ilb'2, where II = min(II!, ... ,Ilk)'
We illustrate this approach with a numerical ex- Suppose we are to estimate proportions associated
ample. Suppose the anthropologist wishes to estimate with a four-parameter multinomial distribution. Then the
the proportion of inhabitants on an island having tabulation says that if the binomial estimate of sample
blood types A, 0, B, and AB. The anthropologist size is lOa, the multinomial estimate would be 166 for

The American Statistician, August 1978, Vol. 32, No.3 101


a confidence coefficient of 0.95. Note that the tabula- References
tion summarizes the comparison for either absolute or
relative precision, since for relative precision one also Cochran, William G. (1963), Sampling Techniques, 2nd ed., John
obtains equation (3.2). Wiley & Sons.
Goodman, Leo A. (1965), "On Simultaneous Confidence Intervals
[Received April 7, 1977. Revised April 10, 1978.] for Multinomial Proportions," Technometrics , 7, 247-254.

Yarnold's Criterion and Minimum Sample Size


P. W. EATON*
A simple procedure for establishing minimum sample size in X' where q is the proportion of classes for which
goodness-of-fit tests is presented. Samples of this size will
np" < 5.
automatically satisfy Yarnolds criterion.
This rule raises no particular problems when used
Downloaded by [Michigan State University] at 02:42 04 January 2015

KEY WORDS: Minimum sample size; Goodness-of-fit; Yarnold's with samples that have already been drawn, but stu-
criterion; Multinomial distribution. dents have considerable difficulty in determining how
large samples yet to be drawn should be to satisfy this
Over the years various rules of thumb have been rule. They tend, especially in elementary courses, to
proposed for determining the adequacy of the chi- adopt a trial-and-error approach and, because of the
squared distribution as an approximation to the nature of this approach, to waste considerable time in
distribution of settling upon a sample size, and then to select a size
which is far too large.
X2 = i (r" - np"F For this reason a systematic procedure is needed
h~l np" which would select in a few steps the minimum
sample size n* consistent with Yarnolds criterion. I
where {rd is distributed as (n!/Ilr"!)Ilp,,T/'(O ~ r" ~ n;
propose the following procedure.
I 1"" = n; 0 < p" < I; I p" = I; n = 1,2, ... ). These Suppose the p" are ordered from largest to smallest:
rules have been formulated with safety as the dominant
P<kJ ::?: ::?: Pm ::?: ::?: Pin- Let m denote the max-
consideration and emphasize simplicity at the expense
of sensitivity. They have usually been expressed imum i such that Pen < kPll)' (Since Pm ~ 11k, such
an i exists.) Define kPm/O to be +00 and let Cw
solely in terms of sample size (n must be at least 20,
= kPm/(i - 1).
10,5, etc.), but occasionally the Pi values themselves
have been incorporated (e.g., np" ::?: 5 for all h), or the Procedure: With the p" ordered P(k) ::?: ::?: Pw
number of r,,-expectations falling below some standard Pllb begin with i = m and successively com-
::?: . ::?:
(e.g., one expectation may be as low as .5 provided the pare Pm with Cm until Pw ~ Cm is satisfied. Let s be the
remainder exceed 5) has been used. A difficulty with first such i. Then (except for rounding upwards where
these rules is that, on the one hand, they appear overly necessary),
conservative, while on the other hand, they ask the
n* = 5lcc.'+1l if PC.,) is less than PC,Hll, CC.'+1h and CC.,);
user to believe that the richness and complexity of the
multinomial can be adequately allowed for in so simple = 5/pc.,) otherwise.
a fashion.
Yarnold (1970), after detailed study of the problem,
proposed a criterion more closely matched to the nature Illustration 1: The marketing department of a maga-
of the multinomial. It applies when none of the p" need zine feels that the proportions of advertising com-
be estimated, and has received some attention in the panies falling into each of ten mutually exclusive and
literature (Kendall and Stuart 1973, p. 457). Yarnolds exhaustive categories (e.g., ten asset classes) are as
criterion states that for satisfactory approximation of follows: POOl = .3, p(9) = .2, p(S) = .2, Pm = .1, P(6)
P(X 2 ::?: c) by the chi-squared distribution, one should = .062, Pm = .04, P(4J = .03, P(31 = .025, P(21 = .023,
have p(l) = .02. To find the minimum sample size satisfy-
np i; ::?: 5q for all h E {I, 2, ... , k}, k ~ 3 ing Yarnolds criterion in preparation for testing this
hypothesis, it first calculates ko, = .2. So m = 7. Be-
ginning the procedure, it finds Pm > .2/6 and P(6l
* P.W. Eaton is Associate Professor, Finance Department,
Northern Illinois University, Dekalb, IL 60115. The author wishes
> .2/5, but Pm < .2/4. So n* = 5/.04 = 125. (Since
to thank the referees for their suggestions for improving the clarity of n*p(iJ < 5 (i = 1,2,3,4), q = .4. Since n'tr, = 2.5,
the exposition. n *P(i) ::?: 5q = 2 for i = I, 2, ... , 10 as required.)

102 The American Statistician. August /978, Vol. 32, No.3

You might also like