Professional Documents
Culture Documents
To cite this article: Robert D. Tortora (1978) A Note on Sample Size Estimation for Multinomial Populations, The American
Statistician, 32:3, 100-102
Taylor & Francis makes every effort to ensure the accuracy of all the information (the Content) contained in the
publications on our platform. However, Taylor & Francis, our agents, and our licensors make no representations or
warranties whatsoever as to the accuracy, completeness, or suitability for any purpose of the Content. Any opinions
and views expressed in this publication are the opinions and views of the authors, and are not the views of or endorsed
by Taylor & Francis. The accuracy of the Content should not be relied upon and should be independently verified with
primary sources of information. Taylor and Francis shall not be liable for any losses, actions, claims, proceedings,
demands, costs, expenses, damages, and other liabilities whatsoever or howsoever caused arising directly or indirectly
in connection with, in relation to or arising out of the use of the Content.
This article may be used for research, teaching, and private study purposes. Any substantial or systematic
reproduction, redistribution, reselling, loan, sub-licensing, systematic supply, or distribution in any form to anyone is
expressly forbidden. Terms & Conditions of access and use can be found at http://www.tandfonline.com/page/terms-
and-conditions
In this section, The American Statistician publishes, in addi- of printed material) to Associate Editor Harry O. Posten,
tion to articles of interest to teachers of statistics, announcements Statistics Department, University of Connecticut, Storrs, Con-
and selected reviews of Teaching Materials of general use to the necticut 06268. A statement of intention that the material will be
statistical field. These may include (but will not necessarily be available to all requestors for a minimum of a two-year period
restricted to) curriculum material, collections of teaching examples should be provided, along with information on the cost (including
or case studies, modular instructional material, transparency sets, postage) and special features of the material. Information on
films. film-strips. video-tapes, probability devices, audio-tapes, classroom experience may also be included. All materials sub-
slides, and data deck sets (with complete documentation). mitted must be of general use for teaching purposes in the area of
Authors, producers, or distributors wishing to have such materials probability and statistics. Articles should sent to the Editor.
announced or reviewed should submit a single copy (three copies
A method is described for determining the sample size required and if this is the primary consideration, then the
for a specified precision simultaneous confidence statement about appropriate sample size is different from that based on
the parameters of a multinomial population. The method is based
the consideration of only one blood group at a time.
on a simultaneous confidence interval procedure due to Goodman,
and the results are compared with those obtained by separately In this note, a procedure is given for determining
considering each cell of the multinomial population as a binomial. the sample size required for a simultaneous confidence
KEY WORDS: Multinomial populations; Sample size estimation;
statement of specified precision about the parameters
Simultaneous confidence intervals. of a multinomial population. The situation considered
is that of a simple random sample from a large
population where each population unit is classified into
1. Introduction
one of k mutually exclusive categories. For other sam-
ple designs, the design effects must be considered.
The many variables measured in a survey and the
The procedure is based on a simultaneous confidence
multiple uses of survey data make the determination
interval procedure due to Goodman (1965), and the
of a reasonable sample size to use for a sample
resulting sample sizes are compared with those that
survey an onerous task for the statistician. Fre-
would be obtained by considering a confidence interval
quently the problem is restricted to the investigation
for a binomial parameter.
of the effect of the sample size on the estimates as-
sociated with one or two key variables. For a classifica-
tion variable, Cochran (1963, p. 71) gives an illustra-
2. Methodology
tion where an anthropologist studying the inhabitants
of an island is interested in taking a sample survey to
Consider a population of units divided into k mu-
estimate the percentage of the inhabitants that belong
tually exclusive and exhaustive categories. Let IIi,
to blood group O. The observations to be obtained are
i = I, . . . , k , be the proportion of the population in
treated as a simple random sample of Bernoulli trials,
the ith category, and let n io i = I, . . . , k, be the
and the sample-size effect on the estimate is measured
frequency observed in the ith category in a simple
in terms of the standard error of the estimate or in
random sample of size n from the population.
terms of a confidence interval width. This is a practical
For a specified value of ex, we wish to obtain a set of
and accepted approach to the determination ofa sample
intervals S io i = I, ... , k, such that
size; however, suppose the anthropologist specifically
k
wants to make a joint statement about the distribution
of all blood types. That is, he asks how large a
Pr{ n (IIi E SJ} 2':: I - ex;
i=1
sample he should take to be reasonably confident that
that is, we require the probability that every interval
the observed percentages of each blood type, A, B, 0,
S, contains IIi to be at least I - ex. Goodman (1965)
and AB, will all be within a specified range of the true
gives the approximate large-sample confidence interval
percentages. This is clearly a simultaneous statement,
bounds (when n --'> (0) as
Of course a sample size calculation including the fpc The following tabulation gives values ointn' for various
can be computed as above. values of k and a for the same precision:
Here again, one should make k calculations, one for k
each pair (b;',IIJ, i = I, ... , k. The largest n com-
puted is selected as the desired sample size. As 4 5 10
IIi ~ a or hi' ~ 0, the sample size increases according
.1 1.71 1.84 2.04 2.44
to (2.10). If h/ = hi for all i , then the largest sample .05 1.53 1.66 1.73 2.05
size is n = B(l - II)/Ilb'2, where II = min(II!, ... ,Ilk)'
We illustrate this approach with a numerical ex- Suppose we are to estimate proportions associated
ample. Suppose the anthropologist wishes to estimate with a four-parameter multinomial distribution. Then the
the proportion of inhabitants on an island having tabulation says that if the binomial estimate of sample
blood types A, 0, B, and AB. The anthropologist size is lOa, the multinomial estimate would be 166 for
KEY WORDS: Minimum sample size; Goodness-of-fit; Yarnold's with samples that have already been drawn, but stu-
criterion; Multinomial distribution. dents have considerable difficulty in determining how
large samples yet to be drawn should be to satisfy this
Over the years various rules of thumb have been rule. They tend, especially in elementary courses, to
proposed for determining the adequacy of the chi- adopt a trial-and-error approach and, because of the
squared distribution as an approximation to the nature of this approach, to waste considerable time in
distribution of settling upon a sample size, and then to select a size
which is far too large.
X2 = i (r" - np"F For this reason a systematic procedure is needed
h~l np" which would select in a few steps the minimum
sample size n* consistent with Yarnolds criterion. I
where {rd is distributed as (n!/Ilr"!)Ilp,,T/'(O ~ r" ~ n;
propose the following procedure.
I 1"" = n; 0 < p" < I; I p" = I; n = 1,2, ... ). These Suppose the p" are ordered from largest to smallest:
rules have been formulated with safety as the dominant
P<kJ ::?: ::?: Pm ::?: ::?: Pin- Let m denote the max-
consideration and emphasize simplicity at the expense
of sensitivity. They have usually been expressed imum i such that Pen < kPll)' (Since Pm ~ 11k, such
an i exists.) Define kPm/O to be +00 and let Cw
solely in terms of sample size (n must be at least 20,
= kPm/(i - 1).
10,5, etc.), but occasionally the Pi values themselves
have been incorporated (e.g., np" ::?: 5 for all h), or the Procedure: With the p" ordered P(k) ::?: ::?: Pw
number of r,,-expectations falling below some standard Pllb begin with i = m and successively com-
::?: . ::?:
(e.g., one expectation may be as low as .5 provided the pare Pm with Cm until Pw ~ Cm is satisfied. Let s be the
remainder exceed 5) has been used. A difficulty with first such i. Then (except for rounding upwards where
these rules is that, on the one hand, they appear overly necessary),
conservative, while on the other hand, they ask the
n* = 5lcc.'+1l if PC.,) is less than PC,Hll, CC.'+1h and CC.,);
user to believe that the richness and complexity of the
multinomial can be adequately allowed for in so simple = 5/pc.,) otherwise.
a fashion.
Yarnold (1970), after detailed study of the problem,
proposed a criterion more closely matched to the nature Illustration 1: The marketing department of a maga-
of the multinomial. It applies when none of the p" need zine feels that the proportions of advertising com-
be estimated, and has received some attention in the panies falling into each of ten mutually exclusive and
literature (Kendall and Stuart 1973, p. 457). Yarnolds exhaustive categories (e.g., ten asset classes) are as
criterion states that for satisfactory approximation of follows: POOl = .3, p(9) = .2, p(S) = .2, Pm = .1, P(6)
P(X 2 ::?: c) by the chi-squared distribution, one should = .062, Pm = .04, P(4J = .03, P(31 = .025, P(21 = .023,
have p(l) = .02. To find the minimum sample size satisfy-
np i; ::?: 5q for all h E {I, 2, ... , k}, k ~ 3 ing Yarnolds criterion in preparation for testing this
hypothesis, it first calculates ko, = .2. So m = 7. Be-
ginning the procedure, it finds Pm > .2/6 and P(6l
* P.W. Eaton is Associate Professor, Finance Department,
Northern Illinois University, Dekalb, IL 60115. The author wishes
> .2/5, but Pm < .2/4. So n* = 5/.04 = 125. (Since
to thank the referees for their suggestions for improving the clarity of n*p(iJ < 5 (i = 1,2,3,4), q = .4. Since n'tr, = 2.5,
the exposition. n *P(i) ::?: 5q = 2 for i = I, 2, ... , 10 as required.)