The Sampling Formula n = _ N__

Ne2+ 1

By Eddie Seva See, Mary Ann Musni See

There are five different formulae presented and discussed in research and statistics
textbooks. The most frequent formula of which is the formula that estimates population mean with
infinite population , which is given below

n = z2 2 where n is sample size, z = standard score based on an assumed confidence level

e2 = standard deviation, and e = margin of error (in ratio/interval measure)

(Pegula, 2008; Lind, et.al___; Berenson,, 2006; Weirs, 2005; Moore, 2004; Larson &Faber, 2003;
McClave&Sincich, 2003; Moore,, 2003; Davis,, 2002; Pelosi,, 2001; Triola, 2001; Cooper
&Schindler,1998; Healey, 1996; Sincich, 1996; Anderson,,, 1996;Hamilton, 1996; Triola, 1995;
Mendenhall &Reinmuth, 1982; Hertzman& Mueller, 1980)

This is followed by the formula that estimates population proportion with infinite

n = z2PQ where n is sample size, z = standard score based on an assumed confidence level
e2 P is assumed proportion in decimal, Q =1-P, , and e = margin of error (in decimal)

(Berenson,, 2006; Weirs, 2005;Larson& Faber, 2003; McClave&Sincich, 2003;Duckworth, 2003;
Davis,, 2002; Pelosi,, 2001; Anderson,, 1996; Sincich, 1996;Bordens& Abbot, 1996;
OSullivan &Rassel, 1995; Mendenhall &Reinmuth, 1982)

Third is the formula for estimating population mean with finite population size given below.

n = __N 2_
(N-1) e2+ 2
where N = Population size; 2 = Variance of the variable being measured; e = Margin of error in
terms of the value of the variable being measured; z = Standard score based on an assumed
confidence level

(Lind,; Berenson., 2006; Weirs, 2005; Newbold,, 2003; Davis,, 2002;
Anderson,,1996; Madsen &Moeschberger, 1983; Mendenhall &Reinmuth, 1982; Neter,, 1966)

Fourth is the formula for estimating population proportion with finite population size
given as follow.

n = __NPQ__
(N-1) e2 + PQ
where P= proportion between two variables of nominal measure in decimal form; Q = 1-P
e = margin of error in decimal form; z = standard score based on an assumed confidence level

(Weirs, 2005; Newbold,, 2003; Davis,, 2002; Anderson,, 1996; Mendenhall &Reinmuth,

The least frequent (fifth) among the books is the formula that requires only population size and
margin of error.

n= N__
Ne2+ 1

(Nuque & Feliciano, 1984; Mendehall&Reinmuth, 1982; and Pagoso,, 1978)

The sampling formula for estimating population means, while most frequently discussed in the
statistics books is seldom used in student researches. Perhaps this seeming unattractiveness of the method
is due to therequired variance, 2, which, after all, is one of the unknown variables sought to be measured
by the research. For studies involving new populations, this technique appears to be absurd. The
formulaewould be most useful when probing populations that have already been studied in the past.

In many sampling computations involving the same population but measured for different variables
that cover both ratio/interval and categorical measures, the formula that needs only population size and
margin of error is the most popular, This could be credited to the very few mathematical steps involved in
its calculation and only population size is required as an input parameter. It is also unexpected that a
formula rarely available in statistics and research textbooks would be so popular among research students.
In the same manner, it is also startling that a formula that is widely known to and used by researchers
would not be taught in books designed for future researchers.

n= N__
Ne2+ 1

Provided by Yamane in 1967 (Kasiuleviciusi,,, 2006), many researchers wonder where this
formula came from. Unknown to many, it is derived from that one that estimates population proportion
with finite population.

n = ____NPQ____
(N-1)e2 + PQ

How P, Q, and z became invisible in the formula is discussed in the following derivative steps.

n = ___NPQ____
(N-1)e2 + PQ

Population size (P), Q, and z disappeared because they were replaced by actual values. Population
size, P is assumed to be 0.5, which automatically results to a Q value of 0.5 since Q = 1-P = 1-0.5 = 0.5.
The number 0.5 is the P value that yields the highest possible
sample size, as determined by this author and. as explained by Madsen &Moeschberger (1983, p. 314),

The quantity of PQ always lies between 0 and 0.25. It assumes a maximum value of
0.25 when P = 0.5. Consequently the largest value of n is at this value. To be on the safe side, we can use
this large value.

The standard score (z) of 2 arises from a confidence level of 95.44 per cent. How this was
determined is presented in the following process-set the confidence level at 95.44% or 0.9544, from this
value, define the level of significance, 1-0.9544= 0.0456, the matter is two-tailed, so divide 0.0456 by 2 ,
resulting to 0.0228, subtract this value from 0.500; 0.500-0.0228 = 0.4772, locate the z value of 0.4772
from the z table.The critical value, z = 2. How the minus 1 in N-1 disappeared is explained below

When N is very large the sample size formula reduces 1 to zero...(Mendenhall

&Reinmuth, 1982, p 72)

Substituting the assumed actual values to P, Q, and z and removing 1 thus translates
our equation to the following.

n = __ NPQ______ = __N(0,5)(0.5)__
(N-1) e2 + PQ N e2 + (0.5)(0.5)
z2 22

n =__0.25N__ = _____0.25N___ =__(4)0.256N

Ne2+ 0.25 Ne2 + (4)0.25 Ne2+ 1
4 4

n = __N__
Ne2+ 1

While the formula seems to depend only on the known size of the population and the
assumedmargin of error in decimal, its proof of derivation shows that it presupposes the following: a very
largepopulation size (N), a confidence level of 95.44 per cent, and a proportion (P) of 0.5. It is originally
designed for research variables with categorical measure. Using this derivative formula, a maximum
sample size each will result for any given margin of error, no matter how big the population is.

It must also be noted that when the size of N does not make 1 negligible (or N is not very large),
the derivative formula is

n= N_____
(N-1)e2+ 1


