You are on page 1of 6

Proceedings of the Twenty-sixth (2016) International Ocean and Polar Engineering Conference www.isope.

org
Rhodes, Greece, June 26-July 1, 2016
Copyright 2016 by the International Society of Offshore and Polar Engineers (ISOPE)
ISBN 978-1-880653-88-3; ISSN 1098-6189

Sample Size Determination and Confidence Interval Derivation for Exponential Distribution

Fuqing Yuan1*, Jinmei Lu1 and Bjrn Morten Batalden 1

1
Department of Engineering and Safety,
University of Troms, Troms, Norway

ABSTRACT Given failure data, the MLE method is the common method for
parameter estimation. For MLE estimator, in general, it has been
Exponential distribution is the simplest and the most common used known: MLE estimator has good consistence. When the sample size
distribution for risk and reliability data analysis. This paper investigates becomes large, the estimator can asymptotically reach the true
the distribution of the estimator from maximum likelihood estimation parameters. However, in terms of biasness, the maximum likelihood
(MLE). This estimator distribution shows the MLE estimator biased method does not have good performance. The estimator from MLE is
and it should be corrected for small data size problem. The derived biased (Kendall et al., 2004). The biasness implies for small data size
distribution is applied to determine the optimal sample size for situation, the MLE tends to deviate the true parameters. Using this
experiment design, and it is applied to derive confidence interval. This estimator could have high risk of making wrong decision. The biasness
confidence interval is compared with confidence interval derived from problem should therefore be concerned in risk and reliability
Fisher information matrix and the likelihood ration method. engineering. Figure 1 demonstrates how the estimator distributed when
the sample size is only 5, where we can find how the risk could be if
KEY WORDS: Exponential Distribution; Sample Size; Confidence one choose one of them. The estimator is dispersed around the true
Interval; Estimator Distribution. parameter (dashed line) widely.

NOMENCLATURE & NOTATION 1400

Failure rate 1200


Sample size
Time to failure 1000
Variance
PDF function 800
Count

Estimator of failure rate


Likelihood function 600

Gamma distribution
Inverse Gamma distribution 400

PDF Probability density function


CDF Cumulative density function 200

INTRODUCTION 0
0 5 10 15 20 25

In risk and reliability analysis, the exponential is an important Failure Rate


distribution to model failure data (Meeker and Escobar, 1998, Yang
and Sirvanci, 1977). This simple distribution model with constant Figure 1. Estimator Distribution
failure rate has been widely applied in life test and the spare part
management (Barlow et al., 1996). The life of electronic system or This biasness problem of the MLE estimator has been ignored by a lot
some complex engineering system tends to follow exponential of practitioner in risk and reliability. In spite of some publications tries
distribution. Moreover, the parameter estimation is also simpler to develop method to correct the biasness, most applications in state of
compared with other distribution. art still uses the MLE for small sample size problem. The application is
not wrong, but it is not reasonable.

664
Exponential distribution is the simplest distribution. The exact expectation of estimator is
distribution of MLE estimator can be derived. The derived estimator
distribution can be used to find the mean value of the estimator and find
(8)
the exact variance. The other distribution for example Weibull
distribution, one have to use simulation or use the approximate method
to estimate the variance. This paper derives the exact MLE estimator From (8), it readily finds the maximum likelihood for exponential is
distribution of exponential distribution. This exact distribution is used biased. For small sample size, the estimator can be corrected by
to determine the sample size for experiment design and is used to multiplying a correction factor ,i.e.
correct the MLE estimator. Later on this exact distribution is used to
find a confidence interval for the estimator.
(9)
EXACT DISTRIBUITON OF EXPONENTIAL ESTIMATOR
Similarly, the variance of estimator can be derived.
In spite of that the MLE estimator is biased, however, if the expectation
of the estimator can be known, or the distribution of the estimator can
be known, the biasness can be corrected. Unfortunately, in reliability
engineering, the MLE for most distributions dont have explicit
expression. The expectation of estimator is not able to known. But
fortunately, for the exponential distribution, distribution of the MLE
estimator can be obtained. The estimator distribution can be obtained
by Gamma distribution. Some publications on statistics have obtained (10)
the similar results by using the moment generating method considering
the censoring situation (Childs et al., 2012, Cheng et al., 2013, Gupta The Crow-Rao lower bound for the Exponential is
and Kundu, 2007). This paper considers the complete data. The
probability density function of Gamma distribution is
(11)

; (1)
The . It can find the CR lower bound is lower
than the true lower bound of the . The CR variance is derived from
Samples drawn from Gamma distribution are The Fisher information matrix.
summation of the random variables is still Gamma distribution as
The true distribution of the maximum likelihood estimator can be
(2) derived from the distribution of in (2). This paper omits the detailed
procedure. The procedure can find in the textbook of probability and
It is straightforward to find the exponential is a special case of Gamma statistics (Rohatgi, 1975). The estimator distribution is
distribution with . Therefore, the summation of random variables
from Exponential distribution is then
(12)
(3)
The CDF of (12) is not easy to compute. One can instead use (2) to get
The probability density function is the CDF for (12) indirectly.

; (4) SAMPLE SIZE DETERMINATION

The equation (12) shows the distribution of the estimator. This


The maximum likelihood estimator of exponential distribution for distribution can be used to determine the number of samples in
observed samples is experiment design, when the life of the product has been known
following exponential distribution. The (12) depends on a unknown
(5) parameter . If the criteria to choose the sample size is based

The expectation of is (13)

The sample size problem can be determined with the requirement of


(6) knowing the . The probability density function of can be derived
The (6) can be rewritten as as

(7) (14)

The (14) has an unknown parameter n, this is the sample size, i.e. the
The follows Gamma distribution in the form of
(14) shows the distribution of for n. Figure 2 shows the
against .
. It is thus . Therefore, the

665
0,06
8

0,05

n=5
6 n=10
0,04 X(L)<P<X(U)
n=20
n=50
Gamma PDF

n=400

PDF
0,03
4

0,02

2
0,01

0,00
0 0 20 40 60 80 100 120
0,0 0,5 1,0 1,5 2,0 2,5 3,0
ns
Efficiency Figure 3. Distribution of ns
Figure 2. Distribution of Estimator Ratio
This procedure requires iteration computation. Table 1 tabulates the
Alternatively, for convenience of computation, one can choose criteria minimum required sample size for various relative error. In spite that
as method requires this iteration computation, the advantage is evident. It
is accurate without requiring any asymptotical distribution, and it does
(15) not requires knowing parameter value.

Table 1. Minimal sample size at 95% level


It is can be deduced from (12)
No. Relative Error
; (16) 1 5% 1.05 0.95 1541
2 10% 1.11 0.91 385
The (16) is still Gamma distribution with and k=n in the Gamma 3 20% 1.25 0.833 101
distribution (1). It is more readily for computation than (12). 4 50% 2 0.67 21

Suppose the relative error tolerated is , the . If the CONFIDENCE INTERVAL DERIVATION
sample size used is , using the Formula (16), the probability of the test
failure rate near the true value is the Gamma distribution The distribution of the estimator can be applied to find confidence
, as shown in the shadowed area in Figure 3. interval. In general, the confidence interval derivation in statistics is
using the asymptotically normality of the estimator, i.e. according to
central limit theorem, the distribution of estimator is normal
distribution when sample size is infinite. The Fisher information matrix
method and the likelihood ratio method are based on this. It is evident
that this method requires the sample size is sufficiently large. Different
(17)
to the Fisher and likelihood ratio method, this section derives a
confident interval from the estimator distribution.
Given a certain confidence level, e.g. 95%, start from n=1, we calculate
the (17) and obtains the and . If the and does not Estimator Distribution Based Method
reach the requirement, we increase the n and repeat the procedure until
it meets it. Since the exact distribution of the estimator has been know as (12), the
confidence interval of the estimator is readily to be derived. Instead of
using (12), this paper uses the (2) to derive confidence interval. Since
the failure is the inverse of the in (2), we firstly find the interval for
. It is known follows Gamma distribution . The can use the
maximum likelihood estimator

, (18)

Or use the alternative baseness corrected estimator in (9). A confidence


interval with level is then

(19)

666
The usage of the estimator as the true parameter is risky. The estimator divided by the value of the likelihood value of true parameter
performance of the confidence interval relies how close the estimator to follows . More detail can be found in (Lehmann and Romano,
the true parameter. This is a disadvantage of this method. 2005). This ratio can also be used to derive confidence interval, as
some commercial software uses. The likelihood ratio method is claimed
Fisher Information Matrix Method to be more suitable to the small sample size than the Fisher method. Let
denote the likelihood function. For exponential distribution, it has
The variance of estimator can be obtained from the Fisher information
matrix (Lehmann and Casella, 1998). This method is general and not
exponential distribution specific. The likelihood function of ln (27)
exponential distribution is:
The . The (27) can be
-ln -ln rewritten as
ln (20)
(28)
The second derivative of the log likelihood function is
It is a nonlinear equation. For the function
-ln
(21)
, (29)
Thus the variance is
It is easy to find the (29) has only one maximum value, as for the
-ln exponential distribution, the likelihood function has only a
(22)
maximum value. This maximum is the MLE estimator . Thus this is
According the central limit theorem, when the sample size is infinite, global maximum of (29). This property can facilitate finding the upper
the estimator follows normal distribution as and lower bound of the . Figure 4 shows the for an exponential
distribution. Figure 4 shows the has two solutions to . A
(23) confident interval of can derived by letting the lower value of the
solution as the lower bound, the higher solution as the higher bound.
The disadvantage of the method requires the numerical method to find
The follows standard normal distribution. Let is the value of
the roots of (28). The numerical method will have difficult when the
sample size is big. The and would be extreme big or small when
the standard normal distribution at percentage . The confidence
interval of a can be derived as n is large.

1,2e-7

(24) 1,0e-7

8,0e-8

6,0e-8

The lower bound of (23) is possibly to be negative. This negativity is


Likelihood

4,0e-8
not realistic. In practice, one can use ln instead of to estimate
interval (Meeker and Escobar, 1998). By the Delta method, the
2,0e-8

variance of ln is 0,0

-2,0e-8

Var ln (25) -4,0e-8


0,0 0,2 0,4 0,6 0,8 1,0 1,2

Lambda

The . A confidence interval of can be derived Figure 4. Numerical plot of likelihood function
as
SIMULATION STUDY
The Fisher based and the likelihood based methods are the most two
(26) common confidence intervals in the applications. The estimator
distribution based method this paper derived is rarely found
applications or to be discussed in state of art. Comparing these three
methods, the estimator distribution method is lest relying on the
Similar to the Estimator distribution method, this method also uses the normality assumption. This simulation study is to find the performance
estimator as the true parameters. of the three methods.

Likelihood Ratio Method The simulation chooses three exponential distributions: they are bigger
1, equal 1 and less than 1. The sample size is chooses from 5 to 500.
When the sample size is infinite, the likelihood function value of any The choosing of more sample is not necessary as the performance will

667
approach similar when the data size is sufficiently large. For each distribution method shows the worst performance. When the sample
sample size, total 50,000 iterations has been run. size increases, the performance improves. The performances are similar
when the data size is 500. The likelihood ratio method shows best
The first simulation is to test the biasness correction (8). When the true performance. It has higher percentage over all two methods when the
parameter is 1, out of the 50,000 iterations for sample size 5, the mean data size is extreme small. When the data size is big, the three methods
of the adjusted estimator is 1.0051. The unadjusted estimator is 1.2564. show almost equal performance. The Fisher method is very stable, in
The adjusted estimator shows significant outperformance over the spite of the performance for extreme small data size not able to
original MLE estimator. Figure 5 shows the distribution of the two outperform likelihood ratio method. Some cases, the likelihood method
estimators, where we can see the distribution of the estimator has been is not working, the reason is the computer is overthrown. The or
shifted to left proportionally for the adjusted estimator. For the extreme is too big for the numerical software to handle.
small sample size, the biasness correction is very necessary.
When the estimator distribution uses the true parameter in the (19), the
estimator distribution method can almost reach 1,i.e. it can always
cover the true parameter. The estimator method relies too heavily on
the closeness to the true parameter. When the sample size is big, the
estimator can close to the true parameter, therefore the confident
interval becomes better. When the estimator deviates from the true
parameter largely, the method shows poor performance.

The next simulation uses the unbiased estimator for the estimator
distribution method. Simulation only runs for the small data size, as for
the large size, the biased estimator works well. The results shows in
Table 3, it shows the confidence interval works well. It is much better
than the performance using biased the MLE estimator. The
performance of the unbiased estimator has approach the performance of
the likelihood ratio method.

Figure 5. Adjusted Estimator vs the original Table 3. CI using Biasness Correction Estimator (0.95 confidence
level)
Exponential Size Estimator Unbiased
For the different confidence intervals, the performance is defined as:
Distribution Estimator Dis.
the percentage of this interval can cover the true parameter out of the
50,000 iterations. This iteration number is large enough. More iteration 5 0.8995 0.9451
2
will shall similar results. The simulation results shows in the Table 1. 20 0.9356 0.9494
5 0.8958 0.9434
1
Table 2. CI Simulation Results (0.95 confidence level) 20 0.9356 0.9470
Exponential Size Estimator Lik Ratio Fisher 5 0.9009 0.9476
0.1
Distribution Method 20 0.9378 0.9513
5 0.8963 * 0.9310
10 0.9231 0.9493 0.9429 The simulation shows for the small data size, using the biased
20 0.9360 0.9479 0.9437 correction estimator is very necessary both for the point estimation and
2 for the confidence interval derivation. In reliability and risk data
50 0.9442 0.9487 0.9446
analysis, when the data size is small, the MLE should be corrected to be
100 0.9470 0.9495 0.9490
unbiased, otherwise, it has high risk of making wrong decision.
500 0.9492 0.9505 0.9500
5 0.8989 0.9454 0.9316 CONCLUSION
10 0.9234 0.9422 0.9504
20 0.9374 0.9500 0.9468 The usage of the exact distribution can help to define the optimal
1
50 0.9444 0.9489 0.9477 sample size for experiment design. However, the confidence interval
100 0.9470 0.9493 0.9494 derived from the estimator distribution is not able to outperform the
500 0.9492 * 0.9493 Fisher information matrix method and the likelihood ratio method as
5 0.9006 0.9485 0.9339 the simulation implies, when the original maximum likelihood
10 0.9235 0.9464 0.9402 estimator used. The confidence interval has been significantly
20 0.9363 0.9446 0.9451 improved by using the biasness corrected estimator. The paper shows
0.1 the importance of the biasness correction for small sample size
50 0.9449 0.9458 0.9490
100 0.9474 0.9477 0.9494 problem. The usage of biased maximum likelihood estimator has high
500 0.9493 1 0.9494 risk to lead to wrong decision in risk and reliability data analysis.
1. * denotes for this method, the computer running this simulation is
overflown. ACKNOWLEGMENT
2. The last number in the number is italicized, as it differs when a new
simulation run. The random in this number is due to precision The authors would like to thank professor Gilberto to reconsider this
limitation of the simulation software. paper and review this paper. Special thanks own to the lab staffs of
department of engineering from university of Troms to motivate this
As shown in Table 2, for extreme small sample size, the estimator paper.

668
REFERENCES Lehmann, EL. & Casella, G. (1998). Theory of point estimation. 2nd
ed. Series: Springer texts in statistics. New York: Springer.
Barlow, RE., Proschan, F. & Hunter, LC. (1996). Mathematical theory Lehmann, EL. & Romano, JP. (2005). Testing statistical hypotheses.
of reliability. Series: Classics in applied mathematics, vol. 17. 3rd ed. Series: Springer texts in statistics. New York: Springer.
Philadelphia: SIAM. Meeker, WQ. & Escobar, LA. (1998). Statistical methods for reliability
Cheng, CH., Chen, JY. & Bai, JM. (2013). Exact inferences of the two- data. Series: Wiley series in probability and statistics Applied
parameter exponential distribution and Pareto distribution with probability and statistics section. New York: Wiley.
censored data. Journal of Applied Statistics, 40(7), pp. 1464-1479. Rohatgi, VK. (1975). An introduction to probability theory and
Childs, A., Balakrishnan, N. & Chandrasekar, B. 2012. Exact mathematical statistics. Series: Wiley series in probability and
distribution of the MLEs of the parameters and of the quantiles of mathematical statistics. New York: Wiley.
two-parameter exponential distribution under hybrid censoring. Yand, G. & Sirvanci, M. (1977). Estimation of a Time-Truncated
Statistics, 46(4), pp. 441-458. Exponential Parameter Used in Life Testing. Journal of the American
Gupta, RD. & Kundu, D. (2007). Generalized exponential distribution: Statistical Association, 72(358), pp. 444-447. doi: Doi
Existing results and some recent developments. Journal of Statistical 10.2307/2286816.
Planning and Inference, 137(11), pp. 3537-3547.
Kendall, MG., O'Hagan, A. & Forster, J. (2004). Kendall's advanced
theory of statistics. 2nd ed. Series: Kendall's library of statistics.
London: Arnold.

669