Professional Documents
Culture Documents
Indeed there are a lot of statistic can be used for estimation any population’s
parameter, therefore we will discuss the criteria for preferring one estimator than
another and how can obtain the best estimator if it exists.
called θˆ is unbiased estimator for θ iff E ( θˆ ) = θ .If the previous condition valid in
1
Definition (2-3) Relative Efficient Estimator:
MSE (θˆ1 )
Suppose θˆ1 , θˆ2 are two estimators for θ , iff < 1 then it is
MSE (θˆ2 )
concluded that θˆ1 is more efficient than θˆ2 , where MSE refers to Mean Square
Error.If the previous condition valid in large sample size, hence θˆ1 will be asymptotic
of the random sample given θˆ is independent of θ .It can be obvious that the previous
definition can not show how to obtain the sufficient statistic , the following theorem
will overcome this problem.
Definition (2-6) The likelihood function is the joint density function for the whole
sampling taking the following formula:
n
l ( x1 ...x n ;θ ) = ∏ f θ ( xi ;θ )
i =1
sufficient statistic θ .
2
Theorem (2-2): if we can express the density function of the random sample as
follows:
family, then ∑ d (x) is a sufficient for θ . Sufficient statistic for θ is not unique , that
any transformation one to one from sufficient statistic leads to another sufficient
statistic.
So far it is not quite obvious how to prefer between two statistical estimators
that may one unbiased but not efficient or not sufficient, the following theorem can
give some hints for comparison.
Theorem (2-3) Rao – Blackwell: assume θˆ1 be sufficient statistic for θ and θˆ2
( not function in θˆ1 ) be unbiased estimator for θ , if we obtain :
hence L(θˆ1 ) is unbiased estimator for θ , also L(θˆ1 ) is sufficient estimator for θ and
3
Cramer- Rao proposed an inequality that proposes the lower bound for the
variance of unbiased estimator and if the ( UMVUE) exists or not as following :
Assume θˆ is unbiased estimator for θ then :
1
V (θˆ) ≥ (1 − 1)
d
E( ln L( x;θ )) 2
dθ
If the two sides coincided, then θˆ is the best estimator for θ , indeed the
UMVUE for θ always exists until we can express the likelihood function as follows :
d
ln L( x;θ )) = b(θ ){θˆ − θ )}
dθ
From inequality (1-1),we notice the following points:
1. The lower bound for UMVUE can take another formula:
1
V (θˆ) ≥ 2
d
− E( ln L( x;θ ))
d 2θ
2. The denominator of the inequality (1-1) called Fisher Information I( θ ), that is
an index for the size of the information in the sampling corresponding θ .It is
obvious more information leads to more accuracy meaning less variability .
3. If θˆ is unbiased estimator for θ and it is variance coincides with Cramer-
Rao’s inequality, then f(x ; θ ) belongs to exponential family.
2.1.2Interval Estimation
interval likely to include the parameter is given. Thus, confidence intervals are used to
indicate the reliability of an estimate, that is shorter C.I imply better estimator.
4
Typically, ( L ,U) are functions of the point estimator that was preferred for
estimating θ , confidence interval is a range of the values has a high probability of
containing the estimated parameter and it is consider being good if it has a shorter
length.
It is difficult to track who discovered this tool, but Daniel Bernoulil (1700-
1784) was the first who reported about it see (Gelder 2001), the idea for this method it
is requried to give the specified sample high probability to be drawn, so we will
research about the parameters achieve this goal, that is maximized the likelihood for
the specified sample.
5
The likelihood function is the joint density function for the whole random
sampling taking the following formula:
n
l ( x1 ...x n ;θ ) = ∏ f θ ( xi ;θ )
i =1
It takes another formula making the calculation easier and no loss information:
n
l ( x1 ...x n ;θ ) = ∑ ln f θ ( xi ;θ )
i =1
^
The method of maximum likelihood’s idea that estimates θ by finding the value of θ
^
obtaining θ , in many cases, by solving the following equation:
d ln l (θ)
=0 (1 −2)
dθ
The maximum likelihood method can be used for estimated more than one
unknown parameter.It can be shown that it cann’t obtain θˆ by the (1-2) equation, if
the following conditions are not valid (often called regularity conditions):
1. The first and second derivatives of the log-likelihood function must be defined.
2. The range of X’s not depend on the unknown parameter .
3. The Fisher information matrix must not be zero.
According to Gelder (1995),properties of MLE’s are following:
1. The MLE’s estimates are asymptotically unbiased or consistent for the true
parameter.
^
2. The MLE have a powerful property called Invariance that if θ is MLE for θ ,
^
then g (θ ) is MLE for g(θ).
6
The principle of maximum entropy estimation ( POME) approach is a flexible
and powerful tool for a multiple purposes, for instance estimation the distribution’s
parameters , goodness of fit for a testing null hypothesis, derive a probability
distribution according to some conditions …etc. For understanding ( POME) it should
be to discuss the statistical Entropy in the information theory.
1. The quantity H(X) reaches a minimum, equal to zero, when one of the
events is a certainty.
2. If some events have zero probability, they can just as well be left out of
the entropy when we evaluate the uncertainty.
7
The concept for POME that employing to enlarge the parameter space and
maximize the entropy subject to both the parameters and the Lagrange multipliers, an
important notice from enlarging parameter space that method is applicable to any
distribution having any number of parameter.
Subject to
n
∑ p( x ) = 1
i =1
i
∑g
i =1
j ( x ) p ( xi ) = c j j = 1...m
n
dL
= − ln p ( xi ) − λ − µ j ∑ g j ( x) = 0 j = 1..m
dp ( xi ) i =1
n
−λ − µ j ∑ g j ( x )
p′( xi ) = e i =1
j = 1..m
Hence it is quite to know the relation between Lagrange’s coefficient and the
parameters in the distribution of the data, then substituting p′( xi ) in the entropy
8
2.3Hypotheses Testing
A statistical hypothesis tests is a method of making statistical decision using
sampling data, it is consider a key technique of statistical inference, the aim of using
hypotheses tests that if the information in the sample guides us to accept or reject the
doubtful hypothesis called null hypothesis H o .In fact there are two types of
hypotheses testing:
1. Parametric Hypothesis : is considered with one or more constraints imposed upon
the parameters of certain distribution .
2. Non-Parametric Hypothesis : is the statement about the form of the cumulative
distribution function or probability function of the distribution from which sample
is drawn.
Hypotheses testing can be classified as following:
1. Simple Hypothesis: if the statistical hypothesis specifies the probability
distribution completely .
H 0 : f ( x) = f o ( x)
2. Type II Error β : this error can be done when we accept H o although it is wrong,
1− β
the complement of this error called the Power of the Test .
Hence, we need statistical test that keeps the errors of the decision as
minimum as possible, unfortunately in a fixed sample size if one of errors was
9
minimized the other was maximized, so there is a negative relation between the two
errors.
To overcome this problem we can fixed the more serious error- Type I Error-
and searching for the statistical test which has the minimum Type II Error or Most
Powerful Test.
Theorem (2-5) : In the case of simple hypothesis verses simple alternative
hypothesis the most powerful test among all the tests have α or less than can take the
following formula:
Reject H o if λ < k accept H o if λ > k
n n
Where λ = ∏ f o ( xi ) ÷ ∏ f1 ( xi ) and k is a positive constant.
i =1 i =1
The idea that we calculate the ratio between the likelihood function under H o
n n
Where Λ = ∏ f o ( xi ) ÷ ∏ f Ω ( xi ) and c is a positive constant.
i =1 i =1
The idea that we calculate the ratio between the likelihood function under H o
and H1 , f Ω ( xi ) means all sample space for the parameter θ , this ratio is called
typically Generalized Likelihood Ratio.
10
any particular null and alternative hypothesis ( − 2 ln Λ ) has approximately χ 2
distribution with degree of freedom the of tested parameter in the null hypothesis.
Goodness of fit can be classified in to two types, the first based on differences
between the observed frequencies and the expected frequencies the second tests based
on discrepancy between the Empirical Distribution Function (EDF) and the given
distribution function. Typically the applied researches need to test hypothesis related
to the population’s distribution, the test concerned with agreement between the
distribution of the sample’s values and hypothesized distribution is called " test of
goodness of fit". The hypothesis which is related to the goodness of fit :
In 1900 Person was an early testing the drawn sample has a specific
distribution, he proposed the square difference between the observed and the expected
frequencies to overcome the canceling between the positive and the negative
differences, the test statistic can take the following formula:
r
(Oi − E i ) 2
x 2
r −1 =∑
i =1 Ei
where ( O,E) refer to observed and expected frequencies respectively .
The distribution of x 2 r −1 has a complicated form, so it has been proved that
11
r
(Oi − Eˆ i ) 2
x 2 r −k = ∑
i =1 Eˆ i
Although the literature which applied researches tend to prefer person’s test,
steele (2002) represents a subset of alternative Chi-Square types in table ( 2-1).
1. There is a high argument about the number of the classes that must be take in
consideration for guarantee limiting Chi-Square distribution, reviewed in Mann et
al (1942) suggest the following equation :
1
m = 4{2n 2 ÷ Z α } 5
12
, Kupier’s V) , and quadratic tests ( Anderson-Darling and Cramer-von Mises tests)
see Thode (2002).
According to Thode (2002) the empirical distribution function can be defined
as a step function is calculated from the sample always used to estimate the
population’s cumulative function distribution has the following formula:
# observation ≤ x
Fn ( x) =
n
Where Fn (x) are empirical distribution function for sample of size n and F (x) is the
cumulative distribution’s population.
It is concluded that with probability one the convergence of Fn (x) to F (x) is
uniform for all values x, according to Mood (1978) the EDF has a limiting Normal
F ( x)(1 − F ( x )
distribution with F(x) and variance .Hence D = ( Fn ( x) − F ( x)) has
n
F ( x)(1 − F ( x )
Normal distribution with Zero mean and variance , hence Kolmogorov
n
(1933) proposed his test that taken the following formula:
13
simulation for various sample size and different levels of significance. Also Gibbons
(1971) gives limiting distribution of this test for various levels of significance.
1. The K-S test doesn’t require observations to be grouped as in the case of Chi-
Square test, so K-S test uses all the information in the sample.
2. The K-S test can be used for any sample size in contract to Chi-Square test,
because it well known the exact distribution for K-S test.
3. The K-S test is not applicable when parameters have to be estimated from the
sample, in contract of Chi-Square test can be used in this case with reducing the
degree of freedoms as mentioned above.
4. The K-S test is used for continuous distributions but if it is applied with the
discrete distributions, it will be conservative test see Abdel assis (2005).
In this section, it will be in brief shown some famous distributions which will
be used in this thesis.
2.5.1Normal Distribution:
14
−1
1 ( x −µ )2
f ( x) = e 2σ
2
−∞ < x < ∞
σ 2π
Table (2-4) summarizes the more important characteristics for normal distribution.
Mean μ
Median μ
Mode μ
Variance σ2
Skewness 0
Excess 0
kurtosis
Entropy ln(σ 2πe
Moment- σ 2t 2
generating M X (t ) = exp( µt + )
function (mgf) 2
2.5.2Uniform distribution
15
1
a< x<b
f ( x) = b − a
0 otherwise
Table (2-5) summarizes the more important characteristics for uniform distribution.
a+b
Mean
2
Median a+b
2
Mode any value in{
a,b}
Variance (b − a ) 2
12
Skewness 0
Excess kurtosis 6
−
5
Entropy ln(b − a )
Moment- e tb − e ta
generating
function (mgf) t (b − a )
2.5.3Exponential distribution
16
In probability theory and statistics, the exponential distributions are a class of
continuous probability distributions. They describe the times between events in a
Poisson process, indeed exponential distributions can be a special case for Gamma
Distribution, it has a widely application in life models, biology, mechanics, hydrology.
Mean 1
λ
Median ln(2)
λ
Mode 0
Variance 1
λ2
Skewness 2
Excess 6
kurtosis
Entropy
1
1− ln
λ
Moment- t
generating (1 − ) −1
function (mgf) λ
17