You are on page 1of 5

688 IEEE TRANSACTIONS O N INFORMATION THEORY, VOL. 35, NO. 3.

MAY 1989

Entropy Expressions and Their Estimators and Z positive definite. The entropy, denoted by H(NORM), is
for Multivariate Distributions given by
P P 1
NABIL ALI AHMED AND D. V. GOKHALE. MEMBER, IEEE H(N0RM) = - - ln(2n) + - In 121.
2 2 2
+ (1.1)
Ahstruet -Entropy expressions for several continuous multivariate dis- Let H(LN0RM) be the entropy of a multivariate lognormal
tributions are derived. Point estimation of entropy for the multinormal distribution given by
distribution and for the distribution of order statistics from Weinman's
exponential distribution is considered. The asymptotic distribution of the
uniformly minimum variance unbiased estimator for multinormal entropy
is obtained. Simulation results on convergence of the means and variances
of these estimators are provided.
where
INTRODUCTION
The entropy of a random variable X having density f(x) In=(In x1,. . . ,In x P ) ' x = ( x 1,. . ,x,,)'.
absolutely continuous with respect to the Lebesgue measure on
RJ' is given by: Then
P
H ( LNORM) = H( NORM) + p, . (14
r=l

where
Let H(L0GST) denote the entropy of a multivariate logistic
x=( X I , x2; . ,X J ' .
'
distribution given by
In receni years the concept of entropy has been used increasingly
in inferential statistics. Vasicek [12] constructed an entropy esti-
mate and a test of goodness of fit of univariate normality based
on that estimate. Dudewicz and Van der Meulen [3] proposed a
goodness-of-fit test of the uniform distribution based on maxi- r=l
mum entropy criteria. Gokhale [4] proposed a general form of a
goodness-of-fit test statistic for families of maximum entropy
distributions; particular cases of such families are normal, expo-
nential, double-exponential, gamma, and beta distributions.
Lazo and Rathie [7] derived and tabulated entropies for vari- - CO < x , < + OO,pr > 0 , - 00 < p, < + 0 0 .
ous univariate continuous probability distributions. This corre- Then
spondence extends their table and results to the entropy of
P
several families of multivariate distributions (Section I). Section
I1 deals with point estimation for the multivariate normal and the H(LOGST) = C In(p,)-ln(p!)+(p+l)A(p) (1.3)
I =1
distribution of the order statistics from the multivariate exponen-
tial distribution proposed by Weinman [13]. Two types of estima- where
tors are considered. One, denoted by H,(f), is a consistent
estimator and the second, denoted by H,(f),is the uniformly
minimum variance unbiased estimator. Section I11 deals with the
asymptotic distribution of the entropy estimators H*(f)and I 1 9 forp =l.
H, (f)for the multivariate normal distribution. To compare their
performances, means and variances obtained by Monte Carlo
simulation are tabulated. Let H(PARET0) denote the entropy of the multivariate Pareto
I. PARAMETRICENTROPY distribution of Type I1 (Arnold, [2]) given by
The expressions of entropy derived in this section often involve -(a+,)
digamma and trigamma functions, + ( a )= ( d / d a )In ('I a ) and a+i-1
+'(a) = (d/da)+(a),respectively (Magnus, [SI), and Euler's con- r=l
stant y = 0.5772156649.
The density function of the multivariate normal distribution is for x , > p , , i = 1,2,. . . , p .
given by
Then

H( PARETO)

x E R P , p E RP
r=l

Manuscript received July 3, 1987; revised July 22, 1988. This correspon-
dence was presented in part at the International Symposium on Information
and Coding Theory, University of Campinas, Campinas, Brazil. July 1987.
+
n(a+i-1)
1 1( a + i - 1 )
1
N. Ali Ahmed is with Bell Communications Research, 3 Corporate Place,
Piscataway. NJ 08854. USA.
D. V. Gokhale is with the Department of Statistics. University of California.
Riverside, CA 92521, USA.
[ I =1 (u+i)-112[ I =1 (a+i)-2]...[ I =1 (a+i)-p] '

IEEE Log Number 8927888.

001S-9448/89/0500-0688$01 .OO 01989 IEEE

Authorized licensed use limited to: UNIVERSITY OF SUSSEX. Downloaded on October 13, 2008 at 11:45 from IEEE Xplore. Restrictions apply.
IEEE TRANSACTIONSON INFORMATION THEORY, VOL. 35, NO. 3, MAY 1989 689

Let H(MEXP) denote the entropy of the multivariate exponen-


tial probability density (Amold, [2]) given by

/(I)= fI(a+i-l)(e.l + . . . + e'.-p+ l)-(~+p){ e..).


I =1 I =1

for x , > 0, i = 1,2; . . ,p .


Then
Using the fact that ( y - y- 1),i = 1; . ., n has an exponential
H ( MEXP) distribution with parameter p , /( p + 1 - i), the entropy
P H(0MEXP) of g( yl, y2,. . ., yp) is given by
=- In(a+i-1)+pIn(a)
i=l

- Pa

For the multivariate exponential distribution (Weinman, [13]),


the joint density function of XI, X,; . ., Xp over all the p!
regions of the sample space is give by

is the entropy of the univariate exponential distribution with


scale parameter p , ;

11. PARAMETRIC ENTROPY ESTIMATORS


In this section we consider two point estimators for the para-
metric entropy; the first is a consistent estimator denoted by
IT,.(/),and the second is the uniformly minimum variance unbi-
ased estimator (UMVUE) denoted by H*( /).
Consistent estimators for the parametric entropy of all the
multivariate distributions, considered in the previous section, can
be formed by replacing the parameters with their consistent
estimators. Consistent estimators for the parameters of the multi-
variate normal, lognormal, logistic, and Pareto are well-known
(see, for example, Rao [ll], Press [lo], Johnson and Kotz [6],
1
+ . . . +-(xl-xx,)
Pp-1 1 ,

(1.6)
Arnold [2] for the respective families of distribution), and those
for the distributions studied by Weinman [13] are given in his
reference. Hence for the sake of brevity we will not pursue this
topic here in more detail.
We first consider the UMVUE for H(N0RM).
where Definition I: Let V :( p x p ) be symmetric and positive defi-
1 nite. The random matrix V is said to follow the nonsingular
C= p-dimensional Wishart distribution with scale matrix E, and n
POP1 ' . .P p - 1 degrees of freedom, n 2 p , if the joint distribution of the distinct
elements of V is continuous with the density function of V given
The entropy H(EXPW) for the foregoing probability density by
function is

2 , V are positive definite

Let XI; . ., X p be the i.i.d. failure times from the multivariate and p ( V ) = 0, otherwise, where c is a numerical constant de-
exponential distribution due to Weinman [13] and Y1; . . , Yp be fined as
the corresponding order statistics. The joint density of the order
statistics of the multivariate exponential distribution (Weinman
[13]) consisting of the joint density of the random variables { y }

Authorized licensed use limited to: UNIVERSITY OF SUSSEX. Downloaded on October 13, 2008 at 11:45 from IEEE Xplore. Restrictions apply.
690 IEEE TRANSACTIONS ON INFORMATION THEORY. VOL. 35, NO. 3. MAY 1989

with We can express H,(NORM) in terms of H,.(NORM) as fol-


lows:
H,(NORM) = H,.(NORM)

Theorem I : For X , : p X l , let X,,X2;..,X2;..,Xn be mu- The following lemma shows that the limit of the bias terms in
tually independent; let X, be normally distributed with parame- H,.(NORM) is equal to zero. Note that from (2.7) the bias B,,
ters 0 and Z, B>O, and let X = ( X , , X 2 ; . . , X , , ) ' , X : p x n , eauals
n 2 p. Define V = X X ' ; then V > 0 and V will have a Wishart
distribution with parameters Z, p , and n.
Proof: See Press [lo].
Lemma 2: We have B,, = 0.
Lemma I (Wijsman, [14]): The distribution of lVl/lZl is the
distribution of the product of p independent chi-squared vari- Proof: We have
ables with n, n - 1,. . ., n - p + 1 degrees of freedom, respec-
tively. Iim { B , , } = Iim
n+m
Theorem 2: The uniformly minimum variance unbiased esti-
mator for the parametric entropy of the multivariate normal
distribution is

The inequality e.' - 1 2 x for x 2 0 will be needed to show that


Proof: From Wijsman's lemma we have
the limit of +((n + 1- i)/2) is the limit of {In( n + 1- i ) - ln2)
for i=1,2;..,p.
IVI
-=
IZI
n[
r=l
X:,,+l-,)] Consider the equality

Taking logarithms, we get


P
In IVI = In VI+ c In [ x;,z+
r=1
1-,)I

t dt
The mean and the variance of In[X;,,+ - I ) ] are given by . (2.9)
n+l-i

Let y = 2nt; then


t dt
rw
where +(.) and +'(.) are the digamma and trigamma functions,
respectively. (See Johnson and Kotz, [6].)
The expected value of In 1V1 and the variance can be found as

Then it is easy to show that

E { &(NORM)} = H(N0RM)
The term in the bracket is Cauchy with parameters p = v ( n +
and 1- i ) and p = 0. Thus the last term of (2.9) is less than or equal
to l / ( n + 1- i ) :
var { H,(NORM) } =-
4 r=l
4' ( n+i-i).
~

(2.6) n+l-i n+l-i 2


n-w
Since the Wishart distribution is a member of the exponential
family, S is a complete and sufficient statistic for 8. By the = lim I n ( n + 1 - i ) - I n 2 .
n+m
Lehmann-Scheffe theorem (Graybill,
.~ 151).
_, H,(NORM)
.. is
~

UMVUE for H(N0RM). Therefore, the limit of (2.8) is equal to zero.

Authorized licensed use limited to: UNIVERSITY OF SUSSEX. Downloaded on October 13, 2008 at 11:45 from IEEE Xplore. Restrictions apply.
IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 35, NO. 3, MAY 1989 691

Lemma 3: Var { H,(NORM)}, given by (2.6), converges to zero TABLE I


as n -+ 00. THEORETICAL
( t ) AND SIMULATED
(s) MEANSAND VARIANCES
OF H,(NORM) AND H,(NORM)"
Proof: From (2.6) we have to show that lim+'(n) = 0. Con-
Sample Estimator
sider the equality
Size Type Mean Variance
2.78132 0.1175100
2.83790 0.1175100
2.53974 0.1375068
2.59632 0.1375051
2.82056 0.0350800
2.83790 0.0350800
2.73749 0.0469360
2.75847 0.0469300
2.82766 0.0206200
< n-*
- 2.83790 0.0206200
2.79402 0.0207231
Thus lim +'(n) = 0. 2.80426 0.0207220
n+m 2.83284 0.0101500
Theorem 3: Both H,(NORM) and H,(NORM) converge in 2.83790 0.0101500
probability to H(N0RM) as n tends to infinity. 2.80969 0.0101093
2.81475 0.0101048
Proof: Since S is a consistent estimator of Z and the log
function is continuous the statement of the theorem follows. " 2= [ ,!.: ; theoretical entropy: 2.8379
Now we consider UMVUE for H(0MEXP). Let X , , . . .,X,, ,be
a random sample from the multivariate exponential distribution TABLE rr
THEORETICAL
( 1 ) AND SIMULATED
(s) MEANS
AND VARIANCES
given by (1.6), and let Y1; . ., 5 be the corresponding random OF HdNORM) AND H,(NORM)"
sample of order statistics.
Theorem 4: The uniformly minimum variance unbiased esti- Sample Estimator
mator of the parametric entropy of the joint density of the order Size Type Mean Variance
statistics from the multivariate exponential distribution (1.6) 2.73412 0.1175100
(Weinman, [13]) is 2.79070 0.1175100
2.50423 0.1316368
P 2.56082 0.1316243
H,( OMEXP) = In ( n x ) + p - p+( n) + In ( p ! ) 2.77336 0.0350800
i=l 2.79070 0.0350800
2.70396 0.0331871
with 2.72130 0.0331804
var { H,( OMEXP)} = p+'( n) . 2.78046 0.0206200
2.79070 0.0206200
2.73135 0.0241806
Proof: Equation (1.8) for H(0MEXP) involves the sum of 2.74159 0.0230754
the parametric entropies of the univariate exponential distribu- 2.78564 0.0101500
tions with parameters p,, i = 1 ,. . ., p . The UMVUE for the 2.79070 0.0101500
entropy of_the univariate exponential distribution with parameter 2.76253 0.0112372
p, is l n ( n & ) + l - +(n) with variance +'(n). The result follows. 2.76760 0.0109676

DISTRIBUTION
111. ASYMPTOTIC OF H,(NORM) "I:= [A: ; theoretical entropy: 2.7907.

Note that from (2.1), (2.2), and the fact that In IS1 =In IVJ-
p In( n - 1) the random variable in &(NORM) is distributed as TABLE 111
THEORETICAL
( t )AND SIMULATED
(s) MEANS
AND VARIANCES
OF H,(NORM) AND H,(NORM)a

Sample Estimator
i=l Size TKJe Mean Variance
2.55852 0.1175100
The distribution of ln[X:n+r-l)] is well approximated by the 2.61510 0.1175100
normal distribution with appropriate mean and variance (Olshen, 2.32824 0.1316455
[9]). Hence H,(NORM) is also well approximated by the normal 2.38483 0.1316268
distribution with mean H(NORM), given by (1.1) and the vari- 2.59776 0.0350800
ance (2.6). 2.61510 0.0350800
To study the convergence properties of H,(NORM) and 2.52798 0.0331916
H,(NORM), we generated lo00 samples of different sizes from 2.54532 0.0331788
bivariate normal distributions with different covariance matrices 2.60486 0.0206200
2.61510 0.0206200
2. For each sample the estimates &(NORM) and H,(NORM) 2.55536 0.0206548
were calculated. Their means and variances for different sample 2.56500 0.0206537
sizes are given in Tables I-IV. From the tables it is seen that 2.61004 0.0101500
H,(NORM) converges slightly faster to the true value than 2.61510 0.0101500
"(NORM). The simulated variance of &(NORM) is slightly 2.58655 0.0104487
smaller than that of H,(NORM) as expected. This convergence 2.59161 0.0104384
behavior does not seem to be affected much by the increasing
correlation between the two variables. "Z = [ z] ; theoretical entropy: 2.6151.

Authorized licensed use limited to: UNIVERSITY OF SUSSEX. Downloaded on October 13, 2008 at 11:45 from IEEE Xplore. Restrictions apply.
692 IEEE TRANSACTIONS O N INFORMATION THEORY, VOL. 35. NO. 3. MAY 1989

TABLE IV strained sequence is presented. A closed-form expression forthe power


( t ) AND SIMULATED
THEORETICAL ( s ) MEANSAND VARIANCES spectral density of maximum entropy charge constrained sequences is
OF H,(NORM) AND H,(NORM)= given and plotted.
Sample Estimator
I. INTRODUCTION
Size Type Mean Variance
1.95132 0.1175100
A binary magnetic recording system is considered. Due to
2.00790 0.1175100 transformer coupling, the magnetic recorder cannot pass dc. To
1.72102 0.1316378 avoid errors, the message sequence is mapped by an invertible
1.77761 0.1316303 encoder into a sequence { x}
that has no dc component and can
safely pass through the magnetic recorder. Pierobon [2] showed
1.99056 0.0350800
2.00790 0.0350800 that the charge constraint is a necessary and sufficient condition
1.92075 0.0331787 for a null in the spectrum of a binary sequence at dc. The charge
1.93810 0.0331827 constraint maintains a bound on the absolute value of the run-
1.99766
2.00790
0.0206200
0.0206200
ning sum of { x}.}.
For every rn input bits, the encoder outputs n bits. The code
1.94814 0.0206687
rate m / n is maximized to pass the largest amount of information
1.95838
2.00284
0.0206672
0.0101500 possible. The maximization occurs when { x}
has the maximum
entropy probability distribution; the rate is then the capacity
2.00790 0.0101500
1.97933 0.0104350 [ll]. The maximum entropy distribution of a charge constrained
1.98438 0.0104406 sequence is presented in Section 11. A closed-form solution is
given in Section I11 for the power spectral density (spectrum) of
“Z = IA ; theoretical entropy: 2.0079. the associated stat5 sequence, from which it is easy to find the
spectrum of { x}.
The spectrum shows the relation between the
REFERENCES charge constraint and the shape of the null at dc. As the con-
N. A. Ahmed. “Goodness-of-fit tests for testing multivariate families of
straint becomes looser the capacity increases and the null be-
distributions,” Ph.D. dissertation, University of California, Riverside, comes less pronounced.
1987. Let E; E { +1, -1}. Here is the same as A,, given by
B. C. Arnold, Pureto Distributions. Burtonsville, MD: International Gallopoulos et al. [8], the charge associated to a (0,l} run-length
Co-operative Publishing House, 1985.
E. J. Dudewicz and E. C. Van der Meulen, “Entropy based statistical sequence. Define the state as the accumulated charge S,, = Xr=oE;.
inference, I: Testing hypothesis on continuous probability densities, with Note that
reference to uniformity,” Mededelingen Uit Het Wiskundig Institut,
Katholieke Universiteit, Leuven, Belgium, 120, 1979.
D. V. Gokhale. “On entropy-based goodness-of-fit test,” Comput. Statist.
r, =s, -

Dutu Anal., vol. 1, pp. 157-165, 1983.


F. A. Graybill, A. M. Mood. and D. C. Boes, Introduction to the Theory The charge constraint C makes IS,l I C for all n. This state
of Stutistics. New York: McGraw-Hill, Series in Probubility und Statis- sequence has the 2C + 1 X 2C + 1 adjacency matrix A ,
tics. 1974.
N. L. Johnson and S . Kotz, Distributions in Stutistics: Continuous Multi- 0 ...
ouriute Distributions. New York: Wiley, 1972.
1 1 ... 0
C. G. Verdugo Lazo and Pushpa N. Rathie, “On the entropy of continu-
ous probability distributions,” IEEE Trans. Inform. Theory, vol. IT-24, 0 0 ... 0
no. 1, 1978. A= .
W. Magnus, F. Oberhettinger, and R. P. SON, Formulus and Theorems
for the Speciul Functions of Mathematical Physics. New York: 0 ... 0 1
Springer-Verlag, 1966. ... 1
A. C. Olshen, “Transformations of the Pearson Type 111 distributions,”
Ann. Muth. Stutist., vol 8, pp. 176-200, 1937.
S. J. Press, Applied Multiouriute Analysis. Malabar, FL: Krieger, 1982. 11. THEMAXIMUM
ENTROPY
DISTRIBUTION
C. R. Rao. Lineur Storistical Inference and its Applicutions. New York:
Wiley. 1976. The following theorem of Shannon’s about general constrained
0. Vasicek. “A test for normality based on sample entropy,” J . Roy. sequences is employed to find the maximum entropy distribution
Stutist. Soc.: Series B , vol. 36, pp. 54-59, 1976.
D. G. Weinman. “A multivariate extension of the exponential distribu- of charge constrained sequences. Many results in this section can
tion,” Ph.D. dissertation, Arizona State University, Tempe, 1966. be found in [l], [3], [5], and [7].
R. A. Wijsman, “Random orthogonal transformation and their use in
some classical distribution problems in multivariate analysis,” Ann. Theorem I [3]: Given a constrained sequence with an irre-
Math. Statist., vol. 28, no. 2, 1957. ducible state transition diagram and an n X n adjacency matrix
A with entries a,,,, let the maximum eigenvalue of A be X. Find
a left eigenvector wT and a right eigenvector U of A associated
The Power Spectral Density of Maximum Entropy with X such that d u = 1. Then the capacity of the sequence is
Charge Constrained Sequences log, ( A ) and the maximum entropy distribution is Markov with
transition probability matrix P. The entries of P are p I , ,=
KENNETH J. KERPEZ, MEMBER, IEEE (a,,,u,)/(Xu,).Thestationarydistributionof P i s n with? = wlu,,
i, j = 1,2; . ., n .
Abstract -A limit on the absolute value of the running digital sum of a For the char e constraint, the symmetry of the matrix A
sequence is known as the charge constraint. Such a limit imposes a implies that w‘= U. The maximum eigenvalue of A , A , was
spectral null at dc. The maximum entropy distribution of a charge con- found by Chien [l] to be
m
Manuscript received June 13, 1988; revised July 14, 1988. This work was X=2cos(y) with y=-
supported in part by the National Science Foundation under Grant ECS- 2(C+1)
8352220, and in part by IBM, CDC, and AT&T.
The author is with the Department of Electrical Engineering. Cornel1 Uni-
versity, Ithaca. NY 14853.
Thus the capacity of the charge constrained sequence is given by
IEEE Log Number 8928198. log, (2COS(Tr/2(C + 1))).

0018-9448/89/05OO-0692$01 .OO01989 IEEE

Authorized licensed use limited to: UNIVERSITY OF SUSSEX. Downloaded on October 13, 2008 at 11:45 from IEEE Xplore. Restrictions apply.

You might also like